FoswikiCache

A pluggable caching service and built-in HTML page cache

Discuss at FoswikiCacheTalk.

Previous work

There have been a couple of attempts to implement caching for Foswiki on different levels. All these solution don't share any code and are all quite different in their use. Some are

There are even plugins that do their own caching of downloaded or finished operations like:

There are certainly more plugins out there that have a need to temporarily store objects and retrieve them instead of recomputing them. I've looked at all of these implementations for their most useful and interesting features, leaving out what can be done better from within the core engine than from the "outside". Some of the above caching solutions will not directly profit from a central caching services, but indirect as the burden on them is relieved a bit. Some are still of great great value to ease the "first hit", when the page is not found in the cache and is about to be computed.

Requirements

So the basic requirement here is to have some sort of caching service offered by the Foswiki engine itself and made available to subsidiary components, extensions and Extensions. This caching service should relieve plugin authors from having to implement the caching store themselves by offering a simple API to a centralized cache. This caching service might be configured to use different backends that may be choosed from depending on the given hosting environments. For example in a hosted environment it is quite unlikely that you will be allowed to run a memcached daemon. Nevertheless you are able to use a file based store. A solution somewhere in the middle would be a (size-aware) memory cache that will keep objects for the timespan of the Foswiki request. The caching effect for this solution might be increased by using perl accelerators like mod_perl or speedy-cgi that keep the memory cache alive for more than one request.

There are different areas where there's obviously a need for caching even within the core engine itself, e.g. for preferences or users. Last not least the produced html page itself may be cached to be available if the same page is requested once again. These different requirements for caching objects establish a system of multiple levels of caching, with different levels of granularity with regards to the handled objects. The outermost and most coarse-grained level is the one of caching the complete html page. This is also the level where the most caching effects occur and objects are becoming invalid most frequently.

Cache maintenance

Which leads to the question when actually objects in the cache are outdated. This question can't be answered in general and for all levels of caching. The knowledge which information has been used to create a certain object may be buried in the depth of some plugin and there is no way for the cache to know when it has to forget an object without any help by that plugin itself.

Tracking dependencies

Typically, content management systems deal with cache invalidation by recording "dependencies" between objects and "fire" them on demand. Thus they are effectively removing the objects from cache again, so that they can be recreated next time they are requested. So the needed interfaces here is addDependency() and fireDependency() to inform the "dependency tracker" about what he actually stores.

So the hard part of the FoswikiCache is actually not to implement a pluggable caching service but to get the dependency tracking done right, i.e. for the html page cache.

There are a couple of different kinds of dependencies.
  1. automatically detected,
  2. external,
  3. manually added and
  4. temporal dependencies.

A lot of dependencies can be detected by the engine itself while rendering a page. Foremost these are added by reading in other files and topics, like WebPreferences and so on. Basically everything that goes through Foswiki::Store::readTopicRaw() adds a dependency to the currently rendered page. But also such little things like WikiWords to existing or not yet existing topics add a dependency. Stuff like this can be recorded automatically.

External dependencies are added to the current page by some plugin because it says so. These may be automatic dependencies also but are not recorded/recordable by the core engine itself. The event for such a dependency to get fired may come from outside emitted by some update to an external database. We can't do much about it from inside the core engine but offer the API to hook into the dependency tracker.

Manual dependencies are those that the author of a topic may want to add to it. He may either decide to
  • prevent the current page from being cached at all (Set CACHEABLE = no)
  • invalidate a couple of other topics whenever the current one is changed (Set DEPENDENCIES = web.topic, web.topic)
  • invalidate a couple of topics when any topic in the web has changed (Set WEBDEPENDENCIES = ...)

The latter, for example, allows to invalidate a topic that has a SEARCH in it on every edit in a web, so that the output of the SEARCH can be rendered once again as there might be a changed hit set now due to the topic changes.

Last not least, you'd like to automatically expire an object in cache and a topic author may decide to do so by adding a Set CACHEEXPTIME = preference value.

Firing dependencies

Once all dependencies of an object are known, it can be purged if one of them "fires". Dependencies are fired by the Foswiki::Store when

  • a topic has been saved,
  • a topic has moved (if the topic is moved to a new web, the target webdependencies have to be fired also)
  • a new attachment is added or
  • an attachment has been moved (source and target topic are firing their dependencies).

When a dependency fires all objects it points to will be removed from the cache. As a consequence, a single event might invalidate a wide range of objects that used the current object as an ingredient. That is, dependencies are fired "backwards" using the reverse relation. A See "Notes on the implementation" below.

As can be seen, most cache maintenance overhead happens during saving and renaming objects, not during viewing them. The only overhead added during view is actually storing the objects into the cache and recording new dependencies. And this only happens on the "first hit".

Dirty areas

Sometimes caching complete pages is really too coarse-grained. There are some parts of a page that change much too often while other parts of the same page never change, nevertheless are computed with non-zero costs. In that case the FoswikiCache can be told not to cache certain parts of the topic, called "dirty areas", keep the FoswikiMarkup inside as it is, and patch in the information computed during the request. In a way, cached pages with dirty areas resemble tempates.

CacheManager Backends

The page cache makes use of a so called "CacheManager" which does the real work to actually store the page while the page cache itself takes care of dependencies and rendering of dirty areas etc.

The current code implements backends based on
  • Cache::Memcached
  • Cache::FileCache
  • Cache::MemoryLRU

These backends have prooven to be of most use. There are other more experimental backends that are of less value for different reasons.

Page variations

A single topic may be cached several times distinguished by
  • url parameters,
  • wikiuser,
  • interface language,
  • servername and port and
  • session values.

These are "variations" on the same topic. If a topic is invalidated in cache, all of its variations will be deleted at the same time.

This means that every user has a separate set of cached topics. This is needed to cope with different user-level preferences as a site can look and behave completely differently for every user. This also means that a new copy of the current topic will be cached if it has been called with different url parameters.

In some cases those variants don't differ in their result but this extra space is worth spending for the sake of correctness. The same argument holds for values stored on the server side inside the session objects. Unfortunately not all session values are worth distinguishing and may result in cache trashing. There's a certain list of session values that are excluded. There might be a need for a clear specification of how to name session values so that they are excluded from the cache logic (e.g. all session values starting with an underscore are ignored).

Reverser dependencies

Reverse dependencies are created to ease firing dependencies and replicate the normal forward dependency in the target. These are a kind of backlinks. So when a topic gets cached it not only adds the variant to the correct page bucket but also updates all buckets of topics it depends on and adds the reverse dependencies. The reverse relation is used to fire dependencies, while the forward relation is only used to establish its reverse counterpart. This also means that there might already be a record for a topic even if the topic does not yet exist. This is needed to invalidate those entries that contain pages with NewWikiWord links. When the NewWikiWord topic comes to existence, it updates its page bucket, finds the reverse dependencies and fires them recursively.

Configuration

  • Select the caching backend:
    $Foswiki::cfg{CacheManager} = 'Foswiki:Cache::FileCache';
         
  • Enable/Disable the cache manager:
    $Foswiki::cfg{Cache}{Enabled} = 0|1; #default off
         
  • Enable/Disable debug output:
    $Foswiki::cfg{Cache}{Debug} = 0|1; #default on
         
  • Enable/Disable encoding content in gzip/deflate format:
    $Foswiki::cfg{Cache}{Compress} = 0|1; # default off
         
  • Options specific for size-aware cache backends:
    • max size of cache:
      $Foswiki::cfg{Cache}{MaxSize} = 10000;
               
  • Memcached-specific options (see the documentation of memcached for more information):
    • Address and port memcached servers (comma separated)
      $Foswiki::cfg{Cache}{Servers} = '127.0.0.1:11211';
             
  • Filebase cache-specific options:
    • root directory of the cache file system (will be created automatically if not yet present):
      $Foswiki::cfg{Cache}{RootDir} = '/tmp/Foswiki_cache;' 
             

Usage

Preventing the current topic from being cached:
  * Set CACHEABLE = no

NOTIMPLEMENTED Auto-expire caching of the current topic:
   * Set CACHEEXPTIME = <seconds>

Prevent certain parts of a topic being cached, rendering it during request time:
  • pointless example:
    <dirtyarea> Don't cache this. </dirtyarea>
  • using TimeSincePlugin to display the age of the current topic
    <dirtyarea> %TIMESINCE% </dirtyarea>
  • never cache the result of this SEARCH:
    <dirtyarea> %SEARCH{...}% </dirtyarea>

Manually add dependencies to other topics; This will invalidate the cache for the listed topics if the current one is changed:
 
   * Set DEPENDENCIES = Main.ListofAllEmployees, AllOpenActions
Best practice: add a manual dependency to those topics that are searched for regularly pointing to the topic that contains the SEARCH. Example: there's one ReportTopic that dynamically lists ReportItem topics. So add the following lines to all ReportItem topics (best by using a topic template when creating the ReportItem topics):
   * Set DEPENDENCIES = ReportTopic
So editing any of the ReportItem topics will result in the ReportTopic to be recomputed automatically.

Manually add dependencies from all topics of a web to a list of other topics; This will invalidate the cache for the listed topics if any topic in the current web changes (best used in WebPreferences):
   * Set WEBDEPENDENCIES = WebHome, Main.WebHome, People.ListofAllEmployees

You may force the current topic to be recomputed by adding a refresh=on url parameter. The complete cache can be cleared using refresh=all (not implemented for Foswiki::Cache::Memcached as this operation is not supported by the backend).

Server-side caching helps client-side caching

Whenever a page is cached it will also store its etag computed on the base of its last modification time, that is the time it has been added to the server cache. In addition to the normal http headers Etag and Last-Modified headers will be added based on this information. So whenever a page is requested again and the browser transmitted the If-None-Match and/or the If-Modified-Since request headers, and these match those on the server cache, then Foswiki will answer with a 304 - Not modified message and an empty body. The browser on the other end will reuse the page as it has last been stored in its own client-side cache.

Compressed content encoding

Modern browers understand gzip encoded html and indicated their capabilities using an Accept-Encoding header in the request. That is instead of sending it the plain text html, you gzip it and add an appropriate http header to the response that tells the browser to uncompress the message before parsing it in. While this is normally done for static files only, like javascript and css by sending their ...js.gz counterpart, it also makes sense to send over compressed dynamic content. This will pay off when using a server-side caching like FoswikiCache where the same page only gets compressed once and transmitted gzip encoded multiple times on each cache hit.

Status: added to SVN/trunk on 2009-06-09

-- Contributors: MichaelDaum

I think we need an update on the progress of this, its likely hood of being useful in the 1.1 timeframe, and a testing and validation plan. I was talking to Michael on irc, and he said he found some difficult problems with it - which need doccoing so that anyone that wants to work towards 1.1 can help work on them (or remove the troublesome bits to further the release of 1.1)

-- SvenDowideit - 01 Mar 2010

Michael?

I'm concerned that we have a lot of code in trunk for this, but we don't have any user facing documentation, no idea of status, and from what I can see, no unit tests ensuring that the cache implementations actually work in a consistent and manageable fashion.

-- SvenDowideit - 14 Mar 2010

I am not happy with the current implementation. The current dependency tracker wastes too much time on updating reverse dependencies. Solution is not to use non-database backends to store dependencies and manually create their reverse index. I will move the current dependency tracker into a subclass of their own and implement one based on dbi.

-- MichaelDaum - 14 Mar 2010

I share Sven's concern.

There is no end user documentation written.

I have no clue what this cache feature does.

And I lack commitment that this code will be finished and stable in time for 1.1. I have seen statements on IRC claiming that the code is not stable and can slow down Foswiki in time.

We cannot get a commitment that the code will be made stable within the time schedule in the ReleasePlan, then we'd better start removing the code now and let further development happen in a scratch branch.

-- KennethLavrsen - 14 Mar 2010

This proposal is 3 years old. Its predecessor is still running on beijing.

Now you tell me you don't know what this all about? What about the docu on this topic? Did you ever try enabling the page cache?

It works fine, btw. However it still has room for optimizations. Leave the code in the core, please, so that we can continue to work on it.

-- MichaelDaum - 14 Mar 2010

We are not discussing if the feature is wanted or not. We are way past that. We want it. Consensus has been reached in this proposal. Noone is talking about rejecting it. What Sven and I are concerned about is the code in trunk and release 1.1.

Let me repeat the questions that both Sven and I have.

  • Will you please wite the user documentation do we users can know how to use the feature?
  • Will you commit to solve the problems within the schedule for 1.1? Problems that YOU yourself has claimed are still in the code. You are the one that said on IRC that the code slows down Foswiki when it has run for a while. You yourself is the source of this FUD.

-- KennethLavrsen - 14 Mar 2010

I just tried to enable the cache again. Last time I tried it crashed. And guess what. It still does.

Can't locate Cache/FileCache.pm in @INC (@INC contains: /var/www/foswiki/core/lib . /usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib/perl5/5.8.8/i386-linux-thread-multi /usr/lib/perl5/5.8.8 /var/www/foswiki/core/lib/CPAN/lib//arch /var/www/foswiki/core/lib/CPAN/lib//5.8.8/i386-linux-thread-multi /var/www/foswiki/core/lib/CPAN/lib//5.8.8 /var/www/foswiki/core/lib/CPAN/lib/) at /var/www/foswiki/core/lib/Foswiki/Cache/FileCache.pm line 35.
 at /var/www/foswiki/core/lib/Foswiki/Cache/FileCache.pm line 35
   Foswiki::Cache::FileCache::BEGIN() called at Cache/FileCache.pm line 35
   eval {...} called at Cache/FileCache.pm line 35
   require Foswiki/Cache/FileCache.pm called at (eval 16) line 2
   Foswiki::PageCache::BEGIN() called at Cache/FileCache.pm line 35
   eval {...} called at Cache/FileCache.pm line 35
   eval 'use Foswiki::Cache::FileCache

And I do not understand half the settings one can enable in configure. Even when reading the help text I have no idea what it means. I did not change any of the expert settings. I just enabled it.

I guess most of us have not tested your feature much because you have always said it was not finished and your Tasks.Item3695 is still open.

-- KennethLavrsen - 14 Mar 2010

There are a series of cache implementations. one of them is Cache::FileCache. Another is Cache::Memcached. Default is BerkeleyDB as you can see in Foswiki.spec Yet another is MemoryLRU ... only for testing purposes ... only works with persistent perl impls. See the docu in Foswiki.spec when to use which implementation.

If you decide to use Cache::Memcached or Cache::FileCache, make sure that this perl module is installed on your computer.

-- MichaelDaum - 14 Mar 2010
I Attachment Action Size Date Who Comment
TWikiCache.patchpatch TWikiCache.patch manage 55 K 02 Mar 2007 - 17:21 MichaelDaum against MAIN, revision 13018
untitled.drawdraw untitled.draw manage 3 K 24 Apr 2008 - 17:27 UnknownUser TWiki Draw draw file
untitled.gifgif untitled.gif manage 8 K 24 Apr 2008 - 17:28 UnknownUser TWiki Draw GIF file
Topic revision: r15 - 01 Feb 2017, MichaelDaum
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy