FoswikiCacheTalk

Good idea - not sure that the all-encompassing term 'TWiki cache' should be used for this though, as it could easily be confused with the many types of content cache that people have built as plugins for TWiki, caching rendered pages, variables, or database information. The top hit for Google:TWiki+cache is one of the cache plugins. Maybe this should be TWiki::Cache::Internal or something, leaving scope for TWiki::Cache::Headers for CacheControlHeaders (also includes links to cache plugins).

When I read this I thought 'great, Sven is going to address content caching'... If you make these caches persistent, or ModPerl memory resident, they could actually help with internal caching, though - so perhaps I should have waited till you fleshed out the concept.

Sounds like this is mainly about 'internal caching' as in VarCachePlugin but covering a lot more objects. Does sound like you could speed things up quite a lot if there's a persistent FoswikiCache implementation.

-- RichardDonkin - 26 Feb 2007

Actually, I did already implement the TWiki::Cache as outlined here. We discussed it several times on the WikiRing before getting it out to the community. It is actively used to speed up the WikiRing blog.

The description above is not accurate and infact does not describe what the foremost feature of the current code is. And yes, Richard, your first expectations are right: the TWiki::Cache is a content cache that does the job that some of the plugins you mentioned tried. Note, that I commented on the limitations of a plugin like the TWikiCacheAddOn years ago. So I decided to do it but right.

So what is the TWiki::Cache?
  • It caches the content using memcached before it sends it out to the browser.
  • It retrieves content from the cache before trying to render the same once again.
  • It tracks all dependencies of each cached page in cache and invalidates it as needed.
  • It offers an API for TWikiApplications and plugins to hook into the dependency tracker to provide additional knowledge about content dependencies.
  • It can invalidate content automatically based on a timer.
  • It allows to disable pages from being cached.
  • It allows to protect parts of a page from being cached (dirty areas) that are then rendered after retrieving them from the cache, before sending it to the browser.
  • It does all this the most efficient way possible.
  • It reduces server load and rendering time considerably, though I did not have the time to to thorough testing.

In combination with speedy-cgi, pages can be send out in under one second each, no matter how complicated the TWikiApplication behind is. The overhead of maintaining the cache does only happen during save firing dependencies, storing the new page.

Memcached is a highly efficient distributed in-memory-caching services. Unfortunately it isn't available for hosted environments, which is the reason why having a filesystem fallback mechanism is desirable.

Only as a side-product you get caching services as described above. But the most important factor in the TWiki::Cache is to track dependencies for html pages. Frankly, offering a plain caching service isn't of great value if you don't specify when this content is outdated.

The weak point of the current implementation is that it relies on memcached and its perl API to be installed on the system. This infact can be made pluggable to use different storage mechanisms in the background, e.g. modules that implement the Cache::Cache interface.

The current code is available as a patch against the MAIN branch. I am about to maintain a similar patch for TWiki-4.1.x as some of my clients already use this code on intranet sites.

I'm going to refactor the above introduction to this page asap.

-- MichaelDaum - 26 Feb 2007

Most interesting, I'm looking forward to try it out. You are saying this can't be used on hosting. You mean shared hosting I assume, is that because memcached needs it's own process?

What would be great to have is a framework that would allow you to implement your own cache mechanism fairly easily based on memcached or whatever else you wish to use. But I see you already mention that Michael.

-- StephaneLenclud - 26 Feb 2007

Sounds very interesting - is it extensible to support CacheControlHeaders? The content invalidation is the hard part, once you have that the HTTP cache headers should be quite easy, and a big win for enterprises with internal proxy caches as well as ISPs that provide proxy caches, such as AOL.

-- RichardDonkin - 26 Feb 2007

I've reworked the description of this FeatureRequest and implemented the other cache backends.

-- MichaelDaum - 27 Feb 2007

TreePlugin for instance certainly ought to make use of such a cache mechanism.

Also look at mod_cache even though it's probably unusable for our purpose.

-- StephaneLenclud - 28 Feb 2007

The key that makes third party caches (like mod_cache) usable is that we need a way to fire dependencies, that is an API to actively invalidate cache entries. And this must be available in perl. That's why mod_cache and also varnish are probably out. These http accelerators are simply to "uninformed" about what's going on in the content management system behind. However, in a more complex server setup, it may still make sense to add another level of caching.

-- MichaelDaum - 28 Feb 2007

I've got the code ready to check it in to the MAIN branch.

-- MichaelDaum - 28 Feb 2007

What about attaching the diff to this topic or to Bugs:Item3695 so that we can have a look at the code and documentation? Of course, I am interested in caching, but I'd guess it needs some installation and configuration items, fallbacks (if neither memcached nor Cache::Cache are installed), and admin and author guides (e.g. which TWiki variables should be avoided for better caching performance).

-- HaraldJoerg - 28 Feb 2007

There's no other fallback than no caching if neither memcached nor Cache::Cache are installed. That's ok for now as there are no other parts in the engine that depend on the cache service to be there. Here is the patch against MAIN, revision 12994. Apply it, install Cache::Cache, add $TWiki::cfg{CacheManager} = 'TWiki::Cache::FileCache'; $TWiki::cfg{Cache}{Enabled} = 1; to your LocalSite.cfg and see if it works for you.

-- MichaelDaum - 28 Feb 2007

Just found this http://cpan.robm.fastmail.fm/cache_perf.html ... and implemented a Cache::FastMmap backend. But unfortunately it fails to cache large html pages and bloats up the memory requirements of the view process on large mmap files.

-- MichaelDaum - 28 Feb 2007

Implemented a DB_File base cache backend as a fallback. I think that a set of sensible cache backends have been implemented now. Infact, the SizeAwareFileCache, SizeAwareMemoryCache and MemoryCache are not worth it and may even degrade performance. Cache::FastMmap needs to be checked if there's an error in the adapter class or if it is too buggy/resource hungry.

For now I'd propose to only keep
  • Cache::Memcached (for high end sites)
  • Cache::FileCache
  • DB_File (if installing Cache::FileCache is no option)

The configure could check for Cache::FileCache and fall back to DB_File if it is not available. DB_File comes with perl but I am not sure if it was in perl-5.6.1 already.

-- MichaelDaum - 28 Feb 2007

Another alternative for Cache::Cache is the caching library by Chris Leishman. It says in the docu that it is a complete reimplementation of Cache::Cache ... but not if it is performing better or worse.

-- MichaelDaum - 28 Feb 2007

Thanks Micha for tackling the caching question on a system-wide level. This is good stuff!

On CPAN dependencies, we must make sure that we ship with a default that "just works out of the box."

On single topic caching several times: I am wondering what the right balance is between caching everything and caching important stuff only. Caching also brings an overhead, it can be counter-productive (by performance) if content gets cached and is not used later. Possibly good to cache only content that is accessed frequently, such as topic viewed by TWikiGuest user, without any URL params. This simple approach could also simplify the design. As a data point, 96% of the view traffic on twiki.org is by TWikiGuest.

The name <dirtyarea> assumes knowledge that this is part of caching. Most users editing a topic have no idea what this word means. Possibly name it <nocaching> or the like to give a hint that this is cache related.

-- PeterThoeny - 01 Mar 2007

In general, you'd like to cache as much as possible. Some cache implementations do slow down the fuller they get (collisions getting more frequent, cache directories holding several hunderds of entries). But that can be dealt with on the cache implementation level using different removal strategies, e.g. FIFO (removing the oldest entry first) or LRU (removing the least frequently used entry first).

In addition the amount of dependencies firing on a save reduce the cache size automatically. Even if the view/save ratio is rather high, a single save is very likely to delete lots of cached pages. So virtually, the size of a cache will not grow as dramatically as one would expect.

About CPAN dependencies: does anybody know which perl versions do not ship DB_File? For now, as there are no subsystems that do rely on the caching services being there, TWiki still runs as normal without any html page caching. If, however, we'd like to make use of the caching services more extensively in other areas, you are right that we need to be sure that there is always a lowlevel option ... though DB_File does quite fine wrt performance. There's also the option to bundle TWiki containing a CPAN package it can fallback to.

-- MichaelDaum - 01 Mar 2007

Updated patch for latest MAIN branch. Added TDB_File backend.

-- MichaelDaum - 01 Mar 2007

Re DB_File - seems that Perl 5.6 does include this, but Perl 5.8 does not - see the http://search.cpan.org/~nwclark/perl-5.8.8/ perl core modules list.

As for caching a lot vs a little - I think it's best to cache as much as possible, subject to memory and disk space limits of course. Most TWiki sites are not that large, and as Peter says public Internet TWikis have a lot of guest access, so caching the whole site on disk should be no problem, and even caching in memory is feasible.

Since this module is the first to address the whole area of cache invalidation/maintenance, it would be quite easy to plug in HTTP CacheControlHeaders support, which would make this even more useful for large-scale use across WAN links within enterprises, and on the Internet, by enabling browsers and proxies to cache more effectively. You've done all the hard work, or at least are doing it!

One thing to watch out for is that we don't re-introduce any existing issues solved through cache headers and per-edit-operation URLs, i.e. BackFromPreviewLosesText and RefreshEditPage.

-- RichardDonkin - 03 Mar 2007

Preview and Edit pages are not cached. Only view pages are cached.

Hm, I've downloaded the perl-5.8.8.tar.gz and DB_File definitely is in there.

Wrt small vs big sites. Yes you have to decide on the caching backend and chose one that is appropriate. Small sizes can use one that is not "size aware", e.g. DB_File or Cache::FileCache. Large sizes should use the memcached backend which is size aware. I still haven't tried the caching library by Chris Leishman which might implement a vital size aware filebased caching backend. The one I've tried so far (Cache::SizeAwareFileCache) was way too slow.

Wrt CacheControlHeaders, this is quite a different issue and could have been addressed independently working towards better proxy and client-side cache behavior (is it so bad?). This is rather likely to be a frustrating work as the cache, you are addressing, is not under your control regarding the implementations not following the specs and cache invalidation. In principle, proxies and browsers can't know when their content is out of date. They only cache based on the assumption that hopefully - within a certain time frame - there's most probably no need to fetch a fresh copy. TWiki can't tell them in advance either. It can only tell them not to come back any sooner but x seconds/minutes/.... But if you can sacrifice content validity for speed - and this is the case for some kind of sites - the metadata, on which proxies and browsers base their caching, should be as cooperative as possible. I don't see that the current FoswikiCache implementation is a specific facilitator for continuing work on CacheControlHeaders.

-- MichaelDaum - 04 Mar 2007

Re DB_File, this seems to be in the http://search.cpan.org/~nwclark/perl-5.8.8/lib/AnyDBM_File.pm AnyDBM_File package as part of perl-5.8.8.

I agree that CacheControlHeaders is a separate feature that could be added later, possibly as a plugin if there are suitable core APIs to determine the true 'last modified' date, cacheability, etc, for a given page built using FoswikiCache.

FoswikiCache would be useful as an enabler for HTTP CacheControlHeaders, because it provides HTML page caching, already provides much of the required data, and handles the whole area of cache invalidation/maintenance. In fact many of the concepts are similar, e.g. the HTTP/1.1 ETag is a unique ID for the page variants that you store within a FoswikiCache pageBucket, so there is a lot of synergy.

The expiry time is really a policy decision to be controlled by the TWiki site administrator - e.g. the site could use these headers to prevent or minimise most HTTP caching, if that's preferred. Sites that want to make use of HTTP proxy and browser caching could set parameters that allow most pages to be cached (e.g. all view pages but not those with significant embedded searches).

Cache control headers may also be important for security - proxy caches will often cache items for longer than they should, and in some cases can cache personalised content, but cache control headers provide a way to control this. Use of explicit freshness information (expiry date) and 'validators' (unique URLs and ETags that flag that a particular page version is unique, as with RefreshEditPage) are a good idea for many pages, and in particular for Edit and Preview pages - by using unique URLs and ETags (basically a unique ID for a page/object), together with a long expiration time, browser Back buttons will keep working for Edit and Preview, while proxy caches can be told not to cache Edit and Preview pages. At the very least, using a unique ETag for personalised pages should guarantee that HTTP/1.1 caches will not cache anything that is personalised - this has been a problem with at least one http://www.securityfocus.com/bid/12716/discuss Squid information disclosure bug relating to cookies.

It's also possible to tell proxy caches to never cache a page using a header such as Cache-Control: no-store, which forces the proxy cache to go direct to the web server each time. Other options are to allow caching but force re-validation of the URL+ETag on every client request, i.e. a much shorter request that could perhaps be served by a lookup into FoswikiCache.

Some useful resources:

-- RichardDonkin - 04 Mar 2007

Why is the already done work not available as sources?

-- KennethLavrsen - 18 Mar 2007

It is. See patch below. Unfortunately I was not able to visit the two last release meetings so that this work was formally accepted as a 4.2 feature.

This work is not finished. The following things have to be done:

  1. implement a size-aware filebased cache backend based on caching library by Chris Leishman
  2. implement a cronjob that purges cache entries. this might ease the runtime of Cache::SizeAwareFileCache and likely cache backends, given you can overcome their purge during set/get behavior.
  3. delayed write access in maintenance cycles to reduce cache-twiki communications
  4. debugging re-rendering of dirtyareas: rendering might be sensitive to the context a dirty area was found in. so there might be a need to capture the context of a dirty area as otherwise the re-rendering result might be different.
  5. extensive performance measurements to evaluate different aspects of the implementation, i.e. growing cache maintenance overhead on large caches

-- MichaelDaum - 20 Mar 2007

There are for sure some more things that need to be done - and maybe there are still some design challenges:
  1. The cache has user interfaces which are yet undocumented:
    1. For authors: They need to know where and when to introduce <dirtyarea>. In my opinion a cache should not have such an interface at all, or at least the typical culprits (like %SEARCH%, %ACTIONSEARCH%) should imply "don't cache this".
    2. For readers: They need to know when to use the reload url parameter and when to use the reload button in the browser.
  2. A cache entry of the complete rendered page is done (needs to be done) "per user". This gives bad performance characteristics on sites where every access is authenticated (happens in my intranet). Our most frequent scenario of people clicking on URLs reported by WebNotify will inevitably give one new cache entry per view, but rarely a cache re-use.
  3. Caching "on view" instead of "on write" gives bad performance characteristics if you have a search engine periodically visiting your site (happens in my intranet) unless your cache can accommodate all topics for this particular "user".
  4. The cache is "pluggable" with respect to backends, but not with respect to caching techniques: This is "cache complete pages on read", whereas previous ideas went for "cache compiled templates".
  5. Finally, some test cases would be in order, too.

-- HaraldJoerg - 20 Mar 2007

Re (1.1): SEARCHing is the most expensive operation in a TWiki. Most of the time a search is performed, the engine finds the same things again. There's no need to do so, but only if there was a change within the scope of the search.That means, whenever a topic was edited/created/renamed/deleted within that scope (e.g. a single web) the cached search results need to be invalidated and recomputed. But not if nothing changed. All TWikiApplications are based on some sort of SEARCH. These all wouldn't be cacheable.

Re (1.2): no, readers shall never need to reload a page explicitly. That's what the dependency tracker is for. There is a way to refresh a page manually using "More actions".

Re (3,4): Caching on "view" is fine. You simply keep what you just computed. That's the normal way caches work. Pre-computing html even if you might not need it does seem to be a much bigger waste conceptually: "Don't think about things beforehand, but remember what you just found out in case someone asks again."

Re (1,5): self-evident

Re (2): Right, you get notified about a changed topic. As it was changed by some user, he invalidated the cache for that new topic. If you click on it and you need to authenticate to view it, then the page needs to be recomputed for you because someone else changed it. But if you visit it again and nothing changed you get the cached page. If you don't need to authenticate to view this page and some other guest has already visited it, then you get a cached version also. Pre-computing (versions of) that topic for all known users is no option imho. Not sure if that's doable at all.

-- MichaelDaum - 20 Mar 2007

I am going to check the current code into MAIN - if nobody disagrees - in the hope to get some more testers.

-- MichaelDaum - 26 Mar 2007

yes please, I have too many patches that i'm tending to apply major ones like this. but once its in, I can start to 'just use it', and to code without breaking it.

-- SvenDowideit - 26 Mar 2007

Nothing was implemented before deadline for 4.2.0

So this is deferred to GeorgetownRelease

-- KennethLavrsen - 03 Jun 2007

This is implemented and used on a daily base on the WikiRing. However there are some issues I had no time to figure out before feature freeze. This deadline came too early to get code like this into 4.2.0 in a safe way. I will keep on working on it and might release the code as a TWikiCacheContrib independently. Infact, some of my clients want a backport of the FoswikiCache for 4.1.2. So there are some sponsors already to fund this work, though on a lower priority than a couple of other projects.

-- MichaelDaum - 03 Jul 2007

Michael, you really did a great work. I decided me to put in code my own (quite close, actually) approach as PublicCacheAddOn. It has not exactly the same goals, and is definitely less polished or complete, but I hope to be able to use it to have a different look at issues to help you and the various performance efforts on TWiki

-- ColasNahaboo - 13 Jan 2008

It does have the same goals! The core of the FoswikiCache is (a) the PageCache for full page caching and (b) a dependency tracker to automatically invalidate cache entries if a dependency of a page gets fired. The rest is generic infrastructure one could reuse otherwise.

Too bad you never contacted me or commented here before, and started YACI (yet another cache implementation). frown, sad smile

-- MichaelDaum - 14 Jan 2008

Sorry Michael, I was not planning to hurt you. I had only hunches and no clear experience on this subject, so I wanted to get my hands personally dirty to gain experience on this subject. Otherwise, I would have just made comments on this page backuped by no real knowledge, that would not have helped you. And I am more and more wary to say "I am going to do this" as experience proves that I am often sidetracked and do to deliver promises. Moreover I am a very bad perl programmer, with no intimate knowledge of freetown, letting me loose in your code would have been, like we say in French, "an elephant in a china shop". Also I think I tried a different approch from you, due to sligthly different goals (do not cache different versions of pages), that I really wanted to try. Besides, this is in my free time, and I needed something fun to do, fun for me being now getting bare to the metal. Anyways, I sincerely think our approaches are quite complemetary in implementation and that we can gain insight in comparing the 2 approaches

-- ColasNahaboo - 14 Jan 2008

After some interesting IRC discussions with Michael, http://koala.ilog.fr/twikiirc/bin/irclogger_log/twiki?date=2008-01-14,Mon&sel=1175-1334#l1171 I think the best way for me is just to avoid taking the same approaches than FoswikiCache. For instance, I will drop my idea of analyzing the topic contents to find dependencies, and even once I have a working strategy, try to implement it as varnish rules to see if I can just replace my front end code by varnish

-- ColasNahaboo - 15 Jan 2008

I can't find any code for this extension. Is any of it public?

-- ArthurClemens - 18 Mar 2008

There is a patch below but it's from a year ago. Maybe Michael can comment?

-- RichardDonkin - 18 Mar 2008

I am gonna check that in to trunk asap.

-- MichaelDaum - 19 Mar 2008

Here's a silly idea: why implement caching in TWiki? Implementing rfc-compliant HTTP caching is quite tricky, and is imho better left to dedicated software such as varnish, squid, or whatever.

Rather than reinventing the above, wouldn't it make sense to provide the proper HTTP headers such that an upstream cache can properly cache content. This is more scalable too.

Next to the headers (Cache-Control, Expires, Last-Modified, etags etc..) it'd be nice if TWiki could emit proper PURGE requests to upstream caches if content needs to be refreshed of course..

Ie. what i would like is not caching but better cache control.

-- KoenMartens - 04 Apr 2008

Yep, my current FoswikiCache patches do include Etags and gzip compression (if the browser supports it). That's enhancing upstream caching a lot as well reduces bandwidth.

The main reason to implement caching in TWiki is its dependency tracker. Only TWiki itself can track and purge caches. If no dependencies were fired, it will return the same page again and never do the same thing twice. This by no means superseeds more upstream caching, i.e. using a reverse proxy. FoswikiCache follows the idea to: never do the same thing twice, i.e. never render the same page twice. Looking at other external upstream caches, they only get a caching effect if they sacrifice cache correctness, that is return the same page for a certain timespan and ask for updates from the backend less frequently. FoswikiCache does not do that. It gets its caching effects because nothing changed and there's no reason to render exactly the same page twice.

-- MichaelDaum - 04 Apr 2008

Are there patches for this for 4.2?

-- KenGoldenberg - 09 Apr 2008

I would be interested in the 4.1.2 patch if you ever managed to do that backport Michael.

-- StephaneLenclud - 10 Apr 2008

Michael, proper cache control means that the app tracks what has changed and what not, and notifies the upstream cache accordingly. Either by sending a http PURGE to the cache or by answering properly to HEAD requests from the cache. Anyway, as i've been losing interest in commiting code i should not complain or say how things should be done smile

-- KoenMartens - 14 Apr 2008

I always felt TWiki shouldn't cache entire processed pages but instead pre-calculate tags and wiki markup.

Suppose we add a mechanism for common tags to specify if the result of the tag is static, somewhat dynamic or very dynamic. Then TWiki could calculate a pre-processed topic text on save with the static entries and wiki markup replaced, leaving only the dynamic tags.

The difference between "somewhat" and "very" dynamic tags would be that the first only gets updated every x minutes and the second on every page load. For example, a search would probably be "somewhat" dynamic but a user tag would be very dynamic. In contrast, the result of a TeX math formula or syntax highlighting will always be the same.

The "somewhat" dynamic tags could return a list of topics they depend on instead of just relying on a timed update, which ties into the dependency tracking above.

The advantages of this approach are:
  • Fully transparent and backwards-compatible
  • The look and feel (skins etc) are dynamic, but the expensive topic text conversions are optimized
    • If the look and feel are optimized towards client-side processing, they become static and fast as well
  • Most processing is done on save, which is a slow event anyway
  • Tag handlers can be updated one-by-one, concentrating on the most CPU-expensive ones first
  • Plugin writers get an extra incentive to use the common tags handler instead of regexing

Thoughts?

-- WoutMertens - 23 Apr 2008

I agree Wout smile and there is even a patch here in Codev somewhere that does cache template evaluations - it makes a measureable difference, and I'll be picking it up again soon. The others in your list - yup, most have been proven to make a big difference, but doing a complete releaseable and compatible change needs work - the largest amount in defining more unit tests, so we can be sure to have changed TWiki as little as possible.

-- SvenDowideit - 23 Apr 2008

The FoswikiCache already supports partial caching in the sense that it allows to prevent certain areas from being stored into the cache and re-computes them for each request instead like this:

...
static content
...
<dirtyarea>
...
non cacheable content
...
</dirtyarea>
...
static content
...
<dirtyarea>
...
non cacheable content
...
</dirtyarea>
...
static content
...

Nevertheless, I agree that templates could in theory be pre-compiled, although this is a much tougher job than what the FoswikiCache, i.e. its page cache does. That's because templates can be very dynamic.

Modern CMS systems do caching on multiple levels. The FoswikiCache's page cache - caching the full html - is just one of these levels, sitting somewhere in between.

-- MichaelDaum - 23 Apr 2008

Wout, all you describe is interesting, but you must remember that all this cost CPU. For instance in my tests on a slow machine, the big topic TWikiVariables takes 6 seconds to render without cache, 3s with FoswikiCache, and... 0.06s with the full html cache PublicCacheAddOn that cache the final full processed html. So there is no free lunch, you must take care that the time to compute sophisticated algorythms do not eat up all your efficiency gains...

-- ColasNahaboo - 24 Apr 2008

Colas, I agree that any processing at all will eat CPU, but you gain processing. I feel that caching the full processed html kills the flexibility that TWiki provides. Users are no longer able to have personal skin settings etc.

Caching should be a transparent process resulting in a gain in speed without losing accuracy and flexibility.

Sven, Michael: Thinking more about the template caching, it might be that the real problem is the skin. After all 99% of the topics on a typical wiki site is completely static, making them prime candidates for regular page caching, but the skin adds site trees, username expansions etc. If we would agree that a complicated skin would only be needed/supported on a full-scale browser, skins could use javascript and iframes to load the static topics separately from the current page. Browsers that don't support iframes would have to use a plainer or slower skin.

An example of how this would work:
  • User requests TopicA
  • The skin delivers an html page with nothing but 3 iframes. These iframes are each wiki topics, but with the skin=subskin parameter.
  • The iframes would be the header of the twiki site, the navigation bar and the topic text. Of these, only the navigation bar would typically be dynamic.
  • The navigation bar could even contain a bunch of javascript that loads the topic tree separately, making that iframe static as well.
  • FoswikiCache would be able to cache each iframe according to its own dynamics
  • The user would see the static content almost immediately, and faster because of connection parallellism

%DRAWING%

-- WoutMertens - 24 Apr 2008

Interesting idea, Wout. I'd love to see this explored in reality! You could even use an AjaxSkin instead of (i)frames, as that would be more flexible when each three regions "interact" in some way.

Wrt, user settings being impossible using FoswikiCache. Not true. Each page is stored using a sophisticated key that takes a couple of things into account. Besides the plain url - which is the only key a normal upstream cache would use as a key to the cache - are the url params, session values and user identity. These information bits, i.e. the session values, are only available from within TWiki. These are used to calculate so called "page variations". All page variations are stored in one bucket for the url. A purge will always empty a complete bucket, including all of the page variations. I've been experimenting with more finegrained purging on a variations base, but that turned out to be too complicated in terms of CPU and code maintenance.

So you can see that FoswikiCache by no means reduces the flexibility of TWiki wrt user settings whatsoever.

Colas, topics like TWikiVariables or even worse TWikiDocumentation should be banned. They are too complex for any "system" on the way: twiki, bandwidth, browser and last but not least the user that is simply overstrained by such an amount of information. Even the first hit, that is when the page isn't cached yet, is too expensive. If someone wanted such a page, e.g. to print it all out, he should be able to ask for such a page using a special separate link or button. Any such monster pages thwarts the workflow of omeone who just wants to look up the documentation of a variable quickly.

Still, your benchmarks are great! Well done.

-- MichaelDaum - 25 Apr 2008

Michael, I see. Hmmm. Does that mean that each topic gets generated at least once per user, or is the code smart enough to notice that some topics are invariant to any or all internal values?

Given that most users will visit a certain page only a few times, not taking such invariance into account means a lot of CPU and disk waste.

If we implement static/dynamic scheme on plugin execution, that takes care of the invariance, plus it moves a lot of the processing to save time, making things faster at view time.

Of course, any caching done transparently and correctly is better than no caching at all smile

-- WoutMertens - 25 Apr 2008

So basically what I'm proposing (as someone who doesn't have time to code it, sorry frown, sad smile ) is to automatically generate a "precompiled" version of a topic with the dirty areas marked at save time.

FoswikiCache can then be used on top of that at view time.

Templates can be left out of the equation by optimizing the skin as explained above. FoswikiCache would then automatically be in the position to cache the proper parts of the page.

Page processing time goes down, user satisfaction goes up.

One interesting metric that should be looked at is the variability in page generation time given "normal" topics, unlike TWikiVariables wink

-- WoutMertens - 25 Apr 2008

I've considered this several times, but dropped it for a simple reason: security and complexity (again). Any page in TWiki can suddenly show up information that is only visible due to the user being authorized to see it. Search results of a FormattedSearch or an INCLUDE are all filtered through TWiki's access control. As far as I investigated it, things got much more complicated to get it right than one would think in the first place. Last but not least I even prefer to render an invariant topic twice but be sure that no unauthorized information is disclosed by means of suddenly see page fragments of another user being more authorized. I took this issue so serious that there deliberately is no sharing of cached pages among registered users at all. Sure, anonymous users will all see the pages as they were cached for the TWikiGuest.

While it seems to be a pity not to share fragments among registered users, the main factor that pages in the cache get purged and need to be recomputed is that wiki content is highly interdependent: a single notion of a WikiWord linking to another page creates a dependency to this page. So a couple of edits on strategic topics, e.g. WebPreferences, will purge large amounts of the cache. As a consequence, on a long running FoswikiCache the number of currently valid pages in the cache won't be that much as one might expect: it takes time to capture pages again, but lots are purged with a single edit.

Remember: the more complex the cache algorithm gets, i.e. its dependency tracker, the lower is the net value of caching, and the more probable are principle flaws in the code that may lead to unwanted information disclosure.

-- MichaelDaum - 25 Apr 2008

You bring up a good point with the security, I hadn't considered it. However I disagree that it means we shouldn't consider precompilation.

All an %INCLUDE% tag handler needs to do is mark the results of including a page with access controls as dynamic (possibly by user). Also note that tag handlers that don't return a static/dynamic identifier would be handled as dynamic.

But the most important thing to remember is that precompilation is at a different layer than FoswikiCache. Even if FoswikiCache would not use the static/dynamic hints that would be provided, it would still get a boost from decreased topic compilation times.

So in summary:
  • FoswikiCache strives to be correct at all times, at the expense of CPU and storage
  • Letting tag handlers return static/dynamic hints about their results would enable precompilation
  • Precompilation would speed up regular topic compilation times, which would speed up cache-misses for FoswikiCache
  • Optionally, FoswikiCache could query the staticness of a topic to know if:
    • A cache deletion is really necessary
    • The topic would be invariant to users and therefore could be shared

Right?

Either precompilation or FoswikiCache would work without each other. I'm just really curious what the speedup is. Some profiling would be able to tell us if precompilation is worth pursuing.

-- WoutMertens - 26 Apr 2008

So as a fairly newbie (just installed TWiki 4.2), I've got several questions: 1) Why does TWiki not support cacheing form "out of the box"? Clearly (from my own experience) TWiki is a little sluggish-- even with the help of mod_perl. Supporting and focusing on a fully functional cacheing solution would be outstanding. 2) I think this is the right addon for me -- i'm running a protected TWiki on a public internet on my own server for a small team of people. Everything is protected via .htaccess methods with the TWiki guest function turned off. PublicCacheAddOn looks like another great implementation, but it seems to sacrifice user individuality and other features (although I'm not sure why that sacrifice was made -- simplicity?) 3) Most importantly: How do I install this addition???? Is there a linked page with better installation instructions for 4.2? 3.1) What does $TWiki::cfg mean? This isn't a command line call.... is this a Twiki/configure call of some sort? What do some of the rest of the calls mean?

I look forward to the response(s). Thanks

-- RedByer - 10 May 2008

On PublicCacheAddOn:
  • If your TWiki is just write-protected, clearly it is a good solution for your. If it is read-protected via TWiki Access control statements, you cannot use it. However, if you are protecting it with .htaccess, I think you can use it: just allow anonymous access from the local machine (via its IP) and it should work (The cache gets the pages by wget from the local machine, it does not save a particular view of an user)
  • on "sacrifice": remember that the web work because everybody sees the same thing at the same URL. Personalisation goes against the fundamental web architecture (no google could exist if google saw different things from users), and in my opinion is evil and should be banned. For instance if you keep things "right" you can get efficient caching (100x) otherwise you will get only a 2x speedup from my tests. I regret deeply that people were not more aware of this and let themselves prisoner from these "features" and try to obey them rather than ditch them, which explains why there is no caching out of the box: it is too hard to do trying to accomodate these "features". A personalised left bar should for instance be forbidden in the TWiki engine. If you really want this, use non "core web" techniques like javascript.

On $TWiki::cfg it is perl code, to be used in TWiki addon or plugins, to access variables set by bin/configure

-- ColasNahaboo - 11 May 2008

If this were a public site, I'd be all over PublicCacheAddOn, but for now we're hosting a private workgroup of about 5 users. There is to be no public access or anonymous viewing. The controls are done with Apache .htaccess controls to block off access to all the sections (including /pub). The personalization of the webpages handles the link on the top left of the web bar to send to the user to their homepage.

Still not sure I understand how to apply this patch -- not written for us noobs.

-- RedByer - 12 May 2008

Red, sorry that the code isn't released yet. TWiki-5.0 will have this kind of caching build in, out of the box.

Colas, personalized or role based web content is quite a common thing. Nothing wrong per se with it, although it hinders different users from sharing fully cached pages, obviously.

I wouldn't go so far to ban personalization from TWiki just because it is hard to cache. For example, people simply get different content because a query returns a different hit set based on their access rights. Banning personalized/rolebased content from TWiki would also put an end to any workflow feature where content has got different states of clearance etc.

-- MichaelDaum - 17 May 2008

Although I cannot technically add to the discussion I just wanted to throw in a "thumbs up!" for the caching efforts. I believe, that this will help especially large TWiki-Implementations a lot.

-- MartinSeibert - 18 May 2008

Updated docu.

-- MichaelDaum - 08 Jun 2009
Topic revision: r1 - 08 Jun 2009, MichaelDaum
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy