You are here: Foswiki>Tasks Web>Item10772 (22 May 2011, PaulHarvey)Edit Attach

Item10772: VarCachePlugin issues

pencil
Priority: Normal
Current State: New
Released In: n/a
Target Release: n/a
Applies To: Extension
Component: VarCachePlugin
Branches:
Reported By: TimotheLitt
Waiting For:
Last Change By: PaulHarvey
I implemented VarCachePlugin on my (sigh, still TWiki) server and found a number of issues in VarCachePlugin and in TablePlugin.

Patches for the TWiki version are at http://www.twiki.org/cgi-bin/view/Plugins/VarCachePluginDev.

Looking at SVN trunk: the TablePlugin issue is already fixed in Foswiki.

VarCachePlugin still has the following issues that I fixed for TWiki:
  • Mod_Perl global variable problem
  • Url Parameter filter is not working as intended (should treat params other than varcache as varcache=no, but checks for 'refresh' instead.)

I also chose to create one sub-directory of working per web for the cache data; for large installations, this will keep the directory size down (v.s. the Foswiki web_topic in the plugin's working directory.)

Note also that the TWiki version has been updated since the Foswiki port - it now handles caching HTML Headers; we probably should take those changes.

Not clear if you're better off re-porting the Plugin or selectively merging the changes...

Enjoy.

-- TimotheLitt - 19 May 2011

Cool, thanks for the heads up!

On Foswiki it's probably better to just use the built-in PageCaching feature, which tracks dependencies (per-user) so it can automatically expire a topic's cache if some other topic involved in the rendering has changed. It's a been a (very!) long time, but I don't remember VarCachePlugin doing that.

The built-in PageCaching doesn't handle the case where newly created topics influence a SEARCH etc., in that case you still need to manually refresh.

-- PaulHarvey - 19 May 2011

I believe we should fix the VarCachePlugin.

I have used it in the past. And I have also tried our new PageCaching. I gave up on PageCaching.

Our PageCaching has several problems.

  • It slow down Foswiki significantly. Even if you enable the plugin and disable the cache. It seems that even if the feature does not cache a topic it still builds the cache files and it takes time all the time.
  • In applications where you create new topics the PageCaching often ends up showing an old search result and ommitting the new topics. In VarCachePlugin I could better manually control the the refreshing.

So the two overlap in scope but cannot replace each other. Our PageCaching feature for sure does a lot of stuff that the VarCachePlugin cannot do well.

-- KennethLavrsen - 19 May 2011

Yeah, we really need make the PageCache add dependencies to ResultSets, and expire any cached topic that is affect by a newly created topic that would be affected by the ResultSet

-- PaulHarvey - 19 May 2011

I noticed one other problem after posting this - omitting the {} (as one would do if all the defaults are acceptable) fails to recognize the macro. I posted the fix for that as an additional patch section.

(Simply, make the {(.*?)} into (?:{(.*?)})? in before CommonTagsHandler)

There is another issue that I haven't addressed -- if you delete or move a topic, nothing cleans up the cache. Same thing happens if you remove the %VARCACHE% macro from a topic.

-- TimotheLitt - 19 May 2011

Oh, and with respect to PageCaching - I just read the documentation. Fixing the case where SEARCH is tracking topics will be hard.

Consider this SEARCH, which is what started me down this path. It's nothing but a nested enumeration of topics parents, but the same applies where the names match a regex. So every topic create/add/move/rename of any topic would have to invalidate every page containing such a search. And, to complicate matters, there are dozens of such searches - which are INCLUDEd from template topics.

%SEARCH{ search= "parent.name = '%INCLUDINGTOPIC%'" 
   web="%INCLUDINGWEB%"
   scope="topic"
   type="query"
   nonoise="on"
   casesensitive="on"
   reverse="off"
   separator=" $n"
   nofinalnewline="off"
   format="| $web.$topic | $percntSEARCH{ \"parent.name = \"$topic\"\" web=\"%INCLUDINGWEB%\" nofinalnewline=\"on\" type=\"query\" scope=\"topic\" nonoise=\"on\" nosearch=\"on\" casesensitve=\"on\" format=\"$dollarweb.$dollartopic\" separator=\"  \" }$nop% |"
 }%

These searches are the ones that I needed to cache...so marking the searches as would be counterproductive smile

Seems like we're stuck with both until someone has a clever, efficient idea for fixing this.

One last thought - if PageCaching can someday be made a superset of the plugin, it would be nice to have it handle the plugin's macro compatibly - I already have thousands of pages marked.

Well, hopefully one of these months I'll migrate to Foswiki so I'll have that problem.

-- TimotheLitt - 19 May 2011

So every topic create/add/move/rename of any topic would have to invalidate every page containing such a search. And, to complicate matters, there are dozens of such searches - which are INCLUDEd from template topics.

Well, the good news is that the PageCache is supposed to handle this "deep dependency tracking" automatically. All that happens is, when you rename/modify a topic, the PageCache can lookup which (cached) topics have dependencies on the renamed/modified topic, and mark those as invalid.

So any child topic in your search above, if they are modified, should invalidate the cache of any topic that uses this search (or an INCLUDE of this search, or an INCLUDE of an INCLUDE of this search..).

All topics which were used to render a page, are counted as dependencies on the cache of the generated HTML. But the cache isn't just for a Web.Topic name, but also authenticated username and URL params are additional dimensions that can result in hundreds of versions of a single topic being held in the cache.

This generates a lot of data. On our (publicly accessible) site, without a proper cache expiry setting, we had ~2GB of cache data accumulate in a couple of weeks.

But as I said, only edits and renames on already tracked topics can invalidate a cache right now. To handle the case where a new topic matches some cached search, or a previously un-tracked topic that has been modified to match a cached search, this requires us to add query expressions as dependencies on a piece of cached content. When a topic is modified or created, we'd have to pass it by a series of of these queries to see if any of them match, and then invalidate any cached topics that had the query/ResultSet as a dependency. I guess this could be a plugin saveHandler, but the Store Listener is another layer that could be used...

For what it's worth, I'd really love to find the time to better understand PageCache code and think of how to use it to cache things at the macro-level, rather than fully rendered HTML page. And it could be a good excuse to try out neo4j... smile

-- PaulHarvey - 19 May 2011

It's also worth noting that, IIRC, the PageCaching feature probably requires a persistent perl environment to avoid the overheads Kenneth has mentioned.

I do agree that it's not quite ready for many typical Foswiki installations just yet, but even a 1 hour cache timeout can reduce server load noticeably when you're being hammered with a lot of traffic from bots.

-- PaulHarvey - 19 May 2011

Yup - but the point of my example is that it's ONLY based on what topics exist. So every topic create/rename/move would have to be run be this search (and any search in any web that depends on what topics exist). You can optimize the case of a topic that disappears by keeping a dependency tree for which topics contain a search result that included it. I don't see an optimization for when a topic appears.

Seem to have a streak running. There's yet another bug in VarCachePlugin - patched in the latest update to my TWiki note. refresh argument of zero (which is supposed to mean infinite) is over-ridden with the default of 24 hours. Usual perl coding bug.
             my $refresh = TWiki::Func::extractNameValuePair( $theArgs )
-                       || TWiki::Func::extractNameValuePair( $theArgs, "refresh" )
-                       || TWiki::Func::getPreferencesValue( "\U$pluginName\E_REFRESH" ) || 24;
+                       || TWiki::Func::extractNameValuePair( $theArgs, "refresh" );
+            $refresh = TWiki::Func::getPreferencesValue( "\U$pluginName\E_REFRESH" ) unless( defined $refresh );
+           $refresh = 24 unless( defined $refresh );
             $refresh *= 3600;
-- TimotheLitt - 20 May 2011

That's what I mean about adding a ResultSet as a dependency - yes, every create/rename/modify would have be tested against any query expression that might include it.

However there are some mitigating factors:
  • Save/rename/modify operations are far outnumbered by page views. IMHO it's okay to have expensive saves for the sake of quick views.
  • If we don't want to add several seconds to a save operation, we could make an async batch/background task that processes recently created/renamed/modified topics to invalidate caches based on query dependencies that include them
  • There may be hundreds of queries to evaluate, but the query set size is 1 - just the topic being created/renamed/modified.
  • Hopefully your server has plenty of spare CPU, as it will have an awesome cache system wink

Anyway, I'm sorry to drag this task so far off-topic. I do worry that we should be unifying our efforts into a coherent caching strategy, with proper dependency tracking, but I guess for now we all have higher priorities.

-- PaulHarvey - 21 May 2011

No problem - as long as someone takes the patches for the plugin in the meantime smile

Nit: the query set size is 2 for a rename/move (old name/old location & new name/new location).

A background task seems much more sensible - I certainly don't want to add seconds to, say, clicking "add comment". Or editing that 1-character typo. It's really important to keep Wikis interactive and responsive - otherwise users resist using them (even more than normally.)

Cache refills (views) also get slower - you don't just list the viewed topic's dependencies, you have to intercept all the TML operators that create them. E.g. SEARCH expressions; any IF expression that results in a topic exists test or creates a topic name from a macro, preference etc. (Consider %IF %servertime% is "afternoon" THEN "AfterNoonActivities" else "MorningMeetings") Then, any plugin that does any of these behind the scenes needs an API - and if we're clever, some hooks (e.g. Foswiki::Func::TopicExists could make a dependency graph entry - handing a simple case, but it wouldn't know how the argument got constructed.)

Nothing insurmountable - but as I said, this seems hard to get completely right.

I do like several of the features of the page caching design; marking areas, involving the browser with etags and so on. So I don't mean to discourage. It's just that it seems hard to get right.

VarCachePlugin, besides being on TWiki, has the advantage of being small, simple, fast and effective for a useful subset of cases, at the expense of being visible to the user via the "if you're confused, click refresh" link. It's "good enough" and "quick" vs. "complete" and "elegant". I'd be happy to see it superseded by page caching - when all the engineering issues are resolved.

Meantime, I wonder how hard it would be to teach VarCachePlugin to etag/304...

Perhaps this exchange should be refactored into a page caching topic and (the original) VarCachePlugin bugs report...

-- TimotheLitt - 21 May 2011

FWIW, Foswiki::Meta is already adding dependencies to the currently viewed page on any topicExists() & load() 'd topic. If a plugin does deeper things than this with, the API is Foswiki::Meta::addDependency() and Foswiki::Meta::fireDependency(). WebPreferences topics (and a configurable list of others) are automatically dependencies to all topics in a web.

I'm sure a user of VarCachePlugin will commit the patch, I just wanted to bring attention to the fact that we should be unifying these features... one day smile

-- PaulHarvey - 22 May 2011
 
Topic revision: r10 - 22 May 2011, PaulHarvey
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy