Feature Proposal: Deprecate auto attachments, and make UpdateAttachmentsPlugin a default

Motivation

Auto-attachments are terribly inefficient, requiring a directory scan and meta merge for every topic view. It also directly ties Foswiki to the file store, making implementation of the Store Listeners and CDN enabling much more difficult.

Description and Documentation

Deprecate the configuration parameter $Foswiki::cfg{RCS}{AutoAttachPubFiles} and add UpdateAttachmentsPlugin as a default plugin. (Note that the plugin needs to be fixed for Foswiki 1.1. Tasks.Item9322)

Move the auto attachment function into a Meta and Store validation API so that it can be called on demand from a new UpdateAttachmentsPlugin

This is not a proposal to eliminate auto attachments. It moves the auto-attach overhead out of the readTopic path and into an explicit API. Auto attachments will still function, but will be changed to warn in configure as a deprecated option, and ongoing development like the Store Listeners will ignore auto-attached files. The only way to keep attachments synchronized with the listeners will be to use the revised UpdateAttachmentsPlugin.

Examples

Impact

UpdateAttachmentsPlugin should probably go under release control even if not shipped as default. By moving the attachment validation into the Meta and Store API, UpdateAttachmentsPlugin will have a much better chance of remaining compatible with new releases.

%WHATDOESITAFFECT%
edit

Implementation

API Changes

  • Add a Meta::getValidatedMeta( $topicObject ) - returns a new validated topicObject.
    • New attachments discovered added to Meta
    • Removed attachments removed from Meta
    • Updated attachments (check file date/time & size) have updated Metadata
    • Validate rev so that the Topic rev. matches the rcs rev. (TBD - Unsure how/where to implement this validation.)

  • Add Meta::isValidated() returns true if the Meta was validated
    • implies that there may be changes due to validation.

  • Meta::getDifferences currently uses the Store and RCS system to compare two revisions in store.
    • Test if topicObject is "dirty" and
    • Add an option to use Algorithm::Diff to compare the in-memory Object with the Store.

  • Meta::isModified() Meta needs to record / report if it has bee modified.
    • Problem - Meta hash can be externally accessed.
    • Deferred - we don't have a good solution for this yet. See AddAReadOnlyMetaClass

  • Add VC::Store::validateAttachments( $topicObject )
    • Performs the auto-attach like function to synchronize Meta with Store
    • Updates the passed topicObject with the synchronized Attachment data.
    • Records insert/update/remove status in the topicObject for later query & reporting
      • May not be needed if processing Meta::getDifferences would be sufficient.
      • Would an option to restrict getDifferences to Metadata only make sense?
    • Refactored out from the VC::Store::readTopic autoattach code.

  • Update VC::Store::saveTopic
    • If TopicObject has been validated before save, report any attachment changes to the Listeners, referencing status recorded in validateAttachments
      • TBD - process results of Meta::getDifferences when Meta::isValidated() is true.

  • The UpdateAttachmentsPlugin is simplified
    • Meta::readTopic - reads in the topic to be synchronized
    • getValidatedMeta() - new topic object with updated attachments
    • saveTopic - Saves new object which triggers Listeners called for updated attachments and topic.
    • Report differences between the meta objects.
-- Contributors: GeorgeClark - 08 Dec 2010

Discussion

AutoAttachPubFiles is already optional, and off by default. What gain would you get by replacing the functionality with a plugin?

-- ArthurClemens - 08 Dec 2010

Directory reading is very inefficient. Also it results in "inconsistent" Meta updating. So readTopic returns meta that is not actually on disk. It may or may not be present depending on whether or not someone else has updated the topic. Saving a topic makes the meta information become permanent, (and attributed to the user who saved, not the user who actually added the topic iirc.) Also the Store listeners added to mirror topics into the DBI Store will "miss" the auto-attached files. It really is a bad hack.

The plugin handles it much more efficiently:
  • The rest call can be issued to correctly update the Metadata or..
  • cron scripts can be run to incur the overhead of scanning the storage for inconsistent attachments.
-- GeorgeClark - 09 Dec 2010

more important is what we lose - we lose a weird situation where the meta that we're GET ing is different from whats on disk, which means that we're breaking the user's expectation any time they do a query that thinks down to grep.

it means that if you have a high load server, every GET does more work - when we need to reduce the workload.

I'm not 100% sure, but i suspect if you do a query search on an autoattach file, it won't be there - because we're thining out the results using grep, and then loading and testing

sooooo - if you do a query, you either force the dir and file lookup foreach topic in the considered set, or you miss anything that happens via autoattach.

this feature was only implemented to cater for showing in the BASETOPIC's attachment table, and pretty much is broken everywhere else
(11:26:51 PM) SvenDowideit: ok, a little more added to the fet req
(11:26:58 PM) SvenDowideit: short version
(11:27:09 PM) SvenDowideit: imagine a query over a web of 10000 topics
(11:27:14 PM) SvenDowideit: if there are no attachments
(11:27:26 PM) SvenDowideit: you are still foing 10000 dir lookups
(11:27:51 PM) SvenDowideit: and if there are 5 attachments in every topic, even if they are all in meta, you're doing.... 60000 lookups
(11:27:54 PM) SvenDowideit: and so on
(11:28:20 PM) SvenDowideit: and there endeth the madness

-- SvenDowideit - 09 Dec 2010

This has my support.

I am not even sure we need to have UpdateAttachmentsPlugin as default. Just updating it so it is working and well documented may be enough.

-- KennethLavrsen - 09 Dec 2010

Deprecating autoattach makes sense, but it does not follow that we need UpdateAttachmentsPlugin as a default plugin. Those that need it can install it. Deprecating autoattach effectively moves functionality out of the core, which does not need to be there.

-- MichaelTempest - 09 Dec 2010

Off topic question: Is there any chance that FW will support attachments with spaces in their file names some day in the 21th century? wink

-- FranzJosefGigler - 09 Dec 2010

AutoAttach and UpdateAttachmentsPlugin both have a limitation in that they update topic Metadata (AutoAttach - in memory only, UpdateAttachments - in the topic file) But neither actually drives the Store API to recognize an attachment.

-- GeorgeClark - 10 Dec 2010

FranzJosefGigler, support of spaces in topic and attachment names is old legacy restrictions and will be difficult to resolve. All names are filtered with the same {NameFilter}, and Space removal for attachment names is hard coded in Foswiki::Sandbox::sanitizeAttachmentName, I've done a bit of testing. Spaces in topic names runs into various traps were pages are not parsed correctly, etc. The fundamental support works - it's possible to create a topic with a space in the name, but rendering has issues. Same problem with attachment names. The attachment rev. table has problems, rename and copy have strange results, etc. A lot of this is all entwined down in the regular expressions used in topic parsing. Space is a delimiter for a lot of operations, so permitting them to be embedded will require some significant effort.

Need to write up a development proposal and find a developer who can tackle the effort. It doesn't appear to be trivial.

-- GeorgeClark - 10 Dec 2010

IRC Discussions for approach to resolving the Meta issues.

[Moved the API notes up to implementation]

-- GeorgeClark - 10 Dec 2010

Another not that off topic question: Would it make sense to somehow use CRC checksums on attachments? This would make it possible to identify two identic attachments although they may have different names and possibly be stored in separate directories, wouldn't it?

And thanks for the analysis George, I feared so.

-- FranzJosefGigler - 10 Dec 2010

Since this is a formal feature proposal that requires a community decision - it is desirable that we stay on topic. Spaces and CRC checks are not relevant to the proposal.

The plugins available, so those that need the autoattach feature AFTER we have removed the feature, are very important and relevant.

The discussion I have seen here and on IRC confirms to me that the auto-attach is a resource disaster that harms more than it helps. And the discussion has also confirmed my statement that we do not need to add a new default plugin to do the similar feature. But I think we should ensure that we have a working plugin available that can attach files to topics from command line or similar. And do it via official API so we do not have to check that the plugin works each time we release Foswiki.

-- KennethLavrsen - 10 Dec 2010

Regarding the Checksum question, that's maybe a bit more on topic - the question is how to determine if an attachment has changed during the scan of attachments. For local auto-attach, I don't believe it's as important, and the overhead of doing an MD5 on each attachment would outweigh the extra storage of an occasional duplicate attachment. My intention during the Auto-attach pass is to verify the size and file system timestamp. I think that should be adequate. Other options for autoattach might be to use something like FAM or inotify on Linux to watch for changes. But this ties us to Linux, and is also designed to watch individual files or inodes which implies a large number of watched objects - one per directory that might receive an attachment. Too complex for little added benefit.

However for the proposal EnableCloudStorageForAttachments, the MD5 might make more sense, as the cost of the external storage, and bandwidth for the mirroring are tangible costs. I have not explored the cloud storage capabilities yet, but one thought is to store unique files keyed by their md5 or some other key, and maintain a mapping from Web/Topic/Attachment to the stored file. However discussions about that belong in that topic, not here.

-- GeorgeClark - 10 Dec 2010

I've refactored this topic a bit, moved the API changes up to Implementation section, and reflect the apparent consensus that UpdateAttachmentsPlugin need not become a default plugin.

-- GeorgeClark - 10 Dec 2010

Has my support in principal (I wrote AutoAttach - I don't care when it happens as long as it can happen).

-- MartinCleaver - 04 Jan 2011

UpdateAttachmentsPlugin has been updated under Item9322.


Setting this to a discarded proposal. Enough of the internals have changed that it needs redesign.

-- GeorgeClark - 18 Nov 2015
Topic revision: r21 - 19 Nov 2015, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy