Feature Proposal: Delegate More Processing To Search Algorithm

Motivation

During my development of the Kino Search Algorithm in KinoSearchContrib, it becomes incredibly obvious that the Foswiki core needs to delegate more choices to the Search Algorithm.

This work may be interwoven with some of the ResultSets and ExtractAndCentralizeFormattingRefactor work.

Description and Documentation

In TWiki 4.2.2, when SEARCHs happen, we call a very naive pluggable function once per web -

SearchAlgorithm::search ( $searchString, $topics, $options, $sDir, $sandbox, $web )

where $options only contains scope, type, casesensitive, wordboundaries, and $topics (painfully) created list of topics.

This function then returns a hash of topic name to 'extract', which the Search rendering then throws away, keeping only the topicname list.

KinoSearchContrib (As can the Xapian Engine I'm working on) can return (incredibly quickly) all the meta information for the topic, including a contextual extract, and to add to that, can return non-topics - attachments and other external data, which I would love to use.

Impact

API, Performance, Refactoring, Search %WHATDOESITAFFECT%
edit

Implementation

So: I propose to refactor the TWiki::Store::SearchAlgorithms and TWiki::Store::QueryAlgorithms API's (which I understand only Crawford and I have worked with please pipe up if I've missed you to :
  1. bring them into one API, where multiple SearchAlgorithms can register themselves as capable of processing a search type (or list of types)
  2. create the UI elements to dynamically add support for enabled 'types' in the WebSearch topic (so we can have attachment, external doc, google search) checkboxes
  3. pass the SearchAlgorithms all the known settings that might allow it to optimise a query (including the format string)
  4. use any information that SearchAlgorithms return in the output rendering, thus leveraging advanced improvements
for backwards compatibility, the currently existing search types and scopes will be required to return identical results as in previous versions of twiki. This implies that scope=all will not in fact search all data types, but rather only topicname and topic text.

-- Contributors: SvenDowideit - 19 Aug 2008

Discussion

Great Initiative, Sven!!!

From my studies about twiki performance, I realized that search and store are the worst bottlenecks. I was planning to try out Xapian (it seems to be very fast).

TWiki-5 will fly smile

-- GilmarSantosJr - 19 Aug 2008

Sounds excellent, Sven. The devil is in the detail; it sounds like you will be doing a lot of refactoring in Search.pm (to get rid of those topic lists, for a start).

Ideally I'd like the API fixes to climb higher up the tree so that I can perform multiple-web searches with one call; though that may be a refactoring too far.

-- CrawfordCurrie - 19 Aug 2008

It would be so cool to make it a modern interface using iterators over result sets. I can imagine that most of the current Search.pm simply goes over the fence.

-- MichaelDaum - 19 Aug 2008

Please remember a date in date of commitment field so the proposal app can work. Added todays date

-- KennethLavrsen - 11 Sep 2008

The options are now passed on to the Search Algorithm, which can ignore them as it needs - The MongoDBContrib work validated parts of this, and when foswiki 1.1 is released I'll continue work on that.

-- SvenDowideit - 14 May 2010

Any documentation for the new API?

What is the MongoDBConbtruib all about?

-- JulianLevens - 14 Jun 2010

docco - sorry, like the store api on the whole, the source is still moving - I should really write something asap.

MongoDBContrib is the latest in my attempts to provide a modern backend for foswiki - its in svn - but i think its broken right now - you should see some commits to update it to the current state of 1.1 very soon.

frown, sad smile sorry, i've dropped working on 'future' things to focus on getting 1.1 out.

-- SvenDowideit - 16 Jun 2010
 
Topic revision: r8 - 06 Dec 2010, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy