You are here: Foswiki>Tasks Web>Item9893 (24 Apr 2011, PaulHarvey)Edit Attach

Item9893: mongo throws an exception if you're sorting on a key that is not indexed?

pencil
Priority: Urgent
Current State: Closed
Released In: n/a
Target Release: n/a
Applies To: Extension
Component: MongoDBPlugin
Branches:
Reported By: SvenDowideit
Waiting For:
Last Change By: PaulHarvey
in an annoying wtf, it seems that MongoDB has issues with adhoc queries on large datasets :/

Could not perform search. Error was: query error: too much data for sort() with no index at /home/sven/src/foswiki/core/lib/Foswiki/Search/MongoDBInfoCache.pm line 56. at /home/sven/src/foswiki/core/lib/Foswiki/Search/MongoDBInfoCache.pm line 56 Foswiki::Search::MongoDBInfoCache::next('Foswiki::Search::MongoDBInfoCache=HASH(0x4bd2778)') called at /home/sven/src/foswiki/core/lib/Foswiki/Search/ResultSet.pm line 93 Foswiki::Search::ResultSet::hasNext('Foswiki::Search::ResultSet=HASH(0x4bd2370)') called at /home/sven/src/foswiki/core/lib/Foswiki/Search.pm line 357 Foswiki::Search::searchWeb('Foswiki::Search=HASH(0x4a20ae8)', 'search', 'ZonePlugin', 'basetopic', 'ZonePlugin', 'reverse', 'on', '_RAW', '"ZonePlugin" limit="10" scope="topic" web="System" nonoise="o...', ...) called at /home/sven/src/foswiki/core/lib/Foswiki/Macros/SEARCH.pm line 32 Foswiki::__ANON__() called at /usr/share/perl5/Error.pm line 416 eval {...} called at /usr/share/perl5/Error.pm line 408 Error::subs::try('CODE(0x24961a0)', 'HASH(0x4a20b00)') called at /home/sven/src/foswiki/core/lib/Foswiki/Macros/SEARCH.pm line 41 Foswiki::SEARCH('Foswiki=HASH(0x2a11550)', 'Foswiki::Attrs=HASH(0x2496278)', 'Foswiki::Meta=HASH(0x35d1688)') called at /home/sven/src/foswiki/core/lib/Foswiki.pm line 3015 Foswiki::_expandMacroOnTopicRendering('Foswiki=HASH(0x2a11550)', 'SEARCH', '"ZonePlugin" limit="10" scope="topic" web="System" nonoise="o...', 'Foswiki::Meta=HASH(0x35d1688)') called at /home/sven/src/foswiki/core/lib/Foswiki.pm line 2905 Foswiki::_processMacros('Foswiki=HASH(0x2a11550)', '

which is going to make life interesting

-- SvenDowideit - 25 Oct 2010

Goodness, SERVER-790 is curious :/

-- PaulHarvey - 26 Oct 2010

there's more. you can only define 40 indexes, and compound indexes only seem to help sometimes

a couple of possibilities:
  1. only add indexes when this issue is noticed (ie, when a mongoDB query returns this error, create an index on the requested sort, and if we run out, either tough, or delete one..)
  2. use one collection per DataForm definition- with the limitaion that you cannot have ~30 fields (as we need to index built in things too)
  3. perhaps sharding mitigates this?
  4. map-reduce
  5. work harder on reducing document size, as the 1Meg limit is based on document size of the result

-- SvenDowideit - 29 Oct 2010

So, we want mongodb to sort because paging through large results requires it to be done at the db server, otherwise we lose the whole point of having the db do the work instead of Foswiki.

Would it be possible to make a mongo query with sort/skip+limit parameters to build a result where each item was purely ObjectID (12 bytes) and the sorted field (I assume you need the sorted field in order for sort() to be able to do its thing)? A 2nd query from Foswiki would then obtain the full documents (or the pieces of them it wants, anyway).

OTOH as you said on IRC I think collection-per-form is probably aligned with the nosql strategy mongo peeps seem to be advocating anyway. Also, many of my wiki-apps start with a "form.name='BlahForm'" anyway, as the form indicates the topic 'type'.

I think the most fields we have on our scariest form is ~20, and I'm working on normalising-out that one so it'll be back down to 12 or so. Sharding might mitigate this but not to the extent we need to have a solution... we have 2 shards and I hope that's enough for now.

The other reason I'm warming to the idea of collection-per-form is that we could enhance DataForms to specify index types. Maybe. One day smile For example, I see that there's a mongodb 1.7 patch which allows polygon (rather than box, circle) 2d geospatial search.

Hm. I guess collection-per-form could be a bit more fiddly for a 3rd-party (non-foswiki) app to work with. OTOH it might actually make it easier - all the 3rd-party app ideas we have for talking directly to mongodb without Foswiki in the middle, would probably benefit from having the topics already grouped by form - as they will want to be working on a particular type of (data) topic.

Well, I don't think I actually contributed to a solution, I'll stop typing now...

-- PaulHarvey - 29 Oct 2010

Hmmm, mongodb.org needs to update its docs. SERVER-1140 seems to imply that 1.6 can have 64 indexes on a collection.

-- PaulHarvey - 31 Oct 2010

Okay, another thought. SemanticLinksPlugin wants efficient SEARCHes too. It registers a few META:BLAH types, and I guess they will want to be indexed as well.

So the problem with indexes seems to be mounting. Hmmmmm.

Maybe MongoDBPlugin is going to need to split things up into collection per form; and dedicated collections per META:TYPE?

This will require some tedious book keeping. Collection-per-form isn't too bad, the topic object will exist only once.

But splitting up pieces of the topic into constituent bits means (I guess) the 'topic' document maintaining dbrefs into the other META:TYPE collections, and of course the various collections per META:TYPE will have a dbref back to the 'topic' object.

Is it really tedious to eval js on the server so that Foswiki can obtain a 'complete' topic object in just one request? Or is this approach doomed to fire n + 1 requests to completely fetch a topic, where n is the number of different META:TYPEs in use on the topic?

/end ramble

-- PaulHarvey - 19 Nov 2010

As of now, this item is the urgentest. Anecdotally for us it gets "too large" to sort somewhere between 1000-1500 topics

-- PaulHarvey - 24 Mar 2011

MongoDBPlugin:d8285c0b83fd is actually for this task, and makes a significant improvement on my tests

-- SvenDowideit - 31 Mar 2011

This still:
%SEARCH{
  "form.name='Lauries.GlossaryForm'"
  web="Lauries/GlossaryData"
  type="query"
  order="formfield(Base)"
  pager="on"
}%

Produces this: Could not perform search. Error was: query error: too much data for sort() with no index. add an index or specify a smaller limit

I can however, see that we can sort TaxonProfile/Definitions now. So I guess we'll keep working on that

-- PaulHarvey - 01 Apr 2011

ok, with the commits we have now, order=formfield(Base) works on my system - can you confirm?

-- SvenDowideit - 08 Apr 2011

I'm having trouble with the insects web - here's Insect000000-Insect078517, can you load it and tell me how many pages the SEARCH pager gives you with the following expression:
%SEARCH{"1"
  type="query"
  web="System.MigrationScriptsInsectsDemo"
  pager="on"
  pagesize="10"
}%

http://wiki.trin.org.au/Sandbox/TestTopic12 only gives me 1937 pages, indicating that my sort tests are only sorting on ~19,370 topics..

-- PaulHarvey - 23 Apr 2011

So, after doing a db.dropDatabase on the web, and then doing another updatweb... I found that mongo wasn't actually loading the topics! Instead, I had to re-load the root (System) web.

-- PaulHarvey - 23 Apr 2011

so you mean to say, it works, the bug is fixed, but we have some kind of (new task) issue with loading?

-- SvenDowideit - 23 Apr 2011

Indeed

-- PaulHarvey - 23 Apr 2011

Okay, I'm able to sort the insects DB (first 162,000 anyway!) on any formfield in around 3s typical. Nice smile I think we can close this.

-- PaulHarvey - 24 Apr 2011
 

ItemTemplate edit

Summary mongo throws an exception if you're sorting on a key that is not indexed?
ReportedBy SvenDowideit
Codebase trunk
SVN Range
AppliesTo Extension
Component MongoDBPlugin
Priority Urgent
CurrentState Closed
WaitingFor
Checkins MongoDBPlugin:54ba499e4ef4
TargetRelease n/a
ReleasedIn n/a
Topic revision: r18 - 24 Apr 2011, PaulHarvey
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy