You are here: Foswiki>Tasks Web>Item10596 (04 May 2011, SvenDowideit)Edit Attach

Item10596: MongoDBPlugin milestone 4

pencil
Priority: Normal
Current State: Closed
Released In: n/a
Target Release: n/a
Applies To: Extension
Component: MongoDBPlugin
Branches:
Reported By: SvenDowideit
Waiting For:
Last Change By: SvenDowideit
This one's about redoing the schema so that each web is in a seperate 'database', with a 'current' collection, a history collection and other optimisation based collections.

Id Applies To Summary State Last Edit By
Item10532 Extension MongoDBPlugin award WIBNIF we allowed queries to go to slaves (slave_ok) Needs Developer 8 years SvenDowideit
Item10409 Extension MongoDBPlugin award query on createdate Closed 12 years PaulHarvey
Item10663 Extension MongoDBPlugin award renaming a web moves the web on disk, but doesn't rename the DB in mongo. Closed 12 years SvenDowideit
Item10664 Extension MongoDBPlugin Having trouble getting MongoDB to notice newly created webs - i.e. it seems I have to explicitly load a web before the DB gets created for it Closed 12 years SvenDowideit
Item9893 Extension MongoDBPlugin award mongo throws an exception if you're sorting on a key that is not indexed? Closed 12 years PaulHarvey
Item10611 Extension MongoDBPlugin award MongoDB sort order=createdate and order=modified weirdness Closed 13 years SvenDowideit
Item10654 Extension MongoDBPlugin award SeaSlug results in MongoDb not identical to bruteforce Closed 13 years SvenDowideit
Number of topics: 7

-- SvenDowideit - 05 Apr 2011

So, the urgentest one is sorting Item9893, and the following tasks are "important but not urgent" (roughly in order):

  • Item10611 - order="modified" and order="created" weirdness, but I'm still trying to understand this bug
  • Item10409 - this might be an easy win that our users would appreciate, however, if it makes more sense to tackle this at a later stage then that's cool too
  • Item10532 - slave_ok madness

-- PaulHarvey - 08 Apr 2011

er, and obviously delegating ACLs is probably a m5 thing

-- PaulHarvey - 08 Apr 2011

moved m5 tasks to Item10652

added magic list to make my life easier

-- SvenDowideit - 19 Apr 2011

Hi Sven, sorry about the delay testing m4. Hrm. Before, our /var/lib/mongodb/set was ~4-5GB. MongoDB's chunk size is 200MB, so per-web this is the first allocation size - ie. minimum 200MB overhead per web/subweb. So now we've got 23GB. Our standard VM setup is 60GB disk, so I'm having to do some tidying up now.

Initially it looks like we only have ~1GB RES, so I hope this is just disk overhead (not RAM). We'll see how it goes..

I'm concerned that DB-per-web will kill off any wiki app that makes heavy use of subwebs, although I don't use that pattern atm.

But the good news: we can sort Lauries now smile Performance is about the same as the m3+ code we were on.

-- PaulHarvey - 20 Apr 2011

Also having trouble getting MongoDB to notice newly created webs - i.e. it seems I have to explicitly load a web before the DB gets created for it

Item10664

-- PaulHarvey - 20 Apr 2011

Also, it doesn't work if I load the new subweb by itself: it seems that SEARCHes can't see the new data unless I re-load the whole root web

-- PaulHarvey - 20 Apr 2011

MigrationScriptsContrib now has a script to load all insect names from AFD

-- PaulHarvey - 20 Apr 2011

I think --noprealloc --smallfiles can save the day. It's nothing to do with "chunk size" (I think - that's a sharding thing), I think our massive disk usage is just the prealloc overhead.

-- PaulHarvey - 20 Apr 2011

Here are the results:

Web Size Config
System 209M  
Sandbox 417M  
System 65M --noprealloc --smallfiles --directoryperdb
System + Sandbox 97M --noprealloc --smallfiles --directoryperdb

Without the extra options, adding the Sandbox web after System cost 208MB.

With the extra options, the cost was 32MB.

So now instead of 20GB for 100 webs, we'll have an overhead of 3.2GB. Much better smile (will test on production tonight).

-- PaulHarvey - 20 Apr 2011

yes, I presumed that the disk size was mostly prealloc - mostly a tuning thing, though I have to admit that my test server only has and 80GB disk, so :}

-- SvenDowideit - 21 Apr 2011

Okay, a full re-load with directoryperdb, smallfiles & noprealloc on each mongod sees 6.2GB on replSet members a & b, but 5.4GB on member c. Member c is running Ubuntu 10.04 LTS whereas a & b are Ubuntu 9.10, still, strange there's different overhead there... the mongod PRIMARY seems to stabalise to ~2.8GB RES memory usage

-- PaulHarvey - 22 Apr 2011

This query is now taking ~4s at the mongodb end according to our timing headers (82k topics):
%SEARCH{"form.name='System.MigrationScriptsInsectsDemo.InsectsDemoForm'"
  type="query"
  web="System.MigrationScriptsInsectsDemo"
  pager="on"
  pagesize="10"
}%

However,
db.current.find({"FORM" : {"name" : "System.MigrationScriptsInsectsDemo.InsectsDemoForm"}}).explain()
{
   "cursor" : "BasicCursor",
   "nscanned" : 82888,
   "nscannedObjects" : 82888,
   "n" : 82872,
   "millis" : 165,
   "nYields" : 0,
   "nChunkSkips" : 0,
   "isMultiKey" : false,
   "indexOnly" : false,
   "indexBounds" : {
      
   }
}

Indicates ~0.165ms. What are the headers actually reporting on? Or does the foswiki query issue something more complex?

-- PaulHarvey - 23 Apr 2011

milestone 4 is now cooked, all it needs is testing and bug fixing, which should be handled in separate tasks - closing.

the header question above - can you put the headerinfo into a task?
  • the answer is that the MongoDB header entry is a list of time taken for query (in mongo+roundtrip), and
  • the DebugLog one is the measured time to run the entire perl code (variation due to where I can put hooks)

and yes, its not fastcgisafe - he says looking for the task stick out tongue

-- SvenDowideit - 04 May 2011
 
Topic revision: r24 - 04 May 2011, SvenDowideit
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy