You are here: Foswiki>Tasks Web>Item9813 (06 Jul 2015, GeorgeClark)Edit Attach

Item9813: Inputset sorting order need more work

pencil
Priority: Urgent
Current State: Proposal Required
Released In: n/a
Target Release: major
Applies To: Engine
Component: SEARCH
Branches:
Reported By: SvenDowideit
Waiting For:
Last Change By: GeorgeClark
Looking at the Fn_SEARCH unit tests, that sounds like I changed the implicit ordering that as you say is documented as default(order) = 'Sort by topic name'.

I even left myself notes to look into it.

      $this->{test_topicObject}->expandMacros( $search.'order="editby" reverse="on" format="$topic ($wikiname)"}%');
#    $this->assert_str_equals( "QueryTopic (simon),QueryTopicTwo (admin),OkATopic (WikiGuest),OkBTopic (WikiGuest),OkTopic (WikiGuest),TestTopicSEARCH (WikiGuest),WebPreferences (WikiGuest),QueryTopicThree (Gerald)", $result );
     $this->assert_str_equals( "QueryTopic (simon),QueryTopicTwo (admin),OkTopic (WikiGuest),OkBTopic (WikiGuest),WebPreferences (WikiGuest),TestTopicSEARCH (WikiGuest),OkATopic (WikiGuest),QueryTopicThree (Gerald)", $result );
#TODO: why is this different from 1.0.x?

mind you, this only occurs if the user's requested order is not the topic - in the above eg, ordering by editorname..

not sure its urgent, just annoying.

-- SvenDowideit - 09 Oct 2010

Not at all sure I know what you are reporting here; as you know, I changed trunk to always respect alphanumeric ordering by topic name. I'm not sure that was the right decision, as the current "order by topic=" allows a sequence that isn't achievable any other way. I'm quite tempted to revert it and restore (and document) the "order by input set" functionality - but iff that set is a complete, unambiguous specification (no wildcards)

-- CrawfordCurrie - 23 Oct 2010

This is a tough call

  • I perfectly see the need to sort in the order given by the topic parameter
  • My intuitive thought of how the order feature works when you say order by topic is by alphanum of the topic name so I understand why Crawford changed the code.

I am sure may have been puzzled why the sorting by topic did not work when a list of topics was given. On the other hand I never noticed myself, probably because the situation that is most common is that the topic parameter is the result of a nested search that returns a comma separated list of topic names and there you will typically have the inner SEARCH already sorted by topic name. So in these common nested searches it does not really matter how it works. Same when the topic is given a wild card. How would we sort topic="Web*, MyTopic, Other*" ?

But in a multiple type search where the application builder wants to dig out some information from say 3 specific topics, he may very well want to control the order of the information.

So we really need both.

We need a solution that is compatible with 1.0 (not sorting if given list of topics) but sorting if you have wildcards in the topic="" parameter

I think is the best.

-- KennethLavrsen - 23 Oct 2010

the thing i'm reporting here, is that due to store implementation assumptions, we have sometimes had an implied sort="userspecified,topic", and something in the refactor changed that. However, I also noticed while developing stores over the years, that its hard (unlikely, and inefficient) for us to insist that the implied result order (or even requested order) is always identical across different OS's, store implementations and search implementations.

a simple example is the ordering that comes from the filesystem - OSX&Window's orders are subtley different due to not-quite case-sensitive-ness..

And then the more complex ones are due to different collations in databases, or different implementation decisions in the indexers.

So with the intention to leverage the fastest paths of whatever store/search backend a user chooses:

I think we have no choice - we have to be flexible wrt the 'correct' order - and forcing inputsets to be presorted by the application (foswiki) into alphanumeric isn't helpful for this.

Personally, I liked being able to use the inputset order, including with wildcard - but I am just as happy for that to require topic="Sandbox.Web* Main.User* System" order="inputset"

just be aware that inputset order is only a small part of the issue - I think we need to formally define the result order as being slightly flexible to allow for realities and speed.

-- SvenDowideit - 23 Oct 2010

What does the two of you propose short term for 1.1.1?

-- KennethLavrsen - 24 Oct 2010

This bug is the only one that prevents a 1.1.1 release as of 24 Oct 2010

-- KennethLavrsen - 24 Oct 2010

After talking to Sven last night I think we have mixed in other things in the discussion above.

In a 1.1.1 context the question is if it is OK that the sequence of results has changed in the unit test?

Is that a new bug, or is it a bug that was fixed, or is it a bug in the unit test?

All the new stuff you are talking about Crawford is in trunk and checked in after we branched off 1.1.

-- KennethLavrsen - 25 Oct 2010

AFAIK the sequence has not changed between 1.0.10 and 1.1.1 - I only checked code into 2.0 (trunk) and as I said above, am seriously considering reverting it. I don't think this task should not block 1.1.1 - at least, not until we have agreed a strategy.

Sven, you are basically saying the the ordering of results is non-deterministic unless you explicitly specify an ordering, because the OS delivers results in different orders and the fastest path is to accept whatever ordering the OS suggests.

At the moment, in the 1.1 code, if you don't explicitly ask for an ordering, you get whatever the underlying store returns. In the case of the default search algorithm, that is majorkey=(web parameter | alphanumeric) minorkey=(topic parameter | alphanumeric) where alphanumeric is different on different configurations (e.g. selecting utf8 causes the collation order to change). in the case of SQL, it's ORDER BY (web, topic) - which in itself is susceptible to variation according to the collation sequence of the underlying RDBMS.

So, what options do we have?
  1. Keep things as they are, document the results from the underlying algorithm as unordered (and treat them as such in the code and tests).
    • Only those things explicitly ordered are sorted.
  2. Normalise the reporting order, and compromise the fastpath.
    • Note that this is potentially very bad; you can't normalise the sort order until you have all the results, which makes incremental searching and map/reduce much less achievable.
  3. Make the fastpath/normalised tradeoff explicit (e.g. in a configuration option).
    • We already have too many obscure options.
  4. Make the default non-deterministic, but only if no ordering was otherwise specified
    • i.e. if I express one sort key, then all orderings must be normalised. Thus if I order by web, then that always implies alphanumeric ordering of topics. If I order by topic, then that implicitly orders by web (i.e. sorts on web.topic name).
If we elect to go with any type of non-determinism, then we must eliminate the assumption in the code that the underlying store delivers results in a specific order. I have already had to code up a partition of the SQL results into webs in order to be consistent with the default store, which signals that there is an ordering assumption somewhere.

Right now, my personal inclination is to go with (1)
  • Clearly document that the sort order is non-deterministic for any key where you have not explicitly dictated an ordering
  • Fix the unit tests to reflect this
  • Fix the code to ensure that no ordering assumptions are exported from the search algorithm

-- CrawfordCurrie - 25 Oct 2010

Good. Agree. I will change this to normal and defer it to 2.0 so we have time to think. A quick fix in this area could ruin more than it fixes.

-- KennethLavrsen - 25 Oct 2010

Changing this to needs proposal.

-- Main.GeorgeClark - 06 Jul 2015 - 03:28

 

ItemTemplate edit

Summary Inputset sorting order need more work
ReportedBy SvenDowideit
Codebase 1.1.0, 1.1.0 beta1
SVN Range
AppliesTo Engine
Component SEARCH
Priority Urgent
CurrentState Proposal Required
WaitingFor
Checkins
TargetRelease major
ReleasedIn n/a
CheckinsOnBranches
trunkCheckins
masterCheckins
ItemBranchCheckins
Release01x01Checkins
Topic revision: r12 - 06 Jul 2015, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy