Item11886: Solr Query too many Results

pencil
Priority: Normal
Current State: Closed
Released In: n/a
Target Release: n/a
Applies To: Extension
Component: SolrPlugin
Branches:
Reported By: OliverSchaub
Waiting For: OliverSchaub
Last Change By: OliverSchaub
A Search for a username (tschedanoff) brings over 1300 results. But there are actually only four Topics containing this name. All the other hits found by SolrPlugin are actually ALL topics of one particular web. Three of those four correct hits are in this Web. The other is his User Topic.
This is a rather new User so he has not made that many footprints already.

When we fire the same query directly on Solr (without the involvment of the Wiki or SolrPlugin) the correct 4 Topics are found. So the Index itself seems to be OK.

We have SolrPlugin 1.10 Installed and run a Delta-Index every 5 Minutes and a Full Index every Weekend. The Bug is now consistent for over 5 Days. (since several Delta and one Full index run)

-- OliverSchaub - 22 May 2012

This is 99.99% certain a configuration issue.

You must check the following things:

  • which query type are you using in both cases: standard vs dismax vs edismax
  • how are your query handlers configured in solrconfig.xml, i.e. which fields are queried, how is the search query processed?
  • which group is that new user in?

Please check the solr docu for more info.

-- MichaelDaum - 22 May 2012

We actually use the SolrPlugin "Out of the Box" and the included Solr War-File and it's xml's. We haven't changed any of these files except for a UTF-8 Patch in the servers.xml File.

-- OliverSchaub - 23 May 2012

We still have this issue... I could break it down to the phrase "tsched" that is "found" in over 1800 Topics but is in reality in NONE of those topics. As already mentioned above, solr itself (without the solr plugin) does NOT find these results. They appear only in combination with the wiki/solrplugin.

-- OliverSchaub - 22 Aug 2012

When you tested "solr itself" then you most probably are using the standard query handler which is processing the query differently compared to the dismax and edismax query handler, which are also "solr itself". The standard query handler expects a valid lucene query in proper lucene syntax. The two dismax handlers are much more forgiving and has got much more options to be configured to take different properties of lucene documents when computing the hit set, i.e. phonetic, substring and whatever you configure it, and then weights each property computing the hit score per document. See http://wiki.apache.org/solr/ExtendedDisMax for more documentation.

SolrPlugin has been configured to make a reasonable choice out of the box as this is not always easy to ballance the parameters in a way that results make sense for your users.

I've been experimenting with these in solrconfig.xml a couple of times as there have been people like you getting unexpected results. The best I could figure out is now part of SolrPlugin as checked in to svn. Please have a look at SolrPlugin in http://svn.foswiki.org/trunk/SolrPlugin/. I'd strongly recommend to use this instead of the one on http://foswiki.org/Extensions/SolrPlugin as there are lots of fixes in there.

Furthermore you are strongly encouraged not to use the old solr binaries attached there. Instead, use the official solr distribution avaiable at http://lucene.apache.org/solr/ (currently shipping version 3.6.1).

From that point, please try your test query again. Also: have a look at the way the edismax handler is configured in solrconfig.xml and give it a spin optimizing it. I'd be very interested in your findings coming to better parameters there.

-- MichaelDaum - 22 Aug 2012

This Error seems to have dsappeared since our Upgrade to the latest Solr Plugin.

-- OliverSchaub - 20 Sep 2013
 

ItemTemplate edit

Summary Solr Query too many Results
ReportedBy OliverSchaub
Codebase 1.1.5
SVN Range
AppliesTo Extension
Component SolrPlugin
Priority Normal
CurrentState Closed
WaitingFor OliverSchaub
Checkins
TargetRelease n/a
ReleasedIn n/a
CheckinsOnBranches
trunkCheckins
Release01x01Checkins
Topic revision: r7 - 20 Sep 2013, OliverSchaub
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy