This question about Developing extensions (plugins skins etc.): Asked

Solr WordDelimiterFilterFactory splitOnNumerics ?

We are using solr 5.5.5 and SolrPlugin 7.30 on latest Foswiki.

It seems that the Solr WordDelimiterFilterFactory default setting splitOnNumeric=1 yields to bad search results, at least in our context.

E.g. when searching for acc09 (which is known abbreviation in our context) then top ranked results are .xlsx attachment which have somewhere separated acc and 09 in the filename, the document itself does not contain acc09.

On the otherside when searching quoted "acc09" or use facet topic then we get many topics which contain acc09 exactly (expected result).

Two questions:
  1. why are topics containting multiple exact terms not ranked on top ?
  2. would it not be more intuitive to have splitOnNumeric=0 as default ?
http://lucene.apache.org/core/5_5_5/analyzers-common/org/apache/lucene/analysis/miscellaneous/WordDelimiterFilterFactory.html

I think i will adjust foswiki_configs/conf/schema.xml and use splitOnNumeric=0.

-- UlrichLeodolter - 04 Dec 2019

Could you report back whether these settings give better results and I'll integrate it into the next release. Thanks.

-- MichaelDaum - 04 Dec 2019
 

QuestionForm edit

Subject Developing extensions (plugins skins etc.)
Extension SolrPlugin
Version Foswiki 2.1.6
Status Asked
Related Topics
Topic revision: r2 - 04 Dec 2019, MichaelDaum
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy