highlight - Solr patternreplacefilterfactory gives unmatching values -


i setup field type configured as

<fieldtype name="text" class="solr.textfield" positionincrementgap="100">           <analyzer type="index">           <charfilter class="solr.patternreplacecharfilterfactory"                  pattern="#(\w+)" replacement="htag.$1 $1"/>             <tokenizer class="solr.uax29urlemailtokenizerfactory"/>             <filter class="solr.commongramsfilterfactory" words="stopwords.txt" ignorecase="true"/>             <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true"/>             <filter class="solr.lowercasefilterfactory"/>           </analyzer>           <analyzer type="query">               <charfilter class="solr.patternreplacecharfilterfactory"                  pattern="#(\w+)" replacement="htag.$1"/>             <tokenizer class="solr.uax29urlemailtokenizerfactory"/>             <filter class="solr.commongramsfilterfactory" words="stopwords.txt" ignorecase="true"/>             <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true"/>             <filter class="solr.lowercasefilterfactory"/>           </analyzer>         </fieldtype> 

my aim index both words , #words. #usopen indexed both usopen , #usopen.

and query parameters in addition hl factors hl.fl=text&hl.fragsize=0&hl.simple.pre=<tag>hl.simple.post=</tag>&hl.requirefieldmatch=true.

when query usopen, highlighting text value shown #usope<tag>n</tag> , when querying #usopen, text value shown <tag>usope</tag>n.

whats issue in above configuration , me fix it.

the charfilter changes offsets in indexed string not match original.

try using tokenfilter instead (patternreplacefilterfactory).

also, may make more sense normalize #something=>something both during indexing , searching , not try keeping both forms. long matches, care about.


Comments

Popular posts from this blog

java - Jasper subreport showing only one entry from the JSON data source when embedded in the Title band -

serialization - Convert Any type in scala to Array[Byte] and back -

SonarQube Plugin for Jenkins does not find SonarQube Scanner executable -