highlight - Solr patternreplacefilterfactory gives unmatching values -
i setup field type configured as
<fieldtype name="text" class="solr.textfield" positionincrementgap="100"> <analyzer type="index"> <charfilter class="solr.patternreplacecharfilterfactory" pattern="#(\w+)" replacement="htag.$1 $1"/> <tokenizer class="solr.uax29urlemailtokenizerfactory"/> <filter class="solr.commongramsfilterfactory" words="stopwords.txt" ignorecase="true"/> <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true"/> <filter class="solr.lowercasefilterfactory"/> </analyzer> <analyzer type="query"> <charfilter class="solr.patternreplacecharfilterfactory" pattern="#(\w+)" replacement="htag.$1"/> <tokenizer class="solr.uax29urlemailtokenizerfactory"/> <filter class="solr.commongramsfilterfactory" words="stopwords.txt" ignorecase="true"/> <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true"/> <filter class="solr.lowercasefilterfactory"/> </analyzer> </fieldtype>
my aim index both words , #words. #usopen
indexed both usopen
, #usopen
.
and query parameters in addition hl
factors hl.fl=text&hl.fragsize=0&hl.simple.pre=<tag>hl.simple.post=</tag>&hl.requirefieldmatch=true
.
when query usopen
, highlighting text value shown #usope<tag>n</tag>
, when querying #usopen
, text value shown <tag>usope</tag>n
.
whats issue in above configuration , me fix it.
the charfilter changes offsets in indexed string not match original.
try using tokenfilter instead (patternreplacefilterfactory).
also, may make more sense normalize #something=>something both during indexing , searching , not try keeping both forms. long matches, care about.
Comments
Post a Comment