duplicates - Solr Deduplication (dedupe) giving all zeros in signatureField -
i've followed examples listed in documentation here: http://wiki.apache.org/solr/deduplication , https://cwiki.apache.org/confluence/display/solr/de-duplication
however, when analyzing results every signaturefield gets returned so: 0000000000000000
i can't seem figure out why unique signature isn't being generated.
relevant config sections:
solrconfig.xml
<requesthandler name="/update" class="solr.xmlupdaterequesthandler"> <!-- see below information on defining updaterequestprocessorchains can used name on each update request --> <lst name="defaults"> <str name="update.chain">dedupe</str> </lst> </requesthandler>
...
<!-- deduplication example dedup update processor creates "id" field on fly based on hash code of other fields. example has overwritedupes set false since using id field signaturefield , solr maintain uniqueness based on anyway. --> <updaterequestprocessorchain name="dedupe"> <processor class="solr.processor.signatureupdateprocessorfactory"> <bool name="enabled">true</bool> <str name="signaturefield">signaturefield</str> <bool name="overwritedupes">false</bool> <str name="fields">name,features,cat</str> <str name="signatureclass">solr.processor.lookup3signature</str> </processor> <processor class="solr.logupdateprocessorfactory" /> <processor class="solr.runupdateprocessorfactory" /> </updaterequestprocessorchain>
schema.xml
<fields> <!-- valid attributes fields: name: mandatory - name field type: mandatory - name of defined type <types> section indexed: true if field should indexed (searchable or sortable) stored: true if field should retrievable multivalued: true if field may contain multiple values per document omitnorms: (expert) set true omit norms associated field (this disables length normalization , index-time boosting field, , saves memory). full-text fields or fields need index-time boost need norms. norms omitted primitive (non-analyzed) types default. termvectors: [false] set true store term vector given field. when using morelikethis, fields used similarity should stored best performance. termpositions: store position information term vector. increase storage costs. termoffsets: store offset information term vector. increase storage costs. default: value should used if no value specified when adding document. --> <field name="signaturefield" type="string" stored="true" indexed="true" multivalued="false" /> <field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="sku" type="text_en_splitting_tight" indexed="true" stored="true" omitnorms="true"/> <field name="name" type="text_general" indexed="true" stored="true"/> <field name="alphanamesort" type="alphaonlysort" indexed="true" stored="false"/> <field name="manu" type="text_general" indexed="true" stored="true" omitnorms="true"/> <field name="cat" type="string" indexed="true" stored="true" multivalued="true"/> <field name="features" type="text_general" indexed="true" stored="true" multivalued="true"/> ... etc
i'm wondering if can steer me in right direction?
Comments
Post a Comment