Solr integration with Nutch ..
Aug 20, 2009 in Search, nutch, solr
“requestHandler” notes for the solrconfig.xml file:
– Fields are defined here:
<str name=”hl.fl”>text features name</str>
– Field values are defined here:
<str name=”f.name.hl.alternateField”>name</str>
<str name=”f.name.hl.fragsize”>0</str>
<str name=”f.text.hl.fragmenter”>regex</str>
– The alternate ‘nutch’ configuration is:
(See http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/)
– Fields:
<str name=”hl.fl”>title url content</str>
– Field values:
<str name=”f.content.hl.fragmenter”>regex</str>
<str name=”f.title.hl.alternateField”>title</str>
<str name=”f.title.hl.fragsize”>0</str>
<str name=”f.url.hl.alternateField”>url</str>
<str name=”f.url.hl.fragsize”>0</str>
– To map a parser to a file type,
– Map mime type for the file to a plugin in conf/parse-plugins.xml .
– Define new mime type for the file in conf/mime-types.xml .
