Author: techfox9
Solr integration with Nutch ..
Thursday, August 20th, 2009 @ 2:29 pm
“requestHandler” notes for the solrconfig.xml file:
— Fields are defined here:
<str name=”hl.fl”>text features name</str>
— Field values are defined here:
<str name=”f.name.hl.alternateField”>name</str>
<str name=”f.name.hl.fragsize”>0</str>
<str name=”f.text.hl.fragmenter”>regex</str>
— The alternate ‘nutch’ configuration is:
(See http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/)
— Fields:
<str name=”hl.fl”>title url content</str>
— Field values:
<str name=”f.content.hl.fragmenter”>regex</str>
<str name=”f.title.hl.alternateField”>title</str>
<str name=”f.title.hl.fragsize”>0</str>
<str name=”f.url.hl.alternateField”>url</str>
<str name=”f.url.hl.fragsize”>0</str>
— To map a parser to a file type,
— Map mime type for the file to a plugin in conf/parse-plugins.xml .
— Define new mime type for the file in conf/mime-types.xml .