This project provides a collection of Apache Solr plugins that collaborate with NLP4L/framework.
This project uses maven. To build the project, just do:
$ mvn package
Copy the project jar file to the lib directory of Solr webapp. It usually exists under ${solr_install_dir}/server/solr-webapp/webapp/WEB-INF/lib.
$ cp target/nlp4l-solr-VERSION.jar ${solr_install_dir}/server/solr-webapp/webapp/WEB-INF/lib
This project needs config library developed by typesafe to be copied in the same directory.
$ cp config-1.3.1.jar ${solr_install_dir}/server/solr-webapp/webapp/WEB-INF/lib
The fileReceiver servlet needs to be registered in web.xml.
<servlet>
<servlet-name>fileReceiver</servlet-name>
<servlet-class>org.nlp4l.solr.servlet.FileReceiver</servlet-class>
<init-param>
<param-name>root_path</param-name>
<param-value>/path/to/solr_core</param-value>
</init-param>
</servlet>
<servlet-mapping>
<servlet-name>fileReceiver</servlet-name>
<url-pattern>/nlp4l/receive/file</url-pattern>
</servlet-mapping>
FeaturesRequestHandler needs to be registered in solrconfig.xml.
<requestHandler name="/features" class="org.nlp4l.solr.ltr.FeaturesRequestHandler" startup="lazy">
<lst name="defaults">
<str name="conf">ltr_features.conf</str>
</lst>
</requestHandler>
Where, ltr_features.conf is provided in the same directory as the directory where solrconfig.xml exists and has the structure of this file looks like below:
{
"features": [
{
"name": "TF in title",
"class": "org.nlp4l.solr.ltr.FieldFeatureTFExtractorFactory",
"params": { "field": "title" }
},
{
"name": "TF in body",
"class": "org.nlp4l.solr.ltr.FieldFeatureTFExtractorFactory",
"params": { "field": "body" }
},
{
"name": "IDF in title",
"class": "org.nlp4l.solr.ltr.FieldFeatureIDFExtractorFactory",
"params": { "field": "title" }
},
{
"name": "IDF in body",
"class": "org.nlp4l.solr.ltr.FieldFeatureIDFExtractorFactory",
"params": { "field": "body" }
},
{
"name": "TF*IDF in title",
"class": "org.nlp4l.solr.ltr.FieldFeatureTFIDFExtractorFactory",
"params": { "field": "title" }
},
{
"name": "TF*IDF in body",
"class": "org.nlp4l.solr.ltr.FieldFeatureTFIDFExtractorFactory",
"params": { "field": "body" }
}
]
}
$ curl -X POST -H "Content-type: text/json" -d @examples/ltr-queries.json "http://localhost:8983/solr/collection1/features?command=extract"
You'll get procId so that you can know the progress and download the result.
Using procId, access the following URL to see the progress:
http://localhost:8983/solr/collection1/features?command=progress&procId=<proc id>
Using procId, access the following URL to download the result:
http://localhost:8983/solr/collection1/features?command=download&procId=<proc id>
As the extraction produces a temporary file, the client who requested the extraction had better delete the temporary file when it is no longer needed.
http://localhost:8983/solr/collection1/features?command=delete&procId=<proc id>
Optionally, you can delete it when you download the result by using optional parameter "delete=true"
http://localhost:8983/solr/collection1/features?command=download&procId=<proc id>&delete=true
When you use RankingSVM, use LinearWeightQParserPlugin.
<queryParser name="linearWeight" class="org.nlp4l.solr.ltr.LinearWeightQParserPlugin">
<lst name="settings">
<str name="features">collection1/conf/ltr_features.conf</str>
<str name="model">collection1/conf/linearweight_model.conf</str>
</lst>
</queryParser>
When you use PRank, use PRankQParserPlugin.
<queryParser name="prank" class="org.nlp4l.solr.ltr.PRankQParserPlugin">
<lst name="settings">
<str name="features">collection1/conf/ltr_features.conf</str>
<str name="model">collection1/conf/prank_model.conf</str>
</lst>
</queryParser>
http://localhost:8983/solr/collection1/select?indent=on&q=your_query&rq={!rerank reRankQuery=$rqq}&rqq={!linearWeight}your_query&debugQuery=on
http://localhost:8983/solr/collection1/select?indent=on&q=your_query&rq={!rerank reRankQuery=$rqq}&rqq={!prank}your_query&debugQuery=on