nutch

ApacheConEU - part 07

November 16, 2012
nutch, Apache Con, ApacheCon, Lucene, gora, apacheconeu, Solr

ApacheConEU - part 07 # Julien Nioche shared some details on the nutch crawler. Being the mother of all Hadoop projects (as in Hadoop was born out of developments inside of nutch) the project has become rather quite with a steady stream of development in the recent past. Julien himself uses the nutch for gathering crawled data for several customer projects - feeding this data into an NLP pipeline based on Behemoth that glues Mahout, UIMA and Gate together. ...