BigDataCon

BigDataCon #


Together with Uwe Schindler I had published a series of articles on Apache
Lucene at Software and Support Media's Java Mag several years ago. Earlier this
year S&S kindly invited my to their BigDataCon - co-located with JAX to give a
talk of my choosing that at least touches upon Lucene.


Thinking back and forth about what topic to cover what came to my mind was to
give a talk on how easy it is to do text classification with Mahout when
relying on Apache Lucene for text analysis, tokenisation and token filtering.
All classes essentially are in place to integrate Lucene Analyzers with Mahout
vector generation - needed e.g. as a pre-processing step for classification or
text clustering.


Feel free to check out some of my sandbox code over at <a
href=``http://github.org/MaineC/sofia''>github</a>.


After attending the conference I can only recommend everyone interested in Java
programming and able to understand German to buy a ticket for the conference.
It's really well executed, great selection of talks (though the sponsored
keynotes usually aren't particularly interesting), tasty meals, interesting
people to chat with.