GSoC at Mahout #
GSoC 2009 is about to finish: Final evaluations are through, most of the code submitted by Mahout’s students has been
committed to svn, code samples are on their way to Google.
In Mahout, we had three students joining the project:
Robin working on an HBase based Naive Bayes extension and on frequent itemset discovery. David contributing a
distributed LDA implementation. Deneche was working on a Random Forest implementation. All three of them have done
great work during this summer, contributing not only code but valuable input on the project’s mailinglists as well. As
a result, all three of them have been given committer status by the end of GSoC.
Apart from three new additions
to the code base, summer also brought quite some traffic to the user list - not only in terms of subscriptions but also
in terms of developers contributing to the discussions online. Currently, it looks like the project is really gaining
momentum, as also noted in Grant
Ingersoll’s post.
Discussions on the dev list on the future road map of Mahout clearly showed that the
developers share the vision of a scalable, potentially distributed, stable machine learning library. That the focus
should be on production ready code under a commercially friendly license instead of bleeding edge research
implementations. Last but no least the goal is to build a lively, diverse community around the project to guarantee
further development and user support.
2009 brought quite a few talks both in Germany as well as the US
on the topic of Mahout (besides all the events on Hadoop, scalable databases and cloud computing in general) with an
Apache Con US talk introducing Mahout in Oakland
still to come.
Yesterday, a great article indroducing Apache Mahout with hands-on examples was published on IBM Developerworks by Grant Ingersoll.
Check it out, if you want to learn more on Mahout, and Machine Learning in general.