Mahout in Action #
Flying to Atlanta I finally had a few hours of time to finalize the review of the Mahout in Action MEAP edition. The
book is intended for potential users of the Apache Mahout, a project focussing on implementing scalable algorithms for
machine learning.
Describing machine learning algorithms and their application to practioners is a non-trivial task:
Usually there is more than one algorithm available for seemingly identically problem settings. In addition each
algorithm usually comes with multiple parameters for fine-tuning its behaviour to the problem setting at hand.
Sean
Owen does an awesome job explaining the basic concepts behind building recommender systems in that book. In a very
intuitive way he highlights the properties of each algorithm and its options. Based on one example setting taken from a
real world problem (parents buying music Cds for their children based on more or less background information) he
highlights the properties of each available recommender algorithm.
The second section of the book highlights
available implementations for clustering documents, that is grouping documents by similarity – a problem that is very
common when it comes to grouping texts into topics and detecting upcoming new topics in a stream of publications. Robin
Anil and Ted Dunning make it very easy to understand what clustering is all about, explain how to use, configure and
use the current implementations in Mahout in various practical settings.
The book looks very promising. It is well
suited for engineers looking for an explanation of how to successfully use Mahout to solve real world problems. In
contrast to existing publications it makes it easy to grasp the basic concepts event without wading through complicated
computations. The book is specially targeted to Mahout users. However it does give important background information on
the algorithms available that is needed to decide on exactly which implementation and which configuration to use.
Looking forward to the last section on classification algorithms.