Mahout in Action

2010-11-30 21:07
Flying to Atlanta I finally had a few hours of time to finalize the review of the Mahout in Action MEAP edition. The book is intended for potential users of the Apache Mahout, a project focussing on implementing scalable algorithms for machine learning.
Describing machine learning algorithms and their application to practioners is a non-trivial task: Usually there is more than one algorithm available for seemingly identically problem settings. In addition each algorithm usually comes with multiple parameters for fine-tuning its behaviour to the problem setting at hand.
Sean Owen does an awesome job explaining the basic concepts behind building recommender systems in that book. In a very intuitive way he highlights the properties of each algorithm and its options. Based on one example setting taken from a real world problem (parents buying music Cds for their children based on more or less background information) he highlights the properties of each available recommender algorithm.
The second section of the book highlights available implementations for clustering documents, that is grouping documents by similarity – a problem that is very common when it comes to grouping texts into topics and detecting upcoming new topics in a stream of publications. Robin Anil and Ted Dunning make it very easy to understand what clustering is all about, explain how to use, configure and use the current implementations in Mahout in various practical settings.
The book looks very promising. It is well suited for engineers looking for an explanation of how to successfully use Mahout to solve real world problems. In contrast to existing publications it makes it easy to grasp the basic concepts event without wading through complicated computations. The book is specially targeted to Mahout users. However it does give important background information on the algorithms available that is needed to decide on exactly which implementation and which configuration to use. Looking forward to the last section on classification algorithms.