Apache Mahout 0.4 release

Apache Mahout 0.4 release #

On last Sunday the Apache Mahout project published the 0.4 release. Nearly every piece of the code has been refactored and improved since the last 0.3 release. The release was timed to happen exactly before Apache Con NA in Atlanta. As such it was published on October 31st - the Halloween release, sort-of.

Especially mentionable are the following improvements:


  • Model refactoring and CLI changes to improve integration and consistency
  • Map/Reduce job to compute the pairwise similarities of the rows of a matrix using a customizable similarity measure
  • Map/Reduce job to compute the item-item-similarities for item-based collaborative filtering
  • More support for distributed operations on very large matrices
  • Easier access to Mahout operations via the command line
  • New vector encoding framework for high speed vectorization without a pre-built dictionary
  • Additional elements of supervised model evaluation framework
  • Promoted several pieces of old Colt framework to tested status (QR decomposition, in particular)
  • Can now save random forests and use it to classify new data


New features and algorithms include:

  • New ClusterEvaluator and CDbwClusterEvaluator offer new ways to evaluate clustering effectiveness
  • New Spectral Clustering and MinHash Clustering (still experimental)
  • New VectorModelClassifier allows any set of clusters to be used for classification
  • RecommenderJob has been evolved to a fully distributed item-based recommender
  • Distributed Lanczos SVD implementation
  • New HMM based sequence classification from GSoC (currently as sequential version only and still experimental)
  • Sequential logistic regression training framework
  • New SGD classifier
  • Experimental new type of NB classifier, and feature reduction options for existing one


There were many, many more small fixes, improvements, refactorings and cleanup. Go check out the new release, give the new features a try and report back to us on the user mailing list.