Apache Hadoop in Debian Squeeze

2010-07-17 12:04 More posts about Debian Hadoop hbase Zookeeper
After using Mandrake for quite a while (still blaming my boyfriend Thilo for infecting not only my computer but also myself first with that system, then with the more general idea of Free Software - but that's another story.) after finishing my master's thesis I started using GNU Debian Linux (back then in the version code-named Woody). Since I always had a GNU Debian on my private box as my main operating system - even installed it on my MacBook following the steps in the Debian Wiki.

As I am also an Apache Mahout committer, closely related to the Apache Hadoop project, I always found it kind of sad that there were no Hadoop packages in the official Debian repositories. I tried multiple times to find some time to get into Debian packaging myself, I learned what "debian/rules" is all about and discovered some of the intricacies of packaging Java based software. However I have to admit that I never was able to find enough time to really finish that task.

A few weeks before this year's FOSDEM I learned on the Apache Hadoop as well as on the Debian Java lists that a guy called Thomas Koch was working on solving bug 535861 - ITP to package Hadoop. We met at FOSDEM where I tried to raise some attention in the audience for Thomas' plans (back then he was in need for help with a few last missing pieces). In addition I invited him for Berlin Buzzwords to get in touch with other Hadoop developers and users for further input.

I am really happy that by now Hadoop has made it into the official Debian package repositories - as soon as Debian Squeezeapt-get install [Hadoop component you need]: Debian package search.

Squeeze

If you want to speed up the process of Squeeze being released as stable version: Help fixing the remaining bugs in that distribution. There are various Debian Bug Squashing Parties being organised around the world. Next one in Berlin is on next Monday, the one for Munich is running this weekend. Just got the information that Fefe posted in his blog a link to the Mozilla bug bounty:

The packages are based on the upstream Apache Hadoop distribution, being comparably new they are intended for development machines at the moment. If you are using Debian and want to work with Hadoop - this is a great opportunity to help making the packages more stable by simply using them and reporting your experiences back to the Debian community.

In addition Debian now also provides packages for Zookeeper as well as HBase - though the HBase version is not yet production ready as the HDFS-append patch is still missing.

To follow the general state and progress of these packages feel free to follow the packages pages for Hadoop, HBase, Zookeeper respectively.

Thomas currently plans to work more closely with upstream e.g. to tidy up the chaos in the start-up scripts and other minor glitches. So watch out for further improvements.

In addition I just saw another interesting ITP in the Debian bugtracker: Wishlist: katta. I am sure there are quite a few others as well.