Apache Con Europe 2009 - part 1

Apache Con Europe 2009 - part 1 #

The past week members, committers and users of Apache software projects gathered in Amsterdam for another Apache Con EU

and to celebrate the 10th birthday of the ASF. One week dedicated to the development and use of Free Software and the Apache Way.

Monday was BarCamp day for me, the first BarCamp I ever attended. Unfortunately not all participants proposed talks. So some of the atmosphere of an unconference was missing. The first talk by Danese Cooper was on “HowTo: Amsterdam Coffee Shops”. She explained the ins and outs of going to coffee shops in Amsterdam, gave both legal and practical advise. There was a presentation of the Open Street Map project, several Apache projects. One talk discussed transfering the ideas of Free Software to other parts of life. Ross Gardler started a discussion on how to advocate contributions to Free Software projects in science and education.

Tuesday for me meant having some time for Mahout during the Hackathon. Specifically I looked into enhancing matrices with meta information. In the evening there were quite a few interesting talks at the Lucene Meetup: Jukka gave an overview of Tika, Grant introduced Solr. After Grant’s talk some of the participants shared numbers on their Solr installations (number of documents per index, query volumn, machine setup). To me it was extremely interesting to gain some insight into what people actually accomplish with Solr. The final talk was on Apache Droids, a still incubating crawling framework.

The Wednesday tracks were a little unfair: The Hadoop track (videos available online for a small fee) was right in parallel to the Lucene track. The day started with a very interesting keynote by Raghu from Yahoo! on their storage system PNUTS. He went into quite some technical detail. Obviously there is interest in publishing the underlying code under an open source license.

After the Mahout introduction by Grant Ingersoll I changed room to the Hadoop track. Arun Murthy shared his experience on tuning and debugging Hadoop applications. After lunch Olga Natkovich gave an introduction to Pig - a higher language on top of Hadoop that allows for specifications of filter operations, joins and basic control flow of map reduce jobs in just a few lines of Pig Latin code. Tom White gave an overview of what it means to run Hadoop on the EC2 cloud. He compared several options for storing the data to process. Today it is very likely that there will soon be quite a few more providers of cloud services in addition to Amazon.

Allen Wittenauer gave an overview of Hadoop from the operations point of view. Steve Lougran finally covered the topic of running Hadoop on dynamically allocated servers.

The day finished with a pretty interesting BOF on Hadoop. There still are people that do not clearly see the differences of Hadoop based systems to database backed applications. Best way to find out whether the model fits: Set up a trial cluster and do experiment yourself. Noone can tell which solution is best for you except for yourself (and maybe Cloudera setting up the cluster for you :) ).

After that the Mahout/UIMA BOF was scheduled - there were quite a few interesting discussions on what UIMA can be used for and how it integrates with Mahout. One major take home message: We need more examples integrating both. We developers do see the clear connections. But users often do not realize that many Apache projects should be used together to get the biggest value out.