Home > *Camp, Apache Con, General, Hacking, Software Foundation > Apache Con Europe 2009 - part 1

Apache Con Europe 2009 - part 1

March 29th, 2009 at 6:41pm

The past week members, committers and users of Apache software projects gathered in Amsterdam for another Apache Con EU - and to celebrate the 10th birthday of the ASF. One week dedicated to the development and use of Free Software and the Apache Way.

Monday was BarCamp day for me, the first BarCamp I ever attended. Unfortunately not all participants proposed talks. So some of the atmosphere of an unconference was missing. The first talk by Danese Cooper was on “HowTo: Amsterdam Coffee Shops”. She explained the ins and outs of going to coffee shops in Amsterdam, gave both legal and practical advise. There was a presentation of the Open Street Map project, several Apache projects. One talk discussed transfering the ideas of Free Software to other parts of life. Ross Gardler started a discussion on how to advocate contributions to Free Software projects in science and education.

Tuesday for me meant having some time for Mahout during the Hackathon. Specifically I looked into enhancing matrices with meta information. In the evening there were quite a few interesting talks at the Lucene Meetup: Jukka gave an overview of Tika, Grant introduced Solr. After Grant’s talk some of the participants shared numbers on their Solr installations (number of documents per index, query volumn, machine setup). To me it was extremely interesting to gain some insight into what people actually accomplish with Solr. The final talk was on Apache Droids, a still incubating crawling framework.

The Wednesday tracks were a little unfair: The Hadoop track (videos available online for a small fee) was right in parallel to the Lucene track. The day started with a very interesting keynote by Raghu from Yahoo! on their storage system PNUTS. He went into quite some technical detail. Obviously there is interest in publishing the underlying code under an open source license.

After the Mahout introduction by Grant Ingersoll I changed room to the Hadoop track. Arun Murthy shared his experience on tuning and debugging Hadoop applications. After lunch Olga Natkovich gave an introduction to Pig - a higher language on top of Hadoop that allows for specifications of filter operations, joins and basic control flow of map reduce jobs in just a few lines of Pig Latin code. Tom White gave an overview of what it means to run Hadoop on the EC2 cloud. He compared several options for storing the data to process. Today it is very likely that there will soon be quite a few more providers of cloud services in addition to Amazon.

Allen Wittenauer gave an overview of Hadoop from the operations point of view. Steve Lougran finally covered the topic of running Hadoop on dynamically allocated servers.

The day finished with a pretty interesting BOF on Hadoop. There still are people that do not clearly see the differences of Hadoop based systems to database backed applications. Best way to find out whether the model fits: Set up a trial cluster and do experiment yourself. Noone can tell which solution is best for you except for yourself (and maybe Cloudera setting up the cluster for you :) ).

After that the Mahout/UIMA BOF was scheduled - there were quite a few interesting discussions on what UIMA can be used for and how it integrates with Mahout. One major take home message: We need more examples integrating both. We developers do see the clear connections. But users often do not realize that many Apache projects should be used together to get the biggest value out.

*Camp, Apache Con, General, Hacking, Software Foundation , ,

  1. hannescarlmeyer
    April 20th, 2009 at 14:02 | #1

    Thanks for those three comments on the conference. Unfortunately I was not able to attend this year…

    One question here regarding the UIMA topic. Was there a real interest in UIMA from the audience and was it popular/known by them? What do you think how does the Lucene community is aware of UIMA?

    Bests

    Hannes

  2. April 20th, 2009 at 15:33 | #2

    There was some interest - not only at Apache Con but also at the Hadoop User Group UK in London. Problem is: UIMA does not really have the visibility inside the community: Usually few, if any, UIMA committers are available at the Meetups. There was no UIMA talk scheduled neither at Apache Con nor at the Lucene Meetup on Tuesday after the trainings. I think what we need is a) people explaining more on what UIMA is all about and b) a viable example demonstrating that UIMA and various Apache projects (Lucene, Solr, nutch, Mahout…) are a good fit.

  3. hannescarlmeyer
    April 21st, 2009 at 16:09 | #3

    Thanks for your rating. I did the decision three years ago to implement UIMA into one of our text mining product ICE (intelligent content engineering) and UIMA helped a lot. But on the other UIMA was hard to explain unless there were no usable examples out there and nor infrastructure. Lets see what will happen to it…

  1. No trackbacks yet.
You must be logged in to post a comment.