Archive

Archive for December, 2009

With a little help from my friends

December 31st, 2009

The end of the year 2009 is quickly approaching. To me it feels a little like it ran away far too quickly. So instead of taking part in the annual review of past events, I would like to use it as an opportunity to say thank you: The past twelve months were a lot of fun with lots of interesting, nice people from all over the world. I got the chance to meet quite a bit of the Mahout community, I got lots and lots of new developers from all over Germany - or more precisely the EU - to attend the Apache Hadoop Get Together in Berlin. The interest in Mahout has grown tremendously over the past year.

All of this would not have been possible without the help of many people: First of all I’d like to thank Thilo Fromm - for making me happy whenever I was disappointed, for solacing me when I when I was sad, for patiently listening to me nervously whining before each and every talk, for kindly reviewing my slides and last but not least for helping me fix some of the problems that bugged me. Oh - and, thanks for helping me fix the issue in the zookeeper c-client within minutes that puzzled me for days.

Another big Thanks goes to family, first and foremost my mum, who kindly took care of organizing quite a bit of my paperwork and kept me on schedule with so many “unimportant” tasks like getting an appointment with some hospital to finally get the screws taken out of my knee ;)

A special thanks goes to the growing Mahout community as well as to the Lucene people - you know, who you are - keep up the great work: You rock!

Furthermore there are students at TU Berlin who have shown that with Mahout it is “dead-simple” to write an application that, given a stream of documents, groups them by topic and makes the result searchable in Solr. Thanks to you for solving the minor and major problems, for communicating with the community, for transparently communicating problems. Looking forward to continue working together with you next year.

Finally a big thank you to all of the speakers, sponsors and attendees of the Apache Hadopp Get Together, the NoSQL conference and the Apache Dinner Berlin - without you these events would never have been possible. Looking forward to seeing you again in January/ March 2010!

I hope I didn’t forget too many people - just in case: I am pretty grateful for all the input, help and feedback I got this year.

PS: Another thanks to the spaceboyz visiting Berlin for 26C3 for helping Thilo tidy up our apartment after Congress was over this year ;)

Hadoop, Lucene, Mahout , , , , ,

First December Apache Hadoop Berlin video online

December 31st, 2009

The video of Nikolaus’ Pohle’s talk at the December Apache Hadoop Get Together Berlin is online already - more to come soon.

hadoop nikolaus pohle from Isabel Drost on Vimeo.

Thanks to Martin from newthinking for video taping and uploading. Thanks to StudiVZ for sponsoring the video.

Apache Hadoop Get Together Berlin , , ,

Screws are out

December 26th, 2009

Before: Some time in between: After:


On December 22nd those screws got taken out of my knee: Early in the morning (early as in arrive at 6:45am) I was to be at the hospital. In return I was allowed to go home the same day in the afternoon: Finally some time for reading and refining MAHOUT-85 ;)

Freetime

Winter arrived at Berlin

December 18th, 2009

Finally winter seems to have arrived at Berlin as well:



Looks a little like Christmas is drawing closer. Only disadvantage of the weather: One of the breaks of my bike was frozen after very few minutes. Luckily for me, my bike has one of those old-fashioned back pedal brakes ;)

Freetime , , ,

Summary - December Get Together

December 16th, 2009

Today the seventh Apache Hadoop Get Together took place in Berlin. The room was again packed with more than 40 people from various companies with and without practical experience with Hadoop: There were people from Nokia Gate 5, Sun, nurago, StudiVZ, Dawanda, Last.fm, nugg.ad. There were people from academia, e.g. HPI Potsdam. And a few Freelancers interested in the topic or providing help with Hadoop.

We had three very interesting talks. The first one was given by Richard Hutton from nugg.ad on their usage of Hadoop. They provide targeted advertisement services to their clients. Naturally they do need to process lots of user interactions to be able to draw reliable conclusions. nugg.ad started out with a traditional system setup: Erlang loggers in front, data got fed to well known data warehouse infrastructures, analysed and results pushed back to the frontends. However this architecture would scale only so far. So in the beginning of 2009 they started migrating their systems over to Hadoop. (A Thanks from the speaker to Tom White for publishing the Hadoop book at O’Reilly that obviously helped the developers a lot.). Today, nugg.ad is down from one to two days for analysis to one to two hours. I will link the slides of the talk as soon as I have the pdf version available.

Second talk was given by Jörg Möllenkamp on what Sun is doing with Hadoop. Sun does have “special hardware” - special in that the have systems with up to 512 virtual processors on one chip. With Solaris they do have an operating system that scales to that architecture. But now they are looking for applications that can use such hardware efficiently as well. Hadoop is well suited for distributing computations - so it looked like a great fit for Sun. Slides are available online.




The last talk was given by Nikolaus Pohle from nurago. They switched to Hadoop only recently. Coming from online market analysis, they have to analyse lots of user interaction data. Currently they are moving away from a MySQL based architecture to a distributed system based on HDFS and Map/Reduce. In order to ease writing M/R jobs for their employees they built their own abstract language on top of Hadoop that helps formulating recurring jobs. That does sound a lot like what PIG or Cascading already does - but is specially targeted at the type of jobs they have. Slides are available online. There is also a pdf version for users who prefer open formats.

If anyone should be interested in it, I also put my introductory slides online.

Next meetup will be in March 2010. It will feature a talk by Zanox on their Hadoop usage, one talk by eCircle from Munich as well as one talk by Nokia. You are very welcome to join us. If you would like to give a presentation yourself - please do contact me. If you would like to sponsor the event, please send me an e-mail.

A big Thank You to all the speakers - Nikolaus Pohle from nurago, Jörg Möllenkamp from Sun and Richard Hutton from nugg.ad - without you, the event would not be possible. Another big Thank You to newthinking for providing the venue for free. And, last but not least, another big Thank You to StudiVZ for sponsoring the videos. They will be linked to from here as well as from the StudiVZ blog as soon as they are available.

Apache Hadoop Get Together Berlin ,

On Thursday: Open Hadoop User Group Munich

December 16th, 2009

If one evening of Apache Hadoop is not enough for you: The next Christmas Meetup in Germany takes place one day later in Munich.

  • When: Thursday December 17, 2009 at 5:30pm open end
  • Where: eCircle AG, Nymphenburger Straße 86, 80636 München (”Bruckmann” Building, “U1 Mailinger Str”, map in German http://www.ecircle.com/de/kontakt/anfahrt.html and look for the signs)

Talks scheduled by Bob and Lars:

Bob Schulze from eCircle will be giving the first presentation on how eCircle is planning to use the Hadoop stack.

Dave Butlerdi will be giving an overview of his usage of Hadoop.

Lars George will give a state of affairs of the HBase project. What is it, what does it do and how he is using it (since early 2008).

There is a quick connect via train from Berlin to Munich. So if you are attending the Berlin Get Together, it is very easy to travel south to Munich one day later and visit the Munich event as well.

*Camp, Hadoop , ,

On Wednesday: December Apache Hadoop @ Berlin

December 14th, 2009

This week on Wednesday at 5p.m. the December Hadoop Get Together takes place in newthinking store Berlin.

Talks scheduled so far:

  • Richard Hutton (nugg.ad): “Moving from five days to one hour.”
  • Jörg Möllenkamp (Sun): “Hadoop on Sun”
  • Nikolaus Pohle (nurago): “M/R for MR - Online Market Research powered by Apache Hadoop. Enable consultants to analyze online behavior for audience segmentation, advertising effects and usage patterns.”

There will be videos after the event linked to by StudiVZ (thanks for sponsoring) after the Meetup is over.

As this is the last Meetup before Christmas there will be cookies waiting for you.

If you want to get notifications of future events on Apache Hadoop, NoSQL, Apache Lucene - be it trainings, meetups or conferences - feel free to subscribe to the Mailinglist or join the Xing Group that accompanies the Berlin Get Together.

Apache Hadoop Get Together Berlin , ,

Photos the traditional way

December 13th, 2009

After one year of taking pictures at various occasions and at various places the pile of photos grew frighteningly large:



I am not counting the images taken at Apache Con US Oakland - they are not yet developed. All other photos were taken either with an old fashioned Olympus µ-zoom or a Praktica Nova 1st.

Several hours of work later, I ended up with one more book containing memories of an exciting year…

Freetime ,

“Schneeflöckchen, Weißröckchen”

December 13th, 2009

First snow seen this morning - seems like finally it’s winter:



Freetime , ,

Apache Hadoop at FOSDEM 2010

December 11th, 2009

Though the official schedule is not yet online: I will be giving an introductory talk about Apache Hadoop at next year’s FOSDEM (Free and Open Source Developer European Meeting) in Brussles. This will be the 10th birthday of the event - looking forward to a fun event, meeting other free and open source software developers from all over Europe.



If you are a Apache Hadoop developer and would like me to include some particular topic in the talk - please feel free to contact me. If you are an Apache Hadoop user and would like to learn more on the project, please come to the talk and ask questions. If you are an Apache Hadoop Newbie - feel free to join us.

In addition there will be a NoSQL Dev Room at FOSDEM as well. The call for presentations is up already. So if you are doing fun stuff with CouchDB, HBase and friends or are a developer of these projects - submit a talk and join us in early-February in Brussles. Read more…

Events, Free Software, Hadoop , ,