Archive

Archive for the ‘Hadoop’ Category

With a little help from my friends

December 31st, 2009 at 11:55pm

The end of the year 2009 is quickly approaching. To me it feels a little like it ran away far too quickly. So instead of taking part in the annual review of past events, I would like to use it as an opportunity to say thank you: The past twelve months were a lot of fun with lots of interesting, nice people from all over the world. I got the chance to meet quite a bit of the Mahout community, I got lots and lots of new developers from all over Germany - or more precisely the EU - to attend the Apache Hadoop Get Together in Berlin. The interest in Mahout has grown tremendously over the past year.

All of this would not have been possible without the help of many people: First of all I’d like to thank Thilo Fromm - for making me happy whenever I was disappointed, for solacing me when I when I was sad, for patiently listening to me nervously whining before each and every talk, for kindly reviewing my slides and last but not least for helping me fix some of the problems that bugged me. Oh - and, thanks for helping me fix the issue in the zookeeper c-client within minutes that puzzled me for days.

Another big Thanks goes to family, first and foremost my mum, who kindly took care of organizing quite a bit of my paperwork and kept me on schedule with so many “unimportant” tasks like getting an appointment with some hospital to finally get the screws taken out of my knee ;)

A special thanks goes to the growing Mahout community as well as to the Lucene people - you know, who you are - keep up the great work: You rock!

Furthermore there are students at TU Berlin who have shown that with Mahout it is “dead-simple” to write an application that, given a stream of documents, groups them by topic and makes the result searchable in Solr. Thanks to you for solving the minor and major problems, for communicating with the community, for transparently communicating problems. Looking forward to continue working together with you next year.

Finally a big thank you to all of the speakers, sponsors and attendees of the Apache Hadopp Get Together, the NoSQL conference and the Apache Dinner Berlin - without you these events would never have been possible. Looking forward to seeing you again in January/ March 2010!

I hope I didn’t forget too many people - just in case: I am pretty grateful for all the input, help and feedback I got this year.

PS: Another thanks to the spaceboyz visiting Berlin for 26C3 for helping Thilo tidy up our apartment after Congress was over this year ;)

Hadoop, Lucene, Mahout , , , , ,

On Thursday: Open Hadoop User Group Munich

December 16th, 2009 at 6:06am

If one evening of Apache Hadoop is not enough for you: The next Christmas Meetup in Germany takes place one day later in Munich.

  • When: Thursday December 17, 2009 at 5:30pm open end
  • Where: eCircle AG, Nymphenburger Straße 86, 80636 München (”Bruckmann” Building, “U1 Mailinger Str”, map in German http://www.ecircle.com/de/kontakt/anfahrt.html and look for the signs)

Talks scheduled by Bob and Lars:

Bob Schulze from eCircle will be giving the first presentation on how eCircle is planning to use the Hadoop stack.

Dave Butlerdi will be giving an overview of his usage of Hadoop.

Lars George will give a state of affairs of the HBase project. What is it, what does it do and how he is using it (since early 2008).

There is a quick connect via train from Berlin to Munich. So if you are attending the Berlin Get Together, it is very easy to travel south to Munich one day later and visit the Munich event as well.

*Camp, Hadoop , ,

Apache Hadoop at FOSDEM 2010

December 11th, 2009 at 9:19am

Though the official schedule is not yet online: I will be giving an introductory talk about Apache Hadoop at next year’s FOSDEM (Free and Open Source Developer European Meeting) in Brussles. This will be the 10th birthday of the event - looking forward to a fun event, meeting other free and open source software developers from all over Europe.



If you are a Apache Hadoop developer and would like me to include some particular topic in the talk - please feel free to contact me. If you are an Apache Hadoop user and would like to learn more on the project, please come to the talk and ask questions. If you are an Apache Hadoop Newbie - feel free to join us.

In addition there will be a NoSQL Dev Room at FOSDEM as well. The call for presentations is up already. So if you are doing fun stuff with CouchDB, HBase and friends or are a developer of these projects - submit a talk and join us in early-February in Brussles. Read more…

Free Software, General, Hadoop , ,

Mahout 0.2 released

November 18th, 2009 at 10:52am

Apache Mahout 0.2 has been released and is now available for public download at http://www.apache.org/dyn/closer.cgi/lucene/mahout

Up to date maven artifacts can be found in the Apache repository at
https://repository.apache.org/content/repositories/releases/org/apache/mahout/

Apache Mahout is a subproject of Apache Lucene with the goal of delivering scalable machine learning algorithm implementations under the Apache license. http://www.apache.org/licenses/LICENSE-2.0

Mahout is a machine learning library meant to scale: Scale in terms of community to support anyone interested in using machine learning. Scale in terms of business by providing the library under a commercially friendly, free software license. Scale in terms of computation to the size of data we manage today.

Built on top of the powerful map/reduce paradigm of the Apache Hadoop project, Mahout lets you solve popular machine learning problem settings like clustering, collaborative filtering and classification
over Terabytes of data over thousands of computers.

Implemented with scalability in mind the latest release brings many performance optimizations so that even in a single node setup the library performs well.

The complete changelist can be found here:

http://issues.apache.org/jira/browse/MAHOUT/fixforversion/12313278

New Mahout 0.2 features include

  • Major performance enhancements in Collaborative Filtering, Classification and Clustering
  • New: Latent Dirichlet Allocation(LDA) implementation for topic modelling
  • New: Frequent Itemset Mining for mining top-k patterns from a list of transactions
  • New: Decision Forests implementation for Decision Tree classification (In Memory & Partial Data)
  • New: HBase storage support for Naive Bayes model building and classification
  • New: Generation of vectors from Text documents for use with Mahout Algorithms
  • Performance improvements in various Vector implementations
  • Tons of bug fixes and code cleanup

Getting started: New to Mahout?

For more information on Apache Mahout, see http://lucene.apache.org/mahout

A very BIG Thank You to all those who made this release happen!

Hadoop , , ,