Archive

Archive for the ‘Get Together’ Category

Apache Hadoop Get Together - Hand over

November 2nd, 2011 at 4:20pm

Apache Hadoop receives lots of attention from large US corporations who are using the project to scale their data processing pipelines:

“Facebook uses Hadoop and Hive extensively to process large data sets. [...]” (Ashish Thusoo, Engineering Manager at Facebook), “Hadoop is a key ingredient in allowing LinkedIn to build many of our most computationally difficult features [...]” (Jay Kreps, Principal Engineer, LinkedIn), “Hadoop enables [Twitter] to store, process, and derive insights from our data in ways that wouldn’t otherwise be possible. [...]” (Kevin Weil, Analytics Lead, Twitter). Found on Yahoo developer blog.

However the system’s use is not limited to large corporations only: With 101tec, Zanox, nugg.ad, nurago also local German players are using the project to enable new applications. Add components like Lucene, Redis, CouchDB, HBase and UIMA to the mix and you end up with a set of majour open source components that allow developers to rapidly develop systems that until a few years ago were possible only either in Google-like companies or in research.

The Berlin Apache Hadoop Get Together started in 2008 allowed to learn more on how the average local company leveraged this software. It is a platform to get in touch informally, exchange knowledge and best practices across corporate boundaries.

After three years of organising that event it is time to hand it over to new caring hands: David Obermann from Idealo kindly volunteered to take over organisation. He is a long-term attendee of the event and will continue it in the roughly the same spirit as before: Technical talks on success stories by users, new features by developers - not solely restricted to Hadoop only but also taking into account related projects.

A huge Thank You for taking up the work of co-ordinating, finding a venue and a sponsor for the videos goes to David! If any of you attending the event think that you have an interesting story to share, would like to support the event financially or just help out please get in touch with David.

Looking forward to the next Apache Hadoop Get Together Berlin. Watch this space for updates on when and where it will take place.

Get Together , , , , ,

Video is up: Paolo Negri on scaling by one order of magnitude

February 23rd, 2011 at 9:23pm

Video is up - Simon Willnauer on Lucene 4 Performance improvements

February 22nd, 2011 at 9:21pm

Video is up - Josh Devins on Apache Hadoop at Nokia

February 21st, 2011 at 9:19pm

Slides of yesterday’s Apache Hadoop Get Together

January 28th, 2011 at 9:40pm

This time with little less than 24 hours delay - the usual, by some impatiently expected, summary of the Apache Hadoop Get Together. The meetup took place at Zanox’ event campus. The room was well filled with some fourty attendees from various companies, experience with Hadoop ranging from interested beginners to experienced users.

Slides of all presentations:

The first presentation was given by Josh Devins from Nokia in Berlin. He is working closely with the OVI maps team. After giving a general overview of the cluster setup as well as some information on what machines they are running Hadoop on. Currently Hadoop is used mostly to process log data and aggregate information from it. For that task scribe is used for log collection, standard Ganglia and Nagios for monitoring and graphing. When starting to process and aggregate log data the main challenge is a mixture of transforming the logs into some slightly consistant format, cleaning logs from noisy data and in some cases initiating the storage of further information from various services. Nokia is a heavy - and happy - user of Pig though they are looking into Hive for making data accessible to business analysts who usually are more familiar with SQL like languages.

As an example - the results of a few simple jobs on analysing location based searches were shown: Looking at where in the greater Berlin area searches for “Ikea” were issued - at least Ikea Tempelhof and Spandau were easy to make out. On a more serious use case similar information could be used for automatically detecting traffic jams. Currently Nokia is only scratching the surface of all information that could possibly be extracted. So there is quite some interesting work ahead.

In the second presentation Simon Willnauer gave a deep dive introduction to the various stunning performance improvements of Lucene4 - the not yet released, not backwards compatible trunk version of Apache Lucene. For more flexible indexing column stride fields have been integrated. With the introduction of an automaton implementation fuzzy query performance could be improved significantly reducing complexity from n to log n. In addition Simon had a great surprise to share with the audience: He proudly announced that Ted Dunning (you know that guy who is active on nearly every Hadoop mailing list, shares a lot of in-depth theoretical knowledge that is backed by proven practical experience?) and Doug Cutting (founder of Lucene, Hadoop and many other Apache projects) are going to be keynote speakers at Berlin Buzzwords.

In the third presentation Paolo Negri shared some inside as to how Wooga’s Ruby on Rails/ MySQL based system was scaled. Disclaimer: Redis did play a major role when upping performance.

Videos will be published as soon as they are processed - thanks again to Cloudera for supporting the event by sponsoring video taping.

Get Together , , ,

WiFi at the Apache Hadoop Get Together

January 18th, 2011 at 8:40pm

Just a brief reminder: The next Apache Hadoop Get Together is scheduled to take place on Thursday, January 27th at 6p.m. at the Zanox Event Campus at Media Spree Berlin.

We have three very interesting talks, though thirty guests registered already, we still have a few free seats. Head over to the xing event page to register if you have not done so yet.

If you would like to have access to the local WiFi please let me know - I need to register your mail address for that two days before the event with the venue.

A huge thanks to Zanox for providing the location for free, another huge thanks to Cloudera for sponsoring video taping of the event.

Get Together , , ,

Apache Hadoop Get Together Berlin - January 2011

December 28th, 2010 at 4:31pm

This is to announce the next Apache Hadoop Get Together sponsored by Cloudera and Zanox that will take place in the Zanox Event Campus in Berlin.

When: January 27th 2011, 6p.m.

Where: zanox Event Campus (Please mark the changed event location.)


Größere Kartenansicht

As always there will be slots of 30min each for talks on your Hadoop topic. After each talk there will be a lot time to discuss. We head over to a bar after the event for some beer and something to eat.

Talks scheduled so far:

Simon Willnauer: “Lucene 4 - Revisiting problems for speed”

Abstract: This talk presents a brief case study of long standing problems in Lucene and how they have been approached to gain sizable performance improvements. Each of the presented problems will have brief introduction, implemented solution and resulting performance improvements. This talk might be interesting even for non-lucene folks.

Josh Devins: “Title: Hadoop at Nokia”
Abstract: In this talk, Josh will outline some of the ways in which Nokia is using Hadoop. We will start by having a quick look at the practical side of getting started with Hadoop and outline cluster hardware and configuration and management with tools like Puppet. Next we’ll dive head first into how Hadoop and its’ ecosystem are being utilized on a daily basis to perform business analytics, drive machine learning and help build data-driven products. We will also touch on how we go about collecting metrics from dozens of applications distributed in multiple data centers around the world. An open Q&A session will follow.

Paolo Negri: “The order of magnitude challenge: from 100K daily users to 1M ”
Abstract: “Social games backends share many aspects of normal web applications, but exasperate scaling problems, follow this talk to see how we evolved and brought a plain ruby on rails app to sustain 5000 reqs/sec, moved part of our data from sql to nosql to reach 5 millions queries per minute and see what we learned from this experience.”

Please do indicate on Upcoming or Xing if you are coming so we can more safely plan capacities.

A big Thank You goes to zanox for providing the venue for free for our event as well as to Cloudera for supporting videos being taped of the presentations.

Looking forward to seeing you in Berlin,
Isabel

Get Together , , , , ,

Apache Hadoop - Trainings by Cloudera in Berlin

December 22nd, 2010 at 11:53pm

Cloudera is offering trainings both for Administrators as well as for Developers early next year in Berlin. If your are getting started in using Apache Hadoop this might be a great option to get your developers and operations up to speed with the framework. If you are a regular of the local Apache Hadoop Get Together a discount code should have been sent to you by mail.

Get Together, Hadoop , , , ,

Video: Max Heimel on sequence tagging w/ Apache Mahout

October 26th, 2010 at 7:58pm

Some time ago Max Heimel from TU Berlin gave presentation of the new HMM support in the Mahout 0.4 release at the Apache Hadoop Get Together in Berlin:

Mahout Max Heimel from Isabel Drost on Vimeo.

Thanks to JTeam for sponsoring video taping, thanks to newthinking for providing the location and thanks to Martin Schmidt from newthinking for producing the video.

Get Together , , , ,

Video: Sebastian Schelter on Recommendation w/ Apache Mahout

October 21st, 2010 at 1:55pm

A few weeks ago we had the autumn edition of the Apache Hadoop Get Together in newthinking store in Berlin. I am glad to announce the first video online:

Mahout Sebastian Schelter from Isabel Drost on Vimeo.

Thanks to JTeam for sponsoring video taping, thanks to newthinking for providing the location and thanks to Martin Schmidt from newthinking for producing the video.

Stay tuned for the second video to be published next week.

Get Together, Mahout , ,