August 26, 2013
Wonder if you should switch from your RDBMS to Apache Hadoop: Don’t! # Last weekend I spend a lot of fun time at FrOSCon* in Sankt Augustin - always great to catch up with friends in the open source space. As always there were quite a few talks on NoSQL, Hadoop, but also really solid advise on tuning your system for stuff like MySQL (including a side note on PostgreSQL and Oracle) from Kristian Köhntopp.
...
May 18, 2013
JAX: Hadoop overview by Bernd Fondermann # After breakfast was over the first day started with a talk by Bernd on the
Hadoop ecosystem. He did a good job selecting the most important and
interesting projects related to storing data in HDFS and processing it with Map
Reduce. After the usual "what is Hadoop", "what does the general architecture
look like", "what will change with YARN" Bernd gave a nice overview of which
...
May 16, 2013
Hadoop Summit Amsterdam # About a month ago I attended the first European Hadoop Summit, organised by
Hortonworks in Amsterdam. The two day conference brought together both vendors
and users of Apache Hadoop for talks, exhibition and after conference beer
drinking.
Russel Jurney kindly asked me to chair the Hadoop applied track during
Apache Con EU. As a result I had a good excuse to attend the event. Overall
...
May 14, 2013
ApacheConNA: Hadoop metrics # Have you ever measured the general behaviour of your Hadoop jobs? Have you
sized your cluster accordingly? Do you know whether your work load really is IO
bound or CPU bound? Legend has it noone expecpt Allen Wittenauer over at
Linked.In, formerly Y! ever did this analysis for his clusters.
Steve Watt gave a pitch for actually going out into your datacenter measuring
what is going on there and adjusting the deployment accordingly: In small
...
May 8, 2013
Apache Hadoop Get Together Berlin # This evening I joined the group over at Immobilienscout 24 for today’s Hadoop Get Together. David Obermann had invited Dr. Falk-Florian Henrich from CeleraOne to talk about their real-time analytics on live data streams.
Their system is being used by the New York Times Springer’s Die Welt for traffic analysis. The goal is to identify recurring users that might be willing to pay for the content they want to read.
...
February 17, 2013
Notes on storage options - FOSDEM 05 # On MySQL
Second day at FOSDEM for me started with the MySQL dev room. One thing that made me smile was in the MySQL new features talk: The speaker announced support for “NoSQL interfaces” to MySQL. That is kind of fun in two dimensions: A) What he really means is support for the memcached interface. Given the vast number of different interfaces to databases today, announcing anything as “supports NoSQL interfaces” sounds kind of silly.
...
January 16, 2013
Linux vs. Hadoop - some inspiration? # This (even for my blog’s standards) long-ish blog post was inspired by a talk given late last year at Apache Con EU as well as from discussions around what constitutes “Apache Hadoop compatibility” and how to make extending Hadoop easier. The post is based on conversations with at least one guy close to the Linux kernel community and another developer working on Hadoop.
...
November 14, 2012
ApacheConEU - part 05 # The afternoon featured several talks on HBase - both it’s implementation as well as schema optimisation. One major issue in schema design in the choice of key. Simplest recommendation is to make sure that keys are designed such that on reading data load will be evenly distributed accross all nodes to prevent region-server hot-spotting. General advise here are hashing or reversing urls.
When it comes to running your own HBase cluster make sure you know what is going on in the cluster at any point in time:
...
November 13, 2012
ApacheConEU - part 04 # The second talk I went to was the one on the dev@hadoop.a.o insights given by Steve Loughran. According to Steve Hadoop has turned into what he calls an operating system for the data center - similar to Linux in that it’s development is not driven by a vendor but by its users: Even though Hortenworks, Cloudera and MapR each have full time people working on Hadoop (and related projects), this work usually is driven by customer requirements which ultimately means that someone is running a Hadoop cluster that he has trouble with and wants to have fixed.
...
August 28, 2012
Video: Stefan Hübner on Cascalog # Stefan Hübner: "Introducing Cascalog: Functional Data Processing for Hadoop" from David Obermann on Vimeo.