Hadoop

Hadoop

Wonder if you should switch from your RDBMS to Apache Hadoop: Don't!

August 26, 2013

Wonder if you should switch from your RDBMS to Apache Hadoop: Don’t! # Last weekend I spend a lot of fun time at FrOSCon* in Sankt Augustin - always great to catch up with friends in the open source space. As always there were quite a few talks on NoSQL, Hadoop, but also really solid advise on tuning your system for stuff like MySQL (including a side note on PostgreSQL and Oracle) from Kristian Köhntopp. ...

JAX: Hadoop overview by Bernd Fondermann

May 18, 2013

BigDataCon, Hadoop, JAX, Event

JAX: Hadoop overview by Bernd Fondermann # After breakfast was over the first day started with a talk by Bernd on the Hadoop ecosystem. He did a good job selecting the most important and interesting projects related to storing data in HDFS and processing it with Map Reduce. After the usual "what is Hadoop", "what does the general architecture look like", "what will change with YARN" Bernd gave a nice overview of which ...

Hadoop Summit Amsterdam

May 16, 2013

amsterdam, Hadoop, hadoopsummit, Event

Hadoop Summit Amsterdam # About a month ago I attended the first European Hadoop Summit, organised by Hortonworks in Amsterdam. The two day conference brought together both vendors and users of Apache Hadoop for talks, exhibition and after conference beer drinking. Russel Jurney kindly asked me to chair the Hadoop applied track during Apache Con EU. As a result I had a good excuse to attend the event. Overall ...

ApacheConNA: Hadoop metrics

May 14, 2013

sizing, ApacheConNA, ApacheCon, Apache Con, Hadoop

ApacheConNA: Hadoop metrics # Have you ever measured the general behaviour of your Hadoop jobs? Have you sized your cluster accordingly? Do you know whether your work load really is IO bound or CPU bound? Legend has it noone expecpt Allen Wittenauer over at Linked.In, formerly Y! ever did this analysis for his clusters. Steve Watt gave a pitch for actually going out into your datacenter measuring what is going on there and adjusting the deployment accordingly: In small ...

Apache Hadoop Get Together Berlin

May 8, 2013

Hadoop

Apache Hadoop Get Together Berlin # This evening I joined the group over at Immobilienscout 24 for today’s Hadoop Get Together. David Obermann had invited Dr. Falk-Florian Henrich from CeleraOne to talk about their real-time analytics on live data streams. Their system is being used by the New York Times Springer’s Die Welt for traffic analysis. The goal is to identify recurring users that might be willing to pay for the content they want to read. ...

Notes on storage options - FOSDEM 05

February 17, 2013

Free Software, Hadoop, MySQL, hbase, Fosdem, Event

Notes on storage options - FOSDEM 05 # On MySQL Second day at FOSDEM for me started with the MySQL dev room. One thing that made me smile was in the MySQL new features talk: The speaker announced support for “NoSQL interfaces” to MySQL. That is kind of fun in two dimensions: A) What he really means is support for the memcached interface. Given the vast number of different interfaces to databases today, announcing anything as “supports NoSQL interfaces” sounds kind of silly. ...

Linux vs. Hadoop - some inspiration?

January 16, 2013

Hadoop, brainstorming, standardisation, Linux, desgin, Hacking

Linux vs. Hadoop - some inspiration? # This (even for my blog’s standards) long-ish blog post was inspired by a talk given late last year at Apache Con EU as well as from discussions around what constitutes “Apache Hadoop compatibility” and how to make extending Hadoop easier. The post is based on conversations with at least one guy close to the Linux kernel community and another developer working on Hadoop. ...

ApacheConEU - part 05

November 14, 2012

hbase, ApacheCon, Apache Con, Hadoop, apacheconeu

ApacheConEU - part 05 # The afternoon featured several talks on HBase - both it’s implementation as well as schema optimisation. One major issue in schema design in the choice of key. Simplest recommendation is to make sure that keys are designed such that on reading data load will be evenly distributed accross all nodes to prevent region-server hot-spotting. General advise here are hashing or reversing urls. When it comes to running your own HBase cluster make sure you know what is going on in the cluster at any point in time: ...

ApacheConEU - part 04

November 13, 2012

ApacheCon, Apache Con, Hadoop, apacheconeu

ApacheConEU - part 04 # The second talk I went to was the one on the dev@hadoop.a.o insights given by Steve Loughran. According to Steve Hadoop has turned into what he calls an operating system for the data center - similar to Linux in that it’s development is not driven by a vendor but by its users: Even though Hortenworks, Cloudera and MapR each have full time people working on Hadoop (and related projects), this work usually is driven by customer requirements which ultimately means that someone is running a Hadoop cluster that he has trouble with and wants to have fixed. ...

Video: Stefan Hübner on Cascalog

August 28, 2012

GetTogether, Cascalog, Video, Hadoop, Get Together

Video: Stefan Hübner on Cascalog # Stefan Hübner: "Introducing Cascalog: Functional Data Processing for Hadoop" from David Obermann on Vimeo.