November 16, 2012
ApacheConEU - part 07 # Julien Nioche shared some details on the nutch crawler. Being the mother of all Hadoop projects (as in Hadoop was born out of developments inside of nutch) the project has become rather quite with a steady stream of development in the recent past. Julien himself uses the nutch for gathering crawled data for several customer projects - feeding this data into an NLP pipeline based on Behemoth that glues Mahout, UIMA and Gate together.
...
November 15, 2012
ApacheConEU - part 06 # For the next session I joined the Tomcat crowd in Marc Thomas’ to learn more on Tomcat reverse proxy configurations. One rather common setup is to have Tomcat connected to an httpd instance. One common issue encountered with this setup in particular when running httpd with the event mpm is the problem of thread exhaustion on tomcat’s side. Fixes include always having more active tomcat threads than there can be httpd threads at any one time and to disable persistent connections.
...
November 14, 2012
ApacheConEU - part 05 # The afternoon featured several talks on HBase - both it’s implementation as well as schema optimisation. One major issue in schema design in the choice of key. Simplest recommendation is to make sure that keys are designed such that on reading data load will be evenly distributed accross all nodes to prevent region-server hot-spotting. General advise here are hashing or reversing urls.
When it comes to running your own HBase cluster make sure you know what is going on in the cluster at any point in time:
...
November 13, 2012
ApacheConEU - part 04 # The second talk I went to was the one on the dev@hadoop.a.o insights given by Steve Loughran. According to Steve Hadoop has turned into what he calls an operating system for the data center - similar to Linux in that it’s development is not driven by a vendor but by its users: Even though Hortenworks, Cloudera and MapR each have full time people working on Hadoop (and related projects), this work usually is driven by customer requirements which ultimately means that someone is running a Hadoop cluster that he has trouble with and wants to have fixed.
...
November 12, 2012
ApacheConEU - part 03 # Tuesday started early with a plenary - run by the sponsor, not too many news there, except for the very last slide that raised a question that is being discussed often also within the ASF - namely how to define oneself compared to non-ASF projects. What is the real benefit for our users - and what is the benefit for people to go with the ASF.
...
November 11, 2012
ApacheCon EU - part 02 # For me the week started with the Monday Hackathon. Even though I was there early the room quickly filled up and was packed at lunch time. I really liked the idea of having people interested in a topic register in advance - it gave the organisers a chance to assign tables to topics and put signs on the tables to advertise the topic worked on.
...
November 10, 2012
ApacheConEU - part 01 # Apache Con EU in Germany - in November, in Sinsheim (in the middle of nowhere): I have to admit that I was more than skeptical whether that would actually work out very well. A day after the closing session it’s clear that the event was a huge success: Days before all tickets were sold out, there were six sessions packed with great talks on all things related to Apache Software Foundation projects - httpd, tomcat, lucene, open office, hadoop, apache commons, james, felix, cloud stack and tons of other projects were well covered.
...
October 29, 2012
Teddy in London # While I was at the conference – Teddy spent some time exploring the surroundings of the conference hotel. Looks like in particular Hyde park was attractive:
October 28, 2012
Strata EU - part 4 # The rest of the day was mainly reserved for more technical talks: Tom Wight introducing the merits of MR2, also known as YARN. Steve Loughran gave a very insightful talk on the various failure modes of Hadoop – though the Namenode is like the most obvious single point of failure there are a few more traps waiting for those depending on their Hadoop clusters: Hadoop does just find with single harddisks failing.
...
October 27, 2012
Strata EU - part 3 # The first Tuesday morning keynote put the hype around big data into historical context: According to wikipedia big data apps are defined by their capability of coping with data set sizes that are larger than can be handled with commonly available machines and algorithms. Going from that definition we can look back to history and will realize that the issue of big data actually isn’t that new: Even back in the 1950s people had to deal with big data problems.
...