November 18, 2012
ApacheCon EU - part 09 # In the Solr track Elastic Search and Solr Cloud went into competition. The comparison itself was slightly apples-and-oranges like as the speaker compared the current ES version based on Lucene 3.x and Solr Cloud based on Lucene 4.0. During the comparison it still turned out that both solutions are more or less comparable - so choice again depends on your application. However I did like the conclusion: The speaker did not pick a clear winner in terms of projects.
...
November 17, 2012
ApacheConEU - part 08 # Jan Lehnardt’s talk covered the history of CouchDB - including lessons learnt along the way. The first issue he went into: Shipping 1.0 is hard! They spent a lot of effort and time in order to have a stable database that won’t loose your data - only to have a poorly patch slip in for 1.0 that resulted in data loss. The fury of action happening afterwards was truely amazing - people working on rolling shifts all over the planet to not only fix the issue but also provide recovery tooling for those affected by the bug.
...
November 16, 2012
ApacheConEU - part 07 # Julien Nioche shared some details on the nutch crawler. Being the mother of all Hadoop projects (as in Hadoop was born out of developments inside of nutch) the project has become rather quite with a steady stream of development in the recent past. Julien himself uses the nutch for gathering crawled data for several customer projects - feeding this data into an NLP pipeline based on Behemoth that glues Mahout, UIMA and Gate together.
...
November 15, 2012
ApacheConEU - part 06 # For the next session I joined the Tomcat crowd in Marc Thomas’ to learn more on Tomcat reverse proxy configurations. One rather common setup is to have Tomcat connected to an httpd instance. One common issue encountered with this setup in particular when running httpd with the event mpm is the problem of thread exhaustion on tomcat’s side. Fixes include always having more active tomcat threads than there can be httpd threads at any one time and to disable persistent connections.
...
November 14, 2012
ApacheConEU - part 05 # The afternoon featured several talks on HBase - both it’s implementation as well as schema optimisation. One major issue in schema design in the choice of key. Simplest recommendation is to make sure that keys are designed such that on reading data load will be evenly distributed accross all nodes to prevent region-server hot-spotting. General advise here are hashing or reversing urls.
When it comes to running your own HBase cluster make sure you know what is going on in the cluster at any point in time:
...
November 13, 2012
ApacheConEU - part 04 # The second talk I went to was the one on the dev@hadoop.a.o insights given by Steve Loughran. According to Steve Hadoop has turned into what he calls an operating system for the data center - similar to Linux in that it’s development is not driven by a vendor but by its users: Even though Hortenworks, Cloudera and MapR each have full time people working on Hadoop (and related projects), this work usually is driven by customer requirements which ultimately means that someone is running a Hadoop cluster that he has trouble with and wants to have fixed.
...
November 12, 2012
ApacheConEU - part 03 # Tuesday started early with a plenary - run by the sponsor, not too many news there, except for the very last slide that raised a question that is being discussed often also within the ASF - namely how to define oneself compared to non-ASF projects. What is the real benefit for our users - and what is the benefit for people to go with the ASF.
...
November 11, 2012
ApacheCon EU - part 02 # For me the week started with the Monday Hackathon. Even though I was there early the room quickly filled up and was packed at lunch time. I really liked the idea of having people interested in a topic register in advance - it gave the organisers a chance to assign tables to topics and put signs on the tables to advertise the topic worked on.
...
November 10, 2012
ApacheConEU - part 01 # Apache Con EU in Germany - in November, in Sinsheim (in the middle of nowhere): I have to admit that I was more than skeptical whether that would actually work out very well. A day after the closing session it’s clear that the event was a huge success: Days before all tickets were sold out, there were six sessions packed with great talks on all things related to Apache Software Foundation projects - httpd, tomcat, lucene, open office, hadoop, apache commons, james, felix, cloud stack and tons of other projects were well covered.
...
September 15, 2012
Speaking at ApacheCon EU 2012 # I’ll be at ApacheCon EU in November. Looking forward to an interesting conference on all things Apache that is finally returning back to Europe. Go there if you want to learn more on Tomcat, Hadoop, httpd, HBase, Camel, Open Office, Mahout, Lucene and more.
Now on to prepare the two talks I submitted:
“Choosing the right tool for your data analysis task - Apache Mahout in context”
...