Inductive Bias

Inductive Bias

Elastic Search meetup Berlin

November 28, 2012

Elastic Search meetup Berlin # Today Retresco hosted the (to my knowledge fourth) Elastic Search User Group Berlin - a group dedicated to using Lucene as part of Elastic Search. With roughly fifteen attendees the meetup attracted a decent crowd - most interestingly many of the people there were already using the software either in production or for closed beta projects. The fist talk given was by people from ferret-go - a company doing media monitoring for brands focused on the German market. ...

ApacheConEU - part 11 (last part)

November 20, 2012

buildr, ApacheCon, Apache Con, log4j, apacheconeu

ApacheConEU - part 11 (last part) # One of the last sessions covered logging frameworks for Java. Christian Grobmeier started by detailing the common requirements for all logging frameworks: Speed - developers do not want to pay a disproportional penalty for using a logging framework. Fail-safety and reliability - under no circumstances should your logging framework kill your application. In addition it would be most annoying to find that one log message that would help you de-cypher the problem your application ran into missing. ...

ApacheConEU - part 10

November 19, 2012

Lucene, tika, ApacheCon, Apache Con, apacheconeu

ApacheConEU - part 10 # In the next session Jukka introduced Tika - a toolkit for parsing content from files including a heuristics based component for guessing the file type: Based on file extension, magic and certain patterns in the file the file type can be guessed rather reliably. Some anecdotes: not all mime types are registered with IANA, there are of course conflicting file extensions, Microsoft Word not only localises their interface but also the magic in the file, ...

ApacheCon EU - part 09

November 18, 2012

tfidf, Apache Con, ApacheCon, Lucene, ap, apacheconeu, Solr

ApacheCon EU - part 09 # In the Solr track Elastic Search and Solr Cloud went into competition. The comparison itself was slightly apples-and-oranges like as the speaker compared the current ES version based on Lucene 3.x and Solr Cloud based on Lucene 4.0. During the comparison it still turned out that both solutions are more or less comparable - so choice again depends on your application. However I did like the conclusion: The speaker did not pick a clear winner in terms of projects. ...

ApacheConEU - part 08

November 17, 2012

ApacheCon, Apache Con, community, CouchDB, apacheconeu

ApacheConEU - part 08 # Jan Lehnardt’s talk covered the history of CouchDB - including lessons learnt along the way. The first issue he went into: Shipping 1.0 is hard! They spent a lot of effort and time in order to have a stable database that won’t loose your data - only to have a poorly patch slip in for 1.0 that resulted in data loss. The fury of action happening afterwards was truely amazing - people working on rolling shifts all over the planet to not only fix the issue but also provide recovery tooling for those affected by the bug. ...

ApacheConEU - part 07

November 16, 2012

nutch, Apache Con, ApacheCon, Lucene, gora, apacheconeu, Solr

ApacheConEU - part 07 # Julien Nioche shared some details on the nutch crawler. Being the mother of all Hadoop projects (as in Hadoop was born out of developments inside of nutch) the project has become rather quite with a steady stream of development in the recent past. Julien himself uses the nutch for gathering crawled data for several customer projects - feeding this data into an NLP pipeline based on Behemoth that glues Mahout, UIMA and Gate together. ...

ApacheConEU - part 06

November 15, 2012

ApacheCon, Apache Con, tomcat, apacheconeu

ApacheConEU - part 06 # For the next session I joined the Tomcat crowd in Marc Thomas’ to learn more on Tomcat reverse proxy configurations. One rather common setup is to have Tomcat connected to an httpd instance. One common issue encountered with this setup in particular when running httpd with the event mpm is the problem of thread exhaustion on tomcat’s side. Fixes include always having more active tomcat threads than there can be httpd threads at any one time and to disable persistent connections. ...

ApacheConEU - part 05

November 14, 2012

hbase, ApacheCon, Apache Con, Hadoop, apacheconeu

ApacheConEU - part 05 # The afternoon featured several talks on HBase - both it’s implementation as well as schema optimisation. One major issue in schema design in the choice of key. Simplest recommendation is to make sure that keys are designed such that on reading data load will be evenly distributed accross all nodes to prevent region-server hot-spotting. General advise here are hashing or reversing urls. When it comes to running your own HBase cluster make sure you know what is going on in the cluster at any point in time: ...

ApacheConEU - part 04

November 13, 2012

ApacheCon, Apache Con, Hadoop, apacheconeu

ApacheConEU - part 04 # The second talk I went to was the one on the dev@hadoop.a.o insights given by Steve Loughran. According to Steve Hadoop has turned into what he calls an operating system for the data center - similar to Linux in that it’s development is not driven by a vendor but by its users: Even though Hortenworks, Cloudera and MapR each have full time people working on Hadoop (and related projects), this work usually is driven by customer requirements which ultimately means that someone is running a Hadoop cluster that he has trouble with and wants to have fixed. ...

ApacheConEU - part 03

November 12, 2012

Httpd, ApacheCon, Apache Con, apacheconeu

ApacheConEU - part 03 # Tuesday started early with a plenary - run by the sponsor, not too many news there, except for the very last slide that raised a question that is being discussed often also within the ASF - namely how to define oneself compared to non-ASF projects. What is the real benefit for our users - and what is the benefit for people to go with the ASF. ...