Inductive Bias

Linux vs. Hadoop - some inspiration?

January 16, 2013
Hadoop, brainstorming, standardisation, Linux, desgin, Hacking

Linux vs. Hadoop - some inspiration? # This (even for my blog’s standards) long-ish blog post was inspired by a talk given late last year at Apache Con EU as well as from discussions around what constitutes “Apache Hadoop compatibility” and how to make extending Hadoop easier. The post is based on conversations with at least one guy close to the Linux kernel community and another developer working on Hadoop. ...

ABC - die Katze lief im Schnee

January 11, 2013
Relocating to Berlin, Berlin, winter, snow

ABC - die Katze lief im Schnee # Seen this morning in Berlin: A little impression from what the city looked like the weeks before it turned green on Christmas: For winter images of other years see also previous posts. Title taken from a children’s song:

On Taming Text

January 1, 2013
Mahout, Lucene, review, book, Science

On Taming Text # This time of the year I would usually post pictures of my bicycle standing in the snow somewhere in Tierpark. This year however I was tricked into using public transport instead: a) After my husband found a new job, we now share some of the route to work - and he isn’t crazy going by bike when it’s snowing. b) I got myself a Nexus7 earlier this month which obsoleted having to take paper books with me when using public transport. ...

Thanks for all the help

December 31, 2012
Free Software, Thanks, General

Thanks for all the help # This year was a blast: It started with the ever great FOSDEM in Brussels (see you there in 2013?), an invitation to GeeCon in Poznan (if you ever get an invitation to speak there - do accept, the organisers do an amazing job at that event). In summer we had Berlin Buzzwords in Berlin for the third time with 700 attendees (to retain the community feel to the conference we decided to limit tickets in 2013, so make sure you get your’s early). ...

RecSys Stammtisch Berlin - December 2012

December 30, 2012
Mahout, Science, recommendation, music, General

RecSys Stammtisch Berlin - December 2012 # Earlier this month I attended the fourth Recommender Stammtisch in Berlin. The event was kindly hosted by Soundcloud - who on top of organising the speakers provided a really yummy buffet by Kochzeichen D. With Paul Lamere the evening started with a very entertaining but also very packed talk on why music recommendation is special - or put more generally why all recommender systems are special: ...

Elastic Search meetup Berlin

November 28, 2012
Lucene, elasticsearch, Get Together

Elastic Search meetup Berlin # Today Retresco hosted the (to my knowledge fourth) Elastic Search User Group Berlin - a group dedicated to using Lucene as part of Elastic Search. With roughly fifteen attendees the meetup attracted a decent crowd - most interestingly many of the people there were already using the software either in production or for closed beta projects. The fist talk given was by people from ferret-go - a company doing media monitoring for brands focused on the German market. ...

ApacheConEU - part 11 (last part)

November 20, 2012
buildr, ApacheCon, Apache Con, log4j, apacheconeu

ApacheConEU - part 11 (last part) # One of the last sessions covered logging frameworks for Java. Christian Grobmeier started by detailing the common requirements for all logging frameworks: Speed - developers do not want to pay a disproportional penalty for using a logging framework. Fail-safety and reliability - under no circumstances should your logging framework kill your application. In addition it would be most annoying to find that one log message that would help you de-cypher the problem your application ran into missing. ...

ApacheConEU - part 10

November 19, 2012
Lucene, tika, ApacheCon, Apache Con, apacheconeu

ApacheConEU - part 10 # In the next session Jukka introduced Tika - a toolkit for parsing content from files including a heuristics based component for guessing the file type: Based on file extension, magic and certain patterns in the file the file type can be guessed rather reliably. Some anecdotes: not all mime types are registered with IANA, there are of course conflicting file extensions, Microsoft Word not only localises their interface but also the magic in the file, ...

ApacheCon EU - part 09

November 18, 2012
tfidf, Apache Con, ApacheCon, Lucene, ap, apacheconeu, Solr

ApacheCon EU - part 09 # In the Solr track Elastic Search and Solr Cloud went into competition. The comparison itself was slightly apples-and-oranges like as the speaker compared the current ES version based on Lucene 3.x and Solr Cloud based on Lucene 4.0. During the comparison it still turned out that both solutions are more or less comparable - so choice again depends on your application. However I did like the conclusion: The speaker did not pick a clear winner in terms of projects. ...

ApacheConEU - part 08

November 17, 2012
ApacheCon, Apache Con, community, CouchDB, apacheconeu

ApacheConEU - part 08 # Jan Lehnardt’s talk covered the history of CouchDB - including lessons learnt along the way. The first issue he went into: Shipping 1.0 is hard! They spent a lot of effort and time in order to have a stable database that won’t loose your data - only to have a poorly patch slip in for 1.0 that resulted in data loss. The fury of action happening afterwards was truely amazing - people working on rolling shifts all over the planet to not only fix the issue but also provide recovery tooling for those affected by the bug. ...