Lucene

Hello elasticsearch

December 2, 2013
Lucene, elasticsearch

Hello elasticsearch # First of all a disclaimer: I had a little bit of time left during the last few weeks. As a result my blog migrated from dynamic wordpress content to statically hosted pages. If anything looks odd, in case you find any encoding issues, if you miss specific functionality - please do let me know. I’ll switch from this beta url back to the old sub-domain in a week or so unless there are major complaints. ...

Elastic Search meetup Berlin – January 2013

February 1, 2013
Lucene, tfidf, Free Software, elastic search

Elastic Search meetup Berlin – January 2013 # The first meetup this year I went to started with a large bag of good news for Elastic Search users. In the offices of Sys Eleven (thanks for hosting) the meetup started at 7p.m. last Tuesday. Simon Willnauer gave an overview of what to expect of the upcoming major release of Elastic Search: For all 0.20.x version ES features a shard allocator version that is ignorant of which index shards belong to, machine properties, usage patterns. ...

On Taming Text

January 1, 2013
Mahout, Lucene, review, book, Science

On Taming Text # This time of the year I would usually post pictures of my bicycle standing in the snow somewhere in Tierpark. This year however I was tricked into using public transport instead: a) After my husband found a new job, we now share some of the route to work - and he isn’t crazy going by bike when it’s snowing. b) I got myself a Nexus7 earlier this month which obsoleted having to take paper books with me when using public transport. ...

Elastic Search meetup Berlin

November 28, 2012
Lucene, elasticsearch, Get Together

Elastic Search meetup Berlin # Today Retresco hosted the (to my knowledge fourth) Elastic Search User Group Berlin - a group dedicated to using Lucene as part of Elastic Search. With roughly fifteen attendees the meetup attracted a decent crowd - most interestingly many of the people there were already using the software either in production or for closed beta projects. The fist talk given was by people from ferret-go - a company doing media monitoring for brands focused on the German market. ...

ApacheConEU - part 10

November 19, 2012
Lucene, tika, ApacheCon, Apache Con, apacheconeu

ApacheConEU - part 10 # In the next session Jukka introduced Tika - a toolkit for parsing content from files including a heuristics based component for guessing the file type: Based on file extension, magic and certain patterns in the file the file type can be guessed rather reliably. Some anecdotes: not all mime types are registered with IANA, there are of course conflicting file extensions, Microsoft Word not only localises their interface but also the magic in the file, ...

ApacheCon EU - part 09

November 18, 2012
tfidf, Apache Con, ApacheCon, Lucene, ap, apacheconeu, Solr

ApacheCon EU - part 09 # In the Solr track Elastic Search and Solr Cloud went into competition. The comparison itself was slightly apples-and-oranges like as the speaker compared the current ES version based on Lucene 3.x and Solr Cloud based on Lucene 4.0. During the comparison it still turned out that both solutions are more or less comparable - so choice again depends on your application. However I did like the conclusion: The speaker did not pick a clear winner in terms of projects. ...

ApacheConEU - part 07

November 16, 2012
nutch, Apache Con, ApacheCon, Lucene, gora, apacheconeu, Solr

ApacheConEU - part 07 # Julien Nioche shared some details on the nutch crawler. Being the mother of all Hadoop projects (as in Hadoop was born out of developments inside of nutch) the project has become rather quite with a steady stream of development in the recent past. Julien himself uses the nutch for gathering crawled data for several customer projects - feeding this data into an NLP pipeline based on Behemoth that glues Mahout, UIMA and Gate together. ...

GeeCon - Solr at Allegro

May 25, 2012
Lucene, geecon, Solr

GeeCon - Solr at Allegro # One particularly interesting to me was on Allegro’s (polish Ebay) Solr usage. In terms of numbers: They have 20Mio offers in Poland, another 10Mio active offers in partnering countries. In addition in their index there are 50Mio inactive offers in Poland and 40 Mio closed offers outside that country. They serve 8Mio updates a day, that is 100 updates a second. Those are related to start/end of bidding phase, buy now actions, cancelled bids, bids themselves. ...

Berlin Buzzwords Schedule online - book your ticket now

April 30, 2012
Lucene, Berlin, Hadoop, Berlin Buzzwords

Berlin Buzzwords Schedule online - book your ticket now # As of beginning of last week the Berlin Buzzwords schedule is online. The Program Committee has completed reviewing all submissions and set up the schedule containing a great lineup of speakers for this years Berlin Buzzwords program. Among the speakers we have Leslie Hawthorn (Red Hat), Alex Lloyd (Google), Michael Busch (Twitter) as well as Nicolas Spiegelberg (Facebook). Checkout our program in the online schedule. ...

Apache Hadoop Get Together - Hand over

November 2, 2011
Scaling, NOSQL, Apache Hadoop Get Together, Hadoop, Lucene, Berlin, Get Together

Apache Hadoop Get Together - Hand over # Apache Hadoop receives lots of attention from large US corporations who are using the project to scale their data processing pipelines: “Facebook uses Hadoop and Hive extensively to process large data sets. […]” (Ashish Thusoo, Engineering Manager at Facebook), “Hadoop is a key ingredient in allowing LinkedIn to build many of our most computationally difficult features […]” (Jay Kreps, Principal Engineer, LinkedIn), “Hadoop enables [Twitter] to store, process, and derive insights from our data in ways that wouldn’t otherwise be possible. ...