Chris Male on spatial search with Lucene

2010-03-16 20:42
Last week the March 2010 Hadoop Get Together took place in Berlin. Last speaker was Chris Male on spatial search with Lucene and Solr. The video is now available online:

Lucene Chris Male from Isabel Drost on Vimeo.

Feel free to share and distribute the video to anyone who might be interested. Thank you Chris, for traveling over from Amsterdam for an awesome talk on spatial search.

If you want to learn more on what people over at Lucene and Solr are currently working one, head over to Berlin Buzzwords - a conference on scalable search, storage and data analysis. If you yourself have interesting projects - feel free to submit a talk.

Thanks to Nokia for sponsoring the video taping - and again as always thanks to newthinking for providing the location for free.

Slides are available

2010-03-11 00:49
Slides for the last Hadoop Get Together are available online:

Videos will follow as soon as the are ready. Watch this space for further updates.

Apache Hadoop Get Together March 2010

2010-03-11 00:40
Today (or more correctly, yesterday) the March 2010 Hadoop Get Together took place in newthinking store. I arrived rather early to have some time to do some planning for Berlin Buzzwords - got there nearly one hour before the meetup. However it did not take very long until first guests came to the store. So I quickly got my introductory slides in place - Martin from newthinking already had the room setup, camera in place and audio working.

When starting the meetup the room was already packed with some 60 people - we ended up having over 70 people interested in the mix of talks on Hadoop, HBase and Spatial search with Lucene and Solr. Doing the regular "Who are you"-round, we learned that there were people from nurago, Xing, StudiVZ, *lots and lots* of people from Nokia, Zanox, eCircle, and many others.

The meetup was kindly supported by newthinking store (venue for free) and Nokia (sponsored the videos). Steffen Bickel took his chance during the introduction to give a brief overview of Nokia and - guess - explain, that Nokia is a great place to work and yeah - they are hiring!

The first talk was given by Bob Schulze who joined the meetup coming from eCircle in Munich. Given his previous experience with scaling their infrastructure from a regular database/ datawarehouse setup he explained how HBase helped when processing really large amounts of data. Being an e-mail marketing provider, eCircle does have quite a bit of data to process. And yes, eCircle is hiring.

Second talk was by Dragan Milosevic from Zanox on scaling product search and reporting with Hadoop. Just as eCircle, Zanox came from a regular RDMS setup that became too expensive and too complex too scale before switching over to a Hadoop/Lucene stack. He used his chance to make the Lucene developers aware of the fact that there are users who would were actually using Lucene's compression features. Zanox, as well, is looking for people to hire.

Last talk was by Chris Male from JTeam in Amsterdam on the developments in Lucene and Solr to support for spatial search. There are various development routes being followed: Cartesian tiers as well as numeric range searches. He also explained that most of the features are still under heavy development. He finished his talk with a demo on what can be done with spatial search in Lucene/ Solr. You already guessed so, JTeam is hiring as well ;)

After the talks we went to Cafe Aufsturz for beers, drinks and some food. People enjoyed talking to each other exchanging experiences. A Lucene focussed table quickly formed - main topics: Spatial search, Lucene/Solr merge threads, heavy committing, Mike McCandless (is this guy real or just an alter-ego of the Lucene community?).

At some time around 11p.m. the core of the guests (well - the Lucene part of the meetup, that is Simon, Uwe and the guys from JTeam) moved over to a bar close by next to cinema central for some more beer and drinks. At about 1a.m. it finally was time to head home.

I'd like to say thanks: First of all to the speakers. Without you the meetup would not be possible. Second to newthinking and Nokia for their support. And of course to all attendees for having grown the meetup to its current size.

I had a really nice evening with people from the Hadoop, HBase and Lucene community. Special thanks to you guys from JTeam for traveling 6h to Berlin just for a "little", though no longer that tiny, Hadoop meetup. Promise stands, to visit one of your next Lucene meetups in Amsterdam and present Mahout there - however I need some help finding affordable accomodation ;)

Hope to see you all in June at Berlin Buzzwords.

March 2010 Apache Hadoop Get Together Berlin

2010-01-29 08:40
This is to announce the next Apache Hadoop Get Together that will take place in newthinking store in Berlin.

  • When: March 10th, 4p.m.
  • Where: Newthinking store Berlin

As always there will be slots of 20min each for talks on your Hadoop topic. After each talk there will be a lot time to discuss. You can order drinks directly at the bar in the newthinking store. If you like, you can order pizza. We will go to Cafe Aufsturz after the event for some beer and something to eat.

View Larger Map

Talks scheduled so far:

Chris Male (JTeam/ Amsterdam): Spatial Search with Solr

Abstract: The rise in popularity of Google Maps and mobile devices with GPS have resulted in a trend in the search field. People are no longer content with finding results that match a text query, they also want to find results which are near a location. So called spatial search differs considerably from traditional free text search in that it cannot be achieved through common search techniques such as inverted indexes. Instead, new algorithms and data structures had to be developed that achieve efficient and accurate spatial search, that also allow spatial search to have a role in the determination of a result's relevance. This technology has primarily been found in proprietary closed source search applications, however in the last 12-18 months, considerable effort has been invested into bringing open source spatial search support to Apache Solr and Lucene. While much is still left to be done, this talk will introduce how spatial search is currently supported in Solr, what work is happening currently, and a roadmap for future developments.

Dragan Milosevic (zanox/ Berlin: Product Search and Reporting powered by Hadoop


To efficiently process and index 80 million products, as well as store and analyse 30 million clicks and 500 million views daily, Zanox AG is using Hadoop HDFS and Map?Reduce technologies. This talk will present product-processing and reporting frameworks running on 17 node Hadoop cluster, being able to (1) robustly store products and tracking data in distributed manner, (2) rapidly consolidate, normalise and categorise products, (3) merge and aggregate tracking data and (4) efficiently builds indexes for supporting distributed search and reporting, running in several search clusters.

Bob Schulze (eCircle/ Munich): Database and Table Design Tips with HBase

Abstract: Recurring design patterns for the BigTable/HBase storage model.

A big Thanks goes to the newthinking store for providing a room in the center of Berlin for us. Another big thanks goes to Nokia Gate 5 for sponsoring videos of the talks. Links to the videos will be posted here.

Please do indicate on the following Upcoming event if you are planning to attend to make planning (and booking tables at Aufsturz) easier. Registration through Xing is possible as well.

Looking forward to seeing you in Berlin,