August 17, 2009
September 2009 Hadoop Get Together Berlin # The newthinking store Berlin is hosting the Hadoop Get Together user group meeting. It features talks on Hadoop, Lucene, Solr, UIMA, katta, Mahout and various other projects that deal with making large amounts of data accessible and processable. The event brings together leaders from the developer and user communities. The speakers present projects that build on top of Hadoop, case studies of applications being built and deployed on Hadoop.
...
July 10, 2009
AMQP Erlang user group talk # Last Wednesday at the Erlang user group Berlin Matthias Radestock from the RabbitMQ project gave a talk on RabbitMQ, AMQP and messaging in general. Slides are available online.
First Matthias motivated the need for an open standard for messaging: So far, their are a few provides of middleware systems like Tibco and IBM. But those solutions are usually closed, expensive, cumbersome to handle. In short they do not fit into a world where people rely on open standards for communication, free software for development and lightweight implementations.
...
July 2, 2009
Solr at AOL # Grant Ingersoll has posted a very interesting interview with Ian Holsman on Solr at Relegance, now AOL. It describes the business side of the decission to switch to an open source solution, provides some inside on the size of the installation and details which technological reasons have driven the decission to switch from a proprietary implementation to Solr:
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Podcasts-and-Videos/Interview-Ian-Holsman-Relegence</ a>
June 30, 2009
Lucene slides online # The slides of the Lucene talk at the last Apache Hadoop Get Together Berlin are available online: Lucene Slides. Especially interesting to me are the last few slides which detail both index size and machine setup:
The installation is running on two standard PCs with 2 dual-core processors (usual speed, bought in January 2008 for about 4000 Euro). They have 32GB RAM, 24 GB are used as ramdisk for the index.
...
June 26, 2009
Data serialization # XML, JSON and others are currently standard data exchange formats. Being human-readable but still structured enough to be easily parsable by programs is their main benefit. Problems are overhead in size and parsing time. In addition at least xml is not really as human-readable as it could be.
An alternative are binary formats. Yet those often are not platform independent (either C++ or Java or Python bindings) or are not upgradable (what if your boss comes along and wants you to add yet another field?
...
June 23, 2009
Large Scalability - Papers and implementations # In recent years the Googles and Amazons on this world have released papers on how to scale computing and processing to terrabytes of data. These publications have led to the implementation of various open source projects that benefit from that knowledge. However mapping the various open source projects to the original papers and assigning tasks that these projects solve is not always easy.
...
June 21, 2009
June 2009 Apache Hadoop Get Together @ Berlin # Just a brief reminder: Next week on Thursday the next Apache Hadoop Get Together is scheduled to take place in Berlin. There are quite a few interesting talks scheduled:
Torsten Curdt: Data Legacy - the challenges of an evolving data warehouse
Christoph M. Friedrich, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI): “SCAIView - Lucene for Life Science Knowledge Discovery”.
...
June 21, 2009
Scrum Table Berlin # Last week I attended the scrum table Berlin. This time around Phillippe gave a presentation on “backlog colours”, that is types of work items tracked in the backlog.
The easiest type to track are features - that is items that generate revenue and are on the wishlist of the customer. Second type of items he sees are infrastructure items - that is, things needed to implement several features but invisible to the customer.
...
June 21, 2009
Open Street Map @ FSFE meetup # At the last meeting of the local FSFE group here in Berlin Sabine Stengel from cartogis gave a presentation on Open Street Map. But instead of focussing on the technical side she described the legal issues and showed the broad variety of commercial projects that are possible with this type of mapping information.
It was interesting to learn of how detailed and high quality the information provided by volunteers really is.
...
June 21, 2009
Keeping changesets small # One trick of successful and efficient software development is tracking changes in the sources in source code management systems, be it centralized systems like svn or perforce or decentralized systems like git or mercurial. I started working with svn while working on my Diploma thesis project in 2003, continued to use this systems while researcher at HU Berlin. Today I am using svn at work as well as for Apache projects and have come to like git for personal sandboxes.
...