Inductive Bias

Inductive Bias

Flying back home from Cologne

August 23, 2009

Mahout, Germany, Free Software, Software Foundation, FrOSCon

Flying back home from Cologne # Last weekend FrOSCon took place in Sankt Augustin, near Cologne. FrOSCon is organized on a yearly basis at the university of applied sciences in Sankt Augustin. It is a volunteer driven event with the goal of bringing developers and users of free software projects together. This year, the conference featured 5 tracks, two examples being cloud computing and the Java track. Unfortunately this year the conference started with a little surprise for me and my boyfriend: Being both speakers, we had booked a room in Hotel Regina via the conference committee. ...

Converting a git repo to svn

August 17, 2009

svn, Hacking, git

Converting a git repo to svn # Pretty unlikely though it may seem, but there are cases when one might want to convert a git repo to svn and still keep all revisions intact. There is a nice explanation online on how to do that in the Google Open Source blog.

September 2009 Hadoop Get Together Berlin

August 17, 2009

JAQL, Hadoop, Software Foundation, Lucene, Event, Get Together

September 2009 Hadoop Get Together Berlin # The newthinking store Berlin is hosting the Hadoop Get Together user group meeting. It features talks on Hadoop, Lucene, Solr, UIMA, katta, Mahout and various other projects that deal with making large amounts of data accessible and processable. The event brings together leaders from the developer and user communities. The speakers present projects that build on top of Hadoop, case studies of applications being built and deployed on Hadoop. ...

AMQP Erlang user group talk

July 10, 2009

Messaging, Hacking, Erlang, Free Software, General

AMQP Erlang user group talk # Last Wednesday at the Erlang user group Berlin Matthias Radestock from the RabbitMQ project gave a talk on RabbitMQ, AMQP and messaging in general. Slides are available online. First Matthias motivated the need for an open standard for messaging: So far, their are a few provides of middleware systems like Tibco and IBM. But those solutions are usually closed, expensive, cumbersome to handle. In short they do not fit into a world where people rely on open standards for communication, free software for development and lightweight implementations. ...

Solr at AOL

July 2, 2009

Solr, Hacking, Free Software, Software Foundation

Solr at AOL # Grant Ingersoll has posted a very interesting interview with Ian Holsman on Solr at Relegance, now AOL. It describes the business side of the decission to switch to an open source solution, provides some inside on the size of the installation and details which technological reasons have driven the decission to switch from a proprietary implementation to Solr: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Podcasts-and-Videos/Interview-Ian-Holsman-Relegence</ a>

Lucene slides online

June 30, 2009

Lucene, Get Together, General

Lucene slides online # The slides of the Lucene talk at the last Apache Hadoop Get Together Berlin are available online: Lucene Slides. Especially interesting to me are the last few slides which detail both index size and machine setup: The installation is running on two standard PCs with 2 dual-core processors (usual speed, bought in January 2008 for about 4000 Euro). They have 32GB RAM, 24 GB are used as ramdisk for the index. ...

Data serialization

June 26, 2009

Avro, Data Serialization, General, Protocol Buffers, Etch, Get Together, Thrift

Data serialization # XML, JSON and others are currently standard data exchange formats. Being human-readable but still structured enough to be easily parsable by programs is their main benefit. Problems are overhead in size and parsing time. In addition at least xml is not really as human-readable as it could be. An alternative are binary formats. Yet those often are not platform independent (either C++ or Java or Python bindings) or are not upgradable (what if your boss comes along and wants you to add yet another field? ...

Large Scalability - Papers and implementations

June 23, 2009

search, Hacking, Free Software, Hadoop, Software Foundation

Large Scalability - Papers and implementations # In recent years the Googles and Amazons on this world have released papers on how to scale computing and processing to terrabytes of data. These publications have led to the implementation of various open source projects that benefit from that knowledge. However mapping the various open source projects to the original papers and assigning tasks that these projects solve is not always easy. ...

June 2009 Apache Hadoop Get Together @ Berlin

June 21, 2009

Hadoop

June 2009 Apache Hadoop Get Together @ Berlin # Just a brief reminder: Next week on Thursday the next Apache Hadoop Get Together is scheduled to take place in Berlin. There are quite a few interesting talks scheduled: Torsten Curdt: Data Legacy - the challenges of an evolving data warehouse Christoph M. Friedrich, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI): “SCAIView - Lucene for Life Science Knowledge Discovery”. ...

Scrum Table Berlin

June 21, 2009

Scrum, Event, General

Scrum Table Berlin # Last week I attended the scrum table Berlin. This time around Phillippe gave a presentation on “backlog colours”, that is types of work items tracked in the backlog. The easiest type to track are features - that is items that generate revenue and are on the wishlist of the customer. Second type of items he sees are infrastructure items - that is, things needed to implement several features but invisible to the customer. ...