Inductive Bias

Solr at AOL

July 2, 2009
Solr, Hacking, Free Software, Software Foundation

Solr at AOL # Grant Ingersoll has posted a very interesting interview with Ian Holsman on Solr at Relegance, now AOL. It describes the business side of the decission to switch to an open source solution, provides some inside on the size of the installation and details which technological reasons have driven the decission to switch from a proprietary implementation to Solr: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Podcasts-and-Videos/Interview-Ian-Holsman-Relegence</ a>

Lucene slides online

June 30, 2009
Lucene, Get Together, General

Lucene slides online # The slides of the Lucene talk at the last Apache Hadoop Get Together Berlin are available online: Lucene Slides. Especially interesting to me are the last few slides which detail both index size and machine setup: The installation is running on two standard PCs with 2 dual-core processors (usual speed, bought in January 2008 for about 4000 Euro). They have 32GB RAM, 24 GB are used as ramdisk for the index. ...

Data serialization

June 26, 2009
Avro, Data Serialization, General, Protocol Buffers, Etch, Get Together, Thrift

Data serialization # XML, JSON and others are currently standard data exchange formats. Being human-readable but still structured enough to be easily parsable by programs is their main benefit. Problems are overhead in size and parsing time. In addition at least xml is not really as human-readable as it could be. An alternative are binary formats. Yet those often are not platform independent (either C++ or Java or Python bindings) or are not upgradable (what if your boss comes along and wants you to add yet another field? ...

Large Scalability - Papers and implementations

June 23, 2009
search, Hacking, Free Software, Hadoop, Software Foundation

Large Scalability - Papers and implementations # In recent years the Googles and Amazons on this world have released papers on how to scale computing and processing to terrabytes of data. These publications have led to the implementation of various open source projects that benefit from that knowledge. However mapping the various open source projects to the original papers and assigning tasks that these projects solve is not always easy. ...

June 2009 Apache Hadoop Get Together @ Berlin

June 21, 2009
Hadoop

June 2009 Apache Hadoop Get Together @ Berlin # Just a brief reminder: Next week on Thursday the next Apache Hadoop Get Together is scheduled to take place in Berlin. There are quite a few interesting talks scheduled: Torsten Curdt: Data Legacy - the challenges of an evolving data warehouse Christoph M. Friedrich, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI): “SCAIView - Lucene for Life Science Knowledge Discovery”. ...

Scrum Table Berlin

June 21, 2009
Scrum, Event, General

Scrum Table Berlin # Last week I attended the scrum table Berlin. This time around Phillippe gave a presentation on “backlog colours”, that is types of work items tracked in the backlog. The easiest type to track are features - that is items that generate revenue and are on the wishlist of the customer. Second type of items he sees are infrastructure items - that is, things needed to implement several features but invisible to the customer. ...

Open Street Map @ FSFE meetup

June 21, 2009
Open Street Map, Free Software, FSFE, General

Open Street Map @ FSFE meetup # At the last meeting of the local FSFE group here in Berlin Sabine Stengel from cartogis gave a presentation on Open Street Map. But instead of focussing on the technical side she described the legal issues and showed the broad variety of commercial projects that are possible with this type of mapping information. It was interesting to learn of how detailed and high quality the information provided by volunteers really is. ...

Keeping changesets small

June 21, 2009
svn, Hacking

Keeping changesets small # One trick of successful and efficient software development is tracking changes in the sources in source code management systems, be it centralized systems like svn or perforce or decentralized systems like git or mercurial. I started working with svn while working on my Diploma thesis project in 2003, continued to use this systems while researcher at HU Berlin. Today I am using svn at work as well as for Apache projects and have come to like git for personal sandboxes. ...

Scrum Tisch

June 4, 2009
Scrum

Scrum Tisch # Title: Scrum Tisch Location: Divino FHain Link out: Click here Description: Philippe will present his speech from the Orlando scrum Gathering where he will speak about backlog and time-box, about value versus cost, about visible features versus invisible features (and in particular software architecture), about defects and technical debt, and more generally about release planning and sprint planning for non-trivial and long-lived software development projects. Start Time: 18:00 ...

Ken Schwaber in Berlin XBerg

May 24, 2009
software development, management, Scrum, General

Ken Schwaber in Berlin XBerg # Last week I attended a discussion meetup with Ken Schwaber in Berlin/ Kreuzberg. The event was scheduled pretty shortly still quite a few developers and project managers from various companies in Berlin showed up. Ken started with a brief summary of the history of Scrum: Before there was such a thing as an IT industry programming actually was a lot of fun. But somehow the creative job was turned into something people tend to suffer from pretty quickly as people tried to apply principles from manufacturing industries to software “production”. ...