FOSDEM - Sunday - smaller bits and pieces

2011-02-18 20:17

With WebODF the Office track featured a very interesting project that focusses on providing a means to open ODF documents in your favourite browser: Content and formatting are converted to a form that can easily be dealt with by using a combination of HTML and CSS. Advanced editing is then supported by using JavaScript.

With Open Stack the following talk focussed on an open cloud stack project that was started by NASA and Rackspace as both simultanously needed support for an open source, openly designed, developed cloud stack that strives for community inclusion. According to the speaker the goal is to be as ubiquitous a cloud project as Apache is for web servers - he probably was not quite aware of how close to even the foundation side of Apache that development model is.

The closing keynote dealt with the way kernel development takes place. There were a few very interesting pieces of information for contributors that are valid for any open source project really:

  • Out of tree code is invisible to the kernel developers and users. As such the longer it remains out of tree code the harder it becomes to actually go out there and feel the wind.
  • In contrast open code means giving up control: Maintainership means responsibility but it does not come with any power or control over the source code. Similarly opening code up as patch or separate project at Apache means giving up control - means working towards turning the project into a community that can live on its own.
  • For kernel patches the general rule is to not break things and not go backward in quality: What is working for users today must be working with the next release as well. To be able to spot any compat issues it is necessary to take part on the wider disucssion lists - not only in your limited development community. Developers should focus on coming up with a problem solution instead of getting their original code into the project.

Or in short: The kernel is no research project, as such it must not break existing applications. Visionary brilliance really is no excuse for poor implementation. Conspiracy theories such as "hey, developer x declined my patch only because it is out of scope for his employer's goals" are not going to get you anywhere. Such things do happen, but in general kernel developers first think of themselves as kernel developers - being employee somewhere only comes after that.

Keep in mind that the community remembers past actions. In the end you need not convince business people or users but the developers themselves who might end up with the maintanance burden for your patch. To get your patch accepted it greatly helps to not express it in terms of the implementation needs only but to clearly formulate your requirements - independent of implementation. And as in any open source project, helping with cleanup (that is not only white space fixes, but real cleanup as in refactoring) does help build a positive attitude.

Why you should go for kernel development never the less? It's a whole lot of fun. It's a way to influence the kernel to support the features that you need. It's sort of like becoming part of an elite club - and which developer does not like the feeling of belonging to the elite changing the way the world looks tomorrow? In addition as with an substantial open source involvement being visible in the kernel community also most likely means being visible to your future employer.

FOSDEM - HBase at Facebook Messaging

2011-02-17 20:17

Nicolas Spiegelberg gave an awesome introduction not only to the architecture that powers Facebook messaging but also to the design decisions behind their use of Apache HBase as a storage backend. Disclaimer: HBase is being used for message storage, for attachements with Haystack a different backend is used.

The reasons to go for HBase include its strong consistency model, support for auto failover, load balancing of shards, support for compression, atomic read-modify-write support and the inherent Map/Reduce support.

When going from MySQL to HBase some technological problems had to be solved: Coming from MySQL basically all data was normalised - so in an ideal world, migration would have involved one large join to port all the data over to HBase. As this is not feasable in a production environment instead what was done was to load all data into an intermediary HBase table, join the data via Map/Reduce and import all into the target HBase instance. The whole setup was run in a dark launch - being fed with parallel life traffic for performance optimisation and measurement.

The goald was zero data loss in HBase - which meant using the Apache Hadoop append branch of HDFS. The re-designed the HBase master in the process to avoid having a single point of failure, backup masters are handled by zookeeper. Lots of bug fixes went back from Facebooks engineers to the HBase code base. In addition for stability reason rolling restarts were added for upgrades, performance improvements, consistency checks.

The Apache HBase community received lots of love from Facebook for their willingness to work together with the Facebook team on better stability and performance. Work on improvements was shared between teams in an amazing open and inclusive model to development.

One additional hint: FOSDEM videos of all talks including this one have been put online in the meantime.