Building online communities - from the 0MQ trenches

2013-11-13 21:38
After seeing several talks on how open source communitites are organised at FOSDEM, on how to license open source software strategically at Chemnitzer Linuxtage and on how to nurture open source communities at Berlin Buzzwords over the past couple of years during the past year or so I've come to read quite a few articles and books on the art of building online communities. It all started with the now famous video on poisonous people of a talk given by Brian Fitzpatrick and Ben Collins-Sussman. Starting from there I went on to read their book "Team Geek" - a great read not only if you are working in the open source space but also if you have to deal with tech geeks and managers on a daily basis as part of your job.

I continued the journey with reading "Producing Open Source Software" - a book commonly recommended to read for those trying to understand how to run open source projects. Even though I started Apache Mahout back in 2008, first got in touch with the nutch/Lucene community in 2004 and wrote my first mails to realtime Linux mailing lists to ask for help for some university assignment as far back as I guess 2001 the book still contained many ideas that were new and valuable to me. Most important of all it presented most of the important aspects of running an open source project in a very concise nicely presented format.

After going to a talk on engineering a collaborative culture in the midst of flame wars (including a side note on how to even turn trolls into valuable community members that help new comers substantially) given by Kristian Koehntopp earlier at Froscon this year I started reading a book that he recommended: Building Successful online communities by MIT press.

Many of these texts come from people that either have an Apache background one way or another - or are of more general nature. Yesterday I was happy to take the ZeroMQ guide (also available on dead trees) and as github project you can contribute to) that Pieter Hintjens had kindly given to my husband earlier this year during FOSDEM and find a whole chapter on how he manages ZeroMQ.

The text is unique in that iMatix got into a very influential position in the project very early on. However based on decades of open source experience Pieter managed to avoid many of the mistakes beginners make from the very outset. Also having built several online communities before (ranging from open source projects to the NGO FFII) he deliberately designed the ZeroMQ development in a
way that would encourage a healthy community.

There are several essential aspects that I find interesting:

The ZeroMQ development model is explicitly codified - they call this C4: After the painful experience of discussing what seemed obvious but unspoken rules before codification the development team came up with a protocol for developing ZeroMQ - the protocol definition formulation being based on the rules IETF RFC are written. Many rules at Apache are not written down - especially when explaining how the Apache Way works to new projects in the incubator this becomes obvious again and again. Granted - apart from a handful of core values - Apache projects are essentially free to define their own way of working. However even within one project your mileage may very depending on who you ask how things are done. This makes it hard for newcomers to understand what's going on - but also can become an issue when problems arise.

A concept that I find interesting about the way ZeroMQ works is the separation between maintainers and contributors: Maintainers are people who pull code into mainline - contributors are those doing the actual coding. Essentially this means that in order to get a patch in it needs at least two people to look at it. This isn't too much different from a review-than-commit policy - just enforced and written down as good practice. It helps avoid panic errors of people committing code in a hurry. But it also makes sure that those writing code actually get the positive feedback they deserve - which in turn might help
avoiding fast contributor burn out.

Also this kind of split in roles makes sure that there are no people with special privileges - just because someone has commit access to the main repository doesn't mean he can take any shortcuts process wise: They still have to come up with a decent problem description, file a ticket, create a patch, submit the patch through a pull request and have it reviewed like anyone else. I found it interesting that though ZeroMQ is backed and initiated by iMatix Pieter considers it to be very important to keep a balance in power and delegate to non-iMatix contributors both, coding and design decisions.

With iMatix being a small company the stance on making ZeroMQ an LGPL license project is a very clear decision. It's the only way to ensure that downstream users cannot just take the project, make modifications to it, re-package and ship it to users without the accompanying source code under the same license. In turn this tends to make it much more likely that even capable users tend to contribute to upstream. Of course taking the idea itself and turning it into some proprietary project would still be very possible. However the one thing that sets ZeroMQ apart from other efforts is not the source code or the architecture alone - it's the way the community works and blossoms.

One part where this choice of license is particularly handy is the deliberate decision to not go through any copyright assignment process. Instead each patch gets licensed to the project under the regular LGPL terms. This means that even should iMatix one day be sold or change their minds re-licensing the whole project is utterly hard. The impact on the community is clear: It makes sure that contributors' patches remain their own - including all merit and praise that comes with it. This approach prevents re-licensing but encourages a sense of shared ownership. Essentially this model of copyright handling is not unlike the way the Linux kernel works.

The last point that I found important is the way the project itself is structured: Instead of having everyone work on one single project ZeroMQ makes it easy to write extensions to the core library. There is a whole guide on how to write language bindings. Those writing these bindings aren't regulated at all - they are hosted in their own repositories with their own governance if they want - in the end it's up to the user to decide which ones are good and which ones will never become popular. In turn this lead to many people contributing indirectly to the value of ZeroMQ in significant ways. This is not unlike other projects: Apache HTTPd provides APIs to write modules against. ElasticSearch provides a clean REST API that encourages people speaking other languages to develop plugins that will translate the REST API into whatever their preferred language is. Open/Libre Office deliberately encourages writing extensions and plugins - even providing hosting facilities where users can search and download extensions from third parties.

I leave it as an exercise to the reader to check out the whole book. Even in the community chapter there are several other interesting concepts as well: The experience ZeroMQ went through with actively encouraging even developers with commit access to the main repository to work with forks instead of feature branches for experimental development, the trouble they went through with making backwards in-compatible changes to user facing APIs way too often, the exact definition of the C4 development process.

Overall a really interesting perspective on open source development from the trenches with lots of experience to back the advise given. If you are interested in learning more on how open source projects work - and if you are using any you definitely should be, otherwise you are betting part of your business on something you do not understand which generally isn't the best idea of all.

ApacheConNA: Meet the indian tribe

2013-05-08 20:10
ApacheCon is the ``User Conference of the Apache Software Foundation''. What
should that mean? If you are going to Apache Con you have the chance of meeting
committers of your favourite projects as well as members of the foundation
itself. Though there are a lot of talks that are interesting from a technical
point of view the goal really is to turn you into an active member of the
foundation yourself. This is true for the North American version even more than
for the European edition.

Though why should you as a general user of Apache software be interested in
attending then? Pieter Hintjens put it quite nicely in an interview on his
latest ZeroMQ book with O'Reilly:

If you are using free software in particular in commercial setups you really do
want to know how the project is governed and what it takes to get active and
involved yourself. What would it take to move the project into a direction that
fits your business needs? How do you make sure features you need are actually
being added to the project instead of useless stuff?

ApacheCon is the conference to find out how Apache projects work internally,
the place to be to meet active people in person and put faces to names. Lots of
community building events focus on getting newbies in touch with long term

How to get your submission accepted at Berlin Buzzwords

2013-05-07 11:21
Disclaimer: Intentionally posting on my private blog - these are my own criteria, not general advice from the review committee.

Berlin Buzzwords is in it's fourth year. Probably the most tedious task of all is having to select talks to make it into the final schedule. With roughly 120 submissions and roughly 30 slots to fill the result is that three quarters of all submissions have to be rejected. Last year I shared some details on how we do talk ranking given reviewers have provided their input.

Now the mechanics of ranking are clear, people have asked me what goes into the reviews themselves. Here I can only speak for myself: After doing reviews ourselves during the first two years, Simon, Jan and myself decided to spread the work of reviewing submissions among a larger team of people. As nearly all of them had attended Berlin Buzzwords in the past already (or had at least followed the conference remotely) we could assume they were roughly familiar with what kind of content would be a good fit. As a result review guidelines that we send out tend to be rather light:

Berlin Buzzwords is a conference from geeks for geeks: The goal is to get the people actively working in the field together to meet and exchange ideas. Content should have some technical depth - in particular pure marketing talks and obvious product placements without further technical value are not welcome. We usually invite both, interesting case studies as well as talks highlighting the technical details a project is built upon.

In the end judgement is up to the individual reviewer - so I can speak only for myself when listing what you should do to get your talk accepted.

  • Be on topic. There's always a handful of submissions that look and sound like pure marketing, product placement or simply aren't related to software engineering at all. Those tend to be easy to spot and weed out.
  • Tell us what you are talking about. An abstract is there to provide some detail on your presentation - don't be just funny, promising overly generic content. In order to decide whether or not your talk is relevant please provide some details on which direction you'll be heading.
  • Don't be too detailed in the abstract neither - there's no need to list the content of every slide. Make sure the abstract correctly summarizes your talk, making it catchy and nice to read usually helps if the content is solid.
  • We try to find those speakers that have not only an interesting topic to talk about but are also a pleasure to listen to, who can successfully get their point across. We cannot know every potential speaker in person though. As a result it helps if you list which conferences you've spoken at in the past, any videos of previous talks is helpful as well. As a general piece of advice: Choosing Berlin Buzzwords as your first conference to speak at ever usually is a great way to disaster. Get some practice at local meetups like the Berlin Hadoop Get Together, the data science day, the Java User Group Berlin Brandenburg, the RecSys Stammtisch Berlin or the MongoDB User Group Berlin to name just a few.
  • Make sure your talk is novel - submitting the same topic in 2012 and 2013 is a great way to ensure getting rejected. Also it is fine to submit a talk you have given at another conference earlier. However if everyone in the Buzzwords audience is very likely to have watched the exact same version of your presentation earlier already, we are less likely to accept your talk.
  • Finally: When drafting your bio make sure to include details that explain why you are the perfect expert to talk about the topic at hand. As much as I'd like to I don't know every project's committer by name. Provide some help by pointing out explicitly what your contributions have been or in what context you have used the technology you are presenting. Don't be shy to list that you are a co-founder of a successful project. Not only does this information help with selecting talks, it also provides some background for the audience to judge the claims you make.

Two words on the role of free software at Buzzwords: There is no explicit requirement to only talk about software that is publicly available under a free software license however if some project or framework is presented it helps to be open source to raise the applicability for the audience. Most projects discussed at Berlin Buzzwords are developed openly. In order to get the maximum out of these projects it pays to know how they work internally, how to get active yourself, how to contribute. As a result discussions and talks on project governance are generally welcome.

A parting note: With way more than half of all submissions to reject making a final decision will always be hard. Being rejected doesn't necessarily mean that your proposal was bad. Following the above advise may raise chances of being accepted - however it is no guarantee. We could raise the number of accepted talks by extending the conference by another track or even another day - at the cost of raising the ticket price substantially. However we want not only "big corp representatives" but a diverse audience, attendees that get active themselves, that help shape the conference:

There's plenty of space and time to get active in addition to the main conference program. Use the time and space to shape the conference.


2009-03-04 08:55
This is the blog of Isabel. I am committer at Apache Mahout. In my free time I am working on Apache Mahout, organising the Apache Hadoop Get Together in Berlin and speaking at various conferences explaining the ins and outs of Hadoop in general and Mahout in particular.

Disclaimer: I am writing this blog with my “Apache hat” on my head. The opinions expressed herein are my own personal opinions and do not represent my employer’s view in any way. Except when explicitly stated otherwise activities and events described are not related to my daytime job.