ApacheConEU - part 08

2012-11-17 20:53
Jan Lehnardt's talk covered the history of CouchDB - including lessons learnt along the way. The first issue he went into: Shipping 1.0 is hard! They spent a lot of effort and time in order to have a stable database that won't loose your data - only to have a poorly patch slip in for 1.0 that resulted in data loss. The fury of action happening afterwards was truely amazing - people working on rolling shifts all over the planet to not only fix the issue but also provide recovery tooling for those affected by the bug. The lessons learnt form that are as obvious as they are often neglected: Both test coverage as well as code review are crucial for any software project.

The second topic Jan went into was the disctraction and tension that comes from having a company built around your favourite open source project. When going down this road keep in mind that the whole VC setup usually is very time consuming - the world starts revolving around the need to either gather more VC funding or make up a successful business case to support your company. All of this results in less time spent coding, friction around the fact that the corporate interests may not always be what is best for your open source project. In CouchDB the result was the explosion of the project founder who eventually left the project. This hit CouchDB particularly badly as the project essentially was built around the idea of the one brilliant coder, relied on his information channels for marketing. The lesson learnt was that having communications centralised that way can easily turn against you - don't trust your benevolent dictator.

Usually it is quite ok for users to move on - in particular if the project does no longer fit their needs. However having multiple key people leave at the same time can be detrimental, in particular if they are the vocal ones. In terms of lessons learnt: Embrace the fact that people will fail your software. Use the resulting knowledge about your application boundaries - or fix what failed them.

In terms of general advise: The world moved on after any of these cases. What does help is to ship what users need instead of running after the next big hype. Also good ideas will stick - using json as format and js for query formulation did make it into many other applications with the former also making it into the next SQL standard to be released in 2015. The goal should be to build stuff that is easy (and fun) to use.

In the mean time CouchDB grew up. Not only does it have another release and a new web site. It has turned into a project that is no longer a thing pushed forward by a single person but that moves on its own. The secret behind that development is to acknowledge that having just few people in the leading position will burn them out - make sure to enable others and that your strong leaders to get to lead. Oh and as any Apache project also CouchDB is happy about any new contributor joining the project.

When it comes to communication the Apache incubation process made sure to burn the "everything happens on the mailing list" mantra into their mind. Still IRC was a valuable way of communication for non-decision stuff like user support and community building. IRC is fun - in particular when you can train irc bots based on earlier communication to automatically answer incoming user questions.

Another option CouchDB used to fix the community issues was to meet with people face-2-face - for three days in Boston, later in Dublin, later in Vienna. In addition they added a roadmap for the next 2 to 3 years including points like:

  • faster releases - they switched to time based instead of feature based releases except for security patches
  • they are the first to use git@apache to make branching and merging easier
  • they are github lovers with pull requests ending up on their dev list
  • they enabled a Erlang beginners question list in order to be able to recruit new contributors in a world of lacking Erlang developers. A very specific result of that was that people are much more comfortable even asking simple question - and on a more practical note one question for the birds eye view of couchdb resulted in Jan spending an hour and a half drawing up that particular picture: Spending an hour on docs to get to really new people is time well spend.


In terms of PMC chair lessons learnt: The goal should be to get the right people to care about the right thing. Having people finish stuff helps - and is infectious.

In the end as an open source project your biggest asset is your community. Motivating more people to join is key. If for your target audience JIRA is one step too much talk to infra to figure out how to make things better (and help them with the solutions).

What is fascinating about CouchDB is the whole ecosystem around the project. CouchDB is not just a database project hosted at Apache. It comes with a really well working replication API. There are implementations in js running in Browsers, there's BigCouch (dynamo in Erlang on top of CouchDB), there is an iOS app, there is PouchDB (the couch for your pocket), TouchDB (iOS and android implementations on top of sqlLight). The fun part to watch is that the idea is bigger than the project at Apache. The bigger the ecosystem the better for the community - there's no need to fold everything into the original project.

And of course also CouchDB is hiring.

Scalability

2010-06-23 11:17
For Berlin Buzzwords we concentrated pretty heavily on scalable systems and architectures: We had talks on Hadoop for scaling data analysis; HBase, Cassandra and Hypertable for scaling data storage; Lucene and Solr for scaling search.

A recurring pattern was people telling success stories involving project that either involve large amounts of data or growing user numbers. Of course the whole topic of scalability is extremely interesting for ambitious developers: Who would not be happy to solve internet-scale problems, have petabytes of data at his fingertips or tell others that their "other computer is a data center".

There are however two aspects of scalability that people tend to forget pretty easily: First of, if you are designing a new system from scratch that implements a so far unknown business case - your problem most likely is not scalability. It's way more likely that you have to solve marketing tasks, just getting people to use your cool new application. Only after observing what users actually do and use you have the slightest chance of spotting the real bottlenecks and optimising with clear goals in mind (e.g. reduce database load for user interaction x by 40%).

The second issue people tend to forget about scalability is that the term is about scaling systems - some developers easily mix that up with high performance. The goal is not to be able to deal with high work load, but to build a system that can deal with increasing (or decreasing) work load. Ultimately this means that not only your technology must be scalable: Any architecture can only scale to a certain load. The organisation building the system must be willing to continuously monitor the application they built - and be willing to re-visit architectural decisions if the environment changes.

Jan Lehnardt had a very interesting point in his talk on CouchDB: When talking about scalability, people usually look into the upper right corner of the application benchmark. However to be truely scalable one should also look into the lower left corner: Being scalable should not only mean to be able to scale systems up - but also to be able to scale them down. In the case of CouchDB this means that not only large installations at BBC are possible - but running the application on mobile devices should be possible without problems as well. It's an interesting point in the current "high scalability" hype.

Dev House Berlin 2.0

2009-10-04 20:04
This weekend DevHouseBerlin took place in the Box119, kindly organized by Jan Lehnardt, sponsored by Upstream and StudiVZ. There were about 30 people gathered in Friedrichshain, hacking and discussing various projects: Mostly Python/ Django, Ruby/ Rails and Erlang people.

The first day was reserved for hacking and exchanging ideas. Late afternoon attendees put together a list of talks that were than rated, ranked with the top three chosen for presentation on Sunday. The list included topics on CouchDB, RestMS, Hadoop, Concurrency in Erlang, P2P CouchDB and many more. The first three topics were chosen by the participants for presentation.

During the time at DevHouse I finally got a list of topics and papers up at Mahout TU project - now only the exact credit system for the Mahout course at TU is missing. I got some time to work on Mahout improvements and documentation. Unfortunately I was too tired today to complete the code review for MAHOUT-157 - promise to do that early next week.

Spending one weekend with equal-minded people, being able to pair with someone else in case of more complex problems made the weekend a great time for me. Planning to be there again next year. Thanks to the sponsors and organisers for making this happen.