ApacheConEU - part 03

2012-11-12 20:27
Tuesday started early with a plenary - run by the sponsor, not too many news there, except for the very last slide that raised a question that is being discussed often also within the ASF - namely how to define oneself compared to non-ASF projects. What is the real benefit for our users - and what is the benefit for people to go with the ASF. The speaker concentrated on pointing out the difference to github. Yes tooling changes are turning into a real game changer - but that is nothing that the foundation could not adopt over time. What I personally find interesting is not so much what makes us different from others but more what can be learnt from other projects - not only on github but also in a broader scope from the KDE, Python, Gnome, Debian, Open/Libre-Office communities, from people working on smaller but non-the-less successful projects as well as the larger foundations, maybe even the corporate driven projects. Over time lots of wisdom has accumulated within and outside of the foundation on how to run successful open source projects - now the question is how to transfer that knowledge to relatively young projects and whether that can help given the huge amount of commercial interest in open source - not only in using it but also in driving individual projects including all the benefits (more people) and friction around it.

The first talk I went to was an introduction to was Rainer Jung’s presentation on the new Apache httpd release. Most remarkably the event mpm available as “experimental” feature is now marked as default for the Apache distribution - though it is not being used for ssl connections. In addition there is support for async write completion, better support for sizing and monitoring. In particular when sizing the event mpm the new scoreboard comes in handy. When using it, keep in mind to adjust the number of allowed open file handles as well.

In order to better support server-to-client communication there is html5 web socket standardisation on it’s way. If you are interested in that check out the hybi standardisation list. Also taking a look at the Google SPDY could be interesting.

Since 2.4 dynamic loadable modules are supported and easy to switch. When it comes to logging there is now support for sub second timestamp precision, per module log levels. Process and thread ids are kept in order to be able to untwist concurrent connection handling. There are unique message tokens in the error log to track requests. Also the error log format is configurable - including trace levels one to eight, configurable per directory, location and module.
They’ve added lots of trace messages to core, correlation ids between error and access log entries (format entry %L). In addition there is a mod_log_debug module to help you log exactly what you want when you want.

Speaking of modules - in order to upgrade them from 2.2 to 2.4 it’s in general sufficient to re-compile. With the new version though not all modules are going to be loaded as default anymore. New features include dynamic configurations based on mod_lua. AAA was changed again, there are filters to rewrite content before it’s sent out to clients (mod_substitute, mode_sed, mod_proxy_html). mod_remoteip helps to keep the original ip in your logs instead of the procy ip.

When it comes to documentation better check the English documentation - or better yet provide patches to it. mod_rewrite and mod_proxy improved a lot. In addition the project itself now has a new service for it’s users: via comments.apache.org you can send documentation comments to the project without the need to register for a bugzilla account and provide documentation patches. In addition there is now syntax highlighting in the documentation. One final hint: the project is very open and actively looking for new contributors - though they may be slow to respond on the user and dev list - they definitely are not unfriendly ;)

ApacheCon EU - part 02

2012-11-11 20:26
For me the week started with the Monday Hackathon. Even though I was there early the room quickly filled up and was packed at lunch time. I really liked the idea of having people interested in a topic register in advance - it gave the organisers a chance to assign tables to topics and put signs on the tables to advertise the topic worked on. I'm not too new to the community anymore and can relate several faces to names of people I know are working on projects I'm interested in - however I would hope that this little bit of extra transperancy made it easier for newcomers to figure out who is working on what. Originally I wanted to spend the day continuing to work on an example showing what sort of pre-processing is involved in order to get from raw html files to a prediction of which Berlin Buzzwords submission is going to be accepted. (Un-?)fortunately I quickly got distracted and drawn into discussions on what kind of hardware works best for running an Apache Hadoop cluster, how the whole Hadoop community works and where the problem areas are (e.g. constantly missing more helping hands to get all things on the todo list done).





The evening featured a really neat event: Committers and Hackathon participants were invited to the committer reception in the Sinsheim technical and traffic museum. One interesting observation: There's an easy way to stop geeks from rushing over to the beer, drinks and food: Just put some cars, motor cycles and planes in between them and the food ;)

ApacheConEU - part 01

2012-11-10 14:30
Apache Con EU in Germany - in November, in Sinsheim (in the middle of nowhere): I have to admit that I was more than skeptical whether that would actually work out very well. A day after the closing session it's clear that the event was a huge success: Days before all tickets were sold out, there were six sessions packed with great talks on all things related to Apache Software Foundation projects - httpd, tomcat, lucene, open office, hadoop, apache commons, james, felix, cloud stack and tons of other projects were well covered. In addition the conference featured a separate track on how the Apache community works.





The venue (the Hoffenheim soccer team home stadium) worked out amazingly well: The conference had four levels rented with talks hosted in the press room, a lounge and two talks on each of the first and second floor in an open space setup. That way entering a talk late or leaving early was way less of a hazzle than when having to get out the door - sneaking into interesting talks on the second floor was particularly easy: From the third floor that was reserved for catering one could easily follow the talks downstairs. Speaking of catering: Yummy and available all the time - and that not only counts for water but for snacks (e.g. cake between breaks), coffee, soft-drinks, tea etc. On top of that tasty lunch buffet with all sorts of more or less typical regional food. You've set high standards for upcoming conferences ;)

Speaking at ApacheCon EU 2012

2012-09-15 12:47
I'll be at ApacheCon EU in November. Looking forward to an interesting conference on all things Apache that is finally returning back to Europe. Go there if you want to learn more on Tomcat, Hadoop, httpd, HBase, Camel, Open Office, Mahout, Lucene and more.

Now on to prepare the two talks I submitted:


  • "Choosing the right tool for your data analysis task - Apache Mahout in context"
  • "I was voted to be committer. Now what?"


Looking forward to see you there.

Apache Con returns to Europe

2012-08-01 20:41
In November Apache Con will come back to Europe. The event will take place in Sinsheim inviting foundation members, project committers, contributors and users to meet, discuss and have fun during the one week event.



Several meetups will be held the weekend before the main conference kicks off, watch out for announcements on your favourite project mailing list.

ApacheCon is still open for submissions until August 3rd - head over to the Call for submissions for more information. The conference is split into several tracks that are being handled individually: Apache Daily - Tools frameworks and components used on a daily basis, Apache Java Enterprise projects, Big Data, Camel in Action, Cloud, Linked Data, Lucene, Modular Java Applications, NoSQL Database, OFBiz (The Apache Enterprise Automation project), Open Office and finally Web Infrastructure (covering HTTPD, TomCat and Traffic Server, the heart of many Internet projects).

Make sure to mark the date in your calendar to meet with the people behind the ASF projects, learn more on how the foundation works and what makes Apache projects so particular compared to others. Join us for a week of fun and dense talks on all things Apache.


The Apache Feather logo is a trademark of The Apache Software Foundation.

Talking people into submitting patches - results

2012-01-01 18:42
Back in November I gave a talk at Apache Con NA in Vancouver on talking friends and colleagues into contributing patches to open source projects. The intended audience for this talk were experienced committers to Apache projects, the goal was to learn more on their tricks for talking people into patching. First of all thanks for an interesting discussion on the topic - it was great to get into the room with barely enough slides to fill 10 min and still have a lively discussion 45min later.

For the impatient - the written feedback is available as Google Doc. Most common advise I heard involved patience, teaching, explaining, fast feedback and reward.

One warning before going into more detail on the talk: All assumptions and observations stated are highly subjective, influenced by my personal experience or by whatever the experience of the audience was. Do not expect an objective, balanced, well research analysis of the problems in general. That said, lets start with the talk itself. Before the talk I decided to limit scope to getting people in that have limited experience with open source. That intentionally excluded anyone downstream projects depending on one's code. Though in particular interaction with common Linux distributions and their package maintainers is vital, that issue warrants for a separate talk and discussion.

I divided those inexperienced with open source into three groups to keep discussion somewhat focused:

  • Students learning about open source projects during their education and have neither background in software engineering nor in open source but are generally very eager to lean and open to new ideas.
  • Researchers learning about the concept as part of a research grant who have some software engineering experience, some experience with open source - in particular with using it - but in general do not have writing open source software as their main objective, but have to participate as part of their research grant.
  • Software engineers having experience with software engineering, some experience in particular with using open source and in general both strong opinions on what the right way of doing things is and who have a strong position in their team that helps them in no way when starting to contribute.


One very common way



To understand some of the issues below let me first highlight what seems to be the most common way to become involved with any Apache project: Usually it starts with using one of their software packages. After some time what is shipped does no longer fit your needs, reveals bugs that stop you from reaching your goals or is missing one particular feature - even if that is just one particular method being protected instead of private.

People fix those issues. As the best software developers are utterly lazy the contribute stuff back to the project to avoid the work of having to maintain their private fork just for some simple modification. The more features of a project are being used, the more likely it gets that also larger contributions become possible. Overall this way of selecting issues to fix has a lot to do with scratching your own itch. In the end this kind of issue prioritisation also influences the general direction of a project: Whatever is most important to those actively contributing is driving the design and development. So the only way to change a project's direction to better fit your needs is to start getting active yourself: Those that do are the ones that decide.

Students



Lets take a closer look at students aspiring to work on an open source project. They are very keen on contributing new stuff, learning the process and open to new ways of doing things. However for the most part they are no active users of the projects they selected so they do not directly see what is important to fix. In addition they have only limited software development experience - at least when looking at German universities, bug trackers, source version control, build systems, release management, maintaining backwards compatibility, unit test frameworks are on no schedule - and most likely shouldn't be neither. So your average student has to learn to deal with checking out code, compiling it, getting it into their favourite editor, adding tests and making them pass.

Apart from teaching, giving even simple feedback it helps to provide the right links to literature at the right times, and generally mentor students actively. In addition it can be helpful to leave non-critical, easy to fix issues open and mark them as "beginner level" to make it easier for new-comers to get started. One last advise: Get students to publish what they do as early and as often as possible. Back in the days I used to do projects at TU Berlin with the goal of getting students to contribute to Mahout. In the first semester I left the decision on when to open up the code to the students - they never went public. In the second semester I forced them to publish progress on a weekly basis (and made that part of how their final evaluation was done) - suddenly what was developed turned into a patch committed to the code base.

Researchers



A second group of people that has an increasing interest in open source projects are researchers. In particular for EU project research grant the promise of providing results and software developed with the help of European tax-payers money under and open source license has become an important plus when asking for project grants.

However before becoming all too optimistic it might make sense to take a closer look: Even though there is an open source check box on your average research grant that by no means leads to highly motivated, well educated new contributors for your project: With software development only being a means to reach the ultimate goal of influential publications researchers usually do not have the time and motivation to polish software to the level needed for a successful and useful contribution. In addition the concept of maintaining your contribution for a longer time usually does not fit the timeline and timeframe of a research project.

Apart from teaching and mentoring projects themselves should start asking for the motivation of the contribution. There are a few popular arguments to contribute patches back. However not all of them really work for the research use case: The cost of maintaining a fork is close to zero if you intend to never upgrade to a new version and do not need security fixes. Another common argument is an improved visibility of your work and an improved reputation of yourself as software developer. If software development for you is just a means to reach a much higher goal those arguments may not mean much to you. A third common argument is that of improving code quality by having more than one pair of eyes review it - and where would you get a better review than in the project bringing together the original code authors? However if ultimate stability, security and flexibility is not your goal than also that may not mean much to you.

Key is to find out where the interest for working on open source comes from and build up arguments from there.

Software engineers



The third group I identified was professional software developers - as clarified after a question from the audience: Yes, I consider people who are unable to create, read, apply patches as professional software developers. If I would exclude these people there would be noone left who earns his living with software development and does not already work on open source projects.

In contrast to the above groups these people have extensive software development experience. However that also means that after having seen a lot of stuff that works and that does not work they do have a strong position in their teams. Usually those fixing issues in libraries they use re the ones that have established work-flows that work for them very well and who are used to being pretty influential. When going into an open source community however no-one knows them. In general they are only judged based on their patch. They get open feedback - in the context of that project. Projects tend to have established coding guidelines, best practices, build systems - that may differ from what you are used to in your corporate environment.

Getting up to speed in such an environment can be intimidating at best in particular if everything you do is public, searchable and findable by definition. All the more it is important to get involved and get feedback early by even putting online early sketches of what your plan is.

However with everything being open there is also one major positive side to motivating contributors: Give credit where credit is due - add praise to the issue tracker by assigning issues to the one providing he patch, add the name of the contributor to your release notes. When substantial, mention the contribution with name in talks, presentations and publications.

Another important issue here is the influence of deadlines: If it takes half a year to get feedback on your particular improvement the reason why you made it may no longer exist - the project may have been cancelled, the developer moved to a different team, the patch applied internally as is fixing the existing issues. Fast feedback on new patches, in particular if they are clean and come with tests is vital. One positive example for providing feedback on formal issues quickly is the automated review bot at Apache Hadoop: It checks stuff like style, addition of tests, checks against existing tests and the like quickly after the patch is submitted in an automated way. Just one nitpick from the audience: The output of that bot could be either marked more clearly as "this is automated" or the text formulated a bit friendlier - if a human had done the review it would have mentioned the positive things first before criticising what is wrong.

Last but not least (applies to researchers as well), there may be legal issues lurking: Most if not all contracts entail that at least what you do during working hours belongs to your employer - so it's up to them what gets open sourced and what doesn't. Suddenly your very technical new contributor has to convince management, deal with legal departments and work his way through the employers processes - most likely without deep prior knowledge on open source licenses - let alone contributor agreements (or did you know what the Apache CCLA entails, let alone being able to explain it to others before really getting active?)

General advise



To briefly summarise the most important points:


  • Give feedback fast - projects only run for so long, interest only lasts for so long. The faster a contributor is told what is not too great about his patch, the more likely those issues are fixed as part of the contribution. (Inspired by Avro and Zookeeper who were amazingly fast in providing feedback, committing and in the case of Avro even releasing a fixed version).
  • When it comes to new contributors be patient, remain friendly even when faced with seemingly stupid mistakes.
  • Give credit where credit is due - or could be due. Mention contributors in publications, press releases, release notes, the bug tracker. Let them know that you do. (Inspired by Drools, Tomcat, Zookeeper, Avro). Pro-tip: Make sure to have no typo in people's names even if checking takes one extra minute. (Learned from Otis).
  • Use any chance you get to teach the uninitiated about the whole patch process. I know that this seems trivial to those who work with open source on a daily basis. However when getting dependencies through Maven it may already be cumbersome to figure out where to get the source from. When used to git in the daily workflow it may be a hurdle to remember how to checkout stuff from svn ;) Back in June we had a Hadoop Hackathon in Berlin that was well attended - mostly by non-committers. Jakob Homan proposed a rather unusual but very well received format: In the Hadoop bug tracker there are several issues marked as trivial (typos in documentation and the like). Attendees were asked to choose one of these issues, checkout the source, create a patch and contribute it back to the project. Optionally they got explained how the process continues from there on the committer side of things. It may seem trivial to mechanically go through the patch process, however it help lower the bar in case you have a real issue to fix to first get accustomed to just how it works. If instead of contributing to Apache you are more into working on the Linux kernel I'd like to advise you to watch Greg Kroah Hartman on writing and submitting your first Linux kernel patch (FOSDEM).
  • Last but not least make sure to lower the bar for contribution - do not require people to jump through numerous loops, in general even just getting a patch ready is complicated enough. Provide a how to contribute page (e.g. see how to contribute and how to become a committer pages in the Apache Mahout wiki.
  • In particular when your project is still very young lower the bar by turning contributors into committers quickly - even if they are "just" contributing documentation fixes - in my view one of the most important contribution there is as only users spot areas for documentation improvement.


In case you yourself are thinking about contributing and need some additional advice as to why and for what purposes: Dr Dobbs has more information on reasons why developers tend to start to contribute to Apache software, Shalin explains why he contributes to open source, on the Mahout mailing list we hade a discussion on why also students should consider contributing, on the Apache community mailing list there was an interesting discussion on whether developers working on open source are happier than those that don't.

Apache Con Wrap Up

2011-11-16 20:45
First things first - slides, audio and abstracts of Apache Con are all online now on their Lanyrd page. So if you missed the conference or could not attend a session due to conflicting with another interesting session - that's your chance to catch up.

For those of you who are speaking German, there's also a summary of Apache Con available on heise Open. (If you don't speak German, I have been told that the Google Translate version of the site captures the gist of the article reasonably well.)

Apache Con NA

2011-10-25 10:50
Title: Apache Con NA
Location: Vancouver
Link out: Click here
Start Date: 2011-11-07
End Date: 2011-11-11

See you in Vancouver at Apache Con NA 2011

2011-10-24 13:49
Mid November Apache hosts its famous yearly conference - this time in Vancouver/Canada. They kindly accepted my presentations on Apache Mahout for intelligent data analysis (mostly focused on introducing the project to new comers and showing what happened within the project in the past year - if you have any wish concerning topics you would like to see covered in particular, please let me know) as well as a more committer focused one on Talking people into creating patches (with the goal of highlighting some of the issues new-comers to free software projects that want to contribute run into and initiating a discussion on what helps to convince them to keep up the momentum and over come and obstacles).

Looking forward to seeing you in Vancouver for Apache Con NA.

Talking people into submitting patches

2011-09-21 20:22
In November I am going to attend Apache Con NA. This year I decided to do a little experiment: I sumitted a talk on talking people into contributing to free software projects. The format of the talk is a bit unusual: Drawing from my - admittedly limited and very biased - experience explaining free software to others and talking people into contributing patches this talks tries to initiate a discussion on methods to get awesome developers to consider contributing their work back to free software projects.

As a precursor to the talk I have created a public Google docs document - it already contains the title of each slide I will use in my presentation. Of course the content does not get disclosed ahead of time.

If any of the readers of this blog post has experience with either explaining why they contribute, how to contribute, issues and questions new users have - please feel free to fill them into above document. I'll try to integrate as much feedback as possible into my final slides.