Open development and inner source for fun and profit

2017-05-26 07:17
Last in a row if interesting talks at Adobe Open Source Summit was on Open Development/ Inner Source and how it benefits internal projects given by Michael Marth. Note: He knows there's subtle differences between inner source and open development, but mentioned to use the terms interchangeably in his talk.

So what is inner source all about? Essentially: Use all the tools and processes that already work for open source projects, just internally. (Company) public mailing lists, documentation, chat software, issue trackers. Taken to it's core this is very simplistic though. The more interesting aspects are waiting when looking at the people interaction patterns that emerge.

First off: The goal of making all interaction public and easy to follow for anyone is to attract more contributors. The richest source of contributors can be tapped into if your users are tech savvy as well. Being based on this assumption inner source works best when dealing with infrastructure software, or platform software where downstream users are developers themselves.

As a general rule of thumb: From 100 users, 10 will contribute, one will stick around and become a long term committer. This translates into a lot of effort for gaining additional hands for your project.

So - assuming what you want is a wildly successful open source project: You put your code on Github (or whatever the code hosting site of the day is), start some marketing effort - but still not magic happens, no stars, no unicorns, maybe there's 10 pull requests, but that's about it. What happened?

Architecting a community around an open source project is a long term investment: Over time you'll end up training numerous newbies, help people getting started and convince some of those to contribute back.

According to the speaker Michael Marth where that works best is for infrastructure projects: Where users can be turned into contributors and where projects can be turned into platform software that lasts for a decade and longer. In his opinion what is key are two factors: Enabling distributed decision making to let others participate, and a management style that lets the community take their own decisions instead of having one entity control the project. Usually what emerges from that is a distributed, peer-peer networked organisational structure with distributed teams, no calls, no standups, consensus based decision making.

In Michaels experience what works best it to adopt an open source working model from the very start. His recommendation for projects coming from comercial entities is to go to the Apache Software Foundation: There, proven principles and rules have been identified already. In addition going there gives the project much more credibility when it comes to convincing partners that decisions are actually being made in a distributed fashion without being controllable by one single entity. So telling a customer "We have to check this with the community first" as an answer to a feature request becomes much more credible.

The result of this approach are projects that under his guidance gained ten times as many people contributing to the project outside of the original entity than inside of it. The result Michael observed were partners that were much more likely to stick with the technology by means of co-owning it. Partners were participating in development. Also the project made for a lovely recruiting pipeline filled with people already familiar with the technology.

Async decision making

2017-05-16 06:45
This is the second in a series of posts on inner source/open source. Bertrand Delacretaz gave an interesting talk on how to avoid meetings by introducing an async way of making decisions.

He started off with a little anecdote related to Paul Graham's maker's vs. manager's schedule: Bertrand's father was a carpenter. He was working in the same house that his family was living in, so joining the family for lunch was common for him. However there was one valid excuse for him to skip lunch: "I'm glueing." How's that? Glueing together a chair is a process that cannot be interrupted once started without ruining the entire chair - it's a classical maker task that can either be completed in one go, or not at all.

Software development is pretty similar to that: People typically need several hours of focus to get anything meaningful done, in my personal experience at least two (for smaller bugs) or four (for larger changes). That's the motivation to keep forced interruptions low for development teams.

Managers tend to be on a completely different schedule: Context switching all day, communicating all day adding another one hour meeting to the schedule doesn't make much of a difference.

The implication of this observation: Adding one hour of meeting time to an engineer's schedule comes with an enourmous cost if we factor the interruption into the equation. Adding to the equation that lots of meetings actually fail for various reasons (lack of preparation, lack of participants getting prepared, bad audio or video quality, missing participants, delayed start time, bad summarisation) it seems valid to ask if there is a way to reduce the number of face to face meetings while still remaining operational.

As communication still remains key to a functional organisation, one approach taken by open development at Adobe (as well as at the Apache Software Foundation really) is to differentiate between things that can only be discussed in person and decisions that can be taken in an asynchronous fashion. While this doesn't reduce the amount of time communicating (usually quite the contrary happens) it does allow for participants to participate pretty much on their own schedule thus reducing the number of forced interruptions.

How does that work in practice? In Bertrand's experience decision making tends to be a four step process: Coming from an open brainstorming sessions, options need to be condensed, consensus established and finally a decision needs to be made.

In terms of tooling in his experience what works best is to have one and only one shared communication medium for brainstorming. At Apache those are good old mailing lists. In addition there is a need for a structured issue tracker/ case management tool to make options and choices obvious, decisions visible and archived.

When looking at tooling we are missing one important ingredient though: Each meeting needs extensive preparation and thourough post processing. As an example lets take the monthly Apache board of directors' meeting: It's scheduled to last no longer than two hours. Given each of hundreds of projects are required to report on a quarterly basis, given that executive officers need to provide reports on a monthly basis, given that each month at least one major decision item comes up and given that there is still day to day decisions about personel, budget and the like to be taken: How does reducing that to two hours work? The secret sauce is a text file in svn + a web frontend called whimsy to it.

Directors will read through those reports ahead of the meeting. They will add comments to them (which will be mailed automatically to projects), often those comments are used by directors to communicate with each other as well. They will pre-approve reports, they will mark them for discussion if there is something fishy. Some people will check projects' lists to match that up with what's being discussed in the report, some will focus on community stuff, some will focus on seeing releases being mentioned. If a report gets enough pre-approvals an no mark "to be discussed" they are not being shown or touched in the real meeting.

That way most of the discussion happens before the actual meeting leaving time for those issues that are truely contentious. As the meeting is open for anyone in te foundation to attend questions raised beforehand that could not be resolved in writing can be answered in the voice call fairly quickly.

Speaking of call: How does the actual meeting proceed? All participants dial in via good old telephone. Everyone is on a telephone so the issue of "discussions among people in the same room are hard to understand for remote participants" doesn't occur. In addition to telephone there's an IRC backchannel for background discussion, chatter, jokes and less relevant discussion. All discussion that has to be archived and that relates to discussions is kept on the voice channel though. During the meeting the board's chair is moderating through the agenda. In addition the secretary will make notes of who attended, which discussions were made and which arguments exchanged. Those notes are being shared after the meeting, approved at the following month's meeting and published thereafter. If you want to dig deeper into any project's history, there's tooling to drill down into meeting minutes per project until the very beginning of the foundation.

Does the above make decision making faster? Probably not. However it enables an asynchronous work mode that fits best with a group of volunteers working together in a global, distributed community where participants do not only live in different geographies and timezones but are on different schedules and priorities as well.

Notebook - OSS office at Adobe

2017-05-12 06:46
tl;dr: This post summarises what I learnt at the Adobe Open Source Summit last week about which aspects to think of when running an open source office. It's mainly a mental note for myself, hopefully others will find it useful as well.

Longer version:

This is another post in a series of articles on "random stuff I leant in Basel last week". When I was invited to Adobe's open source summit I had no idea what to expect from it. In retrospect I'm glad I accepted the invitation - it was a great mix of talks, a valuable insight into how others are adopting open source processes and their motivation for open sourcing code and helping out upstream.

The first in a series of talks gave an overview of Adobe's open source office. Currently the expertise collected there includes one human with a software development background, one human with a marketing background and one human with program management background showing the breadth of topics faced when interacting with open source projects.

The self-set goal of these people is to make contributing to open source as easy as possible. That includes making people comfortable internally with the way many open source projects work. It means making (e.g. legal, branding) review processes of contributions internally as easy as possible. It means supporting teams with checking contributions for license conformance, sensitive (e.g. personal) information that shouldn't be shared publicly. It also means supporting teams that want to run their own open source projects.

One idea that I found very intriguing was to run their team through a public github repository, in a sort-of eat-your-own-dogfood kind-of way.

One of the artifacts maintained in that repository is a launch checklist that people can follow which includes reminders for creating a blog post, publishing on the website, sharing information on social media. Most likely those are aspects that are often forgotten even in established projects when it comes to topics like releasing and promoting a new software release version.

Another aspect I found interesting was the creation of an OSS advisory board - a group of people intersted in open development and open source. A wider group of people tasked with figuring out how to best approach open development, how to get the organisation to move away from a private by default strategy towards an open by default way of collaborating across team boundaries.

Many of these topics fit well within what was shared earlier this year in the Linux foundation webcast on how to go about creating an open source office in organisations.

Talking people into submitting patches - results

2012-01-01 18:42
Back in November I gave a talk at Apache Con NA in Vancouver on talking friends and colleagues into contributing patches to open source projects. The intended audience for this talk were experienced committers to Apache projects, the goal was to learn more on their tricks for talking people into patching. First of all thanks for an interesting discussion on the topic - it was great to get into the room with barely enough slides to fill 10 min and still have a lively discussion 45min later.

For the impatient - the written feedback is available as Google Doc. Most common advise I heard involved patience, teaching, explaining, fast feedback and reward.

One warning before going into more detail on the talk: All assumptions and observations stated are highly subjective, influenced by my personal experience or by whatever the experience of the audience was. Do not expect an objective, balanced, well research analysis of the problems in general. That said, lets start with the talk itself. Before the talk I decided to limit scope to getting people in that have limited experience with open source. That intentionally excluded anyone downstream projects depending on one's code. Though in particular interaction with common Linux distributions and their package maintainers is vital, that issue warrants for a separate talk and discussion.

I divided those inexperienced with open source into three groups to keep discussion somewhat focused:

  • Students learning about open source projects during their education and have neither background in software engineering nor in open source but are generally very eager to lean and open to new ideas.
  • Researchers learning about the concept as part of a research grant who have some software engineering experience, some experience with open source - in particular with using it - but in general do not have writing open source software as their main objective, but have to participate as part of their research grant.
  • Software engineers having experience with software engineering, some experience in particular with using open source and in general both strong opinions on what the right way of doing things is and who have a strong position in their team that helps them in no way when starting to contribute.


One very common way



To understand some of the issues below let me first highlight what seems to be the most common way to become involved with any Apache project: Usually it starts with using one of their software packages. After some time what is shipped does no longer fit your needs, reveals bugs that stop you from reaching your goals or is missing one particular feature - even if that is just one particular method being protected instead of private.

People fix those issues. As the best software developers are utterly lazy the contribute stuff back to the project to avoid the work of having to maintain their private fork just for some simple modification. The more features of a project are being used, the more likely it gets that also larger contributions become possible. Overall this way of selecting issues to fix has a lot to do with scratching your own itch. In the end this kind of issue prioritisation also influences the general direction of a project: Whatever is most important to those actively contributing is driving the design and development. So the only way to change a project's direction to better fit your needs is to start getting active yourself: Those that do are the ones that decide.

Students



Lets take a closer look at students aspiring to work on an open source project. They are very keen on contributing new stuff, learning the process and open to new ways of doing things. However for the most part they are no active users of the projects they selected so they do not directly see what is important to fix. In addition they have only limited software development experience - at least when looking at German universities, bug trackers, source version control, build systems, release management, maintaining backwards compatibility, unit test frameworks are on no schedule - and most likely shouldn't be neither. So your average student has to learn to deal with checking out code, compiling it, getting it into their favourite editor, adding tests and making them pass.

Apart from teaching, giving even simple feedback it helps to provide the right links to literature at the right times, and generally mentor students actively. In addition it can be helpful to leave non-critical, easy to fix issues open and mark them as "beginner level" to make it easier for new-comers to get started. One last advise: Get students to publish what they do as early and as often as possible. Back in the days I used to do projects at TU Berlin with the goal of getting students to contribute to Mahout. In the first semester I left the decision on when to open up the code to the students - they never went public. In the second semester I forced them to publish progress on a weekly basis (and made that part of how their final evaluation was done) - suddenly what was developed turned into a patch committed to the code base.

Researchers



A second group of people that has an increasing interest in open source projects are researchers. In particular for EU project research grant the promise of providing results and software developed with the help of European tax-payers money under and open source license has become an important plus when asking for project grants.

However before becoming all too optimistic it might make sense to take a closer look: Even though there is an open source check box on your average research grant that by no means leads to highly motivated, well educated new contributors for your project: With software development only being a means to reach the ultimate goal of influential publications researchers usually do not have the time and motivation to polish software to the level needed for a successful and useful contribution. In addition the concept of maintaining your contribution for a longer time usually does not fit the timeline and timeframe of a research project.

Apart from teaching and mentoring projects themselves should start asking for the motivation of the contribution. There are a few popular arguments to contribute patches back. However not all of them really work for the research use case: The cost of maintaining a fork is close to zero if you intend to never upgrade to a new version and do not need security fixes. Another common argument is an improved visibility of your work and an improved reputation of yourself as software developer. If software development for you is just a means to reach a much higher goal those arguments may not mean much to you. A third common argument is that of improving code quality by having more than one pair of eyes review it - and where would you get a better review than in the project bringing together the original code authors? However if ultimate stability, security and flexibility is not your goal than also that may not mean much to you.

Key is to find out where the interest for working on open source comes from and build up arguments from there.

Software engineers



The third group I identified was professional software developers - as clarified after a question from the audience: Yes, I consider people who are unable to create, read, apply patches as professional software developers. If I would exclude these people there would be noone left who earns his living with software development and does not already work on open source projects.

In contrast to the above groups these people have extensive software development experience. However that also means that after having seen a lot of stuff that works and that does not work they do have a strong position in their teams. Usually those fixing issues in libraries they use re the ones that have established work-flows that work for them very well and who are used to being pretty influential. When going into an open source community however no-one knows them. In general they are only judged based on their patch. They get open feedback - in the context of that project. Projects tend to have established coding guidelines, best practices, build systems - that may differ from what you are used to in your corporate environment.

Getting up to speed in such an environment can be intimidating at best in particular if everything you do is public, searchable and findable by definition. All the more it is important to get involved and get feedback early by even putting online early sketches of what your plan is.

However with everything being open there is also one major positive side to motivating contributors: Give credit where credit is due - add praise to the issue tracker by assigning issues to the one providing he patch, add the name of the contributor to your release notes. When substantial, mention the contribution with name in talks, presentations and publications.

Another important issue here is the influence of deadlines: If it takes half a year to get feedback on your particular improvement the reason why you made it may no longer exist - the project may have been cancelled, the developer moved to a different team, the patch applied internally as is fixing the existing issues. Fast feedback on new patches, in particular if they are clean and come with tests is vital. One positive example for providing feedback on formal issues quickly is the automated review bot at Apache Hadoop: It checks stuff like style, addition of tests, checks against existing tests and the like quickly after the patch is submitted in an automated way. Just one nitpick from the audience: The output of that bot could be either marked more clearly as "this is automated" or the text formulated a bit friendlier - if a human had done the review it would have mentioned the positive things first before criticising what is wrong.

Last but not least (applies to researchers as well), there may be legal issues lurking: Most if not all contracts entail that at least what you do during working hours belongs to your employer - so it's up to them what gets open sourced and what doesn't. Suddenly your very technical new contributor has to convince management, deal with legal departments and work his way through the employers processes - most likely without deep prior knowledge on open source licenses - let alone contributor agreements (or did you know what the Apache CCLA entails, let alone being able to explain it to others before really getting active?)

General advise



To briefly summarise the most important points:


  • Give feedback fast - projects only run for so long, interest only lasts for so long. The faster a contributor is told what is not too great about his patch, the more likely those issues are fixed as part of the contribution. (Inspired by Avro and Zookeeper who were amazingly fast in providing feedback, committing and in the case of Avro even releasing a fixed version).
  • When it comes to new contributors be patient, remain friendly even when faced with seemingly stupid mistakes.
  • Give credit where credit is due - or could be due. Mention contributors in publications, press releases, release notes, the bug tracker. Let them know that you do. (Inspired by Drools, Tomcat, Zookeeper, Avro). Pro-tip: Make sure to have no typo in people's names even if checking takes one extra minute. (Learned from Otis).
  • Use any chance you get to teach the uninitiated about the whole patch process. I know that this seems trivial to those who work with open source on a daily basis. However when getting dependencies through Maven it may already be cumbersome to figure out where to get the source from. When used to git in the daily workflow it may be a hurdle to remember how to checkout stuff from svn ;) Back in June we had a Hadoop Hackathon in Berlin that was well attended - mostly by non-committers. Jakob Homan proposed a rather unusual but very well received format: In the Hadoop bug tracker there are several issues marked as trivial (typos in documentation and the like). Attendees were asked to choose one of these issues, checkout the source, create a patch and contribute it back to the project. Optionally they got explained how the process continues from there on the committer side of things. It may seem trivial to mechanically go through the patch process, however it help lower the bar in case you have a real issue to fix to first get accustomed to just how it works. If instead of contributing to Apache you are more into working on the Linux kernel I'd like to advise you to watch Greg Kroah Hartman on writing and submitting your first Linux kernel patch (FOSDEM).
  • Last but not least make sure to lower the bar for contribution - do not require people to jump through numerous loops, in general even just getting a patch ready is complicated enough. Provide a how to contribute page (e.g. see how to contribute and how to become a committer pages in the Apache Mahout wiki.
  • In particular when your project is still very young lower the bar by turning contributors into committers quickly - even if they are "just" contributing documentation fixes - in my view one of the most important contribution there is as only users spot areas for documentation improvement.


In case you yourself are thinking about contributing and need some additional advice as to why and for what purposes: Dr Dobbs has more information on reasons why developers tend to start to contribute to Apache software, Shalin explains why he contributes to open source, on the Mahout mailing list we hade a discussion on why also students should consider contributing, on the Apache community mailing list there was an interesting discussion on whether developers working on open source are happier than those that don't.

Some statistics

2010-08-11 20:03
Various research projects focus on learning more on how open source communities work:
  • What makes people commit themselves to such projects?
  • How much involvement from various companies is there?
  • Do people contribute during working hours or in their spare time?
  • Who are the most active contributors in terms of individuals and in terms of companies?


When asked to fill out surveys, especially in cases where that happens for the n-th time with n being larger than say 5, software developers usually are not very likely to fill out these questionairs. However knowing some of the processes of open source software development it soon becomes obvious there are way more extensive sources for information - albeit not trivial to evaluate and prone to at least some error.

Free software tends to be developed "in the open": Project members with various backgrounds get together to collaborate on a common subject, exchanging ideas, designs and thoughts digitally. Nearly every project with more then two members at least has mailing list archives and some sort of commit log to some version control system. Usually people also have bugtrackers that one can use as a source for information.

If we take the ASF as an example, there is a nice application to create various statistics from svn logs:


The caveats of this analysis are pretty obvious: Commit times are set according to the local of the server, however that may be far off compared to the actual timezone the developer lives in. Even when knowing each developer's timezone there is still some in-accuracy in the estimations as people might cross timezone bounderies when going off for vacation. Still the data available from that source should already provide some input as to when people are contributing, how many projects they work on, how much effort in general goes into each project etc.

Turning the analysis the other way around and looking at mailing list contributions, one might ask whether a company indeed is involved in free software development. One trivial, naive first shot could be to simply look for mailinglist postings that originate from some corporate mail address. Again the raw numbers displayed below have to be normalised. This time company size and fraction of developers vs. non-developers in a company has to be taken into consideration when comparing graphs and numbers to each other.

Yet another caveat are mailinglists that are not archived in the mail archiving service that one may have choosen as the basis for comparison. In addition people may contribute from their employer's machines but not use the corporate mail address (me personally I am one of these outliers, using the apache.org address for anything at some ASF project).


101tec


JTeam


Tarent



Kippdata


Lucid Imagination


Day



HP


IBM


Yahoo!



Nokia


Oracle


Sun




Easily visible even from that trivial 5min analysis however is general trending of involvement in free software projects. In addition those projects are displayed prominently projects that employees are working with and contributing to most actively - it comes as no surprise that for Yahoo! that is Hadoop. In addition if graphs go back in time far enough, one might even see the timeframe of when a company changed its open source strategy (or was founded (see the graph of Lucid), or got acquired (see Sun's graph), or acquired a company with a different stategy (see Oracle's graph) ;) ).

Sort of general advise might be to first use the data that is already out there as a starting point - in contrast to other closed communities free software developers tend to generate a lot of it. And usually it is freely available online. However when doing so, know your data well and be cautious to draw premature conclusions: The effect you are seeing may well be caused by some external factor.