ApacheCon - Keynotes

2010-11-25 23:20
The first keynote was given by Dana Blankenhorn – a journalist and blogger regularly publishing tech articles with a clear focus on open source projects. Focussed on the evolution of open source projects with a special focus on Apache.

Coming from a research background the keynote given by Daniel Crichton from NASA was very interesting to me: According to the speaker scientists are facing challenges that are all to known to large and distributed corporations. Most areas in science is currently becoming more and more dependent on data intensive experiments. Examples include but are not limited to

  • The field of biology where huge numbers of experiments are needed to decipher the internal workings of proteins, or to be able to understand the fundamental concepts underlying data encoded in DNA.
  • In physics hadron collider experiments huge amounts of data are generated with each experiment. With facilities for running such experiments are expensive to build and the amount of data generated is for too large to be analysed by just one team groups of scientists are suddenly facing the issue of exchanging data with remote research groups. They suddenly run into the requirement of integrating their system to those of other groups. All of a sudden data formats and interfaces have to somehow be standardised.
  • Running space missions used to be limited to just a very small number of research institutions in a very tiny number of countries. However this is about to change as more countries are gaining the knowledge and facilities to run space missions. Again this leads to the need to be able to collaborate towards one common goal.

Not only are software systems so far distinct and incompatible. Even data formats used usually are incompatible. The result are scientists spending most of their time re-formatting, converting and importing datasets before being able to get any real work done. At the moment research groups are not used to working collaboratively in distributed teams. Usually experiments are run on specially crafted, one-of software that cannot be easily re-used, that does not adhere to any standards and that is being re-written over and over again by every research group. Re-using existing libraries is oftentimes a huge cultural shift as researchers seemingly are afraid of external dependencies, afraid of giving up control over part of their system.

One step into the right direction was taken by NASA earlier this year: They released their decision making support system OODT under a free software license (namely the Apache Software License) and put the project under incubation at Apache. The project currently is about to graduate to its own top level Apache project. This step is especially remarkable as successfully going through the incubator also means to have established a healthy community that is not only diverse but also open to accepting incoming patches and changes to the software. This means to not only give up control over your external dependencies but also having the project run in a meritocratic, community driven model. For the contributing organisation, this boils down to no longer having total control over the future roadmap of the project. In return this usually leads to higher community participation, and higher adoption in the wild.