ApacheCon EU - part 02

2012-11-11 20:26
For me the week started with the Monday Hackathon. Even though I was there early the room quickly filled up and was packed at lunch time. I really liked the idea of having people interested in a topic register in advance - it gave the organisers a chance to assign tables to topics and put signs on the tables to advertise the topic worked on. I'm not too new to the community anymore and can relate several faces to names of people I know are working on projects I'm interested in - however I would hope that this little bit of extra transperancy made it easier for newcomers to figure out who is working on what. Originally I wanted to spend the day continuing to work on an example showing what sort of pre-processing is involved in order to get from raw html files to a prediction of which Berlin Buzzwords submission is going to be accepted. (Un-?)fortunately I quickly got distracted and drawn into discussions on what kind of hardware works best for running an Apache Hadoop cluster, how the whole Hadoop community works and where the problem areas are (e.g. constantly missing more helping hands to get all things on the todo list done).





The evening featured a really neat event: Committers and Hackathon participants were invited to the committer reception in the Sinsheim technical and traffic museum. One interesting observation: There's an easy way to stop geeks from rushing over to the beer, drinks and food: Just put some cars, motor cycles and planes in between them and the food ;)

Apache Mahout Hackathon Berlin

2011-03-21 21:39
Last year Sebastian Schelter from Berlin was added to the list of committers for Apache Mahout. With two committers in town the idea was born to meet some day, work on Mahout. So why not just announce that meeting publicly and invite others who might be interested in learning more about the framework? I got in touch with c-base - a hacker space in Berlin well suited to host a Hackathon - and quickly got their ok for the event.

As a result the first Apache Mahout Hackathon took place at c-base in Berlin last weekend. We had about eight attendees - arriving at varying times: I guess 11a.m. simply is way too early to get up for your average software developer on a Saturday. I got a few people surprised by the venue - especially those who were attending a Hackathon for the very first time and had expected c-base to be some IT company ;)

We started the day with a brief collection of ideas that everyone wanted to work on: Some needed help to use Mahout - topics included:

  • How to use Apache Mahout collaborative filtering with complex models.
  • How to use Apache Mahout via a web application?
  • How to use classification (mostly focussed on using Naive Bayes from within web applications).
  • Is HBase a solution for scalable graph mining algorithms?
  • Is there a frequent itemset algorithm that respects temporal changes in patterns?


Those more into Mahout development proposed a slightly different set of topics:

  • PLSI and Map/Reduce?
  • Build customisable sampling strategies for distributed recommendations.
  • Come up with a more Java API friendly configuration scheme for Mahout clusterings.
  • Complete the distributed SVD recommender.


Quickly teams of two to three (and more) people formed. First several user side questions could be addressed by mixing more experienced Mahout developers with newbie users. Apart from Mahout specifics also more basic questions of getting involved even by simply contributing to the online documentation, answering questions on the mailing lists or just providing structured access to existing material that users generally have trouble finding.

Another topic that is being overlooked all too when asking users to contribute to the project is the process of creating, submitting, applying and reviewing patches itself: Being deeply involved with free software projects dealing with patches, integration of issue tracker and svn with the project mailing lists all seems very obvious. However even this seemingly basic setup sometimes looks confusing and complex to regular users - that is very common but not limited to people who are just starting to work as software developers.

Thanks to Thilo Fromm for taking the group picture.

In the evening people finally started hacking more sophisticated tasks - working on the first project patches. On Sunday only the really hard core developers remained - leading to a rather focussed work on Mahout improvements which in the end led to first patches sent in from the Mahout Hackathon.

Apache Mahout Hackathon Berlin

2010-12-14 20:50
Early next year - on February 19th/20th to be more precise - the first Apache Mahout Hackathon is scheduled to take place at c-base. The Hackathon will take one weekend. There will be plenty of time to hack on your favourite Mahout issue, to get in touch with two of the Mahout committers and get your machine learning project off the ground.

Please contact isabel@apache.org if you are planning to attend this event or register with the xing event so we can plan for enough space for everyone. If you have not registered for the event there is now guarantee you will be admitted.

If you'd like to support the event: We are still looking for sponsors for drinks and pizza.

Apache Con – Hackathon days

2010-11-23 23:17
This year on Halloween I left for a trip to Atlanta/GA. Apache Con US was supposed to take place there featuring two presentations on Apache Mahout – one by Grant Ingersoll explaining how to use Mahout to provide better search features in Solr, one by myself with a general introduction to what features Mahout provides, giving a bit more detailed information on how to use Mahout for classificaiton.

I spent most of Monday in Sally Khudairi's media training. In the morning session she explained the Ins and Outs of successfully marketing your open source project: One of the most important questions is to be able to provide a dense but still accessible explanation of what your project is all about and how it differentiates from other projects potentially in the same space. As a first exercise attendees would meet in pairs interviewing each other about their respective project. When summarising the information I had gotten, Sally quickly pointed out additional pieces of valuable information I had totally forgotten to ask about:


  • First of all the full name of the interviewee, including the sur-name.
  • Second the background of the person with respect to the project. It seemed all to natural that someone you meet at Apache Con in a Media Training almost certainly is either founder or core-committer to the project. Still it is interesting to know more on how long he has been contributing, whether he maybe even co-founded the project.


After that first exercise we would go into detail on various publication formats. When releasing project information the first format that comes to mind are press releases. For software projects at the ASF these are created in a semi-standardised format containing

  • Background on the foundation itself.
  • Some general background on the project.
  • A few paragraphs on the news to be published on the project in an easily digestible format.
  • Contact information for more details.


Some of these parts can be re-used across different publications and occasions. It does make sense to keep these building blocks as a set of boilerplates ready to use when needed.

After lunch Michael Coté from redmonk visited us. Michael has a development background, currently he works as business analyst for redmonk. It is fairly simple to explain technical projects to fellow developers. To get some experience in explaining our project also to non-technical people Sally invited Michael to interview us. By the end of the interview Michael asked each whether they had any question for him. As understanding what machine learning can do for your average Joe programmer is not all to trivial I simply asked him for strategies for better explaining or show-casing our project. One option that came to his mind was to come up with one – or a few – example show cases where Mahout is applied to freely available datasets. Currently most data analysis systems are rather simple or based only on a very limited set of data. Showing on a few selected use cases what can be done with Mahout should be a good way to get quite some media attention for the project.

During the remaining time of the afternoon I started working a short explanation of Mahout and our latest release. The text was reviewed by the Mahout community. The text was published by Sally on the blog of the Apache Software foundation. I also used it as a basis for an article on heise open that got published that same day.

The second day was reserved for a mixture of attending the Barcamp session and hacking away at the Hackathon. Ross had talked me into giving an overview of various Hadoop use cases as that was requested by one of the attendees. However it turned out the guy wasn't really interested in specific use cases: The discussion quickly turned into the more fundamental question of how far the ASF should go in promoting its projects. Should there be a budget for case studies? Should there even be some marketing department. Well, clearly that is out of scope for the foundation. And in addition would run contrary to it being a neutral ground for vendors to collaborate towards common goals while still separately making money providing consulting services, selling case studies etc.

During the Hackathon I was turned into a Mentor for Stanbol, a new project entering incubation just now. In addition I spent some time to finally catch up with the Mahout mailing list.