Archive

Archive for the ‘Apache’ Category

Learning Machine Learning with Apache Mahout

December 13th, 2011 at 10:20pm

Once in a while I get questions like Where to start learning more on machine learning. Other than the official sources I think there is quite good coverage also in the Mahout community: Since it was founded several presentations have been given that give an overview of Apache Mahout, introduce special features or even go into more details on particular implementations. Below is an attempt to create a collection of talks given so far without any claim to contain links to all videos or lectures. Feel free to add your favourite in the comments section. In addition I linked to some online courses with further material to get you started.

When looking for books of course check out Mahout in Action. Also Taming Text and the data mining book that comes with weka are good starting points for practitioners.

Introductory, overview videos

Technical details

Further course material

Mahout, Science , ,

See you in Vancouver at Apache Con NA 2011

October 24th, 2011 at 1:49pm

Mid November Apache hosts its famous yearly conference - this time in Vancouver/Canada. They kindly accepted my presentations on Apache Mahout for intelligent data analysis (mostly focused on introducing the project to new comers and showing what happened within the project in the past year - if you have any wish concerning topics you would like to see covered in particular, please let me know) as well as a more committer focused one on Talking people into creating patches (with the goal of highlighting some of the issues new-comers to free software projects that want to contribute run into and initiating a discussion on what helps to convince them to keep up the momentum and over come and obstacles).

Looking forward to seeing you in Vancouver for Apache Con NA.

Apache Con, Free Software, Mahout ,

GoTo Con

October 10th, 2011 at 8:49pm

Location: Amsterdam
Link out: Click here
Start Date: 2011-10-12
End Date: 2011-10-14

This week late Tuesday night I am going to leave for GoTo con in Amsterdam. Train tickets are already booked - this is going to be my first trip with City Night line, will see how great they are.

GoTo Amsterdam features a special Apache track as well as several talk on scaling up, searching, but also includes stuff in general architectural decisions. If you have not registered yet - use dros200 as promotion code to get a discount on the registration prize.

Looking forward to seeing you in Amsterdam later this week.

General, Mahout , ,

Apache Mahout Hackathon Berlin

March 21st, 2011 at 9:39pm

Last year Sebastian Schelter from Berlin was added to the list of committers for Apache Mahout. With two committers in town the idea was born to meet some day, work on Mahout. So why not just announce that meeting publicly and invite others who might be interested in learning more about the framework? I got in touch with c-base - a hacker space in Berlin well suited to host a Hackathon - and quickly got their ok for the event.

As a result the first Apache Mahout Hackathon took place at c-base in Berlin last weekend. We had about eight attendees - arriving at varying times: I guess 11a.m. simply is way too early to get up for your average software developer on a Saturday. I got a few people surprised by the venue - especially those who were attending a Hackathon for the very first time and had expected c-base to be some IT company ;)

We started the day with a brief collection of ideas that everyone wanted to work on: Some needed help to use Mahout - topics included:

  • How to use Apache Mahout collaborative filtering with complex models.
  • How to use Apache Mahout via a web application?
  • How to use classification (mostly focussed on using Naive Bayes from within web applications).
  • Is HBase a solution for scalable graph mining algorithms?
  • Is there a frequent itemset algorithm that respects temporal changes in patterns?

Those more into Mahout development proposed a slightly different set of topics:

  • PLSI and Map/Reduce?
  • Build customisable sampling strategies for distributed recommendations.
  • Come up with a more Java API friendly configuration scheme for Mahout clusterings.
  • Complete the distributed SVD recommender.

Quickly teams of two to three (and more) people formed. First several user side questions could be addressed by mixing more experienced Mahout developers with newbie users. Apart from Mahout specifics also more basic questions of getting involved even by simply contributing to the online documentation, answering questions on the mailing lists or just providing structured access to existing material that users generally have trouble finding.

Another topic that is being overlooked all too when asking users to contribute to the project is the process of creating, submitting, applying and reviewing patches itself: Being deeply involved with free software projects dealing with patches, integration of issue tracker and svn with the project mailing lists all seems very obvious. However even this seemingly basic setup sometimes looks confusing and complex to regular users - that is very common but not limited to people who are just starting to work as software developers.

Thanks to Thilo Fromm for taking the group picture.

In the evening people finally started hacking more sophisticated tasks - working on the first project patches. On Sunday only the really hard core developers remained - leading to a rather focussed work on Mahout improvements which in the end led to first patches sent in from the Mahout Hackathon.

Hacking, Mahout , , ,

Apache Mahout Meetup Amsterdam

February 19th, 2011 at 8:18pm

Last week I was honoured to be invited as one of the two speakers on Apache Mahout at the Mahout meetup in Amsterdam at JTeams offices. After free beer, cola and pizza Frank Scholten gave an overview of Mahout’s clustering capabilities. After a brief introduction to Mahout itself he went into a little more detail on how clustering works in general. After that with a selection of Seinfeld scripts he used a fun data set to guide the audience through the process of choosing the right data preparation steps, coming up with good training parameters and finally evaluating clustering quality.

After that I gave a brief introduction to classification with Mahout - going into a little more detail when it comes to data preparation and quality evaluation. The audience seemed most interested in learning more on how data preparation works - after all that step cannot really be covered by Mahout itself (though we do have some support) but instead needs a lot of domain knowledge from the user side.

Judging from the brief round of self introductions the meetup was well visited by an intesting mixture of people coming from JTeam, Hippo, the dutch police working on data analytics, developers working at RIPE and many more.

If you are interested in more data analysis, search and data storage - do not miss registration for Berlin Buzzwords on June 6/7th 2011.

General, Mahout , ,

O’Reilly Strata Conference

January 22nd, 2011 at 4:34am

Title: O’Reilly Strata Conference
Location: Santa Clara
Link out: Click here
Description: Early next February O’Reilly is planning to put on a very interesting conference on the topic of data analysis and the business of generating value from raw digital data.


Strata 2011

I’m really glad to have received the acceptance notification for my presentation and travel sponsorship from the DICODE project. So see you in Santa Clara.
Start Date: 2011-02-01
End Date: 2011-02-03

If you are still unsure whether you should attend or not: Strata kindly handed out discount codes to speakers to share with their followers and readers. It saves you 25% of the registration cost - just use str11fsd during registration.

General, Mahout , , ,

Apache Hadoop - Trainings by Cloudera in Berlin

December 22nd, 2010 at 11:53pm

Cloudera is offering trainings both for Administrators as well as for Developers early next year in Berlin. If your are getting started in using Apache Hadoop this might be a great option to get your developers and operations up to speed with the framework. If you are a regular of the local Apache Hadoop Get Together a discount code should have been sent to you by mail.

Get Together, Hadoop , , , ,

Apache Mahout Hackathon Berlin

December 14th, 2010 at 8:50pm

Early next year - on February 19th/20th to be more precise - the first Apache Mahout Hackathon is scheduled to take place at c-base. The Hackathon will take one weekend. There will be plenty of time to hack on your favourite Mahout issue, to get in touch with two of the Mahout committers and get your machine learning project off the ground.

Please contact isabel@apache.org if you are planning to attend this event or register with the xing event so we can plan for enough space for everyone. If you have not registered for the event there is now guarantee you will be admitted.

If you’d like to support the event: We are still looking for sponsors for drinks and pizza.

Mahout , , , ,

Apache Mahout Podcast

December 13th, 2010 at 9:21pm

During Apache Con ATL Michael Coté interviewed Grant Ingersoll on Apache Mahout. The interview is available online as podcast. The interview covers the goals and current use cases of the project, goes into some detail on the reasons for initially starting it. If you are wondering what Mahout is all about, what you can do with it and which direction development is heading, the interview is a great option to find out more.

Apache Con, Mahout, Software Foundation , , ,

Apache Lunch Devoxx

December 11th, 2010 at 9:30pm

On Twitter I suggested to host an Apache dinner during Devoxx. Matthias Wesendorf of Apache MyFaces was so kind to take up the discussion carrying it over to the Apache community mailing-list. It quickly turned out that there was quite some interest with several members and committers attending Devoxx. We scheduled the meetup for Friday after the conference during lunch time.
I pinged a few Apache related people I knew would attend the conference (being a speaker and a committer at some Apache project almost certainly resulted in getting a ping). Steven Noels kindly made a reservation at a restaurant close by and announced time and geo coordinates on party.apache.org. Although several speakers had left already that very same morning, we turned out to be eleven people – including Stephen Coleburn, Mathias Wessendorf, Steven Noels, Martijn Dashorst of the Apache Wicked project. Was great meeting all of you – and being able to put some faces to names :)

General, Software Foundation , , ,