Archive

Archive for November, 2009

ScrumTisch Berlin - November

November 26th, 2009

This evening Marion organised another successful Scrumtisch: Usually we either meet for timeboxed discussions on Scrum and agile development questions or a speaker is invited to present his favourite Scrum or Lean or Agile topic.

Today Markus Frick gave a presentation of how Scrum was introduced at SAP, a German software company. They started implementing Scrum in a “small” team of about 60 people, organised in about six to seven teams. The idea was to get people together who are already sort of familiar with agile technologies and let them evaluate what works in the companies context at what doesn’t. The conversion started with trainings, teams were organised by features as opposed to components. The main goal was to get people to learn Scrum and then spread the idea across the whole company.

Soon upper management were fascinated by the methodology - not shortly after that the goal was reset to converting 2500 employees working at four different locations (Bangalore, Berlin, Waldorf and Sofia/France) on diverse topics ranging from developers to managers to agile methods. The question thus turned from scaling Scrum to quickly scaling the conversion process: Where do we get enough trainers? Where does Scrum expertise come from? How should communication be organised? How do we adapt our sales and governance processes?

The way to do this chosen back than was to use Scrum itself for the conversion process. That is to introduce teams for training and conversion and let them work according to a backlog. Also managers were set up to participate by organising work according to a backlog containing management tasks. This first let to quite some confusion: Conversion does take time and working according to sprint backlogs makes it pretty much obvious how much time it actually takes and how much time people really spent on these tasks. On the other hand, the whole process was made very transparent for everyone - and open for participation.

The process started about two years ago - it has not finished to date and processes continue to evolve, get improved and refined as people go along. A very rough estimation was that it might take another three years to get to a stable, clean state. However most - if not all - problems were not caused by Scrum. They were made visible and trackable by Scrum.

The main take home messages were that Scrum does bring improvement:

  • It makes goals transparent and communicates clearly the current state.
  • You get a short feedback cycle so people actually see where problems are.
  • It inherently allows for reflection and analysis of problems.
  • As introduced here it also made the work of management people transparent by making backlog and velocity of managers accessible by everyone.
  • Internal trainings helped to get feedback from teams who are already practicing what is introduced.

Among the people who were very skeptical there usually were quite a few people from middle management. Uncertain about how future development should work they usually feared a loss in influence. Most positive feedback came from developers themselves: After explaining what Scrum is all about, that is includes shore release cycles and fast feedback, most developers that were in the teams already for quite some time reacted by stating that this basically resembled development “in the good old days” with a bit of development process added on top.

If you are interested in hearing more stories on how Scrum is or was introduced in companies of various sizes, I would like to recommend visiting the German Scrum Day in Düsseldorf. The talk by Thilo Fromm gives a nice overview of how a transition from traditional Waterfall to Scrum can look like. And agile42 Andrea Tomasini will talk about the Scrum implementation in distributed teams at be2 ltd.

Update: This blog post was re-posted at the Agile42 blog.

Scrum , ,

First Apache Dinner Berlin

November 25th, 2009

A few days ago, I received a mail from Torsten Curdt that read something like: “[...] For a long time now I wanted to organise an Apache Dinner Berlin. What do you think, when would be a good time for that?”. As that was about the third time I heard of that idea (and the third person mentioning the idea), I included some Berlin-based Apache-people asking whether they would be interested in having an Apache Dinner on November 24st in X-Berg. General answer: Yes! Sure!

The idea was to make it open to anyone interested in the ASF and send invitations to committers who are living in the greater-Berlin-area. Then book a table, have some food, get some drinks…

We met at Graefekiez - we, that is Torsten (Jakarta and Hadoop), Jan and Daniel (CouchDB), Simon+Vera (Lucene), oswald (xampp), Eric (Http Components) and myself - for a great “small menu” at La Buona Forchetta (Thanks to Torsten for coming up with that restaurant and booking the table). After that some of us moved over to a bar close to the restaurant.

After a long evening with lots of interesting (cross-project as well as non-technical) discussions, the general conclusion was to organize another Apache Dinner some time in January after Christmas-time is over:

Thanks guys for a great evening. Hope to see you all - as well as a few more Apache people from around Berlin - in January. Date and location to be set.

Final note to self: No Club Mate for Isabel after 02:00 a.m. …

Apache ,

Moving from Fast to Solr

November 19th, 2009

Sesat has published a nice in-depth report on why to move from Fast to Solr. The article also includes a description of the steps taken to move over as well as several statistics:

http://sesat.no/moving-from-fast-to-solr-review.html

On a related topic, the following article details, where Apple is using Lucene/Solr to power it’s search. Spoiler: Look at Spotlight, their desktop search, as well as on the iTunes search with about 800 QPS.

Update: As the site above could not be reached for quite some time, you should either look into the Google Cache version.

Free Software, Lucene ,

ApacheCon Oakland Roundup

November 19th, 2009

Two weeks ago ApacheCon US 2009 ended in Oakland California. Shane published a set of links to articles that contain information on what happened at Apache Con. Some of them are officially published by the Apache PRC project, others are write-ups of individuals on which talks they attended and which topics they considered particularly interesting.

Apache Con , ,

Mahout 0.2 released

November 18th, 2009

Apache Mahout 0.2 has been released and is now available for public download at http://www.apache.org/dyn/closer.cgi/lucene/mahout

Up to date maven artifacts can be found in the Apache repository at
https://repository.apache.org/content/repositories/releases/org/apache/mahout/

Apache Mahout is a subproject of Apache Lucene with the goal of delivering scalable machine learning algorithm implementations under the Apache license. http://www.apache.org/licenses/LICENSE-2.0

Mahout is a machine learning library meant to scale: Scale in terms of community to support anyone interested in using machine learning. Scale in terms of business by providing the library under a commercially friendly, free software license. Scale in terms of computation to the size of data we manage today.

Built on top of the powerful map/reduce paradigm of the Apache Hadoop project, Mahout lets you solve popular machine learning problem settings like clustering, collaborative filtering and classification
over Terabytes of data over thousands of computers.

Implemented with scalability in mind the latest release brings many performance optimizations so that even in a single node setup the library performs well.

The complete changelist can be found here:

http://issues.apache.org/jira/browse/MAHOUT/fixforversion/12313278

New Mahout 0.2 features include

  • Major performance enhancements in Collaborative Filtering, Classification and Clustering
  • New: Latent Dirichlet Allocation(LDA) implementation for topic modelling
  • New: Frequent Itemset Mining for mining top-k patterns from a list of transactions
  • New: Decision Forests implementation for Decision Tree classification (In Memory & Partial Data)
  • New: HBase storage support for Naive Bayes model building and classification
  • New: Generation of vectors from Text documents for use with Mahout Algorithms
  • Performance improvements in various Vector implementations
  • Tons of bug fixes and code cleanup

Getting started: New to Mahout?

For more information on Apache Mahout, see http://lucene.apache.org/mahout

A very BIG Thank You to all those who made this release happen!

Hadoop , , ,

Open Source Expo 09

November 16th, 2009

I spent last Sunday and the following Monday at Open Source Expo Karlsruhe - co-located with web-tech and php-conference organized by the Software-and-Support Verlag. Together with Simon Willnauer I ran the Lucene/Mahout booth at the expo.

So far the conference is still very small (about 400 visitors) compared to free software community events. However the focus was set to be more on professional users, accordingly several projects showed that free software can be used successfully for various business use cases. Visitors were invited to ask Sun about their free software strategy. Questions concerning OpenJDK or MySQL were not uncommon. Large distributors like SuSE or Mandriva were present as well. But also smaller companies e.g. providing support for Apache OfBIZ were present.

The Apache Lucene project was invited as exhibitor as well. Together with PRC and ConCom we organized for an Apache banner. Lucid Imagination sponsored several Lucene T-Shirts to be distributed at the conference. At the very last minute information (abstract, links to projects and mailing lists and current users) was put together on flyers.

We arrived on Saturday, late evening. Together with a friend of mine we went for some indian food at a really good restaurant close to the hotel. Big thanks to her, for being our tourist guide - hope to see you back in Waldheim in December ;)

Sunday was pretty quiet - only few guests arrived at the weekend. I was invited by David Zuelke to give a brief introduction to Mahout during his MapReduce Hadoop tutorial workshop. Thanks, David. Though lunch was served already, people did stay to hear my presentation on large scale machine learning with Mahout. I got contacted by one of the students of Katarina Morik who was pretty interested in the project. Back at her research group people are working on Rapid Miner - a tool for easy machine learning. It comes with a graphical user interface that makes it simple to explore various algorithm configurations and data workflow setups. It would be interesting to see how this tool helps people to understand machine learning. Would also be very interesting to learn what form of contribution might be interesting and appropriate for research groups to contribute to Mahout. Maybe not code-wise but more in terms of discussions and background knowledge.

Sunday was a bit more busy, with more people attending the conferences. Simon got a slot to present Lucene at the Open Stage track and show off the new features of Lucene 2.9. Those using Lucene already could be tricked into telling their Lucene success-story at the beginning of the talk. At the booth we had a wide variety of people: From students trying to find a crawling and indexing system for their information retrieval course homework up to professionals with various questions on the Apache Lucene project. The experience of people at the conference varied widely. That proved to be a pretty good reality-check. Being part of the Lucene and the ASF community one might be tempted to think that not knowing about Lucene is almost impossible. Well, it seems to be less impossible than at least I expected.

One last success: As the picture shows, Yacy now is powered by Lucene as well - at least in terms of T-Shirt ;)

Events, Lucene, Mahout , , , ,

Apache Con US Wrap Up

November 16th, 2009

some weeks ago I attended ApacheConUS09 in Oakland/ California. In the mean time, videos of one of the sessions have been published online:

You can find a wrap up of the most prominent topics at the conference at heise (unfortunately Germany-only).

By far the largest topics at the conference:

  • Lucene - there was a meetup with over 100 attendees as well as two main tracks with Lucene focussed talks. New features of Lucene 2.9.* were in the center of interest: The new range search capabilities, segment search that improves caching, a new token stream api that makes annotating terms more flexible as well as a lot of performance improvements. Shortly after the conference, Lucene 2.9.1 as well as Solr 1.4 was released so end-users switching to the new version now benefit from better performance and several new features.

  • Hadoop - large scale data processing currently is one of the biggest topics. Be it logfile analysis, business intelligence or ad-hoc analysis of user data. Hadoop was covered by a user meetup as well as one track on the first conference day. The track started with an introduction by Owen O’Malley and Doug Cutting. It continued with talks on HBase, Hive, Pig and other projects from the Hadoop ecosystem.

But also projects like Apache Tomcat and Apache HTTPD were well covered within one to two sessions each.

Currently a hot topic within the foundation is the challenge of bringing the community together face-to-face. Apache projects have become so numerous that covering them all within 3+2 days of conference and trainings seems no longer feasable. One way to mitigate these problems might be to motivate people to do more local meetups potentially supported by ConCom as has already happened in the Lucene- and Hadoop-communities. A related topic is the task of community building and community growth within the ASF. Google Summer of Code has been a great way to integrate new people. However the model does not scale that well for the foundation. With ComDev a new project was founded with the goal to work on community development issues, talking to research, getting students into open source early on. The project is largely supported by Ross Gardler, who already has experience with teaching and promoting open source and free software in the research context being part of the open source watch project in the UK.

Apache Con US 09 brought together a large community of Apache software developers and users from all over the world who gathered in California, not only for the talks but also for face-to-face communication, coding together and exchanging ideas.

Update: Slides of my Mahout talk are now online.

Apache Con , ,

December Apache Hadoop Get Together @ Berlin

November 15th, 2009

As announced at ApacheCon US, the next Apache Hadoop Get Together Berlin is scheduled for December 2009.

When: Wednesday December 16, 2009  at 5:00pm
Where: newthinking store, Tucholskystr. 48, Berlin

As always there will be slots of 20min each for talks on your Hadoop topic. After each talk there will be a lot time to discuss. You can order drinks directly at the bar in the newthinking store. If you like, you can order pizza. We will go to Cafe Aufsturz after the event for some beer and something to eat.

Talks scheduled so far:

Richard Hutton (nugg.ad): “Moving from five days to one hour.” - This talk explains how we made data processing scalable at nugg.ad. The company’s core business is online advertisement targeting. Our servers receive 10,000 requests per second resulting in data of 100GB per day.

As the classical data warehouse solution reached its limit, we moved to a framework built on top of Hadoop to make analytics speedy,data mining detailed and all of our lives easier. We will give an overview of our solution involving file system structures, scheduling, messaging and programming languages from the future.

Jörg Möllenkamp (Sun): “Hadoop on Sun”
Abstract: Hadoop is a well known technology inside of Sun. This talk want to show some interesting use cases of Hadoop in conjunction with Sun technologies. The first show case wants to demonstrate how Hadoop can used to load massive multicore system with up to 256 threads in a single system to the max. The second use case shows how several mechanisms integrated in Solaris can ease the deployment and operation of Hadoop even in non-dedicated environments. The last usecase will show the combination of the Sun Grid Engine and Hadoop. Talk may contain command-line demonstrations ;).

Nikolaus Pohle (nurago): “M/R for MR - Online Market Research powered by Apache Hadoop. Enable consultants to analyze online behavior for audience segmentation, advertising effects and usage patterns.”

We would like to invite you, the visitor to also tell your Hadoop story, if you like, you can bring slides - there will be a beamer.

A big Thanks goes to the newthinking store for providing a room in the center of Berlin for us. Another big thanks goes to StudiVZ for sponsoring videos of the talks. Links to the videos will be posted here as well as on the StudiVZ blog.

Please do indicate on the following upcoming event if you are planning to attend to make planning (and booking tables at Aufsturz) easier:

http://upcoming.yahoo.com/event/4842528/

Looking forward to seeing you in Berlin,
Isabel

Apache Hadoop Get Together Berlin , , ,

Lucene Meetup Oakland

November 4th, 2009

Though pretty late in the evening the room is packed with some 100 people. Most of them solr or pure lucene java users. There are quite a few Lucene committers at the meetup from all over the world. Several even have heard about Mahout - some even used it :)

Some introductiory questions to index sizes and query volumn: 1 Mio documents seem pretty standard for Lucene deployments - several people run 10 Mio neither. Some people even use indexes with up to billions of documents in Lucene - but at low query volumn. Usually people run projects with about 10 queries per second, but up to 500.

Eric’s presentation gives a nice introduction to what is going on with Lucene/Solr in terms of user interfaces. He starts with an overview of the problems that libraries face when building search engines - especially the facetting side of life. Especially interesting seem Solaritas - a velocity response writer that makes it easy to render search responses not in xml but in simple templated output. He of course also included an overview of the features of LucidFind, the Lucid hosted search engine for all Lucene projects and sub-projects. Take Home message: The interface is the application, as are the urls. Facets are not just for lists.

Second talk is given by Uwe giving an overview of the implementation of numeric searches and range queries and numeric range filters in Lucene.

Third presenter is Stefan on katta - a project on top of Lucene that adds index splits, load balancing, index replication, failover, distributed TFIDF. The mission of katta is to build a distributed Lucene for large indexes under high query load. The project heavily relies on zookeeper for coordination. It uses Hadoop IPC for search communication.

Lighting talks include talks by

  • Zoie: A realtime search extension for Lucene, developed inside of LinkedIn and now open sourced at google code.
  • Jukka proposed a new project: A Lucene-based content mangement system.
  • Next presenter highlighted the problem of document-to-document search. The problem here is that queries are not just one or two terms but more like 40 terms.
  • Next talk shared some statistics: more than 2s at average leads to 40% abandonance rate for sites. The presenter is very interested in the Lucene Ajax project. Before using solr the speaker set up projects with solutions like Endeca or Mercato. Solr to him is an alternative that supports facetting.
  • Andzrej gives an overview of index pruning in 5min - giving details on which approaches are currently being discussed in research as well as in the Lucene jira for index pruning.
  • Next talk was on Lucy - a lucene port to C.
  • Last talk gave an overview of the findings on analysing the Lucene community.
  • One other lightning talk by a guy using and deploying Typo3 pages. Typo3 does come with an integrated search engine. The presenter’s group built an extension to Typo3 that integrates the CMS with Solr search.
  • The final last talk is done by Grant Ingersoll on Mahout. Thanks for that!

Big Thanks to Lucid for sponsoring the meetup.

Apache Con ,

Hadoop Get Together Berlin @ Apache Con US Barcamp

November 3rd, 2009

This is my first real day at ApacheCon US 2009. I arrived yesterday afternoon, was kept awake by three Lucene committers until midnight: “Otherwise you will have a very bad jetlag”… Admittedly it did work out: I slept like a baby until about 08:00a.m. the next morning and am not that tired today.

Today Hackthon, Trainings and barcamp Apache happen in parallel. Ross Gardler tricked me into doing a presentation on my experiences on doing local user meetups. I put the slides online.

The general consent was, that it is actually not that hard to do such a meetup - at least if you are have someone locally to help organizing or do it in a town you know very well. There are ways to get support from the ASF for doing such meetups - people help you get speakers, talk to potential sponsors or find a location. In my experience if doing the event in your hometown, finding a location is not that hard: Either you are lucky having someone like newthinking store around. Or you can contact you local university or even your employer to find some conference room that you can use for free.

Getting the first two to three meetups up and running - especially finding speakers - is hard. However you should be able to benefit from being part of an Apache project already and probably know your community and know who would be willing to speak at one of those meetups. Once the meetup is well established, it should be possible to find sponsors to pay for video taping, free beer and pizza.

Keep in mind that having a fixed schedule ready in advance helps to attract people - it’s always good to know why one should travel to the meetup by train or plane. Don’t forget to plan for time for socializing after the event - having some beer and maybe food together makes it easy for people to connect after the meetup.

Apache Con, Apache Hadoop Get Together Berlin , ,