Archive

Posts Tagged ‘Event’

Scrumtisch August Berlin

August 14th, 2010

Just seen it - the next Scrumtisch Berlin has been scheduled for 25th August 2010 at 18:30 Uhr. So far, no official talk has been scheduled, so please expect two topics on Scrum and its application to be selected for discussion according to Marion’s agile topic selection algorithm.

Please talk to Marion Eickmann if you would like to attend the next meetup.

Scrum , , ,

Apache Hadoop Event Blog

August 24th, 2009

As Apache Hadoop becomes ever more popular both in industry as well as in research, user groups, conferences and hacking days are being scheduled around the world. The goal of the event calendar blog hosted on wordpress.com is to provide a common space for organizers to announce their events and potential participants to look for new conferences.

Apache , ,

September 2009 Hadoop Get Together Berlin

August 17th, 2009

The newthinking store Berlin is hosting the Hadoop Get Together user group meeting. It features talks on Hadoop, Lucene, Solr, UIMA, katta, Mahout and various other projects that deal with making large amounts of data accessible and processable. The event brings together leaders from the developer and user communities. The speakers present projects that build on top of Hadoop, case studies of applications being built and deployed on Hadoop. After the talks there is plenty of time for discussion, some beer and food.

There is also a related Xing Group on the topic of building scalable information retrieval systems. Feel free to join and meet other developers dealing with the topic of building scalable solutions.

Agenda:

Please see upcoming page for updates.

  • Thilo Götz: JAQL
  • Uwe Schindler: Lucene 2.9
  • nugg.ad: Ad Recommendation with Hadoop
  • T. Schuett: Solving puzzles with Hadoop.

If you yourself would like to give a presentation: There are additional slots of 20 minutes each available. There is a beamer provided. Just bring your slides. To include your topic on this web site as well as the upcoming.org entry, please send your proposal to Isabel.

After the talks there will be time for an open discussion. We are going into a nearby restaurant after the event so there will be plenty of time for talking, discussing and new ideas.

Location

The Apache Hadoop Get Together takes place at the newthinking store Berlin:

newthinking store GmbH
Tucholskystr. 48
10117 Berlin


View Larger Map

Accomodation

  • Homeli - not exactly in walking distance, but only a few S-Bahn stations away. Very nice Bed and Breakfast hotel. (The offer is only valid if you stay for at least three nights.)
  • Circus Berlin is a combination of hostel and hotel close by.
  • Zimmer in Berlin is yet another Bed and Breakfast hotel.
  • House boat near Friedrichshain

Announcements

If you would like to be notified on news please subscribe to our mailinglist. The meetings usually are also announced on the project mailing lists as well as on the newthinking store website.

Contact

In case you have any trouble reaching the location or finding accomodation feel free to contact the organiser Isabel.

Past events

June 2009 Apache Hadoop Get Together @ Berlin

June 21st, 2009

Just a brief reminder: Next week on Thursday the next Apache Hadoop Get Together is scheduled to take place in Berlin. There are quite a few interesting talks scheduled:

  • Torsten Curdt: Data Legacy - the challenges of an evolving data warehouse

  • Christoph M. Friedrich, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI): “SCAIView - Lucene for Life Science Knowledge Discovery”.
  • Uri Boness from JTeam in Amsterdam: Solr - From Theory to Practice.

See http://upcoming.yahoo.com/event/2488959/ for more information.

For those interested in NOSQL Meetups, the discussion over at the NOSQL mailing list might be of interest to you: http://blog.oskarsson.nu/2009/06/nosql-debrief.html

Apache Hadoop Get Together Berlin, Events , ,

Scrum Table Berlin

June 21st, 2009

Last week I attended the scrum table Berlin. This time around Phillippe gave a presentation on “backlog colours”, that is types of work items tracked in the backlog.

The easiest type to track are features - that is items that generate revenue and are on the wishlist of the customer. Second type of items he sees are infrastructure items - that is, things needed to implement several features but invisible to the customer. Third type are bugs. Basically these are diminishing the value of features one had already classified as done earlier in the process. Fourth and last type are technical debt items - that is shortcuts taken or bad design choices (either knowingly as intentional decision made to meet some deadline or unintentional due to lack of experience).

A very simple classification could be the following matrix:

Name Value Cost
Feature Visible Positive Positive
Infrastructure Invisible Positive Positive
Bug Visible Negative Positive
Technical Debt Invisible Negative Positive

All four types of items exist in the real world. The interesting part is making these visible, assigning costs to each of them and scheduling these items in the regular sprint intervals.

The full presentation can be downloaded: http://scrumorlando09.pbworks.com/f/kruchten_backlog_colours.pdf

Events, Scrum ,

Tomcat Tuesday talk

May 21st, 2009

Since several months at neofonie we have a talk given by external or internal developers on various subjects each Tuesday. Usually these presentations are a nice way to get an overview of new emerging technologies, to get an overview of current conference topics or to gain insight into interesting internal projects.

This week we had Apache Tomcat Committer and PMC Peter Rossbach here at neofonie to talk about the Tomcat architecture and Tomcat clustering solutions. He gave two pretty in-depth presentations on the Tomcat internals, Tomcat optimization and extension points.

Some points that were especially interesting to me: The project started out in the late nineties, initiated by a bunch of developers who just wanted to see what it takes to write a web application container and that fullfills the spec. The goal basically was a reference implementation. Soon enough however users defined the resulting code as production ready and used it.

There are a few caveats from this history that are still visible. One is the lack of tests in the codebase. Sure, each release is tested agains the Sun TCK - but these tests cannot be opened to the general public. So if you as a developer make extensions or modifications to the code base there is no easy way of knowing whether you broke something or not.

For me as a developer it was interesting to see really how complex it quickly gets to cluster tomcat deployments and make them failure resistant. Some tools mentioned that help automatic with easier deployment are Puppet and FAI. One issue however that is still on the developer’s agenda is Tomcat monitoring.

To summarize: The conference room was packed with developers expecting two very interesting talks. Thanks to Peter Rossbach for coming to neofonie and explaining more on the internals of the Tomcat software, the project and the community behind.

Apache , , ,

Hadoop User Group UK

April 21st, 2009

On Tuesday the 14th the second Hadoop User Group UK took place in London. This time venue and pizza was sponsored by Sun. The room quickly filled http://www.thecepblog.com/2009/04/19/kmeans-clustering-now-running-on-elastic-mapreduce/with approximately 70 people.

Tom opened the session with a talk on 10 practical tips on how to get the most benefit from Apache Hadoop. The first question users should ask themselves is which type of programming language they want to use. There is a choice between structured data processing languages (PIG or Hive), dynamic languages (Streaming or Dumbo), or using Java which is closest to the system.

Tom’s second hint dealt with the size of files to process with Hadoop: Both - too large unsplittable and too small ones are bad for performance. In the first case, the workload cannot be easily distributed across the nodes in the latter case each unit of work is to small to account for startup and coordination overhead. There are ways to remedy these problems with sequence files and map files though. Another performance optimization would be to chain individual jobs - PIG and Hive do a pretty decent job in automatically generating such jobs. ChainMapper and ChainReducer can help with creating chained jobs.

Another important task when implementing map reduce jobs is to tell Hadoop the progress of your job. For once, this is important for long running jobs in order for them to remain alive and not be killed by the framework due to timeouts. Second, it is convenient for the user as he can view the progress in the web UI of Hadoop.

Usual suspects for tuning a job: Number of mappers and reducers, usage of combiners, compression customised data serialisation, shuffling tweaks. Of course there is always the option to let someone else do the tuning: Cloudera does provide support as well as pre-built packages init scripts and the like ;)

In the second talk I did a brief Mahout intro. It was surprising to me that half of the attendees already employed machine learning algorithm implementations in their daily work. Judging from the discussion after the talk and from questions I received after it the interest in the project seems pretty high. The slide I liked the most: The announcement of our first 0.1 release. Thanks to all Mahout committers and contributors who made this possible.

After the coffee break Craig gave an introduction to Terrier an extensible information retrieval plattform developed at the university of Glasgow. He mentioned a few other open IR platforms namely Tuple Flow, Zettair, Lemur/Indri, Xapian, as well as of course nutch/Solr/Lucene.

What does Terrier have to do with the HugUK? Well index creation in Terrier is now based on an implementation that makes use of Hadoop for parallelization. Craig did some very interesting analysis on scalability of the solution: The team was able to achieve scaling near linear in the number of nodes added (at least as long as more than reducer is used ;) ).

After the pizza Paolo described his experiences implementing the vanilla pagerank computation with Hadoop. One of his test datasets was the Citeseer citation graph. Interestingly enough: Some of the nodes in this graph have self references (maybe due to extraction problems), duplicate citations, and the data comes in an invalid xml format.

The last talk was on HBase by Michael Stack. I am really happy I attended HugUK as I missed that talk in Amsterdam at the ApacheCon. First Michael gave an overview of which features of a typical RDBMS are not supported by HBase: Relations, joins, and of course JDBC being among the limitations. On the pro site HBase offers a multiple node solutions that has scale out and replication built in.

HBase can be used as source as well as as sink for map reduce jobs and thus integrates nicely with the Apache Hadoop stack. The framework provides a simple shell for administrative tasks (surgery on sick clusters forced flushes non sql get scan and put methods). In addition the master comes with a UI to monitor the cluster state.

Your typical DBA work though differs with HBase: Data locality and physical layout do matter and can be configured. Michaels recommendation was to start out testing with the XL instance on EC2 and decrease instances if you find out that it is too large.

The talk finished with an outlook of the features in the upcoming release the issues on the todo list and an overview of companies already using HBase.

After talks were finished quite a few attendees went over to a pub close by: Drinking beer, discussing new directions and sharing war stories.

I would to thank Johan Oskarsson for organising the event. And a special thanks to Tom for letting me use his Laptop for the Apache Mahout presentation: the hard disk of mine broke exactly one day before.

Last but not least thank you to Sylvio and Susi for letting me stay at their place - and thanks to Helene for crying only during daytime when I was out anyway ;)

Hope to see at least some of the attendees again at the next Hadoop Meetup in Berlin. Looking forward to the next Hadoop User Group UK.

Apache, Events, Hadoop User Group UK , , ,

GSoC: Student applications.

April 5th, 2009

Title: GSoC: Student applications closed
Link out: Click here
Description: After this date no more student applications are accepted. Internal ranking at Apache starts 7 days earlier. The ranking process finishes at 16th of April.
Date: 2009-04-03

Apache, Events , ,

Students begin coding for their GSoC projects

April 5th, 2009

Title: Students begin coding for their GSoC projects
Link out: Click here
Description: GSoC time.
Date: 2009-05-26

Apache, Events , ,

Apache Con Europe 2009 - part 3

March 29th, 2009

Friday was the last conference day. I enjoyed the Apache pioneers panel with a brief history of the Apache Software Foundation as well as lots of stories on how people first got in contact with the ASF.

After lunch I went to the testing and cloud session. I enjoyed the talk on continuum and its uses by Wendy Smoak. She gave a basic overview of why one would want a CI system and provided a brief introduction to continuum. After that Carlos Sanchez showed how to use the cloud to automate interface tests with Selenium: The basic idea is to automatically (initiated through maven) start up AMIs on EC2, each configured with another operating system and run Selenium tests against the application under development in these. Really nice system for running automated interface tests.

The final session for me was the talk by Chris Anderson and Jan Lehnardt on CouchDB deployments.

The day ended with the Closing Event and Raffle. Big Thank You to Ross Gardler for including the Berlin Apache Hadoop Get Together in newthinking store in the announcements! Will sent the CfP to concom really soon, as promised. Finally I won one package of caffeinated sweets at the Raffle - does that mean less sleep for me in the coming weeks?

Now I am finally back home and had some time to do a quick writeup. If you are interested in the complete notes, go to http://fotos.isabel-drost.de (default login is published in the login window). Looking forward to the Hadoop User Group UK on 14th of April. If you have not signed up yet - do so now: http://huguk.org

Apache, Apache Con, Events , ,