Archive

Posts Tagged ‘Hadoop’

December Apache Hadoop Get Together Berlin

November 24th, 2011 at 8:14pm

First of all please note that meetup organisation is being transitioned over to our xing meetup group. So in order to be notified of future meetings, make sure to join that group. Please make also sure to register for the December event as in contrast to past meetups this time space will be limited, so make sure to grab a ticket. If you cannot make it, please let the organiser know so he can issue additional tickets.

For those of you currently following this blog only for announcements:

When: December 7th 2011, 7 p.m.

Where: Smarthouse GmbH, Erich-Weinert-Str. 145, 10409 Berlin

Speaker: Martin Scholl
Title: On Firehoses and Storms: Event Thinking, Event Processing

Speaker: Douwe Osinga
Title: Overview of the Data Processing Pipeline at Triposo

Looking forward to seeing you at the next Apache Hadoop Get Together Berlin in December.

Get Together , ,

Cloudera in Berlin

November 14th, 2011 at 8:24pm

Cloudera is hosting another round of trainings in Berlin in November this year. In addition to the trainings on Apache Hadoop this time around there will also be trainings on Apache HBase.

Register online via:

Get Together ,

Apache Hadoop Get Together - Hand over

November 2nd, 2011 at 4:20pm

Apache Hadoop receives lots of attention from large US corporations who are using the project to scale their data processing pipelines:

“Facebook uses Hadoop and Hive extensively to process large data sets. [...]” (Ashish Thusoo, Engineering Manager at Facebook), “Hadoop is a key ingredient in allowing LinkedIn to build many of our most computationally difficult features [...]” (Jay Kreps, Principal Engineer, LinkedIn), “Hadoop enables [Twitter] to store, process, and derive insights from our data in ways that wouldn’t otherwise be possible. [...]” (Kevin Weil, Analytics Lead, Twitter). Found on Yahoo developer blog.

However the system’s use is not limited to large corporations only: With 101tec, Zanox, nugg.ad, nurago also local German players are using the project to enable new applications. Add components like Lucene, Redis, CouchDB, HBase and UIMA to the mix and you end up with a set of majour open source components that allow developers to rapidly develop systems that until a few years ago were possible only either in Google-like companies or in research.

The Berlin Apache Hadoop Get Together started in 2008 allowed to learn more on how the average local company leveraged this software. It is a platform to get in touch informally, exchange knowledge and best practices across corporate boundaries.

After three years of organising that event it is time to hand it over to new caring hands: David Obermann from Idealo kindly volunteered to take over organisation. He is a long-term attendee of the event and will continue it in the roughly the same spirit as before: Technical talks on success stories by users, new features by developers - not solely restricted to Hadoop only but also taking into account related projects.

A huge Thank You for taking up the work of co-ordinating, finding a venue and a sponsor for the videos goes to David! If any of you attending the event think that you have an interesting story to share, would like to support the event financially or just help out please get in touch with David.

Looking forward to the next Apache Hadoop Get Together Berlin. Watch this space for updates on when and where it will take place.

Get Together , , , , ,

Video is up - Simon Willnauer on Lucene 4 Performance improvements

February 22nd, 2011 at 9:21pm

CFP - Berlin Buzzwords 2011 - search, score, scale

January 26th, 2011 at 8:00am

This is to announce the Berlin Buzzwords 2011. The second edition of the successful conference on scalable and open search, data processing and data storage in Germany,
taking place in Berlin.


Call for Presentations Berlin Buzzwords

http://berlinbuzzwords.de

Berlin Buzzwords 2011 - Search, Store, Scale

6/7 June 2011

The event will comprise presentations on scalable data processing. We invite you to submit talks on the topics:

  • IR / Search - Lucene, Solr, katta or comparable solutions
  • NoSQL - like CouchDB, MongoDB, Jackrabbit, HBase and others
  • Hadoop - Hadoop itself, MapReduce, Cascading or Pig and relatives

Closely related topics not explicitly listed above are welcome. We are looking for presentations on the implementation of the systems themselves, real world applications and case studies.

Important Dates (all dates in GMT +2)

  • Submission deadline: March 1st 2011, 23:59 MEZ
  • Notification of accepted speakers: March 22th, 2011, MEZ.
  • Publication of final schedule: April 5th, 2011.
  • Conference: June 6/7. 2011

High quality, technical submissions are called for, ranging from principles to practice. We are looking for real world use cases, background on the architecture of specific projects and a deep dive into architectures built on top of e.g. Hadoop clusters.

Proposals should be submitted at http://berlinbuzzwords.de/content/cfp-0 no later than March 1st, 2011. Acceptance notifications will be sent out soon after the submission deadline. Please include your name, bio and email, the title of the talk, a brief abstract in English language. Please indicate whether you want to give a lightning (10min), short (20min) or long (40min) presentation and indicate the level of experience with the topic your audience should have (e.g. whether your talk will be suitable for newbies or is targeted for experienced users.) If you’d like to pitch your brand new product in your talk, please let us know as well - there will be extra space for presenting new ideas, awesome products and great new projects.

The presentation format is short. We will be enforcing the schedule rigorously.

If you are interested in sponsoring the event (e.g. we would be happy to provide videos after the event, free drinks for attendees as well as an after-show party), please contact us.

Follow @berlinbuzzwords on Twitter for updates. News on the conference will be published on our website at http://berlinbuzzwords.de.

Program Chairs: Isabel Drost, Jan Lehnardt, and Simon Willnauer.

Schedule and further updates on the event will be published on http://berlinbuzzwords.de Please re-distribute this CfP to people who might be interested.

Contact us at:

newthinking communications GmbH
Schönhauser Allee 6/7
10119 Berlin, Germany
Julia Gemählich
Isabel Drost
+49(0)30-9210 596

Berlin Buzzwords , , , , ,

WiFi at the Apache Hadoop Get Together

January 18th, 2011 at 8:40pm

Just a brief reminder: The next Apache Hadoop Get Together is scheduled to take place on Thursday, January 27th at 6p.m. at the Zanox Event Campus at Media Spree Berlin.

We have three very interesting talks, though thirty guests registered already, we still have a few free seats. Head over to the xing event page to register if you have not done so yet.

If you would like to have access to the local WiFi please let me know - I need to register your mail address for that two days before the event with the venue.

A huge thanks to Zanox for providing the location for free, another huge thanks to Cloudera for sponsoring video taping of the event.

Get Together , , ,

Apache Hadoop Get Together Berlin - January 2011

December 28th, 2010 at 4:31pm

This is to announce the next Apache Hadoop Get Together sponsored by Cloudera and Zanox that will take place in the Zanox Event Campus in Berlin.

When: January 27th 2011, 6p.m.

Where: zanox Event Campus (Please mark the changed event location.)


Größere Kartenansicht

As always there will be slots of 30min each for talks on your Hadoop topic. After each talk there will be a lot time to discuss. We head over to a bar after the event for some beer and something to eat.

Talks scheduled so far:

Simon Willnauer: “Lucene 4 - Revisiting problems for speed”

Abstract: This talk presents a brief case study of long standing problems in Lucene and how they have been approached to gain sizable performance improvements. Each of the presented problems will have brief introduction, implemented solution and resulting performance improvements. This talk might be interesting even for non-lucene folks.

Josh Devins: “Title: Hadoop at Nokia”
Abstract: In this talk, Josh will outline some of the ways in which Nokia is using Hadoop. We will start by having a quick look at the practical side of getting started with Hadoop and outline cluster hardware and configuration and management with tools like Puppet. Next we’ll dive head first into how Hadoop and its’ ecosystem are being utilized on a daily basis to perform business analytics, drive machine learning and help build data-driven products. We will also touch on how we go about collecting metrics from dozens of applications distributed in multiple data centers around the world. An open Q&A session will follow.

Paolo Negri: “The order of magnitude challenge: from 100K daily users to 1M ”
Abstract: “Social games backends share many aspects of normal web applications, but exasperate scaling problems, follow this talk to see how we evolved and brought a plain ruby on rails app to sustain 5000 reqs/sec, moved part of our data from sql to nosql to reach 5 millions queries per minute and see what we learned from this experience.”

Please do indicate on Upcoming or Xing if you are coming so we can more safely plan capacities.

A big Thank You goes to zanox for providing the venue for free for our event as well as to Cloudera for supporting videos being taped of the presentations.

Looking forward to seeing you in Berlin,
Isabel

Get Together , , , , ,

Apache Hadoop - Trainings by Cloudera in Berlin

December 22nd, 2010 at 11:53pm

Cloudera is offering trainings both for Administrators as well as for Developers early next year in Berlin. If your are getting started in using Apache Hadoop this might be a great option to get your developers and operations up to speed with the framework. If you are a regular of the local Apache Hadoop Get Together a discount code should have been sent to you by mail.

Get Together, Hadoop , , , ,

Apache Mahout Hackathon Berlin

December 14th, 2010 at 8:50pm

Early next year - on February 19th/20th to be more precise - the first Apache Mahout Hackathon is scheduled to take place at c-base. The Hackathon will take one weekend. There will be plenty of time to hack on your favourite Mahout issue, to get in touch with two of the Mahout committers and get your machine learning project off the ground.

Please contact isabel@apache.org if you are planning to attend this event or register with the xing event so we can plan for enough space for everyone. If you have not registered for the event there is now guarantee you will be admitted.

If you’d like to support the event: We are still looking for sponsors for drinks and pizza.

Mahout , , , ,

Devoxx – Day two – Hadoop and HBase

December 8th, 2010 at 9:24pm

In his session on the current state of Hadoop Tom went into a little more detail not only on the features released in the latest release or on the roadmap for upcoming releases (including Kerberos based security, append support, warm standby namenode and others).
He also gave a very interesting view on the current Hadoop ecosystem. More and more projects are currently being created that either extend Hadoop or are built on top of Hadoop. Several of these are being run as projects at the Apache Software Foundation, however some are available outside of Apache only. Using graphviz he created a graph of projects depending on or extending Hadoop and from that provided a rough classification of these projects.

As to be expected HDFS and Map/Reduce are part of the very basis of this ecosystem. Right next to them sits zookeeper, a distributed coordination and looking service.

Storage systems extending the capabilities of HDFS include HBase that adds random read/write as well as realtime access to the otherwise batch-oriented distributed file-system. With PIG and Hive and Cascading three projects are making it easier to formulate complex queries for Hadoop. Among the three, PIG is mainly focussed on expressing data filtering and processing, with SQL support being added over time as well. Hive came from the need for SQL formulation on Hadoop clusters. Cascading goes a slightly different way, providing a Java API for easier query formulation. The new kid on the block sort of is Plume, a project initiated by Ted Dunning that has the goal of coming up with a Map/Reduce abstraction layer inspired by Google’s Flume Java publication.

There are several projects for data import into HDFS. Sqoop can be used for interfacing with RDMBS. Chukwa and Flume deals with feeding log data into the filesystem. For general co-ordination and workflow orchestration there is the release of Oozie, originally developed at Yahoo! as well as support for workflow definition in Cascading.

When storing data in Hadoop it is a common requirement to find a compact, structured representation of the data to store. Though human readable, xml files are not very compact. However when using any binary format, schema evolution commonly is a problem: Adding, renaming or deleting fields in most cases causes the need to upgrade all code interacting with the data as well as re-formatting already stored data. With Thrift, Avro and Protocol Buffers there are three options available for storing data in a compact, structured binary format. All three projects come with support for schema evolution by providing users no only to deal with missing data but also by providing a means to map old to new fields and vice versa.

General , ,