Archive

Posts Tagged ‘Fosdem’

FOSDEM - Sunday - smaller bits and pieces

February 18th, 2011 at 8:17pm

With WebODF the Office track featured a very interesting project that focusses on providing a means to open ODF documents in your favourite browser: Content and formatting are converted to a form that can easily be dealt with by using a combination of HTML and CSS. Advanced editing is then supported by using JavaScript.

With Open Stack the following talk focussed on an open cloud stack project that was started by NASA and Rackspace as both simultanously needed support for an open source, openly designed, developed cloud stack that strives for community inclusion. According to the speaker the goal is to be as ubiquitous a cloud project as Apache is for web servers - he probably was not quite aware of how close to even the foundation side of Apache that development model is.

The closing keynote dealt with the way kernel development takes place. There were a few very interesting pieces of information for contributors that are valid for any open source project really:

  • Out of tree code is invisible to the kernel developers and users. As such the longer it remains out of tree code the harder it becomes to actually go out there and feel the wind.
  • In contrast open code means giving up control: Maintainership means responsibility but it does not come with any power or control over the source code. Similarly opening code up as patch or separate project at Apache means giving up control - means working towards turning the project into a community that can live on its own.
  • For kernel patches the general rule is to not break things and not go backward in quality: What is working for users today must be working with the next release as well. To be able to spot any compat issues it is necessary to take part on the wider disucssion lists - not only in your limited development community. Developers should focus on coming up with a problem solution instead of getting their original code into the project.

Or in short: The kernel is no research project, as such it must not break existing applications. Visionary brilliance really is no excuse for poor implementation. Conspiracy theories such as "hey, developer x declined my patch only because it is out of scope for his employer’s goals" are not going to get you anywhere. Such things do happen, but in general kernel developers first think of themselves as kernel developers - being employee somewhere only comes after that.

Keep in mind that the community remembers past actions. In the end you need not convince business people or users but the developers themselves who might end up with the maintanance burden for your patch. To get your patch accepted it greatly helps to not express it in terms of the implementation needs only but to clearly formulate your requirements - independent of implementation. And as in any open source project, helping with cleanup (that is not only white space fixes, but real cleanup as in refactoring) does help build a positive attitude.

Why you should go for kernel development never the less? It’s a whole lot of fun. It’s a way to influence the kernel to support the features that you need. It’s sort of like becoming part of an elite club - and which developer does not like the feeling of belonging to the elite changing the way the world looks tomorrow? In addition as with an substantial open source involvement being visible in the kernel community also most likely means being visible to your future employer.

General ,

FOSDEM - HBase at Facebook Messaging

February 17th, 2011 at 8:17pm

Nicolas Spiegelberg gave an awesome introduction not only to the architecture that powers Facebook messaging but also to the design decisions behind their use of Apache HBase as a storage backend. Disclaimer: HBase is being used for message storage, for attachements with Haystack a different backend is used.

The reasons to go for HBase include its strong consistency model, support for auto failover, load balancing of shards, support for compression, atomic read-modify-write support and the inherent Map/Reduce support.

When going from MySQL to HBase some technological problems had to be solved: Coming from MySQL basically all data was normalised - so in an ideal world, migration would have involved one large join to port all the data over to HBase. As this is not feasable in a production environment instead what was done was to load all data into an intermediary HBase table, join the data via Map/Reduce and import all into the target HBase instance. The whole setup was run in a dark launch - being fed with parallel life traffic for performance optimisation and measurement.

The goald was zero data loss in HBase - which meant using the Apache Hadoop append branch of HDFS. The re-designed the HBase master in the process to avoid having a single point of failure, backup masters are handled by zookeeper. Lots of bug fixes went back from Facebooks engineers to the HBase code base. In addition for stability reason rolling restarts were added for upgrades, performance improvements, consistency checks.

The Apache HBase community received lots of love from Facebook for their willingness to work together with the Facebook team on better stability and performance. Work on improvements was shared between teams in an amazing open and inclusive model to development.

One additional hint: FOSDEM videos of all talks including this one have been put online in the meantime.

General , , , ,

FOSDEM - Django

February 16th, 2011 at 8:17pm

The languages/ cloud computing track on Sunday started with the good, the bad and the ugly of Django’s architecture. Without much ado the speaker started by giving a high level overview of the general package layout of Django - unfortunately not going into too much detail on the architecture itself.

What he loves about Django are the model layer abstractions that really are no ORM only - instead both relational and non-relational databases can be supported easily. Abstractions in Django are made by task solved - there are multiple implementations available for caching, mailing, session handling etc. There is great geo support with options for defining geo objects, querying single points on a map for all their overlaying geo objects. Being a community of test driven people Django features awesome debugging and testing tools. To avoid cross side request forgery Django comes with built in protection mechanisms.

There is multi database support for building applications. Being a small core implementation features can be turned on and off as needed. In addition the framework comes with great documentation: No feature addition is accepted unless it comes with decent documentation - which fits nicely with the common perception that anything that is untested and undocumented does not exist.

The bad things about Django according to the speaker? Well, the old CSRF protection implementation that might lead to token leakage. Schema changes and migrations currently really are hard to handle. Though there is south to handle at least some of the migrations pain. The templating implementation could use some improvement as well - being designed to make inclusion of logic in the templates hard some use cases are just to clumsy to implement.

As for the ugly things: There is quite a bit of magic at work which generally leads to harder tracing of applications - that is about to get better. Too many parts of Django rely on unwieldy regular expressions. Anything that spans more than 4 lines on a screen probably is to be considered unmanageable and unchangeable. Authentication cannot really be customised - the information that is stored per user is hard coded and fixed.

Over time what was learned: Refactoring cannot be avoided as requirements change. However being consistent in what you do makes it so much easier for users to pick up the framework. What helps with creating a great open source project: People that have the time to invest - never under estimate the time needed to really go from prototype to production ready.

General , ,

FOSDEM - Saturday

February 15th, 2011 at 8:17pm

Day one at FOSDEM started with a very interesting and timely keynote by Eben Moglen: Starting with the example of Egypt he voted for de-centralized distributed and thus harder to take over communication systems. In terms of tooling we are already almost there. Most use cases like micro blogging, social networking and real time communications can already be implemented in a distributed, fail safe way. So instead of going for convenience it is time to think about digital independence from very few central providers.

I spent most of the morning in the data dev room. The schedule was packed with interesting presentations ranging from introductory overview talks on Hadoop to more in depth treatment of the machine learning framework Apache Mahout. With an analysis of the Wikileaks cables the schedule also included case studies on what use cases can be implemented by thourough data anlysis. The afternoon featured presentations on the background to more data analytics for better usability at Wikimedia as well as talks on buiding search applications.

In the lightning talks room a wide variety of projects was presented - in only ten minutes Pieter Hintjens explained the gist of using 0MQ for messaging. That talk included "Hintjens law of concurrency: e = m * c^2, where e is effort needed to implement and maintain, m is mass - that is the amount of code written and c is complexity.

For me the day ended with a very interesting presentation by Matthias Kirschner/FSFE on one of their campaigns: pdfreaders.org has the very narrow and well scoped goal of getting links to unfree software off of governmental web pages. Using a really intuitive example they were able to convince officials of linking to their vendor neutral list of pdf readers: "Just imagine a road in your city. At this road drivers will find a sign that tells them the road is well suited to be used by VW cars. Those cars can be obtained for test drive at the following address. Your government." As unthinkable as such as sign may be that same text is included in nearly all governmental web pages linking to the acrobat reader.

What made pdfreaders successful is the combined effort of volunteers, its very narrow and clear scope, it’s scalability by nature: People were asked to submit "broken" web pages to a bug tracker, campaign participants would then go and send out paper letters to these institutions and mark the bugs fixed as soon as the links were changed. Letters were pre-written and well prepared. So all that was needed was money for toner, paper and stamps.

One final cute example of how that worked out can be seen at hamburg.de/adobe.

General ,

FOSDEM II 2011

January 23rd, 2011 at 3:46pm

It’s already sort of a nice little tradition for me to spend the first weekend in February in Brussels for FOSDEM. This year I am particulary happy that there will be a Data Analytics Dev Room at FOSDEM. A huge Thanks to @ogrisel and @nmaillot who have done most of the heavy lifting of getting the schedule in place.

I'm going to FOSDEM, the Free and Open Source Software Developers' European Meeting

Looking forward to an interesting Cloud Track, to meeting Peter Hintjens who is going to give a talk on 0MQ, the DevOps presentation and lots of very interesting DevRooms. Looks like again it’s going to be tough to decide on which presentations to go to at any one time.

General , ,

CfP: Data Analysis Dev Room at Fosdem 2011

October 27th, 2010 at 6:56am

Call for Presentations: Data Analysis Dev Room, FOSDEM
http://fosdem.org
5 February 2011
1pm to 7pm
Brussels, Belgium

This is to announce the Data Analysis DevRoom co-located with FOSDEM. The first Meetup on analysing and learning from data, taking place in Brussels, Belgium.

Important Dates (all dates in GMT +2):

  • Submission deadline: 2010-12-17
  • Notification of accepted speakers: 2010-12-20
  • Publication of final schedule: 2011-01-10
  • Meetup: 2011-02-05

Data analysis is an increasingly popular topic in the hacker community. This trend is illustrated by declarations such as:

“I keep saying the sexy job in the next ten years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s? The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it.”

– Hal Varian, Google’s chief economist

Topics
The event will comprise presentations on scalable data processing. We invite you to submit talks on the topics:

  • Information retrieval / Search
  • Large Scale data processing
  • Machine Learning
  • Text Mining
  • Computer vision
  • Linked Open Data
  • Sample list of related open source / data projects (not exhaustive) :
  • http://lucene.apache.org
  • http://hadoop.apache.org (including MapReduce, Pig, Hive, …)
  • http://www.r-project.org/
  • http://scipy.org
  • http://mahout.apache.org
  • http://opennlp.sourceforge.net
  • http://nltk.org
  • http://opencv.willowgarage.com
  • http://mloss.org & http://mldata.org
  • http://dbpedia.org & http://freebase.com

Closely related topics not explicitly listed above are welcome.

High quality, technical submissions are called for, ranging from principles to practice.

We are looking for presentations on the implementation of the systems themselves, real world applications and case studies.

Submissions should be based on free software solutions.

Submission
Proposals should be submitted at fosdem.datadevroom@gmail.com no later than 2010-12-17. Acceptance notifications will be sent out on 2010-12-20.

Please include your name, bio and email, the title of the talk, a brief abstract in English language. Please indicate the level of experience with the topic your audience should have (e.g. whether your talk will be suitable for newbies or is targeted for experienced users.)

The presentation format is short: 30 minutes including questions. We will be enforcing the schedule rigorously.

Sponsoring
If you are interested in sponsoring the event (e.g. we would be happy to provide videos after the event, free drinks for attendees as well as an after-show party), please contact us. Note: “DataDevRoom sponsors” will not be endorsed as “FOSDEM sponsors” and hence not listed in the sponsors section on the fosdem.org website.

Announcements
Follow @DataDevRoom on twitter for updates. News on the conference will be published on our website at http://fosdem.org.

Program Chairs:

  • Olivier Grisel - @ogrisel
  • Isabel Drost - @MaineC
  • Nicolas Maillot - @nmaillot

Please re-distribute this CFP to people who might be interested.

General , , , , ,

FOSDEM - video recordings online

February 14th, 2010 at 8:32pm

As published in the FOSDEM blog the video recordings are available online - at least for the main track and the lightning talks. Happy video watching!

Freetime, General, Hadoop , , ,

FOSDEM 2010 - part 3

February 10th, 2010 at 9:02pm

Sunday started in Janson with Andrian Bowyer’s talk on RepRap machines, that is devices that can be used as manufacturing devices and are able to replicate themselves. After that I went over to the Mono dev room to listen to Miguel de Icaza on Mono Edge. A great talk on the history of Mono, the way the community interacts with Microsoft, the C# language itself and special features only available in Mono.

After this talk we went over to Janson for Andrew Tanenbaum’s talk on Minix. We knew quite a bit of the talk already from Froscon two years ago, however Andrew is an awesome speaker, so it’s always fun to catch up on the news on Minix.

The scalability talk started with an introduction to Hadoop by myself and continued with a talk on the facebook infrastructure by David Recordon. According to feedback I got after the talk, laughing with Thilo helped quite a bit to get myself calm. Before the talk I received one very good recommendation of one of the audio guys: Imagine you are giving the talk to one of your best friends - and forget about the microphone. Though I had way more slides than minutes to talk, we had enough time for the Q&A session after the talk. I started the talk by learning more about the audience - however this time not by handing the microphone to those listening (room too large) - I just asked them “have you heard about Hadoop?” - half of the audience. Are you Hadoop users: one quarter maybe. How large are your clusters? - 10 to 100 nodes mostly. Have you heard of Zookeeper? - some, Hive - some more, Pig - a few, Lucene - a lot, Solr - a little less, Mahout - maybe 5, Mahout users: 1.

Turns out the Mahout user in the audience was Olivier: It’s so nice to meet people you know are active on the mailing lists for real and have a chat with them. Hope to see you more often on the lists - and meet you face to face again.

I used the chance to announce the Berlin Buzzwords 2010, a two day event on search and scalability buzzwords like cloud computing, Hadoop, Lucene, NoSQL and more. It takes place on June 7th and 8th in the center of Berlin. Follow this blog for further information. Judging from the input I got after the announcement there is quite some need for such a conference in Europe.

The slides of my talk are soon to be available online.

After my talk I could stay in Janson: A talk on the Facebook infrastructure (not only the Hadoop side of things) followed. After that I met Lars George at the NoSQL dev room - unfortunately I did not manage to actually talk to Steven Noels, who organised the room.

The afternoon was reserved for Greg Kroah-Hartman on how to “Write and submit your first Linux Kernel Patch” - my personal conclusion: git is really awesome. I really, really need to find a few spare minutes to learn how to effectively use it.

In the evening we met with Pieter Hintjens for dinner - and to finalize an awesome weekend in Brussels and a great 10th anniversary FOSDEM. A huge Thank You to all volunteers and organisers of FOSDEM - you did a great job this year putting together an awesome schedule, you did a fantastic job making the now pretty huge event (with 306 talks and about 5000 hackers attending) run smoothly. Even the wireless was working from minute one. See you again at FOSDEM 2011.

Free Software, Freetime, General ,

FOSDEM 2010 - part 2

February 9th, 2010 at 9:00pm

The event itself featured 306 talks - so pretty hard to choose what to watch on two days. This time, not only the main tracks were awesome, but also several dev rooms featured very interesting talks by well known FOSS developers.

Saturday started with a FOSDEM birthday dance done by all attendees. The first keynote speaker Brooks Davis explained his experiences promoting open source methods at a large company. After that Richard Clayton gave an amazing talk on the evil on the internet. He explained not only how phishing works on a technical level but also included an explanation of the economics behind these attacks, explained how the money flow from victims to attackers works.

On the afternoon Bernard Li gave an introduction to the cluster monitoring tool Ganglia. Directly after that Lindsay Holmwood gave an overview of the monitoring and notification tools flapjack and cucumber-nagios.

The evening was filled with the speakers dinner. Thanks for the organisers for providing that. We had a really nice evening together with some of the organisers, Andrew Tanenbaum and Elena Reshetova at our table.

Free Software, Freetime, General ,

FOSDEM visitor seems to like my baby

February 9th, 2010 at 8:19am

Posted using Mobypicture.com

Another picture that was taken before the first session early in the morning:

Freetime ,