Archive

Archive for the ‘Hacking’ Category

Dorkbot Berlin

January 30th, 2012 at 11:18pm

c-base - 8p.m. on a Monday evening - the room is packed (and pretty cloudy as well): Time for Dorkbot, a short series of talks on “People doing strange things with electricity” hosted by Frank Rieger.

First talk up on stage was Gismo on Raumfahrtagentur - a Berlin maker-space located in Wedding. Originating from the presenter’s interest in electrical bikes a group of ten people interested in hardware hacking got together. Projects include but are not limited to 3D printing, 3D scanning, textile hacking, a collaborative podcast. Essentially the idea is to provide room and infrastructure to be used collaboratively by a group of members. From an organisational point of view the group is incorporated as a GmbH - however none of the projects is mainly targeted to commercialization: It’s main target group are hobbyists, researchers and open hardware/software people. If interested: Each Monday evening there is a “Sunday of the Kosmonauts” where externals are invited to come visit.

Second talk was on the project Drinkenlights (Klackerlaken) - a way for children to learn the basics of electronics without any soldering (hardware available for three Euros max). Experiences made with giving the ingredients for creating these toys to children of varying ages were interesting: From kids of about five years playing around up to ten/eleven year olds that when in school seemingly had to re-learn being creative without being given much direction or instruction on the task at hand.

In the third talk Martin Kaltenbrunner introduced his Tworse Key - a nice symbiosis of old technology (a morse key) and new media (Twitter). Essentially built on top of an Arduino Ethernet board it made it possible to turn morse messages into Tweets. Martin also gave a brief overview of related art projects and briefly touched upon the changes that open source and open hardware bring to art: There are projects that open all design and source code to the public to benefit from a wider distribution channel (without having to actually produce anything), working on designs in a collaborative way and get improvements back to the original project. All of these form a stark contrast to the existing idea of having one single author whose contribution is to build a physical object that is then presented in exhibitions - providing both, new possibilities and new challenges to artists.

In the last presentation Milosch introduced his new project ETIB whose goal it is to bring hardware hacking geeks together with textile geeks to work on integrating circuits into clothes.

If you are interested in hacking spaces in general and what is happening in that direction in Berlin, mark this Friday in your calendar: c-base will be hosting a Hackerspace meetup - so if you want to know how hackerspaces work or want to create one yourself, this event might be interesting to you.

Hacking ,

One day later

January 5th, 2012 at 11:57pm

Fun little new toy

January 3rd, 2012 at 11:48pm

Yesterday Thilo invited me to attend an “Electronics 101″ workshop including an introduction to soldering that was scheduled to start at 7p.m. this evening at the offices of IN-Berlin e.V.. As part of my studies back in university I do have a little bit of background in Electronics, but never before had tried any serious soldering (apart from fixing one of our audio cables) so I thought, why not.

The workshop turned out to be a lot of fun: The organisers Mitch Altman and Jimmie Rodgers had brought several pre-packaged kits for people to work on. Quite a few of them based on Arduino so after putting them together you can actually continue having fun with writing little programs. After giving a brief but very well done, easy to understand introduction to digital electronics Mitch showed attendees how to use a soldering iron (make sure to check out his comic “soldering is easy” if you want to know more) and got everyone started. Both Jimmie and Mitch did a great job answering any upcoming questions, fixing issues and generally helping out with any problems. Even those that never used a soldering iron before quickly got up to speed and in the end went home with that nice experience of having built something that you cannot only program but can touch and hold in your hands.

I got myself a LoL shield (still to be done), and a Diavolino. Still missing is the FTDI TTL-232R cable for getting the device hooked up to our laptops and be able to re-program it (though most likely that will be easier to find than a >1G Ohm resistor Thilo is looking for to be able to calibrate his Geiger counter).

Results of my first session are below:

The board First pins attached Last pins attached

Also thanks to Sven Guckes organising and announcing this workshop on short notice. And thanks to Thilo for talking me into that.

Update: Images of the event are available online.

Hacking ,

#28c3

December 30th, 2011 at 2:07am

Restate my assumptions.

One: Mathematics is the language of nature.

Two: Everything around us can be represented and understood through numbers.

Three: If you graph the numbers of any system, patterns emerge. Therefore, there are patterns everywhere in nature.

The above is a quote from today’s “Hackers in movies” talk at 28c3 - which amongst others also showed a brief snippet of the movie Pi. For several years I stayed well away from that one famous Hackers’ conference in Berlin that takes place annually between Christmas and New Year. 23C3 was the last congress I attended until today. Though there were several fantastic talks and mean presentation quality was pretty good the standard deviation of talk quality was just too high for my taste. In addition due to limited room sizes with 4 tracks there were quite a few space issues.

In recent years much of that has changed: The maximum number of tickets is strictly enforced, there is an additional lounge area in a large tent next to the entrance, for the sake of having larger rooms the number of tracks was reduced to three. Streaming works for the most part making it possible for those who did not get one of the 3000 full conference tickets to follow the program from their preferred hacker space. In addition fem does an amazing job of recording, mastering, encoding and pushing videos online: Hacker Jeopardy - a show that wasn’t over until early Thursday morning (about 3a.m.?) - was up on Youtube at least on Thursday at 7a.m if not earlier.

Several nice geeks got me talked into joining the crowd briefly this evening for a the last three talks in “Saal 1″ depicted above: You cannot be in Berlin during 28c3 and not see the so-called “fnord Jahresrückblick” by Fefe and Frank Rieger, creators of the Alternativlos podcast.

Overall it is amazing to watch BCC being invaded by a large group of hackers. It’s fun to see quite a few of them on Alexanderplatz, watch people have fun with a TV B Gone in front of large electronics stores. It’s great to get to watch highly technical but also several political talks 4 days in a row from 11 a.m. until at least 2p.m. the following day that are being given by people who are very passionate about what they do and the projects they spend their time on.

If you are into tinkering, hacking, trying out sorting algorithms and generally having fun with technology make sure you check out the 28c3 Youtube channel. If you want to learn more on Hacker culture, mark the days between Christmas and New Year next year and attend 29c3 - don’t worry if you do not speak German - the majority of talks is in English, most of the ones that aren’t are being translated on the fly by volunteers. If you are good at translations, feel free to volunteer yourself for that task. Speaking of volunteering: My respect to all angels (helping hands), heralds (those introducing speakers), noc (network operating center), poc (phone operating center), the organisation team and anyone who helps keep make that event as enjoyable to attendees as it is.

Update: Thank you to the geeks who after staying in our apartment for #28c3 helped get it back to a clean state - actually cleaner than it was before. You rock!

General, Hacking

Apache Mahout Hackathon Berlin

March 21st, 2011 at 9:39pm

Last year Sebastian Schelter from Berlin was added to the list of committers for Apache Mahout. With two committers in town the idea was born to meet some day, work on Mahout. So why not just announce that meeting publicly and invite others who might be interested in learning more about the framework? I got in touch with c-base - a hacker space in Berlin well suited to host a Hackathon - and quickly got their ok for the event.

As a result the first Apache Mahout Hackathon took place at c-base in Berlin last weekend. We had about eight attendees - arriving at varying times: I guess 11a.m. simply is way too early to get up for your average software developer on a Saturday. I got a few people surprised by the venue - especially those who were attending a Hackathon for the very first time and had expected c-base to be some IT company ;)

We started the day with a brief collection of ideas that everyone wanted to work on: Some needed help to use Mahout - topics included:

  • How to use Apache Mahout collaborative filtering with complex models.
  • How to use Apache Mahout via a web application?
  • How to use classification (mostly focussed on using Naive Bayes from within web applications).
  • Is HBase a solution for scalable graph mining algorithms?
  • Is there a frequent itemset algorithm that respects temporal changes in patterns?

Those more into Mahout development proposed a slightly different set of topics:

  • PLSI and Map/Reduce?
  • Build customisable sampling strategies for distributed recommendations.
  • Come up with a more Java API friendly configuration scheme for Mahout clusterings.
  • Complete the distributed SVD recommender.

Quickly teams of two to three (and more) people formed. First several user side questions could be addressed by mixing more experienced Mahout developers with newbie users. Apart from Mahout specifics also more basic questions of getting involved even by simply contributing to the online documentation, answering questions on the mailing lists or just providing structured access to existing material that users generally have trouble finding.

Another topic that is being overlooked all too when asking users to contribute to the project is the process of creating, submitting, applying and reviewing patches itself: Being deeply involved with free software projects dealing with patches, integration of issue tracker and svn with the project mailing lists all seems very obvious. However even this seemingly basic setup sometimes looks confusing and complex to regular users - that is very common but not limited to people who are just starting to work as software developers.

Thanks to Thilo Fromm for taking the group picture.

In the evening people finally started hacking more sophisticated tasks - working on the first project patches. On Sunday only the really hard core developers remained - leading to a rather focussed work on Mahout improvements which in the end led to first patches sent in from the Mahout Hackathon.

Hacking, Mahout , , ,

Note to self: svn:ignore usage

February 25th, 2011 at 8:47pm

Putting the information here to make retrieving it a bit easier next time.

When working with svn and some random IDE I’d really love to avoid checking in any files that are IDE specific (project configuration, classpath, etc.). The command to do that:

svn propedit svn:ignore $directory_to_edit

After issuing this command you’ll be prompted to enter file patterns for files to ignore or the directory names.

More detailed information in the official documentation on svn:ignore.

Hacking , ,

Devoxx – Day 2 HBase

December 9th, 2010 at 9:25pm

Devoxx featured several interesting case studies of how HBase and Hadoop can be used to scale data analysis back ends as well as data serving front ends.

Twitter

Dmitry Ryaboy from Twitter explained how to scale high load and large data systems using Cassandra. Looking at the sheer amount of tweets generated each day it becomes obvious that with a system like MySQL alone this site cannot be run.

Twitter has released several of their internal tools under a free software license for others to re-use – some of them being rather straight forward, others more involved. At Twitter each Tweet is annotated by a user_id, a time stamp (ok if skewed by a few minutes) as well as a unique tweet_id. In order to come up with a solution for generating the latter one they built a library called snowflake. Though rather simple algorithm even works in a cross data-centre set-up: The first bits are composed of the current time stamp, the following bits encode the data-centre, after that there is room for a counter. The tweet_ids are globally ordered by time and distinct across data-centres without the need for global synchronisation.

With gizzard Twitter released a rather general sharding implementation that is used internally to run distributed versions of Lucene, MySQL as well as Redis (to be introduced for caching tweet timelines due to its explicit support for lists as data structures for values that are not available in memcached).

FlockDB for large scale social graph storage and analysis. Rainbird for time series analysis, though with OpenTSDB there is something comparable available for HBase. Haplocheirus for message vector caching (currently based on memcached, soon to be migrated to Redis for its richer data structures). The queries available through the front-end are rather limited thus making it easy to provide pre-computed, optimised version in the back-end. As with the caching problem a tradeoff between hit rate on the pool of pre-computed items vs. storage cost can be made based on the observed query distribution.

In the back-end of Twitter various statistical and data mining analysis are run on top of Hadoop HBase To compute potentially interesting followers for users, to extract potentially interesting products etc.
The final take-home message here: Go from requirements to final solution. In the space of storage systems there is not such thing as a silver bullet. Instead you have to carefully evaluate features and properties of each solutions as your data and load increase.

Facebook

When implementing Facebook Messaging (a new feature that was announced this week) Facebook decided to go for HBase instead of Cassandra. The requirements of the feature included massive scale, long-tail write access to the database (which more or less ruled out MySQL and comparable solutions) and a need for strict ordering of messages (which ruled out any eventually consistent system. The decision was made to use HBase.

A team of 15 developers (including operations and frontend) was working on the system for one year before it was finally released. The feature supports for integration of facebook messaging, IM, SMS and mail into one single system making it possible to group all messages by conversation no matter which device was used to send the message originally. That way each user’s inbox turns into a social inbox.

Adobe

Cosmin Lehene presented four use cases of Hadoop at Adobe. The first one dealt with creating and evaluating profiles of the Adobe Media Player. Users would be associated with a vector giving more information on what types of genre the meda they consumed belonged to. These vectors would then be used to generate recommendations for additional content to view in order to increase consumption rate. Adobe built a clustering system that would interface Mahout’s canopy- and k-means implementations with their HBase backend for user grouping. Thanks Cosmin for including that information in your presentation!

A second use case focussed on finding out more on the usage of flash on the internet. Using Google to search for flash content was no good as only the first 2000 results could be viewed thus resulting in a highly skewed sample. Instead they used a mixture of nutch and HBase for storage to retrieve the content. Analysis was done with respect to various features of flash movies, such as frame rates. The analysis revealed a large gap between the perceived typical usage and the actual usage of flash on the internet.

The third use case involves analysis of images and usage patterns on the Photoshop-in-a-browser edition of Photoshop.com. The forth use case dealt with scaling the infrastructure that powers businesscatalyst – a turn-key online business platform solution including analysis, campaigning and more. When purchased by Adobe the system was very successful business-wise. However the infrastructure was by no means able to put up with the load it had to accommodate. Changing to a back-end based on HBase led to better performance, faster report generation.

General, Hacking, Mahout , , , , , ,

Devoxx – Day two – Caching

December 7th, 2010 at 9:22pm

Day two started with a really good talk on caching architectures by Greg Luck. He first motivated why caching works: Even with SSIDs being available now there is still a huge performance gap between RAM access times and having to go to disk. The issue is even worse in systems that are architected in a distributed way making frequent calls to remote systems.

When sizing systems for typical load, what is oftentimes forgotten is that there is no such thing as typical load: Usually the load distribution observed over one day for a service used mainly in one time zone has the shape of an elephant – most queries are issued during lunch time (head of the elephant) with another but smaller peak during the afternoon. This pattern repeats when looking at the weekly distribution, repeats again when looking at the yearly distribution. When looking at the peak time of the year, at the peak day, at the peak time your lead may be increased by several orders of magnitude compared to average load.

Although query volume may be high in most applications that reach out for caching, these queries usually exhibit a power law distribution. This means that there are just a few queries being issued very frequently, however many queries are pretty seldom. This pattern allows for high cache hit rates thus reducing load substantially even during very busy times.

The speaker went into some more detail concerning different architectures: Usually projects start with one cache located directly on the frontend server. When scaling horizontally and adding more and more frontends this leads to an ever increasing load on the database during one period of lifetime for one cached item. The first idea employed to remedy this setup is to link the different caches to each other increasing cache hit rates. Problem here are updates racing to the various caches when the same query is issued to the backend by more than one frontend. The usual next step is to go for a distributed remote cache such as memcache. Of course this has the draw-back of now having to do a network call for each cache access slowing down response times by several milliseconds. Another problem with distributed caching systems is a theorem well known to people building distributed NoSQL databases: CAP says that you can get only two of the three desired properties consistency, availability and partition-tolerance. Ehcache with a terracotta back end lets you configure where your priority lies.

Hacking , ,

Devoxx University – Productive programmer, HBase

December 4th, 2010 at 9:17pm

The first day at Devoxx featured several tutorials – most interesting to me was the pragramatic programmer. The speaker also is the author of the equally named book at O’Reilly. The book was the result of the observation that developers today are more and more IDE bound, no longer able to use the command line effectively. The result are developers that are unnecessarily slow when creating software. The goal was to bring usage patterns of productive software development to juniors how grew up in a GUI only environment. However half-way through the book, it became apparent that a book on command line wizardry only is barely interesting at all. So the focus was shifted and now includes more general productivity patterns.
The goal was to accelerate development – mostly by avoiding time consuming usage patterns (minimise mouse usage) and automation of repetitive tasks (computers are good at doing dull, repetitive tasks – that’s what they are made for.
Second goal was increasing focus. Two main ingredients to that are switching off anything that disturbs the development flow: No more pop-ups, not more mail notifications, no more flashing side windows. If you have ever had the effect of thinking “So late already?” when your colleagues were going out for lunch – then you know what is meant by being in the flow. It takes up to 20min to get into this mode – but just the fraction of a second to be thrown out. With developers being significantly more productive in this state it makes sense to reduce the risk of being thrown out.
Third goal was about canonicality, fourth one on automation.
During the morning I hopped on and off the Hadoop talk as well – the tutorial was great to get into the system, Tom White went into detail also explaining several of the most common advanced patterns. Of course not that much new stuff if you sort-of know the system already :)

General, Hacking , , ,

First steps with git

October 30th, 2010 at 7:47pm

A few weeks ago I started to use git not only for tracking changes in my own private repository but also for Mahout development and for reviewing patches. My setup probably is a bit unusual, so I thought, I’d first describe that before diving deeper into the specifc steps.

Workflow to implement

With my development I wanted to follow Mahout trunk very closely, integrating and merging any changes as soon as I continue to work on the code. I wanted to be able to work with two different machines on the client side that are located at two distinct physical locations. I was fine with publishing any changes or intermediate progress online.

The tools used

I setup a clone of the official Mahout git repository on github as a place the check changes into and as a place to publish my own changes.

On each machine used, I cloned this github repository. After that I added the official Mahout git repository as upstream repository to be able to fetch and merge in any upstream changes.

Command set

After cloning the official Mahout repository into my own github account, the following set of commands was used on a single client machine to clone and setup the repository. See also the Github help on forking git repositories.

#clone the github repository
git clone git@github.com:MaineC/mahout.git

#add upstream to the local clone
git remote add upstream git://git.apache.org/mahout.git

One additional piece of configuration that helped make life easier was to setup a list of files and file patterns to be ignored by git.

Each distinct changeset (be it code review, code style changes or steps towards own changes) would then be done in their own branches locally. To share them with other developers as well as make them accessible to my second machine I would use the following commands on the machine used for initial development:

#create the branch
git branch MAHOUT-666

#publish the branch on github
git push origin MAHOUT-666

To get all changes both from my first machine and from upstream into the second machine all that was needed was:

#select correct local branch
git checkout trunk

#get and merge changes from upstream
git fetch upstream
git merge upstream/trunk

#get changes from github
git fetch origin
git merge origin/trunk

#get branch from above
git checkout -b MAHOUT-666 origin/MAHOUT-666

Of course pushing changes into an Apache repository is not possible. So I would still end up creating a patch, submit that to JIRA for review and in the end apply and commit that via svn. As soon as these changes finally made it into the official trunk all branches created earlier were rendered obsolete.

What still makes me stick with git especially for reviewing patches and working on multiple changesets is it’s capability to quickly and completely locally create branches. This feature totally changed my so-far established workflow for keeping changesets separate:

With svn I would create a separate checkout of the original repository from a remote server, make my changes or even just apply a patch for review. To speed things up or be able to work offline I would keep one svn checkout clean, copy that to a different location and only there apply the patch.

In combination with using an IDE this workflow would result in me having to re-import each different checkout as a separate project. Even though both Idea and Eclipse are reasonably fast with importing and setting up projects it would still cost some time.

With git all I do is one clone. After that I can locally create branches w/o contacting the server again. I usually keep trunk clean from any local changes - patches are applied to separate branches for review. Same happens to any code modifications. That way all work can happen when disconnected from the version control server.

When combined with IntelliJ Idea fun becomes even greater: The IDE regularly scans the filesystem for updated files. So after each git checkout I’ll find the IDE automatically adjust to the changed source code - that way avoiding project re-creation. Same is of course possible with Eclipse - it just involves one additional click on the Refresh button.

For me git helped speed up my work processes and supported use cases that otherwise would have involved sending patches to and fro between separate mailboxes. That way work with patches and changeset seemed way more natural and better supported by the version control system itself. In addition it of course is a great relief to be able to checkin, diff, log, checkout etc. even when disconnected from the network - which for me still is one of the biggest advantages of any distributed version control system.

Update
Lance Norskog recently pointed out one more step that is helpful:

You didn’t mention how to purge your project branch out of the github fork. From http://help.github.com/remotes/: Deleting a remote branch or tag

This command is a bit arcane at first glance… git push REMOTENAME :BRANCHNAME. If you look at the advanced push syntax above it should make a bit more sense. You are literally telling git “push nothing into BRANCHNAME on REMOTENAME”. And, you also have to delete the branch locally also.

Hacking , ,