Keepers of secrets - FOSDEM 09

2013-02-20 20:49

The closing keynote was given by Leslie Hawthorn whom I had the pleasure of meeting last year during Berlin Buzzwords. In her talk she shared insights into a topic commonly encountered in open source leadership that is way less often talked about than should be the case: Being in the role of a community leader people will talk to you about all sorts of confidential information and ask you to not share that information with other no matter how beneficial that might be for both parties.

Essentially if you've never been a community leader – it is much less about technical skills and way more about strategy, marketing, development events and unpaid therapy really.

Leslie first introduced the types of secrets:

  • There are lots of one-on-one communications. There are several small group conversations. After all this is what makes humans human. However no matter how much a community trusts the people meeting in small groups, ultimately someone will feel betrayed, someone will suspect evil things being drafted in those discussions – even though the conversation really may just involve the quality of the beer they had yesterday.
  • Being social entities we ultimately need input from our peers. This may mean that we require input on things we perfectly well know that we are not supposed to discuss these topics with anyone.
  • There are secrets that are only secrets when told to the wrong person. There is information that is shared publicly – as in “on a website that requires no authentication whatsoever” - but that due to the nature of how information is discovered by certain people will never make it to the right person anyway.
  • Some things are innociuous.
  • Some things are blindingly apparent, but aren't told anyway.

All of this becomes all the more interesting once you become a community leader. Your ultimate goal is to foster empathy and inclusion. You have to understand not only what you communicate, but also how to say certain things.

One example: Assume there is a contributor in a critical code path that is having a hard time privately and appears less and less often online. He told you the reason why, but asked you to not talk about it for whatever reason. On the other hand the community – being uninformed as they are – is loosing trust in the community member, blaming him for stopping progress. How should you react? Well, the three solution paths are extremely obvious but that doesn't make them any easier:

  • Encourage disclosure.
  • Ask for permission to disclose parts yourself.
  • Encourage the community to talk to the individual directly.

The worst you can do is to ignore the issue. Still that is what many people do, simply because it is the most comfortable solution. Go out of your comfort zone – your goal should be to make your project thrive.

What about that one person that just doesn't get they are hurting the project. The good-hearted person whose actions slow down the project? People on your project will get cranky, waste cycles on herding volunteer work if you avoid dealing with this person. There is no manual on dealing with frustrations and feelings in open source projects - though Poisonous people is a great intro to the topic by Brian Fitspatrick and Ben Collins-Sussman, so is their book on “Team Geek” published at O'Reilly:

Though these issues are messy and make you feel uncomfortable – do deal with them as quickly as you can, otherwise they will kill your project. Either correct the educational issues of the person in question, suggest other ways to be effective and ultimately be willing to kindly but sincerely ask the person to move on.

We have negotiations each day – most of them we do not notice as they happen in the comfort zone of “I like the person and our interests are very well aligned.” The more uncomfortable ones that we actually remember are the ones involving either constellations where we do not like the people involved but are well aligned, where we like the people involved but aren't well aligned or in the extreme case neither like the people involved nor are we well aligned. Especially in the uncomfortable situations it makes sense to remember that negotiations really come in up to six stages:

  • Being willing to openly ask for what you need
  • Asking for what you need.
  • Finding common ground and reaching agreement
  • If impossible, finding the best alternative for boot
  • If still impossible, agreeing to not having reached agreement.

Value honesty above all, but really do not be a tactless jerk. Diplomacy in order to reach your goals is ok: Ultimately you have to decide whether you want to be right or whether you want to win.

To summarise make sure you care about your project – the people in it will need most love when you have most reason to hate them.

One final recommendation after a question for leader burnout from the audience: Noticing burn out is as easy as observing that each morning you wake up with that “oh no, I don't want to do this, I want to walk away” kind of feeling wrt. to stuff that formerly used to be a lot of fun to do. First counter measure: RUN AWAY! Take vacation, turn of your electronics, hug a tree – get away from what is turning you down. After returning make sure you involve your peers in your work. If you cannot get on with your former pet project, find a successor. Nothing will kill your project faster than a burnt out leader dragging the project down. The reason for your burn out really can be as simple as having seen the same negative things over and over again so you do not want to deal with them yet again and having seen the same positive things over and over again so they do no longer give you any reward for your work. It may just be time to move on and do something else.

On making Libre Office suck less – a major refactoring effort - FOSDEM 08

2013-02-19 20:47
Libre Office is currently in a phase of code cleanup and refactoring that turns the whole code base upside down. What that means is that people need tooling to avoid quality from going down and allow for new features going in without too much risk. The project made good experiences with using gerrit for code review of patches, tinderbox for fast integration testing, strict whitespace checks to avoid unintended mistakes, use clang compiler plugins. They have less process that allows for change anywhere in any part of the code base.

There is an easy hacks page for people to get started quickly. I know that kind of thing from the Hadoop issue tracker and really appreciate having this to get new developers comfortable with the code base and all the tooling around. They apply reply-header mangling to allow for responses to go back to posters on their mailing list w/o prior subscription. They moved from their own dmake that wasn't industry standard to standard make tooling to build the project. They are in the process of translating all the German comments – shout-out to all German speaking readers of this blog: Help the Libre Office developers understand the code better by providing your German speaking skills for translation.

Some anecdotes: They found 4+ String classes in the code base and managed to get rid of one of them only recently. They are busy killing dead code, kicking out string macros, fixing cpplint warnings, refactoring code that was writing pre-STL into clean STL code, getting rid of obsolete libraries, fixing the windows installer, killing proprietary translation services and replacing that with an open one, getting rid of cargo-cult componentisation, getting rid of code duplication e.g. in the import filter implementation. They are reducing structure sizes for calc, switching to a layout based frontend, optimising the red lining for writer. The goal is to really have no no-go areas.

In order to retain quality in this fluid setup they opted for a drastic increase in unit- and integration test coverage, using bug documents as source for tests. Though the Bugzilla assistant they made it way easier even for non-experts and end-users to submit bug reports.

They are going for time-based, 6-monthly releases. Due to a long build time they are keeping track of all past binary builds for bi-section purposes – currently in git which most likely isn't the most ideal choice.

What works is putting graphs of bugs created vs. fixed over time in front of developers to keep the number of bugs low.

Within version 4.0 Libre Office is shipping:

  • better interop features for word documents with comments
  • RTF drawing imports
  • RTF improved formulae import
  • Docx annotation support
  • CMIS support for better interaction with Sharepoint, Alfresco and Nuxeo
  • More import filters e.g. for Microsoft publisher
  • Visio is now completely supported
  • Support for arbitrary XML to spreadsheet mappings
  • Conditional formulae
  • Stock option pricing support
  • Android remote control support for slides
  • Libre Logo integration for schools
  • Image rendering, smoothing, re-sizing and scaling was improved
  • Better support for right-to-left writing arabic languages
  • Style previews for fonts
  • Better unity integration

There even is an Android port in the works!

E17 - FOSDEM 07

2013-02-18 20:46
I'm really glad the NoSQL room was all packed on afternoon – otherwise I'd have missed an amazing talk by people behind Enlightenment – a window manager that is older than Gnome, nearly older than KDE and has been my favourite choice for years and years (simply because they have sensible default configuration options: focus follows mouse, virtual desktops that allow for desktop switching when moving the mouse close to the screen edges, menu opening when clicking anywhere on the desktop background, options for remembering window placement and configuration on re-boot etc).

Finally in December last year they actually did realease E17 after more than a decade of work. They now feature a tiling module, split desktops per screen, launchers, taskbars, systrays (*brrr*), screenshotting and multiple sharing options, custom layout modules for desktop and mobile.

There's a full fledged file manager that is also used as file selector. There is a compositor with wayland client support, that works decently even on old or slow hardware (think raspberry pi).
Their main goal is not to build a window manager that even your grandma can use. Rather they focus on stuff for the geeks that just works, is efficient, has lots of eye candy and when run on a nexus7 instead of unity saves 200MB of RAM.

Their main goal is to be a base for touch and mobile development. The number of desktops is shrinking giving way to more and more mobile devices. Fortunately the project is now sponsored (as in paid developers) by Samsung as part of their Tizen efforts. E17 does work as part of Tizen for years now, the only part missing is a product running the software available for purchase.

The goals for E18 (to be released end of 2013 – hear, hear) include going beyond the desktop, to polish things up, provide more default profiles for diverse devices, optimise battery and memory consumption, run without swap space, avoid going to memory instead of the cache to avoid draining the battery of mobile devices. There will be image and font sharing across processes, faster software rendering, async rendering with more threads. There's even thoughts to deal with different finger size issues on touch devices.

On using the composite manager as default: It made the code and optimisation a whole lot easier, though there are still issues with multiple screens that all switch compositing off in case of full screen games that cannot run with it turned on.

There will be work to integrate better with wayland, support for physics and sounds in themes, more compositing signals, improved gadget infrastructures, easier content sharing options – and all the cool stuff users can think of.

Systemd - FOSDEM 06

2013-02-18 20:45
As sort of a “go out of your comfort zone and discover new stuff” exercise I went to the systemd – two years later talk next. It's just plain amazing to see a machine boot in roughly one second (that is not counting the 7s that the BIOS needs for initialization). The whole project started as a init-only project but has since grown to a much larger purpose: An init platform ranging from mobile, embedded, desktop devices to servers many features were just over-due across the board.

Essentially the event-based system brings together what was split and duplicated before in things like console-kit, sysVinit, initscripts, inetd, pm-utils, acpid, syslog, watchdog services, cgrulesd, cron and atd. It brings support for event based container spawning, suspending and shutdown which brings whole new opportunities for optimisations. In addition for the first time in the history of Linux there is the possibility of grouped resource management: Instead of having nice levels bound to processes you now can group services to cgroups and give them guaranteed resources (which makes resource management of e.g multiple Apache processes plus some MySQL instances all running on the same machine so much easier).

(Post kindly proof-read and corrected by Thilo Fromm)

Notes on storage options - FOSDEM 05

2013-02-17 20:43

Second day at FOSDEM for me started with the MySQL dev room. One thing that made me smile was in the MySQL new features talk: The speaker announced support for “NoSQL interfaces” to MySQL. That is kind of fun in two dimensions: A) What he really means is support for the memcached interface. Given the vast number of different interfaces to databases today, announcing anything as “supports NoSQL interfaces” sounds kind of silly. B) Given the fact that many databases refrain from supporting SQL not because they think their interface is inferior to SQL but because they sacrifice SQL compliance for better performance, Hadoop integration, scaling properties or others this seems really kind of turning the world upside-down.

As for new features – the new MySQL release improved the query optimiser, subquery support. When it comes to replication there were improvements along the lines of performance (multi threaded slaves etc.), data integrity (replication check sums being computed, propagated and checked), agility (support for time delayed replication), failover and recovery.

There were improvements along the lines of performance schemata, security, workbench features. The goal is to be the go-to-database for small and growing businesses on the web.

After that I joined the systemd in Debian talk. Looking forward to systemd support in my next Debian version.

HBase optimisation notes

Lars George's talk on HBase performance notes was pretty much packed – like any other of the NoSQL (and really also the community/marketing and legal dev room) talks.

Lars started by explaining that by default HBase is configured to reserve 40% of the JVM heap for in memory stores to speed up reading, 20% for the blockcache used for writing and leaves the rest as breath area.

On read HBase will first locate the correct region server and route the request accordingly – this information is cached on the client side for faster access. Prefetching on boot-up is possible to save a few milliseconds on first requests. In order to touch as little files as possible when fetching bloomfilters and time ranges are used. In addition the block cache is queried to avoid going to disk entirely. A hint: Leave as much space as possible for the OS file cache for faster access. When monitoring reads make sure to check the metrics exported by HBase e.g. by tracking them over time in Ganglia.

The cluster size will determine your write performance: HBase files are so-called log structured merge trees. Writes are first stored in memory and in the so-called Write-Ahead-Log (WAL, stored and as a result replicated on HDFS). This information is flushed to disk periodically either when there are too many log files around or the system gets under memory pressure. WAL without pending edits are being discarded.

HBase files are written in an append-only fashion. Regular compactions make sure that deleted records are being deleted.

In general the WAL file size is configured to be 64 to 128 MB. In addition only 32 log files are permitted before a flush is forced. This can be too small a file size or number of log files in periods of high write request numbers and is detrimental in particular as writes sync across all stores, so large cells in one family will cause a lot of writes.

Bypassing the WAL is possible though not recommended as it is the only source for durability there is. It may make sense on derived columns that can easily be re-created in a co-processor on crash.

Too small WAL sizes can lead to compaction storms happening on your cluster: Many small files than have to be merged sequentially into one large file. Keep in mind that flushes happen across column families even if just one family triggers.

Some handy numbers to have when computing write performance of your cluster and sizing HBase configuration for your use case: HDFS has an expected 35 to 50 MB/s throughput. Given different cell size this is how that number translates to HBase write performance:

Cell size OPS
0.5MB 70-100
100kB 250-500
10kB with 800 less than expected as this HBase is not optimised for these sizes
1kB 6000, see above

As a general rule of thumb: Have your memstore be driven by size number of regions and flush size. Have the number of allowed WAL logs before flush be driven by fill and flush rates.. The capacity of your cluster is driven by the JVM heap, region count and size, key distribution (check the talks on HBase schema design). There might be ways to get rid of the Java heap restriction through off-heap memory, however that is not yet implemented.

Keep enough and large enough WAL logs, do not oversubscribe the memstore space, keep the flush size in the right boundaries, check WAL usage on your cluster. Use Ganglia for cluster monitoring. Enable compression, tweak the compaction algorithm to peg background I/O, keep uneven families in separate tables, watch the metrics for blockcache and memstore.

AFERO GPL Panel discussion - FOSDEM 04

2013-02-16 20:41
The panel started with a bit of history of the AGPL: Born in the age of growing ASP (application service provider) businesses AGPL tried to fix the hosting loop whole in GPL in the early 2000s. More than ten years later it turns out the license hasn't quite caught traction: On the one hand the license does have a few wording issues. In addition it is still rather young and used by few so there is less trust compared to GPL or ASL to last when put on trial. However there's another reason for low adoption:

Those that are being targeted with the license – people developing web services – tend to prefer permissive licenses over copyleft ones (see Django, Rails for example). People are still in the postion of trying to gain strong positions when opening up their infrastructure. As a result there is a general preference for permissive licenses. Also there are many more people working on open source not as their hobby project but as their general day job. As a result the number of people backing projects that are infrastructure only, company driven and trying to establish de-facto standards through the availability of free software is growing.

Depressing for the founders of AGPL are businesses using the AGPL to try and trick corporations into using their software as open source and later go after them with additional clauses in their terms and conditions to enforce subscription based services.

Mozilla legal issues - FOSDEM 03

2013-02-15 22:39
In the next talk Gervase Markham talked about his experience working for Mozilla on legal and license questions. First the speaker summarized what kind of requests he gets most:

  • There are lots of technical support requests.
  • Next on the top list is the question for whether or not shipping Mozilla with a set of modifications is ok.
  • Next is an internal question, namely: Can I use this code?
  • Related to that is the “We have a release in two weeks, can we ship with this code?”
  • Another task is finding code that was used but is not ok.
  • Yet another one is getting code licensed or re-licensed.
  • Maintaining the about:license page is another task.
  • Dealing with ECCV/CCATS requests is another issue that comes up often.

However there are also bigger tasks: There was the goal of tri-licensing Mozilla. The only issue was the fact that they had accumulated enough individually copyrighted contributions to make that that task really tricky. In the end they wrote a tool to pull out all contributor names, send them mails asking for permission to tri-license. After little over three years they had responses from all but 30 contributors. As a result the “find this hacker” campaign was launched on /. and other news sites. In the end most could be found.

As another step towards easier licensing the MPL 2 replacing 1.1 was introduced – it fixes GPL/ASL license incompatibilities, notification and distribution requirements, the difference for initial developers and the use of conditional/Jacobson language.

There are still a few issues with source files lacking license headers (general advise that never has been tested in court is the concept of license bleeding: If there are files with and without license headers in one folder, most likely those w/o have the same license as those with. “aehem” ;)

There are lots of questions on license interpretation. This includes questions from people wanting to use Mozilla licensed software that wasn't even developed within the Mozilla foundation. Also there are lots of people who do not understand the concept of “free does not mean non-commercial use only”.

Sometimes there a license archeology task where people ask “hey, is that old code yours and is it under the Mozilla license?”

Another interesting case was a big, completely unknown blue company asking whether the hunspell module, having changed licenses so often (from BSD forked to GPL, changed to LGPL, to CC-Attr, to the tri license of Mozilla, including changed GPL stuff with the author's permission) really can be distributed by Mozilla under the MPL. After lots of digging through commit logs and change logs they could indeed verify that the code is completely clean.

Then there was the case of Firefox OS which was a fast development effort, involving copying lots of stuff from all over the internet just to get things running. A custom license scanner written to verify all bits and pieces was finally implemented and used to give clearance on release. It found dozens of distinct versions of the Mozilla and BSD licenses (mainly due to the fact that people are invited to add their own name to it when releaseing code). As a result there now is a discussion on OSI to discourage that behaviour to keep the number of individual license files to ship with the software down to a minimal number.

The speaker's general recommendation on releasing small software projects under a non-copyleft license was to use the CC-0 license, for larger stuff his recommendation was to go for the ASL due to its patent grant clauses. Even at Mozilla quite a few projects have switched over to Apache.
There also were a few license puzzlers:

  • OpenJDK asked for permission to use their root-store certificates. Unfortunately at the time of receiving them they had not been given any sort of contract under which they may use them. *ahem*
  • The case with search engine icons … really isn't … so much different.

There also tend to be some questions on the Firefox/Mozilla trademarks ranging from

  • “can I use your logo for purpose such'n'such”?
  • ”Do you have 'best viewed in...' button”? - Nope, as we generally appreciate developers writing web sites that comply with web standards instead of optimizing for one single browser only.
  • They did run into the subscription on download scam trap and could stop those sites due to trademark infringement.
  • Most of this falls under fair use – especially cases like Pearson asking for permission (with a two-page mail + pdf letter) to link to the mozilla web site...

In general when people ask for permission if they do not need to ask: Tell them so but give them permission anyway. This is in order to avoid an “always-ask-for-permission” culture, and really to keep the number of requests down to those that are really necessary. One thing that does need prior permission though is shipping Firefox with a bunch of plugins pre-installed as a complete package.

On Patents – Mozilla does not really have any and spends time (e.g. on OPUS) avoiding them. On a related note there sometimes even are IPO requests.

Trademarks and OSS - FOSDEM 02

2013-02-14 20:38
So the first talk I went to ended up being in the legal dev room on trademarks in open source projects. The speaker had a background mainly in US American trademark law and quite some background when it comes to open source licenses.

To start Pamela first showed a graphic detailing the various types of trademarks: In the pool of generic names there is a large group of trademarks that are in use but not registered. The amount of registered trademarks actually is rather small. The main goal of trademarks is to avoid confusing costumers. This is best seen when thinking about scammers trying to trick users into downloading users pre-build and packaged software from third party servers demanding a credit card number that is later charged based on a subscription service the user signed by clicking away the fine print on the download page. Canonical example seems to be e.g. the Firefox/Mozilla project that was effected by this kind of scam. But also other end-user software (think Libre/Open Office, Gimp) could well be targets. This kind of deceiving web pages usually can be taken down way faster with a cease and desist letter due to trademark infringement rather than due to the fraud they do.

So when selecting trademarks – what should a project look out for? One is the name should not be too generic as that would lead to a name that is not enforceable. It should not be too theme-y as the names that are themed usually are already taken. The time to research should be contrasted with the pain it will cost to rename the project in case of any difficulties.

There are few actual court decisions that relate to trademarks and OSS: In Germany it was decided that forking the ENIGMA project and putting it on set-op boxes but keeping the name was ok for as long as the core function would be kept and third party plugins would still work.

In the US there was a decision that keeping the name of re-furbished SparkPlugs is ok for as long as it is clearly marked what to expect when buying them (in this case re-furbished instead of newly made).

Another thing to keep in mind are trademarks are naked trademarks – those that were not enforced and have become too ubiquitous. In the US that would be the naked license trademarks, in Greece the recycling mark “Der Grüne Punkt” has become too ubiquitous to be treated as a trademark any more.

Trademark law already fails in multinational corporation setups with world wide subsidies. It gets even worse with world wide distributed open source projects. The question of who owns the mark, who is allowed to enforce it, who exercises control gets worse the more development is distributed. When new people take over trademarks there should be some clear paper transferral document to avoid confusion.

Trademarks only deal with avoiding usage confusion: Using the mark when talking about it is completely fine. Phrases like “I'm $mark compatible”, “I'm running on top of $mark” care completely ok. However make sure to use as little as possible – there is no right to also just use the logos, icons or design forms of the project you are talking about – unless you are talking about said logo of course.

So to conclude: respect referential use, you can't exercise full control but should avoid exercising too little control.

There is a missing consistent understanding of and behaviour towards trademarks in the open source community. Now is the time to shape the law according to what open source developers think they need.

FOSDEM 2013 - 01

2013-02-13 22:38
On Friday morning our train left for this year's FOSDEM. Though a bit longish I have a strong preference for going by train as this gives more time and opportunity for hacking (in my case trying out Elastic Search), reading (in my case the book “Team Geek”) and chatting with other FOSDEM visitors.

Monday morning was mostly busy with meeting people - at the FSFE, Debian, Apache Open Office booths, generally in the hallways. And with getting some coffee to the Beaglebone booth where my husband helped out . For really fun videos on the hardware they had there see:

if you want to get the hardware underneath talk to circuitco.

Unfortunately I didn't make it to the community and marketing room – too full during the talks that I wanted to see (as a general shout-out to people attending conferences: If you do not find a seat, move into the room instead of standing right next to the door, if you do have a seat and a free one just next to you, move to the seat next to you).

If you missed some of the talks you might want to try your luck with the FOSDEM video archive - it's really extensive featuring videos taken at previous editions as well and is a great resource to find talks of the most important tracks.

FrOSCon - understanding Linux with strace

2012-09-06 20:29
Being a Java child I had only dealt with strace once before: Trying to figure out whether any part of the Mahout tests happens to use /dev/random for initialisation in a late night debugging session with my favourite embedded Linux developer. Strace itself is a great tool to actually see what your program is doing in terms of system calls, giving you the option to follow on a very detailed level what is going on.

In his talk Harald König gave a very easy to follow over view on how to understand Linux with strace. Starting with the basic use cases (trace file access, program calls, replay data, analyse time stamps, do some statistics) the quickly moved on to showing some more advanced tricks you can do with the little tool: Finding sys-calls that take surprisingly long vs. times when user code is doing long-running computations. Capturing and replaying e.g. networking related calls to simulate problems that the application runs into. Figuring out bottlenecks (or just plain weird stuff) in the application by figuring out the most frequent syscall. Figuring out which configuration files an application really touches - sorting them by last modified date with a bit of shell magic might give an answer to the common question of whether the last update or the last time the user tinkered with the configuration turned his favourite editor to appear green instead of white. On the other hand it can also reveal when configurations have been deleted (in the presentation he moved away the user-emacs configuration. As a result emacs tried >30 times to find it for various configuration options during startup: Is it there? No. ... Is it now there? No. ... Maybe now? Nope. ... ;) ).

When looking at strace, you might also want to take a look at ltrace that traces library calls - the output there might be a bit more readable in that it's not just system calls but also library calls. Remember though that tracing everything can not only make your app pretty slow but also quickly generates several gigabytes of information.