FrOSCon 2018

2018-08-29 16:34

A more general summary: of the conference written in German. Below a more detailed summary of the keynote by Lorena Jaume-Palasi.

In her keynote "Blessed by the algorithm - the computer says no!" Lorena detailed the intersection of ethics and technology when it comes to automated decision making systems. As much as humans with a technical training shy away from questions related to ethics, humans trained in ethics often shy away from topics that involve a technical layer. However as technology becomes more and more ingrained in everyday life we need people who understand both - tech and ethical questions.

Lorena started her talk detailing how one typical property of human decision making involves inconsistency, otherwise known as noise: Where machine made decisions can be either accurate and consistent or biased and consistent, human decisions are either inconsistent but more or less accurate or inconsistent and biased. Experiments that showed this level of inconsistency are plenty, ranging from time estimates for tasks being different depending on weather, mood, time of day, being hungry or not up to judges being influenced by similar factors in court.

One interesting aspect: While in order to measure bias, we need to be aware of the right answer, this is not necessary for measuring inconsistency. Here's where monitoring decisions can be helpful to palliate human inconsistencies.

In order to understand the impact of automated decision making on society one needs a framework to evaluate that - the field of ethics provides multiple such frameworks. Ethics comes in three flavours: Meta ethics dealing with what is good, what are ethical requests? Normative ethics deals with standards and principles. Applied ethics deals with applying ethics to concrete situations.

In western societies there are some common approaches to answering ethics related questions: Utilitarian ethics asks which outputs we want to achieve. Human rights based ethics asks which inputs are permissible - what obligations do we have, what things should never be done? Virtue ethics asks what kind of human being one wants to be, what does behaviour say about one's character? These approaches are being used by standardisation groups at e.g. DIN and ISO to answer ethical questions related to automation.

For tackling ethics and automation today there are a couple viewpoints, looking at questions like finding criteria within the context of designing and processing of data (think GDPR), algorithmic transparency, prohibiting the use of certain data points for decision making. The importance of those questions is amplified now because automated decision making makes it's way into medicine, information sharing, politics - often separating the point of decision making from the point of acting. One key assumption in ethics is that you should always be able to state why you took a certain action - except for actions taken by mentally ill people, so far this was generally true. Now there are many more players in the decision making process: People collecting data, coders, people preparing data, people generating data, users of the systems developed. For regulators this setup is confusing: If something goes wrong, who is to be held accountable? Often the problem isn't even in the implementation of the system but in how it's being used and deployed. This confusion leads to challenges for society: Democracy does not understand collectives, it understands individuals acting. Algorithms however do not understand individuals, but instead base decisions on comparing individuals to collectives and inferring how to move forward from there. This property does impact individuals as well as society.

For understanding which types of biases make it into algorithmic decision making systems that are built on top of human generated training data one needs to understand where bias can come from:

The uncertainty bias is born out of a lack of training data for specific groups amplifying outlier behaviour, as well as the risk for over-fitting. One-sided criteria can serve to reinforce a bias that is generated by society: Even ruling out gender, names and images from hiring decisions a focus on years of leadership experience gives an advantage to those more likely exposed to leadership roles - typically neither people of colour, nor people from poorer districts. One-sided hardware can make interaction harder - think face recognition systems having trouble identifying non-white humans, having trouble identifying non-male humans.

In the EU we focus on the precautionary principle where launching new technology means showing it's not harmful. This though proves more and more complex as technology becomes entrenched in everyday life.

What other biases do humans have? There's information biases, where humans tend to reason based on analogy, based on the illusion of control (overestimating oneself, downplaying risk, downplaying uncertainty), there's an escalation of committment (a tendency to stick to a decision even if it's the wrong one), there are single outcome calculations.

For cognitive biases are related to framing, criteria selection (we tend to value quantitative criteria over qualitative criteria), rationality. There's risk biases (uncertainties about positive outcomes typically aren't seen as risks, risk tends to be evaluated by magnitude rather than by a combination of magnitude and probability). There's attitude based biases: In experiments senior managers considered risk taking as part of their job. The level of risk taken depended on the amount of positive performance feedback given to a certain person: The better people believe they are, the more risk they are willing to take. Uncertainty biases relate to the difference between the information I believe I need vs. the information available - in experiments humans made worse decisions the more data and information was available to them.

General advise: Beware of your biases...

FrOSCon - on teaching

2012-09-09 08:17
The last talk I went to during FrOSCon was Selena's keynote on "Mistakes were made". She started by explaining how she taught computer science (or even just computer-) concepts to teachers herself - emphasizing how exhausting teaching can be, how many even trivial concepts were unknown to her students. After that Selena briefly sketched how she herself came to IT - emphasizing how providing mostly the information she needed to accomplish the current task at hand and telling how to get more information helped her make her first steps a great deal.

The main point of her talk however was to highlight some of the underlying causes for the lack of talented cs students. Some background literature is online at her piratepad on the subject.

The discussion that followed the keynote (and included contributions from two very interested, refreshingly dedicated teachers) was quite lively: People generally agreed that computer science/ computing or even just logical and statistical thinking plays a sadly minor role in current education. Students are mainly forced to memorize large amounts of facts by heart but are not taught to question their environment, discover relations or rate sources of information. The obvious question that seemed to follows was that on what to remove from the curriculum when introducing computing as a subject. My personal take on that is that maybe there is no need for removing anything - instead changing the way concepts are taught might already go a long way: Put arts, maths, natural sciences and music into context, have kids evaluate statistics and rate them not only in maths but also in e.g. biology by letting them examine some common statistical fallacies in the subject area.

Another problem stated was the common lack of technical understanding, the common lack of time for preparation and the common lack of understanding for the concept of open source or creative commons content. Taken together this makes sharing teaching material and improving it together with others incredibly hard.

Selena's call to action was for geeks to get involved and educate the people near and dear to them instead of giving up. On thing to add to that: Most German universities have some sort of visitors' days to prospective students - some even have collaborations with schools to do projects together with younger ones - make sure to check out your own university - you might well find out that teaching is not only exhausting but also particularly rewarding especially when teaching students that really want to know and participate in your project just because they want to.

If you know any teachers who are open to the idea of having externals take over some their lessons or at least provide input get them connected with your peers that are interested in educating others. Also keep in mind that most open source projects, hacker spaces and related organisations in Germany are so-called "gemeinnütziger e.V." - a status that in many cases was achieved by declaring the advancement of education as at least one of their goals.

FrOSCon - Git Goodies

2012-09-05 20:34
In his talk on Git Goodies Sebastian Harl introduced not only some of the lesser known git tooling but also gave a brief introduction as to how git organises its database. Starting with an explanation of how patches essentially are treated as blobs identified by SHA1 hashes (thus avoiding duplication not only in the local database but allover the git universe), pointed to by trees that are in turn generated and extended by commits that are in turn referenced by branches (updates on new commits) and tags (don't update on new commits). With that concept in mind it suddenly becomes trivial to understand that HEAD simply is a reference to wherever you next commit is going to in your working directory. It also becomes natural to understand that HEAD pointing just to a commit-id but not to a branch is called a de-tached head.

Commits in git are tracked in three spaces: In the repository (this is where stuff goes after a commit), in the index (this is where stuff goes after an add or rm) and in the working directory. Reverting is symetric: git checkout takes stuff from the repository and puts it into the current working copy. reset --mixed/--hard only touches the index.

When starting to work more with git start reading the man and help pages. They contain lots of goodies that make daily work easier: There are options that allow for colored diffs, setting external merge tools (e.g. vimdiff), setting the push default (just current branch or all matching branches). There are options to define aliases for commands (diff here has a large variety of options that can be handy like coloring only different words instead of lines). There are options to set the git-dir (where .git lies) as well as the working directory which makes it easy to track your website in git but not have the git directory lie in your public_html folder.

There is a git archive to checkout your stuff as tar.gz. When browsing the git history tig can come in handy - it allows for browsing your repository with an ncurses interface, show logs, diffs and the tree of all commits. You can ask it to only show logs that match a certain pattern.

Make sure to also look at the documentation of ref-parse that explains how to reference commits in an even more flexible manner (e.g. master@{yesterday}). Also checkout the git reflog to take a look at the version history of your versioning. Really handy if you ever mess up your repository and need to get back to a sane state. Also a good way to recover detached commits. Take a look at git-bisect to learn more on how to binary-search for commits that broke your build. Use a fine granular way to add changes to your repository with git add -p - do not forget to take a look at git stash as well as cherry-pick.

FrOSCon - Robust Linux embedded platform

2012-09-04 20:05
The second talk I went to at FrOSCon was given by Thilo Fromm on Building a robust embedded Linux platform. For more information on the underlying project see also projec HidaV on github. Slides of the talk Building a robust Linux embedded platform are already online.

Inspired by a presentation on safe upgrade prodedures in embedded devices by Arnaut Vandecappelle in the Embedded Dev Room FOSDEM earlier this year Thilo extended the scope of the presentation a bit to cover safe kernel upgrades as well as package updates in embedded systems.

The main goal of the design he presented was to allow for developing embedded systems that are robust - both in normal operation but also when upgrading to a new firmware version or a set of new packages - the design included support for upgrading and rolling back to a known working state in an atomic way. Having systems deployed somewhere in the wild to power a wind turbine, inside of busses and trains or even within satellites pretty much forbids relying on an admin to press the "reset button".

Original image

The reason for putting that much energy into making these systems robust also lies in the ways they are deployed. Failure vectors include not only your usual software bugs, power failures or configuration incompatibilities. Transmission errors, storage corruption, temperature, humidity add their share to increase the probability of failure.

Achieving these goals by building a custom system isn't too trivial. Building a platform that is versatile enough to be used by others building embedded systems adds to the challenges: Suddenly having easy to use build and debug tools, support for software life-cycle management and extend-ability are no longer nice-to-have features.

Thilo presented two main points to address the requirements: The first is to avoid trying to cater every use case. Setting requirements for a platform in terms of performance, un-brickability (see also urban dictionary, third entry as of this writing). Even setting a requirement for dual boot support or to the internal storage technology used. As a result designing the platform can become a lot less painful.

The second step is to harden the platform itself. Here that means that upgrading the system (both firmware and packages) is atomic, can be rolled-back atomically and thus no longer carries the danger of taking the device down for longer than intended: A device that does no longer perform it's task in the embedded world usually is considered broken and shipped back to the producer. As a result upgrading may be necessary but should never render the device useless.

One way to deal with that is to store boot configurations in a round robin manner - for each configuration a "was booted" (set by the bootloader on boot) and a "is healthy" (set by the system after either a certain time of stability or after running self tests) flags are needed. This way at each boot it is clear what the last healthy configuration was.

To do the same with your favourite package management system is slightly more complicated: Imagine running something like apt-get upgrade with the option to switch back to the previous state in an atomic way if anything goes wrong. One option to deal with that presented is to work with transparant overlay filesystems that allow for having a read-only base layer - and a "transparent" r/w layer on top. If a file does not exist in the transparent layer, the filesystem will return the original r/o version. If it does exist it will return the version in the transparent overlay. In addition there's also an option to mark files as deleted in the overlay.

With that upgrading becomes as easy as installing the upgraded versions into some directory in your filesystem and mounting said directory as transparent overlay. With that roll-back as well as snapshots are easy to do.

The third ingredient to achieving a re-usable platform presented was to use Open Embedded. Including an easy to extend layer-based concept, support for often recent software versions, versioning and dependency modelling, some BSP layers officially supported by hardware manufacturers building a platform on top of Open Embedded is one option to make it easily re-useable by others.

If you want to know more on the concepts described join HiDaV platform project - many of the concepts described are already - or soon to be - implemented.

FrOSCon 2012 - REST

2012-08-29 19:33
Together with Thilo I went to FrOSCon last weekend. Despite a few minor glitches and the "traditional" long BBQ line the conference was very well organised and again brought together a very diverse crowd of people including but not limited to Debian developers, OpenOffice people, FSFE representatives, KDE and Gnome developers, people with background in Lisp, Clojure, PHP, Java, C and HTML5.

The first talk we went to was given by JThijssen on REST in practice. After briefly introducing REST and going a bit into Myths and false believes about REST he explained how REST principles can be applied in your average software development project.

To set a common understanding of the topic he first introduced the four steps REST Maturity Model: Step zero means using plain old xml over http for rpc or SOAP. Nothing particularly fancy here - even to some extend breaking common standards related to http. Going one level up means modeling your entities as resources. Level two is as simple as using the http verbs for what they are intended - don't delete anything on the other side just by using a GET request. Level three finally means using hypermedia controls, HATEOS and providing navigational means to decide on what to do next.

Myths and legends

Rest is always http - well, it is transport agnostic. However mostly it using http for transport.

Rest equals CRUD - though not designed for that it is often used for that task in practice.

Rest scales - as a protocol yes, however of course that does not mean that the backend you are talking to does. All Rest does for you is to give you a means to horizontally scale without having to worry too much about server state.

Common mistakes

Using Http verbs - if you've ever dealt with web crawling you probably know those stories of some server's content being deleted just be crawling a public facing web site just because there was a "delete" button somewhere that would trigger a delete action through an innocent looking GET request. The lesson learnt of those: Use the verbs for what they are intended to be used. One commonly confused thing is the usage of PUT vs. POST. Common rule of thumb that also applies to the CouchDB REST API: Use PUT if you know what the resulting URL should be (e.g. when storing an entry to he database and you know the key that you want to use). Use POST if you do not care about which URL should result from the operation (e.g. if the database should automatically generate a unique key for you). Also make sure to use the error codes as intended - never return error code 2?? only to add an xml snippet to the payload that explains to the surprised user that an error occurred including an error code. If you really need an explanation of why this is considered bad practice if not plain evil, think about caching policies and related issues.

When dealing with resources a common mistake is to stuff as much information as possible into one single resource for one particular use case. This means transferring a lot of additional information that may not be needed for other use cases. A better approach could be to allow clients to request custom views and joins of the data instead of pre-generating them.

When it comes to logging in to your API - don't design around HTTP - use it. Sure you can give a session id into a cookie to the user. However than you are left with the problem of handling client state on the server - which was supposed to be stateless so clients can talk to any server. You could store the logged in information in the client cookie - signing and encrypting that might even make it slightly less weird. However the cleaner approach would be to authenticate individual requests and avoid state altogether.

When it comes to URL design keep in mind to keep them in a format that is easy to handle for caches. An easy check would be to try and bookmark the page you are looking at. Also think about ways to increase the number of cache hits if results are even slightly expensive to generate. Think about an interface to retrieve the distance from Amsterdam to Brussels. The URL could be /distance/to/from - however given no major road issues the distance from Amsterdam to Brussels should be the same as from Brussels to Amsterdam. One easy way to deal with that would be to allow for both requests but to send a redirect to the first version in case a user requests the second. The semantics would be slightly different when asking for driving directions - there the returned answers would indeed differ.

The speaker also introduced a concept for handling asynchronous updates that I found interesting: When creating a resource hand out a 202 accepted response including a queue ticket that can be used to query for progress. For as long as the ticket is not yet being actively dealt with it may even contain cancellation methods. As soon as the resource is created requesting the ticket URL will return a redirect to the newly created resource.

The gist of the talk for me was to not break the Rest constraints unless you really have to - stay realistic and pragmatic about the whole topic. After all, most likely you are not going to build the next Twitter API ;)

FrOSCon 2012

2012-07-31 20:25
On August 25th/26th the Free and Open Source Conference (FrOSCon) will again kick off in Sankt Augustin/ Germany.

The event is completely community organised, hosted by the FH Sankt Augustin. It covers a broad range of free software topics like Arduino microcontrollers, git goodies, politics, strace, open nebula, wireshark and others.

Three highlights that are on my schedule:

Looking forward to interesting talks and discussions at FrOSCon.

Flying back home from Cologne

2009-08-23 20:40
Last weekend FrOSCon took place in Sankt Augustin, near Cologne. FrOSCon is organized on a yearly basis at the university of applied sciences in Sankt Augustin. It is a volunteer driven event with the goal of bringing developers and users of free software projects together. This year, the conference featured 5 tracks, two examples being cloud computing and the Java track.

Unfortunately this year the conference started with a little surprise for me and my boyfriend: Being both speakers, we had booked a room in Hotel Regina via the conference committee. Yet on Friday evening we had to learn that the reservation never actually reached the hotel... So, after several minutes talking to the receptionist, calling the organizers we ended up in a room that was booked for Friday night by someone who was known to arrive no earlier than Saturday. Fortunately for us we have a few friends close by in Düsseldorf: Fnord was so very kind to let us have his guest couch for the following night.

Checkin time next morning: On the right hand side the regular registration booth. On the left hand side the entrance for VIPs only. The FSFE quickly realized it's opportunity: They soon started distributing flyers and stickers among the waiting exhibitors and speakers.

Set aside the organizational issues, most of the talks were very interesting and well presented. The Java track featured two talks by Apache Tomcat committer Peter Roßbach, the first one on the new Servlet 3.0 API, the second one on Tomcat 7. Too sad, my talk was in parallel to his Tomcat talk, so I couldn't attend that. I appreciate several of the ideas on cloud computing highlighted in the keynote: Cloud computing as such is not really new or innovative, it is several good ideas so far known for instance as utility computing that are now being improved and refined to make computation a commodity. At the very moment however cloud computing providers tend to sell their offers as new, innovative products. There is no standard API for cloud computing services. That makes switching from one provider to another extremely hard and leads to vendor-lockin for its users.

The afternoon was filled by my talk. This time I tried something, that so far I only have done in user groups of up to 20 people: I first gave a short introduction into who I am and than asked the audience to describe themselves in one sentence. There were about 50 people, after 10 minutes everyone had given is self-introduction. It was a nice way of getting detailed information of what knowledge to expect from people, and it was interesting to hear people from IBM and Microsoft being in the room.

After that I attended the RestMS talk by Thilo Fromm and Peter Hintjens. They showed a novel, community driven way to standards creation. RestMS is a messaging standard that is based on a restful way for communication. So far the standard itself is still in it's very early stages, still there are some very “alpha, alpha, alpha” implementations out there that can be used for playing around. According to Peter there are actually people who already use these implementations for production servers and send back bug reports.

Sunday started with an overview of the DaVinci VM by Dalibor Topic, the author of the OpenJDK article series in the German Java Magazin. Second talk of the day was an introduction to Scala. I already know a few details of the language, but the presentation made it easy to learn more: It was organised as an open question and answer session with live coding leading through the talk.

After lunch and some rest, the last two topics of interest were on details on the campaigns of FFII against software patents and an overview of the upcoming changes in gnome3.0.

This year's FrOSCon did have some organizational quirks but the quality of most of the talks was really good with at least one interesting topic in one of the sessions at nearly every time slot - though I must admit that that was easy in my case with Java and cloud computing being of interest to me.

Update: Videos are up online.