FrOSCon - Robust Linux embedded platform

2012-09-04 20:05
The second talk I went to at FrOSCon was given by Thilo Fromm on Building a robust embedded Linux platform. For more information on the underlying project see also projec HidaV on github. Slides of the talk Building a robust Linux embedded platform are already online.

Inspired by a presentation on safe upgrade prodedures in embedded devices by Arnaut Vandecappelle in the Embedded Dev Room FOSDEM earlier this year Thilo extended the scope of the presentation a bit to cover safe kernel upgrades as well as package updates in embedded systems.

The main goal of the design he presented was to allow for developing embedded systems that are robust - both in normal operation but also when upgrading to a new firmware version or a set of new packages - the design included support for upgrading and rolling back to a known working state in an atomic way. Having systems deployed somewhere in the wild to power a wind turbine, inside of busses and trains or even within satellites pretty much forbids relying on an admin to press the "reset button".

Original image

The reason for putting that much energy into making these systems robust also lies in the ways they are deployed. Failure vectors include not only your usual software bugs, power failures or configuration incompatibilities. Transmission errors, storage corruption, temperature, humidity add their share to increase the probability of failure.

Achieving these goals by building a custom system isn't too trivial. Building a platform that is versatile enough to be used by others building embedded systems adds to the challenges: Suddenly having easy to use build and debug tools, support for software life-cycle management and extend-ability are no longer nice-to-have features.

Thilo presented two main points to address the requirements: The first is to avoid trying to cater every use case. Setting requirements for a platform in terms of performance, un-brickability (see also urban dictionary, third entry as of this writing). Even setting a requirement for dual boot support or to the internal storage technology used. As a result designing the platform can become a lot less painful.

The second step is to harden the platform itself. Here that means that upgrading the system (both firmware and packages) is atomic, can be rolled-back atomically and thus no longer carries the danger of taking the device down for longer than intended: A device that does no longer perform it's task in the embedded world usually is considered broken and shipped back to the producer. As a result upgrading may be necessary but should never render the device useless.

One way to deal with that is to store boot configurations in a round robin manner - for each configuration a "was booted" (set by the bootloader on boot) and a "is healthy" (set by the system after either a certain time of stability or after running self tests) flags are needed. This way at each boot it is clear what the last healthy configuration was.

To do the same with your favourite package management system is slightly more complicated: Imagine running something like apt-get upgrade with the option to switch back to the previous state in an atomic way if anything goes wrong. One option to deal with that presented is to work with transparant overlay filesystems that allow for having a read-only base layer - and a "transparent" r/w layer on top. If a file does not exist in the transparent layer, the filesystem will return the original r/o version. If it does exist it will return the version in the transparent overlay. In addition there's also an option to mark files as deleted in the overlay.

With that upgrading becomes as easy as installing the upgraded versions into some directory in your filesystem and mounting said directory as transparent overlay. With that roll-back as well as snapshots are easy to do.

The third ingredient to achieving a re-usable platform presented was to use Open Embedded. Including an easy to extend layer-based concept, support for often recent software versions, versioning and dependency modelling, some BSP layers officially supported by hardware manufacturers building a platform on top of Open Embedded is one option to make it easily re-useable by others.

If you want to know more on the concepts described join HiDaV platform project - many of the concepts described are already - or soon to be - implemented.