Open Source Summit - Day 3

2017-10-29 08:35
Open source summit Wednesday started with a keynote by members of the Banks family telling a packed room on how they approached raising a tech family. The first hurdle that Keila (the teenage daughter of the family) talked about was something I personally had never actually thought about: Communication tools like Slack that are in widespread use come with an age restriction excluding minors. So by trying to communicate with open source projects means entering illegality.

A bit more obivious was their advise to help raise kids' engagement with tech: Try to find topics that they can relate to. What works fairly often are reverse engineering projects that explain how things actually work.

The Banks are working with a goal based model where children get ten goals to pursue during the year with regular quarterly reviews. An intersting twist though: Eight of these ten goals are choosen by the children themselves, two are reserved for parents to help with guidance. As obvious as this may seem, having clear goals and being able to influence them yourselves is something that I believe is applicable in the wider context of open source contributor and project mentoring as well as employee engagement.

The speakers also talked about embracing children's fear. Keila told the story of how she was afraid to talk in front of adult audiences - in particular at the keynote level. The advise that her father gave that did help her: You can trip on the stage, you can fall, all of that doesn't matter for as long as you can laugh at yourself. Also remember that every project is not the perfect project - there's always something you can improve - and that's ok. This is fairly in line with the feedback given a day earlier during the Linux Kernel Panel where people mentioned how today they would never accept the first patch they themselves had once written: Be persistant, learn from the feedback you get and seek feedback early.

Last but not least, the speakers advised to not compare your family to anyone, not even to yourself. Everyone arrives at tech via a different route. It can be hard to get people from being averse to tech to embrace it - start with a tiny little bit of motivation, from there on rely on self motivation.

The family's current project turned business is to support L.A. schools to support children get a handle on tech.

The TAO of Hashicorp

In the second keynote Hashimoto gave an overview of the Tao of Hashicorp - essentially the values and principles the company is built on. What I found interesting about the talk was the fact that these values were written down very early in the process of building up Hashicorp when the company didn't have much more than five employees, comprised vision, roadmap and product design pieces and has been applied to every day decisions ever since.

The principles themselves cover the following points:
  • Workflows - not technologies. Essentially describing a UX first approach where tools are being mocked and used first before diving deeper into the architecture and coding. This goes as far as building a bash script as a mockup for a command line interface to see if it works well before diving into coding.
  • Simple, modular and Comosable. Meaning that tools built should have one clear purpose instead of piling features on top of each other for one product.
  • Communicating sequential processes. Meaning to have standalone tools with clear APIs.
  • Immutability.
  • Versioning through Codification. When having a question, the answer "just talk to X" doesn't scale as companies grow. There are several fixes to this problem. The one that Hashicorp decided to go for was to write knowledge down in code - instead of having a README.md detailing how startup works, have something people can execute.
  • Automate.
  • Resilient systems. Meaning to strive for systems that know their desired state and have means to go back to it.
  • Pragmatism. Meaning that the principles above shouldn't be applied blindly but adjusted to the problem at hand.


While the content itself differs I find it interesting that Hashicorp decided to communicate in terms of their principles and values. This kind of setup reminds me quite a bit about the way Amazon Leadership principles are being applied and used inside of Amazon.

Integrating OSS in industrial environments - by Siemens

The third keynote was given by Siemens, a 170 year old, 350k employees rich German corporation focussed on industrial appliances.

In their current projects they are using OSS in embedded projects related to power generation, rail automation (Debian), vehicle control, building automation (Yocto), medical imaging (xenomai on big machines).

Their reason for tapping into OSS more and more is to grow beyond their own capabilities.

A challenge in their applications relates to long term stability, meaning supporiting an appliance for 50 years and longer. Running there appliances unmodified for years today is not feasible anymore due to policies and corporate standards that requrire updates in the field.

Trouble they are dealing with today is in the cost of software forks - both, self inflicted and supplier caused forks. The amount of cost attached to these is one of the reasons for Siemens to think upstream-first, both internally as well as when choosing suppliers.

Another reason for this approach is to be found in trying to become part of the community for three reasons: Keeping talent. Learning best practices from upstream instead of failing one-self. Better communication with suppliers through official open source channels.

One project Siemens is involved with at the moment is the so-called Civil Infrastructure Platform project.

Another huge topic within Siemens is software license compliance. Being a huge corporation they rely on Fossology for compliance checking.

Linus Torvalds Q&A

The last keynote of the day was an on stage interview with Linus Torvalds. The introduction to this kind of format was lovely: There's one thing Linus doesn't like: Being on stage and giving a pre-created talk. Giving his keynote in the form of an interview with questions not shared prior to the actual event meant that the interviewer would have to prep the actual content. :)

The first question asked was fairly technical: Are RCs slowing down? The reason that Linus gave had a lot to do with proper release management. Typically the kernel is released on a time-based schedule, with one release every 2.5 months. So if some feature doesn't make it into a release it can easily be integrated into the following one. What's different with the current release is Greg Kroah Hartman having announced it would be a long term support release, so suddenly devs are trying to get more features into it.

The second question related to a lack of new maintainers joining the community. The reasons Linus sees for this are mainly related to the fact that being a maintainer today is still fairly painful as a job: You need experience to quickly judge patches so the flow doesn't get overwhelming. On the other hand you need to have shown to the community that you are around 24/7, 365 days a year. What he wanted the audience to know is that despite occasional harsh words he loves maintainers, the project does want more maintainers. What's important to him isn't perfection - but having people that will stand up to their mistakes.

One fix to the heavy load mentioned earlier (which was also discussed during the kernel maintainers' panel a day earlier) revolved around the idea of having a group of maintainers responsible for any single sub-system in order to avoid volunteer burnout, allow for vacations to happen, share the load and ease hand-over.

Asked about kernel testing Linus admitted to having been sceptical about the subject years ago. He's a really big fan of random testing/ fuzzing in order to find bugs in code paths that are rarely if ever tested by developers.

Asked about what makes a successful project his take was the ability to find commonalities that many potential contributors share, the ability to find agreement, which seems easier for systems with less user visibility. An observation that reminded my of the bikeshedding discussions.

Also he mentioned that the problem you are trying to solve needs to be big enough to draw a large enough crowd. When it comes to measuring success though his insight was very valuable: Instead of focussing too much on outreach or growth, focus on deciding whether your project solves a problem you yourself have.

Asked about what makes a good software developer, Linus mentioned that the community over time has become much less homogenuous compared to when he started out in his white, male, geeky, beer-loving circles. The things he believes are important for developers are caring about what they do, being able to invest in their skills for a long enough period to develop perfection (much like athletes train a long time to become really sucessful). Also having fun goes a long way (though in his eyes this is no different when trying to identify a successful marketing person).

While Linus isn't particularly comfortable interacting with people face-to-face, e-mail for him is different. He does have side projects beside the kernel. Mainly for the reason of being able to deal with small problems, actually provide support to end-users, do bug triage. In Linux kernel land he can no longer do this - if things bubble up to his inbox, they are bound to be of the complex type, everything else likely was handled by maintainers already.

His reason for still being part of the Linux Kernel community: He likes the people, likes the technology, loves working on stuff that is meaningful, that people actually care about. On vacation he tends to check his mail three times a day to not loose track and be overwhelmed when he gets back to work. There are times when he goes offline entirely - however typically after one week he longing to be back.

Asked about what further plans he has, he mentioned that for the most part he doesn't plan ahead of time, spending most of his life reacting and being comfortable with this state of things.

Speaking of plans: It was mentioned that likely Linux 5.0 is to be released some time in summer 2018 - numbers here don't mean anything anyway.

Nobody puts Java in a container

Jörg Schad from Mesosphere gave an introduction to how container technolgies like Docker really work and how that applies to software run in the JVM.

He started off by explaining the advantages of containers: Isolating what's running inside, supplying standard interfaces to deployed units, sort of the write once, run anywhere promise.

Compared to real VMs they are more light weight, however with the caveat of using the host kernel - meaning that crashing the kernel means crashing all container instances running on that host as well. In turn they are faster to spin up, need less memory and less storage.

So which properties do we need to look at when talking about having a JVM in a container? Resource restrictions (CPU, memory, device visibility, blkio etc.) are being controlled by cgroups. Process spaces for e.g. pid, net, ipc, mnt, users and hostnames are being controlled through libcontainer namespaces.

Looking at cgroups there are two aspects that are very obviously interesting for JVM deployments: For memory settings one can set hard and soft limits. However much in contrast to the JVM there is no such thing as an OOM being thrown when resources are exhausted. For CPUs available there are two ways to configure limits: cpushares lets you give processes a relative priority weighting. Cpusets lets you pin groups to specific cpus.

General advise is to avoid cupsets as it removes one level of freedom from scheduling, often leads to less efficiency. However it's a good tool to avoid cup-bouncing, and to maximise cache usage.

When trying to figure out the caveats of running JVMs in containers one needs to understand what the memory requirements for JVMs are: In addition to the well known, configurable heap memory, each JVM needs a bit of native JRE memory, perm get/ meta space, JIT bytecode space, JNO and NIO space as well as additional native space for threads. With permgen space turned native meta space that means that class loader leaks are capable of maxing out the memory of the entire machine - one good reason to lock JVMs in containers.

The caveats of putting JVMs into containers are related to JRE intialisation defaults being influenced by information like the number of cores available: It influences the number of JIT compilation threads, hotspot thresholds and limits.

One extreme example: When running ten JVM containers in a 32 core box this means that:
  • Each JVM believes it's alone on the machine configuring itself to the maximally availble CPU count.
  • pre-Java-9 the JVM is not aware of cpusets, meaning it will think that it can use all 32 cores even if configured to use less than that.


Another caveat: JVMs typically need more resources on startup, leading to a need for overprovisioning just to get it started. Jörg promised a blog post to appear on how to deal with this question on the DC/OS blog soon after the summit.

Also for memory Java9 provides the option to look at memory limits set through cgroups. The (still experimental) option for that: -XX:+UseCGroupMemLimitForHeap

As a conclusion: Containers don't hide the underlying hardware - which is both, good and bad.

Goal - question - metric approach to community measurement

In his talk on applying goals question metrics to software development management Jose Manrique Lopez de la Fuente explained how to successfully choose and use metrics in OSS projects.

He contrasted the OKR based approach to goal setting with the goal question metric approach. In the latter one first thinks about a goal to achieve (e.g. "We want a diverse community."), go from there to questions to help understand the path ot that goal better ("How many people from underrepresented groups do we have."), to actual metrics to answer that question.

Key to applying this approach is a cycle that integrates planning, making changes, checking results and acting on them.

Goals, questions and metrics need to be in line with project goals, involve management and involve contributors. Metrics themselves are only useful for as long as they are linked to a certain goal.

What it needs to make this approach successful is a mature organisation that understands the metrics' value, refrains from gaming the system. People will need training on how to use the metrics, as well as transparency about metrics.

Projects dealing with applying more metrics and analytics to OSS projects include Grimoire Lab, CHAOSS (Community Health Analytics for OSS).

There's a couple interesting books: Managing inner source projects. Evaluating OSS projects as well as the Grimoire training which are all available freely online.

Container orchestration - the state of play

In his talk Michael Bright gave an overview of current container orchestration systems. In his talk he went into some details for Docker Swarm, Kubernetes, Apache Mesos. Technologies he left out are things like Nomad, Cattle, Fleet, ACS, ECS, GKE, AKS, as well as managed cloud.

What became apparent from his talk was that the high level architecture is fairly similar from tool to tool: Orchestration projects make sense where there are enough microservices to be unable to treat them like pets with manual intervention needed in case something goes wrong. Orchestrators take care of tasks like cluster management, micro service placement, traffic routing, monitoring, resource management, logging, secret management, rolling updates.

Often these systems build a cluster that apps can talk to, with masters managing communication (coordinated through some sort of distributed configuration management system, maybe some RAFT based consensus implementation to avoid split brain situations) as well as workers that handle requests.

Going into details Michael showed the huge takeup of Kubernetes compared to Docker Swarm and Apache Mesos, up the point where even AWS joined CNCF.

For Thursday I went to see Rich Bowen's keynote on the Apache Way at MesosCon. It was great to hear how people were interested in the greater context of what Apache provides to the Mesos project in terms of infrastructure and mentoring. Also there were quite a few questions on what that thing called The Apache Software Foundation actually is at their booth at MesosCon.

Hopefully the initiative started on the Apache Community development mailing list on getting more information out on how things are managed at Apache will help spread the word even further.

Overall Open Source Summit, together with it's sister events like e.g. KVM forum, MesosCon as well as co-located events like the OpenWRT summit was a great chance to meet up with fellow open source developers and project leads, learn about technologies and processes both familiar was well as new (in my case the QEMU on UEFI talk clearly was above my personal comfort zone understanding things - here it's great to be married to a spouse who can help fill the gaps after the conference is over). There was a fairly broad spectrum of talks from Linux kernel internals, to container orchestration, to OSS licensing, community management, diversity topics, compliance, and economics.

Open Source Summit Prague 2017 - part 1

2017-10-23 11:18
Open Source Summit, formerly known as LinuxCon, this year took place in Prague. Drawing some 2000 attendees to the lovely Czech city, the conference focussed on all things Linux kernel, containers, community and governance. The first day started with three crowded keynotes: First one by Neha Narkhede on

Keynotes

Apache Kafka and the Rise of the Streaming Platform. Second one by Reuben Paul (11 years old) on how hacking today really is just childs play: The hack itself might seem like toying around (getting into the protocol of children's toys in order to make them do things without using the app that was intended to control them). Taken into the bigger context of a world that is getting more and more interconnected - starting with regular laptops, over mobile devices to cars and little sensors running your home the lack of thought that goes into security when building systems today is both startling and worrying at the same time.

The third keynote of the morning was given by Jono Bacon on what it takes to incentivise communities - be it open source communities, volunteer run organisations or corporations. According to his perspective there are four major factors that drive human actions:

  • People thrive for acceptance. This can be exploited when building communities: Acceptance is often displayed by some form of status. People are more likely to do what makes them proceed in their career, gain the next level in a leadership board, gain some form of real or artificial title.
  • Humans are a reciprocal species. Ever heart of the phrase "a favour given - a favour taken"? People who once received a favour from you are more likely to help in the long run.
  • People form habits through repetition - but it takes time to get into a habit: You need to make sure people repeat the behaviour you want them to show for at least two months until it becomes a habit that they themselves continue to drive without your help. If you are trying to roll out peer review based, pull request based working as a new model - it will take roughly two months for people to adapt this as a habit.
  • Humans have a fairly good bullshit radar. Try to remain authentic, instead of automated thank yous, extend authentic (I would add qualified) thank you messages.


When it comes to the process of incentivising people Jono proposed a three step model: From hook to reason to reward.

Hook here means a trigger. What triggers the incentivising process? You can look at how people participate - number of pull requests, amount of documentation contributed, time spent giving talks at conferences. Those are all action based triggers. What's often more valuable is to look out for validation based triggers: Pull requests submitted, reviewed and merged. He showed an example of a public hacker leaderboard that had their evaluation system published. While that's lovely in terms of transparency IMHO it has two drawbacks: It makes it much easier to evaluate known wanted contributions than what people might not have thought about being a valuable contribution when setting up the leadership board. With that it also heavily influences which contribtions will come in and might invite a "hack the leadership board" kind of behaviour.

When thinking about reason there are two types of incentives: The reason could be invisible up-front, Jono called this submarine rewards. Without clear prior warning people get their reward for something that was wanted. The reason could be stated up front: "If you do that, then you'll get reward x". Which type to choose heavily depends on your organisation, the individual giving out the reward as well as the individual receiving the reward. The deciding factor often is to be found in which is more likely authentic to your organisation.

In terms of reward itself: There are extrinsic motivators - swag like stickers, t-shirts, give-aways. Those tend to be expensive, in particular if shipping them is needed. Something that in professional open source projects is often overlooked are intrinsic rewards: A Thank You goes a long way. So does a blog post. Or some social media mention. Invitations help. So do referrals to ones own network. Direct lines to key people help. Testimonials help.

Overall measurement is key. So is concentrating on focusing on incentivising shared value.

Limux - the loss of a lighthouse



In his talk, Matthias Kirschner gave an overview of Limux - the Linux rolled out for the Munich administration project. How it started, what went wrong during evaluation, which way political forces were drawing.

What I found very interesting about the talk were the questions that Matthias raised at the very end:

  • Do we suck at desktop? Are there too many depending apps?
  • Did we focus too much on the cost aspect?
  • Is the community supportive enough to people trying to monetise open source?
  • Do we harm migrations by volunteering - as in single people supporting a project without a budget, burning out in the process instead of setting up sustainable projects with a real budget? Instead of teaching the pros and cons of going for free software so people are in a good position to argue for a sustainable project budget?
  • Within administrations: Did we focus too much on the operating system instead of freeing the apps people are using on a day to day basis?
  • Did we focus too much on one star project instead of collecting and publicising many different free software based approaches?


As a lesson from these events, the FSFE launched an initiative to drive developing code funded by public money under free licenses.

Dude, Where's My Microservice

In his talk Dude, Where's My Microservice? - Tomasz Janiszewski from Allegro gave an introduction to what projects like Marathon on Apache Mesos, Docker Swarm, Kubernetes or Nomad can do for your Microservices architecture. While the examples given in the talk refer to specific technologies, they are intented to be general purpose.

Coming from a virtual machine based world where apps are tied to virtual machines who themselves are tied to physical machines, what projects like Apache Mesos try to do is to abstract that exact machine mapping away. Is a first result from this decision, how to communicate between micro services becomes a lot less obvious. This is where service discovery enters the stage.

When running in a microservice environment one goal when assigning tasks to services is to avoid unhealthy targets. In terms of resource utilization instead of overprovisioning the goal is to use just the right amount of your resources in order to avoid wasting money on idle resources. Individual service overload is to be avoided.

Looking at an example of three physical hosts running three services in a redundant matter, how can assigning tasks to these instances be achieved?

  • One very simple solution is to go for a proxy based architecture. There will be a single point of change, there aren't any in-app dependencies to make this model work. You can implement fine-grained load balancing in your proxy. However this comes at the cost of having a single point of failure, one additional hop in the middle, and usually requires using a common protocol that the proxy understands.
  • Another approach would be to go for a DNS based architecture: Have one registry that holds information on where services are located, but talking to these happens directly instead of through a proxy. The advantages here: No additional hop once the name is resolved, no single point of failure - services can work with stale data, it's protocol independent. However it does come with in-app dependencies. Load balancing has to happen local to the app. You will want to cache name resolution results, but every cache needs some cache invalidation strategy.


In both solutions you will also still have logic e.g. for de-registrating services. You will have to make sure to register your service only once is successfully booted up.

Enter the Service Mesh architecture, e.g. based on Linker.d, or Envoy. The idea here is to have what Tomek called a sidecar added to each service that talks to the service mesh controller to take care of service discovery, health checking, routing, load balancing, authn/z, metrics and tracing. The service mesh controller will hold information on which services are available, available load balancing algorithms and heuristics, repeating, timeouts and circuit breaking, as well as deployments. As a result the service itself no longer has to take care of load balancing, ciruict breaking, repeating policies, or even tracing.

After that high level overview of where microservice orchestration can take you, I took a break, following a good friend to the Introduction to SoC+FPGA talk. It's great to see Linux support for these systems - even if not quite as stable as would be an ideal world case.

Trolling != Enforcement

The afternoon for me started with a very valuable talk by Shane Coughlan on how Trolling doesn't equal enforcement. This talk was related to what was published on LWN earlier this year. Shane started off by explaining some of the history of open source licensing, from times when it was unclear if documents like the GPL would hold in front of courts, how projects like gplviolations.org proofed that indeed those are valid legal contracts that can be enforced in court. What he made clear was that those licenses are the basis for equal collaboration: They are a common set of rules that parties not knowing each other agree to adhere to. As a result following the rules set forth in those licenses does create trust in the wider community and thus leads to more collaboration overall. On the flipside breaking the rules does erode this very trust. It leads to less trust in those companies breaking the rules. It also leads to less trust in open source if projects don't follow the rules as expected. However when it comes to copyright enforcement, the case of Patrick McHardy does imply the question if all copyright enforcement is good for the wider community. In order to understand that question we need to look at the method that Patrick McHardy employs: He will get in touch with companies for seemingly minor copyright infringements, ask for a cease and desist to be signed and get a small sum of money out of his target. In a second step the process above repeats, except the sum extracted increases. Unfortunately with this approach what was shown is that there is a viable business model that hasn't been tapped into yet. So while the activities by Patrick McHardy probably aren't so bad in and off itself, they do set a precedent that others might follow causing way more harm. Clearly there is no easy way out. Suggestions include establishing common norms for enforcement, ensuring that hostile actors are clearly unwelcome. For companies steps that can be taken include understanding the basics of legal requirements, understanding community norms, and having processes and tooling to address both. As one step there is a project called Open Chain publishing material on the topic of open source copyright, compliance and compliance self certification.

Kernel live patching

Following Tomas Tomecek's talk on how to get from Dockerfiles to Ansible Containers I went to a talk that was given by Miroslav Benes from SuSE on Linux kernel live patching.

The topic is interesting for a number of reasons: As early as back in 2008 MIT developed something called Ksplice which uses jumps patched into functions for call redirection. The project was aquired by Oracle - and discontinued.

In 2014 SuSE came up with something called kGraft for Linux live patching based on immediate patching but lazy migration. At the same time RedHat developed kpatch based on an activeness check.

In the case of kGraft the goal was to be able to apply limited scope fixes to the Linux kernel (e.g. for security, stability or corruption fixes), require only minimal changes to the source code, have no runtime cost impact, no interruption to applications while patching, and allow for full review of patch source code.

The way it is implemented is fairly obvious - in hindsight: It's based on re-useing the ftrace framework. kGraft uses the tracer for inception but then asks ftrace to return back to a different address, namely the start of the patched function. So far the feature is available for x86 only.

Now while patching a single function is easy, making changes that affect multiple funtions get trickier. This means a need for lazy migration that ensures function type safety based on a consistency model. In kGraft this is based on a per-thread flag that marks all tasks in the beginning and makes waiting for them to be migrated possible.

From 2014 onwards it took a year to get the ideas merged into mainline. What is available there is a mixture of both kGraft and kpatch.

What are the limitations of the merged approach? There is no way right now to deal with data structure changes, in particular when thinking about spinlocks and mutexes. Consistency reasoning right now is done manually. Architectures other than X86 are still an open issue. Documentation and better testing are open tasks.