GeeCon - Randomized testing

2012-05-21 08:02 More posts about Free Software geecon Hacking randomized Science testing
I arrived late during lunch time on Thursday for GeeCon – however just in time to listen to one of the most interesting talks when it comes to testing. Did you ever have the issue of writing code that runs well in your development environment but crashes as soon as it's rolled out at customers only to find out that their Locale setting was causing the issues? Ever had to deal with random test failure because against better advise your tests did depend on execution order that is almost guaranteed to be different on new JVM releases?

The Lucene community has encountered many similar issues. In effect they are faced with having to test a huge number of different configuration combinations in order to make sure that their software runs in all client setups. In recent months they developed an approach called randomised testing to tackle this problem: Essentially on each run “random tests” are run multiple times, each time with a slightly different configuration, input, in a different environment (e.g. Locale settings, time zones, JVMs, operating systems). Each of these configurations are pseudo random – however on test failure the framework will reveal the seed that was used to initialize that pseudo random number generator and thus allow you to reproduce the failure deterministically.

The idea itself is not new: published in a paper by Ntafos, used in fuzzers to identify security holes in applications this kind of technique is pretty well known. However applying it to write tests is a new idea used at Lucene.

The advantage is clear: With every new run of the test suite you gain confidence that your code is actually stable to any kind of user input. The downside of course is that you will discover all sorts of different issues and bugs not only in your code but also in the JVM itself. If your library is being used in all sorts of different setups fixing these issues upfront however is crucial to avoid users being surprised that it does not work well in their setup. Make sure to fix these failures quickly though – developers tend to ignore flickering tests over time. Adding randomness – and thereby essentially increasing the number of tests in your testsuite – will add the amount of effort to invest in fixing broken code.

Dawid Weiss gave a great overview of how random tests can be used to harden a code base. He introduced the testframework written at carrot search that isolated the random test features: It comes with a RandomizedRunner implementation that can be used to subsitute junit's own runner. It's capable of tracking test isolation by tracking spawned threads that might be leaking out of tests. In addition it provides utilities for instance for creating random strings, locals, numbers as well as annotations to denote how often a test should run and when it should run (always vs. nightly).

So when having tests with random input – how do you check for correctness? The most obvious thing to do is when being able to check the exact output. When testing a sorting method, not matter what the implementation and the input is – the output should always be sorted, which is easy enough to check. Also checking against simpler, but maybe in practice more expensive algorithms is an option.

A second approach is to do sanity checks: Math.abs() at least should always return positive integers. The third approach is to do no checking at all in some cases. Why would that help? You'd be surprised by how many failures and exceptions you get by actually using your API in unexpected ways or giving your program unexpected input. This kind of behaviour checking does not need any assertions.

Note: Really loved the APad/ iMiga that Dawid used to give his talk! Been such a long time since I last played with my own Amiga...