GeeCon - Testing hell and how to fix it

2012-05-26 08:08
The last regular talk I went to was on testing hell at Atlassian – in particular the JIRA project. What happened to JIRA might actually be known to developers who have to deal with huge legacy projects that predate the junit and dependency injection era: Over time their test base grew into a monster that was hard to maintain and didn't help at all with making developers confident on checkin time that they would not break anything.

On top of 13k unit tests they head accumulated 4k functional tests, several hundreds of selenium user interface tests in 65 maven modules depending on 554 dependencies that represented quite some technology mix from old to new, ranging across different libraries for solving the same task. They used 60+ remote agents for testing, including AWS instances that were orchestrated by a Bamboo installation, had different plants for every supported version branch, tested in parallel.

Most expensive were platform tests that were executed every two to four weeks before each release – those tested JIRA with differing CPU configurations, JVMs, Browsers, databases, deployment containers. Other builds were triggered on commit, by dependencies or nightly.

Problem was that builds would take for 15 min for unit tests, one hour for functional tests, several hours for all the rest – that means developers get feedback only after they are home essentially blocking other developers' work. For unit tests that resulted in fix turnaround times of several hours, for integration tests several days. Development would slow down, developers became afraid of commits, it became difficult to release – in summary morale went down.

Their problems: Even tiny changes caused test avalanches. As tests were usually red, noone would really care. Developers would not run tests for effort reasons and got feedback only after leaving work.
Some obvious mistakes:

Tests were separate from the code they tested – in their case in a separate maven module. So on every commit the whole suite has to run. Also back when the code was developed dependency injection only just started to catch up which meant the code was entangled, closely coupled and hard to test in isolation. There were opaque fixtures hard coded in xml configuration files that captured application scope but had to be maintained in the tests.

Their strategy to better testing:

  • Introduce less fragile UI tests based on the page objects pattern to depend less on the actual layout and more on the functionality behind.
  • They put test fixtures into the test code by introducing REST APIs for modification and an introduction of backdoors, only open in the test environment.
  • Flickering tests were put to quarantine and either fixed quickly or deleted – if noone fixes them, they are probably useless anyway.


After those simple measures they started splitting the software into multiple real modules to limit scope of development and raise responsibility of development teams. That comes with the advantage of having tests close to the real code. But it comes with the cost of a more complex CI hierarchy. However in well organised software in such a project hierarchy commits turned out to tend to go into leaves only – which did lessen the number of builds quite a bit.

There is a tradeoff between speed vs. control: Modularizing means you no longer have all in one workspace, in turn it means faster development for most of your tasks. For large refactorings noone will stop you to put all code in one idea workspace.

The goal for Atlassian was to turn the pyramid of tests upside down: Have most but fast unit tests, have less REST/html tests and even less Selenium tests. Philosophy was to only provide REST tests if there is no way at all to cover the same function in a unit test.

In terms of speeding up execution they started batching tests against one instance to avoid installation time, merged tests, used in-process databases, mocked IO and webservers where possible. Also putting more hardware in does help, so does avoiding sleeping in tests.

In terms of splitting code – in addition to responsibility that can also be done by maturity to keep what is evolving quickly close together until it is stable.

The day finished with a really inspiring keynote by Kevlin Henney on Cool Code – showing several both either miserably failing or incredibly cool pieces of software. His intention when reading code is to extend a coders vocabulary when it comes to programming. That's why even the obfuscated c code competition does make for an interesting read as it tells you things about language features you otherwise might never have learned about before. One very important conclusion from his talk: “If you don't have the time to read, you have neither time nor tools to write.” - though being made by Stephen King on literature this statement might as well apply to software, after all to some extend what we produce is some kind of art, is some kind of literature in it's own right.

GeeCon - TDD and it's influence on software design

2012-05-22 08:04
The second talk I went to on the first day was on the influence of TDD on software design. Keith Braithwaite did a really great job of first introducing the concept of cyclomatic complexity and than showing at the example of Hudson as well as many other open source Java projects that the average and mean cyclomatic complexity of all those projects actually is pretty close to one and when plotted for all methods pretty much follows a power law distribution. Comparing the properties of their specific distribution of cyclomatic complexities over projects he found out that the less steep the curve is, that is the more balance the distribution is, that is the less really complex pieces there are in the code the more likely are developers happy with the current state of the code. Not only that, also that distribution would be transformed into something more balanced after refactorings.

Now looking at a selection of open source projects he analyzed what the alpha of the distribution of cyclomatic complexity is for projects that have no tests at all, have tests and those that were developed according to TDD. Turns out that the latter ones were the ones with the most balanced alpha.