Scientific debugging - take 2

Scientific debugging - take 2 #

Back in - OMG was that really back in 2010? - 2010 I wrote a post on scientific debugging. Today I was reminded of this post as I actually had the pleasure of watching this principle carried out - except this was for a medical “bug” instead of one in a piece of software.

To quote the book Why programs fail the method of scientific debugging consists of 5 easy to follow steps:

Observe a failure (i.e., as described in the problem description).
Invent a hypothesis as to the failure cause that is consistent with the observations.
Use the hypothesis to make predictions.
Test the hypothesis by experiments and further observations:
If the experiment satisfies the predictions, refine the hypothesis.
If the experiment does not satisfy the predictions, create alternate hypothesis.
Repeat steps 3 and 4 until the hypothesis can no longer be refined.

In the past I’ve often seen developers jump immediately from a problem description in a user reported bug to implementing what looks like the obvious solution only to find out later that the actual problem underlying the problem description was something completely different.

I know that making sure one has found the true problem can take time. However today I was reminded a) just how long that time can be and b) just how important each step in that search can be:

So to start with the problem description (step 1 above): “On Saturday evening on my way into the bath tub some pretty bad pain hit me and forced me to lie down flat on the floor. After a few minutes the pain was gone, what remained until today’s Tuesday is a pain in my back that gets worse when I walk or lie down on my left side. As a side-note: I’m expecting a baby - not sure if that has anything to do with the above.”*

husband urged me to get to our general practitioner. So off I went - told him the story above. He couldn’t do a whole lot for me but to him it seemed like the typical sciatic pain syndrome (step 2 above) - so off he sent me to the orthopaedic (he needed an expert’s opinion to check his hypothesis apparently).

I called the one doctor I was recommended, was told to go there immediately, went there. Some 15 minutes later I told the doctor the story above (step 1). She followed the hypothesis - except for one tiny little problem: Given the location of the pain and my condition she knew of at least one more hypothesis that would also fit the observation (step 2). She made a manual check (step 3). Her observation was still consistent with the hypothesis (step 4). Now in order to reject said hypothesis she would have needed to do one more check - except she couldn’t do that one herself. So she sent me off to my gynecologist to get an expert’s opinion (and gave me an appointment for late this week/early next week in case really only the original hypothesis is valid).

Off I went, called the third doctor. I told her the whole story (step 1), she continued where the other doctor had left off and made the missing check (step 3) - and rejected the alternative hypotheses right away (step 4).

So off she went to step 5: Sometimes labour pains start in the back (step 2) - so she put me on a CTG (step 3) - after several minutes that hypothesis was rejected as well (step 4). So off she went to step 5:

The last non-trivial hypothesis would have been that something’s wrong with kidneys (step 2). So she checked with her medical ultrasonics (step 3) - she wasn’t 100% sure though it seemed to look ok so she send me off to one last expert for a second opinion.

So on my way home I got to tell the story above the fourth time (step 1). Again she continued with where the other doctor had left off - namely step 3: One more medical ultrasonic and a blood test later the last non trivial hypothesis was finally rejected altogether (step 4) So what was left was the initial hunch of there being some blockade in my back. Finally we can step from observing to acting, namely keeping warm (hot water bottle), relaxing and to keep moving regularly.

As annoying as the day for the system to be debugged was there’s at least two valuable lessons learnt in it:

Being absolutely certain about a certain problem cause can be expensive (in this case in particular time consuming for me, rest is something the doctors involved have to discuss with my public health insurance). However jumping directly from problem description to bugfix can turn out to be much more expensive if said bugfix takes a long time to implement (and show effects) but is treating the wrong problem.

Proving one hypothesis right sometimes involves ruling out options that are easy to check but have bad consequences if left untreated. Translated back - before jumping from problem description to bugfix it may make sense to stop for a second and think whether the particular problem you see might actually be caused by a bug far worse than what you anticipated.

Now off to search for my warm little elephant.

* Yeah, that’s why I didn’t make it to the Berlin Open Source meetup on Sunday that I organised - at least my husband had fun together with Lennart and others trying to explain to the non German speaker what on earth is the English translation for the term Hexenschuß.