I’d like to start with a "Getting to know the audience" type of a question – What does it feel like to write a bug?
The words I usually expect to hear in answer to this question range from "Embarrassing", "Sad", "Annoying" through to extremes such as "Mortifying". As you can probably guess from the title of this talk, I don’t believe that at all – and it is my goal to make you see that not only are these the wrong answers but that subscribing to them can actually be harmful.
I’m going to start with a very famous bug – perhaps one of the most famous bugs of all (I’m not talking about the millennium bug, mainly because that wasn’t a bug at all but rather a design decision that turned bad).
For developers, this defect is an example of an Integer Overflow. Basically, an attempt was made to store an integer or result of an integer calculation in a memory location that was too small for that integer. The operating system spotted this and threw an exception – not exactly rocket science.
So – to return to the question. What would it feel like to write that bug? You’d probably go with "Mortifying" again but that would be wrong – mortifying is what it feels like to discover that bug (in this case to discover the bug in production). When the developer wrote the line of code that threw the exception, they didn’t even know they had written a bug so they would have probably felt somewhere between neutral to mildly elated.
Now as developers, you all know how you would have prevented this bug from ever coming into being. Unit tests? Defensive programming? Code review? Test-Driven development?
The Ariane 5 investigation findings were as follows:
- The launcher started to disintegrate at about H0 + 39 seconds because of high aerodynamic loads due to an angle of attack of more than 20 degrees that led to separation of the boosters from the main stage, in turn triggering the self-destruct system of the launcher.
- This angle of attack was caused by full nozzle deflections of the solid boosters and the Vulcain main engine.
- These nozzle deflections were commanded by the On-Board Computer (OBC) software on the basis of data transmitted by the active Inertial Reference System (SRI 2). Part of these data at that time did not contain proper flight data, but showed a diagnostic bit pattern of the computer of the SRI 2, which was interpreted as flight data.
- The reason why the active SRI 2 did not send correct attitude data was that the unit had declared a failure due to a software exception.
- The OBC could not switch to the back-up SRI 1 because that unit had already ceased to function during the previous data cycle (72 milliseconds period) for the same reason as SRI 2.
- The internal SRI software exception was caused during execution of a data conversion from 64-bit floating point to 16-bit signed integer value. The floating point number which was converted had a value greater than what could be represented by a 16-bit signed integer. This resulted in an Operand Error. The data conversion instructions (in Ada code) were not protected from causing an Operand Error, although other conversions of comparable variables in the same place in the code were protected.
- The error occurred in a part of the software that only performs alignment of the strap-down inertial platform. This software module computes meaningful results only before lift-off. As soon as the launcher lifts off, this function serves no purpose.
- The alignment function is operative for 50 seconds after starting of the Flight Mode of the SRIs which occurs at H0 - 3 seconds for Ariane 5. Consequently, when lift-off occurs, the function continues for approx. 40 seconds of flight. This time sequence is based on a requirement of Ariane 4 and is not required for Ariane 5.
- The Operand Error occurred due to an unexpected high value of an internal alignment function result called BH, Horizontal Bias, related to the horizontal velocity sensed by the platform. This value is calculated as an indicator for alignment precision over time.
- The value of BH was much higher than expected because the early part of the trajectory of Ariane 5 differs from that of Ariane 4 and results in considerably higher horizontal velocity values.
How Bugs Come To Be
So – as was said earlier – bugs are rarely if ever written deliberately. Yet here we are, educated and screened by layers of HR writing bug after bug, day after day. I haven’t done decades of peer reviewed double blind random trials so you should treat my observations with some scepticism. Never the less some observations I have made are as follows:
A Good Crop of Bugs Needs a Fertile Environment
Let me tell you a tale of two developers. One is fresh into work at the start of the week. It is 09:15 on a Monday. He or she has just had a fantastic first cup of coffee and is starting a new module with the business walk through fresh in their memory.
Developer two is slightly less well off. It is 02:25 on Sunday morning, he or she is on her 20th cup of coffee with only the vaguest memory of the business requirements and a live Bengal tiger prowling in the office. (If this analogy seems too far-fetched, then you haven’t seen your MD undergoing the P1S1 transmogrification.)
I think it is fairly obvious that developer 2 is going to write more and worse bugs, right?
So here’s the reveal. Developers one and two are the same person – only their circumstances are different. (I have been both of those soldiers). The conclusion should be that we don’t allow you to write code at 02:00am on a Sunday morning but that is not how reality works. A more practical lesson to take from this is when you are developer one, write the code that helps developer two.
(This means, but is not exclusively – write clear code, write comments, write unit tests, write verbose, be nice.)
Analyzing Your Bugs Can Help Reduce Them
I used to assume that as I became more experienced, the number of bugs that I wrote would tail off pretty rapidly to near zero or that only the most esoteric circumstances would have to exist for me to still write bugs – but my reality didn’t match that expectation… so I started keeping a journal. Whenever a bug was found in code that I had written or had a connection to, I went back to try and work out the root cause of that bug.
I was pretty amazed by two things that I discovered:
- Broadly speaking, the more simple the function of code, the more likely it was to have bugs (pretty counter-intuitive)
- The most dangerous habit I had as a developer, responsible for by far the majority of my own personal bugs was one single keypress: CTRL+V.
Conclusions
So what we end up are some really pithy platitudes – but what else can you expect on a 45 minute path to enlightenment. They are:
- No matter how good you are, you are going to write bugs. Don’t be embarrassed, just do everything in your power to prevent them "getting out of the cage".
- You are going to be asked, pressured, cajoled and blackmailed into doing Duck-tape things that will reduce code quality and increase the likelihood of creating bugs. Your responsibility is to know when not to do them.
- The bugs you (or your team) write will tell you quite a lot about the software you (or your team) have written. Keep a bug journal and you will learn from them.