Monday, 10 October 2011

The Question

I know why you're here. I know what you've been doing... why you hardly sleep, why you live alone, and why night after night, you sit by your computer. You're looking for him. I know because I was once looking for the same thing. And when he found me, he told me I wasn't really looking for him. I was looking for an answer. It's the question that drives us. It's the question that brought you here. You know the question, just as I did.

'Why do so many Software projects fail?'

Two weeks back, before I moved house, I gave a presentation to the London Agile Practitioners 20x20 meetup about the "Dark side of metrics" which I've uploaded to the new Slideshare-y site SpeakerDeck here.

I realised by the time that I got to slide 4 that I was giving the wrong presentation, and yet in an ironically un-agile fashion I continued to give the presentation anyway. This was echoed by another presentation, in which the presenter showed a cheetah (more irony, this is the mascot of Pentaho's 'Agile BI' initiative), and argued that whilst most people viewed this as agile, in actual fact the gazelle is more agile because although not as fast in a straight line it is more manoeuvrable and can out-turn the cheetah. His point was that many people now equate 'agile' with 'velocity' (hence the popularity of the term 'sprint'), rather than the more correct 'responsiveness to change'.

To my defence, being a Pecha Kucha night the emphasis is on 'rush rush hurry hurry', no interruptions, the clock is running, no more than 20 seconds per slide, no less than 20 seconds per slide, so the pressure was on. But I learnt something that night, apart from the fact that Pecha Kucha isn't very 'Agile'! That is that no matter what the pressure, we must have the courage to use Jidoka or 'stop the line' in Lean terms.

The reason that the presentation in its current form is 'wrong' is that it tells the following story:

Beginning: We need to plan, so we use software metrics
Middle: Software metrics have unintended pernicious side-effects. Software development is hard, and success is unpredictable. Software is fundamentally different from Science, Engineering and Manufacturing. Algorithmic Complexity research shows that actually the problem of complexity estimation is insoluble, and hence so too are effort and cost estimation!
Ending: If accurate estimation is an insoluble problem, the only rational course is to abandon predictive methodologies for reactive ones, hence the ascension of Agile. QED.

I realised that everybody already knew Big Planning Up Front was bad, nobody in the room had the misfortune of working on government or other public-sector contracts, and everyone was using Agile planning and estimation methods.

Agile has 'crossed the chasm', it is now old news; but still it is quite misunderstood as we have much 'cargo-cult' Agile (the subject of another presentation: 'Agile considered harmful'). The focus of a presentation on metrics should instead have been about what is wrong with Agile and software metrics as we currently have them, and why the emergence of a 'code agility metric' is "a bad thing", like "don't cross the streams" bad. There is currently a whole topic on it on LinkedIn's Agile Alliance forum, people have blogged about it and one guy even has a whole article entitled "Metrics Driven by Agile Values and Principles" on Agile Record magazine.

The issue here is that as soon as there is a weight of opinion behind a metric, people will start using it, managers even, and giving a manager a metric is a bit like juggling chainsaws: whilst initially it seems impressive, it's really not clever and it's only a matter of time until someone gets hurt. Case in point: when I spoke to a senior manager at the BBC they admitted that the only time they use software metrics is when they want some ammo to help back up the decision to fire someone.

Douglas Hoffman in his 1999 article on "The Darker side of metrics" documents many cases of organisational pathology arising from the measurement effect, that the act of measuring the system, changes the system. As Hoffman says: "Whether or not our models are correct, and regardless of how well or poorly we collect and compute software metrics, people’s behaviours change in predictable ways to provide the answers management asks for when metrics are applied" - they create self-fulfilling prophecies. He tells how in countless organisations, software quality metrics cause deleterious and often bizarre behaviours e.g. using 'ready to release' metrics based upon the ratio of bugs found / bugs fixed causes the following to happen:
  • To reduce the number of defects, twenty-five reports against a subsystem were all marked as “duplicates” of one new defect. The new defect report referred to each of the twenty-five for a description of the problem (because the only thing the twenty-five had in common was that they were reported against the same subsystem)
  • In an organization where defects didn’t get counted before initial screening and assignment, a dozen defects that hadn’t been resolved in more than four weeks were assigned to the developer “Unassigned,” and thus were not counted
  • In one case the testers withheld defect reports to befriend developers who were under pressure to get the open defect count down. In another case the testers would record defects when the developers were ready with a fix to reduce the apparent time required to fix problems
  • A test group took heat for not having found the problems sooner (to give the developers more time to fix the problems)
  • Developers only reported problems after they had been fixed (thus never making the ratio worse)
He writes: "Every software organization I have observed that has used metrics for more than a few years has had bizarre behaviours as a result. There is a decidedly 'dark side' to these metrics programs that impacts organizations all out of proportion to what is intended."

Hoffman uses research by Cem Kaner Ph.D. on measurement factors to highlight what we need to think about whenever we make a measurement:
  1. The purpose of the measure. What the measurement will be used for.
  2. The scope of the measurement. How broadly the measurement will be used.
  3. The attribute to be measured. E.g., a product’s readiness for release.
  4. The appropriate scale for the attribute. Whether the attribute’s mathematical properties are rational, interval, ordinal, nominal, or absolute.
  5. The natural variation of the attribute. A model or equation describing the natural variation of the attribute. E.g., a model dealing with why a tester may find more defects on one day than on another.
  6. The instrument that measures the attribute. E.g., a count of new defect reports.
  7. The scale of the instrument. Whether the mathematical properties of measures taken with the instruments are rational, interval, ordinal, nominal, or absolute.
  8. The variation of measurements made with this instrument. A model or equation describing the natural variation or amount of error in the instrument’s measurements.
  9. The relationship between the attribute and the instrument. A model or equation relating the attribute to the instrument.
  10. The probable side effects of using this instrument to measure this attribute. E.g., changes in tester behaviours because they know the measurement is being made.
In my experience, you're lucky if one or two of these have been considered, let alone most or all, yet this is common in the Sciences, my Physics lab work at university would have received short-shrift if I hadn't calculated margins of error for all my measurements. How many times have you seen error bars on a Gantt chart work item effort or duration estimation? Thought so.

Although we naturally equate 'complex' with 'large' in terms of numbers of things, this is a misconception. The humble water molecule is achingly simple, just one oxygen atom and two hydrogen atoms, but it gives rise to extraordinarily complex behaviour as any chemist knows, the phase transition diagram for water shows no less than 21 forms of Ice (that we know about!)




Another example of hidden complexity is the three-body problem, namely: "the problem of taking an initial set of data that specifies directly or indirectly the positions, masses and velocities of three bodies for some particular point in time and then using that set of data to determine the motions of the three bodies and to find their positions at other times in accordance with the laws of classical mechanics: Newton's laws of motion and of universal gravitation."

For a video of this see here: three-body problem

This was first published in the 17th century by Isaac Newton; it turns out that for two bodies of a set mass and initial position, that the problem is generically solvable for all initial conditions, yet for three bodies there is no general solution, and even minuscule changes in initial conditions lead to wildly unpredictable fluctuations in outcomes.

This was the beginning of Chaos Theory, and it turns out that even something as conceptually simple as a coin toss, or three equal-sized masses in a vacuum can spawn undreamed-of complexity.

The problem is... management. Managers still see the world through the lens of causal determinism, and not just because that is what they've been taught, all the way back to Frederick Winslow Taylor and before, but because this is a natural function of the way that the human mind works. The human mind has evolved to see a cause behind everything, a bush rustles and we envisage a lion behind it, waiting to pounce! This is a function of evolutionary psychology, and is to be expected because of the imbalance of risk between the outcomes of:
  • False positive = we run from shadows, and waste a little time
  • False negative = we get eaten
Immanuel Kant posited Causal Determinism is a necessary prerequisite for all scientific discovery, and it has served us well in its way, we learn linear equations in school and they help us solve problems, but we don't learn non-linear equations, because they are too hard to solve!
But let us not blame Taylor, he was the product of a time when the predominant belief, quoted by Lord Kelvin in 1900 was: "There is nothing new to be discovered in physics now. All that remains is more and more precise measurement."

Yet Kurt Gödel knew in 1931 that there was trouble afoot, and his incompleteness theorem paved the way for a troubling new branch of science which had inexorably to admit that there were problems that even the brightest minds could never solve. Like ostriches we pushed these to one side, brushed them under the carpet to concentrate on problems that we could solve, and for decades the scientific community regarded this as a mere curiosity, like Fermat's Last Theorem (ironically itself solved in 1994).

No-one cared for the unsolvable, and this class of problem languished in the scientific doldrums, yet Nobel prize-winning Physicist Richard Feynmann believed that the class of insoluble problems was actually not a mere footnote or curiosity, but actually composed the vast majority of all problems.

Gradually dynamical systems theory, chaos theory, social complexity theory and a variety of other 'complexity sciences' have sprung up to cover the gap in our knowledge, and even Stephen Hawking has referred to the 21st Century as the 'century of complexity'.

As Frederick Brooks wrote in "No Silver Bullet", software is inherently and essentially unpredictable, and no amount of 'Computer Science', or 'Software Engineering', or 'Software Metrics' (e.g. six-sigma) is going to change this. We've proved this already, whether by the Chaitin incompleteness theorem, Network Theory, Dynamical Systems Theory, Game Theory or Social Complexity Theory.

Now it is time for Management to catch up.

Welcome, to the real world!

No comments: