in

Ative at Work

Agile software development

Ative at Work

The Waste of Defects - Bugs are Stop-the-Line Issues

"Don't clean it, " my grandmother used to say; "keep it clean.". 

She probably learned it long before the computer era. Yet for some reason her advice did not spread to the software industry. We still have a tendency to build up a big mess and put off cleaning it up until much later. I am thinking about the waste of defects - the lean principle of preventing trouble from creeping in rather than struggling to get it out after the fact.

In software, as in lean, bugs are stop-the-line issues.
 
Once you find them you have to fix them before moving on. Period. 
 
I have heard many excuses for accepting mediocre quality, but no good ones. If it doesn't have to work, why do we build it at all? In fact, if we explicitly want to build something that does not have to work, the easiest way is not to build it at all!
 
So, if we want it built it is fair to say that we want it to work. Therefore, we have to make sure that it works when we build it, and that it keeps working - namely, that we fix bugs as soon as they appear.
 
A side benefit of this that we might not even need a bug tracking system - after all, it only exists to manage all the defects that we should not have allowed to linger in the system in the first place. If we fix the issues as soon as we find them, we can easily track the number of open bugs on a single post-it note. 

We did this on a mainframe migration projects. We built it test-driven and fixed the odd uncaught defect on the spot. In the end, we had some 7000 automated test cases to keep the system in a known good state. And no bug tracking system.
 
So, in many ways a bug tracking system could be considered an indicator of waste in the organisation. However, for an organisation with low software quality, the non-existence of a bug-tracking system is an indicator for even greater problems.
 
The transition to treating defects as a stop-the-line issue will definitely be painful for many organisations. Remeber that it took Toyota about a month to build the first car after introducing this concept in the NUMMI factory they acquired from General Motors. In software it is often worse.
 
First of all, some teams - namely the teams with the highest technical debt or lowest output quality will appear to not produce anything. They will suddenly spend all their time cleaning up the house rather than adding new features.
 
The downside is that the low quality suddenly becomes painfully visible to the whole organisation. Since many organisations measure new features or delivering on schedule as success and shy away from measuring quality this will create a sense of crisis.
 
The upside is that the quality becomes painfully visible. The teams that produce low quality will be stopped from producing more low-quality stuff whereas the teams with a higher output quality can dedicate a higher proportion of their effort to building new software. This creates a virtuous circle where all new software is produced by the teams that are capable of building the highest quality software. The other teams will be busy cleaning up their mess.
 
The net result is legacy software base that improves and higher quality new software.
 
The pain of transition is so great that many organisations shy away and prefer to run up their technical debt instead.

Sometime they try to hide it by frequent changes of "strategic platform": when the mainframe was replaced in favour of Java it initially appeared to be much more productive but eventually the organisational bent to produce bugs rather than fixing them built up enough technical debt that the Java platform deteriorated to the same level as the mainframe. Then came the time for another "strategic platform shift" to .NET - and development productivity soared, only to decline over time as the code-base atrophied, making the organisation ready for "the next big platform."

The underlying issue is that the technology or platform is not the root cause of the unacceptable productivity levels. It is the organisational culture of accepting low quality that is the cause. This cannot be helped by replacing the technology. The remedy is all about behaviour.
 
We change this behaviour top-down or bottom-up. But we need to fix it. There is no excuse to wait. We have to stop the line, and stop building bugs into our software. Not only for the quality, but also for the productivity. This is what lean software development is all about.

Published jan 29 2007, 10:57 by Martin Jul
Filed under: , ,

Comments

 

Kang Leay said:

it's quite good of saying "Keep it clean", the habit our team pursues.

september 3, 2008 4:27
 

Victor Volle said:

What if:

a "feature" has been implemented and in some corner cases (does not occur often) there is a bug. And there is a workaround for the users that takes them 10 Minutes. Investigating the cause of the bug might take 2-40 hours and fixing it might take 5 hours. Would you fix it, if there are features that are not yet implemented at all?

So I think fixing everything immediately is a waste of time. You have to do some "triage". And therefore you probably will create technical debt.

september 24, 2008 2:06
 

Andrew Goddard said:

Victor,

The whole point of the article it to eliminate technical debt.

If you had a car and sometimes in some corner cases, when you turn a corner at a certain speed and your car breaks down and you have to start your journey again, would you take it to your mechanic to get it fixed?

Think about it - if I'm a user, times how many users, using your product and I have to spend 10 minutes (times x number of users) to do this workaround, how much productivity are users losing over time.

So is a 45 hour investment really a waste?  It could be a waste compared to doing nothing - sure - but I think you'd probably make the investment based on productivity loss and potential customer loss - not to mention having healthy software that can be continually built on at a continual pace.

The mindset is to move from a "work-around" culture to a stop-the-line culture.

september 26, 2008 8:11
 

Martin Jul said:

Hi Victor and Andrew

Thanks for taking part in the conversation.

If you look into Theory of Constraints you will find a good discussion about why stop-the-line makes sense.

The idea is that you look at the throughput of the whole organisation. In a sequential process there will be at least one step that has the lowest capacity, so it is the one limiting the total output.

Many management systems for production are focused on resource utilization - in TOC we care only about maximizing the utilization of the limiting resource and in fact it is recommended to have slack (less than full utilization) in the other steps.

The main principle then, is to look at how we utilize the limiting resource. We want to make sure that we are not wasting it by letting it be idle, by producing the wrong product or by having to do rework (see the lean wastes: community.ative.dk/.../Lean-Principle-Number-1-_2D00_-Eliminate-Waste.aspx). We want to do everything possible to get the most from it to improve the throughput of the whole system.

Let's assume that the limiting step is the development team. Imagine a stream of work flowing from idea to production software. Now, imagine that the development team emits a piece of defective software. What happens now is that the piece comes back and they have to do it again, and since the overall throughput of the whole process is governed by the development team's capacity it means that the defective piece has limited the output of the entire development process:

If it takes one unit of time to do one feature, and one more to fix the defect when it comes back it means we get only the value for one feature for the price of the two units. Since it governs the output of the entire system this is a very sizable reduction if the defect production rate is high.

It is often a good idea to add a quality control step before the limiting step to make sure that it does not waste its capacity working on something defective that will require it to work on the same feature again later.

This is one of the reasons why Scrum planning works since we have the Conversation about the requirements with the Team and Product Owner and try to clarify the acceptance criteria BEFORE we start working. This is essentially a quality control gate to make sure that we do not waste the limited resource on producing the wrong thing.

As for Victor's example, Deming has a good discussion about quality control and a statistical model for calculating the cost/benefit of quality control (eg. testing) in "Out of the Crisis". It would be interesting to relate this to Victor's example and the specific process and cost pattern.

The idea from TOC is that you always have to look at the whole system and optimize for the current bottleneck. This is not necessarily the same at every phase of the project (eg. at one point the developers could be the constraint, later in the project it could be the systems testing team's time that is most precious and should not be wasted).

Goldratt's The Goal is an excellent business novel that explains the principles of Theory of Constraints and Clarke Ching has just written a new book on the same theme for software development called Rolling Rocks Downhill (I have only read a preproduction manuscript of it so far, but it is very good).

Goldratt: en.wikipedia.org/.../Eliyahu_M._Goldratt

Ching: www.rollingrocksdownhill.com

september 29, 2008 12:47
 

Jen said:

Just for clarification

Lean talks about "Andon" and not "stop the line". "Stop the line" could be one state of Andon. Primarily "Andon" is a notification system of a quality / process problem. The alert can be activated manually by a worker using a pull cord or button. The alert is usually a signal light - yellow or red. Red could also mean that the line is automatically stopped - but that is not implicit. Therefore I would say that it should rather be called "Pull the line" because you can pull the line for a yellow signal which does not stop the line. Stopping the line is only an utmost action.

Nevertheless, bugs could be issues for the red signal. But not all bugs require to stop the line - they need focused attention.

september 21, 2010 2:43

About Martin Jul

Building better software faster is Martin's mission. He is a partner in Ative, and helps development teams implement lean/agile software development and improve their craftmanship teaching hands-on development practises such as iterative design and test-first. He is known to be a hardliner on quality and likes to get things done-done.
© Ative Consulting ApS