Wednesday, July 25, 2012

Fixing a bug is like catching a fish

Manager: So, how long will it take to fix this bug?

Inexperienced Programmer: An hour maybe? Two tops? I’ll get right on it!

Experienced Programmer: Well, how long will it take to catch a fish?

It’s hard to know how long it’s going to take to fix a bug, especially if you don’t know the code. James Shore points out in The Art of Agile that obviously before you can fix something, you have to figure out what’s wrong. The problem is that you can’t estimate accurately how long it will take to find out what’s wrong. It’s only after you know what’s wrong that you reasonably estimate how long it will take to fix it. But by then it’s too late. According to Steve McConnell

“finding the defect – and understanding it – is usually 90 percent of the work.”

A lot of bug fixes are only one line changes. What takes the time is figuring out the right line to change – like knowing where to tap the hammer, or when and where the fish will be biting. Some bugs are easy to find and easy to fix. Some bugs are hard to find, but easy to fix. Other bugs are easy to find and hard to fix. And some bugs can’t be found at all, so they probably can’t be fixed. Unless you wrote the code recently, you probably have no idea which kind of bug you’re being asked to work on.

Finding and Fixing a Bug

Let’s look at what’s involved in finding and fixing a bug. In Debug It! Paul Butcher does a good job of describing the steps that you need to go through, in a structured and disciplined way that will be familiar to experienced programmers:

  1. Make sure that you know what you’re looking for. Review the bug report, see if it makes sense, make sure it really is a bug and that you have enough information to figure the problem out and to reproduce it. Check if it has already been reported as a duplicate, and if so, what the guy before you did about it, if anything.
  2. Clear the decks – find and check out the right code, cleanup your work space.
  3. Setup your test environment to match. This can be trivial, or impossible, if the customer is running on a configuration that you don’t have access to.
  4. Make sure that you understand what the code is supposed to do, and that your existing test suite passes.
  5. Now it’s time to go fishing. Reproduce and diagnose the bug. If you can’t reproduce it, you can’t prove that you fixed it.
  6. Write new (failing) developer tests or fix existing tests to catch the bug.
  7. Make the fix – and make sure that you didn’t break anything else. This may include some refactoring work to understand the code better before you make the fix, so that you can do it safely. And regression testing afterwards to make sure that you didn’t introduce any new bugs.
  8. Try to make the code safer and cleaner if you can for the next guy, with some more step-by-step refactoring. At least make sure that you don’t make the code more brittle and harder to understand with your fix.
  9. Get the fix reviewed by somebody else to make sure that you didn’t do something stupid.
  10. Check the fix in.
  11. Check to see if this bug needs to be fixed in any other branches if you aren’t working from the mainline. Merge the change in, deal with differences in the code, and go through all of the same reviews and tests and other work again.
  12. Stop and think. Do you understand what went wrong, and why? Do you understand why your fix worked? Where else should you look for this kind of bug ? In The Pragmatic Programmer, Andy Hunt and Dave Thomas also ask “If it took a long time to fix this bug, ask yourself why”, and what can you do to make debugging problems like this easier in the future? How can you improve the approach that you took, or the tools that you used? How deep you go depends on the impact and severity of the bug and how much time you have.

What takes longer, finding a bug, or fixing it?

The time needed to setup a test environment, reproduce the problem or test it may far outweigh the amount of time that it takes to find the problem in the code and fix it. But for a small number of bugs, it’s not how long it takes to find it – it’s what’s involved in fixing it.

In Making Software, the chapter Where Do Most Software Flaws Come From?, Dewayne Perry analyzed how hard it was to find a bug (understand it and reproduce it) compared to how long it took to fix it. The study found that most bugs (almost 3/4) were easy to understand and find and didn’t take long to fix: 5 days or less (this was on a large-scale real-time system with a heavyweight SDLC, lots of reviews and testing). But there’s a long tail of bugs that can take much longer to fix, even bugs that were trivial to find:

Find/Fix Effort <=5 Days to Fix >5 Days to Fix
Problem can be reproduced 72.5% 18.4%
Hard to Reproduce or Can't be Reproduced 5.9% 3.2%

So you can bet when you find a bug that it’s going to be easy to fix. And most of the time you’ll be right. But when you’re wrong, you can be a lot wrong.

In subsequent posts, I am going to talk more about the issues and costs involved in reproducing, finding and fixing bugs, and how (or whether) to estimate bug fixes.

1 comment:

Anonymous said...

^^ totally gets it.

Site Meter