Building Real Software

Wednesday, August 15, 2012

Rugged Implementation Guide - Strawman

The people behind Rugged Software have quietly made available a Strawman release of a “Rugged Implementation Guide”.

This guide is supposed to offer practical guidance on how to build a secure and reliable application following application security best practices. It is based on the idea of a “security story” for a system – a framework that identifies the technology and organization security lifelines and strategies, the defenses that implement these strategies, and assurance that the defenses are implemented correctly.

The guide walks through an example security story for the different lifelines and strategies at an online bank. These examples include just about every good idea and good practice you can think of, even including a “Simian Army" (of course) and a bug bounty program (in a bank… really?). It’s overwhelming. If doing all of this is what it takes to build “rugged software”, then I don’t think very many (any?) organizations are going to succeed unfortunately.

The cynical part of me noticed that the framework and examples were heavy on verification, especially third party verification: third party architecture reviews, third party application and network pen testing and code reviews. And comprehensive developer training. And the use of ESAPI and of course lots of other tools. All of this is good, and all of this is especially good for appsec vendors and appsec consultants (e.g., the people who wrote the guide).

I tried to understand the Implementation Guide in context of the “Rugged Handbook” which was also quietly released at the same time. There are some interesting and useful ideas in the Handbook - if you get past the introduction (I wasn’t able to the first time I tried to read it), which unfortunately contains pages and pages of stuff like “Work Together like an Ant Colony” and “Defend Yourself Like a Honey Badger”… Towards the end of the document the authors explain how to get people together to write a security story and how different people in the organization (executives, project managers, analysts, developers) should think about and take responsibility for application security.

Although there’s nothing in the Rugged Handbook on the Rugged Implementation Guide or how it is to be used, I think I understand that development organizations need to go out and create a “security story” that explains what they need to protect, how they are going to protect it, and how they will prove that they have protected it, using these two documents as help.

The “security story” framework is a good way of organizing and thinking about security for a system - once you get past the silly marketing junk about badgers or whatever. It looks at security both in the design of a system and how it is designed and built and supported, targeted I guess more towards developers than security professionals.

So now what?

With these documents from Rugged, development managers have another, still incomplete description of how to do secure development to go along with OpenSAMM and BSIMM and Microsoft’s SDL. The over-worked part of me isn't happy that now there are more things developers have to think about and try in order to do their jobs properly.

I know that there is more work to be done on these documents and ideas. But so far I don’t understand yet how or why this is going to change the way that developers build software – maybe that was in the part about the honey badgers that I didn’t read...

Tuesday, August 14, 2012

What can you get out of Kanban?

I’ve spent the last year or so learning more about Kanban, how to use it in software development and IT operations.

It’s definitely getting a lot of attention, and I want to see if it can help our development and operations teams work better.

What to Read, What to Read?

There’s a lot to read on Kanban – but not all of it is useful. You can be safe in ignoring anything that tries to draw parallels between Japanese manufacturing and software development. Anyone who does this doesn’t understand Lean manufacturing or software development. Kanban (big K) isn’t kanban (little k) and software development isn’t manufacturing.

Kanban in software development starts with David Anderson’s book. It’s heavy on selling the ideas behind Kanban, but unfortunately light on facts and examples – mostly because there wasn’t a lot of experience and data to draw on at the time the book was written. Everything is built up from two case studies in small maintenance organizations.

The first case study (which you can read here online) goes back to 2005, at a small offshore sustaining engineering team in Microsoft. This team couldn’t come close to keeping up with demand, because of the insanely heavyweight practice framework that they were following (CMMI and TSP/PSP) and all of the paperwork and wasted planning effort that they were forced to do. They were spending almost half of their time on estimating and planning, and half of this work would never end up being delivered because more work was always coming in.

Anderson and another Microsoft manager radically simplified the team’s planning, prioritization and estimation approach, cutting out a lot of the bullshit so that the team could focus instead on actually delivering what was important to the customers.

You can skip over the theory – the importance of Drum-Buffer-Rope – Anderson changed his mind later on the theory behind the work anyways. As he says:

“This is a case study about implementing common sense changes where they were needed”.

The second case study is about Anderson doing something similar with another ad hoc maintenance team at Corbis a couple of years later. The approach and lessons were similar, in this case focusing more on visual tracking of work and moving away from time boxing for maintenance and break/fix work, using explicit WIP limits instead as the forcing function to control the team’s work.

The rest of the book goes into the details of tracking work on task boards, setting and adjusting limits, and just-in-time planning. There’s also some discussion of effecting organizational change and continuous improvement, which you can get from any other Agile/Lean source.

Corey Ladas’ book Scrumban is a short collection of essays on Kanban and combining Scrum and Kanban. The book is essentially built around one essay, which, taking a Lean approach to save time and money, I suggest that you read here instead. You don’t need to bother with the rest of the book unless you want to try and follow along as Ladas works his ideas out in detail. The basic ideas are the same as Anderson (not surprising, since they worked together at Corbis):

Don’t build features that nobody needs right now. Don’t write more code than you can test. Don’t test more code than you can deploy.

David Anderson has a new book on Kanban coming out soon Unfortunately it looks like a rehash of some of his blog posts with updated commentary. I was hoping for something with more case studies and data, and follow-ups on the existing case studies to see if they were able to sustain their success over time.

There is other writing on Kanban, by excited people who are trying out (often in startups or small teams), or by consultants who have added Kanban to the portfolio of what they are selling – after all, “Kanban is the New Scrum”.

You can keep up with most of this through the Kanban weekly Roundup, which provides a weekly summary of links and discussion forums and news groups and presentations and training in Kanban and Lean stuff. And there’s a discussion group on Kanban development which is also worth checking out. But so far I haven’t found anything, on a consistent basis anyways, that adds much beyond what you will learn from reading Anderson’s first Kanban book.

Where does Kanban work best?

Kanban was created to help break/fix maintenance and sustaining engineering teams. Kanban’s pull-based task-and-queue work management matches the unpredictable, interrupt-driven and specialized work that these teams do. Kanban puts names and a well-defined structure around common sense ideas that most successful maintenance and support and operations teams already follow, which should help inexperienced teams succeed. It makes much more sense to use Kanban than to try to bend Scrum to fit maintenance and support, or following a heavyweight model like IEE 1219 or whatever it is being replaced by.

I can also see why Kanban has become popular with technology startups, especially web / SaaS startups following Continuous Deployment and a Lean Startup model. Kanban is about execution and continuous optimization, removing bottlenecks and reducing cycle time and reacting to immediate feedback. This is exactly what a startup needs to do once they have come up with their brilliant idea.

But getting control over work isn’t enough, especially over the longer-term. Kanban’s focus is mostly tactical, on the work immediately in front of the team, on identifying and resolving problems at a micro-level. There’s nothing that gets people to put their heads up and look at what they can and should do to make things better on a larger-scale – before problems manifest themselves as delays. You still need a layer of product management and risk management over Kanban, and you’ll also need a technical software development practice framework like XP.

I also don’t see how – or why – Kanban makes much of a difference in large-scale development projects, where flow and tactical optimization aren’t as important as managing and coordinating all of the different moving pieces. Kanban won’t help you scale work across an enterprise or manage large programs with lots of interdependencies. You still have to do project and program and portfolio management and risk management above Kanban’s micro-structure for managing day-to-day work, and you still need a SDLC.

Do you need Kanban?

Kanban is a tool to solve problems in how people get their work done. Anderson makes it clear that Kanban is not enough in itself to manage software development regardless of the size of the organization – it’s just a way for teams to get their work done better:

“Kanban is not a software development life cycle or a project management methodology…You apply a Kanban system to an existing software development process…”

Kanban can help teams that are being crushed under a heavy workload, especially support and maintenance teams. Combining Kanban with a triage approach to prioritization would be a good way to get through a crisis.

But I don’t see any advantage in using Kanban for an experienced development team that is working effectively.

Limiting Work in Progress? Time boxing already puts limits around the work that development teams do at one time, giving the team a chance to focus and get things done.

Although some people argue that iterations add too much overhead and slow teams down, you can strip overheads down so that time boxing doesn’t get in the way - so that you really are sprinting.

Getting the team to understand the flow of work, making delays and blockers visible and explicit is an important part of Kanban. But Microsoft’s Eric Brechner (er, I mean, “I. M. Wright”) explains that you don’t need Kanban and taskboards to see delays and bottlenecks or to balance throughput in an experienced development team:

“Good teams do this intuitively to avoid wasted effort. They ‘right-size’ their teams and work collaboratively to succeed together.”

And anybody who is working iteratively and incrementally is already doing or should be doing just-in-time planning and prioritization.

So for us, Kanban doesn’t seem to be worth it, at least for now. If your development team is inexperienced and can’t deliver (or don’t know what they need to deliver), or you’re in a small maintenance or firefighting group, Kanban is worth trying. If Kanban can help operations and maintenance teams survive, and help some online startups launch faster, if that’s all that people ever do with Kanban, the world will still be a better place.

Thursday, August 9, 2012

Bug Fixing – to Estimate, or not to Estimate, that is the question

According to Steve McConnell in Code Complete (data from 1975-1992) most bugs don’t take long to fix. About 85% of errors can be fixed in less than a few hours. Some more can be fixed in a few hours to a few days. But the rest take longer, sometimes much longer – as I talked about in an earlier post.

Given all of these factors and uncertainty, how to you estimate a bug fix? Or should you bother?

Block out some time for bug fixing

Some teams don’t estimate bug fixes upfront. Instead they allocate a block of time, some kind of buffer for bug fixing as a regular part of the team’s work, especially if they are working in time boxes. Developers come back with an estimate only if it looks like the fix will require a substantial change – after they’ve dug into the code and found out that the fix isn’t going to be easy, that it may require a redesign or require changes to complex or critical code that needs careful review and testing.

Use a rule of thumb placeholder for each bug fix

Another approach is to use a rough rule of thumb, a standard place holder for every bug fix. Estimate ½ day of development work for each bug, for example. According to this post on Stack Overflow the ½ day suggestion comes from Jeff Sutherland, one of the inventors of Scrum.

This place holder should work for most bugs. If it takes a developer more than ½ day to come up with a fix, then they probably need help and people need to know anyways. Pick a place holder and use it for a while. If it seems too small or too big, change it. Iterate. You will always have bugs to fix. You might get better at fixing them over time, or they might get harder to find and fix once you’ve got past the obvious ones.

Or you could use the data earlier from Capers Jones on how long it takes to fix a bug by the type of bug. A day or half day works well on average, especially since most bugs are coding bugs (on average 3 hours) or data bugs (6.5 hours). Even design bugs on average only take little more than a day to resolve.

Collect some data – and use it

Steve McConnell, In Software Estimation: Demystifying the Black Art says that it’s always better to use data than to guess. He suggests collecting time data for as little as a few weeks or maybe a couple of months on how long on average it takes to fix a bug, and use this as a guide for estimating bug fixes going forward.

If you have enough defect data, you can be smarter about how to use it. If you are tracking bugs in a bug database like Jira, and if programmers are tracking how much time they spend on fixing each bug for billing or time accounting purposes (which you can also do in Jira), then you can mine the bug database for similar bugs and see how long they took to fix – and maybe get some ideas on how to fix the bug that you are working on by reviewing what other people did on other bugs before you. You can group different bugs into buckets (by size – small, medium, large, x-large – or type) and then come up with an average estimate, and maybe even a best case, worst case and most likely for each type.

Use Benchmarks

For a maintenance team (a sustaining engineering or break/fix team responsible for software repairs only), you could use industry productivity benchmarks to project how many bugs your team can handle. Capers Jones in Estimating Software Costs says that the average programmer (in the US, in 2009), can fix 8-10 bugs per month (of course, if you’re an above-average programmer working in Canada in 2012, you’ll have to set these numbers much higher). Inexperienced programmers can be expected to fix 6 a month, while experienced developers using good tools can fix up to 20 per month.

If you’re focusing on fixing security vulnerabilities reported by a pen tester or a scan, check out the remediation statistical data that Denim Group has started to collect, to get an idea on how long it might take to fix a SQL injection bug or an XSS vulnerability.

So, do you estimate bug fixes, or not?

Because you can’t estimate how long it will take to fix a bug until you’ve figured out what’s wrong, and most of the work in fixing a bug involves figuring out what’s wrong, it doesn’t make sense to try to do an in-depth estimate of how long it will take to fix each bug as they come up.

Using simple historical data, a benchmark, or even a rough guess place holder as a rule-of-thumb all seem to work just as well. Whatever you do, do it in the simplest and most efficient way possible, don’t waste time trying to get it perfect – and realize that you won’t always be able to depend on it.

Remember the 10x rule – some outlier bugs can take up to 10x as long to find and fix than an average bug. And some bugs can’t be found or fixed at all – or at least not with the information that you have today. When you’re wrong (and sometimes you’re going to be wrong), you can be really wrong, and even careful estimating isn’t going to help. So stick with a simple, efficient approach, and be prepared when you hit a hard problem, because it's gonna happen.

Wednesday, August 8, 2012

Fixing Bugs that can’t be Reproduced

There are bugs that can’t be reproduced, or at least not easily: intermittent and transient errors; bugs that disappear when you try to look for them; bugs that occur as the result of a long chain of independent operations or cross-request timing. Some of these bugs are only found in high-scale production systems that have been running for a long time under heavy load.

Capers Jones calls these bugs “abeyant defects” and estimates that in big systems, as much as 10% of bugs cannot be reproduced, or are too expensive to try reproducing. These bugs can cost 100x more to fix than a simple defect – an “average” bug like this can take more than a week for somebody to find (if they can be found at all) by walking through the design and code, and up to another week or two to fix.

Heisenbugs

One class of bugs that can’t be reproduced are Heisenbugs: bugs that disappear when you attempt to trace or isolate them. When you add tracing code, or step through the problem in a debugger, the problem goes away.

In Debug It!, Paul Butcher offers some hope for dealing with these bugs. He says Heisenbugs are caused by non-deterministic behavior which in turn can only be caused by:

Unpredictable initial state – a common problem in C/C++ code
Interaction with external systems – which can be isolated and stubbed out, although this is not always easy
Deliberate randomness – random factors can also be stubbed out in testing
Concurrency – the most common cause of Heisenbugs today, at least in Java.

Knowing this (or at least making yourself believe it while you are trying to find a problem like this) can help you decide where to start looking for a cause, and how to go forward. But unfortunately, it doesn’t mean that you will find the bug, at least not soon.

Race Conditions

Race conditions used to be problems only for systems programmers and people writing communications handlers. But almost everybody runs into race conditions today – is anybody writing code that isn’t multi-threaded anymore? Races are synchronization errors that occur when two or more threads or processes access the same data or resource without a consistent locking approach.

Races can result in corrupted data – memory getting stomped on (especially in C/C++), changes applied more than once (balances going negative) or changes being lost (credits without debits) – inconsistent UI behaviour, random crashes due to null pointer problems (a thread references an object that has already been freed by another thread), intermittent timeouts, and in actions executed out of sequence, including time-of-check time-of-use security violations.

The results will depend on which thread or process wins the race a particular time. Because races are the result of unlucky timing, and because they problem may not be visible right away (e.g., something gets stepped on but you don’t know until much later when something else tries to use it), they’re usually hard to understand and hard to make happen.

You can’t fix (or probably even find) a race condition without understanding concurrency. And the fact that you have a race condition is a good sign that whoever wrote this code didn’t understand concurrency, so you’re probably not dealing with only one mistake. You’ll have to be careful in diagnosing and especially in fixing the bug, to make sure that you don’t change the problem from a race into a stall, thread starvation or livelock, or deadlock instead, by getting the synchronization approach wrong.

Fixing bugs that can’t be reproduced

If you’re think you’re dealing with a concurrency problem, a race condition or a timing-related bug, try introducing log pauses between different threads – this will expand the window for races and timing-related problems to occur, which should make the problem more obvious. This is what IBM Research’s ConTest tool does (or did - unfortunately, ConTest seems to have disappeared off of the IBM alphaWorks site), messing with thread scheduling to make deadlocks and races occur more often.

If you can’t reproduce the bug, that doesn’t mean that you give up. There are still some things to look at and try.

It’s often faster to find concurrency bugs, timing problems and other hard problems by working back from the error and stepping through the code – in a debugger or by hand – to build up your own model of how the code is supposed to work.

“Even if you can’t easily reproduce the bug in the lab, use the debugger to understand the affected code. Judiciously stepping into or over functions based on your level of “need to know” about that code path. Examine live data and stack traces to augment your knowledge of the code paths.” Jeff Vroom, Debugging Hard Problems

I’ve worked with brilliant programmers who can work back through even the nastiest code, tracing through what’s going on until they see the error. For the rest of us, debugging is a perfect time to pair up. While I'm not sure that pair programming makes a lot of sense as an everyday practice for everyday work, I haven’t seen too many ugly bugs solved without two smart people stepping through the code and logs together.

It’s also important to check compiler and static analysis warnings – these tools can help point to easily-overlooked coding mistakes that could be the cause of your bug, and maybe even other bugs. Findbugs has statistical concurrency bug pattern checkers that can point out common concurrency bug patterns as well as lots of other coding mistakes that you can miss finding on your own.

There are also dynamic analysis tools that are supposed to help find race conditions and other kinds of concurrency bugs at run-time, but I haven’t seen any of them actually work. If anybody has had success with tools like this in real-world applications I would like to hear about it.

Shotgun Debugging, Offensive Programming, Brute-Force Debugging and other ideas

Debug It! recommends that if you are desperate, just take a copy of the code, and try changing something, anything, and see what happens. Sometimes the new information that results from your change may point you in a new direction. This kind of undirected “Shotgun Debugging” is basically wishful thinking, relying on luck and accidental success, what Andy Hunt and Dave Thomas call “Programming by Coincidence”. It isn’t something that you want to rely on, or be proud of.

But there are some changes that do make a lot of sense to try. In Code Complete, Steve McConnell recommends making “offensive coding” changes: adding asserts and other run-time debugging checks that will cause the code to fail if something “impossible” happens, because something “impossible” apparently is happening.

Jeff Vroom suggests writing your own debugging code:

“For certain types of complex code, I will write debugging code, which I put in temporarily just to isolate a specific code path where a simple breakpoint won’t do. I’ve found using the debugger’s conditional breakpoints is usually too slow when the code path you are testing is complicated. You may hit a specific method 1000′s of times before the one that causes the failure. The only way to stop in the right iteration is to add specific code to test for values of input parameters… Once you stop at the interesting point, you can examine all of the relevant state and use that to understand more about the program.”

Paul Butcher suggests that before you give up trying to reproduce and fix a problem, see if there are any other bugs reported in the same area and try to fix them – even if they aren’t serious bugs. The rationale: fixing other bugs may clear up the situation (your bug was being masked by another one); and by working on something else in the same area, you may learn more about the code and get some new ideas about your original problem.

Refactoring or even quickly rewriting some of the code where you think the problem might be can sometimes help you to see the problem more clearly, especially if the code is difficult to follow. Or it could possibly move the problem somewhere else, making it easier to find.

If you can’t find it and fix it, at least add defensive code: tighten up error handling and input checking, and add some logging – if something is happening that can’t happen or won’t happen for you, you need more information on where to look when it happens again.

Question Everything

Finally, The Pragmatic Programmer tells us to remember that nothing is impossible. If you aren’t getting closer to the answer, question your assumptions. Code that has always worked might not work in this particular case. Unless you have a complete trace of the problem in the original report, it’s possible that the report is incomplete or misleading – that the problem you need to solve is not the problem that you are looking at.

If you're trapped in a real debugging nightmare, look outside of your own code. The bug may not be in your code, but in underlying third party code in your service stack or the OS. In big systems under heavy load, I’ve run into problems in the operating system kernel, web servers, messaging middleware, virtual machines, and the DBMS. Your job when debugging a problem like this is to make sure that you know where the bug isn’t (in your code), and try to come up with a simple a test case as possible that shows this – when you’re working with a vendor, they’re not going to be able to setup and run a full-scale enterprise app in order to reproduce your problem.

Hard bugs that can't be reproduced can blow schedules and service levels to hell, and wear out even your best people. Taking all of this into account, let’s get back to the original problem of how or whether to estimate bug fixes, in the next post.

Tuesday, August 7, 2012

Fixing Bugs - if you can't reproduce them, you can't fix them

"Generally, if you can’t reproduce it, it’s almost impossible to fix". Anonymous programmer, Practices of Software Maintenance, Janice Singer

Fixing a problem usually starts with reproducing it – what Steve McConnell calls “stabilizing the error”.

Technically speaking, you can’t be sure you are fixing the problem unless you can run through the same steps, see the problem happen yourself, fix it, and then run through the same steps and make sure that the problem went away. If you can’t reproduce it, then you are only guessing at what’s wrong, and that means you are only guessing that your fix is going to work.

But let’s face it – it’s not always practical or even possible to reproduce a problem. Lots of bug reports don’t include enough information for you to understand what the hell the problem actually was, never mind what was going on when the problem occurred – especially bug reports from the field. Rahul Premraj and Thomas Zimmermann found in The Art of Collecting Bug Reports (from the book Making Software), that the two most important factors in determining whether a bug report will get fixed or not are:

Is the description well-written, can the programmer understand what was wrong or why the customer thought something was wrong?
Does it include steps to reproduce the problem, even basic information about what they were doing when the problem happened?

It’s not a lot to ask – from a good tester at least. But you can’t reasonably expect this from customers.

There are other cases where you have enough information, but don’t have the tools or expertise to reproduce a problem – for example, when a pen tester has found a security bug using specialist tools that you don’t have or don’t understand how to use.

Sometimes you can fix a problem without being able to see it happen in front of you, come up with a theory on your own, trusting your gut – especially if this is code that you recently worked on. But reproducing the problem first gives you the confidence that you aren’t wasting your time and that you actually fixed the right thing. Trying to reproduce the problem should almost always be your first step.

What’s involved in reproducing a bug?

What you want to do is to find, as quickly as possible, a simple test that consistently shows the problem, so that you can then run a set of experiments, trace through the code, isolate what’s wrong, and prove that it went away after you fixed the code.

The best explanation that I’ve found of how to reproduce a bug is in Debug It! where Paul Butcher patiently explains the pre-conditions (identifying the differences between your test environment and the customer’s environment, and trying to control as many of them as possible), and then how to walk backwards from the error to recreate the conditions required to make the problem happen again. Butcher is confident that if you take a methodical approach, you will (almost) always be able to reproduce the problem successfully.

In Why Programs Fail: A guide to Systematic Debugging, Andreas Zeller, a German Comp Sci professor, explains that it’s not enough just to make the problem happen again. Your goal is to come up with the simplest set of circumstances that will trigger the problem – the smallest set of data and dependencies, the simplest and most efficient test(s) with the fewest variables, the shortest path to making the problem happen. You need to understand what is not relevant to the problem, what’s just noise that adds to the cost and time of debugging and testing – and get rid of it. You do this using binary techniques to slice up the input data set, narrowing in on the data and other variables that you actually need, repeating this until the problem starts to become clear.

Code Complete’s chapter on Debugging is another good guide on how to reproduce a problem following a set of iterative steps, and how to narrow in on the simplest and most useful set of test conditions required to make the problem happen; as well as common places to look for bugs: checking for code that has been changed recently, code that has a history of other bugs, code that is difficult to understand (if you find it hard to understand, there’s a good chance that the programmers who worked on it before you did too).

Replay Tools

One of the most efficient ways to reproduce a problem, especially in server code, is by automatically replaying the events that led up to the problem. To do this you’ll need to capture a time-sequenced record of what happened, usually from an audit log, and a driver to read and play the events against the system. And for this to work properly, the behavior of the system needs to be deterministic – given the same set of inputs in the same sequence, the same results will occur each time. Otherwise you’ll have to replay the logs over and over and hope for the right set of circumstances to occur again.

On one system that I worked on, the back-end engine was a deterministic state machine designed specifically to support replay. All of the data and events, including configuration and control data and timer events, were recorded in an inbound event log that we could replay. There were no random factors or unpredictable external events – the behavior of the system could always be recreated exactly by replaying the log, making it easy to reproduce bugs from the field. It was a beautiful thing, but most code isn’t designed to support replay in this way.

Recent research in virtual machine technology has led to the development of replay tools to snapshot and replay events in a virtual machine. VMWare Workstation, for example, included a cool replay debugging facility for C/C++ programmers which was “guaranteed to have instruction-by-instruction identical behavior each time.” Unfortunately, this was an expensive thing to make work, and it was dropped in version 8, at the end of last year.

Replay Solutions provides replay for Java programs, creating a virtual machine to record the complete stream of events (including database I/O, network I/O, system calls, interrupts) as the application is running, and then later letting you simulate and replay the same events against a copy of the running system, so that you can debug the application and observe its behavior. They also offer similar application record and replay technology for mobile HTML5 and JavaScript applications. This is exciting stuff, especially for complex systems where it is difficult to setup and reproduce problems in different environments.

Fuzzing and Randomness

If the problem is non-deterministic, or you can't come up with the right set of inputs, one approach to try is to simulate random data inputs and watch to see what happens - hoping to happen on a set of input variables that will trigger the problem. This is called fuzzing. Fuzzing is a brute force testing technique that is used to uncover data validation weaknesses that can cause reliability and security problems. It's effective at finding bugs, but it’s a terribly inefficient way to reproduce a specific problem.

First you need to setup something to fuzz the inputs (this is easy if a program is reading from a file, or a web form – there are fuzzing tools to help with this – but a hassle if you need to write your own smart protocol fuzzer to test against internal APIs). Then you need time to run through all of the tests (with mutation fuzzing, you may need to run tens of thousands or hundreds of thousands of tests to get enough interesting combinations) and more time to sift through and review all of the test results and understand any problems that are found.

Through fuzzing you will get new information about the system to help you identity problem areas in the code, and maybe find new bugs, but you may not end up any closer to fixing the problem that you started on.

Reproducing problems, especially when you are working from a bad bug report (“the system was running fine all day, then it crashed… the error said something about a null pointer I think?”) can be a serious time sink. But what if you can’t reproduce the problem at all? Let’s look at that next…

Monday, August 6, 2012

Ask the Expert - John Steven on Threat Modeling

John Steven closes out a series of interviews with appsec experts on the challenges in security threat modeling and how teams can succeed with threat modeling. You can read the interview with John here.

Thursday, August 2, 2012

Fixing Bugs – there’s no substitute for experience

We've all heard that the only way to get good at fixing bugs in through experience – the school of hard knocks. Experienced programmers aren’t afraid, because they’ve worked on hard problems before, and they know what to try when they run into another one – what’s worked for them in the past, what hasn’t, what they’ve seen other programmers try, what they learned from them.

They’ve built up their own list of bug patterns and debugging patterns, and checklists and tools and techniques to follow. They know when to try a quick-and-dirty approach, use their gut, and when to be methodical and patient and scientific.They understand how to do binary slicing to reduce the size of the problem set. They know how to read traces and dump files. And they know the language and tools that they are working with.

It takes time and experience to know where to start looking, how to narrow in on a problem; what information is useful, what isn’t and how to tell the difference. And how to do all of this fast. We’re back to knowing where to tap the hammer again.

But how much of a difference does experience really make?

Steve McConnell’s Code Complete is about programmer productivity: what makes some programmers better than others, and what all programmers can do to get better. His research shows that there can be as much as a 10x productivity difference in the quality, amount and speed of work that top programmers can do compared to programmers who don't know what they are doing.

Debugging is one of the areas that really show this difference, that separates the men from the boys and the women from the girls. Studies have found a 20-to-1 or even 25-to-1 difference in the time it takes experienced programmers to find the same set of defects found by inexperienced programmers. That’s not all. The best programmers also find significantly more defects and make far fewer mistakes when putting in fixes.

What’s more important: experience or good tools?

In Applied Software Measurement, Capers Jones looks at 4 different factors that affect the productivity of programmers finding and fixing bugs:

Experience in debugging and maintenance
How good – or bad – the code structure is
The language and platform
Whether the programmers have good code management and debugging tools – and know how to use them.

Jones measures the debugging and bug fixing ability of a programmer by measuring assignment scope - the average amount of code that one programmer can maintain in a year. He says that the average programmer can maintain somewhere around 1,000 function points per year – about 50,000 lines of Java code.

Let’s look at some of this data to understand how much of a difference experience makes in fixing bugs.

Inexperienced staff, poor structure, high-level-language, no maintenance tools

Worst	Average	Best
150	300	500

Experienced staff, poor structure, high-level language, no maintenance tools

Worst	Average	Best
1150	1850	2800

This data shows a roughly 20:1 difference between experienced programmers and inexperienced programmers, on teams working with badly structured code and without good maintenance tools. Now let’s look at the difference good tools can make:

Inexperienced staff, poor structure, high-level language, good tools

Worst	Average	Best
900	1400	2100

Experienced staff, poor structure, high-level language, good tools

Worst	Average	Best
2100	2800	4500

Using good tools for code navigating and refactoring, reverse engineering, profiling and debugging can help to level the playing field between novice programmers and experts.

You’d have to be an idiot to ignore your tools (debuggers are for losers? Seriously?). But even with today’s good tools, an experienced programmer will still win out – 2x more efficient on average, 5x from best to worst case.

The difference can be effectively infinite in some cases. There are some bugs that an inexperienced programmer can’t solve at all – they have no idea where to look or what to do. They just don’t understand the language or the platform or the code or the problem well enough to be of any use. And they are more likely to make things worse by introducing new bugs trying to fix something than they are to fix the bug in the first place. There’s no point in even asking them to try.

You can learn a lot about debugging from a good book like Debug It! or Code Complete. But when it comes to fixing bugs, there’s no substitute for experience.