Building Real Software

Tuesday, June 26, 2012

Different ways of looking at security bugs

When a development team first starts to take application security seriously, they'll end up with a list (probably a long list) of security bugs. It's useful to look at security bugs in different ways. In my latest post at the SANS Appsec Street Fighter blog, I explore 4 different ways to look at security bugs.

Thursday, June 21, 2012

Sooner or Later: Deliver Early or Minimize Waste

There’s an obvious but important tension in Lean/Agile development around when to make decisions. Between the fundamental Agile position that we should do the most important and most risky work first, and the Lean argument that we should make decisions at the last possible moment. We need to decide early and try things out, iterate to minimize risk and time to market; but we should put off decisions and avoid iterating to keep costs down.

Later is better than sooner

Lean Development says that in order to avoid waste, we should wait to make decisions as late as possible, especially when we think the stakes may be high. The later that you can make a decision, the more information that you will have on hand to inform your decision, and the more options that you will have to choose from. This means that you should be able to make better quality decisions.

Deferred decision making is built into incremental/iterative planning and management. You don’t try to anticipate everything up front. You don’t try to work out all of the details and dependencies and resourcing at the beginning. Instead, you sketch out an outline of a plan with all of the critical dependencies and milestones that you know about, then focus on what’s in front of you as you move ahead. Detailed decisions about scheduling and implementation and re-prioritization are made when you are ready to do the work, just-in-time.

Deciding later costs less

In Agile Estimating and Planning,Mike Cohn says that

“The best way to reduce the cost of change is to implement a feature as late as possible – effectively when there is no more time for change”.

Because you are putting off work until later, there’s less chance that new information will come up and force you to change something that you’ve already decided on. Procrastination pays – or, at least, it saves some time and money.

You don’t want to put a lot of work into something until you know as much as you can about what you need to do, how you need to do it, or even if you really do need to do it. Build a prototype or run a technical spike first, build a first-of and maybe a second-of something and get the customer using it, get feedback and make sure you have a good idea of what you are doing before you do more typing. This is good common sense.

And there is other work that you can put off because you don’t have to worry about it – you have to do it, but you know what you need to do and you know that you can do it. You might as well wait until you have fleshed out more of the system first so that you can get it done quickly.

When should you do decide sooner rather than later?

The kind of work that you can afford to put off safely is mostly obvious. And the kind of work that you can’t put off to later is also obvious, like figuring out the basic architecture, the platforms and languages that you are going to use, the basic approach that you are going to follow, proving out your core ideas. These are decisions that you need to make early on, while there is still time to learn and while there is still time to change your mind and start again if you have to.

That’s why I was surprised by the example that Mike Cohn uses in his book:

“Adding support for internationalization today may take four weeks of effort; adding it in six months may take six weeks. So we should add it now, right? Maybe. Suppose we spend four weeks and do it now. Over the next six months, we may spend an additional three weeks changing the original implementation based on knowledge gained during that six months. In that case, we would have been better off waiting. Or what if we spend four weeks now and later discover that a simpler and faster implementation would have been adequate?”

But internationalization is a cross-cutting concern. This kind of work is often much harder to take care of than you want it to be or think it will be, impacting more of the design and code than you expect. Do you really want to wait six months before you find out that you have to change all of the forms and reports that you have developed to handle LTR and RTL languages simultaneously, or properly take care of laying out Japanese and Chinese characters? I’ve spent several weeks working with internationalization experts just trying to understand and solve problems with rendering honorific marks in Arabic names (a small thing, but a deal-killer if you are working with Middle Eastern royalty).

Other cross-cutting concerns may require much more fundamental and more expensive changes to implement later, changes that are hard to get right and hard to test: data validation, security, monitoring and instrumentation, caching, logging, error handling, persistence. Understanding this upfront and supporting it from the start is much less risky and less costly than trying to “add it in later”.

When in doubt, sooner is better than later

Deciding to do something early means that you are probably going to make mistakes and you will have to throw some work away, and you’re going to have to make changes later when you get more information. All of this is waste.

When it comes to planning out details, putting off decisions to just-in-time makes sense. But when it comes to designing and implementing important parts of the system, especially technical plumbing and cross-cutting work, doing something sooner is almost always better than later. Managing risk upfront will save you more money in the end.

Thursday, June 14, 2012

Where do Security Requirements come from?

One of the problems in building a secure application is that it’s not always clear what the security requirements are and where they are supposed to come from. Are security requirements supposed to come from the customer? Are they specified in the regulatory and compliance environment? Or are they implicit in the type of application that you are building – an online web store, real-time industrial control, a payroll system, a military weapons command system, an enterprise mobile app – and in the platform and technologies that you are using to build the system?

The answer is: all of the above.

In a recent post on how to manage security requirements in Agile projects, Rohit Sethi offers a useful way of looking at security requirements for any kind of software development work, Agile or not. He breaks security requirements down into 2 types:

Security Stories

A need to do, or not do, something in the system, which can be expressed in a story or a change request like other features. There’s no one way to get these requirements. Some requirements will come directly from the customer or operations or infosec or compliance or a partner, stated explicitly – if not always completely. You need to look harder for other security requirements, take time to understand the compliance and regulatory and legal requirements for the business and wade through the legalese to understand what your obligations are – both in the design of the system, and in how you design and build and operate it.

Other requirements come from the basic patterns of the system that you are building. Any team developing an online web store or an enterprise mobile app should have a good understanding of the basic security problems that they may need to take care of – different kinds of systems have well-understood general requirements for authentication, authorization, auditing and logging, data privacy and confidentiality.

Security stories are tracked and managed and prioritized like the rest your development work – the trick is to make sure that they are taken seriously by the team and by the customer, but at least they are visible to everyone on the project.

Technical Security Constraints

But there are other requirements that aren’t clear to the business and other stake holders – they’re hidden from outside of the development team, and sometimes hidden from the people on the development team as well. These are the things that developers have to do, or not do, to design and build software in a secure way given the architecture and the platform and technologies that they are using. Unlike requirements which are part of the problem space, these constraints are part of the solution space, part of the job of developing software properly.

Some technical constraints are obvious to good, experienced developers who know how to write good, safe software: defensive programming, logging and auditing, not hard-coding passwords or storing secrets in the clear, using prepared statements for relational database access, and so on.

Other technical constraints are much more obscure and specialised – developers may not understand or even be aware of these constraints unless they have training in secure software development and a good technical knowledge of the platform and language(s) and frameworks that they are working with, how to use them properly, what their security weak points are.

Understanding Security requirements is part of your job as a Developer

In Beautiful Security: Forcing Firms to Focus, Jim Routh explains that software security, like software quality, is generally implied in requirements for an application. In the same way that we expect a burger to be served hot and fresh, not stale and at room temperature, nobody should have to explicitly state that the software has to work, that you can’t deliver junk that is full of bugs and isn’t reliable, or code that isn’t secure.

“… clearly, the omission of security from explicit requirements is no reason to believe that customers don’t care about it, or don’t understand its importance.”

It’s up to us as developers to understand the business problems that we’re trying to solve and what our customers and partners need from a system – including understanding and confirming their security requirements. It’s up to us as developers to understand what kind of system we are building, and what the security and reliability and safety requirements for that kind of system should be. It’s up to us as developers to understand the platform and tools that we’re using, and how to use them properly. It’s up to us to find and understand security requirements and security constraints for a system. We can’t depend on other people to tell us these things, to make all of the steps explicit. This is our job, it’s what we get paid for.

Friday, June 8, 2012

Are Agile plans Better because they are Feature-Based?

In Agile Estimating and Planning, Mike Cohn quotes Jim Highsmith on why Agile projects are better:

“One of the things I keep telling people is that agile planning is "better" planning because it utilizes features (stories, etc.) rather than tasks. It is easy to plan an entire project using standard tasks without really understanding the product being built. When planning by feature, the team has a much better understanding of the product.”

In the original post on a Yahoo mailing group,Highsmith also says

“Sometimes key people in the agile movement have exaggerated at times to get noticed, I've done it myself at times--gone a little overboard to make a point. People then take this up and push it too far.”

This is clearly one of those times.

Activity-based Planning vs. Feature-based Planning

The argument runs like this. Activity-based plans described in a WBS and Gantt charts are built up from “standardized tasks”. These tasks or activities are not directly tied to the features that the customer wants or needs – they just describe the technical work that the development team needs to do, work that doesn’t make sense to the other stakeholders. According to Highsmith, you can build a plan like this without understanding what the software that you are building is actually supposed to do, and without understanding what is important to the customer.

An Agile plan, working from a feature backlog, is better because it “forces the team to think about the product at the right level – the features”.

I don't think that I have worked on a software development project, planned any which way, where we didn’t think about and plan out the features that the customer wanted. Where we didn’t track and manage the activities needed to design, build and deliver software with these features, including the behind-the-scenes engineering work and heavy lifting: defining the architecture, setting up the development and build and test environments and tools, evaluating and implementing or building frameworks and libraries and other plumbing, defining APIs and taking care of integration with other systems and databases, security work and performance work and operations work and system and integration testing and especially dealing with outside dependencies.

Some of this work won’t make sense to the customer. It’s not the kind of work that is captured in a feature list. But that doesn’t mean that you should pretend that it doesn’t need to be done and it doesn’t mean that you shouldn’t track it in your plans. Good project planning makes explicit the features that the customer cares about and when they will be worked on, and the important technical work that needs to get done. It has to reflect how the team thinks and works.

Activity-Based Planning is so Wrong in so very many Ways

In one of the first chapters, “Why Planning Fails”, Cohn enumerates the weaknesses of activity-based planning. First, most activity-based planners don’t bother to prioritize the work that needs to be done by what the customer wants or needs. This is because they assume that everything in the scope needs to be, and will be, done. So activities are scheduled in a way that is convenient for the development team. Which means that when the team inevitably realizes that they are over budget and won’t hit their schedule, they’ll have to cut features that are important to the customer – more important than the work that they’ve already wasted time working on.

Maybe. But there’s nothing stopping teams using activity-based planning from sequencing the work by customer priority and by technical dependencies and technical risk – which is what all teams, including Agile teams, have to do. This is how teams work when they follow a Spiral lifecycle, and teams that deliver work in incremental releases using Phased/Staged Delivery, or teams that Design and Build to Schedule, making sure that they get the high-priority work done early in order to hit a hard deadline. All of these are well-known software project planning and delivery approaches which are described in Steve McConnell’s Rapid Development and other books. Everyone that I know who delivers projects in a “traditional, plan-driven” way follows one of these methods, because they know that that a pure, naïve, plan-everything-upfront serial Waterfall model doesn’t work in the real world. So we can stop pretending otherwise.

Another criticism of activity-based planning is that it isn’t possible to accurately and efficiently define all of the work and all of the detailed dependencies for a software development project far in advance. Of course it isn’t. This is what Rolling Wave planning is for – lay out the major project milestones and phases and dependencies, and plan the next release or next few months/weeks/whatever in detail as you move forward. Although Cohn does a good job of explaining Rolling Wave planning in the context of Agile projects, it’s been a generally-recognized good planning practice for any kind of project for a long time now.

Agile plans aren’t better because they are Feature-Based

These, and the other arguments against activity-based planning in this book, are examples of a tired rhetorical technique that Glen Alleman describes perfectly as:

“Tell a story of someone doing dumb things on purpose and then give an example of how to correct the outcome using an agile method”.

Sure, a lot of Waterfall projects are badly run. And, yeah, sure, an Agile project has a better chance of succeeding over a poorly-planned, badly-managed, serial Waterfall project. But it’s not because Agile planning is feature-based or because activity-based planning is wrong. People can do dumb things no matter what approach they follow. The real power in Agile planning is in explicitly recognizing change and continuously managing uncertainty and risk through short iterations. Fortunately, that’s what the rest of Cohn’s book is about.

Tuesday, June 5, 2012

Agile Estimating: Story Points and Decay

I’m re-reading Mike Cohn’s Agile Estimating and Planning. It's the best book I've found on this and worth reading, even if he gets too Scrummy at times, and even if you don’t agree with everything he says. Which I don’t.

For example, I don’t agree with him that Story Points are better for estimation than Ideal Days. When we do estimates, we use Ideal Days, because thinking this way is more natural and more useful, and because these estimates are more meaningful to people inside and outside of the team.

Estimates in Ideal Days Decay

One of the reasons that Cohn recommends Story Points is that, unlike estimates in Ideal Days, Story Point estimates “have a longer shelf life” because they don’t decay – the meanings of the estimates don’t change over time as the team gets more information.

If you’re estimating using Ideal Days, as you learn more about the tools and language and platform, as you have more code to build on and work with, your understanding of how much work you can do in an Ideal Day will change. You may be able to get more work done – or less, because it turns out that you need to spend more time testing or whatever. As you learn and get better at how you work and understand it better, the definition of an “Ideal Day” will change. You’re learning and getting more information, so why not use this information and make better-quality decisions as you go forward?

But this means the estimates that you did earlier aren’t as useful any more – they’ve decayed. You’ll have to re-estimate this work at some point, because you’re mixing before-the-fact and after-the-fact information.

At some point you may have to do this anyways, whether you use Ideal Days or Story Points. Because relative-size estimates in Story Points also decay over time.

Story Points Decay too

Estimates in Ideal Days change as you learn more about the solution space: about how much work it actually takes to build some thing. Story Point estimates change as you learn more about the problem space: as you learn more about the domain and the problems that you need to solve, as you learn more about how big something really is in comparison to other things.

Take Cohn’s “Zoo Points” exercise, where people attempt to estimate the relative size of different wild animals. Everyone knows that a blue whale is much bigger than a rabbit or a fox. But if you’ve never seen a blue whale in real life, you have no real idea just how much bigger it is. 10x? 100x? 500x? 1000x? What about a tarsier or an okapi – some of the rarest animals in the world. You’re unlikely to ever see one. How the heck could you know how big they are relative to a rabbit, or to each other? You don’t know, so you’ll need to make a reasonable guess and use that for the time being.

Later, as you learn more about the problem space, especially if it’s a deeply technical domain, your understanding of size and complexity and risk will improve, and could change a lot.

So your idea of whether a thing is twice as big as something else, or about 1/5 as big as another thing, whether it's worth 5 points or 15 points, will change as you understand more about what these things actually are. As you continue to estimate the changes and new requirements that come in, your new estimates won’t align with earlier ones. The kind of work that you thought to be 5 points earlier you now know is 15 points. Unlike an Ideal Day, you haven’t changed what “5 points” means, but you have changed your understanding of what work is actually “5 points” worth.

On a project that’s run for a long time, people will join the team bringing new knowledge and experience, and people will leave taking knowledge and experience with them. Which means that estimates that you made earlier, before you learned something, or when you knew something that you’ve now lost, can't be the same as the estimates that you're making now, regardless of whether you’re sizing work in days or points. It’s unavoidable – you can’t not use what you have learned, and you can’t keep using what you’ve lost.

There may be reasons that Story Points are a better choice for estimating, for some teams at least, but “decay” isn’t one of them.

Monday, June 4, 2012

Continuous Deployment and Security: Ask the Expert Nick Galbreath

Another excellent interview in the "Ask the Expert" series on AppSec, with Nick Galbreath at Etsy, who looks at how teams can follow Continuous Deployment to deliver software quickly, while still being resilient and secure. You can read the interview here on the SANS AppSec blog.

Wednesday, May 23, 2012

The pursuit of protection: How much testing is “enough”?

I’m definitely not a testing expert. I’m a manager who wants to know when the software that we are building is finished, safe and ready to ship.

Large-scale enterprise systems – the kinds of systems that I work on – are essentially hard to test. They have lots of rules and exceptions and lots of interfaces and lots of customization for different customers and partners, and lots of operational dependencies, and they deal with lots of data. We can’t test everything – there are tens of thousands or hundreds of thousands of different scenarios and different paths to follow.

This gets easier and harder if you are working in Agile methods, building and releasing small pieces of work at a time. Most changes or new features are easy enough to understand and test by themselves. The bigger problem is in understanding the impact of each change on the rest of the system that has already been built, what side-effects the change may have, what might have broke. This gets harder if a change is introduced in small steps over several releases, so that some parts are incomplete or even invisible to the test team for a while.

People who write flight control software or medical device controllers need to do exhaustive testing, but the rest of us can’t afford to, and there are clearly diminishing returns. So if you can’t or aren’t going to “test everything”, how do you know when you’re done testing? One answer is that you’re done testing when you run out of time to do any more testing. But that’s not good enough.

You’re done testing when your testers say they’re done

Another answer is that you’re done when the test team says they’re done. When all of the static analysis findings have been reviewed and corrected. When all of the automated tests pass. When the testers have made sure that all features that are supposed to be complete were completed and secure, finished their test checklists, made sure that the software is usable and checked for fit-and-finish, tested for performance and stability, made sure that the deployment and rollback steps work, and completed enough exploratory testing that they’ve stopped finding interesting bugs, and the bugs that they have found (the important ones at least) have all been fixed and re-checked.

This of course assumes that they tested the right things – that they understood the business requirements and priorities, and found most of the interesting and important bugs in the system. But how do you know that they’ve done a good job?

What a lot of testers do is black box testing, which falls into two different forms:

Scripted functional and acceptance testing, manual and automated – how good the testing is depends on how complete and clear the requirements are (which is a challenge for small Agile teams working through informal requirements that keep changing), and how much time the testers have to plan out and run their tests.
Unscripted behavioural or exploratory manual testing – depends on the experience and skill of the tester, and on their familiarity with the system and their understanding of the domain.

With black box testing, you have to trust in the capabilities and care of the people doing the testing work. Even if they have taken a structured, methodical approach to defining and running tests they are still going to miss something. The question is – what, and how much?

Using Code Coverage

To know when you’ve tested enough, you have to stop testing in the dark. You have to look inside the code, using white box structural testing techniques to understand what code has been tested, and then look closer at the code to figure out how to test the code that wasn’t.

A study at Microsoft over 5 years involving thousands of testers found that with scripted, structured functional testing, testers could cover as much as 83% of the code. With exploratory testing they could raise this a few percentage points, to as high as 86%. Then, by looking at code coverage and walking through what was tested and what wasn’t, they were able to come up with tests that brought coverage up above 90%.

Using code coverage this way, instrumenting code under test and then looking into the code and reviewing and improving the tests that have already been written and figuring out what new tests to write, needs testers and developers to work together even more closely.

How much code coverage is enough?

If you’re measuring code coverage, the question that comes up is how much coverage is enough?

What percentage of your code should be covered before you can ship? 100%? 90%? 80%? You will find a lot of different numbers in the literature and I have yet to find solid evidence showing that any given number is better than another. Cedric Beust, Breaking Away from the Unit Test Group Think

In Continuous Delivery, Jez Humble and David Farley set 80% coverage as a target for each of automated unit testing, functional testing and acceptance testing. Based on their experience, this should provide comprehensive testing.

Some TDD and XP advocates argue for 100% automated test coverage, which is a target to aim for if you are starting off from scratch and want to maintain high standards, especially for smaller systems. But 100% is unnecessarily expensive, and it’s a hopeless target for a large legacy system that doesn’t have extensive automated tests already in place. You’ll reach a point of diminishing returns as you continue to add tests, where each tests costs more to write and finds less. The more tests that you write, the more tests will be bad tests – duplicate tests that seem to test different things but don’t, tests that don’t test anything important (even if they help make the code coverage numbers look a little better), tests that don’t work but look like they do. All of these tests, good or bad, need to run continuously and need to be maintained and get in the way of making changes. The costs keep going up. How many shops can afford to achieve this level of coverage, and sustain it over a long period of time, or even want to?

Making Code Coverage Work for You

On the team that I manage now, we rely on automated unit and functional testing at around 70% (statement) coverage – higher in high-risk areas, lower in others. Obviously, automated coverage is also higher in areas that are easier to test with automated tools. We hit this level of coverage more than 3 years ago and it has held steady since then. There hasn’t been a good reason to push it higher – it gives us enough of a safety net for developers to make most changes safely, and it frees the test team up to focus on risks and exceptions.

Of course with the other kinds of testing that we do, manual functional testing and exploratory testing and multi-player war games, semi-automated integration testing and performance testing, and operational system testing, coverage in the end is much higher than 70% for each release. We’ve instrumented some of our manual testing work, to see what code we are covering in our smoke tests and integration testing and exploratory testing work, but it hasn’t been practical so far to instrument all of the testing to get a final sum in a release.

Defect Density, Defect Seeding and Capture/Recapture – Does anybody really do this?

In an article in IEEE Software Best Practices from 1997, Steve McConnell talks about using statistical defect data to understand when you have done enough testing.

The first approach is to use Defect Density data (# of defects per KLOC or some other common definition of size) from previous releases of the system, or even other systems that you have worked on. Add up how many defects were found in testing (assuming that you track this data – some Lean/Agile teams don’t, we do) and how many were found in production. Then measure the size of the change set for each of these releases to calculate the defect density. Do the same for the release that you are working on now, and compare the results. Assuming that your development approach hasn’t changed significantly, you should be able to predict how many more bugs still need to be found and fixed. The more data, of course, the better your predictions.

Defect Seeding, also known as bebugging,is where someone inserts bugs on purpose and then you see how many of these bugs are found by other people in reviews and testing. The percentage of the known [seeded] bugs not found gives an indication of the real bugs that remain. Apparently some teams at IBM, HP and Motorola have used Defect Seeding, and it must come up a lot in interviews for software testing labs (Google “What is Defect Seeding?”), but it doesn’t look like a practical or safe way to estimate test coverage. First, you need to know that you’ve seeded the “right” kind of bugs, across enough of the code to be representative – you have to be good at making bugs on purpose, which isn’t as easy as it sounds. If you do a Mickey Mouse job of seeding the defects and make them too easy to find, you will get a false sense of confidence in your reviews and testing – if the team finds most or all of the seeded bugs, that doesn’t mean that they’ve found most or all of the real bugs. Bugs tend to be simple and obvious, or subtle and hard to find, and bugs tend to cluster in code that was badly designed or badly written, so the seeded bugs need to somehow represent this. And I don’t like the idea of putting bugs into code on purpose. As McConnell points out, you have to be careful in removing the seeded bugs and then do still more testing to make sure that you didn’t break anything.

And finally, there is Capture/Re-Capture, an approach used to estimate wildlife populations (catch and tag fish in a lake, then see how many of the tagged fish you catch again later), which Watts Humphrey introduced to software engineering as part of TSP to estimate remaining defects from the results of testing or reviews. According to Michael Howard, this approach is sometimes used at Microsoft for security code reviews, so let’s explore this context. You have two reviewers. Both review the same code for the same kinds of problems. Add up the number of problems found by the first reviewer (A), the number found by the second reviewer (B), and separately count the common problems that both reviewers found, where they overlap (C). The total number of estimated defects: A*B/C. The total number of defects found: A+B-C. The total number of defects remaining: A*B/C – (A+B-C).

Using Michael Howard’s example, if Reviewer A found 10 problems, and Reviewer B found 12 problems, and 4 of these problems were found by both reviewers in common, the total number of estimated defects is 10*12/4=30. The total number of defects found so far: 18. So there are 12 more defects still to be found.

I’m not a statistician either, so this seems like magic to me, and to others. But like the other statistical techniques, I don’t see it scaling down effectively. You need enough people doing enough work over enough time to get useful stats. It works better for large teams working in Waterfall-style, with a long test-and-fix cycle before release. With a small number of people working in small, incremental batches, you get too much variability – a good reviewer or tester could find most or all of the problems that the other reviewers or testers found. But this doesn’t mean that you’ve found all of the bugs in the system.

Your testing is good enough until a problem shows that it is not good enough

In the end, as Martin Fowler points out, you won’t really know if your testing was good enough until you see what happens in production:

The reason, of course, why people focus on coverage numbers is because they want to know if they are testing enough. Certainly low coverage numbers, say below half, are a sign of trouble. But high numbers don't necessarily mean much, and lead to “ignorance-promoting dashboards”. Sufficiency of testing is a much more complicated attribute than coverage can answer. I would say you are doing enough testing if the following is true:

You rarely get bugs that escape into production, and

You are rarely hesitant to change some code for fear it will cause production bugs.

Test everything that you can afford to. Release the code. When problems happen in production, fix them, then use Root Cause Analysis to find out why they happened and to figure out how you’re going to prevent problems in the future, how to improve the code and how to improve the way you write it and how you test it. Keep learning and keep going.