Tuesday, February 22, 2011

Simple Software Security: are there any real solutions for small teams?

I’m heading down to the SANS AppSec 2011 conference in San Francisco March 7 and 8. I’ll be on a panel discussing the disconnect between software developers and the security community, and what steps we can take to bridge the gaps. It should be an interesting discussion; I’m looking forward to sharing ideas with people who care about how we can all build better, more secure software.

I’m also looking forward to talking to smart people about the problems that small teams face in writing secure software. I want to know more about tools, ideas, practices that can be scaled down and used effectively by small, fast-moving development teams.

I’m concerned that too much of software security and Appsec is focused on the enterprise, the big firms with the resources and a mandate for security; and that there aren’t enough practical, affordable, simple solutions for small teams – where most of us work today, building and maintaining a lot of the world’s software. I want to know more about what’s out there that small teams can understand and use and rely on.

Are WAFs the answer?

Web Application Firewalls sound like the kind of solution that a small team needs. Don’t worry about fixing or changing security problems in the application: put in an appliance or another piece of software, set it up and let it block security attacks. But, of course, it’s not that simple. You need to find a good WAF solution, configure it, setup the correct set of rules and exceptions and then test these rules and exceptions thoroughly to make sure that the system still works with the firewall running; and continuously update the firewall rules as you update your software and as new problems are found. For some teams, especially teams that are not building out-of-the-box simple web apps, and Agile teams that are following Continuous Delivery with frequent deployments to production, or Continuous Deployment updating production several times a day, that’s a lot of work.

And WAFs add to operational cost and complexity, and there is a performance cost as well. And like a lot of the other appsec “solutions” available today, WAFs only protect you from some problems and leave others open.

Is ESAPI the answer?

OWASP’s ESAPI project promises a secure, enterprise API to handle the security functions for a web application. It’s Open Source, free under BSD license, it has an active community behind it to help with questions and make sure that it works and to keep it moving ahead, and it looks like good technology – exactly the kind of solution that small teams could use. It has code to take care of most of the ugly security problems, like data validation (straightforward in concept, but the devil is in the details) and encryption and secure session management.

But there’s a catch here too. ESAPI is good work, it is one of OWASP’s flagship projects, but it is still a work in progress. A bunch of smart and committed volunteers are working away on releasing 2.0 for Java; but it’s not ready for the wide world yet. Although there are some nice slides about ESAPI and some overviews available, the programmer-level documentation is incomplete and out of date. If you have a security specialist on staff who has the understanding, time and drive to work through the code, and who understands the problems well enough to know what to look for, you can get it to work. And big companies can pay for help from the consultants who wrote it to get it to work. Small application development teams don’t have the time or money to hire consultants to help make sure they understand the code and how to use it; and most of them don’t have the time or expertise to figure it out on their own.

For small teams to understand and trust ESAPI, it needs to come with clear and up to date documentation; simple and unequivocal guidelines and patterns on how to use it properly because you have to get it right the first time; and tools to help test to make sure that you used it properly. For people to want to use ESAPI, it needs simpler and cleaner packaging (something that the ESAPI team is planning) and tutorials with code samples for the common scenarios so that programmers can copy-and-paste to get started.

So ESAPI might be the answer someday, but not yet.

What else is there?

Other Appsec solutions are in the “first, you need to know how to write secure code, and then we can help you to check to see if you got it mostly right” space.

You start by getting the team training in software security awareness and secure development and security testing from The SANS Institute. Get the developers to use secure coding checklists
and cheat sheets from OWASP to write secure code and to help guide code reviews. Then you can use tools and guidance from Microsoft’s SDL to try to build your own threat models and keep them up to date.

Then you can check your code with static source code analysis tools. But most static analysis tools for security are expensive to license, take time to get setup and tuned, and you need your best people to make time to understand what the tool is trying to say and to work through the dups and false positives. And these tools are not enough on their own – they only find some of the problems.

Dynamic analysis, web application vulnerability scanners… same story.

Fuzzing? To fuzz online apps, you need to write smart protocol-aware fuzzers, which takes time and smarts. And you have to make time to setup and run the tests, and it’s a pain in the ass to work through the results and find real problems. It’s a good way to find fundamental weaknesses in data validation, but not a lot of small teams have the understanding or patience for this approach.

And if you do all this you still don’t know how secure you are (although you may know how secure you aren’t). So you need to pay for consulting help: get an expert code review, a threat model or secure architecture review, and you better schedule some pen testing too.

This is too much… and it’s not enough

It has to be simpler. It’s too hard to write secure software, too easy for even smart programmers to make bad mistakes – it’s like having a picnic in a minefield. The tools that we have today cost too much and find too little. Building secure software is expensive, inefficient, and there is no way to know when you have done enough.

There aren’t any easy answers, simple solutions. But I’m still going to look for them.

Tuesday, February 15, 2011

Zero Bug Tolerance Intolerance

It sounds good to say that you shall not and will not release code with bugs – that your team has “zero bug tolerance”. It makes for a nice sound bite. It sounds responsible. And it sounds right. But let’s look carefully at what this really means.

First, there are the logical arguments as to whether it is possible to build a perfect system of any level of complexity, and whether you can prove that a piece of nontrivial code is bug free. These are interesting questions, but I think the more important question is whether you should really try to.

I was guilty earlier in my career of pushing feature-set and schedule over quality, leaving too many bugs too late, and then having to deal with the aftermath. When I first started managing development projects a long time ago, I didn’t understand how “slowing down” to fix bugs would help get the project done faster. But I have learned, and I know (there is a difference between learning something and knowing something, really knowing it deep down) that fixing bugs helps keep costs down, and that it is possible to build good software quickly.

In Software Quality at Top Speed, Steve McConnell makes the case that short-changing design and writing bad code is stupid and will bite you in the ass, and that doing a responsible job on design and writing good code gets you to the end faster. At somewhere around the 90% defect removal rate you reach an optimal point:
“the point at which projects achieve the shortest schedules, least effort, and highest levels of user satisfaction” Capers Jones, Applied Software Measurement: Assuring Productivity and Quality, 1991.
Most teams don’t get close to the optimal point. But aiming beyond this, towards 100% perfection, causes costs to skyrocket; you quickly reach the point of diminishing returns. Diminishing returns in the pursuit of perfect software is explored further by Andy Boothe in The Economics of Perfect Software:
“For example, imagine a program has 100 bugs, and we know it will take 100 units of effort to find and fix all 100 of those bugs. The Law of Diminishing Returns tells us that the first 40 units of effort would find the first 70 bugs, the next 30 units of effort would find the next 20 bugs, and the next 30 units of effort would find the last 10 bugs. This means that the first 70 bugs (the shallow bugs) are cheap to find and squash at only 40 / 70 = 0.571 units of work per per bug (on average). The next 20 bugs (the deep bugs) are significantly more expensive at 30 / 20 = 1.5 units of effort per bug, and the final 10 bugs (the really deep bugs) are astronomically expensive at 30 / 10 = 3 units of effort per bug. The last 10 bugs are more than 5 times more time- and capital-intensive to eliminate per bug than the first 70 bugs. In terms of effort, the difference between eliminating most bugs (say 70%-90%) and all bugs is huge, to the tune of a 2x difference in effort and cost.

And in real life it’s actually worse than that. Because you don’t know when you’ve killed the last bug — there’s no countdown sign, like we had in our example — you have to keep looking for more bugs even when they’re all dead just to make sure they’re all dead. If you really want to kill all the bugs, you have to plan for that cost too.”
There's a cost to building good software

There’s a cost to put in place the necessary controls and practices, the checks and balances, and to build the team’s focus and commitment and discipline, and keep this up over time. To build the right culture, the right skills, and the right level of oversight. And there’s a cost to saying no to the customer: to cutting back on features, or asking for more time upfront, or delaying a release because of technical risks.

You also have to account for opportunity costs. Kent Beck and Martin Fowler have built their careers on writing high quality software and teaching other people how to do this. In Planning Extreme Programming they make it clear that it is important to write good software, but:
“For most software, however, we don’t actually want zero bugs. Any defect, once it is in there, takes time and effort to remove. That time and effort will take away from effort spent putting in features. So you have to decide what to do.”
That’s why I am concerned by right-sounding technical demands for zero bug tolerance. This isn’t a technical decision that can be made by developers or testers or project managers…or consultants. It’s bigger than all of them. It’s not just a technical decision – it’s also a business decision.

Long Tail of Bugs

Like many other problem spaces, the idea of the Long Tail also applies to bugs. There are bugs that need to be fixed and can be fixed right now. But there are other bugs that may never need to be fixed, or bugs that the customer may never see: minor bugs in parts of the system that aren’t used often, or problems that occur in unlikely configurations or unusual strings of events, or only under extreme stress testing. Bugs in code that is being rewritten anyways, or in a part of the system that is going to be decommissioned soon. Small cosmetic issues: if there was nothing better to do, sure you would clean this up, but you have something better to do, so you do this something better instead.

There are bugs that you aren’t sure are actually bugs, where you can’t agree on what the proper behavior should be. And there are bugs that can be expensive and time consuming to track down and reproduce and fix. Race conditions that take a long time to find – and may take a redesign to really fix. Intermittent heisenbugs that disappear when you look at them, non-deterministic problems that you don’t have the information or time to fix right now. And WTF bugs that you don’t understand and don’t know how to fix yet, or the only fix you can think of is scarier than the situation that you are already in.

Then there are bugs that you can’t fix yourself, bugs in underlying technology or third party libraries or partner systems that you have to live with or work around for now. And there are bugs that you haven’t found yet and won’t find unless you keep looking. And at some point you have to stop looking.

All of these bugs are sources of risk. But that doesn’t necessarily mean that they have to be fixed right now. As Eric Sink explains in My Life as a Code Economist, there are different questions that need to be answered to determine whether a bug should be fixed:

First there are the customer questions, basic questions about the importance of a bug to the business:
  • Severity: when this bug happens, how bad is the impact? Is it visible to a large number of customers? Could we lose data, or lose service to important customers or partners? What are the downstream effects: what other systems or partners could be impacted, and how quickly could the problem be contained or repaired? Could this violate service levels, or regulatory or compliance requirements?

  • Frequency: how often could this bug happen in production?
Then there are the developer questions, the technical questions about what needs to be done to fix the bug:
  • Cost: how much work is required to reproduce, fix and test this bug - including regression testing? What about RCA, how much work should we do in digging deeper and fixing the root cause?

  • Risk: what is the technical risk of making things worse by trying to fix it? How well do I understand the code? How much refactoring do I need to do to make sure that the code, and the fix, is clear? Is the code protected by a good set of tests?
For some bugs, the decision is dead easy: simple, stupid mistakes that you or somebody else just made as part of a change or another fix, mistakes that are found right away in testing or review, and should be fixed right away. You know what to do, you don’t waste time: you fix it and you move on.

But for other bugs, especially bugs discovered in existing code, it’s sometimes not so easy. Zero bug tolerance naively assumes that it is always good, it’s always right, to fix a bug. But fixing a bug is not always the right thing to do, because with any fix you run the risk of introducing new problems:
“the bugs you know are better than introducing new bugs”
Fred Brooks first pointed the regression problem out in The Mythical Man Month:
“…fixing a defect has a substantial (20-50%) chance of introducing another. So the whole process is two steps forward and one step back.”
The risk of introducing a bug when trying to fix another bug may have gone down since Fred Brooks’ time. In Geriatric Issues of Aging Software Capers Jones provides some less frightening numbers
“Roughly 7 percent of all defect repairs will contain a new defect that was not there before. For very complex and poorly structured applications, these bad fix injections have topped 20 percent”.
But the cost and risks are still real, and need to be accounted for.

Bars and Broken Windows

You, the team, the customer all need to agree on how high to set the bar, what kind of bugs and risks can be accepted. And then you have to figure out what controls, checks, practices, tools, skills, and how much more time you need to consistently hit that bar. How much it is going to cost. If you’re building safety-critical systems like the command software for the space shuttle
(what are we going to use as an example when the space shuttles stop flying?) the bar is extremely high, and so of course are the costs. If you’re a social media Internet startup, short on cash and time and big on opportunity, then the bar is as low as you can afford it. For the rest of us the bar is somewhere in between.

I get what’s behind Zero Bug Tolerance. I understand the “No Broken Windows” principle in software development: that if we let one bug through then what’s to stop the next one, and the next one and the next one after that. But it’s not that simple. No Broken Windows is about having the discipline to do a professional job. It’s about not being sloppy or careless or irresponsible. And there is nothing irresponsible about making tough and informed decisions about what can and should be fixed now.

Knowing when to stop fixing bugs, when you’ve reached the point of diminishing returns, when you should focus on more important work, isn’t easy. Knowing which bugs to fix and which ones not too, or which ones you can’t or shouldn’t fix now, isn’t easy. And you will be wrong sometimes. Some small problem that you didn’t think was important enough to look into further, some bug that you couldn’t justify the time to chase down, may come back and bite you. You’ll learn and hopefully make better decisions in the future.

That’s real life. In real life we have to make these kinds of hard decisions all of the time. Unfortunately, we can't rely on simple, 100% answers to real problems.

Sunday, February 6, 2011

Sad State of Secure Software Maintenance

This is sad. No, it's not sad, it's sick. I'm looking for ideas and clear thinking about secure software maintenance. But I can't find anything beyond a couple of articles on Software Security in Legacy Systems by Craig Miller and Carl Weber at Cigital. I met Craig, he did some consulting work at a startup that I was running. He's a smart guy for sure. These papers offer some good advice to enterprises looking for where to start, how to get a handle on securing legacy systems and COTS packages. They are worth reading. But this is all I can find anywhere. And that's not good enough.

Most of us who make a career in software development will spend most of our careers maintaining and supporting software. If we're lucky, we will work on software that we had a hand in designing and writing; if we're not so lucky, software that we inherited from somebody else. Software that we don't understand and that we need to get control of.

Software maintenance is a risk management game. Understanding what's important to the business, trading off today's priorities with the long term view. Dealing with work that has to be done right now, what's needed for this customer, how fast can that change be done, what do we need to do to fix this bug. How much are we spending, and where can we save. And making sure that we're not sacrificing tomorrow: keeping the team together, keeping them focused and motivated, helping them moving forward. Keeping technical debt under control: taking on debt where it makes sense, paying it off when we can. And making sure that we're are always dealing with what's important: service levels to customers, reliability, security: protecting customer data.

There's more to secure software maintenance than running static analysis checks on the code and an occasional vulnerability scan and application pen test. And most teams aren't even doing this.

There's not enough smart people taking on the problems of how to manage software maintenance properly. And there's definitely not enough people thinking about software security and maintenance. Where to start, how much to spend, why, what's important, what the next steps should be, where do you get the most return. This has to change. It's too important to too many people. There's too much money being spent and wasted on doing a poor job at too many companies. There's too much at stake.