Building Real Software: Implementing Static Analysis isn't that easy

Static Analysis Testing (SAST) for software bugs and vulnerabilities should be part of your application security – and software quality – program. All that you need to do is run a tool and it will find bugs in the code, early in development when they are cheaper and easier to fix. Sounds easy.

But it takes more than just buying a tool and running a scan – or uploading code to a testing service and having them run the scans for you. You need the direct involvement and buy-in from developers, and from their managers. Because static analysis doesn't find bugs. It finds things in the code that might be bugs, and you need developers to determine what are real problems and what aren't.

This year’s SANS Institute survey on Appsec Programs and Practices which Frank Kim and I worked on found that use of static analysis ranks towards the bottom of the list of tools and practices that organizations find useful in their appsec programs.

This is because you need a real commitment from developers to make static analysis testing successful, and securing this commitment isn't easy.

You’re asking developers to take on extra work and extra costs, and to change how they do their jobs. Developers have to take time from their delivery schedules to understand and use the tools, and they need to understand how much time this is going to require. They need to be convinced that the problems found by the tools are worth taking time to look at and fix. They may need help or training to understand what the findings mean and how to fix them properly. They will need time to fix the problems and more time to test and make sure that they didn't break anything by accident. And they will need help with integrating static analysis into how they build and test software going forward.

Who Owns and Runs the Tools?

The first thing to decide is who in the organization owns static analysis testing: setting up and running the tools, reviewing and qualifying findings, and getting problems fixed.

Gary McGraw at Cigital explains that there are two basic models for owning and running static analysis tools.

In some organizations, Infosec owns and runs the tools, and then works with developers to get problems fixed (or throws the results over the wall to developers and tells them that they have a bunch of problems that need to be fixed right away). This is what McGraw calls a “Centralized Code Review Factory”. The security team can enforce consistent policies and make sure that all code is scanned regularly, and follows up to make sure that problems get fixed.

This saves developers the time and trouble of having to understand the tool and setting up and running the scans, and the Infosec team can make it even easier for developers by reviewing and qualifying the findings before passing them on (filtering out false positives and things that don’t look important). But developers don’t have control over when the scans are run, and don’t always get results when they need them. The feedback cycle may be too slow, especially for fast-moving Agile and Devops teams who rely on immediate feedback from TDD and Continuous Integration and may push out code before the scan results can even get back to them.

A more scalable approach is to make the developers directly responsible for running and using the tools. Infosec can help with setup and training, but it’s up to the developers to figure out how they are going to use the tools and what they are going to fix and when. In a “Self Service” model like this, the focus is on fitting static analysis into the flow of development, so that it doesn't get in the way of developers’ thinking and problem solving. This might mean adding automated scanning into Continuous Integration and Continuous Delivery toolchains, or integrating static analysis directly into developers’ IDEs to help them catch problems immediately as they are coding (if this is offered with the tool).

Disciplined development and Devops teams who are already relying to automated developer testing and other quality practices shouldn't find this difficult – as long as the tools are set up correctly from the start so that they see value in what the tools find.

Getting Developers to use Static Analysis

There are a few simple patterns for adopting static analysis testing that we've used, or that I have seen in other organizations, patterns that can be followed on their own or in combinations, depending on how much software you have already written, how much time you have to get the tools in, and how big your organization is

Drop In, Tune Out, Triage

Start with a pilot, on an important app where bugs really matter, and that developers are working on today. The pilot could be done by the security team (if they have the skills) or consultants or even the vendor with some help from development; or you could make it a special project for a smart, senior developer who understands the code, convince them that this is important and that you need their help, give them some training if they need it and assistance from the vendor, and get them to run a spike – a week or two should be enough to get started.

The point of this mini-project should be to make sure that the tool is installed and setup properly (integrate it into the build, make sure that it is covering the right code), understand how it provides feedback, make sure that you got the right tool, and then make it practical for developers to use. Don’t accept how the tool runs by default. Run a scan, see how long it takes to run, review the findings and focus on cutting the false positives and other noise down to a minimum. Although vendors continue to improve the speed and accuracy of static analysis tools, most static analysis tools err on the side of caution by pointing out as many potential problems as possible in order to minimize the chance of false negatives (missing a real bug). Which means a lot of noise to wade through and a lot of wasted time.

If you start using SAST early in a project, this might not be too bad. But it can be a serious drag on people’s time if you are working with an existing code base: depending on the language, architecture, coding style (or lack of), the size of the code base and its age, you could end up with hundreds or thousands of warnings when you run a static analysis scan. Gary McGraw calls this the “red screen of death” – a long list of problems that developers didn't know that they had in their code yesterday, and are now told that they have to take care of today.

Not every static analysis finding needs to be fixed, or even looked at in detail. It’s important to figure out what’s real, what’s important, and what’s not, and cut the list of findings down to a manageable list of problems that are worth developers looking into and maybe fixing. Each application will require this same kind of review, and the the approach to setup and tuning may be different.

A good way to reduce false positives and unimportant noise is by looking at the checkers that throw off the most findings – if you’re getting hundreds or thousands of the same kind of warning, it’s less likely to be a serious problem (let’s hope) than an inaccurate checker that is throwing off too many false positives or unimportant lint-like nitpicky complaints that can safely be ignored for now. It is expensive and a poor use of time and money to review all of these findings – sample them, see if any of them make sense, get the developer to use their judgement and decide whether to filter them out. Turn off any rules that aren't important or useful, knowing that you may need to come back and review this later. You are making important trade off decisions here – trade-offs that the tool vendor couldn't or wouldn't make for you. By turning off rules or checkers you may be leaving some bugs or security holes in the system. But if you don’t get the list down to real and important problems, you run the risk of losing the development team’s cooperation altogether.

Put most of your attention on what the tool considers serious problems. Every tool (that I've seen anyway) has a weighting or rating system on what it finds, a way to identify problems that are high risk and a confidence rating on what findings are valid. Obviously high-risk, high-confidence findings are where you should spend most of your time reviewing and the problems that probably need to be fixed first. You may not understand them all right away, why the tool is telling you that something is wrong or how to fix it correctly. But you know where to start.

Cherry Picking

Another kind of spike that you can run is to pick low hanging fruit. Ask a smart developer or a small team of developers to review the results and start looking for (and fixing) real bugs. Bugs that make sense to the developer, bugs in code that they have worked on or can understand without too much trouble, bugs that they know how to fix and are worth fixing. This should be easy if you've done a good job of setting up the tool and tuning upfront.

Look for different bugs, not just one kind of bug. See how clearly the tool explains what is wrong and how to correct it. Pick a handful and fix them, make sure that you can fix things safely, and test to make sure that the fixes are correct and you didn't break anything by accident. Then look for some more bugs, and as the developers get used to working with the tool, do some more tuning and customization.

Invest enough time to for the developers to build some confidence that the tool is worth using, and to get an idea of how expensive it will be to work with going forward. By letting them decide what bugs to fix, you not only deliver some real value upfront and get some bugs fixed, but you also help to secure development buy-in: “see, this thing actually works!” And you will get an idea of how much it will cost to use. If it took this long for some of your best developers to understand and fix some obvious bugs, expect it to take longer for the rest of the team to understand and fix the rest of the problems. You can use this data to build up estimates of end-to-end costs, and for later trade-off decisions on what problems are or aren't worth fixing.

Bug Extermination

Another way to get started with static analysis is to decide to exterminate one kind of bug in an application, or across a portfolio. Pick the “Bug of the Month”, like SQL injection – a high risk, high return problem. Take some time to make sure everyone understands the problem, why it needs to be fixed, how to test for it. Then isolate the findings that relate to this problem, figure out what work is required to fix and test and deploy the fixes, and “get er done”.

This helps to get people focused and establish momentum. The development work and testing work is simpler and lower risk because everyone is working on the same kind of problem, and everyone can learn how to take care of it properly. It creates a chance to educate everyone on how to deal with important kinds of bugs or security vulnerabilities, patch them up and hopefully stop them from occurring in the future.

Fix Forward

Reviewing and fixing static analysis findings in code that is already working may not be worth it, unless you are having serious reliability or security problems in production or need to meet some compliance requirement. And as with any change, you run the risk of introducing new problems while trying to fix old ones, making things worse instead of better. This is especially the case for code quality findings. Bill Pugh, the father of Findbugs, did some research at Google which found that

“many static warnings in working systems do not actually manifest as program failures.”

It can be much less expensive and much easier to convince developers to focus only on reviewing and fixing static analysis findings in new code or code that they are changing, and leave the rest of the findings behind, at least to start.

Get the team to implement a Zero Bug Tolerance program or some other kind of agreement within the development team to review and cleanup as many new findings from static scans as soon as they are found – make it part of their “Definition of Done”. At Intuit, they call this “No New Defects”.

Whatever problems the tools find should be easy to understand and cheap to fix (because developers are working on the code now, they should know it well enough to fix it) and cheap to test – this is code that needs to be tested anyway. If you are running scans often enough, there should only be a small number of problems or warnings to deal with at a time. Which means it won’t cost a lot to fix the bugs, and it won’t take much time – if the feedback loop is short enough and the guidance from the tool is clear enough on what’s wrong and why, developers should be able to review and fix every issue that is found, not just the most serious ones. And after developers run into the same problems a few times, they will learn to avoid them and stop making the same mistakes, improving how they write code.

To do this you need to be able to differentiate between existing (stale) findings and new (fresh) issues introduced with the latest check-in. Most tools have a way to do this, and some, like Grammatech's CodeSonar are specifically optimized to do incremental analysis.

This is where fast feedback and a Self-Service approach can be especially effective. Instead of waiting for somebody else to run a scan and pass on the results or running ad hoc scans, try to get the results back to the people working on the code as quickly as possible. If developers can’t get direct feedback in the IDE (you’re running scans overnight, or on some other less frequent schedule instead), there are different ways to work with the results. You could feed static analysis findings directly into a bug tracker. Or into the team’s online code review process and tools (like they do at Google) so that developers and reviewers can see the code, review comments and static analysis warnings at the same time. Or you could get someone (a security specialist or a developer) to police the results daily, prioritize them and either fix the code themselves or pass on bugs or serious warnings to whoever is working on that piece of code (depending on your Code Ownership model). It should only take a few minutes each morning – often no time at all, since nothing may have been picked up in the nightly scans.

Fixing forward gets you started quicker, and you don’t need to justify a separate project or even a spike to get going – it becomes just another part of how developers write and test code, another feedback loop like running unit tests. But it means that you leave behind some – maybe a lot of – unfinished business.

Come Back and Clean House

Whatever approach you take upfront – ignoring what’s there and just fixing forward, or cherry picking, or exterminating one type of bug – you will have a backlog of findings that still should be reviewed and that could include real bugs which should be fixed, especially security vulnerabilities in old code. Research on “The Honeymoon Effect” shows that there can be serious security risks in leaving vulnerabilities in old code unfixed, because this gives attackers more time to find them and exploit them.

But there are advantages to waiting until later to review and fix legacy bugs, until the team has had a chance to work with the tool and understand it better, and until they have confidence in their ability to understand and fix problems safely. You need to decide what to do with these old findings. You could mark them and keep them in the tool’s database. Or you could export them or re-enter them (at least the serious ones) into your defect tracking system.

Then schedule another spike: get a senior developer, or a few developers, to review the remaining findings, drop the false positives, and fix, or put together a plan to fix, the problems that are left. This should be a lot easier and less expensive, and safer, now that the team knows how the tool works, what the findings mean, what findings aren't bugs, what bugs are easy to fix, what bugs aren't worth fixing and what bugs they should be careful with (where there may be a high chance of introducing regression bug by trying to make the tool happy). This is also the time to revisit any early tuning decisions that you made, see if it is worthwhile to turn some checkers or rules back on.

Act and Think for the Long Term

Don’t treat static analysis testing like pen testing or some other security review or quality review. Putting in static analysis might start with a software security team (if your organization is big enough to have one and they have the necessary skills) or some consultants, but your goal has to be more than just handing off a long list of tool findings to a development lead or project manager.

You want to get those bugs fixed – the real ones at least. But more importantly, you want to make static analysis testing an integral part of how developers think and work going forward, whenever they are changing or fixing code, or whenever they are starting a new project. You want developers to learn from using the tools, from the feedback and guidance that the tools offer, to write better, safer and more secure code from the beginning.

In “Putting the Tools to Work: How to Succeed with Source Code Analysis” Pravir Chandra, Brian Chess and John Steven (three people who know a lot about the problem) list five keys to successfully adopting static analysis testing:

Start small – start with a pilot, learn, get some success, then build out from there.
Go for the throat – rather than trying to stomp out every possible problem, pick the handful of things that go wrong most often and go after them first. You’ll get a big impact from a small investment.
Appoint a champion – find developers who know about the system, who are respected and who care, sell them on using the tool, get them on your side and put them in charge.
Measure the outcome – monitor results, see what bugs are being fixed, how fast, which bugs are coming up, where people need help.
Make it your own – customize the tools, write your own application-specific rules. Most organizations don’t get this far, but at least take time early to tune the tools to make them efficient and easy for developers to use, and to filter out as much noise as soon as possible.

Realize that all of this is going to take time, and patience. Be practical. Be flexible. Work incrementally. Plan and work for the long term. Help people to learn and change. Make it something that developers will want to use because they know it will help them do a better job. Then you've done your job.

Building Real Software

Thursday, March 20, 2014

Implementing Static Analysis isn't that easy