Building Real Software: The Value of Static Analysis Tools

Thursday, June 25, 2009

The Value of Static Analysis Tools

Just how effective is static analysis, what does it protect you from?

There is a lot of attention given to static analysis tools, especially from the software security community - and some serious venture capital money being thrown at static analysis tool providers such as Coverity.

The emphasis on using static analysis tools started with Cigital's CTO Gary McGraw in his definitive book on Software Security: Building Security In. In a recent interview with Jim Manico for OWASP (Jan 2009), Dr. McGraw went so far as to say that

“My belief is that everybody should be using static analysis tools today. And if you are not using them, then basically you are negligent, and you should prepare to be sued by the army of lawyers that have already hit the beach”.

Statements like this, from a thought leader in the software security community, certainly encourage companies to spend more money on static analysis tools, and of course should help Cigital’s ownership position in leading static analysis tool provider Fortify Software, which Cigital helped to found.

You can learn more about the important role that static analysis plays in building secure systems from Brian Chess, CTO of Fortify, in his book Secure Programming with Static Analysis.

Secure software development maturity models like SAMM and BSIMM emphasize the importance of code reviews to find bugs and vulnerabilities, but especially the use of static analysis tools, and OWASP has a number of free tools and projects in this area.

Now even Gartner has become interested in this the emerging emerging static analysis marketand its players – evidence that the hype is reaching, or has reached, a critical point. In Gartner’s study of what they call Static Application Security Testing (SAST) suppliers (available from Fortify), they state that

“...enterprises must adopt SAST technology and processes because the need is strategic. Enterprises should use a short-term, tactical approach to vendor selection and contract negotiation due to the relative immaturity of the market.” Well there you have it: whether the products are ready or not, you need to buy them.

Gartner’s analysis puts an emphasis on full-service offerings and suites: principally, I suppose, because CIOs at larger companies, who are Gartner’s customers, don’t want to spend a lot of time finding the best technology and prefer to work with broad solutions from strategic technology partners, like IBM or HP (neither of which has strong static analysis technology yet, so watch for acquisitions of the independents to fill out their security tool portfolios, as they did in the dynamic analysis space). Unfortunately, this has led vendors like Coverity to spend their time and money on filling out a larger ALM portfolio, building and buying in technology for build verification, software readiness (I still don’t understand who would use this) and architecture analysis rather than investing in their core static analysis technology. On their site, Coverity proudly references a recent story "Coverity: a new Mercury Interactive in the making?", which should make their investors happy and their customers nervous – as a former Mercury Interactive, now HP customer, I can attest that while the acquisition of Mercury by HP may have been good for HP and good for Mercury’s investors, it was not good for Mercury’s customers, at least the smaller ones.

The driver behind static analysis is its efficiency: you buy a tool, you run it, it scans thousands and thousands of lines of code and finds problems, you fix the problems, now you’re secure. Sounds good, right?

But how effective is static analysis? Do these tools find real security problems?

We have had success with static analysis tools, but it hasn’t been easy. Starting in 2006, some of our senior developers started working with FindBugs (we’re a Java shop) because it was free, it was easy to get started with, and it found some interesting, and real, problems right away. After getting to understand the tool and how it worked, some cleanup and a fair amount of time invested by a smart, diligent and senior engineer to investigate false positives and setup filters on some of the checkers, we added FindBugs checking to our automated build process, and it continues to be our first line of defense in static analysis. The developers are used to checking the results of the FindBugs analysis daily, and we take all of the warnings that it reports seriously.

Later in 2006, as part of our work with Cigital to build a software security roadmap, we conducted a bake-off of static analysis tool vendors including Fortify, Coverity, Klocwork (who would probably get more business if they didn't have a cutesy name that is so hard to remember), and a leading Java development tools provider whose pre-sales support was so irredeemably pathetic that we could not get their product installed, never mind working. We did not include Ounce Labs at the time because of pricing, and because we ran out of gas, although I understand that they have a strong product.

As the NIST SAMATE study confirms, working with different tool vendors is a confusing and challenging and time-consuming process: the engines work differently, which is good since they catch different types of problems, but there is no consistency in the way that warnings are reported or rated, and different terms are used by different vendors to describe the same problem. And there is the significant problem of dealing with noise: handling the large number of false positives that get reported by all of the tools (some are better than others), understanding what to take seriously.

At the time of our initial evaluation, some of the tools were immature, especially the C/C++ tools that were being extended into the Java code checking space (Coverity and Klocwork). Fortify was the most professional and prepared of the suppliers. However, we were not able to take advantage of Fortify’s data flow and control flow analysis (one of the tool’s most powerful analysis capabilities) because of some characteristics of our software architecture. We verified with Fortify and Cigital consultants that it was not possible to take advantage of the tool’s flow analysis, even with custom rules extensions, without fundamentally changing our code. This left us with relying on the tool’s simpler security pattern analysis checkers, which did not uncover any material vulnerabilities. We decided that with these limitations, the investment in the tool was not justified.

Coverity’s Java checkers were also limited at that time. However, by mid 2007 they had improved the coverage and accuracy of their checkers, especially for security issues and race conditions checking, and generally improved the quality of their Java analysis product. We purchased a license for Coverity Prevent, and over a few months worked our way through the same process of learning the tool, reviewing and suppressing false positives, and integrating it into our build process. We also evaluated an early release of Coverity’s Dynamic Thread Analysis tool for Java: unfortunately the trial failed, as the product was not stable – however, it has potential, and we will consider looking at it again in the future when it matures.

Some of the developers use Klocwork Developer for Java, now re-branded as Klocwork Solo, on a more ad hoc basis: for small teams, the price is attractive, and it comes integrated into Eclipse.

In our build process we have other static analysis checks, including code complexity checking and other metric trend analysis using an open source tool JavaNCSS to help identify complex (and therefore high high-risk) sections of code, and we have built proprietary package dependency analysis checks into our build to prevent violation of dependency rules. One of our senior developers has now started working with Structure101 to help us get a better understanding of our code and package structure and how it is changing over time. And other developers use PMD to help cleanup code, and the static analysis checkers included in IntelliJ.

Each tool takes different approaches and has different strengths, and we have seen some benefits in using more than one tool as part of a defense-in-depth approach. While we find by far the most significant issues in manual code reviews or exploratory testing or through our automated regression safety net, the static analysis tools have been helpful in finding real problems.

While FindBugs does only “simple, shallow analysis of network security vulnerabilities", and analysis of malicious code vulnerabilities as security checks, it is good at finding small, stupid coding mistakes that escape other checks, and the engine continues to improve over time. This open source project deserves more credit: it offers incredible value to the Java development community, and anyone building code in Java that who does not take advantage of it is a fool.

Coverity reports generally few false positives, and is especially good for finding potential thread safety problems and null pointer (null return and forward null) conditions. It also comes with a good management infrastructure for trend analysis and review of findings. Klocwork is the most excitable and noisiest of all of our tools, but it includes some interesting checkers that are not available in the other tools – although after manual code reviews and checks by the other static analysis tools, there is rarely anything of significance left for it to consider.

But more than the problems that the tools find directly, the tools help to identify areas where we may need to look deeper: where the code is complex, or too smarty pants fancy, or otherwise smelly, and requires followup review. In our experience, if a mature analysis tool like FindBugs reports warnings that don’t make sense, it is often because it is confused by the code, which in turn is a sign that the code needs to be cleaned up. We have also seen the number of warnings reported decline over time as developers react to the “nanny effect” of the tools’ warnings, and change and improve their coding practices to avoid being nagged. And the final benefit of using these tools is that this frees up the developers to concentrate on higher-value work in their code reviews: they don’t have to spend so much time looking out for fussy, low-level coding mistakes, because the tools have found them already, so the developers can concentrate on more important and more fundamental issues like correctness, proper input validation and error handling, optimization, simplicity and maintainability.

While we are happy with the toolset we have in place today, I sometimes wonder whether we should beef up our tool-based code checking. But is it worth it?

In a presentation at this year’s JavaOne conference, Prof. Bill Pugh, the Father of FindBugs says that

“static analysis, at best, might catch 5-10% of your software quality problems.”

He goes on to say, however, that static analysis is 80+% effective at finding specific defects and cheaper than other techniques for catching these same defects – silent, nasty bugs and programming mistakes.

Prof. Pugh emphasizes that static analysis tools have value in a defense-in-depth strategy for quality and security, combined with other techniques; that “each technique is more efficient at finding some mistakes than others”; and that “each technique is subject to diminishing returns”.

In his opinion, “testing is far more valuable than static analysis”, and “FindBugs might be more useful as an untested code detector than a bug detector”. If FindBugs finds a bug, you have to ask: “Did anyone test that code”?”. In our experience, Prof Pugh’s FindBugs findings can be applied to the other static analysis tools as well.

As technologists, we are susceptible to the belief that technology can solve our problems – the classic “silver bullet” problem. When it comes to static analysis tools, you’d be foolish not to use a tool at all, but at the same time you’d be foolish to expect too much from them – or pay too much for them.

7 comments:

dre said...: Jim, Excellent article.

One point is that JavaNCSS (and McCabe's CC) might be somewhat useful as key indicators for quality -- however, these metrics do not translate to security.

Anyone who is validating or verifying security controls in codebases would be better without these McCabe CC numbers -- and it would be better for them to focus on security patterns (or lack thereof) through UML or simple domain level class/sequence diagrams. Or, they could also look at controls missing in the frameworks themselves, and concentrate on those areas.

PMD and the Eclipse plugin "Unnecessary Code Detector" can remove lots of code, thus making security control reviews easier, but they don't add a lot of immediate or direct benefit to reduction of risk.

FindBugs, Fortify, Ounce, Coverity, Parasoft, Klockwork, Microsoft, IBM, HP (and any other competitors in the Gartner-defined SAST market) are all missing that key ALM component that you mention. This is really key to integrating SAST with anything useful long-term.

HP and Fortify announced a new product, which will be available in October 2009 -- which integrates HP QC (i.e. ALM) with SAST. Today, HP can integrate AMP/QC with QAI and DI. DI is a SAST tool, but it's not as complete as Fortify, Microsoft CAT.NET, Ounce, or Checkmarx (for managed code) and it doesn't do any of the stuff that Coverity, Klocwork, or Microsoft PREfast/PREfix (for unmanged code) does. Additionally, for managed code, DI is only relevant for .NET and Java Enterprise, so it doesn't even work with PHP.

The reality is that without ALM (HP QC or any competitor), these tools suffer key integration and become useless to almost anyone except application penetration-testers with a 400-pound Guerrilla-PhD hacker-ninja credential behind them. Even then, DAST has been proven (by Fortify during BlackHat Iron Chef) to find about as many bugs (but different types of bugs, as Prof Pugh alludes to) in the same amount of time or faster.

Microsoft is looking at pieces of the ALM with their Microsoft SDL Process Template, based on their original MPT (Microsoft Process Template) tool. Java Enterprise could accomplish the same using JFeature. MSPT combined with CAT.NET in the MSBuild tools is all free. JFeature combined with the FindBugs Ant build tools is also free.

The primary problem with DI is also that it's stuck in the IDE, although it does integrate back to AMP, and then to the HP QC ALM. QAI integrates directly with HP AMP and QC simultaneously, but QAI is purely DAST.

Additionally, there are problems with improving the coverage of both SAST and DAST through a single tool. EMMA (a Java Enterprise coverage reporter) will work on both static and runtime code, but I've never heard of anyone using it for this purpose.

In the future, you'll see IBM, Microsoft, Atlassian, Parasoft, and potentially other ALM solutions integrate SAST/DAST more effectively.

I see "WAF technology" integrating as purely DAST in test/inspection phases, instead of in release management or operations -- and it is another missing component even in the HP+Fortify AMP/QC + QAI+DI+360 "all-in-one" solution. For example, where does Microsoft AntiXSS Security Runtime Engine (SRE) fit into the Secure ALM architecture? Or the Microsoft AntiCSRF IHttpModule or the generic IIS7 IHttpModule security feature set?

Generic Java Enteprise and PHP projects today might utilize Secure Tomcat or Mod-Security as equivalents to the above, which are also long-off from integrating with Secure ALM. We're a long way off from integrating the basic components that fill the big picture architecture, folks.; June 25, 2009 at 2:46 PM
Anonymous said...: Another option to consider: Instantiations offers CodePro AnalytiX for static analysis (plus JUnit testing and more).. includes over 1200 customizable audit rules, with 225 security rules based on OWASP standards.

http://www.instantiations.com/codepro/download-trial.html; June 26, 2009 at 7:22 AM
Andy said...: Good article. Thanks for the comparisons. Static analysis (SA) tools can be very helpful to improve the quality and security of software development. In practice, we've found that adoption suffers -- a few reasons:
* a good chunk of the defects reported fit the "I don't care" priority. The analysis may be pointing out a potential problem but it is actually intentional. With SA tools, because the bug reports are coding flaws, it's difficult to report on the effect of the defect to prioritize it well.
* most organizations don't invest the time/resource to institutionalize SA. Arguably, the developers most resistant to the tool are often the developers who need it the most.
* SA is new. There's no standardized way to do things and there are few out-of-the-box integrations with the existing pieces of the toolchain. Without a tried and true process, many organizations have to reinvent the wheel each time - and it can take years to develop an expertise in static analysis.; June 28, 2009 at 8:51 PM
James said...: Hi Jim,

Appreciated the article, thanks.

FYI - I made a similar comment, although not as comprehensive, in our blog "Static Analysis: Sooner rather than Later?" http://www.redlizards.com/blog/?p=139

Your C/C++ readers might be interested in our new Eclipse (MSVS next month) plug-in static analysis tool "Goanna" available for FREE trial download http://www.redlizards.com/download.html; June 28, 2009 at 11:11 PM
SM said...: Jim,

Could you share some details on which architectural characteristics prevented you from taking advantage of Fortify’s data flow and control flow analysis? Were they related to specific design patterns?

Thank you for the valuable article. I'm evaluating some of these same tools and this helps a lot.

Regards.; June 30, 2009 at 7:41 PM
Jim Bird said...: SM,

The problems with data flow and control flow analysis were specific to our implementation of the Command pattern. There is some indirection (intentional) in our implementation which confused the analyzer.; July 1, 2009 at 1:43 PM
SM said...: Jim,

Thanks for sharing that piece of knowledge. I hope you will continue to cover this subject. There's a lot of market hype around SAST tools but few useful user-perspective reviews.

Regards.; July 2, 2009 at 9:54 PM