Saturday, October 17, 2009

The real cost of software security

There has been a lot of discussion in the blogosphere over the last few months on costs and ROI justifications for building secure software. Back in July, I responded to a post by Jeremiah Grossman, CTO at White Hat Software, which examined the end-to-end costs of software security, whether and how upfront investments in a secure SDLC mitigate downstream security costs and risks: a classic “pay me now or pay me (much more) later” problem. In my response to Jeremiah’s analysis, I tried to break out the costs of building secure software: what I now think of as direct, “hard” pure security costs, compared to indirect, "soft" supporting costs, the costs of building software properly in the first place. From a budgeting and ROI perspective, it is important to break out the costs of building software correctly in the first place, the foundational practices, from real security costs.

An effective software security program has to rely on a foundation of software quality management practices and governance. Your quality management program can be lightweight and agile, but if you are hacking out code without care for planning, risk management, design, code reviews, testing, incident management then what you are trying to build is not going to be secure. Period. Coming from a software development background, I feel that the security community is trying to take on too much of the responsibility, and too much of the costs, for ensuring that basic good practices are in place.

But it’s more than that. If you are doing risk management, design reviews, code reviews, testing, change and release management, then adding security requirements, perspectives, controls to these practices can be done simply, incrementally, and at a modest cost:
  1. Risk management: add security risk scenarios, threat assessment to your technical risk management program.
  2. Design reviews: add threat modeling where it applies, just as you should add failure mode analysis for reliability.
  3. Code reviews: first you can argue, as I did earlier in my response, that a significant number of security coding problems are basic quality problems, simply a matter of writing crappy code: poor or missing input validation (a reliability problem, not just a security problem), lousy error handling (same), race conditions and deadlocks (same), and so on. If making these kinds of mistakes a security problem (and a security cost) helps get programmers to take them seriously, then go ahead, but it shouldn’t be necessary. So what you are left with are the costs of dealing with “hard” security coding problems, like improper use of crypto, secure APIs, authentication and session management, and so on.
  4. Static analysis: as I argued in an earlier post on the value of static analysis, static analysis checks should be part of your build anyways. If the tools that you use, like Coverity or Klocwork, check for both quality and security mistakes, then your incremental costs are in properly configuring the tools and understanding, and dealing with, the results for security defects.
  5. Testing: security requirements need to be tested along with other requirements, on a risk basis. Regression tests, boundary and edge tests, stress tests and soak tests, negative destructive tests (thinking like an attacker), even fuzzing should be part of your testing practices for building robust and reliable software anyways.
  6. Change management and release management: ensure that someone responsible for security has a voice on your change advisory board. Add secure deployment checks and secure operations instructions to your release and configuration management controls.
  7. Incident management: security vulnerabilities should be tracked and managed as you would other defects. Handle security incidents as “level 1” issues in your incident management, escalation, and problem management services.
So what are the direct costs of a software security program? Looking over my own budget, these are the major cost items that I can find:
  1. Training managers, architects, developers, testers on security concepts, issues, threats, practices. You need to invest in training upfront, and then refresh the team: Microsoft’s SDL requires that technical staff be retrained annually, to make sure that the team is aware of changes to the threat landscape, to attack bad habits, to reinforce good ones.
  2. As I described in an earlier post on our software security roadmap , we hired expert consultants to conduct upfront secure architecture and code reviews, to help define secure coding practices, and to work with our architects and development manager to plan out a roadmap for software security. Paying for a couple of consulting engagements was worthwhile and necessary to kickstart our software security program and to get senior technical staff and management engaged.
  3. Buying “black box” vulnerability scanning tools like IBM Rational Appscan and Nessus, and the costs of understanding and using these tools to check for common vulnerabilities in the application and the infrastructure.
  4. Penetration testing: hiring experts to conduct penetration tests of the application a few times per year, to check that we haven’t got sloppy and missed something in our internal checks, tests and reviews, and to learn more about new attacks.
  5. Some extra management and administrative oversight of the security program, and of suppliers and partners.
The other incremental costs of building secure software, like the costs for building robust and reliable software, are now effectively burned in to our SDLC, into how we plan, design, build, test, deploy and support software. I could break out the incremental cost burden of these security practices and controls, but the costs would be modest - most of the cost, and the work, is in building software properly. And by following an incremental, optimizing approach, starting small and continuously reviewing and improving, not only are upfront costs for a software security program reduced, but the ROI is realized much faster. If you set your quality bar high enough, the real costs of secure software are surprisingly low.

Saturday, October 10, 2009

Dreaming in Code - A Failure in Leadership

Reading Scott Rosenberg’s Dreaming in Code gives you a sick feeling, the same sick feeling that you have watching a movie where the hero’s life is coming unraveled, or when you are involved in a project that is going nowhere fast, facing certain failure, and there is nothing that you can do to change the outcome. I made myself read it twice – there are some hard lessons for managing software development in this book.

Dreaming in Code tells the story of the failed Chandler project started by Mitch Kapor, software industry visionary and founder of Lotus Development Corp, currently Chairman of the Mozilla Foundation. Chandler began as an ambitious, “change the world” project to design and build a radically different kind of personal information manager, an Outlook-and-Exchange-killer that would flatten silos between types of data and offer people new ways of thinking about information; provide programmers a cross-platform, extensible open source platform for new development; and create new ways to share data safely, securely, cheaply and democratically, in a peer-to-peer model.

The project started in the spring of 2001. Because of Mitch Kapor’s reputation and the project’s ambitions, he was able to assemble an impressive team of alpha geeks. Smart people, led by a business visionary who had experienced success, interesting problems to solve, lots of money, lots of time to play with new technology and chart a new course: a “dream project”.

But Dreaming in Code is a story of wasted opportunities. Scott Rosenberg, the book’s author, followed the project for 3 years starting in 2003, but he gave up before the team had built something that worked – he had a book to publish, whether the story was finished or not. By January 2008, Mitch Kapor had called it quits and left the company that he founded. Finally, in August 2008, the surviving team released version 1.0, scaled back to a shadow of its original goals and of course no longer relevant to the market.

It is interesting, but sad, to map this project against Construx’s Classic Mistakes list, to see the mistakes that were made:

Unclear project vision. Lack of project sponsorship. Requirements gold-plating. Feature creep. Research-oriented development. Developer gold-plating. Silver bullet syndrome. Insufficient planning. Adding people to a late project. Overly optimistic schedules. Wishful thinking. Unrealistic expectations. Insufficient risk management. Wasted time in the “fuzzy front end”: the team spent years wondering and wandering around, playing with technology, building tools, exploring candidate designs, trying to figure out what the requirements were - and never understood the priorities. Shortchanged quality assurance… hold on, quality was important to this project. It wasn’t going to play out like other Silicon Valley startups. Then why did they wait almost 3 years before hiring a test team (of one person), while they faced continual problems from the start with unstable releases, unacceptable performance, developers spending days or weeks bug hunting.

The book begins with a meeting in July 2003, when the team should be well into design and development, where the project manager announces that the team is doomed, that they cannot hope to deliver to their commitments – and not for the first time. This is met with…. well, nothing. Nobody, not even Mitch Kapor, takes action, makes any decisions. It doesn't get any better from there.

This project was going to be different, it was going to be done on a “design-first” basis. But months, even years into the project, the team is still struggling to come up with a useful design. The team makes one attempt at time-boxed development, a single iteration, and then gives up. Senior team members leave because nothing is getting done. Volunteers stop showing up. People in the community stop downloading the software because it doesn’t work, or it does less than earlier versions – the team is not just standing still, they are moving backwards.

The project manager, after only a few months, quits. Mitch Kapor takes over. A few months later, the “fun draining out of his job”, he asks the team to redesign his job, and come up with a new management structure. OK, to be fair, he is rich and doesn’t have to work hard on this, but why start it all in the first place, why put everyone involved, even those of us who are going to read the book, through all of this?

The new management team works through a real planning exercise for this first time, making real trade-offs, real decisions based on data. It doesn’t help. They fail to deliver – again. And again with an alpha release. They then spend another 21 months putting together a beta release, significantly cutting back on features. By the time they deliver 1.0 nobody cares.

It’s a sad story. It’s also boring at times – the team spends hours, days in fruitless, philosophical bull sessions; in meaningless, abstract, circular arguments; asking the same questions, confronting the same problems, facing (or avoiding) the same decisions again and again. As the book says, “it’s Groundhog Day, again”. I hope that the writer may have not fully captured what happened in the design sessions and planning meetings – that, like the Survivor tv show, we only see what the camera wants us too, or happens to, an incomplete picture. That these people were actually more focused, more serious, more responsible than they come across.

The project failed on so many levels. A failure to set understandable, achievable goals. A failure to understand, or even attempt to articulate requirements. A failure to take advantage of talent. A failure to engage, to establish and maintain momentum. A failure to manage risks. A failure to learn from mistakes - their own, or others.

Most fundamentally, it was a failure of leadership. Mitch Kapor failed to hold himself and the team accountable. He failed to provide the team with meaningful direction, failed to understand and explain what was important. He failed to create an organization where people understood what it took to deliver something real and useful, where people cared about results, about the people who would, hopefully, someday, use what they were supposed to be building. And he gave up: he gave up well before 2008 when he left the company; he gave up almost from the start, he gave up when the hard, real work of building out his vision had actually begun.

Wednesday, October 7, 2009

A Joel Test for Software Security

Back in 2000, Joel Spolsky, software developer, entrepreneur, founder of StackOverflow and popular blogger on the business of building software, proposed a “highly irresponsible, sloppy test to rate the quality of a software team”, known as The Joel Test.

The Joel Test is a crude but effective tool for checking the maturity of a software development team, using simple, concrete questions to determine whether a team is following core best practices. The test could use a little sprucing up, to reflect improvements in the state of the practice over the last 10 years, to take into account some of the better ideas introduced with XP and Scrum. For example, “Do you make daily builds?” (question 3) should be updated to ask whether the team is following Continuous Integration. And you can argue that “Do you do hallway usability testing” (question 12) should be replaced with a question that asks whether the team works closely and collaboratively with the customer (or customer proxy) on requirements, product planning and prioritization, usability, and acceptance. And one of the questions should ask whether the team conducts technical (design and code) reviews (or pair programming).

A number of other people have considered how to improve and update the Joel Test. But all in all, the Joel Test has proved useful and has stood the test of time. It is simple, easy to remember, easy to understand and apply, it is directly relevant to programmers and test engineers (the people who actually do the work), it is provocative and it is fun. It makes you think about how software should be built, and how you measure up.

How does the Joel Test work?


It consists of 12 concrete yes/no questions that tell a lot about how the team works, how it builds software, how disciplined it is. A yes-score of 12 is perfect (of course), 11 is tolerable, a score of 10 or less indicates serious weaknesses. The test can be used by developers, or by managers, to rate their own organization; by developers who are trying to decide whether to take a job at a company (ask the prospective employer the questions, to see how challenging or frustrating your job will be); or for due diligence as a quick “smoke test”.

A recent post by Thomas Ptacek of Matsano Security explores how to apply the Joel Test to network security and IT management. In the same spirit, I am proposing a “Joel Test” for software security: a simple, concrete, informal way to assess a team’s ability to build secure software. This is a thought experiment, a fun way of thinking about software security and what it takes to build secure software, following the example of the Joel Test, its principles and its arbitrary 12-question framework. It is not, of course, an alternative to comprehensive maturity frameworks like SAMM or BSIMM, which I used as references in preparing this post, but I think a simple test like this can still provide useful information.

So, here is my attempt at the 12 questions that should be asked in a Software Security “Joel Test”:

1. Do you have clear and current security policies in place so developers know what they should be doing, and what they should not be doing? Realistic, concrete expectations, not legalese or boiler plate. Guidelines that programmers can follow and do follow in building secure software.

2. Do you have someone (person or a team) who is clearly responsible for software security? Someone who helps review design and code from a security perspective, who can coach and mentor developers and test engineers, provide guidelines and oversight, make risk-based decisions regarding security issues. If everybody is accountable for security, then nobody is accountable for security. You need to have someone who acts as coach and cop, who has knowledge and authority.

3. Do you conduct threat modeling, as part of, or in addition to, your design reviews? This could be lightweight or formal, but some kind of structured security reviews need to be done especially for new interfaces, major changes.

4. Do your code reviews include checks for security and safety issues? If you have to ask, “ummm, what code reviews?”, then you have a lot of work ahead of you.

5. Do you use static analysis checking for security (as well as general quality) problems as part of your build?

6. Do you perform risk-based security testing? Does this include destructive testing, regular penetration testing by expert pen testers, and fuzz testing?

7. Have you had an expert, security-focused review of your product’s architecture and design? To ensure that you have a secure baseline, or to catch fundamental flaws in your design that need to be corrected.

8. Do product requirements include security issues and needs? Are you, and your customers, thinking about security needs up front?

9. Does the team get regular training in secure development and defensive coding? Microsoft’s SDL recommends that team members get training in secure design, development and testing at least once per year to reinforce good practices and to stay current with changes in the threat landscape.

10. Does your team have an incident response capability for handling security incidents? Are you prepared to deal with security incidents, do you know how to escalate, contain and recover from security breaches or respond to security problems found outside of development, communicate with customers and partners.

11. Do you track security issues and risks in your bug database / product backlog for tracking and followup? Are security issues made visible to team members for remediation?

12. Do you provide secure configuration and deployment and/or secure operations guidelines for your operations team or customers?

These are the 12 basic, important questions that come to my mind. It would be interesting to see alternative lists, to find out what I may have missed or misunderstood.