Building Real Software: May 2011

All of us have known for a long time that code reviews find defects, and that reviews are cheaper and can be more effective than most kinds of testing. In Code Complete, Steve McConnell builds an overwhelming case for code reviews: disciplined code inspections can find between 45%-70% of all defects in code, while even fast, informal reviews can find 20%-30%. Studies at IBM, HP, Microsoft and other places show that it is several times cheaper to find bugs in code reviews than through testing. And evidence keeps coming in to support that code reviews work.

“Reviews catch more than half of a product’s defects regardless of the domain, level of maturity of the organization, or lifecycle phase during which they were applied”. What We Have Learned About Fighting Defects

Recent research into code review practices and advances in tools make reviews more effective and less expensive, and can change the way that we think of code reviews and the way that we do them.

Lightweight code reviews work

A lot of the literature on code reviews, including Code Complete, focus on formal, team-based code inspections, using a model defined by Michael Fagan at IBM in the 1970s. Heavyweight and high-ceremony, Fagan inspections typically involve a minimum of 4 people: the programmer, the reader, a moderator, and one or more reviewers. Each review meeting requires extensive preparation, lots of documentation, and can last several hours.

More people now are having success with lightweight but still disciplined code reviews; a middle ground between informal “hey pal can you take a look at this” over-the-shoulder code walkthroughs, and expensive full-on Fagan inspections. In the Modern Code Review chapter of Making Software, Jason Cohen explains that lightweight reviews can be almost as effective, and much less expensive than formal inspections. The formal review meeting in a Fagan inspection, which is expensive to run and difficult to arrange, is mostly a waste of time – 95% of defects are found before the meeting. The real purpose of the meeting is to communicate the findings and to make sure that the programmer understands them. If that’s the case, the meetings can be held in much less formal way, or not at all.

Tool-assisted reviews

Reviewers today can rely on tools to help make their work more effective and efficient. Static Analysis bug checkers like Findbugs, tools from Coverity and Klocwork, and even checkers built-in to IDEs like IntelliJ Idea make the job of code reviewers easier and more valuable. These tools catch subtle bugs and common mistakes in coding that get past compilers and that can get past reviewers too, they highlight crappy or questionable code, and can enforce style conventions and best practices in coding. This means that reviewers can focus on more fundamental issues like correctness and error handling, and safety and security issues (and these tools, and others like Fortify 360 help find security vulnerabilities too).

Formal review meetings are expensive and difficult to setup. A lot of smaller teams do reviews offline instead through email, with the occasional one-on-one meeting when the programmer or reviewer feels that it is necessary. Bigger, distributed teams can use peer review collaboration tools like Crucible or Review Board or SmartBear’s CodeCollaborator or Google Code Reviews (aka Rietveld) to share code review findings. These tools integrate with your version control system (and sometimes your bug tracker) and are especially useful if you are involving multiple reviewers in different locations and different timezones. Reviewers and the programmer can add annotations or comments directly in with the changes, cutting back on the need for review meetings.

Some teams use these tools to ensure that code reviews are done, and prove that they are being done for compliance: they can enforce workflows (that reviews are done before code is committed to the main code line) and managers (and auditors) can see the results of code reviews. New programmers can look at the review history for code that they are working on to understand more about the code and about how to do an effective review, and programmers and managers can go back over the results of reviews and find ways to improve them. Some of these tools collect code review metrics, and others, like Klocwork Inspect, integrate reviewer findings and static analysis findings into the same review space.

Another code review tool, this time to help with security reviews, is Agnitio from David Rook, the Security Ninja. Agnitio serves a different purpose: rather than helping developers collaborate on a code review, it structures and guides a reviewer through a security review, following a detailed code and design review checklist, and then records the results of each review.

Reviews find more than functional defects

Reviews can find important and fundamental design and implementation problems. This includes subtle logic problems and operational safety weaknesses that testers may not find because they are hard to test for, and reviewers have the “white box advantage” that they can see problems or omissions in the code. And code reviews are important for finding security vulnerabilities – often the only way to find vulnerabilities, except through exhaustive and expensive pen testing. This is why code reviews are a fundamental part of secure SDLC’s like Microsoft’s SDL.

But there’s even more to reviews than finding bugs and security vulnerabilities. An interesting study by Mantyla and Lassenius in 2009 “What types of defects are really discovered in code reviews?” builds on earlier research by Siy and Votta at Bell Labs in the late 1990s, to show that the majority of problems found by reviewers are not functional mistakes, but what the reviewers call evolvability defects: issues that make code harder to understand and maintain, more fragile and more difficult to modify and fix.

On average, between 60% and 75% of the defects found in code reviews fall into this class. Of these, around 1/3 are simple code clarity issues: improving element naming and comments, making sure that the code makes sense. Most of the rest of the findings are organizational problems where the code is poorly structured, or duplicate or unused code; or approach problems where the reviewer can see a much simpler and cleaner implementation, sometimes replacing hand-rolled code with built in language features or library calls. And reviewers can also find changes that don’t belong or aren’t required, copy-and-paste mistakes and inconsistencies.

These defects or recommendations feed back into refactoring and are important for future maintenance of the software, reducing complexity and making it easier to change or fix the code in the future. But it’s more than this: many of these changes also reduce the technical risk of implementation, offering simpler and safer ways to solve the problem, and isolating changes or reducing the scope of a change, which in turn will reduce the number of defects that could be found in testing or escape into the wild.

So why aren’t more people reviewing code?

A study in 2005 found that less than half of development teams are following code reviews.

WTF?

Sure, reviews are hard. One of my colleagues who spends a lot of his time reviewing code finds it exhausting, much harder than writing code himself. To do a good review, you have to spend the time to understand the code and what the change was supposed to do, and then put yourself in to the mind of the programmer and work through the solution approach that they decided on. It takes time and it takes discipline. But the return on investment is clear.

I’ll give a pass to XP-style teams that already follow disciplined pair programming – although the focus of pairing is on finding a good solution, rather than looking for problems, a lot of mistakes and structural issues will still be caught in pair programming. Other problems can be caught by a good static analysis tool. And most of these teams are also writing disciplined unit tests, probably following TDD. They’re not writing sloppy code.

Solo programmers have an obvious problem: if you’re coding on your own, who’s going to review your code? You’re limited to self-reviews: setting the code aside for a while and then reviewing it on your own. This is more effective than you would expect. Jason Cohen says that programmers reviewing their own code can find as many as half of the issues that another reviewer would find, just by taking a second look. And solo programmers can also rely on static analysis tools to find some problems.

Reviews don’t need to be a big deal, you don’t need formal review meetings. And there are tools to help make reviews cheaper, easier and more effective. So, what about the rest of you? Why aren’t you doing code reviews? What’s your excuse?

In Prerequisites for Continuous Deployment Dan Ackerson asked “What are your major obstacles to deploying continuously to your live servers”?

This must have been a rhetorical question, since my response is “awaiting moderation”. Why ask a question if you don’t want answers?

There are a number of obstacles to deploying meaningful changes continuously to live production servers, besides having a working Continuous Integration environment and following disciplined development practices.

Making interface changes and certifying them with partners. Making database schema and data changes. Security checks. Destructive testing and stress testing for performance-sensitive online transactional systems. And so on. Most of the changes that people talk about releasing continuously are trivial: minor tweaks, cosmetic fiddling or small bug fixes. Anything bigger has to be done more carefully.

Schema changes can’t be made continuously. Bigger functional changes can’t and shouldn’t be made continuously, even with dark launching. Etsy for example (one of the companies used as a poster child for Continuous Deployment), doesn’t continuously deploy bigger public-facing features. They take their time and design them and prototype them and test them and review them and plan them out with operations and customer support and product management like any sane organization. See John Allspaw’s keynote at Surge last year.

Ackerson talks about “sufficient” automated test coverage. Automated testing isn’t enough for every (any?) system or every (any?) company, certainly not automated testing at 35% coverage which is much much lower than say Jez Humble recommends in Continuous Delivery.

You also have to account for code reviews and security checks, which could be done before check-in. This is how some companies are able to achieve Continuous Deployment – they move some of the necessary and responsible steps like code reviews up before check-in, so you know the code is already pretty good.

Too many descriptions of Continuous Deployment make it sound too simple and too easy. It’s not, and even organizations that have a lot of experience with it continue to have security and reliability problems: Facebook, Wordpress, …

Yes there’s a lot to learn from Continuous Deployment, about streamlining and simplifying release and deployment, and reducing risk by breaking work down into smaller and smaller pieces and tying all of this together with ops monitoring and metrics. But it’s not the “Holy Grail of Devops”, or at least it shouldn’t be. There's a lot more to DevOps than Continuous Deployment, which is a good thing.

Building Real Software

Tuesday, May 17, 2011

Not doing Code Reviews? What’s your excuse?

Wednesday, May 4, 2011

Continuous Deployment is no Holy Grail

Monday, May 2, 2011

Checklists, software and software security

Syndications