Thursday, September 19, 2013

Code Reviews Change Over Time

We've been doing code reviews for about 4 years now.

Getting Started with Code Reviews

From the start, developers would help each other out, look at code when someone asked, or sometimes a lead or a senior developer would step in and review code if we were seeing problems in testing or if someone had just joined the team and we anticipated that they needed some extra oversight. We also hired an expert to come in and do a security code review.

After we launched the system, we decided to be proactive and started risk-based reviews: we asked anyone writing code in high-risk areas (like framework and security plumbing, APIs, core business logic, or areas where we had seen problems before) to get a code review. We could have stopped there, and got a lot of value out of code reviews. But we decided to keep going, and made code reviews a standard practice.

This didn't happen overnight. It wasn't difficult to convince the team that code reviews could help – they’d already seen good results from the risk-based reviews that we had done. But it was difficult to change the way that people worked, and to make sure that they had enough time to do reviews properly: time to schedule and carry out the reviews, and time to understand and act on the feedback. And it also took a while to come up with an effective code review process.

At first we asked developers to pick a buddy and arrange a review. The results were mixed. Sometimes a developer would shop around for someone nice or someone really busy who they figured would let them off lightly; or two developers would trade-off (I’ll show you mine if you’ll show me yours) so they could get the reviews over with as quickly as possible. Because people didn't know how much time they needed to get reviews completed, reviews were often left until too late, and done after the code was already tested and released.

And because most people didn't have a lot of experience doing reviews, they weren't sure what they should look for, and how to give meaningful feedback. Developers were frustrated (he told me to do it this way, but on another change she told me that I should do it that way) and sometimes upset by negative criticism.

Eventually we decided that the leads would do most of the reviews. Although this added a lot more to the leads’ workload, and meant that they couldn't do as much coding themselves, it helped in a few ways. The lead developers usually had a much better understanding of the requirements and what the code was supposed to do than another developer who happened to be available, which mean that they had a better chance of finding real mistakes. And because the same person was doing most of their reviews, developers received consistent feedback.

How we do Code Reviews

How we do reviews has stayed pretty much the same over the years.

Non-trivial code changes are reviewed (either before or after the check-in) regardless of who wrote the code or what the code does. We don’t hold formal review meetings with a projector or print outs, and we haven’t found it necessary to use a review tool like Code Collaborator or Crucible or the internal tools that Google or Facebook use to manage and track reviews, although looking back adopting a tool like this early might have helped the team start off better.

Sometimes reviews are done face-to-face, but most of the time they’re done offline: the reviewer and developer exchanging information and maybe patch files by email, because we've found this to be more convenient and easier for everyone to schedule (although people will meet to go through the review in detail if they decide it is necessary).

What has changed over time is what the reviewers look for, and what they find.

Reviews for Correctness

It seems like programmers spend more time arguing over what to look for in code reviews than they actually spend doing code reviews.

We started doing code reviews to improve code quality and to find problems that we weren't catching in testing. Not as a way of teaching inexperienced developers how to write better code, or a way to spread knowledge about how the code works across the team. These might be happy side effects, but reviews should be about making sure that the code works right.

Instead of using long check lists, reviewers started off by asking a small set of questions when they looked at code:

  1. Does the code do what it is supposed to, and does it look like the logic is correct?

    Are there any obvious coding mistakes or code that looks fishy? Is there anything obviously missing or incomplete? What has to be fixed to make sure that the code will work correctly?

    Reviewing for correctness is hard work – a reviewer has to spend some time understanding the requirements, and more time thinking about the code and trying to get into the head of the person who wrote it.

    But even without a lot of time or familiarity, a good reviewer can still see glaring logical mistakes and oversights, inconsistencies (you changed a, b, and d, but you didn't change c. Was this on purpose?), common coding mixups (looking out for < instead of <= or sometimes > in comparisons, off-by-one errors, using the wrong variables in calculations or comparisons – buyer instead of seller, inviter instead of invitee), debugging code left on by accident, redundant code and checks that don’t seem to be needed and other suspect code that doesn't look right or is confusing.

    Reviewers can also look out for basic security issues (access control rules/roles being applied properly, correct handling of sensitive data), backwards compatibility, and proper use of APIs and language features (especially for people new to the team).

    And while we don’t ask reviewers to spend a lot of time reviewing automated tests, they may also do spot checks to look for holes in the test suite – especially if they find an obvious coding mistake (which means that some tests are either missing or wrong).

  2. Does this code look the same and work the same as the rest of the code base?

    Does it follow the style and conventions that the team has agreed to (indentation, headers, capitalization and naming conventions, import sequence…)? Does it use standard templates and auto-formatting rules? We’re not talking about nitpicky perfection here, but we don’t want every piece of code to look different either.

    Architectural conventions. Does it use the framework and common libraries properly? Does it properly follow the big patterns (MVC etc) that we want people to understand and use?

    We wanted developers to work out issues around consistency and style early on while the team was coming together and everyone was still learning and exploring. This turned out to take longer and cause more trouble than we expected. Not only is it tedious to check for, but style can also be a near religious thing for some people. A developer won’t argue over whether something is a bug or not, at least not after they've understood what’s wrong and why. And most developers appreciate learning how to use the framework or language more effectively, as long as this is done in a polite and respectful way. But some people get seriously wound up over aesthetics and style, what kind of code looks cleaner or is more elegant.

Reviewers could offer other feedback of course, advice and suggestions on how to simplify the solution and improve the code, but we left it up to the developer when or if to act on this advice as long as the code is correct and follows conventions.

Reviewing for Understandability

Over time, what you look for and what you get out of code reviews should change. Because the code has changed. And the people doing the work – the developers and the reviewers – have changed too.

It got easier for reviewers to find bugs and bad code as developers learned more about how to use the code checkers in their IDEs and after we found some good static analysis tools to automatically check for common coding bugs and bad coding practices. This mean that reviewers could spend their time looking for more important design mistakes, or timing errors or sequence accounting errors or concurrency bugs like race conditions and potential deadlocks, and other problems that the tools can’t find.

As the team gained more experience supporting the system in production, troubleshooting and fixing problems and working with Ops, reviewers started to look more closely at defensive programming and run-time safety: error and exception handling (especially exception conditions that are difficult to test for), data validation and limits checking, timeouts and retries, properly enforcing API contracts, and logging and alerting. Making sure that the system “wouldn't boink”.

Because we worked out most of the consistency issues early, it stopped being necessary to review for consistency after a while. There was lots of code, so it was natural and easy for developers to build on what was already there, and to keep following the existing style and patterns.

But this also meant that there was a lot more code that had to be read and understood. So code reviewers became more concerned with anything that made the code harder to read and understand. Although fussing over details like method and variable names and whether comments are needed or make sense might not seem that important, all of this feeds back into correctness – reviewers can’t tell if the code is correct if they aren't sure that they really understand it.

Reviewers also started looking out more for unnecessary code duplication because clones create more opportunities for mistakes when you have to make changes, and code that that hasn't been updated to use new libraries or APIs, or code that isn't needed any more (especially a problem if you use feature switches and don’t get around to removing them), or refactoring that didn't get finished or that got out of control.

Understanding, not Criticizing

Reviews are about understanding the code, and making sure that it works right, not criticizing it.

It’s important to stay out of arguments about “what I think good code is vs. what you think good code is”. A review should never run along the lines of “I know what good OO code is supposed to look like, and apparently you don’t, so let me show you.”

A review should only be about “I have to be able to understand this code well enough to make sure that it works, and in case I have to fix something in this code myself, even though I wouldn't have written it this way and wish that you hadn't (but I’ll keep this part to myself).”

When reviewers step across this line, when people’s pride and sense of professional self-worth are involved, things can get ugly fast.

Besides avoiding arguments, it’s a waste of time giving feedback and suggestions that nobody is going to act on because they’re too busy to get to it or they don’t agree or it isn't necessary. If you want to get good value out of reviews, keep them focused on what’s important.

The main questions that reviewers try to answer now when they look at code are:

  1. Does this code do what it is supposed to do? And will it keep working even if something goes wrong?

    Reviewers keep looking for the same kinds of obvious logic and functional problems as they did before. But they also ask: What happens if it runs into an error? Will Ops be able to tell what the problem was? What happens if some other part of the system that it relies on fails? Will it retry and recover, will it fail gracefully, or will it go boom? What’s the chance that this code could get garbage passed to it, and what would happen if it did? And can it pass garbage to some other part of the system?

  2. Can I be sure that this code does what it is supposed to do? Can I understand the code? Can I follow the changes that were made?

    Is the code maintainable? Could I fix it in production if something went wrong? Is there any code too that is cute or too clever for its own good? Are there any obvious, safe and simple things that can be done to make the code simpler and easier to understand and easier to change?

Some things get better with time, other things don’t

By looking at changes to code bases over time, Michael Feathers has found that most code is rarely or never changed once it is written; most changes are made to over and over to the same small percentage of the code base.

Because reviewers keep seeing the same code being changed, it changes how they work:

  1. Reviewers who have seen the code before don’t have to spend as much time figuring out what’s going on, they understand the context and have a better chance of understanding the changes and whether the changes are done correctly.

    Looking at the same code again, reviewers may see legacy problems that they didn't notice before. And they should be able to offer more valuable, less superficial feedback on how the code can be simplified and improved.

    There’s a trade-off being made here too of course. Reviewers can also get stale looking at the same code, and stop looking closely enough. And the reviewer’s biases and blind spots will get amplified over time.

  2. As reviewers work with the same developers over and over, they will get to know each other better. Reviewers will learn what to look out for, what the developer’s strengths and weaknesses are, the developer’s tendencies and biases and blind spots, the kinds of mistakes and oversights that they tend to make.

    As developers get to know the reviewers better, feedback will get shorter and simpler, and there will be fewer misunderstandings.

    And because developers will know more about what to expect from a review, they will start to check their own code more carefully, and try to find the problems before the reviewer does. This is what Jason Cohen at SmartBear calls “the Ego Effect”: developers write better code because they know that somebody is going to be looking at it closely, especially if it’s someone they respect. Some people feel that this is the most important reason to do code reviews – to make developers work more carefully, so they don’t look foolish in front of somebody else; and to think about writing code that other people will read.

Avoiding Diminishing Returns

Like any practice in software development, you will eventually see diminishing returns from code reviews especially if you keep doing the same things, looking for the same problems in the same way. You may even reach a point where code reviews aren't worth the cost or time.

This hasn't happened to us. We continue to get a lot of value out of reviews, maybe even more now.

Even as our tools have improved and as we've strengthened our testing capability we keep doing reviews because bug checking tools, reviews and testing find different problems and different kinds of problems. We've also found that reviews make testing more effective and efficient, because there’s less back-and-forth between developers and testers over mistakes that are caught and corrected earlier.

And now that reviews are an accepted part of how people work, we don’t have to spend time selling and re-selling the value of code reviews to the team or to the business. As long as code reviews are being taken seriously, and as long as you care about writing good code, they’re worth doing.


Brian Keith Blain said...

My former team was successful using Crucible for code reviews. Before formal reviews began the project already had in place code conventions configured into Eclipse, Sonar executed daily from Jenkins, and IA scanning with Fortify. Having these tools in place allowed the team to focus on code design over grammar and minor bugs.

A review was performed by 2 other randomly chosen developers for every commit. (This is a 5 person team) I felt this provide a few benefits over another team that reviewed all code at the end of the sprint. I also felt having 2 reviewers increased the likelihood of the benefits.

- The code is fresh. Explaining decisions to reviewers were detailed.

- The commit is small. Understanding the changes was easier for reviewers and could be reviewed in 5 to 10 minutes which meant roughly 30 minutes a day. If the commit wasn't small it only took a few 1hr+ reviews for reviewers to complain.

- Developers shared knowledge about the code base and were synchronized with what the other developers were working on. If code was incidentally being duplicated it was caught early. Bus factor is reduced and velocity increased even if someone got sick or went on holiday. Quality and skill across the entire team increased.

- Having 2 reviewers meant there was a better chance of catching code smells earlier. Not all reviewers look for the same problems or may overlook problems.

Great article! Keep it up.

Alex Collins said...

Great stuff!

Site Meter