Wednesday, August 20, 2014

Don’t waste time on Code Reviews

Less than half of development teams do code reviews and the other half are probably not getting as much out of code reviews as they should.

Here’s how to not waste time on code reviews.

Keep it Simple

Many people still think of code reviews as expensive formal code inspection meetings, with lots of prep work required before a room full of reviewers can slowly walk through the code together around a table with the help of a moderator and a secretary. Lots of hassles and delays and paperwork.

But you don’t have to do code reviews this way – and you shouldn’t.

There are several recent studies which prove that setting up and holding formal code review meetings add to development delays and costs without adding value. While it can take weeks to schedule a code review meeting, only 4% of defects are found in the meeting itself – the rest are all found by reviewers looking through code on their own.

At shops like Microsoft and Google, developers don’t attend formal code review meetings. Instead, they take advantage of collaborative code review platforms like Gerrit, CodeFlow, Collaborator, or ReviewBoard or Crucible, or use e-mail to request reviews asynchronously and to exchange information with reviewers.

These light weight reviews (done properly) are just as effective at finding problems in code as inspections, but much less expensive and much easier to schedule and manage. Which means they are done more often.

And these reviews fit much better with iterative, incremental development, providing developers with faster feedback (within a few hours or at most a couple of days, instead of weeks for formal inspections).

Keep the number of reviewers small

Some people believe that if two heads are better than one, then three heads are even better, and four heads even more better and so on…

So why not invite everyone on the team into a code review?

Answer: because it is a tragic waste of time and money.

As with any practice, you will quickly reach a point of diminishing returns as you try to get more people to look at the same code.

On average, one reviewer will find roughly half of the defects on their own. In fact, in a study at Cisco, developers who double-checked their own work found half of the defects without the help of a reviewer at all!

A second reviewer will find ½ as many new problems as the first reviewer. Beyond this point, you are wasting time and money. One study showed no difference in the number of problems found by teams of 3, 4 or 5 individuals, while another showed that 2 reviewers actually did a better job than 4.

This is partly because of overlap and redundancy – more reviewers means more people looking for and finding the same problems (and more people coming up with false positive findings that the author has to sift through). And as Geoff Crain at Atlassian explains, there is a “social loafing” problem: complacency and a false sense of security set in as you add more reviewers. Because each reviewer knows that somebody else is looking at the same code, they are under less pressure to find problems.

This is why at shops like Google and Microsoft where reviews are done successfully, the median number of reviewers is 2 (although there are times when an author may ask for more input, especially when the reviewers don’t agree with each other).

But what’s even more important than getting the right number of reviewers is getting the right people to review your code.

Code Reviews shouldn’t be done by n00bs – but they should be done for n00bs

By reviewing other people’s code a developer will get exposed to more of the code base, and learn some new ideas and tricks. But you can’t rely on new team members to learn how the system works or to really understand the coding conventions and architecture just by reviewing other developers’ code. Asking a new team member to review other people’s code is a lousy way to train people, and a lousy way to do code reviews.

Research backs up what should be obvious: the effectiveness of code reviews depend heavily on the reviewer’s skill and familiarity with the problem domain and with the code. Like other areas in software development, the differences in revew effectiveness can be huge, as much as 10x between best and worst performers. A study on code reviews at Microsoft found that reviewers from outside of the team or who were new to the team and didn’t know the code or the problem area could only do a superficial job of finding formatting issues or simple logic bugs.

This means that your best developers, team leads and technical architects will spend a lot of time reviewing code – and they should. You need reviewers who are good at reading code and good at debugging, and who know the language, framework and problem area well. They will do a much better job of finding problems, and can provide much more valuable feedback, including suggestions on how to solve the problem in a simpler or more efficient way, or how to make better use of the language and frameworks. And they can do all of this much faster.

If you want new developers to learn about the code and coding conventions and architecture, it will be much more effective to pair new developers up with an experienced team member in pair programming or pair debugging.

If you want new, inexperienced developers to do reviews (or if you have no choice), lower your expectations. Get them to review straightforward code changes (which don’t require in depth reviews), or recognize that you will need to depend a lot more on static analysis tools and another reviewer to find real problems.

Substance over Style

Reviewing code against coding standards is a sad way for a developer to spend their valuable time. Fight the religious style wars early, get everyone to use the same coding style templates in their IDEs and use a tool like Checkstyle to ensure that code is formatted consistently. Free up reviewers to focus on the things that matter: helping developers write better code, code that works correctly and that is easy to maintain.

“I’ve seen quite a few code reviews where someone commented on formatting while missing the fact that there were security issues or data model issues.”
Senior developer at Microsoft, from a study on code review practices

Correctness – make sure that the code works, look for bugs that might be hard to find in testing:

  • Functional correctness: does the code do what it is supposed to do – the reviewer needs to know the problem area, requirements and usually something about this part of the code to be effective at finding functional correctness issues
  • Coding errors: low-level coding mistakes like using <= instead of <, off-by-one errors, using the wrong variable (like mixing up lessee and lessor), copy and paste errors, leaving debugging code in by accident
  • Design mistakes: errors of omission, incorrect assumptions, messing up architectural and design patterns like MVC, abuse of trust
  • Safety and defensiveness: data validation, threading and concurrency (time of check/time of use mistakes, deadlocks and race conditions), error handling and exception handling and other corner cases
  • Malicious code: back doors or trap doors, time bombs or logic bombs
  • Security: properly enforcing security and privacy controls (authentication, access control, auditing, encryption)

Maintainability:

  • Clarity: class and method and variable naming, comments, …
  • Consistency: using common routines or language/library features instead of rolling your own, following established conventions and patterns
  • Organization: poor structure, duplicate or unused/dead code
  • Approach: areas where the reviewer can see a simpler or cleaner or more efficient implementation

Where should reviewers spend most of their time?

Research shows that reviewers find far more maintainability issues than defects (a ratio of 75:25) and spend more time on code clarity and understandability problems than correctness issues. There are a few reasons for this.

Finding bugs in code is hard. Finding bugs in someone else’s code is even harder.

In many cases, reviewers don’t know enough to find material bugs or offer meaningful insight on how to solve problems. Or they don’t have time to do a good job. So they cherry pick easy code clarity issues like poor naming or formatting inconsistencies.

But even experienced and serious reviewers can get caught up in what at first seem to be minor issues about naming or formatting, because they need to understand the code before they can find bugs, and code that is unnecessarily hard to read gets in the way and distracts them from more important issues.

This is why programmers at Microsoft will sometimes ask for 2 different reviews: a superficial “code cleanup” review from one reviewer that looks at standards and code clarity and editing issues, followed by a more in depth review to check correctness after the code has been tidied up.

Use static analysis to make reviews more efficient

Take advantage of static analysis tools upfront to make reviews more efficient. There’s no excuse not to at least use free tools like Findbugs and PMD for Java to catch common coding bugs and inconsistencies, and sloppy or messy code and dead code before submitting the code to someone else for review.

This frees the reviewer up from having to look for micro-problems and bad practices, so they can look for higher-level mistakes instead. But remember that static analysis is only a tool to help with code reviews – not a substitute. Static analysis tools can’t find functional correctness problems or design inconsistencies or errors of omission, or help you to find a better or simpler way to solve a problem.

Where’s the risk?

We try to review all code changes. But you can get most of the benefits of code reviews by following the 80:20 rule: focus reviews on high risk code, and high risk changes.

High risk code:

  • Network-facing APIs
  • Plumbing (framework code, security libraries….)
  • Critical business logic and workflows
  • Command and control and root admin functions
  • Safety-critical or performance-critical (especially real-time) sections
  • Code that handles private or sensitive data
  • Old code, code that is complex, code that has been worked on by a lot of different people, code that has had a lot of bugs in the past – error prone code
High risk changes:
  • Code written by a developer who has just joined the team
  • Big changes
  • Large-scale refactoring (redesign disguised as refactoring)

Get the most out of code reviews

Code reviews add to the cost of development, and if you don’t do them right they can destroy productivity and alienate the team. But they are also an important way to find bugs and for developers to help each other to write better code. So do them right.

Don’t waste time on meetings and moderators and paper work. Do reviews early and often. Keep the feedback loops as tight as possible.

Ask everyone to take reviews seriously – developers and reviewers. No rubber stamping, or letting each other off of the hook.

Make reviews simple, but not sloppy. Ask the reviewers to focus on what really matters: correctness issues, and things that make the code harder to understand and harder to maintain. Don’t waste time arguing about formatting or style.

Make sure that you always review high risk code and high risk changes.

Get the best people available to do the job – when it comes to reviewers, quality is much more important than quantity. Remember that code reviews are only one part of a quality program. Instead of asking more people to review code, you will get more value by putting time into design reviews or writing better testing tools or better tests. A code review is a terrible thing to waste.

Wednesday, August 6, 2014

Feature Toggles are one of the worst kinds of Technical Debt

Feature flags or config flags aka feature toggles aka flippers are an important part of Devops practices like dark launching (releasing features immediately and incrementally), A/B testing, and branching in code or branching by abstraction (so that development teams can all work together directly on the code mainline instead of creating separate feature branches).

Feature toggles can be simple Boolean switches or complex decision trees with multiple different paths. Martin Fowler differentiates between release toggles (which are used by development and ops to temporarily hide incomplete or risky features from all or part of the user base) and business toggles to control what features are available to different users (which may have a longer – even permanent – life). He suggests that these different kinds of flags should be managed separately, in different configuration files for example. But the basic idea is the same, to build conditional branches into mainline code in order to make logic available only to some users or to skip or hide logic at run-time, including code that isn’t complete (the case for branching by abstraction).

Using run-time flags like this isn't a new idea, certainly not invented at Flickr or Facebook. Using flags and conditional statements to offer different experiences to different users or to turn on code incrementally is something that many people have been practicing for a long time. And doing this in mainline code to avoid branching is in many ways a step back to the way that people built software 20+ years ago when we didn’t have reliable and easy to use code management systems.

Advantages and Problems of Feature Flags

Still, there are advantages to developers working this way, making merge problems go away, and eliminating the costs of maintaining and supporting long-lived branches. And carefully using feature flags can help you to reduce deployment risk through canary releases or other incremental release strategies, where you make the new code active for only some users or customers, or only on some systems, and closely check before releasing progressively to the rest of the user base – and turn off the new code if you run into problems. All of this makes it easier to get new code out faster for testing and feedback.

But using feature flags creates new problems of its own.

The plumbing and scaffolding logic to support branching in code becomes a nasty form of technical debt, from the moment each feature switch is introduced. Feature flags make the code more fragile and brittle, harder to test, harder to understand and maintain, harder to support, and less secure.

Feature Flags need to be Short Lived

Abhishek Tiwari does a good job of explaining feature toggles and how they should be used. He makes it clear that they should only be a temporary deployment/release management tool, and describes a disciplined lifecycle that all feature toggles need to follow, from when they are created by development, then turned on by operations, updated if any problems or feedback come up, and finally retired and removed when no longer needed.
Feature toggles require a robust engineering process, solid technical design and a mature toggle life-cycle management. Without these 3 key considerations, use of feature toggles can be counter-productive. Remember the main purpose of toggles is to perform release with minimum risk, once release is complete toggles need to be removed.

Feature Flags are Technical Debt – as soon as you add them

Like other sources of technical debt, feature flags are cheap and easy to add in the short term. But the longer that they are left in the code, the more that they will end up costing you.

Release toggles are supposed to make it easier and safer to push code out. You can push code out only to a limited number of users to start, reducing the impact of problems, or dark launch features incrementally, carefully assessing added performance costs as you turn on some of the logic behind the scenes, or run functions in parallel. And you can roll-back quickly by turning off features or optional behaviour if something goes wrong or if the system comes under too much load.

But as you add options, it can get harder to support and debug the system, keeping track of which flags are in which state in production and test can make it harder to understand and duplicate problems.

And there are dangers in releasing code that is not completely implemented, especially if you are following branching by abstraction and checking in work-in-progress code protected by a feature flag. If the scaffolding code isn't implemented correctly you could accidentally expose some of this code at run-time with unpredictable results.

…visible or not, you are still deploying code into production that you know for a fact to be buggy, untested, incomplete and quite possibly incompatible with your live data. Your if statements and configuration settings are themselves code which is subject to bugs – and furthermore can only be tested in production. They are also a lot of effort to maintain, making it all too easy to fat-finger something. Accidental exposure is a massive risk that could all too easily result in security vulnerabilities, data corruption or loss of trade secrets. Your features may not be as isolated from each other as you thought you were, and you may end up deploying bugs to your production environment”
James McKay

The support dangers of using – or misusing – feature flags was illustrated by a recent high-profile business failure at a major financial institution. The team used feature flags to contain operational risk when they introduced a new application feature. Unfortunately, they re-purposed a flag which was used by old code (code left in the system even though it hadn't been used in years).

Due to some operational mistakes in deployment, not all of the servers were successfully updated with the new code, and when the flag was turned on, old code and new code started to run on different computers at the same time doing completely different things with wildly inconsistent and, ultimately business-ending results. By the time that the team figured out what was going wrong, the company had lost millions of $.

As more flags get added, testing of the application becomes harder and more expensive, and can lead to an explosion of combinations: If a is on and b is off and c is on and d is off then… what is supposed to happen? Fowler says that you only need to test the combinations which should reasonably be expected to happen in production, but this demands that everyone involved clearly understand what options could and should be used together – as more flags get added, this gets harder to understand and verify.

And other testing needs to be done to make sure that switches can be turned on and off safely at run-time, and that features are completely and safely encapsulated by the flag settings and that behaviour doesn’t leak out by accident (especially if you are branching in code and releasing work-in-progress code). You also need to test to make sure that the structural changes to introduce the feature toggle do not introduce any regressions, all adding to testing costs and risks.

More feature flags also make it harder to understand how and where to make fixes or changes, especially when you are dealing with long-lived flags and nested options.

And using feature switches can make the system less secure, especially if you are hiding access to features in the UI. Adding a feature can make the attack surface of the application bigger, and hiding features at the UI level (for dark launching) won’t hide these features from bad guys.

Use Feature Flags with Caution

Feature flags are a convenient and flexible way to manage code, and can help you to get changes and fixes out to production more quickly. But if you are going to use flags, do so responsibly:

  • Minimize your use of feature flags for release management, and make the implementation as simple as possible. Martin Fowler explains that it is important to minimize conditional logic to the UI and to entry points in the system. He also emphasises that:
    Release toggles are a useful technique and lots of teams use them. However they should be your last choice when you're dealing with putting features into production.

    Your first choice should be to break the feature down so you can safely introduce parts of the feature into the product. The advantages of doing this are the same ones as any strategy based on small, frequent releases. You reduce the risk of things going wrong and you get valuable feedback on how users actually use the feature that will improve the enhancements you make later.
  • Review flags often, make sure that you know which flags are on and which are supposed to be on and when features are going to be removed. Create dashboards (so that everyone can easily see the configuration) and health checks – run-time assertions – to make sure that important flags are on or off as appropriate.
  • Once a feature is part of mainline, be ruthless about getting it out of the code base as soon as it isn't used or needed any more. This means carefully cleaning up the feature flags and all of the code involved, and testing again to make sure that you didn't break anything when you did this. Don’t leave code in the mainline just in case you might need it again some day. You can always go back and retrieve it from version control if you need to.
  • Recognize and account for the costs of using feature flags, especially long-lived business logic branching in code.

Feature toggles start off simple and easy. They provide you with new options to get changes out faster, and can help reduce the risk of deployment in the short term. But the costs and risks of relying on them too much can add up, especially over the longer term.