Monday, April 29, 2013

What does Code Ownership do to Code?

In my last post, I talked about Code Ownership models, and why you might want to choose one code ownership model (strong, weak/custodial or collective) over another. Most of the arguments over code ownership focus on managing people, team dynamics, and the effects on delivery. But what about the longer term effects on the shape, structure and quality of code – does the ownership model make a difference? What are the long-term effects of letting everyone working on the same code, or of having 1 or 2 people working on the same pieces of code for a long time?

Collective Code Ownership and Code Quality

Over time, changes tend to concentrate in certain areas of code: in core logic and in and behind interfaces (listen to Michael Feathers’ fascinating talk Discovering Startling Things from your Version Control System). This means that the longer a system has been running, the more chances there are for people to touch the same code. Some interesting research work backs up what should be obvious: that the people who understand the code the best are the people who work on it the most, and the people who know the code the best make less mistakes when changing it.

In Don’t Touch my Code!, researchers at Microsoft (BTW, the lead author Christian Bird is not a relative of mine, at least not a relative who I know) found that as more people touch the same piece of code, it leads to more opportunities for misunderstandings and more mistakes. Not surprisingly, people who hadn't worked on a piece of code before made more mistakes, and as the number of developers working on the same module increased, so did the chance of introducing bugs.

Another study, Ownership and Experience in Fix-Inducing Code tries to answer which is more important in code quality: “too many cooks spoil the broth”, or “given enough eyeballs, all bugs are shallow”? Does more people working on the same code lead to more bugs, or does having more people working on the code mean that there are more chances to find bugs early? This research team found that a programmer’s specific experience with the code was the most important factor in determining code quality – code that is changed by the programmer who does most of the work on that code is of higher quality than code written by someone who doesn't normally work on the code, even if that someone is a senior developer who has worked on other parts of the code. And they found that the fewer the people working on a piece of code, the fewer the bugs that needed to be fixed.

And a study on contributions to Linux reinforces that as the number of developers working on the same piece of code increase, the chance of bugs and security problems increases significantly: code touched by more than 9 developers is 16x more likely to have security vulnerabilities, and more vulnerabilities are introduced by developers who are making changes across many different pieces of code.

Long-term Effects of Ownership Approach on Code Structure

I've worked at shops where the same programmers have owned the same code for 3 or 4 or 5 or even 10 years or sometimes even longer. Over that time, that programmer’s biases, strengths, weaknesses and idiosyncrasies are all amplified, wearing deep grooves in the code. This can be a good thing, and a bad thing.

The good thing is that with one person making most or all of the changes, internal consistency in any piece of code will be high – you can look at a piece of code written by that developer and once you understand their approach and way of thinking, the patterns and idioms that they prefer, everything should be familiar and easy to follow. Their style and approach might have changed over time as they learned and improved as a developer, but you can generally anticipate how the rest of the code will work, and you’ll recognize what they are good at and what their blind spots are, what kind of mistakes they are prone to: as I mentioned in the earlier post, this makes code easier to review and easier to test and so easier to find and fix bugs.

If a developer tends to write good, clean, tight code, and if they are diligent about refactoring and keeping the code clean and tight, then most of the code will be good, clean, tight and easy to follow. Of course it follows that if they tend to write sloppy, hard-to-understand, poorly structured code, then most of it will be sloppy, hard-to-understand and poorly-structured. Then again, even this can be a good thing – at least bad code is isolated, and you know what you have to rewrite, instead of someone spreading a little bid of badness everywhere.

When ownership changes – when the primary contributor leaves, and a new owner takes over, the structure and style of the code will change as well. Maybe not right away, because a new owner usually takes some time to get used to the code before they put their stamp on it, but at some point they’ll start adapting it – even unconsciously – to their own preferences and biases and ways of thinking, refactoring or rewriting it to suit them.

If a lot of developers have worked on the same piece of code, they will introduce different ideas, techniques and approaches over time as they each do their part, as they refactor and rewrite things according to their own ideas of what is easy to understand and what isn't, what’s right and wrong. They will each make different kinds of mistakes. Even with clear and consistent shared team conventions and standards, differences and inconsistencies can build up over time, as people leave and new people join the team, creating dissonance and making it harder to follow a thought through the code, harder to test and review, and harder to hold on to the design.

Ownership Models and Refactoring

But as Michael Feathers has found through mining version control history, there is also a positive Ownership Effect on code as more people work on the same code.

Over time, methods and classes tend to get bigger because it’s easier to add code to an existing method than to write a new method, and easier to add another method to an existing class than create a new class. By correlating the number of developers who have touched a piece of code with method size, Feathers research shows that as the number of developers working on a piece of code increases, the average method size tends to get smaller. In other words, having multiple people working on a code base encourages refactoring and simpler code, because people who aren't familiar with the code have to simplify it first in order to understand it.

Feathers has also found that code behind APIs tends to be especially messy – because some interfaces are too hard to change, programmers are forced to come up with their own workarounds behind the scenes. Martin Fowler explains how this problem is made worse by strong code ownership, which inhibits refactoring and makes the code more internally rigid:

In strong code ownership, there's my code and your code. I can't change your code. If I want to change the name of one of my methods, and it's called by your code, I've got to get you to change the call into me before I can change my name. Or I've got to go through the whole deprecation business. Essentially any of my interfaces that you use become published in that situation, because I can't touch your code for any reason at all.

There's an intermediate ground that I call weak code ownership. With weak code ownership, there's my code and your code, but it is accepted that I could go in and change your code. There's a sense that you're still responsible for the overall quality of your code. If I were just going to change a method name in my code, I'd just do it. But on the other hand, if I were going to move some responsibilities between classes, I should at least let you know what I'm going to do before I do it, because it's your code. That's different than the collective code ownership model.

Weak code ownership and refactoring are OK. Collective code ownership and refactoring are OK. But strong code ownership and refactoring are a right pain in the butt, because a lot of the refactorings you want to make you can't make. You can't make the refactorings, because you can't go into the calling code and make the necessary updates there. That's why strong code ownership doesn't go well with refactoring, but weak code ownership works fine with refactoring.
(Design Principles and Code Ownership)

Ownership, Technical Debt or Deepening Insight

An individual owner has a higher tolerance for complexity, because after all it’s their code and they know how it works and it’s not really that hard to understand (not for them at least) so they don’t need to constantly simplify it just to make a change or fix something. It's also easy for them to take short cuts, and even short cuts on short cuts. This can build up over time until you end up with a serious technical debt problem – one person is always working on that code, not because the problem is highly specialized, but because the code has reached a point where nobody else but Scotty can understand it and make it work.

There’s a flip side to spending more time on code too. The more time that you spend on the same problem, the deeper you can see into it. As you return to the same code again and again you can recognize patterns, and areas that you can improve, and compromises that you aren't willing to accept any more. As you learn more about the language and the frameworks, you can go back and put in simpler and safer ways of doing things. You can see what the design really should be, where the code needs to go, and take it there.

There's also opportunity cost of not sticking to certain areas. Focusing on a problem allows you to create better solutions. Specifically, it allows you to create a vision of what needs to be done, work towards that vision and constantly revise where necessary... If you're jumping from problem to problem, you're more likely to create an inferior solution. You'll solve problems, but you'll be creating higher maintenance costs for the project in the long term.
Jay Fields Taking a Second Look at Collective Code Ownership

So far I've found that the only way for a team to take on really big problems is by breaking the problems up and letting different people own different parts of the solution. This means taking on problems and costs in the short term and the long term, trading off quality and productivity against flexibility and consistency – not only flexibility and consistency in how the team works, but in the code itself.

What I've also learned is that whether you have a team of people who each own a piece of the system, or a more open custodian environment, or even if everyone is working everywhere all of the time, you can’t let people do this work completely on their own. It’s critical to have people working together, whether you are pairing in XP or doing regular egoless code reviews. To help people work on code that they’ve never seen before – or to help long-time owners recognize their blind spots. To mentor and to share new ideas and techniques. To keep people from falling into bad habits. To keep control over complexity. To reinforce consistency – across the code base or inside a piece of code.

Thursday, April 25, 2013

Code Ownership – Who Should Own the Code?

A key decision in building and managing any development team is agreeing on how ownership of the code will be divided up: who is going to work on what code; how much work can be, and should be, shared across the team; and who will be responsible for code quality. The approach that you take has immediate impact on the team’s performance and success, and a long-term impact on the shape and quality of the code.

Martin Fowler describes three different models for code ownership on a team:

  1. Strong code ownership – every module is owned exclusively by someone, developers can only change the code that they own, and if they need to change somebody else’s code, they need to talk to that owner and get the owner’s agreement first – except maybe in emergencies.

  2. Weak code ownership – where modules are still assigned to owners, but developers are allowed to change code owned by other people. Owners are expected to keep an eye on any changes that other people make, and developers are expected to ask for permission first before making changes to somebody else’s code.

    This can be thought of as a shared custody model, where an individual is forced to share ownership of their code with others; or Code Stewardship, where the team owns all of the code, but one person is held responsible for the quality of specific code, and for helping other people make changes to it, reviewing and approving all major changes, or pairing up with other developers as necessary. Brad Appleton says the job of a code steward is not to make all of the changes to a piece of code, but to “safeguard the integrity + consistency of that code (both conceptually and structurally) and to widely disseminate knowledge and expertise about it to others”.

  3. Collective Code Ownership – the code base is owned or shared by the entire team, and everyone is free to make whatever changes they need – or want – to make, including refactoring or rewriting code that somebody else originally wrote. This is a model that came out of Extreme Programming, where the Whole Team is responsible together for the quality and integrity of the code and for understanding and keeping the design.

Arguments against Strong/Individual Code Ownership

Fowler and other XP advocates such as Kent Beck don’t like strong individual code ownership, because it creates artificial barriers and dependencies inside the team. Work will stall and pause if you need to wait for somebody to make or even approve a change, and one owner can often become the critical path for the entire team. This could encourage developers to come up with their own workarounds and compromises. For example, instead of changing an API properly (which would involve a change to somebody else’s code), they might shoe horn in a change, like stuffing something into an existing field. Or they might take a copy of somebody’s code and add whatever they need to it, making maintenance harder in the future.

Other arguments against strong ownership are that it can lead to defensiveness and protectionism on the part of some developers (“hey, don’t touch my code!”), where they take any criticism of the code as a personal attack, creating tension on the team and discouraging reviewers from offering feedback and discouraging refactoring efforts; and local over-optimization, if developers are given too much time to spend to polish and perfect their precious code without thinking of the bigger picture.

And of course there is the “hit by a truck factor” to consider – the impact that a person leaving the team will have on productivity if they’re the only one who works on a piece of code.

Ward Cunningham. one of the original XPers, also believes that there is more pride of ownership when code is shared, because everyone’s work is always on display to everyone else on the team.

Arguments against Collective Code Ownership

But there are also arguments against Collective Code Ownership. A post by Mike Spille lists some problems that he has seen when teams try to “over-share” code:

  • Inconsistency. No overriding architecture is discernible, just individual solutions to individual problems. Lots of duplication of effort results, often leading to inconsistent behavior
  • Bugs. People "refactoring" code they don't really understand break something subtle in the original code.
  • Constant rounds of "The Blame Game". People have a knee jerk reaction to bugs, saying "It worked when I wrote it, but since Joe refactored it....well, that's his problem now.".
  • Slow delivery. Nobody has any expertise in any given domain, so people are spending more time trying to understand other people's code, less time writing new code.

Matthias Friedrich, in Thoughts on Collective Code Ownership believes that Collective Code Ownership can only work if you have the right conditions in place:

  • Team members are all on a similar skill level
  • Programmers work carefully and trust each other
  • The code base is in a good state
  • Unit tests are in place to detect problematic changes (although unit tests only go so far)

Remember that Collective Code Ownership came out of Extreme Programming. Successful team ownership depends on everyone sharing an understanding of the domain and the design, and maintaining a high-level of technical discipline: not only writing really good automated tests as a safety net, but everyone following consistent code conventions and standards across the code base, and working in pairs because hopefully one of you knows the code, or at least with two heads you can try to help each other understand it and make fewer mistakes.

Another problem with Collective Code Ownership is that ownership is spread so thin. Justin Hewlett talks about the Tragedy of the Commons problem: people will take care of their own yard, but how many people will pick up somebody else’s litter in the park, or on a street - even if they walk in that park or down that street everyday? If the code belongs to everyone, then there is always “someone else” who can take care of it – whoever that “someone else” may be. As a developer, you’re under pressure, and you may never touch this piece of code again, so why not get whatever you need to do as quickly as possible and get on to the next thing on your list, and let "somebody else" worry about refactoring or writing that extra unit test or...?

Code Ownership in the Real World

I've always worked on or with teams that follow individual (strong or weak) code ownership, except for an experiment in pure XP and Collective Code Ownership on one team over 10 years ago. One (or maybe two) people own different pieces of the code and do all or most of the heavy lifting work on that code. Because it only makes sense to have the people who understand the code best do most of the work, or the most important work. It’s not just because you want the work “done right” – sometimes you don’t really have a choice over who is going to do the work.

As Ralf Sudelbucher points out, Collective Code ownership assumes that all coding work is interchangeable within a team, which is not always true.

Some work isn't interchangeable because of technology: different parts of a system can be written in different languages, with different architectures. You have to learn the language and the framework before you can start to understand the other problems that need to be solved.

Or it might be because of the problem space. Sure, there is always coding on any project that is “just typing”: journeyman work that is well understood, like scaffolding work or writing another web form or another CRUD screen or fixing up a report or converting a file format, work that has to be done and can be taken on by anyone who has been on the team for a while and who understands where to find stuff and how things are done – or who pairs up with somebody who knows this.

But other software development involves solving hard domain problems and technical problems that require a lot of time to understand properly – where it can take days, weeks, months or sometimes even years to immerse yourself in the problem space well enough to know what to do, where anyone can’t just jump in and start coding, or even be of much help in a pair programming situation.

The worst disasters occur when you turn loose sorcerers' apprentices on code they don't understand. In a typical project, not everyone can know everything - except in some mature domains where there have been few business paradigm shifts in the past decade or two.
Jim Coplien, Code Ownership

I met someone who manages software development for a major computer animation studio. His team has a couple of expert developers who did their PHDs and post grad work in animating hair – that’s all that they do, and even if you are really smart you’ll need years of study and experience just to understand how they do what they do.

Lots of scientific and technical engineering domains are also like this – maybe not so deeply specialized, but they involve non-trivial work that can’t be easily or competently done by generalists, even competent generalists. Programming medical devices or avionics or robotics or weapons control; or any business domain where you are working at the leading edge of problem solving, applying advanced statistical models to big data analysis or financial trading algorithms or risk-management models; or supercomputing and high-scale computing and parallel programming, or writing an operating system kernel or solving cryptography problems or doing a really good job of User Experience (UX) design. Not everyone understands the problems that need to be solved, not everyone cares about the problems and not everyone can do a good job of solving them.

Ownership and Doing it Right

If you want the work done right, or need it to be done right the first time, it should be done by someone who has worked on the code before, who knows it and who has proven that they can get the job done. Not somebody who has only a superficial familiarity with the code. Research work by Microsoft and others have shown that as more people touch the same piece of code, there is more chance of misunderstandings and mistakes – and that the people who have done the most work on a piece of code are the ones who make the fewest mistakes.

Fowler comes back to this in a later post about “Shifting to Code Ownership” where he shares a story from a colleague who shifted a team from collective code ownership to weak individual code ownership because weaker or less experienced programmers were making mistakes in core parts of the code and impacting quality, velocity and the team’s morale. They changed their ownership model so anyone could work around the code base, but if they needed to change core code, they had to do this with the help of someone who knew that part of the code well.

In deciding on an an ownership approach, you have to make a trade-off between flexibility and quality, team ownership and individual ownership. With individual ownership you can have siloing problems and dependencies on critical people, and you’ll have to watch out for trucks. But you can get more done, faster, better and by fewer people.

Thursday, April 18, 2013

Architecture-Breaking Bugs – when a Dreamliner becomes a Nightmare

The history of computer systems is also the history of bugs, including epic, disastrous bugs that have caused millions of $ in damage and destruction and even death, as well as many other less spectacular but expensive system and project failures. Some of these appear to be small and stupid mistakes, like the infamous Ariane 5 rocket crash, caused by a one-line programming error. But a one-line programming error, or any other isolated mistake or failure, cannot cause serious damage to a large system, without fundamental failures in architecture and design, and failures in management.

Boeing's 787 Dreamliner Going Nowhere
The Economist, Feb 26 2013

These kinds of problems are what Barry Boehm calls “Architecture Breakers”: where a system’s design doesn't hold up in the real world, when you run face-first into a fundamental weakness or a hard limit on what is possible with the approach that you took or the technology platform that you selected.

Architecture Breakers happen at the edges – or beyond the edges – of the design, off of the normal, nominal, happy paths. The system works, except for a “one in a million” exceptional error, which nobody takes seriously until a “once in a million” problem starts happening every few days. Or the system crumples under an unexpected surge in demand, demand that isn't going to go away unless you can’t find a way to quickly scale the system to keep up – and if you can’t, you won’t have a demand problem any more because those customers won’t be coming back. Or what looks like a minor operational problem turns out to be the first sign of a fundamental reliability or safety problem in the system.

Dreamliner is Troubled by Questions about Safety
NY Times, Jan 10, 2013

Finding Architecture Breakers

It starts off with a nasty bug or an isolated operational issue or a security incident. As you investigate and start to look deeper you find more cases, gaping holes in the design, hard limits to what the system can do, or failures that can’t be explained and can’t be stopped. The design starts to unravel as each problem opens up to another problem. Fixing it right is going to take time and money, maybe even going back to the drawing board and revisiting foundational architectural decisions and technology choices. What looked like a random failure or an ugly bug just turned into something much uglier, and much much more expensive.

Deepening Crisis for the Boeing 787
NY Times, Jan 17 2013

What makes these problems especially bad is that they are found late, way past design and even past acceptance testing, usually when the system is already in production and you have a lot of real customers using it to get real work done. This is when you can least afford to encounter a serious problem. When something does go wrong, it can be difficult to recognize how serious it is right away. It can take two or three or more failures before you realize – and accept – how bad things really are and before you see enough of a pattern to understand where the problem might be.

Boeing Batteries Said to Fail 10 Times Before Incident
Bloomberg, Jan 30 2013

By then you may be losing customers and losing money and you’re under extreme pressure to come up with a fix, and nobody wants to hear that you have to stop and go back and rewrite a piece of the system, or re-architect it and start again – or that you need more time to think and test and understand what’s wrong and what your options are before you can even tell them how long it might take and how much it could cost to fix things.

Regulators Around the Globe Ground Boeing 787s
NY Times, Jan 18 2013

What can Break your Architecture?

Most Architecture Breakers are fundamental problems in important non-functional aspects of a system:

  • Stability and data integrity: some piece of the system won’t stay up under load or fails intermittently after the system has been running for hours or days or weeks, or you lost critical customer data or you can’t recover and restore service fast enough after an operational failure.
  • Scalability and throughput: the platform (language or container or communications fabric or database – or all of them) are beautiful to work with, but can’t keep up as more customers come in, even if you throw more hardware at it. Ask Twitter about trying to scale-out Ruby or Facebook about scaling PHP or anyone who has ever tried to scale-out Oracle RAC.
  • Latency – requirements for real-time response-time/deadline satisfaction escalate, or you run into unacceptable jitter and variability (you chose Java as your run-time platform, what happens when GC kicks in?).
  • Security: you just got hacked and you find out that the one bug that an attacker exploited is only the first of hundreds or thousands of bugs that will need to be found and fixed, because your design or the language and the framework that you picked (or the way that you used it) is as full of security holes as Swiss cheese.
These problems can come from misunderstanding what an underlying platform technology or framework can actually do – what the design tolerances for that architecture or technology are. Or from completely missing, overlooking, ignoring or misunderstanding an important aspect of the design.

These aren’t problems that you can code your way out of, at least not easily. Sometimes the problem isn't in your code any ways: it’s in a third party platform technology that can’t keep up or won’t stay up. The language itself, or an important part of the stack like the container, database, or communications fabric, or whatever you are depending on for clustering and failover or to do some other magic. At high scale in the real world, almost any piece of software that somebody else wrote can and will fall short of what you really need, or what the vendor promised.

Boeing, 787 Battery Supplier at Odds over Fixes
Wall Street Journal, Feb 27 2013

You’ll have to spend time working with a vendor (or sometimes with more than one vendor) and help them understand your problem, and get them to agree that it’s really their problem, and that they have to fix it, and if they can’t fix it, or can’t fix it quickly enough, you’ll need to come up with a Plan B quickly, and hope that your new choice won’t run into other problems that may be just as bad or even worse.

How to Avoid Architecture Breakers

Architecture Breakers are caused by decisions that you made early and got wrong – or that you didn't make early enough, or didn't make at all. Boehm talks about Architecture Breakers as part of an argument against Simple Design – that many teams, especially Agile teams, spend too much time focused on the happy path, building new features to make the customer happy, and not enough time on upfront architecture and thinking about what could go wrong. But Architecture Breakers have been around a lot longer than Agile and simple design: in Making Software (Chapter 10 Architecting: How Much and When), Boehm goes back to the 1980s when he first recognized these kinds of problems, when Structured Programming and later Waterfall were the “right way” to do things.

Boehm’s solution is more and better architecture definition and technical risk management through Spiral software development: a lifecycle with architecture upfront to identify risk areas, which are then explored through iterative, risk-driven design, prototyping and development in multiple stages. Spiral development is like today’s iterative, incremental development methods on steroids, using risk-based architectural spikes, but with much longer iterative development and technical prototyping cycles, more formal risk management, more planning, more paperwork, and much higher costs.

Bugs like these can’t all be solved by spending more time on architecture and technical risk management upfront – whether through Spiral development or a beefed up, disciplined Agile development approach. More time spent upfront won`t help if you make naïve assumptions about scalability, responsiveness or reliability or security; or if you don’t understand these problems well enough to identify the risks. Architecture Breakers won’t be found in design reviews – because you won’t be looking for something that you don’t know could a problem – unless maybe you are running through structured failure modelling exercises like FMEA (Failure mode and effect analysis) or FMECA (Failure mode, effects and criticality analysis), which force you to ask hard questions, but which few people outside of regulated industries have even heard about.

And Architecture Breakers can't all be caught in testing, even extended longevity/soak testing and extensive fuzzing and simulated failures and fault injection and destructive testing and stress testing – even if all the bugs that are found this way are taken seriously (because these kinds of extreme tests are often considered unrealistic).

You have to be prepared to deal with Architecture Breakers. Anticipating problems and partitioning your architecture using something like the Stability Patterns in Michael Nygard’s excellent book Release It! will at least keep serious run-time errors from spreading and taking an entire system out (these strategies will also help with scaling and with containing security attacks). And if and when you do see a “once in a million” error in reviews or testing or production, understand how serious it can be, and act right away – before a Dreamliner turns into a nightmare.

Thursday, April 11, 2013

Software Security Status Quo?

Veracode has released the company’s State of Software Security Report for 2012, the 5th in a series of annual reports that analyzes data collected from customers using Veracode’s cloud-based application security scanning services.

The Important Numbers

As Veracode’s data set continues to get bigger, with more customers and more apps getting scanned, the results get more interesting.

For Web apps, the state of vulnerabilities remains unchanged over the past 18 months:

  • 1/3 of apps remain vulnerable to SQL Injection
  • 2/3 of apps remain vulnerable to XSS, and at least half of all vulnerabilities found in scanning are XSS vulnerabilities
Over the last 3 years, there are no significant changes in the occurrence of different vulnerabilities that cannot be accounted for by changes and improvements in Veracode's technology or testing methods.

For mobile platforms (Android, iOS and Java ME), the most common vulnerabilities found are related to crypto: 64% of Android apps, 58% of iOS apps, and 47% of Java ME apps have crypto vulnerabilities. Outside of crypto, the vulnerability distributions for the different mobile platforms are quite different. It’s possible that these differences are due to fundamental strengths and weaknesses of each platform (different architectures, different APIs and default capabilities provided), but I think that it is still too early to draw meaningful conclusions from this data, as the size of the data set is still very small (although it continues to increase in size, from 1% of the total sample to 3% over the last 18 months).

But Security Vulnerabilities are Getting Fixed, Right?

Some interesting data on remediation, based on Veracode customers resubmitting the same code base for subsequent scans. Almost half of their customers resubmit all or almost all of their apps for re-scanning, regardless of how critical the app is considered to the customer’s business. What’s interesting is which vulnerabilities people chose to fix - bugs that are found in the first scan, but don’t show up later.

For Java, the bugs that are most often fixed are:

  1. Untrusted search path
  2. CRLF injection
  3. Untrusted initialization
  4. Session fixation
  5. Dangerous function

So the first bugs to be fixed seem to be the easiest ones for developers to understand and take care of – low hanging fruit. Remediation decisions don’t seem to be based on risk, but on “let’s see what we can fix now and get the security guys off of our backs”. Security bugs are getting fixed, but it’s clear that SQL Injection and XSS bugs aren’t getting fixed fast enough, because there too many of these vulnerabilities to fix, and because many developers still don’t understand these problems well enough to fix them or prevent them in the first place. PHP developers are much more likely to remediate SQL injection vulnerabilities than Java or .NET developers, but it’s not clear why.

The Art and Science of Predictions

The report results were presented today in a webinar titled “We See the Future … and it’s Not Pretty”, which walked through the data and the predictions that Veracode drew from the data. While the findings seem sound, the predictions are less so: for example, that there will be higher turnover in security jobs (including CISO positions) because appsec programs are not proving effective, and security staff will give up – or get fired – as a result. I can’t see the thread that leads from the data to these conclusions. The authors should read (or re-read) The Signal and the Noise to understand what should go into a high-quality prediction, and what people should try to predict and what they shouldn't.

Tuesday, April 9, 2013

Penetration Testing Shouldn't be a Waste of Time

In a recent post on “Debunking Myths: Penetration Testing is a Waste of Time”, Rohit Sethi looks at some of the disadvantages of the passive and irresponsible way that application pen testing is generally done today: wait until the system is ready to go live, hire an outside firm or consultant, give them a short time to try to hack in, fix anything important that they find, maybe retest to get a passing grade, and now your system is 'certified secure'.

A test like this “doesn't tell you:

  • What are the potential threats to your application?
  • Which threats is your application “not vulnerable” to?
  • Which threats did the testers not assess your application for? Which threats were not possible to test from a runtime perspective?
  • How did time and other constraints on the test affect the reliability of results? For example, if the testers had 5 more days, what other security tests would they have executed?
  • What was the skill level of the testers and would you get the same set of results from a different tester or another consultancy?”

Sethi stresses the importance of setting expectations and defining requirements for pen testing. An outside pen tester will not be able to understand your business requirements or the internals of the system well enough to do a comprehensive job – unless maybe if your app is yet another straightforward online portal or web store written in PHP or Ruby on Rails, something that they have seen many times before.

You should assume that pen testers will miss something, possibly a lot, and there’s no way of knowing what they didn't test or how good a job they actually did on what they did test. You could try defect seeding to get some idea of how careful and smart they were (and how many bugs they didn’t find), but this assumes that you know an awful lot about your system and about security and security testing (and if you’re this good, you probably don’t need their help). Turning on code coverage analysis during the test will tell you what parts of the code didn't get touched – but it won’t help you identify the code that you didn't write but should have, which is often a bigger problem when it comes to security.

You can’t expect a pen tester to find all of the security vulnerabilities in your system – even if you are willing to spend a lot of time and money on it. But pen tests are important because this is a way to find things that are hard for you to find on your own:

  1. Technology-specific and platform-specific vulnerabilities
  2. Configuration and deployment mistakes in the run-time environment
  3. Pointy-Hat problems in areas like authentication and session management that should have been taken care of by the framework that you are using, if it works and if you are using it properly
  4. Fussy problems in information leakage, object enumeration and error handling – problems that look small to you but can be exploited by an intelligent and motivated attacker with time on their side
  5. Mistakes in data validation or output encoding and filtering, that look small to you but…
And if you’re lucky, some other problems that you should have caught on your own but didn’t, like weaknesses in workflow or access control or password management or a race condition.

Pen testing is about information, not about vulnerabilities

The real point of pen testing, or any other kind of testing, is not to find all of the bugs in a system. It is to get information.
  1. Information on examples of bugs in the application that need to be reviewed and fixed, how they were found, and how serious they are.
  2. Information that you can use to calibrate your development practices and controls, to understand just how good (or not good) you are at building software.

Testing doesn't provide all possible information, but it provides some. Good testing will provide lots of useful information. James Bach (Satisfice)

This information leads to questions: How many other bugs like this could there be in the code? Where else should we look for bugs, and what other kinds of bugs or weaknesses could there be in the code or the design? Where did these bugs come from in the first place? Why did we make that mistake? What didn't we know or what didn't we understand? Why didn't we catch the problems earlier? What do we need to do to prevent them or to catch them in the future? If the bugs are serious enough, or there are enough of them, this means going all the way through RCA and exercises like 5 Whys to understand and address Root Cause.

To get high-quality information, you need to share information with pen testers. Give the pen tester as much information as possible

  • Walk through the app with pen testers, hilight the important functions, and provide documentation
  • Take time to explain the architecture and platform
  • Share results of previous pen tests
  • Provide access behind proxies etc

Ask them for information in return: ask them to explain their findings as well as their approach, what they tried and what they covered in their tests and what they didn't, where they spent most of their time, what problems they ran into and where they wasted time, what confused them and what surprised them. Information that you can use to improve your own testing, and to make pen testing more efficient and more effective in the future.

When you’re hiring a pen tester, you’re paying for information. But it’s your responsibility to get as much good information as possible, to understand it and to use it properly.

Thursday, April 4, 2013

How do you measure Devops?

If you’re trying to convince yourself (or the team or management) that your operations program needs to be changed for the better, and that trying a Devops approach makes sense – or that your operations organization is improving, and that whatever changes you have made actually make a difference – you have to measure something(s). But what?

Measuring Culture

John Clapham at Nokia suggests that you should try to measure how healthy your operations culture is. At the Devops Days conference this year in London he talked about ways to measure and monitor culture – behaviour, attitudes and values – to determine whether people were focused on the “right things”, and to assess the team’s motivation and satisfaction. Nokia had already started a Devops program, and wanted to see whether the momentum for change and improvement was still there after the initial push and evangelism had worn off. So they came up with a set of vital signs that they felt would capture the important behaviours and attitudes:

  1. Cycle time – time from development to deployment in production. Are we moving faster, or fast enough?
  2. Shared purpose – do people all share/believe in the same goals, believe in improving how development and ops work together?
  3. Motivation – does everyone care about what they are doing?
  4. Collaboration – are people working together willingly?
  5. Effectiveness – is everyone’s time being spent in a useful way? How much time is being wasted?
Cycle time is the only measure that is relatively easy to measure and report. The rest are highly subjective and fuzzy. Nokia tried to collect this information through a questionnaire that asked questions like: Do you believe there are opportunities to improve ways of working? How much time do you spend on stability, overhead, improvements, innovation? What’s in your way: lack of time, pressure to focus on features, poor tools, lack of management support, nothing…?

Operations Vital Signs that you Can and Should Measure

Clapham’s closing question was: “What vital signs would you look for?”

I'm not convinced that you can measure an organization’s cultural effectiveness, or that it would be really useful if you could. You can’t tell from a wishy-washy questionnaire whether change is making a real difference to the organization’s effectiveness and you are on the right track; or help you understand what you need to change and what the impact of change would be on the bottom line (or the top line). To do this you need concrete and results-based measurements which point out strong points and weaknesses, and that you can use to make a case for change, or justify your decisions.

Puppet Labs and IT Revolution Press have recently published a “State of Devops Report”, which is full of interesting data. The report stresses the importance of metrics in understanding how your organization is performing and why a Devops program is, or would be, worthwhile. They provide a list of objective measures, broken down into two major types.

Agility and reliability metrics:

  1. Deployment rate/frequency
  2. Change lead time – how long it takes to get a change approved and into production
  3. Change failure rate (John Allspaw's brilliant presentation “Ops Meta-Metrics” explains the importance of correlating deployment frequency/size/type and failures – type and severity – in production)
  4. Mean time to recover (and mean time to detect)
Functional metrics:
  1. Test cycle time – how long does it take to test a change?
  2. Deployment time – how long does it take to roll out a new change once tested and approved?
  3. Defect rate in production (defect escape rate)
  4. Helpdesk ticket counts – how much time is spent firefighting?

There are two other important measures that are missing from this list:

  1. Operations costs
  2. Employee retention – a key measure of whether people are happy

Measuring the success of a DevOps program is simple:
If you aren't saving money
If you can’t make change easier and faster
If you don’t improve quality and reliability and your organization’s ability to respond to problems
If you can’t keep good people
... then whatever you’re doing is not working or you’re not doing it right. It doesn't matter if you are “doing DevOps” or using certain tools or if people seem to be more collaborative or believe that they have a greater sense of shared purpose. What matters is the outcome. Make sure that you’re measuring the right things – so that you know that you are doing the right things.

Tuesday, April 2, 2013

War Games, Pair Testing and Other Fun Ways to Find Bugs

I've already examined how important good testing is to the health of a project, a product and an organization. There’s a lot more to good testing than running an automated test suite in Continuous Integration and forcing someone to walk through functional test scripts and check lists. A good tester will spend time exploring the app, making sure that they really understand it and that the app actually makes sense, finding soft spots and poking them to uncover problems that nobody expects, providing valuable information and feedback to the team.

What’s better than a good tester? Two good testers working together…

Pair Testing – Two Heads are Better than One

Pair Testing is an exploratory testing approach where two testers work through scenarios together, combining their knowledge of the app and their unique skills and experience to duplicate hard-to-find bugs or to do especially deep testing of some part of a system. Like in pair programming, one person drives, defining the goals of the testing session, the time limit and the starting scenarios and providing the hands at the keyboard; and the other person navigates, observes, takes notes, advises, asks questions, double checks, challenges and causes trouble. As a pair they can help each other through misunderstandings and blocks, build on each other’s ideas to come up with new variations and more ways to attack the app, push each other to find more problems, and together they have a better chance of noticing small inconsistencies and errors that the other person might not consider important.

Pair testing can be especially effective if you pair developers and testers together – a good tester knows where to look for problems and how to break software; a good developer can use their understanding of the code and design to suggest alternative scenarios and variations, and together they can help each other recognize inconsistencies and identify unexpected behaviour. This is not just a good way to track down bugs – it’s also a good way for people to learn from each other about the app and about testing in general. In our team, developers and testers regularly pair up to review and test hard problems together, like validating changes to complex business rules or operational testing of distributed failover and recovery scenarios.

Pair testing, especially pairing developers and testers together, is a mature team practice. You need testers and developers who are confident and comfortable working together, who trust and respect each other, who understand the value and purpose of exploratory testing, and who are all willing to put the time in to do a good job.

War Games and Team Testing

If two heads are better than one, then what about four heads, or eight, or ten or …?

You can get more perspectives and create more chances to learn by running War Games: team testing sessions which put a bunch of people together and try to get as close as possible to recreating real-life conditions. In team testing, one person defines the goals, roles, time limit and main scenarios. Multiple people end up driving, each playing different roles or assuming different personas, some people trying crazy shit to see what happens, others being more disciplined, while somebody else shoulder surfs or looks through logs and code as people find problems. More people means more variations and more chances to create unexpected situations, more eyes to look out for inconsistencies and finishing details (“is the system supposed to do this when I do that?”), and more hands to try the same steps at the same time to test for concurrency problems. At worst, you’ll have a bunch of monkeys bashing at keyboards and maybe finding some bugs. But a well-run team test session is a beautiful thing, where people feed on each other’s findings and ideas and improvise in a loosely structured way, like a jazz ensemble.

Testing this way makes a lot of sense for interactive systems like online games, social networks, online stores or online trading: apps that support different kinds of users playing different roles with different configurations and different navigation options that can lead to many different paths through the app and many different experiences.

With so many people doing so many things, it’s important that everyone (or at least someone) has the discipline to keep track of what they are doing, and make notes as they find problems. But even if people are keeping decent notes, sometimes all that you really know is that somebody found a problem, but nobody is sure what exactly they were doing at the time or what the steps are to reproduce the problem. It can be like finding a problem in production, so you need to use similar troubleshooting techniques, rely more on logs and error files to help retrace steps.

Team testing can be done in large groups, sometimes even as part of acceptance testing or field testing with customers. But there are diminishing returns: as more people get involved, it’s harder to keep everyone motivated and focused, and harder to understand and deal with the results. We used to invite the entire team into team testing sessions, to get as many eyes as possible on problems, and to give everyone an opportunity to see the system working as a whole (which is important when you are still building it, and everyone has been focused on their pieces).

But now we've found that a team as small as four to six people who really understand the system is usually enough, better than two people, and much more efficient than ten, or a hundred. You need enough people to create and explore enough options, but a small enough group that everyone can still work closely together and stay engaged.

Team testing is another mature team practice: you need people who trust each other and are comfortable working together, who are reasonably disciplined, who understand exploratory testing and who like finding bugs.

Let's Play a Game

We relied on War Games a lot when we were first building the system, before we had good automated testing coverage in place. It was an inefficient, but effective way to increase code coverage and find good bugs before our customers did.

We still rely on War Games today, but now it’s about looking for real-life bugs: testing at the edges, testing weird combinations and workflow chaining problems, looking closely for usability and finishing issues, forcing errors, finding setup and configuration mistakes, and hunting down timing errors and races and locking problems.

Team testing is one of the most useful ways to find subtle (and not so subtle) bugs and to build confidence in our software development and testing practices. Everyone is surprised, and sometimes disappointed, by the kinds of problems that can be found this way, even after our other testing and reviews have been done. This kind of testing is not just about finding bugs that need to be fixed: it points out areas where we need to improve, and raises alarms if too many – or any scary – problems are found.

This is because War Games only make sense in later stages of development, once you have enough of a working system together to do real system testing, and after you have already done your basic functional testing and regression. It’s expensive to get multiple people together, to set up the system for a group of people to test, to define the roles and scenarios, and then to run the test sessions and review the results – you don’t want to waste everyone’s time finding basic functional bugs or regressions that should have and could have been picked up earlier. So whatever you do find should be a (not-so-nice) surprise.

War Games can also be exhausting – good exploratory testing like this is only effective if everyone is intensely involved, it takes energy and commitment. This isn’t something that we do every week or even every iteration. We do it when somebody (a developer or a tester or a manager) recognizes that we’ve changed something important in workflow or the architecture or business rules; or decides that it’s time, because we’ve made enough small changes and fixes over enough iterations or because we’ve seen some funny bugs in production recently, time to run through key scenarios together as a group and see what we can find.

What makes War Games work is that they are games: an intensity and competition builds naturally when you get smart people working together on a problem, and a sense of play.

“Framing something like software testing in terms of gaming, and borrowing some of their ideas and mechanics, applying them and experimenting can be incredibly worthwhile.”
Jonathan Kohl, Applying Gamification to Software Testing

When people realize that it’s fun to find more bugs and better bugs than the other people on the team, they push each other to try harder, which leads to smarter and better testing, and to everyone learning more about the system. It’s a game, and it can be fun – but it’s serious business too.