Wednesday, August 29, 2012

Contracting in Agile – You try it

One of the key principles in Agile development is

“Customer collaboration over contract negotiation”
Unfortunately, that means that if you’re trying to follow Agile methods, you’re left without useful guidelines to follow when it comes to contracting and coming up with contracts that fit the way that Agile teams work.

Time-and-materials of course is a no-brainer, regardless of how the team works – do the work, track the time and other costs, and charge the customer as you do it. But it’s especially challenging for people who have to work within contract structures such as fixed price / fixed scope, which is the way that many government contracts are awarded and the way that a number of large businesses still contract development work.

The advice for Agile teams usually runs something like: it’s up to you to convince the purchaser to change the rules and accept a fuzzier, more balanced way of contracting, with more give-and-take. Something that fits the basic assumptions of Agile development: that costs (mostly the people on the team) and schedule can be fixed, but the scope needs to be flexible and worked out as the project goes on.

But in many business situations the people paying for the work aren’t interested in changing how they think and plan – it’s their money and they want what they want when they want it. They are calling the shots. If you don’t comply with the terms of the bidding process, you don’t get the opportunity to work with the customer at all. And the people paying you (your management) also need to know how much it is going to cost and when it is going to be done and what the risks are so they know if they can afford to take the project on. This puts the developers in a difficult (maybe impossible) situation.

Money for Nothing and Change for Free

Jeff Sutherland, one of the creators of Scrum, proposes a contract structure called “Money for Nothing and your Change for Free”. The development team delivers software incrementally – if they are following Scrum properly, they should start with the work that is most important to the customer first, and deliver what the customer needs the most as early as possible. The customer can terminate the contract at any point (because they’ve already got what they really need), and pay some percentage of the remainder of the contract to compensate the developer for losing the revenue that they planned to get for completing the entire project. So obviously, the payment schedule for the contract can’t be weighted towards the end of the project (no large payments on “final acceptance” since it may never happen). That’s the “money for nothing” part.

“Change for free” means that the customer can’t add scope to the project, but can make changes as long as they substitute work still to be done in the backlog with work that is the same size or smaller. So new work can come up, the customer can change their mind, but the overall size of the project remains the same, which means that the team should still be able to deliver the project by the scheduled end date.

To do this you have to define, understand and size all of the work that needs to be done upfront – which doesn’t fit well with the iterative, incremental way that Agile teams work. And it ignores the fact that changes still carry a price: the developers have to throw away the time that they spent upfront understanding what needed to be done enough to estimate it and the work that they went in to planning it, and they have to do more work to review and understand the change, estimate it and replan. Change is cheap in Agile development, but it’s not free. If the customer needs to make a handful of changes, the cost isn’t great. But it can become a real drag to delivery and add significant cost if a customer does this dozens or hundreds of times over a project.

Fixed Price and Fixed Everything Contracts

Fixed Price contracts, and especially what Alistair Cockburn calls Fixed-Everything contracts (fixed-price, fixed-scope and fixed-time too) are a nasty fact of business. Cockburn says that these contracts are usually created out of lack of trust – the people paying for the system to be developed don’t trust the people building the software to do what they need, and try to push the risk onto the development team. Even if people started out trusting each other, these contracts often create an environment where trust breaks down – the customer doesn’t trust the developers, the developers hide things from the customer, and the people who are paying the developers don’t trust anybody.

But it’s still a common way to contract work because for many customers it is easier for them to plan around and it makes sense for organizations that think of software development projects as engineering projects and that want to treat software engineering projects the same way as they do building a road or a bridge. This is what we told you we want, this is when we need it, that’s how much you said it was going to cost (including your risk and profit margin), we agree that’s what we’re willing to pay, now go build it and we’ll pay you when you get it done.

Cockburn does talk about a case where a team was successful in changing a fixed-everything contract into a time-and-materials contract over time, by working closely with the customer and proving that they could give the customer what they needed. After each delivery, the team would meet with the customer and discuss whether to continue with the contract as written or work on something that customer really needed instead, renegotiating the contract as they went on. I’ve seen this happen, but it’s rare, unless both companies do a lot of work together and the stakes of failure on a project are low.

Ken Schwaber admits that fixed price contracting can’t be done with Scrum projects (read the book). Again, the solution is to convince the customer to accept and pay for work in an incremental, iterative way.

Martin Fowler says that you can’t deliver a fixed price, fixed time and fixed scope contract without detailed, stable and accurate requirements – which he believes can’t be done. His solution is to fix the price and time, and then work with the customer to deliver what you can by the agreed end date, and hope that this will be enough.

The most useful reference I’ve found on contracting in Agile projects is the freely-available Agile Contracts Primer from Practices for Scaling Lean and Agile Development, by Arbogast, Larman And Vodde.

Their advice: avoid fixed-priced, fixed-scope (FPFS) contracts, because they are a lose-lose for both customer and supplier. The customer is less likely to get what they need because the supplier will at some point panic over delivery and be forced to cut quality; and if the supplier is able to deliver, the customer has to pay more than they should because of the risk premium that the supplier has to add. And working this way leads to a lack of transparency and to game playing on both sides.

But, if you have to do it:

  • Obviously it’s going to require up-front planning and design work to understand and estimate everything that has to get done – which means you have to bend Agile methods a lot.
  • You don’t have to allow changes – you can just work incrementally from the backlog that is defined upfront. Or you can restrict the customer to only changing their mind on priority of work to be done (which gives them transparency and some control), or allow them to substitute a new requirement for an existing requirement of the same size (Sutherland’s “Change for Free”).

To succeed in this kind of contract you have to:

  • Invest a lot to do detailed, upfront requirements analysis, some design work, thorough acceptance test definition and estimation – by experienced people who are going to do the work
  • Don’t allow changes in requirements or scope – just replacement / substitution
  • Increase the margin of the contract price
  • Make sure that you understand the problem you are working on – the domain and technology
  • Deliver important things early and hope that the customer will be flexible with you towards the end if you still can’t deliver everything.

PMI-ACP on Agile Contracting?

For all of the projects that have been delivered using Agile methods, contracting seems to be still a work in progress. There are lots of good ideas and suggestions, but no solid answers.

I’ve gone through the study guide materials for the PMI-ACP certification to see what PMI has to say about contracting in Agile projects. There is the same stuff about Sutherland’s “Money for Nothing and your Change for Free” and a few other options. It’s clear that the PMI didn’t take contracting in Agile projects on as a serious problem. This means that they missed another opportunity to help large organizations and people working with large organizations (the kind of people who are going to care about the PMI-ACP certification) to understand how to work with Agile methods in real-life situations.

Tuesday, August 21, 2012

What’s better – Big Fat Tests or Little Tests?

Like most startups, we built a lot of prototypes and wrote and threw out a lot of code as we tried out different ideas. Because we were throwing out the code anyways, we didn't bother writing tests - why write tests that you'll just throw away too?

But as we ramped the team up to build the prototype out into a working system, we got into trouble early. We were pushing our small test team too hard trying to keep up with changes and new features, while still trying to make sure that the core system was working properly. We needed to get a good automated test capability in place fast.

The quickest way to do this was by writing what Michael Feathers calls “Characterization Tests”: automated tests – written at inflection points in an existing code base – that capture the behavior of parts of a system, so that you know if you’ve affected existing behavior when you change or fix something. Once you’ve reviewed these tests to make sure that what the system is doing is actually what it is supposed to be doing, the tests become an effective regression tool.

The tests that we wrote to do this are bigger and broader than unit tests – they’re fat developer-facing tests that run beneath the UI and validate a business function or a business rule involving one or more system components or subsystems. Unlike customer-facing functional tests, they don't require manual setup or verification. Most of these tests are positive, happy path tests that make sure that important functions in the system are working properly, and that test validation functions.

Using fat and happy tests as a starting point for test automation is described in the Continuous Delivery book. The idea is to automate high-value high-risk test scenarios that cover as much of the important parts of the system as you can with a small number of tests. This gives you a “smoke test” to start, and the core of a test suite.

Today we have thousands of automated tests that run in our Continuous Integration environment. Developers write small unit tests, especially in new parts of the code and where we need to test through a lot of different logical paths and variations quickly. But a big part of our automated tests are still fat, or at least chubby, functional component tests and linked integration tests that explore different paths through the main parts of the system.

We use code coverage analysis to identify weak spots, areas where we need to add more automated tests or do more manual testing. Using a combination of unit tests and component tests we get high (90%+) test coverage in core parts of the application, and we exercise a lot of the general plumbing of the system regularly.

It’s easy to test server-side services this way, using a common pattern: set up initial state in a database or memory, perform some action using a message or API call, verify the expected results (including messages and database changes and in-memory state) and then roll-back state and prepare for the next test.

We also have hundreds of much bigger and fatter integration and acceptance tests that test client UI functions and client API functions through to the server. These “really big fat” tests involve a lot more setup work and have more moving parts, are harder to write and require more maintenance, and take longer to run. They are also more fragile and need to be changed more often. But they test real end-to-end scenarios that can catch real problems like intermittent system race conditions as well as regressions.

What’s good and bad about fat tests?

There are advantages and disadvantages in relying on fat tests.

First, bigger tests have more dependencies. They need more setup work and more test infrastructure, they have more steps, and they take longer to run than unit tests. You need to take time to design a test approach and to create templates and utilities to make it easy to write and maintain bigger tests.

You’ll end up with more waste and overlap: common code that gets exercised over and over, just like in the real world. You’ll have to put in better hardware to run the tests, and testing pipelines so that more expensive testing (like the really fat integration and acceptance testing) is done later and less often.

Feedback from big tests isn’t as fast or as direct when tests fail. Gerard Meszaros points out that the bigger the test, the harder is to understand what actually broke – you know that there is a real problem, but you have more digging to figure out where the problem is. Feedback to the developer is less immediate: bigger tests run slower than small tests and you have more debugging work to do. We’ve done a lot of work on providing contextual information when tests fail so that programmers can move faster to figuring out what’s broken. And from a regression test standpoint, it’s usually obvious that whatever broke the system is whatever you just changed, so….

As you work more on a large system, it is less important to get immediate and local feedback on the change that you just made and more important to make sure that you didn’t break something else somewhere else, that you didn’t make an incorrect assumption or break a contract of some kind, or introduce a side-effect. Big component tests and interaction tests help catch important problems faster. They tell you more about the state of the system, how healthy it is. You can have a lot of small unit tests that are passing, but that won’t give you as much confidence as a smaller number of fat tests that tell you that the core functions of the system are working correctly.

Bigger tests also tell you more about what the system does and how it works. I don’t buy the idea that tests make for good documentation of a system – at least unit tests don’t. It’s unrealistic to expect a developer to pick up how a system works from looking at hundreds or thousands of unit tests. But new people joining a team can look at functional tests to understand the important functions of the system and what the rules of the system are. And testers, even non-technical manual testers, can read the tests and understand what tests scenarios are covered and what aren’t, and use this to guide their own testing and review work.

Meszaros also explains that good automated developer tests, even tests at the class or method level, should always be black box tests, so that if you need to change the implementation in refactoring or for optimization, you can do this without breaking a lot of tests. Fat tests make these black boxes bigger, raising it to a component or service level. This makes it even easier to change implementation details without having to fix tests – as long as you don’t change public interfaces and public behavior, (which are dangerous changes to make anyways), the tests will still run fine.

But this also means that you can make mistakes in implementation that won’t be caught by functional tests – behavior outside of the box hasn’t changed, but something inside the box might still be wrong, a mistake that won’t trip you up until later. Fat tests won’t find these kinds of mistakes, and they won’t catch other detailed mistakes like missing some validation.

It’s harder to write negative tests and to test error handling code this way, because the internal exception paths are often blocked at a higher level. You’ll need other kinds of testing, including unit tests and manual exploratory testing and destructive testing to check edge cases and catch problems in exception handling.

Would we do it this way again?

I’d like to think that if we started something brand new again, we’d start off in a more disciplined way, test first and all that. But I can’t promise. When you are trying to get to the right idea as quickly as possible, anything that gets in the way and slows down thinking and feedback is going to be put aside. It's once you’ve got something that is close-to-right and close-to-working and you need to make sure that it keeps working, that testing becomes an imperative.

You need both small unit tests and chubby functional tests and some big fat integration and end-to-end tests to do a proper job of automated testing. It’s not an either/or argument.

But writing fat, functional and interaction tests will pay back faster in the short-term, because you can cover more of the important scenarios faster with fewer tests. And they pay back over time in regression, because you always know that aren’t breaking anything important, and you know that you are exercising the paths and scenarios that your customers are or will be – the paths and scenarios that should be tested all of the time. When it comes to automated testing, some extra fat is a good thing.

Friday, August 17, 2012

Does the PMI-ACP set the bar high enough on Risk Management?

I’m trying to understand the PMI’s new certification for Agile Certified Practitioners, and what value the PMI brings to managing software development projects using Agile methods. So I bought RMC’s PMI-ACP Exam Prep Guide which is written by Mike Griffiths, a guy who understands a lot about project management and Agile methods, and who has been heavily involved in the PMI-ACP program.

How PMI-ACP looks at Risk

I started with how the PMI says risk management should be done in Agile projects. Unlike the PMBOK, the PMI-ACP does not treat risk management as a knowledge area. Instead, it integrates risk into the different practice domains and activities in Agile projects, from prioritization to delivery and problem management.

The first mention of risk management is in “Value-Driven Delivery”, treating risks as “anti-value” when considering what is important to the customer and the business. Fair enough.

Later in the same section there is a discussion of how risks need to be considered when managing the backlog – that you should schedule risk avoidance and risk mitigation activities early in the project, and explaining how to rank work by business value and risk. They suggest leveling the playing field by ranking all work (new features and changes and risks) by financial value, expressing everything in monetary terms. Risks have a negative financial return: risk impact in $ x risk probability in %. This only applies to risks that have avoidance / mitigation activities that can be scheduled and costed in the project – not for risks that are accepted or transferred.

I like the approach of managing risks the same as any other work, using the same costing and prioritization approach. It’s more consistent and more actionable than managing risks from separate lists.

Risk Management comes up one more time in Value-Driven Delivery, under a discussion of reporting tools and techniques, in this case how to create and use Risk Burn Down reports.

Then risk comes up again in Adaptive Planning – which makes sense. Risk assessment, like everything else in planning an Agile project, needs to be done incrementally and iteratively. But unfortunately there’s not a lot on how teams are supposed to identify risks in planning.

Later Griffiths suggests a collaborative game called Speedboat or Sailboat, to help the team come up with a list of risks and opportunities. This is Agile, so everything including risk management needs to be fun, and we don’t want to get people bummed out, so it’s important to spend time identifying opportunities too upfront. Team members post anchor (risk) and wind (opportunity) sticky notes around the picture of a boat on the water. Isn’t that nice…

Griffith does say that

“For any project, we should engage the development team, sponsors, customers, and other relevant stakeholders in the process of risk identification. Their ideas, along with reviews of previous projects’ lessons learned lists, risk logs and industry risk profiles, should be used to identify the known and likely risks for the project.”
But you can only use “lessons learned lists” and “risk logs” from previous projects if somebody on the previous project created them – and there are no actions in the PMI-ACP description of risk management to make sure that this gets done. As part of Continuous Improvement, Agile teams do conduct lessons learned reviews in each iteration, rather than waiting until the end of the project (a step that is often skipped because time and money have run out). The point is to act on lessons learned information immediately – not maybe on some other project in the future. This is good, but if people don’t save information for future use, then you can’t talk about using it in the future.

The last reference to risk management is under Problem Detection and Resolution – recommending running risk-based spikes early in the project to assess technical risks. Emphasizing that it is better to find out and fail early if you run into technical problems or limitations.

Is integrated and implicit risk management enough?

The PMI-ACP emphasizes integrated and active risk management as part of incremental planning and delivery.

“Risk management should serve as a driver for work scheduling, moving high-risk activities into earlier iterations of the project and incorporating risk mitigation activities into the backlog.”
Because risk-management activities are treated the same as other backlog items, work is always being done on reducing or containing risk based on negative value. But because risk management is built-in to different practice domains and into different tools and techniques, there’s no one place to understand how risk management should be done in an Agile project, and to assess whether it is being done well or not. You need to look at each practice area and how risk applies in each context. The way that it is organized makes it difficult to get your head around how risk management should be done in an Agile way – which is a source of risk in itself.

My criticisms aren’t of the study guide, which is well-written. They are of the PMI and the PMI-ACP framework. The PMI-ACP does put more emphasis on risk management than other descriptions of Agile development that I have seen so far. But it’s disappointing that the PMI did not take the opportunity to shore up a fundamental weakness in the Agile approach to development and recommend making risk management explicit, adding risk management activities to planning and reviews as a standard practice.

Some of these ideas are described in the Software Project Manager’s Bridge to Agility, a book that maps Agile development to PMI’s project management framework, and one of the books referenced in the PMI-ACP. But in the PMI-ACP as it is described, like most Agile development today, there’s too much reliance on the kind of risk management that comes for free in iterative and incremental development. This is probably enough for small teams working on simple application development projects, but that’s not the audience for PMI certification. Anyone using Agile methods on a larger scale or in high-risk development will need to look someplace else for help.

Wednesday, August 15, 2012

Rugged Implementation Guide - Strawman

The people behind Rugged Software have quietly made available a Strawman release of a “Rugged Implementation Guide”.

This guide is supposed to offer practical guidance on how to build a secure and reliable application following application security best practices. It is based on the idea of a “security story” for a system – a framework that identifies the technology and organization security lifelines and strategies, the defenses that implement these strategies, and assurance that the defenses are implemented correctly.

The guide walks through an example security story for the different lifelines and strategies at an online bank. These examples include just about every good idea and good practice you can think of, even including a “Simian Army" (of course) and a bug bounty program (in a bank… really?). It’s overwhelming. If doing all of this is what it takes to build “rugged software”, then I don’t think very many (any?) organizations are going to succeed unfortunately.

The cynical part of me noticed that the framework and examples were heavy on verification, especially third party verification: third party architecture reviews, third party application and network pen testing and code reviews. And comprehensive developer training. And the use of ESAPI and of course lots of other tools. All of this is good, and all of this is especially good for appsec vendors and appsec consultants (e.g., the people who wrote the guide).

I tried to understand the Implementation Guide in context of the “Rugged Handbook” which was also quietly released at the same time. There are some interesting and useful ideas in the Handbook - if you get past the introduction (I wasn’t able to the first time I tried to read it), which unfortunately contains pages and pages of stuff like “Work Together like an Ant Colony” and “Defend Yourself Like a Honey Badger”… Towards the end of the document the authors explain how to get people together to write a security story and how different people in the organization (executives, project managers, analysts, developers) should think about and take responsibility for application security.

Although there’s nothing in the Rugged Handbook on the Rugged Implementation Guide or how it is to be used, I think I understand that development organizations need to go out and create a “security story” that explains what they need to protect, how they are going to protect it, and how they will prove that they have protected it, using these two documents as help.

The “security story” framework is a good way of organizing and thinking about security for a system - once you get past the silly marketing junk about badgers or whatever. It looks at security both in the design of a system and how it is designed and built and supported, targeted I guess more towards developers than security professionals.

So now what?

With these documents from Rugged, development managers have another, still incomplete description of how to do secure development to go along with OpenSAMM and BSIMM and Microsoft’s SDL. The over-worked part of me isn't happy that now there are more things developers have to think about and try in order to do their jobs properly.

I know that there is more work to be done on these documents and ideas. But so far I don’t understand yet how or why this is going to change the way that developers build software – maybe that was in the part about the honey badgers that I didn’t read...

Tuesday, August 14, 2012

What can you get out of Kanban?

I’ve spent the last year or so learning more about Kanban, how to use it in software development and IT operations.

It’s definitely getting a lot of attention, and I want to see if it can help our development and operations teams work better.

What to Read, What to Read?

There’s a lot to read on Kanban – but not all of it is useful. You can be safe in ignoring anything that tries to draw parallels between Japanese manufacturing and software development. Anyone who does this doesn’t understand Lean manufacturing or software development. Kanban (big K) isn’t kanban (little k) and software development isn’t manufacturing.

Kanban in software development starts with David Anderson’s book. It’s heavy on selling the ideas behind Kanban, but unfortunately light on facts and examples – mostly because there wasn’t a lot of experience and data to draw on at the time the book was written. Everything is built up from two case studies in small maintenance organizations.

The first case study (which you can read here online) goes back to 2005, at a small offshore sustaining engineering team in Microsoft. This team couldn’t come close to keeping up with demand, because of the insanely heavyweight practice framework that they were following (CMMI and TSP/PSP) and all of the paperwork and wasted planning effort that they were forced to do. They were spending almost half of their time on estimating and planning, and half of this work would never end up being delivered because more work was always coming in.

Anderson and another Microsoft manager radically simplified the team’s planning, prioritization and estimation approach, cutting out a lot of the bullshit so that the team could focus instead on actually delivering what was important to the customers.

You can skip over the theory – the importance of Drum-Buffer-Rope – Anderson changed his mind later on the theory behind the work anyways. As he says:

“This is a case study about implementing common sense changes where they were needed”.

The second case study is about Anderson doing something similar with another ad hoc maintenance team at Corbis a couple of years later. The approach and lessons were similar, in this case focusing more on visual tracking of work and moving away from time boxing for maintenance and break/fix work, using explicit WIP limits instead as the forcing function to control the team’s work.

The rest of the book goes into the details of tracking work on task boards, setting and adjusting limits, and just-in-time planning. There’s also some discussion of effecting organizational change and continuous improvement, which you can get from any other Agile/Lean source.

Corey Ladas’ book Scrumban is a short collection of essays on Kanban and combining Scrum and Kanban. The book is essentially built around one essay, which, taking a Lean approach to save time and money, I suggest that you read here instead. You don’t need to bother with the rest of the book unless you want to try and follow along as Ladas works his ideas out in detail. The basic ideas are the same as Anderson (not surprising, since they worked together at Corbis):

Don’t build features that nobody needs right now. Don’t write more code than you can test. Don’t test more code than you can deploy.

David Anderson has a new book on Kanban coming out soon Unfortunately it looks like a rehash of some of his blog posts with updated commentary. I was hoping for something with more case studies and data, and follow-ups on the existing case studies to see if they were able to sustain their success over time.

There is other writing on Kanban, by excited people who are trying out (often in startups or small teams), or by consultants who have added Kanban to the portfolio of what they are selling – after all, “Kanban is the New Scrum”.

You can keep up with most of this through the Kanban weekly Roundup, which provides a weekly summary of links and discussion forums and news groups and presentations and training in Kanban and Lean stuff. And there’s a discussion group on Kanban development which is also worth checking out. But so far I haven’t found anything, on a consistent basis anyways, that adds much beyond what you will learn from reading Anderson’s first Kanban book.

Where does Kanban work best?

Kanban was created to help break/fix maintenance and sustaining engineering teams. Kanban’s pull-based task-and-queue work management matches the unpredictable, interrupt-driven and specialized work that these teams do. Kanban puts names and a well-defined structure around common sense ideas that most successful maintenance and support and operations teams already follow, which should help inexperienced teams succeed. It makes much more sense to use Kanban than to try to bend Scrum to fit maintenance and support, or following a heavyweight model like IEE 1219 or whatever it is being replaced by.

I can also see why Kanban has become popular with technology startups, especially web / SaaS startups following Continuous Deployment and a Lean Startup model. Kanban is about execution and continuous optimization, removing bottlenecks and reducing cycle time and reacting to immediate feedback. This is exactly what a startup needs to do once they have come up with their brilliant idea.

But getting control over work isn’t enough, especially over the longer-term. Kanban’s focus is mostly tactical, on the work immediately in front of the team, on identifying and resolving problems at a micro-level. There’s nothing that gets people to put their heads up and look at what they can and should do to make things better on a larger-scale – before problems manifest themselves as delays. You still need a layer of product management and risk management over Kanban, and you’ll also need a technical software development practice framework like XP.

I also don’t see how – or why – Kanban makes much of a difference in large-scale development projects, where flow and tactical optimization aren’t as important as managing and coordinating all of the different moving pieces. Kanban won’t help you scale work across an enterprise or manage large programs with lots of interdependencies. You still have to do project and program and portfolio management and risk management above Kanban’s micro-structure for managing day-to-day work, and you still need a SDLC.

Do you need Kanban?

Kanban is a tool to solve problems in how people get their work done. Anderson makes it clear that Kanban is not enough in itself to manage software development regardless of the size of the organization – it’s just a way for teams to get their work done better:

“Kanban is not a software development life cycle or a project management methodology…You apply a Kanban system to an existing software development process…”

Kanban can help teams that are being crushed under a heavy workload, especially support and maintenance teams. Combining Kanban with a triage approach to prioritization would be a good way to get through a crisis.

But I don’t see any advantage in using Kanban for an experienced development team that is working effectively.

Limiting Work in Progress? Time boxing already puts limits around the work that development teams do at one time, giving the team a chance to focus and get things done.

Although some people argue that iterations add too much overhead and slow teams down, you can strip overheads down so that time boxing doesn’t get in the way - so that you really are sprinting.

Getting the team to understand the flow of work, making delays and blockers visible and explicit is an important part of Kanban. But Microsoft’s Eric Brechner (er, I mean, “I. M. Wright”) explains that you don’t need Kanban and taskboards to see delays and bottlenecks or to balance throughput in an experienced development team:

“Good teams do this intuitively to avoid wasted effort. They ‘right-size’ their teams and work collaboratively to succeed together.”

And anybody who is working iteratively and incrementally is already doing or should be doing just-in-time planning and prioritization.

So for us, Kanban doesn’t seem to be worth it, at least for now. If your development team is inexperienced and can’t deliver (or don’t know what they need to deliver), or you’re in a small maintenance or firefighting group, Kanban is worth trying. If Kanban can help operations and maintenance teams survive, and help some online startups launch faster, if that’s all that people ever do with Kanban, the world will still be a better place.

Thursday, August 9, 2012

Bug Fixing – to Estimate, or not to Estimate, that is the question

According to Steve McConnell in Code Complete (data from 1975-1992) most bugs don’t take long to fix. About 85% of errors can be fixed in less than a few hours. Some more can be fixed in a few hours to a few days. But the rest take longer, sometimes much longer – as I talked about in an earlier post.

Given all of these factors and uncertainty, how to you estimate a bug fix? Or should you bother?

Block out some time for bug fixing

Some teams don’t estimate bug fixes upfront. Instead they allocate a block of time, some kind of buffer for bug fixing as a regular part of the team’s work, especially if they are working in time boxes. Developers come back with an estimate only if it looks like the fix will require a substantial change – after they’ve dug into the code and found out that the fix isn’t going to be easy, that it may require a redesign or require changes to complex or critical code that needs careful review and testing.

Use a rule of thumb placeholder for each bug fix

Another approach is to use a rough rule of thumb, a standard place holder for every bug fix. Estimate ½ day of development work for each bug, for example. According to this post on Stack Overflow the ½ day suggestion comes from Jeff Sutherland, one of the inventors of Scrum.

This place holder should work for most bugs. If it takes a developer more than ½ day to come up with a fix, then they probably need help and people need to know anyways. Pick a place holder and use it for a while. If it seems too small or too big, change it. Iterate. You will always have bugs to fix. You might get better at fixing them over time, or they might get harder to find and fix once you’ve got past the obvious ones.

Or you could use the data earlier from Capers Jones on how long it takes to fix a bug by the type of bug. A day or half day works well on average, especially since most bugs are coding bugs (on average 3 hours) or data bugs (6.5 hours). Even design bugs on average only take little more than a day to resolve.

Collect some data – and use it

Steve McConnell, In Software Estimation: Demystifying the Black Art says that it’s always better to use data than to guess. He suggests collecting time data for as little as a few weeks or maybe a couple of months on how long on average it takes to fix a bug, and use this as a guide for estimating bug fixes going forward.

If you have enough defect data, you can be smarter about how to use it. If you are tracking bugs in a bug database like Jira, and if programmers are tracking how much time they spend on fixing each bug for billing or time accounting purposes (which you can also do in Jira), then you can mine the bug database for similar bugs and see how long they took to fix – and maybe get some ideas on how to fix the bug that you are working on by reviewing what other people did on other bugs before you. You can group different bugs into buckets (by size – small, medium, large, x-large – or type) and then come up with an average estimate, and maybe even a best case, worst case and most likely for each type.

Use Benchmarks

For a maintenance team (a sustaining engineering or break/fix team responsible for software repairs only), you could use industry productivity benchmarks to project how many bugs your team can handle. Capers Jones in Estimating Software Costs says that the average programmer (in the US, in 2009), can fix 8-10 bugs per month (of course, if you’re an above-average programmer working in Canada in 2012, you’ll have to set these numbers much higher). Inexperienced programmers can be expected to fix 6 a month, while experienced developers using good tools can fix up to 20 per month.

If you’re focusing on fixing security vulnerabilities reported by a pen tester or a scan, check out the remediation statistical data that Denim Group has started to collect, to get an idea on how long it might take to fix a SQL injection bug or an XSS vulnerability.

So, do you estimate bug fixes, or not?

Because you can’t estimate how long it will take to fix a bug until you’ve figured out what’s wrong, and most of the work in fixing a bug involves figuring out what’s wrong, it doesn’t make sense to try to do an in-depth estimate of how long it will take to fix each bug as they come up.

Using simple historical data, a benchmark, or even a rough guess place holder as a rule-of-thumb all seem to work just as well. Whatever you do, do it in the simplest and most efficient way possible, don’t waste time trying to get it perfect – and realize that you won’t always be able to depend on it.

Remember the 10x rule – some outlier bugs can take up to 10x as long to find and fix than an average bug. And some bugs can’t be found or fixed at all – or at least not with the information that you have today. When you’re wrong (and sometimes you’re going to be wrong), you can be really wrong, and even careful estimating isn’t going to help. So stick with a simple, efficient approach, and be prepared when you hit a hard problem, because it's gonna happen.

Wednesday, August 8, 2012

Fixing Bugs that can’t be Reproduced

There are bugs that can’t be reproduced, or at least not easily: intermittent and transient errors; bugs that disappear when you try to look for them; bugs that occur as the result of a long chain of independent operations or cross-request timing. Some of these bugs are only found in high-scale production systems that have been running for a long time under heavy load.

Capers Jones calls these bugs “abeyant defects” and estimates that in big systems, as much as 10% of bugs cannot be reproduced, or are too expensive to try reproducing. These bugs can cost 100x more to fix than a simple defect – an “average” bug like this can take more than a week for somebody to find (if they can be found at all) by walking through the design and code, and up to another week or two to fix.

Heisenbugs

One class of bugs that can’t be reproduced are Heisenbugs: bugs that disappear when you attempt to trace or isolate them. When you add tracing code, or step through the problem in a debugger, the problem goes away.

In Debug It!, Paul Butcher offers some hope for dealing with these bugs. He says Heisenbugs are caused by non-deterministic behavior which in turn can only be caused by:

  1. Unpredictable initial state – a common problem in C/C++ code
  2. Interaction with external systems – which can be isolated and stubbed out, although this is not always easy
  3. Deliberate randomness – random factors can also be stubbed out in testing
  4. Concurrency – the most common cause of Heisenbugs today, at least in Java.

Knowing this (or at least making yourself believe it while you are trying to find a problem like this) can help you decide where to start looking for a cause, and how to go forward. But unfortunately, it doesn’t mean that you will find the bug, at least not soon.

Race Conditions

Race conditions used to be problems only for systems programmers and people writing communications handlers. But almost everybody runs into race conditions today – is anybody writing code that isn’t multi-threaded anymore? Races are synchronization errors that occur when two or more threads or processes access the same data or resource without a consistent locking approach.

Races can result in corrupted data – memory getting stomped on (especially in C/C++), changes applied more than once (balances going negative) or changes being lost (credits without debits) – inconsistent UI behaviour, random crashes due to null pointer problems (a thread references an object that has already been freed by another thread), intermittent timeouts, and in actions executed out of sequence, including time-of-check time-of-use security violations.

The results will depend on which thread or process wins the race a particular time. Because races are the result of unlucky timing, and because they problem may not be visible right away (e.g., something gets stepped on but you don’t know until much later when something else tries to use it), they’re usually hard to understand and hard to make happen.

You can’t fix (or probably even find) a race condition without understanding concurrency. And the fact that you have a race condition is a good sign that whoever wrote this code didn’t understand concurrency, so you’re probably not dealing with only one mistake. You’ll have to be careful in diagnosing and especially in fixing the bug, to make sure that you don’t change the problem from a race into a stall, thread starvation or livelock, or deadlock instead, by getting the synchronization approach wrong.

Fixing bugs that can’t be reproduced

If you’re think you’re dealing with a concurrency problem, a race condition or a timing-related bug, try introducing log pauses between different threads – this will expand the window for races and timing-related problems to occur, which should make the problem more obvious. This is what IBM Research’s ConTest tool does (or did - unfortunately, ConTest seems to have disappeared off of the IBM alphaWorks site), messing with thread scheduling to make deadlocks and races occur more often.

If you can’t reproduce the bug, that doesn’t mean that you give up. There are still some things to look at and try.

It’s often faster to find concurrency bugs, timing problems and other hard problems by working back from the error and stepping through the code – in a debugger or by hand – to build up your own model of how the code is supposed to work.

“Even if you can’t easily reproduce the bug in the lab, use the debugger to understand the affected code. Judiciously stepping into or over functions based on your level of “need to know” about that code path. Examine live data and stack traces to augment your knowledge of the code paths.” Jeff Vroom, Debugging Hard Problems

I’ve worked with brilliant programmers who can work back through even the nastiest code, tracing through what’s going on until they see the error. For the rest of us, debugging is a perfect time to pair up. While I'm not sure that pair programming makes a lot of sense as an everyday practice for everyday work, I haven’t seen too many ugly bugs solved without two smart people stepping through the code and logs together.

It’s also important to check compiler and static analysis warnings – these tools can help point to easily-overlooked coding mistakes that could be the cause of your bug, and maybe even other bugs. Findbugs has statistical concurrency bug pattern checkers that can point out common concurrency bug patterns as well as lots of other coding mistakes that you can miss finding on your own.

There are also dynamic analysis tools that are supposed to help find race conditions and other kinds of concurrency bugs at run-time, but I haven’t seen any of them actually work. If anybody has had success with tools like this in real-world applications I would like to hear about it.

Shotgun Debugging, Offensive Programming, Brute-Force Debugging and other ideas

Debug It! recommends that if you are desperate, just take a copy of the code, and try changing something, anything, and see what happens. Sometimes the new information that results from your change may point you in a new direction. This kind of undirected “Shotgun Debugging” is basically wishful thinking, relying on luck and accidental success, what Andy Hunt and Dave Thomas call “Programming by Coincidence”. It isn’t something that you want to rely on, or be proud of.

But there are some changes that do make a lot of sense to try. In Code Complete, Steve McConnell recommends making “offensive coding” changes: adding asserts and other run-time debugging checks that will cause the code to fail if something “impossible” happens, because something “impossible” apparently is happening.

Jeff Vroom suggests writing your own debugging code:

“For certain types of complex code, I will write debugging code, which I put in temporarily just to isolate a specific code path where a simple breakpoint won’t do. I’ve found using the debugger’s conditional breakpoints is usually too slow when the code path you are testing is complicated. You may hit a specific method 1000′s of times before the one that causes the failure. The only way to stop in the right iteration is to add specific code to test for values of input parameters… Once you stop at the interesting point, you can examine all of the relevant state and use that to understand more about the program.”

Paul Butcher suggests that before you give up trying to reproduce and fix a problem, see if there are any other bugs reported in the same area and try to fix them – even if they aren’t serious bugs. The rationale: fixing other bugs may clear up the situation (your bug was being masked by another one); and by working on something else in the same area, you may learn more about the code and get some new ideas about your original problem.

Refactoring or even quickly rewriting some of the code where you think the problem might be can sometimes help you to see the problem more clearly, especially if the code is difficult to follow. Or it could possibly move the problem somewhere else, making it easier to find.

If you can’t find it and fix it, at least add defensive code: tighten up error handling and input checking, and add some logging – if something is happening that can’t happen or won’t happen for you, you need more information on where to look when it happens again.

Question Everything

Finally, The Pragmatic Programmer tells us to remember that nothing is impossible. If you aren’t getting closer to the answer, question your assumptions. Code that has always worked might not work in this particular case. Unless you have a complete trace of the problem in the original report, it’s possible that the report is incomplete or misleading – that the problem you need to solve is not the problem that you are looking at.

If you're trapped in a real debugging nightmare, look outside of your own code. The bug may not be in your code, but in underlying third party code in your service stack or the OS. In big systems under heavy load, I’ve run into problems in the operating system kernel, web servers, messaging middleware, virtual machines, and the DBMS. Your job when debugging a problem like this is to make sure that you know where the bug isn’t (in your code), and try to come up with a simple a test case as possible that shows this – when you’re working with a vendor, they’re not going to be able to setup and run a full-scale enterprise app in order to reproduce your problem.

Hard bugs that can't be reproduced can blow schedules and service levels to hell, and wear out even your best people. Taking all of this into account, let’s get back to the original problem of how or whether to estimate bug fixes, in the next post.

Tuesday, August 7, 2012

Fixing Bugs - if you can't reproduce them, you can't fix them

"Generally, if you can’t reproduce it, it’s almost impossible to fix". Anonymous programmer, Practices of Software Maintenance, Janice Singer
Fixing a problem usually starts with reproducing it – what Steve McConnell calls “stabilizing the error”.

Technically speaking, you can’t be sure you are fixing the problem unless you can run through the same steps, see the problem happen yourself, fix it, and then run through the same steps and make sure that the problem went away. If you can’t reproduce it, then you are only guessing at what’s wrong, and that means you are only guessing that your fix is going to work.

But let’s face it – it’s not always practical or even possible to reproduce a problem. Lots of bug reports don’t include enough information for you to understand what the hell the problem actually was, never mind what was going on when the problem occurred – especially bug reports from the field. Rahul Premraj and Thomas Zimmermann found in The Art of Collecting Bug Reports (from the book Making Software), that the two most important factors in determining whether a bug report will get fixed or not are:

  1. Is the description well-written, can the programmer understand what was wrong or why the customer thought something was wrong?
  2. Does it include steps to reproduce the problem, even basic information about what they were doing when the problem happened?
It’s not a lot to ask – from a good tester at least. But you can’t reasonably expect this from customers.

There are other cases where you have enough information, but don’t have the tools or expertise to reproduce a problem – for example, when a pen tester has found a security bug using specialist tools that you don’t have or don’t understand how to use.

Sometimes you can fix a problem without being able to see it happen in front of you, come up with a theory on your own, trusting your gut – especially if this is code that you recently worked on. But reproducing the problem first gives you the confidence that you aren’t wasting your time and that you actually fixed the right thing. Trying to reproduce the problem should almost always be your first step.

What’s involved in reproducing a bug?

What you want to do is to find, as quickly as possible, a simple test that consistently shows the problem, so that you can then run a set of experiments, trace through the code, isolate what’s wrong, and prove that it went away after you fixed the code.

The best explanation that I’ve found of how to reproduce a bug is in Debug It! where Paul Butcher patiently explains the pre-conditions (identifying the differences between your test environment and the customer’s environment, and trying to control as many of them as possible), and then how to walk backwards from the error to recreate the conditions required to make the problem happen again. Butcher is confident that if you take a methodical approach, you will (almost) always be able to reproduce the problem successfully.

In Why Programs Fail: A guide to Systematic Debugging, Andreas Zeller, a German Comp Sci professor, explains that it’s not enough just to make the problem happen again. Your goal is to come up with the simplest set of circumstances that will trigger the problem – the smallest set of data and dependencies, the simplest and most efficient test(s) with the fewest variables, the shortest path to making the problem happen. You need to understand what is not relevant to the problem, what’s just noise that adds to the cost and time of debugging and testing – and get rid of it. You do this using binary techniques to slice up the input data set, narrowing in on the data and other variables that you actually need, repeating this until the problem starts to become clear.

Code Complete’s chapter on Debugging is another good guide on how to reproduce a problem following a set of iterative steps, and how to narrow in on the simplest and most useful set of test conditions required to make the problem happen; as well as common places to look for bugs: checking for code that has been changed recently, code that has a history of other bugs, code that is difficult to understand (if you find it hard to understand, there’s a good chance that the programmers who worked on it before you did too).

Replay Tools

One of the most efficient ways to reproduce a problem, especially in server code, is by automatically replaying the events that led up to the problem. To do this you’ll need to capture a time-sequenced record of what happened, usually from an audit log, and a driver to read and play the events against the system. And for this to work properly, the behavior of the system needs to be deterministic – given the same set of inputs in the same sequence, the same results will occur each time. Otherwise you’ll have to replay the logs over and over and hope for the right set of circumstances to occur again.

On one system that I worked on, the back-end engine was a deterministic state machine designed specifically to support replay. All of the data and events, including configuration and control data and timer events, were recorded in an inbound event log that we could replay. There were no random factors or unpredictable external events – the behavior of the system could always be recreated exactly by replaying the log, making it easy to reproduce bugs from the field. It was a beautiful thing, but most code isn’t designed to support replay in this way.

Recent research in virtual machine technology has led to the development of replay tools to snapshot and replay events in a virtual machine. VMWare Workstation, for example, included a cool replay debugging facility for C/C++ programmers which was “guaranteed to have instruction-by-instruction identical behavior each time.” Unfortunately, this was an expensive thing to make work, and it was dropped in version 8, at the end of last year.

Replay Solutions provides replay for Java programs, creating a virtual machine to record the complete stream of events (including database I/O, network I/O, system calls, interrupts) as the application is running, and then later letting you simulate and replay the same events against a copy of the running system, so that you can debug the application and observe its behavior. They also offer similar application record and replay technology for mobile HTML5 and JavaScript applications. This is exciting stuff, especially for complex systems where it is difficult to setup and reproduce problems in different environments.

Fuzzing and Randomness

If the problem is non-deterministic, or you can't come up with the right set of inputs, one approach to try is to simulate random data inputs and watch to see what happens - hoping to happen on a set of input variables that will trigger the problem. This is called fuzzing. Fuzzing is a brute force testing technique that is used to uncover data validation weaknesses that can cause reliability and security problems. It's effective at finding bugs, but it’s a terribly inefficient way to reproduce a specific problem.

First you need to setup something to fuzz the inputs (this is easy if a program is reading from a file, or a web form – there are fuzzing tools to help with this – but a hassle if you need to write your own smart protocol fuzzer to test against internal APIs). Then you need time to run through all of the tests (with mutation fuzzing, you may need to run tens of thousands or hundreds of thousands of tests to get enough interesting combinations) and more time to sift through and review all of the test results and understand any problems that are found.

Through fuzzing you will get new information about the system to help you identity problem areas in the code, and maybe find new bugs, but you may not end up any closer to fixing the problem that you started on.

Reproducing problems, especially when you are working from a bad bug report (“the system was running fine all day, then it crashed… the error said something about a null pointer I think?”) can be a serious time sink. But what if you can’t reproduce the problem at all? Let’s look at that next…

Monday, August 6, 2012

Ask the Expert - John Steven on Threat Modeling

John Steven closes out a series of interviews with appsec experts on the challenges in security threat modeling and how teams can succeed with threat modeling. You can read the interview with John here.

Thursday, August 2, 2012

Fixing Bugs – there’s no substitute for experience

We've all heard that the only way to get good at fixing bugs in through experience – the school of hard knocks. Experienced programmers aren’t afraid, because they’ve worked on hard problems before, and they know what to try when they run into another one – what’s worked for them in the past, what hasn’t, what they’ve seen other programmers try, what they learned from them.

They’ve built up their own list of bug patterns and debugging patterns, and checklists and tools and techniques to follow. They know when to try a quick-and-dirty approach, use their gut, and when to be methodical and patient and scientific.They understand how to do binary slicing to reduce the size of the problem set. They know how to read traces and dump files. And they know the language and tools that they are working with.

It takes time and experience to know where to start looking, how to narrow in on a problem; what information is useful, what isn’t and how to tell the difference. And how to do all of this fast. We’re back to knowing where to tap the hammer again.

But how much of a difference does experience really make?

Steve McConnell’s Code Complete is about programmer productivity: what makes some programmers better than others, and what all programmers can do to get better. His research shows that there can be as much as a 10x productivity difference in the quality, amount and speed of work that top programmers can do compared to programmers who don't know what they are doing.

Debugging is one of the areas that really show this difference, that separates the men from the boys and the women from the girls. Studies have found a 20-to-1 or even 25-to-1 difference in the time it takes experienced programmers to find the same set of defects found by inexperienced programmers. That’s not all. The best programmers also find significantly more defects and make far fewer mistakes when putting in fixes.

What’s more important: experience or good tools?

In Applied Software Measurement, Capers Jones looks at 4 different factors that affect the productivity of programmers finding and fixing bugs:

  1. Experience in debugging and maintenance
  2. How good – or bad – the code structure is
  3. The language and platform
  4. Whether the programmers have good code management and debugging tools – and know how to use them.

Jones measures the debugging and bug fixing ability of a programmer by measuring assignment scope - the average amount of code that one programmer can maintain in a year. He says that the average programmer can maintain somewhere around 1,000 function points per year – about 50,000 lines of Java code.

Let’s look at some of this data to understand how much of a difference experience makes in fixing bugs.

Inexperienced staff, poor structure, high-level-language, no maintenance tools

Worst Average Best
150 300 500

Experienced staff, poor structure, high-level language, no maintenance tools

Worst Average Best
1150 1850 2800

This data shows a roughly 20:1 difference between experienced programmers and inexperienced programmers, on teams working with badly structured code and without good maintenance tools. Now let’s look at the difference good tools can make:

Inexperienced staff, poor structure, high-level language, good tools

Worst Average Best
900 1400 2100

Experienced staff, poor structure, high-level language, good tools

Worst Average Best
2100 2800 4500

Using good tools for code navigating and refactoring, reverse engineering, profiling and debugging can help to level the playing field between novice programmers and experts.

You’d have to be an idiot to ignore your tools (debuggers are for losers? Seriously?). But even with today’s good tools, an experienced programmer will still win out – 2x more efficient on average, 5x from best to worst case.

The difference can be effectively infinite in some cases. There are some bugs that an inexperienced programmer can’t solve at all – they have no idea where to look or what to do. They just don’t understand the language or the platform or the code or the problem well enough to be of any use. And they are more likely to make things worse by introducing new bugs trying to fix something than they are to fix the bug in the first place. There’s no point in even asking them to try.

You can learn a lot about debugging from a good book like Debug It! or Code Complete. But when it comes to fixing bugs, there’s no substitute for experience.