Building Real Software: How long can this go on?

Our team delivers software iteratively and incrementally, and over the past 3 years we have experimented with longer (1-2 months) and shorter (1-2 week) iterations, adjusting to circumstances, looking for the proper balance between cost and control.

There are obvious costs in managing an iteration: startup activities (planning, prioritization, kickoff, securing the team's commitment to the goals of the release), technical overheads like source code and build management (branching and merging and associated controls), status reporting to stakeholders, end-to-end system and integration testing, and closure activities (retrospectives, resetting). We don’t just deliver “ship-quality” software at the end of an iteration: in almost every case we go all the way to releasing the code to production, so our costs also include packaging, change control, release management, security and operations reviews, documentation updates and release notes and training, certifications with partners, data conversion, rollback testing, and pre- and post-implementation operations support. Yep, that’s a lot of work.

All of these costs are balanced against control: our ability to manage and contain risks to the project, to the product, and to the organization. I explored how to manage risks through iterative, incremental development in an earlier post on risk management.

We’ve found that if an iteration is too long (a month or more), it is hard to defend the team from changes to priorities, to prevent new requirements from coming in and disrupting the team’s focus. And in a longer cycle, there are too many changes and fixes that need to be reviewed and tested, increasing the chance of mistakes or oversights or regressions.

Shorter releases are easier to manage because, of course, they are necessarily smaller. We can manage the pressure from the business-side for changes because of the fast delivery cycle (except for emergency hot fixes, we are usually able to convince our product owner, and ourselves, to wait for the next sprint since it is only a couple of weeks away) and it is easier for everyone to get their heads around what was changed in a release and how to verify it. And shorter cycles keep us closer to our customers, not only giving us faster feedback, but demonstrating constant value. I like to think of it as a “value pipeline”, continuously streaming business value to customers.

One of my favorite books on software project management, Johanna Rothman’s Manage It!, recommends making increments smaller to get faster feedback – feedback not just on the product, but on how you build it and how you can improve. The smaller the iteration, the easier to look at it from beginning-to-end and see where time is wasted, what works, what doesn’t, where time is being spent that isn’t expected.

“Shorter timeboxes will make the problems more obvious so you can solve them.”

Ms. Rothman recommends using the “Divide-by-Two Approach to Reduce Iteration Size”: if the iterations aren’t succeeding, divide the length in half, so 6 weeks becomes 3 weeks, and so on. Smaller iterations provide feedback - longer ones mask the problems.

Ms. Rothman also says that it is difficult to establish a rhythm for the team if iterations are too long. In “Selecting the Right Iteration Length for Your Software Development Process”, Mike Cohn of Mountain Goat Software examines the importance of establishing a rhythm in incremental development. He talks about the need for a sense of urgency: if an iteration is too long, it takes too much time for the team to “warm up” and take things seriously. Of course, this needs to be balanced against keeping the team in a constant state of emergency, and burning everyone out.

Some of the other factors that Mr. Cohn finds important in choosing an iteration length:
- how long can you go without introducing change – avoiding requirements churn during an iteration.
- if cycles are too short (for example, a week) small issues, like a key team member coming down with a cold, can throw the team’s rhythm off and impact delivery.

All of this supports our experience: shorter (but not too-short) cycles help establish a rhythm and build the team’s focus and commitment, constantly driving to delivering customer value. And shorter cycles help manage change and risk.

Now we are experimenting with an aggressive, fast-tracked delivery model: a 3-week end-to-end cycle, with software delivered to production every 2 weeks. The team starts work on designing and building the next release while the current release is in integration, packaging and rollout, overlapping development and release activities. Fast-tracking is difficult to manage, and can add risk if not done properly. But it does allow us to respond quickly to changing business demands and priorities, while giving us time for an intensive but efficient testing and release management process.

We'll review how this approach works over the next few months and change it as necessary, but we intend to continue with short increments. However, I am concerned about the longer-term risks, the potential future downsides to our rapid delivery model.

In The Decline and Fall of Agile James Shore argues that rapid cycling short cuts up-front design:

“Up-front design doesn't work when you're using short cycles, and Scrum doesn't provide a replacement. Without continuous, incremental design, Scrum teams quickly dig themselves a gigantic hole of technical debt. Two or three years later, I get a call--or one of my colleagues does. "Changes take too long and cost too much!" I hear. "Teach us about test-driven development, or pairing, or acceptance testing!" By that time, fixing the real problems requires paying back a lot of technical debt, and could take years.”

While Mr. Shore is specifically concerned about loose implementations of Scrum, and its lack of engineering practices compared with other approaches like XP (see also Martin Fowler of ThoughtWorks on the risks of incremental development without strong engineering discipline), the problem is a general one for teams working quickly, in short iterations: even with good engineering discipline, rapid cycling does not leave a lot of time for architecture, design and design reviews, test planning, security reviews... all of those quality gating activities that waterfall methods support. This is a challenge for secure software development, as there is little guidance available on effectively scaling software security SDLC practices to incremental, agile development methods, something that I will explore more later.

Trying to account for architecture and platform decisions and tooling and training in an upfront “iteration zero” isn’t enough, especially if your project is still going strong after 2 or 3 years. What I worry about (and I worry about a lot of things) is that, moving rapidly from sprint to sprint, the team cannot stop and look at the big picture, to properly re-assess architecture and platform technology decisions made earlier. Instead all the team has a chance to do is make incremental, smaller-scale improvements (tighten up the code here, clean up an interface there, upgrade some of the technology stack), which may leave fundamental questions unanswered, trading off short-term goals (deliver value, minimize the cost and risk of change) with longer-term costs and uncertainties.

One of the other factors that could affect quality in the longer term is the pressure on the team to deliver in a timebox. In Technical Debt: Warning Signs, Catherine Powell raises the concern that developers committing to a date may put schedule ahead of quality:

“Once you've committed to a release date and a feature set, it can be hard to change. And to change it because you really want to put a button on one more screen? Not likely. The "we have to ship on X because X is the date" mentality is very common (and rightly so - you can't be late forever because you're chasing perfection). However, to meet that date you're likely to cut corners, especially if you've underestimated how much time the feature really takes, or how much other stuff is going on.”

Finally, I am concerned that rapid cycling does not give the team sufficient opportunities to pause, to take a breath, to properly reset. If they are constantly moving heads down from one iteration to another, do team members really have a chance to reflect, understand and learn? One of the reasons that I maintain this blog is exactly for this: to explore problems and questions that my team and I face; to research, to look far back and far ahead, without having to focus on the goals and priorities of the next sprint.

These concerns, and others, are explored in Traps & Pitfalls of Agile Development - a Non-Contrarian View:

"Agile teams may be prone to rapid accumulation of technical debt. The accrual of technical debt can occur in a variety of ways. In a rush to completion, Iterative development is left out. Pieces get built (Incremental development) but rarely reworked. Design gets left out, possibly as a backlash to BDUF. In a rush to get started building software, sometimes preliminary design work is insufficient. Possibly too much hope is placed in refactoring. Refactoring gets left out. Refactoring is another form of rework that often is ignored in the rush to complete. In summary, the team may move too fast for it's own good."

Our team’s challenge is not just to deliver software quickly: like other teams that follow these practices, we’ve proven that we can do that. Our challenge is to deliver value consistently, at an extremely high level of quality and reliability, on a continual and sustainable basis. Each design and implementation decision has to be made carefully: if your customer's business depends on you making changes quickly and perfectly, without impacting their day-to-day operations, how much risk can you afford to take on to make changes today so that the system may be simpler and easier to change tomorrow, especially in today's business environment? It is a high stakes game we're playing. I recognize that this is a problem of debt management, and I'll explore the problems of technical debt and design debt more later.

The practices that we have followed have worked well for us so far. But is there a point where rapid development cycles, even when following good engineering practices, provide diminishing returns? When does the accumulation of design decisions made under time pressure, and conscious decisions to minimize the risk of change, add up to bigger problems? Does developing software in short cycles, with a short decision-making horizon, necessarily result in long-term debt?

Building Real Software

Wednesday, June 17, 2009

How long can this go on?

No comments:

Post a Comment