Thursday, October 10, 2013

Don't You Know that Support is the Most Important Part of a Developer’s Job?

Agile development – because you are building working software faster and delivering it incrementally – forces development teams to face a common, fundamental problem: how to balance the work of developing new software with the need to support a system that is already being used in production, whether it’s the legacy system that you’re replacing, or the system that you are still building – and sometimes both.

This is especially a problem for Agile teams following Scrum. On the one hand, in order for the team to meet Sprint goals and commitments and to establish a velocity for future planning, the team is not supposed to be interrupted while they are doing their work. On the other hand, the point of working iteratively and incrementally in Scrum is to deliver working software early and frequently to the customer, who will want to use this software as soon as they can, and who will then need support and help using the software – help and support that needs to come from the people who wrote the software.

At some point, often still early in developing a system, these teams have to stop working in a bubble with their pretend Customer, and start working in the real world with real customers who have real demands.

Supporting Customers and Still Building New Software

This means that teams have to find a way to juggle support and maintenance work with design and development, to deal with rapidly changing priorities and interruptions and complaints and questions and the stress of fire fighting when things break, while still trying to deliver good quality software and hit deadlines.

It’s not easy to balance two completely different kinds of work with directly opposed goals and incentives and metrics. As Don Schueler explains in the “The Fragile Balance between Agile Development and Customer Support”, development teams – even Agile teams working closely with their Customer – are mostly inward-looking, internally focused on delivery and velocity and cost and code quality and technical concerns. Support teams are outward-looking, focused on customer relationships and customer experience and completeness and minimizing operational risk.

Development is about being predictable and efficient: deliver to schedule and keep development costs down. Support is about being responsive and effective: listen to the customer, answer questions, fit in unplanned work, figure out problems and fix things right away. Development work is about flow, continuity, predictability, velocity, and, if managed correctly, is mostly under control of the team. Support and maintenance work is interrupt-driven, immediate, inconsistent and unpredictable – a completely different way of working and thinking. Development work requires the team to be drawn together so that they can collaborate on common goals and the design. Most maintenance and support work is disjointed and disconnected, smaller tasks that can be done by people working independently. Development, even in high pressure projects, is measured in weeks or months. Support and maintenance work needs to be done in days or hours or sometimes minutes.

Agile Support Models: Maintenance Victims

One way that teams try to handle support and maintenance is by sacrificing someone from the team: offering up a “maintenance victim” who takes on the support burden for the rest of the team temporarily, letting the others focus on design and development work. This includes taking calls from Ops or directly from customers, looking at logs, solving problems, fixing bugs. This could mean staying after hours to help troubleshoot or repairing a production problem or putting out a fix, and being on call after hours and on weekends.

The rest of the team tries to pretend that this victim doesn’t exist. If the victim isn’t busy working on support issues or fixing bugs found in production, they might work on fixing other bugs or maybe some other low-priority development work, but they are subtracted from the team’s velocity – nobody depends on them to deliver anything important.

Teams generally rotate someone through support and triage responsibilities for one or two Sprints. This way everyone at some point “shares the pain” and gets some familiarity with support problems and operational issues. There are also positive sides to being sacrificed to support. Developers get a chance to learn more about the system and stretch some of their technical skills, and get off of the hamster wheel of Sprint-after-Sprint delivery for a bit. And they get a chance to play the hero, step in and fix something important and make the customer happy.

Kent Beck and Martin Fowler in Planning Extreme Programming extend this idea to larger organizations by creating a small production support team: 2-4 developers who volunteer to focus on fixing bugs and dealing with production problems. Developers spend a couple of Sprints in production support, then rotate back to development work. Beck and Fowler recommend staggering rotations, making sure that at least one developer is in the first rotation and another in the second so that at least one member of the support team always knows about what is going on and what problems are being worked on.

Sacrificing a maintenance victim or a team makes it possible for most of the rest of the team to move forward on development, while still meeting support commitments. This approach assumes that anyone on the team is capable of figuring out and fixing any problem in the system – that everyone is a cross-functional generalist. And this means that whoever is on this support rotation has to be good enough and experienced enough that they can deal with most issues without bringing in the rest of the team - you can’t rotate newbies through support and maintenance work, at least not without someone senior backing them up.

And you also have to be prepared for problems that are too big or too urgent for your maintenance victim to take care of on their own. Even with a dedicated team you may still need to build in some kind of slack or buffer to deal with emergencies and general helping out, so that you don’t keep blowing up Sprints. You can come up with a reasonable allowance based on “yesterday’s weather”, on how much support work the team has had to do over the last few weeks or months. If you can't make this work, if the entire team is spending too much time on support and fire fighting and pushing hot fixes, then you are doing something wrong and you have to get things under control before you build more software any ways.

Kanban instead of – or inside of – Scrum

Rather than trying to shoe horn maintenance and support into time boxes, some teams have found that Kanban is much better structured than Scrum or XP is to balance support, maintenance, and operations with new development work.

Kanban’s queuing model and use of task boards makes it easy to see what work needs to be done, what work is being done, who is doing it, what’s getting in the way, and when anything changes.

Kanban makes it easier to track and manage different kinds of work that requires different kinds of skills and that don’t always fit nicely into a 1-week or 2-week time-box..

Kanban doesn’t pretend that you won’t be or can’t be interrupted – instead it helps you to manage interruptions and minimize their impact on the team. First, in Kanban you set limits on how much of different kinds of work the team can deal with at a time. This lets the team get control over work coming in, and stay focused on getting things done. Kanban’s queue-and-task model allows emergencies to pre-empt whatever work is in progress through escalation/priority lanes. And priorities can keep changing right up until the last minute – team members just pull the highest priority work item from the ready queue when they are free to take on more work, whether this is designing and developing a new feature, or fixing a bug, or dealing with a support issue.

Kanban helps teams focus more on immediate, tactical issues. It’s a better model to follow when you have more maintenance and support work than new design and development, or when you have to assert control over a major problem or manage something with a lot of moving pieces like the launch of a new system.

Devops Changes Everything

Devops, as followed by organizations like Etsy and Facebook and Netflix (where they go so far as to call it NoOps) tries to completely break down the boundaries between development, maintenance, support and operations. Devops engages developers directly and closely into the support, maintenance and operations of the systems that they build. Developers who work in these organizations are not just writing code – they are part of a team running an online service-based business, which means that support work is as important, and sometimes more important, than designing and writing more software.

In these organizations, developers are held personally responsible for their software, for getting it into production and making sure that it works. They are on call for problems with software that they worked on. They are actively involved in operations of the system, providing insight into how the system works and how it is running, in testing and configuring it and tuning it and troubleshooting problems.

Devops changes what developers work on and how they do it. They move away from project work and more towards fast feature development, fixing, tuning and hardening. Availability and reliability and performance and security and other operational factors become as important – or more important – than delivery schedules and velocity. Developers spend more time thinking about how to make the system work, how to simplify deployment and setup and about the information that people need to understand what’s going on inside the system, what metrics and tools might be useful, how to handle bad data and infrastructure failures, what could go wrong when they make a change and who they need to check with and what they need to test for.

Maintenance and Support – Responsibility and Feedback

Whether developers need to – or even should – take first line support calls from users, they at least need to be part of second level and third level support, where problems are investigated and solved.

This is not just because they are usually the only people who can actually figure out and fix many problems.

Putting aside moral hazard arguments about whether it’s ethically acceptable for developers not to take full responsibility for the consequences for their decisions and the quality of their work, there are compelling advantages to developers being directly involved in supporting and maintaining the software that they work on.

The most important is the quality of the feedback that developers get from supporting a real system – feedback that is too valuable for them to ignore.

Real feedback on what you did right in building the system, and what you got wrong. Feedback on what you thought the customer needed vs. what they really need. What features customers really find useful (and what they don`t). Where the design is weak. Where most of your problems are coming from (the 20% of the code where 80% of the bugs are hiding), where the weaknesses are in your testing and reviews, where you need to focus and where you need to improve. Valuable information into what you’re building and how you’re building it and how you plan and prioritize, and how you can get better.

When developers are called into fire fighting production incidents and Root Cause Analysis reviews they can learn enormous amounts about what it takes to build software for the real world. Thinking seriously about how problems happened and how to prevent them can change how you plan, design, build, test and deploy software; and how people work together as a team.

Farming all of this off to someone else, filtering it through a help desk or an offshore maintenance team, breaks these valuable feedback loops, with negative effects for everyone involved.

Peter Gillard-Moss explains how this happens:

In a startup, developers take care of problems themselves, well, because there isn`t anybody else to do it. But at some point things change:

“…managers decided that we were spending far too long investigating users’ problems and not long enough building the new features the business wanted. Developers needed to be more productive, and more productive meant developers developing more new features. To get developers to develop they need to be ‘in the zone’. They need headphones and big screens to glue their eyes to. They did not need petty interruptions like stupid users ringing up because they got a pop up saying their details will be resent when they tried to refresh.”

But by doing this, the development team became disconnected from the results of their work, and from their customers…

“A systems thinker would tell you this is wrong. You’ve gone from a system that connected a user to the team responsible with one degree of separation, to one that has three degrees of separation. Or think of it another way: the team producing the product, and responsible for improvements and fixes used to be one degree away from their end users, who use the product and are feeding back the product’s shortcomings and issues, but are now three degrees. And not even three degrees all of the time. The majority of the time the team won’t ever hear about most of the support issues. And most of the time the team won’t even have that much interaction with the team that does hear about most of the support issues.”

The result: Customers don’t get the support that they need. Developers don’t get the information that they need to understand how to make the system work better. A support team stuck in the middle with people just trying to keep things from getting worse and hoping to find a better job someday. It’s a self-reinforcing, negative spiral.

In our shop, support takes priority over development – always. Our senior developers work with operations to support the system, and are on call when we put new software in and on call if something goes wrong after hours. They can bring in anyone else from any team that they need for help. As a result, we have very few serious problems and these problems get fixed fast and fixed right. The experience that everyone gets from working in support helps them to design and write better, safer code. This has made the system more resilient and easier and less expensive to support and safer to setup and run and easier and safer to change. And it has made our organization better too. It’s brought developers and operations closer together, and closer to what’s important to the business.

Whether you call it “Agile” or not, there’s nothing more agile than a team that is working directly with customers, responding immediately to problems and changing requirements in a live system. While some developers and managers think of this as overhead, sustaining engineering and try to push it off to somebody else so that they can focus on “more strategic" work, others recognize that this is really the leading edge of software development, and the only way to run a successful software organization, and the only way to make software, and developers, better.

No comments:

Site Meter