Tuesday, July 29, 2014

Devops isn't killing developers – but it is killing development and developer productivity

Devops isn't killing developers – at least not any developers that I know.

But Devops is killing development, or the way that most of us think of how we are supposed to build and deliver software. Agile loaded the gun. Devops is pulling the trigger.

Flow instead of Delivery

A sea change is happening in the way that software is developed and delivered. Large-scale waterfall software development projects gave way to phased delivery and Spiral approaches, and then to smaller teams delivering working code in time boxes using Scrum or other iterative Agile methods. Now people are moving on from Scrum to Kanban, and to One-Piece Continuous Flow with immediate and Continuous Deployment of code to production in Devops.

The scale and focus of development continues to shrink, and so does the time frame for making decisions and getting work done. Phases and milestones and project reviews to sprints and sprint reviews to Lean controls over WIP limits and task-level optimization. The size of deliverables: from what a project team could deliver in a year to what a Scrum team could get done in a month or a week to what an individual developer can get working in production in a couple of days or a couple of hours.

The definition of “Done” and “Working Software” changes from something that is coded and tested and ready to demo to something that is working in production – now (“Done Means Released”).

Continuous Delivery and Continuous Deployment replace Continuous Integration. Rapid deployment to production doesn't leave time for manual testing or for manual testers, which means developers are responsible for catching all of the bugs themselves before code gets to production – or do their testing in production and try to catch problems as they happen (aka “Monitoring as Testing").

Because Devops brings developers much closer to production, operational risks become more important than project risks, and operational metrics become more important than project metrics. System uptime and cycle time to production replace Earned Value or velocity. The stress of hitting deadlines is replaced by the stress of firefighting in production and being on call.

Devops isn't about delivering a project or even delivering features. It’s about minimizing lead time and maximizing flow of work to production, recognizing and eliminating junk work and delays and hand offs, improving system reliability and cutting operational costs, building in feedback loops from production to development, standardizing and automating steps as much as possible. It’s more manufacturing and process control than engineering.

Devops kills Developer Productivity too

Devops also kills developer productivity.

Whether you try to measure developer productivity by LOC or Function Points or Feature Points or Story Points or velocity or some other measure of how much code is written, less coding gets done because developers are spending more time on ops work and dealing with interruptions, and less time writing code.

Time learning about the infrastructure and the platform and understanding how it is setup and making sure that it is setup right. Building Continuous Delivery and Continuous Deployment pipelines and keeping them running. Helping ops to investigate and resolve issues, responding to urgent customer requests and questions, looking into performance problems, monitoring the system to make sure that it is working correctly, helping to run A/B experiments, pushing changes and fixes out… all take time away from development and pre-empt thinking about requirements and designing and coding and testing (the work that developers are trained to do and are good at).

The Impact of Interruptions and Multi-Tasking

You can’t protect developers from interruptions and changes in priorities in Devops, even if you use Kanban with strict WIP limits, even in a tightly run shop – and you don’t want to. Developers need to be responsive to operations and customers, react to feedback from production, jump on problems and help detect and resolve failures as quickly as possible. This means everyone, especially your most talented developers, need to be available for ops most if not all of the time.

Developers join ops on call after hours, which means carrying a pager (or being chased by Pager Duty) after the day’s work is done. And time wasted on support calls for problems that end up not being real problems, and long nights and weekends on fire fighting and tracking down production issues and helping to recover from failures, coming in tired the next day to spend more time on incident dry runs and testing failover and roll-forward and roll-back recovery and participating in post mortems and root cause analysis sessions when something goes wrong and the failover or roll-forward or roll-back doesn’t work.

You can’t plan for interruptions and operational problems, and you can’t plan around them. Which means developers will miss their commitments more often. Then why make commitments at all? Why bother planning or estimating? Use just-in-time prioritization instead to focus in on the most important thing that ops or the customer need at the moment, and deliver it as soon as you can – unless something more important comes up and pre-empts it.

As developers take on more ops and support responsibilities, multi-tasking and task switching – and the interruptions and inefficiency that come with it – increase, fracturing time and destroying concentration. This has an immediate drag on productivity, and a longer term impact on people’s ability to think and to solve problems.

Even the Continuous Deployment feedback loop itself is an interruption to a developer’s flow.

After a developer checks in code, running unit tests in Continuous Integration is supposed to be fast, a few seconds or minutes, so that they can keep moving forward with their work. But to deploy immediately to production means running through a more extensive set of integration tests and systems tests and other checks in Continuous Delivery (more tests and more checks takes more time), then executing the steps through to deployment, and then monitoring production to make sure that everything worked correctly, and jumping in if anything goes wrong. Even if most of the steps are automated and optimized, all of this takes extra time and the developer’s attention away from working on code.

Optimizing the flow of work in and out of operations means sacrificing developer flow, and slowing down development work itself.

Expectations and Metrics and Incentives have to Change

In Devops, the way that developers (and ops) work change, and the way that they need to be managed changes. It’s also critical to change expectations and metrics and incentives for developers.

Devops success is measured by operational IT metrics, not on meeting project delivery goals of scope, schedule and cost, not on meeting release goals or sprint commitments, or even meeting product design goals.

  • How fast can the team respond to important changes and problems: Change Lead Time and Cycle Time to production instead of delivery milestones or velocity
  • How often do they push changes to production (which is still the metric that most people are most excited about – how many times per day or per hour or minute Etsy or Netflix or Amazon deploy changes)
  • How often do they make mistakes - Change / Failure ratio
  • System reliability and uptime – MTBF and especially MTTD and MTTR
  • Cost of change – and overall Operations and Support costs

Devops is more about Ops than Dev

As more software is delivered earlier and more often to production, development turns into maintenance. Project management is replaced by incident management and task management. Planning horizons get much shorter – or planning is replaced by just-in-time queue prioritization and triage.

With Infrastructure as Code Ops become developers, designing and coding infrastructure and infrastructure changes, thinking about reuse and readability and duplication and refactoring, technical debt and testability and building on TDD to implement TDI (Test Driven Infrastructure). They become more agile and more Agile, making smaller changes more often, more time programming and less on paper work.

And developers start to work more like ops. Taking on responsibilities for operations and support, putting operational risks first, caring about the infrastructure, building operations tools, finding ways to balance immediate short-term demands for operational support with longer-term design goals.

None of this will be a surprise to anyone who has been working in an online business for a while. Once you deliver a system and customers start using it, priorities change, everything about the way that you work and plan has to change too.

This way of working isn't better for developers, or worse necessarily. But it is fundamentally different from how many developers think and work today. More frenetic and interrupt-driven. At the same time, more disciplined and more Lean. More transparent. More responsibility and accountability. Less about development and more about release and deployment and operations and support.

Developers – and their managers – will need to get used to being part of the bigger picture of running IT, which is about much more than designing apps and writing and delivering code. This might be the future of software development. But not all developers will like it, or be good at it.

5 comments:

Unknown said...

Title should be : "Devops isn't killing developers – but .." instead of "Develops isn't..."

Jim Bird said...

Julien, thank you for spotting the typo :-)

Michael said...

Gripping read, I must admit!

cblin said...

A great article as usual.

But this time, I have to disagree with you on many points.

Rapid deployment to production doesn't leave time for manual testing or for manual testers
-> that is true if you implement continuous delivery as continuous delivery to production
to me, you can do continuous delivery to a test server and choose to deploy a given release to qa server that will go to production
the pipeline need to be automatic but manual testing is indeed possible in qa
the goal is to deploy to production as soon as possible

This means everyone, especially your most talented developers, need to be available for ops most if not all of the time.
-> indeed if your application keeps crashing that is the case :)
but in my life, I deal with ops problem maybe once or twice in a week
I am far more interrupted by testers than by ops because they constantly want tiny improvements :)

As more software is delivered earlier and more often to production, development turns into maintenance. Project management is replaced by incident management and task management. Planning horizons get much shorter – or planning is replaced by just-in-time queue prioritization and triage.
-> it seems to me that you are in a situation where you are constantly firefighting and having continous delivery increases the burden.

The point I agree with you :
This way of working isn't better for developers, or worse necessarily. But it is fundamentally different from how many developers think and work today.

that is true and to me this is mostly good because this is also true for ops : they now have less excuses to delay the delivery which has always been a burden to me and support teams ("hey dude, this bug was corrected 2 months ago and some clients still experienced it"

A_flj_ said...

A few ideas:

Giving up developer productivity metrics in favor of operational metrics is a good thing - after all, operational metrics can be directly related to business value, while developer productivity, regardless of how you measure it, less directly.

Not all devs are on call all the time. IME it's usually one of the team members on call at any guven time, and the whole team does pager duty by rotation. This should give a developer plenty of time to focus and work on difficult programming tasks which require exquisite focus and the absence of interruptions.

Even with interruptions, if you manage to train yourself in the disciplined way of development called TDD, dealing with interruptions becomes easy - the increments in which you work are tiny, and complex algorithms evolve, rather than being designed. (Of course, similar to agile development in general, nobody believes it before experiencing it first-hand.)

What you loose in perceived productivity you more than make up (IMO/IME) in working on the things that matter and building them in a way that works better. Having a strong split between devs and ops often leads to neither delivering what the others need to efficiently cover business needs - which is probably the reason why IT departments were often not taken seriously and considered plainly stupid by other departments in the enterprise before devops.

This being said, I think that productivity, measured as business value delivered by a devloper in a unit of time - this being the only relevant measure, actually - is likely to go up with devops, instead of down.

Site Meter