Wednesday, February 25, 2015

DevOps is not a Race

Most of what we read about or hear about in DevOps emphases speed. Continuous Deployment. Fast feedback. Fail fast, fail often.

How many times do we have to hear about how many times Amazon or Facebook or Netflix or Etsy deploy changes every day or every hour or every minute?

Software Development at the Speed of DevOps

Security at the Speed of DevOps

DevOps at the Speed of Google

Devops Explained: A Philosophy of Speed, Not Momentum

It’s all about the Speed: DevOps and the Cloud

Even enterprise DevOps conferences are about speed and more speed.

Speed is Sexy, but...

Speed is sexy. Speed sells. But speed isn’t the point.

Go back to John Allspaw’s early work at Flickr, which helped kick off DevOps. Actually, look at all of Allspaw’s work. Most of it is about minimizing the operational and technical risk of change. Minimizing the chance of making mistakes. Minimizing the impact of mistakes. Minimizing the time needed to detect, understand and recover from mistakes. Learning from mistakes when they happen and improving so that you don't make the same kind of mistakes again or so that you can catch them and fix them quicker. Breaking down silos between dev and ops so that they can work together to solve problems.

La de da, everything’s fine … change happens….OMGWTF OUTAGES!!!!!

Infrastructure as Code and eliminating snowflake servers. Not about maximizing speed.

Checking everything into version control – code, application configuration, server and network configurations… not about maximizing speed.

Breaking releases down into small change sets with fewer moving parts and fewer dependencies, makes changes easier to understand, easier to review, easier to test, simpler and easier to deploy and simpler and easier to roll-back or fix. This is not about maximizing speed.

Executing automated tests in Continuous Integration…

Building out test environments to match production so that developers can test and learn how their system will work under real-world conditions…

Building automated integration and deployment pipelines to test and to production so that you can push out a change or a fix immediately is…

Change controls based on transparency and peer reviews and repeatable automated controls instead of CCB meetings…

Auditing all of this so that you know what was changed by who and when…

Developers talking to ops and learning and caring about run-time infrastructure and operations procedures….

Ops talking to developers and learning and caring about the application and how it is built and deployed and configured…

Wiring monitoring and metrics and alerting into the system from the beginning…

Running game days and testing your incident response capabilities with developers and ops…

Dev and ops working through Root Cause(s) Analysis in blameless post-mortems when something goes wrong so that they can learn and improve together…

Injecting automated security testing and checks into your build and deployment chain…

None of this is about speed. It is about building better communications paths and feedback loops between the business and developers and operations. About building a safe, open culture where people can confront mistakes and learn from them together. About building a repeatable, reliable deployment capability. Building better, more resilient software and a better, more resilient and responsive IT delivery and support organization.

DevOps is not a Race

Ignore the vendors who tell you that their latest “DevOps solution” will make your enterprise faster.

And unless you actually are an online consumer startup, ignore the hype about the Lean Startup and Continuous Deployment – this has nothing to do with running an enterprise.

DevOps is a lot of work. Don’t go into it thinking that it’s a race.

Tuesday, February 10, 2015

Don’t waste time tracking technical debt

For the last couple of years we’ve been tracking technical debt in our development backlog. Adding debt payments to the backlog, making the cost and risk of technical debt visible to the team and to the Product Owner, prioritizing payments with other work, is supposed to ensure that debt gets paid down.

But I am not convinced that it is worth it. Here’s why:

Debt that’s not worth tracking because it’s not worth paying off

Some debt isn’t worth worrying about.

A little (but not too much) copy-and-paste. Fussy coding-style issues picked up by some static analysis tools (does it really matter where the brackets are?). Poor method and variable naming. Methods which are too big. Code that doesn’t properly follow coding standards or patterns. Other inconsistencies. Hard coding. Magic numbers. Trivial bugs.

This is irritating, but it’s not the kind of debt that you need to track on the backlog. It can be taken care of in day-to-day opportunistic refactoring. The next time you’re in the code, clean it up. If you’re not going to change the code, then who cares? It’s not costing you anything. If you close your eyes and pretend that it’s not there, nothing really bad will happen.

Somebody else’s debt

Out of date Open Source or third party software. The kind of things that Sonatype CLM or OWASP’s Dependency Check will tell you about.

Some of this is bad – seriously bad. Exploitable security vulnerabilities. Think Heartbleed. This shouldn’t even make it to the backlog. It should be fixed right away. Make sure that you know that you can build and roll out a patched library quickly and with confidence (as part of your continuous build/integration/delivery pipeline).

Everything else is low priority. If there’s a newer version with some bug fixes, but the code works the way you want it to, does it really matter? Upgrading for the sake of upgrading is a waste of time, and there’s a chance that you could introduce new problems, break something that you depend on now, with little or no return. Remember, you have the source code – if you really need to fix something or add something, you can always do it yourself.

Debt you don’t know that you have

Some of the scariest debt is the debt that you don’t know you have. Debt that you took on unconsciously because you didn’t know any better… and you still don’t. You made some bad design decisions. You didn’t know how to use your application framework properly. You didn't know about the OWASP Top 10 and how to protect against common security attacks.

This debt can’t be on your backlog. If something changes – a new person with more experience joins the team, or you get audited, or you get hacked – this debt might get exposed suddenly. Otherwise it keeps adding up, silently, behind the scenes.

Debt that is too big to deal with

There’s other debt that’s too big to effectively deal with. Like the US National Debt. Debt that you took on early by making the wrong assumptions or the wrong decisions. Maybe you didn’t know you were wrong then, but now you do. You – or somebody before you – picked the wrong architecture. Or the wrong language, or the wrong framework. Or the wrong technology stack. The system doesn’t scale. Or it is unreliable under load. Or it is full of security holes. Or it’s brittle and difficult to change.

You can’t refactor your way out of this. You either have to put up with it as best as possible, or start all over again. Tracking it on your backlog seems pointless:

As a developer, I want to rewrite the system, so that everything doesn’t suck….

Fix it now, or it won’t get fixed at all

Technical debt that you can do something about is debt that you took on consciously and deliberately – sometimes responsibly, sometimes not. h

You took short cuts in order to get the code out for fast feedback (A/B testing, prototyping). There’s a good chance that you’ll have to rewrite it or even throw it out, so why worry about getting the code right the first time? This is strategic debt – debt that you can afford to take it on, at least for a while.

Or you were under pressure and couldn’t afford to do it right, right then. You had to get it done fast, and the results aren’t pretty.

The code works, but it is a hack job. You copied and pasted too much. You didn’t follow conventions. You didn’t get the code reviewed. You didn’t write tests, or at least not enough of them. You left in some debugging code. It’s going to be a pain to maintain.

If you don’t get to this soon, if you don’t clean it up or rewrite it in a few weeks or a couple of months, then there is a good chance that this debt will never get paid down. The longer it stays, the harder it is to justify doing anything about it. After all, it’s working fine, and everyone has other things to do.

The priority of doing something about it will continue to fall, until it’s like silt, settling to the bottom. Eventually you’ll forget that it’s there. When you see it, it will make you a little sad, but you’ll get over it. Like the shoppers in New York City, looking up at the US National Debt Clock, on their way to the store to buy a new TV on credit.

And hey, if you’re lucky, this debt might get paid down without you knowing about it. Somebody refactors some code while making a change, or maybe even deletes it because the feature isn’t used any more, and the debt is gone from the code base. Even though it is still on your books.

Don’t track technical debt. Deal with it instead

Tracking technical debt sounds like the responsible thing to do. If you don’t track it, you can’t understand the scope of it. But whatever you record in your backlog will never be an accurate or complete record of how much debt you actually have – because of the hidden debt that you’ve taken on unintentionally, the debt that you don’t understand or haven’t found yet.

More importantly, tracking work that you’re not going to do is a waste of everyone’s time. Only track debt that everyone (the team, the Product Owner) agrees is important enough to pay off. Then make sure to pay it off as quickly as possible. Within 1 or 2 or maybe 3 sprints. Otherwise, you can ignore it. Spend your time refactoring instead of junking up the backlog. This isn’t being irresponsible. It’s being practical.

Site Meter