If you’re trying to convince yourself (or the team or management) that your operations program needs to be changed for the better, and that trying a Devops approach makes sense – or that your operations organization is improving, and that whatever changes you have made actually make a difference – you have to measure something(s). But what?
John Clapham at Nokia suggests that you should try to measure how healthy your operations culture is. At the Devops Days conference this year in London he talked about ways to measure and monitor culture – behaviour, attitudes and values – to determine whether people were focused on the “right things”, and to assess the team’s motivation and satisfaction. Nokia had already started a Devops program, and wanted to see whether the momentum for change and improvement was still there after the initial push and evangelism had worn off. So they came up with a set of vital signs that they felt would capture the important behaviours and attitudes:
- Cycle time – time from development to deployment in production. Are we moving faster, or fast enough?
- Shared purpose – do people all share/believe in the same goals, believe in improving how development and ops work together?
- Motivation – does everyone care about what they are doing?
- Collaboration – are people working together willingly?
- Effectiveness – is everyone’s time being spent in a useful way? How much time is being wasted?
Operations Vital Signs that you Can and Should Measure
Clapham’s closing question was: “What vital signs would you look for?”
I'm not convinced that you can measure an organization’s cultural effectiveness, or that it would be really useful if you could. You can’t tell from a wishy-washy questionnaire whether change is making a real difference to the organization’s effectiveness and you are on the right track; or help you understand what you need to change and what the impact of change would be on the bottom line (or the top line). To do this you need concrete and results-based measurements which point out strong points and weaknesses, and that you can use to make a case for change, or justify your decisions.
Puppet Labs and IT Revolution Press have recently published a “State of Devops Report”, which is full of interesting data. The report stresses the importance of metrics in understanding how your organization is performing and why a Devops program is, or would be, worthwhile. They provide a list of objective measures, broken down into two major types.
Agility and reliability metrics:
- Deployment rate/frequency
- Change lead time – how long it takes to get a change approved and into production
- Change failure rate (John Allspaw's brilliant presentation “Ops Meta-Metrics” explains the importance of correlating deployment frequency/size/type and failures – type and severity – in production)
- Mean time to recover (and mean time to detect)
- Test cycle time – how long does it take to test a change?
- Deployment time – how long does it take to roll out a new change once tested and approved?
- Defect rate in production (defect escape rate)
- Helpdesk ticket counts – how much time is spent firefighting?
There are two other important measures that are missing from this list:
- Operations costs
- Employee retention – a key measure of whether people are happy
Measuring the success of a DevOps program is simple:
If you aren't saving money
If you can’t make change easier and faster
If you don’t improve quality and reliability and your organization’s ability to respond to problems
If you can’t keep good people
... then whatever you’re doing is not working or you’re not doing it right. It doesn't matter if you are “doing DevOps” or using certain tools or if people seem to be more collaborative or believe that they have a greater sense of shared purpose. What matters is the outcome. Make sure that you’re measuring the right things – so that you know that you are doing the right things.