Tuesday, January 13, 2015

We can’t measure Programmer Productivity… or can we?

If you go to Google and search for "measuring software developer productivity" you will find a whole lot of nothing. Seriously -- nothing.
Nick Hodges, Measuring Developer Productivity

By now we should all know that we don’t know how to measure programmer productivity.

There is no clear cut way to measure which programmers are doing a better or faster job, or to compare productivity across teams. We “know” who the stars on a team are, who we can depend on to deliver, and who is struggling. And we know if a team is kicking ass – or dragging their asses. But how do we prove it? How can we quantify it?

All sorts of stupid and evil things can happen when you try to measure programmer productivity.

But let’s do it anyways.

We’re writing more code, so we must be more productive

Developers are paid to write code. So why not measure how much code they write – how many lines of code get delivered?

Because we've known since the 1980s that this is a lousy way to measure productivity.

Lines of code can’t be compared across languages (of course), or even between programmers using the same language working in different frameworks or following different styles. Which is why Function Points were invented – an attempt to standardize and compare the size of work in different environments. Sounds good, but Function Points haven’t made it into the mainstream, and probably never will – very few people know how Function Points work, how to calculate them and how they should be used.

The more fundamental problem is that measuring productivity by lines (or Function Points or other derivatives) typed doesn’t make any sense. A lot of important work in software development, the most important work, involves thinking and learning – not typing.

The best programmers spend a lot of time understanding and solving hard problems, or helping other people understand and solve hard problems, instead of typing. They find ways to simplify code and eliminate duplication. And a lot of the code that they do write won’t count anyways, as they iterate through experiments and build prototypes and throw all of it away in order to get to an optimal solution.

The flaws in these measures are obvious if we consider the ideal outcomes: the fewest lines of code possible in order to solve a problem, and the creation of simplified, common processes and customer interactions that reduce complexity in IT systems. Our most productive people are those that find ingenious ways to avoid writing any code at all.
Jez Humble, The Lean Enterprise

This is clearly one of those cases where size doesn’t matter.

We’re making (or saving) more money, so we must be working better

We could try to measure productivity at a high level using profitability or financial return on what each team is delivering, or some other business measure such as how many customers are using the system – if developers are making more money for the business (or saving more money), they must be doing something right.

Using financial measures seems like a good idea at the executive level, especially now that “every company is a software company”. These are organizational measures that developers should share in. But they are not effective – or fair – measures of developer productivity. There are too many business factors are outside of the development team’s control. Some products or services succeed even if the people delivering them are doing a lousy job, or fail even if the team did a great job. Focusing on cost savings in particular leads many managers to cut people and try “to do more with less” instead of investing in real productivity improvements.

And as Martin Fowler points out there is a time lag, especially in large organizations – it can sometimes take months or years to see real financial results from an IT project, or from productivity improvements.

We need to look somewhere else to find meaningful productivity metrics.

We’re going faster, so we must be getting more productive

Measuring speed of development – velocity in Agile – looks like another way to measure productivity at the team level. After all, the point of software development is to deliver working software. The faster that a team delivers, the better.

But velocity (how much work, measured in story points or feature points or ideal days, that the team delivers in a period of time) is really a measure of predictability, not productivity. Velocity is intended to be used by a team to measure how much work they can take on, to calibrate their estimates and plan their work forward.

Once a team’s velocity has stabilized, you can measure changes in velocity within the team as a relative measure of productivity. If the team’s velocity is decelerating, it could be an indicator of problems in the team or the project or the system. Or you can use velocity to measure the impact of process improvements, to see if training or new tools or new practices actually make the team’s work measurably faster.

But you will have to account for changes in the team, as people join or leave. And you will have to remember that velocity is a measure that only makes sense within a team – that you can’t compare velocity between teams.

Although this doesn't stop people from trying. Some shops use the idea of a well-known reference story that all teams in a program understand and use to base their story points estimates on. As long as teams aren't given much freedom on how they come up with estimates, and as long as the teams are working in the same project or program with the same constraints and assumptions, you might be able to do rough comparison of velocity between teams. But Mike Cohn warns that

If teams feel the slightest indication that velocities will be compared between teams there will be gradual but consistent “point inflation.”

ThoughtWorks explains that velocity <> productivity in their latest Technology Radar:

We continue to see teams and organizations equating velocity with productivity. When properly used, velocity allows the incorporation of “yesterday's weather” into a team’s internal iteration planning process. The key here is that velocity is an internal measure for a team, it is just a capacity estimate for that given team at that given time. Organizations and managers who equate internal velocity with external productivity start to set targets for velocity, forgetting that what actually matters is working software in production. Treating velocity as productivity leads to unproductive team behaviors that optimize this metric at the expense of actual working software.

Just stay busy

One manager I know says that instead of trying to measure productivity

“We just stay busy. If we’re busy working away like maniacs, we can look out for problems and bottlenecks and fix them and keep going”.

In this case you would measure – and optimize for – cycle time, like in Lean manufacturing.

Cycle time – turnaround time or change lead time, from when the business asks for something to when they get it in their hands and see it working – is something that the business cares about, and something that everyone can see and measure. And once you start looking closely, waste and delays will show up as you measure waiting/idle time, value-add vs. non-value-add work, and process cycle efficiency (total value-add time / total cycle time).

“It’s not important to define productivity, or to measure it. It’s much more important to identify non-productive activities and drive them down to zero.”
Erik Simmons, Intel

Teams can use Kanban to monitor – and limit – work in progress and identify delays and bottlenecks. And Value Stream Mapping to understand the steps, queues, delays and information flows which need to be optimized. To be effective, you have to look at the end-to-end process from when requests are first made to when they are delivered and running, and optimize all along the path, not just the work in development. This may mean changing how the business prioritizes, how decisions are made and who makes the decisions.

In almost every case we have seen, making one process block more efficient will have a minimal effect on the overall value stream. Since rework and wait times are some of the biggest contributors to overall delivery time, adopting “agile” processes within a single function (such as development) generally has little impact on the overall value stream, and hence on customer outcomes.
Jezz Humble, The Lean Enterprise

The down side of equating delivery speed with productivity? Optimizing for cycle time/speed of delivery by itself could lead to problems over the long term, because this incents people to think short term, and to cut corners and take on technical debt.

We’re writing better software, so we must be more productive

“The paradox is that when managers focus on productivity, long-term improvements are rarely made. On the other hand, when managers focus on quality, productivity improves continuously.”
John Seddon, quoted in The Lean Enterprise

We know that fixing bugs later costs more. Whether it’s 10x or 100+x, it doesn't really matter. And that projects with fewer bugs are delivered faster – at least up to a point of diminishing returns for safety-critical and life-critical systems.

And we know that the costs of bugs and mistakes in software to the business can be significant. Not just development rework costs and maintenance and support costs. But direct costs to the business. Downtime. Security breaches. Lost IP. Lost customers. Fines. Lawsuits. Business failure.

It’s easy to measure that you are writing good – or bad – software. Defect density. Defect escape rates (especially defects – including security vulnerabilities – that escape to production). Static analysis metrics on the code base, using tools like SonarQube.

And we know how to write good software - or we should know by now. But is software quality enough to define productivity?

Devops – Measuring and Improving IT Performance

Devops teams who build/maintain and operate/support systems extend productivity from dev into ops. They measure productivity across two dimensions that we have already looked at: speed of delivery, and quality.

But devops isn't limited to just building and delivering code – instead it looks at performance metrics for end-to-end IT service delivery:

  1. Delivery Throughput: deployment frequency and lead time, maximizing the flow of work into production
  2. Service Quality: change failure rate and MTTR

It’s not a matter of just delivering software faster or better. It’s dev and ops working together to deliver services better and faster, striking a balance between moving too fast or trying to do too much at a time, and excessive bureaucracy and over-caution resulting in waste and delays. Dev and ops need to share responsibility and accountability for the outcome, and for measuring and improving productivity and quality.

As I pointed out in an earlier post this makes operational metrics more important than developer metrics. According to recent studies, success in achieving these goals lead to improvements in business success: not just productivity, but market share and profitability.

Measure Outcomes, not Output

In The Lean Enterprise (which you can tell I just finished reading), Jez Jumble talks about the importance of measuring productivity by outcome – measuring things that matter to the organization – not output.

“It doesn't matter how many stories we complete if we don’t achieve the business outcomes we set out to achieve in the form of program-level target conditions”.

Stop trying to measure individual developer productivity.

It’s a waste of time.

Everyone knows who the top performers are. Point them in the right direction, and keep them happy.

Everyone knows the people who are struggling. Get them the help that they need to succeed.

Everyone knows who doesn't fit in. Move them out.

Measuring and improving productivity at the team or (better) organization level will give you much more meaningful returns.

When it comes to productivity:

  1. Measure things that matter – things that will make a difference to the team or to the organization. Measures that are clear, important, and that aren't easy to game.
  2. Use metrics for good, not for evil – to drive learning and improvement, not to compare output between teams or to rank people.

I can see why measuring productivity is so seductive. If we could do it we could assess software much more easily and objectively than we can now. But false measures only make things worse.
Martin Fowler, CannotMeasureProductivity

7 comments:

StathisD said...

Function points is the most stupid thing ever invented. It's irrelevant to modern programming. FP counting applies only to COBOL like programming languages. If you write a dialog that inputs 10 fields in a record it's 10 points. If you write a function block with 100000 lines that does some strange calculations its ZERO FPs. FP's is NOT a measure of code. It doesn't work on web applications , it doesnt work with RDBMS's.

Anonymous said...

We can measure programmer productivity. It is not perfect, to be sure, but we will not get better without measuring the right things.
FPs are not mainstream, I agree. LOC has its limitations as mentioned, but _most_ product teams will work with the same language, coding style (dictated), and framework(s), so it can be used for measurement, and it is easy to automate. Yes, someone could write software with a lot of needless LOCs, but wouldn't/shouldn't the team leader call bull$$$$ on that behavior. It is unprofessional and irresponsible. More LOCs (almost always) means resources to store and execute the code.
Escaped errors (i.e., defects) is the other key measure both during development and post-release. We can measure how much time spent and LOCs changed to fix the defects, then factor them into the initial productivity measured for the delivered LOCs.
Finally, programming productivity should not measure the time to write the code. Ultimately, the code is the manifestation of all the thinking, analyzing, and designing that the developer must perform. If you are a traditionalist, you can include the design documentation and the code comment density as well.
I am not the world's greatest programmer, but I can hold my own. I am not afraid to measured. Don't be a Wally. ;^)

Brian Balke said...

Harvard Business Review had a fascinating article on Morningstar Tomato Products that described how to establish job descriptions through direct negotiation between employees. Compensation is derived from the intensity of an employee's involvement in generating value for the people around them. That article had a fascinating accounting of the escalation procedures in the event of a contract dispute.

This is relevant because it's much harder to write, document and disseminate libraries that improve the productivity of an entire team than it is just to spew out the same code again and again as we go from one job to another. It's only in monitoring customer relationships that we can properly encourage that kind of work.

In creating their culture, Morningstar eliminated management. There are still those involved in critical decision making, but they do not have fixed job titles, and can be supplanted when their agenda diverges from the needs of their peers. And when compensation differentials arise, each individual can compare their network of associations with their peers, and decide whether they would benefit in restructuring their relationships to increase the perception of their value added. If not - well, maybe they deserve more!

Frank Vogelezang said...

Jim,

I totally agree with your final observations:
- Measure things that matter – things that will make a difference to the team or to the organization.
- Use metrics for good, not for evil – to drive learning and improvement

Individual programmer productivity is violating both observations. It will only make team members compete for the simplest tasks, rather than bring the whole team to a higher level. The best programmer is usually the one with the worst time/product ration - however measured - since he is doing the toughest jobs and helping less gifted team members.

Team productivity, if used for learning and improving, is a good thing to do as part of the team's own retrospective.

In a lot of cases function points can really serve as a product measure, but you have to watch out for your first observation. Efstatios illustrated that with some examples. If you write calculation or verification modules don't use function points, because they are not meant to measure that. If you build web applications, it is better to use COSMIC function points, because they are better equipped for that.

So look at the things that matter, and then choose your measures wisely . . and you will never end up measuring individual programmer productivity.

Serambi Kita said...

Nice article... very inspiring. But at the end, i still look for the best formulation to measure IT People Productivity (in general term). Thanks anyway...

Ben Thompson said...

Jim, nice summary of an interesting problem.

Agree with you that you can measure developer productivity.

GitPrime published some interesting data about work patterns of software engineers: it turns out that Prolific engineers take small bites there's a direct correlation between the frequency someone checks in code and their overall impact.

Thought you might be interested.

Jim Bird said...

@Ben Thompson,
Cool. Thank you for the data on check-in frequency and size. Makes sense.

Site Meter