BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Measuring Tech Performance: You’re Probably Doing It Wrong

Measuring Tech Performance: You’re Probably Doing It Wrong

Leia em Português

Key Takeaways

  • While there's no silver bullet for software measurement, there are some good guidelines to follow and common mistakes to avoid 
  • Use measures that focus on outcomes, not output
  • Use measures that optimize for global or team outcomes, not local or individual outcomes
  • Many problems with measurement, and the resulting problems of incentives that come with misalignment, can often be traced back to having measures that don’t follow those two simple guidelines
  • When you focus on global outcomes, productivity follows

Hi there and welcome to the Road to Excellence! We’re all probably strapped in, trying our best to become high performers and measure our performance, and along the way identify just the right OKRs and KPIs and ABCs. 

Unfortunately, this can be a difficult thing to do, especially when we’re working in complex organizations, and often inheriting measurements from the Ghosts of Technology and Management Past.

While there’s no One True Metric That Matters (sorry), there are some great guidelines to follow and some all-too-commonly made mistakes I often see. We outline the guidelines for good measurement in my new book, Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations, co-authored with Jez Humble and Gene Kim. These two simple rules are pretty straightforward, and they help us understand why the mistakes we often see in measurement fail so spectacularly.

Two Simple Guidelines for Software Measurement

  1. Use measures that focus on outcomes, not output, and
  2. Use measures that optimize for global or teamoutcomes, not local or individualoutcomes. 

That’s it. So many problems with measurement, and the resulting problems of incentives that come with misalignment, can often be traced back to having measures that don’t follow those two simple guidelines. This is because metrics shape our incentives, which inform our behavior. So we need to start with the right measures.

Common Examples of Bad Software Measurement

Most attempts to measure performance in software have focused on productivity, and they usually ignore the two guidelines I outlined above. And now, for a look in the mirror and maybe a little bit of schadenfreude, let’s start with some “Don’ts” in measurement:

Lines of code. We’ve been attempting to measure productivity in terms of lines of code for a long time in software. Some companies even required developers to record the lines of code committed per week (There's a good story about how the Apple Lisa team's management discovered that lines of code were meaningless as a productivity metric.) But honestly, we would prefer a 10-line solution to a 1,000-line solution to a problem. Rewarding developers for writing extra code just gives us bloated software that incurs higher maintenance costs and higher cost of change. And what about the other extreme? Minimizing lines of code doesn’t work either. Accomplishing a task in a single line of code that no one else can understand gives us code that isn’t maintainable. Ideally, we should reward developers for solving business problems with the minimum amount of code – and it's even better if we can solve a problem without writing code at all or by deleting code (perhaps by a business process change). 

Lines of code as a productivity metric violates our guidelines by focusing on output and not outcomes. It just measures what people have created (lines of code) – often because it is easy to measure in automated ways – but usually ignores measures of what is being accomplished in terms of goals because those are much more difficult to articulate and measure. But isn’t that what we really care about? 

Velocity. Velocity is a metric that comes from the Agile movement – work problems are broken down into story points, and points are assigned an amount of effort by developers. When work is signed off by customers, the points completed is a team’s velocity. However, this is a capacity planning tool for a team. Using velocity as a productivity metric has several flaws. First, velocity is a relative and team-dependent measure, not an absolute one. Teams usually have significantly different contexts which make comparing velocities inappropriate. (Seriously, don’t do this.)  Second, when velocity is used as a productivity measure, teams are very likely to game it: they inflate their estimates and focus on completing as many stories as possible at the expense of collaboration with other teams (which might decrease their velocity and increase the other team's velocity, making them look bad). Not only does this destroy the utility of velocity for its intended purpose, it also inhibits collaboration between teams.

Velocity as a productivity metric violates our guidelines by focusing on focusing on local measures and not global measures. This is particularly obvious in the second critique above: by (understandably) making choices to optimize their own velocity, teams will often not collaborate with other teams. This often results in scenarios where subpar solutions are available to the organization because there isn't’ a focus on global measures.

Utilization. Finally, many organizations measure utilization as a proxy for productivity. Unfortunately, the math just doesn’t work here, and many of us trapped under a huge to-do list will know what I’m about to describe: Once utilization gets above a certain level, there is no spare capacity (or "slack") to absorb unplanned work, changes to the plan, or improvement work. This results in longer lead times to complete work. Queue theory in math tells us that as utilization approaches 100%, lead times approach infinity – in other words, once you get to very high levels of utilization, it takes teams exponentially longer to get anything done. Since lead time—a measure of how fast work can be completed – is a productivity metric that doesn't suffer from the drawbacks of the other metrics we've seen, it's essential that we manage utilization to balance it against lead time in an economically optimal way.

Utilization as a productivity metric violates our guidelines because it focuses on output and not outcomes. It also focuses on individual and not global measures. These come together in a bad way to look like we’re squeezing the most out of our people (yay work!), when in reality, we’re creating a situation where we just make it impossible to get work done (boooo burnout and no work!). 

Good Examples of Measurement in Technology

Not all hope is lost! We do have some good examples of how to measure productivity in technology. I’ll outline a few of them here.

Software. In the research presented in Accelerate, we use a measure of software development and delivery we call software delivery performance. It is comprised of four measures across two categories: 

  • Tempo: 
    • Delivery lead time: the time it takes to go from code committed to code successfully running in production
    • Deployment frequency: how often a team deploys code
  • Stability
    • Time to restore service: how long it generally takes to restore service for the primary application or service they work on when a service incident (e.g., unplanned outage, service impairment) occurs
    • Change fail rate: what percentage of changes for the primary application or service they work on either result in degraded service or subsequently require remediation (e.g., lead to service impairment or outage, require a hotfix, a rollback, a fix-forward, or a patch)

These measures follow our guidelines because they are both outcome measures and global measures – that is, they focus on getting software into production that can value to the organization and customers, and not on small, localized pieces that can be gamed without helping the overall goal. We find that tempo and stability are both achievable together, with high performers doing well at both in tandem. 

Our research has found that by focusing on these measures drives organizational performance, with high performers being twice as likely to achieve profitability, productivity, and market share goals, as well as effectiveness, efficiency, and customer satisfaction goals. 

Database. Another great example comes in measuring database performance. This is tricky, because so often we (smartly, necessarily) get bogged down in the details: data in, data out, is the data safe, what does our telemetry look like, what aggregates are we choosing, etc. And all of these are necessary to get a good understanding of what our data and databases look like. However, if we want to think about database performance, we should take a step back and consider global measures and outcomes. 

This is where I love the guidance from Laine Campbell and Charity Majors in Database Reliability Engineering(where they dig into the details and the high level, by the way). They point out two key questions in their chapter on Operational Visibility: Is Your Service Up? And Are the Consumers in Pain? Here, they very smartly say that “end to end checks are the most powerful tool in your arsenal because they most closely reflect your customer experience” (p. 64).  

I love their clear guidance and focus on these measures, because again, this is where you close the proverbial loop by bringing database and dev teams together to drive value and ensure quality software (application and database code) are delivered in together. By the way, focusing on these measures is also what helps to pull database into the value conversation for your technology and offering. If your customers feel pain because your almost-cross-functional dev teams write applications that ignore your database team, who have no other option but to hand-deploy your database schema changes to keep up with application changes, maybe it’s time to kumbaya and start expanding that tent. 

Quality. A focus on quality is important to all organizations, and yet one of the most difficult measures to talk about universally. Why is that? This is because, in the words of software quality expert Jerry Weinberg, “Quality is value to some person.”1And as we all know, our organizations work in different contexts, serving different functions and different people. 

But “quality” – however you think of it in your context – is quite often a good productivity measure because it focuses on global measures and outcomes. We are usually thinking about our end users or customers, or the end state of our product. In our research, one example quality measure we have captured is the percentage of time spent on rework or unplanned work, including break/fix work, emergency software deployments and patches, responding to urgent audit documentation requests, and so forth. We found that the amount of time spent on new work, unplanned work or rework, and other kinds of work, was significantly different between high performers and low performers. Said another way, high performers were building quality in, and therefore had to spend less time fixing errors. Check out the figure below.

By focusing on these two things – global measures and outcomes – you’ll be well on your way to designing great measures to help you succeed. 

And when you focus on global outcomes like quality, productivity follows. Others have observed this as well. John Seddon said it well: “The paradox is that when managers focus on productivity, long-term improvements are rarely made. On the other hand, when managers focus on quality, productivity improves continuously.” 

About the Author

Nicole Forsgren is an IT impacts expert who shows leaders and tech professionals how to unlock the potential of technology change. She is a consultant, expert, and researcher in DevOps, IT adoption and impacts, and knowledge management. She is the co-founder, CEO and Chief Scientist at DevOps Research and Assessment (DORA), a venture with Gene Kim and Jez Humble. She is a member of ACM Queue's Editorial Board and an Academic Partner at Clemson University and Florida International University. Nicole holds a PhD in Management Information Systems and a Masters in Accounting, has published several peer-reviewed journal papers, and has been awarded public and private research grants (funders include NASA and the NSF).

References

1Weinberg, Gerald M. Quality Software Management. Volume 1: Systems Thinking. New York: Dorset House Publishing, 1992.

Rate this Article

Adoption
Style

BT