Gergely Orosz, author of The Pragmatic Engineer Newsletter, recently published the article Measuring Developer Productivity: Real-World Examples, co-authored by Abi Noda, CEO at DX and co-creator of the DevEx framework. The article analyses Noda’s survey of engineering metrics used across a broad spectrum of well-known tech giants. Noda found that rather than wholesale adoption of DORA or SPACE metrics, the indicators in use included many context-specific qualitative and quantitative metrics. Noda and Orosz provided guidance for defining such metrics by working backward from the outcomes sought by enablement teams.
Noda wrote that he "interviewed teams responsible for measuring developer productivity at 17 well-known tech companies." For the article, Noda and Orosz focused on four scales of organisation size, selecting Google at 100K staff, LinkedIn at 10K, Peloton at <10K, and scaleups such as Notion and Postman in the sub-1000 category. The metrics used range from typical PR and CI metrics, to methodically-selected indicators at Google.
Noda observed that in practice "DORA and SPACE metrics are used selectively," rather than being adopted wholesale. He wrote that while the survey revealed that "every company has its own tailored approach," he believed that "any size (of organisation) can adopt Google's overall philosophy and approach." Noda wrote that Google's approach involves selecting indicators based on three classes of measurement relating to "speed, ease and quality." He wrote that "tensions" exist between these three dimensions, "helping to surface potential tradeoffs."
Noda wrote that Google’s measurements use "qualitative and quantitative measurements to calculate metrics," as this provides "the fullest picture possible." Noda cites a range of information acquisition approaches used by Google, from satisfaction surveys to "measuring using logs." He wrote:
Whether measuring a tool, process, or team, Google’s Developer Intelligence team subscribes to the belief that no single metric captures productivity. Instead, they look at productivity through the three dimensions of speed, ease, and quality.
Similarly, Noda and Orosz described how LinkedIn uses a combination of quarterly developer satisfaction surveys with quantitative metrics. Noda wrote about a range of metrics used by LinkedIn’s Developer Insights team to drive its mission to reduce "friction from key developer activities." The indicators used by this team include CI stability metrics, deployment success rates, as well as P50s and P90s for build times, code review response times, and the time for a commit to go through the CI pipeline. Noda described how the team bolsters such quantitative metrics with qualitative insights, using the example of comparing build time with "how satisfied developers are with their builds." Objective numerical metrics are also de-noised by LinkedIn, using a "winsorized mean":
What a winsorized mean does is it says: figure out your 99th percentile and instead of throwing away all the data points that are above the 99th percentile, clip them. So if your 99th percentile is a hundred seconds and you have a data point that’s 110 seconds, you cross out 110 and you write a hundred, and now you calculate your (winsorized) mean that results in a more useful number.
Noda wrote that Peloton, representing the 3-4K organisation, has evolved from initially capturing "qualitative insights through developer experience surveys" to also incorporating quantitative metrics. For instance, objective proxies for velocity such as lead time and deployment frequency are used to measure speed. He wrote that Peloton’s metrics also included qualitative engagement scores, time to restore services, and code quality as measured by the percentage of "PRs under 250 lines, Line Coverage and Change Failure Rate."
Discussing smaller "scaleup" organisation such as Notion and Postman, Noda wrote that these often focus on measuring "movable metrics." He explained that this is a sensitive metric that enablement teams can "move by impacting it positively or negatively with their work." An example of this is "ease of delivery." Noda explained that this metric reflects "cognitive load and feedback loops," and can be moved as it captures how "easy or difficult developers feel it is to do their job." Another common movable metric reported was the "percentage of developers’ time lost to obstacles" and friction. He explained that measuring friction is particularly powerful for leaders, as it has a direct financial impact:
This metric can be translated into dollars: a major benefit! This makes Time Loss easy for business leaders to understand. For example, if an organization with $10M in engineering payroll costs reduces time loss from 20% to 10% through an initiative, that translates into $1M of savings.
Given the contextual nature of such engineering metrics, Noda recommends four steps for an organisation aiming to define its gauges:
- Define your goals in a mission statement, explaining "why does your dev prod team exist?"
- "Work backwards from your goals to define top-level metrics" based on speed, ease, and quality
- Define "operational metrics" tied to "specific projects or objective key results," eg. adoption rate of a particular developer productivity-enhancing service
Using examples, Noda pointed out that the metrics selected should be created with the dimensions of "speed, ease and quality" in mind. He illustrated with the example that if the goal is to make it easy for developers to "deliver high quality software", the resulting metrics included "Perceived Delivery Speed," "Ease of Delivery" and "Incident Frequency."
Orosz and Noda’s article is a follow-up to Measuring developer productivity? A response to McKinsey, a previous collaboration with Kent Beck that challenged and examined Mckinsey's Yes, you can measure software developer productivity. McKinsey's article proposed what it called "opportunity-focused" metrics, "to identify what can be done to improve how products are delivered and what those improvements are worth." The article included a discussion of developer productivity metrics to layer "on top of" DORA and SPACE. McKinsey’s piece included recommendations to encourage leaders to optimise for the efficiency of individual developer performance; an example area being "noncoding activities such as design sessions." The metrics proposed include tracking "contributions by individuals" and measuring "talent capability scores."
Warning of the dangers associated with measuring individual productivity, rather than the outcomes delivered, Beck shared his experience of seeing such metrics being used to "incentivize (with money & status) changes in the measures". He shared that while this can result in "behaviour change" it is also subject to gamification, and incentivising "creative ways to improve those measurements." Beck and Orosz encouraged leaders to instead focus on measuring "impact" rather than "effort". Beck specifically recommended that such metrics should not be used for anything more than continuous improvement feedback loops of those that are measured. He also warned of safety issues caused by misuse of metrics measuring individuals for anything more:
Be clear about why you are asking & what your power relationship is with the person or people being measured. When the person with power is measuring the person without, you’re going to get distortions … Avoid perverse incentives by analyzing data at the same level as you collect the data. I can analyze my own data. My team can analyze its own aggregated data.
Noda also warned that at the level of CTO, VPE or director of engineering if asked for developer performance metrics, it is better to ensure reporting is at an appropriate level. Noda recommended that this be done by selecting metrics representative of "business impact", "system performance", and "engineering organisation" level "developer effectiveness," an example being project-level metrics, "user NPS" and "weekly time loss" to friction. Noda advised senior leaders:
In this case, my best advice is to reframe the problem. What your leadership team wants is less about figuring out the perfect productivity metrics, and much more about feeling confident that you’re being a good steward of their investment in engineering.
In their response to the McKinsey report, Orosz and Beck shared a poignant meme as a reminder citing Goodhart’s Law, which says "when a measure becomes a target, it ceases to be a good measure."