Key Takeaways
- Engaging teams, middle management (product managers, development managers, operations managers) and upper management (“head of” roles) in data-driven decision-making enables holistic continuous improvement of a software delivery organization
- Optimizing a software delivery organization holistically can be supported by establishing software delivery performance indicators at different granularities
- For each measurement dimension selected by the organization for optimization, indicators at three granularities can be created: team-level indicators, product-level indicators, and product line-level indicators
- The three indicator granularities have been used to facilitate data-driven decision-making at different levels of organization: teams, middle management and upper management
- A dedicated process is required at each organizational level to regularly look at the indicators, analyze the data and take action
The Data-Driven Decision Making Series provides an overview of how the three main activities in the software delivery - Product Management, Development and Operations - can be supported by data-driven decision making.
Introduction
Optimizing a software delivery organization is not a straightforward process standardized in the software industry. A myriad of parameters can be measured, which would produce lots of measurement data. However, getting the organization to analyze the data and act on it is a difficult undertaking. This, however, is the most impactful step in the whole optimization process. If people in the organization do not act on the measurement data, there is no data-driven optimization of the organization taking place.
To advance the state of the art in this area, the "Data-Driven Decision Making" article series from 2020 provided a framework of how the three main activities in the software delivery - Product Management, Development and Operations - can be supported by data-driven decision making. Since then, the 25 teams-strong organization running the Siemens Healthineers teamplay digital health platform and applications gained several years of experience in optimizing itself using the framework. In this article, we present our new and deeper insights of how a socio-technical framework for optimizing a software delivery organization has been set up and brought to the point of regular use.
Selecting measurement dimensions
In a software delivery organization, a myriad of things can be measured. These range from pure mathematical measurements easy to count, such as e.g. bug numbers in the bug tracking system for a time period, to socio-technical process measurements difficult to quantify, such as e.g. outage duration (when does the outage really start and stop?).
Whatever the measurement, it is only effective if it leads to people taking regular action on it. Without taking action to periodically analyze the measurement data and, based on the analysis results, making a change in the organization, the measurement process can be attributed to avoidable waste.
In a given organization, it might be difficult to agree on the measurement dimensions to optimize the software delivery. A guiding principle should be to only measure what the organization is potentially willing to act upon. Acting includes topic prioritization, process changes, organizational changes, tool changes and capital investments.
In recent years, some measurements have become popular. For example, measuring stability and speed of a software delivery process using the DORA metrics enjoys popularity as it is rooted in a multi-year scientific study of what drives the software delivery performance.
Another example is measuring reliability. Here, SRE is getting popular as a methodology employed by a growing number of organizations to run services at scale. It enables reliability measurements based on the defined SLIs, SLOs and corresponding error budget consumption tracking.
In terms of measuring value, there are movements in the software industry to establish leading indicators of value that the teams can act upon. This is as opposed to the lagging indicators of value represented by the revenue streams. Some organizations use hypothesis-driven development or the North Star framework for this. So far, no industry-leading framework has dominantly emerged in this area yet.
As described in the article "Data-Driven Decision Making – Optimizing the Product Delivery Organization" from 2020, our organization running the Siemens Healthineers digital health platform and applications decided to measure the value, speed, stability and reliability. In 2022, we added another measurement dimension: cloud cost. Why did we decide to measure these, and not other, dimensions? The answers are in the table below.
Measurement dimension | Reasoning for selecting the measurement dimension | Willingness to act on the measurement data | Measurement framework |
Value | We sell subscriptions to digital services in the healthcare domain. This is a new market. Being able to measure the value of subscriptions to customers (not just the cost) enables us to steer product development using the new value trend feedback loop. | High willingness of the product owners to increase the value of subscriptions to customers in order to decrease subscription churn and encourage new subscriptions. | North Star framework |
Cloud cost | The cost side of our business case depends on the recurring cost generated by using the Microsoft Azure cloud. Knowing the cloud cost by team, by product, by product line, by business unit and by cost center provides cloud cost transparency to various stakeholders on a daily basis enabling well-informed decision-making. | High willingness of the budget owners and finance department to allocate the cloud cost properly to the involved product lines and cost centers. Further, high willingness to stay within the allocated budgets and not let the cloud cost inflate without prior agreement. | Homegrown FinOps |
Speed | Speeding up feature delivery reduces the inventory cost, feature time to market, test automation debt and deployment automation gaps. | High willingness of the product owners and development teams to be able to release features every 2-4 weeks. | DORA framework |
Stability | Deployment stability is important when increasing the speed of feature delivery to production. Higher frequency of feature delivery should lead to higher deployment stability (and not the other way around) because the size of deployed features is reduced. | High willingness of the development teams to deploy features in a way unnoticeable by customers using zero downtime deployments and other techniques. | DORA framework |
Reliability | Providing a digital service reliable from the user point of view is fundamental to fostering long-term subscribers to digital health services. Quantifying important reliability aspects provides transparency into the overall service reliability in all production environments. | High willingness of the development and operations teams to improve reliability. | SRE framework |
So far, we decided not to measure any other dimensions. There are several reasons for this:
- We can optimize the organization in a way that is impactful to customers and the business using the current set of five measurement dimensions:
- a. Optimizing the subscription value makes our subscriptions more valuable to customers. The feature prioritization process becomes more evidently value-driven.
- b. Optimizing the cloud cost makes our business case stronger. The teams produce software architectures and designs that take the cloud cost in consideration from the beginning.
- c. Optimizing deployment speed contributes to the ability to find a feature-market fit fast. The teams invest in proper feature breakdown, architecture with loose coupling, test automation, deployment automation etc.
- d. Optimizing deployment stability contributes to a good user experience on frequent updates. The teams invest in zero downtime deployments, API backward compatibility and other contributing technical capabilities.
- e. Optimizing reliability contributes to a consistently good user experience throughout the subscription duration. The teams invest in effective monitoring and incident response.
- We have seen that optimizing the current set of measurement dimensions implicitly drives other organizational improvements such as:
- a. Teams invest heavily in loose coupling, test automation and deployment automation by default. What is more, they constantly look for ways to optimize their deployment pipelines in these areas. This drives a very healthy engineering culture in the software delivery organization.
- b. The organization invests in continuous compliance capability by implementing a set of tools automating regulatory document creation. The tools run automatically on each deployment pipeline run.
- c. Teams very regularly update their 3rd party dependencies to frameworks and libraries.
- d. We have not seen the impact of the measurements on security and data privacy practices yet except for the teams having security penetration tests run more frequently.
All in all, while we could introduce other measurement dimensions, the current ones seem to be sufficient to drive a reasonably holistic continuous improvement using a rather small optimization framework.
Setting up a measurement system
The measurement system we set up engages the three major organizational levels in data-driven decision-making: teams, middle management and upper management.
For each measurement dimension chosen above, we set up three indicator granularities: team-level indicators, product-level indicators and product line-level indicators. This is illustrated in the figure below.
[Click on the image to view full-size]
Following this, the data generated for each measurement dimension can be thought of as a cube spawned across three axes:
- X axis: indicator granularity
- Team-level indicator
- Product-level indicator
- Product line-level indicator
- Y axis: the measurement dimension itself
- Value, or
- Cloud cost, or
- Stability, or
- Speed, or
- Reliability
- Z axis: organizational level
- Teams
- Middle management (product managers, development managers, ops managers)
- Upper management (‘head of’ roles)
This enables the entire product delivery organization to work with the data in the granularity that is right for the question at hand. Schematically, the data cubes are illustrated in the figure below.
[Click on the image to view full-size]
For example, a product manager analyzing speed (the blue cube above) can look at the delivery speed data across all relevant products using product-level indicators. They can go further and look at the speed data in aggregate using product-line level indicators. If the product manager has a technical background, they can look at the speed data in greater detail using the team-level indicators. The result of the analysis might lead to prioritization of technical features speeding up the delivery for products where the increased speed to market holds the promise of accelerating the search for a product-market fit.
Likewise, a team architect analyzing reliability (the red cube above) can look at the service reliability for all services owned by their team (team-level reliability indicators). They can then proceed by looking at the service reliability for services they depend upon that are owned by other teams (also team-level reliability indicators). When considering to use a new product, they can initially look at the aggregated product-level reliability data from the past (product-level reliability indicators). If the reliability at the product level seems reasonable, they can drill down to the reliability of the individual services making up the product (team-level reliability indicators). The result of the analysis might lead to data-driven conversations with the product owners about the prioritization of reliability features over customer-facing features.
Similarly, the leadership team consisting of a head of product, head of development and head of operations may analyze the cloud cost data. They can start by looking at the cost data at the product-line level analyzing the cost trends and correlating them with the corresponding value trends. For product lines where the cost trends correlate inversely with the value trends, the leadership team can drill down into the cost and value data at the product level. The result of the analysis might be conversations with the respective product managers about the projected breakeven timepoints of newer products and revenue trends of mature products.
Setting up processes to act on data
Acting on the measurement data above across the organizational silos and levels requires dedicated processes to be set up. The processes are different for teams, middle management (product managers, development managers, operations managers) and upper management ("head of" roles). We found the following cadences and approaches useful in our organization.
Organizational level | Data analysis cadence | Process description | Indicator granularities used |
Teams | Generally every three months, sometimes more frequently | A dedicated one-hour meeting where the team looks at all available measurement dimensions (e.g. reliability, speed, stability, cloud cost and value), performs the data analysis together and derives action items | Team-level and product-level indicators |
Product management | Roughly every three months | Together with the teams, see the cell above | Product-level indicators |
Development management | Roughly every two months | In scrum of scrums or a similar forum the speed and stability data is analyzed, associated process changes are discussed and improvement conversations with other roles are prepared | Product-level indicators |
Operations management | Roughly every month, aligned with SRE error budget periods | In operations review or a similar forum, aggregated reliability data is analyzed, services with the consistently low reliability in relevant SLIs (availability, latency, freshness etc.) are identified and improvement conversations with the service owner teams are prepared | Product-level indicators and team-level indicators |
Upper management | Every 4-6 months in a formal setting; often in ongoing conversations and management standups | Relevant data points are used in ongoing conversations and management standups. Portfolio discussions happen by using the relevant data. Management offsites contain presentations by using the relevant data. | Product line-level and product-level indicators |
A note on the data analysis cadence at the team level: although many teams chose to look at the indicators data every 3 months, some teams look at some of the indicators more frequently. Specifically, the cloud cost data is sometimes watched on a daily basis, especially at times when a team is optimizing the architecture or design to reduce cloud cost. Additionally, build and deployment stability data is sometimes watched on a weekly basis when the team is working on stabilization after a bigger redesign.
Optimizing organization
In this section, we present an example of how we managed to optimize the organization in terms of speed using the data-driven decision-making occurring simultaneously at different organizational levels.
Speed is a very relevant measurement dimension as everybody wants to deliver features faster. When we started, the appetite to speed up was highly present in all our teams. At some point, the speed indicators were introduced and brought to maturity. This enabled the following data-driven workflows throughout the organization:
At the team level |
The lead times between the pipeline environments became apparent. For example, in a pipeline made up of the following environments Build agent → Accept1 → Accept2 → Sandbox → Production the four mean lead times between the five environments could be seen at a glance: Build agent →time→ Accept1 →time→ Accept2 →time→ Sandbox →time→ Production This gave the team owning the pipeline a new visual feedback loop into the speed of code movements between the environments. The team could optimize the speed by reducing the test suite runtimes. |
At the middle management level |
The product release lead times became apparent. For example, for a product X, the following graphs were drawn: 2021: Release1 →time→ Release2 →time→ Release3 The product and development managers could draw the following conclusions from the graphs’ analysis:
|
At the upper management level |
The product line and product release lead times became apparent. The head of product and head of development saw the connection between the release cadences and the regulatory burden entailed with each release. They initiated a project between the product lines and the regulatory department to re-assess the regulatory framework in the organization. It turned out that the production of documentation required by regulations was largely manual. Semi-automation was possible. This insight led to an organization-wide decision to invest in the area. A couple of years in, the manual burden of producing release documentation required by the regulatory framework was greatly reduced. This paved the way to accelerating releases throughout the organization. |
Summary
Optimizing a software delivery organization holistically is a complex endeavor. It can be supported well by providing measurement indicators at three granularities: team-level, product-level and product line-level indicators. This enables data-driven decision-making for the teams, middle management and upper management of the organization. Creating dedicated processes for analyzing the data and taking action at these three organizational levels facilitates holistic continuous improvement in the organization.
The Data-Driven Decision Making Series provides an overview of how the three main activities in the software delivery - Product Management, Development and Operations - can be supported by data-driven decision making.