Key takeaways
|
Our efforts to improve software development face the question what to focus on? Should we govern for predictability without concern of value, maximizing cost-efficiency without concern for end-to-end responsiveness? Or maybe do the opposite and govern for value over predictability, focus on responsiveness over cost efficiency?
That seems to be an ongoing discussion in the software development world. Here are some example opinions:
- “True agility – which means adopting a posture that allows you to respond rapidly to changing market conditions and customer demands – conflicts with predictability.” (Steve McConnell)
- “IT orgs need to govern for value over predictability, responsiveness over cost-efficiency.” (Martin Fowler)
Looking through the Cynefin lens we’ll show that’s a false dilemma – in reality we can’t be predictable if we are not adaptable. In order to improve software development what we really need is to be predictably adaptable.
What do predictability and adaptability mean?
There are three types of systems – ordered, chaotic and complex. In ordered systems a desired output can be predicted and achieved via planning based on historical data and analysis. A Chaotic system is one with complete randomness or lack of connections between the components of the system.
Our definitions of adaptability and predictability will come from the perspective of Complex Adaptive Systems (CAS). CAS are systems that have a large numbers of components, often called agents, that interact and adapt or learn3. Complex systems are non-linear and not causal. We can only understand the system by engaging with it. The components and the system itself coevolve so that we cannot predict the future.
Human complex adaptive systems are different because human beings are different – we have intelligence, many identities and intent. Humans are capable of shifting a system from complexity to order and maintain it there in such a way that it becomes predictable1. As Murray Gell-Mann, an American physicist who received the 1969 Nobel Prize in physics famously said: “The only valid model of a human system is the system itself”. An example of a CAS is a team developing software for a client.
The following are general characteristics of a CAS4:
- Acquires information about its environment and its own interaction with that environment, at a certain level of granularity.
- Identifies and separates regularities in the information it gathers. The regularities are not merely recorded in a lookup table but are compressed into a set of patterns.
- The results obtained by using the patterns in the real world then feedback as new information.
Adaptability is the ability of a system to adapt to changing market conditions and customer demands in order to be better fit for its purpose.
It is useful to distinguish between two levels of adaptation. The first level is when a CAS acts in the real world in ways prescribed by its current set of patterns. That is direct adaptation.
On the next level, the CAS may change or replace its patterns when the prevailing ones don’t give satisfactory results. That is complex adaptation.
Predictability is the degree to which a correct prediction of a system's future can be made.
For a CAS to have a future it needs to successfully adapt on both levels – to apply existing patterns and change or create new ones when needed.
Hence if a CAS has to at least survive if not thrive, there must be confidence it will adapt to the changes in the environment. In other words, it needs to be predictably adaptable.
How to be predictably adaptable?
In order to be predictably adaptable we need to know when to allow the entrained patterns of past experience to facilitate our actions and when to gain new perspective because the old patterns no longer apply. To do that first we need to know what type of problems we are dealing with - ordered, complex or chaotic. The Cynefin framework can help us do that.
Cynefin
The Cynefin framework is a sense-making framework. Sense-making is the way humans chose between multiple possible explanations of sensory and other input as they seek to confirm their perceptions with the reality in order to act in such a way as to determine or respond to the world around them (5). In other words - how do we make sense of the world so that we can act in it. With that comes the concept of sufficiency – how do I know enough so that I can act.
The framework is first about contextual awareness, followed by contextually appropriate action.
As can be seen from Figure 1, the Cynefin framework has five domains, four of which are named, and a fifth central area, which is the domain of disorder. There’s also a little fold beneath the Obvious to Chaotic boundary, to show how easily obvious solutions can cause complacency and fell into chaos.
The right-hand domains are those of order, and the left-hand domains those of un-order. All four domains have equal value. The key is to understand in which we are in and as importantly how we plan to move between them.
Fig. 1
In the Obvious domain the level of constraints is so high that everything is completely predictable. Obvious problems have known solutions. The decision model is sense, then categorize and respond based on best practices.
In the Complicated domain the constraints are governing, looser but within them we’ve got predictability. Complicated problems have predictable solutions, but require expertise to understand, so there are multiple good practices. The decision model is sense, then analyze and respond accordingly.
The Complex domain is the domain of complexity theory. There are cause and effect relationships but both the number of agents and the number of relationships defy categorization or analysis. Order emerges from interactions of the agents and the system over time, and the form of that order is unique to each emergence as it is path dependent – history is important as different starting conditions give different outcomes so the system is irreversible. Emerging patterns can be perceived but not predicted – we call this phenomenon retrospective coherence. Here we’ve got enabling constraints. The decision model is probe via numerous safe-to-fail experiments, sense the results and respond by either amplifying or dampening. Running safe-to-fail experiments (catalytic probes) is a form of learning and adapting to reality.
It’s changing the present without of a series of stepping stones but with a sense of a rough direction with willingness to change it.
In the Chaos domain there are no relationships between cause and effect. There is a potential of order but few can see it and if they can they rarely do unless they have the courage to act. The decision model is to act quickly and decisively, then sense and respond.
The central domain of disorder relates to inauthentic response to a problem or situation, namely a lack of awareness of which domain you are in.
Cynefin is not categorization but a sense-making framework. In order to enable sense-making in the unique context of a new software development project the Cynefin framework is created anew. That is called contextualization. In categorization the framework precedes the data. In contextualization the data precedes the framework and the boundaries between the Cynefin domains emerge from the data as a social process as presented in the next section.
Contextualization
Here we will use Cynefin framework to negotiate meaning between Client and Capability (represented by the delivering organization’s knowledge, skills and capacity) contexts when working on a new project. The value of the contextualized Cynefin framework is that it provides a unique shared language which reflects the context which the Client and the developers can use to discuss perspectives and possible actions. We will create two contextualized Cynefin frameworks – one for the Client and another for our Capability.
The first thing we need when creating a contextualized Cynefin framework is to write down on hexagon cards the user stories to be delivered. Then we describe the four Cynefin domains. Then the cards are positioned relative to each corner of the overall Cynefin space – without boundaries. Clusters are allowed to form but are not required. This can be done in person or electronically – but must be done socially. Discussion is encouraged while placing the cards.
The uncertainty about the nature of the client requirements defines the Cynefin domains for the Client’s context. We will use only three cases:
- The client knows exactly what they need. They have it already defined. (Obvious)
- The client knows what they need but not exactly. They’ll need some expert analysis in order to define it. (Complicated)
- The client has just a vague idea about what they need. They’ll have to explore several alternatives before arriving at a definition. (Complex)
The uncertainty about if our development capability would match the requirements defines the Cynefin domains for the Capability’s context. Again three cases:
- We have all the knowledge and skills required to do the job. (Obvious)
- We have some of the knowledge and skills required to do the job. The rest we’ll research and analyze because someone somewhere has done it before. (Complicated)
- We have none of the knowledge and skills required to do the job. We haven’t done it before and we’ll need to explore alternatives and build knowledge. (Complex)
When all stories are placed, lines are drawn between the user stories that are clearly in one domain leaving a central area of disorder. Then the lines are “pulled in” to make the distinctions between domains clearer. If a story is in the central area of disorder or it lies on a line it is split into two or more user stories that can be moved to a particular domain. This process involves discussion as consensus must be reached.
Eventually a Cynefin framework is created anew for each of the Capability and Client contexts.
(Click on the image to enlarge it)
We will use the contextualized Cynefin frameworks for categorization in our unique context.
Categorization
We match the user stories from each Cynefin framework indexed by their domain. The resulting Capability / Client matrix represents the Uncertainty about the work items we'll need to deliver.
Client Context | ||||
Complex | Complicated | Obvious | ||
Capability Context | Complex | High | High | High |
Complicated | High | Medium | Medium | |
Obvious | High | Medium | Low |
If a story is part of a Complex domain, it is considered Highly uncertain. If a story is part of a Complicated domain, it is Medium uncertainty. Low uncertainty are only user stories which are Obvious in both frameworks.
User Story | Capability Context | Client Context | Uncertainty profile |
#1 | Complex | Obvious | High |
… | … | … | |
#5 | Complex | Complicated | High |
… | |||
#14 | Obvious | Obvious | Low |
#15 | Obvious | Complicated | Medium |
There are a couple of important issues to be addressed about uncertainty categorization.
First - it is important to remember that in a complex system every intervention changes the thing observed. The agents and the system coevolve. Hence the initial categorization of the initial set of user stories is likely to change as we work through the project.
Second –we need to follow the same contextualization/categorization routine whenever we add new user stories to the project scope.
We are addressing the above issues by establishing a regular feedback loop to continuously revisit and update the categorization of the user stories in progress and categorize the newly added user stories.
Staffing
We need different people to work on different uncertainty profiles. The difference is the level of uncertainty they can handle both from Capability and Client perspectives.
For instance – if the problem is Low uncertainty we need a junior person, who can read a recipe book and follow it. He will need stable requirements and all the exact tools and technologies from the book.
If the problem is High uncertainty we need a senior person, who can cope without a recipe, with changing requirements and with whatever is available in terms of tools and technologies.
When we need to adapt we need more diversity in the system because then we’ve got more adaptive capacity. It’s not a coincidence that for the living organisms the mutation rates increase under stress – more diversity more adaptability. The requisite diversity is the optimal level of diversity for the context we are in. This concept comes from Cybernetics. In IT one example application would be the “Staff Liquidity” concept. Increasing diversity means increasing the number of participants and the variety of perspectives. That may make the organization more inefficient. That’s an inefficiency an organization needs to have.
If over focused on efficiency the organization loses adaptive capacity.
Scheduling
Risk is the negative effect of uncertainty on objectives. Schedule risk is exposure to loss from a project not meeting its schedule objectives.
The fundamental dynamic for software development is shifting a High uncertainty problem from complexity to order and maintaining it there in such a way that it becomes predictable.
That’s why we schedule the High Uncertainty work items first. They need exploration and there is a high probability they may negatively affect the schedule. At the same time there is a high probability we get lucky and finish them much earlier than expected. We employ numerous coherent (to the individual experiment and not the group as a whole, although the group must accept that it is a valid view point), safe-to-fail, fine-grained experiments in parallel which will provide feedback on what works and what doesn’t. We manage the experiments as a portfolio. In Agile software development we call such experiments "spikes”.
The Low and Medium uncertainty items we schedule after the High items. Not doing that would mean we try to hide uncertainty under the carpet.
Execution
Different types of problems require different management approaches.
For High uncertainty problems the role of management is to create and relax constraints as needed and then dynamically allocate resources where patterns of success emerge. We manage what we could manage – the boundary conditions, the safe-to-fail experiments, rapid amplification and dampening. By applying this process, we discover value at points along the way. Again we can take “Staff Liquidity” as an example how this is done in Agile software development.
As stated above, for the High uncertainty problems we employ experiments which will provide feedback on what works and what doesn’t. We manage the experiments as a portfolio. Some of the experiments must be oblique, some naïve, some contradictory, some high-risk/high-return types. When we run several probes in parallel, they will change the environment anyway so the logic is that all probes should be roughly of the same size and take roughly the same time.
Then we apply sense-making to the feedback and respond by continuing or intensifying the things that work, correcting or changing those that don’t.
As they start to work in a consistent way we apply constraints such as design an architecture and select the technologies to be used.
Now we’ve successfully moved into the Complicated domain. If we lose certainty, we should move back into the Complex domain. If the level of volatility is such that we never really get certainty, then it is possible to stay in the Complex domain. That Complex to Complicated cycle is essential in software development. Complicated is for exploitation, Complex is for exploration.
We will illustrate the above dynamic with a real software development project. On Fig.2 project delivery rate is visualized using a Cumulative Flow Diagram (CFD). Across the X-axis is the timeline of the project. The Y-axis shows the cumulative count of work items in different states of the development process. On the below CFD the flow of work is represented by the cumulative number of arrivals - the top of the light blue band (Backlog) and the cumulative number of departures - the top of the blue band (Done). The steeper the blue band the more work items delivered per day of work.
Metrics such as delivery rate should be looked at in context. Measuring development progress on the assumption of order is a fundamental error. High uncertainty problems should not be measured as if they are Low uncertainty.
Fig.2
The delivery rate follows a sigmoid or Z-curve pattern.
The Z-curve can be divided in three parts or we can say it has three legs. There is empirical evidence that 20% of the time the delivery rate will be slow. Then for 60% of the time we’ll go faster or it’s “the hyper productivity” period. And for 20% till the end we’ll go slowly. Of course numbers may vary depending on the context but the basic principle about the three sections is correct.
Each leg of the Z-curve is characterized by:
- Different work type
- Different level of variation
- Different staffing in terms of headcount and level of expertise
Only the second Z-curve leg is representative for the system capability. It shows the common cause variation specific to each system. First and third Z-curve legs are project specific and are affected by special cause variation.
The first leg of the Z-curve is the time when the developers climb the learning curve and setup their minds for the new project. But this leg of the Z-curve could also be used for:
- conducting experiments to cover the riskiest work items
- Innovation!
- setting up environments
- adapting to client’s culture and procedures
- understanding new business domain
- mastering new technology
The second leg of the Z-curve is the productivity period. If the project is scheduled properly the system should be like clockwork – sustainable pace, no stress, no surprises…
The third leg of the Z-curve is when the team will clean up the battlefield, fix some outstanding defects and support the transition of the project deliverable into operation.
We can see that initially the delivery rate was low. It was because the team started by tackling the High uncertainty requirements first e.g. running experiments in order to select the technology to be used and the system architecture to be implemented. Those experiments and the decision making associated with them took significant amount of time. When in mid-September the team managed to move the problems into the Complicated domain the work rate increased dramatically.
Important is to mention that the initial backlog did not contain all of the project work. New work items were added as the project progressed. As mentioned earlier in the article the team followed the same contextualization/categorization routine whenever they added new user stories to the project scope.
The diagram does tell a story and that story is one of massive upfront uncertainty turned into predictable performance. A perfect example of being predictably adaptable!
Conclusion
Looking through the Cynefin lens we showed that Predictability vs. Adaptability dilemma implies a false dichotomy – what we really need is to be predictably adaptable or to be able to adapt as predictably as possible.
References
- The new dynamics of strategy: Sense-making in a complex and complicated world CF Kurtz, DJ Snowden - IBM SYSTEMS JOURNAL, 2003
- David J. Snowden, Mary E. Boone A Leader’s Framework for Decision Making
- Holland John H (2006). "Studying Complex Adaptive Systems". Journal of Systems Science and Complexity 19 (1): 1–8.
- Gell-Mann, M. (1994). Complex adaptive systems. In G. Cowan, D. Pines, & D. Meltzer (Eds.), Complexity: Metaphors, models and reality (pp. 17-45). Reading, MA: Addison-Wesley.
- D.J. Snowden, Multi-ontology sense making: a new simplicity in decision making Informatics in Primary Care, 13 (1) (2005), pp. 45–54
About the Author
Dimitar Bakardzhiev is an expert in managing successful and cost-effective technology development. As a Lean-Kanban University (LKU)-Accredited Kanban Trainer (AKT) and avid, expert Kanban practitioner and Brickell Key Award 2015 Finalist, Dimitar puts lean principles to work every day when managing complex software projects.
LinkedIn https://bg.linkedin.com/in/dimitarbakardzhiev
Twitter @dimiterbak