BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Sustaining Fast Flow with Socio-Technical Thinking

Sustaining Fast Flow with Socio-Technical Thinking

Bookmarks
39:45

Summary

Nick Tune shares principles and practices from the fields of DDD and Team Topologies that leaders can apply to create high-performing teams and sustainable flow throughout their organization.

Bio

Nick Tune works with technology leaders to map out business and technology landscapes, architect systems, and build high-performing CD teams. DDD and Team Topologies are the core tools in his enterprise design toolkit. Nick is the author of Patterns, Principles, and Practices of Domain-Driven Design (2014), and Architecture Modernization: Product, Domain, and Team-oriented (2022).

About the conference

QCon Plus is a virtual conference for senior software engineers and architects that covers the trends, best practices, and solutions leveraged by the world's most innovative software organizations.

Transcript

Tune: My name is Nick. I'm going to talk to you about sustaining fast flow with socio-technical thinking. I think I know what all of those words mean, but let's find out. I'd like to start by asking you a question. Does this sound familiar to you about where you work now, or any companies you've worked at before? This is a quote from a recent conversation I had with a marketing leader. Something has gone horribly wrong in this company. All we're asking for is 2 textboxes to be put on a webpage, and it's going to take 3 months. Why is everything taking so long here now? You see, I feel like this phenomena is so common. It reminds me, every time I see this, I think of how we go from being a Formula One car to an old banger. When we start something new, we've got a brand new product to start up a project with a fresh canvas, and everything's new, and there's no problem, there's nothing existing in the way. We can deliver a fast flow of changes very quickly. Almost any company can deliver a fast flow of changes very quickly, at the start of something. Over time, some weird, mysterious phenomenon means that everything starts taking so long. From, at the start, a couple of hours to put a textbox on a webpage, one year later, is taking three months. What happens in that period of time?

When Flow Drops Off, the Whole Company Is Affected

I think this is such an important phenomenon. I'm going to share with you one of my experiences. Back in 2017, I worked for quite a large company. I was based in London. The majority of the other teams were based in Indianapolis. We got invited over there for a big engineering offsite for a large majority, or a big number of the company's engineers working in this part of the business. Overall, the company was huge, tens of billions of dollars revenue and stuff. This was just one part of the business with probably about a couple of million dollars revenue, so still fairly big. The purpose was to get the engineers together to talk about what's working, what can be improved, to learn from each other. To pull out all the stops, trying to build this feeling and culture of how can we become a high performing engineering company. For me, it was also my first visit to the States, so I had big expectations, had never been there before. For many reasons, I was super looking forward to this trip to Indianapolis.

For the first day, we spent some time in the office, met some colleagues in Indianapolis face to face for the first time. It was cool. Then we went to the offsite, it was in a big hotel. There was a games room. There was like some VR stuff, Xboxes and PlayStations, lots of nice food. Then in the big room where the event was taking place, there was a space in the middle, a main floor. It wasn't really a stage, it was ground level. There was seating all around it, I think, probably 200 to 300 people. This event was all about engineering. How can we become better engineers? How can we share ideas? The keynote was being held by the CEO of that part of the business, that business unit. He didn't waste any time getting his point across about what he wanted to say that day. He just said, "We're facing a lot of problems in the company right now. We've grown, which is successful, but the products are unreliable. We've got lots of bugs. Customers are complaining. Customers are leaving. It's taking us so long to get any work done. What's going wrong here?" Then, in no uncertain terms, he called out the engineers. He's like, "You engineers, you're not doing a good-enough job. This is all your fault. All of these problems, it's your fault." I was pretty shocked by that. Then he went on to say software engineering is not a 9:00 to 5:00 job. You got to put in the hours and the effort to make sure work is done on time to a high quality with no bugs. That's how bad things can get when the work slows down, everything turns into an old banger, pressure across everyone in the company.

Sustainable Fast Flow at 7digital (2012)

It doesn't have to be that way. In 2012, before that experience with a big company, I worked for a smaller company called 7digital. We didn't have any fancy offices, no big engineering offsites. We worked in a basement in London in this building called Zetland House, in between Old Street and Moorgate. We didn't even have any daylight, yet I got to feel in this environment, a real high performing team. No frills, just smart, passionate people wanting to do good work. The key thing is the work and the speed was sustained over the course of years. There wasn't this big drop-off. When I started, 6 teams, deploying to production up to 25 times per day. I think on average between 15 and 25 per day. In fact, on my first day, I paired up with a senior engineer. We did some pair programming. We did some TDD. He wrote a test, I wrote some code to make it work. Did that a few times. Then one click, deploy to production the work we'd done, just one button click. It took about 5 or 10 minutes to get that code in production. No long testing processes. For comparison, at a big company, it took two hours to build the code base. In that two hours, we'd implemented a whole new feature. Deployed it all the way to production, and customers were using it. It doesn't have to be that way. Performance fast flow doesn't have to drop off, it can be sustained.

Sustainable Fast Flow at Scale

I was curious. After 7digital, I wondered, this was quite a small company. It was going from startup to scaleup. Two years later, they were building more teams, but still, in general, quite a small company. I wondered, can this work in larger companies? Since then, I've worked on a government project in 2015, 2016 at HMRC, very large organization. At that time, there were 50-plus teams, all deploying to production on a daily or a near daily basis. Since then, I've had the opportunity to work with supermarkets, travel companies, and seeing companies in a variety of different industries who've all managed to have this high rate of flow and sustain it over time.

How to Achieve Sustainable Fast Flow

I feel that this wasn't a fluke, or by chance, I feel that there are consistent things that these companies have done that's allowed them to have this sustainable fast flow. Firstly, incentives. I'm not talking about money or bonuses. I'm talking about the way people are encouraged to build products. The kind of messages leadership sets down around quality, learning, and not just working to get work done as quickly as possible. The second thing is decoupling, splitting different areas of the business, different parts of the software, different teams, so that work can be done in parallel. When the coordination costs get too high, when changes start interfering with each other, that's a huge blocker to flow, and it doesn't scale as your company grows either. Thirdly, I think platforming is one of the key things that enable sustainable fast flow by moving all of these things that block engineers and slow them down. By moving all of that stuff into a platform, taking it away from them, and making sure it can't block them. You just completely preclude those things from blocking your teams. I think platforms are especially important as your organization scales. Once you get beyond six or seven teams, I think platforms really start to shine. When you've got 50-plus teams, it is a huge differentiator. I think also underpinning all of these things has been socio-technical thinking, from engineers to leadership. There's a focus on how we work, how we treat people, how we think about the social impacts of all the things we're doing from incentives to architecture, to culture, and balancing that with the technical concerns of building software systems and building organizations. I think, really, the socio-technical thinking has been present in all of those companies.

Part 1: Incentivizing Sustainable Fast Flow

The first part, I'm going to talk about incentivizing sustainable fast flow. I'm going to talk about this first, because I think this is the most important thing. This touches on the culture, behavior, and leadership. Without those things, I don't believe it's possible to create sustainable fast flow. Here's a quote from a company I worked at previously. What do you think about this? If you started a new job, and you spoke to a senior person in the company, and they said to you, "People who go home on time here, don't become managers in this company." How would you feel about that? How about this quote? If you're setting up a new team, or a few teams, and the CTO of the company who works in a different country, comes over to visit your teams in person and introduced himself. He says, "I've got a reputation for shouting at people, but don't take it personally. Anyway, if I don't shout at you, the CEO will shout at you louder." How would you feel if you worked in a company where you heard something like this? Maybe you do work in a company where you hear things like this. It won't surprise you potentially to hear that in the same company, it was also going through a situation where the investors weren't happy. In fact, they'd said to the CEO, "We're getting very frustrated that this company never delivers anything. You've got six months to deliver this project that hasn't been going anywhere for two years. Six months, and it has to be done or we're going to implement some big changes, and you Mr. CEO won't be part of those plans for the future."

I think those behaviors are a big sign that your company has a flow-destructive culture, where you're not emphasizing or addressing the social aspects needed to sustain fast flow over a long period of time. Always a rush to hit some deadline, pressure to deliver, engineers being stolen from one project to move on another. Those engineers are called resources or development resources. They're just numbers in a spreadsheet. We'll take naught point five of this developer, put him on that project for three days a week, and then the other naught point five of him can work on that project for two days a week. If you see a lot of that, I think you've got flow-destructive culture. Also, in this company, there was a single Jira workflow that the entire company had to use. We're talking tens of engineering teams building stuff in the cloud, stuff on- premises, legacy, APIs, embedded software. They thought they could have this one Jira workflow that every team had to use, and, of course, the CTO who enjoys shouting at people. These behaviors, in my experience, will not allow you to have sustainable fast flow in your organization, because they totally neglect the social aspects of building high performing teams.

Also, that company had this situation where one of the principal engineers in my team was contacted by a sales reps. He got a customer who's very angry. The customer is saying this API is not working. The sales rep, he couldn't find out the team that owned the API to ask them to fix it. It turns out, no one owned the API anymore. It had just been abandoned. Some team was part of a project to build it, then they all got moved to another project. No one was looking after the API anymore. It just became a mess over time and not being cared for. In my experience, if you neglect the social aspects, and you've got these flow-destructive behaviors, like single Jira workflow, hitting arbitrary deadlines, rotating developers like resource the numbers into spreadsheets, and you've got a CTO who likes shouting at people, you are incentivizing software that is not maintainable. Of course, if that software is not maintainable, over time, you're going to have a huge drop-off. It will be harder to make changes. The compile times will take longer. Testing will take up more of your time. You'll have more bugs in production, and flow in your company will catastrophically be blocked. If you're wondering, why is a webpage so difficult to update with a textbox? Look for these behaviors in your company. They probably caused a whole buildup of legacy.

It takes a change in the mindset of leadership to introduce good behaviors and practices and a culture that supports sustainable fast flow. I recently heard a quote from a CEO. He told me this. We were talking about the challenges in his company, and he said, "The engineers here have got OCD. They're always waffling on about technical debt and rewriting stuff. I wish they would just focus more on delivery." That there is an example of flow-destructive leadership. You're not incentivizing good engineering practices that make your systems easier to sustain and evolve. Very often in these companies, the leadership team who like shouting at people, putting in place arbitrary deadlines, not wanting to address technical debt, they're the same people who are always looking for some quick technical fix. "We need to modernize our systems, but let's try some agile framework. We'll hire some consultants and they'll give us all the answers that we need." Unfortunately, those things are not going to pave over your cracks, or if they do, it will only be a small fix. It really won't address the deep fundamental problems. I think the only quick fix is to go back in time and incentivize good behaviors that create sustainable software systems.

At 7digital, it was a completely different mindset. We were encouraged to do TDD and pair programming. We weren't forced to rush and hit any deadlines. We had a lot of time during working hours to learn, two days every month to learn something new, in addition to a bunch of other time for team get-togethers, retrospectives. I just felt so incentivized to work in a sustainable way. We were even told, go home on time every day, which is not something you hear at many tech companies. Those were the things that incentivized well-designed systems, one-click deployment, few bugs in production, high test coverage, and little downtime. It was no surprise that every team in the company had this same incentivization from leadership. They were all deploying multiple times per day, over a long period of time with no drop-off.

I think there are two things that I really want to call out here, which just do not work. If you see these behaviors in your company, these leadership behaviors, you need to stamp them out immediately. The first one is this idea that we can move people around teams. An engineer, 20% here, 50% here, it doesn't work. It doesn't make sense. It doesn't inspire motivation and purpose in people who just feel like cogs in a machine. They're constantly context switching, no sense of ownership. The second one is, if we have this standard process, or this standard Jira workflow, that will make us all effective. If someone moves from one team to another, then we've got the same process that will quickly be up to speed. It does not make any sense. When you're an engineer, and you move from one team to another, it takes a few hours to learn the Jira workflow. It takes months to learn to build relationships with people, to understand the code base, to understand the domain you're working in. A Jira workflow is not going to make moving people around any faster. I think that good companies, they don't just maintain F1 car speed, they get better over time. These incentives encourage people to build quality systems, to learn, to improve how they work, to improve their knowledge, and that F1 car can turn into a spaceship.

Part 2: Decoupling for Sustainable Fast Flow

Part two is decoupling. This is where we try and identify independent parts of our business so we can make changes and teams can work in parallel without tripping each other over, or blocking each other. Sometimes it's not always easy to decouple parts of our business, especially if you're an established company with lots of systems and existing teams. I was having a conversation recently, and a CTO had a different perspective on what this involves. He basically said, we know our team is organized in the wrong way and we're not working efficiently. Can you just tell us what our value streams are, then we can re-org all of our teams, and we'll work much more effectively. Unfortunately, no, I'm not proposing any quick fixes here. It's not that simple. The idea that the way we organize our teams, and align teams in software, definitely has a big impact on flow. I agree with that. Big re-orgs overnight don't really work, because you can't re-org your software as easy as you can re-org your teams. Also, identifying your value streams which your teams are aligned to, that's not a simple easy thing either. You need to understand how your company works, and get into the details to be able to make those decisions.

One of the key reasons for that is the relationship between how teams are organized, our team topology, and our software architecture. There's a concept called Conway's Law. Basically, the communication in your organization will be mirrored in the architecture of your software system. If you organize your teams in a certain way, your architecture will start to mirror your teams. That's a natural thing. Teams will organize their software in a way which makes it easiest for them to get their work done. Easiest is often not having to be blocked or depend on other people or other parts of the system. Over time, the way we shape the architecture then shapes us. If we want to re-org our teams in the future, we can't easily re-org our software architecture and refactor it, because refactoring code, moving responsibilities around network calls, it's very difficult and expensive to do that. The way we shape our software systems impacts how we can change our teams in the future. It's important to try and get the boundaries right. It's also important to have the flexibility. Again, that's where the incentives play a big part. If teams are incentivized to architect software systems well, maintain high quality, when you do want to change your teams, it's going to be much easier to do that. You'll get much more flexibility from your systems.

The approach I recommend is to understand your business domains, different parts of your business. Focus on a specific topic, or subject, or area. Basically, in each business domain, we try to understand, what are the user needs in this business domain? What expertise can we build for developing capabilities in this domain? If we organize our teams around true business domains, we'll be organizing teams around cohesive and related business concepts, parts of the product that change together, and that will naturally mean fewer dependencies. We want both our teams and their architecture to be aligned with our business domains. Business domains are things like tax calculation, journey planning, discovering new treatments, booking appointments. These are different areas of different businesses, and they're conceptually domains.

I've got an example here, of where identifying better domain boundaries had a big impact on the company. This is a company in the experiences industry. The problem they were having was a common problem. When one team makes a change, they're having to coordinate those changes in other teams. This was a frequent phenomenon. A lot of the work they were doing required multiple teams to all make changes to their parts of the system. They also had a manual process which involved 30 people, lots of Excel spreadsheets, emails, handing things over. It was taking up a lot of people's time. The company wasn't huge, it's like 30 people, that's a decent chunk of the company all involved in this manual process, which is slowing them down. After a bit of time, we realized there's a hidden domain here. There were different parts of this domain which have been scattered around, owned by different teams in different parts of different systems, different IT systems, and that's causing this problem. By decoupling those parts of the domain from the systems they're currently coupled to, and consolidating them into a single domain with a single code base, and organizing a team around that, changes will be much easier. That manual process can easily be automated in a centralized place.

One of the tools I recommend for this is things like core domain charts, to map out your different domains at your company, and identify which domains allow your company to differentiate itself, to get some advantage in the market. In this example, we realized that, step one, by consolidating that hidden domain, and reducing all of that manual complexity, that was only a supporting domain. It wasn't a big differentiator, but it was a big cost, big blocker to changes. By consolidating the domain, simplifying all of that complexity and centralizing it, would make it much easier to add big improvement in their core domains, which would come after that, maybe three to six months later. The timelines weren't particularly clear, but it was on the case of months. Thinking strategically about your core domains, and optimizing for flow in your core domains, that's definitely something I recommend you do.

There were other benefits. When you align your teams, your architecture with business domains, teams have a sense of purpose. They're working in a specific area. It's their job to identify value, and develop capabilities and own the technical solution. It's very motivating when you're solving the whole problem, and not just taking requirements. The team can also gain expertise in that part of the system. They can start to gain lots of domain knowledge and become more valuable to the business as they understand more clearly the problem that they're solving, and they can propose solutions and ideas. That's helped by working closely with domain experts. Also, incentivizes teams. If they're going to own something and be responsible for the choices they make, that just incentivizes good long-term behaviors to keep the code sustainable and aligned to the business. On the technical side, in addition to the sustainability, the code will be much more closely aligned to the domain. When your business talks about features and concepts in your business, the code will match a conversation they're having, and it will be much easier to translate requirements into code because it's all using the same language. Also, there's the low coupling benefits. Parts of your business concepts that change together, will live together in the same code base owned by a single team, so fewer dependencies and a lot of those coordination problems are reduced or even go away.

I talked about empowering teams on purpose, and teams having ideas themselves about the product itself. There's a survey from Alpha UX product manager insights, and they found that the best product ideas come from the whole team brainstorming. If you want to learn more about this, check out "Inspired" by Marty Cagan and Melissa Perri. These are people who are experts in product management, they're not engineers. They're talking about giving teams ownership and empowerment is actually building better products. If you want to know how to identify true domain boundaries, event storming is a great technique for collaboratively mapping out your business. I use this all the time. I highly recommend it. Domain message flow modeling is also useful to design end-to-end business processes, how your different domains collaborate to fulfill those four user journeys. Then from "Team Topologies," Matthew and Manuel have created independent service heuristics, and some workshop formats, and a bunch of tips and ideas and clues for identifying true value stream boundaries. Definitely recommend all of those techniques. There are some links here to find out more.

Part 3: Platforming for Sustainable Fast Flow

Part three, I want to talk about leveraging platforms to enable sustainable fast flow. I think platforms are super important because you can take away responsibilities from engineering teams, put them in platforms, and make it so teams don't even have to think about these things. Therefore, the platform just prevents blockers from even existing in the first place. It's not easy to build good platforms. The platform itself can become a cause of blocked flow in an organization if it's not done really well, so a big effort, but also big risks. One of the examples I've got where a platform mindset wasn't in place, is a financial services company in the UK. It's a very difficult experience being a developer, in this company. I tried to go through the experience myself. I was given the company laptop. It took about three or four minutes to load up every day, just logging in. It was very slow, just generally, because it had so much stuff on there. I don't think anyone really thought of making it productive. It just seemed to be, lock everything down as much as possible. I know security matters, but there were better ways to do that. It doesn't need to be that extreme. It took me three tickets with the system operations team to get Docker installed. Developers weren't allowed to download packages from npm. It was forbidden for developers that you can get access to production logs. They had to create a support ticket, and an ops team would extract the logs and give them a file on a file share which they could download. Very difficult to diagnose production issues. It's not surprising that after one year of a new project, nothing got delivered. They didn't even get one line of code to production. The crazy thing about this is developers on these teams probably spent 50% of their cognitive load, 50% of their time just dealing with all of this accidental complexity of local development machines, trying to get pipeline set up and code to production. That's 50% of more than 10 engineers' time spent adding no value to the product itself. That's crazy.

On a positive note, when I worked at HMRC org, I got to see a very different experience. HMRC had a platform called MDTP, Multi Digital Tax Platform. The platform had a very slick paved the road. Any team could get a new microservice pretty much all the way to production in a day or two. There was a security approval required to go to production, but, technically, you went into a file, added some configs, triggered some job, and it would spin up a whole new template for your application, development, QA, production environments. You got metrics, monitoring, logging, testing. You got everything you needed to build code, put it in production, and support it in production. The platforms gave it to you. Developers spent nearly all of their time working on product and domain capabilities, and not fighting infrastructure and red tape. I've put some links to resources here about the MDTP. I just recommend these resources for learning more about the MDTP, and some of their experiences about building platforms.

Platforms are also a key concept in "Team Topologies." I think there's a couple of highlights here which get the point across. The goal of a platform is to enable stream-aligned teams to deliver work with autonomy. It's about giving them ownership and empowerment, to support applications in production. It's about taking cognitive load away from teams for things that they shouldn't be caring about anyway. If you build platforms that do all of these three things, you will have a greater chance of achieving sustainable fast flow, especially if you've got a good decoupling, and teams that are incentivized to maintain levels of quality. It's not easy to build good platforms, and it's not just a technical endeavor. This quote here is something I heard in a project a few years ago. Basically, I was meeting a director in the company, and we were talking about the platform. He said, "You have to work on their terms. You don't go to them with a solution, you go to them with the very problem, and they'll figure everything out for you. Don't try and do their job for them. If they don't want to talk to you on a certain day, you go away, and you come back when they're ready." That is not a good attitude for a platform team. If you're supposed to be helping developers, that's not how you want people to see you. That's not a good relationship.

Likewise, this one, in a traditional company with traditional operations, they were learning about team topologies and platforms. This idea that the platform team is there to enable development teams to be successful, to empower them, to support them. That just did not land well with the operations team. In this company, operations had always been responsible for delivering outcomes. They thought they were above the developers. This idea of building a platform to support the developers and make developers successful, that's like the king helping the peasants do their job. They could not comprehend the idea. If you're going to do platforms, and you want to do platforms well, and you want to create sustainable fast flow, those social aspects of building platforms, equally as important as all of the shiny technical stuff, that enables you to spin up microservices in 10 minutes. Things like, firstly the mindset. Platform teams have to see developers as their customers. How can we make engineering team successful? That's got to be their mindset.

Platform team also has to reduce cognitive load. I worked with one company doing platforms, and each team had to create a huge Kubernetes file to get a new service set up. They had to learn all this stuff about Kubernetes, and how the company used it. That was slowing them down, not speeding them up. Good platforms minimize that. MDTP at HMRC, a few basic configurations but a new application, and that was it. There you go off to production with your code in just an hour or so. Developer experience again. Every interaction with the platform needs to be slick. Every suboptimal interaction slows down your teams and slows down your whole company. I think it's important to have a platform as a product mindset, things like surveys. Platform team going out to engineering teams. Are we serving you well? How can we improve the platform? What's your experience at the moment? In one company, we started doing joint retrospectives from the platform team, and the engineering teams to improve that social relationship between them. I think it worked really well.

In terms of sustainable fast flow, I think there are some developer experience metrics to keep an eye on, which will give you a clue if you are working in sustainable way. I think first one talked about a new service, how quickly to put some new code in production. A couple of hours is good, one hour is good. Any longer than a week, that's a very bad experience. Likewise, how quickly does code go from a developer's laptop to production? We're talking minutes here. If it takes more than 10 minutes to put some code in production, you're not at the high level companies are operating at these days. Also, I think about the onboarding experience, how quickly until a new team member is productive? At 7digital, I deployed to production on my first day. I think that's the standard you should be aiming for in your company. Also, managing the number of tickets. If engineering teams are constantly having to create tickets with the platform team, that's not self-service, that's a potential big blocker to flow. Ideally, we don't want any tickets but we can accept one or two per month per team. Any more than that, it looks like the platform team is becoming a bottleneck. Don't tie your engineers up in red tape and bureaucracy. Roll the red carpet out for them, and this will give you sustainable fast flow.

Wrap-up: Socio-Technical Symbiosis

To wrap up, I'd like to propose looking to nature as a way to help think about balancing socio and technical needs. In nature, we have the concept of symbiosis, where two species live together and there's an association between them. For example, the gut microbes living inside us, we give them a home to live and to flourish, and they break down foods and help us to get energy from that food. There are different kinds of symbiosis. On one hand, there's mutualism, where two species coexist and help each other. On the other hand, there's parasitism. Parasitism, where one species lives off and harms the other. When we try and create organizations with sustainable fast flow, we need to have a mutualism approach where we're not constantly incentivizing bad behaviors that cause our software systems to become more unsustainable.

I presented three things which I think can help build sustainable fast flow, if you apply socio-technical thinking. Firstly, it's about incentivizing good long-term behaviors by going home on time every day. I think that's very important, and technical practices. Decoupling is important. Identify business domains, align your architecture and teams with those business domains. When you implement new features and concepts, you will have minimal dependencies and coupling between those things. You'll also get more empowered teams who build better products. Finally, especially in large companies, it's important to build good platforms. It's also very dangerous. It's important to have a socio-technical mindset, building a platform as a product and platform teams having the right mindset.

Summary

Socio-technical thinking leads to socio-technical mutualism, leads to sustainable fast flow, leads to happier employees and better products. Here's three questions to ask yourself to help you take the next steps applying these ideas. What's stopping you applying socio-technical thinking in your company? What could you start doing differently tomorrow? Finally, what happens if you do nothing? Are things going to get better, or will they continue to get worse?

 

See more presentations with transcripts

 

Recorded at:

Feb 03, 2023

BT