InfoQ Homepage Presentations Bits, Bots, and Banter: a Deep Dive into How Tech Teams Work in a DevOps World

Bits, Bots, and Banter: a Deep Dive into How Tech Teams Work in a DevOps World

View Presentation

Speed:

Download

48:16

Summary

Brittany Woods discusses team structures and how they can make or break healthy teams, insights into how teams are collaborating, strategies for leaders, advice on dealing with T-shaped engineers.

Bio

Brittany Woods is a Senior Engineering Manager based in London. As an industry advocate for DevOps, Brittany has worked to increase developer efficiency while improving developer experience in several organizations globally.

About the conference

Software is changing the world. QCon London empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Woods: What I'd like to explore is how teams are working in a DevOps world. We all know the best practices that have been shared throughout everyone's journey really into DevOps and what the ideal implementation looks like. What's not really written very often is what teams are actually doing. Having worked across several companies in various roles and in various stages of maturity, I wanted to take that and share what I've seen across those roles: some things that I've witnessed to work, some things that I've witnessed that didn't work. I'd like to think of myself as well-versed in the DevOps movement, with that varied experience. The reality is that every company is different.

Their needs are different, so their practices are going to look different. You're going to hear a lot of opinions and a lot of stories that may contradict the things that I'm about to tell you, and that's ok. The key takeaway is more so that DevOps is a flexible methodology, and it can work regardless of the industry that you're in. Do I have opinions on what the best way is? Yes, we all do.

I'm a senior engineering manager with The LEGO Group. Primarily, I'm focused on building out their platform team that's responsible for their e-commerce platform, lego.com. Before that, I worked for an American tax company. I was responsible for server automation, SRE, stuff like that. Then before that, I was an IC. I was a DevOps engineer, an automation engineer, and built out these practices from an engineering perspective.

DevOps, Abridged

What I really wanted to do was just start from the beginning. I mentioned that DevOps is a flexible methodology, and so everybody's taken the bits and bobs that work for them out and implemented those. I wanted to level set on where I'm coming from with this, just so you were aware. The name of the game is to do this through a blend of collaboration and automation, and redefine what you consider a team. That's how you really implement DevOps. You redefine what those ownership lines look like in the past. Through all of this change, you shift to using more of those methodologies among your teams, and that's going to dictate the change, really, that you have to have in your culture.

Culturally speaking, you have to think of things as more of a shared responsibility model. You also have to start thinking about things from a team health and delivery perspective, either through the enablement or automation, or some combination of the two. That also means that lines are going to start to blur. Lines between teams are going to blur between DevOps, or as you matured in your DevOps practices. You have to either build, lead, or steer your organization in a direction that's going to withstand that type of change. As I heard with ClearBank's talk, a point that rang absolutely true was that you have to understand that this happens iteratively and not overnight. There's only so much change in an organization that you can pump in in a really rapid timeframe.

Remember that this is continuous. This is continuous improvement that I'm talking about here, and that's what I'm considering to be the standard that I'm going off of for this talk. You also have to start prioritizing education and learning, because DevOps teams need to be broader in their knowledge. I'll talk more about that later, whenever I talk about the dreaded T-shaped engineering. Just know that DevOps teams have to be broader in their knowledge. Then, the plus side to all of this is that, in theory, all of this focus on culture and learning pays off in spades, because you start to see the efficiencies that DevOps brings.

You're probably also going to ask how this helps teams. This help comes in the form of autonomy. In my opinion, autonomy is like the biggest measure of increased efficiency when you do DevOps and do it well. Through prioritizing learning and facilitating that cultural change, you're opening teams up to be able to control that destiny. Those historical siloed specialties don't exist whenever you do this correctly. Then, done incorrectly, you also have to be aware that new silos can form. The advertised improvements of frequently shipping code and changes and iterative development, they happen. They happen when you're not mature, and they happen when you are mature. Keep that in mind.

Then we'll talk more about some measurement in some later slides where we talk about ways that I've seen to effectively measure teams' maturity. Then the last thing that I want to level set on with my point of view of DevOps is how I view SRE and platform engineering. Some have said that platform engineering and SRE is the new DevOps. It's something new. It's different. Has nothing to do with DevOps. I think a bit differently on this topic. I think that DevOps is a methodology, an umbrella methodology, and that what we're seeing with site reliability engineering and platform engineering is actually those more specialized team frameworks that are allowing you to address some of the problems that people were seeing as they implemented DevOps.

Maybe an iteration, but I think it's still all part of the DevOps movement. While DevOps is pushing to address the way that your technology organization operates, as a whole, SRE and platform engineering is helping to define how those traditional operations teams go from doing the thing to enabling the thing. Also wanted to call out that with the change to DevOps, it's all teams that are changing, not just development teams. I mentioned the shift from doing to enabling, that happens on the operations side.

DevOps - What It Isn't

I want to show some opinions about what I don't think DevOps is. The first one is, an alternative for standardization. A pitfall that I commonly see whenever teams are doing DevOps, or whenever they're moving along their DevOps journey, is they want to enable that autonomy I talked about as being one of the biggest measures. In doing that, they want teams to control their own destiny. In controlling that destiny, they promote, you can use whatever you want, however you want, to get the job done. I agree that that is important, but there is a way that you can empower teams to do things autonomously while still having guidelines or something in place.

You can do this through tooling, process, permissions, automation, some combination of all of these things, but the absence of them causes chaos. That's one important thing that I don't think DevOps is. The next thing is a replacement for ops. We have that. There's a thing that replaces ops, and it's called NoOps, but it's not called DevOps. We're not going to be talking about NoOps. Then there's two more here. It's not a single team or a job title. This is a bit of a controversial opinion. I mentioned earlier that DevOps is interpreted in many different ways. A bit of a running line among DevOps practitioners and leadership is that DevOps is a methodology, not a job title.

There are some companies that they'll have DevOps engineers, and they'll have DevOps teams for whatever reason, either it's because that's their interpretation, or because they needed to create a new job title for pay bands or whatever, whatever they had to do to attract new talent in the DevOps world. There's nothing against and no shame in having those. My point here is more that it's just important to remember that DevOps is a collective effort and not a set of teams or engineers. It's many teams, many engineers across an organization.

Then last one, you can't purchase DevOps, which means DevOps is not a tool. There are many tools that will help you on your journey to achieving DevOps, and those tools have largely been considered by the industry, DevOps tools. There's not a single tool set that we're going to talk about today, or probably ever, that is going to give you successful DevOps out of the bag, or in the can, or however you want to say it. You can use any number of those tools in different combinations. In order to have a successful practice, you have to think about the things that I mentioned earlier, like culture and responsibility lines, focus on learning, appetite for change, all of those less tangible than a tool things.

What's a Healthy Team?

I also keep referring to a team. There are a few facets I consider for healthy teams. I just wanted to talk about those. This is particular to healthy DevOps teams, that's going to fall outside of the more traditional metrics that we'll talk about later. The first one is that I'm a firm believer in having a psychologically safe environment. The TLDR here is, no brilliant jerks. What I mean by that is we have this long-held myth in engineering that it's ok to not have good communication skills, that, as I was an engineer, that seemed to proliferate the organization. That it's ok that Beth or Bob don't have great communication skills, because look at all the things they're doing for the team. Look at all the engineering effort they're putting forward.

The problem with that is, if you have a brilliant jerk on your team, generally speaking, you're creating a non-psychologically safe environment for everyone else on that team, so they're not going to feel safe sharing their ideas or opinions. You're going to lose that potential for innovation that you would have typically got from them. They're going to shut down, probably in meetings, and feel like they aren't going to be heard. They're not going to feel like they have a space to safely learn. It's important to shut that down so your team can continue to tout the experience that they have as a collective group. I also consider a healthy team autonomous.

Healthy teams should be able to use the tools, patterns, and practices that are in place to do what they do with the guardrails set out for the organization. If they need to build infrastructure, they should be able to do that. If they need to manage that infrastructure, they should be able to do that in a way that doesn't give them extra or additional overhead. If they need to deploy code, they should be able to do that without waiting a week for an approval or waiting for somebody to look at it. They should be able to use the practices and patterns in place to safely deliver that to end users or customers or whoever their stakeholders are, by themselves, without having to consult with multiple other teams. Then, engineers should be empowered to solve tough engineering problems without cognitive overload. Really, this can be interpreted in many different ways, but what I specifically mean is twofold.

First of all, you can't have an everything team. We'll talk a bit about what everything teams look like. There should never be a team that is a catchall or a dumping ground for things that don't fit anywhere else. This is going to inherently lead to that cognitive overload, and teams being too overwhelmed to solve the tough problems that you would like for them to use their big brains to solve. Second, just because a team is on board to solve those tough problems, do not try to add the overhead of, if you've touched it, you've owned it. What I mean by that is DevOps and having healthy teams is about facilitating innersourcing. DevOps helps you use innersourcing to be able to source the best ideas from across the organization. If you're fostering a culture of, you've touched it, you've owned it, nobody is going to want to help solve those tough problems or collaborate in those ways.

I think it's also important for teams to have a clear and understood remit. Basically, what I mean by this is, if teams don't have a clear understanding of where they fit into an organization, they're not set up for success. They feel demotivated. Their morale suffers. Sometimes this can be simply by teams owning too much, so those everything teams I talked about. Or when a remit is incredibly wide or too wide, that cognitive overload of trying to do everything for everyone on everything just becomes too much. It's also important for teams to understand not the things that they'll be focused on or working on in their day to day, but also how that plays into the bigger picture for your organization.

How does that roll up to the strategic plan for your company? I have a story here. I have a mentor. I meet with her once a month. We used to meet much more often, but she's in America, and I'm here. She shared with me the importance of this big picture, we call it, of this showing a team where they fit into the organizational plan, showing them the things that they're working on matter. Before she talked about it in that way, I never really, truly thought about it. I was like, we all know that our work is helping the company. That's what we're here for. That's why they sign our paycheck. It's more than that. It's, it shows a team where their impact is, especially for the backend engineering teams or the operations teams that they don't really deliver new shiny features to production that a customer uses in a platform.

Showing them, this is how you're actually helping us achieve that strategic direction, is incredibly important. Consistent work rituals is also a sign of a healthy team. I wanted to call this out simply because I think it's important for teams to have a routine and understand the expectations laid out for them, and understand when things are going to be happening to them or with them. In my team right now, we do the standard agile practices. We have epic refinement sessions at the same time, biweekly. On the off weeks, we do backlog review. We do sprint planning and sprint retro at the same time every two weeks. We do our standups every morning. This consistency has really helped bring consistency to the team's work, to their understanding of delivery, to their understanding of our objectives, and has helped with delivery.

It also becomes more important to keep this consistent when you start actually gathering metrics for these teams. We'll talk about that more. Then, finally, on the topic of measurement, make sure you're being transparent. Several talks have talked about the importance of transparency. As a leader, I advocate for being as transparent as possible in every scenario. The last talk talked about being transparent about promotion cycles and process. I think you should do that. You should also be transparent about the measures that the team is being held to, the measures that you yourself are being held to, and have them understand what is actually expected of them, and what things are you looking for to gauge that.

How Are Teams Really Working?

We're not here for all of the background stuff. We're actually here to talk about how teams are really working today. I mentioned that I have experience across different industries and different levels of DevOps maturity, which means that I have insights into a couple different ways that teams are working today. This isn't just going to be a reflection of how my team today at The LEGO Group is working. It's going to be a collective reflection of all of the types of teams I've worked in. I'll call out some pros and cons of all of them. I also want to highlight that as we go through these, I'll talk about the ways that we're most successful either by helping the teams, by helping our delivery, or the best-case scenario, having the best of both worlds.

Team Structures

Team structures is the first thing I want to talk about. There's three of these that I'm going to take you through. The first structure I want to talk about, I'm calling the traditional structure. Even when doing DevOps, some teams have stayed in the structure. More than likely, you have product squads full of developers that are rolling up to some VP somewhere, or some director somewhere. Then you have an IT team that's completely separate, a different part of the organization. Whenever you look within, so within your IT department, for example, that's where I have the most experience, because that's where I've worked most of my career. You'll have IT engineers. You'll probably have a cloud team. You'll have the network engineers, and all of the things in between.

The obvious benefit with the traditional structure was always that there were really clear lines of who owned what, but it also had really deep silos, because everyone was largely encouraged to stay in their lane. Even with that move to DevOps, some organizations chose to keep this model and roll with this structure. What they did was turned those specialized functions into more consultative roles. What that looks like is essentially, say you have a network engineer. Those network engineers would consult with the product squads on the things that they needed to build a network in the cloud, for example, and help enable those product squads, ultimately leaving the product squad to own that long term.

What this did, through this lending out of expertise, those product squads gain knowledge of how to autonomously support their application or their product while taking that, do the right thing first or do the right hard thing first approach, because they're leveraging the expertise of that part of their organization. The downside of this model is it's traditional, first of all, so you can see how silos could potentially form. As a leader or really as an organization, you have to be really clear on what you hope for these teams, because it's super easy to fall back into the old traditional processes and practices where all these teams in the middle own all of the bits, and the development teams build the bits. Because you're still in that structure, facilitating that can become quite hard.

The next structure I want to talk about, I'm calling single team DevOps. On the surface, it looks a lot like what I just showed you. I just changed the words. Then when you zoom out, it looks a lot different. All of those specialty areas, gone. You have a DevOps team full of DevOps engineers. From those DevOps engineers, everybody needs them. Everybody wants a piece of what they're doing. Everyone needs their support. They're usually equally responsible for anything from delivery questions to network questions. They're probably considered owners of lots of things, but specifically they'll be owners of infrastructure more than likely. That can be actual containers, that can be the processes, that can be the entirety of your cloud environment. It can be any of the above. As the owners of those things, they're going to be on the hook for requests from auditors, for requests for compliance.

Again, requests coming from all sides. These are the everything engineers I was talking about earlier. They're very tired. Remember how we talked about team health? This model doesn't really promote any of those health markers. It doesn't advocate for the avoidance of cognitive overload. Having that broad and that much weight on a single team's shoulders makes every day feel like just staying above water, if you can. Innovation is not happening in this model. These teams are often going to see tremendous amounts of churn, in my experience. You'll just see engineers cycle through. They'll burn out usually, or they'll move to other teams that have a more clear remit.

Ultimately, this team feels a bit like a dumping ground for everything on all the sides. They're the keepers of that infrastructure. They're the experts in DevOps for your organization. They're the admins of the environment, which means that they're the bottleneck for a lot of requests. It's also worth mentioning that I've never seen this model work at scale ever. In all of the organizations that I've worked with, ones that start with this usually transition to something else. Keep in mind the organizations that I've worked with have been on the larger side. They're probably not like massive Microsoft, Google, but they're big.

This probably works slightly better whenever you have less demand on those DevOps engineers time with smaller teams and smaller spaces. I would still say that for the long term, this model doesn't really work. There's always going to be too much to know. There's going to be too much to learn, too much to manage, and no clear way to really build or empower teams around you, so you just continue to perpetuate that bottleneck behavior.

Then the last one that I want to talk about is empowered DevOps. I'm calling it peak autonomy as well, and we'll talk about why. In this model, you have your product squads. You'll notice in the middle you now have platform engineers and site reliability engineers. With this model, so your IT department is no longer made up of specialty areas, but it's also not made up of just a DevOps team doing all of the things for everyone anymore, either. The platform engineers, or the intent behind the platform engineers, is to focus on building platforms and patterns of practice that can be used by the product squads to enable them to do the things they need to do. That's those guidelines and guardrails I was talking about.

Through this enablement, you achieve the autonomy of those product teams in a safe way. I also want to add the caveat that you can have several platform teams in an organization. I know platform is singular, but you can have multiple platforms. There's likely never going to be a single platform that you provide for product squads, because just as you had many concentrations in the traditional model where you had that network expertise, you're also going to need to provide platforms to accomplish some of those tasks, whether that's modules in Terraform, for example, or something else. You're also going to have site reliability engineers as part of this. This too can have many different approaches depending on who you are, what company you're in, who you talk to.

For the sake of the day, we're just going to say that these SREs are responsible for the reliability of the services broadly. These SREs are going to ensure a solid mean time to acknowledge incidents. They're going to make sure that your mean time to resolve those incidents is generally getting better. Then promote practices across the product squads that bring more reliable services. Things like observability, things like defining error budgets, SLIs and SLOs, all of the things that you do as an SRE. I want to put it simply, these SREs are there to help promote operational maturity across teams. They can do that in any number of ways. They can be embedded or they can be more consultative.

Both models work, it just depends on how your teams are set up. Inversely, in this model, the product squads have a bit of a different role to play. Given the focus on autonomy, the product squads have the tools and patterns that they need that have been provided by the platform engineers to effectively build and own and then also maintain the applications from end-to-end. Both the traditional model and the empowered model promote that ownership of environments, living within the teams that built the application. This model is a bit more structured than the traditional method, though, and it promotes that centralized enablement coming from those platform teams.

It promotes those standards and that safe environment to learn, that we mentioned before. It also promotes having a collective and unified approach, so you're not adding cognitive overhead to the product squads. We don't want to just move the problem from one spot with everything team in the DevOps engineers and go put it on the developers. That's not the intent. You have to have enablement material to make this work.

My sales pitch, why do empowered DevOps? This is the model that I've seen be most effective, pure and simple. It generally provides those key non-technical pillars that we talked about of healthy teams, like autonomy, the reduction of cognitive overload, while also allowing for that faster delivery, faster cycle times for teams. It puts the tech in the hands of the people that need it most, and that can impact change to it the quickest. You're allowing for that quicker iteration. Then also giving that platform mindset. It allows product squads to focus on what matter to them and not have to deal again with that cognitive overload and overhead of owning everything. They have things in place to help them own the new things that are coming to them, and a safe space to learn how to do that.

T-Shaped Engineering

Then I wanted to talk about the dreaded T-shaped engineering. We probably all heard it. Maybe it was a couple years ago. Maybe it was yesterday, but we've all heard it come up. As an engineer myself, I know the skepticism that comes with the connotation around T-shaped engineering. As a leader, I think it's important to make it clear that you're not expecting everyone to learn to be an expert in everything. With the shift to this model, there's going to be a bit of a learning curve, whether you're learning the platforms provided by the platform engineering teams, or whether you're learning how to go from building and doing to supporting. It's all about learning a new engagement model and a new way of thinking.

This is also an illustration of something I talked about earlier, which is making that shift into an investment of learning. It's changing your culture to put value on that learning and growing, while you ensure you maintain a balance with that cognitive load on teams. Basically, this doesn't mean that you're an expert in everything. It actually means that you learn to be aware of how to use the tools around you, while broadening your understanding. You're broadening your understanding, not your expertise, in wider areas.

How Are Teams Collaborating?

The next thing I want to talk about is how teams are collaborating, and talk about what's working for them. There's a lot of tools to help with collaboration, I'm sure you're all aware. Some companies have multiple chat tools, for example, but the collaboration tool is the new DevOps tool. There is no single set that's going to give you the perfect collaboration for teams. With the new ways of working that come out of COVID and all that we've dealt with over the last four years now, these practices have morphed into something different than what we knew to be true back in 2019. I've worked in teams using a full Atlassian suite. I've worked in teams that are using everything in Azure.

In both cases, the biggest thing was, teams need a way to give real-time feedback in the written form. They need integrations to be able to automate their processes and tie them together. They need pipelines for delivery that encompasses all of the checks and things that are important that we've talked about already. Teams that have these methods have been the most successful in being able to deliver quickly, safely, and with the least customer impact. This is also how teams stay connected. Another buzzword that happens a lot is having a connected culture, or a one team approach, or whatever your company has branded as their hashtag for the day. It doesn't have to rely solely on in-person experiences.

Many of us are working in global organizations. I've only ever worked in global organizations. That in-person team camaraderie is still important. There's still value in that. More so, building relationships in real time with everyone is key. Tools aren't necessarily going to give you that. What you should do and focus on is build something stable and consistent into your culture that's going to give you those things, that's going to give you that collaboration. Set communication standards with your team. Enable teams in other ways. Find ways to have fun with your whole team remotely, for example. That's what's going to build collaboration in your team.

When we talk about collaboration across multiple teams, there's many avenues that you can take here. Fireside chats is a good one, just to empower experts within your organization to use a forum to share what they know and what they've learned. While I was at H&R Block, I founded something called Block Bits. Block Bits was, basically, we do two lightning talks, so we would have 30 minutes. It happened the second Wednesday of every month. You could sign up to do a lightning talk and just present, this is a cool thing that I learned, did, or my team built. It gave a place for engineers to get in front of the wider engineering community.

Usually, we had attendance between 300 and 500 engineers. It gave them that broader audience. Even though they didn't realize at the time, it built connections. It also started other teams thinking, I can solve the problem in the same way that you did, and I wasn't thinking about it that way. That's a good way to collaborate across teams when you have a very big engineering organization. I'm also a big advocate for community user groups. I started my automation engineering career long ago in the configuration management space. Whenever I was trying to introduce configuration management and automation to an organization, I leaned really heavily on these community user groups. We had a Chef user group. We were using Chef at the time.

Basically, what this did was it gave users a way to feel like they had ownership in the thing we were doing. It gave the adoption of that platform a little bit of a kickstart. It was also a really good forum to understand what the actual challenges those users were facing, so then we could build solutions for that into the platform itself. You can also encourage gatherings like engineering clubs to form. We have so many clubs within The LEGO Group. Most recently, a backend engineering club was founded, and all of the backend engineers get together and share solutions and problems that they're facing. It's a really good way to build camaraderie across your engineering organization.

Then, workgroups. I look at workgroups as a way to do important things for the organization that maybe would get deprioritized somewhere else, but also as a way for people to stretch outside of what they're doing every day, if they want to learn something or do something new. Because it's a safe environment for them to be able to pick something up.

There are also ways that I've found to improve collaboration within an individual team. I mentioned earlier the importance of routine. The typical team rituals that take place that I talked about, giving them a forum for sharing and a way to include each other in their work. In the case of retros, I view them a bit team therapy, bonding experience. It depends on the week. As long as you foster that culture of sharing, and as long as your engineers feel safe in doing so, these retros are a really good place for them to be real with each other, for them to express things that are going good, things that are going well, things that maybe didn't go so well over the last two weeks. If there's any kind of stress or underlying tension, usually we can work that out in a retro. From these sessions, we also do another thing, MOB sessions, where the team gets together to solve hard problems.

Again, it's about building that camaraderie and trust within the team to promote finding that right solution, but it also helps promote the culture of learning that I talked about earlier. Within LEGO, we all have cute names, and then we have real names, and my team's cute name is Houston. Not for, Houston, we have a problem, though, sometimes that is the case. We do Houston meets. Anyone on the team can join. It's in our group channel. We'll just start a meeting and we solve hard problems. We share what we're working on. We use it as a sounding board or a rubber duck session, but it gets the team solving the problem together, and then allows them to learn from each other. I should also mention, these aren't planned. They're entirely informal. The team is empowered to say, does anyone want to jump on a Houston meet? They'll just start a meet.

Measuring Success

The next section and the last section actually is measuring success. There's a lot of metrics that I'm going to talk about, not necessarily measuring lines of code, but a lot of ways that you can measure the impact of the capabilities of a team. As a leader, I found that taking this more outcome and capability approach is a better indicator of my team, the teams that I've been in, of their health and how they're impacting the organization as a whole than those typical productivity markers. There's also something to be said about ensuring capacity trends stay on target and all of that. I'm not actually going to talk about those, because we all know that that's something that you do as part of planning.

Each type of metric that I'm going to talk about is slightly different depending on the focus area. I couldn't talk about metrics and about DevOps without talking about DORA metrics. We've all heard them. We probably all have OKRs about them. This has been the long-held standard and way to determine DevOps maturity. They're going to help you understand how often your code is shipping. The time it takes a developer to get it from their laptop to a production environment, live. The quality of those changes, and the impact of errors on your customers.

Then how quickly you can recover after failure. For platform teams, these could be good markers for how well their enablement is working, because you should start to see these metrics get better if you're enabling teams in the right way. As they're all, in a way, measures of autonomy, these are also really good indicators of whether there's adequate enablement to allow for those faster delivery times, or it will point to if you have any bottlenecks.

The next set is an extension of DORA, and that is the SPACE metrics, or the SPACE framework. Basically, this framework goes a step further and measures more of those soft areas of maturity, where the DORA metrics are more focused on the delivery piece. With satisfaction, this measure can really show anything from developer platform engineering job satisfaction to something like satisfaction with the platform, if we're talking about what's useful for a platform engineering team, for example. How useful is that thing that you built? Performance and activity, they all tie back with traditional measures like velocity and delivery.

While they may not be alone, the best indicator that a team is delivering value, combined into something like the SPACE framework, you can start to see a clear picture of the impact that that team is having, and whether or not those focus areas are working for the larger group. Community is also a good way to measure the impact of that internal collaboration that I talked about, and to make sure that you're focused on the right things, building environments that are inclusive for everyone. Then, measuring if people are talking and working together to solve hard problems. Are those solutions being spread across the organization? Are they just landing and sitting in a single team? Those are all important things to know. Then, finally, evolution, again with autonomy. This is a great measure of autonomy because you want to ensure that the change is happening over that period of time or moving in the right direction.

On platform metrics, after all of what I just talked about, I have one, and it is adoption. If you build a platform that nobody adopts, did you really build a platform? It's like if a tree falls in a forest and nobody hears it, did it really make a sound? You have to be focused on building something that's useful for the teams that you've built it for. You have to do user research. You have to understand what their challenges are. Adoption is the single most important metric for a platform engineering team to understand if their platform is useful.

Then with SRE teams, we all probably, if you've worked in the SRE space, have heard of the Golden Signals. Those are things a SRE team should be promoting to product squads. It's also a way to measure the impact of that enablement that they're providing, whether they're embedded SREs or whether they are more consultative. I talked earlier about there being many ways to do SRE, of those two models, I see these measures as helpful targets for enablement on that consultative model that I talked about. Even in the embedded approach, having this data, understanding latency, traffic, error, saturation, and seeing how those trends change over time when you've sent these SREs out to help enhance the operational maturity of your teams, seeing those over time is really important.

It's also important to understand your incident response times. This ties really back to those DORA metrics and that MTTR that was at the end there. In my opinion, this is one of the best measures of operational maturity, because it tells you how quickly you can solve something, and, by extension, how many handoffs it takes to actually solve the problem. Because if that number is 5, for example, it would be better at 1. It would be better if the first team that touched it could solve it. Uptime is also really important. You want to ensure that as your teams are helping product squads focus on increasing their reliability, or enhancing their reliability, that they're focused on improving the uptime of their platform.

Then the final one is practice maturity. Doing maturity models of your product squads is really important for an SRE team, whenever you get started, to understand where they're starting. What practices do they have that are operationally mature already, and what practices need your help? Then you can see that heat map and focus in on the areas that's going to provide value to that product. Measuring that over time is a really important practice, just to ensure that you're taking them on the right journey.

Then there's team health. Within The LEGO Group, I was introduced to team health checks. We do these quarterly. You can do them more often if you want. Generally speaking, quarterly gives you enough time to be able to make improvements between health checks. Essentially what these are, we use our agile coaches, and they put together a two-hour session where we get together and talk about problems within the team, problems with the ways of working that we're following, any other challenge that we're facing really. We talk about them. We come up with some actions to solve them.

Then we try to change them in between. It gives teams an area that's safe to express, this is how I think this team could be better. It drives forward making the team better, by somebody other than just yourself. Then the last one is a 360 review process, or anonymous feedback for teams. The last talk talked about the importance of doing real-time feedback and not relying on the end of year process to get that feedback. I also think that goes both ways. While I'll give continuous feedback to the team, so they understand where we're at, so I've level set with them, I also want them to give feedback to me. What things am I doing that you don't find helpful? What things could I be doing that you would find helpful? Doing that consistently throughout the year is really important to having operationally mature teams as well. Just using that as a system of measure for me has been really effective.

See more presentations with transcripts

Recorded at:

Dec 31, 2024

Brittany Woods

InfoQ Software Architects' Newsletter