BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Podcasts Josh Evans on DevOps at Netflix

Josh Evans on DevOps at Netflix

This is the Engineering Culture Podcast, from the people behind InfoQ.com and the QCon conferences.

In this podcast Shane Hastie, Lead Editor for Culture & Methods, spoke to Josh Evans, former engineering manager at Netflix, on how Netflix does DevOps and the freedom and responsibility culture that undermines their way of working.

Key Takeaways

  • There are many interpretations of the term DevOps, it is a useful shorthand for a wide variety of technologies and approaches
  • You build it, you run it” is the concrete application of the freedom and responsibility culture
  • When building a platform tool make it so easy to use that the product teams are not tempted to try and build something for themselves
  • Product teams are free to experiment and learn, which can feel chaotic and is a valuable part of the freedom and responsibility culture
  • The value of blameless and safe incident reviews – the goal is to learn and find patterns and use that information to present whole classes of failure from happening in the future
  • Don't view the value stream in a fragmented way – see the whole end to end system with all its interactions and dependencies and optimize the system as a cohesive whole rather than different tools and domains
  • 0:40 Introductions
  • 1:50 There are many interpretations of the term DevOps, it is a useful shorthand for a wide variety of technologies and approaches
  • 2:13 DevOps at Netflix starts with the company culture
  • 2:35 “Freedom and responsibility” as an abstract class for how most areas of Netflix function
  • 3:03 Each area manifests the culture in a way that is correct in their context
  • 3:10 “You build it, you run it” is the concrete application of the freedom and responsibility culture
  • 3:34 The importance of having teams that wholly own the things they build – development, deployment, instrumentation, monitoring and continually improving their components
  • 4:23 The distinction between product teams and centralised platform teams
  • 4:50 The centralised platform teams build the infrastructure and tools the product teams use
  • 5:15 The value of having separate platform teams to accelerate product development
  • 5:35 When building a platform tool make it so easy to use that the product teams are not tempted to try and build something for themselves
  • 6:15 Platform teams being thought leaders and identifying what the product teams may want before they realise they want them
  • 7:00 Know who your customers are and listen to them, value the squeaky wheels, talk to people and listen to feedback
  • 8:15 Sometimes the product teams use a new language or technology before the centralised teams have identified the need, and that’s OK  
  • 8:57 When a centralised team picks up a product they harden it and share it across other teams
  • 9:21 This can feel chaotic, and it allows for experimentation and learning and is part of the freedom and responsibility culture
  • 10:30 The importance of context not control – visibility into the cost of adopting new technologies and the value of involving the centralised teams when exploring new tools or techniques
  • 11:14 The secret sauce to Netflix’s freedom and responsibility culture is hiring very senior people and giving them a lot of autonomy
  • 11:40 Autonomous engineers will generally make better decisions than if managers try to micromanage them
  • 12:00 Hiring people who have a sense of responsibility, letting people go who are unable to take feedback
  • 12:35 Netflix has the highest revenue per employee ratio in the industry, this comes from hiring senior people and giving them opportunities for autonomy, mastery and purpose
  • 13:20 You can’t just copy the Netflix way of working in a different business with a different culture
  • 13:38 Engineering leaders need to hold the space and create opportunities for freedom and responsibility within their scope of control, even if the whole organisation doesn’t change
  • 13:52 Empower the people who feel the pain to solve their own problems
  • 13:58 Adopt these changes in small steps – getting to the end state takes time, find the pain points and address them one by one
  • 14:15 For Netflix, the initial driver was reliability
  • 14:35 Pick one simple metric to monitor – for Netflix this was Start-plays per second
  • 15:38 Picking just one metric enabled all the teams to focus on the same outcome and improve the metric over time
  • 16:05 Explore the tools for continuous delivery and implement them - Spinnaker is one example
  • 16:30 If necessary start with manual steps in the CD process and replace them with automation over time
  • 17:20 The value of blameless and safe incident reviews – the goal is to learn and find patterns and use that information to present whole classes of failure from happening in the future
  • 17:38 Never fail the same way twice
  • 18:40 Making it very clear that the goal of the incident review is learning and improvement not blame, and ways to achieve this
  • 20:02 The importance of emotional maturity, not as chronological age but as a personality trait, supplemented by experience
  • 21:10 What to look for when interviewing – introspection and being open to feedback
  • 21:55 People who will thrive in a freedom and responsibility culture need to be open to hearing what others say and to learn
  • 23:35 Thinking holistically about the whole system – seeing the developer experience as well as the customer experience in an integrated way
  • 24:44 Don’t view the value stream in a fragmented way – see the whole end to end system with all its interactions and dependencies and optimize the system as a cohesive whole rather than different tools and domains
  • 25:31 An example of how Netflix achieved this using a “Canary” which compares the performance of new code against the old code, side by side, and the holistic set of tools, models and metrics which expose the results
  • 26:27 The benefits that accrue to the whole organisation from having the monitoring and management tools fully integrated
  • 27:36 Velocity with confidence – the ability to move quickly while having safety nets in place to catch and recover from mistakes quickly 
  • 27:58 What’s next for Josh?

Mentioned:

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and YouTube. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Previous podcasts

Rate this Article

Adoption
Style

BT