InfoQ Homepage Chaos Engineering Content on InfoQ
-
Ryan Kitchens on Learning from Incidents at Netflix, the Role of SRE, and Sociotechnical Systems
In today’s podcast, we sit down with Ryan Kitchens, a senior site reliability engineer and member of the CORE team at Netflix. This team is responsible for the entire lifecycle of incident management at Netflix, from incident response to memorialising an issue.
-
Kolton Andrus on Gremlin’s Newly Announced SaaS Chaos Engineering Product and Running Game Days
Gremlin is a Software as a Service that lets you plan, control and undo Chaos engineering experiments built by engineers with experience from Netflix, AWS, Dropbox and others. In this podcast Wes talks to Kolton Andrus about the Gremlin product and architecture and related topics such as running Game Days.
-
Nora Jones on Establishing, Growing, and Maturing a Chaos Engineering Practice
Nora Jones, a senior software engineer on Netflix’ Chaos Team, talks with Wesley Reisz about what Chaos Engineering means today. She covers what it takes to build a practice, how to establish a strategy, defines cost of impact, and covers key technical considerations when leveraging chaos engineering.
-
Kolton Andrus on Lessons Learnt from Failure Testing at Amazon and Netflix and New Venture Gremlin
Wesley Reisz talks to Kolton Andrus. Andrus is the founder of Gremlin Inc. He was a Chaos Engineer at Netflix, focused on the resilience of the Edge services. He designed and built FIT, Netflix’s failure injection service. Prior, he improved the performance and reliability of the Amazon Retail website.