InfoQ Homepage Incident Response Content on InfoQ
Podcasts
RSS Feed-
Courtney Nash Discusses Incident Management, Automation, and the VOID Report
In this episode, Courtney Nash, a researcher focused on system safety and failures in complex sociotechnical systems, discussed the latest edition of the VOID report. Topics covered included: incident management and the role of automation, working effectively within socio-technical systems, and the value of collecting and analyzing system metrics in the good times and the bad.
-
Anurag Gupta on Day 2 Operations, DevOps, and Automated Remediation
In this podcast Anurag Gupta, founder and CEO of Shoreline.io, sat down with InfoQ podcast host Daniel Bryant and discussed: the role of DevOps and site reliability engineering (SRE), day 2 operations, and the importance of building observability into applications and platforms.
-
Ryan Kitchens on Learning from Incidents at Netflix, the Role of SRE, and Sociotechnical Systems
In today’s podcast, we sit down with Ryan Kitchens, a senior site reliability engineer and member of the CORE team at Netflix. This team is responsible for the entire lifecycle of incident management at Netflix, from incident response to memorialising an issue.
-
Resilience and Incident Management with Vanessa Huerta Granda
In this podcast Shane Hastie, Lead Editor for Culture & Methods spoke to Vanessa Huerta Granda Manager of Resiliency Engineering at Enova about resilience and incident management.