InfoQ Homepage Failure Content on InfoQ

Presentations

RSS Feed

Newer Older

Culture & Methods

Risk and Failure on the Path to Staff Engineer

Caleb Hyde discusses their career progression and regressions, as well the context they used to figure out what to work on and whom to work with, distilling a framework to utilize in one’s own work.

Caleb Hyde
on Jul 10, 2024

Icon

46:41
Architecture & Design

Deconstructing an Abstraction to Reconstruct an Outage

Chris Sinjakli explores the aftermath of a complex outage in a Postgres cluster, retracing the steps taken to reliably reproduce the failure in a local environment.

Chris Sinjakli
on Dec 22, 2023

Icon

41:19
DevOps

How Did It Make Sense at the Time? Understanding Incidents as They Occurred, Not as They are Remembered

Jacob Scott explores the basics of failure in complex systems, the theory and practice of how it made sense at the time, and actions to take.

Jacob Scott
on Sep 14, 2023

Icon

38:15
Architecture & Design

Managing the Risk of Cascading Failure

Laura Nolan discusses some of the mechanisms that cause cascading failures, what can be done to reduce the risk, and what to do if there is a cascading failure situation.

Laura Nolan
on Jul 11, 2021

Icon

40:19
DevOps

Culturing Resiliency with Data: a Taxonomy of Outages

Ranjib Dey overviews the categorization of outages that happened at Uber in the past few years based on root cause types.

Ranjib Dey
on Dec 25, 2020

Icon

29:14
DevOps

Failing over without Falling over

Adrian Cockcroft shows how to use System Theoretic Process Analysis (STPA), as advocated by Professor Nancy Leveson’s team at MIT, to analyze failover hazards.

Adrian Cockcroft
on Nov 20, 2020

Icon

21:34
Development

#FAIL

Kevlin Henney keynotes on some of the failures that people had in various projects and the lessons to be learned from them.

Kevlin Henney
on Nov 10, 2019

Icon

42:37
Culture & Methods

Rules in Agile Transformation: 80/20 and “Not Everybody Likes to Dance”

Zbigniew Piecuch discusses why some teams do not manage to master Agile.

Zbigniew Piecuch
on Nov 01, 2019

Icon

31:39
DevOps

What Breaks Our Systems: A Taxonomy of Black Swans

Laura Nolan talks about Black Swan events - unforeseen, unanticipated, and catastrophic incidents - that may happen in production and can take the system down.

Laura Nolan
on Oct 10, 2019

Icon

50:46
DevOps

How Did Things Go Right? Learning More from Incidents

Ryan Kitchens describes more rewarding ways to approach incident investigation without overly focusing on failure prevention.

Ryan Kitchens
on Oct 09, 2019

Icon

42:14
Culture & Methods

How Condé Nast Succeeds by Buildling a Culture that Embraces Failure

Crystal Hirschorn talks about learnings found by building a culture that embraced failure through Chaos Engineering practices, what her teams have learned & adapted for their platforms at Condé Nast.

Crystal Hirschorn
on Aug 04, 2019

Icon

48:47
Architecture & Design

Building Resilient Serverless Systems

John Chapin explains how to use serverless technologies and an infrastructure-as-code approach to architect, build, and operate large-scale systems that are resilient to vendor failures.

John Chapin
on Aug 03, 2019

Icon

44:39

Newer Presentations

Older Presentations

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Presentations