InfoQ Homepage Resilience Content on InfoQ
-
How Do We Talk to Each Other? How Surfacing Communication Patterns in Organizations Can Help You Understand and Improve Your Resilience
Nora Jones discusses how communication patterns in organizations can reveal how systems actually work in practice, vs how we think they work in theory.
-
Two Years of Incidents at Six Different Companies: How a Culture of Resilience Can Help You Accomplish Your Goals
Vanessa Huerta Granda looks at real-life examples of companies she has worked with who chose to invest in improving their incident programs and have seen it pay dividends.
-
Resilience Hides in Plain Sight
John Allspaw describes what resilience is, and how it's incredibly hard to recognize it.
-
Orchestrating Resilience: Building Modern Asynchronous Systems
Sai Pragna Etikyala discusses her journey at Twilio, sharing practical examples from their projects, the challenges they faced, and how they overcame them.
-
Comparing Apples and Volkswagens: the Problem with Aggregate Incident Metrics
Courtney Nash presents data from the Verica Open Incident Database (VOID) to demonstrate how aggregate incident metrics (MTTR) aren't representative of systems' resilience.
-
Tales of Kafka @Cloudflare: Lessons Learnt on the Way to 1 Trillion Messages
Andrea Medda and Matt Boyle discuss Kafka on the way to one trillion messages, and the internal tools used to ease adoption as well as improve resiliency.
-
Offline and Thriving: Building Resilient Applications with Local-First Techniques
Carl Sverre explores how popular apps like WhatsApp, Figma, and Linear elevate their user experience and facilitate seamless collaboration by harnessing the power of offline-first techniques.
-
Did the Chaos Test Pass?
Christina Yakomin discusses how to run Chaos experiments with Vanguard technologies.
-
24/7 State Replication
Todd Montgomery discusses lessons learned in designing systems, especially those based on replicated state machines, that need to continue operating.when things go wrong.
-
The Scientific Method for Testing System Resilience
Christina Yakomin discusses the Scientific Method, and how Vanguard draws inspiration from it in their resilience testing efforts, covering the "Failure Modes and Effects Analysis" technique.
-
Resiliency Superpowers with eBPF
Liz Rice considers several facets where eBPF can help, from dynamic vulnerability patching through super-fast load balancing to multi-cluster networking.
-
Building Trust & Confidence with Security Chaos Engineering
Aaron Rinehart shares his experience on Security focused Chaos Engineering used to build trust and confidence, proactively identifying and navigating security unknowns.