InfoQ Homepage Resilience Content on InfoQ
-
Building Reliability One Step at a Time
Ana Margarita Medina shares how she has been using Chaos Engineering and how it can be used to decouple our system’s weak points, learn from incidents and improve monitoring and observability.
-
A Sticky Situation: How Netflix Gains Confidence in Changes
Haley Tucker discusses sticky canaries, what they are and how they can help, and how to build confidence in changes.
-
Scaling Culture of Resiliency in the Enterprise
Nate Vogel shares how he grew the data engineering team with an emphasis on building a culture of reliability, discussing processes and tools used.
-
IBM’s Principles of Chaos Engineering
Haytham Elkhoja discusses the process of getting engineers from across to agree on a list of Chaos Engineering principles, adapting existing principles to customer requirements and internal services.
-
Self-Service Chaos Engineering: Fitting Gremlin into a DevOps Culture
Doug Campbell shares how they rolled out Gremlin at Grubhub and how they educated and enabled all engineering teams to use it.
-
Continuous Resilience
Adrian Cockcroft talks about how to build robust systems by being more systematic about hazard analysis, and including the operator experience in the hazard model.
-
Certainty among the Chaos
Marco Coulter discusses the capabilities of chaos engineering beyond resiliency to support capacity optimization.
-
The More You Know: a Guide to Understanding Your Systems
Tyler Wells shares how Twilio developed a template that enables them to understand their systems better, identify critical metrics to watch, and how to use Chaos Engineering to verify it all.
-
Convergence of Chaos Engineering and Revolutionized Technology Techniques
Yury Niño Roa explores how emerging paradigms can use Chaos Engineering to manage the pains in the path toward providing a solution, showing how Chaos Engineering can benefit from AI.
-
Let Devs Be Devs: Abstracting away Compliance and Reliability to Accelerate Modern Cloud Deployments
Rahul Arya shares how they built a platform to abstract away compliance, make reliability with Chaos Engineering completely self-serve, and enable developers to ship code faster.
-
Identifying Hidden Dependencies
Liz Fong-Jones discusses some of the manual experiments they ran at Honeycomb, the bugs discovered in some automatic replacement tools, and what steps they took for continuously running experiments.
-
Automating Chaos Attacks
Daniel Albuquerque and Nikos Katirtzis show how to run attacks in both manual and automated ways.