InfoQ Homepage Resilience Content on InfoQ
-
Did the Chaos Test Pass?
Christina Yakomin discusses how to run Chaos experiments with Vanguard technologies.
-
24/7 State Replication
Todd Montgomery discusses lessons learned in designing systems, especially those based on replicated state machines, that need to continue operating.when things go wrong.
-
The Scientific Method for Testing System Resilience
Christina Yakomin discusses the Scientific Method, and how Vanguard draws inspiration from it in their resilience testing efforts, covering the "Failure Modes and Effects Analysis" technique.
-
Resiliency Superpowers with eBPF
Liz Rice considers several facets where eBPF can help, from dynamic vulnerability patching through super-fast load balancing to multi-cluster networking.
-
Building Trust & Confidence with Security Chaos Engineering
Aaron Rinehart shares his experience on Security focused Chaos Engineering used to build trust and confidence, proactively identifying and navigating security unknowns.
-
Making Applications Resilient with a Smart Application Aware Network
Varun Talwar goes through three patterns that can be used to make an application highly available by transparently injecting application aware network components which can improve resiliency.
-
Architecting for Resilience Panel
Nora Jones, Dan Lorenc, and Varun Talwar discuss what architecting for resiliency means, sharing ready-to-use examples, and ideas that can be employed in other contexts.
-
Resilience in Supply Chain Security
Dan Lorenc goes over real-world threats facing open source supply-chains today, and what can be done to architect resilient build and delivery pipelines.
-
Panel: Observability and Understandability
Jason Yee, John Egan, and Ben Sigelman discuss their approaches and preferred methods to get impactful results in incident management, distributed tracing, and chaos engineering.
-
Incident Analysis: Your Organization's Secret Weapon
Nora Jones discusses how to move faster and focus on the things that matter by using incident analysis.
-
More More More! Why the Most Resilient Companies Want More Incidents
John Egan discusses how companies of any scale can improve their understandability by lowering their barriers to incident reporting and simplifying their processes for documenting postmortems.
-
Complex Systems: Microservices and Humans
Katharina Probst discusses some of the best practices to build, evolve, and operate microservices, learnings from containers, service meshes, DevOps, Chaos & load testing, and planning for growth.