InfoQ Homepage Resilience Content on InfoQ
-
Building and Trusting a Cloud Bank
Greg Hawkins discusses how Starling Bank, part of the new movement in FinTech challenger banks, is innovating while addressing the need for resilience in a world where failure is everywhere.
-
Automating Chaos Experiments in Production
Ali Basiri discusses the motivation behind ChAP (Chaos Automation Platform), how they implemented it, and how Netflix service teams are using it to identify systemic weaknesses.
-
Applying Failure Testing Research @Netflix
Kolton Andrus and Peter Alvaro present how a “big idea” -- lineage-driven fault injection -- evolved from a theoretical model into an automated failure testing service at Netflix.
-
Architecting for Failure in a Containerized World
Tom Faulhaber discusses the new container-based toolbox for building systems that are robust in the face of failures, how to recover from failure and how the tools can be used to best effect.
-
Stranger Things: The Forces that Disrupt Netflix
Haley Tucker discusses how other systems may affect Netflix' services, strategies to protect their systems and make sure they won't fail even if things go wrong.
-
WebSockets, Reactive APIs and Microservices
Todd Montgomery investigates whether WebSockets, HTTP/2, Reactive Streams and microservices can deliver the high scalability, resiliency, and ease of development promised.
-
0 to 100 days - Running DRTs at Dropbox
Thomissa Comellas shares her experiences developing and rolling out new Disaster Recovery Testing techniques at Dropbox. Tammy Butow shares how her team runs DRTs and has implemented the techniques.
-
Chaos Kong - Endowing Netflix with Antifragility
Luke Kosewski describes Flow, how it adds value to a microservice architecture, what preconditions must be met for such a recovery mechanism to succeed, and tells the story of a 2015 Q4 outage.
-
Containers Change Everything
Anne Currie talks about the architectural impact of containers, and what modern container schedulers mean for resilience, redundancy and server density.
-
Distributed Consensus: Making the Impossible Possible
Heidi Howard explores how to construct resilient distributed systems on top of unreliable components. Howard discusses which algorithms are best suited to different situations.
-
Resilient Predictive Data Pipelines
Sid Anand discusses how Agari is applying big data best practices to the problem of securing its customers from email-born threats, presenting a system that leverages big data in the cloud.
-
Resilience Planning & How the Empire Strikes Back
Bhakti Mehta approaches best practices for building resilient, stable and predictable services: preventing cascading failures, timeouts pattern, retry pattern, circuit breakers and other techniques.