InfoQ Homepage Resilience Content on InfoQ
-
Failure: The Good Parts
Viktor Klang keynotes on the imminence and the need to prepare for failure along with several ways of managing failure in case it happens.
-
How Netflix Architects for Survival
Jeremy Edberg discusses how Netflix designs their systems in order to survive outages, network latency and random instance failure.
-
Partitions for Everyone!
Kyle Kingsbury discusses some of the limitations found in distributed systems and the way some of them behave under partitioning.
-
Resiliency through Failure - Netflix's Approach to Extreme Availability in the Cloud
Ariel Tseitlin discusses Netflix' failure-based suite of tools, collectively called the Simian Army, used to improve resiliency and maintain the cloud environment.
-
Systems that Run Forever Self-heal and Scale
Joe Armstrong outlines the architectural principles needed for building scalable fault-tolerant systems built from small isolated parallel components which communicate though well-defined protocols.