InfoQ Homepage Fault Tolerance Content on InfoQ
-
Fault Tolerance 101
Joe Armstrong discusses fault tolerant systems, summarizing the key features of Erlang and showing how they can be used for programming fault-tolerant and scalable systems on multi-core clusters.
-
Fault Tolerance Made Easy
Uwe Friedrichsen discusses implementing resilient software design patterns (code included) and improving those patterns to achieve robustness and becoming a resilient software developer.
-
Fault Tolerance 101
Joe Armstrong discusses how fault tolerance relates to scalability and concurrency, and how Erlang helps build fault-tolerant systems on multi-core clusters.
-
Programming, Only Better
Bodil Stokke keynotes on the FP languages for writing bug free, fault tolerant code that help building simple, concurrent and reusable software.
-
Architecting for High Availability
Attila Narin discusses AWS concepts: Availability Zones, RDS Multi-AZ deployments, SQS and Auto Scaling, Elastic IP, load balancing, DNS, DynamoDB, Amazon S3, etc., and EC2 best practices.
-
Designing Fault Tolerant Distributed Applications
Scott Andreas discussing creating fault tolerant distributed applications, and demoes Ordasity, a framework for building self-organizing systems with services.
-
Runaway Complexity in Big Data, and a Plan to Stop It
Nathan Marz outlines several sources of complexity introduced in data systems - Lack of human fault-tolerance, Conflation of data and queries, Schemas done wrong - and what can be done to avoid them.
-
Erlang's Open Telecom Platform (OTP) Framework
Steve Vinoski introduces Erlang’s OTP Frmework, outlining some of its main features, including several behaviors – implementations of common patterns useful for concurrent fault-tolerant applications.
-
Storm: Distributed and Fault-tolerant Real-time Computation
Nathan Marz discusses Storm concepts –streams, spouts, bolts, topologies-, explaining how to use Storms’ Clojure DSL for real-time stream processing, distributed RPS and continuous computations.
-
Anomaly Detection, Fault Tolerance and Anticipation Patterns
John Allspaw discusses fault tolerance, anomaly detection and anticipation patterns helpful to create highly available and resilient systems.
-
Techniques for Scaling the Netflix API
Daniel Jacobson covers the history of Netflix’s APIs, adaptation for the cloud, development and testing, resiliency, and the future of their APIs.
-
Architecting for Failure at the Guardian.co.uk
Michael Brunton-Spall talks about various types of system failure that can happen, sharing the lessons learned at the Guardian and measures taken to prevent and mitigate failure.