InfoQ Homepage Resilience Content on InfoQ
-
SeaMonkeys - Chaos in the War Room
Glen Ford describes his experience applying a very early form of chaos testing to naval combat systems in the Australian military in the late 1990s and draws the parallels to modern SRE.
-
The Abyss of Ignorable: a Route into Chaos Testing from Starling Bank
Greg Hawkins describes how Starling Bank introduced a chaos engineering practice, starting in 2016 with their own simple chaos daemon.
-
Applying Chaos Engineering in Healthcare: Getting Started with Sensitive Workloads
Carl Chesser shares what the teams at Cerner Corporation, a healthcare information technology company, found to be effective in introducing chaos engineering with their systems.
-
Failover Conf Q&A on Building Reliable Systems: People, Process, and Practice
One of the biggest engineering challenges associated with maintaining or increasing the reliability of a system is knowing where to invest time and energy. InfoQ recently sat down with several engineers and technical leaders who are involved with the upcoming Failover Conf virtual event, and asked their opinion on the best practices for building and running reliable systems.
-
The Fundamental Truth behind Successful Development Practices: Software is Synthetic
Software systems are creative compounds, emergent and generative; the product of complex interactions between people and technology. They are different from the orderly, analytic worlds that our school-age selves expect to find. Being so full of complexity and uncertainty, we use a different way to arrive at a solution.
-
InfoQ Editors' Recommended Talks from 2019
As part of the 2019 end-of-year-summary content, this article collects together a list of recommended presentation recordings from the InfoQ editorial team.
-
SLOs Are the API for Your Engineering Team
SLOs provide a simple common language for evaluating risk in terms of error budgets. SLOs save everyone involved both time and energy, which you can redirect toward more important things, like keeping your customers happy.
-
How to Use Chaos Engineering to Break Things Productively
Chaos can be a preventative for calamity. It's predicated on the idea of failure as the rule rather than the exception, and it led to the development of the first dedicated chaos engineering tools. This article explores chaos engineering, and how to apply it.
-
Designing Chaos Experiments, Running Game Days, and Building a Learning Organization: Chaos Conf Q&A
The second Chaos Conf event is taking place in San Francisco over 25-26 September. In preparation for the conference, InfoQ sat down with a number of the presenters, and discussed topics such as the evolution and adoption of chaos engineering, key people and process learning from running chaos experiments, and what the biggest blockers are for mainstream adoption.
-
An Engineer’s Guide to a Good Night’s Sleep
Increased microservices adoption, fueled by the move to the cloud where architectures and infrastructure can flex and be ephemeral, adds complexity every day to the systems we create and maintain. This takes place alongside operating models with autonomous and totally empowered teams, so each distributed system has its own tapestry of technical approaches, languages, and services.
-
DevOps and Cloud InfoQ Trends Report - February 2019
An overview of how the “cloud computing” and DevOps space is evolving in 2019 including updates on Kubernetes, Chaos Engineering, Service meshes and more.
-
Towards Successful Resilient Software Design
In this article, Uwe Friedrichsen explains the “why” and “what” of resilient software design, discusses the challenges he has met most often in recent years, and shares his thoughts on how to implement resilient software design in your organisation.