InfoQ Homepage Resilience Content on InfoQ
-
How We Improved Application’s Resiliency by Uncovering Our Hidden Issues Using Chaos Testing
This article lists the chaos testing principles which are outlined by Netflix. The readers should be able to understand the advantages and disadvantages that chaos testing offers. This will help them to decide whether they want to perform it or not. The article also explains why we should convince the management to perform chaos tests, considering all benefits over the risks.
-
How Do We Utilize Chaos Engineering to Become Better Cloud-Native Engineers?
Engineers these days are closer to the product and the customer needs—there is still a long way to go and companies are still struggling with how to get engineers closer to their customers to understand in-depth what their business impact is: what do they solve, what’s their influence on the customer, and what is their impact on the product?
-
Chaos Engineering and Observability with Visual Metaphors
This article introduces a new actor for visualising chaos engineering and observability: metaphors. It provides the conceptual foundations of chaos engineering and observability, presents a state of art of visualisation techniques available in the market and shows how treemaps, gauge charts, geocentric and city metaphors can enrich the spectrum of the visual strategies to observe the chaos.
-
DevOps and Cloud InfoQ Trends Report - July 2021
This article summarizes how we see the "cloud computing and DevOps" space in 2021, which focuses on fundamental infrastructure and operational patterns, the realization of patterns in technology frameworks, and the design processes and skills that a software architect or engineer must cultivate.
-
Building Reliable Software Systems with Chaos Engineering
Advances in large-scale, distributed software systems are changing the game for software engineering. As an industry, we are quick to adopt practices that improve flexibility and improve feature velocity. If we can move quickly, can we do so without breaking things? Chaos Engineering practices can be used to navigate complexity and build more reliable systems.
-
Continuous Learning as a Tool for Adaptation
The fifth and capstone article in a series on how software companies adapted and continue to adapt to enhance their resilience explores key themes with a special view on the practicality of organizational resilience. It also provides practical guidance to engineering leadership and recommendations on how to create this investment.
-
Software Architecture and Design InfoQ Trends Report—April 2021
An overview of how the InfoQ editorial team sees the Software Architecture and Design topic evolving in 2021, with a focus on what architects are designing for today.
-
Designing & Managing for Resilience
The fourth article in a series on how software companies adapted and continue to adapt to enhance their resilience explores the strategies used by engineering leaders to help create the conditions for sustained resilience. It provides stories, examples, and strategies towards designing an organizational structure to support resilient performance and managing for resilience.
-
Adaptive Frontline Incident Response: Human-Centered Incident Management
The third article in a series on how software companies adapted and continue to adapt to enhance their resilience zeros in on the sources that comprise most of your company’s adaptive resources: your frontline responders. In this article, we draw on our experiences as incident commanders with Twilio to share our reflections on what it means to cultivate resilient people.
-
Learning from Incidents
Jessica DeVita (Netflix) and Nick Stenning (Microsoft) have been working on improving how software teams learn from incidents in production. In this article, they share some of what they’ve learned from the research community in this area, and offer some advice on the practical application of this work.
-
Meeting the Challenges of Disrupted Operations: Sustained Adaptability for Organizational Resilience
The first article in a series on how software companies adapted and continue to adapt to enhance their resilience starts by laying a foundation for thinking about organizational resilience. It looks at what organizations can do structurally during surprising and disruptive events to establish conditions that help engineering teams adapt in practice and in real time as disruptive events occur.
-
InfoQ 2020 Recap, Editor Recommendations, and Best Content of the Year
As 2020 is coming to an end, we created this article listing some of the best posts published this year. This collection was hand-picked by nine InfoQ Editors recommending the greatest posts in their domain. It's a great piece to make sure you don't miss out on some of the InfoQ's best content.