InfoQ Homepage Chaos Engineering Content on InfoQ
-
How Google Does Chaos Testing to Improve Spanner's Reliability
To ensure their Spanner database keeps working reliably, Google engineers use chaos testing to inject faults into production-like instances and stress the system's ability to behave in a correct way in the face of unexpected failures.
-
Chaos Engineering Service Azure Chaos Studio Now Generally Available
Two years after entering public preview, reliability experimentation service Azure Chaos Studio is now generally available. Among its most recent features are experiment templates, dynamic targets, load testing faults, and more.
-
Filibuster: Automated Fault Injection Tool to Improve DoorDash's Reliability
DoorDash recently revealed how they are using Filibuster, an automated fault injection tool, to identify resilience issues in microservice applications early on and improve platform reliability.
-
Microsoft Announces Azure Chaos Studio in Public Preview
At the recent Ignite, Microsoft announced the public preview of Azure Chaos Studio, a fully-managed experimentation service to help customers track, measure, and mitigate faults with controlled chaos engineering to improve the resilience of their cloud applications.
-
Litmus 2.0 Release Includes Multi-Tenancy, Chaos Workflows, GitOps, and Observability
Last month, Litmus 2.0 was released for general availability, with the goal of simplifying chaos engineering by adding new features like chaos center, chaos workflows, GitOps for chaos, multi-tenancy, observability, and private chaos hubs. InfoQ interviewed Umasankar Mukkara, CEO of ChaosNative and co-creator and maintainer of Litmus engineering platform.
-
Gremlin Adds Automated Service Discovery for Targeting Chaos Experiments
Gremlin, a chaos engineering platform, recently announced automated service discovery. This new feature will auto discover services running within dynamic environments. These services are then available to target for chaos experiments. Gremlin has also added role based access control for their API keys.
-
Cheryl Hung on Trends in Cloud Native and DevOps for 2021
In a recent keynote for The DEVOPS Conference, Cheryl Hung, VP ecosystem for the Cloud Native Computing Foundation (CNCF), shared her top 10 predictions for cloud native in the upcoming year. This includes improvements in cross cloud support, growth in GitOps and chaos engineering practices, and an increase in the adoption of FinOps.
-
InfoQ Live March 16: Explore Ways of Reducing Uncertainty in Software Delivery
InfoQ Live, the one-day virtual event for software engineers and architects, returns on March 16th with a new edition, this time focusing on ways to reduce the uncertainty of your software development cycle.
-
Gremlin Aims to Reduce Kubernetes Noisy Neighbours through Chaos Engineering
Gremlin has released enhancements to its Chaos Engineering platform aimed at DevOps engineers interested in future-proofing Kubernetes clusters by isolating "noisy neighbours". On Kubernetes, the noisy neighbour issue occurs when multiple applications sharing a Kubernetes cluster compete for resources leading to degraded performance.
-
Gremlin Releases State of Chaos Engineering 2021 Report
Gremlin released their State of Chaos Engineering 2021 report based on a community survey and their own product data. The key findings include a positive correlation between running chaos engineering experiments and increased availability.
-
AWS Announces Chaos Engineering as a Service Offering
AWS has announced the upcoming release of their chaos engineering as a service offering. The Fault Injection Service (FIS) will provide fully-managed chaos experiments across a number of AWS services. The service includes pre-built templates that generate disruptions mimicking common real-world events. It can be integrated into CI pipelines via API.
-
Chaos Engineering on Kubernetes : Chaos Mesh Generally Available with v1.0
The Chaos Mesh team announced the general availability (GA) of Chaos Mesh 1.0 after it was accepted as a CNCF sandbox project in July 2020. Chaos Mesh is a tool to perform chaos engineering experiments on Kubernetes applications.
-
Chaos Conf Q&A: Adrian Cockcroft & Yury Niño Roa
In preparation for ChaosConf 2020, InfoQ sat down with Adrian Cockcroft and Yury Niño Roa to explore topics of interest in the chaos engineering community. Key takeaways included: there are clear benefits to running “game days” to develop psychological safety, and the future of chaos engineering points toward incorporating security and scaling up experiments to test larger failure modes.
-
An Open Source Chaos Engineering Library from AWS
AWS engineers recently wrote about an open source chaos engineering tool called AWSSSMChaosRunner that they used to test fault injection in Prime Video. Built using AWS Systems Manager that can execute arbitrary commands on EC2 instances, the team was able to mitigate latency related issues using it.
-
Gremlin Announces General Availability of Status Checks
Gremlin recently announced the general availability of Status Checks. This new feature automatically validates systems that are healthy and ready for running chaos experiments in production.