InfoQ Homepage Availability Content on InfoQ
-
Meta Switches to MySQL Raft to Improve Reliability and Operational Simplicity
Meta is rolling out MySQL Raft in its data centers to replace its current MySQL semisynchronous databases. The new consensus engine helps operation and allows MySQL servers to take responsibility for promotions and membership.
-
Testing Advanced Driver Assistance Systems
Advanced driver assistance systems can have a huge number of test cases. Cutting the elephant into smaller pieces can ensure every bit and piece is tested. A good test environment is essential to be efficient, fast and flexible to cover all required tests to ensure quality. Testers should be involved in the project right from the beginning to avoid task-forces, quality- or delivery problems.
-
Atlassian Exceeds 99.9999% of Availability Using Sidecars and Highly Fault-Tolerant Design
Atlassian recently published how it exceeded 99.9999% of availability with its Tenant Context Service. Atlassian achieved this high availability by implementing highly-autonomous client sidecars, able to proactively shield themselves from complete AWS region failures. Sidecars query multiple services concurrently to accomplish this goal and ensure that requests are entirely isolated internally.
-
Slack Implements Circuit Breakers to Improve CI/CD Pipeline Availability
Slack recently published how it implemented the Circuit Breaker pattern to improve its CI/CD pipeline availability. Before this project, engineers at Slack saw challenges as peak request volumes in internal tooling caused cascade failures in dependent systems. Since completion, engineers saw increased service availability and fewer bad developer experiences like flakiness from failing services.
-
AWS Increases the Availability and Reliability of Amazon EventBridge with Global Endpoints
Recently, AWS introduced a new capability called global endpoints for its serverless event bus service Amazon EventBridge to improve availability and reliability.
-
AWS Delivers a New Unified Service Health Dashboard
Recently, AWS updated its Service Health Dashboard with an improved Interface, better responsiveness, and integration with Personal Health Dashboard – all combined in a new Health Dashboard.
-
AWS Details Its Local Zones’ Expansion Disclosing 32 Cities Worldwide
In December last year, AWS announced the launch of over 30 new AWS Local Zones in significant cities worldwide, however yet not disclosing which cities. The company now announced the completion of its first 16 AWS Local Zones in the U.S. and plans to launch new AWS Local Zones in 32 new metropolitan areas in 26 countries worldwide.
-
AWS Announces Further Worldwide Expansion of Local Zones
AWS Local Zones are an infrastructure deployment that places compute, storage, database, and other select AWS services close to a large population and industrial centers. And recently, AWS announced the launch of over 30 new AWS Local Zones in significant cities worldwide.
-
How GitHub Partitioned Its Relational Database to Improve Reliability at Scale
GitHub has been working for the last couple of years on partitioning their relational database and moving the data to multiple independent clusters. This effort led to a 50% load reduction and a significant reduction of database-related incidents, explains GitHub engineer Thomas Maurer.
-
AWS Releases Amazon Route 53 Application Recovery Controller into General Availability
Recently, AWS announced the general availability (GA) of Amazon Route 53 Application Recovery Controller, an additional new set of capabilities in Amazon Route 53. With the capabilities, it will be easier for customers to continuously monitor their applications’ ability to recover from failures and control their recovery across AWS Regions, Availability Zones, and on-premises infrastructure.
-
Microsoft Announces Support for Azure Container Registry across Availability Zones in Public Preview
Microsoft recently announced the public preview for support of Azure Container Registry across Availability Zones. The Zone redundancy provides resiliency and high availability to a registry or replication resource (replica) in a specific region.
-
Large-Scale Infrastructure Hardware Availability at Facebook
Facebook's engineering team wrote about the different methodologies used at Facebook that help maintain a high degree of hardware availability in their data centers.
-
Microsoft Introduces the Azure Well-Architected Framework
In a recent blog post, Microsoft introduced the Azure Well-Architected Framework, which provides customers with a set of Azure architecture best practices to help them build and deliver well-designed solutions.
-
GitHub Was down Multiple Times Last February: Here's Why
GitHub completed its internal investigation about what caused multiple service interruptions that affected its service last February for over eight hours. The root cause for this was a combination of unexpected database load variation and database configuration issues.
-
Interfacing Elixir with Rust to Improve Performance: Discord's Story
When the Discord team hit a hard-limit on BEAM's performance dealing with large data structures, they resorted to interfacing Elixir with Rust to make their system able to scale up to 11 million concurrent users.