InfoQ Homepage Availability Content on InfoQ

News

RSS Feed

Newer Older

AI, ML & Data Engineering

Meta Switches to MySQL Raft to Improve Reliability and Operational Simplicity

Meta is rolling out MySQL Raft in its data centers to replace its current MySQL semisynchronous databases. The new consensus engine helps operation and allows MySQL servers to take responsibility for promotions and membership.

Renato Losio
on May 20, 2023
Culture & Methods

Testing Advanced Driver Assistance Systems

Advanced driver assistance systems can have a huge number of test cases. Cutting the elephant into smaller pieces can ensure every bit and piece is tested. A good test environment is essential to be efficient, fast and flexible to cover all required tests to ensure quality. Testers should be involved in the project right from the beginning to avoid task-forces, quality- or delivery problems.

Ben Linders
on Dec 15, 2022
Architecture & Design

Atlassian Exceeds 99.9999% of Availability Using Sidecars and Highly Fault-Tolerant Design

Atlassian recently published how it exceeded 99.9999% of availability with its Tenant Context Service. Atlassian achieved this high availability by implementing highly-autonomous client sidecars, able to proactively shield themselves from complete AWS region failures. Sidecars query multiple services concurrently to accomplish this goal and ensure that requests are entirely isolated internally.

Eran Stiller
on Sep 28, 2022
Architecture & Design

Slack Implements Circuit Breakers to Improve CI/CD Pipeline Availability

Slack recently published how it implemented the Circuit Breaker pattern to improve its CI/CD pipeline availability. Before this project, engineers at Slack saw challenges as peak request volumes in internal tooling caused cascade failures in dependent systems. Since completion, engineers saw increased service availability and fewer bad developer experiences like flakiness from failing services.

Eran Stiller
on Aug 25, 2022
Cloud

AWS Increases the Availability and Reliability of Amazon EventBridge with Global Endpoints

Recently, AWS introduced a new capability called global endpoints for its serverless event bus service Amazon EventBridge to improve availability and reliability.

Steef-Jan Wiggers
on Apr 13, 2022
Cloud

AWS Delivers a New Unified Service Health Dashboard

Recently, AWS updated its Service Health Dashboard with an improved Interface, better responsiveness, and integration with Personal Health Dashboard – all combined in a new Health Dashboard.

Steef-Jan Wiggers
on Mar 09, 2022
Cloud

AWS Details Its Local Zones’ Expansion Disclosing 32 Cities Worldwide

In December last year, AWS announced the launch of over 30 new AWS Local Zones in significant cities worldwide, however yet not disclosing which cities. The company now announced the completion of its first 16 AWS Local Zones in the U.S. and plans to launch new AWS Local Zones in 32 new metropolitan areas in 26 countries worldwide.

Steef-Jan Wiggers
on Feb 23, 2022
Cloud

AWS Announces Further Worldwide Expansion of Local Zones

AWS Local Zones are an infrastructure deployment that places compute, storage, database, and other select AWS services close to a large population and industrial centers. And recently, AWS announced the launch of over 30 new AWS Local Zones in significant cities worldwide.

Steef-Jan Wiggers
on Dec 31, 2021
Architecture & Design

How GitHub Partitioned Its Relational Database to Improve Reliability at Scale

GitHub has been working for the last couple of years on partitioning their relational database and moving the data to multiple independent clusters. This effort led to a 50% load reduction and a significant reduction of database-related incidents, explains GitHub engineer Thomas Maurer.

Sergio De Simone
on Sep 30, 2021
Cloud

AWS Releases Amazon Route 53 Application Recovery Controller into General Availability

Recently, AWS announced the general availability (GA) of Amazon Route 53 Application Recovery Controller, an additional new set of capabilities in Amazon Route 53. With the capabilities, it will be easier for customers to continuously monitor their applications’ ability to recover from failures and control their recovery across AWS Regions, Availability Zones, and on-premises infrastructure.

Steef-Jan Wiggers
on Aug 10, 2021
Cloud

Microsoft Announces Support for Azure Container Registry across Availability Zones in Public Preview

Microsoft recently announced the public preview for support of Azure Container Registry across Availability Zones. The Zone redundancy provides resiliency and high availability to a registry or replication resource (replica) in a specific region.

Steef-Jan Wiggers
on Jan 11, 2021
DevOps

Large-Scale Infrastructure Hardware Availability at Facebook

Facebook's engineering team wrote about the different methodologies used at Facebook that help maintain a high degree of hardware availability in their data centers.

Hrishikesh Barua
on Dec 13, 2020
Cloud

Microsoft Introduces the Azure Well-Architected Framework

In a recent blog post, Microsoft introduced the Azure Well-Architected Framework, which provides customers with a set of Azure architecture best practices to help them build and deliver well-designed solutions.

Steef-Jan Wiggers
on Aug 04, 2020
DevOps

GitHub Was down Multiple Times Last February: Here's Why

GitHub completed its internal investigation about what caused multiple service interruptions that affected its service last February for over eight hours. The root cause for this was a combination of unexpected database load variation and database configuration issues.

Sergio De Simone
on Mar 31, 2020
Development

Interfacing Elixir with Rust to Improve Performance: Discord's Story

When the Discord team hit a hard-limit on BEAM's performance dealing with large data structures, they resorted to interfacing Elixir with Rust to make their system able to scale up to 11 million concurrent users.

Sergio De Simone
on Jul 02, 2019

Newer News

Older News

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

News