InfoQ Homepage Reliability Content on InfoQ

News

RSS Feed

Newer Older

DevOps

Adopting Continuous Deployment: Tom Wanielista at QCon San Francisco 2022

At QCon San Francisco 2022, Tom Wanielista, a staff engineer on infrastructure at Lyft, presented on Adopting Continuous Deployment at his company. The talk is part of one of the editorial tracks called "Architecting Change at Scale."

Steef-Jan Wiggers
on Oct 25, 2022
Architecture & Design

Filibuster: Automated Fault Injection Tool to Improve DoorDash's Reliability

DoorDash recently revealed how they are using Filibuster, an automated fault injection tool, to identify resilience issues in microservice applications early on and improve platform reliability.

Tanmay Deshpande
on Sep 26, 2022
Cloud

Google Introduces Cloud Backup and Disaster Recovery

Google recently introduced Cloud Backup and Disaster Recovery (DR), allowing customers to enable centralized backup management directly from the Google Cloud console. The new backup and recovery service is designed to work with cloud storage repositories, databases, and applications.

Steef-Jan Wiggers
on Sep 18, 2022
Culture & Methods

Developing and Evolving SaaS Infrastructures for Enterprises

SaaS companies that are focused on the enterprise market need to evolve their infrastructure to meet the security, reliability, and other IT requirements of their customers. IT admins and large customers are two important sources of requirements to drive development.

Ben Linders
on Aug 04, 2022
Architecture & Design

Building Resiliency into the Twitter Ad Pacing Service

Twitter’s ad pacing algorithms were initially part of an ad-serving monolith. Later, Twitter’s engineering extracted them into a separate service to facilitate its development. Being an important service, it needs to be very reliable. An article was published recently describing how they built a reliable service by making economical design choices on managing different failure scenarios.

Vasco Veloso
on Apr 20, 2022
Cloud

AWS Increases the Availability and Reliability of Amazon EventBridge with Global Endpoints

Recently, AWS introduced a new capability called global endpoints for its serverless event bus service Amazon EventBridge to improve availability and reliability.

Steef-Jan Wiggers
on Apr 13, 2022
Culture & Methods

Measuring the Environmental Impact of Software and Cloud Services

Software has an influence on the limitation of the service life or the increased energy consumption. It’s possible to measure the environmental impacts that are caused by cloud services. The design of the software architecture determines how much hardware and electrical power is required. Software can be economical or wasteful with hardware resources.

Ben Linders
on Mar 17, 2022
Architecture & Design

Real-Time Exactly-Once Event Processing at Uber with Apache Flink, Kafka, and Pinot

Uber faced some challenges after introducing ads on UberEats. The events they generated had to be processed quickly, reliably and accurately. These requirements were fulfilled by a system based on Apache Flink, Kafka, and Pinot that can process streams of ad events in real-time with exactly-once semantics. An article describing its architecture was published recently in the Uber Engineering blog.

Vasco Veloso
on Nov 12, 2021
Architecture & Design

How GitHub Partitioned Its Relational Database to Improve Reliability at Scale

GitHub has been working for the last couple of years on partitioning their relational database and moving the data to multiple independent clusters. This effort led to a 50% load reduction and a significant reduction of database-related incidents, explains GitHub engineer Thomas Maurer.

Sergio De Simone
on Sep 30, 2021
Architecture & Design

Reviewing the Eight Fallacies of Distributed Computing

In a recent article on Ably Blog, Alex Diaconu reviewed the eight fallacies of distributed computing and provided a number of hints at how to handle them. InfoQ has taken the chance to talk with Diaconu to learn more about how Ably engineers deal with the fallacies.

Sergio De Simone
on Sep 03, 2021
Culture & Methods

Artificial Intelligence for IT Operations: an Overview

Artificial intelligence for IT operations (AIOps) combines sophisticated methods from deep learning, data streaming processing, and domain knowledge to analyse infrastructure data from internal and external sources to automate operations and detect anomalies (unusual system behavior) before they impact the quality of service.

Ben Linders
on Jul 22, 2021
DevOps

Auth0's Move to a Single-Cloud Architecture on AWS

Auth0, a provider of authentication, authorization and single sign on services, moved their infrastructure from multiple cloud providers (AWS, Azure and Google Cloud) to just AWS. An increasing dependency on AWS services necessitated this, and today their systems are spread across four AWS regions, with services replicated across zones.

Hrishikesh Barua
on Aug 25, 2018
DevOps

How DevOps Principles Are Being Applied to Networking

Practices from the DevOps world are being adopted into managing networking services. Vendor hardware, configuration tools and deployment modes have eased programmable configuration and automation of network devices and functions.

Hrishikesh Barua
on Jan 22, 2018
Culture & Methods

Using Models in Developing Software for Self-Driving Cars

Models play an important role in developing software for autonomous systems like self-driving cars; they are used to simulate and verify behavior, document the system, and generate code. Jonathan Sprinkle explains how to model software used in autonomous systems, the benefits of modeling, using test data to validate the software that drives a car and techniques for writing reliable code.

Ben Linders
on Jul 28, 2016
Development

GitHub’s DGit Improves Reliability, Performance, and Availability

GitHub has been quietly rolling out DGit, short for “distributed Git”, a new distributed storage system built on top of Git with the aim of improving reliability, availability, and performance of using GitHub.

Sergio De Simone
on Apr 07, 2016

Newer News

Older News

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

News