InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage Fault Tolerance Content on InfoQ

News

RSS Feed

Newer Older

Architecture & Design

How to Achieve a Resilient Architecture

To manage systems at scale you must push your system almost to the breaking point, but still be able to recover – and embrace failures, Adrian Hornsby writes in two blog posts sharing his experiences from working with large-scale systems for more than a decade, and the patterns he has found useful.

Jan Stenberg
on Sep 13, 2018
DevOps

Chaos Engineering at LinkedIn: The “LinkedOut” Failure Injection Testing Framework

The LinkedIn Engineering team has recently discussed their “LinkedOut” failure injection testing framework. Hypotheses about service resilience can be formulated and failure triggers injected via the LinkedIn LiX A/B testing framework or via data in a cookie that is passed through the call stack using the Invocation Context (IC) framework. Failure scenarios include errors, delays and timeouts.

Daniel Bryant
on Jun 24, 2018
AI, ML & Data Engineering

Microservices Resiliency and Fault Tolerance Using Istio and Kubernetes

Animesh Singh and Tommy Li from IBM spoke at the recent KubeCon + CloudNativeCon North America 2017 Conference about the microservices resiliency and fault tolerance leveraging Istio framework. They also showed how to configure and use circuit breakers and other resiliency features using Istio.

Srini Penchikala
on Jan 15, 2018
DevOps

Chaos Engineering at Twilio

The Twilio team describes their foray into Chaos Engineering where they use Gremlin to inject failures into their homegrown queuing system shards to test for automated recovery.

Hrishikesh Barua
on Dec 25, 2017
Java

What's New in MicroProfile 1.2

The Eclipse Foundation recently released MicroProfile version 1.2. New APIs added to this release include improved communications among microservices, response to system faults, and the JSON Web Toolkit (JWT). Emily Jiang, CDI and MicroProfile development lead at IBM, and Michael Croft, Java middleware consultant at Payara, spoke to InfoQ about this latest release.

Michael Redlich
on Nov 30, 2017
DevOps

Expedia's Journey toward Site Resiliency: Embracing Chaos Testing in Dev and Production at QCon SF

At QCon SF, Sahar Samiei and Willie Wheeler presented “Expedia’s Journey Toward Site Resiliency”, and discussed the building of a community of practice around resilience testing within Expedia. The results have generally been positive: Netflix’s Chaos Monkey has been running daily in production since May 15th; and resilience tests have been added to four Tier 1 service pipelines.

Daniel Bryant
on Nov 19, 2017
Architecture & Design

Relearning Functional Service Design for Microservices: Uwe Friedrichsen at microXchg

The opening talk of the microXchg microservices conference was delivered by Uwe Friedrichsen, and discussed “Resilient Functional Service Design”. Key takeaways included: microservice developers should learn about fault tolerant design patterns and caching; understanding Domain-Driven Design (DDD) and modularity is vital; and aim for replaceability of components rather than reuse.

Daniel Bryant
on Feb 19, 2017
Java

Google Kick-Starts Git Ketch: A Fault-Tolerant Git Management System

Although development has only started, Google has announced their first commits of Git Ketch, a multi-master Git management system that replicates information across multiple Git servers for resilience and scalability. The changes are based on JGit, a Java-based Git server, although other Git servers may be part of the multi-master cluster.

Abraham Marín Pérez
on Feb 02, 2016
FoundationDB SQL Layer: Storing SQL Data in a NoSQL Database

FoundationDB has announced the general availability of SQL Layer, and ANSI SQL engine that runs on top of their key-value store. The result is a relational database backed up by a scalable, fault-tolerant, shared-nothing, distributed NoSQL store with support for multi-key ACID transactions.

Abel Avram
on Sep 10, 2014
Refreshed AWS Trusted Advisor Offers Several Free Checks

Amazon Web Services (AWS) has recently integrated the AWS Trusted Advisor into the AWS Management Console and made four security and service limit checks available at no charge. Additional checks from the security, performance, fault tolerance and cost optimization categories remain part of their Business and Enterprise support tiers.

Steffen Opel
on Aug 31, 2014
The Netflix API Optimization Story

The Netflix API optimization story is an interesting journey from a generic one-size-fits-all static REST API architecture to a more dynamic architecture that lends power to the client team to define and deploy their custom service endpoints. InfoQ spoke to Ben Christensen regarding this client adapter layer as well as the services layer redesign.

Jeevak Kasarkod
on Feb 08, 2013
10gen: MongoDB’s Fault Tolerance Is Not Broken

A Cornell University professor claims MongoDB’s fault tolerance system is “broken by design”. 10gen responds through its Technical Director, rejecting the claims.

Abel Avram
on Feb 07, 2013
Architecture & Design

Netflix Hystrix - Latency and Fault Tolerance for Complex Distributed Systems

Netflix has released Hystrix, a library designed to control points of access to remote systems, services and 3rd party libraries, providing greater tolerance of latency and failure. Hystrix features thread and semaphore isolation with fallbacks and circuit breakers, request caching and request collapsing, and monitoring and configuration.

Bienvenido David
on Dec 21, 2012
Introducing Windows New File System: ReFS

For the first time since 1993 Microsoft is posed to offer a new file system architecture. ReFS or Resilient File System is designed to both improve reliability and as a chance to drop obsolete features offered by NTFS.

Jonathan Allen
on Jan 17, 2012
Akka 1.1 Released, Brings Many Improvements to Futures and Performance, Reduces Dependencies,

Akka 1.1 was released with many improvements in performance, Futures and more. The basic Akka also has no dependencies except for Scala 2.9. InfoQ caught up with Jonas Bonér to talk about the current state and the future of Akka.

Werner Schuster
on May 12, 2011

Newer News

Older News

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

News