InfoQ Homepage Reliability Content on InfoQ

Presentations

RSS Feed

Newer Older

Culture & Methods

Shifting Left for Better Engineering Efficiency

Ying Dai discusses how reliability and productivity drove two critical migrations at Roblox, improving telemetry and automating change rollouts to boost engineering efficiency.

Ying Dai
on Jun 24, 2025

Icon

47:19
Architecture & Design

How We Created a High-Scale Notification System at Duolingo

Vitor Pellegrino and Zhen Zhou discuss how they built and tested Duolingo's high-scale on-demand notification system, including what it takes to manage resources and site reliability concurrently.

Vitor Pellegrino Zhen Zhou
on Sep 24, 2024

Icon

49:01
Architecture & Design

How Netflix Ensures Highly-Reliable Online Stateful Systems

Joseph Lynch discusses the architecture of Netflix's stateful caches and databases, including how they capacity plan, bulkhead, and deploy software to their global, full-active, data topology.

Joseph Lynch
on Feb 12, 2024

Icon

49:31
Architecture & Design

Reliable Architectures through Observability

Kent Quirk shows an overview of observability tools and techniques, and specific recommendations for how to fit observability into their system designs and day-to-day development process.

Kent Quirk
on Jan 11, 2024

Icon

48:59
Architecture & Design

How to Build a Reliable Kafka Data Processing Pipeline, Focusing on Contention, Uptime and Latency

Lily Mara shares how OneSignal improved the performance and maintainability of its highest-throughput HTTP endpoints (backed by a Kafka consumer in Rust) by making it an asynchronous system.

Lily Mara
on Jan 02, 2024

Icon

49:33
Architecture & Design

Architecting a Production Development Environment for Reliability

At Meta, developers use servers (devservers) to perform their daily work. This talk discusses their software architecture and the mechanisms employed to ensure they remain reliable and available.

Henrique Andrade
on Dec 19, 2023

Icon

56:52
DevOps

Building Reliability One Step at a Time

Ana Margarita Medina shares how she has been using Chaos Engineering and how it can be used to decouple our system’s weak points, learn from incidents and improve monitoring and observability.

Ana Margarita Medina
on Aug 29, 2021

Icon

39:01
Culture & Methods

Less Mess, Less Stress: the Reliability Benefits of Custom Tools

Daniel Hochman discusses how an overreliance on vendor tooling leads to worse reliability outcomes, how Lyft lowered MTTR for its most common alerts using custom tooling, and how Clutch can help.

Daniel Hochman
on Jul 27, 2021

Icon

27:07
DevOps

InfoQ Live Roundtable: Production Readiness: Building Resilient Systems

The panelists discuss observability, security, the software supply chain, CI/CD, chaos engineering, deployment techniques, canaries, blue-green deployments all in the pursuit of production resiliency.

Wesley Reisz Adam Zimman Holly Cummins Anastasiia Voitova Haley Tucker Charity Majors
on Dec 03, 2020

Icon

46:00
Architecture & Design

Chaos Engineering: the Path to Reliability

Kolton Andrus shares examples of what works, what doesn’t, and what the future holds in using Chaos Engineering to build reliability in a system.

Kolton Andrus
on Nov 26, 2020

Icon

26:42
DevOps

Reliability Matters More Than Ever

Tammy Butow discusses why reliability and resilience matter now more than ever, and how one can achieve them.

Tammy Butow
on May 22, 2020

Icon

25:46
Architecture & Design

High Performance Cooperative Distributed Systems in Adtech

Stan Rosenberg explores a set of core building blocks exhibited by Adtech platforms and applies them towards building a fraud detection platform.

Stan Rosenberg
on Oct 23, 2019

Icon

52:09

Newer Presentations

Older Presentations

Unlock the full InfoQ experience

Don't have an InfoQ account?

Topics

How to Use Apache Spark to Craft a Multi-Year Data Regression Testing and Simulations Framework

Spec Driven Development: When Architecture Becomes Executable

Somtochi Onyekwere on Distributed Data Systems, Eventual Consistency and Conflict-free Replicated Data Types

Achieving a Culture That Works: Inclusive Leadership that Drives Lasting Success

Platform-as-a-Product: Declarative Infrastructure for Developer Velocity

Helpful links

Choose your language

Presentations

Shifting Left for Better Engineering Efficiency

How We Created a High-Scale Notification System at Duolingo

How Netflix Ensures Highly-Reliable Online Stateful Systems

Reliable Architectures through Observability

How to Build a Reliable Kafka Data Processing Pipeline, Focusing on Contention, Uptime and Latency

Architecting a Production Development Environment for Reliability

Building Reliability One Step at a Time

Less Mess, Less Stress: the Reliability Benefits of Custom Tools

InfoQ Live Roundtable: Production Readiness: Building Resilient Systems

Chaos Engineering: the Path to Reliability

Reliability Matters More Than Ever

High Performance Cooperative Distributed Systems in Adtech

Inside the Development Workflow of Claude Code's Creator

Facebook Survey Reveals Growing Adoption of Typed Python for Improved Code Quality and Flexibility

Cloudflare Year in Review: AI Bots Crawl Aggressively, Post-Quantum Encryption Hits 50%, Go Doubles

How Agoda Unified Multiple Data Pipelines Into a Single Source of Truth

Solving Fragmented Mobile Analytics: Uber’s Platform-Led Approach

Spec Driven Development: When Architecture Becomes Executable

Achieving a Culture That Works: Inclusive Leadership that Drives Lasting Success

What Testers Can Do to Ensure Software Security

Things Software Developers Think They Don’t Need to Care about, But Can Impact Their Job

AI-Powered Code Editor Cursor Introduces Dynamic Context Discovery to Improve Token-Efficiency

Vercel Open-Sources Bash Tool for Context Retrieval Using Local Filesystems

QCon London 2026: Practitioner-Led Tracks on Connectivity & Production AI Engineering

Platform-as-a-Product: Declarative Infrastructure for Developer Velocity

LangGrant Unveils LEDGE MCP Server to Enable Agentic AI on Enterprise Databases

Cloudflare Scales Infrastructure as Code with Shift-Left Security Practices

QCon London

QCon AI Boston

InfoQ Software Architects' Newsletter

Presentations