InfoQ Homepage DevOps Content on InfoQ

Articles

RSS Feed

Newer Older

DevOps

From Alert Fatigue to Agent-Assisted Intelligent Observability

As systems grow, observability becomes harder to maintain and incidents harder to diagnose. Agentic observability layers AI on existing tools, starting in read-only mode to detect anomalies and summarize issues. Over time, agents add context, correlate signals, and automate low-risk tasks. This approach frees engineers to focus on analysis and judgment.

Rohit Dhawan
on Feb 04, 2026
Development

One Cache to Rule Them All: Handling Responses and In-Flight Requests with Durable Objects

Traditional caching fails to stop "thundering herds" where multiple clients trigger the same work during a miss. This article proposes using Cloudflare Durable Objects to treat in-flight work and finished results as two states of one cache entry. By routing to a single owner, systems eliminate redundant tasks. This pattern replaces complex locks with simple promises, simplifying the system design.

Gabor Koos
on Jan 28, 2026
DevOps

Preventing Data Exfiltration: a Practical Implementation of VPC Service Controls at Enterprise Scale in Google Cloud Platform

Implementing VPC Service Controls is more about people and process than technology. Organizations must conduct extensive upfront discovery, use phased rollouts to avoid breaking production systems, and design VPC Service Controls that enable rather than block work. Success requires automation, clear exception processes, tracking both security and business metrics, and continuous improvement.

Shijin Nair
on Jan 19, 2026
DevOps

Platform-as-a-Product: Declarative Infrastructure for Developer Velocity

Declarative infrastructure config hides complexity, enabling developers to focus on application code. Unified YAML per service allows early cost validation, while independent CI with centralized CD balances team autonomy and deployment consistency. This standardized approach scales across organizations, making infrastructure invisible and operations automatic.

Avinash Sabat
on Jan 14, 2026
DevOps

Stop Guessing, Start Improving: Using DORA Metrics and Process Behavior Charts

Delivery performance rarely changes in a straight line. Small degradations caused by tooling, environment instability, or team changes can accumulate quietly, while real improvements take time to emerge. This article shows how combining DORA metrics with Process Behavior Charts helps teams zoom out, detect meaningful shifts early, and validate improvement hypotheses.

Egor Savochkin
on Dec 29, 2025
DevOps

Overload Protection: the Missing Pillar of Platform Engineering

Overload protection is often overlooked in platform engineering, leaving teams to create inconsistent, fragile fixes. Centralized rate limits, quotas, adaptive controls, and clear visibility give services predictable ways to handle traffic spikes, reduce reliability debt, and prevent cascading failures across systems.

Gaurav Nanda Tapan Manaktala
on Dec 09, 2025
Architecture & Design

Holistic Engineering: Organic Problem Solving for Complex Evolving Systems

Late projects. Architectures that drift from their original design. Code that mysteriously evolves into something nobody planned. These persistent problems in software development often stem not from technical failures, but from forces we pretend don't exist—reward systems that incentivize the wrong behaviors, organizational structures that ignore domain boundaries, and human dynamics.

Vanessa Formicola
on Nov 17, 2025
DevOps

When Reverse Proxies Surprise You: Hard Lessons from Operating at Scale

Operating massive reverse proxy fleets reveals hard lessons: optimizations that work on smaller systems fail at scale; mundane oversights like missing commas cause major outages; and abstractions meant to simplify become hidden fragility points. Success requires profiling on target hardware, relentlessly monitoring boring details, keeping hot paths lean, and trusting instrumentation over theory.

Mitendra Mahto
on Nov 12, 2025
Architecture & Design

Building Resilient Platforms: Insights from over Twenty Years in Mission-Critical Infrastructure

Building resilient platforms requires understanding the art and science of creating infrastructure that others depend on for critical applications. This perspective applies to anyone who builds software consumed by others at scale. Whether developing infrastructure platforms, software development platforms, or messaging systems, principles address how to build software that others consume at scale

Matthew Liste
on Nov 10, 2025
Cloud

InfoQ Cloud and DevOps Trends Report - 2025

This InfoQ Trends Report offers readers a comprehensive overview of emerging trends and technologies in the areas of Cloud and DevOps. This report summarizes the InfoQ editorial team’s and external guests' view on the current trends in Cloud and DevOps technologies and what to look out for in the next 12 months.

Steef-Jan Wiggers Shweta Vohra Matt Saunders
on Oct 22, 2025
DevOps

Beyond the Padlock: Why Certificate Transparency is Reshaping Internet Trust

Certificate Transparency (CT) creates public, append-only logs of every TLS certificate issued, enabling detection of rogue or mistaken certificates. This article explores how CT has transformed internet PKI by moving from reliance on certificate authority trustworthiness to providing verifiable transparency that major browsers now require.

Karthiek Maralla
on Sep 08, 2025
DevOps

How Causal Reasoning Addresses the Limitations of LLMs in Observability

Large language models excel at converting observability telemetry into clear summaries but struggle with accurate root cause analysis in distributed systems. LLMs often hallucinate explanations and confuse symptoms with causes. This article suggests how causal reasoning models with Bayesian inference offer more reliable incident diagnosis.

Dhairya Dalal
on Sep 02, 2025

Newer Articles

Older Articles

InfoQ Software Architects' Newsletter

Articles