InfoQ Homepage Observability Content on InfoQ
-
Google Cloud Observability Adopts OpenTelemetry Protocol for Native Trace Ingestion
Google Cloud has announced native support for the OpenTelemetry Protocol (OTLP) in its Cloud Trace service, marking a significant step toward vendor-neutral observability infrastructure. The new capability allows developers to send trace data directly using OTLP through the telemetry.googleapis.com endpoint, eliminating the need for vendor-specific exporters and custom data transformations.
-
Datadog Launches Monocle, a Unified Rust-Powered Real-Time Metrics Engine
Datadog has launched Monocle, a new real-time time series storage engine written in Rust. The system unifies the company’s metrics storage infrastructure, delivering higher ingestion throughput and lower query latency while reducing operational complexity. Monocle replaces several generations of storage backends, addressing concurrency challenges and scaling limits that accumulated over time.
-
PagerDuty's Kafka Outage Silences Alerts for Thousands of Companies
PagerDuty, the incident management platform used by thousands of organisations to alert them to problems on their systems, suffered a major outage itself on 28th August, 2025. In a comprehensive outage report, the company detailed the scope of the problem, the customer impact, and how it is working to prevent a recurrence.
-
Azure Service Groups Enter Public Preview Offering New Abstraction Layer for Resource Management
Microsoft has launched Azure Service Groups in public preview, a new feature designed to simplify resource management and administration. Acting as a flexible, tenant-level container, Service Groups allow users to organize Azure resources from anywhere within their tenant without affecting RBAC or policy inheritance.
-
Honeycomb Hosted MCP Brings Observability Data into the IDE
Honeycomb has launched its hosted Model Context Protocol (MCP), giving developers real-time access to observability data inside IDEs and AI tools like GitHub Copilot. Available as a managed service on AWS Marketplace, it removes the need for self-hosting and streamlines debugging by surfacing traces, metrics, and logs without context-switching.
-
Grafana 12.1 Brings Built-in Diagnostics and Enhanced Alerting
Grafana 12.1 is here, elevating system reliability and alert management with features like Grafana Advisor for health checks, a revamped alerting interface, and trendline transformations for smarter data visualization. Enhanced dashboard interactivity and improved variable handling empower teams to scale efficiently. Experience the new era of Grafana on Cloud or self-hosted!
-
New Open Source Tool Subtrace Brings Network Analysis to Container Environments
Y Combinator startup Subtrace has released an open-source tool to help analyse network traffic from containerised applications. The creators have positioned it as "Wireshark for Containers " and aim to simplify network debugging in Docker and Kubernetes environments.
-
Vercel Adds External API Caching Analytics to Observability
Vercel has enhanced its observability platform by integrating external API caching insights, enabling developers to track how many requests to third-party APIs are served from the Vercel Data Cache versus being routed to the origin server.
-
Inside Netflix’s Title Launch Observability System: Validating Title Availability at Global Scale
Netflix has developed a platform called Title Launch Observability, which shifts observability from system health to product intent. Instead of relying solely on logs and metrics, the system validates launches against what users should see, catching content quality issues early. The platform helps detect issues such as missing artwork, incorrect recommendations, or localization gaps.
-
Logz.io and Dynatrace Innovations Shift Observability into the AI Age
Major observability platform providers are integrating artificial intelligence into their monitoring systems, as enterprises look to their suppliers to reduce the manual work involved in keeping an eye on digital infrastructure. Companies have implemented AI-driven features designed to automate routine operational tasks and accelerate incident resolution processes.
-
AWS Lambda Gains Native Avro and Protobuf Support for Kafka Events with Schema Registry Integration
Lambda now natively supports Apache Avro and Protobuf events, streamlining Kafka event processing - an enhancement that eliminates the need for custom deserialization, automates schema validation and filtering, and optimizes costs through efficient event handling. Integration with AWS Glue and Confluent registries simplifies development, allowing cleaner data consumption and enhanced scalability.
-
Microsoft Azure Enhances Observability with OpenTelemetry Support for Logic Apps and Functions
Microsoft has expanded OpenTelemetry support in Azure Logic Apps and Functions, enhancing observability and interoperability across platforms. This open-source framework enables seamless data generation and correlation, enhancing diagnostics beyond standard telemetry. With streamlined configuration and integration, Azure's offerings aim for standardized observability across cloud services.
-
AWS Shield Network Security Director: Network Topology Visibility and Remediation Guidance
Introducing AWS Shield Network Security Director: a game-changer in DDoS protection and network security visibility. This innovative feature automates resource discovery, evaluates configurations against best practices, and prioritizes security findings. With actionable remediation steps and natural language queries via Amazon Q Developer, organizations can enhance their security posture.
-
Amazon API Gateway Adds Dynamic Routing Based on Headers and Paths
AWS's new dynamic routing rules for Amazon API Gateway empower developers to streamline API traffic management by routing requests based on HTTP headers without complex URL structures. This innovative feature simplifies API versioning, enables fine-grained control, enhances A/B testing, and improves request visibility, making API configurations more efficient and user-friendly.
-
Applying Observability to Leadership to Understand and Explain your Way of Working
Leadership observability means observing yourself as you lead, treating yourself as the system that is under observation. Alex Schladebeck shared how narrating thoughts, using mind maps, asking questions, and identifying patterns helped her as a leader to explain decisions, check bias, support others, and understand her actions and challenges.