InfoQ Homepage Monitoring Content on InfoQ
-
Instana Pipeline Feedback for Release Performance
Application performance management service provider Instana launched Pipeline Feedback for release performance tracking and analysis. Pipeline Feedback provides automatic tracking of application releases, feedback on release performance, and integration with Jenkins.
-
Microsoft Releases a Preview of the Integration of Prometheus with Azure Monitor for Containers
Recently Microsoft announced the integration of Prometheus, a popular open-source metric monitoring solution and part of Cloud Native Compute Foundation, with Azure Monitor for containers. This integration is currently available in a preview stage for testing.
-
Oliver Gould on Linkerd Service Mesh and Traffic Management
Oliver Gould, Linkerd product lead and CTO of Buyont, spoke at the QCon New York 2019 Conference last week about Linkerd service mesh, with a focus on traffic management capabilities.
-
Athena: Automated Build Health Monitoring at Dropbox Engineering
Dropbox’s engineering team runs ~35,000 builds and millions of automated tests, many of which can fail either due to bad commits or due to environmental conditions. The team created a build monitoring system to minimize the manual intervention necessary to detect and quarantine flaky tests, and notify code authors.
-
Expo: Real Time A/B Testing and Monitoring with Spark Streaming and Kafka at Walmart Labs
The WalmartLabs engineering team developed a real time A/B testing tool called Expo that collects and analyzes user engagement metrics. It uses Spark Structured Streaming to process the incoming data and stores the metrics in KairosDB.
-
HashiCorp Releases Consul 1.5.0 with Layer 7 Observability and Centralized Configuration
Hashicorp released version 1.5.0 of Consul, their service mesh application and key-value store. These are the first features released on their new roadmap for Consul, including support for L7 observability and load balancing via Envoy, centralized configuration, and ACL authentication support for trusted third-party applications.
-
Merging OpenTracing and OpenCensus into a Single Distributed Tracing Framework
The OpenTracing and OpenCensus projects have announced that they will merge into a single, unified project. The goals of the merge include creating a single instrumentation standard, maintaining essential functionality without including every feature from both projects, a loosely coupled architecture to enable pluggability, and cover within its scope traces, metrics and logs.
-
Scaling Graphite at Booking.com
Booking.com's engineering team scaled their Graphite deployment from a small cluster to one that handles millions of metrics per second. Along the way, they modified and optimized Graphite's core components - the carbon-relay and carbon-cache, and the rendering API.
-
Vector Performance Monitoring Tool Adds eBPF, Unified Host-Container Metrics Support
Vector, the open source performance monitoring tool from Netflix, added support for eBPF based tools using a PCP daemon, a unified view of container and host metrics, and UI improvements.
-
Observability in Testing with ElasTest
In a distributed application it is difficult to use debugging techniques common in developing non-distributed applications. Bringing production observability to your testing environment helps to find bugs, argued Francisco Gortázar at the European Testing Conference 2019. He presented ElasTest, a tool for developers to test and validate complex distributed systems using observability.
-
Recommendations When Starting with Microservices: Ben Sigelman at QCon London
During the years Ben Sigelman worked at Google, they were creating what we today call a microservices architecture. Some mistakes were made during this adoption, which he believes are being repeated today by the rest of the industry. In his presentation at QCon London 2019, Sigelman described his recommendations to avoid making these mistakes when starting with microservices.
-
Chaos Engineering Observability: Q&A with Russ Miles
In a new O’Reilly report, “Chaos Engineering Observability: Bringing Chaos Experiments into System Observability”, the author, Russ Miles, explores why he believes the topics of observability and chaos engineering “go hand in hand”. He argues that as engineers begin to run chaos experiments, they will need to be able to ask many questions about the underlying system being experimented on.
-
Scaling, Incident Management and Collaboration at New York Times Engineering
The New York Times Engineering Team wrote about their approach to scaling and incident management against the backdrop of increased traffic during the November 2018 US midterm elections.
-
Three Pillars with Zero Answers: Rethinking Observability with Ben Sigelman
At KubeCon NA, held in Seattle, USA, in December 2018, Ben Sigelman presented “Three Pillars, Zero Answers: We Need to Rethink Observability” and argued that many organisations may need to rethink their approach to metrics, logging and distributed tracing.
-
Evolution of Metrics Collection and Log Aggregation at Coinbase
Luke Demi, software engineer at Coinbase, writes about the changes in monitoring and logging that have taken place at Coinbase since mid-2018. Coinbase moved from a self-managed Elasticsearch cluster that served the dual purpose of log analysis and metrics visualization, to Datadog for metrics collection and managed Elasticsearch on AWS for log aggregation.