InfoQ Homepage Observability Content on InfoQ
-
How Lyft Detects Android Memory Leaks in Production
While modern tooling for Android and iOS enable memory leak detection using local builds, this is not enough to guarantee an app shows correct memory behavior in production, where it runs on a wide range of devices in diverse conditions. For this reason, Lyft engineers combine A/B testing and memory observability to detect which features leak memory.
-
Grafana Adds Outlier Detection to Its Machine Learning Toolkit
Grafana has released outlier detection as part of their Grafana Machine Learning toolkit. Outlier detection can be used to monitor a group of similar things and be alerted when some of them start to behave differently than the norm.
-
Prometheus Adds Long Term Support Model and Improved Remote Write Mode
Prometheus, the open-source monitoring tool, has added a number of new features including a reduced functionality remote write mode. Additional improvements include a new HTTP service discovery mechanism, native histogram support, additional integrations for Alertmanager, and a new long-term support model.
-
Elastic 8.6 Released with Improvements to Observability, Security, and Search
Elastic has released Elastic 8.6 with improvements across the entire Elastic Search Platform including Elastic Enterprise Search, Elastic Observability, Elastic Security, and Kibana. The release includes additional connector clients, better observability of dependencies, improvements to alerts generated from prebuilt security rules, and temporary data views.
-
Grafana Releases New Frontend Observability SDK and Backend Profiling Database
Grafana has announced two new additions to its suite of observability and monitoring tooling. Grafana Faro is an open-source web SDK for real user monitoring (RUM) of browser frontend applications. Grafana Phlare is an open-source backend database for storing and querying profiling data. A new flame graph panel is available to facilitate visualizing and interpreting the collected profile data.
-
AWS Lambda Telemetry API Provides Enhanced Observability Data
AWS has released the AWS Lambda Telemetry API, a new way for extensions to receive enhanced function telemetry from the Lambda service. The new API simplifies collecting traces, logs, and custom and enhanced metrics from Lambda functions. Along with several example extensions, there are several extensions available from third parties including Datadog, Dynatrace, Serverless, and Sumo Logic.
-
New Metrics Capabilities for OpenTelemetry on Azure Monitor
Microsoft released for preview a series of updates to its Azure Monitor OpenTelemetry Exporter packages for .NET, Node.js and Python applications.
-
Comprehensive Kubernetes Telemetry with AWS Observability Accelerator
AWS recently created a new template within the AWS Observability Accelerator project that provides an integrated telemetry solution for Elastic Kubernetes Service (EKS) workloads.
-
Google Previews Log Analytics Feature in Its Cloud Logging Service
Google recently announced the preview of a new feature called Log Analytics in its Cloud Logging service, allowing companies to analyze data collected from their cloud environments.
-
Kubernetes Control Plane Metrics Now Available in Google Kubernetes Engine
Google has announced the general availability of Kubernetes control plane metrics in Google Kubernetes Engine (GKE). These metrics are directly integrated with Google Cloud Monitoring providing a single solution for troubleshooting issues with GKE. Integration with third-party observability tooling is also possible via the Cloud Monitoring API.
-
Azure Managed Grafana Now Generally Available
Microsoft recently announced the general availability (GA) of Azure Managed Grafana, a managed service that enables customers to run Grafana natively within the Azure cloud platform. With the managed service, they can connect to existing Azure Services to enhance observability and cloud management.
-
Standardising Observability and Incident Management at Miro
The Miro Data Engineering team recently discussed how they systematised alerts and incident management. Along with standardising the observability metrics and alert(s) definitions, the team started using OpsGenie for incident management. This helped the team address challenges with scaling such as standard format for metric labelling, alert definitions, on-call duties, etc.
-
Programming Observability: Measuring the Maturity of Observability as Code
Observability can be programmed and automated with observability as code. A maturity model can be used to measure and improve the adoption of observability as code implementation. Yury Niño Roa, cloud infrastructure engineer at Google, spoke about programming observability at InfoQ live August 2022.
-
Dealing with Cognitive Load Using Observability
We can make good decisions with speed when we limit the cognitive load on any one person or team. Observability can help to increase delivery speed, by providing information to developers that helps them to make decisions quickly.
-
Grafana 9 Brings Big Improvements to Alerting and User Experience
Grafana, an open-source graphing tool, has reached its version 9 release. The key goals behind version 9 are improving the user experience, making observability and data visualization easy and accessible, and improving alerting.