Lightstep has released a number of improvements to their observability platform. These include native support for OpenTelemetry metrics, a new underlying time series database, and Change Intelligence, a new feature that looks to connect unusual patterns with impacting changes by bringing together system metrics and trace data.
OpenTelemetry is a Cloud Native Computing Foundation sandbox project that provides a collection of open source, vendor neutral tools, APIs, and SDKs for generating, capturing, and collecting telemetry data. With this release, Lightstep now has native support for OpenTelemetry Metrics. The metrics API supports processing of raw measurements typically with the intent of producing continuous summaries of those measurements. This integration is also available within the free community tier of Lightstep.
This release also introduces a new underlying time series database (TDSB). This database was built by some of the same people involved in the creation of Google's Monarch system. Unlike Monarch, Lightstep's TLDB (The Lightstep Database) doesn't rely fully on storing monitoring data in memory. This change allows for minimizing the number of touches to each data point. They note that telemetry workloads tend to have very small writes. The data structures best suited for in-memory verses on-storage tend to be different so eliminating the need for in-memory storage allows for a translating the data into a single format. This minimization of touches helps reduce the overall cost of the system.
With this rebuild, there is now an integrated query engine powering all of the alerting and dashboard features. Nearly all data ingested by Lightstep is stored in the TLDB and it is being used to serve Lightstep's own internal alerts, dashboards, and traces.
Lightstep found noticeable improvements in performance as a result of their new database. For example, transitioning one user-facing workload into the TLDB cut CPU usage in half, memory usage by two-thirds, and disk storage by over 80%.
metricdb
latency improvements from moving a user-facing workload into the TLDB (Credit: Lightstep)
Another implementation decision was to avoid splintering tracing and metric data. As Alex Kehlenbeck, principal software engineer at Lightstep, notes: "We’ve found that there’s a kind of corollary to Conway’s Law at work: when data is siloed in the backend, the product experience inevitably ends up exposing that fact." The new Change Intelligence feature takes advantage of this decision by leveraging both trace and metric data within one view to provide system insights.
With Change Intelligence, Lightstep is looking to provide a connection between changes happening within the system and discrepancies in system performance. It is now possible to click into a deviation and see possible changes that led to the anomaly via trace data.
This leads to a new view that shows a system-wide analysis focused on explaining the change in behavior. A list of possible impacting changes is listed with a comparison of baseline performance to the selected deviation.
Lightstep community edition is available free of charge and includes the new features in this release. For use cases that require more data ingestion and retention there are paid options available.