Elastic recently announced its plan to donate its continuous profiling agent to the OpenTelemetry (OTel) project. This agent is an always-on, continuous profiling solution that eliminates the need for runtime or bytecode instrumentation, recompilation, on-host debug symbols or restarting services.
The Elastic profiling agent, which uses eBPF technology, is a tool for monitoring the performance of applications across different languages and environments. It allows for system-wide, ongoing profiling across entire networks and systems without requiring any changes to the applications or restarting them. It constructs stack traces that start from the kernel, pass through user space native code, and extend into code executed in higher-level runtimes. This helps highlight performance downturns, cut down on unnecessary computations, and troubleshoot complicated problems more quickly.
This agent has been deployed in large customer settings since August 2021.
Elastic is an active member of the OTel community, especially in the Profiling Special Interest Group (SIG). This group played a key role in creating the OTel Profiling Data Model, a vital move towards standardizing profiling data.
In the current times, where code impacts finances and the environment, focusing on computational efficiency is more important. Efficient software lowers costs and reduces the carbon footprint. With this donation, Elastic aims to support the OpenTelemetry community in boosting computational efficiency.
Sometimes, libraries or background processes use more resources than the applications themselves. With profiling the entire system and tools to analyze data by service and total usage, identifying these resource-heavy elements is easier. Unlike conventional profilers that only analyze runtime, Elastic Universal Profiling offers complete system insight.
Source: OpenTelemetry and Elastic: Working together to establish continuous profiling for the community
It profiles everything from your code to third-party libraries and kernel activities, including code outside your ownership. This broad perspective helps quickly optimize by emphasizing inefficient common libraries and revealing hidden issues that use up CPU resources.
Some features of Elastic profiling agent include low impact on system resources, with CPU usage capped at 1% and memory usage at 250MB in tests. It offers robust support for profiling native C/C++ applications without requiring DWARF debug information, instead utilizing .eh_frame data for stack unwinding.
The agent can profile system libraries lacking frame pointers and debug symbols and supports mixed stack traces across different runtimes, covering everything from kernel space through unmodified system libraries to high-level languages. Additionally, it handles native code from a variety of languages including C/C++, Rust, Zig, and Go, even without debug symbols present on the host.
About OpenTelemetry, we came across an interesting conversation on Hacker News. The discussion revolved around the present condition of OpenTelemetry, with the tech community sharing their insights. One of the HN users expressed that while OpenTelemetry is a promising idea, it hasn't fully met expectations yet, particularly due to insufficient documentation. The initial guides are helpful for basic setups but fall short in addressing more complex, real-world scenarios.
Elastic has been involved in various OpenTelemetry (OTel) projects, including the development of language SDKs like OTel Swift, Go, Ruby, and more. The company is also active in special interest groups (SIGs), aiming to strengthen OTel's role as a standard in observability and security.
Elastic is eager to enhance its partnership with OTel by contributing the profiling agent, assuring advantage to both the Elastic and wider OTel communities. Interested participants are welcome to contribute to the proposal or engage in the discussion.