BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Leveraging eBPF for Improved Infrastructure Observability

Leveraging eBPF for Improved Infrastructure Observability

To efficiently and effectively investigate multi-tenant system performance, Netflix has been experimenting with eBPF to instrument the Linux kernel to gather continuous, deeper insights into how processes are scheduled and detect "noisy neighbors".

Using eBPF, the Compute and Performance Engineering teams at Netflix aimed at circumventing a few issues that usually make noisy neighbors detection hard. Those include the overhead introduced by analysis tools like perf, which also implies they are usually deployed only after the problem has already occurred, and the level of engineer expertise required. What eBPF makes possible, according to Netflix engineers, is observing compute infrastructure with low impact on performance to achieve continuous instrumentation of the Linux scheduler.

The key metric identified by Netflix engineers as an indicator of possible performance issues caused by noisy neighbors is process latency:

To ensure the reliability of our workloads that depend on low latency responses, we instrumented the run queue latency for each container, which measures the time processes spend in the scheduling queue before being dispatched to the CPU.

To this aim, they used three eBPF hooks: sched_wakeup, sched_wakeup_new, and sched_switch. The first two of them are invoked when a process goes from 'sleeping' to 'runnable', i.e., when it is ready to run and waiting for some CPU time. The sched_switch hook is triggered when the CPU is assigned to a different process. Process latency is thus calculated by subtracting the timestamp at the moment the CPU is assigned to the process and the moment when it first became ready to run.

Finally, the events collected by instrumenting the kernel are processed in a Go program to emit metrics to Atlas, Netflix metrics backend. To pass collected data to the userspace Go program, Netflix engineers decided to use eBPF ring buffers, which provide an efficient, high-performing, and user-friendly mechanism that does not require extra memory copying or syscalls.

Along with the timing information, eBPF also makes it possible to collect additional information about the process, including the process's cgroup ID which associates it to a container, which is key to correctly interpret preemption. Indeed, detecting noisy neighbors is not just a matter of measuring latency, since it also requires tracking how often a process is preempted and which process caused the preemption, whether it runs in the same container or not.

For example, if a container is at or over its cgroup CPU limit, the scheduler will throttle it, resulting in an apparent spike in run queue latency due to delays in the queue. If we were only to consider this metric, we might incorrectly attribute the performance degradation to noisy neighbors when it's actually because the container is hitting its CPU quota.

To make sure their approach did not hamper the performance of the monitored system, Netflix engineers also created a tool to measure eBPF code overhead, bpftop. Using this tool, they could identify several optimizations to reduce even more the overhead they initially had, keeping it below the 600 nanoseconds threshold for each sched_* hook. This makes it reasonable to constantly run the hooks without fearing they will have an impact on system performance.

If you are interested in this approach to system performance monitoring or understanding better how eBPF works internally, the original article provides much more detail than can be covered here, including useful sample code.

About the Author

Rate this Article

Adoption
Style

BT