Popular music streaming service Deezer has written about how it uses custom metrics to enable auto-scaling in its Kubernetes infrastructure.
Deezer has been successfully using Kubernetes in production since 2018, primarily in a bare-metal environment with some clusters on public cloud platforms. However, server utilisation and performance issues made scaling applications to an appropriate size and number of replicas challenging.
Noting that scaling applications to the correct size and the right number of replicas is difficult, the Deezer engineering team turned to Kubernetes' widely-used Horizontal Pod Autoscaling (HPA) feature, the use of which is well-documented, for example, by a post on the Digital Ocean community site. However, Kubernetes' default HPA relies on generating scaling rules from CPU and memory usage, and Deezer found that using these metrics for scaling didn't suit the needs of their applications. To scale more accurately, they built a solution which uses Prometheus for metric collection and the Prometheus Adapter to expose these metrics to Kubernetes.
A key innovation in Deezer's approach is using Event Loop Utilization (ELU) as a custom metric for their Node.js applications. ELU measures the time the Node.js event loop is busy processing events compared to being idle.
The Deezer team discovered that ELU is more representative of the server load than the CPU percentage. They explain that this metric allows for more accurate scaling decisions, especially when CPU usage might spike during the initial loading of a new pod.
Deezer's implementation involved several steps:
- Deploying Prometheus to collect application-specific metrics
- Configuring Prometheus Adapter to expose ELU metrics to Kubernetes
- Creating HorizontalPodAutoscaler resources that reference the ELU metric
The team provided detailed configuration examples in their blog post, offering insights into the setup process.
They used Vegeta, an HTTP load-testing tool, to validate its new autoscaling setup. Vegeta allowed Deezer to generate controlled loads on its applications while monitoring pod counts and HPA state, ensuring the system scaled as expected under various conditions.
Other organisations have successfully driven auto-scaling from custom metrics. Pixie explains how they wrote a custom metrics server in Go to fulfil this requirement. Overcast also has a tutorial describing how monitoring a queue length could be used as a custom metric for autoscaling.
Further emphasising the value of using custom metrics, Loft provides a full breakdown of how to implement HPA and critiques whether it is good enough. Levent Ogut from Loft explains that the Horizontal Pod Autoscaler (HPA) in Kubernetes, while useful, has limitations. Its reliance on CPU and memory metrics can cause scaling delays during sudden traffic spikes, making KEDA or custom metrics preferable for faster response.
Stateful workloads and I/O-dependent applications may also not scale efficiently with HPA's default metrics, and lengthy application startup times or improper shutdown handling can impede effective scaling. To avoid these issues, Loft recommends carefully setting up probes, using minimum replica counts, and having stabilisation windows. HPA struggles with bursting traffic patterns, suggesting that only preemptively scaling can keep up with demand. Lastly, HPA can't consider dependent services - such as databases - which can lead to those becoming overloaded.
While the custom metric-based autoscaling has proven beneficial, Deezer acknowledges the increased complexity it introduces. The team emphasises the importance of proper monitoring and tuning of the Prometheus Adapter, as it becomes a critical autoscaling component.
To mitigate potential issues, Deezer recommends:
- Scaling Prometheus Adapter replicas to ensure high availability
- Implementing a PodDisruptionBudget to maintain HPA functionality during cluster maintenance
- Setting up comprehensive monitoring for the autoscaling infrastructure
Rounding out the blog post, Deezer explained how it is exploring further methods to improve its autoscaling setup, analysing more advanced scenarios that potentially use tools like KEDA (Kubernetes Event-Driven Autoscaling) for even greater flexibility.