InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News Deezer Optimizes Kubernetes Autoscaling with Custom Metrics

DevOps

Deezer Optimizes Kubernetes Autoscaling with Custom Metrics

Oct 07, 2024 3 min read

Write & Win: InfoQ Contest

Join the contest to:

Win a conference ticket
Boost your profile
Help the community

Send your article proposal

Popular music streaming service Deezer has written about how it uses custom metrics to enable auto-scaling in its Kubernetes infrastructure.

Deezer has been successfully using Kubernetes in production since 2018, primarily in a bare-metal environment with some clusters on public cloud platforms. However, server utilisation and performance issues made scaling applications to an appropriate size and number of replicas challenging.

Noting that scaling applications to the correct size and the right number of replicas is difficult, the Deezer engineering team turned to Kubernetes' widely-used Horizontal Pod Autoscaling (HPA) feature, the use of which is well-documented, for example, by a post on the Digital Ocean community site. However, Kubernetes' default HPA relies on generating scaling rules from CPU and memory usage, and Deezer found that using these metrics for scaling didn't suit the needs of their applications. To scale more accurately, they built a solution which uses Prometheus for metric collection and the Prometheus Adapter to expose these metrics to Kubernetes.

Deezer's custom metrics setup for Autoscaling

A key innovation in Deezer's approach is using Event Loop Utilization (ELU) as a custom metric for their Node.js applications. ELU measures the time the Node.js event loop is busy processing events compared to being idle.

The Deezer team discovered that ELU is more representative of the server load than the CPU percentage. They explain that this metric allows for more accurate scaling decisions, especially when CPU usage might spike during the initial loading of a new pod.

Deezer's implementation involved several steps:

Deploying Prometheus to collect application-specific metrics
Configuring Prometheus Adapter to expose ELU metrics to Kubernetes
Creating HorizontalPodAutoscaler resources that reference the ELU metric

The team provided detailed configuration examples in their blog post, offering insights into the setup process.

They used Vegeta, an HTTP load-testing tool, to validate its new autoscaling setup. Vegeta allowed Deezer to generate controlled loads on its applications while monitoring pod counts and HPA state, ensuring the system scaled as expected under various conditions.

Other organisations have successfully driven auto-scaling from custom metrics. Pixie explains how they wrote a custom metrics server in Go to fulfil this requirement. Overcast also has a tutorial describing how monitoring a queue length could be used as a custom metric for autoscaling.

Further emphasising the value of using custom metrics, Loft provides a full breakdown of how to implement HPA and critiques whether it is good enough. Levent Ogut from Loft explains that the Horizontal Pod Autoscaler (HPA) in Kubernetes, while useful, has limitations. Its reliance on CPU and memory metrics can cause scaling delays during sudden traffic spikes, making KEDA or custom metrics preferable for faster response.

Stateful workloads and I/O-dependent applications may also not scale efficiently with HPA's default metrics, and lengthy application startup times or improper shutdown handling can impede effective scaling. To avoid these issues, Loft recommends carefully setting up probes, using minimum replica counts, and having stabilisation windows. HPA struggles with bursting traffic patterns, suggesting that only preemptively scaling can keep up with demand. Lastly, HPA can't consider dependent services - such as databases - which can lead to those becoming overloaded.

While the custom metric-based autoscaling has proven beneficial, Deezer acknowledges the increased complexity it introduces. The team emphasises the importance of proper monitoring and tuning of the Prometheus Adapter, as it becomes a critical autoscaling component.

To mitigate potential issues, Deezer recommends:

Scaling Prometheus Adapter replicas to ensure high availability
Implementing a PodDisruptionBudget to maintain HPA functionality during cluster maintenance
Setting up comprehensive monitoring for the autoscaling infrastructure

Rounding out the blog post, Deezer explained how it is exploring further methods to improve its autoscaling setup, analysing more advanced scenarios that potentially use tools like KEDA (Kubernetes Event-Driven Autoscaling) for even greater flexibility.

About the Author

Matt Saunders

I am the CTO's DevOps Lead at Adaptavist. I help teams use DevOps, platform engineering and cloud-native tools and technologies to deliver reliable quality software quickly and efficiently and with minimal stress. I've worked with complex enterprises, small start-ups, SMEs and everything in between. I also co-organise the London DevOps meetup group, which has over 10,000 members, hosting a hugely popular monthly industry event.

Show moreShow less

This content is in the DevOps topic

Write Your Way to a QCon or InfoQ Dev Summit!

Join the InfoQ article competition to win a complimentary ticket to QCon or InfoQ Dev Summit! We're seeking in-depth technical articles written by software developers for software developers.

Send your proposal

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Deezer Optimizes Kubernetes Autoscaling with Custom Metrics

Write & Win: InfoQ Contest

About the Author

Matt Saunders

This content is in the DevOps topic

Related Topics:

Popular in DevOps

Related Sponsored Content

Popular across InfoQ

Write Your Way to a QCon or InfoQ Dev Summit!

The InfoQ Newsletter