InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News How Amazon Aurora Serverless Manages Resources and Scaling for Fleets of 10K+ Instances

Architecture & Design

How Amazon Aurora Serverless Manages Resources and Scaling for Fleets of 10K+ Instances

This item in japanese

Aug 23, 2024 2 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

AWS engineers published a paper describing the evolution and latest design of resource management and scaling for the Amazon Aurora Serverless platform. Aurora Serverless uses a combination of components at different levels to create a holistic approach for dynamically scaling and adjusting resources to satisfy the needs of customer workloads.

Amazon Aurora Serverless automatically scales Amazon Aurora databases to respond to changing customer workloads and delivers cost optimizations, improved performance, and simplified operations. Aurora customers configure scaling bounds using Aurora Capacity Units (ACUs), and the services dynamically adjust resources according to demand. From the customer's perspective, these scaling operations don't require any interventions and don't disrupt client connections or session states, but they may impact latency.

The current Aurora Serverless product is a second-generation design based on experiences from operating and supporting ASv1, which launched in 2018. The new design focused on in-place scaling, using CPU and memory hot (un)plug, supported by live migrations across hosts. Compared with ASv1, ASv2 offers faster and more seamless scaling, with smaller scaling increments, making it more cost-effective.

The team working on the second-generation solution had to address many challenges, the main of which was effective memory management for DB workloads that would support scale-up and scale-down events. Linux and DB engines tend to commit all available memory and hold on to it. Engineers changed DB engines, Linux kernel, and AWS Nitro hypervisor to allow more flexible memory management for varying workloads.

Instance Manager Service (Source: Resource Management in Aurora Serverless)

Amazon Aurora utilizes a per-instance manager service to control the resource scaling of the DB engine based on the demand trends from all instances on the physical host. Optimizing the placement of DB engines across hosts and the available resource headroom allows Aurora Serverless to ensure sufficient resources are available on the host to accommodate dynamic workloads without the need to migrate those between hosts.

Aurora Serverless service manages large fleets of tens of thousands of compute instances at the broadest level. The fleet manager service focuses on mid to long-term fleet sizing and capacity adjustments based on desired utilization levels and predicted demand. Live migrations between hosts are used when hosts risk becoming "hot" to free up resources. Additionally, the fleet manager can impose temporary limits on maximum ACUs for instances during "heat remediation".

Fleet Manager Service (Source: Resource Management in Aurora Serverless)

Engineers shared some data from Aurora flees in US AWS regions, pointing out that the vast majority (99.98%) of scale-up events didn't need inter-host migrations and could be satisfied by an in-place scaling mechanism.

The paper concludes with some key takeaways, emphasizing focusing on design simplicity and a reactive, metric-driven approach to resource management. The team didn't rule out introducing more predictive elements into the solution in the future and highlighted further opportunities for the co-evolution of hypervisors and OS kernels to better support DB workloads.

About the Author

Rafal Gancarz

Rafał is an experienced technology leader and expert. He's currently helping Starbucks make its Commerce Platform scalable, resilient and cost-effective. Previously, Rafał has been involved in designing and building large-scale, distributed and cloud-based systems for Cisco, Accenture, Capita, ICE, Callsign and others. His interests span architecture & design, continuous delivery, observability and operability, as well as sociotechnical and organisational aspects of software delivery.

Show moreShow less

This content is in the AWS topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

How Amazon Aurora Serverless Manages Resources and Scaling for Fleets of 10K+ Instances

Write for InfoQ

About the Author

Rafal Gancarz

This content is in the AWS topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter