BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News How Amazon Aurora Serverless Manages Resources and Scaling for Fleets of 10K+ Instances

How Amazon Aurora Serverless Manages Resources and Scaling for Fleets of 10K+ Instances

This item in japanese

AWS engineers published a paper describing the evolution and latest design of resource management and scaling for the Amazon Aurora Serverless platform. Aurora Serverless uses a combination of components at different levels to create a holistic approach for dynamically scaling and adjusting resources to satisfy the needs of customer workloads.

Amazon Aurora Serverless automatically scales Amazon Aurora databases to respond to changing customer workloads and delivers cost optimizations, improved performance, and simplified operations. Aurora customers configure scaling bounds using Aurora Capacity Units (ACUs), and the services dynamically adjust resources according to demand. From the customer's perspective, these scaling operations don't require any interventions and don't disrupt client connections or session states, but they may impact latency.

The current Aurora Serverless product is a second-generation design based on experiences from operating and supporting ASv1, which launched in 2018. The new design focused on in-place scaling, using CPU and memory hot (un)plug, supported by live migrations across hosts. Compared with ASv1, ASv2 offers faster and more seamless scaling, with smaller scaling increments, making it more cost-effective.

The team working on the second-generation solution had to address many challenges, the main of which was effective memory management for DB workloads that would support scale-up and scale-down events. Linux and DB engines tend to commit all available memory and hold on to it. Engineers changed DB engines, Linux kernel, and AWS Nitro hypervisor to allow more flexible memory management for varying workloads.

Instance Manager Service (Source: Resource Management in Aurora Serverless)

Amazon Aurora utilizes a per-instance manager service to control the resource scaling of the DB engine based on the demand trends from all instances on the physical host. Optimizing the placement of DB engines across hosts and the available resource headroom allows Aurora Serverless to ensure sufficient resources are available on the host to accommodate dynamic workloads without the need to migrate those between hosts.

Aurora Serverless service manages large fleets of tens of thousands of compute instances at the broadest level. The fleet manager service focuses on mid to long-term fleet sizing and capacity adjustments based on desired utilization levels and predicted demand. Live migrations between hosts are used when hosts risk becoming "hot" to free up resources. Additionally, the fleet manager can impose temporary limits on maximum ACUs for instances during "heat remediation".

Fleet Manager Service (Source: Resource Management in Aurora Serverless)

Engineers shared some data from Aurora flees in US AWS regions, pointing out that the vast majority (99.98%) of scale-up events didn't need inter-host migrations and could be satisfied by an in-place scaling mechanism.

The paper concludes with some key takeaways, emphasizing focusing on design simplicity and a reactive, metric-driven approach to resource management. The team didn't rule out introducing more predictive elements into the solution in the future and highlighted further opportunities for the co-evolution of hypervisors and OS kernels to better support DB workloads.

About the Author

Rate this Article

Adoption
Style

BT