BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Uber Improves Resiliency of Microservices with Adaptive Load Shedding

Uber Improves Resiliency of Microservices with Adaptive Load Shedding

Uber created a new load-shedding library for its microservice platform, serving over 130 million customers and handling aggregated peaks of millions of requests per second (RPSs). The company replaced the solution based on QALM with Cinnamon library, which, in addition to graceful degradation, can dynamically and continuously adjust the capacity of the service and the amount of load shedding.

Uber needed a solution to gracefully degrade the quality of service in case of performance issues within its platform and previously created QALM (QoS Aware Load Management) framework. The solution was able to prioritize server resources for more critical requests and shed less critical ones in case of traffic overload, resource exhaustion, or dependency failure to mitigate or avoid outages. QALM served the company well but required careful tuning to adjust configuration parameters for specific use cases.

Jakob Holdgaard Thomsen, principal engineer at Uber, explains why configuration-free load shedding is essential for the company:

Given Uber's service footprint, any solution for graceful degradation also has to be automatic and require no configuration because even if we require just a single configuration value per service, it will quickly be expensive in engineering hours, given the vast number of services we operate and how often such values tend to be outdated.

The new library uses a modified TCP-Vegas algorithm to adjust the capacity of the service. It applies load shedding rate based on incoming requests, tracked with the PID (Propotional-Integral-Derivative) controller. Cinnamon is built as an RPC middleware and can be easily imported by services that Uber develops. The library relies on the priority tag as part of the request context propagated from the edge via Jaeger. Uber used a scheme inspired by WeChat's approach, where priority value consists of the tier component (0 being the highest and 5 the lowest) and the cohort component, segmenting the user population into 128 buckets.

The Design of Cinnamon Library (Source: Uber Engineering Blog)

Each request, handled by Cinnamon, is routed through a sequence of internal components before being dispatched to the service's business logic. The Rejector determines whether the request should be accepted or not, the Priority Queue orders accepted requests based on priority, and the Scheduler is responsible for parallel processing while enforcing rate limits.

Additionally, two background processes continuously adjust the rejection threshold and parallelism level. When the Rejection component detects the overload, the PID Controller process is started to monitor the inflow and outflow of requests into the Priority Queue. The PID controller can quickly find a stable ratio of requests to shed and is agnostic to the throughput of the service. The Auto-tuner background process uses a customized version of the Vegas TCP/IP control algorithm that monitors service latencies and adjusts the parallelism level used by the Scheduler to achieve the optimal throughput vs latency level.

Throughput and Latency Comparison Between Cinnamon and QALM (Source: Uber Engineering Blog)

The team performed extensive performance testing of the new solution and compared it to QALM. The results showed that Cinnamon is very efficient, on average adding only one millisecond to the request latency. The new library can also handle bigger overloads while maintaining acceptable latencies, and it is being quickly adopted by service teams at Uber.

InfoQ previously reported on Monzo’s solution for targeted traffic shedding.

About the Author

Rate this Article

Adoption
Style

BT