LinkedIn recently published how it handles overload detection and remediation in its Java-based microservices. LinkedIn's solution, Hodor, provides an adaptive solution that works out of the box with no configuration. It is a platform-agnostic mechanism to run overload detectors and load shedders inside the monitored process that seamlessly samples load and sheds traffic from within the application's processing chain.
"Hodor" stands for Holistic Overload Detection and Overload Remediation. It is a Java-based component that runs in-process via platform-specific adapters. It detects service overload caused by multiple root causes, automatically remediating the problem by dropping just enough traffic to allow the service to recover and then maintaining an optimal traffic level to prevent reentering overload.
Bryan Barkley, a senior staff engineer at LinkedIn, describes the motivation for creating Hodor:
LinkedIn launched in its initial form over 18 years ago, which is an eternity in the technology world. The early site was a single monolithic Java web application, and as it gained in popularity and the user base grew, the underlying technology had to adapt in order to support our ever-growing scale. We now operate well over 1,000 separate microservices running on the JVM, and each has its own set of operational challenges when it comes to operating at scale. One of the common issues that services face is becoming overloaded to the point where they are unable to serve traffic with reasonable latency.
The following diagram depicts Hodor's internal architecture.
Overload detectors determine when a service is suffering from overload. Multiple detectors can be registered (including application-specific ones), and each is queried for every inbound request. When the detector determines that the system is overloaded, the load shedder decides which traffic should be dropped. Finally, a platform-specific adapter converts any request-specific data into a platform-agnostic format that the detectors and shedder understand. Rejected requests are safe for the client to retry since no application logic has not handled them.
Mr Barkley states that "when developing Hodor, we wanted to target the most common set of overload scenarios to start with, and then iteratively increase the scope to different potential causes". Consequently, they decided to first build an overload detector for CPU exhaustion.
Their approach to measuring CPU availability, which they determined following a lengthy set of experiments, is to run a background daemon thread in a loop and schedule it to sleep for an interval. The actual amount of time slept is recorded and compared to the expected sleep time when the thread wakes up. They collect these samples over a time window and then look at a certain percentile from the data to see if it crosses an out-of-bound threshold. If too many windows violate the given threshold, they consider the service CPU overloaded.
At this point, the load shedder kicks in to start dropping traffic. Instead of simply dropping all traffic at this point, the default load shedding strategy is to limit the number of concurrent requests. Once the service exceeds the limit of concurrent requests, the load shedder drops additional traffic. Yet, once this number drops under the limit, it allows other traffic.
This mechanism's heart is an adaptive algorithm that determines the concurrency limit based on feedback from the overload detectors, increasing or decreasing the concurrency threshold as needed. This limit is continuously fine-tuned during an overload event to guarantee that minimal traffic is lost - just enough for the service to continue operation.
When Hodor sheds traffic, it instructs clients to retry a rejected request on another instance. Mr Barkley notes that "retrying requests blindly can be problematic and lead to retry storms if an entire cluster of services is overloaded". Hence, Hodor manages several retry budgets in place, on both the client and server sides, using an approach heavily influenced by Google's SRE Book. Once the budget is exceeded, the request fails, and the server will stop instructing the client to retry until it appears the broader situation is resolved.