A group of engineers from Google, UCLA, SpaceX are presenting the paper Maglev: A Fast and Reliable Software Network Load Balancer (PDF) at the 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI '16) taking place this week. Maglev is Google’s network load balancer.
Unlike a dedicated load balancer hardware, Maglev is a software solution running on commodity servers. Instead of acquiring specialized hardware ahead of time to provide enough capacity for traffic peaks, Google runs Maglev on regular servers, adding more of them to the pool as demand grows. Maglev was developed in-house by Google for their own data centers and has been used in production since 2008.
Google services run in clusters in multiple data centers spread around the world. Each such cluster has a load balancer which consists of multiple devices placed between routers and the servers providing services. Dedicated load balancers are usually deployed in active-passive pairs to provide 1+1 redundancy, which makes one of them idle, resulting in unused capacity. Also, they are limited by their capacity and hard or impossible to reprogram. Google has decided to use a configuration providing N+1 redundancy with their own software and commodity servers, for better scalability and flexibility, as shown in the following graphic.
Regarding performance, a single Maglev server can “saturate a 10Gbps link with small packets. Maglev is also equipped with consistent hashing and connection tracking features, to minimize the negative impact of unexpected faults and failures on connection-oriented protocols.” Maglev is being used for Google Cloud to serve 1M requests/sec within 5 seconds after setup and without pre-warming. During a performance benchmark conducted by Google, a Maglev instance running on one 8-core CPU capped at 12M pps (packets per second). Maglev is not using the Linux kernel network stack which would slow it down to less than 4M pps.
The paper presents in detail how a request is processed by Maglev, how virtual IP addresses are handled, how the request is directed to a service end-point based on an Equal Cost Multipath (ECMP) algorithm, hashing and others.