Yesterday GitHub engineer Robert Mosolgo posted a detailed account of how GitHub scaled the GitHub API with a sharded, replicated rate limiter in Redis. GitHub migrated from an older Memcached-based rate limiter to a Redis-based one. According to Mosolgo, the new implementation has improved reliability, fixed issues for clients, and reduced GitHub's support load.
GitHub's main issue with the older architecture revolved around the fact that other application caches shared the Memcached instance with the rate limiter. The problem manifested itself in two ways. First, GitHub was going to switch from a single, shared Memcached to one instance per data center. This change would cause the rate limiter to behave strangely if client requests were routed to different data centers. Second, when Memcached filled up (even by other application caches), it would sometimes evict rate limiter data, even when it was still active. This eviction allowed end-users to circumvent their rate limit, which is undesirable.
GitHub's solution was to switch to a dedicated sharded Redis cluster "since it has a more appropriate persistence system and simple sharding and replication setups." Sharding is performed client-side, where each client picks which Redis cluster to read and write from for each request. A single primary (for writes) and several replicas (for reads) compose each cluster. Storage logic is implemented in Lua scripts on Redis to guarantee the atomicity of operations.
Mosologo mentions an alternative solution the team ruled out:
One option we considered but decided against was using our MySQL-backed KV store (GitHub::KV) for storage. We didn't want to add traffic to already-busy MySQL primaries: usually, we use replicas for GET requests, but rate limit updates require writing access to a primary. By choosing a different storage backend, we could avoid the additional (and substantial) write traffic to MySQL.
As part of the migration, the team discovered two distribution-related bugs in the implementation. One bug caused the returned rate limit "reset timestamp" to "wobble" between requests. Some clients also had their requests rejected for being over the limit, but the response headers said that request quota is still available.
The team eventually fixed both bugs using a mix of two patterns. First, they stored some additional values in the Redis database instead of calculating them at runtime multiple times. This change increased the storage footprint of the rate limiter, but in return, guaranteed greater consistency. Then, they restructured the application to handle stale values received from read-secondaries in the cluster consistently.
While the problematic implementation relied on both stale and newer values within the same logical transaction, the fix involved using only one set of values per request to achieve consistency from the end user's perspective.