Airbnb recently described how it built Himeji, a scalable centralized authorization system. Himeji stores permissions data and performs permission checks as a central source of truth. It uses a sharded and replicated in-memory cache to improve performance and lower latencies and has served checks in production for about a year. Its throughput has scaled up from 0 in March 2020 to 850k entities/sec in March 2021, while maintaining 99.999% availability and 12-millisecond latency at the 99 percentile.
The image below depicts Himeji's three-layer architecture.
First, the orchestration layer receives requests from clients and is responsible for fetching the data from the cache. The caching layer, which is sharded and replicated, is responsible for filtering in-memory and loading from the database on cache misses. Airbnb targets a ~98% hit rate on the cache. Finally, the data layer uses Amazon Aurora for durable database storage. Airbnb's SpinalTap detects data mutations and sends notifications over Apache Kafka to invalidate the cache.
Airbnb engineer Alan Yao describes the reasons that led to this architecture:
Over the last couple of years, Airbnb engineering moved from a monolithic Ruby on Rails architecture to a service-oriented architecture. In our Rails architecture, we had an API per resource to access the underlying data. These APIs had authorization checks to protect sensitive data. As there was a single way to access a resource's data, managing these checks was easy. In the transition to SOA, we moved to a layered architecture where data services wrap databases and presentation services hydrating from multiple data services.
According to Yao, Airbnb initially moved the permission checks to presentation services, as the image below shows.
This choice led to several problems. First, authorization checks were now duplicated and challenging to manage. Second, each authorization check needed to fan out to multiple services to execute the required logic, which severely degraded performance and reliability. The solution was to move the authorization checks to data services instead of presentation services and create Himeji, allowing storing permissions data in a centralized, scalable manner. The figure below depicts Himeji and how it is used in the Airbnb system.
Himeji's check API allows data services to perform permission checks. Data services can ask Himeji if a particular principal (e.g., user) has a relation (e.g., a privilege or action) on a specific entity. For example, a data service can ask, "Can user 123 write to listing 10's description?" This structure is called a tuple. It is inspired by Google Zanzibar, which is Google's global authorization system.
Permission rules in Himeji can be either stored in the DB or derived from the configuration. For example, the following configured rule allows a principal to read a listing's location if the principal is the owner of the listings (an owner is listing permission stored in the DB). Alternatively, it allows guests of a reservation related to the listing to read the location as well.
LISTING:
LOCATION:
'#READ':
union:
- #OWNER
- LISTING : $id # RESERVATION @
Reference(RESERVATION : $reservationId # GUEST)
As a result, if a guest tries to read the listing's location, the data service will check if that user's principal has permission to do. Based on the above rule, Himeji automatically queries if the principal is a guest on the listing's reservation automatically, and it will return the appropriate result.
To cut down on integration times and drive developer adoption, Airbnb created some tools. These include tools for porting pre-existing permission data with Apache Airflow and Apache Spark and scripts to autogenerate Java and Scala code.