At the recent Large Installation System Administration (LISA) 2018 Conference, Sherry Xiao, production engineer at Instagram, explained how their team split Instagram’s services across datacenters in the US and Europe. They achieved data locality in their stateful services - Cassandra and TAO - by using new and modified tools from Facebook’s engineering team.
Facebook acquired Instagram in 2012, and the latter migrated to Facebook’s infrastructure. Instagram’s infrastructure was only in the US, whereas Facebook’s datacenters were in both the US and Europe. Instagram’s stack consists primarily of Django, Cassandra, the TAO distributed data store, Memcached and Celery async jobs. They had to split their services between US and EU datacenters to solve data storage space constraints. High latency from Cassandra quorum calls, partitioning the dataset for data locality, failover within the EU region and master-replica synchronization for TAO were the challenges the team had to overcome to make the split.
Image Courtesy - https://www.youtube.com/watch?v=2GInt9E3vrU
Instagram uses Cassandra as a general key value storage service. They moved it from AWS to Facebook’s own datacenters with other components. Cassandra uses a quorum of replicas across datacenters for both read and write consistency. Maintaining copies of the data in European datacenters would have led to a waste of storage, and quorum requests to travel across the ocean, which was inadvisable due to high latency. The Instagram team instead partitioned the data so that US users had their data in the rive US datacenters, and EU users in the three EU datacenters, using a tool called Akkio. Akkio is a data placement tool built by Facebook that can optimize data retrieval. It does this by grouping data into logical sets that are then stored in the datacenters closest to the end users that access them frequently. Akkio, says Xiao, "tracks the access patterns of end users and triggers the data migration".
This architecture absolves the need to store copies of all data in each datacenter. The US and EU DCs could operate independently, and quorum requests could stay on the same continent. Instagram also used the Social Hash partitioner to route requests to the correct buckets, especially for accounts with a high number of followers.
TAO is Facebook’s storage for the social graph used by Instagram also. In sharded mode, TAO has a single master per shard. Writes are forwarded only to the master - which runs in the US datacenters - and replicas are read-only. The team modified TAO so that it could write to the region-local master in the EU, avoiding the cross-Atlantic call. Why is Akkio not used here? "TAO has a different data model compared to Cassandra", explains Xiao, "where most of the use cases are keyed by user id, and the data belongs to the user itself". In contrast, media objects handled by TAO can be accessed by users all over the globe, and thus Akkio cannot do an optimal placement for the data based on locality.
The final architecture resulted in a stateless Django tier in front, backed by a partitioned Cassandra and TAO writing to local master nodes. The migration entailed a change in the disaster recovery (DR) planning as cross-ocean DR was not possible due to latency as well as different data sets. Each region was capable of handling the load from a failed datacenter by keeping a 20% headroom in each datacenter, according to Xiao.