Pinterest created the next-generation asynchronous computing platform, Pacer, to replace the older solution, Pinlater, which the company outgrew, resulting in scalability and reliability challenges. The new architecture leverages Kubernetes for scheduling job-execution workers and Apache Helix for cluster management.
The company previously created Pinlater, an asynchronous job execution platform, and open-sourced it a few years ago. Pinlater has been used in production at Pinterest for many years and has supported many critical functional areas. The company operated several Pinlater clusters on AWS EC2, processing millions of jobs each minute.
Qi Li and Zhihuang Chen, software engineers at Pinterest, summarize the challenges with Pinlater that prompted the team to build the new platform:
With the growth of Pinterest over the past few years and increased traffic to Pinlater, we discovered numerous limitations of Pinlater, including scalability bottleneck, hardware efficiency, lack of isolation, and usability. We have also encountered new challenges with the platform, including ones that have impacted the throughput and reliability of our data storage.
The team concluded that it would be impossible to address all identified issues within the existing architecture and decided instead to invest in creating the next-generation platform, considering their experience using and operating Pinlater.
Pacer, the new architecture, comprises a stateless Thrift API service (compatible with Pinlater), a data store (MySQL), a stateful dequeue broker service, and job-execution worker pools running on Kubernetes. Apache Helix with Zookeeper is used to manage the assignment of job-queue partitions to dequeue brokers.
Pacer High-level Architecture (Source: Pinterest Engineering Blog)
The dequeue broker is a stateful service responsible for prefetching job-queue data from the data store and caching it in memory to reduce latency and isolate enqueue and dequeue workloads. Each dequeue broker is assigned a set of job-queue partitions so that jobs are fetched and executed exclusively by a single broker, hence avoiding any contention. Each job queue has a dedicated pool of pods provisioned in Kubernetes to eliminate the impact of skewed resource consumption by different job types.
The new dequeuing and execution model alleviates problems experienced with Pinlater, including avoiding scanning of all partitions or mitigating lock contention when fetching data from hot partitions. Additionally, it supports job execution in enqueuing order (FIFO) with a single partition configured for the job queue.
The new design requires an exclusive assignment of queue partitions to dequeue broker instances, similar to Kafka consumer topic-partition assignments. The team opted to use Apache Helix to help with that functionality. Apache Helix offers a generic cluster management framework and is used to track partition assignments across a set of dequeue brokers forming a cluster. Helix uses Apache Zookeeper to communicate resource configurations between Helix Controller and Helix Agents embedded in dequeue broker instances.
Dequeue Broker Coordination with Apache Helix and Zookeeper (Source: Pinterest Engineering Blog)
Helix Controller monitors dequeue broker instances joining and leaving the cluster and any changes to configured job queues, and on any change, it recomputes the ideal distribution of queue partitions to brokers. Once the latest partition allocations are saved in Zookeeper, individual broker instances update their internal state and fetch data for the queue partitions they are made responsible for.