Uber faced some challenges after introducing ads on UberEats. The events they generated had to be processed quickly, reliably and accurately. These requirements were fulfilled by a system based on Apache Flink, Kafka, and Pinot that can process streams of ad events in real-time with exactly-once semantics. An article describing its architecture was published recently in the Uber Engineering blog.
The authors explain that ad events processing had to publish results with the smallest delay possible, without losing data and without overcounting them. These requirements were fulfilled by a solution built with the help of Apache Flink, Kafka and Pinot. The system is composed of Flink jobs communicating via Kafka topics and storing end-user data in Hive and Pinot.
According to the authors, the system’s reliability is ensured by relying on cross-region duplication, Flink checkpoints and Kafka’s record retention policy. Accuracy is accomplished by leveraging exactly-once semantics in Kafka and Flink, upsert operations in Pinot and unique record identifiers for idempotency and deduplication purposes.
The combination of Kafka transactions with Flink checkpoints and its two-phase commit protocol ensures that Kafka consumers see only fully processed events. Additionally, it provides that the Kafka offsets stored in checkpoints are in line with committed records. With Kafka transactions, any uncommitted events caused by failures are ignored. Flink’s checkpoints happen periodically and give stateful jobs the same semantics as failure-free execution by allowing it to recover state and stream positions from a known-good point in time in case of failure. Combining Flink's checkpoints with its two-phase commit protocol enables exactly-once semantics.
Source: https://eng.uber.com/real-time-exactly-once-ad-event-processing/
Incoming raw ad events are validated, deduplicated with the help of Flink’s keyed state, stored temporarily in a Docstore database, and aggregated into one-minute tumbling windows. The authors explain that they chose this one-minute window because it’s simultaneously small enough to provide good granularity for analysis and large enough to avoid overloading their databases with write operations. The aggregation results receive a unique identifier.
Another job correlates order and ad events. Its results also receive a unique identifier. The authors add that the time-to-live setting of ad events stored in the Docstore database guarantees that only relevant events are held: they exist only during the duration of their correlation window.
According to the authors, the unique record identifiers generated are used with Pinot’s upsert feature to ensure that records with the same identifier are never duplicated and maintain exactly-once semantics. They are used with a similar purpose in Hive for record deduplication.
The authors point out that their Pinot deployment is active-active across their two regions and does not replicate data. Therefore, a Flink job unions events from both regions, ensuring that the same data is stored in both.
Apache Kafka is a distributed event streaming platform widely used in the industry. Apache Flink is used for performing stateful computations on streaming data because of its low latency, reliability and exactly-once characteristics. Apache Pinot allows building user-facing latency-sensitive analytical applications.