Netflix recently published how it built Marken, a scalable annotation service using Cassandra, ElasticSearch and Iceberg. Marken allows storing and querying annotations, or tags, on arbitrary entities. Users define versioned schemas for their annotations, which include out-of-the-box support for temporal and spatial objects.
An annotation is a piece of metadata that can be attached to an object from any domain. An example of a simple annotation is "Movie Entity with id 1234 has violence". A more complex instance includes an ML algorithm that can identify characters in a frame and wants to store that information in a queryable format. Marken allows for defining the annotations' schema, storing them, and querying them in real-time with CRUD and search operations, as well as having data available for offline analytics.
The team at Netflix decided to choose Apache Cassandra as the source of truth for storing annotations. Cassandra is an open-source, wide-column store, NoSQL distributed database that provides horizontal scalability. They chose ElasticSearch to support the search requirements and Apache Iceberg to make data available for offline analytics. The figure below depicts the overall architecture.
Source: https://netflixtechblog.com/scalable-annotation-service-marken-f5ba9266d428
Clients use Marken via one of several APIs. ML Algorithms ingest data via schematised data ingestion, while end-user applications (UI applications and other services) use CRUD APIs and a custom Query DSL to operate with the data. Internally, Marken uses several auxiliary services, such as a Netflix internal schema service for managing the schemas and Apache Zookeeper.
Netflix started with a 12-node Cassandra cluster and scaled up to 24 nodes to support the current data size. Some Netflix titles have more than 3M annotations (most of which are subtitles). Currently, the service has around 1.9 billion annotations with a data size of 2.6TB.
Data ingestion from ML data pipelines presents a unique challenge for system responsiveness. Varun Sekhri, senior software engineer at Netflix, and Meenakshi Jindal, staff software engineer at Netflix, describe this issue and their solution:
Data ingestions from the ML data pipelines are generally in bulk, specifically when a new algorithm is designed and annotations are generated for the full catalogue. We have set up a different stack (fleet of instances) to control the data ingestion flow and hence provide consistent search latency to our consumers. In this stack, we are controlling the write throughput to our backend databases using Java thread pool configurations.
Search latency for text queries with ElasticSearch is generally in milliseconds, while more complex semantic searches are typically complete in several hundreds of milliseconds. The below images illustrate average system latency in response to search queries.
Average search latency
Source: https://netflixtechblog.com/scalable-annotation-service-marken-f5ba9266d428
Average semantic search latency
Source: https://netflixtechblog.com/scalable-annotation-service-marken-f5ba9266d428
Marken persists all data to Apache Iceberg to accommodate bulk data queries in offline analytics scenarios. Iceberg is a high-performance format for huge analytic tables. It allows querying big data as if it was stored in SQL tables while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to work with the same tables. By using Iceberg, such queries don't impact the latency of online clients (such as UI applications), thus maintaining the overall system responsiveness under load.