Twitter is using replicated logs for high performance data collection and analysis of its systems. DistributedLog is the system developed at Twitter for this purpose. Twitter has developed a distributed key-value database, Manhattan. Manhattan can trade consistency for latency in reads following the eventually consistent data model. This means that even though the final state is always consistent, there are brief periods of time during which queries may return different results, based on the replica being queried. This is a parameter that had to be taken into account when designing DistributedLog.
Other requirements for the log service included reliability, high throughput, low latency, scalability, being lightweight operations wise and simplicity for developers. Developing a distributed logging system, workloads can be of three categories. Appending at the tail of the log, reading from the tail of the log or reading older entries trying to catch up towards the tail of the log where writes happen. For Twitter’s engineers it was important to design a system that could handle all three of these workloads simultaneously as this is where things most commonly break in production.
Engineers considered options like Kafka but dismissed it for lack of strong durability guarantees and its I/O model. Raft and Paxos looked appealing but would require a significant time and development investment for it. The team finally settled for Apache BookKeeper.
BookKeeper in contrast to other systems like LinkedIn’s Apache Kafka is focused at the low level aspects like flexible replication, high performance and ease of operations. Replication in BookKeeper is handled by the client, is highly configurable and can assign location to replicas. In terms of performance BookKeeper is segmenting the three different workloads that occur in distributed logs. It is using a journal and a dedicated journal disk for writes, writing data to memtable before sending an acknowledgement back to the client. This results in isolating writes from read workloads and having sequential I/O which can help performance wise in HDD’s. Trailing reads are using the memtable and catch-up reads are using a ledger storage that uses indexed log files, with the index being in-memory. This isolation results in predictable low latency under high stress.
DistributedLog, sitting on top of BookKeeper is the serving layer implementing abstractions over logs. It hides the details of BookKeeper’s ledger management, providing a high level API. It also provides naming and metadata for logs, finetuned log ownership and tunable read and write pipelines.
The end result is that adding DistributedLog to a Manhattan write path adds just 10ms to write latency on average with 99.9% of the reads being at most 20ms slower. More information is available in Twitter’s blog post by Senior Software Engineer Leigh Stewart.