The Pinterest engineering team has used OpenTSDB as their metrics engine for storing and querying metrics since 2014. Various performance issues caused by a growth in the amount of metrics data led them to develop their own time series database called Goku in C++, which exposes OpenTSDB compliant APIs.
Pinterest development teams use a system called Statsboard - a dashboard that integrates metrics from Graphite, Ganglia, and OpenTSDB with declarative configuration in YAML. Pinterest's monitoring used Ganglia in early 2012 which collected system metrics only, without any application metrics. Later that year Graphite was deployed for application metrics with statsd, followed by a clustered Graphite deployment. OpenTSDB was deployed in 2014, with a custom metrics agent that pushed data to a Kafka cluster from which it was pushed into OpenTSDB and Graphite through a processing pipeline. They had 1.5 million points/second throughput in OpenTSDB a couple of years ago. Pinterest's team faced Java garbage collection issues as well as frequent crashes of HBase, which OpenTSDB uses as its backend store. It is worth noting here that Pinterest has a large HBase deployment for many of their services.
Goku, their in-house time series database engine, attempts to improve on some specific areas in OpenTSDB. These include using an inverted index instead of an HBase scan, better compression of data points, clustered aggregation of queries, and a faster serialization format. Goku uses the Facebook Gorilla in-memory storage engine to store recent data, with persistent storage on an NFS. Pinterest is hosted on EC2 but it is not clear from the article if they are using AWS EFS or a self-hosted solution. The authors mention that Goku reads back data from disk into memory when it restarts.
Goku’s querying model is identical to that of OpenTSDB. The team wrote their own query aggregation layer to fan out and aggregate the queries across shards. Goku uses a two-level sharding strategy - based on the metric name followed by the tag key-value pairs. The queries are handled by a Goku proxy, which sends the query to individual Goku shards. The shards use the inverted index to get the ids of the time series being requested and fetch the data, run the individual aggregators (downsampling, summing, etc) and send it back to the proxy. The proxy sends it back to the client after a second round of aggregation. Another Goku improvement is in using Apache Thrift's binary data type instead of OpenTSDB's JSON format.
Using Goku has brought down the latency, the resource requirements, as well as the dataset size at Pinterest. Written in C++, Goku is fully compliant with the OpenTSDB APIs. Goku has many similarities with another Pinterest project called Yuvi, which is written in Java. Other engineering teams that have written or customized their time series metrics collection and query systems include Vivint, Uber, Improbable and Criteo.