At last month’s FOSDEM, a member of the Criteo SRE team delivered a talk on scaling their Graphite installation using Cassandra as a storage backend. A custom Graphite plugin called BigGraphite written by the Criteo engineering team replaces the default WhisperDB with Cassandra in order to achieve fault tolerance and elastic scaling.
Criteo’s requirements -- fault tolerance and elastic scaling -- have already been solved by distributed databases. The Criteo team decided to use Graphite’s plugin architecture to write a custom plugin which enabled Cassandra as a storage backend. This plugin, which was open-sourced as BigGraphite, is designed to support different backends but currently only supports Cassandra.
Whisper, the default database that comes with Graphite, stores data in fixed-size files. The files are fixed size since Graphite stores data with pre-configured retention periods, with older data usually stored at a lower resolution. Criteo’s metrics collection spans six datacenters with 20k+ servers, which translates to writing around 800k points/second. The team has 2000+ dashboards with 1000+ alerts that are evaluated every five minutes. Graphite’s default configuration, including its storage backend, was unable to meet the demands of such a setup. Apart from the single-file-per-metric model which is wasteful, Graphite’s clustering is not robust and it is not "truly elastic", according to the talk. In addition, the command line tools for data model manipulation in Whisper are slow and brittle.
Image Courtesy : Talk Slides
In BigGraphite's architecture, a Carbon relay forwards events from each datacenter to the Carbon cache process that writes to Cassandra. A Carbon relay does the job of replication and sharding by pushing data to multiple Carbon cache processes, which writes the metrics data to disk. The move to BigGraphite included changes to Graphite’s Web UI also.
The Cassandra time series schema is described in the talk, although there is not much detail about how tags for a given metric are stored or queried. Each row in the Cassandra tables has the metric name and the starting timestamp as the primary key, with the column keys being the offset from the starting timestamp. Graphite’s way of storing metrics data is based on retention stages, e.g. seven days data at one minute resolution, six months at one day resolution, and so on. An aggregation function is used to calculate the data for older metrics. This is reflected in the Criteo team’s design for Cassandra tables. There are multiple tables for a given metric with each table for a given retention stage -- i.e. at what resolution the data points for a specific period should be stored.
The team evaluated multiple time series data stores -- OpenTSDB, Cyanite, KairosDB and InfluxDB -- in addition to Cassandra. OpenTSDB requires an HBase backend, and the Criteo team discarded it because it was difficult for them to get an HBase cluster in addition to the one that they already had for other purposes. The other options did not have the required features at the time when this evaluation was done, and were dropped as well.
Criteo’s Cassandra cluster runs with 20 nodes. The team is currently working on introducing Prometheus and is building a bridge between the systems.