Twitter has replaced Storm with Heron which provides up to 14 times more throughput and up to 10 times less latency on a word count topology, and helped them reduce the needed hardware to a third.
Twitter used Storm to analyze large amounts of data in real time for years, and open sourced it back in 2011. The project was later incubated at Apache, becoming a top level project last fall. Having a quarterly release cycle, Storm has reached version 0.9.5 and is approaching the stable and desired version 1.0. But all this time, Twitter has been working on a replacement called Heron because Storm is no longer up to the task for their real-time processing needs.
Twitter’s new real-time requirements are: “billions of events per minute; have sub-second latency and predictable behavior at scale; in failure scenarios, have high data accuracy, resiliency under temporary traffic spikes and pipeline congestions; be easy to debug; and simple to deploy in a shared infrastructure.” And according to Karthik Ramasamy, Storm/Heron Team Leader at Twitter, they have considered several options to meet these requirements: enhance Storm, use a different open source solution or create a new one. Enhancing Storm would take too long and no other system met their scaling, throughput and latency needs. Plus, other systems are not compatible with Storm’s API, requiring rewriting all topologies. The decision was to create Heron, but keep its external API compatible with Storm’s.
Topologies are deployed to an Aurora scheduler which in turn runs them as a job consisting of several containers (cgroups): a Topology Master, a Stream Manager, a Metrics Manager (for performance monitoring) and multiple Heron Instances (spouts and bolts). A topology’s metadata is kept in ZooKeeper. The processing flow is adjusted using a backpressure mechanism to control the amount of data flowing through the topology. Besides Aurora, Heron can use other service schedulers such as YARN or Mesos. Instances run user code written in Java, and there is one JVM per instance. Heron processes communicate with each other via protocol buffers and there can be multiple containers on a single machine. (More details on Heron’s internal architecture can be found in the paper, Twitter Heron: Stream Processing at Scale.)
Twitter has completely replaced Storm with Heron which is currently processing “several tens of terabytes of data, generating billions of output tuples” per day, “delivering 6-14X improvements in throughput, and 5-10X reductions in tuple latencies” on a standard word count test, and resulting in a 3X reduction in hardware.
When we asked if Twitter intends to open source Heron, Ramasamy said “in the short term no, long term maybe.”