BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Netflix’s Pushy: Evolution of Scalable WebSocket Platform That Handles 100Ms Concurrent Connections

Netflix’s Pushy: Evolution of Scalable WebSocket Platform That Handles 100Ms Concurrent Connections

Netflix shared details on the evolution of Pushy, a WebSocket messaging platform that supports push notifications and inter-device communication across many different devices for the company’s products. Netflix’s engineers implemented numerous improvements across the Pushy ecosystem to ensure the platform's scalability and reliability and support new capabilities.

Pushy is a vital part of an ecosystem of services that delivers notification messages to the Netflix app running on various devices, including TVs, game consoles, mobile phones, and web browsers. This part of Netflix’s platform has been around for some years, and Susheel Aroskar talked about Zuul Push at QCon New York 2018, discussing motivations, goals, and the initial design.

In a recent blog post, Karthik Yagna, Baskar Odayarkoil, and Alex Ellis, software engineers at Netflix, discussed the historical growth in the usage of Pushy and the need to modernize the platform:

Over the last five years, Pushy has gone from tens of millions of concurrent connections to hundreds of millions of concurrent connections, and it regularly reaches 300,000 messages sent per second. To support this growth, we’ve revisited Pushy’s past assumptions and design decisions with an eye towards both Pushy’s future role and future stability.

The engineers implemented improvements across many services involved in notification delivery, rewriting some entirely with scalability, performance, and reliability in mind. The team tackled Pushy's scalability by switching to a more performant and cost-effective instance type and increasing the average number of concurrent connections per node from 60,000 to 200,000, with the headroom to go up to 400,000. The service also relies on an exponential scaling policy based on the number of connections to balance them equally across the server pool.

The Evolution of Pushy Ecosystem (Source: Netflix Technical Blog)

Furthermore, the company replaced the cache implementation in the notification delivery flow (Push Registry) from Dynomite, Netflix’s open-source Redis wrapper, with KeyValue, a generic key-value database service developed internally. The Message Processor, based on the Mantis job, was reworked into a standalone Spring Boot service consuming messages from Apache Kafka topic to leverage automatic horizontal scaling, canary and red/black deployments, as well as more observability.

Over the years, Netflix's use cases for notification delivery have changed, with more emphasis on moving from asynchronous delivery to direct (synchronous) push and device-to-device messaging. New features required new components in the platform, such as the Device List Service, which provides the ability to look up customer devices registered with the same account. The service, combined with the Push Registry, enables inter-device messaging involving source and target Pushy instances.

Inter-device Messaging Support (Source: Netflix Technical Blog)

The team created an internal protocol stack to enable many bespoke messaging use cases, with three distinct layers: WebSocket and Pushy, device-to-device protocol, and client app protocol. The device-to-device protocol uses JSON to transport app-level messages between devices.

Further optimizations in the messaging platform included caching target device data to improve lookup times, WebSocket connection management and tuning, and a switch to an OkHttp client.

About the Author

Rate this Article

Adoption
Style

BT