Discord optimized its platform to serve over one million online users in a single server while maintaining a responsive user experience. The company evolved the guild component, which is responsible for fanning out billions of message notifications, in a series of performance and scalability improvements supported by system observability and performance tuning.
Key elements of the Discord platform have been implemented using Elixir, a functional language that runs on the Erlang VM. Elixir-based components are responsible for routing and delivering message notifications to users. The guild server is a central hub for managing various business flows for the Discord community served by it. The guild process interacts with many session processes that, in turn, use WebSocket connections to deliver messages to client apps on user devices. Another key element of the architecture includes the API service, written in Python, that is responsible for persisting messages in ScyllaDB.
Message Flow Through Discord Platform (Source: Discord Engineering Blog)
Given the previous design choices and platform constraints, the Discord team had to ensure that the guild process could continue handling an ever-increasing number of online users, that, for instance, in the Midjourney community exceeded one million. Yuliy Pisetsky, staff software engineer at Discord, discusses user experience considerations in relation to server performance:
In addition to overall throughput concerns, some operations end up getting slower the larger a server is. Ensuring that we can keep almost all operations quick is important to the server feeling like it's responsive: when a message is sent, others should see it right away; when someone joins a voice channel, they should be able to start participating right away. Taking multiple seconds to process an expensive operation hurts that experience.
Engineers spent considerable time trying to understand system performance. They instrumented the guild process's event processing loop to capture key metrics around message processing times. Using the process stack tracing, the team conducted the analysis, looking for factors contributing to message processing latency. They also created a helper library to estimate memory usage for large objects efficiently to help with optimizing the memory usage. Armed with observability data, they implemented several optimizations and considerably reduced the processing times of some message types.
The team achieved some big wins by reducing the amount of work for the guild process. They did this by disabling notifications for passive sessions, where users don’t interact with some of the communities they are members of. This change alone made the fanout work 90% less expensive and provided much-needed headroom for the growing number of users.
Developers introduced a new layer of relay processes to help with more efficient delivery of messages between the guild process and session processes. The relay process took over handling some of the business flows, freeing up the guild process to handle even more users.
Relay Process Layer (Source: Discord Engineering Blog)
Other optimizations included using an in-memory database, ETS, to store and safely share lists of members between processes, and creating a separate sender process to do the fanout to recipient nodes.
In the HN thread, the blog post's author replied to some questions from the community interested in additional details about some of the enhancements implemented by Discord.