Peter Morgan explains how to architect and build a highly scalable and dynamic system without caching any data. Peter is the head of engineering for the English sports betting company William Hill. He has described their architecture in a talk at QCon San Francisco 2015 and on several other occasions.
Since the odds on sporting events change constantly as the game progresses, so do the values of the bets which can be bought back at any time before settlement. A customer can bet on almost any action in an event such as who will get the next point in tennis, or kick the next soccer goal as well as the final score and the winner. Therefore no data can be cached, and the values of everything in the system must be up-to-date. Independent Erlang processes model the domain objects that are fed data streams from Kafka. The processes can then recalculate instantaneously the system values. Since Erlang processes are naturally distributed, and they share no data, this makes scalability easier. Bets are placed throughout the day, online, by phone, or in storefronts. Failure is not an option, and since the company operates worldwide, there is no real window for system downtime.
The high level architecture is illustrated in this figure:
Game feeds are purchased from vendors which are fed into a trading model.
Each sport has its own trading model. For example, if a kick is made, a million simulations will be run to determine, based on that event what the probabilities of an event happening (such as will Team A or Team B wins). This resultant stream of dynamic pricing data is sent to the betting engine. The trading models are William Hill’s intellectual property.
The betting engine has four parts. Capture is getting the bet from the customer. Cash-In is paying out money to a customer if they want WH to buy back their bet. Settlement is paying out money if a customer wins. Liability is the current measure of what is owed to customers as the games evolve. Each of these results is a stream of the corresponding data. Each stream is represented as a Kafka topic.
Analytics is machine intelligence applied to the data to analyze if the things that the model thinks are happening are actually happening.
Each domain object is an Erlang process. Since the only way you can access an Erlang process is to send it a message, this is a shared-nothing system. Erlang processes give amazing concurrency which allows for natural distribution of the work over a number of cores in a virtual machine. This allows for quick reloading of data for disaster recovery. Everything is in memory, and you send messages the same way whether you are on the same or different networks.
The graph of Erlang processes matches the data graph. So if in a given horse race, the odds on a particular horse change, a message is sent to the process that models that horse. That process then sends a message to all processes that model bets on that horse to reprice themselves. If a horse wins, a message is sent to the process modelling that horse, which sends the appropriate messages to the associated process bets.
Bottlenecks are easy to find in this architecture, they are the processes with the biggest queue.
One of the possible improvements in this architecture is to use Kafka as means to distribute the load to different data centers. Since the system is stateless, you can run multiple settlement engines in different locations and each will do the same work. The wallet can understand that it saw a transaction before and will not post duplicates.
An architecture based on streams and distributed processes allows William Hill to build a scalable, dynamic system that must always be up to date.