Using an event-centric approach, the Continuous Delivery team at eBay has built an orchestrator for continuous delivery that is resilient to failures and able to scale to handle the increasing amount of work in their build pipelines. John Long and Nataraj Sundar have written two blog posts describing their view on the overall benefits of event-sourcing and the advantages they have seen in their application development.
Long and Sundar, both working at eBay, note that the ideas behind event sourcing have been used for a very long time in many fields. In accounting for instance, every entry is recorded in an immutable way and the current balance is calculated by adding up all relevant entries. When mistakes are made, a new compensating entry is recorded instead of erasing the erroneous one. Comparing with the progress of code through a development pipeline, they see event sourcing as a natural match.
The system where event sourcing is implemented is the "Enterprise Continuous Delivery (ECD)" system. This is an orchestrator for coordinating, defining and observing the pipelines that move code through pull requests, builds, tests and deployment, and is used for many internal systems. The component within the ECD where event sourcing is used is the "Pipeline Execution Service (PES)", a service that runs and tracks the pipelines, and reports the state to GitHub. It’s written in Scala using the actor model framework Akka.
Besides the more general benefits of using event sourcing, Long and Sundar note three primary reasons for their use of event sourcing in the PES:
- Concurrency. They have had race conditions with events from different parts of pipelines received almost concurrently. Processing each event sequentially makes it easier to resolve concurrency issues.
- Debugging and traceability. Since they are a small team supporting many pipelines, it’s essential for them to be able to quickly find the reasons for failures and fix them.
- Clarity and correctness. The orchestration is critical for the company and can become very complicated, which requires a high quality and simple codebase that is easy to understand. Separating the code into the parts that record information coming in, calculating the final model and reacting to model changes is beneficial for that.
Their strongest reason for choosing event sourcing is the last point – clarity and correctness. Long and Sundar argue that for any reasonably complex system involving time and state, event sourcing is the way to go. With a well-designed model, different parts in a process are separately handled, thus creating a process easier to understand. Looking more into the details of their implementation, they describe four components, and note that each of these components is easy to grasp and discuss, as well as easy to change and test.
- Validation of incoming events, creating and storing relevant internal events as needed.
- Process the recorded events in the order they are inserted and making the appropriate update of the view model for each event.
- Persisting the view after each event to make it possible to run queries without a need to load and run all events for each query.
- Acting on state changes and deciding what to do because of read model updates. Actions are then performed by launching a new actor for relevant changes.
The system has so far handled over 2.2 million run events, resulting in around 200,000 run views; Long and Sundar claim that an event sourced architecture has been crucial in implementing a solution that is both performant and intuitive.