Key Takeaways
- Event Sourcing works a bit like a ledger. Every event related to a "thing" (aka an "Aggregate") is recorded in a persistent journal (aka an Event Store). Whenever you need to know the current state of the thing, it can be recreated by "re-applying" all the events for the thing to the thing.
- It's important to respect the boundaries of our bounded-contexts and recognise that it's important that we manage the borders between our domains very carefully in our code. This means avoiding cross-domain dependencies on specific events.
- Should developers custom build core parts of a CQRS and Event Sourcing framework? No.
- Teams have to make two mutually-exclusive choices; whether to choose an event-driven architecture, and whether or not to apply event sourcing.
The idea of Event Sourcing has been around for a while. But recently, we've seen more real-world examples of this data storage and retrieval pattern. When should you use this approach? What are the architectural implications? Where do you depend on a platform versus an application framework? To answer these questions and more, InfoQ reached out to a pair of experts. Ben Wilcock is a Senior Solutions Architect at Pivotal Labs, and Allard Buijze is Founder and CTO of AxonIQ, the company behind the popular Axon Framework.
InfoQ: How do you explain Event Sourcing, and please give us a real-world example of where this pattern was an improvement over a more traditional architecture.
Buijze: Event Sourcing is a style of storing entities, not by directly storing their state, but rather as a sequence of events (i.e. an event stream) describing all the changes that happened to that entity in the past. The current state of the entity is recalculated by replaying all the past events on an "empty" instance. Generally, an event stream describes the changes that happened to an aggregate: a group of entities that are considered as one unit in respect to state changes. For example, instead of storing the state of an order, and being able to see that a specific item was ordered, and that the order was cancelled, with Event Sourcing, you would store the fact that the order was created, several items were added, some items were removed, the order was confirmed, then shipped, and then cancelled. The Event Stream contains a lot of information that may be valuable, but would have been lost if just storing the state.
Event Sourcing has a lot of advantages (and some disadvantages as well!). It provides, for example, a reliable audit trail of what happened to a component in the past. That is because it doesn't store this "trail" as a side-effect, but it's the only thing that the current state is based on. Any decision that has been made can be explained by the past events.
In one of the projects I was involved in, we used Event Sourcing to allow for this audit trail to be "native" to the application. We knew the information was also going to be useful for analytics, but simply didn't know how, yet. The application was an online card-game (called Bridge) that allowed users to play tournaments and win real money prizes. Somewhat similar to how poker tournaments work. As large money prizes were involved, auditing was important.
Not too far into the project, we noticed we didn't like the domain model we used in the "game engine", the component that implemented the rules of the Game. Because we used event sourcing, all of our tests were in the style of "given past events, when executing this command, expect these new events to be published". Both commands and events we driven by functional requirements, rather than technical ones. This meant we could leave the tests as they were, and freely re-implement the "game engine".
Some time later, with the system actively used in production, the audit trail has proven it's use: a number of complaints came in of users playing "suspicious moves". The lead developer decided to build some analytics tooling to investigate the situation. By replaying past events and creating different view models based on them, he managed to uncover a network of colluding players, that increased their chances of winning. They managed to seize a significant amount of money from their accounts, before it was paid out.
Had only the current state been stored, there was a big chance the information needed to uncover this fraudulent activity was lost as well.
Wilcock: When I'm with a client, I try to explain event sourcing in a way that resonates with them and reflects the domains that they recognise. So if I'm with a retailer, for example, I might use "Product Catalogue Management" as an example. Traditionally if a developer were asked to develop a solution for such a domain, they'd probably begin with a design that relies on CRUD based persistence rather than Event Sourcing. However, very quickly, as the true nature of the domain is discovered, it soon becomes clear that being limited to single row record for each product isn't enough. In real life, Products get added to Catalogues via a complex lifecycle that is full of key decision points (events) that need to be carefully tracked for all kinds of reasons - regulatory & competition compliance reasons, safety reasons, profitability reasons, supply-chain integrity reasons, operational reasons, and many other things all affect the journey of a product into the catalogue and the products lifecycle while it's there.
When we try to deal with these complex requirements using a purely CRUD based model, very quickly we need to start adding "Audit Tables", "Notifications", "Relationships" and "Rollbacks" and "Reports", and all this adds a great deal of complexity to our CRUD models. What initially seemed the "simplest thing that could work", soon becomes quite complex and not a particularly neat or elegant solution. By starting with CRUD we may soon "paint ourselves into a corner".
For this reason, I try to introduce clients to Event sourcing very early in the design process. Event Sourcing is fundamentally different. It's driven and shaped by the true nature of complex domains like the one I've described above. Event Sourcing treats domain events as first-class citizens of the solution design (rather than as an afterthought as is often the case with CRUD). For those new to the concept, Event Sourcing works a bit like a ledger (one of the oldest and most successful record keeping systems we've ever known). Every event related to a "thing" (aka an "Aggregate") is recorded in a persistent journal (aka an Event Store). Whenever you need to know the current state of the thing, it can be recreated by "re-applying" all the events for the thing to the thing. So for example, if we were tracking the RRP of a Product, we'd use PriceChanged to do this and those events may show that the product was initially $199 set by Bob, then dropped to $149 set by Jill and is currently $109 set by Jane. The "Event Store" would hold all these events for this product, and the Product's current price could be ascertained at any time by re-applying it's PriceChanged.
The beauty of this method is that there is no need for a separate audit or price history table. We know Bob, Jill and Jane all changed the products price at some point in the past because we stored all these events and "sourced" the current Product from them. We can also go "back in time" and show historic prices very easily. And finally, we can retrospectively add meaningful full history reporting, such as "who is top price changer since time began", "which quarter has most price volatility", and other business insights. Because we have every event ever recorded, we can do a lot of really neat things, we can build a new version of the system and see how it behaves with our real-life event records before we go into production!
It's true that Event Sourcing can have a learning curve associated with it initially, but once the true complexities of an enterprise scale domain are taken into account, it's usually the more elegant solution architecture.
InfoQ: How should we think about cross-domain interactions in an Event Sourcing world? In Ben's example, do we need to maintain a consistency expectation across Aggregate boundaries of PriceEvents and another "thing"? How does this play out in long-running Sagas?
Wilcock: I think I interpret this more generally as a Domain Driven Design question rather than purely as an Event Sourcing question. As such, I would recommend that readers seek out the DDD Distilled and Implementing DDD by Vaughn Vernon (@VaughnVernon) for some terrific insights on this topic.
On the specific topic of how to think about cross-domain interactions, I think it's important to respect the boundaries of our bounded-contexts and recognise that it's important that we manage the borders between our domains very carefully in our code. What we probably want to avoid is introducing cross-domain dependencies on specific events (like the PriceChanged event). If we do that (i.e. share the class definitions or formats for our events across bounded contexts), we diminish our ability to write code independently and introduce physical bindings into the code that were unintentional.
This, of course, affects how we think about consistency, but I think that anyone who buys into the idea of DDD and microservices probably already accepts the fact that cross-domain consistency is difficult and is probably accepting of eventual consistency and loose coupling instead. It is for this reason that patterns like Anti Corruption Layer are a common feature of DDD architectures.
Buijze: In essence, the concept of Event Sourcing itself does not ever cross boundaries. While exact definitions 'out there' differ, my take is that Event Sourcing is a choice for persisting the state of an object as a series of events, rather than state. How an object is persisted, it a choice that should never leak beyond a component, let alone among contexts.
However, in practice, the events that describe changes to an object are often also very useful for other components, and they get used to synchronize different models. Either in the same context, for example to update view models when applying CQRS, or in different contexts.
As Ben pointed out, in the latter case, extra precaution is required to make sure not to get unwanted coupling between the contexts. One things that you want to avoid, is simply publish all events out for any component to use. Let's illustrate with an example: a Shipping module is interested in knowing when an order has been placed. But the Order module doesn't emit exactly that event (Orders are much more tricky than that). Instead, it emits OrderCreated, ItemAdded, ItemRemoved, PaymentInformationRegistered and OrderConfirmed. The Shipping module would need to listen to all of these events to be able to get the information it requires to do its work. Worse than listening to all the events, it is also replicating much of the "logic" associated with processing these events, such as how to match an ItemRemoved against an ItemAdded.
This is where events should be treated with care when used cross-context. We generally recommend our customers to only share events between components within the same context. To synchronize components between context, you want more coarse-grained events describing what happened. One solution would be to send out a different event, such as OrderPlaced containing relevant order details, when the order was confirmed. The other is to consider the OrderConfirmed event a so-called "milestone" event, and make it contain more relevant data, so that it still carries value outside of the Order context.
Both solutions have advantages and disadvantages, and are considered part of Strategic Design in DDD.
Sagas are "just a component" in this sense. They an live within a context, dealing with low-level, order related details, for example, or live in between contexts, coordinating some higher-level process, such as making sure an invoice is created when an order is placed and that a shipment is planned. They help reduce the direct coupling of contexts, so that shipping doesn't need to know exactly "when" to create a shipment. A Saga could coordinate that instead.
InfoQ: In Event Sourcing, what's the role of an application framework versus platform component? That is, in an ES/CQRS architecture, what should we expect from our event stream processor or database, versus a Java framework like Axon? Additionally, should developers custom build any core parts of this architecture?
Wilcock: I'll let Allard comment on Frameworks, Stream Processing and Event Stores comparisons. But on the question, of whether developers should custom build the core parts of the framework, my answer would be a firm no; not if they have any choice.
This exact question has come up several times for me recently and my answer is always the same - don't do it. Designing, testing and delivering a safe, production-grade CQRS and Event Sourcing architecture is quite difficult (I'm sure Allard can back me up on this) and it's easy to make mistakes. That's not to say it can't be done, it can, but that doesn't change the fact that it's "undifferentiated heavy lifting".
Building your own CQRS/ES architecture in 2018 is definitely "below the value line" - it doesn't add any value to a regular Bank, Retailer, Manufacturer, Service Provider etc. Asking developers to work on a DIY solution may be technically interesting, but it's a complex exercise that will take time, brains and cash away from the things that really matter like attracting new customers or increasing profitability.
For me, a much more sensible approach is to seek out a reputable open-source alternative and start with that. There are plenty of credible options out there to choose from: For Java developers, the Axon Framework is a particularly good option. It's mature, stable, extensible, works well with Spring Boot and has some great reference customers in a variety of industries. Developers on .NET have many viable options (such as Brighter) as do those using Node (Wolkenkit) and users of several other programming languages. I'd investigate such alternatives first and start there. Because it's open-source you can always fork it later if you really must.
Buijze: Obviously, being the author of Axon Framework, I'm completely biased on this one, but I completely agree with Ben. I'd like to add that the reason AxonFramework exists, is because I started implementing an application using CQRS and Event Sourcing, and noticed a large amount of "plumbing" was required. I started sharing the common parts of the code and people started re-using that. This was early 2010. Building your own would mean you're ignoring 8 years of experience.
The way I position Axon Framework nowadays, is as a framework that helps separate the business logic from the infrastructure logic, allowing developers to implement the business logic using DDD principles, and providing the (common) infrastructure elements out of the box. Components need to interact, obviously. We have identified 3 reasons for this interaction, each with their own message type: commands - the system must do something (i.e. change state), events - a notification that something happened and queries - a request for information. Each of these messages have very distinct routing patterns.
When using proper abstractions, a component doesn't need to know about the exact infrastructure components being used to transport these messages. Even, it shouldn't even matter if the components are deployed as part of a single deployable unit, or in different ones. Infrastructure elements must be selected/configured that match the chosen deployment style and non-functional requirements. This is where the choice for event stream processors come in. It is an application framework's role to make sure these processors can be accessed using proper abstractions, so that the business logic is not tied to implementation specific choices.
At AxonIQ, we have noticed that most databases (especially the relational ones) are a reasonable fit to serve as an Event Store. But performance is impacted when the size of data in the database grows. While we have a natural tendency to think that NoSQL is the solution, it appears they are less optimal than the good old DBMS actually. It is important to realize what choices a database (especially the NoSQL implementations) has made. These choices often conflict with the expectations of an Event Store implementation. That's why we have recently released AxonDB, a database that is optimized for storing large amounts of events for the purpose of Event Sourcing.
To the question of whether developers should build their own core parts, I'd say no. Unfortunately, every now and then, we have to come to the conclusion that no suitable part exist yet that match our non-functionals.
InfoQ: What advice do you have for teams trying to do Event Sourcing in a large organization with many teams? Where are the pitfalls? What should, and shouldn't, be shared?
Buijze: My recommendation would be to make two clearly distinct choices. One is whether to choose an event-driven architecture at all, the other is whether or not to apply event sourcing. While both techniques empower eachother, there is no strict dependency of one on the other.
Whether to use Events as a primary element in the communication between components is an architectural decision. It should be embraced by all (or at least most) components for it to hold true value. I believe the answers to the questions before have already shown that using events is a very powerful approach when building complex systems.
Whether to use Event Sourcing, is a decision that needs to be made by each component. There can be directives or guidelines on the architectural level about when to and when not to, but it's a local decision. It is still possible to emit events, while not using those events internally as the sole source of state. You will still benefit from the ability to see what has happened (by storing the published events) and to asynchronously inform other components. It's just less reliable as an audit trail, as there is no guarantee that the state and history match. It's like putting a little ledger next to Kenny's grafitti wall that all artists that overwrite previous work will have to sign off. The idea is nice, but there's a big chance, at some point, there will be some entries missing. Having no audit trail is probably better than having one that proves you wrong....
I also urge anyone working on large scale (distributed) systems, and especially when using Event Driven Architecture, to read up on Domain Driven Design. In particular the part about Bounded Contexts contains some very good guidelines for making choices on where to put certain logic and how to design the interaction between components. Carefully choose the directions of dependencies. Events are a good way to invert a dependency, but let's not forget that we have Commands and Queries as well.
As a last remark: I am glad to see that Events are given a more dominant position in modern architectures. We must be careful not to over-react and see events everywhere (beware hammers & nails). Let's not forget about the Commands and Queries and start giving them the same amount of attention.
Wilcock: I'd probably start by saying that Event Sourcing in isolation is definitely useful, but it's power and potential is amplified when it's used to complement a CQRS architecture and Domain Driven Design. One of the pitfalls of using Event Sourcing in isolation is that it can be viewed simply an alternative persistence mechanism, but this undermines it's wider potential - it's ability (with CQRS and DDD) to put distributed event-driven architecture at the centre of an elegant and scalable solution architecture. Similarly, having events simply 'appear' without having some traceability or 'cause and effect' back to the commands that brought them into existence also seems like a lost opportunity. Having commands spawn events is a really elegant solution to many systems problems (far more elegant than traditional CRUD approaches in my book).
In a large organisation, another potential pitfall is that of commonality or standardisation between teams. Having many teams, each with their own 'do it yourself' approach to Event Sourcing could result in a significant waste of time, money & brains and cause non-trivial maintenance headaches for years to come. It's much better, in my opinion, to start with a common framework as a baseline and only deviate from this framework with strong justification. CQRS and Event Sourcing is a solved problem. Don't reinvent the wheel, and don't have 10 teams re-invent the wheel at the same time.
Finally, take great care with what event data you share between teams. Some events can be shared but many cannot, not without introducing inter-team dependencies and couplings that will be hard to decouple in the future. Patterns like 'anti-corruption layer' can prove useful in preventing these unintentional couplings, but can also add some complexity and overheads to your work. Whatever you decide, It's helpful to design events carefully, applying thought to what data should be in an event and what can be left out, and maybe thinking about the event classifications that you can use that identify and separate 'internal' events from 'external' events. Getting an event-driven solution design right can be a challenge, but the rewards can be great!
About the Panelists
Ben Wilcock is a Senior Solutions Architect for Pivotal Labs and helps Pivotal’s Fortune 500 clients to go cloud-native using the Pivotal Application Service (PAS) and Pivotal Container Service (PKS). Ben has a passion for CQRS, event sourcing, microservices, cloud, and mobile applications. He's also an established technology blogger whose articles have featured in DZone, Java Code Geeks, InfoQ, The Spring Blog and more. You can follow him on Twitter (@benbravo73) and read his blog.
Allard Buijze is Founder and CTO of AxonIQ. Starting at the age of 6, he has developed a great passion for programming and has guided both large and small organizations in building performant and scalable applications. Now, he is on a mission to make implementations of large scale systems easier, using the concepts of Domain Driven Design, Command-Query Responsiblity Segregation and Event Driven Architectures. He created Axon Framework as an experiment initially, but when both large and organizations started using Axon as a solution to their complexity problems, AxonIQ was born. Through his conviction that good craftsmanship can only be achieved through continuous and intensive exchange of experience with others, Allard is a frequent speaker at conferences and meetups and enjoys giving trainings to fellow developers and architects. Allard is also regularly found in board rooms, explaining the concepts and values of DDD, CQRS and EDA to C-level executives.