When moving from a monolithic system to a distributed or microservices system, you commonly also move from a single source of truth in one database to many databases and thus many sources of truth. Using an event architecture and persisting all events as a stream can give back the single source of truth, Ben Stopford claims in one of a series of blog posts about using events and Kafka.
Stopford, an engineer at Confluent, points out that in traditional messaging systems, events are ephemeral, there is no history of already consumed events. Persisting all events will, besides creating a single source of truth, also give the possibility of rewinding and replay events, like a version control system but for data. This also makes it possible to recover a corrupt system and replay events after a bug is fixed.
A typical event based system listens to events, updates and persists its state in a database and emits new events. For Stopford, this poses two challenges: first, maintaining consistency when writing both to a database and to an event log; secondly, the data in the database and the events may start to diverge because of, for example, different code paths, and this may cause problems with inconsistencies in the system. The best way to solve this is to make events first class entities and only use the events, as in an event sourcing system.
One way of starting with event streams is using Change Data Capture (CDC), a technique available in a growing number of databases. Using CDC, you write to the database and the writes are then converted into an event stream in the background. One advantage Stopford mentions is that this gives a consistency point; you read and write against the database, but still have an event stream without the need for distributed transactions to keep the database and the stream in sync.
One of the most important use cases for CDC is for Stopford migrating away from an old architecture. By connecting to a legacy system’s database using CDC, we can extract an event stream and gradually move away from using the legacy system into using the event stream.
A useful pattern when using event sourcing and an event stream is to hold events twice, once in a retention-based topic which keeps every change made in chronological order, used for the event sourced view, and once in a compacted topic which provides only the latest view of an entity and thus will be both smaller and faster.
Stopford concludes by noting that the top feature for a stream based event architecture is its ability to evolve. As new requirements emerge, new services can be built and then easily deployed and brought up-to-date by replaying all events from the beginning. He believes that Kafka with the functionality it provides is well suited to be part of this style of architecture.