BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Experiences Going from Event-Driven to Event Sourcing: Fangel and Ingerslev at MicroCPH

Experiences Going from Event-Driven to Event Sourcing: Fangel and Ingerslev at MicroCPH

Leia em Português

This item in japanese

As many other start-ups, Lunar Way started with a monolithic Rails backend application but shortly after going live three years ago, for organisational and technical reasons, a decision was made to start a migration towards a microservices architecture. This led to an event-driven microservices backend consisting of about 75 microservices, with asynchronous message passing as their main communication pattern.

At the recent MicroCPH 2019 conference in Copenhagen, Thomas Bøgh Fangel and Emil Krog Ingerslev, both from Lunar Way, a fintech company providing banking services though a smartphone, described how along this microservices journey they found out that their platform had some inherent problems. Last year, therefore, they decided to move to event sourcing in order to solve these problems and get consistency guarantees within the platform. In their presentation they discuss four problems they encounted and how they solved them using event sourcing and event streams.

Their first problem was a consistency problem. When a service is making a change in its database, an event should also be published on the message queue. But if the service crashed directly after the database update, no event was published — a zero-or-more message delivery. This led to a consistency problem which neither the event producer, nor the consumer, was aware of. This also led to strange support cases which they often fixed by resending events or by manually updating the data in the consuming service.

A solution to this is to make the state change and message publishing in one atomic operation. Their take on this was to use event sourcing where each event also represents the state change, which makes it inherently atomic. Ingerslev points out though, that this is an event internal to the service, and thus needs to be published. By reading each event, publish it and then store a cursor pointing to the event; they can guarantee that all events are published externally. If the service crashes and restarts, it will continue publishing events, starting with the event the cursor points at – at least once delivery.

Their next problem was adding a new service. Before starting, the service will need data from upstream services, but reading events suffers from the previous consistency problem. The solution is again event sourcing with an ordered event stream, that has the possibility of redelivery. By adding an API on top of an event stream, they enabled a consumer to read all events of a specific type from the very beginning. The internal events are an implementation detail; it’s about how a service stores its state, which shouldn't be exposed. Instead, they create integration events that are a projection of internal events, and these are the events a consumer reads.

The third challenge was broken services. A service receives events, but if an event is lost for some reason, the consumer service is unaware of this and ends up in an inconsistent state. To solve this, the consumer needs to know when an event is missing and be able to get it redelivered. The solution with redelivery of events works here as well, with the added functionality of redelivery from any point in the event stream, not just the beginning. The same technique with a cursor is also used here: when the consumer knows the last event received and the order of events, it can detect if the next event is out of order. A problem is that a missing event is not detected until a new event arrives, but this can be mitigated by regularly asking upstream services for events after the last one indicated by the cursor.

The last topic was about change. A service may change on the inside, but should still avoid breaking communication with other services or require big bang migrations of several services. In their first implementation they had a consensus among developers that only additive changes should be made to the events. When they had to change the structure in an event, they also had to make coordinated migrations of all services involved. In the new solution based on event sourcing, they are using integration events. There is only one event stream inside a service, but it’s possible to have multiple projections, each with its own integration event stream. By adding a new projection based on an evolution of the data model, consumers can later start to read the new integration stream and the new projection. This way a migration can be done without coordination between service deployments.

Ingerslev summarizes by pointing out that event sourcing libraries are not commonly available. This led them to build their own framework, which was quite expensive. He notes that introducing a new service design paradigm in all teams is hard and takes time, but emphasizes that the event sourcing patterns they’ve used are very composable and easy to improve. Fangel concludes by pointing out one pattern that emerged during their work. They went from a painful situation with special or manual handling of specific cases, to a normal mode of operation of a service, and this made it possible for the developers to focus on what’s most important — the business domain.

Most presentations at the conference were recorded and will be available over the coming weeks.

Rate this Article

Adoption
Style

BT