At QCon San Francisco 2017, Chris Richardson, software architect, introduced techniques for data consistency in microservices. The main focus was on the saga pattern, a means of splitting up a distributed transaction into a series of smaller transactions that either all commit or rollback.
Key takeaways from the talk included:
- ACID guarantees are possible with individual microservices, as each one can have its own private database
- In order to achieve distributed transactions in microservices architectures, the saga pattern can be adopted. This splits a transaction into a series of smaller transactions, connected by messaging, which must either complete or rollback
- The saga pattern cannot give isolation guarantees, meaning that countermeasures must be made for anomalies
- In order to guarantee that a saga commits or rolls back, a combination of transaction log tailing and messaging can be used
Richardson explained that in a microservices architecture, each microservices should have its own private database which is not accessible directly by any other microservice. Whilst this promotes loose coupling, Richardson pointed out that it introduces problems with data consistency: "How do we implement transactions that now span multiple microservices?"
To solve this problem, Richardson introduced the saga pattern. Its core principle is that instead of having long transactions that hold onto locks (Like a two-phase commit), you break them up into a series of short transactions that commit in sequence. This leads to the following ACD characteristics:
- Atomicity: Either all transactions are executed, or all are compensated
- Consistent: Referential integrity is given by both local databases and the application code
- Durability: This is guaranteed by message brokers and databases
Whilst somewhat close to having ACID guarantees, the saga pattern is still missing isolation. This means that it is possible to read and write data from an incomplete transaction, thus introducing various isolation anomalies. To get around this, Richardson outlined various countermeasures. These included making transactions within a saga commutative, or even using a version file to allow them to occur in any order.
Richardson also demonstrated how much more challenging rollbacks are, as they no longer come for free (Like with an ACID database), and instead must be implemented within the application code. On top of this, when asynchronous sagas are triggered by synchronous API requests, a decision must be made around when a response should be given -- should it block until the saga is complete, or return immediately and be notified by the user? Richardson recommended the latter due to improved availability, and also for the fact that most UI’s can hide asynchronicity from the user.
Sagas can be coordinated in one of two ways: choreography -- where saga events are fired off asynchronously between microservices; and orchestration -- where a centralized service triggers and keeps track of all the steps in the saga. Richardson believes that the orchestration approach has the most benefits -- it reduces cyclic dependencies, and it is easier to reason about a saga if it is centralized. To help achieve this, he introduced Tram, an open source saga framework written for Java.
For cross-saga communication, Richardson explained that just like a normal transaction, a saga must be guaranteed to either complete or rollback. Because of this, he thinks messaging is the only logical option, mainly because of the durability guarantees that it has in comparison with HTTP. He also proposed that messages be written to the local database before they are sent, meaning that the transaction log can be tailed in order to publish new ones as they come in, further enforcing the ACD guarantee.
The full talk is available to watch online, and the source code for Tram is available on GitHub.