Martin Campbell, microservices scalability expert at DigitalOcean, talked about running a microservice based architecture with a distributed scheduler at MicroXchg Berlin 2017. He focused primarily on the problems encountered along the way, and the tradeoffs between offerings like Kubernetes, Nomad, and Mesos.
Some key takeaways included:
- Distributed schedulers allow you to reason about a cluster as if it is a single physical machine.
- They make DevOps easier, reducing a lot of operational complexity that is particularly prevalent in microservice based architectures.
- None of the offerings out there run stateful services particularly well, so it is best to not run these types of services using one.
- In the event of a network partition, a fault tolerant distributed scheduler should leave all processes running even if their host nodes cannot talk to the master.
Campbell first explains that an operating system kernel is a centralized scheduler. This is because it is something that manages multiple processes on a single machine. He then explains that a distributed scheduler is conceptually the same, except that it works across a cluster of machines as opposed to a single one: "You are taking your entire datacenter and treating it like it is a single physical machine"
Distributed schedulers are particularly beneficial in a microservice architecture. Campbell believes that this is because of the extra operational overhead caused by multiple services which are constantly being scaled and deployed.
Comparing various distributed schedulers, Campbell first spoke about his experience with Mesos. With it, you don’t need to care about what physical machine a process ends up on, as it handles deployment based on resource quotas such as CPU and RAM. It also provides a dashboard that allows to easily visualize the data center as if it’s a single physical machine.
Campbells main issue with Mesos was how it handles network partitions. If a process can no longer communicate with the Mesos master, it’s killed. Campbell believes this is a poor design choice, and in fact, as network partitions are commonplace in distributed systems the applications should still keep running in the event of one. He gave Kafka as an example of where this behavior leads to data loss. Although it’s a distributed message bus which is designed to be resilient, a partition could lead to losing nearly every single node, thus data.
Ultimately, Campbell abandoned Mesos, first seeking out Nomad as an alternative. The advantage of Nomad was that it had its own gossip protocol, allowing servers to communicate within the same data center, and across data centers. In the event of partition, services within the same partition could still function and communicate, and in the event of partition resolution, they would become eventually consistent. However, because Campbell did not know of any applications running Nomad in production, he did not want to risk the migration.
The final and chosen tool was Kubernetes. Whilst similar to Mesos, Campbell found a few advantages. Mainly, it handled network partitions differently and did not kill instances in the event of one. It also had a dashboard that made it easy to reason about the cluster, and fewer levels of abstraction when dealing with the applications.
The full talk is available to watch online, with more detail about the application architecture that Campbell dealt with, and more information about the various schedulers.