BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Kubernetes the Very Hard Way with Large Clusters at Datadog

Kubernetes the Very Hard Way with Large Clusters at Datadog

This item in japanese

Laurent Bernaille from Datadog talked at the Velocity conference in Berlin about the challenges of operating large self-managed Kubernetes clusters. Bernaille focused on how to configure resilient and scalable control planes, why and how to rotate certificates frequently, and the need for using networking plugins for efficient communication in Kubernetes.

A traditional architecture approach is to have all Kubernetes master components in one server, and have at least three servers for high availability. However, these components have different responsibilities and can’t or don't need to scale in the same way. For instance, the scheduler and the controller are stateless components, making them easy to scale. But the etcd component is stateful and needs to have redundant copies of the data. Also, components like the scheduler work with an election mechanism where only one of their instances is active. Bernaille said that it doesn’t make sense to scale out the scheduler.

Hence, Datadog decided to split Kubernetes components in different servers with different resources and configure custom scaling policies. For components like the API server, they put a load balancer in front of it to distribute the requests correctly. And for the etcd server, they split it as well to dedicate one etcd cluster to handle Kubernetes events only.

Bernaille remarked that Kubernetes uses encryption and x509 certificates to communicate between all of its components. So, to avoid problems with certificates, such as the expiration, Datadog decided to rotate certificates daily. Yet, rotating certificates is a challenging task, as Kubernetes needs to install and use several certificates in different components and servers. Also, Datadog noticed that they had to restart components like the API server after every rotation. Therefore, Datadog decided to automate a daily certificate rotation and issue them using HaschiCorp Vault.

However, because of the way the kubelet works to generate certificates on-demand, Datadog decided to add an exception rule in daily rotations for the kubelet. In spite of the challenges and complexity, Bernaille recommends rotating certificates frequently. It’s not an easy task, but users can avoid problems in the future when a certificate expires, and there might no be evident signs of it in logs.

Bernaille also mentioned that Datadog had networking challenges because of the large number of servers they need to run their platform. Bernaille took the time to explain that Kubernetes nodes have a range of IPs they use to assign IP addresses to pods. Thus, for small clusters, configuring static routes to communicate between pods works well. But for medium clusters, one approach that works well is to use networking overlays where nodes communicate through a tunnel. For Datadog, the approach that works well for them is to give pods a routable IP throughout the whole network. This way, the communication to pods is direct, without having intermediaries like the kube-proxy. GCP supports this model with IP aliases, AWS as well with an elastic network interface (ENI), and for on-premises clusters, users can use tools like Calico.

Lastly, Bernaille talked about communicating across different clusters. By default, in Kubernetes, when an external request comes to the cluster, Kubernetes routes the traffic through the kube-proxy. But if the request arrived at the incorrect node where the destination pod is not running, kube-proxy has to redirect the request to the proper node. An alternative solution is to create an external traffic policy or use an ingress controller, but it doesn’t scale with large clusters. Therefore, Datadog uses native routing through an ALB ingress controller in AWS for HTTP communication only.

Bernaille finished by saying that they had other challenges with components like DNS, stateful applications, and application deployments, but he didn't have enough time to dive deep into these topics. However, he recommended watching Jerome Petazzoni’s talk for a deep dive into Kubernetes internals, and a previous talk from Datadog about Kubernetes the very hard way.

Rate this Article

Adoption
Style

BT