The Google Cloud Platform team started an article series to share its views on containers, leveraging their 10-year experience on the technology. Google's first two articles provide an overview of the topic. They explain the rationale behind container clusters and their defining traits. Along the way, they show how it all applies to Kubernetes.
Containers provide several benefits. These include a simpler deployment model, fast availability and an ideal infrastructure layer for microservices architectures. Container products, such as Docker, are concerned with operating on a single computer. That creates a need to orchestrate multiple containers. Or, as Joe Beda, Senior Staff Engineer and co-founder of Kubernetes prefers to say, a need to manage an "improv jazz performance", to better reflect that container management reacts to "conditions and inputs in real time".
Container clusters, such as Kubernetes, provide services around cluster management, networking and naming. They create "a layer of abstraction that allows the developer and administrator to work collectively on improving the behaviour and performance of the desired service, rather than any of its individual component containers or infrastructure resources".
As the number of containers shoot up, container coordination needs quickly become apparent. Google is an extreme case, launching seven thousand containers per second, or two billion per week. Hence, the need for container clusters.
Among other requirements, container clusters built by Google have to be able to update while keeping available, be scalable and easy to instrument and monitor. By meeting these requirements, Google gains many benefits. It can build microservices with clear contracts and boundaries. In turn, clear boundaries allow small engineering teams to keep software manageable and scalable. Container clusters provide self-healing and frictionless horizontal scaling, with highly efficient resource utilization rates. The ability to specialize "roles for cluster and application operations teams" is a less obvious benefit. Joe Beda mentions that, for instance, "the GMail operations and development teams rarely have to talk directly to the cluster operations team". Developers can focus on building the service instead of thinking on the underlying infrastructure.
"Ingredients of a great container cluster"
Joe also shared the "ingredients of a great container cluster manager". They include dynamic container placement, thinking in sets and connecting within a cluster.
Dynamic container placement is all about finding a place to run a given workload, while respecting the declared intentions on how and where a container should run. This is an instance of the knapsack problem. How to fit the containers, each with their own constraints, in the finite compute resources available? When scheduling a given workload, the cluster has to take into account available capacity (e.g. CPU, RAM), special hardware needs and real-time changing conditions (e.g. failures, auto-scaling). Kubernetes solves this problem with pods:
A pod (as in a pod of whales or pea pod) correspond to a colocated group of Docker containers with shared volumes. A pod models an application-specific "logical host" in a containerized environment. It may contain one or more containers which are relatively tightly coupled -- in a pre-container world, they would have executed on the same physical or virtual host.
The second ingredient is providing the possibility to think in terms of sets. Kubernetes provides labels and replication controllers to enable it. Each pod can have a set of labels, each a key/value pair. Labels groups pods, e.g. by application tier or geographic location. Labels can then be used in several ways, e.g. by services and replication controllers. Replication controllers, as the name implies, ensure that a number of pod replicas are always alive, thereby helping on scaling out activities.
Kubernetes labels. "What makes a container cluster?" article.
The third ingredient for a successful container cluster is all about communication. Communication is a critical component when you have many containers and thus, microservices. To allow microservices to talk to each other without knowing where their respective containers are actually running, a good container cluster should provide a naming resolution system. It's important that the naming resolution is aware of any changes right after containers are started or moved. Kubernetes provides this lightweight approach to service discovery with the help of labels and a Watch API pattern. This pattern provides a way to deliver async events from a service, inspired by the Google Chubby paper. The Kubernetes team is aware that not all clients are going to be rewritten to use that API immediately. Kubernetes provides the notion of a service proxy to handle this scenario.
[A service proxy] is a simple network load balancer/proxy that does the name query for you and exposes it as a single stable IP/port (with DNS) on the network.
In short, the Google Cloud Platform team, defines a container cluster as:
(...) a dynamic system that places and manages containers, grouped together in pods, running on nodes, along with all the interconnections and communication channels.
Google's move to containerization started with the addition to the Linux kernel of cgroups (short for Control Groups), a feature that isolates resource usage of a collection of processes. Docker popularized container technology by simplifying it enough to be adopted by the wider community.
InfoQ has been covering Kubernetes evolution, including an in-depth article by Carlos Sanchez.