Kubernetes is an open source project to manage a cluster of Linux containers as a single system, managing and running Docker containers across multiple hosts, offering co-location of containers, service discovery and replication control. It was started by Google and now it is supported by Microsoft, RedHat, IBM and Docker amongst others.
Google has been using container technology for over ten years, starting over 2 billion containers per week. With Kubernetes it shares its container expertise creating an open platform to run containers at scale.
The project serves two purposes. Once you are using Docker containers the next question is how to scale and start containers across multiple Docker hosts, balancing the containers across them. It also adds a higher level API to define how containers are logically grouped, allowing to define pools of containers, load balancing and affinity.
Kubernetes Concepts
The Kubernetes architecture is defined by a master server and multiple nodes. The command line tools connect to the API endpoint in the master, which manages and orchestrates all the nodes, Docker hosts that receive the instructions from the master and run the containers.
- Namespace: Cluster partition with its own quotas and network.
- Node: Each of the multiple Docker hosts with the Kubelet service that receive orders from the master, and manages the host running containers (formerly called minions).
- Pod: Defines a collection of containers tied together that are deployed in the same node, for example a database and a web server container.
- Replication controller: Defines how many pods or containers need to be running. The containers are scheduled across multiple nodes.
- Service: A definition that allows discovery of services/ports published by containers, and external proxy communications. A service maps the ports of the containers running on pods across multiple nodes to externally accessible ports.
- kubectl: The command line client that connects to the master to administer Kubernetes.
Kubernetes is defined by states, not processes. When you define a pod, Kubernetes tries to ensure that it is always running. If a container is killed, it will try to start a new one. If a replication controller is defined with 3 replicas, Kubernetes will try to always run that number, starting and stopping containers as necessary.
The example app used in this article is the Jenkins CI server, in a typical master-slaves setup to distribute the jobs. Jenkins is configured with the Jenkins swarm plugin to run a Jenkins master and multiple Jenkins slaves, all of them running as Docker containers across multiple hosts. The swarm slaves connect to the Jenkins master on startup and become available to run Jenkins jobs. The configuration files used in the example are available in GitHub, and the Docker images are available as csanchez/jenkins-swarm, for the master Jenkins, extending the official Jenkins image with the swarm plugin, and csanchez/jenkins-swarm-slave, for each of the slaves, just running the slave service on a JVM container.
What’s New
The release of version 1.0 has brought into the platform a stable API and functionality to better manage large clusters, including DNS, load balancing, scaling, application-level health checking, service accounts and namespaces to segregate applications. The API and CLI allows now to execute commands remotely, log collection across multiple hosts, and resource monitoring. The number of network based volumes includes now Google Compute Engine persistent disk, AWS Elastic Block Store, NFS, Flocker or GlusterFS.
Version 1.1 adds on top of 1.0, including graceful termination of pods, support for Docker 1.8, horizontal auto scaling, and a new Job object that allows scheduling short running jobs (vs. long running applications) using Pods in the Kubernetes cluster.
Creating a Kubernetes Cluster
Kubernetes provides scripts to create a cluster with several operating systems and cloud/virtual providers: Vagrant (useful for local testing), Google Compute Engine, Azure, Rackspace, etc. Google Container Engine (GKE) provides supported Kubernetes clusters as a service. The next steps show how to create a cluster in both local Vagrant virtual machines and Google Container engine, and choosing one over the other is just a matter of personal preference for the examples included, although using GKE provides several extra features for networking and persistent storage. The examples would also work in any Kubernetes cluster in any other provider with minimal changes as explained in the respective sections.
Vagrant
The scripts included in Kubernetes can create a local cluster running on Vagrant, using Fedora as OS, as detailed in the getting started instructions, and have been tested on Kubernetes 1.0.6. Instead of the default three nodes (Docker hosts) we are going to run just two, which is enough to show the Kubernetes capabilities without requiring a more powerful machine.
Once you have downloaded Kubernetes and extracted it, the examples can be run from that directory. In order to create the cluster from scratch the only command needed is ./cluster/kube-up.sh
.
$ export KUBERNETES_PROVIDER=vagrant
$ export KUBERNETES_NUM_MINIONS=2
$ ./cluster/kube-up.sh
The cluster creation will take a while depending on machine power and internet bandwidth, but should eventually finish without errors and it only needs to be run once.
The command line tool to interact with Kubernetes is called kubectl. The Kubernetes release package includes a convenience script in cluster/kubectl.sh but the CLI can be independently installed, and used normally as kube-up.sh has created the required configuration files under ~/.kube/config
.
Ensure kubectl is included in the PATH as it is going to be used in the next sections, and check that the cluster is up and running using kubectl get nodes
$ kubectl get nodes
NAME LABELS STATUS
10.245.1.3 kubernetes.io/hostname=10.245.1.3 Ready
10.245.1.4 kubernetes.io/hostname=10.245.1.4 Ready
Google Container Engine
Sign up for Google Container Engine (GKE) following the steps on Google Cloud website, enable the Container Engine API, install gcloud tool and configure the Google Cloud with gcloud init.
Once you have logged in, update the kubectl tool, set some default values and create the Kubernetes cluster. Make sure to use your Google Cloud project id.
$ gcloud components update kubectl
$ gcloud config set project prefab-backbone-109611
$ gcloud config set compute/zone us-central1-f
$ gcloud container clusters create kubernetes-jenkins
Creating cluster kubernetes-jenkins...done.
Created [https://container.googleapis.com/v1/projects/prefab-backbone-109611/zones/us-central1-f/clusters/kubernetes-jenkins].
kubeconfig entry generated for kubernetes-jenkins.
NAME ZONE MASTER_VERSION MASTER_IP MACHINE_TYPE STATUS
kubernetes-jenkins us-central1-f 1.0.6 107.178.220.228 n1-standard-1 RUNNING
Set the configuration defaults to the cluster just created and configure kubectl
with its credentials.
$ gcloud config set container/cluster kubernetes-jenkins
$ gcloud container clusters get-credentials kubernetes-jenkins
Running kubectl
get nodes should correctly display the 3 nodes that compose our cluster.
$ kubectl get nodes
NAME LABELS STATUS
gke-kubernetes-jenkins-e46fdaa5-node-5gvr kubernetes.io/hostname=gke-kubernetes-jenkins-e46fdaa5-node-5gvr Ready
gke-kubernetes-jenkins-e46fdaa5-node-chiv kubernetes.io/hostname=gke-kubernetes-jenkins-e46fdaa5-node-chiv Ready
gke-kubernetes-jenkins-e46fdaa5-node-mb7s kubernetes.io/hostname=gke-kubernetes-jenkins-e46fdaa5-node-mb7s Ready
Cluster Info
After our cluster is up we can access the API endpoint or the kube-ui
web interface.
$ kubectl cluster-info
Kubernetes master is running at https://107.178.220.228
KubeDNS is running at https://107.178.220.228/api/v1/proxy/namespaces/kube-system/services/kube-dns
KubeUI is running at https://107.178.220.228/api/v1/proxy/namespaces/kube-system/services/kube-ui
Heapster is running at https://107.178.220.228/api/v1/proxy/namespaces/kube-system/services/monitoring-heapster
Example Code
Get the example configuration files from GitHub
$ git clone --branch infoq-kubernetes-1.0 https://github.com/carlossg/kubernetes-jenkins.git
Namespaces
Namespaces partition cluster resources across multiple users or applications.
$ kubectl get namespaces
NAME LABELS STATUS
default <none> Active
kube-system <none> Active
Namespaces can have associated resource quotas to limit the resources used in cpu and memory and number of resources of each type (pods, services, replication controllers, etc). In the following examples the default namespace is used, but new namespaces can be created to isolate the resources.
Some of the services running by default in the kube-system
namespace provide endpoints to Kubernetes system resources.
$ kubectl get endpoints --all-namespaces
NAMESPACE NAME ENDPOINTS
default jenkins <none>
default kubernetes 107.178.220.228:443
kube-system kube-dns 10.172.2.3:53,10.172.2.3:53
kube-system kube-ui 10.172.0.5:8080
kube-system monitoring-heapster 10.172.0.6:8082
Pods
A pod in Kubernetes terminology is a group of containers that would be deployed in the same Docker host, with the advantage that containers in a pod can share resources, such as storage volumes, and use the same network namespace and IP.
In this Jenkins example we could have a pod running the Jenkins master server, but there is a caveat, pods will not survive scheduling failures, node failures, lack of resources, or node maintenance. In case that the node running our Jenkins master as a pod fails, the container would never be restarted. For that reason we will be running the Jenkins master as a replication controller with only one instance as described below.
Replication Controllers
Replication controllers allow running multiple pods across multiple nodes. The Jenkins master server is run as a replication controller of size one to survive node crashes as previously mentioned, while Jenkins slaves use a replication controller to ensure there is always a pool of slaves ready to run Jenkins jobs.
Persistent Volumes
Kubernetes supports multiple backends for storage volumes
emptyDir
hostPath
gcePersistentDisk
awsElasticBlockStore
nfs
iscsi
glusterfs
rbd
gitRepo
secret
persistentVolumeClaim
Volumes are by default empty directories, type emptyDir
, that live for the lifespan of the pod, not the specific container, so if the container fails the persistent storage will live on. Other volume type is hostPath
, that will mount a directory from the host server in the container.
secret
volumes are used to pass sensitive information, such as passwords, to pods. Secrets can be stored in the Kubernetes API and mounted as files for use by pods, and are backed by tmpfs in RAM.
PersistentVolumes are a way for users to "claim" durable storage (such as a gcePersistentDisk
or an iscsi
volume) without knowing the details of the particular cloud environment and can be mounted in a pod using a persistentVolumeClaim
.
In a typical Kubernetes installation the volume backend used would be a networked one so the data can be mounted into any of the nodes where the pod can be scheduled to.
In the Vagrant example we are going to use a hostDir
volume for simplicity, with the caveat that if our pods get moved from one node to another the data would not move.
When using Google Container Engine, we need to create a Google Compute disk that will be used by the Jenkins master pod for its data:
gcloud compute disks create --size 20GB jenkins-data-disk
Jenkins Master Replication Controller
In order to create a Jenkins master pod we run kubectl with the Jenkins container replication controller definition, using Docker image csanchez/jenkins-swarm
, ports 8080 and 50000 mapped to the container in order to have access to the Jenkins web UI and the slave API, and a volume mounted in /var/jenkins_home. livenessProbe
config option is covered below under the self healing section. You can find the example code in GitHub as well.
The definition for Vagrant (jenkins-master-vagrant.yml
) is defined as follows:
---
apiVersion: "v1"
kind: "ReplicationController"
metadata:
name: "jenkins"
labels:
name: "jenkins"
spec:
replicas: 1
template:
metadata:
name: "jenkins"
labels:
name: "jenkins"
spec:
containers:
-
name: "jenkins"
image: "csanchez/jenkins-swarm:1.625.1-for-volumes"
ports:
-
containerPort: 8080
-
containerPort: 50000
volumeMounts:
-
name: "jenkins-data"
mountPath: "/var/jenkins_home"
livenessProbe:
httpGet:
path: /
port: 8080
initialDelaySeconds: 60
timeoutSeconds: 5
volumes:
-
name: "jenkins-data"
hostPath:
path: "/home/docker/jenkins"
The Google Container Engine definition just replaces the jenkins-data
volume definition to make use of the Google Compute Engine disk we previously created:
volumes:
-
name: "jenkins-data"
gcePersistentDisk:
pdName: jenkins-data-disk
fsType: ext4
And the replication controller is created in the same manner:
$ kubectl create -f jenkins-master-gke.yml
Afterwards, the Jenkins replication controller and pod are displayed in the corresponding lists.
$ kubectl get replicationcontrollers
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS
jenkins jenkins csanchez/jenkins-swarm:1.625.1-for-volumes name=jenkins 1
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
jenkins-gs7h9 0/1 Pending 0 5s
More details can be fetched using kubectl describe
, for instance in which node it is scheduled or if there was any error during provisioning.
$ kubectl describe pods/jenkins-gs7h9
Name: jenkins-gs7h9
Namespace: default
Image(s): csanchez/jenkins-swarm:1.625.1-for-volumes
Node: gke-kubernetes-jenkins-e46fdaa5-node-chiv/10.240.0.3
Labels: name=jenkins
Status: Running
Reason:
Message:
IP: 10.172.1.32
Replication Controllers: jenkins (1/1 replicas created)
Containers:
jenkins:
Image: csanchez/jenkins-swarm:1.625.1-for-volumes
Limits:
cpu: 100m
State: Running
Started: Sun, 18 Oct 2015 18:41:52 +0200
Ready: True
Restart Count: 0
Conditions:
Type Status
Ready True
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
Tue, 13 Oct 2015 20:36:50 +0200 Tue, 13 Oct 2015 20:36:50 +0200 1 {scheduler } scheduled Successfully assigned jenkins to gke-kubernetes-jenkins-e46fdaa5-node-5gvr
Tue, 13 Oct 2015 20:36:59 +0200 Tue, 13 Oct 2015 20:36:59 +0200 1 {kubelet gke-kubernetes-jenkins-e46fdaa5-node-5gvr} implicitly required container POD pulled Pod container image "gcr.io/google_containers/pause:0.8.0" already present on machine
[...]
After some time, as it has to download the Docker image to the node, we can check that the status is Running
:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
jenkins-gs7h9 1/1 Running 0 15s
The Jenkins container logs can be accessed from the Kubernetes API:
$ kubectl logs jenkins-gs7h9
Running from: /usr/share/jenkins/jenkins.war
webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
...
INFO: Jenkins is fully up and running
Services
Kubernetes allows defining services, a way for containers to discover each other, proxy requests to the appropriate node and do load balancing.
Every Service has a type field which defines how the Service can be accessed.
- ClusterIP: Use a cluster-internal IP only, the default;
- NodePort: Use a cluster IP, but also expose the service on a port on each node of the cluster (the same port on each node);
- LoadBalancer: On cloud providers which support external load balancers (Currently GKE and AWS), uses a ClusterIP and a NodePort, but also asks the cloud provider for a load balancer which forwards to the Service.
By default services are of type ClusterIP
which means services are only accessible from inside the cluster. In order to allow external access to the services one of the other types needs to be used: NodePort
or LoadBalancer
.
Each service is assigned a unique IP address tied to the lifespan of the Service. If we had multiple pods matching the service definition the service would load balance the traffic across all of them.
We are going to create a service with id jenkins pointing to the pod with the label name=jenkins, as declared in the definition of the jenkins-master replication controller, and forwarding connections to the ports 8080, Jenkins web UI, and 50000, needed by Jenkins swarm plugin.
Vagrant
Running in Vagrant we can create a NodePort
type service with kubectl.
---
apiVersion: "v1"
kind: "Service"
metadata:
name: "jenkins"
spec:
type: "NodePort"
selector:
name: "jenkins"
ports:
-
name: "http"
port: 8080
protocol: "TCP"
-
name: "slave"
port: 50000
protocol: "TCP"
$ kubectl create -f service-vagrant.yml
$ kubectl get services/jenkins
NAME LABELS SELECTOR IP(S) PORT(S)
jenkins <none> name=jenkins 10.247.0.117 8080/TCP
50000/TCP
Describing the service will show which ports in the nodes map to the port 8080 in the container.
$ kubectl describe services/jenkins
Name: jenkins
Namespace: default
Labels: <none>
Selector: name=jenkins
Type: NodePort
IP: 10.247.0.117
Port: http 8080/TCP
NodePort: http 32440/TCP
Endpoints: 10.246.2.2:8080
Port: slave 50000/TCP
NodePort: slave 31241/TCP
Endpoints: 10.246.2.2:50000
Session Affinity: None
No events.
We can access the Jenkins web UI using any of our Vagrant machine nodes and the NodePort
from the service definition in this case port 32440, so both 10.245.1.3:32440 or 10.245.1.4:32440 will forward the connection to the correct container.
Google Container Engine
Running in GKE we can create a more production-like load balancer with a public ip:
---
apiVersion: "v1"
kind: "Service"
metadata:
name: "jenkins"
spec:
type: "LoadBalancer"
selector:
name: "jenkins"
ports:
-
name: "http"
port: 80
targetPort: 8080
protocol: "TCP"
-
name: "slave"
port: 50000
protocol: "TCP"
$ kubectl create -f service-gke.yml
$ kubectl get services/jenkins
NAME LABELS SELECTOR IP(S) PORT(S)
jenkins <none> name=jenkins 10.175.245.100 80/TCP
50000/TCP
Describing the service will provide us with the load balancer ip (it may take a while to be created), in this case 104.197.19.100
$ kubectl describe services/jenkins
Name: jenkins
Namespace: default
Labels: <none>
Selector: name=jenkins
Type: LoadBalancer
IP: 10.175.245.100
LoadBalancer Ingress: 104.197.19.100
Port: http 80/TCP
NodePort: http 30080/TCP
Endpoints: 10.172.1.5:8080
Port: slave 50000/TCP
NodePort: slave 32081/TCP
Endpoints: 10.172.1.5:50000
Session Affinity: None
No events.
We can also use the Google Cloud API to get info about the load balancer (forwarding-rule
in GCE terminology)
$ gcloud compute forwarding-rules list
NAME REGION IP_ADDRESS IP_PROTOCOL TARGET
afe40deb373e111e5a00442010af0013 us-central1 104.197.19.100 TCP us-central1/targetPools/afe40deb373e111e5a00442010af0013
If we open a browser to the ip address and port listed 104.197.19.100:80 we should see the Jenkins web interface running.
DNS and Service Discovery
Another feature of services is that a number of environment variables are available for any subsequent containers ran by Kubernetes, providing the ability to connect to the service container, in a similar way as running linked Docker containers. This will be useful for finding the master Jenkins server from any of the slaves.
JENKINS_SERVICE_HOST=10.175.245.100
JENKINS_PORT=tcp://10.175.245.100:80
JENKINS_PORT_80_TCP_ADDR=10.175.245.100
JENKINS_SERVICE_PORT=80
JENKINS_SERVICE_PORT_SLAVE=50000
JENKINS_PORT_50000_TCP_PORT=50000
JENKINS_PORT_50000_TCP_ADDR=10.175.245.100
JENKINS_SERVICE_PORT_HTTP=80
Kubernetes also makes use of SkyDNS as an optional addon, a distributed service for announcement and discovery of services built on top of etcd, creating A and SRV records for services. A records have the form my-svc.my-namespace.svc.cluster.local
and SRV records _my-port-name._my-port-protocol.my-svc.my-namespace.svc.cluster.local
. Containers are configured to use SkyDNS and will resolve service names to the service ip address, /etc/resolv/conf
is configured to search in several suffixes to make naming easier:
nameserver 10.175.240.10
nameserver 169.254.169.254
nameserver 10.240.0.1
search default.svc.cluster.local svc.cluster.local cluster.local c.prefab-backbone-109611.internal. 771200841705.google.internal. google.internal.
options ndots:5
As an example, for the jenkins service we have just created, the following names will resolve:
jenkins, jenkins.default.svc.cluster.local, A record pointing to 10.175.245.100
_http._tcp.jenkins, _http._tcp.jenkins.default.svc.cluster.local, SRV with value 10 100 80 jenkins.default.svc.cluster.local
_slave._tcp.jenkins, _slave._tcp.jenkins.default.svc.cluster.local, SRV with value 10 100 50000 jenkins.default.svc.cluster.local
Jenkins Slaves Replication Controller
Jenkins slaves can be run as a replication controller to ensure there is always a pool of slaves ready to run Jenkins jobs. Note that while this setup helps understand how Kubernetes works, in this specific Jenkins example a better solution is to use the Jenkins Kubernetes plugin that automatically creates slaves dynamically as needed based on the number of jobs in the queue, and isolates job execution.
In a jenkins-slaves.yml
definition:
---
apiVersion: "v1"
kind: "ReplicationController"
metadata:
name: "jenkins-slave"
labels:
name: "jenkins-slave"
spec:
replicas: 1
template:
metadata:
name: "jenkins-slave"
labels:
name: "jenkins-slave"
spec:
containers:
-
name: "jenkins-slave"
image: "csanchez/jenkins-swarm-slave:2.0-net-tools"
command:
- "/usr/local/bin/jenkins-slave.sh"
- "-master"
- "http://jenkins:$(JENKINS_SERVICE_PORT_HTTP)"
- "-tunnel"
- "jenkins:$(JENKINS_SERVICE_PORT_SLAVE)"
- "-username"
- "jenkins"
- "-password"
- "jenkins"
- "-executors"
- "1"
livenessProbe:
exec:
command:
- sh
- -c
- "netstat -tan | grep ESTABLISHED"
initialDelaySeconds: 60
timeoutSeconds: 1
In this case we want to make the Jenkins slave connect automatically to our Jenkins master, instead of relying on Jenkins multicast discovery. For that we need to point the swarm plugin to the master host running in Kubernetes for both the http port (using -master
) and slave port (using -tunnel
).
We use a mix of values from the Jenkins service definition, the jenkins hostname, provided by SkyDNS, and Kubernetes injected environment variables JENKINS_SERVICE_PORT_HTTP and JENKINS_SERVICE_PORT_SLAVE
, named after the service name and each of the ports defined in the service (http and slave). We could have also used the environment variable JENKINS_SERVICE_HOST
instead of jenkins
hostname. The image command is overridden to configure the container this way, useful to reuse existing images while taking advantage of the service discovery. It can be done in pod definitions too. The livenessProbe option is covered below under the self healing section.
Create the replicas with kubectl:
$ kubectl create -f jenkins-slaves.yml
Listing the pods would show new ones being created, up to the number of replicas defined in the replication controller, one in this case.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
jenkins-gs7h9 1/1 Running 0 2m
jenkins-slave-svlwi 0/1 Running 0 4s
The first time running jenkins-swarm-slave
image the node has to download it from the Docker repository, but after a while, the slaves should automatically connect to the Jenkins server. kubectl logs
is useful to debug any problems on container startup.
The replication controller can automatically be scaled to any number of desired replicas:
$ kubectl scale replicationcontrollers --replicas=2 jenkins-slave
And again the pods are updated to add the new replicas.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
jenkins-gs7h9 1/1 Running 0 4m
jenkins-slave-svlwi 1/1 Running 0 2m
jenkins-slave-z3qp8 0/1 Running 0 4s
Rolling Updates
Kubernetes also supports the concept of rolling updates, where one pod at a time is updated. With a new jenkins-slaves-v2.yml
config file defining the new replication controller configuration, running the following command would cause kubernetes to update one pod every 10 seconds.
$ kubectl rolling-update jenkins-slave --update-period=10s -f jenkins-slaves-v2.yml
Creating jenkins-slave-v2
At beginning of loop: jenkins-slave replicas: 1, jenkins-slave-v2 replicas: 1
Updating jenkins-slave replicas: 1, jenkins-slave-v2 replicas: 1
At end of loop: jenkins-slave replicas: 1, jenkins-slave-v2 replicas: 1
At beginning of loop: jenkins-slave replicas: 0, jenkins-slave-v2 replicas: 2
Updating jenkins-slave replicas: 0, jenkins-slave-v2 replicas: 2
At end of loop: jenkins-slave replicas: 0, jenkins-slave-v2 replicas: 2
Update succeeded. Deleting jenkins-slave
jenkins-slave-v2
$ kubectl get replicationcontrollers
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS
jenkins jenkins csanchez/jenkins-swarm:1.625.1-for-volumes name=jenkins 1
jenkins-slave-v2 jenkins-slave csanchez/jenkins-swarm-slave:2.0 name=jenkins-slave-v2 2
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
jenkins-gs7h9 1/1 Running 0 36m
jenkins-slave-v2-7jiyn 1/1 Running 0 52s
jenkins-slave-v2-9gip7 1/1 Running 0 28s
In this case nothing is really changed, other than the name of the replication controller, but any other parameter could have been changed too (docker image, environment, etc).
---
apiVersion: "v1"
kind: "ReplicationController"
metadata:
name: "jenkins-slave-v2"
labels:
name: "jenkins-slave"
spec:
replicas: 2
template:
metadata:
name: "jenkins-slave"
labels:
name: "jenkins-slave-v2"
spec:
containers:
-
name: "jenkins-slave"
image: "csanchez/jenkins-swarm-slave:2.0-net-tools"
command:
- "/usr/local/bin/jenkins-slave.sh"
- "-master"
- "http://$(JENKINS_SERVICE_HOST):$(JENKINS_SERVICE_PORT_HTTP)"
- "-tunnel"
- "$(JENKINS_SERVICE_HOST):$(JENKINS_SERVICE_PORT_SLAVE)"
- "-username"
- "jenkins"
- "-password"
- "jenkins"
- "-executors"
- "1"
livenessProbe:
exec:
command:
- sh
- -c
- "netstat -tan | grep ESTABLISHED"
initialDelaySeconds: 60
timeoutSeconds: 1
Self Healing
One of the benefits of using Kubernetes is the automated management and recovery of containers. If the container running the Jenkins server dies for any reason, for instance because the running process crashes, Kubernetes will notice and create a new container after a few seconds. As mentioned earlier, using replication controllers ensures that even with node failures the defined number of replicas will always be running. Kubernetes will take the necessary measures to ensure that new pods are started in different nodes in the cluster.
Application specific health checks can be created too using livenessProbe
, allowing HTTP checks and the execution of commands in each of the containers in a pod. If the probe fails, the container is restarted.
For instance, to monitor that Jenkins is actually serving HTTP requests the following code is added to the Jenkins master container spec:
livenessProbe:
httpGet:
path: /
port: 8080
initialDelaySeconds: 60
timeoutSeconds: 5
We can also monitor that our slaves are actually connected to the master. This is needed because if the Jenkins master is restarted it may forget about connected slaves, and the slaves try to continuously reconnect instead of failing fast, so we check that there is a connection established to the master.
livenessProbe:
exec:
command:
- sh
- -c
- "netstat -tan | grep ESTABLISHED"
initialDelaySeconds: 60
timeoutSeconds: 1
Vagrant
We are going to kill the Jenkins master pod, and Kubernetes will create a new one.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
jenkins-7yuij 1/1 Running 0 8m
jenkins-slave-7dtw1 0/1 Running 0 6s
jenkins-slave-o3kt2 0/1 Running 0 6s
$ kubectl delete pods/jenkins-7yuij
pods/jenkins-7yuij
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
jenkins-jqz7z 1/1 Running 0 32s
jenkins-slave-7dtw1 1/1 Running 0 2m
jenkins-slave-o3kt2 1/1 Running 1 2m
The Jenkins web UI is still accessible in the same location as before, as the Kubernetes service takes care of forwarding connections to the new pod. Running the Jenkins data dir in a volume we guarantee that the data is kept even after the container dies, so we do not lose any Jenkins jobs or data created. And because Kubernetes is proxying the services in each node the slaves will reconnect to the new Jenkins server automagically no matter where they run! And exactly the same will happen if any of the slave containers dies, the system will automatically create a new container and thanks to the service discovery it will automatically join the Jenkins slave pool.
GKE
If a node goes down, Kubernetes will automatically reschedule the pods in a different location. We are going to manually shutdown the node running the Jenkins master server:
$ kubectl describe pods/jenkins-gs7h9 | grep Node
Node: gke-kubernetes-jenkins-e46fdaa5-node-chiv/10.240.0.3
$ gcloud compute instances list
gcloud compute instances listNAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
gke-kubernetes-jenkins-e46fdaa5-node-5gvr us-central1-f n1-standard-1 10.240.0.4 104.154.35.119 RUNNING
gke-kubernetes-jenkins-e46fdaa5-node-chiv us-central1-f n1-standard-1 10.240.0.3 104.154.82.184 RUNNING
gke-kubernetes-jenkins-e46fdaa5-node-mb7s us-central1-f n1-standard-1 10.240.0.2 146.148.81.44 RUNNING
$ gcloud compute instances delete gke-kubernetes-jenkins-e46fdaa5-node-chiv
The following instances will be deleted. Attached disks configured to
be auto-deleted will be deleted unless they are attached to any other
instances. Deleting a disk is irreversible and any data on the disk
will be lost.
- [gke-kubernetes-jenkins-e46fdaa5-node-chiv] in [us-central1-f]
Do you want to continue (Y/n)? Y
Deleted [https://www.googleapis.com/compute/v1/projects/prefab-backbone-109611/zones/us-central1-f/instances/gke-kubernetes-jenkins-e46fdaa5-node-chiv].
$ gcloud compute instances list
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
gke-kubernetes-jenkins-e46fdaa5-node-5gvr us-central1-f n1-standard-1 10.240.0.4 104.154.35.119 RUNNING
gke-kubernetes-jenkins-e46fdaa5-node-mb7s us-central1-f n1-standard-1 10.240.0.2 146.148.81.44 RUNNING
After some time Kubernetes noticed that the pod running in that node died, and so it created a new pod jenkins-44r3y
in another node:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
jenkins-44r3y 1/1 Running 0 21s
jenkins-slave-v2-2z71c 1/1 Running 0 21s
jenkins-slave-v2-7jiyn 1/1 Running 0 5m
$ kubectl describe pods/jenkins-44r3y | grep Node
Node: gke-kubernetes-jenkins-e46fdaa5-node-5gvr/10.240.0.4
If you go back to the browser, using the same ip as before, you will see the same Jenkins master UI, as it forwards to the node running the Jenkins master automatically. Any jobs that you may have created remain, as the data was in a Google persistent disk which Kubernetes automatically attached to the new container instance.
Note also that after some time GKE realized one of the nodes was down and started a new instance, keeping our cluster size:
$ gcloud compute instances list
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
gke-kubernetes-jenkins-e46fdaa5-node-5gvr us-central1-f n1-standard-1 10.240.0.4 104.154.35.119 RUNNING
gke-kubernetes-jenkins-e46fdaa5-node-chiv us-central1-f n1-standard-1 10.240.0.3 104.154.82.184 RUNNING
gke-kubernetes-jenkins-e46fdaa5-node-mb7s us-central1-f n1-standard-1 10.240.0.2 146.148.81.44 RUNNING
Tearing down
kubectl
offers several commands to delete the replication controllers, pods and services definitions.
To stop the replication controllers, setting the number of replicas to 0, and causing the termination of all the associated pods:
$ kubectl stop rc/jenkins-slave-v2
$ kubectl stop rc/jenkins
To delete the services:
$ kubectl delete services/jenkins
Conclusion
Kubernetes allows managing Docker deployments across multiple servers and simplify the execution of long running and distributed Docker containers. By abstracting infrastructure concepts and working on states instead of processes, it provides easy definition of clusters, including enterprise level features out of the box such as self healing capabilities, centralized logging, service discovery, dynamic DNS, network isolation or resource quotas. In short, Kubernetes makes management of Docker fleets easier.
About the Author
Carlos Sanchez has been working on automation and quality of software development, QA and operations processes for over 10 years, from build tools and continuous integration to DevOps best practices, continuous delivery and big scale deployments. He has delivered solutions to Fortune 500 companies from several US based startups, and now he works at CloudBees scaling the Jenkins platform. Carlos has been a speaker at several conferences around the world, including JavaOne, EclipseCON, ApacheCON, JavaZone, Fosdem or PuppetConf. Very involved in open source, he is a member of the Apache Software Foundation amongst other open source groups, contributing to several projects, such as Jenkins, Apache Maven, Fog or Puppet.