Blue Matador migrated their Kubernetes infrastructure from a kops-managed cluster running on AWS EC2 instances to AWS's managed Kubernetes service EKS after they did a comparison of various features. They chose EKS for its better security model, a managed control plane, and a lower cost for their specific use case. While kops was the winner in setting up a new Kubernetes cluster, EKS scored higher in cluster management and security. InfoQ reached out to Keilan Jackson, software engineer at Blue Matador, to find out more about their experience.
EKS's shared responsibility model and its managed control plane were the primary reasons for migration. Prior to EKS, the Blue Matador team ran their own Kubernetes master nodes on 3 c4.large AWS instances. Kubernetes upgrades -- for both features, bug fixes and security patches -- were the responsibility of the team. AWS still provided a layer of security as the infrastructure was running within an AWS environment, but the Blue Matador team had to manage Kubernetes-specific security issues themselves. "Kubernetes clusters created with kops are by default set up very much like EKS", writes Jackson, in terms of resources like private networking, encrypted root volumes and security group controls. Setting up a new cluster using EKS needed some preparatory work, but EKS made it easier to manage the cluster once the initial setup was done.
Blue Matador primarily uses HashiCorp Terraform to manage their AWS resources via Infrastructure as Code. Terraform has implementations for many resource types across cloud providers, but real world usage reveals the challenges. Jackson spoke about the EKS-specific challenges they faced:
I tried to leverage the community-built EKS module as much as possible. The main issues I had were using out of date versions of the AWS provider and Terraform, and then connecting the managed resources from this module to my externally managed resources like our main ALB, RDS instances, and so on. I recommend outputting some terraform variables from the module you configure EKS in so you can reference them in your other modules, like this:
output "worker_role_arn" {
value = "${module.eks_cluster.worker_iam_role_arn}"
}
Although Terraform can create and manage EKS clusters well, the latter depends on peripheral resources that need to be tied together. Jackson elaborates:
EKS needs a lot resources besides the EKS cluster itself to run. You have to configure worker nodes, security groups, VPC networking, and have a plan to make updates when new versions of Kubernetes are supported by EKS. Definitely use the community module if you can, since it helps connect a lot of these essential resources correctly, but remember to double-check the settings against your security needs. For instance, make sure the security groups are only open to things that need them, that your worker nodes don't get public IP addresses, and that you are using an encrypted AMI for the root device.
Referring to the scale of their cluster, Jackson says that "the total size of the cluster has not reached a point where we had to use more than three masters in our kops cluster, but it was important that we are able to quickly and easily scale up nodes and update to newer versions of Kubernetes as they are released."
Managed Kubernetes offerings are usually integrated with their platform's monitoring solutions. Jackson explains how they monitor their cluster:
We primarily rely on our own product, Blue Matador, for alerting on our Kubernetes clusters. It finds things like unhealthy Deployments, critical node events, which pods run out of memory, and helps us keep tabs on cluster utilization. We also use Datadog, but only to graph a couple of custom metrics. We have our eye on CloudWatch Container Insights for Amazon EKS, but CloudWatch in general is not dynamic enough for Kubernetes so I would not rely on it for production alerting.
The migration also reduced both infrastructure and monitoring costs for the team.