Key takeaways
|
Partitioning tools with VMs
Automated builds and delivery pipelines are a wonderful thing once they’re operational. But, provisioning build agents can be quite painful.
It’s rather common to see agents that are provisioned with multiple versions of tools. For example, an agent with multiple versions of the JDK, multiple versions of the .NET Framework, and multiple versions of say Node.js. Eventually, because of conflicts, it’s common to create pools of agents to partition toolsets. Maintaining this is no walk in the park. Quite often these agents are bespoke, hand-crafted machines. Which works fine until you need to scale, or you have a failure.
The need to scale often leads people to build golden VM images that they can use to spin up agents. Sometimes the images are built by hand. Sometimes configuration management is used. Either way, a hodgepodge of tools is necessary to create, update and distribute images. So that ultimately we can push a button and spin up a new agent based on one of these images.
VMs give us the ability to partition our toolsets but they’re not easy to work with. Maybe that will change someday. But for now, creating an image is at least a two-step process:
- Find an existing image, often with just the OS.
- Install our software and capture a new image.
Swapping a VM for a container
If we take a step back and consider alternatives for partitioning our toolsets, what we’re really focusing on is isolation. And with the recent rise in popularity of containerization there are many great tools to consider, like Docker. Tools to not only partition our environments but also simplify things.
With containers we can have images that contain everything necessary to run a single piece of software. And, container images largely alleviate the need to manually craft our own images. Registries of container images provide just about any tool we can imagine. For example, Microsoft provides an image with the latest .NET Core tools.
Of course we have to be careful about what we trust. And therein lies another helpful aspect of containerization, it’s incredibly easy to create and distribute our own images.
But, there’s a big problem that most people encounter when it comes to using containers for build automation and delivery pipelines. Most people try to get everything inside of a single container. Often this manifests as running an agent in a container alongside the tools it needs to perform builds and other tasks. Basically, swap a VM for a container.
While containers improve the tooling story, we’re still stuck handcrafting what is more or less a VM image with every tool necessary for a single project’s pipeline (build, test, package, deploy, etc).
A Series of Containers
If we challenge this mentality of containerizing the entire agent, we can instead use containers to run individual pieces of software. This isn’t about some best practice to only run one process in a container. It’s about simplicity.
With this approach we can get back to a homogenous pool of agents! All of our agents will have the exact same set of tools. And likely the only tool these agents will need is Docker or something similar to manage containers.
The one exception to a homogenous pool of agents is the need to run different OSes and/or CPU architectures. For example, if we use both Windows and Linux, we would have one homogenous pool of Windows agents and one homogenous pool of Linux agents.
Within a single OS we can pull each tool, by version, as it’s needed.
For example, if we’re executing a multi-step build process we might use a series of containers:
- A first container fetches dependencies with a tool like npm or nuget.
- A second container performs a compilation process with a tool like typescript, javac or the c# compiler.
- A third container packages up the result with a tool like tar or nuget.
Another build processes may only use one tool, like maven.
Either way, we can visualize pipelines in terms of the individual tools that we need to run. And then what we’re really looking to do is to take a workspace—a directory on disk—and pass it through a series of tools to produce the final result.
Imagine a pipeline as a conveyor belt. Much like the TSA screening process. You put your belongings into a bin. Then, they flow through an X-Ray machine. If all is well, your stuff comes out the other end for you to pick up.
If something seems amiss, your bin might be flagged for review by a human. In which case it’s carried to a private screening area where additional checks are performed. If you have something that’s not allowed, like a bottle of water, it’s removed.
So imagine your delivery pipeline starts out with a given commit in source control, which then triggers your source code to be placed in a bin. You then place that bin or “workspace” onto a conveyor belt. It flows to the first tool. Perhaps that tool fetches dependencies and places them into the bin. Then the bin goes back onto the conveyor belt.
It travels down the line to the next tool. Perhaps a compiler generates a binary with object code or some intermediate language. That build output is then placed in the bin. The bin is placed back onto the conveyor belt and it travels down the line to the next tool.
The next tool might pick up the bin, grab all the files inside that are a part of the build output and package them up as a zip file. The zip file goes into the bin. And then the bin travels on down the line to the next tool.
In a delivery pipeline this bin is akin to a directory on disk. Often referred to as a workspace or checkout directory.
Elevating the Workspace into Containers
What I find helpful is to think about the agent as the conveyor belt. It orchestrates the series of tools that a directory will flow through. It pulls images. It creates containers one at a time with each piece of software (tool). It elevates the bin into the tool’s container. Docker in particular does this with a simple bind mount, called a volume.
The agent captures the output (STDOUT, STDERR) as the tool runs. The tool is free to put whatever it wants into the directory and when it’s done the output is there to flow into the next tool in the pipeline. A pipeline is merely a series of transformations upon a directory. A series of projections.
I like to imagine a series of containers hovering over an assembly line. As the workspace moves down the line, it is elevated into each container. The container manipulates the workspace. And then the workspace floats back down to the conveyor belt.
With this mindset, we can get back to dumb agents that simply run an agent service and the Docker engine. We can forget about conflicts between different tools and even different versions of a single tool. The container image provides the entire filesystem for a given tool.
We can also permute any number of these containers to create any desired pipeline.
A dumb agent only needs Docker installed. When someone pushes a commit to a git repository, the agent creates an empty workspace, i.e. /work/1 for build #1. Then, the agent coordinates the execution of a series of containers each with a different tool as a part of the build or delivery pipeline. Any permutation of tools is possible without complicating agent provisioning. The agent 'elevates' the workspace into each container as a volume, the tool in the container does its work, writes files to the workspace, the container is destroyed, and then the workspace 'moves on’ to the next container.
Challenges
There are a few things you should keep in mind when considering this approach.
First, there will be different challenges that you face when you run software in a container. For example, application caches. Like a global package cache. You certainly don’t want to recreate caches every time a container is created. Volumes are one way to persist caches.
Second, you need to make sure you trust the images that you’re using. Or, build your own. Fortunately, unlike building a VM image, building a container image is easy and fast. It’s even something you could integrate into your pipeline, thus eliminating the need for external orchestration. And thanks to things like Docker’s build cache, image creation is idempotent and only has to run the first time you build the image. And then not again unless you make a change to what your image contains.
Third, cleaning up containers and images can be problematic. It’s easy to consume quite a bit of disk space. Especially if you have custom images. There are many ways to handle this, just make sure you consider it. For example, take a look at the upcoming prune commands in Docker 1.13.
Fourth, there’s a learning curve. Fortunately, Docker affords so many benefits to running software that people tend to want to learn about it once they get a taste of what it can do. I believe people should study the inner workings of containers and images too. Tools like Docker are magical in that users don’t have to know how things work. But without this perspective I don’t think people will be able to really take advantage of the benefits. For example, understanding that a container is just an isolated process afforded me the mental model necessary to create the ideas behind this article.
Why Docker?
There are a plethora of container management tools, but I’d recommend Docker because it affords the following benefits:
- Containerization for isolation’s sake, the added security is icing on the cake. We don’t want our tools to conflict with each other. We don’t want simultaneous jobs to conflict with each other.
- A diverse registry of existing images that you can trust.
- When I say trust, I mean for the most part images should be maintained by the authors of the tool you are using. Ninety-five percent of the time you shouldn’t have to build images. You should simply be able to mount your workspace and run a tool.
- Also when I say trust, I mean safe to run. Registries should scan images for common vulnerabilities and maybe provide things like signed images.
- A simple one liner to pull an existing image and run a container from it.
- A simple one liner to build an image that leverages a build cache.
- A simple means of “elevating” a directory into the container, i.e. bind mounting a host directory.
The Future
Containers have been around for a long time. One can argue that chroot in Unix back in the 80s was the first version of a container. Containers only recently garnered attention. That’s largely because of the simplicity in image distribution.
A large part of the reason why VMs are a hassle is due to managing VM images. The benefit of containerization isn’t just a reduction in overhead, it’s largely a function of the fact that the tools to create container images are far easier to work with than the tools we have with VMs. They’re faster too. And, maybe the most important part, the ecosystem of existing images that we can use, without customization, is another reason why people prefer containers.
There’s nothing stopping us from bringing these same improvements to VM images. And the overhead (speed and memory) of VMs can be reduced substantially too. Microsoft did exactly this with Hyper-V Windows Containers. And Intel’s Clear Containers Project proved that Linux VMs can be spun up in milliseconds too, with very little overhead. And if you look at tools like Vagrant and Packer, you’ll see that there’s actually a large community of people doing similar things with VMs.
How we run software will change in the coming years. If you keep this “workspace elevation” pattern in mind you can apply it to new contexts. Perhaps one with VMs for isolation, where each tool runs in its own VM. A VM that lives for the lifespan of a single tool in your delivery pipeline.
About the Author
As a consultant, Wes Higbee helps people eradicate emotional blind spots with technology and produce rapid results. His career has been a journey. He started out as a software developer. In working closely with customers, he realized there are many needs beyond the software itself that nobody was taking care of. Those are the needs he addresses today, whether or not technology is involved. Along the journey Wes has had a passion for sharing knowledge. With 15 courses and counting, he has helped thousands of people improve. He works with both Pluralsight and O’Reilly. He’s been a speaker at countless local meetups, community organizations, webinars and conferences. And he speaks professionally to help organizations improve.