The open source release of Docker in March 2013 triggered a major shift in the way in which the software development industry is aspiring to package and deploy modern applications. The creation of many competing, complimentary and supporting container technologies has followed in the wake of Docker, and this has lead to much hype, and some disillusion, around this space. This article series aims to cut through some of this confusion, and explains how containers are actually being used within the enterprise.
This articles series begins with a look into the core technology behind containers and how this is currently being used by developers, and then examines core challenges with deploying containers in the enterprise, such as integrating containerisation into continuous integration and continuous delivery pipelines, and enhancing monitoring to support a changing workload and potential transience. The series concludes with a look to the future of containerisation, and discusses the role unikernels are currently playing within leading-edge organisations.
This InfoQ article is part of the series "Containers in the Real World - Stepping Off the Hype Curve". You can subscribe to receive notifications via RSS.
Introduction
While the public cloud draws headlines, deployments on public Infrastructure as a Service (IaaS) or Platform as a Service (PaaS) offerings may not satisfy the regulatory, security, or performance demands of every workload. The economics of cloud metering do not fit every scenario, and a loosely-accounted and somewhat mysteriously large AWS bill has become increasingly common on CTO desks. An emerging hybrid pattern sees deployments divided between sensitive or otherwise prioritized on-premises jobs, and commodity tasks running on public cloud infrastructure, in an attempt to best exploit the strengths of each environment.
In the largest or most tightly regulated enterprise deployments, the desire to provision and to account platforms and infrastructure as an internal service, or even to monetize data center investments by offering utility access downstream, prompts concerns with portability, flexibility, and modularity in the components from which enterprise services are constructed.
CoreOS has created and integrated a number of open source projects and products in pursuit of a modular platform that satisfies the needs of modern container cluster infrastructure, and has studied these emerging patterns to make design decisions that reflect both the state of the art and the real world of application deployment. Paramount among these are the Application Container (appc) specification and its successor, the OCI Image Format Spec, along with the rkt runtime that executes these containers. Tying those projects together into highly available and highly automated clusters is the job of the Kubernetes orchestration system.
Motivations and Challenges
Reasons for public cloud
Enterprises use public cloud services for many reasons -- some even unique to the individual business -- but industry surveys shake out common purposes. The appeal of reducing costs is obvious, and public infrastructure promises an efficient outsourcing and aggregation both of hardware and utilities investments, and the expertise required to maintain them. Organizations also frequently cite decreased time to deployment and to market as a factor in cloud decisions, seeking to improve developer productivity by reducing the scope of developer concern to the application itself. The perceived and actual high availability of centralized computing utilities causes many businesses to favor cloud deployments relative to in-house infrastructure that may vary in robustness.
Reasons for private infrastructure
Security and regulatory
While those common reasons are convincing, they don’t cover every case. Security concerns for the most sensitive data or processing keep some workloads on-premises, for example in the financial services sector. These security requirements go hand in hand with regulatory demands, like HIPAA, that are often interpreted to require in-house data storage and retention.
Performance and latency
The need for near real-time or otherwise guaranteed performance also drives the choice to maintain in-house infrastructure. Beyond the performance required from compute resources themselves, network connections, routes, and firewalling policies may be internally maintained because of latency and security concerns, especially in industries where microseconds of secrecy can equal millions of dollars.
Actual cost
When graded on the curve those requirements describe, the cost of cloud services doesn’t always match expectations. Studies by KPMG and others have shown 30% or more of surveyed organizations report the costs of cloud migration and ongoing services to be higher than anticipated. While those marginal costs don’t annihilate the cloud value proposition, they do merit an examination of the proper division of workloads along economic lines.
Current enterprise strategies: Mixed
In a recent VMTurbo survey whose results were published in “Multi-Cloud 2016: Part 1”, 31% of organizations reported pursuit of a “mixed” public cloud plus private data center strategy, while another 28% aren’t deploying applications in the public cloud at all. Clearly, the bell does not yet toll for the on-premises or co-located data center.
Future direction: Hybrid
If the decisions about where applications run is to be made an economic one, how do we hybridize the public cloud’s convenience and efficiency with the power and security of on-premises computing? The more similar the processes and interfaces of the two (or more) environments, the greater the opportunity for organizations to control the expense of training and establishing procedures and tools for developers and users. The end goal is deployment transparency.
So what does a modern, common infrastructure for hybrid deployments look like? What are the component boundaries and what must the pieces support in order to be modular and flexibly composable? How can we control the complexity generated in pursuit of the truly hybrid cloud?
Solutions for a hybrid infrastructure
Containers: standardizing distribution and density
The application container, a method of software packaging that bundles application code with the dependencies it needs to run properly, is the first point where designing for the hybrid scenario helps control complexity. The container image, a file or file system package, serves as the objective application artifact for distribution. When executed, the container is isolated in a manner that reflects its self-sufficiency, both protected and barred from access to or from the operating system layer and other containers executing on the same node. Decoupling the application layer from the operating system and libraries below it allows both sides of the container isolation wall to be updated more frequently, without worrying about breaking interconnections between the two.
Containers today are a convenient way to distribute and isolate applications, but the existence of multiple container image formats limits their potential in hybrid deployments. If a container must be completely rebuilt, or at minimum reconfigured, in order to run in different execution environments and on different clouds, it is a less objective artifact than it could be.
Toward an industry standard container: appc, Docker, and OCI
When CoreOS and other leaders in the container space started the App Container specification two years ago, containers were gaining early adoption in industry, but no clearly specified and implementable specification existed to describe the container image that would be stored on disks and shipped on the wire to execution environments. The appc project and the App Container Image (ACI) specification always aimed to form a single industry and community standard for container images. The key was to involve players through competitive innovation and thinking in this still new space, so that the standards that emerged would be truly industry-wide.
Not long after the formation of appc, another standards body formed, the Linux Foundation’s Open Container Initiative (OCI). CoreOS is a proponent and member of both groups, and over time, the work of appc and the running-code Docker image implementation have converged greatly, allowing the creation of the OCI Container Image Specification. This work has recently been initiated at the OCI, with the Docker v2.2 image format as its basis, and the goal of incorporating the best features of the appc and Docker formats into a standard that any industry third party can adopt and implement.
Given a standard container image format, the basic building block is in place to make the cloud a logical extension of varying levels of on-premises compute resources. A truly standard and portable container will speed the shift from a mixed environment with its strict wall between two worlds, to a truly hybrid deployment where the same container can run anywhere you can buy cpu cycles, on-premises or public. This work will take time to develop, but as it progresses in the industry, the selection of where an application runs will become an increasingly economic rather than technical decision.
Operating system lessons: CoreOS Linux
CoreOS Linux is an interesting example of system architecture decisions informed by the reality of container clusters. If the container makes applications self-contained, portable, standard units, the operating system should adapt to empower this dynamic use case. In CoreOS, the operating system and basic userland utilities are stripped to their bare minimum and shipped as an integral unit, automatically updated across a cluster of machines. CoreOS may be thought of as a “hypervisor” for containers, that is itself packaged in a discrete and standard way.
Utilizing this single image distribution, CoreOS foregoes individual package management in favor of frequent and coordinated updates of the entire system, protected by a dual-partition scheme providing instant and simple rollbacks in the event of trouble. The isolation of all application code and dependencies in containers means these frequent operating system updates can deliver the latest features and security fixes without risk to the apps running above.
The decoupling of the application from the system and library dependencies layer is the force driving containers in the enterprise. CoreOS applies these lessons to the container support layer, the operating system, minimizing it and formalizing the semantics of updates.
Container lifecycle and execution
rkt: A modern, secure, modular container runtime
The rkt container engine is an open source project begun at CoreOS in 2014 to incorporate modern security best practices with the lessons learned in deploying container clusters at scale. rkt is an implementation of the appc container specification, discussed above, and can convert and execute Docker container images as well. rkt will add support for the emerging OCI Container Image Spec in step with that specification’s definition.
Security
rkt strives to solve two essential problems in container execution: security and manageability. Security is improved most visibly by rkt’s expectation that any container it executes should have a valid offline PGP signature from a known key. Signatures are part of the appc container image specification, implemented as rkt’s default. As part of the CoreOS system image, rkt itself gets the same rapid pace of updates, decoupled from the application layer, as the rest of the OS, ensuring the latest security fixes go into the distributed rkt immediately.
At a deeper level, rkt leverages CoreOS Tectonic’s Distributed Trusted Computing (DTC) technologies to keep a tamper-evident log of all container execution events. This facility is implemented with the Trusted Platform Module (TPM) found on the motherboards of modern servers, making this audit log safe even from privilege elevation attacks at the OS level. Since sophisticated attackers move quickly to cover their footprints in system logs, rkt’s tamper-evident TPM audit trail can provide powerful forensic evidence that might otherwise be destroyed in the event of an attack.
Manageability and portability
Manageability encompasses rkt’s desire to make the container runtime and the applications in the containers it executes monitorable, flexible, and modularly composable with other tools and workflows. This is addressed in rkt along three main axes: interposable isolation regimes; integration with standard service management and init systems, like systemd; and networking through a consistent abstraction, the Container Network Interface (CNI).
Stages of execution and isolation in rkt
rkt is implemented in modular parts, referred to as *stages* of container execution, similar to chaining in operating system bootloaders. The rkt binary is referred to as *stage0*, and it primarily an options processor that executes *stage1*, a binary responsible for executing the applications in a set of container images called a *pod*. By clearly seperating this responsibility, rkt can effectively control container execution in a variety of ways, and developers can implement their own customer stage1 images for advanced uses.
The default stage1 in rkt is the familiar software isolation of Linux kernel cgroups and namespaces, analogous to the container isolation provided by predecessors like Docker and LXC. A container image is verified by rkt, and this default stage1 creates the cgroup and namespaces in which it will run before executing the container’s entry point or main application.
As part of their Clear Containers project, Intel created an alternative rkt stage1 image, referred to as the *LKVM* stage1. Rather than using Linux kernel cgroups and namespaces, this stage1 runs an identical container image atop its own kernel and operating system, by making the KVM Linux hypervisor the primitive of isolation. This enables high-security or high-performance isolation of containers without imposing a different container image build process or deployment tool, and illustrates the concept of the container image as a modern apt-get, a way of packaging applications that fits many many styles of deployment with a standardized distribution artifact.
rkt process model and service management
Some new challenges in service management have arisen as unintended side effects in other container engines, evidenced by miniature re-implementations of basic process lifecycle management historically provided by init(8) on unix systems, or more commonly by systemd today. A well known example is Yelp’s dumb-init, created to handle process management and cleanup inside of their Docker containers. The need for utilities like dumb-init arises in part from the impedance mismatch between the usual host system management tools and the Docker engine daemon that imposes itself as a container process manager.
rkt does not invoke a persistent daemon to manage containers. This makes rkt a much more traditional unix executable than a client-server daemon for container management, and it means rkt integrates more readily with service management systems on the host. On CoreOS Linux, this means systemd. rkt leverages systemd externally and internally: systemd socket activation is one method of container invocation, so that rkt is itself executed and managed like any other systemd service.
Perhaps more interestingly, rkt’s default stage1 simply runs a copy of systemd inside of each executing container. This means the code required to exec and reap and restart and otherwise manage process lifecycles is the popular, well-known and well-tested systemd, rather than an incomplete and less general implementation in the container engine itself, or hand-rolled by each application developer. It also means that in-container microservices can be more easily managed with systemd units, whether by administrators on a single host, or by serving as a simple container execution API, even for naive cluster orchestration software not specifically integrated with rkt.
Container networking in rkt: CNI
The rkt architecture reflects similar choices about modularity in how container networking is implemented. Rather than make a particular view of networking “native” to rkt, network routing for containers is outsourced to CNI. A detailed discussion of CNI is outside the scope of this article, but the principle is similar to the Unix Virtual File System or other abstraction layers. To connect a container to one or more networks, rkt relies on CNI. CNI, in turn, defines a JSON configuration format and relies on plugin modules to actually perform network setup without requiring the container engine -- or cluster orchestrator -- to know the details of that network. The CNI API is intentionally narrow, allowing third-party developers to implement an imaginative array of network configurators, including innovative examples from Project Calico and Weave.
Orchestration
So far, we’ve described secure, portable containers running on a single host. How do the standard container image and rkt fit into the next layer up, the level of cluster orchestration? Modern applications typically run both at scales and with availability requirements beyond the ability of any single machine. Orchestration is about coordinating multiple servers to keep applications running quickly, reliably, and always.
Kubernetes
Kubernetes originated at Google as the successor to their internal Borg and Omega container cluster orchestration systems. Now a project of the Cloud Native Computing Foundation (CNCF), Kubernetes reflects the exceptionally long experience of pioneers in the container cluster architecture, most obviously Google itself, but also of CNCF members like CoreOS, RedHat, and many others.
Notably, CNI is the network plugin model for Kubernetes, and work to make rkt a first-class container execution environment for Kubernetes pods is near an alpha-release state today. The aim of this work is to make the underlying container engine a modular piece that can be swapped out to meet application and developer needs, much like CNI abstracts network details away from the core of Kubernetes. Paired with a standard container image format, Kubernetes makes portability three-dimensional by reducing the need for developers to care or even know “where” an application executes, instead presenting a mass of compute resources and constantly ensuring the deployed state of applications matches the developer’s declaration of desired state.
Conclusion
At CoreOS, we like to refer to the modern approach to infrastructure as #GIFEE: Google Infrastructure for Everyone Else. From CoreOS Linux to intensive work on container standards, rkt, and Kubernetes, the goal is making the massive scale with minimal management that powers the datacenters of Google and other compute giants much more widely available, and the projects discussed here, all open source, comprise a set of components from which that infrastructure can be built.
At the technical level, rkt strives to take a single, standard container image, sign it to secure it, name it in a flexible way to discover it, and run it across many different providers and in different regimes of isolation without changing that container image. CNI tries to simplify the container network problem in a similar way, and Kubernetes harnesses those components to distribute work on large clusters of machines to keep applications available and fast. Like the open IETF standards that make the internet interoperable, a container specification is the keystone to making the container image a portable, interoperable application distribution artifact. Ideally, developers tomorrow will deploy application code in containers with no more concern for “where” that code runs than they have today for which CPU core in their laptop executes a program thread.
About the Author
Josh Wood’s passion for systems and infrastructure brought him to CoreOS, where he is responsible for documentation. When procrastinating, he enjoys photographing polydactyl cats and writing short autobiographies.
The open source release of Docker in March 2013 triggered a major shift in the way in which the software development industry is aspiring to package and deploy modern applications. The creation of many competing, complimentary and supporting container technologies has followed in the wake of Docker, and this has lead to much hype, and some disillusion, around this space. This article series aims to cut through some of this confusion, and explains how containers are actually being used within the enterprise.
This articles series begins with a look into the core technology behind containers and how this is currently being used by developers, and then examines core challenges with deploying containers in the enterprise, such as integrating containerisation into continuous integration and continuous delivery pipelines, and enhancing monitoring to support a changing workload and potential transience. The series concludes with a look to the future of containerisation, and discusses the role unikernels are currently playing within leading-edge organisations.
This InfoQ article is part of the series "Containers in the Real World - Stepping Off the Hype Curve". You can subscribe to receive notifications via RSS.