After an initial beta program, HashiCorp have publicly released Atlas, a commercial platform that unites their open source tools for development and operations to create a version control system for infrastructure management. Atlas integrates HashiCorp’s Vagrant, Packer, Terraform, and Consul tooling, with the primary goal of promoting ‘automation, audit and collaboration on infrastructure changes’ across the modern datacenter.
The HashiCorp blog states that the goal of operations and infrastructure management is to deploy and maintain applications with an ‘automated, error-free, and auditable process’. However, organisations often use complex manual processes that are time-consuming, error-prone, and difficult to scale. Atlas attempts to solve this problem by providing integrated tooling and a collaborative workflow for both developers and operators; from application code push through to deployment artifact creation, infrastructure provisioning and ultimately application lifecycle management during runtime:
Atlas was released in beta during December 2014, and organisations using the product include Mozilla, Cisco and Capgemini. The HashiCorp blog proposes that Atlas provides the same benefits of application code version control systems for infrastructure code and configuration:
Just as version control for application code grants transparency, auditability, and collaboration to application development, version control for infrastructure accomplishes the same for infrastructure management on both public and private clouds. As a result, Atlas provides a common workflow to simplify deployments across one or many clouds.
Atlas integrates the following HashiCorp open source tooling, and provides them as a paid-for (per node) hosted service:
- Vagrant provides virtual machine box builds for the creation of ‘lightweight, reproducible and portable development environments’. The Atlas integration allows the creation, hosting and distribution of Vagrant boxes throughout a team.
- Packer automates deployment artifact builds and storage. Packer can be run on Atlas to create and store versioned machine/instance ‘golden images’ for AWS EC2, Docker, VMWare, OpenStack, and several other cloud/VM vendors.
- Terraform automates infrastructure provisioning and management across cloud/datacenter vendors. The Atlas integration with Terraform allows state to be stored in a single remote location (rather than being stored on local operator machines), provisioning to be audited, and enables collaborative review (of infrastructure ‘diffs’) and application of changes .
- Consul provides service discovery, configuration via a highly-available key/value store, and a set of orchestration primitives (events, execs, and watches). Atlas integration enables cluster state visualisation, node monitoring and alerting on applications and associated infrastructure.
InfoQ sat down with Kevin Fishner, director of customer success at HashiCorp, and asked several questions about the public release of Atlas:
InfoQ: A key question many of our readers may have is 'how is this different than using Chef, Ansible, Puppet or SaltStack ("CAPS") tooling in combination with a DVCS such as Git(hub)'?
Fishner: There are several differences between Atlas and runtime configuration management tools like Chef, Ansible, Puppet, SaltStack. At a philosophical level, Atlas differs from other application delivery options in that it uses an open source foundation to solve the specific problems within the application delivery space, and then unites those open source components to form a cohesive platform. Those specific problems are 1) configuring servers 2) provisioning servers and 3) maintaining servers and the applications running on them. These problems are solved by Packer, Terraform, and Consul respectively. Atlas runs Packer, Terraform, and Consul remotely so all changes are versioned, auditable, and collaborative. We describe this modular approach in the Tao of HashiCorp, which draws inspiration from the Unix design philosophy. The CAPS tools you mentioned also have open-source foundations, but the offerings are more monolithic. The HashiCorp approach means teams can use pieces of the Atlas ecosystem (Packer, Terraform, Consul, Vagrant, Vault), without being forced to buy-in to additional tools that they may not want.
At a technical level, the major difference is that Atlas embraces immutable infrastructure and build-time configuration, rather than runtime configuration. For those unfamiliar with immutable infrastructure, the workflow is to first build a deployable artifact such as an Amazon Machine Image (AMI), Google Cloud Engine image, OpenStack image, Docker container, etc and then provision a host using this fully configured artifact. With runtime configuration, the host is provisioned first, and then configured at runtime. Immutable infrastructure leads to a bit faster deploys, identical host configurations, and an overall more scalable design.
Overall, Atlas can be described as a version control system for infrastructure. Where a DVCS will hold the Packer and Terraform configurations, Atlas is responsible for turning those configurations into properly provisioned infrastructure. It's interesting that we spend so much time and effort version controlling application code, but the infrastructure that runs the application is not versioned. Developers have beautiful, intuitive tools that enable collaboration, but operators are a bit left in the dark. With Atlas, operators now have the ability to responsibly collaborate on and make changes to infrastructure.
InfoQ: What size of organization or size of infrastructure estate will benefit the most from using Atlas?
Fishner: Teams with more than one full-time operator benefit the most from Atlas, as it provides transparency into all steps in the application delivery process and allows operators to collaborate on changes. This is similar to version control systems for application code — a team working on a new application project would never start without a version control system like GitHub to centrally manage the project and collaborate on changes. The same is true with operators and a version control system for infrastructure (VCI), which is what Atlas accomplishes.
InfoQ: What would be the core drivers/motivations for moving away from a CAPS/Github stack to Atlas? And what would be the migration path look like?
Fishner: First, it's important to note that the stacks aren’t mutually exclusive. You could use Terraform in Atlas to provision infrastructure, which is then configured at runtime by a configuration management tool. If a team wants to use immutable infrastructure, the configuration management playbooks can still be used to build deployable artifacts with Packer. So for example, rather than running Chef at runtime, Chef would be run by Packer to build fully configured AMIs, which are then provisioned by Terraform. The migration path there is extremely simple — just reference those Chef cookbooks in your Packer build! Overall, the motivations really depend on if a team wants to invest in immutable infrastructure or runtime configuration.
InfoQ: Although Consul provides basic monitoring, it doesn't compete with the emerging container/cloud monitoring-as-a-service offerings from Datadog, Boundary or New Relic. Would this be a future area of interest for HashiCorp?
Fishner: We see the monitoring tools as complementary to Consul. Eventually, Consul could act as a “data pipe” to send metrics to your monitoring tool of choice.
InfoQ: Comprehensive testing is also currently an unsolved problem with microservice-based applications (and the corresponding infrastructure). Do you have any guidance on this for our readers? For example, can you TDD with Terraform (maybe using something like ServerSpec?), and how should developers test their applications end-to-end when deploying with Atlas?
Fishner: First off, this is a great question and I agree that it’s a huge unsolved problem. The reason for that is that it is extremely difficult to spin up environments that precisely mirror production. Terraform actually enables teams to do this, and we’ve seen companies spin up staging environments and run ServerSpec there. Additionally, because Terraform separates the “plan” and “execution” phases, teams of operators can validate that configuration updates are creating the proper changes before applying them to a production environment. It's also important to note that build-time configuration with Packer, rather than runtime configuration, will catch a lot of errors before they reach production. For example, if a Puppet run fails while building an AMI, it doesn't really matter.
Many thanks for answering these questions. Is there anything else you would like to share with the InfoQ readers?
Fishner: It’s important to note that Atlas is workflow focused, and not technology focused. What I mean by this is that Atlas solves the inherent problem of operations — moving an application from code in development to running in production through a safe, repeatable, and auditable process. This can be done on the infrastructure provider of your choice (right now we support AWS, GCE, Azure, OpenStack, DigitalOcean, and more coming soon) and with the technologies of your choice (VMs, containers, configuration management, etc). Because Atlas is built on HashiCorp’s open source tools, the teams are able to select exactly which features of Atlas they want to use, and are not locked-in to the platform. As a company, we acknowledge that datacenter technologies are changing rapidly, and we will ensure that the operations workflow stays the same, even if the technologies do not.
More information of the public release of Atlas can be found within the HashiCorp ‘Atlas General Availability’ blog post, and additional details of Atlas and the associated tooling can on the HashiCorp website.