The CloudSkiff team released an open source tool called driftctl which can detect drift in Terraform managed infrastructure.
Infrastructure-as-Code (IAC) tools make it easy to provision servers and other infrastructure components on public and private clouds. While this can increase software delivery velocity and make it easier to manage complex environments, differences in versions and configurations across the infrastructure can creep in. Even when tools like Terraform are used, incomplete coverage, less-than-perfect processes and emergency changes made directly to the infrastructure can lead to such differences - call infrastructure/configuration drift. Drift detection is important to ops teams to ensure that components are in line with the expected configuration and also to ensure compliance.
driftctl reads Terraform state files and checks that against the actual running infrastructure. It currently supports Terraform as the IAC tool and works against a subset of AWS resources only - namely EC2, S3, IAM, RDS and Lambda, but support for GCP and Azure is part of the roadmap. The primary command in driftctl is "scan" which parses Terraform state files. It can also filter out resources by tags, to handle cases where one wishes to ignore unavoidable manual changes. A scan outputs the resources that are out of sync with the expected state, both in human-readable as well as in JSON format for programmatic parsing.
InfoQ reached out to Stephane Jourdan, CTO and founder at CloudSkiff, to learn more about driftctl.
The authors of the tool spoke to around 200 DevOps teams to learn about infrastructure drift challenges. The key learnings from their survey (PDF) were that application and deployment induced drift is widespread, security issues are a major concern, and GitOps is not sufficient to prevent drift. 96% of the teams surveyed mentioned bypassing the IAC tool as the leading cause of drift, while 50% mentioned application and deployment induced drift. Some teams run "terraform plan" in a cron job - which outputs the changes that Terraform detects and thus indicates the differences between committed code and the running infrastructure. Jourdan elaborates on how driftctl aims to tackle this:
driftctl compares the Terraform state files against the cloud provider APIs for unexpected modifications, but also and maybe more importantly the other way around (deltas from API to TF state) so we catch all manual changes on the console/API.
Image courtesy : driftctl. Used with permission.
Application induced drift happens when applications modify infrastructure components directly. Jourdan talks about how this can be handled:
It all comes down to keeping track of the changes and making sure they stay within acceptable limits. When you create a resource with Terraform, part of its parameters may need to be changed by some applications (tags, contents, up or downscaling values, etc…) which is something that you can be aware of and accept. But still, you need a reporting of those changes, and more importantly, you need alerts if those changes go beyond the boundaries that you set. Keeping track of those changes to allow for analysis whenever needed is essential. For example, you might want to store all values of a VERSION tag, and receive specific alerts on some of those values (like VERSION=error for example).
There are tools like terraformer or terraforming that can generate the Terraform code based on existing infrastructure. Once generated, "terraform import" can be used to merge selective changes into the Terraform code to be committed to version control. Jourdan explains driftctl's roadmap in this context:
So far, driftctl detects and warns of infrastructure drift, but does not correct it. In the short run, our roadmap will essentially focus on adding more cloud providers on top of AWS (like Azure, GCP, etc…) and covering more services for each of them. But providing corrections of the drift events is definitely something that we’re planning as a second step of the project. Part of this remediation will be proposed as pull requests with some additional code matching the change detection.
Some public cloud providers have integrated tools to manage drift which work with their cloud-platform specific provisioning tools. For example, AWS has a 'cloudformation-stack-drift-detection-check' command that can detect drift in infrastructure managed by CloudFormation. Similarly, Azure has AzOps. There is also commercial software that performs drift detection - often as part of a compliance toolset. Examples include Fugue, OpsCompass, Pulumi, and Atomist.
driftctl is written in Go and available on GitHub under the Apache 2.0 license.