BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Securing the Modern Software Delivery Lifecycle

Securing the Modern Software Delivery Lifecycle

 

 

Information security practice has evolved to be pretty good at granting and managing access to confidential information - by people. But automation is taking over. Applications, servers, even networks are not configured and deployed by hand anymore. This is great; our systems and delivery pipelines are becoming faster and more robust. Automation, however, requires a shift in how we think about securing our infrastructure and the applications that run on it.

When delegating authority to non-human actors, we want to make sure they can only do what we ask. Modern infrastructure is made of cattle, not pets. The scale of modern distributed systems makes babysitting individual components too time-intensive. A VM or container may be running for less time than it takes to record their existence by hand. In this article, I will cover a few common steps in the modern software delivery lifecycle and share best practices for securing them.

Development

Checking secrets into source control is a problem. Not checking secrets into source control creates a new set of problems. If your application will not run without credentials, they become dependencies. Untracked dependencies become a maintenance nightmare.

There are several approaches to this problem. The simplest solution that works for your organization should win; if you put a complicated process in place people will work around it. They’re just trying to get their work done.

A proper solution must satisfy these criteria:

  1. Simple flow for working with credentials locally.
  2. Least privilege is maintained, no sharing keys.
  3. Credential rotation should not hinder development workflows.
  4. Rotation and access to credentials is recorded in an audit.

Let’s look at a few different ways to keep your secrets out of source control and see how they stack up.

Workflow

simple

least privilege

easy rotation

audited

Put secret values into environment variables or files on development machines.

yes

no

no

no

Encrypt your secrets and commit the encrypted values to source control.

yes

no

no

no

Offload credential handling to config management or deployment tooling.

no

maybe

no

maybe

Store references to secrets in source control that are resolved by a secrets source.

maybe

yes

yes

yes

In the chart above, “maybe” means it’s possible but depends on the implementation and scale. Placing secrets values on development machines makes it hard to manage who has access to what credentials. It also requires manual coordination when rotating credentials. Checking encrypted secrets into source control solves one problem and creates another: distribution of decryption keys. These keys are secrets themselves that must be managed. Embedding credential handling into config management or deployment tooling complicates these tools’ workflows. It also hinders collaboration because with secrets stored in these systems they must now be locked down to a trusted set of users, stifling innovation and hindering velocity.

Clearly, storing secrets by reference allows the most flexibility even though it may add some up-front complexity. Credentials are fetched during runtime and not left on the system. Using a tool like Summon allows you to swap out secrets providers while maintaining the same development workflow. There are not yet many tools in this space however.

Continuous integration

Once you have more than a dozen projects, maintaining a reliable and secure continuous integration workflow can become a burden unless you take steps to preempt the sprawl of complexity that happens. Many of these steps are laid out in the Ten Factor CI Job.

Whether you’re using a SaaS CI system or hosting one yourself, committing your job configurations to source control is a great first step. Doing so allows you to view the changes to job configs over time and enables peer review for those changes. Also, should your CI system go down, you can restore it from source control. This also enables your developers to run copies of your CI system locally to test changes before pushing them. Consider setting your CI UI to read-only and only allowing changes through source control. Self-hosted CI systems like Jenkins, Go CD, Bamboo and TeamCity all support this though via authorization strategies and/or plugins. Hosted CI systems like Travis CI and Circle CI are driven by YAML files committed to the project directory so they already enforce this pattern. If your CI tooling does not allow configuration via source code, it may be time to look at other CI options.

With CI job definitions now in source control, the same considerations apply as discussions in the Development section above. Secrets should not be checked into jobs. Additionally, since CI jobs are short-running, consider granting them access to secrets only when they’re running. A robust role-based access control system can grant jobs temporary authorization to fetch secrets needed to build artifacts and then revoke that authorization when the job finishes. This greatly decreases the threat surface of your CI server.

Finally, be wary of granting wide-open privileges on your source control to your CI system. This is particularly a problem with many hosted CI solutions; for example, by default they may want read/write access to your entire GitHub organization’s repos. For most workflows, read-only access and per-project tokens or access keys are good practice.

Deployment

Often continuous integration systems are dual-purposed as deployment systems. On the surface this seems like a good idea. CI is producing the artifacts so why shouldn’t it deploy them as well? The problem is that the set of credentials and user access required to build artifacts are almost certainly not the same set required to deploy them.

This is the metaphorical “putting all your eggs in one basket” or, as it’s known in the security community, “violating least privilege”. Mistakes happen, machines and people are compromised. Limiting the effect of that compromise is a good practice. In a modern distributed system, each part should only have access to the resources it needs to do its job.

Enhancing the role of continuous integration to include deployment also has a human cost. Now that the CI system has all the keys to the kingdom extra effort needs to be made to lock it down and prevent misuse. This often has the unintended effect of slowing down or locking out users in your organization. They will work around it to get their job done.

Finally, automated governance is vital for deploy (and CI) systems. Limited visibility into what’s happening to your environments will slow your software delivery to a crawl. The answers to these questions should be recorded and readily accessible by anyone.

Investigation questions for deployments:

  1. Is this a manual change? Someone accessing servers and deploying by hand.
    1. If so, how is that person given access to the servers?
  2. Is this an automated change? Software doing the deploy.
    1. If so, what system is responsible for taking this action?
    2. Is there a human gate at any step in the process? What is it?
  3. How does a person or software access the credentials needed to deploy code?
    1. Is that access recorded and cross-referenced with the deploy metadata?
  4. How is feedback given, positive or negative, on the deployment result?
    1. Is this feedback recorded? If so, where?

Consider creating separate roles and systems to handle build and deployment in your system. For example, Jenkins can build a war file and push it to an artifact repository. It can then trigger an API call to a deployment tool like Rundeck to distribute the artifact. The clear separation of duties makes managing security policy and access for both systems easier than if they were combined.

Production

The number one barrier to compliance at scale is limited visibility into distributed systems. Audits can be harrowing ordeals if you don’t have the right kind of visibility into your environments. The amount of data generated can be immense. Much of this data is operational, server loads, requests per second, and so on. This is not the type of data I’m talking about. Visibility into state change is really the goal.

Here is an example of an audit log that is capturing state change events.

```

[2016-02-23 18:26:07 UTC] vizapp:a3ee40f deployed to host:prod/web01 by host:ansible01

[2016-02-23 18:26:11 UTC] vizapp:a3ee40f deployed to host:prod/web02 by host:ansible01

[2016-02-23 18:43:12 UTC] user:dave ran command:“sudo nano /etc/service/postgresql.conf” on host:dev/db01

[2016-02-23 18:45:34 UTC] user:dave ran command:“sudo service restart postgres” on host:dev/db01

[2016-02-23 18:45:34 UTC] user:jill granted user:stan permission:ssh on host:dev/db01

[2016-02-23 19:01:01 UTC] user:jill revoked user:stan permission:ssh on host:dev/db01

```

Note that the format is standardized and there are no events in the log that are not related to state change. Having an immutable audit log tracking state change is invaluable for both operational and compliance visibility. In the example log above, we can see that Dave edited PostgreSQL config on a database server and restarted the service. That didn’t seem to fix the issue he was working on so his supervisor Jill granted temporary machine access to Stan to help debug the issue. Jill then removed Stan’s access when it was no longer required. This is the level of visibility you want when trying to make sense of what is happening in your systems.

Manual maintenance should be avoided at all costs. It is hard to track, error-prone and becomes impractical quickly. SSHing into several machines in an auto-scaling group to debug a misbehaving application is not a sustainable process. Change should be rolled out en masse. Consider limiting, or even removing, SSH access to your most critical environments. A break-glass procedure can be put in place to allow temporary escalation if needed. This escalation should be recorded to an immutable record, as in the example above.

Automation is a much more reliable method for maintaining software. Most modern tooling and processes output what they’re doing. Processing this data into a common format and getting it all in one place is a very worthwhile endeavor. For example, if you are using a SaaS logging service for your application logs it is also a logical place for your build, deployment and SSH access logs. It’s much easier to correlate events when they’re all in the same place. One thing to be careful of here is that you don’t capture sensitive data in your logs. The goal here is to open up the data to anyone that wants it. Having database passwords in your logs means you have to lock access down to only “trusted” users.

Finally, delegating authority to automation tooling should be specific. For example, a config management script should only be able to pull artifacts, not publish them. Infrastructure orchestration scripts meant for deployment should not be able to tear down every server in your production account. Granting only the privileges required, and no more, will make your systems easier to understand.

Conclusion

In conclusion, automation is not at odds with security good practices. Rather, they are mutually beneficial. The control and visibility afforded by automation can make it much easier to understand large complex systems. Security provides a safe environment in which to learn more about the systems we operate and tune them to better accommodate our needs. The ways we think about and implement automation and security will inform each other. The end result: more innovative and secure software.

About the Author

Dustin Collins is a polyglot engineer and Conjur's developer advocate. He organizes the Boston DevOps meetup and most interested in how we as a community can iterate on our cultures, processes and tools to enable continuous delivery of quality software.

Rate this Article

Adoption
Style

BT