BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Virtual Panel: Configuration Management Tools in the Real World

Virtual Panel: Configuration Management Tools in the Real World

Configuration management is the foundation that makes modern infrastructure possible.  Tools that enable configuration management are required in the toolbox of any operations team, and many development teams as well. Although all the tools aim to solve the same basic set of problems, they adhere to different visions and exhibit different characteristics. The issue is how to choose the tool that best fits each organization's scenarios.

This InfoQ article is part of a series that aims to introduce some of the configuration tools on the market, the principles behind each one and what makes them stand out from each other. You can subscribe to notifications about new articles in the series here.

 

Configuration management tools are a hot topic on the DevOps community and IT organizations in general. InfoQ reached out to users of each of the major tools (Ansible, CFEngine, Chef, Puppet and SaltStack) to ask them about their experiences.

The panelists:

  • Forrest Alvarez - Flowroute, SaltStack user
  • Kevin Fenzi - Fedora, Ansible user
  • Miguel João - OutSystems, Chef user
  • Mike Svoboda - LinkedIn, CFEngine user
  • Richard Guest - GeoNet, Puppet user

The questions:

  1. Can you describe the context in which your configuration management tool is used?
  2. What criteria did you use when choosing your configuration management tool?
  3. How was the tool introduced, and the process changes that it must have brought, in your organization? Given your experience, would you recommend anything different?
  4. Where does the tool excel at?
  5. Where is the tool lacking? Are those shortcomings important in any respect?
  6. Are there any scenarios or contexts where you would suggest a different tool?
  7. Do you use your configuration management tool in scenarios that might not be obvious for someone who does not use that tool, or configuration management tools in general?
  8. Did the adoption of the tool have any impact in the way the different groups in your organisation interact with each other, especially devs and ops?
  9. Where will the future of configuration management tools lead to?
  10. Do you have any special advice for an organisation that is still not using infrastructure configuration management?

InfoQ: Can you describe the context in which your configuration management tool is used?

Forrest (Salt): We're currently using salt to manage an infrastructure of approximately 100 machines that include a wide variety of applications both internally developed, and open source. The development team (if they are interested), and the DevOps team are using Salt, with most of the work being complete by the members of the DevOps group. We build, deploy, and manage systems using Salt multiple times every day.

Kevin (Ansible): Around 400 instances/machines/vms spread out over a number of datacenters.

A number of Fedora Infrastructure folks use ansible. Our core 'main' group uses it all the time, but we delegate running of some playbooks to other groups so they can run things on their own schedules.

We do have development, staging, and production instances. Changes are pushed to staging, tested and then production. We also do freeze our production infrastructure (where signoff for changes is required) around Fedora releases.

Miguel (Chef): We use the chef orchestration tool in our automatic cloud management framework to provision, update and manage our cloud offer infrastructure in a fully automated fashion. Both our public and private cloud core services are managed by this framework, which currently extends to over 200 web and database cloud servers across the globe, and falls into the responsibility of both the Development and Operations teams to guarantee the development and management of this framework. Chef is just one of the many technologies that we use in this framework, and is mainly used as a centralized scripting engine to remotely manage our infrastructure. By using a set of development guidelines when building the chef tool recipes, we minimize the risk of change, as this scripting engine is a critical core component of our cloud offer.

Mike (CFEngine): We use CFEngine to automate all aspects of LinkedIn's Operations. CFEngine is used to push code to production, detect hardware failure, perform user / access administration, enable system monitoring, etc. Every aspect of taking bare metal hardware to serving production traffic is automated. Our server footprint in the datacenter is very large. We run several data centers globally. SVN is used as our revision control system. Each datacenter contains a read-only replica of the SVN dataset that is synchronized from the SVN master in seconds. The CFEngine Master Policy servers utilize this read-only replica to pull updates. From a high level, we have a four tier automation architecture:

  1. Master SVN instance (2 machines, one primary, one failover). Operations interacts with this machine to execute automation changes.
  2. Read-only replicated SVN instance at each datacenter (10 machines). These machines pull updates from the master every few seconds.
  3. CFEngine Master Policy servers (80 machines). The policy servers pull updates from the replicated SVN instance in each datacenter.
  4. Our servers running policy code (clients, 40k machines). Our "clients machines" use software load balancing to determine which CFEngine Master Policy server they should pull updates from.

LinkedIn's machines are not considered "built" until they have run through all of our CFEngine automation policies. We will not put a machine on the network in the datacenter that is not under CFEngine's administration.

Only our system operations group is authorized to make automation changes within CFEngine. The learning curve of any automation framework is very large. Automation administrators need to understand how to perform rapid change at large scale without causing negative effects upon production. There are several challenges associated with operations at large scale. Automation architecture is a much different line of work than what developers (engineers) usually operate in. Developers are focused on writing code to deliver individual products. Automation engineers enables a company to serve thousands of individual products into production. We found that it is much more scalable if developers submit requests to our operations team who perform releases on their behalf. Similarly, our automation operations team would not be responsible for delivering Java code to develop a customer facing product. The complexities of both roles within the company are best isolated by organizational responsibilities. The problem domain of system automation / software development are large enough to dedicate personnel resources to each individually. To this point, it is necessary that automation engineer has a programming background. Automation can be greatly enhanced with Python / Perl / Ruby / Bash integrated components. The ideal candidate for an automation engineer has a depth of programming experience, and a breath of operating systems fundamentals. How programming languages are implemented differ between the developer and automation engineer, in that the automation engineer uses system programming to create "extensions" that further amplify and extend CFEngine's automation platform. An example of this is Sysops-API. We leveraged Python with Redis, driven by CFEngine, to create a framework that grants LinkedIn instant visibility to thousands of machines in seconds. Sysops-API allows engineering / operations to have questions answered about infrastructure without having to log into systems. CFEngine's policy language with programming extensions can be used to solve any business problem, even if it wasn't initially designed to do so. CFEngine delivers a platform, where the automation engineer can use policy and programming to create company-wide / game changing cornerstones of production infrastructure.

We perform 15+ human driven production changes, but hundreds of thousands (or millions) of modifications that occur in production each day which are driven by CFEngine. The true power of automation frameworks are to build higher level platforms. In a sense, we build automation "on top of" CFEngine's framework.

What I define as a platform, some examples can be described below:

  • Utilizing CFEngine to execute a hardware failure detection routines in an independent Python utility, and report those discovered failures back into our CMDB.
  • Distribute account information across thousands of hosts, and update access rules on each machine locally. This is similar in concept to delivering a complete copy of the LDAP database to each host. We can leverage the flexibility of centralized user administration, without the overhead of performing network authentication on login requests.
  • Complex system monitoring. Allow the monitoring agents established by CFEngine to react to changing conditions on the host, and to alert operations if human intervention is required.
  • Creating infrastructure that grants complete visibility of tens of thousands of machines, without having to directly log into them to understand their current state.

By using automation to build higher level platforms, we can allow production changes to be computer driven without human intervention. Modifying production does not have to be a human initiated policy change. These higher level platforms (which are driven by CFEngine's execution) can adapt to a changing environment in an extremely rapid and reliable method.

In a metaphorical sense, CFEngine's automation executing on a machine is similar to a car driving down the road. Within the car, there are separate functioning systems that are designed for a specific goal:

  • Climate control systems
  • Audio / Entertainment systems
  • Automatic transmissions
  • Traction / Steering control

These systems rely on a "higher level functioning product" (the car) in order to operate. By themselves, they simply would not exist. Our CFEngine automation platforms are similar to this. A human can turn a car left / right or can alter the car's velocity (metaphorically creating policy changes and commits to SVN). In contrast, the air conditioning system can determine that the interior of the car needs to be lowered 3 degrees (automation within a larger system of the car). Maybe traction control determines that the braking system needs to be engaged during a turn? These independent systems within the car can make thousands of decisions a second without human intervention.

By building higher level automation systems within CFEngine, thousands of changes are made to production without human intervention. This brings a revolutionary level of automation which most organizations never achieve. In production, there are metaphorically tens of thousands of cars on the freeway. Each car operates independently. Thousands of decisions are occurring in each car every minute. Together, CFEngine allows the traffic to continue to flow down the metaphorical freeway unhindered.

CFEngine's automation allows the swarm of machines to behave as intended by the automation engineer, regardless of the "starting point" of the machine. In a metaphorical sense, its as if we have tens of thousands of self driving cars traveling down the freeway without human intervention. Each car performs thousands of changes / decisions a second to achieve stability. Using declarative policy language, CFEngine converges each machine in production into a known good state down the freeway. Traffic flows downstream in a very controlled, coordinated method.

Richard (Puppet): We use puppet to manage around 110 to 120 (mostly heterogeneous) nodes. The term 'groups' would be inappropriate to describe our three man team. That said, our "DevOps" team of three does cut across both Dev and Ops teams (although we don't have an Ops team in the traditional sense).

The three of us write puppet code, pretty much on a daily basis. Our code base is kept in a private Github repository. All merges are peer reviewed via pull-requests, and we operate two long running branches as puppet environments (staging and production). We generally only merge from staging to production once a week and the changes to nodes using the production branch are applied with some form of human intervention (i.e. our production nodes are running in a no-op mode).

InfoQ: What criteria did you use when choosing your configuration management tool?

Forrest (Salt): It had to be fast to set up, quick to learn, and easy to teach. It was also critical that the tool be written in Python and provide us with remote execution, as well as the ability to build and manage both physical and virtual systems. The tool also had to be extensible so we could have both patches we pushed back to upstream, as well as those that fulfilled our specific needs. Salt met all of these requirements thanks to a simple installation and setup process combined with the fact it's completely open source.

Kevin (Ansible): Simplicity and easy of use were very important to us. We add new folks from our community all the time to help out and it's very good to have configuration be simple and easy to explain to new people.

We also wanted something that was agentless (no desire to run an agent on every instance taking up valuable resources) and didn't have a difficult or heavyweight transport.

Miguel (Chef): Due to time constraints during the evaluation of the tools to use on a new project, we've selected a tool that delivered the expected results for remote infrastructure configuration and management via scripting, and was already known and used by some of us on other organizations. Chef met those requirements, and it was the obvious tool of choice.

Mike (CFEngine): Our primary requirement was an automation framework that allowed for a declarative policy language, not imperative. CFEngine's architecture is based upon Mark Burgess' work in promise theory. Two excellent references are Mark's publications "In Search of Certainty", which describes the current production state at large organizations and "Promise Theory, Principles and Applications" that details the science behind CFEngine's automation. Automation differs from "engineering development / typical programming" in that for any group of servers, they could be in thousands of varying different "starting states." Automation's goal is to converge infrastructure to a known good end state. With declarative programming, you describe the "end state" and allow the automation framework to converge infrastructure to this known good point. Imperative programming, what can be commonly thought as Java / Python / shell scripting, etc., attempts to describe in extreme detail how individual actions should occur. The problem with using imperative programming with automation frameworks is that they assume a beginning state. Imperative programming assumes "these are my inputs, here is what my output should be..." In reality, production environments are in a constant state of wanting to unravel into thousands of independent / unrelated states - just like one of the cars swerving off of the freeway. The automation platform must be able to handle the car in any starting position on the freeway (on the shoulder, in the slow lane, stopped in the fast lane, etc.) and still be able to deliver the car to its destination. Imperative programming would have to account for any of these conditions. Declarative programming describes the "end goal", and allows the automation system to make the necessary corrections to arrive at the end state.

Richard (Puppet): It had to be open source, widely adopted and proven. We wanted something with excellent documentation and community involvement. Availability of training in our region and support that overlapped reasonably well with our time-zone. And price was an obvious factor also.

InfoQ: How was the tool introduced, and the process changes that it must have brought, in your organization? Given your experience, would you recommend anything different?

Forrest (Salt):At my current organization Salt was already in place when I started. It was introduced as a replacement for another configuration management tool as Salt provided more features and functionality. As our teams are pretty small there weren't many process changes other than the fact we could deploy more often with fewer roadblocks since we followed the same deployment process.

My recommendation would depend on the size of the organization and their willingness to explore new ideas. I would probably start by bringing it to the teams that would benefit the most (operations), and then start showing people how much time had been saved, and what processes were automated to get them excited about the possibilities they could explore.

Kevin (Ansible): We are still migrating over to ansible from our previous tools. The process is going slower than I would have liked, but I think it's gone pretty smoothly.

Miguel (Chef): Chef was introduced for a new project related to the launch of our cloud offer, and it's use was limited to that project only. This contained usage of the tool didn't brought any impact on the existing processes, but plans to extend its usage to other projects and teams are being considered. This is the ideal approach to introduce a new technology into the midst of an organization, which allows to evaluate the fitness of the tool and decide to adopt it or not, in a controlled fashion with less impact to other ongoing projects.

Mike (CFEngine): Before we implemented CFEngine's automation, we were heavily dependent on our kickstart build infrastructure. Most companies / organizations are in this same situation. We spent thousands of hours customizing our kickstart post install scripts so after a machine was built, it had completed steps to come closer to a production-ready state. After the kickstart was complete, there was several more hours of "hand-tuning" the machine by a system administrator to complete the necessary steps in order to get a single machine into production.

Our first goal was to replace our complex post steps in kickstart with CFEngine's automation framework. We wanted to only have kickstart perform the "minimum base O/S" installation, and then move any complexities of getting the machine into production into CFEngine policy. Once we could build machines with CFEngine just as well as we could kickstarting them, it became much easier to maintain production. Complex build infrastructures suffer from a major flaw: They assume machines can be rebuilt at any point in time. This is not true. Machines stay in production for years. If modifications are made to post in kickstart, they would need to be applied to existing infrastructure by hand. In practice, this doesn't happen. Attempting to use the "server build process" for configuration management fails miserably. Alternatively by putting all logic into CFEngine's automation policy files, we allow the machine to be constantly kept up to date, every five minute execution interval. CFEngine maintains the server throughout its entire lifecycle, not just a few commands during the initial build. When all of production remains maintained at a constant level, applying small incremental changes in production to satisfy business requests is trivial.

Our initial attempt at spending thousands of hours into customizing our kickstart post scripts was a futile effort. We should have started using configuration management years earlier and saved ourselves the trouble. Once we shifted our operations to CFEngine's automation, we were able to deliver a much more advanced and flexible platform for engineering to develop linkedin.com.

Richard (Puppet): It was a fairly slow process at first. We made several design decisions that proved to be misinformed and had to be revised. The biggest speed-bump in adoption was the massive difference in our existing systems. We had a mix of RHEL, CentOS and SciLinux, at major versions 4, 5 and 6, running on 32 and 64 bit architectures. I think if we were doing this again, we would do less with puppet to accommodate the older OSs, and instead work harder on migrating. Modelling the actual current state was much harder than modelling the desired future state.

InfoQ: Where does the tool excel at?

Forrest (Salt): Salt is great for a wide variety of tasks, from remote execution, to reactions based on what happens on the server. It also has a very small learning curve to get users started and lets them feel successful after just a few minutes. Salt also has one of the best configuration management communities on the internet, with a ridiculously active IRC, amazing documentation, and a drive to improve these aspects and get more people involved. The continual growth within the community ensures that new features are being implemented constantly which always provides a feeling of excitement when you review the patch notes.

Kevin (Ansible): Simplicity and ease of use. Ansible playbooks run in a easy to understand order, have a simple and clear syntax and are easy for everyone to see what each play is doing without training or needed to know obscure language features.

Miguel (Chef): Our experience shows that the extensibility of the tool through the usage of chef recipes, the runtime stability of the software and the ease to integrate with, are the top advantages that we're benefiting for using the chef tool for our use case.

Mike (CFEngine): Automation platforms (regardless of which implementation) are not natively easy to use. Metaphorically speaking, driving a car with a manual transmission for the first time has a learning curve. Scaling production to tens of thousands of machines introduces new challenges every day. CFEngine's automation grants operations personnel the necessary tools to accomplish their responsibilities at extreme scale. CFEngine gives us the necessary tools to execute any sort of complex change, as if we are logged into every machine directly with a shell prompt. There is no request that would be too complex that could not be accomplished with CFEngine's policy language.

We have made tens of thousands human-driven changes to production with CFEngine. Every problem that the business has presented to us, we have been able to solve with CFEngine's automation in a scalable way. In this sense, CFEngine is 100% extensible to the demands of the business in the hands of a skilled administrator.

With CFEngine's use of declarative policy language instead of imperative programming, the policy sets are simple to understand. Metaphorically, the automation engineer just needs to "describe the destination", similar to a car's GPS navigation. Humans provide the "high level details" on how to reach the destination, with CFEngine being responsible for the low level details of transporting the vehicle from A to B. CFEngine handles the low level details of reaching the end goal. Humans are responsible for describing at 10,000 foot view what the end state of production should look like. When reading through CFEngine's policy files, this grants simplicity to complex infrastructure problems.

Richard (Puppet): Puppetlabs make it very easy to understand the basics of puppet and get started. After we got started, the extensibility of was critical to our continued success with puppet. The introduction of Puppetlabs supported modules in the forge gives confidence in the quality of the code being imported. As well as that, we have also had great success with other community driven extensions and support tools, such as adrienthebo/r10k (puppet code deployment tool), TomPoulton/hiera-eyaml (encrypting passwords that are managed by puppet) and razorsedge/puppet-vmwaretools (VMware tools installation), to name just a few.

InfoQ: Where is the tool lacking? Are those shortcomings important in any respect?

Forrest (Salt): Right now Salt is primarily lacking in pre-built configuration files that allow you to quickly set up an environment for a specific application. Like any young community Salt is continually growing and moving fast which sometimes leads to backwards compatibility issues and bugs. While these aspects are definitely important, there are few other open source projects I've seen that are as active about fixing issues and addressing user feedback while providing a great user experience.

Kevin (Ansible): Fedora has a pure OSS policy, so there are some automation things that Ansible Tower can do that we still have to replicate in Fedora, like running ansible-playbooks from cron or when changes were commited.

Miguel (Chef): During the development of our project, we've identified a couple of features that could benefit from some improvement, to better fit our use cases: one is the lack of atomicity of the chef recipes set & execute operations, which forced us to implement a concurrency manager outside of the tool to avoid multiple operations on the same server to overwrite the configurations for the server; the other is the lack of a simple input/output parameter framework for the chef recipes, which limits the ability to send and collect information into the chef scripts. For the time being and in the near future, these shortcomings do not seem that important in our projects.

Mike (CFEngine): Automation frameworks assume that the administrator understands the behavior of their environment. For example, the business may request to modify the kernel behavior of 40,000 systems. Maybe 5 systems out of the 40,000 have been modified in an obscure way? If we simply apply our desired configuration to all 40,000 without taking these 5 systems into consideration, its possible to introduce a negative change. In this sense, there is a "needle in the haystack" that needs to be discovered before policy changes can be made.

Before modifying production, administrators must have visibility into all systems that the change is about to effect. To address this shortfall, LinkedIn had to develop a system called Sysops-API (learn about it at slideshare or YouTube).

Sysops-API allows LinkedIn to audit production before changes are made. In a sense, we understand the impact of our change before we execute. Some organizations are burned by automation because changes are executed upon without understanding the impact. They do not find these "needle in the haystack" systems, and inadvertently cause outages. Because CFEngine is so extensible, we were able to develop infrastructure in Sysops-API to grant us the visibility we required. This visibility allows us to safely execute thousands of human-directed changes without causing production outages.

Richard (Puppet): One frustration we have is the inability to view our environments separately via the PE console. It's not hugely lacking, but more of a feature we would like it to have. The only other thing we really miss is an out of the box integration test suite.

InfoQ: Are there any scenarios or contexts where you would suggest a different tool?

Forrest (Salt): If the organization was heavily invested in Ruby (within development groups as well as operations) I would suggest they look at another tool. Salt is written entirely in Python, and I feel that choosing a tool which fits into your ecosystem is the most important factor.

Kevin (Ansible): Not off hand.

Miguel (Chef): When the infrastructure orchestration processes requires different kinds of integration, Chef should be seen as a complement to a larger orchestration framework, with the ability to integrate with different kind of systems and protocols (databases, web services, etc) and also integrate with Chef for scripting capabilities. Fortunately, our product already provides such capabilities, so there was no need to look any further for a global orchestrator, which reinforced the fitness of Chef as a centralized scripting engine.

Mike (CFEngine): No. CFEngine provides a wide enough framework that we are able to accomplish all demands requested by the business. Our agility to respond to business demands using CFEngine is unparalleled. Not only can we satisfy the business' demands, we are able to do so in a clear and concise method that is very easily understood. Any other person (who was not involved with the original change) can look at CFEngine's policy language and can grasp the original intention of the automation behavior.

Richard (Puppet): As I haven't used any other tools that I could reasonably compare, I couldn't recommend anything else.

InfoQ: Do you use your configuration management tool in scenarios that might not be obvious for someone who does not use that tool, or configuration management tools in general?

Forrest (Salt): Yes, I use Salt for my own projects which includes 'deployments' to my personal blog. It's as easy as making a push to github, and running Salt on the associated server. I've also used Salt to manage a Linux desktop and it worked really well for that.

Kevin (Ansible): We do use ansible to manage not only configuration on instances, but to also create and install them, which is pretty handy. Just plug in variables and run playbook and instances are installed and configured from one playbook run.

Miguel (Chef)Our usage of Chef is limited to remote script execution on our servers infrastructure, which is the basic usage one can expect with this tool. Beyond that, it's what we can do with this capability that allows us to innovate in the areas of automatic provisioning, monitoring and self-healing of large enterprise application systems.

Mike (CFEngine): Yes. CFEngine doesn't just provide configuration management for LinkedIn. CFEngine has literally become the backbone of LinkedIn's operations. Without CFEngine's automation platform, we would be unable to operate production. CFEngine's usage goes far beyond typical configuration management. We are leveraging CFEngine to provide complete lifecycle management of our infrastructure. CFEngine manages all aspects of our system state. This is a far more complex notion than filesystem configuration drift, what would typically be thought of "configuration management."

In a sense, configuration management is an outdated phrase or misnomer when describing the benefits we receive from this software. CFEngine provides LinkedIn a platform, to which we are able to respond to any business need at any scale. Our footprint in the datacenter has become irrelevant. If we have 1,000 machines, 10,000, or 1,000,000; CFEngine is able to address business demands to a level that it is possible for a single person to manage.

Richard (Puppet): Because of the heterogeneous nature of our infrastructure and the size of our "team" we don't like to run puppet in applying mode on our production systems. This could prove a bit of a headache essentially having to manually turn the handle to deploy wide-ranging changes. However, we get a lot of assistance in these types of scenarios from some of the less documented powerful features under the hood of PE: mcollective. After our merge of weekly changes from staging to production, we leave puppet to run on our production systems in no-op mode, reporting potential changes (and occasionally failures) to the master, and collate them via the PE console's Event View. If we are happy with the changes a class or resource is making, we will then trigger a tagged run for that specific class or resource across the infrastructure. This allows us to visualise the changes puppet wants to make, before we approve them, without having to log in to every single node.

InfoQ: Did the adoption of the tool have any impact in the way the different groups in your organisation interact with each other, especially devs and ops?

Forrest (Salt): Not really, both of our groups are quite good at communicating and working with each other, so Salt was simply another piece in the puzzle that allowed us to deliver faster, with better predictability and fewer issues.

Kevin (Ansible): Not really. We have always had a close relationship between our developers and operations folks. Migrating to ansible has just brought things more together as we migrate.

Miguel (Chef): The adoption of the tool still hasn't grown outside the teams responsible for the cloud offer projects. But even within these teams, the ability to leverage Chef to perform bulk operations on a large cloud server infrastructure, provided us the ability to coordinate the deployment and execution of chef recipes very easily, between the development and the operations team.

Mike (CFEngine): Engineering found that their requests to operations were satisfied within a single business day. The scale or complexity of their requests became possible to accomplish very quickly. They are astonished with how agile operations has become. Our agility has created a very healthy / trusting environment. Engineering is eager to request assistance using CFEngine's automation solutions. This in turn, leads to more requests.

In a sense, our success with CFEngine's automation has become self-fueling. The more successful we are with the product, the more engineering requests operations to use it to solve complex problems.

Richard (Puppet): We already worked within the same team, so it hasn't had a huge impact.

InfoQ: Where will the future of configuration management tools lead to?

Forrest (Salt): Probably in a direction where configuration management is able to easily handle your entire environment, from the routers and switches (something Salt is currently working towards), to bare metal, and public/private clouds. I'd also expect more pieces of software to support configuration management tools out of the box, providing the configuration files, and required steps to make their software work immediately so it becomes much more 'plug and play'.

Kevin (Ansible): I think there will be more movement into orchestration (which ansible excels at), where configuration happens in the order it needs to over a large group of servers or instances instead of linear on just one server or two.

Miguel (Chef): Configuration management covers a lot of different aspects of IT Service management. But as with many other software constructs, the configuration management tools most likely will evolve in three main areas: simplicity, automation and cloud. The IT service departments need to get more efficient in the delivery of their services, and the key improvements to configuration management tools to assist in this evolution are simplicity and automation. Simplicity to setup, and manage the tool, and to take advantage of automated operations as much as possible. In addition, the migration of key IT services to cloud based services is a trend that is growing and configuration management tools will have to keep up with this trend, to configure and manage cloud services and infrastructures.

Mike (CFEngine): Ask me tomorrow! I'll be pushing 10-15 automation changes across tens of thousands of machines in minutes. LinkedIn is operating in the future. A very small number of companies in the world are able to respond to business demands in such a scalable and agile method. We have been able to develop a solution for every business problem that has been brought to our attention.

Richard (Puppet): The slow extinction of "traditional" (read crusty) sysadmins. And better sleep for the smart DevOps who employ these tools!

InfoQ: Do you have any special advice for an organisation that is still not using infrastructure configuration management?

Forrest (Salt): Just go for it! Analyze your needs, pick the tool that suits them, and start tinkering. When you're looking at the benefits from a distance they can seem somewhat small, but once you start using configuration management you'll wonder how you ever lived without it. Fewer mistakes, faster deployments, and more time to work on important projects benefits everyone within the organization.

Kevin (Ansible): Your life will be made 100x easier by having CM in place. You don't have to do it all at once, but consider new machines with it to try it out.

Miguel (Chef): If the organization needs to provision and manage multiple systems, then an infrastructure configuration management tool is the way to go. Investing in automation for the operation tasks required to configure and manage a large infrastructure reduces the risk of manual operations, increases the scalability factor of the operations teams, and allows faster response times for mission critical systems. When designing your system architecture consider scale, automation and simplicity right from the design phase. This will increase adoption by the users, and allow to rapidly grow into a core asset within the organization.

Mike (CFEngine): Does your company have talented system administrators? Would you like to retain them? The quality of life for your administrators will increase once they learn how to wield CFEngine's automation framework to deliver scalable solutions. Once a business problem is solved in CFEngine policy, it never has to be addressed again. This allows your administrators to plan for the problems of tomorrow, instead of repeatibly correcting yesterday's issues.

Richard (Puppet): Just do it!

About the Panelists

Forrest Alvarez is a DevOps engineer focusing on automation and redundancy at Flowroute. He is active within the Salt community and one of the top contributors to the project on Github. As an engineer Forrest strives to push collaborative mentalities to their limit, subscribing to the school of thought that communication, understanding, and the use of open source software can empower organizations.

Kevin Fenzi has been running Linux servers for the last 10+ years and currently manages the diverse and 100% Open Source Fedora infrastructure for Red Hat.

 

 

Miguel João is a Lead Cloud Software Engineer engaged in the development of the core framework that supports the OutSystems public and private cloud offer. He is an experienced IT professional with over 8 years of experience in enterprise customer support, core product development and maintenance, infrastructure deployment and operation, and technical training.

Mike Svoboda is currently employed at LinkedIn as a Senior Staff Systems and Automation Engineer. Mike has used CFEngine to scale LinkedIn 100 times their original server footprint over the past 4 years, all while reducing administration overhead.

 

Richard Guest is a Senior Software Engineer for GeoNet; a GNS Science project funded by the New Zealand Earthquake Commission, to build and operate a modern geological hazard monitoring system in New Zealand. He leads a small team implementing DevOps processes across GeoNet data management and development systems and infrastructure, including real-time earthquake location and information delivery systems.

 

Configuration management is the foundation that makes modern infrastructure possible.  Tools that enable configuration management are required in the toolbox of any operations team, and many development teams as well. Although all the tools aim to solve the same basic set of problems, they adhere to different visions and exhibit different characteristics. The issue is how to choose the tool that best fits each organization's scenarios.

This InfoQ article is part of a series that aims to introduce some of the configuration tools on the market, the principles behind each one and what makes them stand out from each other. You can subscribe to notifications about new articles in the series here.

Rate this Article

Adoption
Style

BT