BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Infrastructure Code Reviews

Infrastructure Code Reviews

DevOps is a movement. DevOps is a mindset. DevOps is devs and ops working together. DevOps is a way of organizing. DevOps is continuous learning.

The intangibility of DevOps makes it hard for leaders to come up with a clearcut roadmap for adopting DevOps in their organization. The meaning of DevOps is highly contextual, there are no popular methodologies with prescribed practices to abide by.

However, healthy organizations exhibit similar patterns of behavior, organization and improvement efforts. In this series we explore some of those patterns through testimonies from their practitioners and through analysis by consultants in the field who have been exposed to multiple DevOps adoption initiatives.

This InfoQ article is part of the series "Patterns of DevOps Culture". You can subscribe to receive notifications via RSS.

 

Works In Production

We were facing an issue with an open-source distributed database. Like many such systems it uses a gossip protocol to coordinate which nodes are responsible for which data (and making sure all nodes agree). Gossip is also used to detect failed nodes and ensure that client requests go to nodes that are up. In the case where no nodes responsible for a particular range of data are available, the database can not satisfy the request and instead must return an error to the client. This system was working well for us until nodes started going haywire and returning a stream of availability errors on restart (the cause was eventually tracked down to a side effect of a new feature). Getting spooky errors on restart is particularly nerve-wracking because routine operations like upgrades or changing settings that might help debug the problem become high-stress affairs.

Fortunately the database was open source, and we found a report of a similar problem a while ago where someone had sketched out a solution. I cleaned up and rebased the patch and after some internal review and testing pushed it to our production clusters. The patch worked(!) fixing a vexing production problem before a long holiday. However, after posting the patch with the Works-In-Production seal of approval, someone pointed out that it would prevent the cluster from even starting in certain configurations. Simply reading the code and reasoning about the system had uncovered a fundamental flaw not discovered by running on dozens of production nodes.

Why Code Review

Code review is usually pitched as a software "development" practice. However, as infrastructure -- including configuration management, deployment scripts, provisioning manifests, packages, and runbooks -- becomes code, those artifacts become testable as code. Testability is an obvious virtue whose benefits are now well accepted by practitioners. Just as infrastructure code serves as a common language for both development and operations focused teams, review is the common forum for code discussions and brings with it another set of virtues.

There are several ways that engineering organizations review the source code that they write. Historically, all of the source code might be printed out and then reviewed by experts before the code is run. Engineers might meet as a group sporadically to review proposed changes at a high level (perhaps doubling as an architectural, product, or deployment procedure summary). Engineers might also audit an entire code base for security or compliance reasons. These are all fine practices. But this article is about the type of code review where each change is reviewed continuously as it is developed, such as sending a patch to a mailing list or a GitHub pull request.

There are several straightforward reasons for doing code review. Most directly, the process can discover bugs before the software is running in production when they are dramatically cheaper to correct. Code review also improves the "bus factor" for components by ensuring that at least two people understand how each change works -- and more importantly -- why it was made. Code review helps break down small enclaves of a particular style ("this is obviously team X code") and spread consistent best practices throughout an organization. Also, as a matter of professional development, it is a great way to learn from your peers.

Infrastructure code can be difficult to test, as it is rarely dominated by pure functions and almost by definition interacts extensively with the local OS and remote systems. While testing and code review never need be exclusive practices, review of infrastructure code has the significant advantage of always being applicable. Code may be difficult to test or require extensive scaffolding before the first unit test can even be written, but all changes can immediately be reviewed. Code review might even be necessary to get that test scaffolding working for infrastructure code. At AddThis we use a combination of chefspec (quick running verification like "Was the correct instruction passed to the configuration management system?") and serverspec (create a VM and check: "Was the right directory tree really created?") for testing our configuration management recipes. When to use which one while satisfying other desirable properties like minimizing duplication takes much more nuance than a simple unit test. Debate over the proper test cases are often responsible for more commentary than the actual code.

Reviewing in Practice

When trying to first introduce systematic code review organizations often get tripped up on when to insert the review. Should it be pre-push review (before the change has landed in the authoritative repository) or post-push (some time after the change has landed)? Since pre-push review happens before the change is deployed authors are incentivized to craft small changes that can be readily understood (since the change will not be deployed until someone else understands it) and reviewers have a chance to make meaningful suggestions before the code runs in production (this is far more rewarding for the reviewer). However, adding a new -- blocking -- step to the development process is a risky and potentially disruptive change. Post-push review still realizes many of the benefits of review in general and requires no initial changes to anyone's process.

The most important part is that all changes are reviewed and the ratio of pre- to post- review can be adjusted over time. An organization can start with mostly post-review (to minimize disruption) and gradually switch to mostly pre-review as attitudes towards review shifts and tooling improves. Infrastructure repositories may also tend towards smaller time based state changes such as the membership of a cluster or the desired version of a package. These changes may have a different trade-off for the cost of blocking vs value of review than large web services.

It may be tempting initially to formalize code review as a one way street, where senior engineers pontificate to a captive audience of more junior engineers. This is a mistake. All engineers have some expertise in particular non-overlapping areas, and the empathy required to do code review well is exactly the sort of non-technical skill that you want to encourage in your engineers. Matt Welsh is a well known systems researcher and computer science professor whose seminal SEDA paper even has its own wikipedia page. He describes his first experience with code reviews (by people old enough to be his students) as a life altering experience that dramatically improved the quality of his work.

Emphasizing that code review is an exercise in empathy also puts people in a positive mindset. It is perilously easy to fall into the trap of giving negative feedback along the lines of "This isn't how I would do it". That type of feedback isn't any fun to receive and might not make the code any better if acted upon, just more like the reviewer's code style. Instead, judge the change on its own terms: Does it do what it says it does? Is the problem diagnosed adequately or is this a shot in the dark? Does the change actually solve the problem? Does the commit message have enough context for why this change was made that will make sense later?

That doesn't mean there is no room for idiosyncrasies or styles. Code review can be a vehicle to promote whatever values your engineering team wants to emphasize. Perhaps your team really likes (well commented!) tricks in the tradition of Hacker's Delight or, after being burned repeatedly, values production robustness. Whatever the values are, code review is a way to reinforce them day to day. Since code review is also, by necessity, written down it can also be a venue to engage the necessary storytelling to ingrain your culture. It is one thing to point out that maybe a function ought to be a bit more efficient. Is is quite another to relate a story where a few lines of optimized bit twiddling resulted in huge real world performance gains. How a package management conflict caused a painful weekend outage that was fiendishly difficult to debug not only is a good way to explain why a test case would be an addition but also reinforces the importance of testing in general. (Provided of course that these stories are really relevant to the code at hand and not just the most fun stories to tell.) If you find your reviewers inadvertently writing long tales that's a great problem to have, but you should probably also find a better home for them -- such as internal wiki or blog -- than buried inside review comments.

Linting: More than just curly brace wars

Most of the engineers I work with primarily write in Java or other JVM languages. Java does not have a reputation as a particularly terse language and it is is not uncommon for lines to extend far beyond the traditional 80 character limit. Java developers also tend to prefer IDEs which aggressively take care of formatting. For a little while it was a running joke that despite all of this, every IDE by default placed a bright red vertical line at 80 columns and no engineer would ever disable the red barrier.

At the time it was also common to configure a style checker as part of the build, but to only occasionally check its output. This was my context when I was preparing a patch for a project that had integrated linting into its review system. One line with a series of string concatenations felt a bit awkward to break up, so I left it longer than the lint tool wanted. But the tool still went ahead and stuck a bright yellow "Lint Warnings TXT3 Line Too Long" on the diff. The feedback was quick and simple "Please fix this lint warning." Initially feedback over white-space didn't feel great (even when self aware of our own follies, it's hard not to think code you just wrote is pretty slick) and after all I had considered the warning and thought my taste and style judgment were decent. But as I became more familiar with this particular project I came to appreciate the strict linting. There were over 300k lines of code, and they all looked the same. More importantly patch review rarely got bogged down over style nits, (except when first time submitter deliberately ignored the rules!).

Inevitably as engineers start critiquing each other's patches (perhaps for the first time) someone will disagree over white-space, curly braces, or some other pedantic style issue. If your company has not already gone through the process of agreeing on a style guide this can trigger some very lively discussions.

This discussion is a good thing for all of the expected reasons; for example, that consistency makes code easier to read. It also lets you capture all of those formatting rules (or at least all of them that are free of false positives) as code so that computers can catch them quickly. This makes code review more efficient as reviewers can focus on more interesting things because linting has already checked the formatting. It's also just plain healthy for an organization to know how to agree about prickly cross-team issues lacking in glamour. At AddThis the java style guide for opening brace placement was settled by a Mortal Kombat tournament (and full disclosure: I lost).

Moving towards automatic mechanical checks also gets your team in the habit of systematically making changes across your entire code base. This might start with trailing whitespace, but snowballs in value as it progresses to fixing entire classes of bugs with agreed upon static analysis settings. That project that I submitted a patch to with consistent column length also checks the permissions of every file, undefined property in dynamic languages, and arguments to printf-style functions. Google is known for not only persistent refactoring of this sort but taking it one step further and pushing the checks into the compiler. Review with tests provides the confidence necessary for systematic refactoring and fixes while each linting review in turn improves the quality of reviewed code.s

Testing Towards Continuous Delivery

The final infrastructure piece after linting is integrating testing into the review flow. This carries a lot of the same basic benefits of linting (reviewers don't need to question if they were run correctly because they can see the results). But by inserting testing into an already asynchronous process you gain the ability to do more automated testing. Integration tests, serverspec tests (including spinning up multiple VMs), and data based regression tests can all take far longer than is reasonable for a single developer to always run on their workstation for every commit. Fortunately, running multiple expensive tests in parallel and reporting back the results is exactly what continuous integration systems are good at.

Review, linting, and testing combined will result in a self-reinforcing cycle where trunk is always expected to be stable. Review (like linting and testing) is a check in the release pipeline, and when a diff fails the followup fix (bug fix in code, linting rules with fewer false positives, or more stable tests) can also be reviewed. And if trunk is always stable, why not release after every commit? The practice of small frequent deploys has been well known since 10 deploys a day as have the virtues: accelerated time to market; easier analysis of problems; and quicker feedback. By emphasizing the importance of an always stable trunk, continuous delivery also reinforces the importance of thoughtful code review and adds another software quality flywheel.

About the Author

Chris Burroughs works at AddThis where he focuses on the intersection of infrastructure, systems performance, and developer productivity.  He writes or reviews code every day.

 

 

DevOps is a movement. DevOps is a mindset. DevOps is devs and ops working together. DevOps is a way of organizing. DevOps is continuous learning.

The intangibility of DevOps makes it hard for leaders to come up with a clearcut roadmap for adopting DevOps in their organization. The meaning of DevOps is highly contextual, there are no popular methodologies with prescribed practices to abide by.

However, healthy organizations exhibit similar patterns of behavior, organization and improvement efforts. In this series we explore some of those patterns through testimonies from their practitioners and through analysis by consultants in the field who have been exposed to multiple DevOps adoption initiatives.

This InfoQ article is part of the series "Patterns of DevOps Culture". You can subscribe to receive notifications via RSS.

Rate this Article

Adoption
Style

BT