In a recent blog post, Alberto Gimeno, GitHub actions engineer, shared how GitHub makes use of feature flags to enable frequent, safe deployments. GitHub leverages feature flags for all potentially risky changes, allowing them to quickly disable the change if needed.
Feature flags, or feature toggles, are a technique by which new code is toggled off upon shipping except for cases where it is desired to be visible. As Charity Majors, CTO of Honeycomb, describes:
I am also a really big fan of feature flags. You need to decouple deploys and release because just shipping your code to production shouldn’t mean everybody automatically sees it.
With the addition of feature flags, code complexity does increase. GitHub has a number of ways for validating that the code is functioning as expected. In development, feature flags can be toggled from the command line. Feature flags can also be toggled via code for use within automated tests. Within production, feature flags can be toggled via the request's query string.
For their CI pipeline, Gimeno shares that
We have two different builds: one that runs with all feature flags disabled by default, and another one that runs with all feature flags enabled by default. This drastically reduces the chances of not covering most code paths in automated tests properly.
Gimeno notes that GitHub's use of feature flags allows them to work incrementally on features. He shares that
We don’t use long-lived feature branches. Instead, feature flags allow us to work on small batches, which brings us many benefits.
Google notes that:
One key benefit of the trunk-based approach is that it reduces the complexity of merging events and keeps code current by having fewer development lines and by doing small and frequent merges.
Small pull requests also improve the ability of the reviewer to validate the change. According to research by Chris Kemerer and Mark Paulk, reviewing more than 200 lines of code per hour leads to a decrease in the efficacy of the review process. Another study by SmartBear on a Cisco Systems programming team found that developers should review no more than 200 to 400 lines of code at a time. Beyond that and their ability to detect issues in the pull request dropped significantly. Gimeno echoes this finding that "small batches are easier to review by other engineers in pull requests."
Gimeno states that producing small pull requests does require some up front planning. The teams at GitHub use two approaches to chunking up their pull requests. In the first, the main work is done within a larger pull request. Once ready for review, smaller pull requests are extracted from the main work. The second approach is to create branches off the branch that comprises the main pull request.
GitHub has flexibility in how they target their feature flags. The simplest approach is to target the individual user. This allows them to slowly roll out features and gather feedback. In cases where that isn't feasible, such as a change that affects how data is stored, then an entire repository may be toggled on for a particular flag.
They also make use of what Gimeno terms dark shipping. Dark shipping, or dark launching, is a process by which a backend feature is enabled for a subset of users or requests but without the user's knowledge. As Martin Fowler notes,
Dark launching works best when it's a process that enhances existing user interactions and isn't something users choose to do.
Gimeno indicates that they improved their automation for cleaning up stale code that is left behind after a flag is enabled. They run a script that "is capable of deleting code blocks, such as if/else statements, and modifying boolean expressions, in addition to reindenting the code afterwards." In the future, they want to have the script run automatically when the feature flag is considered stale.