Oftentimes, complex software projects span across multiple repositories on account of external dependencies. This can be a challenge in itself, explains Google WebRTC engineer Patrik Höglund, who also described Google's approach to developing software that uses dozens of third-party libraries such as Chrome.
The complexity of managing multi-repository projects stems from the necessity of keeping track of changes in external dependencies, especially so when their codebases are rapidly changing. In such cases, it is simply not possible to skip updating the dependencies, because this would do away with important fixes and new features. On the other hand, if updating external libraries is not done quite often, then the risk is that the accumulated change will break the application and exact a heavy toll to catch up with all the changes.
In Google's jargon, bringing in a new dependency version is called "rolling." Rolling is a process that could require changing Chrome's code to adapt to any new API, and this could in turn break Chrome's test suite. Of course, this will not be known until the full test suite is run, and this would be too late. To avoid breaking Chrome tests, developer use dedicated bots (continuous building/testing machines) to detect any issues that might arise by rolling a dependency.
Such bots, called "FYI bots," will not use the pinned version of a given dependency, say WebRTC, and will use instead WebRTC HEAD. Running the bot will thus provide any of the following outcomes:
- The bot succeeded (is green): this means that a new roll will likely go smoothly.
- The bot failed to compile: this means that Chrome code must adapt to some new API.
- The bots failed testing (is red): this means that there was some semantics change or newly uncovered bug that needs to be addressed.
To make this process run even more smoothly, Google developers added a new policy to reduce the chance that FYI bot builds could break. This had the undesirable effect of making those bots not return meaningful information until Chrome code was changed to adapt to new APIs, and this could last days. To solve this, developers started following the API prime directive, which consists in not breaking existing clients when evolving a component. One way of accomplishing this is not removing any existing API and simply add new APIs as required. Say, e.g., that we have a method signature:
class WebRtcAmplifier {
...
int SetOutputVolume(float volume);
}
And let's consider the case that this should be changed into the following:
class WebRtcAmplifier {
...
int SetOutputVolume(float volume, bool allow_eleven1);
}
In order to not break any clients, we can go with the following API:
class WebRtcAmplifier {
...
int SetOutputVolume(float volume);
int SetOutputVolume(float volume, bool allow_eleven);
}
Thanks to this policy, Google developers can ensure that their bots are always green and if they ever go red, then they know the change should be rolled back.