Over the last few years, Dropbox engineers have rewritten their client-side sync engine from scratch. This would not have been possible had they not defined a clear testing strategy to allow them to build and ship the new engine through a quick release cycle, writes Dropbox engineer Isaac Goldberg.
The key requirement that ensured the new sync protocol, Nucleus, was testable was the principle of "designing away invalid states" thanks to the use of Rust's type system. This was a clear step forward from the old sync engine, which evolved in such a way it could transition through invalid states to reach its final, supposedly legal state. The key design decision that made this shift possible was representing Nucleus state through three trees associated to consistent filesystem states: the remote filesystem state, the local filesystem state, and the last known fully synced state.
The Synced Tree is the key innovation that lets us unambiguously derive the correct sync result. If you’re familiar with version control, you can think of each node in the Synced Tree as a merge base. A merge base allows us to derive the direction of a change, answering the question: “did the user edit the file locally or was it edited on dropbox.com?”
Another key difference between Dropbox legacy sync engine and Nucleus lies with their concurrency models. While the legacy engine used threading in a completely free way, Nucleus ties all control tasks to a single thread, with only secondary operations such as I/O and hashing offloaded to background threads. For testing purposes, all background operations can be serialized to the main thread, ensuring thus test reproducibility and determinism.
The cornerstone of Dropbox testing strategy are randomized tests, due to the amount of edge cases that could arise when you run on hundreds of millions of users' machines. To make test randomization effective, says Goldberg, it must be fully deterministic and reproducible. This is accomplished by using a pseudo-random number generator and logging the random seed used to initialize it in case a test fails. This approach makes it possible to re-run a random test that failed to investigate it further.
Every night we run tens of millions of randomized test runs. In general, they are 100% green on the latest master. When a regression sneaks in, CI automatically creates a tracking task for each failing seed, including also the hash of the latest commit at the time.
Goldberg further details how two testing frameworks used by Nucleus work. One is CanopyCheck, meant to catch bugs in the planner, the core algorithm in Dropbox sync engine, responsible for building a series of operations that can incrementally converge the three trees that represent Nucleus' state. CanopyCheck thus generates three test trees representing the remote and the local filesystem plus the last known synced state, then iteratively asks the planner to process the trees until they have fully converged. In doing so, it enforces a number of invariants, based on an analysis of the generated trees, that ensure the final synced result is correct.
CanopyCheck leverages Haskell QuickCheck approach consisting in attempting to find a minimally-complex input that reproduces the failure for each failed test case. This approach is called "minimization" and is carried through in CanopyCheck by iteratively removing nodes from the input trees and verifying that the failure persists.
The second testing framework Goldberg describes in details is Trinity, which focuses on engine's concurrency and specifically on race conditions. Trinity makes heavy use of mocking to interact with Nucleus in random ways by injecting all kinds of async behaviour, such as modifying the local or remote state, simulating I/O failures, controlling timings, and so on. One subtle bit about Trinity behaviour is it runs fully on the main thread along with Nucleus using Rust Futures. Trinity intercepts all futures used by Nucleus and decides which one is to fail or succeed. Trinity can also work in non-mocked state, i.e. use the native filesystem and networking, to reproduce platform-specific edge cases, albeit at the expense of time efficiency.
All in all, thanks to this approach, Dropbox engineers could rewrite their sync engine without incurring the risk of regressing a huge number of bug fixes accumulated along the lifecycle of their legacy engine. The original article by Goldberg includes much more fine detail than what can be covered here, so make sure you do not miss it if interested.