Dropbox runs close to 35,000 builds and millions of automated tests everyday. Some of these tests can fail due to code changes, and some of them can fail in a non-repeatable manner. It is difficult to manually disable flaky tests, revert bad commits and notify test owners of failures at Dropbox's scale. Athena was built with the purpose of automating these actions. Athena notifies test owners of deterministic test failures, and detects and quarantines flaky tests. It does not revert breaking commits.
The code commit cycle starts with basic "pre-submit" tests, merging with the master branch, followed by "post-submit" tests. The post-submit category has expensive tests like UI and end-to-end. Tests in this latter category can fail due to environmental factors like time, date, and the underlying infrastructure. They can also fail due to inherent characteristics of the code like concurrency and random number generation. Dropbox earlier used to have a rotating team that reverted such changes. InfoQ got in touch with Utsav Shah, software engineer at Dropbox, for more details about Athena.
Athena uses a multi-step algorithm to determine if a test is flaky. Post-submit tests that fail multiple times in a period are marked, and the code is allowed to go through. The failing tests continue to run to determine if the failure is due to flakiness or bad code. Athena can also pinpoint the commit where a test started failing. The system has reduced operational overhead at Dropbox by auto-quarantining such tests. Shah notes that Athena "monitors most of the tests for all services (back-end, front-end) at Dropbox. It does not monitor tests for the desktop or mobile applications yet."
Image used with permission.
Continuous integration (CI) is used across Dropbox, although deployment strategies vary across services. Shah elaborates:
There’s a lot of backend services owned by specific teams, and there's a main web application that powers dropbox.com. We require all the relevant tests to be green to allow deploys for each service, with exceptions for non-live site/experimental services. Some service owners prefer a continuous deploy for their services, and others deploy manually.
Athena has a UI that can be used to monitor its status and provide visibility to developers, as Shah explains:
We show progress indicators per test, and the result of running the test on the different commits. Each test has a corresponding message that indicates which commit the bisect is on, and the lower and upper bound commit where the test could have been broken, so if a user is in a hurry, they can catch and resolve the issue themselves.
What about monitoring the Athena system itself? Shah comments that more needs to be done in this regard:
We have a suite of unit tests that use a fake of our internal CI system that help catch regressions in the business logic, like a bug in the bisect. We have integration tests against the CI system to verify that the API guarantees are met. This catches most problems.
Our service orchestration system at Dropbox has nifty auto-generated alerts which catch many basic issues. But we haven’t invested in custom real time monitoring of Athena yet. The system generally works unless there's an issue in the CI system, in which case we mostly know about it already. The lack of monitoring has caused problems a few times when certain upstream APIs are slow and time out, so we're thinking about adding some basic alerts to catch those.
Athena's roadmap includes auto-revert of breaking commits and includes desktop tests in its purview. There are no plans to open source it yet, due to internal dependencies.