Mesa CI is a continuous integration system at Intel for running builds and compliance test suites for the Mesa graphics library. It runs across more than 200 systems and runs tens of millions of tests per day.
The Mesa project is an OSS implementation of graphics standards like OpenGL and Vulkan. Intel and AMD use it as the basis for their graphics drivers. It acts as a translation layer between a graphics API and the hardware drivers. Mesa developers use a framework called Mesa CI for continuous integration, especially for their test suites. Mesa needs to support a wide variety of vendor graphics drivers as well as different versions of the API standards. This necessitates a comprehensive suite of tests that needs to run with every commit to ensure functionality and performance. Piglit, dEQP, VK-GL-CTS, and Crucible are some test suites that run on Mesa CI. At the recent X Org Developer’s conference, Mark Janes and Clayton Craft shared some details about Mesa CI.
Mesa CI is a set of configuration files, a job scheduler and a job implementation that can run on Jenkins. Written mostly in Python, it is driven by the principle that "the most important design consideration for the Mesa CI is to minimize configuration in Jenkins". The Mesa CI can theoretically run on top of any CI infrastructure, not just Jenkins, according to the documentation. It’s currently used for developer testing, release verification, pre-silicon (hardware) testing in simulators for Intel drivers, performance testing and validation of conformance test suites. The typical developer testing turnaround time is 30 minutes even though a commit to the master branch kicks off millions of tests. A custom database provides immediate access to test history, and the system also generates performance trend lines for common benchmarks.
Mesa CI was started in 2014 but the benefits of automated tests at Mesa were understood earlier than that. Since then, the release process has been formalized and has evolved (PDF). In a previous article (PDF), Janes had shared the philosophy behind setting up continuous integration for Mesa. Making tests a first class artifact, and prioritizing test reliability and run time were among them.
Image Courtesy - https://xdc2018.x.org/slides/Mesa_Continuous_Integration_at_Intel.pdf
Each platform has a separate CI config file, and some test suites need a separate config for 32 bit builds. A test failure caused by a commit triggers a series of steps, some of which are manual. The failed tests are added to a "skip list" in the CI config. However, this is not done by the developer and it’s unclear if it's because the test frameworks don't have a way to annotate test cases so that they are ignored. This feature is available in common test suites like JUnit and NUnit. Tests in the skip list are still run but not reported when they fail. This avoids missing out on test coverage until the bug is fixed.
When features are being developed on a branch which does not have a fix for a certain bug, it will fail the build as the CI configuration tracks the master branch. Mesa CI records the blamed commit for every test status change. In this case, since the bug fix was pushed to the master branch, it recorded the commit identifier when the test started passing. Mesa CI checks if the feature branch has the fix. If not, it understands that the test status is wrong, i.e., the test is expected to fail. Eventually, old stable branches run on Mesa CI as they have the CI config for test status that is consistent with the source code on that branch. However, tests can still fail against older branches there are hardware updates on the test machines, as such updates impact all branches.
Future plans for Mesa CI include showing logs and status of components during the build execution, and allowing developers to do A/B comparison of builds. A public dashboard is also available.