Gil Tayar, senior architect and developer relations at Applitools, recently presented at ReactiveConf 2019 in Prague the specific issues behind CSS testing and how they can be addressed through methodology and tooling.
Tayar started with emphasizing a trend which has strengthened over the last five years in the front-end development area: front-end developers write automated tests for their own code themselves. A key reason behind that trend is the confidence that tests provide when adding, removing, or refactoring code. In the recent years, the front-end has slowly built testing methodologies based on unit testing, component and multi-component testing (using for instance JSDOM), and browser-based automation testing (with tools like Cypress or webdriverio).
However, front-developers still often do not know how to write automated tests for CSS, resorting to manual testing, or skipping CSS testing entirely. At the same time, such testing is key to automate the testing of responsive user interfaces. A key testing technique, functional testing, i.e. testing an output produced by feeding inputs in a function under test, is not an option for testing CSS. Taylar asserted that testing CSS is a hard problem, because, it is, by nature visual, not functional. The CSS testing problem can thus be reformulated into a visual testing problem, and Tayar proceeded with a list of methodologies, techniques and tools to address the visual testing problem.
Tayar explained that the dream methodology consists of navigating to a page, taking a screenshot, verifying that the screenshot looks good or conforms to some design system. This is akin to a pattern recognition problem that could be dealt with through machine learning techniques, but as Tayar lamented, machine learning algorithms are not that good (yet). The mitigating strategy suggested by Tayar is to replace visual testing, with visual regression testing.
The following code illustrates the methodology (using Cypress’s cy
module):
it ('home page visual test'), () => {
// pick up a view port dimension
cy.viewport(1024, 768);
// navigate to the page
cy.visit('http://localhost:3000');
// Bring the interface to some state by simulating some user actions
cy.get('#searchBar').type('test');
// Take a screenshot and compare it to the previously validated baseline screenshot
cy.matchImageSnapshot('home-page');
}
The idea is to start with a validated (generally manual) screenshot and test that what was, still is. Differences are automatically detected and the tester manually accepts or rejects the differences. Manual tester intervention is necessary as the testing program cannot know whether the new screenshot is the result of new features, that is, a valid and expected screenshot. The tester thus must manually invalidate the baseline screenshot when that case occurs. Provided that false negatives are infrequent (i.e. program modifications do not systematically result in invalidating previous screenshots), the aforementioned methodology offers progress versus the use of entirely manual visual testing.
Tayar however mentioned four issues with the previous methodology. First of all, taking a screenshot involves working around the heterogeneity of executing environments. Cypress under Chrome allows developers to take screenshots of a window, a full page, a selector or a region. Selenium or webdriver only natively proposes screenshotting a browser’s window. That first issue may be remedied by changing the tooling when needed and possible, or resorting to existing commercial tools which provide the necessary screenshotting options. Such tools for instance produce a screenshot for a page starting with the current visible window then repetitively scroll down the page, eventually sewing up all the screenshots in one.
Comparing screenshots, the second issue is hard. The naive approach, pixel by pixel comparison, often does not provide reliable-enough results. What appears to be the same picture to a human will in fact be different image objects for the computer, as the exact pixels being represented may vary according to the graphic card used, or aliasing. Tayar gave the example of the same jpeg picture on Chrome 67 and Chrome 68, with significant differences when compared pixel by pixel. Tayar then gave another example of the same interface on the same machine, displayed twice in the same browser, with a five minute interval, and which also presented important pixel-by-pixel differences.
A mitigating strategy to address too-stringent pixel-by-pixel image comparisons is to configure manually an acceptable difference threshold. The difference threshold may account for slight differences in color (typically not perceptible to the human eye) or aliasing. The threshold must thus be tweaked regularly to significantly reduce the amount of false negatives. As before, tools exist to address this issue in a more sophisticated way by applying advanced comparison algorithms which try to look at images the way a human would. Tayar emphasized that these tools are the most significant advance the visual testing field has seen in the last few years. Most of these tools have free and OSS plans, and as such can be used by developers in a wide range of project contexts.
The third issue is related to managing the screenshot comparisons. As mentioned before, visual regression testing includes a manual part in which developers invalidate a past screenshot. This manual handling can become cumbersome when there are hundreds of comparisons to review. Tayar provided three mitigating strategies to alleviate the issue. One strategy consists of invalidating a large series of screenshots through the command line (with Cypress this would be npm run cypress:run -- --env updateSnapshots=true
). A second strategy consists of going through the directories where the snapshots are stored, and replacing the current snapshots by the new snapshots where needed, thus removing the false negatives. The third strategy involves using commercial tools which often include a dashboard to speed up the manual invalidation, with configurable levels of granularity.
The fourth issue originates from the need to test against all responsive widths (like 1024 x 768, iPhone, iPad), pixel densities (Retina Desktop), and browsers. There again, three ways to tackle the issue co-exist. The first obvious solution is to run the same visual test multiple times for each width/density/browser configuration. The second solution improves on the first one by parallelizing the tests. While this may require extra infrastructure, a lot of companies use that technique. The last solution again consists of outsourcing the testing to commercial cloud testing services providers.
Tayar concluded the talk by running the audience through a live demo of visual regression testing, illustrating some of the solutions to the four previously described issues.
ReactiveConf is a yearly conference targeted at developers with talks addressing the latest technologies and trends in software development. ReactiveConf 2019 took place from Oct. 30 until Nov. 1, 2019, and is the fifth installment of the event.