Advances in computer vision algorithms and the application of modern artificial intelligence (AI) techniques have made writing visual tests practical. With AI in testing, autonomous testing becomes possible. The boring and rote tasks will be delegated to the AI so that the tester can do the thinking.
Gil Tayar, evangelist and senior architect at Applitools, spoke about what AI techniques can bring to automated testing at Craft Conference 2018. InfoQ is covering this event with interviews.
InfoQ interviewed Tayar about the main challenges that testing is facing nowadays, the six levels of autonomous testing and how the software industry is doing compared to these levels, what can be done to automate visual testing, how we can use machine learning in testing, and the impact that AI will have on the tester’s job.
InfoQ: What in your opinion are the main challenges that testing is facing nowadays?
Gil Tayar: Testing, unfortunately, is not yet mainstream. I remember the days, far back in the 80s and 90s, where the whole idea of QA, and manually testing software, was laughed at. Not only was there no automation, there was just no testing whatsoever! Thankfully, the industry has evolved beyond that, and testers regularly test the software before it’s shipped, a lot of it using test automation.
But in today’s "internet time", that is not enough. Agility is forcing us to deploy faster and faster, and that is a good thing! It is a good thing because it is forcing the developers themselves to start testing their software, and not (lazily, I might add) relying on gatekeeper testers to test their software.
But this idea, that developers must test the software they write, is not yet mainstream. Most developers do not write tests that check their code. They test manually, or rely on testers to ensure that their software is working correctly.
And this is the main challenge that testing is facing—to make developers write their own tests. And AI won’t help here, unfortunately! AI is a tool, to be used when testing. But if you don’t test, AI won’t help you. Yes, AI helps (and will help even more in the future) with testing, but if you’re not testing, AI will not be your salvation.
Developers must adopt the testing mindset. That is the main challenge we’re facing in testing.
InfoQ: In your talk, you presented the six levels of autonomous testing. What are they?
Tayar: They are a mirror image of the five levels of autonomous driving. They are a description of how much AI will help us in our testing.
- No autonomy whatsoever. You’re on your own in writing your tests!
- Driving Assistance: the AI can see the page, and help you write your assertions. You still write the code that "drives" the application, but the AI can check the page and ensure that the expected values in it are the correct ones.
- Partial Automation: just looking at the differences between the actual page and the expected (baseline) one is nice, but a higher level of understanding is what a level 2 AI will need here. For example, if all the pages include the same change, the AI will understand that it is the same page and show it to the human as one change. Moreover, an AI will look at the layout of the page and the content of the page, and categorize each change as a content change or a layout change. This will help us if we want to test responsive web sites—even if the layout changes slightly, the content should be the same. This is the level tools such as Applitools Eyes are at.
- Conditional Automation: In Level 2, any failure or change detected in the software still needs to be vetted by a human. A Level 2 AI can help analyze the change, but cannot understand whether a page is correct or not just by looking at the page. It needs a baseline to compare against. But a Level 3 AI can do that and much more, because it can apply machine learning techniques to the page. For example, a Level 3 AI can look at the visual aspects of a page and figure out whether the design is off, based on standard rules for design, rules like alignment rules, whitespace use, color and font usage, and layout rules. An AI can also look at the content in the page, and based on previous views of the same page, can determine, without human intervention, whether the content makes sense. We’re not there yet, but we are getting there.
- High Automation: up to now, all the AI did was run the checks automatically. The human is still driving the test, and clicking on the links (albeit using automation software). Level 4 is where the AI is driving the test itself. The AI will be able to understand, through looking at real users driving the app, how to drive the tests itself. Now the AI can write the tests, and can write the checks that tests the pages. But this is not the end game; it will still need to observe humans at work, and will occasionally need to defer to human testers.
- Full Automation: I have to admit that this level is a bit scifi to me. At this level, the AI "converses" with the product manager, understands the specs of the product, and writes the tests itself, without the need for any human.
InfoQ: How is the software industry doing compared to these levels?
Tayar: Some companies are definitely at stage 2, progressing towards step 3. I believe level 4 will take time, but we’ll reach it. As to level 5? I’m skeptical, but as they say in Hebrew: "since the destruction of the second Temple, prophecy has been given only to fools and babies".
InfoQ: What can be done to automate visual testing?
Tayar: Funnily enough, a few years ago, visual testing could not even be automated by developers, let alone an AI! The idea of taking a screenshot of the app, and checking it against a baseline was laughable, as the amount of false positives generated by a naive pixel-based algorithm was too high to be practical.
But advances in computer vision algorithms, and the application of modern AI techniques, have made writing visual tests practical. When you think about testing, visual testing is the missing piece of the puzzle. Everything should be tested, and now, thanks to visual testing, everything can be tested.
So how do we use AI techniques to remove all those false positives, and to be better than the naive algorithm that compares pixels? The answer is not one technique: it is a composite of different algorithms, each of which gives one piece of the accuracy puzzle, and a decision tree that combines the results of the different algorithms (where sometimes one algorithm feeds the results to another one) to determine the final result. But what kind of algorithms are used? An example would be our segmentation algorithm. This algorithm tries to determine which parts of the image is text and which images. This is not a trivial problem, as we need to figure out that stuff like emoji’s are text, and alternatively, text inside an image is part of the image, and not some new text. This is where deep learning techniques really shine, and since we’ve used them in our code, we’ve upped our accuracy from 88% to 96%.
So to automate visual testing, you can use modern visual testing tools such as Applitools Eyes. Those tools encompass and make simple advanced AI techniques, and remove the need for you to understand and implement them. Look at this example:
In this example, we validate a blog application. In a regular, non-visual, and non-AI test, validating the home page and validating a specific blog post would have necessitates tens of checks on each element in the page (and we would have probably missed some…). But in the example, we just "checkWindow", the screenshot is uploaded to a server, and that server uses AI techniques to validate the screenshot.
I have always said that tests remove the fear from the development process—removing the fear of adding features to our code, and, even more importantly, removing the fear of refactoring our code. But why should mitigating the fear of code change only be applied to our business logic? Why can’t our visual code (in CSS, HTML, and JS files) also be tested, enabling us to remove the fear of changing it? Advances in AI finally enable us to do this.
Appraise is another tool that can be used for visual testing. Earlier InfoQ interviewed Gojko Adzic about automating visual testing with Appraise:
Adzic: Appraise takes the approach of specification by example, but applying it to visuals. We take concrete examples, push them through an automation layer to create executable specifications, and then use headless Chrome to take a screenshot clip and compare it with the expected outcome. Unlike the usual specification-by-example tools, where any difference between expected and actual results is wrong, Appraise takes the approval testing approach. It shows all differences to humans to review, and lets people decide if the difference is correct, and if so, approve it.
InfoQ: How can we use machine learning in testing?
Tayar: I recently saw a talk where testers used machine learning to test Candy Crush (yes, that game). Because Candy Crush is a highly random game, where anything can happen, standard testing techniques are hard to implement. King’s goal was to understand whether a level is solvable, and how difficult it it. They first started with a brute force technique which tried all paths, and checked how many of them lead to success. That worked (guided by algorithms that weeded out bad combinations), but took hours, as the number of combinations is astronomical. To solve this problem, they used Genetic Algorithms. In this technique, they take some base algorithms (always use the pink candy, always try to unblock…) and try them out on a level. Then they take the best algorithms (based on how and if they finished the level) and "breed" them to create an algorithm that combines the two. They continue this process and finally reach an algorithm that is good at what it does.
Another example is Convolutional Neural Networks which enables us to analyze image and extract semantic information from it. This is useful in applications that generate images. Visual testing tools use it to understand the different parts of the page, and how they relate to each other, and to extract the different text and image blocks, as I outlined above.
InfoQ: What impact do you expect that AI will have on the tester’s job?
Tayar: What impact did the tractor have on the farmer? A huge impact! It allowed the farmer to substantially reduce rote, boring, and physically hard tasks, and delegate them to machines. I believe the same thing will happen with AI in testing. The boring and rote tasks will be delegated to the AI, and the tester will have to do what they always should have been doing: thinking.
Most testers won’t need to adapt, as they do a lot of thinking anyway—thinking about the product, how to test it, edge cases, etc.—and are looking for tools that alleviate the boring tasks. Some testers will need to adapt, and start doing higher-level, thinking tasks. And, yes, some testers won’t be able to adapt. But I believe these are in the minority.