Automated acceptance tests are an essential component of a continuous delivery style testing strategy, as they give an important and different insight into the behaviour of our systems. Developers must own the responsibility to keep acceptance tests running and passing, argued Dave Farley; you don’t want to have a separate QA team lagging behind a development team.
Dave Farley, independent software developer and consultant, spoke about acceptance testing for continuous delivery at Craft 2017. InfoQ is covering the conference with Q&As, summaries and articles.
A good acceptance test is an executable specification of the behaviour of the system, said Farley. He suggested to use the language of the domain for writing tests, as it improves readability and makes it easier to explain the test to product owners, customers, or anyone else. Also, maintaining your test cases will become easier.
There are two options to deal with time dependency in acceptance testing. You can ignore time aspects in your test cases, but then there will be situations in which you are not able to test, so you may miss errors. The other option is to treat time as an external dependency. Farley gave the example of how you can take control of time by creating a stub that provides clock functionality and add functions in your test framework to manipulate time.
InfoQ interviewed Dave Farley about what makes acceptance testing so hard and how to do acceptance testing.
InfoQ: Where do acceptance tests fit in? What’s their purpose?
Dave Farley: I see automated acceptance tests as an essential component of a Continuous Delivery style testing strategy. Combined with low-level unit tests, best created as the fruits of Test Driven Development (TDD), acceptance tests give us an important and different insight into the behaviour of our systems.
Acceptance tests evaluate the system from the perspective of an external user, in a production-like test environment. They are best created in the form of "Executable Specifications" for the behaviour of our system.
Finally, in addition to providing this executable specification view, asserting the functional contract of our software with our users, they also provide us with an automated definition of done, and the ideal opportunity to rehearse the automated deployment and configuration of our systems.
InfoQ: What makes acceptance testing so hard?
Farley: Coupling! Most organisations don’t work hard enough on ensuring that these sorts of high level functional test are loosely-coupled from the system-under-test. Then every time the system-under-test changes, the tests fail and need modification. My talk is largely focused on this aspect of the problem. There are techniques using Domain Specific Languages to define "Executable Specifications" rather than "Tests", test isolation and concentrating on *what* the system needs to do rather than *how* it does it, that help significantly in maintaining an effective separation of concerns between test case (aka specification) and system-under-test.
Another key learning in fixing the problems of living with these kinds of tests is putting the responsibilities into the right hands. It is an industry-wide anti-pattern to have a separate QA team lagging behind a development team, even if they are trying to automate the tests.
If we are successful in creating genuine "executable specifications" for the behaviour of our systems, then when a developer introduces a change that causes the test to fail, it MUST be that developer’s responsibility to fix the problem. After all, their failing change means that the system no longer meets its functional specification (as described by it’s suite of acceptance tests). Anyone can write one of these tests, but developers must own the responsibility to keep them running and passing as soon as the test is executed for the first time.
InfoQ: How do you isolate test cases from one another?
Farley: There are three levels of isolation that I think are important: system-level, functional and temporal.
System-level isolation is about being very specific about the boundaries of your system-under-test. We need to be able to very precisely control the state of the system-under-test so, to achieve that, we need our tests to be right at the boundary of our system. We don’t want to test our system by interacting with some up-stream system. We will lack sufficient control to examine difficult edge cases if we do that. We don’t want to collect results from some down-stream external system either. We want to invoke the behaviours of our system by using its normal interfaces, whatever form they take, directly and collecting results, often using stubs for external dependencies so that we can collect data for assertions or inject data to invoke behaviours.
Functional isolation allows us to use natural boundaries within a multi-user system to isolate test cases from one another. This allows us to share the start-up costs of a large complex system and then run many tests together, in parallel, without them disturbing one-another. The idea is very simple. As part of the setup of each test case, we create new accounts, products, market-places (whatever concepts represent the natural functional boundaries of our system) and use these functional elements only in the context of a single test-case. If I was creating tests for the Amazon bookstore, each test would begin by registering a new account and creating a new book for sale.
Temporal isolation allows us to run the same test repeatedly and get the same results. Again, I don’t want to have to tear-down the system at the end of each test - too expensive. Instead I would like to run the same test case over and over again in the same instance of a running, deployed system. For this we combine the functional isolation technique, described above, with the use of a proxy-naming technique. When our test asks us to create a new account or book, our test infrastructure intervenes. Instead of naming the item using the name we give it in the test case, it creates an alias for that name and uses that instead. Within the scope of the test case the test can always use the name it chose, but the test infrastructure maps that within the scope of any given test run, to the unique name it chose. This allows us to run the same case over and over and still get the benefits of functional isolation.
InfoQ: You suggested that we shouldn’t use UI Record-and-playback Systems. Can you explain why? What’s the alternative?
Farley: I want to make my executable specifications focus on user behaviours, not on the details of how I have implemented my user interface. If I can encode my test cases in terms from the problem domain, like "placeOrder" or "payForPurchases", I can use the same test case and evaluate behaviour, whatever the nature of the interface to the system. If my system allows me to placeOrders though a graphical user interface and through a REST API, then I can write one test, and it is a viable specification for both channels of communication. The separation of concerns is complete. If you can do that, then relatively minor changes, like re-writing your Web UI, are simple to cope with and keep all of your test-cases (executable specifications) viable.
By their nature, UI Record-and-playback systems start from assuming that your UI is the focus of the test, rather than the behaviour of your system that the user wants. This is actually a technically-focused kind of test, rather than a behaviourally-focused kind of test. As a result, these tests are always more fragile, more prone to break in the face of relatively small changes to the system-under-test. So I avoid them - too much work in the long run.
InfoQ: How can we make test cases and acceptance testing as a whole more efficient?
Farley: Efficiency in tests is pervasive. We need to think about testing the right sorts of things in the right places.
I don’t want to do detailed every-line-of-code type tests in acceptance tests - too expensive. I do that kind of testing using TDD and get that feedback sooner as a result.
I don’t want to use production data for my tests- too large and so slows down the provisioning of my test environments, the start-up of my system-under-test and the execution of the test cases themselves. I want test data to be minimal and to allow me to precisely target the behaviour that I am trying to test.
I will work to optimise my application in unusual places, e.g. account registration, and to improve test and application start-up times so that my acceptance testing will get to an answer more quickly.
I want to avoid the use of sleeps, or waits in my testing. These are usually used as sticking-plasters to try to hide from race-conditions. They add inefficiency and usually end up moving the race-conditions somewhere else instead of solving the real problem.
Acceptance testing is an important part of a high-quality testing strategy, but it is only a part. It should not replace TDD and the low-level unit tests that TDD produces; it should compliment them.
My minimal deployment pipeline- the kind of pipeline that I would create for even the simplest of projects- includes "Commit Tests" (TDD), "Acceptance Tests" (Executable Specifications) and "Automated Deployment to Production".