Dave Aronson gave at JSConf Hawaii 2020 an introduction to mutation testing. Aronson presented the rationale behind mutation testing, its benefits, drawbacks and pitfalls, and how it works under the hood.
Aronson started by explaining that mutation testing is a way to test both code and the matching unit test suite by changing (mutating) the code under test. A typical example-based test guarantees the correct behavior of a function under test only for that example. Testers thus use a smart sample of a generally infinite test space so that the guarantees given for this sample extend to a much larger space. For any test in the test space, a method must also be devised to validate or invalidate the observed test results.
Testers thus face two challenges, which Prof. Tsong Yueh Chen called the reliable test set problem and the oracle problem:
The oracle problem refers to situations where it is extremely difficult, or impossible, to verify the test result of a given test case (…).
The reliable test set problem means that since it is normally not possible to exhaustively execute all possible test cases, it is challenging to effectively select a subset of test cases (the reliable test set) with the ability to determine the correctness of the program.
Mutation testing tests the reliability of the chosen test set, i.e. the quality of the testing code. Starting from a situation in which all tests are passing, the tested code is modified, and the test suite is repeated. If the test suite is reliable, it should catch erroneous implementations of the function under test and the execution of the suite should fail.
Aronson explained:
[Mutation testing] ensures that our unit tests are strict, by finding the gap in our unit test suites. […] Lack of strictness usually comes from lack of tests, poorly written tests, or poorly maintained tests[…]. It also helps ensure that our code is meaningful, so that any change to the code will produce a noticeable change in behavior.
Aronson proceeded with the drawbacks. First of all, mutation testing is CPU-intensive and is generally not meant to be run on the whole codebase on every save — tools usually offer an incremental mode as a mitigation measure.
Aronson also contended that mutation testing is not a beginner-friendly technique. Tests that are still passing, even when the code under test was mutated, require careful interpretation. The passing behavior is generally a false positive, meaning that the current test set may not be reliable enough (missing tests). It may also be that part of the tests themselves are erroneous.
Mutation testing is thus a fault-based technique that strives to produce erroneous implementations of a function by inserting faults in the source code. Aronson gave the example of the Chaos Monkey resiliency tool in place at Netflix that generates random instance failures to test system stability and discover error recovery flaws.
Mutation testing tools generally operate by finding functions and their corresponding tests. The function code is then parsed into an abstract syntax tree. Mutations are applied at the syntax tree level and the function code is then regenerated from the mutated syntax tree. Conditionals may see their boundaries mutated (from <
to <=
, ==
to !=
, or if (cond)
to if (true)
). Increments (++
) may be turned into decrements (--
). Some method or constructor calls may be voided.
The second edition of JSConf Hawaii took place in February in Hawaii. JSConf Hawaii is a three-day, single-track conference dedicated to web development and JavaScript. The full talk is available online and contains plenty of additional technical details, illustrations, and code examples.