BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News A Guide to Writing Properties of Pure Functions - John Hughes at Lambda Days 2020

A Guide to Writing Properties of Pure Functions - John Hughes at Lambda Days 2020

This item in japanese

John Hughes, co-designer of Haskell and Quickcheck, recently discussed property-based testing at Lambda Days 2020. Hughes presented five different strategies in his talk for determining properties and compared their effectiveness. Metamorphic and model-based properties reportedly showed high effectiveness.

Hughes first discussed a common problem faced by developers using property-based testing (PBT) to test pure functions. With example-based testing, the expected output of the function under test can be computed ahead of time and compared to the actual output of the function. Property-based testing, however, typically generates (semi-randomly) a large number of test sequences. Computing the expected outputs of the function under test for a large set of values that are not known ahead of time is not easier than implementing the function in the first place. This challenge is called the test oracle problem and is described by Prof. Tsong Yueh Chen as follows:

The oracle problem refers to situations where it is extremely difficult, or impossible, to verify the test result of a given test case

Developers may end up replicating the original code in the tests, and as such introduce the same bugs in the testing code as in the tested code.

To solve this conundrum, Hughes recommended identifying and checking properties of the output of the function under test. One such property for the reverse function, which takes a list and returns the list in reverse, is:

reverse (reverse xs) === xs

The challenges now turn to finding a good-enough set of properties which allows developers to either find bugs, or ensure the correctness of the function implementation. Hughes demonstrated that using only one property was often not enough by giving the following erroneous `reverse` implementation:

reverse xs = xs

Hughes thus proceeded to give five systematic ways of generating properties and analyzed their effectiveness. The first method consists in identifying invariants of the function under test. To illustrate that method, Hughes considered a binary search tree (BST) data structure, endowed with the operations find, insert, delete, union. Such data structure has the property that the insertion of a key/value pair in a valid tree results in a valid tree. The same applies to the other deletion and union operations. A tree itself is considered valid if it adheres to the contracts of the binary search tree (in particular those referring to the ordering of keys).

The second method consists of finding a postcondition verified by the output of the function under test. For instance, Hughes gave the following post-condition for the BST’s find operation:

After calling insert, we should be able to find the key inserted, and any other keys present beforehand.

The third method relies on metamorphic properties. Metamorphic properties relate the outputs produced by the system under test when fed related inputs. Hughes gave the following metamorphic property for the BST’s insert operation:

insert k v (insert k' v' t)
===
insert k' v' (insert k v t)

The previous property simply means that the resulting tree from inserting two key/value pairs does not depend on the order of the insertions. Metamorphic testing is an active research area. A thorough examination of the technique can be found in Prof. Tsong Yueh Chen’s Metamorphic testing: A review of challenges and opportunities paper.

The fourth method is based on what Hughes termed inductive properties. Scott Wlaschin, senior software architect, explained in one of his talls on PBT:

These kinds of properties are based on structural induction – that is, if a large thing can be broken into smaller parts, and some property is true for these smaller parts, then you can often prove that the property is true for a large thing as well.
(…)
Induction properties are often naturally applicable to recursive structures such as lists and trees.

Hughes gave the following inductive property for the BST:

union nil t ==== t
union (insert k v t) t'
=~= 
insert k v (union t t')

In other words, a union operation on a tree can be related to a union operation on a smaller tree. By induction on the size of the tree, it can be shown that the only function satisfying those properties is the correct union operation of the BST. Inductive properties are often themselves metamorphic properties with the peculiarity that they form a complete specification, being linked to inductive proofs.

The fifth method relies on model-based properties. This implies the existence of a model of the system under test. In the BST’s case, a simple model is the list of key values stored in the tree, as obtained by traversing the tree data structure. Operations on the system under test translate into operations on the model. Checking properties of the data structures translate into checking properties on the model, with the idea that the latter checking is significantly simpler or easier to perform reliably.

To assess the effectiveness of the identified properties, Hughes introduced bugs in every operation of a correct BST implementation and counted the bugs found by each property generation method:

 

Type of property Number of properties Effectiveness
Invariant 4 38%
Post-condition 5 79%
Metamorphic 16 90%
Model-based 5 100%

 

The model-based properties reach 100% effectiveness and are the closest thing to having an oracle for the system under test. The model acts as a simplified version of the system under test for which an oracle is available. Metamorphic properties are also very effective in finding bugs. This is in line with results originating from different industries. As presented in Metamorphic Testing: A Review of Challenges and Opportunities, a major feat of metamorphic testing is finding 100+ new bugs in two popular C compilers (GCC and LLVM).

Hughes appropriately concluded in the paper associated with his talk:

These results suggest that, if time is limited, then writing model-based properties may offer the best return on investment, in combination with validity properties to ensure we don’t encounter confusing failures caused by invalid data. In situations where the model is complex (and thus expensive) to define, or where the model resembles the implementation so closely that the same bugs are likely in each, then metamorphic properties offer an effective alternative, at the cost of writing many more properties.

The full video recorded at Lambda Days 2020 is available online. Lambda Days is a developer conference focusing on functional programming methods and techniques. Lambda Days 2020 was held on the 13th and 14th of February 2020 in Kraków, Poland.

Rate this Article

Adoption
Style

BT