InfoQ Homepage Articles Efficiently Arranging Test Data: Streamlining Setup with Instancio

Java

Efficiently Arranging Test Data: Streamlining Setup with Instancio

Nov 21, 2023 17 min read

Arman Sharif

reviewed by

Olimpiu Pop

InfoQ Article Contest

Share your knowledge Win a ticket to a QCon event
or an InfoQ Dev SummitFind out more

Key Takeaways

Automated test data generation supports rigorous quality assurance. Using randomized test scenarios may uncover potential defects not evident when only using static data.
Using randomized test generation tools can complement standard testing methodologies, such as Arrange-Act-Assert (AAA) and Given-When-Then (GWT), enhancing a developer’s ability to identify complex bugs.
The decoupling of test code from data generation simplifies maintenance, as changes to the data model do not require corresponding updates to test cases.
Automated generation of randomized inputs broadens test coverage, providing a wide array of data permutations that help catch edge cases, thus reducing the dependency on handcrafted, parameterized tests.
Automation in test data creation eliminates the tedium of manual setup, providing ready-to-use, complex data structures that save time and minimize human error.

The school of thought about developing software products more effectively evolved from Waterfall to Agile, but one aspect of development was never in doubt: the need to ensure quality.

Current approaches, like Continuous Deployment and Continuous Delivery, indicate that reliable test suites are directly connected to the speed of development and quicker customer feedback loop.

Approaches like Test-Driven Development (TDD) even help shape the structure of software products, and a healthy test suite is often an indicator of a well-maintained code base.

Regardless of the overall philosophy, one thing is certain: writing tests is beneficial.

Having the necessary tools to write tests will make the task less tedious and more enjoyable for developers, resulting in more comprehensive test coverage.

Patterns like Arrange Act Assert (AAA) or Given-When-Then (GWT) focus on providing a more predictable structure.

This article follows the AAA path, and it focuses on maybe the most challenging aspect of developing unit tests: test data generation, which is the Arrange step.

This article will compare manual test fixtures with automated data setup using Instancio, an open-source Java library for generating randomised data for unit and integration tests. Using randomised data in tests is not a novel concept. Property-based testing (PBT) frameworks, such as QuickCheck for Haskell, have used this approach for years. QuickCheck has inspired similar frameworks in other languages, including jqwik for Java. Although Instancio is not a PBT framework, it does have some similarities due to the randomised nature of the data.

In the following sections, we will cover the following in more detail:

Illustrate the usage of Instancio with a simple test refactoring exercise.
Highlight the library’s features using a few common testing use cases.

Testing Problem

Arguably, an important part of a test is represented by the quality of the data used. The arranging step of a unit test focuses exactly on that aspect, some of the key challenges of this step being:

Data setup may incur significant development effort and maintenance overhead, depending on the size and complexity of the data model.
Setup code is often brittle in the face of refactoring.
The setup code must be updated to reflect the changes when properties are added or removed.
Different test cases may require similar or overlapping data. This often leads to duplicated setup code with minor variations or helper factory methods with complex signatures

Existing approaches to data setup include helper methods within a Test class, helper classes providing static factory methods (the Object Mother pattern), and test object builders.

Helper methods are the simplest and most common approach. They work well when we don’t need a lot of variation in the data. Since the setup code is within the test class, the tests are typically easier to read and understand.

The Object Mother pattern can be useful when the setup code needs to be reused across multiple tests, or the setup logic is more complicated (for example, we may need objects created in different states).

One of the downsides of the Object Mother pattern is its lack of flexibility when objects need to be created in different states. This is where test object builders come in, although it does make set setup code more complicated.

Instancio attempts to solve some of the problems mentioned above. First, we will briefly cover what the library does. Next, we’ll review a refactoring exercise to show how manual data setup can be eliminated. We will also discuss the Instancio extension for JUnit 5 and, finally, cover some common data setup use cases and how they can be implemented using the API.

Instancio Basics

Instancio’s API accepts a class and returns an instance of the class populated with sensible defaults:

Non-null values
Positive numbers
Non-empty collections with a few elements

This allows typical data setup code like the below:

Person person = new Person();
person.setFirstName("first-name");
person.setLastName("last-name");
person.setDateOfBirth(LocalDate.of(1980, 12, 31));
// etc

to be replaced with a more concise version:

Person person = Instancio.create(Person.class);

By automating the Arrange stage, we can Act and Assert within a shorter time frame and with fewer lines of code.

How it Works

Instancio populates objects with reproducible, random data, which can be customized as needed. First, the class structure is converted to a node hierarchy, where each node represents a class or class property, for example:

This output can be obtained by running Instancio in verbose mode:

Person person = Instancio.of(Person.class)
    .verbose()
    .create();

The digit before the class name refers to the depth of the node, where the root node (the Person class) is at depth zero.

Once the node hierarchy has been constructed, Instancio traverses the nodes and populates values using reflection. By default, this is done by assigning values directly via fields; however, the behavior can be overridden to assign values via setters.

As mentioned earlier, the values are generated randomly. At first glance, the idea of tests based on random data may seem bad, but upon closer inspection, it is not without merit and actually offers some advantages. The fear of randomness stems from its inherent unpredictability. The concern is that it will make tests themselves unpredictable. However, it is important to note that many tests require a value to be present but do not actually care what the value is. For such tests, whether a person’s name is "John," "foo," or a random value like "MVEFZ," makes no difference. The test is happy as long as it has some non-null value. For cases where tests require specific values, the library provides an API for customizing the generated data.

We will look at customizing generated values (and reproducing tests) later in the article. Let’s go through a refactoring exercise to illustrate the library’s usage with a concrete example.

A Simple Refactoring Exercise

Instead of making up a test case, we will modify an existing test from an open-source project. The test we'll use is from the MapStruct samples repository. MapStruct is a great library for automating mapping between POJOs, and its samples are a perfect candidate for our purposes.

The following is the original version of the test that involves mapping a DTO to an entity (note: AssertJ is used for assertions). Although this is just a simple example, we may find similar tests in many real-world projects, often with dozens of fields. Some projects may have utility methods or classes for creating test objects, but the data setup is still done manually.

class CustomerMapperTest {
 
   @Test 
   void testMapDtoToEntity() { 
      CustomerDto customerDto = new CustomerDto(); 
      customerDto.id = 10L; 
      customerDto.customerName = "test-name"; 
      
      OrderItemDto order1 = new OrderItemDto(); 
      order1.name = "Table"; 
      order1.quantity = 2L; 
      customerDto.orders = new ArrayList<>(Collections.singleton(order1)); 

      Customer customer = CustomerMapper.MAPPER.toCustomer(customerDto); 

      assertThat(customer.getId()).isEqualTo(10); 
      assertThat(customer.getName()).isEqualTo("test-name"); 
      assertThat(customer.getOrderItems()) 
            .extracting("name", "quantity") 
            .containsExactly(tuple("Table", 2L)); 
  }
}

The refactored version of the test looks as follows:

@ExtendWith(InstancioExtension.class) 
class CustomerMapperTest { 

   @WithSettings 
   private final Settings settings = Settings.create() 
         .set(Keys.COLLECTION_MIN_SIZE, 0) 
         .set(Keys.COLLECTION_NULLABLE, true); 

   @RepeatedTest(10) 
   void testMapDtoToEntity() { 
      // Given 
      CustomerDto customerDto = Instancio.create(CustomerDto.class); 

      // When 
      Customer customer = CustomerMapper.MAPPER.toCustomer(customerDto); 

      // Then 
      assertThat(customer.getId()).isEqualTo(customerDto.id); 
      assertThat(customer.getName()).isEqualTo(customerDto.customerName); 
      assertThat(customer.getOrderItems()) 
            .usingRecursiveComparison() 
            .isEqualTo(customerDto.orders); 
   } 
}

Manual setup, as shown in the first example, usually manifests itself in the following properties:

If fields are added or removed, the setup code needs to be updated to reflect the changes
Use of hard-coded values adds noise to the test
Verifying optional data requires additional setup, where the values are not present
Testing for null, empty, and non-empty collections also adds complexity to the setup code, so typically, only a collection with one or two elements is verified

The refactored test offers some improvements. First, the data setup code has been reduced and no longer contains hard-coded values, making the test’s intention clearer: the mapper should map source values to the target object as-is, without any transformations. In addition, automating object creation means that no changes to the setup code are necessary if, for example, new properties are added or removed from CustomerDto or any class it references. We only need to ensure that the mapper handles the new property and that an assertion covers it.

Another advantage is that the same test method verifies all possible collection states: null, empty, and with sizes >= 1. This is specified by custom Settings (since, by default, Instancio generates non-empty collections). For this reason, @Test was replaced with @RepeatedTest to test all permutations. However, unlike using @ParameterizedTest, as is common in this situation, using @RepeatedTest does not introduce additional complexity to the test setup.

As a side note, AssertJ’s usingRecursiveComparison is very well suited for this type of test. Using Instancio to populate the object and usingRecursiveComparison to verify the results, we may not even need to update the test when fields are added to or removed from classes that can be verified with usingRecursiveComparison:

Instancio automatically populates an object
MapStruct automatically maps the properties (assuming the names match)
AssertJ auto-verifies the result via recursive comparison (again, assuming the property names match)

In summary, when everything aligns - which does happen occasionally - creation, mapping, and assertion are all handled by the respective libraries.
Bottom line: as the size and complexity of a data model increases, manual data setup becomes less practical and more costly in terms of development and maintenance.

Instancio Extension for JUnit 5

Instancio can be used as a standalone library with any testing framework, such as JUnit or TestNG. When used standalone (for example, when with JUnit 4 or TestNG), the seed used for generating the data can be obtained via the API:

Result<Person> result = Instancio.of(Person.class).asResult();
Person person = result.get();
long seed = result.getSeed();

This makes reproducing the data a little more cumbersome.

With JUnit 5, Instancio provides the InstancioExtension. The extension manages the seed value, ensuring that all objects created within a single test method can be reproduced in case of test failure. When a test fails, the extension reports the seed value as follows:

Test method ’testMapDtoToEntity’ failed with seed: 12345

Using the reported seed value, we can reproduce the failure by annotating the test method with @Seed annotation:

@Seed(12345) // to reproduce data
@Test 
void testMapDtoToEntity() { 
    // .. remaining code unchanged 
}

The seed value specified by the annotation will be used instead of a random seed. Once the test failure is resolved, the annotation can be removed.

Parameter Injection

In addition, the extension adds support for injecting arguments into a @ParameterizedTest. For example, we could also re-write the refactored test shown earlier as follows:

@InstancioSource 
@ParameterizedTest 
void testMapDtoToEntity(CustomerDto customerDto) { 
   // When 
   Customer customer = CustomerMapper.MAPPER.toCustomer(customerDto); 

   // Then 
   // same assertions as before... 
}

This allows any number of arguments to be provided to a test method.

A Collection of Common Use Cases

When testing specific behaviors, there are instances where we require the creation of an object in a particular state. For this purpose, the library offers a fluent API that allows for the customization of objects. The subsequent examples outline several typical scenarios involving data setup commonly arising during testing. We’ll provide a use case and a sample code illustrating how to achieve it.

Customizing an Object’s Values

In this example, we create an object populated with random data but with specific values for some of the fields. We will assume our test case requires a customer from Great Britain with a past registration date. We can achieve this as follows:

Customer customer = Instancio.of(Customer.class) 
    .set(field(Address::getCountry), “GB”)
    .set(field(Phone::getCountryCode), “+44”)
    .generate(field(Customer::getRegistrationDate), gen -> gen.temporal().localDate().past())
    .create();

The gen parameter (of type Generators) provides access to built-in generators for customizing values. Built-in generators are available for most common JDK classes, such as strings, numbers, dates, arrays, collections, etc.

The set() method works as a setter, but unlike a regular setter, it will be applied to all generated instances. For example, if the customer has more than one phone number, all of them will have country code +44.

Creating a Collection

Let’s assume we need a list of 10 orders that

have a null id
have any status except CANCELLED or COMPLETED

Such a list can be generated as follows:

List<Order> orders = Instancio.ofList(Order.class) 
    .size(10) 
    .ignore(field(Order::getId)) 
    .generate(field(Order::getStatus), gen -> gen.enumOf(OrderStatus.class)
             .excluding(OrderStatus.CANCELLED, OrderStatus.COMPLETED))
    .create();

While the Order class may have dozens of other fields, only those of interest are set explicitly. This has the benefit of highlighting which properties the test actually cares about. Other Order fields, for example, the shipping address, may also be required by the method under test to pass. However, they may not be pertinent to this particular test case and can therefore be filled with random data.

Customizing Collections within a Class

In addition to creating a collection of objects, the API also supports customizing collections declared somewhere within a class. For instance, let’s assume that we need to create a Customer that

Has a specific id
Has 7 orders with expected statuses

We can generate such a Customer as follows:

Long expectedId = Gen.longs().get(); 

Customer customer = Instancio.of(Customer.class) 
   .set(field(Customer::getId), expectedId) 
   .generate(field(Customer::getOrders), gen -> gen.collection().size(7)) 
   .generate(field(Order::getStatus), gen -> gen.emit() 
         .items(OrderStatus.RECEIVED, OrderStatus.SHIPPED) 
         .item(OrderStatus.COMPLETED, 3) 
         .item(OrderStatus.CANCELLED, 2)) 
   .create();

The Gen class above also provides access to built-in generators via static methods. It’s a shorthand API for generating simple values. The emit() method allows generating a certain number of deterministic values. It can be useful for generating a collection containing objects with certain properties.

Generating Optional Data

In this example, we need to verify that the method under test does not fail if optional values are absent. Let’s say we need to create a Person that has:

An optional date of birth
An optional Spouse
The Spouse itself contains only optional fields, all of type String.

This can be done as shown below:

Person person = Instancio.of(Person.class) 
   .withNullable(all( 
      field(Person::getDateOfBirth), 
      field(Person::getSpouse))) 
   .generate(allStrings().within(scope(Spouse.class)), gen -> gen.string().nullable().allowEmpty()) 
   .create();

withNullable() will generate an occasional null value. The method all() groups selectors to avoid repeating withNullable() twice. Finally, all strings declared within Spouse will be null, empty, or non-null.

This setup will result in different permutations of null and non-null values on each test run. Therefore, a single test method can verify different states without the use of parameterized tests or custom data setup logic.

Generating Conditional Data

Often, a class has interdependent fields, where the value of one field is determined by the value of another. Instancio provides an assignment API to handle such cases. The following example illustrates how the country code field in the Phone class can be set based on the country field of the Address class:

Assignment assignment = Assign.given(field(Address::getCountry), field(Phone::getCountryCode))
    .set(When.isIn("Canada", "USA"), "+1")
    .set(When.is("Italy"), "+39")
    .set(When.is("Poland"), "+48")
    .set(When.is("Germany"), "+49");

Person person = Instancio.of(Person.class)
    .generate(field(Address::getCountry), gen -> gen.oneOf("Canada", "USA", "Italy", "Poland", "Germany"))
    .assign(assignment)
    .create();

Bean Validation Constraints

Another common use case is creating a valid object based on Bean Validation constraints. Perhaps the method under test performs validation, or we must persist some entities in an integration test.

Assuming we have this data model:

class Person { 
   @Length(min = 2, max = 64) 
   String name; 

   @Range(min = 18, max = 65) 
   int age; 

   @Email 
   String email; 
}

we can generate a valid object as follows:

Person person = Instancio.of(Person.class) 
       .withSettings(Settings.create()
            .set(Keys.BEAN_VALIDATION_ENABLED, true))
       .create(); 

// Sample output: Person(name=XGFK, age=23, email=ty7gvh6@1un9n.net)

It should be noted that this is an experimental feature. It can be enabled via Settings, as shown above, or a configuration file. Most constraints from the following packages are supported, depending on what’s available on the classpath:

• jakarta.validation.constraints 
• javax.validation.constraints 
• org.hibernate.validator.constraints

Object Reuse via Models

Sometimes, we may have several test methods requiring the same object but in slightly different states. Let’s assume we are testing a method that accepts a loan Applicant. The loan application is approved if both of these conditions are met:

The applicant has an income of at least $25,000
The applicant has not declared bankruptcy within the past five years

To cover all the test cases, we will need three applicant states: one valid for the happy path and two invalid, one for each of the conditions. We start by defining a Model of a valid applicant with the required income and a bankruptcy date that is either null or over five years ago.

LocalDate maxDate = LocalDate.now().minusYears(5).minusDays(1);

Model<Applicant> validApplicantModel = Instancio.of(Applicant.class)
    .generate(field(Applicant::getIncome), gen -> gen.ints().min(25000))
    .generate(field(Applicant::getBankruptcyDate), gen -> gen.temporal()
            .localDate()
            .nullable()
            .range(LocalDate.MIN, maxDate))
    .toModel();

For the happy path test, we can then generate a valid applicant from the model:

Applicant applicant = Instancio.create(validApplicantModel);

The created Applicant will inherit all the properties specified by the model. Therefore, no other customizations are needed.

To create an invalid applicant, we can simply customize the object created from the model. For example, to generate an applicant with a bankruptcy date within the last five years, we can customize the object as follows:

Applicant applicant = Instancio.of(validApplicantModel) 
   .generate(field(Applicant::getBankruptcyDate(), gen -> gen.temporal()
         .localDate()
         .range(LocalDate.now().minusYears(5), LocalDate.now())) 
   .create();

Applicant’s income can be modified in a similar manner.

Populating Objects via Setters

By default, Instancio populates POJOs by assigning values directly to fields. However, there may be cases where assigning values via setters is preferred. One such example is when setters contain logic that is relevant to the functionality under test, as shown below:

class Product { 
   // ... snip 
   String productCode; 

   void setProductCode(String productCode) { 
      Objects.requireNonNull(productCode); 
      this.productCode = productCode.trim().toUpperCase(); 
   } 
}

To populate this class using setters, we can modify the assignment type via Settings:

Product product = Instancio.of(Product.class) 
   .withSettings(Settings.create()
     .set(Keys.ASSIGNMENT_TYPE, AssignmentType.METHOD))
   .create();

Instancio will then attempt to resolve setter method names from field names. By default, it assumes that mutators follow the JavaBeans convention and use the set prefix. If setters follow a different naming convention, for example, using with as the prefix, the behavior can be customized by modifying the SetterStyle option:

Settings.create() 
 .set(Keys.ASSIGNMENT_TYPE, AssignmentType.METHOD) 
 .set(Keys.SETTER_STYLE, SetterStyle.WITH);

Generating Custom Types

Some applications have data classes that are used extensively throughout the data model. For instance, a GIS (Geographic Information System) application may define a Location class that is referenced by PointOfInterest and several other classes:

public class Location { 
   private final double lat; 
   private final double lon; 
   // snip ... 
} 

public class PointOfInterest { 
   private final Location location; 
   // snip... 
}

Although we can generate valid locations as shown below:

PointOfInterest poi = Instancio.of(PointOfInterest.class) 
   .generate(field(Location::lat), gen -> gen.doubles().range(-90d, 90d))
   .generate(field(Location::lon), gen -> gen.doubles().range(-180d, 180d)) 
   .create();

It can get tedious if it needs to be done repeatedly across many tests. Defining a custom Generator can solve this problem:

import org.instancio.Random; 

class LocationGenerator implements Generator<Location> { 
   @Override 
   public Location generate(Random random) { 
      double lat = random.doubleRange(-90, 90); 
      double lon = random.doubleRange(-180, 180); 
      return new Location(lat, lon); 
   } 
}

Then the previous example can be modified as:

PointOfInterest poi = Instancio.of(PointOfInterest.class) 
    .supply(all(Location.class), new LocationGenerator()) 
    .create();

Although this is an improvement, we must manually bind the custom generator to the Location field. To take it further, we can also register the new generator via Instancio’s Service Provider Interface. Once registered, the following statement will automatically produce valid locations using the custom generator:

PointOfInterest poi = Instancio.create(PointOfInterest.class);

We will omit an example for brevity. For details, please refer to the Instancio Service Provider documentation.

Final Thoughts: Embracing Randomness

Some developers are apprehensive about using random data in unit tests. Their main concern is that random data will cause tests to become flaky and that failures will be impossible or difficult to reproduce. However, as we mentioned earlier, randomised data in tests has been used for many years, most notably in property-based testing frameworks.

Repeatability is a key feature of most testing libraries that generate random data. Therefore, the fear of randomness is misplaced. In addition, as we showed earlier, switching from hard-coded inputs to generated data can increase test coverage and reduce the need for more complicated data setup logic and parameterized tests.

If a test fails, it may have uncovered a potential bug, or perhaps it was an incorrectly set expectation. Regardless of the root cause, each time the test runs, it probes the subject under test from a different angle.

About the Author

Arman Sharif

Show moreShow less

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?