BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Production Like Performance Tests of Web-Services

Production Like Performance Tests of Web-Services

Introduction

Try to understand the trade-off between putting more efforts into testing and detecting issues faster in production (Sam Newman 2015, p. 153)

A failing test does not directly benefit the user ... If a product works, it works (Mike Wacker2015)

Tests should always keep the end user view in mind to ensure that the software meets with acceptance on the part of the users. But how to test web services which are not directly customer-facing, and in particular, how to performance test them in a meaningful way? This article outlines performance test approaches that we have developed and proven to be effective in the company HERE, which is a leading location cloud company. Since the performance of the web services, hosted on HERE cloud platforms, are bound by Service Level Agreements, we are performance testing thoroughly the web services by leveraging Continuous Integration & Delivery.

Web services are software components that communicate with other components using standards-based Web technologies including HTTP and XML-based messaging. They offer an interface that describes the collection of operations. RESTful web services communicate with stateless operations. Web services allow applications to communicate with each other without time-consuming custom coding, and because the communication is relying on industry standard protocols, they are not tied to any one operating system or programming language. However, the distributed and loose coupling also makes it challenging to test web services.

Test Approaches

Tests should be created with knowledge regarding the end user so that they are effective and risk-based. Because of this factor, as well as release techniques such as canary releases and feature toggles, the line between tests run prior to the release and of the released software on production becomes blurred. For example A/B testing became popular because it can determine the impact of new features or changes to a user experience on production. This paper raises the question if this idea can also applied for performance testing and how it is possible to performance testing (RESTful) web services based on a meaningful and realistic usage scenario prior to the release

Acceptance Test

Acceptance testing is based on the idea of discussing requirements using realistic examples, and of executing tests against those acceptance criteria. That way, tests can be used to create executable requirement specifications, and the software can be created based on those realistic examples.

An acceptance test can be therefore treated as a description of the behavior of a software product, generally expressed as an example or a usage scenario. However, the actual usage of the software may change over time, so there is the risk that the test data gets outdated. If that is the case, the tests have to be updated.

Acceptance testing can be written for web services as well as frontends.

Functional Split Test

Split testing (also known as A/B testing) compares two versions of a product (A and B or Green and Blue) as shown in Figure 1. Typically, this test is done on frontend level. You split your website production traffic between the two versions and measure metrics such as the conversion rate.

As described by Janet Gregory and Lisa Crispin (2015, p. 203), A/B testing is done in production by real customers, thus providing a way of critiquing the product.

But why test simultaneously with real end users in production?

Mainly, because this is a way to get end user feedback and thus to get information how the product is actually used:

  • You want to compare apples to apples (with tests there is some uncertainty if the test scenario is representative)
  • It is a controlled experiment
  • Monday is not the same as Saturday (differs the usage over time?)
  • The usage in USA East is not the same as Australia (differs the usage from region to region?)

Figure 1: A/B Testing of Frontend

Traditional Performance Test

Performance tests of web services can determine if the response times are acceptable for an expected usage scenario. Besides the number of API calls and thus the load, the usage scenario is typically reflected in the request ratio of the API calls: If the API of an e-commerce web service gets 95% of article search requests and 5% buy requests, this request ratio should be reflected in the performance test. As a site note, an intentional deviation from a realistic ratio may be also a good test scenario: By executing 20% buy requests, the system may get more under stress.

Load tests are used to determine the performance behavior for higher load than the typical usage scenario and stress tests in the case of critical load scenarios. One of the main challenges for the performance test design, is the creation of a realistic load scenario. Not only the volume (throughput), but also the request ratio is important; if the web-service is already in production, production logs can be used to fetch API calls. If the test data design is done thoroughly, the performance tests can be then done with static test data pools containing the API calls representing a “production like” and thus a realistic request ratio. Regular reviews and updates of the data pools are needed to ensure that the request ratio is kept fairly up to date. If the web service is new, or a new functionality is added, the API calls have to be designed artificially, to keep the expected usage in mind.

However, it has to be considered that the test (data) design is not a one-time effort. For example, due to the loose coupling of web services or a new customer, the request ratio and thus API usage can change suddenly.

No matter how many tests and fixtures you have, they just can’t cover all cases. Production traffic always differs from your expectations. (Leonid Bugaev, 2014).

An unforeseen (and thus untested) production request ratio can have a huge performance impact and could even lead to problems of availability and stability.

While traditional performance tests relying on static test data pools must deal with the risk of getting stale, thus failing to be representative, they have also important advantages:

  • The test results are repeatable and the tests are deterministic. Therefore, it is possible to use them for performance trending. Also, they are well suited for use as scaling tests, which is important for determining capacity needs (how many instances have to be hosted).
  • They are the method of choice for performance-testing new web services prior to production.
  • Designing the test data also means that it is possible to performance-test web services with uncommon API usages, such as designing the test data pool in such a way that the API usage is not optimized, resulting in API requests with a big payload.

Performance Split Test

Performance split testing is a performance test approach that duplicates real-time production traffic to determine the performance of a web service for the actual production load mix prior to the release of a new version into production. In this performance test, real-time production traffic gets duplicated and multicasted to two variations of a web services, hosted on the test environment. If one of the variations is the release candidate of the web service, it is possible to determine its performance for the actual load mix. And if the other variation is the version of the web service which is currently on production, the performance of those two variations (and thus versions) can be compared 1:1. These tests are also called "dark traffic testing" because the responses are not sent back to the customers. As shown in Figure 2, a “traffic distributor” service decides from which production instance(s) the API calls are used as test input. Since it is not desired to use the actual production API calls, this traffic must be duplicated. This allows use of the production traffic to drive load against test instance(s) and therefore determines the behavior under load conditions.

Figure 2: Split Performance Test Setup

Performance split testing follows the A/B testing idea of splitting traffic to compare the results. However, there are significant differences:

  • It is primarily focusing on the performance of RESTful web-services, while A/B testing is typically done to compare different functionalities that are visible on the frontend.
  • It does not include customer feedback - no conversion rate is calculated. Instead, the validation focuses typically on response time, response code and operating system usage metrics.
  • It is a test with production traffic but done prior to the production release.
  • It allows a web service to be load-tested, by redirecting a configurable amount of duplicated production traffic (served by multiple production instances) against a single test instance.

Production

Performance testing prior to the production release is important, because performance problems are often difficult to fix and can have a severe production impact. However, it is not possible to performance-optimize web services for all possible usage scenarios. The same applies to functional testing; there can be always an unforeseen API usage resulting in unexpected functionality. That is why it is important to keep in mind the trade-offs between putting more effort into (performance) testing all possible scenarios and detecting issues faster in production (see also: Sam Newman, 2015, p.153).

Tests focusing on the stability and fault-tolerance are not covered by this paper. As a note, those tests are executed by some companies (such as Netflix) on production. This is done to ensure that production is fault-tolerant and eventually, to say it in the words of Dave Zwieback and Nassim Taleb, "antifragile".

Canary Release

For the rollout of a new web-service version, the canary release strategy can be used to reduce the risk of introducing a new software version in production. This is done by slowly rolling out the change to a small subset of instances, before rolling it out to the entire infrastructure (see also: http://martinfowler.com/bliki/CanaryRelease.html).

The canary release is a release strategy, and not a test approach such as performance split testing. It can be used to monitor the performance behavior of a new web-service version, on the basis of a few instances prior to updating whole production.

Note that besides the canary release, there also other release strategies, such as feature toggles that target control of the introduction of software updates in production.

Production Monitoring & Log Analysis

Release does not mean ‘all done’; it is important to monitor the production behavior to detect issues fast. It is also possible to analyze the production logs, to determine the production request ratio. Monitoring and analyzing the production logs can be used to provide feedback to ensure that the performance test load mix is still “production-like” so that the load scenarios of the split test and the performance test can be adapted.

Conclusion - Role of Performance Split Testing

In summary, the functional split test (A/B) testing approach cannot be applied for performance testing web services. However, with performance split testing it is possible to use real-time traffic as test input.

Performance split testing cannot be done for all web services:

  • Is the web services RESTful? Performance split testing is done with a subset of real-time production traffic requires therefore that the web service is stateless.
  • Is the web service already on production? Because of missing production traffic, performance split testing new web services is not possible. Also, if new features require API usage changes (i.e. the usage of a new parameter), these changes cannot be tested with real-time production traffic as test input.

If performance split testing is possible, the question is, can be done instead of traditional type performance testing or if they are mutually supportive?

To answer this question, it helps to bear in mind that both performance test approaches, the traditional type performance tests as well as the performance split tests, have advantages and disadvantages:

Performance Test Aspect

Traditional Performance Test

Performance Split Test

Deterministic / repeatable

Yes

No

Performance test with boundary values for parameters

Possible

No control about which API calls are used as test input

Production-like request ratio

Requires frequent updates of test data

Yes

As you can see, the performance test split has limitations. Still, to be able to performance test with real-time production traffic is an important benefit. Also, by comparing the test results with a traditional performance test run, it is possible to determine if the request ratio of the traditional performance test is still up-to-date or if it should be reviewed. This means that both performance tests fit well together. The traditional performance test is typically run first and the performance split tests afterwards as one of the last tests prior to the release.

Even though these performance tests provide valuable information, a canary release may be still helpful to control also the actual rollout of the new web service version.

Can the performance split test find its own niche, given that with the traditional performance test and the canary release approach, there is already a multi-facetted tool kit available to get insights in the performance characteristics of web services? If performance contracts are in place, performance split testing is useful to establish confidence about the performance behavior of web services before they are updated on production. If the request ratio is difficult to predict, traditional performance tests may lead to wrong assumptions because their results may be not representative for production. Depending upon the specifics of the circumstances, a performance split test may offer additional benefit. It covers aspects which cannot be addressed with a traditional performance test and a canary release.

Sources

  • Bugaev, Leonid (retrieved August 4, 2015)
  • Fowler, Martin (retrieved August 4, 2015)
  • Gregory, Janet and Crispin, Lisa (2015): More Agile Testing. Learning journeys for the whole team. Addison-Wesley.
  • Newman, Sam (2015): Building Microservices. O'Reilly Media.
  • Wacker, Mike (retrieved April 22, 2015): Google testing blog 
  • Zwieback, Dave (2014): Antifragile Systems and Teams. O'Reilly Media.

About the Author

Stefan Friese is a Test and R&D Manager at HERE, working in Berlin. In his current role, he leads a team that builds Continuous Integration & Delivery test capabilities for real-time location web services, hosted on the HERE cloud platforms. Stefan has 14 years of experience as test manager, performance tester, software engineer, and quality engineer. You can find Stefan on twitter as @y3key.

Rate this Article

Adoption
Style

BT