Lilit Yenokyan, director of engineering at Pivotus, presented on performance testing of reactive services at Reactive Summit. Yenokyan describes the types of performance testing and covers the tooling necessary to run the tests and analyze the results.
Yenokyan started with a description of a system, a portal for bank agents to manage conversations with their customers. It also includes a React Native app. The application can potentially be used by any bank, meaning potentially millions of users.
As the team realised the traffic generated by the system would be significant, the question of how much load the system could sustain arised, and how to put a value on this predicted amount of load was discussed.
Yenokyan defines the goal of performance testing as determining capacity. With performance metrics, the system performance can be assessed and compared with how much it costs to run the system. This gives a sense of performance per dollar spent, and may lead to optimization efforts if the potential gain is worth the time.
Yenokyan defines three types of performance testing. Load testing is a scenario of a constant number of users. Stress testing is to push the system to its limit -- this allows a team to find out how much pressure a system can sustain before collapsing, and it also allows to test recovery from such failures. Endurance testing consists of running under moderate load for an extended period of time, which allows the uncovering of issues such as memory exhaustion.
The first step when implementing performance testing is to define success criteria. To achieve that, the scale must be defined in concrete terms, such as number of concurrent users. Then, scenarios can be defined and implemented. Goals should be defined for the target system to enable the formulation of concrete performance targets. Ideally, tests would run on the production cluster. However this is often not a practical solution; a clone of production is the next best thing.
Yenokyan enumerates several tools the are used within her team’s testing stack. The tests themselves are executed with Apache JMeter. JMeter is a testing tool sending requests in parallel and compiling the results.
Jenkins runs the tests themselves. It offers a variety of parameters and build triggers useful for load testing. The parameterized builds also offer a way to have a finer grained control over the tests without having to resort to manual execution.
The tests results are sent to InfluxDB, a time series database. The data is then visualised through Grafana, a visualization tool. Datadog, an Infrastructure monitoring service, provides additional metrics on the resource usage of the different machines in the system.
The difference between testing a monolith and a microservice system resides in how to interpret the results. In a monolith, the performance is seen as a whole. In a microservice system, each microservice is tested independently from each other, and then optimized and scaled independently as well.
Yenokyan continued with the lessons learned by the team. The first is to test scaling. While the team assumed the application would scale linearly by adding instances, tests showed throughput to remain the same. They found out by analysing data several bottlenecks limiting horizontal scaling, such as the database and web socket limit.
Yenokyan concluded with takeaways:
- Define expected performance before you commit
- Start measuring early on
- Question the obvious
- Measure performance for every release.