Architects designing Micro-Service Architectures typically focus on patterns, topology, and granularity, but one of the most fundamental decisions to make is the choice of threading model. With the proliferation of so many viable open source tools, programming languages, and technology stacks, software architects have more choices to make now than ever before.
It is very easy to get lost in the details of nuanced language and/or library differences and lose sight of what is important.
Choosing the right threading model for your micro-services and how it relates to database connectivity can mean the difference between a solution that’s good enough and a product that’s amazing.
Paying attention to the threading model is an effective way to focus the architect on considering the trade-offs between efficiency and code complexity. As a service is decomposed into parallel operations with shared resources, the application will become more efficient and its responses will exhibit less latency (within limits, see Amdahl’s Law). Parallelizing operations and safely sharing resources introduces more complexity into the code.
However, the more complex the code is, the harder it is for engineers to fully comprehend; which means developers are more likely to introduce new bugs with every change.
One of the most important responsibilities of the architect is to find a good balance between efficiency and code complexity.
Single Threaded, Single Process Threading Model
The most basic threading model is the single threaded, single process model. This is the simplest way to write code.
A single threaded, single process service cannot execute on more than one core at a time. A modern, bare metal server typically has up to 24 cores. A service built around this model will not be able to utilize more than one server core. The throughput of these services will not increase with additional load and their CPU utilization will not be able to rise over single digit percentage. With so much underutilization, a compensating tactic is to have larger server pools in order to handle the load.
This approach works, but is wasteful and ultimately expensive. The most popular cloud computing vendors offer single virtual core instances fairly cheaply in order to facilitate this approach’s more granular scaling needs.
Single Threaded, New Multi-Process, Threading Model
The next step up in both complexity and efficiency would be the single threaded, multi-process, threading model where a new process gets created for each request. Code for this type of micro-service is relatively simple, but it does contain more complexity than the previous model.
(Click on the image to enlarge it)
The overhead of process creation and constantly having to create and destroy database connections can steal processor time and thereby increase latency across all collocated services. The reason why this threading model creates more database connections is because database connections are per process and cannot be shared across process boundaries. Because the process lives only as long as the request, each request has to reconnect to each database.
Micro-services that run in this threading model should delay connecting to databases until they are needed. There is no reason to incur the cost of a database connection if the code path does not require that connection. While database connections cannot be cached across processes, some environments support a cross process opcode cache where you can store your service’s configuration data such as host IP and credentials for connecting. to a database; two popular examples of opcode caches are Zend OpCache and APC.
Single Threaded, Reused Multi-Process, Threading Model
The next increase of code complexity with efficiency is a threading model which is single threaded, multi-process, and any new request reuses existing worker processes. This is different from the previous threading model which always created a new process for each request. In this threading model, a new process is not created with each request after the process has been provisioned.
(Click on the image to enlarge it)
The service’s complexity is relatively simple but extra orchestration code must be involved to manage the worker process life-cycle. Code must also correctly re-initialize itself with each request. For example, programmers might maintain static variables instead of passing around a lot of extra data as parameters. That makes for simpler code and is fine when those static variables are reset with each new request. If the code doesn’t reset these variables, then behaviour will be based on previous requests instead of the current one. The last bit of additional code complexity is that logic for recovering from stale database connections will need to be included. A database connection can go stale when the database disconnects most likely due to inactivity.
Because each process can service multiple requests, there is no need to reconnect to each database with each request; database connections get reused which reduces latency by avoiding connection costs. But each process still has to create and manage its own database connections. Because processes cannot share database connections, shared databases maintain more open connections. Excessive open connections can degrade database performance. That is because database connections are stateful so the database application has to allocate resources in its own process for each connection.
Multi-Threaded, Single Process, Threading Models
There is a way to better protect the databases with a configurable number of connections. By using connection pooling in the multi-threaded, single long lived process model. Although a database connection cannot be shared across multiple processes, it can be shared across multiple threads in the same process.
(Click on the image to enlarge it)
Here is an example: If you have 100 single threaded processes each on 10 servers, then the database will see 100 X 10 = 1000 connections. If you have 1 process each with 100 threads on 10 servers and each process has 10 connections in its connection pool, then the database will see only 10 X 10 = 100 connections and the service can still achieve high throughput. Cross thread connection pooling is very efficient for both the service and the database.
This connection pooling technique achieves high throughput while protecting the databases but comes at a cost of extra code complexity. Because threads must share stateful database connections, developers must be able to identify and fix concurrency bugs such as deadlock, livelock, thread starvation and race conditions. One way to address these types of bugs is to serialize access but serializing access too much reduces parallelism. These types of bugs can be difficult for junior developers to identify and correct.
Multi-threaded, single long lived process models come in two flavors; by dedicating a thread per request or by sharing a single thread for all requests. In the former threading model, an extra thread is tied up with each request which limits the number of requests being processed in parallel. Too many threads can lead to inefficiencies due to excessive task switching in the CPU scheduler part of the Operating System.
In the latter threading model, there is no need to have an extra thread for each request but I/O bound tasks must run in a separate thread pool in order to prevent the entire service from hanging on the first slow operation that it encounters. If the results must be returned to the caller, then the request handler must wait for the results from the thread pool to finish.
With the no dedicated thread per request approach, expect high throughput and low latency for asynchronous operations but no real performance gains over the dedicated thread per request approach for synchronous operations.
Summary
threading model |
efficiency concerns |
code complexity issues |
single threaded, single process |
The service will not be fully able to utilize server cores. Expect throughput to not increase with additional load and CPU utilization to not be able to rise over 10%. |
The simplest and most easy to understand approach. |
single threaded, multi-process, new process for each request |
The overhead of process creation and constantly having to create and destroy database connections a lot can raise latency. |
Database connections should be lazy loaded. Consider using an OpCode cache. |
Single threaded, multi-process, requests reuse worker processes |
The databases see more open connections because they cannot be shared across process boundaries. Excessive open connections can degrade database performance. |
Extra code must be present to manage the worker process lifecycle. The code must be able to recover from stale connections. Static variables should get reset with each request. |
Multi-threaded, single long lived process, dedicated thread per request |
Cross thread connection pooling is very efficient for both the service and the database but an extra thread is tied up with each request which limits the number of requests being processed in parallel. |
Because threads must share stateful database connections, developers must be able to identify and fix concurrency bugs such as deadlock, livelock, thread starvation and race conditions. |
Multi-threaded, long lived single process, no dedicated thread per request |
Cross thread connection pooling is very efficient for both the server and the database. Expect high throughput for asynchronous operations. |
I/O bound tasks must run in a separate thread pool. If the results must be returned to the caller, then the request handler must wait for the results from the thread pool to finish. |
Conclusion
Before thinking about libraries and languages, software architects should reflect on the choice of threading model most appropriate to their engineering culture and competency. Striking the right balance between code complexity and efficiency will help sort out the confusion and give direction in choosing between the various technology stacks available. Because each micro-service has less scope than a monolithic application, consider leaning a little more towards code complexity in order to achieve higher efficiency.
About the Author
Glenn Engstrand is the Technical Lead for the Architecture Team at Zoosk. His focus is server side application architectures that need to run at B2C web scale with manageable operational and deployment costs. Glenn was a breakout speaker at the 2012 Lucene Revolution conference in Boston. He specializes in breaking monolithic applications up into micro-services and in deep integration with Real-Time Communications infrastructure.