BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Two Mistakes You Need to Avoid When Integrating Services

Two Mistakes You Need to Avoid When Integrating Services

Key takeaways

  • Key factors that affect reliability of a system
  • Store and forward design pattern to preserve availability of a system
  • Latency impact of synchronous messaging and the reason to avoid it
  • Asynchronous communication using Linked Services pattern
  • Different acknowledgment patterns in brokering protocols/APIs i.e JMS and the reason for using them.

 

Introduction

With the emergence of Service Oriented Architecture (SOA), businesses rapidly started transcending from their monolithic application designs into heterogeneous designs by decomposing their business functionality into multiple services [1]. Enterprise Architects should be careful when integrating these services together, as poorly integrated services would result in a spaghetti mess. Most often enterprises assume adopting to patterns such as Enterprise Service Bus (ESB) and Microservices alone would save them from this mess [2] and provide a viable solution. While it ‘partially’ does, unfortunately there are certain hidden challenges that these patterns alone do not address. The danger is that they typically go unnoticed during the initial stages of development and deployment, but surfaces when a system is live in production. By the time the consequences are realized, it might be too late. The intention of this article is to elaborate some of these challenges and articulate measures which could be taken to avoid them.

Service Integration Challenges

When adopting to SOA it’s common to use an ESB as the backbone infrastructure to integrate between services [3]. Well of course there’s sudden hype into using Microservices [4] as an alternative for ESB, nevertheless Anne Thomas (vice president and distinguished analyst at Gartner) doesn’t agree [5]. In my opinion, over granualizing services would adversely impact on the network overhead and also it makes it hard to debug and find faults of system. Therefore, it would be wise to move with Microservices with caution (unless making services granuler brings great benefits to the business). Nevertheless, the intention here is to elaborate the scope of the challenges addressed by these design patterns and reveal some of the hidden challenges which are not addressed by them.

As illustrated in [1] the usage of Restful APIs keeps growing exponentially. Therefore, examples used in this article would mainly elaborate Restful API integrations done using ESB and uncover some of the hidden challenges which are not addressed through the ESB alone.  

Following illustrates a simple service chaining example done using the ESB.  The user calls a proxy service in the ESB, where the role of the ESB is to chain between the two services Order Processing Service (OPS) and Order Delivery Service (ODS) and provide a response back to the user.

Figure 1: Pizza Delivery System

Figure 2: Pizza Delivery Message Flow

Listed below is a comparison of the benefits a user would get by using an ESB to integrate between the services. Instead of having the services directly talk to each other (point to point).

 

Point to Point

ESB

Virtualization/Usability

Implementation complexity, each client would have to implement service orchestration/chaining logic by themselves.

The ESB proxy virtualized the service chaining between OPS and ODS. Where the user will now be required to call only one service in the ESB to get the expected response.

Extensibility

Let’s say OPS  service was secured and the service as a precondition requires a security token to be attached in the incoming request. To obtain a security token, User Authorization Service (UAS) should be called before calling OPS. If there are millions clients calling OPS, all million will have to change their implementation to talk to UAS before calling OPS.

Client’s are not tightly coupled with the service orchestration/chaining complexity. Hence changing the configuration in the ESB would be enough to reflect the change.

 

 

Fault Tolerance 

When services are tightly coupled if one service fails the whole system could be in jeopardy. Handling fault tolerance and load balancing in auto scaling environments could be complex when communicated directly.

ESB allows service orchestration/chaining to be managed centrally. Hence, fault tolerance, dynamic discovery of auto scaling of nodes could centrally managed. Brings down the development complexity for the users/clients.

The above are some of the great benefits of using an ESB over point to point integration between the services, however, ESB alone do not solve all the challenges. The next sections would evaluate the challenges which are not addressed through the ESB alone.

By looking at the above example illustrated in Figure 1 and Figure 2. Consider a usage of millions of users consuming the ESB service at a given time.  Could OPS or ODS handle these incoming requests at the same rate, the ESB accepts them from the users ?

As discussed earlier an ESB is commonly seen as a backbone in an enterprise, which means it should be able to handle high amount of request load. Request load is commonly measured in Transactions Per Second (TPS). The services implemented in the organization (OPS or ODS) might not be designed to handle requests at the same TPS rate as the ESB. As a matter of fact what could go wrong? As explained in [6] a service or a system could fail to deliver due to presence of faults or the system being overloaded. If the ESB routes the requests to OPS or ODS at the same TPS rate it receives the requests and if the services cannot withstand that rate these service would be overloaded and would fail to deliver the responses. Failure in the service would result in losing the request message the ESB accepted from the user.

Therefore, it is essential to take proactive measures to  prevent the services from being overloaded, in order to ensure that the system does not crash measures should be taken to ensure the communication link is much more reliable (zero message loss), so that if the ESB accepts a request from the user, the ESB should guarantee the delivery (reliability) of the message among all the services.

Next section would focus on how communication links between the ESB could be made reliable and measures which could be taken to prevent the services (OPS and ODS) from being overloaded.

Reliable Communication

One of the traditional ways of achieving reliable communication was to use    WS-ReliableMessaging. However, this evolved with the growth of Restful APIs [1] since WS-ReliableMessaging is specific to Web Services. Also Marc in [7] argues how reliability should be taken out from the transport layer and the necessity of including it with the business semantics. However, in my opinion this approach would be applicable only if all the services in the system are implemented in-house. this would not be the case for most of the enterprises; one of the main purposes of moving to SOA is to make the system extensible so that it could re-use and interoperate between services which are implemented in-house and/or exposed by third party service/API vendors. Since, an organization cannot influence the external service vendors to adhere to specific business semantics, reliability should not be tightly coupled to the business application level. Therefore, it would be essential to use a more generic (independent of its business semantics) mechanism to achieve reliability of system.

Message-Broker is an intermediary pattern which decouples message senders and receivers. Most of the ESB vendors support integrating with Message Brokers (MB) via protocols such as JMS. The next sections would focus on elaborating how ESB and MB patterns together could be used to achieve a more reliable communication link between the services which are chained through the ESB (achieve zero message lost). Also would elaborate how MB would come into play to offer means to control the TPS rate (throttle) in which the messages are routed through the ESB to the services (OPS and ODS), to prevent the services from being overloaded.

Request Rate Control

Message Brokers (MB) revolves around several messaging concepts. Mainly based on queues (point to point) and topics (pub/sub). These concepts are designed to decouple between time and space [8]. So that if a given message is inserted into the queue by the sender the broker will ensure to deliver the message to it’s receiver. If the receiver is unavailable at the time the message was sent, the broker will persist the message until the receiver becomes available. Let’s take a look at how MB could be intercepted with the ESB for the same pizza delivery system example which was discussed earlier. The intention is to illustrate how the MB could be added without creating an adverse effect on the services (OPS and ODS) as shown in Figure 3 and Figure 4.

Figure 3: Store and Forwards with ESB

Figure 4: Store and Forward Message Flows

As depicted in the above message flow diagram (figure 4),

  1. The client sends the order message to the ESB.
  2. The ESB accepts the incoming HTTP message and re-publish the message to a queue (OPSQ) in message broker via a brokering API (JMS).
  3. The stored message is consumed from OPSQ by the ESB at a controlled rate. (i.e once in every 30 seconds). The broker deletes the message from the queue once consumed.
  4. The consumed message is sent to OPS by the ESB through performing a protocol conversion between JMS to HTTP.
  5. Once OPS responds, the response message is published by the ESB to ‘ODSQ’ queue.
  6. The same procedure described in steps 3 and 4 would imply to deliver the message to ODS at a controlled rate.

Let’s take a look at the following benefits and drawbacks of using message broker,

Benefits

  • The approach could be used with any service implementation i.e Web Service, Restful.
  • Messages would be persisted in the queue or the topic if the services cannot withstand the incoming TPS rate. This would ensure to achieve guaranteed delivery of messages as well as the services could consume messages at a controlled TPS rate.
  • Message broker could be added without creating an adverse impact on the service implementation (OPS and ODS). Hence this approach becomes much more agile when integrating with 3rd party APIs and achieve reliability.

Drawbacks

  • Addition of broker means an extra layer the message will go through. This is apparent when referring to the figures 2 and 4. When the broker was added the message will now go through 2 extra layers (OPSQ and ODSQ).

The more layers the message goes through (network hops) the more latency it would add for the client to receive the response [9]. Besides if the services which were chained (OPS and ODS) consumed at a lower TPS rate (throttled). The impact on the response latency for the client would be higher.

The next section would focus on elaborating how increase in the response latency could have an adverse effect on the client.

Adverse Effects of Response Latency

Depending on the Operating System (OS), the client keep-alive timeout could vary between ~20-75 seconds. For the above example if the sender (client) block waits to receive a response from the proxy service (synchronously), the proxy service is expected to deliver a response to the client in less than 20 seconds. Else the sender would timeout and would assume the transaction had failed, but it may have not. The message would’ve being processed by all the services (OPS,ODS) even though the sender was not notified on time. As a result assuming the transaction has failed the sender will re attempt to send the same message again which causes inconsistencies.

Measures should be taken to prevent these inconsistencies. The reality is systems should not assume it could always deliver the message to the caller/sender before it times out, since response time could be subjective, especially when services consume the messages at a controlled rate and when there’re multiple network hops the message would go through to achieve reliability goals. Therefore, it would be of best interest for the sender to move away from synchronous (blocking) way of communication to asynchronous (non-blocking).

Asynchronous Communication

There are pros and cons of using asynchronous communication when compared against RPC based synchronous communication [9]. As elaborated in [9], as opposed to synchronous communication, asynchronous communication will keep the sender blocked only for a less amount of time.

Referring to the example elaborated in figure 4, Instead of having the ESB respond to the client after calling each service i.e OPS and ODS, we could have the ESB respond to the client immediately after publishing the message to the message broker. This approach will reduced the response latency for the sender. However, there are a few gray areas which needs to be addressed.

  1. How will the sender be notified if an error occurs when the message is being processed through the services (OPS,ODS) ? or an update on the status of the order delivery ?
  2. As described in [9], unlike synchronous communication, asynchronous communication by default does not provide guarantees to the sender on successful delivery of the message at the destination. In that case how could the sender be assured on successful processing of the order which was placed.

The next sections would describe how the above areas could be addressed.

Linked Services Pattern

Linked Service [10] is a service design pattern which defines a mechanism for senders/clients to discover related services which could be used to identify the status of the request which is being processed. The pattern basically specifies to include hyperlinks in the response message delivered to the sender, so that the sender could later refer to these links to track the status of the request. Along with adapting to asynchronous communication, service owners could provide hyperlinks to the service callers (senders) in the response message. This way the sender could use these links to identify the status of the order delivery or any error condition which may have occurred when the the message is being processed.

Illustrated below is how the sender could communicate with the services asynchronously using linked services pattern.

Figure 5: Asynchronous Messaging

As depicted in the figure 5,

  1. The client sends the order message to the ESB.
  2. The ESB accepts the incoming HTTP message and re-publish the message to a queue (OPSQ) in message broker via a brokering API i.e JMS.
  3. Once the message is published to the queue, the ESB responds back to the client. The response includes a hyperlink which allows the client to refer and track the status of the order.
  4. From thereon, the message flow is similar to step 3-6 described in section 1.

Delivery Guarantees and Transactions 

When messages are sent without expecting acknowledgments (fire-and-forget pattern) there’s a risk of losing the message, because there is a chance that the network or system that the message is sent could be erroneous or unreliable. What will happen if the JMS message published to the broker (OPSQ) by the ESB doesn’t reach the queue?

Since the sender would be informed on successful acceptance of the message after placing the message on the queue. it would be essential to  verify with the MB whether the message was successfully placed into the queue, before sending the acceptance response to the sender. Also it would be essential to always verify whether the recipients who consume the message from the queue successfully consumes it before deleting it from the queue.

MB will send a publisher acknowledgment to the callers when it successfully accepts a message to the queue and will delete a message from the queue when an acknowledgement is being sent by the consumer who receives it. By default in JMS [11] the acknowledgment mode is AUTO_ACKNOWLEDGE, where the consumer (ESB) would acknowledge the message upon receiving it.

The potential risk which would affect the pizza delivery system would be, to have OPS return an error status or not returning a response at all after consuming the message from the queue. Since the ESB by default would AUTO_ACKNOWLEDGE to OPSQ the message would be discarded from the queue immediately after it’s consumed by the ESB. The whole idea is to make sure OPS and ODS successfully receives the message, acceptance of the message by the ESB would not guarantee to deliver the message to the relevant services (OPS,ODS). 

JMS, CLIENT_ ACKNOWLEDGEMENT mode would be an option which could be used. This mode allows the consumer explicitly acknowledge or rollback the message instead of auto acknowledging the message upon receiving it. This way if there is an error thrown from OPS or ODS services, the consumer (ESB) could rollback the message and have the message replayed until it is delivered. The MB would ensure to persist the message until its being acknowledged by the ESB. More information on acknowledgment modes could be found in [12]

As depicted in figure 6,

  1. The client sends the order message to the ESB.
  2. The ESB accepts the incoming HTTP message and re-publish the message to a queue (OPSQ) in message broker via a brokering API i.e JMS.
  3. ESB waits for the broker to acknowledge on accepting the message to OPSQ. (publisher acknowledgment)
  4. Once acknowledged the ESB sends the confirmation response to the client.
  5. ESB peeks the message from queue using CLIENT_ACKNOWLEDGE mode. (This ensures that the message would not be discarded from OPSQ until the client sends an acknowledgement or rejection)
  6. ESB dispatches the request to OPS. If OPS sends a successful response, the ESB acknowledges the message and inform OPSQ to delete the message from the queue. If the status is an error, the ESB sends a rejection to OPSQ asking it to rollback, so that the message will not be discarded but rather be ready to consume again (replayed).
  7. The same steps will be applicable when sending a message to ODS by consuming from ODSQ.

Conclusion

In conclusion, patterns such as ESB brings great benefits when integrating between heterogeneous services. However, ESB alone does not address all the integration challenges which a system would face. The services orchestrated/chained through the ESB might not be able to handle the incoming requests at the same TPS rate the ESB accept the requests from the users. Due to this the services could be overloaded, resulting the systems to crash and fail to deliver. Message Brokers could be used to control the rate on which the services consume messages, ensuring reliability. Also using Message Brokers would not have any adverse effect on the implementation of the services, making it suitable to ensure reliability for both in-house and 3rd party services/APIs. While using Message Brokers to achieve reliability and to rate control between the services there will be a proportional impact on the response latency. Increase in response latency would create inconsistencies due to client time-outs. Therefore, it would be beneficial to consider asynchronous communication over synchronous, to avoid these inconsistencies. Asynchronous communication would have a lesser response latency in comparison synchronous, however unlike synchronous communication it does not implicitly provide a way for the caller to identify the state of the request which is processed as well as does not implicitly guarantee delivery of the message. These gaps could be fulfilled by explicitly using service design patterns such as LinkedServices and proper acknowledgment modes.        

About the Author

Pamod Sylvester works at WSO2 Inc which is an open source cloud and middleware company, he is a Product Lead of WSO2 Message Broker and former member of the WSO2 ESB team. He mainly specializes in enterprise integration technologies, and message brokering protocols, he has been an active contributor to the open source community for several years. You can read some of his work here.

Rate this Article

Adoption
Style

BT