InfoQ Homepage Articles How to Minimize Latency and Cost in Distributed Systems

Cloud

How to Minimize Latency and Cost in Distributed Systems

Sep 25, 2024 14 min read

Amir Souchami
Chief Architect Aura from Unity

reviewed by

Steef-Jan Wiggers
Cloud Queue Lead Editor

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Key Takeaways

Distributed Systems spanning over multiple availability zones can incur significant data transfer costs and performance bottlenecks.
Organizations can reduce costs and latencies by applying zone aware routing techniques without sacrificing reliability and high availability.
Zone Aware Routing is a strategy designed to optimize network costs and latency by directing traffic to services within the same availability zone whenever possible.
Implementing zone aware routing end-to-end requires various tools, such as Istio, and selecting distributed databases that support this capability.
Be aware of the issues that arise when your services aren’t evenly distributed. Handle cluster hotspots and be able to scale a service within the scope of a specific zone.

The microservices architectural approach has become a core factor in building successful products. It was made possible by adopting advanced cloud technologies such as service mesh, containers, and serverless computing. The need to rapidly grow, create maintainability, resilience, and high availability made it standard for teams to build "deep" distributed systems: Systems with many microservices layers. Systems that span across multiple Availability Zones (AZs) and even regions. Those systems are often characterized by dozens of chatty microservices that must frequently communicate with each other over the network.

Distributing our resources across different physical locations (regions and AZs) is crucial for achieving resilience and high availability. However, significant data transfer costs and performance bottlenecks can be incurred in some deployments. This article aims to create awareness of those potential issues and provide guidelines on maintaining resilience while overcoming these challenges.

The Problem

Using multiple AZs is a best practice and an essential factor in maintaining service availability. Whether those are our microservices, load balancers, DBs, or message queues, we must provision them across multiple AZs to ensure our applications are geographically distributed and can withstand all sorts of outages.

The Cost Burden

Every time data is transferred between two AZs, it is common among the big cloud providers to apply data transfer charges in both directions for incoming cross-AZ data transfer and for outgoing cross-AZ data transfer. For example, if it costs $0.01/GB in each direction, a single terabyte of transferred data between two same-region availability zones will cost you $10 for the egress and $10 for the ingress, a total of $20. However, those costs can quickly get out of control on distributed systems. Imagine that as part of a request, your Load Balancer instance on AZ-1 may reach out to Service A in AZ-2, which in turn calls Service B on AZ-3, which in turn reads a key/value record from a DB on AZ-2, and often consumes updates from a Kafka broker on AZ-1 (see figure 1).

So, while this architectural choice prioritizes availability, it is prone to extreme situations where all the communication between services on a single request is done across AZs. So, the reality is that in systems with various DBs, message queues, and many microservices that talk to each other, data must frequently move around and will likely become extremely expensive. A burden that tends to grow every time you add services to your stack.

The Performance Bottleneck

AZs are physically separated by a meaningful distance from other AZs in the same data center or region, although they are all within up to several miles. This generally produces a minimum of single-digit millisecond roundtrip latency between AZs and sometimes even more. So, back to our sample where a request to your Load Balancer instance on AZ-1 may reach out to Service A in AZ-2, which in turn calls Service B on AZ-3, which in turn reads a key/value record from a DB on AZ-2, and often consumes updates from a Kafka broker on AZ-1. That way, we could easily add over a dozen milliseconds to every request. A precious time that can drop to sub millisecond when services are in the same AZ. And like with the cost burden, the more services you add to your stack, the more this burden grows.

So, how can we gain cross-AZ reliability without sacrificing performance?
And is there anything to do about those extra data transfer costs?
Let’s dive into those questions.

Zone Aware Routing to the Rescue

Zone Aware Routing (aka Topology Aware Routing) is the way to address those issues. And in Aura from Unity, we have been gradually rolling this capability for the past year. We saw that it saved 60% of the cross-AZ traffic in some systems. We understood how to leverage it to optimize performance and reduce bandwidth costs significantly. We also uncovered where it didn’t fit and should be avoided. So, let’s describe Zone Aware Routing and what needs to be considered to utilize it well.

What Is Zone Aware Routing?

Zone Aware Routing is a strategy designed to optimize network costs and latency by directing traffic to services within the same availability zone whenever possible. It minimizes the need for cross-zone data transfer, reducing associated costs and latencies.

Zone Aware Routing aims to send all or most of the traffic from an originating service to the upstream service over the local zone. Ideally, routing over the local zone or performing cross-zone routing should depend on the percentage of healthy instances of the upstream service in the local zone of the origin service. In addition, in case of a failure or malfunction with the upstream service in the current AZ, a Zone Aware Routing component should be able to automatically reroute requests across AZ to maintain high availability and fault tolerance.

At its core, a Zone Aware Router should act as an intelligent load balancer. While trying to maintain awareness of zone locality, it is also responsible for balancing the same number of requests per second across all upstream instances of a service. Its purpose is to avoid cases where specific instances are bombarded with traffic. Such a traffic skew can cause localized overload or underutilization, which can cause inefficiency and potentially incur additional costs.

Due to that nature, Zone Aware Routing can be very beneficial in environments with evenly distributed zonal traffic and resources (for example, when you distribute your traffic 50/50 over two AZs with the same amount of resources). However, for uneven zonal distributions, it could become less effective due to the need to balance the traffic across multiple zones and overcome hotspots (bombarded and overloaded instances—see figure 2).

Figure 2: Illustration of uneven zonal distribution. Two services across two zones. Will Service B hold up when it will serve all requests coming from AZ-2?

How Can I Adopt Zone Aware Routing?

The concept of Zone Aware Routing has various implementations. So, adopting it depends on finding the implementations that fit your setup and tech stack. Let’s review some of the options:

Istio Locality Load Balancing

Istio is an open-source service mesh platform for Kubernetes. Istio probably has the most promising Zone Aware Routing implementation out there: Locality Load Balancing. It is a feature in Istio that allows for efficient routing of requests to the closest available instance of a service based on the locality of the service and the upstream service instances. When Istio is installed and configured in a Kubernetes cluster, locality can be defined based on geographical region, availability zone, and sub-zone factors.

Istio’s Locality Load Balancing enables requests to be preferentially routed to an upstream service instance in the same locality (i.e., zone) as the originating service. If no healthy instances of a service are available in the same locality as the origin, Istio can failover and reroute the requests to instances in the nearest available locality (e.g., another zone and even another region). This ensures high availability and fault tolerance.

In addition, Istio allows you to define weights for different localities, enabling you to control the distribution of traffic across AZs based on factors such as capacity or priority. This makes it easier to overcome uneven zonal distributions by letting you adjust the amount of traffic you keep localized in each zone vs. the amount of traffic you send across zones.

Topology Aware Routing (AKA Topology Aware Hints)

If you work with Kubernetes without Istio, Topology Aware Routing is a native feature of Kubernetes, introduced in version 1.17 (as topology aware hints). While somewhat simpler than what Istio offers, it allows the Kubernetes scheduler to make intelligent routing decisions based on the cluster's topology. It considers the topology information, such as the node’s geographical location (e.g., region) or the availability zone, to optimize the placement of pods and the routing of traffic.

Kubernetes also provides some essential safeguards around this feature. The Kubernetes control plane will apply those safeguard rules before using the feature. Kubernetes won't localize the request if these rules don’t check out. Instead, it will select an instance regardless of the zone or region to maintain high availability and avoid overloading a specific zone or instance. The rules can evaluate whether every zone has available instances of a service or if it is indeed possible to achieve balanced allocation across zones, preventing the creation of uneven traffic distribution on the cluster.

While Topology Aware Routing enables optimizing cross-zone traffic and handling the hidden burden of performance and costs of working with multiple AZs, it is a little less capable than Istio’s Locality Load Balancing. The main drawback is that it doesn’t handle failovers like Istio. Instead, its safeguards approach shuts down the zone locality entirely for the service, which is a more rigid design choice requiring services to spread evenly across zones to benefit from those rules.

End-to-End Zone Aware Routing

Using Istio’s Locality Load Balancing or Kubernetes Topology Aware Routing to maintain Zone Aware Routing is a significant step forward in fighting our distributed systems' data transfer costs and performance bottlenecks. However, to complete the adoption of Zone Aware Routing end-to-end and bringing cross-AZ data transfer to a minimum, we also have to figure out whether our services are capable of reading their data from DBs and Message Queues (MQ) over the local zone (see figure 3).

Figure 3: Illustration of a request flow localized on a single AZ end-to-end

DBs and MQs are typically distributed across AZs to achieve high availability, maintaining copies of each piece of data in every AZ. This is why DBs and MQs are prone to being accessed cross zones, and as such, exposing the system to the performance and cost burden. So, can we optimize our DB reads latencies and reduce data transfer costs by accessing their data over the local zone without compromising resilience?

Here are some examples of DBs and MQs that can support Zone Aware Routing:

Kafka’s Follower Fetching feature

Apache Kafka is an open-source distributed event streaming platform for high-performance data pipelines, streaming analytics, and data integration. Like other MQs, it is typically used to connect between microservices and move data across distributed systems.
Follower Fetching is a feature of the Kafka client library that allows consumers to prefer reading data over the local availability zone, whether from the leader or replica node. This capability is based on Kafka’s Rack Awareness feature, which is designed to enhance data reliability and availability by leveraging the physical or logical topology of the data center.

Kafka’s Rack Awareness requires each broker within the cluster to be assigned its corresponding rack information (e.g., its availability zone ID) so that it will be possible to ensure that replicas of a topic’s partition are distributed across different AZs to minimize the risk of data loss or service disruption in case of an AZ or node failure. While rack awareness doesn’t directly control the locality of client reads, it does enable the Kafka broker to provide its location information via the rack tag so that the client libraries can use this to attempt to read from replicas located in the same rack or zone, or either fallback to another zone if the local broker isn’t available. Thus, it follows the concept of Zone Aware Routing, which optimizes data transfer costs and latency. To utilize follower fetching, we must assign the consumer the rack tag, indicating which AZ it is currently running from so the client library can use it to locate a broker on that same AZ.

Notice that when using Kafka’s follower fetching feature and a local broker’s replication lags behind the leader, it will increase the consumer’s waiting time within that local zone instead of impacting random consumers across zones, thus limiting the blast radius of some issues to the local zone. Also, like other Zone Awareness implementations, it is very sensitive to uneven zonal distributions of consumers, which may cause certain brokers to become overloaded and require the consumers to handle failover.

Redis and other Zone Aware DBs

Redis is a popular key-value DB used in distributed systems. It is one of those DBs used in high throughput, large-scale apps for caching and other submillisecond queries. A DB that typically would be distributed across multiple AZs for redundancy reasons. As such, any geographically distributed application that reads from Redis would have a great performance and cost-benefit from reading over the local zone.

Redis does not have built-in support to automatically route read requests to the local replica based on the client’s availability zone. However, we can gain this capability with certain Redis client libraries. For example, when using Lettuce (a Java lib), setting the client’s "ReadFrom" setting to LOWEST_LATENCY will configure the client to read from the lowest-latency copy of the data, whether it is on a replica or the master node. This copy will usually reside in the local zone, therefore reducing the cross-zone data transfer.

If you use a client library that does not support selecting the nearest replica, you can implement custom logic that achieves the same by retrieving the data from the local Redis endpoint. And preferably fallback to a cross-AZ endpoint when needed.

There are more popular database technologies that support Zone Aware Routing; here are two that are worth mentioning:

Aerospike—A distributed key-value database designed for high-performance data operation. It has a rack awareness feature, which, like Kafka, provides a mechanism for database clients to prefer reading from the closest rack or zone.

Vitess—A distributed scalable version of MySQL originally developed by Youtube, enabling Zone Aware Routing with its Local Topology Service that enables it to route queries over the local zone.

While we may notice that the concept of Zone Aware Routing is named a little bit differently in every piece of technology, all of those implementations have the same goal —to improve read latencies and data transfer costs while not sacrificing service reliability.

Another cost burden of distributing our DBs and MQs across multiple availability zones is that every piece of data we write needs to be replicated within the DB cluster itself to every zone. The replication, by itself, can cause a significant amount of data transfer cost. While technically there is nothing to do about it, it is essential to note that usually managed DB services such as AWS MSK (Amazon’s Managed Streaming for Apache Kafka), AWS RDS (Amazon Relational Database Service), Elasticache (Amazon’s managed Redis), and more, aren’t charging their users for cross AZ data transfer within the cluster but only for data getting in and out of the cluster. So, choosing a managed service can become a significant cost consideration if you plan to replicate large amounts of data.

Handling Hotspots and Uneven Service Distribution

While it is highly suggested that we maintain even zonal distributions of our services, it is not always as trivial or possible. Handling hotspots is a complex topic, but one way to address it would be to consider a separate deployment and a separate auto-scaling policy for each zone. Allowing a service to independently scale based on the traffic within a specific zone can ensure that if the load is concentrated in a specific zone, it will be adequately scaled to handle that load with dedicated resources, eliminating the need to fall back to other zones.

Conclusions

Fighting data transfer costs and performance in highly available distributed systems is a real challenge. To handle it, we first need to figure out which microservices are prone to frequent cross-AZ data transfer. We need to be data-driven about it, measure cross-AZ costs, and trace latencies of microservices communication to ensure we put our efforts in the right direction.

Then, we must adopt the proper tooling to eliminate or minimize those cross-AZ calls. Those tools and optimizations depend on our technology stack. To adopt Zone Aware Routing end-to-end, we must adopt different tools for different use cases. Some tools, like Istio and Topology Aware Routing rules, are meant to unlock zone awareness for Kubernetes pods communicating with each other. However, they won’t apply to cases where your microservices consume data across AZ from Databases or Message Queues. So, we must choose the right DBs and MQs with zone-aware routing baked in their client libs.

Finally, we must ensure we do not break our system’s high availability and resilience by deploying those tools and optimizations in an uneven zonal distribution setup that causes hotspots (instances bombarded with traffic). Such deployment may defeat the purpose, causing us to fix our cost and performance problem by creating a new one.

Overall, optimizing cross-AZ traffic is crucial; it can affect our applications' performance, latency, and cost, but it has to be handled carefully to ensure that we do not sacrifice our app’s reliability and high availability.

About the Author

Amir Souchami

Amir Souchami is Chief Architect Aura from Unity. With a great passion for technology, he’s constantly learning the latest to stay sharp and create, scale, and drive a positive business ROI. Follow up with him to chat about Gen AI, Machine Learning, Stream Processing, Microservices, and AdTech.

Show moreShow less

This content is in the Cloud topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?