Developers attracted to the Azure Service Bus Relay for its ability to expose web services on internal networks to Internet-facing consumers have had, until recently, only one way to build such services: Windows Communication Foundation (WCF). Using a just-released public preview called Azure Relay Hybrid Connections, developers may now use any Web socket friendly platform to connect local services to this cloud-based broker.
The Azure Relay works by relaying requests sent to an Azure-provided endpoint to a remote service that’s connected to Azure through a bi-directional socket. Because the relay connection is host-initiated and the traffic flows over traditional web ports, there is typically no need for firewall changes or any updates whatsoever to internal network infrastructure. Until now, this capability was only available to developers using WCF, .NET, and Windows. In a recent blog post, lead architect for Azure messaging Clemens Vasters announced a new open protocol capability called Hybrid Connections that brings Azure Relay to any platform.
Hybrid Connections evolution of the Relay is completely based on HTTPS and WebSockets, allowing you to securely connect resources and services residing behind a firewall in your on-premises setup with services in the cloud or other assets anywhere.
Being based on the WebSockets protocol and thus providing a secure, bi-directional, and binary communications channel unencumbered by particularities of specific frameworks, allows easy integration with many existing and modern RPC frameworks such as Apache Thrift, Apache Avro, Microsoft Bond, and many others.
Developers using Azure Relay get more than just a pass-through broker, according to updated documentation from Microsoft. Additional capabilities include “TCP-like throttling, endpoint discovery, connectivity status, and overlaid endpoint security.” Microsoft points out that the Azure Relay differs from traditional VPN-based solutions because the Relay can be scoped to a single application endpoint and doesn’t require intrusive network changes.
Any users of the WCF-oriented functionality won’t experience any change in support or functionality. Rather, the Azure Relay now supports both the traditional capability—now called WCF Relays—along with the Hybrid Connections. In a comparison table found in the product documentation, Microsoft points out that Hybrid Connections differ from WCF Relay in that it supports .NET Core, Javascript/Node.js, Java, standards-based open protocols, and multiple RPC programming models. The protocol itself is made up of four “listener” interactions—listen, accept, renew, and ping—and a single “sender” interaction—connect. All communication happen using Web sockets over port 443.
Pricing is different depending on whether you are using WCF Relays or Hybrid Connections. For Hybrid Connections, users pay for a “connection charge” per listener, and per GB transferred (beyond the included 5GB per month). WCF Relay users pay for “relay hours” (the amount of time each relay is open) and per-10,000 messages charge. The Azure Relay isn’t designed for a massive number of simultaneous listeners to a given relay and wouldn’t necessarily be a choice for something like mobile device broadcasting. However, the service has healthy quotas including 25 concurrent listeners on a relay (with traffic load balancing among listeners), 5 billion messages per month, and 2 million Relay hours per month.
As part of this public preview, Microsoft shared a pair of Hybrid Connections samples. The .NET samples used a preview package (Microsoft.Azure.Relay) deployed to Nuget. The Node.js sample takes advantage of a new npm package (hyco-ws) that simplifies connectivity for Javascript developers.
To learn more about this release, InfoQ reached out to Clemens Vasters for a brief interview.
InfoQ: The first version of the Service Bus Relay shipped back in 2010. Why did it take so long to remove the dependency on WCF and Windows? Until recently, was there much demand for a cross-platform service?
Vasters: The first version shipped in January 2010 and the Relay had been in incubation for 3 years prior. The original reason for the WCF dependency was that the Relay started as a side-gig of two people in the WCF team as WCF was about to ship. Even though the API and protocol surface area has remained fairly stable over the years, we did a lot of work on it under the covers since the initial release. This service is a lot harder to make work with different customers’ networking circumstances than any outsider might think. The protocol surface area has been so stable that we just recently found a customer who was still using the original 1.0 client assembly that we shipped in 2010. The demand for a cross-platform version of the Relay has been constantly on the rise for years, but it only now got to the point where it secured a spot on our busy backlog. Meanwhile, this same team has created and shipped the Queues/Topics Messaging broker, Event Hubs, Notification Hubs, and led the initial work for IoT Hub. The cloud assets we run now measure load in trillions of message transfer transactions and Petabytes of volume per month. We’ve been busy.
InfoQ: What do you consider the ideal use case(s) for the Azure Relay?
Vasters: The Relay has been and still is an enormous surprise in terms of use-cases that customers realize with it. The Relay provides a solution for a combination of network communication problems that are each difficult to solve by themselves. First, it provides inbound application-layer connectivity to any app or machine that has outbound network connectivity to the Internet (or just to Azure). That means you can host a server that accepts connections nearly anywhere and also in networking environments that you don’t manage. Second, the Relay obscures the location of that server so that it stays private on the Internet; when you shut down the host, there is no IP address or port anyone can usefully remember for later. Third, the Relay provides a stable network-resolvable name for that server without requiring per-endpoint DNS management. Fourth, the Relay allows for programmatic discovery of existing endpoints and their state though ARM. Fifth, the Relay provides automatic load balancing across multiple connected servers. And, finally, the Relay provides an extra authorization boundary for clients and ensures that all communication is TLS protected without the server having to juggle certificates.
What is the ideal use case? Any scenario where you need point-to-point bridging of traffic that runs over a socket. Databases, remote desktops, shells, RPC. Both parties may be behind firewalls if you want to go site-to-site, or just one party might be if you want to go cloud-to-site. There are business printers that print via the Relay. There are point of sale machines that are connected through the Relay. There are on-premises databases and CRM systems that attach via the Relay into the cloud. There’s a lot that people do today and we expect that the new version will dramatically expand that reach.
A newly emerging scenario is that the Relay is a near perfect networking tool for interconnecting container workloads because it provides all those capabilities I just enumerated.
InfoQ: How has the architecture changed over time? While the service itself has maintained a fairly static customer-facing interface, has the plumbing evolved over the years?
Vasters: We don’t talk a lot about the internals of the services because we find that people start drawing conclusions from partial data. But yes, the service has seen very regular updates and at least two substantial rewrites in the first 3 years. What’s easily forgotten is that building hyper-scale cloud platform services is a new art, and building them with an optimal balance of performance, maximum reliability and cost efficiency is very hard. The rewrites we did were invisible because we put extra effort, sometimes a month of a developer’s time, into making sure that the system can smoothly upgrade from version to version under load and while retaining SLA, even if upgrade means that we’re switching to a wholly different set of machines with a different architecture.
The new Relay Hybrid Connection capability is a substantial addition, but builds on a lot of existing and proven parts and is co-deployed with the WCF Relay, meaning that Hybrid Connections is a public preview that is globally deployed everywhere from day one. The internal backplane of the Relay has long been based on our core AMQP stack and Service Fabric with no remaining WCF dependencies. The WCF Relay and Hybrid Connections now coexist on the same foundation.
InfoQ: How should users of distributed messaging solutions like Azure Relay approach things like availability and monitoring of their channels? When do you encourage customers to switch to a queue/topic solution instead?
Vasters: Queues and Topics are a very different set of messaging capabilities compared to the Relay. The Relay is about making connectedness possible and about locating your communication partner. Queues and Topics are about decoupling and not requiring all communicating parties to be simultaneously available or for a publisher to even know who subscribes. Those two services address different ends of the communication spectrum, and we believe that the Relay’s domain is quite exciting with much room for exploration.
For monitoring we have an ever improving set of tools in form of the Azure Portal for which we just added significant improvements for Relay, Queues/Topics, and Event Hubs. All that data is also available for programmatic access. There are also some third party solutions that can help with monitoring Service Bus which are using those available APIs.
InfoQ: Somewhat surprisingly after all these years, Azure Relay remains a fairly unique offering in the industry. Why do you think that is?
Vasters: There are a few similar solutions out there, but Azure is indeed the only major cloud with such a service. What we presented with Hybrid Connections as a pure WebSocket capability that can easily work for any platform and any device where you have a WebSocket stack is now instantly the largest service of its kind overall, because we have it in all Azure datacenters globally and in all existing Relay clusters which already serve many millions of connections.
The answer about why none of the other big clouds has such a service is difficult for me to give. One real technical reason might be that some of our competitors have made such strong architectural bets on HTTP that running stateful connection services is hard on their infrastructure. CloudFoundry only gained plain TCP routing capabilities this year. We also see the same shining through in IoT offerings and in messaging. We run the largest enterprise-class multiprotocol AMQP/HTTP cloud broker with nothing else coming even close. AWS SQS has only a fraction of the features and only supports HTTP which much less efficiency on the wire. Messaging middleware at scale with efficient protocols is hard.
InfoQ: Do you have any hypotheses about how this service will be used now that it's opening up to a wider audience? Any integrations with existing products that you'd like to see?
Vasters: We believe that opening up the protocol and making it as simple as we did, with basically just five protocol gestures over WebSockets will open up a lot of opportunities for customers and the community. The uptake will also directly impact how the pricing model will continue to evolve. That is obviously a factor for adoption and integration. For the WCF Relay, we have customers who maintain just a couple of connections that are enabling capability that would cost them huge sums to build and they’re stunned how little they pay. At the same time, we have customers who want extreme numbers of connections or listeners and they often don’t come talking to us about options. We are anticipating that there will be a dedicated Relay offering similar to what we offer with Premium Messaging and dedicated Event Hubs.
In terms of integration with existing products we’d love to see integration with anything that relies on sockets or streams. We’ll be doing some work ourselves to provide integration with language runtimes and protocol stacks. There’s a long, long backlog of software out there for which integration would be great. Databases, OpenSSH, Remote Desktop, RPC Frameworks, Sensor data Streaming, there are a lot of existing frameworks and products I can think of that would benefit greatly from seamless integration of this technology that makes network boundaries disappear.
As said earlier, we think the Relay can be a huge pain reliever for containerized workloads that often sit behind multiple layers of NATs and where secure network management can rapidly become very complex. That’s an area where we see a lot of potential.