The inaugural EnvoyCon ran in Seattle, USA, alongside the KubeCon and CloudNativeCon events, and explored the past, present and future of the Envoy Proxy. Key takeaways from the first part of the day included: the success of Envoy has been driven by the rapid establishment of a (non commercial) community and the focus on technical qualities such as performance, extensibility and well-defined management APIs; many engineers and organisations are contributing to the open source Envoy code base, including Lyft, Google and Apple; and Square has deployed more than 2000 Envoy instances in production across 120 services, both on-premises and within the public cloud.
Matt Klein, "plumber and loaf balancer" (software engineer) at Lyft and creator of the Envoy Proxy, opened the day by talking about the creation of Envoy. The Lyft team created the proxy as an internal project in 2015 because none of the other proxies available at the time met all of their requirements for backend data store load balancing and service discovery, which was needed as the Lyft team moved from a monolithic application to a service-based system. The project was originally going to be called "Lyft Proxy" until the engineering team's desire for a "cooler" name forced him to look in a thesaurus under "proxy".
Envoy was released as open source mid-way in 2016, and after rapid adoption by the wider community it became hosted as an "incubating project" by the Cloud Native Computing Foundation (CNCF). Along with Kubernetes, Envoy is the only project to have "graduated" from the CNCF, which indicates the level of maturity, breadth of contributors, and wide-scale adoption. Envoy has arguably become the "universal data plane API" for modern service meshes and edge gateways, with projects like Istio, Ambassador and Gloo providing control planes for this data plane proxy. In addition, although the original Envoy project is written in C++, the Ant Financial team has created SOFAMesh, a Go-based implementation of the Envoy APIs.
Klein shared his view that the success of Envoy is largely "down to the community" and their uptake, contributions and support. He also proposed that there are several other reasons for the rapid industry uptake of Envoy: there is no "premium" (commercial) version; decisions within the project are made upon the basis of "technology first"; and an ecosystem has been built that allows "differentiated success on top" (for example, in the control plane implementations). From a technical perspective, he believes that Envoy provides performance, reliability, a modern codebase, extensibility, best in class observability, and a well-defined configuration API.
Harvey Tuch, staff software engineer at Google, was next to the stage, and he provided an overview of the contributions to Envoy. The largest contributors in regard to code and pull requests include Lyft, Google, Turbine Labs, Tetrate, Alibaba, IBM, Solo, Datawire, and Apple. Over the past two years, the Envoy code base has grown from 20k lines of code to 100k+, but in 2018 the codebase was divided between "core" and "extensions" in order to make this more manageable.
Isaac Diamond, software engineer at Stripe, provided an overview of the Envoy management "xDS" APIs that can be implemented by backend servers. These APIs cover, for example, the cluster discovery service (CDS), route discovery service (RDS), endpoint discovery service (EDS), and listener discovery service (LDS). All of these APIs offer eventual consistency and do not interact with each other. Potentially, many higher-level operations, such as performing an A/B deployment of a service, require the ordering of operations in order to prevent traffic being dropped, and accordingly the Aggregate Discovery Service (ADS) API is often used. The ADS API allows all other APIs to be marshalled over a single gRPC bidirectional stream from a single management server, which allows for deterministic sequencing of operations.
Next, Michael Puncel and Snow Pettersen, software engineers at Square, provided a tour of Envoy usage at Square. Envoy has been a core part of implementing a bespoke service mesh that is being deployed as the Square engineering team is migrating from a monolithic application to service-based system. The motivations for the migration include the simplification of the client/server libraries, distribution of load balancing responsibilities, and to abstract the infrastructure from application code. Square run their applications primarily on bare metal, with some workloads being run on public cloud. They have a "Kubernetes-like" deployment system, which orchestrates applications running on multi-tenanted hosts with no network namespacing but mutual TLS (mTLS) enforced between all applications.
The Envoy control plane implemented by the Square team was built upon the open source java-control-plane (rather than the often used go-control-plane), and makes extensive use of custom caching for identity access management (IAM) and service discovery.
Puncel and Pettersen provided a detailed overview of how their team migrated from a virtual IP (VIP)-powered service discovery and routing solution to an Envoy-powered service mesh. They currently have implemented feature parity with the legacy system, and 2000 Envoy instances are deployed across 120 services, both on-premises and within the public cloud. They cautioned that although they had put a lot of thought and resources into the engineering of the control plane that interacts with Envoy, the "hardest part of rolling out a service mesh is the migration", which involves communicating the benefits to engineers, providing training, and coordinating the deployment of the mesh.
Future plans include finishing the migration to Envoy, migrating to Kubernetes for service orchestration, adding SPIFFE integration for security management, and replacing their current edge/forward proxy.
Andy Shi, developer advocate at Alibaba Group, was next to the stage and discussed how Alibaba is using Envoy to autoscale Java-based RPC microservices. He introduced Apache Dubbo, a Java RPC framework that has been released as open source by Alibaba -- which is very popular within China -- and RSocket, an application protocol providing reactive stream semantics, and talked about integration work between these technologies and Envoy and the Kubernetes Horizontal Pod Autoscaler. Several filters were created for Envoy that allowed for custom metrics from both Dubbo and RSocket to captured and sent to Prometheus, which in turn were used to drive autoscaling decisions, based on user traffic patterns and service interaction. Work is ongoing in this space, and Shi encouraged developers to prioritise observability in order to support operational understanding and automation of tasks such as autoscaling.
Slide decks of talks can be found on the EnvoyCon Sched page, and the recording of many of the presentations can be found on the CNCF YouTube channel.