In a recent blog post, Uber engineer Emily Reinhold described how they broke a monolithic API into a modular, flexible microservice architecture. She highlighted a few key design and architectural choices that were key to Uber’s migration effort.
Migrating to microservices, Reinhold says, had the goal of achieving better scalability on three different counts: handling growing traffic; adding new features easily; and, adopting an architecture that can adapt itself readily to the growth of the organization.
Uber engineers took a few general design decisions aimed to reduce coupling among microservices:
- Adopting MVCS, an extension of the well-known Model-View-Controller approach that explicitly includes a service layer, hosting the application logic. This allowed Uber to decouple the business logic from the persistence layer, making it thus easier to modify the latter.
- Replacing PostgreSQL with UDR, Uber’s globally replicated datastore to enable serving trips simultaneously from several datacenters.
Similarly, Uber engineers made important architectural decisions aimed to deal with the consequences of having a high number of services:
- Asynchronous networking: to handle the increase in the number of service requests, Uber engineers relied on Tornado, a Python event-loop-based asynchronous networking library which touts being able to scale to tens of thousands of open connections at once. One of the advantages of using Tornado was its ability to integrate well with Uber’s existing Python networking code, which was structured around a synchronous paradigm.
- Service discovery and resiliency: with the increased number of services a key feature is discovering services and identifying points of failure. This includes collecting metrics like failure rates and SLA violations, detect unhealthy hosts and doing circuit breaking to prevent cascading failures. Uber addressed these concerns by using TChannel over Hyperbahn, a network multiplexing and framing protocol for RPC that they developed and open-sourced.
- Strict interface definitions: Uber chose Thrift to define service interfaces using an IDL and to generate client-side source files for a variety of languages. Thrift makes it possible to detect any consumer trying to make a call that does not comply with the interface definition.
Finally, Reinhold explains, Uber keeps a healthy production environment by applying three principles:
- running load tests to identify bottlenecks and breaking points.
- using containers to use hardware more efficiently.
- simulating service disruption to ensure the system is able to recover and to identify its vulnerabilities.
Emily Reinhold also discussed thses topics at the last QCon in New York.