Lyft engineering finished their decomposition of a monolith into a collection of microservices back in 2018. Modular development environments using Docker containers eventually moved to the cloud. Recent articles describe how their development tooling struggled to keep up as time passed and the number of microservices exploded. Development environments had to return to the engineer's machine.
The original plan was to build a Docker-based container orchestration environment that engineers could use for testing. It would use multi-tenant environments to its advantage on production, becoming cheaper and faster to scale than the previous solution.
Lyft's local development environment was called Devbox – short for "development environment in a box" – and consisted of some tooling managing a local virtual machine and its configuration, including database seeding, package and image download, and installation. Developers needed only to issue one command to build an environment ready to receive requests.
Eventually, the need for sharing these environments with others arose. Devbox took to the cloud, becoming Onebox. Onebox was essentially a Devbox environment running on an EC2 instance. Since it had a bigger capacity and could download images much faster, engineers naturally preferred it to Devbox.
(source: https://eng.lyft.com/scaling-productivity-on-microservices-at-lyft-part-1-a2f5d9a77813)
As time went by and the number of microservices grew, it became increasingly difficult and time-consuming to configure and launch Onebox instances.
Since each service could have interaction trees many levels deep, the environment instance could require too many resources to be practical. Observability tools could not keep up with all running environments, making debugging frustrating. Additionally, the engineer's cognitive load increased significantly because they needed to keep the entire system in mind instead of focusing on their specific component.
According to Lyft engineering, an engineer's code changing process can be divided into an "inner dev loop" and an "outer dev loop". The first should take only a few seconds to give feedback because it involves only changing the code and running some tests. The latter can take much longer since it involves continuous integration and code review.
Given that Onebox environments were slow to set up and start, coupled with a visible degree of instability, engineers would often rely on the outer dev loop's CI tests to validate each code change iteration.
To get rid of these growing pains and frustrations, the focus shifted to bringing the development environment back to the engineer's laptop while simultaneously rebuilding the inner dev loop.
Having learned that running code inside containers is not a free abstraction, a decision was made to run service code natively on MacOS inside an isolated environment, without containers or VMs.
At Lyft, the majority of backend services are written in Python or Go, and frontend services are written in Node:
- For Python services, isolation is obtained by using immutable virtual environments. A new virtual environment is built every time the requirements.txt file changes
- Go services take advantage of the Go modules toolchain to automatically download and link all dependencies when the commands go run or go test are executed
- A wrapper was built around nodeenv to create the right environment for each Node service based on its metadata
Specialised services like datastores also run locally, often using containers. Datastores are loaded with fresh data at startup using scripts maintained by the team owning the service.
Therefore, locally starting a service is a multi-step process. Manually executing it is tedious and error-prone.
Tilt is used at Lyft to orchestrate the lifecycle of the service and its environment, eliminating the need to run all steps manually. Each service has a Tiltfile describing the steps necessary to have it running locally. A running service also reloads itself when the engineer changes its code on an IDE, further shortening the inner dev loop.
In addition to running the service, it is also necessary to interact with it. Making requests to the service is not trivial due to the different transport formats in use at Lyft, including gRPC, JSON/HTTP and protobuf/HTTP.
Engineers use a tool developed at Lyft to compose and send requests to the local service, taking advantage of autocomplete features made possible by the tool's integration with the service's IDL.
The net result was that "when running a service locally, users send requests directly to their service's API instead of testing using the mobile app talking to public APIs. This increases the developer's familiarity with their service's API and reduces the scope of debugging when there's an error."
Other professionals agree that local development environments are a good thing. For example, the VP of engineering at Sotheby's James Turnbull described how they make the engineer's work more efficient and are cheaper than cloud-based environments.