How Netflix continuously delivers code that serves TV shows and movies to more than 75 million viewers is explained in a blog post by three Netflix employees: Ed Bukoski, Brian Moyles and Mike McGarr.
The Immutable Server pattern is the basis for Netflix deployment. Each deployment creates a brand new Amazon Machine Image (AMI).
Netflix’s microservice architecture allows Netflix teams to be loosely coupled. Changes are pushed at the speed with which each team is comfortable.
Netflix does not require any team to use any set of tools, but they are responsible for maintaining the tools they do implement. Centralized teams at Netflix offer tools as part of a “paved road” to reduce the cognitive load of the majority of Netflix engineers.
The “paved road” code delivery process consists of several steps. Code is built and tested locally using Nebula. Changes are committed to a central Git repository. A Jenkins job, builds, tests, and packages the application for deployment. Using Spinnaker, Netflix’s global continuous delivery platform, these packages are deployed into Amazon Machine Images (AMI).
Build
Nebula is a set of plugins for the Gradle build system which builds, tests, and packages Java applications. Most of Netflix’s code is written in Java. These plugins extend Gradle’s automation functionality to include dependency management, release management, and packaging. A project’s build file declares the dependencies and plugins to be used.
Integrate
The next step is to push the locally built, tested, and packaged source code to a Git repository. The particular workflow is chosen by the team.
Upon commit, a Jenkins job is triggered to build, test, and package the code for deployment. The appropriate package type will be chosen based on whether a library or an application is being built.
Deploy
The Netflix “Bakery” exposes an API that is used to create an AMI. The actual image is created by using Aminator. The user specifies what foundation image and packages are to be put into the AMI. The foundation image is a Linux environment with the common conventions, tools, and services required for integration with the Netflix ecosystem.
When the Jenkins integration job is successful, it triggers a Spinnaker pipeline. Spinnaker reads the Nebula package, and uses the Bakery API to create the AMI.
Spinnaker then makes the AMI available for tens, hundreds, or thousands of instances.
The first deployment is to the test environment where the deployment is exercised with automated integration tests. After passing these tests, Spinnaker provides teams the flexibility to customize their production deployment process such as with multi-region deployments, canary releases, or red/black deployments.
As an example of the efficiency and automation of this process, it takes 16 minutes to move the Janitor Monkey cloud resiliency and maintenance service from code check-in to a multi-region deployment.
Future Directions
There is increasing demand at Netflix to be language agnostic. Non-JVM languages need to be incorporated into the build process.
A large fraction of the deployment time is in the baking process which Netflix is trying to reduce.
Netflix is also looking to see if containers could help solve these last two challenges.
Containers could also improve the current build, bake, and deploy process, which would improve the development and test cycles. A container that could be deployed locally, and in production without modification, would greatly help in determining if a bug could be caused by environmental differences. This allows engineers to focus on new features.