Microservice architectures and container-based virtualization have taken the software development community by storm in recent months. Though predating Docker, containers have been given new life by the rapidly growing open-source project due the benefits they confer on applications; benefits such as agility, resilience, speed and portability. In a recently published InfoQ article, I argued that these benefits are incomplete if stateful services like databases are not also containerized.
That article was about theory, this article is about practice. In the tutorial below, I will show you how to achieve the benefits of containers for your stateful services like databases, using Flocker, an open source project from ClusterHQ where I am CTO. You can think of Flocker like the operations toolkit for Dockerized stateful services. Managing data comes with a certain amount of operational overhead. On any particular day you might need to move, copy it, back it up. Flocker aims to make these tasks trivially simple for containerized stateful applications so that every part of an application can be run inside a container.
Before jumping into our tutorial, let’s look at just why running stateful services in containers is so challenging.
The challenges of running a stateful services in containers
The most basic challenge with running your stateful services in containers comes from the way containers themselves read and write to disk. Docker best practices dictate that a container, say an ElasticSeach container, should write its data to a data-only container. This model means the container itself is stateless and immutable and can be shut down and started again while the data resides in the linked, data-only container. However the data that it has written to disk is very much stateful, and has to be managed differently. For instance, if you shut down your ElasticSearch database and start it up on a new, more powerful machine, it won’t have access to its state since that is still sitting on the original machine.
Flocker: an introduction
Flocker is an open-source data volume manager designed exclusively for the purpose of managing stateful services running in containers. As of this writing, Flocker is at release 0.4, meaning that the project is under active development. In the coming weeks and months, we will be adding an API so that you can do everything described in this tutorial programmatically.
Additionally, we are building Flocker from the ground up to work with other popular container management tools, such as Docker Swarm, Google Kubernetes and Apache Mesos. We’ve recently published a tutorial on how to use Flocker with Docker Swarm and Kubernetes, so if you are a user of either of those tools, I’d encourage to try out our integration demos.
Using Flocker to deploy and migrate an ElasticSearch-Logstash-Kibana stack
In today’s tutorial, I will show you how to use Flocker to deploy and then migrate ElasticSearch. ElasticSearch has exploded in popularity recently and it seems like almost everyone is using it for something, big or small. Running ElasticSearch in production, however, does not have a reputation of being easy. First, most people don’t just run ElasticSearch, they run the full ELK stack which is ElasticSearch, Logstash and Kibana.
Each of these services plays a role, and for real workloads, that probably means that you want to deploy each service to its own server which introduces complexities around multi-node deployment as well as networking. Furthermore, once your ELK stack is out there doing its thing, you are going to probably need to upgrade your ElasticSearch box to a larger node at some point because ElasticSearch is notoriously memory hungry. Docker itself is a great way to package up the ELK images. However, deployment, networking and migrations are ops tasks that native Docker doesn’t support.
Below I’ll show you how to use Flocker to deploy an ELK stack to multiple nodes, and then perform a near seamless database and data volume migration of ElasticSearch from one node to another.
Setting up ELK
First, a quick overview of the various ELK components and the roles that they play:
-
Logstash receives logged messages and relays them to ElasticSearch
-
ElasticSearch stores the logged messages in a database
-
Kibana connects to ElasticSearch to retrieve the logged messages and presents them in a web interface.
The first thing that we need to do is package up our three applications along with their dependencies into three separate Docker images. We’ve done this for you and put the result on DockerHub:
There are great tutorials out there for creating Docker images so I won’t go into that here.
Deploying ELK to a multi-node cluster
Now that we have our Docker images, we are ready to deploy our stack to multiple nodes.
Working with Flocker takes two configuration files: application configuration and deployment configuration. Let’s look at the application configuration first.
The application configuration is simply a yml file that describes how your application made up of multiple Docker containers talk to each other. For this reason, we often refer to it as application.yml. If you are familiar with Docker Compose, formally called Fig, you will instantly recognize many similarities with Flocker’s application yml. Below is the application.yml that is needed to start all three containers, as well as the port mapping to let them speak with each other, and to create a Flocker-managed Docker data volume in the ElasticSearch container.
"version": 1
"applications":
"elasticsearch":
"image": "clusterhq/elasticsearch"
"ports":
- "internal": 9200
"external": 9200
"volume":
"mountpoint": "/var/lib/elasticsearch/"
"logstash":
"image": "clusterhq/logstash"
"ports":
- "internal": 5000
"external": 5000
"links":
- "local_port": 9200
"remote_port": 9200
"alias": "es"
"kibana":
"image": "clusterhq/kibana"
"ports":
- "internal": 8080
"external": 80
Let’s take particular note of a few things:
-
The ElasticSearch application has a volume and mount point specified, in this case /var/lib/elasticsearch. One of the major benefits of Flocker is its ability to migrate data volumes between hosts as we will see later.
-
links allow containers to communicate even when located on different hosts
-
ports proxy a port (“external”) on the Docker host (accessible to the outside world) to port (“internal”) in the container.
Deploying ElasticSearch
Now that we have our ELK stack images ready to go and our application.yml defined, we are ready to deploy these containers to multiple hosts. We specify where we want our containers deployed in the second configuration file mentioned above: the deployment configuration.
In this example, we will deploy each of the services to its own virtual machine (VM). If you want to follow along, you can use virtually any host and the steps work equally well on VMs, bare metal servers, or any combination thereof. For instance, you might want to run ElasticSearch on bare metal for performance reasons, but run Logstash and Kibana on VMs to keep costs down. It’s up to you and Flocker is agnostic to the underlying host.
The deployment config is also just a yml file. The deployment.yml tells Flocker where to deploy each container by listing one or more IP addresses and application aliases defined in the application.yml.In this case, we are going to deploy each of our containers to a different VM.
"version": 1
"nodes":
"172.16.255.250": ["elasticsearch"]
"172.16.255.251": ["logstash"]
"172.16.255.252": ["kibana"]
When we run the command flocker-deploy using the CLI tool provided by Flocker, the containers will be automatically deployed, networked, and started up on the servers that were defined in the deployment configuration.
alice@mercury:~/flocker-tutorial$ flocker-deploy deployment.yml application.yml
alice@mercury:~/flocker-tutorial$
Migrating ElasticSearch and its data from one server to another
Now the containers have been deployed to multiple nodes in a cluster. What if ElasticSearch starts consuming too much RAM and you need to move to a larger instance size?
With Flocker, this is easy. Just update your deployment.yml with the IP address of your new, beefy box and re-run flocker-deploy. Your ElasticSearch container and its data volume will be automatically moved to the new node and connections that would have formally gone to the original node will be automatically routed to the new one.
OLD:
"version": 1
"nodes":
"172.16.255.250": ["elasticsearch"]
"172.16.255.251": ["logstash"]
"172.16.255.252": ["kibana"]
NEW:
"version": 1
"nodes":
"172.16.255.250": []
"172.16.255.251": ["logstash"]
"172.16.255.252": ["kibana"]
"172.16.255.253": ["elasticsearch"]
Here’s what happens when you re-run flocker-deploy to migrate ElasticSearch from node1 to node2:
-
Flocker checks to see if you’ve changed your deployment config
-
Since it sees that you’ve moved ElasticSearch from 172.16.255.250 to 172.16.255.253 , it initiates a migration.
-
The migration starts by taking a snapshot and pushing the entire contents of the snapshot from node1 to node2. During this time, node1 is still accepting connections so your users and/or other processes dependent on that data don’t experience any connectivity issues.
-
Once all the data in the snapshot has been copied over, only then is the application running on node1 shut down.
-
Any changes to the data volume after the 1st snapshot was taken will then be migrated. Depending on how busy your database it, this may just be a few hundred kilobytes of changes.
-
Once these last few changes have been copied over, Flocker hands off ownership of the volume to node2
-
The ElasticSearch container is started up on node2
We call this operation “two-phase push”, because the data is migrated in two phases. During phase one, the longest phase, while the data volume is being copied over, the database continues to serve connections. It is only during phase two when the application experiences downtime. Importantly, removing this downtime completely is one of the goals of the Flocker project.
What’s next for Flocker?
As we move towards our first production-quality release, we will be adding some important features. For instance, we are currently working on an API so you can do everything in this tutorial programmatically. Additionally, we are working on adding options for block storage, so instead of using local storage for your data volumes, you can use a block storage device at your cloud provider or data center.
We also have a lot of ideas for where to take Flocker after that. For instance, many of us are using the replication features built into ElasticSearch itself (not to mention similar features in MongoDB, Cassandra, and other NoSQL databases). In some instances, we don’t want to migrate a single database container and its data volume between nodes, but rather we want to run a cluster of containers and if one of the nodes fails, have the database rebuild it. Anyone who has waited for ElasticSearch to reindex itself after losing a node, however, knows that getting a cluster back to full performance can take a while. Wouldn’t it be cool if you could regularly snapshot your ElasticSearch volumes, then if a node was lost, bring it back from snapshot where only the last hour, or 5 minutes, or minute of data needed to be rebuilt?
Another area that we are looking at is better, more complete testing for our Continuous Integration (CI) pipelines. Who among us hasn’t been burned running functional tests against mock data that did not realistically simulate our actual databases. Or what about front-end testing that didn’t take into consideration the fact that our users don’t all have neatly organized First Name, Middle Initial, Last name? Wouldn’t it be amazing to be able to use Docker containers to immediately spin up a testing environment and run those tests against a copy of our production data, sanitized or not, depending on the use case? Then, when the tests are run, remove the environment completely and throw away the data?
Containers: Not just a faster horse
These thoughts on where Flocker will go next point to one of the most fundamental aspects of the container and microservice revolution we are currently experiencing: containers will enable completely new workflows.
Containers are often described as light-weight virtualization. This characterization leads us to assume that the benefits of containers are incremental to operating-system virtualization. Rather than being a car, they are a better horse and buggy. I don’t agree with this. I think that containers will completely change the way we build applications. What will happen when containers can spin up in 10 milliseconds? What will that enable? What type of storage requirements will be necessary? At ClusterHQ, we don’t know the answer to those questions, but we’re excited to find out.
About the Author
Luke Marsden is co-founder and CTO of ClusterHQ, The Container Data People. He has 12 years of experience building web apps, and using containers and distributed storage. Inspired by the practical operational problems faced running web apps at scale, he started ClusterHQ. Luke holds a Computer Science degree from Oxford University.