1. We are here at QCon San Francisco 2014, I’m sitting here with Jeff Lindsay, so Jeff who are you?
I’m Jeff Lindsay, I’m also known as @progrium on the internet on Twitter and GitHub. I’m currently the principle and co-founder of Glider Labs which is a consulting company that helps people with Docker deployments but in general sort of modern software architectures, and I guess I’m known for building a lot of OpenSource software in the Docker Community and bunch of other random stuff.
Docker is a container runtime, it’s basically a container manager but more in a sense of a, it’s kind of confusing because of the fact that containers have existed for a long time, things like LXC and what Docker does is it provides a higher level abstraction to sort of model a unit of software for deployment and management, so it provides us basically a model of a high level container or might be a better way to just think of it as a unit of software.
3. When you say container, could you explain what a container is because it’s a rather ambiguous word?
So traditionally a container is in the Linux world at least, sort of a kernel-level technology that allows you to isolate software and what look like their own little systems so you can have your own little file system but it looks like any distribution Linux and you can isolate resources using cgroups and some other things and it basically was originally for isolation and security type of applications and so Docker kind of inherits that, but it’s kind of building on and based on a lot of the experiences of companies that have been using that technology to develop platform services, platform as a service, companies like Heroku and dotCloud.
I mean at a high level really conceptually it’s very similar, but the differences are in the implementation and the benefits, it’s almost as if containers really, this generation of container technology is kind of, gives us a lot of the promises that people thought they are going to get with virtualization. So it’s kind of, and I remember we used to say containerization is the new virtualization. I think though that the point of Docker is a little bit more than that, it’s kind of beyond that, but in general the idea that it’s like lightweight virtualization and that’s because it’s not actually emulating the whole hardware system and it’s sharing that Linux kernel, so there are a lot of problems, it’s great when EC2 came out and people start working with virtualization but it’s kind of, it’s a very complicated beast, so it will take a while to spin up the VM, sometimes it will spin up and not work, it just like it’s very complicated because you are modeling so much, like it’s such a complicated system whereas a Docker container, containers in general are just very simple and because of that they are very fast to spin up and throw away and it really changes what you can do with them, the kinds of applications that you can build with them.
Werner: So essentially as I understand so containers are just kind of special processes, that run on the same kernel as other containers I guess.
Yes, I mean that’s the best way to think of them as processes, a container can run multiple processes and in the same way any process can have child processes, but thinking at a process level than at a system host level, machine level, is a big helpful way to think of it.
Werner's full question: So if I look at virtualization versus containers, I guess the big benefit is that if you run a bunch of containers, they all share the same kernel, which means if you upgrade that kernel, all the containers get the same upgrades, I guess it’s a big advantage, versus if you have 20 virtualization images, each has its own kernel, you have to recreate them, update them and so forth, is that correct?
Yes, I mean that’s one benefit, the other benefit is, you know in virtual machines when you run software on them or even bare metal any kind of host if you run software on them, you are going to have all kinds of dependencies, all software has dependencies, you have to have certain libraries, certain versions of various runtimes Python, Ruby, whatever, and whenever you upgrade your system whether it’s the operating system or the kernel, you run into potential problems where something changes or some dependencies are broken or something like that, you upgrade to a new version of Ubuntu and this package doesn’t exist any more or something like that. So containers are one way of that you can think of containers is extreme vendoring, so when you make your software you make sure that it has all the dependencies that needs to run with it and so it’s built into that unit, as opposed to having to worry about, you want to run the software and you have to make sure that the system is going to be running, have all the right packages and everything, and what’s really painful is when you run multiple pieces of software on a host because they might have conflicting dependencies versions of things and your host system is very complicated, hard to manage and then that usually leads people to not update it as often, so in a way it sort of really improves people’s, I guess it makes people more comfortable in upgrading things because everything is self contained.
That’s actually kind one of the ideas in CoreOS when those guys were telling me about the idea, they pitched it almost comparing it directly with Google Chrome and I’m not sure of too many if they’ve actually made this, because it might be confusing, but what Google Chrome did was a couple of interesting things, one was to run every tab in a separated isolated process, that’s kind of interesting, so if one tab froze up or whatever it wouldn't affect the other tabs, and then they also started doing automatic updates and that was sort of a more significant thing because it meant that Google Chrome was always using the latest technology, the latest versions of things and that really helps change the way people think about writing web software because they can just write against the latest as opposed to worrying about various versions, worrying about like IE6 or whatever. You still have to worry about different browsers but if every browser just started automatically updating then you can just at least target the latest version, and I think CoreOS was kind of thinking about that as a conceptual model because they were able to say: “Well, if we can automatically update the kernel, we can make sure everybody is running the latest kernel and take advantage of the security benefits of that and improve overall security by making sure everybody is running the latest version of the kernel” and by running everything in containers it kind of helps isolate that sort of process.
6. You mentioned CoreOS, CoreOS it’s a stripped down version of Linux or is it more?
Yes, it’s sort of the hypervisor idea of Docker, if you run Docker and you build a system that runs everything in containers using Docker, then the host system that you are running Docker on doesn’t matter as much, like it will in terms because it ultimately what's configuring your file system and everything like that, interfacing with the actual hardware that you have, but the amount of software that you have to maintain at that level is very minimal. So the idea is when I make an operating system that’s made specifically for running things in containers, and it just so happens I’m not sure that it was at the same time or inspired by Docker, but CoreOS is at least currently focusing on Docker containers and so it works out really well, becomes one of the better choices I guess, in terms of a host system for Docker. And based on some of our clients it’s actually almost most sort of a psychological thing because when you have a stripped down operating system running Docker to run containers, it sort of helps people realize that they are not going to be SSHing into this host service as much, they are not going to be trying to install stuff on it, because that’s not how you are trying to design the system to behave, you try to make sure everything runs in containers. So if you have Ubuntu sitting there as your host system, you’ll might be tempted to just install a bunch of stuff there and then at some point you'll think “Maybe we need to run Chef to can manage all this or something”.
7. It’s a benefit to have a smaller host system, because it’s a smaller attack surface basically?
That’s the other big thing, any services that you normally run on the host, you can run in a container which adds an extra little bit of isolation and security, minimizing that attack surface is another big point.
If you really want it, it’s not best practice to have a shared kind of piece of software on the host, even though it’s possible, it’s really sort of not idiomatic and it kind of defeats the purpose, because if you do that, the container that you build is now depending on the host having that piece of software and part of the point of these containers is they are self contained, they have everything they need to run, and should run the same whether it’s on your server or on my server on your local laptop or whatever. So ideally you'd try to avoid doing anything clever like that because it’s actually maybe too clever, you should just install Ruby binaries in the container, and if you have 2 applications that you are running in containers, you install Ruby in both of them. They have the concept of layers when you are building images you can have a base layer that has Ruby and then you can build a container based on these base layers, so that helps eliminate some of the redundancy there.
9. When you say layers, is that in the file system?
Yes, these are file system layers, so Docker kind of depends on these layered file systems to do its magic.
10. Essentially you bundle all your software in one container image?
Yes, so everything that anything that a piece of software needs to run it should be in there, in the container.
11. So essentially the only shared component is the kernel?
And then any other sort of network resources and stuff like that which you would interact with through the kernel.
I mean it comes out of the box with some functionality to sort of help containers interact on the same host because they are an isolated network environment on the same host and it comes with some capabilities to help them interact on the same host, once you get to interacting with containers on another host, that’s sort of out of the scope of Docker, and so there is a number of different approaches to solving that, most of them are sort of orthogonal to Docker technology, things like SDN is something that you can use regardless of whether you are using Docker or not. Another alternative is to just do kind of a service discovery system where you use something like Consul or Zookeeper and have services register what port they are listening on and what IP and then being able to look them up through the service directory.
I guess there are two ways of thinking about it because on one hand, micro services was in a way sort of the inspiration for the thinking about how to design Docker and what we wanted to make it sort of push people towards. And so even though it can be used for a lot of things it really is pretty beneficial when you are building micro service architectures. But in terms of the actual size, really it’s service oriented architecture and so you could even use Docker for any service oriented architecture even if you have a giant JVM monolithic service, you can run it in a container but the way people think about micro services is generally a lot more focused and there is a lot of definitions of what a micro service is, but in general it means that you are going to have a lot more them because you are splitting up your application into many little services.
And so because of that it’s very beneficial in the Docker world to have very minimal containers for each service, so I think it’s, a lot of people when they start using containers and Docker, they are coming from a world of creating VMs and VM images and they kind of expect to install everything that they would normally install in a VM in a container and run for example SSHD, because you want SSH into your container, don’t you? So in the Docker world the way we kind of wanted people to use containers was to focus on the one thing that it is running, turns out SSHD isn’t necessary because there is other sort of Docker specific ways to kind of get in there and then you don’t have to worry about having SSHD in there and all its dependencies, because you want to try and focus everything that is in this container and get it down as small as possible, so you only want the dependencies of the thing you are actually running. Turns out there are ways that you can sort of attach and you can kind of create an SSH container that is attached to the network name space or in the file system of another container, it’s also another way that you can kind of get in there the way you normally do a SSH, so you normally need that, but there are other things that people might want think they need to put in there like some kind of logging tool or something, logging service, SysLog or what a lot of people do is put their build tool chain in there because they imagine if you are going to build a container and you have an application that you build let’s say it’s a Go application, you have to have all the Go tooling in there and to compile binary, but a lot of the time your build tool chain includes all these other dependencies and not of things you are going to be using when you actually run the container for its actual purpose. But you know traditionally people that’s the way they think about building VM’s, so they’ll do the same thing but in the Docker world it's usually best practice to have a separate container, to have all your build tooling and you do the build there, and they will produce a binary, ideally a static binary that you can just drop into another container.
Go programs are a great example because they are all statically built binaries and they don’t really depend on anything else but libc, and so you can drop them into a Busybox container, Busybox comes from the embedded systems world, things that run on really small devices, so it pared down to the smallest system possible and it’s actually all in one binary, all the sort of the Linuxy utilities are all in this one binary, and so you just drop a Go binary in there and you have like maybe a 20MB container which, if you were to start with Ubuntu as a image inside your container, it'd be 200MB just to start with and you haven’t even installed your application and all its dependencies. But the benefit of that it’s that the way you distribute containers over the network and all this if you are going to have to, and having a registry involved it means there is a lot of this pushing and pulling and stuff over the network that you really want to, in order to get the benefits of fast iteration and fast deploys and stuff like that, you want to have your container as small as possible so they can be pushed and pulled as quickly as possible, started up as fast as possible, built as fast as possible, so there is a lot of benefit to building minimum containers and I think everybody should try, it’s a best practice to build the smallest possible Docker containers.
Yes and not necessarily build a new one but start a new one and in fact that is kind of the things that Docker has the functionality to stop a container and then start it again given the same file system, so if you start it once and it writes some stuff to its isolated the file system, stop it and start it again you’ll have all the file system changes there and again that’s very sort of like a virtual machine, that’s how virtual machines work. But again sort of a best practice, I’ve actually never stopped a container, I always kill a container and create a new one and it is that sort of immutability, so it’s like whatever changes it’s going to do I’m going to throw those away, and so that also affects the way to think about designing your software to run in them, but that’s not entirely new, that’s running again coming from the platform as a service world, you already had these kinds of constraints if you are building for Heroku or dotCloud or AppEngine or those things. But it turns out just to be good practice in general because it helps reduce, getting into a funky state and then dealing with bugs later on that fall out of that, you kind of want to fail fast and start clean.
I’m really working on a lot of projects, if you look at my GitHub [Editor's note: https://github.com/progrium ] and it's almose like there is a new one every week but some of the bigger ones that are Dokku which is a single host, sort of one of the first killer applications for Docker it gives you a mini Heroku, a host using Docker and that was recently sponsored by Deis, so Dokku it’s been around for a year and it’s sort of fallen a little bit behind and so Deis as a sponsor it’s a, I can kind of breath new life into it and that’s really exciting and there are a number of tools that I have around service discovery ,if you look at my blog www.progrium.com, there is a couple of recent blog posts talking about service discovery specifically in the Docker Ecosystem and there is a number of projects that are really useful there, but yes, there is always something new, so I guess if you are following me on Twitter, on GitHub or something, you probably find some cool stuff.
Werner: Ok, so audience, we are all going to follow Jeff and thank you very much!
Thanks!