Chris' full question: This is Chris Swan, I am one of the cloud editors at InfoQ, I am here with Anil. Anil, earlier this afternoon at QCon, did a presentation on Mirage OS so I am going to try and ask him questions about different things. But if we could start out, Anil, with you just introducing yourself, and talking a bit about the background and what got you to Mirage OS in the first place.
I am Anil Madhavapeddy, I am faculty at the University of Cambridge and I was on the founding team of Xen hypervisor. This was a research project that we started in 2002, quite a while back, and it came out of a long line of research into operating systems, trying to figure out what exactly to structure these operating systems and Xen was just in the right place at the right time and it happened to be a really nice way to run virtual machines in a very safe and controlled fashion, but crucially without breaking any backwards compatibility. So a lot of people who adopted Xen did it because they could run all of their Linux virtual machines and their Windows virtual machines really easily using open source software.
So I ran a lot of the engineering operations in XenSource, which commercialized Xen, and the thing that I wanted to do after that whole crazy ride was to come back to academia and try to figure out how to build operating systems that run specifically under Xen, so I started exploring how to build very small compact specialized operating systems and that turned into Mirage OS and the intuition was that we could get rid of gigabytes and gigabytes of unnecessary code that we spend a lot of time in Xen making sure it works in exactly the same way, and just eliminated it entirely from the operating system stack. So this was really my attempt to get away from Xen and come up with new operating systems architectures for a very modern cloud with attacks coming from everywhere and people charging for all the resource you use and distill it down into this very concrete project that we now call Mirage OS.
Chris' full question: Within the software world in general it feels to me like Moore’s law has allowed us to bloat and it also feels to me that it may not be the next couple of years, but sometime that party is going to come to an end. So, are you preparing us for the next generation where we are starting to have to think about all that optimization we’ve left on the table for decades now?
It’s a great question. One of the big things that we’ve learned about Moore’s law is that you never fight Moore’s law, even when it’s dead. So competing in performance has never really been successful, but the thing that really surprised me with Mirage OS was that we not only went for performance gains, but we also took the decision to rebuild the entire operating system stack in a type safe language, in OCaml, and where we’ve really seen gains now is in security, because once a system bloats beyond control you can always make it performant by just looking at the right hot data paths required to make it go fast, but one thing you never do is make it secure, because you can’t take 20-30 million lines of code, all of which have no real strong abstraction properties, and try to define for example a security property that matters such as confidentiality, or lack of buffer overflows and so on.
So where Mirage is really making inroads is in building these very small secure services, so we need an analogue of Moore’s law in security, which is once your line of codes and the number of components that go into your system, all hit a certain tipping point it’s basically not possible to build really hardened security systems. And people have recognized this in specific domains such as mission critical systems in NASA or in military specs systems, and it’s taken that ease of use that we have in the cloud and applying it to the discipline that goes into building those sort of systems in those very specific domains. So I don’t think we’re going to ever defeat Moore’s law, but I do think we need a new law for security that says if you pile in too much junk, eventually the whole tar is going to fall over and cause you a lot of pain.
3. Is this a repeat maybe of some of the arguments we saw around operating systems in the ‘90s? [...]
Chris' full question: Is this a repeat maybe of some of the arguments we saw around operating systems in the ‘90s? I kind of fondly remember the, I don’t quite remember if it was the Linux is dead thread from Andrew Tanenbaum or, Linux is obsolete, that’s what he said, because I shared a lab with some of the people that got involved with that thread, I wasn’t personally involved in it, and the argument there was microkernel operating systems are coming, Taligent, etc., etc., who wants a giant monolith and yet the giant monoliths ended up succeeding, but perhaps if we look at Xen, was Xen really the success of the microkernel in the end?
There have been several papers written about this, one paper says that Xen is microkernels done right and the other one says that Xen is microkernels done wrong and they are both written by the same author, by Steve Hand, and they both make the point that architectures don’t really matter, it’s really the tradeoffs you are making between security and programmability and compatibility, so this is the fundamental tradeoff that operating systems have been trying to find forever, when do you maintain the ability to run old code versus a new shiny system that will run using the latest hardware and so on. So, the thing in Mirage is that it isn’t anything to do with microkernels and so on, it just happens that we pick Xen as the first execution platform.
The way that Mirage is built is that it’s designed to be future proof because the way that we construct the operating system has a very different interface to the application stack, so it uses functional programming. So the idea there is that today we can run Mirage under Xen, but in ten years from now when for example secure L4, which is formally verified hypervisor, becomes widely employed we can trivially retarget to that. But we can also retarget at very exotic runtimes such as JavaScript, so the whole of Mirage, the whole of the operating stack we’ve built can compile just fine into other environments like that and the reason for that is that we redefined what it means to be portable, what it means to have operating systems interfaces, so we’ve given up compatibility with source code, but we retained compatibility with all these previous operating systems, you can compile into Unix binaries, we’ll soon have Windows supports, Windows executables, and we can also look at Docker integration, all of these things are just normal backends onto Mirage applications. As for the Xen wars, they will continue forever.
One of the really nice things about Xen is that it’s made it possible to build a whole new generation of operating systems, and the reason Mirage would have failed ten years ago is that we didn’t have a virtualization layer which provided with simple device drivers, with all of the right abstractions that we can just build on once. If I was building Mirage ten years ago, I would be building USB device driver stacks, network device driver stacks, that would then have to be repeated across all of the physical hardware in the world, that’s clearly never going to scale. So this is just in the right place at the right time, I just have to build a Xen layer, a VMware layer and then it will just run a billion hosts on the cloud. So we just got a bit lucky there, I think, that’s really the value of Xen.
Docker is a really exciting technology because it’s just very complementary to the work we’ve been doing with unikernels. The reason that I stopped off at the Docker open space is that I use Docker every day because all of the Mirage build systems and the test suites use Docker containers to capture the build environments on which we build unikernels. And so we use them through regular builds, I build about 4,000 containers a day for the entire OCaml package repository, but the thing with Docker is that it’s really nice frontend fundamentally on top of a Linux containerization stack with LXC and the associated technologies. So what I went to the open space for was to figure out exactly how would I replace the container backend in Docker with the unikernel backend and it so happens that people there are people like Luke Marsden and Andrew Kennedy who have been working on exactly the same kind of extension mechanisms with Docker.
So I actually came back out of the open space with knowledge of exactly where to apply a little patch in the Docker to add a unikernel backend, so when you do Docker run it would use all of the nice Docker frontend to fetch all of the dependent packages and so on, but when you do Docker run it would actually spin up a virtual machine instead of spinning up a container and that actually looks like a very simple patch to add in. Of course, it’s very simple to prototype, getting that upstream into Docker not only will it take a lot of architectural discussion and so on, but I am just looking to prototyping the end to end system, and the open space was great, I know how to do it now.
Chris' full question: Fantastic. So it feels to me that Docker kind of started out keeping developers in their comfort zone letting them use familiar OS user land, familiar package managers and that kind of thing, but the tide seems to be turning now and people are a bit more focused on resource utilization and efficiency, so exactly the same things that you are doing with Mirage, and we are now getting these super lean containers that are just a thin wrapper around binaries and have all that dependencies; do you feel like this is part of an industry trend where Mirage OS and unikernels have one piece and perhaps runkernels are taking another piece and then the user land segment is going to be lightweight containerization?
Again, I always go back in not trying to compete in performance and resources too much because especially in x86 and cloud it doesn’t matter so much if you are using up a few extra megabytes of ram, in particular existing cloud providers are geared towards either minutes or hourly charges for virtual machines so you never really see the benefits, remember unikernels operate in milliseconds based so we can start and stop these extremely fast, where it does matter though is when you are doing DevOps, dependency management, you are trying to manipulate these images as quickly as possible so you can deploy security updates, in that world a unikernel is just a megabyte in size versus one gigabyte Ubuntu image even differential images and these tend to be quite large if to do a Docker pull.
So I’m very clear that all of our use cases are driven by something other than resource usage, so on the cloud in particular HowVM and Mirage, are two unikernels that are focusing on security and small footprint size and we are seeing that in those specific domains that need very secure services it’s absolutely fantastic. Unikernels are some way away from being general purpose stacks, and they will positively never be general purpose stacks, so if you need to have your fully compatible LAMP stack that has a POSIX files system then you should be deploying an existing container running CoreOS and Docker, but in a lot of cases we can slice away a particular part of functionality, for example if you are deploying a LAMP stack, I might just want to run a SSL proxy that has a unikernel. There is no need for that SSL proxy to pull the full weight of Linux baggage with it.
And so breaking down these containers into small protocol handlers and then evolving those into unikernels seems to be the right direction to go in. And this is why Docker is so strategic, because Docker provides us with the API to describe our application, to describe the deployments and then figure out exactly which portion of that application stack to turn in to a unikernel so this way we don’t bite off more than we can chew. Remember the unikernels are extremely ambitious because we are rewriting the entire operating system stack in order to realize these benefits so it will be some time before they have the full set of capabilities that Linux or Windows does, but right now we work very well on certain domains.
Chris' full question: So it feels like one of the remaining challenges particularly around security is in dependency management, so we’ve had a whole rush of now brand named security vulnerabilities with people rushing around updating underlying libraries. Obviously when we are making things smaller we’ve got fewer things to worry about, but what do we do about cataloguing those things and often when we make things smaller we end up with more of them; so what do we do about sprawl?
So this is one of the areas that have been the most exciting in the last year because what we did with Mirage OS was unify the way that we manage our binary deployments with the way that we do source code management, so we built as part of Mirage OS, Thomas Gazagnaire one of the team built OPAM, a package manager for OCaml, and what OPAM does is it has a full Git based workflow. So whenever you do OPAM install for a package it fetches all of the source code required for building the package, not binaries, and it provides you with a compilation environment. So because we built OPAM in the age of GitHub, it uses Git as an inherit part of its workflow, so whenever I install a Mirage package it uses OPAM to track every library that goes into the kernel and we have a full Git tree that tells us everything that went into it.
So for the last four years, the live Mirage website has been using Git to track its own deployment, so I can go back in time and despite all the development in Mirage OS, I can use my existing Git tools to take a unikernel we’ve deployed onto Amazon and figure out exactly what every line of source code that went into it is. And we can do this in one Git repository running on GitHub, mirage/mirage-www-deployment. So this is incredible because we can not only track our input, we can also track our outputs in the same version control system, and this unlocks all the benefits of version control; we have git log, git bisect, git blame to finger various developers for their mistakes and we don’t have to depart from the way we debug source code as well. So we love this so much we built a new database called Irmin which is codifying this workflow into a library that lets us program Git based databases without having to shell out the Git.
So we have an entire reinterpretation of Git that lets us build data structures and queues and maps and sets and serializing them to Git, use Git in order to inspect the database but everything else is being programmed directly in the functional language. So you are going to see a lot more of this integration around unikernels and we are going to see where that goes in the next few months, I think.
Well, I am very much not a cool kid, so one thing that we’ve always ignored is what the latest language of the day is. So OCaml is a very old language, it’s over 20 years old and it’s one of the best systems languages I have used, ever, because it has just the minimal set of features it needs to build stable natively compiled embedded systems. The fact that it’s functional it’s almost irrelevant, just a very good type safe small garbage collected language. So one of the most important things when building systems you have to remember some of this code has been written eight to ten years ago is you just want to maintain compatibility, you don’t want to rewriting your source code every time new language comes out, building a TCP stack, building a TLS stack, these are all major undertakings that take years of work to get right and so this is why the Linux kernel has stayed around as long as it has, rewriting it takes a lot of effort. So when we started Mirage OS neither Go nor Rust existed and it will take a significant amount of time for those languages to settle down and just be as robust as something like OCaml is.
On the other hand though Rust is an extremely exciting language, it offers us the ability to remove the garbage collector from our OCaml code and to move to this very nice linearly typed world, so I am watching it with great excitement. And we actually have some technology called Cmeleon which lets us write OCaml code that bridges into Rust and bridges into Go without ever having to write a line of C. So our high level goal is to never write a line of unsafe code again, we are only going to write OCaml or Go or Rust, anything that is type safe and the bridges between those languages will be established in the high level language so we never have to drop down into something that could introduce a buffer overflow. I think it’s shockingly irresponsible for any developer to be writing in an unsafe language these days when you don’t have to because anything you deploy in cloud will just be owned and buffer overflowed before you know it. So our goal is to make sure safety above all, and performance will follow afterwards and whichever language we use will be dictated by those goals.
But OCaml, by the way, we are seeing a great movement towards big companies using OCaml, so in the last couple of years Facebook has released five major open source projects that all use OCaml, their new language Hack, we’ve seen Jane Street go from strength to strength with open source releases, so I have a lot of faith in OCaml, but we are not evangelizing OCaml, we are just using it and other people want to use it in a similar way will find huge benefits in doing so.
Chris: Sounds good. Well, thank you very much for stopping by today.
Thank you very much, it was fun.