At DockerCon EU 2015, InfoQ sat down with Gareth Rushgrove, a senior software engineer at Puppet Labs, and explored the concepts behind his conference presentation “Shipping Manifests, Bill of Lading and Docker”.
The topics discussed included the benefits of system package management (manifest) metadata, how to use Docker labels to create this type of metadata within the domain of container technology, and the importance of defining related standards in order to facilitate management and audit of the software supply chain.
InfoQ: Hi Gareth, and many thanks for talking to InfoQ today. I'm sure many of our readers know you as the maintainer of the 'DevOps Weekly' newsletter and creator of the Puppet Docker module, but could you introduce yourself and your current interests please?
Rushgrove: Absolutely. Hi, I’m Gareth Rushgrove. I’m a developer and occasional systems administrator based in Cambridge, in the UK. I work as a senior software engineer at Puppet Labs, building tools to help people manage infrastructure and applications. I have a thing for new infrastructure ideas, so I’ve been using Docker pretty much from the first release. I’m particularly interested in management-over-time problems —how you maintain software for the long term, not just bring up a great demo. I write about infrastructure software and security occasionally on my blog, More than Seven.
InfoQ: At DockerCon EU you were discussing the benefits of manifests (and metadata), particularly in the context of installing software, for example using package managers such as apt or yum. Could you explain a little about this please?
Rushgrove: In my DockerCon talk I spoke about how the file format of system packages was never the interesting part. RPM or DPKG packages carry lots of metadata, like who wrote the software, what the package contains, what license the software carries, etc. It’s this metadata which leads to all the powerful tools built on top of these formats — for example, for asking where a particular file comes from or which packages need updating. Having this information locally becomes important if you want to manage systems, not just install software.
I went on to talk about how this is a missing piece with the Docker ecosystem today. Docker itself has excellent support for labels in recent versions. This provides us somewhere to store that metadata. But as a user community, we haven’t really adopted labels in a consistent way. So your organisation, or individual projects like OpenShift, might have a standard for labels, but we don’t have a way of sharing schemas or building consensus, and eventually tools, around shared metadata.
InfoQ: We are increasingly hearing about security, compliance and auditability issues within the DevOps space, and concepts such as building 'rugged' software are being discussed at many conferences. How important do you believe the software supply chain is within this space?
Rushgrove: The idea of applying supply chain ideas to software is critical, in my view, to us building better software. The 2015 State of the Software Supply Chain report has lots of data and observations about the scale of the problem we face. Security just hasn’t been most people’s focus, and the rate of change in both the tools and the software we create with them has posed a problem for more traditional security approaches. That’s led to many of the frameworks or practices we use not being secure by default. And security is hard, if not impossible, to add on later.
The opportunity we have with containers is to build tools that are secure by default around a single, shareable, artifact. So using standardisation to simplify. Security shouldn’t be as expensive and artisanal as it is today.
InfoQ: You championed the use of labels within the Docker runtime and associated suite of tools. Can you explain more about this in the context of manifests and bills of lading?
Rushgrove: Labels were added to Docker back in 1.6. They provide the ability to store simple key-value pairs against images, containers or the Docker engine. The built-in tools then allow for using those labels to filter, say, a list of containers or images, or to schedule containers against valid runtimes. But the labels are also exposed in the API for people to build tools on top of. In my talk, I built on the shipping container analogy that Docker uses. The shipping containers themselves are neat, but the whole shipping industry wouldn’t function without all the paperwork that tells you what is in a container, which containers are on a ship, and who owns the contents. Labels provide the ability to store the equivalent paperwork for our Docker containers, but what do we put in the labels?
InfoQ: Something I took away from your talk was the importance of standards within this space (and the general lack of compliance to existing guidelines you demonstrated). Can you offer any more thoughts on this?
Rushgrove: Labels provide somewhere to store data, but it’s up to the user what to put in them. Docker publishes some basic guidelines about format; for example, label names should be reverse-DNS namespaced and all lower case. But from a very unscientific look at Dockerfiles on GitHub, it doesn’t appear many people follow the guidelines. I made a small tool, docker-label-inspector, to act as a linter for those guidelines. But past the formatting, I think it’s the names and content where standards are actually useful.
I think the ideal is that we reach some sort of consensus as a community about standard labels, and we develop schemas for those standards. For instance, what should the name be for the label that tells me the license of the container? What format should the value take? If we all decide independently, the labels are probably useful only to people, not to machines. But we could agree to use something like SPDX, and that opens up the possibility of interoperable tooling. Then we could build a whole range of independent tools to use that data. In my talk, I speculated about capabilities like documentation discovery, license verification, linking to source code and release notes and generated user-interfaces, and showed a simple demo of a package search engine for containers.
InfoQ: Thanks for talking to InfoQ today Gareth. Is there anything else you would like to share with our readers?
Rushgrove: Just that if you’re interested in building tools on top of Docker labels, and coming to some agreement on community label schemas, get in touch via twitter (@garethr) or drop me an email (gareth@puppetlabs.com).
The video recording of Rushgrove’s DockerCon EU 2015 talk “Shipping Manifests, Bill of Lading and Docker” can be found on the Docker Youtube channel, and the corresponding slides are available via Speaker Deck.