Key Takeaways
- Cloud-native infrastructure offers application-centric abstractions.
- Running cloud-native infrastructure is about more than technology; it’s also about creating a learning culture.
- Think of infrastructure as code as a one-off script you can run, and infrastructure as software, as a service.
- Remember that your system can be refined in endless ways, but if it doesn't eliminate bottlenecks then it was wasted work.
Think that running your virtual machines in the cloud gives you cloud-native infrastructure? Not so fast, say Justin Garrison and Kris Nova in their upcoming book. Cloud Native Infrastructure: Patterns for Scalable Infrastructure and Applications in a Dynamic Environment from O’Reilly Media is a collection of guidance about building and managing modern infrastructure. InfoQ reached out to the authors to learn more about what they’re proposing, and how you can act upon it.
InfoQ: You make it clear that using Infrastructure-as-a-Service doesn't automatically make your infrastructure "cloud-native." How do you describe cloud-native infrastructure?
Garrison: Cloud native infrastructure is about creating reliability out of chaos and useful abstractions for applications. Cloud native applications enable you to run them when you don't control the infrastructure. It forces engineers to write applications that are controlled by software and expose the necessary operability hooks in the application instead of through typical infrastructure or human processes.
It's hard to use IaaS directly because there's too much chaos when you don't control the underlying technology and typical IaaS components are not directly consumable by applications. You can do it. Netflix's original cloud native infrastructure was done with VM images and deploying IaaS directly. There are trade-offs with that approach and newer technology allows more reliability and better abstractions.
Nova: Justin is right, IaaS is just infrastructure with an API on top of it. Cloud native is a complete and fundamental shift in the way we engineer these abstractions.
InfoQ: What was the motivation to write this book? It's obviously a lot of work, so what do you hope to accomplish (besides fame and fortune)?
Garrison: We wanted to stop people and companies from repeating mistakes we saw when trying to adopt or migrate to the cloud. There was a lot of confusion about why the cloud is different for infrastructure and we wanted to show how other companies have been successful and the patterns to help others learn from their successes and failures.
We also wanted to give back and grow the community we are both a part of. The larger the community, the more diverse and the stronger we will all get.
Nova: I personally love challenging myself. That is why I am always climbing more and more dangerous mountains. The book is the same thing for me, it was hard, and that made it attractive. Furthermore, this is something I was very passionate about. I spent my entire career building up to these lessons, so getting to encapsulate my knowledge in the form of text was about the closest thing to therapy I could imagine. Writing felt good.
InfoQ: You say that cloud native infrastructure is something you can't buy. What do you mean by that?
Garrison: As much as some companies want you to believe you can buy a product that gives you cloud native infrastructure, it's not something you can just throw money at and be done. Many parts of cloud native are an evolution of DevOps practices that require new ways of working and a learning culture to benefit from it. If you have a slow-moving, highly-regulated, or traditional on-prem environment you likely won't benefit from cloud native infrastructure.
You should find the pros and cons and figure out for yourself if it solves your problem. We try to be clear about that in the book. It's not a magic bullet and needs to be adopted when applicable. When building reliability on top of chaos you'll need to spend time to learn how your systems behave and adapt accordingly. You can't buy cloud native infrastructure, but you will pay for it.
InfoQ: You are careful in the book to not frame solutions around too many specific products. But how do you approach "build vs. buy" when establishing cloud-native infrastructure? How much should a company--who isn't in the business of building infrastructure platforms--build themselves? What components make the most sense to buy (or adopt existing OSS products)?
Garrison: The most cloud native infrastructure that exists is the one you don't have to run. If your applications will run on a serverless platform like AWS Lambda, Google App Engine, or Azure Container Service by all means that will be the most cloud native infrastructure you can run.
The reality is, many applications won't run on cloud provided services without some infrastructure management on your part. Use existing services when you can, use existing products and OSS projects when they solve your needs, and build infrastructure applications as a last resort. You can't be afraid of lock-in because there's always lock-in to some degree. The absolute worst type of lock-in is the one you build yourself.
InfoQ: In the book, you make a key distinction between "infrastructure as code" and "infrastructure as software." Explain what you mean there, and why that distinction matters.
Garrison: When Kris first told me about her ideas for how these were different I loved it! "Infrastructure as code" has evolved from the configuration management days to describe how a single server would be configured and now also describes terraform modules to describe how infrastructure should be provisioned. But the "as code" typically means it's limited in scope (a server) or only applied some times (not always enforced).
"Infrastructure as software" tries to take the code one step further and make sure that it's always applied (software is literally running code) and it works using the reconciler pattern we explain in detail in the book. The infrastructure applications need to continually drive your infrastructure toward a desired state. Examples of this pattern can be seen in how terraform works and in Kubernetes' controllers. You don't always have to write infrastructure applications, but when you do, they should use these patterns.
Nova: Infrastructure as code operates in one direction, and at one time. Infrastructure as software operates in both directions, and runs over time. I think of infrastructure as code as a one-off script you can run, and infrastructure as software as a service. Something you would `systemctl start myInfrastructureAsCode` for instance. In regard to the directions, infrastructure as code typically is read-only (like Terraform) whereas infrastructure as software is a negotiation between the software that is running and the declared state a user intended. Infrastructure as software often will mutate data and infrastructure.
InfoQ: You both seem to reinforce that "cloud native infrastructure" relates to public cloud. Is this something achievable and worthwhile in a private environment?
Garrison: Achievable? Of course. Many companies do it and from what I've heard even a couple of them are happy. Worthwhile? Depends on scale and requirements. Running your infrastructure in a public cloud is more than just technical benefits. As a matter of fact, public cloud will likely be technically inferior in a lot of ways compared to on-prem servers. The VMs are likely slower, the network may be less configurable, and the cost will seem more expensive compared to buying servers.
A major benefit of public cloud comes from process rather than performance. The people hours you can save from becoming an infrastructure consumer rather than an infrastructure builder will be very difficult to calculate but will likely enable a new method of working that far outweighs the technical limitations of a public cloud. Not to mention some of the best infrastructure builders and maintainers in the world work at public cloud providers and the companies behind them spend billions every year building out the infrastructure, R&D, and new features.
The biggest considerations when building your own cloud is not what it will cost you to build the private cloud, but what it will cost you to maintain it (a much larger price than building it) and what happens when you fall behind public cloud offerings. Even if what you build today is technically superior, where do you think you'll be in 3 years? How about 5 years?
You also need to remember that cloud native infrastructure is built on top of IaaS. So once you have built your own public cloud, you still need to create a usable platform for applications to consume. You know, the thing your business cares about.
InfoQ: For those readers who have very manual infrastructure with only light automation in place, where do you recommend that they start spending their time? Where will they see the most value?
Garrison: That's a tough question to answer because every application and infrastructure is different. In general, one of the biggest impacts we've seen to changing how companies run applications is by adopting a container orchestration platform such as Kubernetes. Kubernetes isn't an infrastructure provider, but it is an abstraction layer on top of infrastructure and it forces you to release some manual control of how applications are run.
Even if you manually provision Kubernetes nodes on-prem or use a hosted service such as Google Container Engine you can get a lot of benefits from letting software (Kubernetes controllers) manage software (your applications).
Nova: Start tinkering. I wrote terraformctl as an example of what people might start wanting to do, but my biggest piece of advice is if something is hard, or intimidating you should keep doing it. I want to encourage people to stretch out of their comfort zones.
InfoQ: Conversely, for those who have advanced cloud-native infrastructure, where should they be looking next? Where can they keep refining the experience to drive value?
Garrison: When doing research for the book we talked to a lot of people who have been defining this space for a while. Each of them seemed to have different plans for the next goals for their infrastructure but in many cases, it came back to a couple things. Using DevOps and lean practices to deploy, learn, iterate, and fail as fast as possible and always attack the bottleneck of the system. The system can be refined in endless ways, but if it doesn't eliminate bottlenecks then it was wasted work. Making sure the infrastructure enables engineers to deploy the right thing as quickly as possible is a good goal. This includes deploying applications as well as infrastructure.
In all of these systems, people are the most important and time is the most valuable. You can endlessly optimize for those two things and gains here will likely have a bigger value than any technical solution you can implement.
About the Interviewees
Kris Nova is a Senior Developer Advocate for Microsoft with an emphasis in containers and the Linux operating system. She lives and breathes open source. She believes in advocating for the best interest of the software, and keeping the design process open and honest. She is a backend infrastructure engineer, with roots in Linux, and C. She has a deep technical background in the Go programming language, and has authored many successful tools in Go. She is a Kubernetes maintainer, and the creator of kubicorn, a successful Kubernetes infrastructure management tool. She organizes a special interest group in Kubernetes, and is a leader in the community. Kris understands the grievances with running cloud native infrastructure via a distributed cloud native application, and is authoring an O'Reilly book on the topic called Cloud Native Infrastructure.
Justin Garrison loves open source almost as much as he loves community. He is not a fan of buzz words but searches for the patterns and benefits behind technology trends. He frequently shares his findings and tries to disseminate knowledge through practical lessons and unique examples. He is an active member in many communities and constantly questions the status quo. He is relentless in trying to learn new things and giving back to the communities who have taught him so much.