BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Framework Defined Infrastructure (FdI) – an Evolution of Infrastructure as Code (IaC)

Framework Defined Infrastructure (FdI) – an Evolution of Infrastructure as Code (IaC)

Bookmarks
28:09

Summary

Malte Ubl discusses the future of infrastructure management with Framework-defined Infrastructure (FdI), an evolution of the industry-standard Infrastructure as Code (IaC).

Bio

Malte Ubl is the CTO of Vercel. Prior to joining Vercel, Malte was the Principal Engineer for Google Search Rendering and Engineering Director for Google’s Search on Laptops, Tablets, and Desktop. Malte has also created the frontend infrastructure for a number of Google Web Apps and the web at large. He is also the founder and curator of JSConf EU.

About the conference

Software is changing the world. QCon San Francisco empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Ubl: My name is Malte. I'm the CTO of Vercel. I want to talk about framework-defined infrastructure in quite some depth. Vercel is a frontend cloud. It gives your teams the tools and the capabilities to develop, preview, and ship at the moment of inspiration. That doesn't say too much. This is the picture that I like to give you folks to understand what we're doing. You're building a web app, you're throwing it over the fence, and we're serving it for you, which is nice. You don't have to carry a pager necessarily. We're also generally doing a really good job, because it's our job.

To make this a little bit more clear, if you're in a modern development environment, for example, using GitHub, and you're making pull requests. Then we will come, and we will take your pull request and deploy it on our platform, and give you a dedicated preview URL to give you the ability to look at your changes. We do that every single time you commit Git push. Finally, when you're pushing to production, we're redeploying your site as well. As we're doing that, this is actually really important for this talk, is we are looking at your application. We're analyzing it.

For example, here we have a Next.js application. We're looking what's inside of the application, and then we're deploying it. For example, it's middleware, it's like ACID. I put this screen here because this is really important for this presentation. Another thing we do is we make a framework called Next.js. It's the most popular React framework by far, gets a whole bunch of downloads, and all the greatest brands on the web are using it.

Framework-Defined Infrastructure

Let's talk about framework-defined infrastructure. Werner Vogels, CTO of Vercel. Werner gave this talk at I think, re:Invent, and he had this slide up there, primitives not frameworks. Again, we're talking about framework-defined infrastructure. How is it related to that slide? You got me basically telling you it's time to actually think about frameworks, not primitives. I'll tell you why, and I'll also tell you when that might not be super appropriate. Let's take a slight step back here, and look at how we used to deploy infrastructure. In the 90's, we SSH into machines or worse. I'm definitely the FTP generation. It was rough out there.

Obviously, that didn't quite work, so we started doing semi-automated deployments. That can be anything from basically a bash script to something slightly more sophisticated. I think that relatively naturally then evolved over to infrastructure as code where you were formalizing how you express infrastructure and then automatically deploy it. I'll talk about framework-defined infrastructure as a bit of an evolution. It's actually related to something slightly different, which is called infrastructure from code. I'll talk about in this talk, how these are related, but also different.

Why do we care about framework-defined infrastructure? It provides us portability between different target infrastructure providers, so no lock-in. I'll talk about that. It eliminates the need to manually configure infrastructure to run application production, which is nice. Which also means I have now more time to write product code, instead of doing system management. Really important is, I can use my favorite framework's local development tools unchanged. I don't have to figure out how to run my sophisticated server on my local machine.

Because there's this indirection to how infrastructure is defined, we can actually only use the stuff that the security team ok. That's great. I used that word infrastructure a lot. Infrastructure is network equipment and setups, web application servers, databases, message queues. This obviously is not a complete list. That's not what it meant to be. It's the stuff that you have in a data center, and we have to deploy it. I think what's slightly more complicated of a question is the question of, what is a framework? The way I think about it, at the core of the answer to that question lies the Hollywood Principle.

The Hollywood Principle is, don't call us, we'll call you. That really summarizes how a framework is different from a library. In a library, you call the library. It has a function. It's a nice function, you call it. In a framework, you go and write code to a pattern that the framework then invokes while it is managing a lifecycle, which is very powerful. That can be imperative. We pass a callback, or, which is really popular today, and certainly in Next.js, what we make, how it works is where you have files that are in certain places. These files export functions, and they get called. There are many ways to do it. The common pattern is that the framework calls you.

The framework controls the lifecycle of the system. That led me to this interesting observation that frameworks are really solving the halting problem. Not in general, obviously not. What they do is they allow us to understand what a program is going to do. I can't, in general, look at a program and understand what it's going to do. If I know it complies with the framework, and I understand the framework's flow and lifecycle, that I can actually understand whether this program will halt. That forms the basis for what then will enable us to do framework-defined infrastructure, FdI. The way it works is that you write code in a framework dialect. What I mean with a dialect is that it might be Go, or it might be C++, it might be TypeScript. You still use TypeScript.

There's no different necessarily like DSL, kind of specialized language. That's not what I mean. What I mean is that you comply with the words that the framework gives you. That's really why it's a dialect. That structure that comes from that dialogue, we take that in our framework-defined infrastructure compiler, and compile your program into the infrastructure as code definition. That's coming back to Vercel's platform. We support a bunch of frameworks. I think overall, 35. We have a compiler per framework, where we look at the code put into the framework, and then we map it to the infrastructure primitive that we can deploy to.

Anatomy Of the Modern Frontend Framework

Because not everyone's a frontend expert, I want to really quickly talk about the anatomy of a modern frontend framework. There's many of them. We're supporting 35. They have certain things in common. That is really important. Because I want to use some examples in this talk, I wanted to bring everyone up to speed what these kinds of things are that you'll find in a modern framework so you can abstract a little bit and follow along, even if you're not a frontend expert. They almost all provide something called isomorphic rendering, which is the ability to write some code and run it on the server and on the client, without really changing too much what you do based on the environment.

There's always some kind of routing, which is in general something that frameworks do, where you have some request, and to map it to some function or functionality you want to run, and so that's a core framework function. Because here you have the request, and then the framework calls you, and it figures out what's going on in the middle. That's routing.

There's things like pages and API routes. Pages in the frontend framework are like things that users see. API routes are like routes that get invoked indirectly by other machines. We often have something like middleware where we can write code that lives in that routing space, that changes how routing works. There's clearly data fetching and caching, and stuff like ACID optimization, something you would find in a frontend framework. Very specific to frontend, because you surf images and stuff like that.

Example

With that basis, let's look at an example. On the left you have some files that you'll find in a typical Next.js application. Let's go through that. The first one up top under your code, there's pages/index.tsx. It has a function called getServerSideProps. Again, this is the dialect of a framework. What it means is that every time that page gets invoked, I have to run some computer. That is why our compiler provisions a serverless function to actually execute that computer. Let's go one down, pages/blog.tsx. This one has a getStaticProps method. Again, in the dialect of Next.js Pages Router, what that means is that this page can be rendered at build time.

That means we will then do that at build time. All we have to deploy in production to serve that page is a static file. That's why it goes on to the high-performance static file service. That's just an image file. Goes out to the high-performance static file service. The next file is the middleware.ts. That's a bit of a different beast, that runs the routing phase that needs to run globally. We deploy a Global edge worker to perform that task.

Then, finally, we have a JPEG image. We could serve that with a high-performance static file service, but for larger images, it makes sense to do image optimization. We deploy an image optimization service to help you serve that image in perfect quality. All of this knowledge, how this file maps to this service, how this file needs to go there has to be somehow routed. That's why we take apart that application, deploy it, and that's the very right to our application routing table, which then allows our gateway service that takes all the ingress traffic to our service, and allows it to route and then go to the correct infrastructure for that customer.

Let's look at this in a little bit more detail, and also, kind of look at the lifecycle here. Coming back to this blog post example, you can almost ignore that first export default function. That's just the code that actually renders the blog post. We don't really care about this in this example. We have this file, and we have this export. I already mentioned this, getServerSideProps. Again, we don't really care about the contents here. That method present means that we need to deploy that serverless function.

We push it, Vercel deploys a serverless function. Next up, I'm going to go to my text editor and make a change. getServerSideProps, let's change it to getStaticProps. Again, this is a change of the source code. We saw that in the previous diagram that if we have getStaticProps, we can actually render it at build time and deploy it to production as a static file. Just the code changed, you redeploy. The infrastructure that is needed for this page changes. Vercel will just do that for you. A final change. It's a little bit more subtle. I'm adding this one line here, revalidate: 60, again we're going back and we're going forward, revalidate: 60.

What this means is that every 60 seconds with traffic, this page will be rerendered. Outside of those renderings, we serve a static file. We need to deploy both the static file plus the ability to rerender the page, so we need another serverless function, plus the ability to, in the background, perform the invocation against that serverless function, and then write it to our static file serving system. Again, one line of code and a whole bunch of orchestration all has to happen. That all comes from the FdI compiler understanding what you do. Just to complete the example, finally, we got the middleware.

Again, there's just this file, this is my project, but it leads to the global provisioning of the edge worker that can actually perform this work. Eventually, you deploy to Vercel, and we give you this deployment summary. Again, you get this every time you push any change to your Git provider. We will then determine the infrastructure that you need and tell you what we deployed. Because you didn't tell us, you only gave us your program, and then we compiled your program to this infrastructure.

Where Does Vendor Lock-In Actually Come From?

I want to pivot a bit. This is the basic of FdI. This is how it works. I want to now look at various aspects of it. The first one that I think is really key for FdI really differentiates itself from other way of doing things, is that it's different in terms of vendor lock-in. I want to be very open about this. Vercel, we are, very broadly speaking, a platform as a service company, and they have a reputation of having vendor lock-in. Let's look at where it actually comes from. I'm going to walk through a few files. This is Terraform. I'm not a fan of Terraform. Clearly my editor doesn't even have syntax highlighting. I didn't use it all that much. That's ok. We can all read Terraform.

Anyway, what I really wanted to point out is this is a Terraform to deploy a lambda function. In Vercel, all we did was look at your file, and we deployed a serverless function, but you didn't say you want a lambda or a Google Cloud Run. Here in this Terraform file, you very clearly say what you want. You're now locked into AWS, because you can't just deploy this to Google, because Google doesn't have lambda. You're directly connecting yourself to your deployment target. This is another vendor. They have a CLI, it's called wrangler. You want to deploy their stuff, you need to use their CLI, so you're locking yourself in. I even changed the name of the vendor here. They have this toml file. They invented some syntax to define headers. Sure, it seems fine.

Again, you're locking yourself to the vendor because you're writing code to their conventions. That's not how Vercel works. That's not how FdI works. There's this customer, a very large customer of ours, and they shared their source code with me, which, obviously, you don't have to do, but they did. I was looking at their entire large applications to see if they did anything Vercel specific, and the only mention of even Vercel in their entire 100,000-plus line code base was that they Git ignored this build artifact that we make when you're deploying. This entire application that is very sophisticated, and makes literally a billion dollars a year or more, this application gets deployed to our service. It has no idea that's what this does. It has that gitignore file. That's really nothing. It's all fully automatic. All you do is write code to the conventions of whatever framework you pick.

Local Dev Experience with FdI

This is actually also really important for my next point, which is the local development experience with FdI. If you've ever tried to go and deploy anything with a sophisticated cloud setup, you probably have something like Docker Compose situation, like running lambda locally is incredibly difficult. On the other hand, you're probably using a framework. If you look at your framework, it usually comes with some instructions for how to run it locally. The framework is really incentivized to really help you to make that a good experience.

Our experience is they are really good at it. We make a framework, and we invest so much time in making local development experience good. Vercel does not have a local development experience. You just use whatever your framework tells you to do. You run the command in the Getting Started page. That's all you have to do. You get the perfect, best-in-class development experience locally.

Immutable Deployments

The next thing I wanted to mention, is the notion of immutable deployment. I already mentioned that in Vercel, whenever you push, we deploy for that event. Like you committed a git sha, and we give you an infrastructure for it. Let's look at an example. You make a git commit, whatever, abc 123. You deploy that git commit, and we may give you a virtual infrastructure for that git commit. Remember back how I changed, getServerSideProps to getStaticProps, and that change to my program changed the infrastructure that I needed. Let's imagine I commit that change and deploy it.

You don't want to change the old infrastructure because everyone's committing code. That's not how it works. Instead, what we do is we give you a virtual infrastructure for every single git commit. Obviously, there are some deduping with content addressing. From a developer point of view, every single time you get a completely new infra. There's many ways why this is amazing. One of our most popular features is instant rollback. Because when you deploy to production, the last production deployment is still there. If you realize, I have a bug, you can just go back, and not just 1, 2, 3, 4, you can go back as far as you decide really.

We don't actively dedupe deployments, unless you set up a policy for that. It's called instant rollback because it doesn't do like key 8 rollback, and then it does a bunch of stuff. Really, all that's happening is that you have these immutable deployments. When you say www.mydomain.com now points to the other deployment, all that's happening is that a message goes out globally to our serving stack saying, serve this other deployment. It's effectively a bit flip compared to the traditional way of doing this.

Maybe as a bit of bonus content, that enables something, which is very powerful that we call skew protection, where we actually allow clients to force the server to run at the same version so that you don't have to worry about the problem that the server might have changed under you, and now communicates in a different fashion from what you expected. Again, that's possible because we can actually serve different deployments, effectively infinite deployments at the same time.

Does FdI Depend on Serverless?

This is an interesting topic. I think what it does bring up is whether FdI depends on serverless, because I called it virtual infrastructure, that really only works if you have some notion of serverless infrastructure. Because if you actually have a computer, and you need it for every deployment, and every git commit makes a deployment, then you really quickly run out of computers. You need some notion of something that's serverless, and in particular, something that scales to zero, so that if you have in theory that old deployment, but it gets no traffic, it doesn't take more resources than whatever the storage cost that actually has the data, which shouldn't be all that much. Yes, we need serverless, but if you've really determined you could probably make it work without.

Does It Work for Stateful Workloads?

Next question, does it work for stateful workloads? What I mean with that would be like a database. It's definitely much harder, but it's not impossible. What it really means is that you need some way of provisioning infra lazily when usage is discovered. Say, to connect to a database, ok, then we make one. You need some way of isolating changes to their branch. In Rails, there were these database migration scripts, which are best practices nowadays.

If they operate on your core database, then you don't want to run them on a branch or for every commit. Obviously, most databases don't have this. You have an instance here, an instance there, but I think what we are seeing is that there's a trend of even stateful workloads coming towards FdI. What, for example, we are seeing is a provider like Neon, which is a Postgres provider, and they support copy-on-write branching of your database. They can actually give you a copy at very little cost, every time you deploy, and really bring you closer to that FdI vision, even for a stateful workload.

FdI Tradeoffs

Having said that, there's definitely tradeoffs that I want to acknowledge. Especially if you're a DevOps person, it can feel a little bit off that you don't really know what's going on in production, because it comes from the code so you have to deploy to find out. Again, we tell you when we deploy, exactly what we did, but there's not this Terraform file we can just look, this is what's going to happen. There's definitely this indirection. As I was mentioning, especially with stateful workloads, this can be a bit of a leaky abstraction.

If you have only one, then it can't map to this range of deployments. In your observability strategy, you have to account for this ephemerality of resources, that we can do a top-down observability tree because things are bubbling up bottoms-up, and so you have to account for that. It's manageable, but I want to acknowledge that's the thing. Because there's this continuum of space, the original infrastructure as code, but it's really, typically a declarative thing, where you wrote down, this is the infrastructure that I want.

Maybe you actually wrote imperative code, but you said what you wanted. There is infrastructure from code as an evolution, certainly, of infrastructure as code where you write code that actually does business logic, but you're using a specific SDK that is bound to the infrastructure. It's very tightly bound to the infrastructure. That is very related to FdI because you're writing business logic code. It doesn't have this extra level of indirection where you really are not thinking about infrastructure, you're really thinking about your framework. The compiler takes the logic out of the framework to make things work. This is definitely a continuum, and these terms are related.

Examples Outside of Vercel

There's a bunch of examples outside of Vercel. I think Google Service Weaver is a really interesting technology, in a very different scope. They're thinking about how to deploy serverless services into a microservice architecture. I think Deno KV, it's actually a really interesting example. They have this KV that works locally, works in production, and is really seamless in the deployment flow. Then there's stuff like Encore.dev, Nitric framework, and they're definitely on that continuum. It's more like infrastructure from code, somewhere in between. These are all worth checking out.

Summary

With that, hopefully, I convinced you that it's worth paying attention to frameworks, and not only thinking about directly programming the primitives. It's about portability between different target infrastructure providers. We want to eliminate the need to manually configure infrastructure to run an application in production. We want to increase the time spent writing product code over system management. We want to use whatever our framework gives you for local development.

Finally, as a security team, we want to know what's going on in production. We love it, that there's a system that just only allows folks to do what actually the system does, and doesn't have an easy way of bypassing it, and folks deploying whatever they want to production.

 

See more presentations with transcripts

 

Recorded at:

Aug 30, 2024

BT