BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Programming the Cloud: Empowering Developers to Do Infrastructure

Programming the Cloud: Empowering Developers to Do Infrastructure

Bookmarks
50:10

Summary

Luke Hoban looks at some of the leading solutions across various different domains - from serverless to static websites to Kubernetes to infrastructure as code - to highlight areas where developers are starting to take ownership of cloud infrastructure more directly.

Bio

Luke Hoban is the CTO at Pulumi where he is re-imagining how developers program the cloud. Prior to Pulumi, he held product and engineering roles at AWS and Microsoft. He is passionate about building tools and platforms to enable and empower developers, and is a deep believer in the transformative potential of the cloud.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Hoban: My name is Luke Hoban, and today I'm going to be talking about programming the cloud. When I talk about programming the cloud, what I really mean is taking advantage of all these amazing building block services that we have in the cloud providers today. Things from AWS, from Azure, from GCP, from Kubernetes, from every other cloud provider that we work with on a daily basis have amazing building block capabilities that we're now building on top of with all of the applications we're delivering. What we really want to do is turn all those capabilities into something that we have at our fingertips that we can really take advantage of in a really natural way, not hand over to some operations team to do something for us, but how can we, as developers, actually take advantage of all these capabilities directly?

My interest in this topic comes from my background and the things I've worked on over the last 15 years or so. I spent most of the early part of my career actually building tools for application developers, so building programming languages, and IDEs, and frameworks at Microsoft. I helped start the TypeScript project at Microsoft. I did some of the early incubation work on what has now become Visual Studio Code. I worked on the ECMAScript standards body, a variety of these sorts of things. Worked on some of the tools that have helped application developers to scale up the complexity of the applications that they build.

I then went over and worked at AWS on EC2 and worked on some of the raw cloud infrastructure that we have now available to us. Got a front-row seat to the rate of growth that's going on inside the cloud, and also, a front-row seat into how different the tools and practices are around how we're taking advantage of the cloud today. We do have these amazing building blocks, the tools and practices are quite a bit more immature in a sense than what we have in the application development world.

Then, most recently, as Joe [Duffy] mentioned, I joined him and a number of others to start a company called Pulumi, which is really focused on bringing these two together, and I'll show some demos throughout here that used Pulumi. We're really going to talk more about the general problem and opportunity that I see here.

An Analogy

To give a background and analogy for why I think there's a problem and analogize into something which folks may have some understanding of, I want to start with where application development was, say, 40 years ago. I don't actually know exactly how long ago it was that this was a state of the art, but some time ago, the state of the art was "I'm going to write assembly. This is how I'm going to build applications. This is the best thing I have." One thing that's really important to understand about this being the best thing is that there was really a period in time where this felt like the best thing. By writing this code here, I got access to computers. I got access to use computers to do computations for me that either would have been hard or potentially impossible for me to do myself. I got access to the modern microprocessor, all of these capabilities. If I had the skills to go and learn how to use this thing, I had amazing capabilities at my fingertips.

The problem was, looking from today's lens, this was an incredibly low-level way of working with this. That would not have, by itself, scaled up. We can look at some of the things that are clearly missing from this from a modern perspective. There's things like, there's no variables, there's no self-documenting code, where my variables have names that I can understand. I have to look at these very complex registry names and offsets. There's no loops. I have to manually decode loops into branching constructs. I don't have any of this basic control flow capabilities.

I don't have functions, and this is where it gets more interesting. It's the ability to take a piece of code, write it once, give it a name, reuse that in multiple places. This concept wasn't in this low-level description of things. More importantly, I don't even have the ability to create abstractions, so these combinations of functions which then share a common data type that they operate on and I can turn into a reusable library. All of this capability wasn't really available here. An abstraction becomes really useful then when you build standard libraries out of it. You build libraries of these abstractions that solve for common use cases so that if people are going to rebuild applications for a variety of different purposes, they're going to reuse standard libraries that are available and shared amongst multiple people. I don't have that here. Here, I have to copy-paste the assembly code from one application into another to reuse it. Finally, once I have standard libraries, I want types to describe what the APIs these things are and to give me checking and validation over the top of that ahead of time.

Assembly didn't have any of these things, and you probably all know we solved this problem by introducing C, introducing a higher-level language that's a bit more attuned to what developers were actually doing. They were actually trying to create reusable libraries, create abstractions, have a standard library, have types that they could use. Over the intervening 20, 30 years, we've gone up the stack even further, of course. C, these days, is considered incredibly low level, and so, we had Basic, and then Java, and then .NET, JavaScript, Python – much higher-level languages that take care of more and more of the heavy lifting that we might have to do.

One of my basic views is that where we are with the cloud today is still very much in this assembly world. This is very low-level world of having to work with very low-level building blocks that are incredibly powerful. Do give us access to these things that have a lot of capability, but there's so much opportunity still to move up the stack. I'll talk a little bit about that throughout the talk today.

Software Engineering Desiderata

When I think about is what I do as a software engineer, what are the things I value when I'm looking at tools at a software engineering perspective, what are some of the things I like. I won't talk through all of these, but, if you're doing application development today, you probably see some of these things like, "Yes, those are important things to me." If I hit some application development environment where I didn't have access to high-level languages and I had to write an assembly, that would be bad. If I didn't have access to package management, that would be bad. These are things which generally are considered useful.

The interesting thing is that many of these really are not things that are available in the typical workflows around infrastructure today. In fact, there was a whole talk. For those who went to Yevgeniy's [Brikman] talk before this, in this track, there was an entire talk just about how to do unit testing and integration testing for infrastructure. These are not solved problems. These are problems that still need a lot of work, and they're not the norm for folks to be doing this. I'll talk about a little bit more about a few of these as we go.

Infrastructure as Text

One of the things that folks may think, if they're especially coming from the DevOps side of things, is that the solution to a lot of these problems that we talked about on the last slide, all of these things we want from software engineering, the solution is infrastructure as code. It's taking my infrastructure and turning it into code, and using software engineering practices around that code. This is actually really true. I think infrastructure as code is one of the foundational tools that we have available to us to make infrastructure more programmable, more amenable to software engineering.

There's a lot of tools for this, so many folks here are probably using one or more of these tools. There's CloudFormation and Azure Resource Manager for the major cloud providers. Terraform is very commonly used for multi-cloud and cross-cloud things. Then, of course, I'm throwing Pulumi here just because that's where I work on, but there are lots of tools around infrastructure as code that we can use.

The problem is that most of these tools actually aren't infrastructure as code. They're infrastructure as text. This is actually an important distinction. Infrastructure as text is still useful. It means I can source control my files, it means I can still run CI/CD processes of my files, but it doesn't mean that I have all the same richness of software engineering capabilities as I have with code. What we really want to look at is what does it mean to bring some of these capabilities of code of application development into my infrastructure. To do that, let me start with a quick demo and take a look at infrastructure as code with a focus on the developer experience.

Demo

I'm going to start with an empty beginning application here. I'm going to just write some infrastructure, but I'm going to do this in code, I'm going to do this, in this case, in TypeScript, but Pulumi supports a variety of different languages. What I'm going to do is I'm going to start by just writing that I want to have a bucket, so I'm going to say, new aws.s3. There's a couple of things we notice right away. First off, I'm in Visual Studio Code here, so I'm in an IDE. I'm getting that IDE environment that I expect from a rich development language. I'm getting things like, when I hit .s3, when I hit . here, I get access to all the different APIs that are available in my cloud platform in the same way I get all the access to all the different APIs available in .NET or in Java or what have you. I'm getting all of that directly within my environment. Then I get access to all the different things in the s3 namespace. I can create a bucket, and I get all the help text, and all these things, full IDE experience here. I just want to say bucket. Now I want to just export bucketName.

What Pulumi does is it lets me take this code written in my language and just say, "pulumi up," to go and deploy it. When I do pulumi up, first, it's going to preview my application. It's going to say what will this do to my cloud environment. We're about to modify an actual running cloud environment, want to give you a sense of what's going to change. In this case, it's telling me it's going to create that bucket. I can see the details of what it's going to create, like here. I can go ahead and say yes. This is standard infrastructure as code. The only difference is that I'm using an existing programming language here, like TypeScript, to define that infrastructure as code.

Because I've got this, I can then go and create more resources. I can say, const obj = new aws.s3.BucketObject. In this case, I want to set a few different properties on here. I can say I want this bucket object to live inside that bucket. I want its key to be "obj" and I want its content to be "Hello, world!" Now when I run pulumi up, I'll see a preview again of what changes are going to be made. This time, instead of creating both of these resources, like I would if I was running a typical imperative program, we'll see we're only going to create the new object, only the change that I made, only the things different from what's already deployed. You can tell this program is really describing the desired state of my infrastructure, and every time I deploy it, I'm updating my infrastructure to match this new desired state. In this case, I'm going to create that bucket object. You see it has the "Hello, world!" content and the key "obj." I'll go ahead and deploy that.

So far, this is just standard. If you use Terraform, if you use CloudFormation, something like that, this is looking fairly familiar to that experience. This is just infrastructure as code The only difference is I get a little bit of IDE experience, get a little bit of type checking, so if I mess up the names, I get feedback on that right away, a lot of these simple things, but really, it's fundamentally the same.

Where things start to change is when I can use all the capabilities of the underlying platform. I can use a for-loop and I want to create one of these objects for each filename in this directory. I'll use the filename as the name of this thing. I'll use the filename as the key. Instead of hardcoding the content, I'll read that off of disk. I'll just say readFileSync and just do a little bit of Node.js stuff to combine the root folder with the filename. There we go. We've written a little loop that creates my infrastructure. This is using some files on disk, going to find out what they are, going to put each one of them up into this s3 bucket. This is still infrastructure as code, but here, I'm actually describing that desired state by doing some computation using some libraries from Node. It's starting to feel a little bit more like a code, like something that's programmable.

One thing you'll notice, I get a little squiggle here. I'm actually getting some of this feedback that's telling me I can't assign a buffer to a string, and in fact, that's because I have a subtle mistake, which is that I need to describe what encoding this file is in before I upload it. It's feedback I can get from my IDE here. I'll just go ahead and update. You should see we'll actually delete the old object we had and create two new objects. We're going to change the infrastructure here. Scroll up, we'll see we're creating these two new ones, index.html and README, and I'm going to delete this object. I can see the details of what's inside those. This one has some markdown in it. This one has some HTML in it. I can go ahead and deploy that. We can use things like for-loops.

I've been doing this iteration of going back and editing some code and then running pulumi up. One of the things that we've just added recently as an experimental capability, something that I really think is interesting, is the ability to do this interactively. I can say, PULUMI_EXPERIMENTAL=true and then pulumi watch. Now, when I make changes to my code, it's just going to automatically deploy them, and so it feels a lot more quick. If I want to make this a public website, for example, I can say, indexDocument: "index.html." I hit Save and you'll notice it starts updating immediately. I can continue working on my application as it's updating in the background, and all of this is deploying in the AWS as I make these changes. I can make a few other changes along the way. To make this something that's accessible publicly, I'm going to make it "public-read" acl, and I'm also going to make the content type be "text/html." Finally, I'm going to export the URL, and I'm going to export that bucket endpoint.

We're doing updates as we go, so we're seeing this application. All these cloud resources are just magically appearing as I'm editing my code and hitting Save. We're creating resources, we're updating resources, the cloud environment is changing right as I type this code. I can come over into a new output here, a new window here and say, pulumi stack output bucketURL. Get the URL that I just exported there of the running application, come over into a new window and open it, and there we go. I've got a little static website being hosted out of a few files and I just wrote that using some code, using that interactive development style, all just in a few lines of code. It feels quite different, I think, for those of you who've worked with CloudFormation or something like that. The experience here is very different, even those what we're ultimately doing is working with a lot of the same primitives.

One last thing I'll talk about just from the perspective of software engineering is I can take this function here which syncs these files into a folder. I can even just give that a name; I can say, syncDir. I can take two things. I'll take the bucket I want to sync it into, so I can say s3.bucket, and I want to say dir, which is a string. I can just say, "I want to make this into a function that I can call." I'll call syncDir. Now, instead of hardcoding these files here, I can just set the dir and set the dir here. I've given that function a name, I've abstracted away that capability to sync a bunch of files into a directory. You actually see that update didn't change anything when I made that change. I refactored my code, but it was observationally exactly the same. It's creating the same resources.

I could go further here. I could document this, I could put it into an NMP package. I could put that NPM package, publish that to NPM and reuse it across multiple projects. A lot of these core software engineering capabilities I have because I've turned this thing into a reusable function and a reusable piece of code. That's a quick overview of using Pulumi to do infrastructure as code for developers.

One of the things that I think is most interesting about that last example was that watch mode that I showed where I was being able to interactively develop code and develop infrastructure as I edited. One of the things that reminds me of is this demo that many of you may have seen vectored to this talk back 10 years ago or something, where you showed up a whole bunch of things with developing in environments where you are able to interactively modify the code and immediately see the results of it inside the application and how, by doing this, by closing the gap between when you are writing the code and seeing the impacts of that code. Every time you could meaningfully decrease that gap, it enabled people to be so much more creative, because they could see the results and feed them back into the thought process immediately. This idea of being able to have this really tight feedback loop is one of these things which I think is really critical for application development. The tighter you can make that loop, every order of magnitude faster you make that, the more that people can take advantage of these building blocks and use them in really interesting ways.

This is one of the things with cloud development today. I think in a lot of places, cloud development feels very slow. In many organizations, you have to throw it over the wall to some other person who has to go and do the work for you, either Ops team to provision some hardware or DevOps team to tell you what you're allowed to do inside the Dev environment. Even when you do have access to it yourself, a lot of the cloud provider primitives are themselves fairly slow to use, but there's an increasing set of these cloud primitives that are actually very fast, things like what I showed with s3 buckets, things like serverless functions, things like containers in Kubernetes. We can actually do this iteration very effectively.

There's lots of tools in practice that are doing this today in other parts of application development, things like Chrome DevTools. The ability to just immediately understand what's happening inside your website and be able to modify it live inside the DevTools, created this really different way of thinking about web development that has really made web development substantially easier than it was in prior eras of web development.

Even in the cloud space, there's other folks working on this problem. One that I am a fan of is something called Tilt. This is a project in the Kubernetes space that's actually trying to take the same approach to really tightening this feedback loop for the way that we develop and deploy Kubernetes applications. We don't have to go and run those things and test those things on our local machine independently. We can actually do that inner-loop development cycle directly inside our Kubernetes clusters. That's a really interesting project.

Then Dark, which has a talk in this track later this afternoon, is taking all this even further and saying, "By rethinking the whole stack for deployments, rethinking the programming language and the platform that we're going to run on, can we take this from a couple of seconds to do this deployment to tens of milliseconds to do this deployment, make it so that every time I change my code, I'm immediately seeing that deployed into runtime?" I'm really excited about this general direction of enabling that.

Process Models

One of the things that I think happens around a number of these things, when we're thinking about this idea of every time I change my code, I'm deploying a delta into my running environment, it makes you wonder a little bit about what is the process model, what is the execution model that these programs are using. I want to talk a little bit about process models, just a background for how I think about why this environment is different and what it means as a developer trying to target this environment.

Imagine one of the environments that we have, and I'll talk about this from a JavaScript perspective and how JavaScript itself has evolved. JavaScript started in the web and that was the execution environment, was the page. Every time I navigate from page to page, in some sense, my entire world gets torn down and restarted up again with some different set of JavaScript. Everything is very transient in this execution environment. I have this page as very short-lived things. I load my code in via script tags, and everything is very stateless. Everything that I want to be stateful, I have to send off to some API server or something like that to manage the state. My application environment itself is this very short-lived thing that has no long-term state associated with it.

Further along in JavaScript's lifetime, Node.js happened, but certainly, the thing that probably folks here are most familiar with as a process model is the server process the idea that I have some operating system process. Folks often think about these things as living for a long time. I run my process on my server, and it lasts for days, months, years, potentially. One of the key things about process is that they do fundamentally have this idea that they have a finite lifetime. Either they crash, because I did something bad in my process, and so it died, or the machine they're running on crashes, and so they're no longer there, or I want to update them. I want to run a new piece of code, and so the only way for me to run a new piece of code is to actually deploy the new piece of code into a new process, tear down the old process, and make sure that anyone who is using it is pointed at the new thing. Processes fundamentally have this notion of a finite lifetime that they're going to run for. You can deliver these through ELF binaries or Windows EXEs or what have you. Again, still processes are largely stateless. Because they can die, their state is managed somewhere else, either on disk or in some operating system or other concept that's going to maintain the statefulness.

In the cloud, I think there's this other process, and this thing doesn't really have a name. I've never seen some standardized name in the industry for this. In Pulumi, we call it a stack, but it's the idea that I'm going to deploy some infrastructure into the cloud. I'm going to deploy a serverless function, for instance, into the cloud, and it's going to be there forever. When I deploy a serverless function, it doesn't matter if any given process dies, if any given EC2 instance underneath the hood dies. That serverless function logically exists forever. It has an endpoint that's accessible that's going to access it. This notion of an execution model where the thing I've deployed is a process that's going to run forever, and every time I want to make a change, I don't make a change by deploying a new EXE or deploying a new JavaScript bundle. What I do is I make a change by describing a new desired state and telling some orchestration tool to drive my existing environment, which is still running, into that new state.

This is more for folks who have used things like editing continue. This was a big thing, I remember, when I started my career. DB6 had had editing continue, and in .NET, we didn't have editing continue, and it was this big deal to bring editing continue to .NET for the first five years I was working at Microsoft. This idea of being able to take a running application, make changes to it while it's running, but keep it running and keep it state all consistent, really changes the way you think about developing applications. I think this is one of the things with cloud infrastructure. One of the reasons they feel so different is that they're not just a process. It's not just, I go on to a VM, and I restart the process, and I'm all good. It really is that this application I'm developing is one that's going to run effectively forever. I need to edit it as it's running, and so I think that's a key new idea that changes a lot of these things.

Application vs Infrastructure

One other concept I wanted to touch on really quickly is the applications and infrastructure piece. One of the interesting things that I've noticed is, in a lot of environments, applications infrastructure is not an unified thing. You're actually treating it very differently. In many organizations, these things are owned by entirely different teams. Even in organizations where they're owned by the same team, they're often managed in separate repositories, and so managed as totally separate code bases. Even when they're in the same code base, like in this architecture example here, they're driven off of different deployment pipelines. Along the top route, we deliver our application code, and in this case, a Docker image, and we push it into some container registry. Then in the bottom route, we deploy some infrastructure as code, in this case, using CloudFormation. That maybe deploys our cluster and our services into the cluster that take advantage of the application that is built above.

This idea of separating these things and having them be run very independently, it can be good for a lot of reasons, but it really often decouples things which are meant or which there's reason to think they should actually be more coupled. By coupling them together and deploying them and versioning them together, things would be a lot simpler, and pipelines like this wouldn't actually be necessary anymore.

Application + Infrastructure

One example of this that I'll show in a live demo in a second, is this piece of code here. This is some Pulumi code, again. It looks, at first blush, just like a normal piece of infrastructure. I'm saying I want to have this cloud.Task thing, which represents a containerized task that I can run in the cloud, and I want it to have its desired state, I want this Fargate task. It has 512 megabytes of memory reservation.

Then there's this line that's interesting, it says, build and then this local folder. What does that line do? That line says, "In my desired state, I want to be running the Docker image that was built from that folder." You think about, "What does that mean?" It means I have to actually, "I need to have a registry in the cloud where I can run, where I can put that Docker image." When I do this line of code, Pulumi underneath the hood is actually going to allocate a Docker registry for me in AWS.

It means I need to build that Docker image locally on my machine, because whenever I'm doing this deployment, that's the code I want, whatever code in that folder is ready to deploy. I'm going to run a Docker build, and then we need to actually push that Docker built image up into the registry. All of those steps we should have taken care of automatically here, but it means we actually version these things together. Every time I deploy my infrastructure, I get whatever the latest application code is. These things version identically together. This is a very different way of thinking about these things than we often do when we separate out the notions of how our infrastructure and how our applications evolve.

Pulumi is, by no means, the only place where this happens. I think one of the places you see this a lot is actually in something like Docker Compose. One of the reasons why developers really like using things like Docker Compose is they let them couple these things a little bit. They let them create a little infrastructure environment locally on their machine that describes how their application service or their piece of code they're writing can run inside the environment that includes the database, includes the network, includes whatever other pieces of infrastructure they want to run, but can actually pick up whatever the latest code they have is and run it inside that environment. Docker Compose, I think, is one example of this.

Another one is serverless. Both serverless the general technique and the serverless framework really encourage this idea that we couple our deployment of our code and the infrastructure. Inside serverless, we often don't have a distinction that the code makes much sense outside of the infrastructure. The entire way that we can invoke that function depends on how we've hooked up the infrastructure around it. The logging of that thing depends on how I've hooked it up to my logs. The access control is all based on IM and things like that. When we do serverless, we often are really naturally driven to a world where we version these things together, but I think there's actually reasons why this stuff can get integrated more and more.

Demo

Let me show one example of this that goes a little bit beyond even those examples back in Pulumi. I'll just show it right here. In this case, I've got the same application I just deployed which had the bucket and some objects inside the bucket. One of the things we can do inside Pulumi is we can look at this bucket object, and we have a bunch of properties on it that give me access to all these different things. You'll notice, there's also these two functions. There's onObjectCreated and onObjectRemoved. These are functions which actually allow me to hook up callbacks onto my infrastructure. I can say newobject. I can actually just hook up a callback right here. I can say, when this is called, I want to just print out this "ev" object. I'd say even, this is actually going to go ahead and start deploying this infrastructure.

The interesting thing about this is that this is actually two things. This is some infrastructure, you're going to see a bunch of things down here that are getting deployed, a BucketEventSubscription, a lambda function, an iam role, all the things I need to deploy this capability, but it's actually also some code. This code inside here is code that's going to actually run it runtime When my bucket gets a new object added to it, we're going to run this piece of code here. We've taken this idea of coupling this infrastructure in the application code to quite a bit of an extreme here by actually incorporating them into the same source file. These things are really versioning very closely together.

Now this update is completed. If I make some change, like I come in here and say Hi QCON SF, I can hit Save, this will do an update which should actually deploy that bucket object. Because I changed an object inside my bucket, it should invoke this function. In a couple of seconds, we should see that this logging actually gets run and I get the logs back into my IDE here. The roundtrip time for CloudWatch is sometimes a couple of seconds, so I'll just wait for it to finish here. There we go.

We see that now we're actually getting the logs from that runtime capability as well. Whenever an object gets added, we're getting that log sent, and that's going to be in CloudWatch somewhere, so I can access it from AWS, but it's also here within my development environment. I'm now actually combining into my development environment both these infrastructure changes that are happening and some of these runtime changes that are happening, so really getting this inner-loop development quite tight here.

Transition to Cloud Native

One of the things that I think happens related to this is, when you look at the world of – I call it pre-cloud here, but really, this is also what, for a lot of people, is the cloud but, it's this transitionary period of cloud where it's doing lift and shift. Everything is still very VM-based. I'm running my own processes inside my own VMs, and my provisioning is saying what I want to run inside these VMs. When I move to cloud native, I move to something like what's on the right. When I say cloud native here, by the way, I don't mean Kubernetes. I think, for a lot of people, cloud native means Kubernetes. For me, I mean, native to the cloud, so I mean using the native capabilities of the cloud. I may have some infrastructure or architecture like what's on the right. I have some API Gateway and Lambda, I have EKS and S3, a variety of these high-level managed services that I'm using.

You may look at this and say, "That right-hand side looks way more complicated. Why would I want to move to this world? That looks terrible." I think, in some ways it is. There's more stuff here There are moving pieces in this thing. There's more that you have to understand. The thing that's really key, though, about the right-hand side is that the operationally complex parts of this are these gray boxes The pieces that I own the operational behavior of and don't get to outsource that to a cloud provider are the application code I'm going to run inside these little boxes. Whereas on the left, I own the operational context of that whole VM. I owned everything about what's running inside that VM. In this cloud native world, I offload a lot more of that complexity onto some cloud provider, and I now own these boxes here.

The thing I trade off for that operational complexity benefit is I now own all these arrows and all these edges between these components. I own describing how all these things are hooked up. My application's logic is now embedded in how API Gateway is hooked up to Lambda. This is no longer something which is just the infrastructure. This is essential to the correct behavior of my application, is how are all these things wired up together. I think, in these architectures, it's no longer the case that your application code and your infrastructure are separate things. They're all very tied up together and they're all something that's going to have to version together in much more interesting ways.

Programming Architecture Diagrams

To give a concrete example of that, let me talk about this idea of programming architecture diagrams. I'll show this one simple architecture diagram here. This is the kind of thing if you go to re:Invent and you go to Werner's talk or something, and he'll show up all these diagrams of all these AWS services and how they're all hooked up together. You look at them all and you say, "I get what that means. I get what they're doing. I'll go and take that idea, and I'll use it in my job when I get back to it." It's always like you see this thing and you totally get it. In this example here, I'm going to take a video, drop it in a bucket. Whenever a new file gets dropped in the bucket, I'm going to fire this Lambda. That Lambda is going to spin off some long-running ECS task that's going to run some container that runs, in this case, Ffmpeg. Then that FFmpeg thing is going to write do a keyframe extraction and drop a JPEG into a bucket and fire off another Lambda. I can look at this and in a couple of sentences, I can describe and understand what it's doing.

If you want to interpret this and said, "I'm going to implement it," it suddenly gets scary. This is not easy to implement. In fact, you look at implementation of this, and it's 500 lines of CloudFormation or so to build just the infrastructure around this. It's a whole bunch of application delivery pipelines that I've got to do to make sure that those two Lambdas and that ECS task are all getting deployed and synced with my infrastructure. It's surprisingly complicated to actually build the kinds of things that are in these architecture diagrams.

Demo

Let me show you what this thing looks like, or can look like in something like Pulumi. I'll just show this part. I've collapsed a few pieces of it just so I can talk through it very briefly. This example here encodes that entire architecture diagram in a real running piece of code in just about 40 lines of code or so. You can see the various pieces of this. We've got a bucket that was in the architecture diagram. This is where I'm going to store my video files and my images. I've got this long-running Fargate task, this long-running compute piece, which is going to run FFmpeg. If I look at this, it's actually exactly that code I had on the slide earlier. I've got my memoryReservation, but then I want to build whatever is here. If I go and look at that, I can see, that's just the folder here with a Dockerfile in it. This could be my Java app, my .NET app, my GO app, whatever it is I have. In this case, it's actually just effectively some bash that runs FFmpeg inside a container, but it can run whatever I want. It'll actually build that image from whatever source code I have here and deploy it directly.

I can then say I have two event handlers. I have one that says, "When a new MP4 file gets placed inside this bucket, run some code." Here, I've got, "When a new JPEG gets placed inside this bucket, run some code." Probably the most complicated part of this, as is normally the case, is just the parsing and passing data back and forth between things. Here, I've got to parse some of the inputs that come with the image that gets uploaded, and I've got to pass those into the Docker container as environment variables. Effectively, all I do here is I just take that task that I had defined up here, and I run an instance of it whenever a new video gets uploaded. As soon as I'm running that, I write out a log entry, and then whenever it's finished, when a JPEG gets uploaded by the task, I would just going to print out that it finished.

You could imagine extending this. If I wanted to have some data store that kept track of what's running and what's finished, I could add a new table in here and write that information in, very easy to extend this to more complicated infrastructures. The key thing here is really, just in 40 lines of code or so at the level of that architecture diagram, and we would actually develop something which represents the task at hand.

Architecture as Code

Again, Pulumi is definitely not the only tool looking at this, and in fact, the AWS CDK, which just GAed early this year, and I think there's a talk on it next, is doing a lot of really interesting stuff here specific to AWS. We're taking some of these patterns and making it really easy in higher-level ways to work with them at the level of the architecture, not just at the level of the individual building block services that AWS provides and really so some of this stuff that CDK is doing. In fact, the notion architecture as code, I really love this. Clare Liguori at AWS wrote a blog post recently, a really great blog post about this topic, and used this word architecture as code. I'm stealing it from her, because it's such a good phrase to talk about turning these architecture diagrams into code.

There is one last demo I wanted to quickly show, and that's focused on this tweet that I saw a couple of weeks ago. Someone wrote out this tweet, "I want a website that does one thing. It displays the number of times it's been viewed. What is the shortest number of steps you can think of to get it running in the cloud?" It feels like a trick question. It's like, "This shouldn't be that hard. I know how to build a website that has a counter on it. Why is this a hard thing?" Actually, a lot of folks jumped on to this thread and responded with for various different tools, the steps you can go through to accomplish this. It is actually a surprisingly hard thing to do. I wanted to bring together a few of the ideas I've touched on here to show one of the ways with Pulumi where you can do this today. I'm going to claim I can do it 90 seconds. We'll see if that really is true.

Demo

I just have an empty slide here, and I have that same watch thing running. I'll just write out what I would do to build this website that has a counter. The first thing I'll do is create a table, so I can say, new cloud.Table, and I'll just call this counter. I hit Save and it's going to start deploying this. Then, while it's deploying that, I'm going to create a new API, and so this will serve my website. I want to say, api.get, so I want to route on this thing. Whenever I get the slash route, I want to run some code. For those who've used Express.js or something like that, this is using a similar style to that. I'll just, res.end, and I'll say, "Hello world!" Finally, I want to export the URL that this thing is going to be at, so, export const url = api.publish().url.

When I hit Save now, it's actually going to go and deploy that API as well. While that's going, I'll actually implement the logic of this application. I can say something like const counter = await table.get, and I'll just get the entry that's called counter. This entry may not exist because the first time I hit this page, there's nothing there, and so let me just seed it with n: 0. Now I'll say await table.update, same entry, and I'll write out n is counter.n+1. Just as a final thing, I'll just say, Seen $(counter.n+1). Now we're going to deploy and see that. I can say, pulumi stack output, get the URL, go visit that. Assuming that Lambda spins up quickly. It's seen one times, two times, three times. There we go. We have a counter.

We're just a few lines of code in Pulumi. We used that fact that we can use that watch mode to iteratively deploy this thing, and we very simply built that application. This is that experience, that developer-centric experience. If I want to be able to express my ideas very quickly in a programming environment, but deploy those into a real cloud application, I can take advantage of all the capabilities of AWS here.

The Future?

That example I just showed, I think it's really exciting that you can do that thing. For a somewhat scenario we offer that capability with something like Pulumi. I think there's still a ton of opportunity to take a lot of these ideas and move them further forward. I think a lot of folks are working on piecing this. There's a few things that I have no doubt are going to be part of the future of how we work with the cloud platforms.

A few of the things that I really think about a lot, one is these frameworks on top of the raw building blocks of the cloud. You saw me using those throughout some of these demos, these higher-level things, like this cloud framework that I was using in the last demo, that are just easier to use than the raw building blocks. That doesn't mean I won't sometimes want to reach for the raw building blocks, just like I sometimes reach for my file system access APIs on Windows or whatever. A lot of the time, I want to use that high-level Java file system API. I don't want to be stuck with those low-level building blocks all the time.

Another one is unifying application infrastructure code. Continually, this is going to be a trend to move these things closer together, especially as we move to more cloud-native architectures. I think we're seeing this with serverless, we're seeing this with Kubernetes, and I think this is going to continue to be a thing which we see more integration of these two things. That's actually going to drive the teams integrating more, so more of this responsibility moving into development teams and development teams being able to take ownership of more of this.

This idea, I talked about these process models and this idea of this long-running cloud process. I think it's an important way to think about this stuff and really starts to raise questions like, "What about if I need to do database migrations? I've got this long-running thing, how do I script those database migrations into these kinds of deployment processes?" I think there's still a lot to be discovered around that and a lot of innovation that's going to happen there.

Then, finally I've spent a bit of time talking about how important this rapid iteration is, and I really do believe that that's one of the things, the idea that the cloud platforms feel like they're at your fingertips. This isn't something I have to throw over the wall to somebody else so that I can immediately see the results of what I'm doing. I think this is going to really change the way folks think about being able to work with the cloud productively.

In summary, I'd say I believe that, over the next few years, every developers end up going to be a cloud developer to some extent. Everyone is going to be touching the cloud as a part of the way that they develop and deploy their applications. I see a huge opportunity across all of these for the industry and for all of us as developers to take advantage of this.

Questions and Answers

Participant 1: Is Pulumi production ready?

Hoban: Yes. Pulumi hit 1.0 a couple of months ago, I think. I think we've been working on it for about two and a half years. It's been out publicly for a little over a year and a half now. We have a lot of folks using Pulumi in production today and the core of it. A lot of these things I demoed here are things that are higher level, as we had some experimental features here. Some of what I demoed here today is definitely more on the edges of what we're doing. The core platform, the core infrastructure's code offering is definitely being used in production today.

Participant 2: If I want to learn more about Pulumi, where would I go?

Hoban: Pulumi.com has a ton of resources, getting started information, and a whole bunch of guides, and things like that.

Participant 3: Is Pulumi mostly integrated with AWS? All your examples were in AWS.

Hoban: It's always a tough thing, because I want to show some continuity of what we're doing, and so I did focus on AWS here. Pulumi is equally accessible across AWS, Azure, GCP, Kubernetes, a variety of other cloud providers. To some extent, everything I showed here is something you could do on those other platforms as well. AWS is probably where the single-largest user base of Pulumi is, but there's large user bases across the rest of those as well.

Participant 4: How do you account for in large teams of the state management and the infrastructure drift?

Hoban: Infrastructure drift thing often is, somebody else on the team is going behind the scenes of the infrastructure's code and changing something in the console or deploying something from their laptop that wasn't actually checked into the main codebase or something. Pulumi has a bunch of features to deal with that. It has a Pulumi refresh which goes and make sure the state matches what's actually in the cloud provider. You got tools like to go to import existing resources from the cloud if you've got things that are out there that you want to bring into your environment but you don't want to have to recreate. There's a whole bunch of tools, there are some operational tools around dealing with this thing.

You can also do things like catch; in your preview, you can see whether the changes are something you're going to actually expect to happen. I focused a lot onto that inner-loop experience, which is a bit different. When folks think about infrastructure's code, I think they more think about this thing. They think about the operations for running this stuff in their production environment. Pulumi definitely has the full toolset for that use case as well. Today, I really wanted to focus on the other side, which I don't think gets as much attention, which is how do we make this stuff more accessible to developers, but it's certainly for this production environment. There are a bunch of these tools to make sure you can robustly manage a production environment.

Participant 5: Which programming languages are supported? Second part of the question is, where is the state being stored?

Hoban: Today, we support JavaScript and TypeScript. We support Python. Actually, just today, we launched and previewed .NET support, so C#, F#, and Bb. GO and Java are on our near-term roadmap. That's the core languages we have today. We're going to continue to expand our language coverage. On the state question, there's a free tier of a Pulumi service backend which you can use, so you don't have to worry about state at all out of the box. It just gets managed by the Pulumi service itself. If you want to manage it locally on your machine or in S3 or Azure Blob Storage or whatever, those options are available as well. You can just log into whatever backend you'd want to use to store the state. We try and make that something that most users of Pulumi don't even have to think about, but if you don't want to trust those files to go up to Pulumi, you can take it yourself.

Participant 6: What does it look like when you're deploying to multiple environments with something like Pulumi? I'm developing locally, I have the fast feedback loop, but then, when I'm ready to commit it and push it to my test environment or production, what does that look like? What's the model for that?

Hoban: I focused a lot on some of that very inner-loop phase and during that development cycle. In that phase, you don't care so much about the infrastructure. If you break everything, it's ok; you throw it all away, you start a new one, just like you would if you're developing your application locally and something goes wrong. You just kill the process, deploy again. When you're moving it into a robust test environment or ultimately into a production environment, that phase, you're typically going to put that into some CI/CD system. Pulumi can be added into a variety of different CI/CD systems. Then you'll deploy. Every time you push or every time you go through some gated deployment process, you'll actually go and do that Pulumi deployment. Potentially, when you open a PR, you'll do a preview of what change would happen if you merge that PR. Then, when you merge it, we'll actually push that into the environment it's being mapped.

There's a lot of different techniques you can use there, but one that we've seen is just mapping branches in your source control into environments to have branch as your testing environment, a branch as your production environment. As you merge bits through those different branches, it'll actually merge them into that environment in your production infrastructure. That's something largely you can just script yourself using your existing CI/CD tools, in the same way that you might script CloudFormation or Terraform. Pulumi, itself, does offer some features in that service that help with that.

Participant 7: With respect to AWS, how rich is the library support? Does it support all the resources, are you guys working through it? Or does it support most commonly used?

Hoban: Pulumi, itself, has a support. We project the libraries for AWS, Azure, GCP, Kubernetes, everything. We project the full API surface area of those into the Pulumi programming model. Everything is available there in the raw form. Some of the higher-level libraries I used here, like these cloud things, those are much more limited today, and we have some higher-level libraries in some limited domains, in the serverless domain, in the ECS domain, and a few others. That's something that we continue to expand those higher-level libraries. We're looking at doing things around RDS. There's a lot of interesting things we can do to simplify the way that you work with databases in AWS. That's something that, as we work with folks in the community and customers and things, we'll continue to expand those higher-level libraries to cover more cases.

 

See more presentations with transcripts

 

Recorded at:

Jan 28, 2020

BT