BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Turbocharged Development: the Speed and Efficiency of WebAssembly

Turbocharged Development: the Speed and Efficiency of WebAssembly

Bookmarks
37:53

Summary

Danielle Lancashire discusses why Wasm is the most cross-platform unit of compute for serverless applications, and how that translates to efficiency at scale.

Bio

Danielle Lancashire is a principal engineer at Fermyon where she works on bringing WebAssembly to the Cloud. She is also a co-chair of the CNCF wasm-wg, member of the Kubernetes Code Of Conduct Committee, and a Kubelet maintainer.

About the conference

Software is changing the world. QCon London empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Lancashire: Let's talk about efficiency, and specifically carbon efficiency. That is a term that until about 9 months ago, if you said to me, I would have no idea what you were talking about. I imagine that's potentially true for many of you when applied to the software space. The Green Software Foundation came up with a method of measuring carbon intensity for your software, called Software Carbon Intensity. That is measuring the amount of resources used over a certain number of requests to a functional unit of work. This is a very simple version of that equation. It actually looks something more like this.

When a coworker sent me this, my response was, what? It turns out, the letters actually mean things, obviously. It's the energy consumed by the software system, times by a marginal rate for where you're getting your electricity from. If, like me, you live in Germany, and everything is cold, that goes really big, and it's a sad time. Plus the embodied emissions of the hardware used to run your software. If you're constantly buying new servers, number go up. If you do things like some interesting work that came out of, I think Telefonica in Spain, where they pulled in a bunch of old Android phones to build mini edge data centers on old phones, that's one way you can start being more efficient. Divided by the functional unit of processing an order in your e-commerce store or whatever else.

What Can You Do?

I think we all agree that we want to use less carbon and save some of our planet's resources while we still can. There are a few actions that you can do. We're going to mostly focus on two of them. One of those is energy efficiency, so using less energy to do the same work, and hardware efficiency, using less hardware to do the same work. They are surprisingly, not the same things. Then, if you have done all of those things, or have workloads that just require a lot of energy and you can't really do anything about it, then you can time or shift where those workloads happen to use more efficient energy.

Should I just write everything to be really fast? Is that going to solve all of my carbon problems? No. This comes from a paper on energy usage of different programming languages. Java, for example, is fifth place for both energy consumption and time, but uses six times more memory to do that work than C, which is a lot more hardware that you need to do that kind of work. You need to look at a lot of these things more holistically, to understand what works for your software. Even if you wrote everything in C, in like really efficient, beautiful C with totally no memory bugs, Pinky promise, no memory bugs, it still doesn't tell the full story, because very few individual applications use a whole CPU all of the time. We have workloads that shift throughout the day, whatever.

Most servers use 30% to 60% of their maximum power when they're doing nothing. That's a lot of power. Improving compute density often matters more than the efficiency of a single application. Some of you are thinking, I put my apps in containers, I run them in Kubernetes. Is that enough? Am I done? Can I go home now? Planet saved. Seventy percent of CPU is unused in most cloud Kubernetes deployments. That is a lot of C, over not a lot of R.

If we take a look at the evolution of compute. We started out deploying to bare metal. Scaling up was making a purchase order from Dell, waiting. Your users were sad. Then they gave you some more servers and you rack them in your data center. Then we decided that running single purpose servers was really dumb, and computers got better, so we could do virtualization. Then we could run multiple applications with a whole copy of an operating system and a kernel, and all of the overhead for everything. It was still better than running on a server. Then, containers came along, and we shifted that down, we started sharing the kernel. Then we started shipping a whole userland with every application anyway, and now we have containers that can be several gigabytes in size. It sucks. Whenever OpenSSL needs to be patched, and you need to go patch all of your containers like now, it's about time.

WebAssembly (Wasm)

My very leading question marks, let's talk about WebAssembly. What is WebAssembly? WebAssembly is basically just a portable compilation target, originally designed for web browsers, so you could run things that weren't JavaScript in the browser with your JavaScript. It gives you a memory safe sandbox execution environment for any language, which gets really interesting, especially when you combine it with WASI. WASI is a set of portable, modular WebAssembly native APIs that can be used by Wasm code to interact with the outside world. That gives you things like your boring, typical access to files, file systems, whatever. It also gives you generic interfaces over things like key-value stores, queues, HTTP applications, everything. It means that you can change out different components of your application without having to change your code, which is really cool the first time you see it live, and it works.

What Makes Wasm Great?

What makes Wasm great? Wasm binaries are really small, because you're just shipping your application's code, not 3 million copies of OpenSSL, a kernel for funsies. They're a little bit bigger than just building an x86 binary, but not many people are just shipping binaries to production. They're really fast to start. Unlike pulling a container where you then have to go and build a whole layered file system, and do a bunch of startup and stuff, you can start and shut down a WebAssembly module in under a millisecond. That's not after you've pre-warmed up some execution environment. That's, you pull the WebAssembly. You exec it.

You close it down, and it's a millisecond. That's really cool. They're also portable. I have a Kubernetes cluster, that's an Intel NUC, like an AMD NUC, and a RISC-V board. It's one cluster, and you deploy the same bit of code, and it runs on all of them. They're also well isolated. With sandbox execution, linear memory, and capability-based security, it's really hard to mess up that badly. WebAssembly slots in here, where you have your application running in some WebAssembly execution context, like Spin that I'll be talking about a little later, on the host. If you need to patch things like OpenSSL, like I said, you do it once on the host, not in all of your applications.

Serverless

Let's talk about serverless, or as we used to call it, FastCGI. What is serverless? A lot of talks that talk about serverless don't really talk about what it is. Serverless means different things to different people. Most of those things fit into two different buckets, a type of application and a development model. I think about it as the concept of building something that's event driven, ephemeral, and mostly stateless. If you are building serverless WebSockets, for example, your gateway handles the state management of socket management and keeping those sessions alive, and defers to your code only when it needs to do event handling. You can deploy those independently. A development model where you focus on your business logic, and leave all the cruft to the platform, and you don't really need to think about where it's going to be running.

That doesn't really answer why. There are still servers running your code. Why can't I just manage those myself? Giving you the benefit of focusing on business logic means you spend less time dealing with things that don't matter to you. Especially if you work in a larger organization, you can bury a lot of the complexity, and let average developer be very happy, write that business logic, move on with their day. It's a good time. Gives you more flexible scaling, because you don't have to handle all of that state management in your process, and because those binaries are really small and really fast to start, you never have the complexity of needing to wait 10 minutes for a container to boot and build its caches.

Or, if you're deploying some types of software, that might be more like 6 hours. We've all been there. Because WebAssembly modules don't have the concept of a cold start, and because you don't need to do all of that stuff, in an ideal serverless world, you can get a lot more increased density, because you're doing a lot less stuff in your application.

Early serverless is backed by mostly micro VMs and containers. That's a problem, because those containers, and if you have images, tend to be really large, because you're not just shipping your code, you are shipping a whole Linux distribution in a box. That means that not only do they take a lot of disk, and take a long time to download to every instance where they need to run, but they can also take seconds to minutes to insert however long your cache warming takes to be ready to serve traffic. That means you often scale up far more than you need. That reduces the overall density of your clusters. That really sucks.

A lot of people's things aren't actually in that high demand. Eighty-one percent of Azure functions are invoked once per minute or less, but they stay running all of the time, with whatever thing is serving the traffic and also your thing. That can be prohibitively expensive. Even more amusing to me from a waste perspective is that 45% of them are invoked once per hour or less, and they stay running all of the time. That's really sad, especially in the line of Kubernetes where your average cluster can do 100 pods a node, but most of them do more like 20 to 30. That is a lot of servers, for not a lot of compute. A lot of serverless solutions are also really vendor specific, and have poor local development stories, and poor operational stories when you want to do things like continuous delivery, or continuous integration.

What would our ideal serverless look like? It would be language agnostic. Not everyone wants to write Rust. Not everyone wants to write JavaScript. There's a lot of everything in the world. It should have great developer experience. The way you run things on your laptop shouldn't be so different from how they run in production that they don't look the same. At the same time, you shouldn't have to spin up a Kubernetes cluster to do your job. They should be well isolated for multi-tenancy, so that you can increase density across all of your clusters. They should be cross-platform. Shouldn't have to run on x86 Linux.

Most of us now have Arm laptops, but probably still deploy to x86 servers. Although with most modern toolchains, that doesn't always break, I spent enough time fighting Cargo cross in the last 2 weeks to know that it's still not fixed for everyone. Sometimes you want to deliver to Windows for reasons. Or maybe you're doing some weird thing where you want Apple Silicon in the cloud. I've seen weirder. You should be able to do that too. Portable across all of those platforms. It should also be efficient and scalable. You shouldn't be running one function per node. Ideally, you're running 10,000, if that's what you need.

That looks a lot like WebAssembly? Language agnostic, so anything can target WASI. Language support is constantly improving. At Fermyon, we maintain a list of the state of WebAssembly support in a bunch of languages. Right now, there's pretty good support in things like JavaScript, Python, Rust, Go, TypeScript. We actually just released a new JavaScript runtime based on SpiderMonkey. It's getting surprisingly good. Running JavaScript in production without having to worry about memory leaks is something that made me finally learn JavaScript. We built a tool called Spin. It's open source. It's a framework and developer tool for building and running WebAssembly applications. There is a demo. You do, spin new, you pick a template, hello-qcon. We're going to write a TypeScript application.

I'm going to do the npm install dance, take a look at the code. We're going to then use spin watch. What that's going to do is watch our code and restart the application every time our code changes. If we call localhost and I type out, hello world, but if we change this to Hello QCon, save. Then, hello-qcon. Getting that kind of feedback while also running in the same way you would be in production is really neat. It should be well isolated for multi-tenancy. We do that using Wasmtime, which is a WebAssembly runtime as part of the Bytecode Alliance, which is a foundation with a bunch of people who work on WebAssembly stuff coming together to build all of the common tooling for everyone. Wasmtime is written in Rust, but has bindings for C and some other stuff. If you want to embed it in your own applications to build things like plugin systems, we have WebAssembly.

Cross-platform, we have built a Spin for Arm, x86-64, and even RISC-V. You can run WebAssembly in any kind of thing you want. A few examples here, you can just shove it in systemd. systemd is great. Also, in things like Kubernetes, or even Nomad. We recently released SpinKube that we're in the process of joining the CNCF with to simplify running your Spin applications on any Kubernetes cluster in any of the clouds, which is awesome. It should be efficient and scalable.

Demo

Open http://finicky.terrible.systems. We're going to go feed a cat together with WebAssembly. When you get there, you should see something like this. Then, you can either play as Ninja the dog or Slats the cat: our adorable, lovely mascots. I'm going to play as Slats. Then you have to feed the cat whatever it wants. Any time you do anything, that's executing a WebAssembly module in a Kubernetes cluster, while also runs a WebAssembly module in our Kubernetes cluster, that's a mix of AMD and x86 nodes. If I come look over here, kubectl top, we get a bit of a live set of metrics off those pods. As people play the game, we potentially see load. This should autoscale. It relies on people being really active in the game. That application is actually really cool. Because your average application is written maybe in one programming language, but within your organization, there are many.

This application is written in a mix of Rust, Ruby, and JavaScript. It all runs together in a single WebAssembly module, that just as I ran in the cloud before, I can also run on my laptop. As you can see, the different modules are shown and where they're mounted to in the application. If I just go to here, it's the same application we were just playing, but the data is written to a SQLite database on my laptop, as opposed to Redis and a real database in production with no changes to the application. Apparently, some things were broken somewhere, locally. That's ok. I probably didn't run database migrations, because we've got to have some problem somewhere. I believe some people call it job security.

Questions and Answers

Participant 1: That interesting table you had of the 20 programming languages, there was a letter in brackets before each language name. That wasn't clear to me what that letter indicated, was I or C, it could have been interpreted or compiled, maybe?

Lancashire: That one is giving you the ranking in energy based on everything, so you can do the relative position, and which is interpreted and compiled.

Participant 2: If I were to run a Java application in WebAssembly or WASI, for example, I have used Spring Boot as Quarkus, how would I do that?

Lancashire: Java specifically is a very complicated question in WebAssembly. Right now, targeting Java to WebAssembly involves using a project called TeaVM, which was originally written to enable Java in the browser and eventually got WebAssembly support. The problem is, is it doesn't use the OpenJDK class library, it has its own. Only a very limited subset of things in Java specifically work right now. It's a thing that for a lot of reasons we want to improve, but working with Oracle is a process. We don't really know what our path forward is as a community right now. It might be that we invest a bunch of time in TeaVM.

Participant 3: What kind of resources can the workers access like local storage, network ports? How do you sandbox this whole thing? How do you open?

Lancashire: What kind of WebAssembly module access, and how does that work with things like sockets?

This is a really interesting part of WebAssembly. By default, a WebAssembly component can do nothing. You have to give it the capability of doing those things. You can give it a file system, so it can access local files. You can transparently to that application, also give it a virtual file system, or block or file system access. It can only do what you give it. For things like sockets, we give you the ability to configure the specific things that a module can connect to. For example, you can say that your frontend application for your consumers can talk to a specific backend, but it can't talk to your back office secret stuff.

We'll guarantee that at a runtime level. One of the more interesting things that's happening there is, we're expanding that security model to be able to give sealed data to components, so you could do things like, in your OpenAPI definition, mark data as PII, and then only give the PII sensitive data to components that are allowed to have it. If you pull in a library for doing analytic stuff, or whatever, you can give it the other data but not the PII data. You can guarantee that at compile time of your application. We're not quite there yet. When we are, it's going to be awesome.

Participant 3: Have there been any standard performance benchmarks run on Wasm, or maybe like these kind of efficiency benchmarks as well, that you saw, like you mentioned just before?

Lancashire: Have there been any benchmarks done on WebAssembly runtimes, not just the sort of density efficiency stuff?

The answer is yes. There was a research paper last year that looked at, I think, five different WebAssembly runtimes versus natively building the same application. I think it was something like a 1.3x to 1.5x was in at runtime. A lot of advancements have been made in the runtime efficiency since then. It's a relatively negligible overhead for what I think is quite a nice benefit. It's getting better all the time.

Participant 4: I think Docker and containerd is also working on some kind of integration with WebAssembly.

Lancashire: Yes, containerd. Docker were the first. They ship Spin as part of Docker Desktop. If you have Docker on your machine, you can Docker run a Spin application, and assuming you have the right bits of Docker enabled, should just work, which is really cool. The containerd project has a thing called runwasi, which is a library for building containerd shims, so you don't have to run the full container thing to run WebAssembly applications.

As part of SpinKube, we have a containerd shim for running Spin applications, where we'll also do things like cached recompilation of your application. The first time we schedule our node, we'll compile it, store in container these content store, and then as you scale up on that node, you won't have to do the WebAssembly to native code compilation again. It'll reuse the same artifact. Which means you can go from a single pod to 200, kind of like that.

Participant 5: I seem to remember that in the browser, at least, WebAssembly was limited to one thread. Is that still something that's limited here, or is that not a thing?

Lancashire: Technically still, yes. There's more support now for doing Async/Await type things. Wasi-threads is getting very close to being at least ready for most people. Assuming you're offloading a lot of your tangential stuff like observability to the host, there's not a lot of reasons why you'd want in most event driven things to need threads anyway. Wasi-threads should be shipping this year.

Participant 6: If you use WebAssembly, you can reach the kernel space to use, for example, BPF probes or something?

Lancashire: Should just work.

Participant 6: If your WebAssembly application can use the kernel probe?

Lancashire: It can't access the kernel unless you give it access to the kernel. You could set up bindings for things, but by default, it can't do anything.

Participant 7: You talked about that you can compile Python to WebAssembly. Usually, if you're doing intensive things in Python, you're calling into C libraries, so NumPy would be the most obvious example here. Would you be able to take like a script using NumPy and compile that to WebAssembly?

Lancashire: You say you can compile Python to WebAssembly, how does that work when you have C extensions like NumPy?

Some things work and some things don't work yet. With NumPy specifically, I'm not sure. My colleague Joel started a project porting a lot of C based wheels to be able to build for WebAssembly as part of his work on componentize-py, which is the thing that takes Python and builds WebAssembly components. For NumPy specifically, I'm not sure. There's no reason why you shouldn't be able to. It's just a case of configuring build systems, just a build system. It's all fine. I spent too long looking at LLVM.

Participant 8: I have a question about WebAssembly. Mostly [inaudible 00:35:56], for example, on containers, vendors as well for traditional containers. They are using native instructions so they can be optimized, for example, on Intel to use AVX-512, or its equivalent for RISC-V on Arm. Why does somebody leverage this, because the hardware is not huge for standard binary build by distributions that are meant to be compatible with old CPUs of the same architecture. When you go down to optimizations of binaries to use specific instructions, maybe the difference of overhead can be bigger.

Lancashire: How do you handle machine specific instructions with WebAssembly when you want to do specific optimizations for what you're running?

WebAssembly already has support for things like SIMD, and some other machine level things. It means your application says, I need this stuff to run. If the runtime can't provide it, it can't run that. The other is, because your WebAssembly application goes through a crane lift optimization path when you run it, you can actually specialize a lot of existing software to use better instruction sets when it can anyway. If you really care, you might not want WebAssembly still, but for a lot of people you probably get some extra performance for free. A lot of the SIMD and stuff is coming from browsers wanting SIMD2.

 

See more presentations with transcripts

 

Recorded at:

Oct 08, 2024

BT