InfoQ Homepage Presentations On beyond Serverless: CALM Lessons and a New Stack for Programming the Cloud

On beyond Serverless: CALM Lessons and a New Stack for Programming the Cloud

View Presentation

Speed:

59:02

Summary

Joe Hellerstein shares lessons from research on how we can deliver stateful, communicating, autoscaling cloud software, describing ongoing work in the Hydro Project at Berkeley on programmable cloud.

Bio

Joseph M. Hellerstein is professor of CS at the UC, Berkeley. He is an ACM Fellow, an Alfred P. Sloan Research Fellow and the recipient of three ACM-SIGMOD "Test of Time" awa-rds for his research. MIT's Technology Review magazine included his work on cloud programming in their TR10 list of the 10 technologies "most likely to change our world".

About the conference

QCon Plus is a virtual conference for senior software engineers and architects that covers the trends, best practices, and solutions leveraged by the world's most innovative software organizations.

Transcript

Hellerstein: My name is Joe Hellerstein. I'm a professor at UC Berkeley in computer science, and a fellow at Sutter Hill Ventures. I'm going to be talking about serverless computing, CALM lessons, and a new stack for programming the cloud. There's a story we like to tell in computing that with every new generation of platform that gets invented, a programming model emerges that allows third party developers to unlock the unique properties of that platform in new and unexpected ways. This story goes back at least as far as the minicomputers of the early '70s, with Unix and C, for which Ritchie and Thompson were given Turing Awards. We see similar patterns as we look over time at the disruptive platforms that have been invented. What's interesting is what's missing from this slide, the biggest platform for computing that humankind has ever assembled, the cloud. We don't have a programming environment that is suited to its unique physics.

The Big Question

The big question that I have that we're going to talk about is how will folks program the cloud in that way that fosters the unexpected innovation that takes advantage of its properties. I'll say before we get into it, that distributed programming is hard. It's harder than the other challenges we faced on the previous platforms. It entails issues of parallel computing, of the consistency of data that's replicated across multiple geographies. Of the possibility that pieces of the system have failed while it's still running. Modern autoscaling challenges where things are supposed to grow as usage grows and shrink as usage shrinks to save money, that only makes it harder. In my mind, programming the cloud is one of the grand challenges for computing over the next decade. It's something I've been working on and I'm very passionate about.

Outline

In this talk, we're going to have four chapters. In the first we'll talk about serverless computing, which is an early leading indicator of cloud programming. We'll talk about state and coordination, which are the hardest things to do in a distributed programming language. Then we'll talk about some foundations, the CALM theorem that help us figure out when it's easy to deal with state and when it's hard. Then finally, I'll talk about work we're doing in my group right now on a project called Hydro, which is building a language and compiler stack for programming the cloud.

Serverless: Signs of Interest from Cloud Vendors

Let's begin with serverless. I view serverless computing as a leading indicator of interest from the cloud vendors in making the platform truly programmable to third parties. Work on serverless computing goes back at least as far as 2014. It's not brand-new. Lambda was introduced on AWS in 2014. It's really come to the fore in the last four to five years. The idea of serverless computing particularly as instantiated by something called Functions as a Service, or FaaS. The idea is really simple. The idea is you write a little function, you can write it in your favorite language, Python, JavaScript, Java, you launch it into the cloud. This little function can take an input and produce an output. Once you launch it to one of these FaaS platforms, clients from the internet can send inputs to this thing, and it'll produce outputs for them at whatever scale of clients you manage to recruit. If you have a very popular function, the FaaS platform will scale up to meet all your demand. If you have a function that's not being used, the FaaS platform will scale down so you're not billed for any usage. It's a really attractive way to think about the cloud and how to program it. The promise of FaaS then, is that it's like a boundless computer. It's programmable. It's as big as you need it to be or as small as you need it to be. It knows no bounds on compute and storage. The reality of FaaS, however, is much less attractive. It's what I like to call an elastic army of incommunicado amnesiacs. Let me tell you what I mean by that.

Serverless Function Limitations

Serverless functions have the positive aspects that they have this boundless compute. What's negative about it is the functions themselves have to be simple enough to run on a single computer. They're basically a laptop's worth of computation. You can have as many laptops' worth of computation in the cloud as you need, but each of those computations is completely isolated. This is caused by the fact that serverless platforms don't let functions talk to each other, there's no network messages. That means no distributed computation out of a platform that is fundamentally the biggest distributed computer you can imagine. We really handicapped ourselves in terms of the physics of the cloud, we're not taking advantage of the ability to have functions that span multiple computers.

A second problem is there's no low latency data storage or access from these functions. They can access remote cloud services like S3 storage, object storage, or remote databases, but they don't have any local storage for very fast persistence, which means that typically, they're stateless. They have no information that's kept across invocations. Then the third big missing piece with these functions and Functions as a Service platforms is that they're made to reboot every few minutes. Because they have no storage, they didn't squirrel away in memory what they knew before they were rebooted. They're basically reborn with no memory, they're amnesiacs. This is why I call it an army of incommunicado amnesiacs. They have no ability to talk to each other, no ability to remember anything. We wrote about this in a paper that I'd encourage you to read. The spirit of this is not so much that serverless computing is bad, but rather that it skipped some fundamental challenges, and if we solve those challenges, we would really have a programmable cloud in our hands. Let's get to work.

Serverless as a Leading Economic Indicator

Before we get into that work, though, I do want to point out that serverless computing is really interesting as a leading economic indicator. The first 15 to 20 years of the cloud is what I call the boring revolution, surely a revolution in the sense that folks like Jeff Bezos were able to overturn the industry and shift billions of dollars of revenue from legacy enterprise vendors like IBM and Oracle, to cloud vendors that didn't previously exist, like AWS and Google Cloud. Of course, Microsoft was able to weather this transition internally. That's a big deal. The truth is, it's still legacy enterprise software. What can you get in the cloud today? Mostly, you get databases, and queuing systems, and application services, and load balancers, and all the stuff we already had before. From a fundamental computing perspective, it's not that exciting. It seems somehow, we haven't taken advantage of what the cloud could let us do in terms of innovation.

For 15 years or so, the likes of Jeff Bezos were off pulling off this revolution, they weren't worried about enabling third-party developers to take advantage of the new platform. They were busy taking advantage of it themselves. Now they've won that battle. In 2022, they start to have incentives to grow by exposing the platform and fostering more innovation. I think the time is ripe to really start answering this question, how do we program the cloud? I think there's economic incentives for it now. All we need is for the infrastructure folks who build the programming models, the systems that do the runtimes behind those programming models, we need to roll up our sleeves and say yes to the hard challenges that earlier generations like Functions as a Service took a pass on.

The good news is that in that 15 years, when the cloud vendors were stealing business from the enterprise vendors, a bunch of us were off doing research. We've been quietly able to do a bunch of research. Now it's time to start to harvest that research, put it together into artifacts that people can really use to leverage the power of the cloud. We're going to roll up our sleeves. I think we can do this. The goal really is general-purpose cloud computing, without compromises, without really narrow corner cases. How do you let most people write programs that harness the full power of the cloud? Three main goals, simplicity. It should be easy for developers to get on-ramped into this new programming platform: easy to learn, easy to debug, easy to operate. Correctness, we want the programs to run as intended. That is tricky in the cloud, as we're going to talk about. Then, the dynamics of cost and performance. We want efficient code by which we mean, yes, it should run fast. It should also only consume the resources that it needs. It should be efficient in terms of my cloud bill, not just in terms of my time.

Toward Generality: Embracing State

To get to that general-purpose computing, the thing we're going to have to do, the hard thing is to deal with what's called state. In the animation, I'm embracing the state where I live, and you might want to embrace yours. What I mean by state here is the data that functions are computations, generate and manage within and across invocation. If you call a function, it generates some information that it keeps in RAM, and then if you call it a second time, it might need to remember some things from the past. I call all of that state. It's really data that's either in memory or could be on storage devices as well. There's two challenges with state, one is hard, one is less hard. The really hard one is a correctness challenge called distributed consistency. We're going to talk about it quite a bit. It's a correctness problem. It's a barrier to simplicity. It really makes programming distributed systems hard to get right, and it's unavoidable. The second challenge is data placement and data movement. Which data should be where, when, for high performance? This performance problem is the stuff that we as engineers are quite good at. You do a prototype, it's a little slow. You profile it, you realize the data that was over here should be over there. You do some adaptations, and over time your system gets fast. Challenge number one is the one I want to talk about.

The Challenge: Consistency

What is the challenge of consistency? The challenge is to ensure that agents that are separated by space, agree, or will agree on common knowledge. We have this lovely couple in the upper right, and they are an example of what I call simple data replication. All we've replicated is x, a variable. We haven't replicated a program and all its meanings, we've just replicated some state, x equals heart. They both agree that this variable x, which could change, it's mutable. Right now, it equals heart, and everything's happy. What happens if they're separated? They can't talk to each other. One of them changes the value of that variable, so now unfortunately, the young woman on the left believes that x is poop. While the young man on the right believes that x is heart, what is going to happen? The problem is that with their beliefs, they will move forward and make more fateful decisions, and step down a forking path of these decisions to the point where if we have to put them back together later, they've made so many decisions based on different assumptions, that we can't put them back together in any sensible way. This is what's sometimes called split brain divergence. The computer brain across these two people has two different parts that can't be put back together in a sensible way.

This is not a new problem. This is a classic problem in distributed computing, and it's been solved at some level. On the right you see Leslie Lamport. He won the Turing Award for inventing a consensus protocol called Paxos. What's a consensus protocol? It's a protocol that allows multiple computers to agree on a value, like x equals heart. There's similar protocols from database systems, like the two-phase commit protocol that gets a network of computers to agree on whether or not to commit a transaction. These protocols are well known. I will say, though, that they are tricky. Every time I have to teach Paxos, I really need to bone up on it before I go into the classroom, because otherwise students will get very lost. There's a ton of details, and it's not particularly intuitive. Sometimes in engineering, in math, in computer science, stuff is tricky, but you got to do it. It's worth it because it's cool. It solves a real problem. This stuff, unfortunately, is both tricky and also bad. We shouldn't use this stuff. Don't take my word for it. What do I mean by this?

Coordination

In the upper right of this slide is a picture of James Hamilton. James has been around forever. He was involved in the early IBM databases. He was involved in the Windows file system. Over the last 20, 25 years, he's been involved in architecting two of the major clouds, both at Microsoft, and at Amazon. He is one of the key architects of the modern cloud. Some 13 years ago, he gave a talk, and the quote is so good I like to read it like poetry. He talks about it like this. "The first principle of successful scalability is to batter the consistency mechanisms down to a minimum, move them off the critical path, hide them in a rarely visited corner of the system, and then make it as hard as possible for application developers to get permission to use them." It's as if he's saying, "Thank you Dr. Lamport for all the Paxos but we're not going to use that."

Why is he saying this? Why is he tying our hands and not letting us use the solution to the problem at hand? Coordination of the forms that he's calling consistency mechanisms, so things like Paxos, two-phase commit, they involve computers waiting for each other to come to agreement. There's a lot of messages that go back and forth that you have to wait for. That waiting causes other machines to wait, and it builds up queues. I may be waiting for you. Other services may be waiting for me. Services way up the chain may not even know what they're waiting for anymore. What happens is if some junior developer at a big cloud vendor decides to call Paxos in the middle of their program, because they think they might have a bug otherwise, it can cause a cascading effect of queues throughout the cloud that bring things to their knees. This has been documented in some of the major clouds, especially in the first many years. James's advice is very pragmatic, we don't want to use coordination. If we have to use it, only the experts get to use it, and only in a corner of the system.

The problem with coordination mechanisms is that they're reasoning in a very conservative way, way down at the bottom of the system, at the level of memory accesses, things like memory access, read and write, or disk access, read and write. Databases are classic for this in terms of their transaction semantics. My former student, Peter Baillis, is now at Stanford and has Sisu, which is this startup of his, had this great cartoon in his PhD thesis. It's an adaptation of The Far Side. It goes like this, what we say to databases is, "Ok, database, now be good. Please move one iPhone out of inventory and into Bob's cart." What the database storage manager actually hears is, "Blah, blah, blah, read, blah write, blah read, blah write." It knows nothing about all that application-level semantics about products and inventories and what it means for there to be less than zero things in inventory or too many things in a cart. It doesn't know any of that. It just knows reads and writes. It's very cautious about what reads and writes it will allow.

What we're seeing is a generational shift in the way we're going to solve the problem. The Lamport-era 20th century approach was to reason at the storage level, and make worst-case assumptions. What we're going to talk about, and what we're going to see in more modern research in the 21st century is going to look at application specific assumptions. What do we know about our application, and how it uses its data and its state, and how the computations in the application affect that state? At some level, the stuff about memory access is really not where the action is anymore. If we want to do this well, we're going to be reasoning about programs, not about I/Os.

When and Why Do We Need Coordination?

The big question, when you start walking down this path of application semantics is, when do we truly need stuff, like Paxos, like Lamport's work? Why do we need it, and when can we avoid it? This question, when do I need coordination and why? If you ask a traditional computer scientist, ask most computer scientists, why do we need a lock in our program? That's a little example of a local form of coordination. Why do I need Paxos? They'll tell you, "We have conflicts around some resource, and if we don't coordinate the order in which people access that resource in space and time, bad things will happen." A typical example in the physical world is this intersection, hasn't been coordinated, and two cars can't be in the same place at the same time, and so you get this problem. There's an obvious solution to this thing, which is fix the stoplights. Stoplights are great, because then the north-south guys can go for a while. Then the east-west crowd can go for a while. Then the north-south again. At any given time, at least we're utilizing that resource of the intersection. The problem with that, of course, is if the arrival rate of cars is faster than the drain rate that happens while the light is green, we get infinite pileups at these intersections, even with stoplights. What are we going to do? We have to have stoplights. Physics says you can't be in the same place at the same time. Maybe there's a clever engineering solution that doesn't break physics. In fact, there is one and you see it all the time. It's to use another dimension in space, we'll build overpasses, and then we don't have to have stoplights. These overpasses can go at full bandwidth all the time. Of course, you've been stuck in traffic. There's other reasons why traffic backs up besides coordination. The idea that we can avoid coordination in cases where we might have thought we have a contention or a race condition makes the question really hard, when do we need coordination really? When is there a clever solution to work around it and not have coordination?

It's a theory question really, at the end of the day. Let's move to our whiteboard. Suppose you understand your program semantics, which programs have a coordination-free implementation? If we could just find it, a clever programmer could build it without using coordination. Those are the good programs, we put them in the green circle. The rest of the programs are the programs that require coordination. There is no programmer in the world who could come up with a solution to this problem without using coordination. It's absolutely intrinsic to the statement of the problem. I want to know, what's the green stuff, because the green stuff, I can make that autoscale and run faster in the cloud and all these good things. The red stuff, I'm going to have to bite the bullet and use coordination. That's the stuff that James Hamilton doesn't want the junior programmers to write. What is that green line? Can you give me a test that will separate the red programs from the green programs? This is a computational complexity or computability problem? What are the programs computable without coordination versus the programs that require coordination? What we're going to do next is look at that theory. It's called the CALM theorem. It's going to inform how we can actually build systems. What we're going to see is that monotonicity is going to be that green circle, the bright line test for whether coordination is needed. We're going to talk about that, some lessons, and some performance payoffs.

CALM Theorem

Here's the statement of the CALM theorem as it was conjectured in a talk I gave back in 2010, and then proved the very next year by an extremely brilliant graduate student at the University of Hasselt, in Belgium, Tom Ameloot. The theorem says this, a distributed program has a consistent and coordination-free distributed implementation, if and only if it is monotonic. The green line test is monotonicity of the problem. If it's monotonic, we can get a consistent coordination-free implementation. If it is not monotonic, it's outside the green circle, there is no consistent coordination-free distributed implementation. If you want to dig into this research, I recommend you start with a short six-page overview paper that Peter Alvaro and I wrote in CACM, just a couple years ago. It's pretty accessible, and it points to the relevant papers where you can dig deeper. The papers are beautiful.

Let me give you some intuition. We're going to need some definitions, particularly of our key words, consistency, monotonicity, and coordination. We won't get formal definitions, that's what the proofs are for, but we will get some intuition. For consistency, we're not going to use an I/O consistency like Lamport did, we're going to use an application-level consistency. I want the outcomes of the code of the application to be the same everywhere in the network, regardless of whatever shenanigans are going on with the network. Messages getting delayed, or reordered, or having to be resent multiple times, so we receive them lots of times. Regardless of all those shenanigans, I want everybody to compute the same thing. The test we're going to apply is monotonicity. Monotonicity is a property of a function that says if you give it a bigger input, you get a bigger output. Formally, on the left, it says that f's monotone is defined as if x is a smaller thing than the y, then the function on x is a smaller thing than the function on y. Put in a bigger thing, namely y, get a bigger outcome, namely f of y. The cool thing about this is you can think of this as a streaming environment, start by computing at x, all of its outputs are correct. Then if you grow to include the stuff that's in y that wasn't in x, you can compute that, and all the stuff that was from x will still be correct.

With monotonic programs, the early results are guaranteed to be in the final results, so you can stream output to the user without regret. This should give you a hint at why this is coordination free. Every computer in the system, the minute it knows something is true, is guaranteed that thing will be true in the final outcome, and so it can emit the things it knows true right away without asking anybody, no coordination. Which leads to the hardest question actually, turns out, how do we define what coordination is? That's where Ameloot's proof is really brilliant. He shows that the way to think about coordination is, there are the messages that we have to wait for, even if it turns out we have all the data. Suppose I know everything about the answer, and the only thing I don't know is that no one else knows anything about the answer. Then I'm going to run around to everybody saying, do you know anything else I should know? Only after everybody else have no idea what you're talking about, then I can output things. That work I was doing with everybody, that counts as coordination. The actual work to compute the answer is not coordination, the work to check that nobody knows anything that I need to know, that's coordination.

Easy and Hard Questions

Again, the CALM theorem says that the monotonic programs are exactly the set of programs that can be computed coordination free with a consistent outcome. Which leads to some easy and hard questions that we can ask of you. The first question is, are any of you over 18? If any one of you sent me a text message or an email or whatever that said, I'm over 18. Then I'd have my answer and I'd be done. What about if I asked the question, who is the youngest person to watch this video? That one, first of all, suppose you are the youngest person who watches the video, the only way you'll know that you're the person who should send me a text is by asking everybody else, are you older than me? Worse than that, suppose somebody younger than you comes along tomorrow and watches the video, then you're not the right answer after all. You shouldn't have said you were the right answer. It's non-monotonic. The more people who come along, the answer might change. We can't stream outputs. We might have to retract outputs that we released early.

Let's look at a rephrasing really of the right-hand question. All we're going to do is we're going to rephrase it to say, who is the person that nobody is younger than? I've done the thing where we do a double negation, where we do De Morgans Laws. Nobody is younger than, so let's read it out. There exists an x such that there does not exist any y, where y is less than x. There is no person younger than this person. The reason I rewrote this is it's really easy to look at this stuff in logic now and see what's monotone and what's not. The thing on the left there exists. The minute we see something that exists, we're done. Anything more that comes along, only makes that thing more true. It was true, and it remains true no matter what happens. The thing on the right, it's got that not sign in front of the exists. When you see not, it's non-monotone, because we might start with there not existing a y. As we introduce more y's, we might find one that does exist, and so we'll start out by saying, Mary is the youngest person watching this video. There's nobody younger than Mary until somebody younger than Mary comes along. That not exists is something we can't test locally. We need to know a lot of everything to test not exists. That negation is the key to non-monotonicity. Very easy to distinguish monotone from non-monotone in these logic expressions.

The CALM Theorem

That gives you a flavor for the CALM theorem. Monotonicity is the bright line test for whether coordination is needed. You might be asking a pretty natural question at this point, which is like, why do you need to know this? Very nice. It was good to go to a talk and learn a little computer science, but like, what's it for? I'm going to try to argue in four different ways that you should understand this, and you should ingest it and somehow use it in your work. We'll do it first to look back at some familiar ideas in distributed computing to some of you, the CAP theorem. Then we'll talk about right now, what can you walk away with from this talk as a design pattern for thinking about how to build systems that go real fast, and are monotone. Just over the horizon, stateful serverless computing, how can we have serverless functions, like we talked about at the beginning, that have state, that have data, and keep them correct? Then finally, we'll transition to the last part of the talk, talk about languages and compilers that can help all of us program in ways that take advantage of monotonicity when we can. Then try to move coordination, as James Hamilton suggested, into the corner, into the background.

A CALM Look Back at CAP

The CAP theorem was a conjecture by my colleague, Eric Brewer at Berkeley. It says that you only get two out of the following three things, consistent outcomes, availability of your system, that is that it's on and live and accepting updates, and partitioning in the network. Partitioning means that at least one computer can't talk to at least one other computer. A way to think about this is if you have a partition, if there's two computers that can't talk to each other, you have two choices, either you can turn off the service, and then it's not available. Or you can let the service run with these two different computers that can't talk to each other, and you'll end up with a lack of consistency. You'll end up with split brain. You won't be able to put it back together when the network heals. That's the CAP theorem. For 15 years, people talked about the CAP theorem, it just doesn't seem quite right. I bet I can beat it. There was a proof of the CAP theorem that came out of MIT, Gilbert and Lynch. An unspoken assumption of that proof, and in a lot of the discussion of the CAP theorem advocates, is that the kind of consistency we mean, is that low level I/O consistency that requires coordination. The definition of consistency is one that requires coordination in its definition.

As I said, that's tired in today's world. What the CALM theorem does, is it explains when and why we can beat the CAP theorem and get all three of consistency, availability, and partitioning. Remember that we can have consistency that's coordination free in the green circle. In the green circle, we can have split brain for a while, while things are partitioned. When they come back together, because they're monotone, they'll just exchange messages and start outputting more good stuff. We can put the Humpty Dumpty back together, in a coordination-free environment. Coordination-free consistency is possible, even if we're available under partitioning. What are the coordination-free consistency programs? The monotone programs, that's the CALM theorem. CALM is explaining where the happy case is. It's the coordination-free programs. We can get all three of the CAP things, and one of the sad cases. The CAP theorem is really a theorem about that outer red circle, or about a worst-case assumption. This definition of coordination is at the heart of the technical differences in the formal proofs of these theorems. What I can tell you is that while the theoreticians chose their formalisms, Eric and I are on the same page about how this relates to what you and I might want to build in the field. Eric and I are pretty connected to practical use cases, both of us involved in the industry.

Design Patterns from CALM

What I want to talk about now are things you can build in languages like C++ and Java that would benefit from the design patterns of the CALM theorem. What we did in my group is we applied it to the context of traditional software. We picked a little petri dish of a problem, which is a key-value store. We built a key-value store called Anna. I should mention on the upper right, this is Anna's hummingbird. It's a kind of hummingbird that was clocked as being the fastest animal for its size on the planet. There was research that was done at UC Berkeley by colleagues, so we like the name. The idea with Anna is that it's very lightweight, very fast. Just as fast as it needs to be for its size, because it's very serverless and autoscaling. The key things with Anna are twofold. First of all, it does give you many kinds of consistency guarantees that are coordination free, so things in the green circle. It'll give you causal consistency if you know what that is. It'll give you read committed transactions if you're familiar with transaction levels. It does this with no coordination, which means it is always running full tilt parallel, and it's able to autoscale. Here's a couple of different papers on the Anna system that you can go chase down. Great work by Chenggang Wu, the lead author. He won the ACM SIGMOD Dissertation Award for this work, so a landmark in database systems.

CALM Autoscaling: The Anna KVS

The key thing with Anna is not only is it fast, but it's autoscaling. We started out with the goal of overcoming conventional wisdom on scaling from the first decade of the cloud, in essence, or of at least network services. Jeff Dean, around 2009, after about a decade at Google, was giving talks saying that when you build a service, you should design for 10x growth, but plan to rewrite it before you get to 100x growth. That's what we've had to do at Google every time. That was good lessons for Google back then but it's a terrible idea today. Serverless computing is all about my application might not be popular now, but maybe next month, it'll take off like a hockey stick, and it'll be 10,000 times more popular than it is today. I as a programmer don't want to rewrite that application three times between today and tomorrow. I want to write it once. I want it to take up as much compute as it needs, up and down.

What does CALM tell us? What we're going to do in Anna is we're going to take the idea of coordination-freeness, and we're going to use it to enforce certain consistency levels. We're going to enforce levels like causal consistency. The way we're going to do it is we're going to use data structures that only grow. Data structures that are monotone, like sets that we union things into, or counters that only get bigger and bigger. These are what are called semilattices, or if you've heard of CRDTs, which I'll talk about later. These are like simple CRDTs, but we're composing them into hierarchies so you can have a set of counters of sets, and so on. When you put these things together, each one of them super easy to reason about, but they're all data structures that just get bigger. We never remove or lower them. We only make them bigger over time. In doing that, we have monotonicity. The data structures themselves are just C libraries. They're just sets and counters and things. Because we're using them only with things that grow or methods that grow like union, and increment, then we're cool.

What this is going to allow us to do is scale across threads without coordination. Here we have three objects in the Anna key-value store and one operating thread. We can scale it up, replicate the objects in certain ways. In this case, there's two copies of every object, but each node has a different pair. This is good for fault tolerance. Any node can go down, and we still have all three objects. We can also scale up and down a memory hierarchy. Here, what we're doing is we're putting cold objects like blue, on disk drives. There's no point allocating RAM in the cloud for those objects if no one's accessing them right now. Hot objects like green and red will promote up into the memory tier where things are fast access. We can choose to have things that need to go low latency and fast in memory. Things that are not used very often and cold can be far away. This is the control we want over state. We want state to be close to us, if we need it to be close to us, and far away if it doesn't need to be close to us. We also have the ability to scale back down when the system changes usage, and we have fewer users. Anna can do all this really easily.

CALM Performance

The performance we get out of Anna is phenomenal. What you're seeing up here, the Anna architecture essentially shares no memory. Even individual threads just talk to each other with messages, and computers talk to each other with messages. What we're seeing is as we add threads to this thing, which are the individual ticks on the x axis, it scales linearly in this very smooth way. When we saturate a machine with all its threads, so we have a 32-core machine here, we just go to the next machine, and it just keeps scaling, with almost no interruption in the smoothness of that. If you know anything about building parallel and distributed systems, you know that getting smooth scaling like this is extremely difficult. When you're doing coordination-free computing, it's often very easy. Anna has this beautiful scaling phenomena. More impressive, though, is that it is crazy fast even under contention. That's the case where there's a few objects that are very hot, everybody wants to write them and read them and write them, and lots of objects that are pretty cold. In most systems, those hot objects determine your throughput. In Anna, you can have lots of copies of those hot objects, lots of people can be updating them, and it'll just keep going at maximum performance. What that gives you is ridiculous performance improvements over the competition.

Masstree was the fastest in-memory multi-threaded key-value store we could find at the time from research, came out of Harvard. Anna was 700 times faster in these experiments with contention. It was 10 times faster than Cassandra in a geo-distributed deployment with this contention. It was 350 times the cost performance from DynamoDB. It was 350 times faster for the same price than DynamoDB. The reason for this is very clear, it's CALM and coordination free. That means that as we're inserting things into this key-value store and reading them, we never have to do any waiting. We have no atomic instructions, no locks, no Paxos, no waiting ever. What that causes in performance when you do your performance breakdowns is that the competition on the lower lines of the chart on the right, TBB is Intel's Thread Building Blocks library. It's Intel's fastest library for hash table. Spends 95% of its time in this workload retrying atomic instructions. These aren't locks, so it's lock-free TBB. It is atomic so it needs to successfully update the thing before somebody else peeks at it. It fails, and so 95% of its time, over and over, it's trying to do atomics. Atomics are coordination. They're just not locks, they're a different kind of coordination. Anna, on the other hand spends 90-plus percent of its time just doing PUTs and GETs. This is what's called good PUT. Anna is doing only good PUT. Ninety-percent of its time is spent doing good stuff. Anna is giving you consistency, this is not just the who knows key-value store, it's guaranteeing you things like causal consistency, or read committed.

Lessons from Staying Monotonic

Some lessons from staying monotonic. If you can figure out how to have your data structures and what you do with them only get bigger over time, monotonicity, then you can update those things at any time, in any place have copies at will. The system will have maximum good PUT even under contention, and you have the ability to be really profligate about your replication, so you can do things like replicate horizontally to your peers for fault tolerance, or for load balancing, or for latency. You can have copies of the data close to your users geographically. You can also replicate vertically to faster caches. If you want data in your serverless functions, for example, you want them to be stateful. Or you can move the data to slow storage if it's cold. You pay more dollars only for the things that are hot. That's how you beat the heck out of something like DynamoDB.

Stateful Serverless Computing

We talked about what we can do, now let's talk a little bit about what's just over the horizon. We've already prototyped a couple years ago at Berkeley, the idea of Functions as a Service, but with state. The trick was to use the ideas of Anna, in our Functions as a Service environment, to do updates locally and remember them as long as the data types are monotone. One of the things that was really interesting about Ameloot's CALM theorem papers, he showed a third equivalence. I said consistency is logical monotonicity, CALM. He pointed out that the monotonic programs and the coordination programs are the same, and they're also the same programs that can be computed by oblivious actors. What's an oblivious actor? It's an actor that doesn't know who all the other actors in the system are. It doesn't even know its own identity. It's just one of an army of clones, not knowing who's in the army that just computes when it gets messages. It sends messages, gets messages, but it doesn't really know who's around. It's just doing its thing. Obliviousness is just the property we want for autoscaling. It means that if I don't need to know the population, I can add nodes at will, I can remove nodes at will, and the individual nodes just keep doing their thing. The programs you can execute in that environment are the monotonic, coordination-free programs. It's all the same. It is possible and we've shown it in CloudBurst, to build a Functions as a Service environment. It's oblivious, so the functions don't know who all is running. It's stateful, and communicating, so it's removing those restrictions that we worried about early in the talk with serverless systems. It's got the freedom to scale up. We can add another node, start copying state to at any time. Scale down. We can stop taking updates at a node, copying state elsewhere, and decommission it.

We're using this architecture that Chenggang and Vikram, the leads on this work, have gone off and started a company called Aqueduct, which is doing serverless model serving, so take machine learning models. They need to do inference or prediction at scale, and you want those things to scale up and scale down with use. That's what Aqueduct is all about. It's an open source library. I encourage you to go check it out if you do machine learning or MLOps. What I was going to close with those, that the shortcuts for Functions as a Service, the statelessness and the lack of communication aren't necessary if you commit to monotonic code. Monotonicity is the key property that was missing.

What More Could We Want?

So much for the CALM theorem. Now I want to talk about Hydro and what we're doing to make it easier to program the cloud. First, let's ask the question, what more could we want? I told you all these great things about monotone programming, but what I didn't focus on is the idea that non-monotone stuff happens. James Hamilton acknowledges that sometimes in the corner of the system, you need to use coordination. We really do want to have a programming environment for general-purpose computing, that has a mix of monotone and non-monotone stuff. We might want strong consistency. We might want transactions, sometimes. I/O guarantees, maybe we want that, we should be able to have it. Simple thing, I might want to know when my monotone program terminates. Let's go back to my question of, is anyone here over the age of 18 watching this talk? I'm sitting here recording the talk and I don't know the answer yet. No one's texted me. I don't know if the answer is no, nobody is over the age of 18 watching this talk, this talk is strictly for kids. Or just, I have to keep waiting. That idea of termination detection, even for monotone programs that haven't finished, it requires coordination. I could keep waiting, and somebody might come along and say I'm over 18, and that's good. Then I'm done. Until that happens, I just have to wait. I might like to know like, is there really nobody over 18? That's a non-monotone question. For all the people in the world, who are going to watch this, are they under 18? That's a non-monotone question. Termination detection requires coordination. Then there's some algorithms that just computational complexity tells us they require coordination. The non-monotone programs don't cover all of polynomial time, which is most of what we do in algorithms. They cover a large fragment of it, but there are programs that you might want to compute algorithms that require coordination or that use coordination to get efficiency gains, so I want that option.

The second thing I want is I want a language, compiler, a debugger that's going to help me with these problems. It's going to address the real concerns of distributed computing. What are the concerns of distributed computing? Here's one, is my program consistent? Even though maybe I'm not sure it's monotone. Is it consistent? Can I take the state in my program and partition it across lots of machines by sharding it? Some of the state's on one machine, some of the state's on a second machine, and so on. If I do that, will my program still be correct? What about if I replicate the state particularly if it's non-monotone, is it going to work out or not work out? Do I need to use coordination? Do I not need to use coordination? Where in the program should I put the coordination? What about failures? What failures can my program tolerate? What if a machine goes down? What if a network message gets dropped? How many failures can I tolerate before the system goes offline, or starts giving wrong answers? Then, what data is moving around in the system where? Who can see it? Is that ok? What if I wanted to control it, how would I control where the data goes? Take any one of these questions and go to your favorite traditional compiler, like the LLVM stack that we all use for compiled programs these days, and they'll look at those questions and pretty much shrug their shoulders. Modern compilers were built to handle the problems you get on your PC, the problems you get on your phone. They were not built to answer the programming problems that arise in the cloud. I want a language, a compiler, a debugger that can answer these kinds of questions. Then, of course, locally on the individual machines we'll call LLVM to compile the local code. That's fine. It does a lot of good things. Hydro is going to try to address these kinds of challenges.

Inspiration is a Query Away: SQL

The inspiration for this is something we're all familiar with. It's just a query away. It's SQL. SQL, the database query language is the single biggest success story in autoparallelized programming. Take an SQL query, run it on your laptop, take it to the cloud, run it on a million machines. It's that simple. Not only that, we've been doing this since the 1980s. There's a project at Wisconsin called Gamma, led by Dave DeWitt, who should win a Turing Award for this work, that showed that you could take SQL queries and parallelize them like crazy. He proved it by building it on a very early parallel computer that was incredibly hard to work with. Around the same time, a company started called Teradata to do the same thing. Teradata is still around, they're still cooking. They can run stuff really fast across lots of machines. This has been known since the '80s. A lot of it was reinvented in the big data craze in the 2000s. Honestly, most of the innovation has been around for SQL since the beginning.

Relational queries scale like crazy. Serializable transactions do not scale like crazy. As we take lessons away from databases, we should focus on the query languages, the query processors, the query optimizers, and not worry about these low-level I/Os things, like transactions. That's a separate concern that we talk about separately.

I want to highlight a thing that goes back to the dawn of relational databases, and Ted Codd, who also won the Turing Award for this. The idea was that data moves around on disks over time, and queries stay the same. We need to build database systems, namely relational systems, that hide how data is laid out and hide how queries are executed. All you do is you say your query, and the system is in charge of worrying about, the data is laid out in a certain way, so I'll run the query in a certain way. The cloud was invented to hide how computing resources are laid out, and how general-purpose computations are executed. The cloud in a lot of ways, is in the state that we were in with database systems in like 1969. It's waiting for somebody to come by and invent a programming model that empowers us to hide this stuff. There's lots to learn from database systems.

Our First Approach: Relational, Data-Centric Programming

The first approach we took in my group to solving this problem was a very relational, data-centric programming language that we called Bloom. Bloom is still available. You can play with it. It's an embedded language inside of Ruby, actually, about 10 years old now. We took a theoretical language for relational queries called Datalog, we extended it to deal with time and space, so distributed systems problems. We call that Dedalus. Then, Bloom was the practical programming language we were showing to developers that took Dedalus into something that was programmable. We demonstrated many of the benefits that I'm alluding to in prototypes of the Bloom language that weren't particularly fast implementations and they weren't particularly easy to learn. There they are, and we wrote papers about them. Folks who are really interested in this area, I encourage you to have a look. What we had built in the Bloom era was a walled garden. We really didn't focus on making this easy for people to learn, incrementally adoptable from existing languages. We didn't worry about how it integrated with other languages, and we didn't really worry about whether it was that fast, just that it was coordination-free and scalable.

Then the other thing I'll point out about Bloom that we started fixing, and we're continuing to fix in new projects, is that a lot of common constructs we want in computing are pretty clumsy to do with SQL or another relational language. The classic one is counters, which are monotonic and just be going up. They're super commonly used in distributed systems, and we use them in a lot of our coordination-free tricks. They're very hard to do in SQL or a relational language like Datalog. Another thing that's hard to do with those languages is data that really wants to be ordered. I have a list of things, it's ordered, and I want to preserve its order. That's very clumsy to do in relational languages. It can be done, but it's ugly. The programming model is not always friendly to the programmer in this environment.

Competing Approach: CRDTs

There was a competing approach around the same time called CRDTs. These are an object-oriented idea. The idea is that you have objects that have one method, the method is called merge. It's an object class with a merge function that takes two CRDTs and merges them to make one big CRDT. Things merge together like amoebas, so these objects can get bigger. The only requirement is that your merge function can be anything you like, as long as it obeys the following rules. It is associative, which means that you can batch these merges up into batches of any size, and it still gives you the same answer. It's commutative, so the order in which you merge things doesn't matter to the outcome. It's idempotent, so if you merge the same object in twice, it doesn't change the outcome. In mathematics, there's a name for a mathematical object that is a type with such a merge function that's associative, commutative, and idempotent, it's called a semilattice. This is an object-oriented interface to semilattices. Because we're computer geeks, we give it a new acronym, we call it CRDTs.

Unfortunately, CRDTs are broken. I like to call them the Hotel California of distributed state because they only have a merge method. You can merge state anytime you like, but you can never read safely. If you observe the internal state of a CRDT, you freeze it in time, you say what's in there, and you open the package and you try to read from it. That breaks the guarantee of eventual consistency that they're giving you. Depending on when you read, you may see stuff may be non-monotone and change over time. CRDTs are broken. They sound like they're promising you correctness. They promise you correctness if you never look at your data. They don't promise you correctness if you do look at your data. Still in the spirit of Anna, they're inspiring. They allow us to start thinking about data structures that only get bigger. CRDTs are data structures that only get bigger in a formal sense. They're getting a little bit of adoption. People are starting to pay attention to CRDTs. It's even beginning to have a little impact in some commercial products. Of course, they're being used by developers in ways, hopefully, that avoid the bad cases that they don't guarantee to handle. I view them the way I view serverless functions or NoSQL, is they're flawed, but they're a leading indicator that people want something and that we could build something even better.

Another Influence: Lessons from Compiler Infrastructure

I talked about relational languages. I talked about CRDTs. Another big influence on the work we're doing is just compiler infrastructure the way it's done today, which is basically to have stacks of languages. The canonical example is LLVM, which is a very influential piece of software. The key idea in LLVM, there's many ideas, but the one that architecture is most important is that it has its own language in the middle, an intermediate representation, an IR as it's called in the programming languages community. You can have many languages at the front of LLVM, like C and FORTRAN and Haskell in this picture, but since then Rust and Swift and a bunch more. First, you translate those to LLVM's internal IR, which is a language. You could actually type it. Then the LLVM optimizer works on that IR to generate an appropriate machine language for the backend, which could be one of many machine languages. It's really three languages. There's the input on the left, there's the IR in the middle, and there's the machine code on the right, all of which, in principle, you can type in text. If you're a performance sensitive person, you can jump in at these different layers and hack. There are people who change their machine code, not recommended. There are people who will change the LLVM IR that's generated from the frontend, might sometimes be worth it.

Hydro's Inspiration

Hydro is a new programming language for the cloud. It's inspired by logic languages like SQL, because they give us the analytic power to look for correctness and optimization. If you remember back to our example with the questions, the exists and not exists questions, the for alls, those are logic languages. You saw those expressions where we could just look for the not side and be able to tell that something was non-monotone. Logic languages are like that, SQL is like that. Inspired by lattices, because really what lattices are, is they're generalizations of what's good about relations to other data types. They allow us to have Grow-Only things like sets of tuples, which is what relations are. They don't have to be sets or tuples, they could be things like counters, they could be things like maps of other kinds of lattices. Lattices are composable, and they're quite general. It generalizes what we already knew about from database systems, if we use them in sophisticated ways. We're inspired by functional dataflow, things like MapReduce, Spark, and Pandas. Actually, if you crack open any SQL database, there's a functional data flow runtime inside of it, so it's no surprise that MapReduce, Spark, and Pandas look a lot like SQL systems, or even support SQL as Spark does, because really, they're just the inside of a database like Gamma. Then finally, compiler infrastructure stacks, as we talked about, things like LLVM. We have these multiple languages targeting both generalists and experts. We have a vision paper that we wrote last year, "New Directions in Cloud Programming," that lays out our agenda for what we want to do and why we think we can do it, given what's known in the field as of a year ago.

Hydro: A Programming Stack for the Cloud

This is Hydro in a block diagram. It's a programming stack for the cloud. At the top, many languages that you can program in, much like the left-hand side of the LLVM picture. In the middle, an IR that's declarative and hides the resources and the execution that's in the cloud, very much like an SQL style language just says what you want, not how to do it. Our challenge is to compile into a declarative IR. Then we have a compiler that translates that into a Dataflow program, and, in particular, into a Dataflow program of lattices. In essence, we're taking the idea from CRDTs, but we're turning it into a composable, rich, low-level programming model that can build extensions to the ideas of things like Spark. Then that Hydroflow program actually generates Rust code. The Rust code goes into LLVM, it turns it into executables. Then the challenge is to deploy them and scale them up and down. That's the Hydro stack.

Initial Wins from Hydro

What I'll tell you is we're starting to get some early wins in the stack after a year of work. The first one that's come out as a paper is surprising to me, from the team, because it's the one I thought was the biggest science fiction. It's the idea of taking a traditional sequential program written in a language like C or C++, and translating it automatically into a language like HydroLogic. In particular, we simply translated it into the CRDT. All we're doing is we're taking sequential data structures, and converting them into semilattices. This uses techniques called verified lifting that my colleague at Berkeley, Alvin Cheung invented. The lead student on this, Shadaj Laddad, is just a rockstar, amazing guy. He knocked this out so quickly, it was really impressive. I'm quite optimistic now that getting things into a IR like HydroLogic is going to be within our grasp in the next year or two.

Another piece of the puzzle, compiler optimizations for distributed programs. My student David Chu won an award at the ACM SOSP competition last year for Best Student Research Pitch, when he pitched this problem. The idea is to take techniques like CALM analysis, and apply them to protocols in distributed systems like Paxos. Actually, what you find is even though Paxos is a coordination protocol, inside of it, there's monotone components that can be scaled up. Even inside the classic protocol for coordination, there's things we can do underneath to scale it. The theme here is that, sure, not all programs are monotone, but most programs are mostly monotone, even programs like Paxos. The challenge here is not only to replicate with CALM, but also to shard the program intelligently. He's using ideas essentially from SQL to do sharding. Then the goal here is to get compiler guarantees of correctness.

Then, finally, at the bottom of the system is this high-performance kernel for distributed execution, local data flow on local machines with network ports at the sides that communicate with low latency. In essence, it's distributed flows of lattices. Then Rust is being used not only to generate really fast code, because it's an LLVM language, but also to do the type checking for the distributed properties that we talked about in the previous slide. We're beginning to make progress on the type system part. The Hydroflow runtime, however, is quite mature. I encourage you to have a look at it. It's got a low-level language that can be used to program it in terms of data flows. We've got examples of programs that are written in Hydroflow today.

Hydroflow Example

Here's a little example of a program that I wrote. It's a program to build a chat server. This is the entire code in Hydroflow for that server, and the first six lines are really just about setting up the network ports. All the logic of the program is in the last four lines. This model of simple data flows or simple lattice flows at the bottom of the system is actually very expressive and compact. It's really cool that you can get programs like this to be so small.

Key Takeaways

Embrace state in serverless computing, don't settle for statelessness. Avoid coordination. You can do both, even though currently the vendors won't let you. We can embrace state and avoid coordination in the monotone case. Move to the 21st century, when we talk about consistency, let's stop talking about reads and writes or stores and accesses, let's talk about the consistency of our application outputs. Then, the centerpiece of this whole thing foundationally is the CALM theorem, which is, if your programs are monotone, then there is an implementation that's coordination free and is consistent. If your program is not monotone, no such implementation could exist. Finally, I showed you the example of how you can build a system where the data structures only grow, so the monotone data structures, and that system will be coordination free, and it'll just go like crazy. This Anna programming model, the way that it was architected, I encourage you to look at that paper. The architecture is super simple. The thing just scales like crazy. It's super-fast. Finally, the blue lollipop is the one that's not quite ready yet, but I can see it over the horizon. New language stack for the cloud. The Hydro project at Berkeley is exploring this and I expect that we'll have really usable results from this based on years of prior research. We'll have usable results from this coming up in the next few years. Programming the cloud from my perspective is a grand challenge for computing. I hope more folks get involved, kicking the tires, generating ideas, building competitors. Let's go solve this problem.

Resources

For more information, you can look at the Hydro website, hydro.run. All our code is on GitHub at hydro-project, particularly the Hydroflow kernel. It's quite mature at this point. It's going to be probably released into v 0.1 towards the end of 2022. The Sky Lab at Berkeley, sky.cs.berkeley.edu, you can learn about that more. The Aqueduct serverless scalable machine learning platform is at aqueducthq.com. If you have to pick three papers, you want to read the serverless paper, "One Step Forward, Two Steps Back," the explainer paper in CACM on the CALM theorem, six pages long, and the vision paper for the Hydro project. Those are three you might want to pay particular attention to.

See more presentations with transcripts

Recorded at:

May 05, 2023

Joe Hellerstein

InfoQ Software Architects' Newsletter