BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles From Imperative to Functional and Back-Monads are for Functional Languages

From Imperative to Functional and Back-Monads are for Functional Languages

The past few years have seen an influx of ideas from functional programming (FP) languages into mainstream imperative languages. Not only have lambdas and higher-order functions found their way into Java, C++ and other languages, but even more advanced concepts like monads, imported from the purest of FP languages – Haskell.

A pure function is a concept that equates the mathematical notion of a function with the software notion of a subroutine (often, confusingly, also called a function). A pure function is a subroutine that behaves like a mathematical function: its only input is its arguments and its only output is its return value, and it may have absolutely no other effect on the world. It is therefore said that pure functions are free of side-effects: they may not engage in I/O, and may not mutate any variables that may be observed by other pure functions.

A pure functional programming (PFP) language is one where all subroutines must be pure functions. Contrast this to non-PFP or imperative languages, which allow impure functions, i.e. subroutines that, unlike mathematical functions, might very well impact external state. Therefore – to confuse matters further – most functional languages (including Clojure, OCaml and Erlang) are imperative. The source of the confusion is that the term functional programming itself, which is used by languages that call themselves functional but aren't pure, doesn't even have a commonly agreed definition. What is certain though is that functional languages are imperative except for pure ones – and there are hardly any pure functional languages out there. The only PFP language that has seen any use in the industry is Haskell, and even Haskell has hardly been used in industry in its twenty year existence. If PFP is so rare why is the monad, which is one of its flagship concepts finding its way into more and more imperative languages – from C++, through Java and all the way to JavaScript? That has to do with a clever solution PFP has for a problem that imperative languages don't even have. What is the problem? And What is the solution? And why do imperative languages even think they have a problem now where they hadn't before? Let’s try to answer these questions and more.

The Mighty Monad

The PFP approach presents a challenge that is immediately obvious. If a pure function is not allowed to do any I/O and in a pure functional language all subroutines are pure functions, how do we make PFP programs do anything useful? PFP solves this conundrum with a clever and elegant idea called the “monad”.

Monads have long eluded simple explanation. It is said that once one understands monads one loses the ability to explain them. I believe I have found a way out of this devilish catch-22: I don't fully understand monads, so therefore I might be able to explain them, at least partially. And if after the following explanation you don't completely understand them either, then you will be in the enviable position of being able to pass on your partial understanding to others, which is, hopefully, better than nothing. For some assistance in understanding please refer to the great illustrated guide Functors, Applicatives, And Monads In Pictures.

To reiterate, the problem monads attempt to solve is to make inherently impotent pure functions, which are by definition limited to computing a single result from a meager bunch of parameters, do some useful work. Let’s consider: a monad is two things: a context that wraps a value, and a rule describing how to combine said wrapped values into a sequence of computation steps. I will provide some examples right away, but the idea is that the context contains some information on how to put the value to work, and the combination rule describes how to combine the individual return values and contexts from many functions into one useful unit of work.

Our first example will be a monad I'll call []. The [] monad implements a counter of sorts. It wraps a value – say 3 – in the [] context, like so: [3, 0]. The first element is the wrapped value, and the second has a meaning in the context of the composition rule, which says that if we have an existing monadic value [a, n] and we're asked to compose it with a monadic function x -> [y, k] (don't worry, it will become clear in a second) we do so by passing the function the value a and adding the returned k to the original n, to yield [y, n + k] (i.e. the second element is our counter.) To make things concrete:

f(x) = [x + 1, 2] (this is a monadic function: it returns a monadic, i.e. wrapped value)

g(x) = [x * 2, 3]

z() = [3, 1]

Now we apply our monadic function [] (applying operations from left to right, as described in the following paragraph)

z .[] f .[] g => [8, 6]

I've chosen “.[]” to mean "compose with the [] monad composition rule". The result is [8, 6] because z returns [3, 1], then f is passed the 3, to return [4, 2], which composes to [4, 3] by our composition rule, which adds the numbers in the second element. Finally, g is passed the 4, and returns [8, 3] which composes into the final result [8, 6])

That, dear reader, is a monad. The beauty of these creatures is that now, because our wrapped, monadic values are also just values, a pure function is allowed to return them, but note how the monadic pure functions were able to affect the "state" of the second accumulated component without ever having it passed into them as a parameter, by virtue of the composition rule.

Ok, so how does this solve the I/O problem? Why, with the I/O monad, of course, which I'll call {}. This monadic wraps the value x in the context, i.e. monadic value, {io-op, x}, where io-op is the name of an I/O operation, which, in our example will be either print or read (or, since monads also require a neutral operation, noop). Our monadic composition rule, which we'll demonstrate momentarily, takes these monadic values and actually performs the I/O operations. This raises the obvious question, how can the rule be allowed to perform any I/O operation in a pure functional language? This is the second part of the trick: in PFP, the I/O monad's composition rule is not written in the PFP language but provided as a built-in construct by the language's runtime.

Let’s see this in practice using a simple example of outputting a message to the screen, reading the input, and then displaying that to the screen. A pair of the form {print, x} is composed by printing the value x to the console and passing nothing (i.e. an empty value which, depending on the language, could be called null or void) to the next monadic function, while a pair of the form {read, x} ignores the value x (which can therefore be empty, denoted as a “_” below), reads a value from the console and passes that to the next monadic function (for completeness, noop performs no I/O operation, but simply passes the value on to the next function in the chain):

p() = {print, "What is your name?"}
r() = {read, _}
n(x) = {print, "Hello, " + x + "!"}

The program p .{} r .{} n would therefore print "What is your name?" to the console, read the user's response – say, "Ron" – and finally print "Hello, Ron!"

In this way, the pure functions produce their {io-op, x} values, and the runtime composes them by doing the actual I/O. Problem solved!

Astute readers will ask, won't a function containing the sequence p .{} r .{} n, then, have I/O as a side-effect and therefore not be a pure function? Well, the PFP language will not actually evaluate the sequence in place, but would make the calling function return the sequence expression all the way back to the runtime, which will then evaluate it (lazily) and perform the actual I/O operations.

Now that we've addressed the toughest side-effect, I/O, with pure functions and monads (well, a built-in monad), we can solve other problems too. For example, in many imperative languages, a subroutine may return a value, but it also may throw an exception. An exception thrown in the middle of a computation will skip the remaining computation stages, until it is caught and handled in some way. We can achieve the same with monads: our monadic value will be a context that contains either the function's normal return value or an error, and the composition rule will pass a normal value to the next monadic function, but will abort the chain – i.e. not invoke further functions in the sequence – if a function returns an error.

The pertinent question is why would monads, a PFP construct designed to solve a PFP problem, ever find their way into the very core of some imperative languages, that by definition allow subroutines to freely perform I/O, throw exceptions, change state, and do all of the above at once?

The Problem with Threads

While PFPs will chain computations using exotic monads, imperative languages chain them by running one after another in any standard thread. Threads are, of course, used by the runtimes of PFP languages too, but they are less present as a language construct, with their associated stack semantics. On the other hand, threads and their stacks are very visible in imperative languages, even those like JavaScript, that only support a single thread.

The p .{} r .{} n program above would typically be written in an imperative language as:

p(); 
name = r(); 
n(name). 

The most powerful contribution of the thread abstraction to this program is what happens inside the r function, which reads the user's name from the console: the thread suspends itself – it blocks preserving the values of any local variables on the stack, and yields its CPU core to other threads or processes. Once the I/O operation completes, the thread unblocks, and continues where it left off, with all local variables intact. It is precisely this ability – blocking – that allows imperative code to chain the calls in the program so easily. But therein lies the problem.

Modern servers are required to handle a large number of concurrent connected clients. The number of concurrent requests they need to serve is determined by Little's law:

L = λW

where λ is the average rate of incoming requests, W is the average time it takes to handle a single request, and L is the required server capacity, i.e. the number of concurrent requests it must serve. If a server's capacity drops below L it becomes unstable and unavailable (see here and here for a longer discussion and experimental results).

There are many factors influencing the server's effective capacity. The number of TCP sockets the OS can support places an upper limit on the number of concurrent requests, as does available processing power. The effective capacity, then, is the minimum of all such upper limits placed by various system components, both software and hardware. As it turns out, one of the most constraining upper limits is imposed by the number of threads the kernel can effectively schedule, which is somewhere between 2,000 and 20,000. The reason for that is the relatively heavy cost – both in RAM and in scheduling latency – imposed by the kernel's implementation of threads. In contrast, modern operating systems can support (when properly configured) up to about two-million open TCP connections. Thread scheduling, then, reduces the server's capacity by about two orders of magnitude. (The effect is probably not quite as drastic, since other factors – such as RAM bandwidth – are more constraining than the number of active connections.)

So even though our elegant blocking thread – just as in our simple console example – would be the perfect way to manage each connection between client and server, it can't bear the load of modern server requirements. The thread, the imperative unit of concurrency, is not sufficient to handle the concurrency of our problem domain by assigning one thread per connection. The very abstraction that powers imperative programming's composition of computation has lost its effectiveness because its implementation – in the OS kernel – is too heavyweight!

What are we to do, then? Why, we must ditch the blocking thread, the most powerful abstraction of imperative programming! Instead of managing each connection with a thread, which results in simple imperative code, we will use only a small number of threads – only about as many as available processing cores – and never block. Thus was non-blocking asynchronous programming born.

Popularized by the server-side JavaScript platform Node.js, the asynchronous I/O style is now standard in most modern I/O APIs, sometimes offered in addition to the traditional blocking API (for example, Java's standard JAX-RS API for the construction and consumption of RESTful services offers both blocking and asynchronous flavors).

As a simplistic (and rather contrived) running example, let's consider a service that performs financial operations by doing calculations and consulting financial data. Using Java 8 syntax, suppose the service exposes an (also somewhat contrived) API consisting of a single method: double compute(String op, double x). Then as an example, we can then use it inside a computePensionPlan or a computeApartmentValue operation:

try {
   double result;
   result = compute("USD -> euro", 100.0);
   result = compute("subtractTax", result * 1.3);
   result = compute("addInterest", result);
   publish(result);
} catch(Exception ex) {
   log(ex);
}

However, the computation may need to consult a remote financial data service in order to retrieve current exchange, interest and tax rates, and may therefore block the thread for a while. Since our code is part of a SaaS software potentially serving many users, we wouldn't want to block the thread servicing the request, because, as we said, our thread pool size is limited.

Instead, let's go async and use callbacks. The API method now looks like this:

void compute(String op, double x, BiConsumer<Double, Exception> callback) and our code becomes:

compute("USD -> euro", 100.0, (result, ex) -> {
   if (ex != null)
       log(ex);
   else
       compute("subtractTax", result * 1.3, (result, ex) -> {
           if (ex != null)
               log(ex);
           else
               compute("addInterest", result, (result, ex) -> {
                   if (ex != null)
                       log(ex);
                   else
                       publish(result);
               });
       });
});

Now that ain't pretty. Instead of blocking the thread waiting for the response, we supply the service with a callback, which will be called when the operation completes. Once it does, we may issue another operation, and supply it with another callback and so on. This quickly becomes unwieldy, and so this callback-based asynchronous programming style has rightly earned the name “callback hell.”

I will not present an example of issuing several operations in parallel and then joining their results – doing so is possible in both the synchronous blocking style, as well as the asynchronous style, but requires further discussion that will take us too far off course. Let’s just say that the gap between the simplicity of the blocking approach vs. the incomprehensible mess of the asynchronous approach is much wider when managing several I/O requests in parallel, with the async approach requiring a rather sophisticated grapple with concurrent synchronization constructs.

Monads to the Rescue!

And it was in this state of dazed, aimlessly wandering, lost without our principal abstraction, that PFP has found us, offering help in the form of a new god, a PFP god – the mighty monad. With monads we can chain callbacks as stages in a computation, without need for ugly, unreadable nesting.

Libraries for employing monadic composition to chain I/O processing without blocking kernel threads have sprung up in imperative languages – the beautiful ReactiveX (C#, Java, JS), Node.js Promises (JS), Facebook's C++ Futures – and monadic composition has even found itself in the JDK itself, with the introduction of CompletionStage (and CompletableFuture which implements it) in Java 8.

Let's see how Java 8's CompletionStage may help us with our financial service. The API would now look like so: CompletionStage<Double> compute(String op, double x). Instead of returning a simple double-precision value, the compute method now returns a monadic value wrapping the result in a context. This particular context says that the value may not be immediately available, but will only be accessible when the lengthy operation completes. Our code now looks like so:

compute("USD -> euro", 100.0)
   .thenCompose(result -> compute("subtractTax", result * 1.3))
   .thenCompose(result -> compute("addInterest", result))
   .whenComplete((result, ex) -> {
       if (ex != null)
           log(ex);
       else
           publish(result);
   });

Do you see the monadic composition (hint: thenCompose)? This is much better than the naive callback-based approach. Note how exceptions are propagated and only need to be handled in the end; monads can do that.

So monads are now all the rage in imperative languages, (or rather, monad-style composition, for there is some dispute on whether or not these are actual monads; the wrapper, CompletionStage may not be a real value, but let's leave this academic discussion for the PFP people, after thanking them for delivering us from callback purgatory.) In fact, even though monads were introduced in Haskell as a core abstraction in 1996, almost twenty years ago, and to imperative languages only in the past couple of years, there are now more people using monadic composition in Java (or JavaScript) alone than the total number of people using Haskell in production.

The Problem with Monads (in imperative languages)

While this monadic (or monad-like) approach is much better than nesting callbacks, I find it unsuitable for imperative programming. Even if we rise above our earthly pragmatic outlook and overlook the fact that this style requires us to abandon almost every I/O-related API we're using in our code, replacing them wholesale with new (mostly nonexistent and certainly non-standard) APIs – even if we go back to basics and consider the fundamental tools of our trade – we find serious problems.

Suppose we wanted to do this:

result = compute("USD -> euro", 100.0);
while (!satisfactory(result)) {
   result = compute("subtractTax", result * 1.3);
   result = compute("addInterest", result);
}
publish(result);

This loop probably doesn't make much sense in a real financial use case, but please forgive my financial ignorance, and merely consider it an illustration of the challenge of implementing an asynchronous loop. How do we do that with monads – not in Haskell, but in Java/JS/C++? It can be done, but probably requires introducing another named function or some such hack. Bottom line, we best not use loops.

Next up, exceptions. While we do use the traditional exception types, we treat them as values, completely ignoring how exceptions are thrown and caught. This makes PFP people rejoice, but not us, as we shall see. If compute wants to signal an error, it must not throw an exception; that would not only violate the requirement that monadic functions have no side effects, but in fact it would not even work at all! Our thread doesn't call compute directly (except for the first invocation). The exception would then be thrown in some worker thread, depending on the particular implementation of the CompletionStage. No, in order to signal an exception, compute must do return new CompletableFuture().completeExceptionally(new Exception("bummer")).The problem isn't the syntax; you can wrap the above statement in a function called throwish if you like and do throwish(new Exception("bummer")). The problem is that our client code can't use try/catch; in short – no exceptions either.

After abandoning threads and blocking, we have now also abandoned imperative control flow (well, some of it) and exceptions, and are required to replicate simple, built-in operation chaining, control flow and error handling with library calls! "So what?" will say the PFP people. "We don't have loops and exceptions and we do fine with monads and recursion alone! You should thank us for abstracting away such basic language concepts into noble higher-order operations."

And that is why I find this grafting of foreign concepts unsuitable. Not because most core language constructs are now replicated in an incompatible way that forces us to wrap and translate things, when we move from one "world" to the other – the one in the monad and the one outside it – but rather because those constructs are all entwined with imperative programming's core abstraction, the thread, which gives them their power.

Consider the exception. The exception in an imperative language isn't just a value signalling an error; it also carries context telling us exactly where the error occurred. Where does it get the context? From the thread's stack. But suppose our monadic compute encounters an error in one of its stages, and percolates it up the monadic chain for us to log. Examining the logs will not reveal whether the failure happened when our code was called as part of our computePensionPlan operation or computeApartmentValue operation; the stack context is lost.

This loss of the stack context not only has implications on post-mortem debugging but on profiling. Suppose we run our program through a profiler and find that compute is responsible for 90% of our application's time. Is most of that time spent when we call compute from computePensionPlan or from computeApartmentValue? We'll never know because that information is lost. We will just see plenty of calls to compute from some anonymous thread-pool, not knowing why and where they've been called.

Other than losing valuable debugging and profiling information, we lose other, more subtle, benefits that the old blocking style brought us. Even though we are no longer nesting callbacks, when using monadic composition each of the computation stages is still a callback: we receive the result of the previous stage as a function parameter rather than as the function's return value. Instead of: result = compute(...); doStuff(result); we have result -> doStuff(result). The difference between the two is known as a pull API vs. a push API respectively. To show why this matters, let's look at the following scenario. We have a library that delivers a stream of messages.

The library might expose a pull API:

m = receive()

or a push API:

onReceive(m) { ... }

Both styles are duals, i.e. there is a constant-time transformation from each to the other. I encourage you to watch the informative talk Contravariance is the Dual of Covariance by Erik Meijer, inventor of the ReactiveX libraries, which explores this duality in some detail. But while push and pull are indisputably duals, I personally believe, and contrary, I presume, to Erik Meijer, that pull is always superior to push in imperative languages.

First, it is more general. If the library supplied us with a pull API and for some reason we'd like to use a push API, we can convert the former to the latter trivially:

while(true)
    onReceive(receive());

On the other hand, we cannot convert a push API to a pull API without introducing an additional data structure (a queue).

Second, the pull API conveys much more information to the programmer, who knows exactly which thread will receive the message – the thread that called receive. In the case of the push API, things are not so simple. Will the messaging library always call onReceive on the same thread? If not, can onReceive be called concurrently? The answers to these questions are not immediately apparent, but must be gleaned from the (hopefully documented) semantics of the library.

Finally, the pull approach conveys more information to the library itself. The call m = receive() contains two signals. One from our code to the library (the call to receive), and the other from the library back to our code (the return of the result). The first signal is important as it automatically creates backpressure. The messaging library will not flood us with messages faster than we can handle them. We will receive one message only when we're good and ready to ask for it. This is not the case with the push approach. Indeed, a new standard for the JVM and JS called Reactive Streams for push-based APIs introduces explicit backpressure, while in the pull approach this is handled automatically and immediately apparent to anyone reading the code.

A push API has no advantages and quite a few disadvantages, and we can still use the same neat functional transformations on the message streams (like those used in functional reactive programming, e.g. RxJava) even with a pull API. In fact, those transformations can be even more powerful (that discussion is beyond the scope of this post, but see here for an example).

To summarize, when using monadic composition in imperative code we:

  1. Lose the ability to use all our existing APIs, and retrofit all code to the new, foreign style.
  2. Recreate control flow and exceptions in the monadic library on top of those constructs that are already built into the language.
  3. Lose stack context when debugging and profiling.
  4. Lose threading information and automatic back-pressure.

Monads are amazing – for PFP languages. Those languages have elegant syntax tailored for monads, and they implement all of their imperative-style computations on top of them. They do exceptions only with monads; I/O only with monads. In PFP, monads are an uber-abstraction used for, well, lots of purposes.

In imperative languages they're just a pain used to plug another pain; we're getting the worst of both worlds, and for what? One thing only: they make asynchronous programming – basically, non-blocking threads – less ugly. Nobody would have made monads so pervasive in I/O code in imperative languages if threads had only behaved.

We Can Do Better, Much Better

The thing is, threads are a simple and beautiful abstraction. – Their only problem is that the particular implementation of that abstraction by the kernel is too heavyweight to support the needs of modern servers.

Why drop a good abstraction – the core abstraction of imperative programming and its basic unit of software concurrency – in favor of a foreign one, built by an alien culture and for a different reason just because of an insufficient implementation? Can't we just fix the implementation? It turns out we can, and it's not too hard either.

The reason OS-provided threads have both a high RAM overhead as well as significant scheduling costs is that they must be completely general; they must support all languages, and widely different behaviors, from a thread playing video, or one running physics simulation for a game, to one serving an HTTP request. If we target a particular runtime/language that we can control and optimize scheduling for a narrower use case, we could lower both RAM consumption and scheduling costs to virtually zero (relative to the asynchronous style), while still preserving this useful, core abstraction.

If we target a particular runtime, we can use small, growable stacks rather than the large, fixed-size stack provided by the OS. If we design our user-mode threads to optimally handle the transaction-serving workloads that are characterized by short bursts of work followed by triggering other threads and blocking, we can use a work-stealing scheduler, which is especially suited for this kind of behavior by yielding just the right kind of CPU-cache behavior.

This is precisely the design of the user-mode threads, aka lightweight threads, aka fibers in Erlang, Go and Quasar – Parallel Universe’s open source fibers library for the JVM. Fibers -- a user-mode thread implementation that makes threads practically free to create and block -- embrace imperative programming's core abstraction without impeding in any way the use of functional techniques where they are appropriate, and simply optimize its implementation. Fibers' low-overhead makes these lightweight threads suitable again as a software unit of concurrency that directly models the domain's unit of concurrency. There is no problem to assign a fiber to handle a single HTTP request, or even more than one fiber if some parallelism is needed. By allowing a virtually unlimited number of threads, concurrency becomes much more straightforward, as no effort is required to avoid imperative programming's core abstraction. You can block all you want without worrying about the abstraction's cost. We can use loops and exceptions as provided by the language, and we get our stack traces back. Best of all, we can keep using the same APIs as the thread-blocking ones. Our code once again becomes:

try {
   double result;
   result = compute("USD -> euro", 100.0);
   result = compute("subtractTax", result * 1.3);
   result = compute("addInterest", result);
   publish(result);
} catch(Exception ex) {
   log(ex);
}

and its performance is just as good as the monadic version.

Fiber implementations are also available for Node.js, Python and other languages. C# and ClojureScript have "semi"-fibers (i.e. implemented with stackless continuations, that suspend computation but only capture a single stack frame) with their async/await and core.async capabilities respectively (Clojure on the JVM can use Quasar's true fibers); .

Fibers are not completely devoid of problems. They are tailored for a specific runtime, so calls to low-level C code may interfere with their operation if it blocks (interoperation with C is one of the reasons the Rust language by Mozilla decided to opt for kernel thread implementation rather than userspace threads out of the box). Tools like debuggers and profilers need to be aware of the runtime's use of fibers, although that is a relatively straightforward issue to address.

When introducing fibers with Quasar to a platform (the JVM) that doesn't support them natively, the biggest problem we faced is that libraries need to be made aware of them. This is not hard to do -- any library exposing an asynchronous, callback-based API, can be easily wrapped to expose its blocking API in a fiber-friendly manner. In fact, any monadic function can be automatically transformed to a fiber-blocking one. For example, when using Quasar, the call get(compute(...)) turns the monadic version of compute to a fiber-blocking version. Similarly, we have already wrapped lots of I/O libraries, so that many of them can be used with fibers without any extra work. All the work put into asynchronous I/O libraries has been put to good use, only now with much better APIs and composition.

Conclusion

It was not my intention to compare and contrast the advantages and disadvantages of PFP and imperative programming; they both have plenty of each. But their approaches to composing a sequence of computational steps are fundamentally different, and strongly tied to the philosophy underlying each. Borrowing PFP's approach of monadic composition to imperative languages yields the worst of both worlds: it doesn't quite enjoy the elegance monads present in a pure environment, and it prevents us from enjoying imperative programming's strengths and interoperability with other imperative code. Worse, the only reason for importing the PFP abstraction is not due to a lack of a more appropriate abstraction – the thread – but due to a flaw in its common implementation, a flaw that is rather easily fixed by most imperative runtimes by the introduction of fibers.

For further discussion, see From Imperative to Pure Functional and Back: Monads vs. Scoped Continuations

About the Author

Ron Pressler is the founder of Parallel Universe, a Y Combinator company building a JVM-based server-side stack for easily writing high-performance, low latency applications that work in harmony with modern hardware architecture rather than fight it. Prior to founding Parallel Universe, Ron worked on air-traffic control and missile-defense systems, as well as large, clustered physics simulations.

Rate this Article

Adoption
Style

BT