BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Beyond Foundations of F# - Asynchronous Workflows

Beyond Foundations of F# - Asynchronous Workflows

This item in japanese

Bookmarks
My previous article for InfoQ, "Beyond Foundations of F# - Workflows", introduced new language feature workflows. This article will look at an interesting use of the workflow feature called asynchronous workflows, which are aimed at simplifying .NET's asynchronous programming model.
What is F#?
F# is a statically typed functional programming language that targets the .NET framework. It shares a common core language with OCaml, another popular functional programming language, and draws ideas from many other programming languages, including Haskell, Erlang, and C#. In a nutshell this means that F# is a programming language that has a nice succinct syntax that feels a bit like scripting as we are able to execute the code interactively but has all the type safety and performance of a compiled language. This article is not indented to be an introduction to F#, but there are many resources on the web intended to make learning F# easy. See the side bar "F# Resources" in my first article on F#.

The Asynchronous Programming Model

When working with the .NET BCL all I/O operations use one of two models, the synchronous model or the asynchronous model. The asynchronous model is supported though a common programming pattern where a pair of BeginXXX and EndXXX methods are provided and the programmer starts the operation by calling the BeginXXX which begins the operation then returns. The programmer must call the EndXXX method one they are notified that the asynchronous operation has ended.

In my experience most programmers tend to use synchronous model because of its simplicity and because there are many class in the BCL that only support the synchronous model, but in many case the asynchronous programming model can produce more responsive more scalable application. To illustrate the difficulties of the asynchronous module let's take a look at a simple example of opening a file and reading bytes from it, here's what the code looks like in the synchronous example:

#light
open System.IO

let openFile() =
use fs = new FileStream(@"C:\Program Files\Internet Explorer\iexplore.exe",
FileMode.Open, FileAccess.Read, FileShare.Read)
let data = Array.create (int fs.Length) 0uy
let bytesRead = fs.Read(data, 0, data.Length)
printfn "Read Bytes: %i, First bytes were: %i %i %i ..."
bytesRead data.(1) data.(2) data.(3)

openFile()

The BCL provides simpler methods of doing this such as "File.ReadAllBytes" but this is the simplest way which has an asynchronous equivalent. The operation of reading from a file is very straight forward, we open a file stream, create an array to hold the data, and finally read all the data into the array. Notice how we use a "use" binding when creating the file stream, this is roughly equivalent to the using statement in C#, and means the file stream will be disposed when it drops out of scope.

Now let's take a look at the equivalent using the asynchronous programming model:

#light
open System.IO

let openFile() =
let fs = new FileStream(@"C:\Program Files\Internet Explorer\iexplore.exe",
FileMode.Open, FileAccess.Read, FileShare.Read)
let data = Array.create (int fs.Length) 0uy
let callback ar =
let bytesRead = fs.EndRead(ar)
fs.Dispose()
printfn "Read Bytes: %i, First bytes were: %i %i %i ..."
bytesRead data.(1) data.(2) data.(3)
fs.BeginRead(data, 0, data.Length, (fun ar -> callback ar), null) |> ignore

openFile()

While this simple example of opening a file is still pretty understandable, things have definitely gotten more complicated. The first couple of steps are pretty much the same: open a file stream, create an array to hold the data. But from then on things get worse, we need to define a callback to handle calling the "EndRead" method and this callback needs to be passed into the "BeginRead" method a long with an state object (here we pass null as we don't need it). It is also important to notice that we can no longer use the "use" binding, this is because the file stream will drop out of scope when the "BeginRead" method exists meaning it would be disposed to soon, and not available when the "EndRead" method is called. This means we need to add a call to its "Dispose" method and that we lose the safety of having this called in a finally block. While these extra complications seem reasonable for a simple example like opening a file you soon run into problems as you add more functionality and further asynchronous reads to the application.

Asynchronous Workflows

Asynchronous workflows were introduced to tackle this specific problem. So now let's take a look at the asynchronous workflow version:

#light

open System.IO
open Microsoft.FSharp.Control.CommonExtensions

let openFile =
async { use fs = new FileStream(@"C:\Program Files\Internet Explorer\iexplore.exe",
FileMode.Open, FileAccess.Read, FileShare.Read)
let data = Array.create (int fs.Length) 0uy
let! bytesRead = fs.ReadAsync(data, 0, data.Length)
do printfn "Read Bytes: %i, First bytes were: %i %i %i ..."
bytesRead data.(1) data.(2) data.(3) }

Async.Run openFile

The most important thing to notice about the workflow version is that it only differs from the synchronous version by a few characters. A "async { ... }" workflow declaration has been added, and more importantly we've changed the line that reads from the file:

let! bytesRead = fs.ReadAsync(data, 0, data.Length)

We've added a bang (!) to the let keyword to and we're now calling the "ReadAsync" instead of "Read". In an asynchronous workflow "let!" tells us that the bind will be made asynchronously, and the ReadAsync function provides the specification of how the BeginRead and EndRead methods should be called. Note if we didn't use an asynchronous function then we'd get a compile type error. So where did the "ReadAsync" come from? It is not a function normally available in the "FileStream" class. Observant readers will have notice the "open Microsoft.FSharp.Control.CommonExtensions", this opens a namespace which contains many F# type augmentations. These are very similar to C#'s extensions methods and allow you add extra functions to existing classes and the namespace "Microsoft.FSharp.Control.CommonExtensions" provides many augmentations for use with asynchronous workflows.

It's also important to notice that we can still use the "use" binding to dispose the file, even if this means the file is being disposed on another thread, we don't care, it just use works.

The other notable change is how we execute the work, the "fileOpen" identifier does not open the file immediately, it is a workflow, that is an action waiting to happen. To execute this action we need to use the "Async.Run" function, this executes a single workflow and waits for its result.

I found it helped my understanding of how asynchronous works to add a call to a little debugging function either side of the called to "ReadAsync", which allows us to see what thread the program is executing on and the threads stack trace, but I'll leave that as an exercise to the reader:

let printThreadDetails() =
Console.WriteLine("Thread ID {0}", Thread.CurrentThread.ManagedThreadId)
Console.WriteLine((new StackTrace()).ToString())

It is also a good point to refer back to my original article on workflows (http://www.infoq.com/articles/pickering-fsharp-workflow) to see how the "let!" is de-sugared to a continuation function. This will help you to understand how the asynchronous workflow can just start up again on another thread after the "let!".

Quantifying Performance Gains

So what kind of performance gain can we expect when using asynchronous workflows? As with nearly all performance related questions, this difficult question to answer without resorting to experimentation. Typically programs are either computation bound or I/O bound and by using asynchronous workflows you will generally see improvements in both cases. However it should be noted that the hardware you are using will have a big effect on this, if your task is I/O bound then you will not see much improvement unless your disk offers good concurrent access, and while there are some disk available that offer very good concurrent access these tend to be fitted to high spec servers rather than laptops or desktops. If your task is processor bound then you will typically see better gains as most modern laptops and desktops are fitted with dual core processors and many readers may be thinking about ordering themselves one of the quad core models that are coming on to the market. This means that by using asynchronous workflows correctly you will be able to harness some of this extra processing power.

Let's work through an example of a task that does both I/O and computation work to see what kind of performance gains we get. Suppose we have some ascii text that we wish to analyse, a first step in this might be to count the total words and then count the number of unique words. Opening and reading the file will provide the I/O work and counting and the number of words and computing the unique words will create the computational overhead. For my test I've chosen to download all the works of Henry Fielding from Project Gutenberg (mainly because fielding has sufficiently few works that you kind download them without going mad, unlike Shakespeare or Dickens).

First we need a script to analyse the works synchronously:

#light
open System
open System.Diagnostics
open System.IO
open System.Text.RegularExpressions

let path = @"C:\Users\robert\Documents\Fielding"
let readFile filePath =
// open and read file
let fileStream = File.OpenText(filePath)
let text = fileStream.ReadToEnd()

// find all the "words" using a regex
let word = new Regex("\w+")
let matches = word.Matches(text)
let words = { for m in matches -> m.Value }

// count unique words using a set
let uniqueWords = Set.of_seq words

// print the results
let name = Path.GetFileNameWithoutExtension(filePath)
Console.WriteLine("{0} - Words: {1} Unique words: {2} ",
name, matches.Count, uniqueWords.Count)

let main() =
let filePaths = Directory.GetFiles(path)
for filePath in filePaths do readFile filePath

As you can see our script is very straight forward, first we open and read the file, then we count all the words using a regular expression (here we define word to be one or more consecutive character), then we count the unique words simply by creating a set. The "Set" type is part of the F#'s native libraries and models a set in mathematics, it is an immutable data structure and will do an efficient job of computing the unique words in the document, but this will still be fairly computationally insensitive.

Now let's examine the asynchronous version:

#light
open System
open System.IO
open System.Text.RegularExpressions
open Microsoft.FSharp.Control.CommonExtensions

let path = @"C:\Users\robert\Documents\Fielding"

let readFileAsync filePath =
async { // open and read file
let fileStream = File.OpenText(filePath)
let! text = fileStream.ReadToEndAsync()

// find all the "words" using a regex
let word = new Regex("\w+")
let matches = word.Matches(text)
let words = { for m in matches -> m.Value }

// count unique words using a set
let uniqueWords = Set.of_seq words
// print the results
let name = Path.GetFileNameWithoutExtension(filePath)
do Console.WriteLine("{0} - Words: {1} Unique words: {2} ",
name, matches.Count, uniqueWords.Count) }

let main() =
let filePaths = Directory.GetFiles(path)
let tasks = [ for filePath in filePaths -> readFileAsync filePath ]
Async.Run (Async.Parallel tasks)

main()

As we can see the file reading function changes little – other than to have "async { ... }" workflow wrapped round it and to make a call to the "ReadToEndAsync" function instead of "ReadToEnd". The changes to the "main" function are more interesting, here first we map our list of files to a list of asynchronous workflow and bind it to the identifier "tasks". Remember at this point the workflows are not executed yet, to execute them we use the "Async.Parallel" to transform the list of task into one workflow than will be executed in parallel. Then we use "Async.Run" to run the tasks in parallel.

I ran the tests on my laptop (a dual core) using in F# interactive, which has some excellent timing facilities. My methodology was simple: I ran both of the scripts once then threw away the results (to get rid of the effect of disk caching), then I ran each script 3 times:

Sync Async
First Run 16.807 12.928
Second Run 16.781 13.182
Third Run 16.909 13.233
Average 16.832 13.114

So the asynchronous version runs about 22% faster than synchronous version on a dual core machine, not bad for modifications that that only changed a couple of lines. But why don't we see a 100% speed up? The answer is quite simple this task isn't fully computational bound, if we added more computational work to our algorithm, perhaps counting the number of occurrences of each word, or looking creating groups of similar words, then we would see the percentage speed up increase. Reading and processing files is not the only place that asynchronous workflows are useful, they can also be used for network programming. In fact, as network data access tends to be slower than disk access then using asynchronous workflows to avoid blocking threads while network access completes can have even more benefits than when using them for file access.

Conclusion

Asynchronous workflows tackle a very specific problem, how to use the .NET asynchronous programming model correctly, providing the most elegant solution available on the .NET framework. Using the asynchronous programming model can help make your applications more scalable and the asynchronous workflows can help you do this more easily.

Further Reading

Jeffery Richter covered the asynchronous programming model and some of the problems and solutions when implementing it using C# here.

Asynchronous Workflows are cover in the Chapter 13 of "Expert F#" (APress, December 2007) with further examples in Chapters 14.

Rate this Article

Adoption
Style

BT