My introductory book to F#, "Foundations of F#" was published in May 2007. All the examples in book use the core syntax, which we expect to remain unchanged going forward. However, as F# comes out of a research lab it typically sees a release adding new features on at around three to sixth months intervals so F# now includes some features that we're not covered in the book.
This article takes a closer look at active patterns a feature that adds to F#'s already strong pattern matching capabilities.
What is F#?F# is a statically typed functional programming language that targets the .NET framework. It shares a common core language with OCaml, another popular functional programming language, and draws ideas from many other programming languages, including Haskell, Erlang, and C#. In a nutshell this means that F# is a programming language that has a nice succinct syntax that feels a bit like scripting as we are able to execute the code interactively but has all the type safety and performance of a compiled language. This article is not indented to be an introduction to F#, but there are many resources on the web intended to make learning F# easy. See "F# Resources" later on in the article for a list of these.
Pattern Matching
To understand the motivation for active patterns it's we need to first understand pattern matching in F#. Most languages support pattern matching to some degree, C style languages (C, C++, java and C#) support this though a switch statement and I like to say that pattern matching in F# is like a switch statement on steroids. So let's take a look at a simple example of pattern matching:
let intToString x =
match x with
| 1 -> "One"
| 2 -> "Two"
| 3 -> "Three"
| x -> Printf.sprintf "Big number: %i" x
Here we define a function that takes an integer parameter x and converts it to string, with 1 being converted to "One" etc. Okay so matching over an integer value isn't very exciting, but hopefully you are can see that this is quite an aesthetically pleasing syntax for doing this. Aesthetics, aside the real power of F#'s pattern matching is the diverse range of values and types you can pattern match over. So as well as being able to pattern match over values, such as integers, floating point numbers and strings, we are also able to pattern match over an objects type:
open System
let typeToString x =
match box x with
| :? Int32 -> "Int32"
| :? Double -> "Double"
| :? String -> "String"
| _ -> "Other"
Another useful feature of pattern matching is that it allows you to pattern match over an F# union types. A union type in F# is a value that can be one of a fixed number of different structures; they are often used model tree like data structures, so we show here a union type that represents a binary tree:
type BinaryTree<'a> =
| Leaf of 'a
| Node of BinaryTree<'a> * BinaryTree<'a>
This is a data structure that can be either a leaf or a node; a node is made up of two other binary tree data structures, a leaf has a value which in this case has a generic type so all the values in the leaves of the tree will have the same type. The way we work with this tree type is patter matching, below we present a very simple function to print out all the values in the tree:
let rec printBinaryTreeValues t =
match t with
| Leaf x -> printfn "%i" x
| Node (l, r) ->
printBinaryTreeValues l
printBinaryTreeValues r
The important thing to notice about this example is the way pattern matching allows us to cover the two cases, either the value is a leaf in which case we print the value or it's a node in which case we recursively call the function to search the sub nodes of the tree. A slightly enhanced version of the function is given below that prints the tree with indentation to give a better idea of its structure is shown below:
let printBinaryTree t =
let rec printBinaryTree t indent =
match t with
| Leaf x -> printfn ->"%sLeaf %i" indent x
| Node (l, r) ->
printfn "%sLeft Branch" indent
printBinaryTree l (indent + " ")
printfn "%sRight Branch" indent
printBinaryTree r (indent + " ")
printBinaryTree t ""
printBinaryTree (Node ((Node (Leaf 1, Leaf 2)), (Node (Leaf 3, Leaf 4))))
When execute the example will print:
Left Branch
Left Branch
Leaf 1
Right Branch
Leaf 2
Right Branch
Left Branch
Leaf 3
Right Branch
Leaf 4
Active Patterns
The idea of active patterns is to enable you to use the pattern matching syntax with other data structures. Active patterns allow us to build a union type like data structures from the .NET classes, we are then able to pattern match over these data structures. Suppose we have an xml document, it would be nice to be able to pattern match over its nodes. The first step is to create our union like data structure from the .NET types:
let (|Node|Leaf|) (node : #System.Xml.XmlNode) =
if node.HasChildNodes then
Node (node.Name, { for x in node.ChildNodes -> x })
else
Leaf (node.InnerText)
We can see here that we define either a pattern that is either a leaf or a node, if the XmlNode object has child nodes then to us it is a node, otherwise it is a leaf. We can now use the leaf and node patterns that we have defined during pattern matching, for example if we wanted to print out an xml document:
let printXml node =
let rec printXml indent node =
match node with
| Leaf (text) -> printfn "%s%s" indent text
| Node (name, nodes) ->
printfn "%s%s:" indent name
nodes |> Seq.iter (printXml (indent +" "))
printXml "" node
In this example if we find a leaf we print out the text it contains, if we find a node we print its name and then go on to print out the child nodes. Using this function becomes just a matter of initializing an xml document and then calling our print function:
let doc =
let temp = new System.Xml.XmlDocument()
let text = "
<fruit>
<apples>
<gannySmiths>1</gannySmiths>
<coxsOrangePippin>3</coxsOrangePippin>
</apples>
<organges>2</organges>
<bananas>4</bananas>
</fruit>"
temp.LoadXml(text)
temp
printXml (doc.DocumentElement :> System.Xml.XmlNode)
I think even this simple example offers a nice way to deal with xml documents that would be quite usable in more realistic scenarios if we didn't need too much information about the node type. We can imagine an extended xml active pattern library with more node types for where we need more details about the node. It would also be feasible to implementing active patterns libraries for other tree like structures we work with regular, for example the file system:
let (|File| Directory|) (fileSysInfo : System.IO.FileSystemInfo) =<br/>
match fileSysInfo with
| :? System.IO.FileInfo as file -> File (file.Name)
| :? System.IO.DirectoryInfo as dir ->
Directory (dir.Name, { for x in dir.GetFileSystemInfos() -> x })
| _ -> assert false
// a System.IO.FileSystemInfo must be either a file or directory
But active patterns are not exclusively for working with tree like structures. Another area where they are useful is where we want run different sets of test on data. Typically when users enter data in string form it is the job of the programmer to transform this string data into something more meaningful and easier to work with, one of the most problematic items to deal with is dates because of the large number of different formats that are used to express them. Often we want to run a number of different test on our input date to try and find the correct format, but running these tests as a series of if then else statements can look scruffy and be difficult to maintain. We can use active patterns to produce a library of parsing active patterns and then use pattern matching to apply the appropriated set of test:
open System
let invar = Globalization.CultureInfo.InvariantCulture
let style = Globalization.DateTimeStyles.None
let (|ParseIsoDate|_|) str =
let res,date = DateTime.TryParseExact(str, "yyyy-MM-dd", invar, style)
if res then Some date else None
let (|ParseAmericanDate|_|) str =
let res,date = DateTime.TryParseExact(str, "MM-dd-yyyy", invar, style)
if res then Some date else None
let (|Parse3LetterMonthDate|_|) str =
let res,date = DateTime.TryParseExact(str, "MMM-dd-yyyy", invar, style)
if res then Some date else None
Here we define 3 different active patterns to attempt to parse a date, ParseIsoDate, ParseAmericanDate and Parse3LetterMonthDate. Here we use an underscore at the end of the pattern to show that the pattern is incomplete, the pattern either finds a date or it doesn't. This is unlike the previous scenarios we have seen we could assert that the patterns we complete, an xml node is either a node or a leaf, we did not allow for the possibility of anything else. In practice working with incomplete patterns is not much different to working with complete ones, except to avoid a compiler warning we must give a default case for the pattern and we can use several incomplete patterns in turn as long as they all take the same kind of input. We illustrate this below in an example which uses our three date active patterns to attempt to parse a string to a date:
let parseDate str =
match str with
| ParseIsoDate d -> d
| ParseAmericanDate d -> d
| Parse3LetterMonthDate d -> d
| _ -> failwith "unrecognized date format"
parseDate "05-23-1978"
parseDate "May-23-1978"
parseDate "1978-05-23"
parseDate "05-23-78"
Our example will successful parse the first 3 dates but not the final one because the final date string uses a two digit year that we have not provided a pattern for. It we could provide providing a library of date patterns that do account for this, and many other formats, and give the programmer a quick and clean way to express which date formats are allowed.
Finally partial active patterns can be parameterized to enable better pattern reuse. Below we show an example of a regular expression active pattern. It is parameterized so that we can have an active pattern of any regular expression we desire:
let (|ParseRegex|_|) re s =
let re = new System.Text.RegularExpressions.Regex(re)
let matches = re.Matches(s)
if matches.Count > 0 then
Some { for x in matches -> x.Value }
else
None
let parse s =
match s with
| ParseRegex "\d+" results -> printfn "Digits: %A" results
| ParseRegex "\w+" results -> printfn "Ids: %A" results
| ParseRegex "\s+" results -> printfn "Whitespace: %A" results
| _ -> failwith "known type"
parse "hello world"
parse "42 7 8"
parse "\t\t\t"
When compiled and run this example will show:
Ids: seq ["hello"; "world"]
Digits: seq ["42"; "7"; "8"]
Whitespace: seq ["\t\t\t"]
Readers with some experience working with parsers will probably notice how similar this is to tokenizer generated by a "lex" style programming tool. There are a few key differences between this examples behaviour and a lex style tokenizer's behaviour; here the entire string is searched for all matches a lex style tokenizer delivers the longest match at the start of the string. However, I believe that if one needed to build a tokenizer and wanted to avoid the complications of using another programming tool one could build an active pattern that fulfilled this requirement.
Summing up
This article has been a quick tour of pattern matching in F# and introduced its new active patterns feature. We've taken a look at why pattern matching is important, it helps us build cleaner more maintainable code, and seen how these ideas are extended by active patterns. If you are interested in learning more about active patterns please see this blog post from Don Syme, which includes a link to a paper which gives more details about the active pattern design:
F# Resources
There are a growing number of F# resources available on the web, here's a summary of some of the best ones:
- The Official F# Site, find the latest version of the compiler and the F# manual
- Don Syme, F#'s lead developer's, Blog, a great place to keep an eye on for F# announcements and short articles about making the most of F#.
- The Hub-FS, an F# community site with blogs and forums.
- Robert Pickering's F# tutorials and resources.
- Flying Frog Consultancy's F# tutorials and resources.
Where Next?
After spending the last year adding features to the F# language and expanding the libraries, the language is entering a new phase. Although the implementation is already of a high quality, it seems there has been growing interest from the Microsoft teams in giving a more official and supported status to F#. F# seems to have a bright future as a value-add tool in the .NET language ecosystem, and the F# team are intending to spend the next year or so refining the compiler to make it even more polished and improving the quality of the libraries, tools and documentation.
About the Author
Robert Pickering is a Software Engineer and Technical Writer. He currently works for LexiFi, which is an innovative ISV that specializes in software for analyzing and processing complex financial derivatives - products such as swaps and options. LexiFi has pioneered the use of functional programming in finance in order to develop a rigorous formalism for representing financial instruments. His blog is: http://strangelights.com/blog.