Refactoring to FParsec

I have been playing around with FParsec a little bit lately, inspired by the chapter 8 “Functional Parsers” of the book Programming in Haskell by Graham Hutton. FParsec is an F# parser combinator library for building quite powerful parsers by combining primitive parsers and functions provided by the FParsec library.

If you haven’t been exposed to the concept of functional monadic parsers then this can be a very different experience. I am still totally fascinated by the power and the simplicity of this concept. Here is a brief introduction in C#.

As an exercise and to learn the usage of FParsec I have been looking for code that could be refactored to FParsec.

Example

Here is a first example. The function generateNext takes a string, tries to generate the next ID by incrementing the numeric part of the string by one and returns the result if it succeeds. Otherwise it will return a failure. It is not really a great implementation but it works.

open Chessie.ErrorHandling
open System

let generateNext (current:string) =
    try
        let (|Int|_|) str =
            match System.Int32.TryParse(str) with
            | (true,int) -> Some(int)
            | _          -> None

        let prefix =
            match current.[0] with
            | a when a = 'A' -> Some a
            | _              -> None

        match current.[1..6], prefix with
        | Int n, Some pf when n >= 0 && n < 999999 -> sprintf "%c%06i" pf (n + 1) |> ok
        | _                                        -> IdGenerationFailure "Ivalid input." |> fail
    with
    | ex -> IdGenerationFailure ex.Message |> fail

Here are some tests for the success cases:

[<TestCase("A000000","A000001")>]
[<TestCase("A009999","A010000")>]
[<TestCase("A999998","A999999")>]
let ``generateNext should succeed if input is valid`` (input, expected) =
   generateNext input |> resultShouldEqual (expected |> ok)

Refactoring

Let’s look at the refactored version. It uses a computation expression to build the parser. By using the let! construct parsers can be applied sequentially and if one of them fails the rest of the operation will be bypassed.

open FParsec

let generateNext current =
    let p = parse {
        let! prefix = pstring "A"
        let! n = int32Within 0 999998
        let! _ = eof
        return sprintf "%s%06u" prefix (n + 1) }

    match run p current with
    | Success (v,_,_)       -> Chessie.ErrorHandling.Trial.ok (v)
    | Failure(errorMsg,_,_) -> Chessie.ErrorHandling.Trial.fail (IdGenerationFailure errorMsg)

The result is really nice, compact and well readable. pstring "A" will succeed if the input starts with “A”. int32Within will succeed if the prefix is followed by a number within the given interval and eof will verify that there is no more unconsumed input. In the return statement the result is constructed given that all previous parsers succeeded.

For this to work we need a little helper function int32Within which is again built from another helper function sat. Those helper functions are of general purpose and are therefore completely reusable.

open FParsec

let sat parser predicate msg = 
    parser >>= (fun x -> if predicate x then preturn x else fail msg)

let int32Within from ``to`` = 
    sat pint32 (fun n -> n >= from && n <= ``to``) (sprintf "unexpected number.\nexpected a number within [%d, %d]" from ``to``)

Note that the refactored version will not verify the length of the input string. E.g. A123 is a valid input as well as A0000000123. Also the original version will also successfully parse inputs that have trailing characters. In the context of generateNext this is not so important. But in others scenarios one might have to pay close attention to these details.

Conclusion

So far I found that refactoring hand rolled parsing code with the use of FParsec made the code more concise, more readable and eventually better, even in a very simple scenario like in the example above. One reason for that is that the aspect of error handling is completely abstracted away. Off course in the example the result is mapped to the Result type of the domain. But the parsing code itself is not cluttered with try catch statements or any other code that is concerned with handling the failure cases.

  • Vasily Kirichenko

    You can pattern match on chars directly:
    let prefix =
    match current.[0] with
    | ‘A’ as a -> Some a
    | _ -> None

    • How cool, thanks for the hint! (I guess then I could even leave out ‘as a’ like this: | ‘A’ -> Some ‘A’ …)

  • Pingback: F# Weekly #51, 2015 | Sergey Tihon's Blog()