Functional Monadic Parsers ported to C#

From a music data type to performance

My colleague Jonas recently pointed out the presentation Making Algorithmic Music by Donya Quick to me. Donya Quick shows how she uses the Haskell library Euterpea to produce algorithmic music.

It got me really excited about the idea of porting this to Elm and to be able to use this in web applications.

In the following we will see the core data types and algorithms from Euterpea ported to Elm. To focus on the core concepts the implementation is stripped down to the minimum that is required to transcribe and perform an existing polyphonic piece of music (for a single instrument).

If your API overflows with Boolean parameters,this is usually a bad smell.

Consider the following function call for example:

toContactInfoList(csv,true,true)

When looking at this snippet of code it is not very clear what kind of effect the two Boolean parameters will have exactly. In fact,we would probably be without a clue.

We have to inspect the documentation or at least the parameter names of the function declaration to get a better idea. But still,this doesn't solve all of our problems.

The more Boolean parameters there are,the easier it will be for the caller to mix them up. We have to be very careful.

Moreover,functions with Boolean parameters must have conditional logic like if or case statements inside. With a growing number of conditional statements,the number of possible execution paths will grow exponentially. It will become more difficult to reason about the implementation code.

Can we do better?

Sure we can. Lambdas and combinators come to the rescue and I'm going to show this with a simple example,a refactoring of the function from above.

This post is based on a great article by John A De Goes,Destroy All Ifs — A Perspective from Functional Programming.

I'm going to take John's ideas that he backed up with PureScript examples and present how the same thing can be elegantly achieved in Scala.

Continue reading →

While taking the MOOC Introduction to Functional Programming by Erik Meijer on edX the current lecture greatly increased my interest in functional parsers. Functional parsers are about the essence of (functional) programming because they are about the concept of composition which is one of the core concepts of programming whatsoever.

The content of the lecture is closely related to the chapter 8 Functional Parsers of the book Programming in Haskell by Graham Hutton. It starts out with the definition of a type for a parser and a few very basic parsers.

I'm totally amazed by the simplicity, the elegance and the compositional aspect of these examples. I find it impressive how primitive but yet powerful these simple parsers are because they can easily be combined to form more complex and very capable parsers.

As an exercise, out of curiosity, and because of old habits I implemented the examples from the book in C#.

At the end of this post there will be an ultimate uber-cool parser example of a parser for arithmetic expressions.

The complete code from this post can be found here on GitHub.

Here is what I got:

The parser type

The type is just a function that takes a string as input and returns a list of tuples of a value of any type and a string:

public delegate List<Tuple<T, String>> Parser<T>(String inp);

The first item of the tuple represents a parsed value and the second item is the remaining unconsumed part of the input string. The fact that the return type is a list indicates that the result can also be empty if the parser fails. Otherwise, if the parser succeeds, the result will always be a singleton list.

Basic parsers

Here are the three most basic parsers.

Return always succeeds and returns the value that it was instantiated with as well as the unconsumed input string.

Failure always fails and therefore returns an empty list.

Item returns the first character of the input string as well as the remaining unconsumed input.

public static Parser<T> Return<T>(T v)
{
    return inp => new List<Tuple<T, string>> { Tuple.Create(v, inp) };
}

public static Parser<T> Failure<T>()
{
    return inp => new List<Tuple<T, string>>();
}

public static readonly Parser<char> Item = 
    inp => String.IsNullOrEmpty(inp)
        ? Failure<char>()(inp)
        : Return(inp[0])(inp.Substring(1));

Instead of calling the parsers directly we will use an application function Parse to abstract from the representation of parsers:

public static List<Tuple<T, string>> Parse<T>(this string input, Parser<T> p)
{
    return p(input);
}

Note: In C# we don't really need this. I think, in Haskell this is needed because the parsers will be instances of the Monad type class.

Sequencing

Next we need a function to combine multiple parsers. This can be done with Bind which takes a parser and a function that returns another parser based on the result of the first parser. If the first parser fails, the whole computation will fail. Otherwise the function will be applied to the result of the first parser and return a parser that then will be applied to the unconsumed remaining input string.

public static Parser<T2> Bind<T1, T2>(this Parser<T1> p, Func<T1, Parser<T2>> f)
{
    return inp =>
    {
        var result = inp.Parse(p);
        return !result.Any() 
            ? Failure<T2>()(inp)
            : result.First().Item2.Parse(f(result.First().Item1));
    };
}

Hers is an example:

var p = Parsers.Item.Bind(
        x => Parsers.Item.Bind(
            _ => Parsers.Item.Bind(
                z => Parsers.Return(new string(new[] { x, z })))));

var result = "abc".Parse(p); 
Check.That(result).ContainsExactly(Tuple.Create("ac", String.Empty));

LINQ query syntax

Now I added two functions Select and SelectMany to get some LINQ syntax sugar.

By adding Select the Parser<T> type becomes a Functor because now we can map functions over it. The function will be applied to the inner value when the parser is applied.

public static Parser<TResult> Select<TSource, TResult>(this Parser<TSource> source, Func<TSource, TResult> selector)
{
    return inp =>
    {
        var result = inp.Parse(source);
        return !result.An
            ? Failure<TResult>()(inp)
            : Return(selector(result.First().Item1))(result.First().Item2);
    };
}

SelectMany is needed so that we can use the LINQ query syntax instead of using Bind which similar to the do-Notation in Haskell.

public static Parser<TResult> SelectMany<TSource, TValue, TResult>(this Parser<TSource> source, Func<TSource, Parser<TValue>> valueSelector, Func<TSource, TValue, TResult> resultSelector)
{
    return source.Bind(s => valueSelector(s).Select(v => resultSelector(s, v)));
}

Here is an example of the use of the LINQ query syntax:

var p = 
    from x in Parsers.Item
    from _ in Parsers.Item
    from z in Parsers.Item
    select new string(new []{x,z});

var result = "abc".Parse(p); 
Check.That(result).ContainsExactly(Tuple.Create("ac", String.Empty));

Choice

Another way of combining parsers is to apply one parsers and if it fails apply another one with the Choice function:

public static Parser<T> Choice<T>(this Parser<T> p, Parser<T> q)
{
    return inp =>
    {
        var result = inp.Parse(p);
        return !result.Any()
            ? inp.Parse(q)
            : result;
    };
}

Derived primitives

We can now create other simple parsers by combining what we have so far.

First we define a parser that takes a predicate and parses the first character if the predicate holds:

 public static Parser<char> Sat(Predicate<char> p)
{
    return Item.Bind(c => p(c) ? Return(c) : Failure<char>());
}

Now we can combine Sat to create single character parsers for numeric, upper and lower case, alphabetic, alphanumeric or specific characters:

public static readonly Parser<char> Digit = Sat(Char.IsDigit);

public static readonly Parser<char> Lower = Sat(Char.IsLower);

public static readonly Parser<char> Upper = Sat(Char.IsUpper);

public static readonly Parser<char> Letter = Sat(Char.IsLetter);

public static readonly Parser<char> AlphaNum = Sat(Char.IsLetterOrDigit);

public static Parser<char> CharP(char c)
{
    return Sat(x => x == c);
}

The parser CharP can now be composed to a parser that matches a string:

public static Parser<string> StringRec(string str)
{
    return String.IsNullOrEmpty(str)
        ? Return(String.Empty)
        : from a in CharP(str[0])
          from b in StringP(str.Substring(1))
          from result in Return(str)
          select result;
}

Also two very useful parsers are Many and Many1 that apply a parser as many times as possible, where Many allows zero or more and Many1 requires at least one successful application.

public static Parser<List<T>> Many<T>(this Parser<T> p)
{
    return p.Many1().Choice(Return(new List<T>()));
}

public static Parser<List<T>> Many1<T>(this Parser<T> p)
{
    return p.Bind(v => p.Many().Bind(vs => Return(new List<T> { v }.Concat(vs).ToList())));
}

The problem with recursion

In C# we have a little problem here. StringRec is defined as a recursive function and Many and Many1 are defined as mutually recursive functions. That means that they are defined in terms of each other. This is a beautiful and elegant thing. But since the C# compiler is not very good at supporting recursive structures, stack overflow exceptions are likely to happen already with relatively small input strings (> a few thousand characters).

Therefore we have to define those functions as non-recursive.

Here is another non-recursive version of Many:

public static Parser<List<T>> Many<T>(this Parser<T> p)
{
    return inp =>
    {
        var state = Tuple.Create(new List<T>(), inp);

        while (true)
        {
            var newState = p(state.Item2);
            if (!newState.Any())
                break;
            state = Tuple.Create(state.Item1.Concat(new[] { newState.First().Item1 }).ToList(), newState.First().Item2);
        }

        return Return(state.Item1)(state.Item2);
    };
}

This looks almost like an unfold operation over the input string. But I couldn't really figure out how to refactor out a yet underlying pattern here. I'm not sure if that is possible at all or makes sense. (Maybe the list concatenation can be abstracted out?)

The non-recursive string parser looks like this:

public static Parser<string> StringP(string str)
{
    return inp => !inp.StartsWith(str)
        ? Failure<string>()(inp)
        : Return(str)(inp.Substring(str.Length));
}

Identifiers, numbers and spaces

Now we can define parsers for identifiers, numbers and spaces:

public static readonly Parser<int> Nat = Digit.Many1().Bind(n => Return(Int32.Parse(String.Concat(n))));

public static readonly Parser<string> Ident =
    from c in Lower
    from cs in AlphaNum.Many()
    from result in Return(String.Concat(new[] {c}.Concat(cs)))
    select result;

public static readonly Parser<Unit> Space = Sat(Char.IsWhiteSpace).Many().Select(x => Unit.Value);

Handling spacing

To be able to ignore arbitrary number of spaces before or after tokens to be parsed we define a parser Token:

public static Parser<T> Token<T>(this Parser<T> p)
{
    return
        from before in Space
        from t in p
        from after in Space
        from result in Return(t)
        select result;
}

Now we can combine Token with other parsers:

public static readonly Parser<string> Identifier = Ident.Token();

public static readonly Parser<int> Natural = Nat.Token();

public static Parser<string> Symbol(string xs)
{
    return inp => inp.Parse(StringP(xs).Token());
}

Example: Parsing a list of integers

Now the first example of a useful parser is a parsers for non-empty lists of integers (in Haskell syntax) that ignores spacing around tokens. A correct example of this syntax could be this:

[ 1,  2    , 3,  4 ,   5   ]

We can combine the parser like this:

var p =
    from ign1 in Parsers.Symbol("[")
    from n in Parsers.Natural    
    from ns in (from ign2 in Parsers.Symbol(",") from n2 in Parsers.Natural select n2)
                        .Many()
    from ign3 in Parsers.Symbol("]")
    select (new[] { n }.Concat(ns));

var result = "[ 1,  2    , 3,  4 ,   5   ]".Parse(p);

Check.That(result.First().Item1).ContainsExactly(1, 2, 3, 4, 5);

// this shoudld fail
var result2 = "[1,2,]".Parse(p);
Check.That(result2).IsEmpty();

Example 2: Parsing arithmetic expressions

The next example will be a parser for arithmetic expressions with addition and multiplication like this or 2 + 3 * 4 or this (2 + 3) * 4.

In my opinion, it is the ultimate uber-cool parser example because it really shows the power of the composability of parsers and FP in general.

Grammar

The grammar for these kinds of expressions looks like this :

$expr ::= term (+ expr | \epsilon) \$

$term ::= factor (* term | \epsilon) \$

$factor ::= (expr) | nat \$

$nat ::= 0 | 1 | 2 | \dots \$

Implementation

Expr, Term and Factor are defined in terms of each other. Therefore the implementation is recursive.

What is really remarkable is that this parser tuple of Expr, Term and Factor is only composed of other general purpose primitives!

public static Parser<int> Expr()
{       
    return     
        from t in Term()
        from r in
           (from _ in Parsers.Symbol("+")
            from e in Expr()
            from r in Parsers.Return(t+e)
            select r)
                .Choice(Parsers.Return(t))
        select r;
}

public static Parser<int> Term()
{
    return
        from f in Factor()
        from r in
           (from _ in Parsers.Symbol("*")
            from t in Term()
            from r in Parsers.Return(f*t)
            select r)
                .Choice(Parsers.Return(f))
        select r;
}

public static Parser<int> Factor()
{
    return
       (from ign1 in Parsers.Symbol("(")
        from e in Expr()
        from ign2 in Parsers.Symbol(")")
        from r in Parsers.Return(e)
        select r)
            .Choice(Parsers.Natural);
}

Also, please note that there is an obvious, visible, direct mapping between the grammar and the implementation.

Here are a few tests:

[TestCase("42", 42)]
[TestCase("(((((42)))))", 42)]
[TestCase("1+1", 2)]
[TestCase("(1+1)", 2)]
[TestCase("1*1", 1)]
[TestCase("1*2", 2)]
[TestCase("(1*2)", 2)]
[TestCase("2*3+4", 10)]
[TestCase("2*(3+4)", 14)]
[TestCase("2 * 3 +  4", 10)]
[TestCase("2*(     3+ 4)  ", 14)]
[TestCase("2*3-4", 6)] // leaves "-4" unconsumed
[TestCase("((1))*(2+(((3)))*(4+(((5))+6))*(((7*8)))+9)", 2531)]
public void ArithmeticExpressions_should_succeed(string inp, int expected)
{
    var result = inp.Parse(ArithmeticExpressions.Expr());
    Check.That(result.First().Item1).IsEqualTo(expected);
}

[TestCase("-1")]
[TestCase("()")]
[TestCase("(5")]
[TestCase("(1+2")]
[TestCase("(1+2()")]
public void ArithmeticExpressions_should_fail(string inp)
{
    var result = inp.Parse(ArithmeticExpressions.Expr());
    Check.That(result).IsEmpty();
}

Leif Battermann

Functional Programming Fu

Identify Side Effects And Refactor Fearlessly

Why avoid side effects?

PureScript Case Study And Guide For Newcomers

Elm And The Algorithm Of Music

From a music data type to performance

Interactive Command Line Applications In Scala –Well Structured And Purely Functional

How To Use Applicatives For Validation In Scala And Save Much Work

Parsers in Scala built upon existing abstractions

Using existing abstractions

Strongly Typed Configuration Access With Code Generation

Error and state handling with monad transformers in Scala

Use lambdas and combinators to improve your API

Modelling API Responses With sbt-json –Print Current Bitcoin Price