This is a follow-up from my last post Functional Monadic Parsers ported to C# where I showed the implementation of basic parsers from the book Programming in Haskell by Graham Hutton in C#.

When these primitives are used to compose a parser for Roman Numerals the result yet again demonstrates the amazing capabilities and elegance of functional programming. The problem of parsing Roman Numerals is not a very difficult one. But still, I find the simplicity of constructing a solution by combining primitive parsers fascinating.

Here is the implementation. It is super easy, I was able to write this in less than 15 minutes without *tests first*, worked the first time.

```
public readonly static Parser<int> RomanNumeral =
Parsers.StringP("IV").Select(x => 4)
.Choice(Parsers.StringP("IX").Select(x => 9))
.Choice(Parsers.StringP("XL").Select(x => 40))
.Choice(Parsers.StringP("XC").Select(x => 90))
.Choice(Parsers.StringP("CD").Select(x => 400))
.Choice(Parsers.StringP("CM").Select(x => 900))
.Choice(Parsers.StringP("I").Select(x => 1))
.Choice(Parsers.StringP("V").Select(x => 5))
.Choice(Parsers.StringP("X").Select(x => 10))
.Choice(Parsers.StringP("L").Select(x => 50))
.Choice(Parsers.StringP("C").Select(x => 100))
.Choice(Parsers.StringP("D").Select(x => 500))
.Choice(Parsers.StringP("M").Select(x => 1000))
.Many1()
.Select(rns => rns.Sum());
```

The code can be found here on GitHub.

*Note that all syntactically correct Roman Numerals will be parsed correctly. However, there are syntactically incorrect combinations like e.g. "IIIII" that will be parsed as well.*

Here are a few spot check tests:

```
[TestCase("I", 1)]
[TestCase("III", 3)]
[TestCase("IX", 9)]
[TestCase("MLXVI", 1066)]
[TestCase("MCMLXXXIX", 1989)]
public void RomanNumerals_tests(string rn, int an)
{
var numnber = rn.Parse(RomanNumerals.RomanNumeral);
Check.That(numnber.First().Item1).IsEqualTo(an);
}
```

## Checking for correct syntax

Checking for correct syntax is also not that hard. It can be done by combining primitive parsers as well.

The requirements are defined like this:

- Every symbol may only appear once
- Except
`I`

, `X`

and `C`

which may be repeated up to 3 times - And
`M`

may be repeated many times (*Off course this is limited by stack, memory, ranges etc. but I won't take this into account right now.*) - The values of the sequence must be strictly decreasing (e.g.
`XV`

is valid - `VX`

is not valid)

To be able to check the constraints we can use the `Sat`

function which takes a predicate and will succeed if the predicate holds. Otherwise it will fail:

```
public static Parser<T> Sat<T>(this Parser<T> parser, Predicate<T> predicate)
{
return parser.Bind(c => predicate(c) ? Return(c) : Failure<T>());
}
```

Now we can easily check our constraints while parsing the input:

```
public static readonly Parser<int> RomanNumeral =
Parsers.StringP("IV").Select(x => 4)
.Choice(Parsers.StringP("IX").Select(x => 9))
.Choice(Parsers.StringP("XL").Select(x => 40))
.Choice(Parsers.StringP("XC").Select(x => 90))
.Choice(Parsers.StringP("CD").Select(x => 400))
.Choice(Parsers.StringP("CM").Select(x => 900))
.Choice(Parsers.CharP('I').Select(x => 1)
.Many1()
.Select(cs => cs.Sum())
.Sat(sum => sum <= 3))
.Choice(Parsers.CharP('X').Select(x => 10)
.Many1()
.Select(cs => cs.Sum())
.Sat(sum => sum <= 30))
.Choice(Parsers.CharP('C').Select(x => 100)
.Many1()
.Select(cs => cs.Sum())
.Sat(sum => sum <= 300))
.Choice(Parsers.CharP('M').Select(x => 1000)
.Many1()
.Select(x => x.Sum()))
.Choice(Parsers.CharP('V').Select(x => 5))
.Choice(Parsers.CharP('L').Select(x => 50))
.Choice(Parsers.CharP('D').Select(x => 500))
.Many1()
.Sat(ns => ns.Zip(ns.Skip(1), (a, b) => a > b).All(b => b))
.Select(ns => ns.Sum());
```

The code can be found here on GitHub.

## Comparison to the imperative version

When googling for "*Parsing Roman Numerals*" one of the first results is this one from Black Belt Coder. This is a rather imperative implementation of a Roman Numerals parser. (Note that it does not check for correct syntax.)

### Advantages of the functional approach

Under the covers both versions are pretty much doing the same thing. Internally the `Many`

function is implemented imperatively using a while loop, similar to the other example, because it is more efficient in C# than recursion. In other languages like Haskell or F# tail recursive implementations are fine. But this is not so important because the internals of `Many`

are encapsulated and don't matter to the caller. This is one of the main differences and a big advantages of declarative or functional programming. This leads to some more advantages:

- The details of implementation of common tasks are abstracted away from the caller
- The internals of such primitive functions are totally encapsulated
- The primitives are well-tested or even proven to be correct
- Primitives are ready for massive reuse
- More concise code
- Less code repetition and therefore less chances of making mistakes
- Functions that are composed of smaller functions that are correct are more likely to be correct as well
- Trivial things don't have to be frequently re-implemented and tested

## Parsing Roman Numerals in Haskell

Now going back to Haskell and the parsing library Parsec, here is an implementation of the Roman Numeral Kata in Haskell (I'm not a Haskell expert so there might be room for improvement):

```
sat :: String -> (a -> Bool) -> ParsecT s u m a -> ParsecT s u m a
sat msg predicate parser = parser >>= (\x -> if predicate x then parserReturn x else parserFail msg)
strictDecr :: Ord a => ParsecT s u m [a] -> ParsecT s u m [a]
strictDecr =
sat msg (\xs -> and (zipWith (>) xs (drop 1 xs)))
where msg = "unexpected order of values\nexpected strictly decreasing values"
romPrimCombiVal :: ParsecT [Char] u Identity Integer
romPrimCombiVal =
choice [
(\_ -> 4) <$> (try $ string "IV"),
(\_ -> 9) <$> (try $ string "IX"),
(\_ -> 40) <$> (try $ string "XL"),
(\_ -> 90) <$> (try $ string "XC"),
(\_ -> 400) <$> (try $ string "CD"),
(\_ -> 900) <$> (try $ string "CM"),
sat "unexpected repetitions of symbol `I`\nexpected symbol to appear 3 times at most" (<= 3) $ sum <$> many1 ((\_ -> 1) <$> (char 'I')),
sat "unexpected repetitions of symbol `X`\nexpected symbol to appear 3 times at most" (<= 30) $ sum <$> many1 ((\_ -> 10) <$> (char 'X')),
sat "unexpected repetitions of symbol `C`\nexpected symbol to appear 3 times at most" (<= 300) $ sum <$> many1 ((\_ -> 100) <$> (char 'C')),
sum <$> many1 ((\_ -> 1000) <$> (char 'M')),
(\_ -> 5) <$> (char 'V'),
(\_ -> 50) <$> (char 'L'),
(\_ -> 500) <$> (char 'D')]
romNum :: ParsecT [Char] u Identity Integer
romNum = do
ns <- strictDecr $ many1 romPrimCombiVal
return $ sum ns
```

Here is the complete code on GitHub.