Parsers in Scala built upon existing abstractions

When we refactor code how can we be confident that we don't break anything?

3 of the most important things that allow us to refactor fearlessly are:

Side effect free - or pure - expressions
Statically typed expressions
Tests

In this article we will solely focus on the aspect of side effects and strictly speaking on how to identify them. Being able to identify side effects in our programs clearly is the precondition for eliminating them.

Why avoid side effects?

Continue reading →

Have you ever wanted to try out PureScript but were lacking a good way to get started?

If you

Have some prior functional programming knowledge - maybe you know Haskell,Elm,F#,or Scala,etc.
Want to solve a small task with PureScript
And want to get started quickly

This post is for you!

In this post we will walk through setting up and implementing a small exemplary PureScript application from scratch.

Continue reading →

In this article I would like to present a minimal implementation of a music data type and everything that is needed to turn that into audible sound from an Elm application.

We will see how to transcribe an existing composition - an excerpt from Chick Corea's Children's Songs No. 6 - and listen to the result right here,embedded in this article.

From a music data type to performance

My colleague Jonas recently pointed out the presentation Making Algorithmic Music by Donya Quick to me. Donya Quick shows how she uses the Haskell library Euterpea to produce algorithmic music.

It got me really excited about the idea of porting this to Elm and to be able to use this in web applications.

In the following we will see the core data types and algorithms from Euterpea ported to Elm. To focus on the core concepts the implementation is stripped down to the minimum that is required to transcribe and perform an existing polyphonic piece of music (for a single instrument).

Continue reading →

This post is about how to implement well structured,and purely functional command line applications in Scala using PureApp.

PureApp originated in an experiment while refactoring out some glue code of an interactive command line application. At the same time it was inspired by the Elm Architecture Pattern,and scalaz's SafeApp,as well as scalm.

To show the really cool things we can do with PureApp,we will implement a self-contained example application from scratch.

This application translates texts from and into different languages. And it provides basic user interactions via the command line.

The complete source code is compiled with tut. Every output (displayed as code comments) is generated by tut.
Continue reading →

In this post we will see how applicatives can be used for validation in Scala. It is an elegant approach. Especially when compared to an object-oriented way.

Usually when we have operations that can fail,we have them return types like Option or Try. We sequence operations and once there is an error the computation is short circuited and the result is a None or a Failure.

Applicatives allow us to compose independent operations and evaluate each one. Even if an intermediate evaluation fails. This allows us to collect error messages instead of returning only the first error that occurred.

A classic example where this is useful is the validation of user input. We would like to return a list of all invalid inputs rather than aborting the evaluation after the first error.

Scala Cats provides a type that does exactly that. So let's dive into some code and see how it works.

Continue reading →

After some initial struggles,the chapter Functional Parsers from the great book Programming in Haskell by Graham Hutton,where a basic parser library is built from scratch,significantly helped me to finally understand the core ideas of parser combinators and how to apply them to other programming languages other than Haskell as well.

While I recently revisited the material and started to port the examples to Scala I wasn't able to define a proper monad instance for the type Parser[A].

The type Parser[A] alias was defined like this:

type Parser[A] = String =>Option[(A,String)] // defined type alias Parser

To test the monad laws with discipline I had to provide an instance of Eq[Parser[A]]. Because Parser[A] is a function,equality could only be approximated by showing degrees of function equivalence,which is not a trivial task.

Also the implementation of tailRecM was challenging. (I couldn't figure it out.)

Using existing abstractions

Continue reading →

Most config libraries use a stringly typed approach.

Some handle runtime failures due to invalid configuration schemas by leveraging data types like Option or Result to represent missing values or errors. This allows us to handle these failures by either providing default values or by providing decent error messages.

This is a good strategy that we should definitely stick to.

However,the problem with default values is that we might not even notice if the configuration is broken. This could potentially fail in production. In any case an error e.g. due to a misspelled config property will be observable at runtime at the earliest.

Wouldn't it be a great user experience (for us developers) if the compiler told us if the configuration schema is invalid? Even better,imagine we could access the configuration data in a strongly typed way like any other data structure,and with autocompletion.

Moreover,what if we didn't have to write any glue code,not even when the configuration schema changes?

This can be done with the costs of an initial setup that won't take more than probably around 5 minutes.

Continue reading →

In this post I will look at a practical example where the combined application (through monad transformers) of the state monad and the either monad can be very useful.

I won't go into much theory,but instead demonstrate the problem and then slowly build it up to resolve it.

You don't have to be completely familiar with all the concepts as the examples will be easy to follow. Here is a very brief overview:

Continue reading →

If your API overflows with Boolean parameters,this is usually a bad smell.

Consider the following function call for example:

toContactInfoList(csv,true,true)

When looking at this snippet of code it is not very clear what kind of effect the two Boolean parameters will have exactly. In fact,we would probably be without a clue.

We have to inspect the documentation or at least the parameter names of the function declaration to get a better idea. But still,this doesn't solve all of our problems.

The more Boolean parameters there are,the easier it will be for the caller to mix them up. We have to be very careful.

Moreover,functions with Boolean parameters must have conditional logic like if or case statements inside. With a growing number of conditional statements,the number of possible execution paths will grow exponentially. It will become more difficult to reason about the implementation code.

Can we do better?

Sure we can. Lambdas and combinators come to the rescue and I'm going to show this with a simple example,a refactoring of the function from above.

This post is based on a great article by John A De Goes,Destroy All Ifs — A Perspective from Functional Programming.

I'm going to take John's ideas that he backed up with PureScript examples and present how the same thing can be elegantly achieved in Scala.

Continue reading →

I'm currently working on an sbt plugin that generates Scala case classes at compile time to model JSON API responses for easy deserialization especially with the Scala play-json library.

The plugin makes it possible to access JSON documents in a statically typed way including auto-completion. It takes a sample JSON document as input (either from a file or a URL) and generates Scala types that can be used to read data with the same structure.

Let's look at a basic example,an app that prints the current Bitcoin price to the console.

Continue reading →

After some initial struggles, the chapter Functional Parsers from the great book Programming in Haskell by Graham Hutton, where a basic parser library is built from scratch, significantly helped me to finally understand the core ideas of parser combinators and how to apply them to other programming languages other than Haskell as well.

While I recently revisited the material and started to port the examples to Scala I wasn't able to define a proper monad instance for the type Parser[A].

The type Parser[A] alias was defined like this:

type Parser[A] = String => Option[(A, String)]
// defined type alias Parser

To test the monad laws with discipline I had to provide an instance of Eq[Parser[A]]. Because Parser[A] is a function, equality could only be approximated by showing degrees of function equivalence, which is not a trivial task.

Also the implementation of tailRecM was challenging. (I couldn't figure it out.)

Using existing abstractions

Functional programming offers a lot of abstractions provided either by the core languages or by external libraries suited to solve every day low-level (domain agnostic) problems.

A good example is the Option type that models the potential absence of a value. Without Option we would be forced to redundantly reimplement low-level imperative handling of nulls and null reference exceptions everywhere.

Apparently it is a good practice to refine our types so that they can be applied to certain categories of problems that have already been solved. Then we can reuse existing abstractions that are battle tested just such as Option.

It turns out that the Parser type looks very similar to the state monad which is a monad instance on the function S => (S, A) where S represents the state. In the case of a parser the input state represents the input string and the output state represents the remaining unparsed string whereas A represents the result type of the parser.

When we now take into account that parsers might fail we can stack the State on top of an Option with the help of monad transformers.

Let's see if we've found an appropriate existing abstraction for parsers (disregarding stack-safety for now) by trying to implement a parser library on top of it.

We can now define the Parser[A] type in terms of StateT and Option from Cats like this:

import cats.data.StateT._
import cats.data.StateT
import cats.implicits._

type Parser[A] = StateT[Option, String, A]
// defined type alias Parser

Implementing the parser library

The great thing about this is that we get a lot of functionality for free.

E.g. we don't have to implement combinators for sequencing parsing operations. This is already taken care of by the combined monad instances of State and Option with the flatMap operation.

Because we get a Functor instance for free we can map over inner values of parsers.

Also the parser that always succeeds is just pure provided by the Applicative instance of the monad and the parser that always fails is raiseError provided by ApplicativeError:

"42".pure[Parser]
// res0: Parser[String] = cats.data.IndexedStateT@6f2c0142

().raiseError[Parser, Nothing]
// res1: Parser[Nothing] = cats.data.IndexedStateT@2cb7bb20

The first real parser we will implement is item which parses whatever next character there is. Even though, it's possible to do this a little more concise, we will use the idiomatic get and modify functions from StateT:

val item: Parser[Char] =
  for {
    input <- get[Option, String]
    _ <- if (input.nonEmpty) 
      modify[Option, String](_.tail)
    else 
      ().raiseError[Parser, Nothing]
  } yield input.head

Now let's define a function to create a parser for single characters that satisfy a given predicate:

def sat(p: Char => Boolean): Parser[Char] =
  for {
    c <- item
    _ <- if (p(c)) c.pure[Parser]
    else ().raiseError[Parser, Nothing]
  } yield c

This allows us to create some more primitive parsers:

val digit: Parser[Char] = sat(_.isDigit)

val lower: Parser[Char] = sat(_.isLower)

val upper: Parser[Char] = sat(_.isUpper)

val letter: Parser[Char] = sat(_.isLetter)

val alphaNum: Parser[Char] = sat(_.isLetterOrDigit)

def char(c: Char): Parser[Char] = sat(_ == c)

A parser for string literals can either be defined recursively or like this:

def string(str: String): Parser[String] = 
  str.map(char).toList.sequence.map(_.mkString)

Next we need a combinator that applies a first parser and if it fails then applies a second parser. There is already such a combinator for StateT comming from SemigroupK called combineK or <+>:

val p = string("hi") <+> string("hello")
// p: Parser[String] = cats.data.IndexedStateT@1c7a25f

p.run("hello world")
// res8: Option[(String, String)] = Some(( world,hello))

With the help of <+> we can define two mutually recursive parsers many and many1 that repeatedly apply parsers until they fail and return lists of results of 0 to n or 1 to n elements:

object Many { // only needed for tut
  def many[A](p: Parser[A]): Parser[List[A]] =
    many1(p) <+> List.empty[A].pure[Parser]

  def many1[A](p: Parser[A]): Parser[List[A]] =
    for {
      head <- p
      tail <- many(p)
    } yield head :: tail
}

import Many._

The parsers and combinators that we have implemented so far together with a parser for zero to many white spaces now can be used to create parsers for identifiers, natural numbers and symbols:

val ident: Parser[String] =
  (lower, many(alphaNum)).mapN(_ :: _).map(_.mkString)

val nat: Parser[Int] =
  many1(digit).map(_.mkString.toInt)

val space: Parser[Unit] =
  many(sat(_.isWhitespace)).map(_ => ())

def token[A](p: Parser[A]): Parser[A] =
  space *> p <* space

val identifier: Parser[String] = token(ident)
val natural: Parser[Int] = token(nat)
def symbol(s: String): Parser[String] = token(string(s))

token is implemented with *> and <*. These operators come from the Apply typeclass. They compose two values and discard one of them. In this case the spaces around the token are discarded while the token is kept.

Building a parser for arithmetic expressions

We have now everything we need to compose more complex parsers, e.g. to parse and evaluate arithmetic expressions like 2 * 3 + 4 * (1 + 5) according to the following grammar where the symbol $\epsilon$ denotes the empty string:

$expr ::= term (+ expr | \epsilon) \$

$term ::= factor (* term | \epsilon) \$

$factor ::= (expr) | nat \$

$nat ::= 0 | 1 | 2 | \dots \$

Here is a direct translation of the grammar to our parser primitives:

object Arithmetics {
  lazy val expr: Parser[Int] =
    for {
      t <- term
      res <- (for {
        _ <- symbol("+")
        e <- expr
      } yield t + e) <+> t.pure[Parser]
    } yield res

  lazy val term: Parser[Int] =
    for {
      f <- factor
      res <- (for {
        _ <- symbol("*")
        t <- term
      } yield f * t) <+> f.pure[Parser]
    } yield res

  lazy val factor: Parser[Int] =
    (for {
      _ <- symbol("(")
      e <- expr
      _ <- symbol(")")
    } yield e) <+> natural
}

import Arithmetics._

Finally we define a function eval to evaluate expressions to either an Int value or an error if the expression is invalid or if there is any unconsumed input left:

def eval(input: String): Either[String, Int] =
  expr.run(input) match {
    case Some(("", n))  => Right(n)
    case Some((out, _)) => Left(s"unconsumed input: $out")
    case None           => Left("invalid input")
  }

Let's check some examples:

eval("2*(3+4)")
// res12: Either[String,Int] = Right(14)

eval("2 * 3 +  4")
// res13: Either[String,Int] = Right(10)

eval("2*(     3+ 4)  ")
// res14: Either[String,Int] = Right(14)

eval("2*3")
// res15: Either[String,Int] = Right(6)

eval(" (( 1 ))*( 2+( ( (   3) ) )* (4+(  ((5))+ 6))*( ((7*8   ))) +9)") 
// res16: Either[String,Int] = Right(2531)

Conclusion

I think this is pretty cool.

Out of nothing but few a primitives we have created a library of parsers and combinators of significantly less than 100 lines of code that allowed us to build parsers for relatively complex arithmetic expressions.

Applying battle tested existing abstractions provided a lot of built in functionalities that we didn't have to implement or test ourselves. This also reduced the amount of boiler-plate code by a great deal.

Note that here are a lot more features and compositional capabilities that we get from the abstraction that we chose that we didn't even use in these examples.

I'll give one more example, though, which I discovered with some help from the Cats Gitter channel (thanks to Luka Jacobowitz and Fabio Labella). Let's say we want to implement a parser by providing a list of parsers and try to apply each one, until one of them succeeds, semantically providing a list of alternatives. We get this for free:

val ps =
  List[Parser[Char]](
    char('a'),
    char('b'),
    lower,
    digit)

ps.foldK.run("foo")
// res19: Option[(String, Char)] = Some((oo,f))

Here is the primary take away from this post: When you are dealing with specific recurring problems of implementation details it is very likely that these problems have been solved before. Try to modify and refine your model and your types in such a way that you can reuse and benefit from existing abstraction.

All examples are build with tut.

Leif Battermann

Functional Programming Fu

Identify Side Effects And Refactor Fearlessly

Why avoid side effects?

PureScript Case Study And Guide For Newcomers

Elm And The Algorithm Of Music

From a music data type to performance

Interactive Command Line Applications In Scala –Well Structured And Purely Functional

How To Use Applicatives For Validation In Scala And Save Much Work

Parsers in Scala built upon existing abstractions

Using existing abstractions

Strongly Typed Configuration Access With Code Generation

Error and state handling with monad transformers in Scala

Use lambdas and combinators to improve your API

Modelling API Responses With sbt-json –Print Current Bitcoin Price

Leif Battermann

Functional Programming Fu

Parsers in Scala built upon existing abstractions

Using existing abstractions

Implementing the parser library

Building a parser for arithmetic expressions

Conclusion