Error and state handling with monad transformers in Scala

From a music data type to performance

My colleague Jonas recently pointed out the presentation Making Algorithmic Music by Donya Quick to me. Donya Quick shows how she uses the Haskell library Euterpea to produce algorithmic music.

It got me really excited about the idea of porting this to Elm and to be able to use this in web applications.

In the following we will see the core data types and algorithms from Euterpea ported to Elm. To focus on the core concepts the implementation is stripped down to the minimum that is required to transcribe and perform an existing polyphonic piece of music (for a single instrument).

If your API overflows with Boolean parameters,this is usually a bad smell.

Consider the following function call for example:

toContactInfoList(csv,true,true)

When looking at this snippet of code it is not very clear what kind of effect the two Boolean parameters will have exactly. In fact,we would probably be without a clue.

We have to inspect the documentation or at least the parameter names of the function declaration to get a better idea. But still,this doesn't solve all of our problems.

The more Boolean parameters there are,the easier it will be for the caller to mix them up. We have to be very careful.

Moreover,functions with Boolean parameters must have conditional logic like if or case statements inside. With a growing number of conditional statements,the number of possible execution paths will grow exponentially. It will become more difficult to reason about the implementation code.

Can we do better?

Sure we can. Lambdas and combinators come to the rescue and I'm going to show this with a simple example,a refactoring of the function from above.

This post is based on a great article by John A De Goes,Destroy All Ifs — A Perspective from Functional Programming.

I'm going to take John's ideas that he backed up with PureScript examples and present how the same thing can be elegantly achieved in Scala.

Continue reading →

In this post I will look at a practical example where the combined application (through monad transformers) of the state monad and the either monad can be very useful.

I won't go into much theory, but instead demonstrate the problem and then slowly build it up to resolve it.

You don't have to be completely familiar with all the concepts as the examples will be easy to follow. Here is a very brief overview:

State monad: Data type for modeling mutable state in a purely functional way
Either monad: Data type that contains either of two values that represent either a success or a failure
Monad transformers: Type constructors to combine multiple monads into one

The complete source code can be found on GitHub.

The problem

Consider a function that extracts the schema of a JSON value. There are two things to note:

Every JSON object schema must be given a unique Id (needed for further processing)
There are cases when processing can fail (e.g. with null values or empty arrays)

object SchemaExtractor {

  def fromJsRoot(value: JsValue): Schema = {
    value match {
      case JsObject(fields) =>
        fromJsObject("root", fields.toList)
      case _ =>
        ???
    }
  }

  private def fromJsObject(name: ClassName, fields: List[(String, JsValue)]): Schema = {
    val schemaFields =
      fields.map { case (fieldName, value) =>
        fromJsValue(fieldName, value)
      }

    SchemaObject(
      UUID.randomUUID,
      name,
      schemaFields)
  }

  private def fromJsValue(name: String, value: JsValue): (String, Schema) = {
    val schema = value match {
      case JsNull =>
        ???
      case JsString(_) =>
        SchemaString
      case JsNumber(_) =>
        SchemaDouble
      case JsBoolean(_) =>
        SchemaBoolean
      case JsObject(fields) =>
        fromJsObject(name, fields.toList)
      case JsArray(values) =>
        val schemas =
          values.map(v => fromJsValue(name, v)._2)
        val first = schemas.head
        if (schemas forall Schema.haveSameStructure(first)) {
          SchemaArray(first)
        } else {
          ???
        }
    }

    (name, schema)
  }
}

The function is impure. Leaving aside the error cases for now, the Id generation by calling UUID.randomUUID introduces a side effect that makes the function non-deterministic. Referential transparency demands that for a given input a function must always yield that same output. This doesn't hold here.

One of the consequences is that testing becomes a tedious task.

test("test impure schema extractor") {
  val json =
    """{
      |  "value1": 42,
      |  "value2": { "value2.1": "", "value2.2": 5, "value2.3": { "value2.3.1": 1.0, "value2.3.2": 1.0, "value2.3.3": 1.0 } },
      |  "value3": { "value3.1": true, "value3.2": 2 },
      |  "value4": [
      |     [ { "value4.1": "", "value4.2": 123 }, { "value4.1": "", "value4.2": 345 } ],
      |     [ { "value4.1": "", "value4.2": 678 }, { "value4.1": "", "value4.2": 312 } ]
      |  ]
      |}
    """.stripMargin

  val schema = SchemaExtractor.fromJsRoot(Json.parse(json))

  schema match {
    case SchemaObject(_, name, fields) =>
      assert(name == "root")
      assert(fields.map(_._1) == List("value1", "value2", "value3", "value4"))
      // ...
    case _ => fail()
  }
}

We cannot simply test the expected and actual result for structural equality with ==. Instead we have to manually deconstruct the result.

If we could make the Id generation pure, we would get a lot of benefit, one of which would be better testability.

The state monad

One possible solution is to use the state monad.

The state monad is represented by a data type that wraps a function of type S => (S, A). The function takes a state of type S and yields a new state of type S and a value of type A. In Cats e.g. this type is called State and also provides many useful operations for composing and transforming state.

To get a better understanding of how to apply this to the original code, let's look at a simpler example of labeling leafs of a tree structure (similar to this example). Here is the definition of the tree data type:

sealed trait Tree[+A]
case class Branch[+A](left: Tree[A], right: Tree[A]) extends Tree[A]
case class Leaf[A](a: A) extends Tree[A]

Suppose we wanted to transform a tree of type Tree[A] to a labeled tree of type type LabeledTree[A] = Tree[(Int, A)].

This can be done like this:

def fromTree[A](tree: Tree[A]): State[Int, LabeledTree[A]] = {
  tree match {
    case Leaf(a) =>
      for {
        state <- State.get[Int]
        _ <- State.modify[Int](s => s + 1)
      } yield Leaf(state, a)
    case Branch(left, right) =>
      for {
        l <- fromTree(left)
        r <- fromTree(right)
      } yield Branch(l, r)
  }
}

The for comprehension is syntax sugar for the flatMap operation which allows us to sequence implicit state changes.

Let's run this code:

val tree: Tree[Char] = Branch(Leaf('a'), Branch(Branch(Leaf('b'), Leaf('c')), Leaf('d')))
pprintln(tree)
pprintln(LabeledTreeExampleWithState.LabeledTree.fromTree(tree).runA(0).value)

Output:

Branch(Leaf('a'), Branch(Branch(Leaf('b'), Leaf('c')), Leaf('d')))
Branch(Leaf((0, 'a')), Branch(Branch(Leaf((1, 'b')), Leaf((2, 'c'))), Leaf((3, 'd'))))

Schema extraction with the state monad

First we need a pure function of type S => S, that defines how a new Id can be produced from the previous one. In our case the schema Id is of type UUID and can be computed like this:

type SchemaId = UUID
def nextId: SchemaId => SchemaId = id => UUID.nameUUIDFromBytes(id.toString.getBytes)

In the original implementation we needed to generate UUID's to ensure uniqueness. But the new mechanism for generating unique values can be applied to almost any other type. Do we need all this overhead of the UUID type? No, we shouldn't needlessly tie our code to this cumbersome representation of an Id if we could simply use an Int:

type SchemaId = Int
def nextId: SchemaId => SchemaId = _ + 1

Now we can change the original implementation according to the example of the labeled tree:

object SchemaExtractor {

  def fromJsRoot(value: JsValue): State[SchemaId, Schema] = {
    value match {
      case JsObject(fields) =>
        fromJsField("root", fields.toList)
      case _ =>
        ???
    }
  }

  private def fromJsField(name: ClassName, fields: List[(String, JsValue)]): State[SchemaId, Schema] = {
    val schemaFieldsState =
      fields.map { case (fieldName, value) =>
        fromJsValue(fieldName, value)
      }

    for {
      schemaFields <- schemaFieldsState.sequence
      state <- get[SchemaId]
      _ <- modify(Schema.nextId)
    } yield
      SchemaObject(
        state,
        name,
        schemaFields)
  }

  private def fromJsValue(name: String, value: JsValue): State[SchemaId, (String, Schema)] = {
    val schema = value match {
      case JsNull =>
        ???
      case JsString(_) =>
        pure[SchemaId, Schema](SchemaString)
      case JsNumber(_) =>
        pure[SchemaId, Schema](SchemaDouble)
      case JsBoolean(_) =>
        pure[SchemaId, Schema](SchemaBoolean)
      case JsObject(fields) =>
        fromJsField(name, fields.toList)
      case JsArray(values) =>
        fromJsArray(name, values.toList)
    }

    schema.map((name, _))
  }

  private def fromJsArray(name: String, values: List[JsValue]): State[SchemaId, Schema] = {
    for {
      schemas <- values.map(v => fromJsValue(name, v).map(_._2)).sequence
      first = schemas.head
    } yield
      if (schemas forall Schema.haveSameStructure(first)) {
        SchemaArray(first)
      } else {
        ???
      }
  }

}

There are a few things to note here.

We use map to map over a value of type State[S, A], e.g. schema.map((name, _)).
We use pure to create instances of State[S, A] from a value, e.g. pure[SchemaId, Schema](SchemaString)
We use sequence to transform a List[State[S, A]] to a State[S, List[A]], e.g. schemaFields <- schemaFieldsState.sequence

Testing is much easier now

With runA we supply an initial state to the result of the schema extractor. As a result we get an instance of Eval (to maintain stack safety) and we can call value to get the final result.

val actual = SchemaExtractor.fromJsRoot(Json.parse(json)).runA(0).value

In the test we can now simply check for equality instead of deconstructing the complete result:

// ...
val value4 = SchemaArray(SchemaArray(SchemaObject(3, "value4", List(("value4.1", SchemaString), ("value4.2", SchemaDouble)))))
val value3 = SchemaObject(2, "value3", List(("value3.1", SchemaBoolean), ("value3.2", SchemaDouble)))
val value2 = SchemaObject(1, "value2", List(
  ("value2.1", SchemaString),
  ("value2.2", SchemaDouble),
  ("value2.3", SchemaObject(0, "value2.3", List(
    ("value2.3.1", SchemaDouble),
    ("value2.3.2", SchemaDouble),
    ("value2.3.3", SchemaDouble)))
  )
))

val expected = SchemaObject(7, "root", List(
  ("value1", SchemaDouble),
  ("value2", value2),
  ("value3", value3),
  ("value4", value4)))

assert(actual == expected)

Error handling and monad transformers

Now it's time to apply error handling with the help of Either.

However, if we would simply use a type like State[SchemaId, Either[Error, Schema]] we would have to write a lot of messy, nested for-comprehensions.

Here monad transformers come to the rescue because they allow us to combine the behavior of multiple monads into one.

Let's again look at the tree example to get a better understanding of how to use the StateT type to stack another monad inside.

StateT takes an additional type constructor as type parameter, but we cannot simply pass in Either because it must be a constructor that takes only a single parameter. Therefore, we have to define a type alias with a fixed error type:

type Error = String
type ErrorOr[A] = Either[Error, A]

Now we can define aliases for the combined state and either monad:

type EitherState[A] = StateT[ErrorOr, Int, A]

Next we make a few changes to the fromTree function:

def fromTree[A](tree: Tree[A]): EitherState[LabeledTree[A]] = {
  tree match {
    case Leaf(a) =>
      for {
        state <- StateT.get[ErrorOr, Int]
        _ <- StateT.modify[ErrorOr, Int](s => s + 1)
      } yield Leaf(state, a): LabeledTree[A]
    case Branch(left, right) =>
      for {
        l <- fromTree(left)
        r <- fromTree(right)
      } yield Branch(l, r)
  }
}

Everything stayed pretty much the same. Only get and modify have a slightly different type, as they are called on StateT as opposed to State now.

Calling runA with an initial state of 0 will output:

Right(Branch(Leaf((0, 'a')), Branch(Branch(Leaf((1, 'b')), Leaf((2, 'c'))), Leaf((3, 'd')))))

Handle invalid JSON inputs

Let's apply this to the schema extractor, and first define the type aliases:

type Error = String
type ErrorOr[A] = Either[Error, A]
type EitherState[A] = StateT[ErrorOr, SchemaId, A]

In fromJsonRoot we lift a value of type Left[A] into StateT with lift:

def fromJsRoot(value: JsValue): EitherState[Schema] = {
  value match {
    case JsObject(fields) =>
      fromJsField("root", fields.toList)
    case _ =>
      lift[ErrorOr, SchemaId, Schema](Left("JSON root value must be an object"))
  }
}

In the Id generation part only get and modify are changed:

private def fromJsField(name: ClassName, fields: List[(String, JsValue)]): EitherState[Schema] = {
  val schemaFieldsState =
    fields.map { case (fieldName, value) =>
      fromJsValue(fieldName, value)
    }

  for {
    schemaFields <- schemaFieldsState.sequence
    state <- get[ErrorOr, SchemaId]
    _ <- modify[ErrorOr, SchemaId](Schema.nextId)
  } yield
    SchemaObject(
      state,
      name,
      schemaFields)
}

In fromJsValue we create an instance of StateT representing an error with lift and instances representing a successful result with pure:

private def fromJsValue(name: String, value: JsValue): EitherState[(String, Schema)] = {
  val schema = value match {
    case JsNull =>
      lift[ErrorOr, SchemaId, Schema](Left("cannot analyze type of a JSON null value"))
    case JsString(_) =>
      pure[ErrorOr, SchemaId, Schema](SchemaString)
    // ...

We've covered everything that we need to change fromJsArray accordingly:

private def fromJsArray(name: String, values: List[JsValue]): EitherState[Schema] = {
  for {
    schemas <- values.map(v => fromJsValue(name, v).map(_._2)).sequence
    schema <- schemas.headOption match {
      case Some(first) if schemas forall Schema.haveSameStructure(first) =>
        pure[ErrorOr, SchemaId, Schema](SchemaArray(first))
      case None =>
        lift[ErrorOr, SchemaId, Schema](Left("cannot analyze empty JSON array"))
      case _ =>
        lift[ErrorOr, SchemaId, Schema](Left("array type is not consistent"))
    }
  } yield schema
}

We can easily test the error cases like this:

test("test null value") {
  val json ="""{ "x": null }"""

  val actual = SchemaExtractor.fromJsRoot(Json.parse(json)).runA(0)

  assert(actual == Left("cannot analyze type of a JSON null value"))
}

Now the schema extraction function is completely pure, total and deterministic.

The complete source code can be found on GitHub.

Leif Battermann

Functional Programming Fu

Identify Side Effects And Refactor Fearlessly

Why avoid side effects?

PureScript Case Study And Guide For Newcomers

Elm And The Algorithm Of Music

From a music data type to performance

Interactive Command Line Applications In Scala –Well Structured And Purely Functional

How To Use Applicatives For Validation In Scala And Save Much Work

Parsers in Scala built upon existing abstractions

Using existing abstractions

Strongly Typed Configuration Access With Code Generation

Error and state handling with monad transformers in Scala

Use lambdas and combinators to improve your API

Modelling API Responses With sbt-json –Print Current Bitcoin Price

Leif Battermann

Functional Programming Fu

Error and state handling with monad transformers in Scala

The problem

The state monad

Schema extraction with the state monad

Testing is much easier now

Error handling and monad transformers

Handle invalid JSON inputs