How to parse a Git log with FParsec

When we refactor code how can we be confident that we don't break anything?

3 of the most important things that allow us to refactor fearlessly are:

Side effect free - or pure - expressions
Statically typed expressions
Tests

In this article we will solely focus on the aspect of side effects and strictly speaking on how to identify them. Being able to identify side effects in our programs clearly is the precondition for eliminating them.

Why avoid side effects?

Continue reading →

Have you ever wanted to try out PureScript but were lacking a good way to get started?

If you

Have some prior functional programming knowledge - maybe you know Haskell,Elm,F#,or Scala,etc.
Want to solve a small task with PureScript
And want to get started quickly

This post is for you!

In this post we will walk through setting up and implementing a small exemplary PureScript application from scratch.

Continue reading →

In this article I would like to present a minimal implementation of a music data type and everything that is needed to turn that into audible sound from an Elm application.

We will see how to transcribe an existing composition - an excerpt from Chick Corea's Children's Songs No. 6 - and listen to the result right here,embedded in this article.

From a music data type to performance

My colleague Jonas recently pointed out the presentation Making Algorithmic Music by Donya Quick to me. Donya Quick shows how she uses the Haskell library Euterpea to produce algorithmic music.

It got me really excited about the idea of porting this to Elm and to be able to use this in web applications.

In the following we will see the core data types and algorithms from Euterpea ported to Elm. To focus on the core concepts the implementation is stripped down to the minimum that is required to transcribe and perform an existing polyphonic piece of music (for a single instrument).

Continue reading →

This post is about how to implement well structured,and purely functional command line applications in Scala using PureApp.

PureApp originated in an experiment while refactoring out some glue code of an interactive command line application. At the same time it was inspired by the Elm Architecture Pattern,and scalaz's SafeApp,as well as scalm.

To show the really cool things we can do with PureApp,we will implement a self-contained example application from scratch.

This application translates texts from and into different languages. And it provides basic user interactions via the command line.

The complete source code is compiled with tut. Every output (displayed as code comments) is generated by tut.
Continue reading →

In this post we will see how applicatives can be used for validation in Scala. It is an elegant approach. Especially when compared to an object-oriented way.

Usually when we have operations that can fail,we have them return types like Option or Try. We sequence operations and once there is an error the computation is short circuited and the result is a None or a Failure.

Applicatives allow us to compose independent operations and evaluate each one. Even if an intermediate evaluation fails. This allows us to collect error messages instead of returning only the first error that occurred.

A classic example where this is useful is the validation of user input. We would like to return a list of all invalid inputs rather than aborting the evaluation after the first error.

Scala Cats provides a type that does exactly that. So let's dive into some code and see how it works.

Continue reading →

After some initial struggles,the chapter Functional Parsers from the great book Programming in Haskell by Graham Hutton,where a basic parser library is built from scratch,significantly helped me to finally understand the core ideas of parser combinators and how to apply them to other programming languages other than Haskell as well.

While I recently revisited the material and started to port the examples to Scala I wasn't able to define a proper monad instance for the type Parser[A].

The type Parser[A] alias was defined like this:

type Parser[A] = String =>Option[(A,String)] // defined type alias Parser

To test the monad laws with discipline I had to provide an instance of Eq[Parser[A]]. Because Parser[A] is a function,equality could only be approximated by showing degrees of function equivalence,which is not a trivial task.

Also the implementation of tailRecM was challenging. (I couldn't figure it out.)

Using existing abstractions

Continue reading →

Most config libraries use a stringly typed approach.

Some handle runtime failures due to invalid configuration schemas by leveraging data types like Option or Result to represent missing values or errors. This allows us to handle these failures by either providing default values or by providing decent error messages.

This is a good strategy that we should definitely stick to.

However,the problem with default values is that we might not even notice if the configuration is broken. This could potentially fail in production. In any case an error e.g. due to a misspelled config property will be observable at runtime at the earliest.

Wouldn't it be a great user experience (for us developers) if the compiler told us if the configuration schema is invalid? Even better,imagine we could access the configuration data in a strongly typed way like any other data structure,and with autocompletion.

Moreover,what if we didn't have to write any glue code,not even when the configuration schema changes?

This can be done with the costs of an initial setup that won't take more than probably around 5 minutes.

Continue reading →

In this post I will look at a practical example where the combined application (through monad transformers) of the state monad and the either monad can be very useful.

I won't go into much theory,but instead demonstrate the problem and then slowly build it up to resolve it.

You don't have to be completely familiar with all the concepts as the examples will be easy to follow. Here is a very brief overview:

Continue reading →

If your API overflows with Boolean parameters,this is usually a bad smell.

Consider the following function call for example:

toContactInfoList(csv,true,true)

When looking at this snippet of code it is not very clear what kind of effect the two Boolean parameters will have exactly. In fact,we would probably be without a clue.

We have to inspect the documentation or at least the parameter names of the function declaration to get a better idea. But still,this doesn't solve all of our problems.

The more Boolean parameters there are,the easier it will be for the caller to mix them up. We have to be very careful.

Moreover,functions with Boolean parameters must have conditional logic like if or case statements inside. With a growing number of conditional statements,the number of possible execution paths will grow exponentially. It will become more difficult to reason about the implementation code.

Can we do better?

Sure we can. Lambdas and combinators come to the rescue and I'm going to show this with a simple example,a refactoring of the function from above.

This post is based on a great article by John A De Goes,Destroy All Ifs — A Perspective from Functional Programming.

I'm going to take John's ideas that he backed up with PureScript examples and present how the same thing can be elegantly achieved in Scala.

Continue reading →

I'm currently working on an sbt plugin that generates Scala case classes at compile time to model JSON API responses for easy deserialization especially with the Scala play-json library.

The plugin makes it possible to access JSON documents in a statically typed way including auto-completion. It takes a sample JSON document as input (either from a file or a URL) and generates Scala types that can be used to read data with the same structure.

Let's look at a basic example,an app that prints the current Bitcoin price to the console.

Continue reading →

In this post we will see how to parse a Git log using F# and FParsec.

FParsec is a parser combinator library for F#. The library provides many simple parser functions that can be combined to create quite complex and powerful parsers.

For an introduction on how this works please refer to Functional Monadic Parsers ported to C# which explains some basic concepts and shows how a parser combinator library is built from scratch. Another good starting point is the FParsec tutorial or this post by Mathias Brandewinder.

In this post, however, we will focus on the usage rather than on how it works.

Complete Gist for this post.

Walkthrough

Here is an example of a Git log containing only the last two commits taken from the Fable repository:

commit 23fafe22264837dcc98890b536b2a21810a3e158
Author: Steffen Forkmann <sforkmann@gmail.com>
Date:   Wed Aug 10 12:40:21 2016 +0200

    Make Fetch.Response.Headers add-hoc props return option (#337)

commit 9452475768c3746249ed8da65760a964b47fea0c
Author: Steffen Forkmann <sforkmann@gmail.com>
Date:   Wed Aug 10 12:00:20 2016 +0200

    Adding Response headers to Fetch bindings (#336)

In the next sections we will see how this log can be parsed into a strongly typed list of commits.

Prerequisites

F# has to be installed
Visual Studio Code with Ionide has to be installed. (Atom, Visual Studio or other is also possible)
Git has to be installed and added to PATH

Cloning a Git repository

Clone an existing Git repository e.g. with:

git clone https://github.com/fsprojects/Fable.git.

Then create a new folder called gitlogparser inside the root folder of the cloned repository.

Using Paket to install FParsec

The easiest way to set up FParsec is to use Paket.

Open the new folder in VS Code. Then initialize Paket by typing paket init into the Command Palette. The Command Palette can be opened with Ctrl+Shift+P. Now Paket will initialize and create some files.

Next open the file paket.dependencies and add nuget FParsec:

source https://www.nuget.org/api/v2

nuget FParsec

Then run the Paket: Install command from the Command Palette.

Defining the types

Create a new F# script file called gitlogparser.fsx.

Let's start with defining the types that represent a commit:

[<AutoOpen>]
module Types =
    open System

    type Id = Id of string

    type Message = Message of string

    type Author = {
        Name:string
        Email:string }

    type Commit = {
        Id:Id
        Author:Author
        Date:DateTimeOffset
        Message:Message }

Creating the parser functions

First we have to reference the FParsec assemblies like this:

#r @"packages/fparsec/lib/net40-client/fparseccs.dll"
#r @"packages/fparsec/lib/net40-client/fparsec.dll"

Then we can create a few helper functions.

str_ws parses a given string s and ignores trailing whitespaces.

let str_ws s = pstring s .>> spaces

char_ws parses a given character c and ignores trailing whitespaces.

let char_ws c = pchar c .>> spaces

This one ignores leading whitespaces.

let ws_char c = spaces >>. pchar c

anyCharsTill parses any characters and combines them to a string until the parser pEnd succeeds.

let anyCharsTill pEnd = manyCharsTill anyChar pEnd

anyCharsTill can be combined with newline to create a parser for a line of text:

let line = anyCharsTill newline

If we want to skip a given string str at the beginning of a line we can combine line with str-ws like this:

let restOfLineAfter str = str_ws str >>. line

With these helper functions parsing the first few lines of a commit is easy now:

let id = restOfLineAfter "commit"
let date = restOfLineAfter "Date:"
let merge = restOfLineAfter "Merge:"

An email can be parsed by consuming a string between < and > while ignoring leading and trailing whitespaces.

let email = ws_char '<' >>. anyCharsTill (char_ws '>')

An author's name is just the string between the Author: keyword and the email. Note that the use of lookAhead makes sure that the parsed email is not consumed yet. It is just needed to know where the parser for the name should stop.

let name = anyCharsTill (lookAhead email)
let author = str_ws "Author:" >>. name .>>. email

A commit message is parsed line by line while leading spaces are ignored. The message parser stops when it encounters a newline followed by an id or the end of the stream eof where id is the parser for the beginning of the next commit (see above). Note that here again lookAhead is used because id should not be consumed.

let msgLine = spaces >>. line
let msg = manyTill msgLine (lookAhead (newline >>. id) |>> ignore <|> eof)

Combining the parsers

Now we can put these functions together to create a parser for one commit. One way to do this is to use the parse computation expression. The let! keyword applies parsers sequentially and extracts the parsed values. Behind the scenes a bind operation (aka >>=, flatMap or SelectMany) is performed, but it has a more friendly syntax.

With the return keyword we can create the commit message record.

let commit = parse {
    let! _ = spaces
    let! id = id
    let! _ = optional merge
    let! author = author
    let! date = date
    let! msg = msg
    return { 
        Id = Id id
        Author = { Name = fst author; Email = snd author }
        Date = DateTimeOffset.Parse(date)
        Message = Types.Message (String.concat Environment.NewLine msg) } }

As Jared Hester pointed out, it is not recommended to use the parse computation expression because it is inefficient (also see FParsec documentation).

So here is an alternative version which makes use of the pipe4 function. pipe4 takes 4 parsers and a function, applies them in sequence and then applies the function to the results of the parsers.

let commitId = (spaces >>. id .>> optional merge)

let createCommit id (name, email) date msg = {
    Id      = Id id
    Author  = { Name = name; Email = email }
    Date    = DateTimeOffset.Parse(date)
    Message = Types.Message (String.concat Environment.NewLine msg) }

let commit = pipe4 commitId author date  msg createCommit

A complete log consists of zero or more commits followed by the end of the input eof and can be parsed like this:

many commit .>> eof

Finally we need a function to apply the parser and extract the value:

let parseLog log =
    match log |> run parser with
    | Success(v,_,_)   -> v
    | Failure(msg,_,_) -> failwith msg

Reading the Git log

Here is the code to read the Git log:

module Git =
    open System
    open System.Diagnostics

    let private runCommand cmd args =
        let startInfo = new ProcessStartInfo()
        startInfo.FileName <- cmd
        startInfo.Arguments <- args
        startInfo.UseShellExecute <- false
        startInfo.RedirectStandardOutput <- true

        let proc = Process.Start(startInfo)
        use stream = proc.StandardOutput
        stream.ReadToEnd()

    let log branch args =
        let args = sprintf "log %s %s" branch (String.concat " " args)
        runCommand "git" args

Parsing the Git log

This reads the log and transforms it into a stongly typed list of type Commit:

Git.log "master" ["--date iso"]
|> GitLogParser.parseLog

Example of further processing

Now we can process the list of commits as we like.

As an example let's list the top 5 contributors and for each display:

the total number of commits
the distribution of commits over the day
the average commit message size

Here's a sample output (see the code below):

Alfonso Garcia-Caro
    Total commits: 432
    Commits by part of day:
        Overnight: 38 %
        Daytime: 34 %
        Evening: 22 %
        Morning: 6 %
    Average commit message size: 33
Steffen Forkmann
    Total commits: 91
    Commits by part of day:
        Daytime: 74 %
        Evening: 21 %
        Overnight: 4 %
        Morning: 1 %
    Average commit message size: 34
David Podhola
    Total commits: 32
    Commits by part of day:
        Evening: 47 %
        Daytime: 47 %
        Overnight: 6 %
    Average commit message size: 63
F.D.Castel
    Total commits: 29
    Commits by part of day:
        Overnight: 45 %
        Morning: 28 %
        Evening: 17 %
        Daytime: 10 %
    Average commit message size: 167
Krzysztof Cieślak
    Total commits: 14
    Commits by part of day:
        Evening: 79 %
        Daytime: 14 %
        Overnight: 7 %
    Average commit message size: 43

The code can be executed in VS Code with the command FSI: Send File.

let run branch  = 

    let averageMsgLength = 
        List.map (fun c -> c.Message) 
        >> List.averageBy (fun (Message m) -> float m.Length)

    let partitionCommitsByPartOfDay = List.countBy (fun c -> 
        let within start stop (ts:TimeSpan) = 
            ts.Hours >= start && ts.Hours < stop
        let morning = within 6 10
        let daytime = within 10 17
        let evening = within 17 22

        if morning c.Date.TimeOfDay then "Morning"
        else if daytime c.Date.TimeOfDay then "Daytime"
        else if evening c.Date.TimeOfDay then "Evening"
        else "Overnight" )

    let print (name, count, length, stats) =
        do printfn "%s" name
        do printfn "\tTotal commits: %d" count
        do printfn "\tCommits by part of day:%s%s" Environment.NewLine 
            (stats 
             |> List.sortBy snd 
             |> List.rev 
             |> List.map (fun (key, n) -> 
                 sprintf "\t\t%s: %.0f %%" key (float n / float count * 100.0))
             |> String.concat Environment.NewLine)
        do printfn "\tAverage commit message size: %.0f" length

    let commits = 
        Git.log branch ["--date iso"]
        |> GitLogParser.parseLog

    commits
    |> List.groupBy (fun c -> c.Author.Name)
    |> List.map (fun (key, xs) -> 
        key, xs.Length, averageMsgLength xs, partitionCommitsByPartOfDay xs)
    |> List.sortBy (fun (_,commits,_,_) -> commits)
    |> List.rev
    |> List.take 5
    |> List.iter print

run "master"

Complete Gist for this post.

Leif Battermann

Functional Programming Fu

Identify Side Effects And Refactor Fearlessly

Why avoid side effects?

PureScript Case Study And Guide For Newcomers

Elm And The Algorithm Of Music

From a music data type to performance

Interactive Command Line Applications In Scala –Well Structured And Purely Functional

How To Use Applicatives For Validation In Scala And Save Much Work

Parsers in Scala built upon existing abstractions

Using existing abstractions

Strongly Typed Configuration Access With Code Generation

Error and state handling with monad transformers in Scala

Use lambdas and combinators to improve your API

Modelling API Responses With sbt-json –Print Current Bitcoin Price

Leif Battermann

Functional Programming Fu

How to parse a Git log with FParsec

Walkthrough

Prerequisites

Cloning a Git repository

Using Paket to install FParsec

Defining the types

Creating the parser functions

Combining the parsers

Reading the Git log

Parsing the Git log

Example of further processing