How to parse a Git log with FParsec

In this post we will see how to parse a Git log using F# and FParsec.

FParsec is a parser combinator library for F#. The library provides many simple parser functions that can be combined to create quite complex and powerful parsers.

For an introduction on how this works please refer to Functional Monadic Parsers ported to C# which explains some basic concepts and shows how a parser combinator library is built from scratch. Another good starting point is the FParsec tutorial or this post by Mathias Brandewinder.

In this post, however, we will focus on the usage rather than on how it works.

Complete Gist for this post.

Walkthrough

Here is an example of a Git log containing only the last two commits taken from the Fable repository:

commit 23fafe22264837dcc98890b536b2a21810a3e158
Author: Steffen Forkmann <sforkmann@gmail.com>
Date:   Wed Aug 10 12:40:21 2016 +0200

    Make Fetch.Response.Headers add-hoc props return option (#337)

commit 9452475768c3746249ed8da65760a964b47fea0c
Author: Steffen Forkmann <sforkmann@gmail.com>
Date:   Wed Aug 10 12:00:20 2016 +0200

    Adding Response headers to Fetch bindings (#336)

In the next sections we will see how this log can be parsed into a strongly typed list of commits.

Prerequisites

  • F# has to be installed
  • Visual Studio Code with Ionide has to be installed. (Atom, Visual Studio or other is also possible)
  • Git has to be installed and added to PATH

Cloning a Git repository

Clone an existing Git repository e.g. with:

git clone https://github.com/fsprojects/Fable.git.

Then create a new folder called gitlogparser inside the root folder of the cloned repository.

Using Paket to install FParsec

The easiest way to set up FParsec is to use Paket.

Open the new folder in VS Code. Then initialize Paket by typing paket init into the Command Palette. The Command Palette can be opened with Ctrl+Shift+P. Now Paket will initialize and create some files.

Next open the file paket.dependencies and add nuget FParsec:

source https://www.nuget.org/api/v2

nuget FParsec

Then run the Paket: Install command from the Command Palette.

Defining the types

Create a new F# script file called gitlogparser.fsx.

Let’s start with defining the types that represent a commit:

[<AutoOpen>]
module Types =
    open System

    type Id = Id of string

    type Message = Message of string

    type Author = {
        Name:string
        Email:string }

    type Commit = {
        Id:Id
        Author:Author
        Date:DateTimeOffset
        Message:Message }

Creating the parser functions

First we have to reference the FParsec assemblies like this:

#r @"packages/fparsec/lib/net40-client/fparseccs.dll"
#r @"packages/fparsec/lib/net40-client/fparsec.dll"   

Then we can create a few helper functions.

str_ws parses a given string s and ignores trailing whitespaces.

let str_ws s = pstring s .>> spaces

char_ws parses a given character c and ignores trailing whitespaces.

let char_ws c = pchar c .>> spaces

This one ignores leading whitespaces.

let ws_char c = spaces >>. pchar c

anyCharsTill parses any characters and combines them to a string until the parser pEnd succeeds.

let anyCharsTill pEnd = manyCharsTill anyChar pEnd

anyCharsTill can be combined with newline to create a parser for a line of text:

let line = anyCharsTill newline

If we want to skip a given string str at the beginning of a line we can combine line with str-ws like this:

let restOfLineAfter str = str_ws str >>. line

With these helper functions parsing the first few lines of a commit is easy now:

let id = restOfLineAfter "commit"
let date = restOfLineAfter "Date:"
let merge = restOfLineAfter "Merge:"

An email can be parsed by consuming a string between < and > while ignoring leading and trailing whitespaces.

let email = ws_char '<' >>. anyCharsTill (char_ws '>')

An author’s name is just the string between the Author: keyword and the email. Note that the use of lookAhead makes sure that the parsed email is not consumed yet. It is just needed to know where the parser for the name should stop.

let name = anyCharsTill (lookAhead email)
let author = str_ws "Author:" >>. name .>>. email

A commit message is parsed line by line while leading spaces are ignored. The message parser stops when it encounters a newline followed by an id or the end of the stream eof where id is the parser for the beginning of the next commit (see above). Note that here again lookAhead is used because id should not be consumed.

let msgLine = spaces >>. line
let msg = manyTill msgLine (lookAhead (newline >>. id) |>> ignore <|> eof)

Combining the parsers

Now we can put these functions together to create a parser for one commit. One way to do this is to use the parse computation expression. The let! keyword applies parsers sequentially and extracts the parsed values. Behind the scenes a bind operation (aka >>=, flatMap or SelectMany) is performed, but it has a more friendly syntax.

With the return keyword we can create the commit message record.

let commit = parse {
    let! _ = spaces
    let! id = id
    let! _ = optional merge
    let! author = author
    let! date = date
    let! msg = msg
    return { 
        Id = Id id
        Author = { Name = fst author; Email = snd author }
        Date = DateTimeOffset.Parse(date)
        Message = Types.Message (String.concat Environment.NewLine msg) } }

As Jared Hester pointed out, it is not recommended to use the parse computation expression because it is inefficient (also see FParsec documentation).

So here is an alternative version which makes use of the pipe4 function. pipe4 takes 4 parsers and a function, applies them in sequence and then applies the function to the results of the parsers.

let commitId = (spaces >>. id .>> optional merge)

let createCommit id (name, email) date msg = {
    Id      = Id id
    Author  = { Name = name; Email = email }
    Date    = DateTimeOffset.Parse(date)
    Message = Types.Message (String.concat Environment.NewLine msg) }

let commit = pipe4 commitId author date  msg createCommit

A complete log consists of zero or more commits followed by the end of the input eof and can be parsed like this:

many commit .>> eof

Finally we need a function to apply the parser and extract the value:

let parseLog log =
    match log |> run parser with
    | Success(v,_,_)   -> v
    | Failure(msg,_,_) -> failwith msg

Reading the Git log

Here is the code to read the Git log:

module Git =
    open System
    open System.Diagnostics

    let private runCommand cmd args =
        let startInfo = new ProcessStartInfo()
        startInfo.FileName <- cmd
        startInfo.Arguments <- args
        startInfo.UseShellExecute <- false
        startInfo.RedirectStandardOutput <- true

        let proc = Process.Start(startInfo)
        use stream = proc.StandardOutput
        stream.ReadToEnd()

    let log branch args =
        let args = sprintf "log %s %s" branch (String.concat " " args)
        runCommand "git" args

Parsing the Git log

This reads the log and transforms it into a stongly typed list of type Commit:

Git.log "master" ["--date iso"]
|> GitLogParser.parseLog

Example of further processing

Now we can process the list of commits as we like.

As an example let’s list the top 5 contributors and for each display:

  • the total number of commits
  • the distribution of commits over the day
  • the average commit message size

Here’s a sample output (see the code below):

Alfonso Garcia-Caro
    Total commits: 432
    Commits by part of day:
        Overnight: 38 %
        Daytime: 34 %
        Evening: 22 %
        Morning: 6 %
    Average commit message size: 33
Steffen Forkmann
    Total commits: 91
    Commits by part of day:
        Daytime: 74 %
        Evening: 21 %
        Overnight: 4 %
        Morning: 1 %
    Average commit message size: 34
David Podhola
    Total commits: 32
    Commits by part of day:
        Evening: 47 %
        Daytime: 47 %
        Overnight: 6 %
    Average commit message size: 63
F.D.Castel
    Total commits: 29
    Commits by part of day:
        Overnight: 45 %
        Morning: 28 %
        Evening: 17 %
        Daytime: 10 %
    Average commit message size: 167
Krzysztof Cieślak
    Total commits: 14
    Commits by part of day:
        Evening: 79 %
        Daytime: 14 %
        Overnight: 7 %
    Average commit message size: 43

The code can be executed in VS Code with the command FSI: Send File.

let run branch  = 

    let averageMsgLength = 
        List.map (fun c -> c.Message) 
        >> List.averageBy (fun (Message m) -> float m.Length)

    let partitionCommitsByPartOfDay = List.countBy (fun c -> 
        let within start stop (ts:TimeSpan) = 
            ts.Hours >= start && ts.Hours < stop
        let morning = within 6 10
        let daytime = within 10 17
        let evening = within 17 22

        if morning c.Date.TimeOfDay then "Morning"
        else if daytime c.Date.TimeOfDay then "Daytime"
        else if evening c.Date.TimeOfDay then "Evening"
        else "Overnight" )

    let print (name, count, length, stats) =
        do printfn "%s" name
        do printfn "\tTotal commits: %d" count
        do printfn "\tCommits by part of day:%s%s" Environment.NewLine 
            (stats 
             |> List.sortBy snd 
             |> List.rev 
             |> List.map (fun (key, n) -> 
                 sprintf "\t\t%s: %.0f %%" key (float n / float count * 100.0))
             |> String.concat Environment.NewLine)
        do printfn "\tAverage commit message size: %.0f" length

    let commits = 
        Git.log branch ["--date iso"]
        |> GitLogParser.parseLog

    commits
    |> List.groupBy (fun c -> c.Author.Name)
    |> List.map (fun (key, xs) -> 
        key, xs.Length, averageMsgLength xs, partitionCommitsByPartOfDay xs)
    |> List.sortBy (fun (_,commits,_,_) -> commits)
    |> List.rev
    |> List.take 5
    |> List.iter print

run "master"

Complete Gist for this post.

  • jaredhester

    Not sure if you caught this page in the user-guide, but the parse cexpr isn’t recommended
    http://www.quanttec.com/fparsec/users-guide/where-is-the-monad.html#why-the-monadic-syntax-is-slow

    Instead you could use good old fashioned bind style


    let commit =
    (spaces >>. id .>> optional merge) >>= fun id ->
    author >>= fun author ->
    date >>= fun date ->
    msg >>= fun msg ->
    preturn
    { Id = Id id
    Author = { Name = fst author; Email = snd author }
    Date = DateTimeOffset.Parse(date)
    Message = Types.Message (String.concat Environment.NewLine msg)
    }

    But really pipe4 is the way to go


    let commit =
    pipe4 (spaces >>. id .>> optional merge) author date msg
    (fun id author date msg ->
    { Id = Id id
    Author = { Name = fst author; Email = snd author }
    Date = DateTimeOffset.Parse(date)
    Message = Types.Message (String.concat Environment.NewLine msg)
    })

    • No, I didn’t catch that. Thanks a lot for pointing that out. I figured that there would be improvements to my code :). I’ll update it later.

  • brandewinder

    Nice post, and thanks for the recommendation 🙂
    This is timely, too, because I was thinking of doing a bit of crawling through the F# projects on GitHub, to understand better / map out who depends on whom, your code should come in handy for that task…

    • Thanks. I hope it helps, because you need to clone a repository for this code to work. I am not sure if this is convenient for your task? Let me know 🙂