dave^2 = -1

Using phantom types to associate static values and generic types

2019-01-17T07:50:00+11:00

Phantom types seem to get used reasonably regularly in a variety of languages for ensuring the safe use of various values (as identifiers, for state transitions and others).

A colleague and I recently found a case where they provided a slightly different benefit. They still helped provide some required type safety, but in this case they also helped to statically associate a value with a generic type parameter, avoiding a runtime lookup and keeping the code succinct using type inference.

This post uses Kotlin, but should be applicable to Java or any other language with generics / parameterised types.

What is a phantom type?

A phantom type is a type that has a type parameter in its declaration that is not actually used by the constructors or fields of that type.

For (a fairly contrived) example:

data class Measure<T>(val value: Double)

Here we have a Measure type with a single Double field. It is parameterised by some type T, but we could quite easily remove this parameter and the class would still work (i.e. data class Measure(val value: Double)).

Even though T isn’t used in the class definition, it does let us tag references to this type with additional information. The type parameter does not change how the type itself works, but it does let us restrict how it can be used.

Let’s make a plus operator for Measure, but ensure it only works on values of the same measurement units:

data class Measure<T>(val value: Double) {
    operator fun plus(other: Measure<T>): Measure<T> =
            Measure(value + other.value)
}

object Metres
object Kilograms

@Test
fun testMeasures() {
    val first = Measure<Metres>(42.0)
    val second = Measure<Metres>(4200.0)
    val third = Measure<Kilograms>(1.0)

    // Can add Metres:
    assertEquals(Measure<Metres>(4242.0), first + second)

    // Can not add Kilograms to Metres:
    // assertEquals(Measure(43.0), first + third)
    // Type mismatch: inferred type is Measure but Measure was expected
}

Within Measure we just have a Double; the actual type of T makes no difference at all. But when we need to use Measure values, we can ensure only compatible T values are combined, or define operations only for specific T (such as convert: Measure -> Measure).

A problem associating a static value with a type

I’ll strip back the actual problem we were facing to something that is convenient to write up, yet still hopefully within the realms of plausibility. Say we have several entity types we want to load from some data store. Values of each type are queried by some schema information.

/** Marker interface for persistable entities. */
interface Entity

/** Schema information used to query entities in their persisted state. */
data class Schema(val name: String, val version: Int)

data class Widget(val style: String, val barcode: Barcode) : Entity {
    companion object {
        /** Static property of Widget, representing that type's schema */
        val SCHEMA = Schema("acme.Widget", 12)
    }
}

data class Sprocket(val size: Int) : Entity {
    companion object {
        /** Schema for Sprockets. */
        val SCHEMA = Schema("acme.Sprocket", 12)
    }
}

class MagicalBucketOfData {
    // ... magical persistence code here ...

    // WILL NOT COMPILE!!!
    fun <T : Entity> fetch(): List<T> =
            // Need to get schema for T to query:
            db.where("schema", T.SCHEMA.name) // ??? But can't access T.SCHEMA!
              .map { it as T }
}

The problem here is that we can’t get the static SCHEMA property from T. There is no way for us to say “all types T where T has a static property SCHEMA: Schema” in Kotlin.

There are a few options here. We can change our fetch method to be fun fetch(schema: Schema): List, but then we could call fetch(Widget.SCHEMA) by accident, which will cause all sorts of troubles when we try to cast/convert our data to the wrong type. We could use Kotlin’s reified type parameters or pass in a Class, and switch on type to lookup the correct schema for whatever T is passed in, but that gets a little messy and will add a runtime cost when we actually know what we want statically.

Phantom types to the rescue

Instead, let’s make Schema a phantom type and tag it with the information about the specific entity type it represents.

data class Schema<T: Entity>(val name: String, val version: Int)

This gives a warning that Type parameter "T" is never used, a convincing indication we have a phantom type. Spooky.

data class Widget(val style: String, val barcode: Barcode) : Entity {
    companion object {
        // This isn't just any schema, it is a schema for a widget!
        val SCHEMA = Schema<Widget>("acme.Widget", 12)
    }
}

data class Sprocket(val size: Int) : Entity {
    companion object {
        val SCHEMA = Schema<Sprocket>("acme.Sprocket", 12)
    }
}

Now we can pass through a schema value to our instance, and the compiler will infer what T we’re after:

fun <T : Entity> fetch(schema: Schema<T>): List<T> =
        db.where("schema", schema.name)
          .map { it as T }

// Example uses:

// Here widgets will be of type `List`; the type will get
// inferred by the schema we pass.
val widgets = fetch(Widget.SCHEMA)

// Given a method that works on sprockets:
fun handleSprockets(sprockets: List<Sprocket>) { ... }
// This will fail as we're trying to use widgets for sprockets:
handleSprockets(fetch(Widget.SCHEMA)) // Type mismatch!

Now we have schema values that are also associated with specific types, giving us the ability to write generic code that needs these values.

Conclusion

Adding an essentially unused type parameter to Schema here gives us an easy way to associate schema values with a particular type. It avoids runtime lookups on types, and does not compromise code safety by allowing us to pass through a value that is not appropriate for the expected type T.

Quick, hacky truth tables in Haskell

2018-12-05T16:20:00+11:00

Today I wanted to test a few boolean expressions, and ended up with some quick truth table generation hackery in Haskell which I thought I’d note down for next time. I’m sure there are many better ways of doing this, but this way was mine.

Here’s a list of all booleans in GHCi:

λ> let bools = [False, True]

We can use the default applicative instance for lists to run all combinations of bools through a function. If we use a tuple constructor, we’ll get truth table inputs (shown below for 2 and 3 argument expressions).

λ> (,) <$> bools <*> bools
[(False,False),(False,True),(True,False),(True,True)]
λ> (,,) <$> bools <*> bools <*> bools
[(False,False,False),(False,False,True),(False,True,False),(False,True,True),(True,False,False),(True,False,True),(True,True,False),(True,True,True)]

Let’s check a basic expression p, and its equivalent using De Morgan’s laws. Writing x <$> bools <*> bools gets repetitive, so we’ll start off by defining tt2 to get truth-tableish output values for a 2 input function Bool -> Bool -> Bool.

λ> let tt2 f = f <$> bools <*> bools
tt2 :: (Bool -> Bool -> b) -> [b]
λ> let p  a b = not (a && not b)
λ> let p' a b = not a || b
λ> tt2 p
[True,True,False,True]
λ> tt2 p'
[True,True,False,True]
λ> tt2 p == tt2 p'
True

We can then zipWith to get truth table inputs with the corresponding truth table output (using a bit of uncurry trickery):

λ> :t uncurry (,,)
uncurry (,,) :: (a, b) -> c -> (a, b, c)
λ> zipWith (uncurry (,,)) (tt2 (,)) (tt2 p)
[(False,False,True),(False,True,True),(True,False,False),(True,True,True)]

Or for more arguments:

λ> let tt3 f = f <$> bools <*> bools <*> bools
tt3 :: (Bool -> Bool -> Bool -> b) -> [b]
λ> zipWith (\(a,b,c) d -> (a,b,c,d)) (tt3 (,,)) (tt3 (\a b c -> a && b || c))
[(False,False,False,False),(False,False,True,True),(False,True,False,False),(False,True,True,True),(True,False,False,False),(True,False,True,True),(True,True,False,True),(True,True,True,True)]

We can write something to format these nicely, but this was enough for me to get the information I wanted about a few expressions.

For sale: Several tonnes of yak hair, going cheap

2018-10-20T10:20:00+11:00

A sordid tale of builds, Gradle, Grunt, node, yarn, Linux, TeamCity, Docker, ssh, and shaven yaks.

Happy yak shaving! (src)

Once upon a time there was a project that produced a JAR based on some generated code. I needed to make a repeatable, versioned build for the JAR and plug it into TeamCity so I could reference that build artifact from my current project. Simple!

What could possibli go rong? (src)

Here is a rough timeline of how this went.

WARNING! In case it is not painfully obvious, this is all me learning stuff so any specifics mentioned are probably sub-optimal at best to destructive at worst. This is more about the journey than the destination.

Local build

Do some initial analysis and find I need to invert the current relationship between the JAR project and the generation code (so building the JAR can call out to a specific version of the generator and get freshly generated files).
Learn some Node and Grunt so I can update the generator to accept a parameter to change the output directory, then update the JAR project to have the generator as a git submodule so we can control the version used.
Learn enough Gradle to trigger the generation code on build, then build the JAR and run tests.
Learn a bit more Gradle to call git describe with --first-parent and use this information to shove some version info into the manifest.
Test build with clean checkout and various tags, document and push changes.

So far so good! Now to wire this up to TeamCity.

Add build to TeamCity

Create TeamCity build
Build immediately fails: --first-parent is not supported by git describe
Log in to build agent, find it has git 1.8.3, but --first-parent was added to git describe in 1.8.4.
Try to update git on build agents, which needs to be done from source as an updated package is not readily available.
Building from source requires some new packages installed, which fails due to some certificate misconfiguration for one repository.
Type stuff into Google until I find a magic flag to skip that repo and get the packages installed.
Install updated git.
Build gets a step further, but fails at the generation code step. The generator requires specific versions of node and yarn. The agents have different versions.
Start installing updated node and yarn from packages, working around repository problems.
Find out that another project has started sharing these agents and requires a different version of node and yarn. They can’t change their version either.
Roll back node and yarn changes.

Docker

At this point someone suggests trying Docker to isolate the build dependencies. The generator team has an image for the generator I can use as a base. I can then create an image that adds the JDK and the version of Git I need and I should be good to go.

Start learning Docker.
Try to work out whether to install Docker locally via brew or via the installer, reading conflicting information on both (I ended up with the installer).
Work through the Docker tutorial.
Learn a bit about images, containers, tags and how to manage the disk space used for these (docker system df -v, different prune options).
Get access to the registry used for storing the base image.
Work out how to create a new Docker image with the required JDK version on top of the base image.
Learn how to run a bash shell in a container based on this image: docker run -it image_name bash
The base image already has a modern version of git installed. Hooray!
Learn how to clean up transient containers used for bash shells: docker run -it --rm image_name bash
Work out how to map volumes to get host’s SSH working within container. Something like: docker run -it --rm -v ${HOME}/.ssh:/root/.ssh -v $SSH_AUTH_SOCK:/ssh-agent -e SSH_AUTH_SOCK=/ssh-agent image_name bash
Work out how to use host’s Gradle properties within container: docker run -it --rm -v ${HOME}/.gradle:/root/.gradle ...
Get build working in container! Hooray!
Come up with a versioning and tagging scheme for the Dockerfile and corresponding image so that we always have enough information to build each commit of the project with a compatible Docker image.
Publish new image to registry and test build.
Document all this in the project readme.

Add Docker build to TeamCity

Great, home stretch now. TeamCity has some Docker integration so this should be easy.

Attempt to install Docker on build agents. Unfortunately due to more package problems this does not work. Have to wget some other packages and manually install them before installing the Docker package.
Read up on TeamCity’s Docker integration.
Work out how to generate a key for TeamCity to talk to the image registry.
Update the TeamCity build to use the required image.
Run the build… JVM crash on the agent!
Log in to agent, find that Docker stores images and containers at a location mounted with no free space.
Work with server team to get more space at that mount point.
Run again, but generation fails. It needs access to checkout its own dependencies via SSH to a separately hosted repo.
Read up on SSH with TeamCity and Docker, and the git hosting being used for the other repo.
Talk to generation team about accessing these dependencies. Get access to generate token with SSH key that can be used for this.
Work out how to assign the token to the required dependency on the git host.
Add key to TeamCity and assign it to the build. Same error.
Find out about TeamCity SSH Agent build feature that needs to be added with the selected key. Same error.
Update TeamCity build’s Docker parameters to pass through SSH info from host. Same error.
Various combinations of the above. Same error.
Work out it is a problem with ~/.ssh/known_hosts. Update this on the agents (ssh-keyscan -t rsa the_repo >> ~/.ssh/known_hosts), then make sure .ssh from host gets mapped to container via Docker parameters.
Different error! Well, same error, but for a different dependency.
Assign token to new dependency.
Assign token to the other 4 dependencies as each new error occurs.
Finally get through the main build!

Then it failed on deployment due to a different credential issue to another internal system. But after temporarily disabling deployment, the build works! Once I sort out the deployment problem I can start on my actual task of adding this library to my project. But that should be simple to fix, right?

Now if you’ll excuse me I’m going to go and have a bit of a lie down on several mountains of yak hair-stuffed cushions.

StandaloneDeriving to fix forgetfulness in GHCi

2018-07-03T15:09:00+10:00

Quick reminder to Future Dave, as I’m going to assume he’ll keep making the same mistake Past and Present Daves make.

When switching between my editor and GHCi REPL to test stuff out I often forget to add a deriving (Show, Eq) or similar line to my data types. This normally occurs after I’ve just set up a bit of test data in the REPL, so if I just fix the data declaration and :reload GHCi then my setup will be lost. We can use the StandaloneDeriving GHC extension to help here.

The following example is me playing around with some parsing stuff and forgetting to make ParseError an instance of Show (so it won’t print in the REPL), then using :set -XStandaloneDeriving to fix this:

λ> :set -XOverloadedStrings
λ> let kv = M.fromList [ ("hi" :: T.Text, "2011-01-02" :: T.Text), ("world" :: T.Text, "2014-05-07" :: T.Text) ]
λ> kv
fromList [("hi","2011-01-02"),("world","2014-05-07")]
λ> let lookup k = (`tag` (KeyNotFound k)) . M.lookup k
λ> lookup "hi" kv

<interactive>:94:1: error:
    • No instance for (Show ParseError) arising from a use of ‘print’
    • In a stmt of an interactive GHCi command: print it

λ> :set -XStandaloneDeriving
λ> deriving instance Show ParseError

<interactive>:1:1: warning: [-Worphans]
    Orphan instance: instance Show ParseError ...

λ> lookup "hi" kv
Right "2011-01-02"

I’ve trimmed the orphan instance warning. In this case it should not be a problem as I’m just working around my forgetfulness. :)

We can also use this in real .hs files if necessary via a language pragma ({-# LANGUAGE StandaloneDeriving #-}) as described here.

Aggregation

2017-11-04T17:30:00+11:00

Today I wanted to look at an approach for producing aggregate data from multiple measurements over a source. I’m learning Kotlin at the moment so I’ll use that for the examples in this post, but we can apply the same idea to pretty much any language (I’ve used similar approaches in F#, and it would work with C# albeit with a bit more code noise). Any feedback on the approach in general and on my Kotlin-ing attempts is appreciated.

Motivating example

For this post we’ll consider the example of a list of Sample values we want aggregate information for. A Sample includes the month and year it was collected, and an integer representing the value sampled. For each set of samples we are required to show the following information:

The total value sampled for each month and year
The earliest sample date in this data set
The largest individual sample collected
A count of how many samples where within a specific range.

data class MonthYear(val year: Int, val month: Int) : Comparable<MonthYear> { /* ... */ }
data class Sample(val date: MonthYear, val value: Int)

Initial attempts

We could neatly get each individual bit of information by using multiple queries¹, but requiring multiple iterations seems quite wasteful, especially for larger data sets. Instead we could use multiple variables, or an aggregate type containing those variables, and update each as we loop or fold over the data set:

data class CandidateAggregate(var data: Map<MonthYear, Int>,
                              var earliestSampleDate: MonthYear?,
                              var largestSample: Int,
                              var inRangeCount: Int)

val result = samples.fold(
        CandidateAggregate(emptyMap(), null, 0, 0), // empty case
        { acc, s ->
            CandidateAggregate(
                acc.data.insertOrUpdate(s.date) { if (it==null) s.value else it + s.value },
                minOf(acc.earliestSampleDate ?: s.date, s.date),
                maxOf(acc.largestSample, s.value),
                acc.inRangeCount + if ((100..200).contains(s.value)) 1 else 0
            )
        })

/** Helper for updating the value for a key in a map, or inserting it if it does not exist. */
private fun <K, V> Map<K, V>.insertOrUpdate(key: K, transform: (V?) -> V): Map<K, V> =
        plus(key to transform(get(key)))

This seems a reasonable approach to me, and we’ll take this and adapt it in an attempt to get a few additional benefits:

Include more information about the type of calculation used for each field in the aggregate
Enable reuse of specific calculations in other aggregates
Enable independent testing of each calculation type
Make it fairly simple to change existing aggregates, and to create new ones.

Representing aggregate calculations

First we’ll create a type to represent values that can be combined. We’ll use Kotlin’s plus operator for this purpose.

/** A type [T] with an associative binary operation. Must satisfy the associative property:  `a + (b + c) == (a + b) + c` */
interface Semigroup<T> {
    operator fun plus(other: T): T
}

We’ll steal the term “semigroup” from mathematics as its definition includes the constraints our plus operation needs², although we could also call it Combinable or Addable or something else if we prefer.

If you haven’t used Kotlin before, defining a plus operator function lets us also use the + symbol, so a + b will get translated to a.plus(b). Whenever you see two semigroups being added using + for the remainder of this post, keep in mind it will be calling the plus function defined by that semigroup instance. (If you don’t like co-opting + in this way feel free to change the interface to declare fun combine(other: T): T) or similar.)

Next, we’ll define instances that represent sum, max, and min aggregation:

data class Sum(val value: Int) : Semigroup<Sum> {
    override fun plus(other: Sum): Sum = Sum(value + other.value)
}

data class Max<T : Comparable<T>>(val value: T) : Semigroup<Max<T>> {
    override operator fun plus(other: Max<T>) = Max(maxOf(value, other.value))
}

data class Min<T : Comparable<T>>(val value: T) : Semigroup<Min<T>> {
    override operator fun plus(other: Min<T>) = Min(minOf(value, other.value))
}

Looking at our CandidateAggregate from earlier, we also need to handle nullable values (earliestSampleDate: MonthYear?), as well as combining Map values. Rather than building these specifically for this case, we can express these concepts more generally in terms of other semigroups, so they can be reused for different cases:

/**
 * Combine nullable values. Use the semigroup instance to combine if both have values, or if only
 * one value is present use that.
 */
data class Nullable<T : Semigroup<T>>(val value: T?) : Semigroup<Nullable<T>> {
    override fun plus(other: Nullable<T>): Nullable<T> =
            if (value != null && other.value != null) {
                Nullable(value + other.value) // Reminder: `+` here will call T.plus defined for the Semigroup.
            } else {
                Nullable(this.value ?: other.value)
            }
}

/**
 * Merge [Map]s where the values have a semigroup instance. If both maps have an entry for the same key, these
 * will be combined using the semigroup operation.
 */
data class Mapped<K, V : Semigroup<V>>(val value: Map<K, V>) : Semigroup<Mapped<K, V>> {
    override fun plus(other: Mapped<K, V>): Mapped<K, V> =
            value.entries.fold(other.value) { acc, entry ->
                acc.insertOrUpdate(entry.key) { if (it != null) it + entry.value else entry.value }
            }.let { Mapped(it) }
}

Each of these operations is implemented quite similarly to the code we used for each field in CandidateAggregate, but now we can reuse them for different aggregates, as well as test each in isolation. The cost is we have now spread this code across more types.

We can also write some general functions, concat and concatMap, to combine any list of Semigroup values into a single Semigroup value, effectively combining aggregates³. Here is an example of how to define and use these functions (as well as an example of testing Sum and Max in isolation):

/** Reduce a list of `T` to a single `T` using a semigroup operation */
fun <T : Semigroup<T>> concat(empty: T, items: Iterable<T>) = items.fold(empty) { acc, t -> t + acc }

/** Reduce a list of [A] by converting each item to a [T] with a semigroup instance, then combining to a single value using [concat]. */
fun <T : Semigroup<T>, A> concatMap(empty: T, items: Iterable<A>, f: (A) -> T) =
        items.fold(empty) { acc, t -> f(t) + acc }
        /* Note: this is logically equivalent to the simpler:
         *      concat(empty, items.map(f))
         * But this would do two passes through the list.
         */

@Test
fun examples() {
    val list = listOf(42, 123, 19, 73)
    assertEquals(Sum(257), Semigroup.concatMap(Sum(0), list) { Sum(it) })
    assertEquals(Max(123), Semigroup.concatMap(Max(0), list) { Max(it) })
}

Using our aggregation types

Now we can rewrite CandidateAggregate using our aggregation types:

data class Aggregate(var data: Mapped<MonthYear, Sum>,
                     var earliestSampleDate: Nullable<Min<MonthYear>>,
                     var largestSample: Max<Int>,
                     var inRange: Sum) : Semigroup<Aggregate> {
    companion object {
        val empty = Aggregate(Mapped(emptyMap()), Nullable(null), Max(0), Sum(0))
    }

    override fun plus(other: Aggregate): Aggregate =
            Aggregate(data + other.data,
                    earliestSampleDate + other.earliestSampleDate,
                    largestSample + other.largestSample,
                    inRange + other.inRange)
}

The type of aggregation used appears explicitly for each field in Aggregate. For example largestSample: Max conveys both the type of the result (Int), as well as the process being used to calculated it (Max). In CandidateAggregate only the former was expressed. We also build some field types by composing semigroups, such as Mapped, which specifies we will be adding values using Sum rather than some other approach. This also makes it very simple to update the method of aggregation (as illustrated below).

We have made Aggregate itself a semigroup to define how we combine these composite aggregates. We’ve also added an empty property to make it easier to call concat and concatMap.

The last piece we need is to translate a single Sample into an Aggregate, then we can do the entire aggregation using concatMap as shown in the aggregateSamples() test. Each Sample gets transformed into an Aggregate representing that individual sample (an aggregate of 1), then each Aggregate in turn gets combined to calculate the required information across all the samples.

fun aggregateSample(sample: Sample): Aggregate =
        Aggregate(Mapped(mapOf(sample.date to Sum(sample.value))),
                Nullable(Min(sample.date)),
                Max(sample.value),
                sample.value.countWithin(100..200))

fun <T : Comparable<T>> T.countWithin(range: ClosedRange<T>) =
        Sum(if (range.contains(this)) 1 else 0)

@Test
fun aggregateSamples() {
    // Aggregation
    val result = Semigroup.concatMap(Aggregate.empty, samples) { aggregateSample(it) }

    // Actual results are equivalent to the individual queries on the left:
    assertEquals(samples.minBy { it.date }?.date, result.earliestSampleDate.value?.value)
    assertEquals(samples.maxBy { it.value }?.value, result.largestSample.value)
    assertEquals(samples.count { (100..200).contains(it.value) }, result.inRange.value)
    val june2017 = MonthYear(2017, 6)
    assertEquals(samples.filter { x -> june2017 == x.date }.sumBy { it.value }, result.data.value[june2017]?.value)
}

What have we gained for the price?

This definitely has more pieces that the CandidateAggregate version (although the code for each piece has not changed much, it is now spread over multiple types). More pieces suggest a performance impact, but I have not measured this.

We do get a few benefits for this price. Firstly, we now have some small, simple, genuinely reusable aggregation types (Sum, Max, Min, Mapped etc.). These can be combined into other aggregates, and they can be tested in isolation. Secondly, we explicitly define aggregate types in terms of the aggregates of which they are composed. We don’t have an aggregate that contains an Int, we have a Sum or a Max which conveys more information as to the aggregation process, as well as preventing errors (summing two Int values that should have been combined using maxOf for example).

We also make it simpler to change our aggregation. For example, if we wanted to change from reporting the total value to the maximum value for each month, we can change Mapped to Mapped> and the aggregation process will adjust accordingly.

Conclusion

We introduced a Semigroup interface which represents values that can be combined with an associative, binary operation. We also introduced concat and concatMap operations that work for any instance of this interface. We created Sum, Max, Min, Nullable and Mapped instances of this interface to represent common methods of aggregation, then built a custom Aggregate semigroup composed of some of these instances.

This is a bit more complex compared than manually aggregating a set of values over a loop or fold, but in return gives us reusable and testable aggregate types, more communicative types for our aggregate model, less opportunities for bugs in the aggregation process, as well as making the creation of new aggregates and modifications to existing aggregates simpler.

Specifying FAKE targets

2017-02-05T08:45:00+11:00

FAKE is an F#-based build tool along similar lines to Make, Rake, etc. The FAKE documentation describes one way of setting up dependencies between targets using the ==> operator. For example:

"Clean" ==> "Version" ==> "Build" ==> "Test" ==> "Package" ==> "Full"`

This declaration means that to run the Test target, Build must be run beforehand, which in turn requires Version, which in turn requires Clean to be run.

This approach limits us to a linear build order. I’d prefer to specify these dependencies less prescriptively, and have FAKE calculate the ordering based on whatever target or targets I need.

Continuing the above example, I’d like to quickly build and run the tests during development, but for that case I don’t really need to version the assemblies. I’d also like to avoid running Clean in this case to take advantage of incremental compilation. But if I’m running the Package task to package everything for NuGet then it is essential to run Version before Build to make sure the packaged assemblies have the right version numbers. And I want to make sure I Clean before a full build to avoid any old artefacts making it into a package.

I fairly recently found out that FAKE does support this flexibility, using soft dependencies and the ability to specify multiple dependencies using the <== operator.

// ... target definitions elided ... 
Target "Full" DoNothing

// Dependencies
"Clean"   ?=> "Build"
"Version" ?=> "Build"
"Test"    <== [ "Build" ]
"Package" <== [ "Build"; "Version" ]
"Full"    <== [ "Clean"; "Version"; "Build"; "Test"; "Package" ]

RunTargetOrDefault "Full"

The "Clean" ?=> "Build" line tells FAKE “if Clean needs to run, it must run before Build”. We also tell FAKE that if we are Versioning, that has to be done before build as well. Unlike the linear definition we are not saying we have to run Clean or Version, just that if they need to run, they must go before Build.

The <== operator lets us make a target depend on multiple other targets. So "Package" <== [ "Build"; "Version" ] tells FAKE that to run Package, we have to run Build and Version. When we run fake Package FAKE knows it has to run both tasks, and it also knows that if it runs Version it must do so before Build. So the final build order for that case is: Version, then Build, then Package.¹

This gives me exactly the behaviour I was after. I can run the Test target which will force a build, but won’t run a clean or version the assemblies. I can generate a NuGet package with versioned assemblies (I should probably make that depend on Test as well). Or I can run a Full build which will clean, version, build, test and create the package.

I’d normally specify the task in the expected final order where possible, so "Package" <== [ "Verison"; "Build" ], but I just wanted to illustrate that FAKE is working out the required order, it isn’t a side-effect of the order dependencies are specified.↩

Pondering a prescription for pattern matching prevalance

2016-02-16T20:45:00+11:00

In which I ramble on about how my thoughts on pattern matching have changed over the years.

Glorified conditional?

At its most basic, pattern matching can be use to represent standard conditionals and switch statements. For example (in F#):

// Pattern matching syntax:
let menu s =
    match s with
    | "X" = exitCommand
    | "U" = moveUpCommand
    | "D" = moveDownCommand
    |  _  = ...

// Conditionals:
let menu2 s =
    if s = "X" then exitCommand
    else if s = "U" then moveUpCommand
    else if s = "D" then moveDownCommand
    else ...

This did not initially seem very exciting to me. There has to be more to it than this, right? (Spoiler: yes :) )

Pattern match all teh things!

Things get more interesting when we are dealing with types whose values can have different shapes. For example, Option (similar to Nullable in C#). In F# Option has an IsSome property (like HasValue for Nullable). If this is true then it is safe to access that value’s Value property. If IsSome is false, then accessing Value will throw a NullReferenceException. So we could (but please don’t) use option types like this:

// Don't do this!
let getKeyAndValue (key : string) (dict : Map<string,string>) =
    let result = dict.TryFind(key)
    if result.IsSome then
        Some (key, result.Value)  // please don't do this
    else
        None

I don’t like this. I’m not fond of null reference exceptions, and I don’t like checking IsSome before accessing values because I do silly things like messing up the conditional, or forgetting to check and crashing with a NullReferenceException (or if not forgetting, there are always those cases that “will never be null” which end up being just that due to a misunderstanding or a change somewhere along a call stack). And what about more complicated types, where we may have to check several different preconditions before accessing a number of different values?

Instead, we can use pattern matching to match all the possible shapes of our type:

// Better (but can still be improved)
let getKeyAndValue (key : string) (dict : Map<string,string>) =
    match dict.TryFind(key) with
    | Some value -> Some (key, value)
    | None       -> None

This is great because we don’t need to access the null reference-throwing .Value property. Instead the value is assigned as part of the pattern: Some value. For the None case there is no value we can access within the pattern. If we tried to add one, the compiler will stop and tell use we have the wrong pattern. What is extra great is that if we don’t cover all the possible allowable values of the type we are matching against the compiler will warn us.

So we’ve ruled out a whole bunch of errors, and have very explicit, compiler-checked documentation about valid ways to use values of each type.

This is awesome! Pattern match all teh things!

The “meh” of matching

Say we have a collection of key value pairs, where both keys and values are strings. Maybe we got this from a POST request, or a flattened JSON object or something. We want to get the value for a particular key, and convert it to an integer (or 0 if we can not do the conversion).

So we have two cases that can be None, looking up a value for a key that may not be in the JSON, and trying to convert the value to a valid integer.

Let’s start out with the conditional version:

let getRows (dict : Map<string, string>) : int =
    let rows = dict.TryFind("numberOfRows")
    if rows.IsSome then
        let result = tryParseInt(rows.Value)
        if result.IsSome then result.Value
        else 0
    else 0

Yuck, look at all those potentially catastrophic .Value calls! Let’s rewrite it with our new-found hammer:

let getRows2 (dict : Map<string, string>) : int =
    match dict.TryFind("numberOfRows") with
    | None -> 0
    | Some rows ->
        match tryParseInt rows with
        | None -> 0
        | Some result -> result

What isn’t so great is that we are still writing very similar code, just with safer pattern-matching instead of free-form conditionals. But we’re still going through the same code branches.

What I also found alarming when first starting out with this is a side-effect of the compiler warning us about unmatched values – we’re now forced to be explicit everywhere about how to handle all the values. Isn’t this going to get horribly verbose? We already have a good idea about when things are going to be null, so why trade concise code for a little safety?

Well, the good thing is we can have our safety and eat… er… code… concisely too!

Combinator all teh things!

Rather than digging into the details of a type by pattern matching all the time, we can define operations for using and combining values of these types. I often see these referred to as “combinators” (although that term seems overloaded). For example, we can rewrite our getRows function using Option.bind and Option.getOrElse¹ without ever digging in to grab a value from an Option type.

let getRows3 (dict : Map<string, string>) : int =
    dict.TryFind("numberOfRows")
    |> Option.bind tryParseInt
    |> Option.getOrElse 0

Under the hood this code is still doing exactly the same thing, but we are now expressing the operation in terms of other distinct operations, instead of via the details of deconstructing values². This allows us to start thinking at a higher level of abstraction. Rather than thinking about things like “if this is Some value return that, or if it is None then return the second option”, we start thinking in terms of the higher-level operations like or and map. These operations allow us to more easily and precisely express more complex ideas.

This was a huge turning point for me. Previously I was worried about things like Option values propagating all over the code, and having to pattern match at each call site. Now we still get propagation (which is completely valid! If we are dealing with a call that can return an empty value, chances are the caller will also need to return an empty value), but there is no cost for this. Combinators make using these values almost as convenient as using the wrapped type³, with the benefit that we are now safely handling empty values instead of relying on us to remember which calls sometimes return null instead of a T.

An aside for pattern matching-less languages

If we mainly use combinators for combining types of values, this makes pattern matching a less essential part of a language. It is still a very nice feature to have, as it is pretty natural to implement combinators using pattern matching, and pattern matching seems to go hand-in-hand with sum types which I regard as an essential language feature. But for those who still do a lot of work in C# and similar languages this means that we can implement these combinators in others ways (sometimes messy ways, without as much compiler/type system help) and get a lot out of useful, oft-pattern-matched types like Option and Either.

Conclusion

My experience with pattern matching has gone from not understanding why it was useful, then to wanting to use it everywhere, now to favouring combinators and avoiding having to dig in to the details of a type as much as possible. Using these operations defined over types gives me a nice, high-level way of thinking about building up these values.

Pattern-matching is still really useful, particularly for defining operations over a type, but in general I try to use those defined operations instead, only falling back to pattern matching in the cases where it is much simpler (for example: cases like let (a,b) = foo instead of let a = fst foo; let b = snd foo).

If you currently use pattern matching all the time, maybe try to pull out the repeated operations the pattern matches represent and see if you prefer that style. Operations like map, flatMap, apply, reduce/fold, and other combining functions along the lines of +, and, and or are good places to start.

getOrElse is not part of the Option module in F#3, but thankfully we can add members to modules.↩
To me using combinators like this is similar to how we tend to use classes. The internal details of the class are stored in private fields, and we define methods to interact with instances of that class without having to know the details of those fields. Combinators give us the same level of abstraction – we can access operations over a type without knowing the patterns / specific constructors of that type.↩
…and every bit as easy as using an object with methods hanging off it, which is one valid way of implementing these combinator functions↩

Currying vs. partial application

2016-02-10T21:30:00+11:00

When I first came across the terms “currying” and “partial application” I was a bit confused about the difference. Here is my attempt at an explanation¹. I’m not 100% confident of my understanding, so please point out any inconsistencies – I’m happy to be corrected :).

Consider a call that takes 2 arguments and returns some value²:

f : (String, Int) -> Widget
// Example call:
f("a", 1)

Currying is the process of converting this to a function that takes a single argument, and returns another function that takes a single argument.

f' : String -> (Int -> Widget)
// or just:
f' : String -> Int -> Widget

// Example call:
f'("a")(1)

For functions with more than 2 arguments, we can use currying to convert it to a series of functions that each take a single argument:

g : (a,b,c,d) -> e
g' : a -> (b -> (c -> (d -> e)))
// or just:
g' : a -> b -> c -> d -> e

Partial application is when we can have a function that takes multiple arguments, give it a subset of those arguments, and get back a function that will take the remaining arguments. With curried functions we get this ability for free, but you could imagine a language feature that implements this for uncurried functions:

// With curried function:
g' : a -> b -> c -> d -> e
let partialApplyG' = g'(1)(2)
// partialApplyG' : c -> d -> e
partialApplyG'(3)(4) // <- providing the rest of the arguments

// With uncurried function (via our imagined language feature):
g : (a,b,c,d) -> e
let partialApplyG = g (1, 2)
// partialApplyG : (c, d) -> e
partialApplyG (3, 4) // <- providing the rest of the arguments

I think it is correct to say that all curried functions support partial application, but not all partial application implementations require currying.

Also left as a comment to this post, modified slightly here↩
See Reading type annotations if this style of writing out types is unfamiliar.↩

Reading type annotations

2016-02-09T22:30:00+11:00

C and C-style languages like C++, Java, and C# tend to have method types written like this:

returnType methodName(argType0 arg0, argType1 arg1);

Other typed languages and programming papers use a notation more like this:

methodName : (argType0, argType1) -> returnType

I found it took a bit of getting used to, but I now much prefer to read and write this style. I think it is worth becoming familiar with, as it is used in quite a few languages¹ and in all the programming papers I’ve seen. So here’s a quick guide on how to read this style of type annotation.

Structure

From the methodName example above, we can see the structure has changed from “return type - name - arguments” to “name - arguments - return type”. So the main change is moving the return type from the beginning to the end.

A : separates the name from the type signature. : can be read as “has type”. Haskell unfortunately uses :: for this, instead of the : character which seems to be used pretty much everywhere else.

A -> arrow separates function input from function output. So a -> b reads as “I take values of type a and produce values of type b”.

Arguments are shown as a tuple of one or more types. In some languages (like ML, OCaml, and F#) tuple types are shown denoted by types separated by * characters, so the signature would look like methodName : argType0 * argType1 -> returnType.

Generics

There are a few different ways of representing generic parameters. Let’s take a function that, given a single element of some type, returns a singleton list of that type.

// C#
List Singleton(T value);

// Haskell
singleton :: t -> List t

// F#
singleton : 't -> List<'t>
// or F# can use postfix syntax where the type variable
// is followed by the type constructor
singleton : 't -> 't list

In Haskell, any type starting with a lowercase character is a type variable rather than a concrete type. In F# type parameters begin with a quote character '. Not requiring an additional step to list generic parameters is handy.

Higher order functions

Where this notation starts to show some advantages is with higher order functions. For example, say we want a generic map function:

// C#-style
List Select(Func f, List list);

// Haskell-style
map :: (t -> a) -> List t -> List a

// or a more exact, less idiomatic translation:
map :: ((t -> a), List t) -> List a

These functions take a function that translates Ts to As, and a list of Ts, to produce a list of As. The parentheses around the (t -> a) in the Haskell-style signature show that this is a single argument (that happens to itself be another function). This is a bit cleaner than the equivalent Func in the C# version, particularly when the explicit type parameter declarations are taken into account. The difference becomes more noticeable as we increase the number of functions and type parameters:

// Example: function composition, (f ∘ g)(x) = f(g(x))

// Haskell style:
compose :: (b -> c) -> (a -> b) -> (a -> c)

// C#-style:
Func Compose(Func f, Func g);

Curried functions

In the map example above a “more exact, less idiomatic translation” was shown:

map1 :: (t -> a) -> List t -> List a
map2 :: ((t -> a), List t) -> List a

map1 takes a function (t -> a) and returns a function List t -> List a. It would also be correct to write it as map1 :: (t -> a) -> (List t -> List a). In constrast, map2 takes a single argument that happens to be a tuple of ((t -> a), List t). If we are supplying both arguments at once there is not much difference, but the map1 version also lets us supply just the (t -> a) argument to create a new function.

> map1 (+1) [1..3]
[2,3,4]
> map2 ((+1), [1..3])
[2,3,4]

> let addOne = map1 (+1)
addOne :: [List Int] -> [List Int]
> addOne [1..3]
[2,3,4]

Being able to supply less than the full contingent of arguments to a function, and get back a function that takes the remainder of the arguments, is called partial application.

The map1 form of signature, where a function is built out of functions that each take a single argument, is called a curried function (map2 is “uncurried”). We get partial application, the ability to provide one argument at a time, for free with curried functions.

Curried function signatures in C# get unpleasant fairly quickly:

// Haskell-style
curriedEg :: a -> b -> c -> d -> e
uncurriedEg :: (a, b, c, d) -> e

// C#
Func>> CurriedEg(A a);
E UncurriedEg(A a, B b, C c, D d);

// cluck cluck, bgark!

Unit values

Some methods take no input and return a value (either a constant, or due to some side-effect). The “no input” value is normally represented by empty parenthesis (), and is called “unit” (because there is only a single legal value of this type, ()).

DateTime GetDate();

getDate : () -> DateTime

void Save(Widget w);
save : Widget -> ()

This starts to look a bit funny when methods take other calls with no input and no direct output:

void Subscribe(Action callback);
subscribe : (() -> ()) -> ()

It does give some immediate clues as to where side-effects are in a type signature thought.

Types inside implementations

We’ve looked at different forms of type signatures, but this style also tends to work its way into method definitions, again using the form name : type.

// C#
List Singleton(T t) {
    return new List { t };
}

// Haskell
singleton :: t -> [T]
singleton t = [t]

// F#
let singleton (t : 'T) : List<'T> = [t]

// Swift
func singleton(t : T) -> [T] {
    return [t]
}

Haskell tends to split the type signature from definition. F# specifies the arguments as argName : argType, and then gives the type of the resulting value (in this case List<'T>. Generic type parameters are indicated with a ' prefix. Swift uses a similar style, but an arrow is used for the return type. Swift needs explicit declaration of generic type parameters.

In both the Haskell and F# cases the type information can actually be omitted – the type system will infer the correct type signature.

Conclusion

This has been a quick tour through the things that first tripped me up when reading type signatures from non-C-style languages.

The main habit I needed to break was expecting to see a type then a name. Instead, names are first, then their type shown. So method types change like this:

returnType blah()
// becomes something like:
blah : () -> returnType

Similarly arguments go from ArgType name to name : ArgType.

void Load(int id)
// becomes something like:
load(id : int) : ()

Hope this helps!

Such as Haskell, F#, Swift, Scala, OCaml, ML, Idris, Elm, PureScript, and TypeScript.↩
Note that void in C-style languages is different to the terms “unit” and “Void” in non-C-style languages. In C-style languages void means “has no return value”, where a return type of () means “returns the value ()”. In contrast, the type Void is one with no legal instance. We can never return a value of type Void, so my understanding is a function a -> Void can never return.↩

The Apply pattern

2015-07-27T21:45:00+10:00

I really enjoy trying to understand how and why things like work, but for this post I’m going to try to skip all that wonderful stuff and instead give a practical outline of how to use a very useful pattern arising from applicative functors.

I’ve found this pattern incredibly useful in F#, Swift and Haskell. The examples here are in F#, but as far as I can tell we can use it anywhere that has generic types and higher-order functions.

Aim

Say we have some generic type, let’s call it Widget (we’ll use the term “widget” as a placeholder for a generic type we are working with - feel free to substitute in Option, Either, Future, List etc.). There are lots of useful functions that work with non-widget types, and we would like them to work with Widget values without having to re-write them.

// Some useful, non-widget functions:
(+) : Int -> Int -> Int
(::) : 'a -> ['a] -> ['a]
createThingoe :: Pop -> Blah -> Zap -> Thingoe

// Widget compatible versions:
widgetPlus : Widget<Int> -> Widget<Int> -> Widget<Int>
widgetCons : Widget<'a>  -> Widget<'a>  -> Widget<'a>
widgetThingoe : Widget<Pop>  -> Widget<Blah>  -> Widget<Zap> -> Widget<Thingoe>

Prerequisites

We can achieve this aim if the generic type has a map (or Select in C# terminology) and an apply function. Continuing our Widget example:

module Widget =
    let map   :    ('a->'b)    -> Widget<'a> -> Widget<'b> = ...
    let apply : Widget<'a->'b> -> Widget<'a> -> Widget<'b> = ...

If the type does not have these functions provided we may still be able to write them. We’ll look at this later.

Apply pattern

We can use any non-widget function with widget values using map for the first argument, and apply for subsequent arguments.

let (<^>) = Widget.map
let (<*>) = Widget.apply

// Use non-widget function with non-widgets:
let normalResult =
  nonWidgetFn firstArg secondArg thirdArg ... finalArg

// Use non-widget function with widgets. It's just like non-widget function
// application, only with more punctuation. :)
let widgetResult =
  nonWidgetFn <^> firstWidget <*> secondWidget <*> thirdWidget <*> ... <*> finalWidget

// Convert any 2 argument function, (a -> b -> c) -> (Widget a -> Widget b -> Widget c)
let lift2 f a b     = f <^> a <*> b

// Convert any 3 argument function:
let lift3 f a b c   = f <^> a <*> b <*> c

// Convert any 4 argument function:
let lift4 f a b c d = f <^> a <*> b <*> c <*> d

// Widget-compatible plus and cons:
widgetPlus a b = (+) <^> a <*> b
widgetCons a b =
    let cons a b = a :: b
    cons <^> a <*> b

Example

Say we are using a library with a Result<'Error, 'T> type that represents operations that can fail with a value of type 'Error, or succeed with a value of type 'T. The library also supplies map and apply functions for this type. We want to use this type to try to parse a Person value from a UI form with name, email and age text fields:

let nonEmpty   (s : string) : Result<AppError, string> = ...
let validEmail (s : string) : Result<AppError, string> = ...
let parseInt   (s : string) : Result<AppError, int> = ...

type Person = { name : string; email : string; age : int }
    with
    static member create a b c = { name=a; email=b; age=c }

// We want to use Person.create which takes strings and ints, but we need to try to
// parse values from text fields which will give us Result
// and Result values.
let (<^>) = Result.map
let (<*>) = Result.apply

Person.create <^> nonEmpty (name.text) <*> validEmail (email.text) <*> parseInt (age.text)
    |> printfn "%A"
(*
When all fields are valid:
> Success {name = "Abc"; email = "abc@example.com"; age = 42;}

When firstName.text is empty:
> Failed UnexpectedEmptyString

When age.text is invalid:
> Failed (CouldNotParseInt "12jf")
*)

When a generic type does not meet the prequisites

Sometimes a type will not have an apply function provided, but will have map, and also a flatMap/bind function provided with the following type:

// Also called "bind"
let flatMap : ('a-> Widget<'b>) -> Widget<'a> -> Widget<'b> = ...

This is the case with the F# Option module, which provides map and bind with the required signatures. In these cases we can implement apply in terms of the these other functions:

module Option =
    let apply ff a = Option.bind (fun f -> Option.map f a) ff

// General case:
module SomeOtherType =
    let apply ff a = SomeOtherType.bind (fun f -> SomeOtherType.map f a) ff

We can now use the pattern with optionals (and any type with map and flatMap/bind):

let (<^>) = Option.map
let (<*>) = Option.apply

let result : Option<int> = (+) <^> tryParseInt (first.text) <*> tryParseInt (second.text)
//> val result : Option = Some 42

In cases where we have a mix of arguments, some using our generic type and others not, we can still apply¹ the pattern by converting the values to our generic type. For our Person.create example, we could already have the person’s email as a valid string value from earlier in the sign-up process:

let email : string = "abc@example.com"
Person.create <^> nonEmpty (name.text) <*> Success email <*> parseInt (age.text)
    |> ...

Here we convert email from a string to a Result value first using the Success constructor. Then we have our three Result values to use with the apply pattern.

Summary

This pattern is useful for being able reuse all our existing functions in the context of another type, like Future, Option, Result and lots, lots more. To do this for some generic type Widget we need:

let map : ('a -> 'b) -> Widget<'a> -> Widget<'b>
let apply : Widget<'a -> 'b> -> Widget<'a> -> Widget<'b>

// Alternatively, can also use bind/flatMap to get an apply function
let flatMap : ('a -> Widget<'b>) -> Widget<'a> -> Widget<'b>

We then apply the non-widget function to the first argument using map, and use apply for subsequent applications.

let (<^>) = Widget.map
let (<*>) = Widget.apply
let result : Widget<A> =
    nonWidgetFn <^> firstWidgetArg <*> secondWidgetArg <*> ... <*> lastWidgetArg

Calls look similar to regular function application, with the additional operators taking care of conversion into our Widget context.

We can mix widget and non-widget arguments by converting non-widgets:

let result : Widget<A> =
    nonWidgetFn <^> firstWidgetArg <*> toWidget secondArg <*> ... <*> lastWidgetArg

I wrote a bit more about how this works a while back, or search around for “applicative functor” if you are interested in the theory behind the practice. We can effectively use this pattern without delving into the details though - so we can apply now and ask questions later. :)

Sorry.↩

Git tidbit: Comparing different paths across branches or commits

2015-04-22T11:25:00+10:00

Today I updated a library version in a project, which changed the path from packages/FSharp.Formatting.CommandTool-2.8.0 to packages/FSharpFormatting.CommandTool-2.9.1. We’d also taken our own copies of some templates in the package, and I wanted to check if there were any differences between -2.8.0\templates and -2.9.1\templates that I should port across.

Rather than my usual fumbling about (check out both, copy, diff) I thought I’d try to learn the necessary Git incantation to compare the paths. And then blog it, so that when I forget I’ll have a quick reference handy for next time. :)

I ended up using git diff with the COMMIT:PATH format, using HEAD and HEAD~1 as the commit references (shown split over multiple lines):

git diff --ignore-space-change \
    HEAD:source/packages/FSharp.Formatting.CommandTool.2.9.1/templates \
    HEAD~1:source/packages/FSharp.Formatting.CommandTool.2.8.0/templates/

To get a summary of the files changes instead (in this case, to confirm nothing changed), use the --stat option:

% git diff --stat --ignore-space-change HEAD:source/packages/FSharp.Formatting.CommandTool.2.9.1/templates HEAD~1:source/packages/FSharp.Formatting.CommandTool.2.8.0/templates
 docpage.cshtml                | 0
 reference/module.cshtml       | 0
 reference/namespaces.cshtml   | 0
 reference/part-members.cshtml | 0
 reference/part-nested.cshtml  | 0
 reference/type.cshtml         | 0
 template.cshtml               | 0
 7 files changed, 0 insertions(+), 0 deletions(-)

I was pretty impressed that Git’s Bash prompt on Windows gave me autocompletion on the HEAD~1:/...2.8.0/ path despite the path no longer being in the working directory.

F# type signature gotchas

2015-01-22T23:30:00+11:00

Today I was speaking with a colleague about some F#, and he pointed out a few gotchas with F# type signatures, especially if you’ve spent some time with Haskell (and not OCaml or other ML-ish language).

Aside: This post just runs through some gotchas, but if you would like more general information on reading -> style function signatures please let me know.

The example we were looking at is Seq.unfold, whose signature looks like this:

Seq.unfold : ('State -> ('T * 'State) option) -> 'State -> seq<'T>

Apostrophes for type parameters

Any type prefixed with a ' character represents a type parameter (or generic type in C# parlance). For unfold this means 'State and 'T can be any type. We can also write this in potentially more familiar .NET syntax:

Seq.unfold<'State, 'T> : ('State -> ('T * 'State) option) -> 'State -> seq<'T>

A lot of the F# code I see follows a more Haskellish (?) convention of using lowercase type variable names, more like:

Seq.unfold<'s, 't> : ('s -> ('t * 's) option) -> 's -> seq<'t>

Asterisk for tuples

Types separated by a * are tupled (or product types, which explains the * symbol). For example, (1, "abc", Foo()) is of type int * string * Foo.

So in unfold, 'T * 'State represents a tuple of 'T and 'State.

Postfix generic syntax

F# supports both .NET-style prefix generic syntax and ML-style postfix syntax. So instead of writing int option, we can also write Option (both forms are equivalent). Which means we can re-write unfold as:

Seq.unfold<'s, 't> : ('s -> Option<'t * 's>) -> 's -> seq<'t>

Using `unfold`

With those things in mind, let’s use the unfold signature to work out what it does.

unfold :
  ('s -> Option<'t * 's>) -- A function that takes an 's and gives an optional tuple of 't and 's.
  -> 's                   -- A value of type 's
  -> seq<'t>              -- A sequence of 't values

Given a function that can take 's values and return a tuple of an element and next 's value or nothing, and a starting 's, unfold will generate a sequence of 't values until the generator function returns None (i.e. potentially infinite).

We could use this to generate a sequence of all the days since a starting date (infinite, at least until DateTime hits DateTime.MaxValue):

let daysAfterThisPost =
    DateTime(2015, 1, 22)
    |> Seq.unfold (fun d -> let d' = d.AddDays(1) in Some (d', d'))

Translating to other languages

Finally, if you’re more familiar with C# or Haskell, here are my attempted translations:

// F#
Seq.unfold : ('State -> ('T * 'State) option) -> 'State -> seq<'T>

-- Haskell
unfold :: (s -> Maybe (t,s)) -> s -> [t]

// C# (uncurried. seq = IEnumerable)
IEnumerable<T> Unfold<S,T>(Func<S, Option<Tuple<T,S>>> generator, S initial);

Haskell uses lowercase type names for generics (instead of ' characters), while concrete types have uppercase names. It also uses the same syntax for tuple types as values, so (1,2) :: (Int, Int). For some odd reason, Haskell uses :: for “type of” instead of a single :.

The C# version is a bit messier due to having to use Func instead of a shorthand for function types, and similarly for declaring tuple types. (I’ve also uncurried the C# version otherwise we end up with nested Func types everywhere, and it is the more typical form for C# functions.)

F#: Pattern matching on field literals

2015-01-14T08:30:00+11:00

F# gave me the following error when working with some C# code:

error FS0729: This field is not a literal and cannot be used in a pattern

I’m not entirely sure it’s a good idea, but I managed to work around this using a partial active pattern.

Here’s the gist of the C# code I was working with:

namespace Workshop.SomeCSharpLib {
    public class Thingamabobs {
        private Thingamabobs(string s) { }
        // ... 
        public static readonly Thingamabobs Foo = new Thingamabobs("foo");
        public static readonly Thingamabobs Bar = new Thingamabobs("bar");
        public static readonly Thingamabobs Clunk = new Thingamabobs("clunk");
        public static readonly Thingamabobs Zap = new Thingamabobs("zap");
    }
}

Thingamabobs represent a sort of enum with an associated value - the kind of thing we’d typically use a discriminated union for in F#.

Trying to convert this to my own type using pattern matching resulted in the FS0729 error:

type Things = Foo | Bar | Clunk | Zap

let convertThingamabob =
    function
    | Thingamabobs.Foo   -> Some Foo // *
    | Thingamabobs.Bar   -> Some Bar
    | Thingamabobs.Clunk -> Some Clunk
    | Thingamabobs.Zap   -> Some Zap
    | _                  -> None

    // * error FS0729: This field is not a literal and cannot be used in a pattern

I couldn’t find much information on this error, but I gather I need to explicitly compare the argument to the field value using if ... else if ... else, something like:

   fun x -> if x = Thingamabobs.Foo then Some Foo
            else if x = ...

I think it looks neater as a pattern match, so worked around this using a partial active pattern to do the comparison:

// Partial active pattern. Match if field equals value.
let (|Field|_|) field x = if field = x then Some () else None

type Things = Foo | Bar | Clunk | Zap

let convertThingamabob =
    function
    | Field Thingamabobs.Foo   -> Some Foo
    | Field Thingamabobs.Bar   -> Some Bar
    | Field Thingamabobs.Clunk -> Some Clunk
    | Field Thingamabobs.Zap   -> Some Zap
    | _                        -> None

I’m not sure if there are any drawbacks to this approach, so if you can think of any please let me know.

D3 newbie updates a bar chart

2014-09-01T00:01:00+10:00

I’ve been trying to learn D3.js via Mike Bostock’s excellent “Let’s make a bar chart” tutorial series. This post is my attempt to extend that example to handle data updates.

Starting point

Part 3 of the tutorial ends with a bar chart that shows the relative frequency of letters used in the English language.

The creation of each bar per datum is handled by this code:

chart.selectAll(".bar")
      .data(data)
  .enter().append("rect")
    .attr("class", "bar")
    .attr("x", function(d) { return x(d.name); })
    .attr("y", function(d) { return y(d.value); })
    .attr("height", function(d) { return height - y(d.value); })
    .attr("width", x.rangeBand());

This says we’re dealing with chart elements of the CSS class .bar for each datum. The .enter() call tells D3 we want to perform the operations that follow for any new data (data that has entered source). We can also use .exit() for data that is no longer in the source. If we want to handle updated data we can add properties directly (outside of enter() / exit()).

Adjusting the bars for new data

To specify updates I had to change the data join so D3 knows how to differentiate added, removed and updated data. In this case we will use the name property, which is a letter from A to Z.

var bar = chart.selectAll(".bar")
        .data(data, function(d) { return d.name; });

Next we’ll modify the code to specify how to handle updated and removed data, instead of just what to do on enter() for new data.

// new data:
bar.enter().append("rect")
   .attr("class", "bar")
   .attr("x", function(d) { return x(d.name); })
   .attr("y", function(d) { return y(d.value); })
   .attr("height", function(d) { return height - y(d.value); })
   .attr("width", x.rangeBand());
// removed data:
bar.exit().remove();
// updated data:
bar
   .attr("y", function(d) { return y(d.value); })
   .attr("height", function(d) { return height - y(d.value); });
   // "x" and "width" will already be set from when the data was
   // first added from "enter()".

Updating the axes

This was enough to update the chart, but the y-axis would draw the new axis over the top of the previous axis, so both values would show. This answer on StackOverflow suggested removing the axis and redrawing it each time, which worked well.

// Remove previous y-axis:
chart.select(".y.axis").remove(); // << this line added
// Existing code to draw y-axis:
chart.append("g")
      .attr("class", "y axis")
      .call(yAxis)
  .append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", 6)
    .attr("dy", ".71em")
    .style("text-anchor", "end")
    .text("Frequency");

Basic transition

The next thing I wanted to try was animating changes to existing data. This turned out to be trivial thanks to D3’s transition() method, which I just dumped prior to the code we used to update each bar.

bar
  .transition().duration(750)  // <<< added this
    .attr("y", function(d) { return y(d.value); })
    .attr("height", function(d) { return height - y(d.value); });

And that’s it!

End result

Here’s an example of the update in action. Use the radio buttons to alternate between the chart showing frequencies of letters in English and the frequencies of letters used in the source for this post.

A simple circuit, an Arduino, and Haskell

2014-08-12T23:45:00+10:00

I recently had loads of fun attending a Nodebots AU event in Sydney. (Thanks a lot to Damon and Andrew for organising, and NICTA for the venue!) I got to muck around with some simple circuits and drive them with Javascript. Towards the end of the day I was running out of time and creativity to do anything fancy, so I decided to see if I could get one of the circuits working with Haskell.

Nodebot prerequisites

I got a Node ARDX kit at the event, and followed the Nodebots AU setup guide to get all the software bits and pieces.

For Haskelling, I used my existing Haskell installation, then created a new cabal sandbox and installed the hArduino package (v0.9) into it.

A simple circuit

Here’s a simple circuit that includes a potentiometer and a bunch of LEDs. The idea is that as someone turns up the potentiometer, the number of LEDs switched on increases accordingly. (Yes, this may seem somewhat unimpressive, but as a complete newbie who managed to do this without blowing anything up, I’m calling it a win! :))

DISCLAIMER: While I somehow managed to avoid blowing anything up while attempting this, I don’t know what I’m doing and so can’t guarantee that this won’t destroy anything you value. Use at your own risk! :)

Arduino Uno with potentiometer connected to A5, and six LEDs (with 330Ω resistors) connected to pins 2-7. Image created with the open-source Fritzing app. [View full size]

Haskellbot

So now I’m in my cabal sandbox and it’s time to write some Haskell. Here’s the main outline of the program (with some explanatory comments added).

import Control.Applicative
import Control.Monad (when)
import Data.Foldable (for_)
import System.Hardware.Arduino

leds = digital <$> [2..7]             -- leds on digital pins 2 to 7
pot = analog 5                        -- potentiometer on A5 
setPin = flip setPinMode

main :: IO()
main =
  withArduino False "/dev/cu.usbmodemfa131" $ do
      for_ leds (setPin OUTPUT)       -- set each led pin as an output
      setPin ANALOG pot               -- set potentiometer's pin as analog
      run 0                           -- run with initial pot. value of 0
  where
    run cur = do
      new <- analogRead pot           -- read potentiometer's value
      when (new /= cur) $ updateLeds new  -- if it has changed from current,
                                          -- update the LEDs based on the new value
      delay 250                       -- wait for a bit
      run new                         -- continue main run loop

After the initialisation stuff, the main bit of the program is the run loop, which polls the potentiometer and updates the LEDs whenever the value changes.

The updateLeds and related code looks like this:

updateLeds :: Int -> Arduino ()
updateLeds potVal =
    for_ (zip leds [1..]) $
        \(led, ledNum) -> digitalWrite led (ledNum <= maxLedNum)
    where
        maxLedNum = numLedsOn potVal

numLedsOn :: Int -> Int
numLedsOn potVal = numLeds * potVal `div` maxPotVal
    where
        maxPotVal = 1023
        numLeds   = length leds

The updateLeds function takes the potentiometer value and works out how many LEDs it needs to turn on based on the numLedsOn function. It then loops through each numbered LED and turns it on or off based on whether the ledNum <= maxLedNum we need to switch on.¹

numLedsOn doesn’t need to be a separate function like this, but I found it helped to be able to test my arithmetic independently of hardware. :) (We could also get away without specifying any types, but I find doing so makes it easier for me to read.)

Running this… er… ‘masterpiece’

Rather than setup a build, I just ran cabal repl from my sandbox to get a GHCi with the hArduino package accessible, then loaded and ran the code:

ghci> :load lights.hs
ghci> main

Now I could finally fulfill my life-long dream of adjusting LEDs using a twirly dial! Hooray! :)

The updateLeds loop is a bit neater in applicative form, but assumes familiarity with the operators: for_ (zip leds [1..]) $ digitalWrite <$> fst <*> ((<= maxLedNum) . snd)↩

Reasoning with more than evaluation

2014-07-24T22:30:00+10:00

Exercise 1.41 of SICP asks us to work out what the following expression will evaluate to, given the definition of double:

(define (double f)
  (lambda (x) (f (f x))))

(((double (double double)) inc) 5)

Direct substitution

If this expression had side-effects, we’d need to understand the evaluation order and keep track of state changes with each evaluation step. Because this is a pure function, we can substitute the definitions of each sub-expression, and in any order we like. I found these substitutions quite tricky though, because with each evaluation of double I had to take into account its argument. I ended up with something like this:

Aim: evaluate (((double (double double)) inc) 5)

(double f)
= lambda (x) (f (f x))

(double double)
= lambda (x) (double (double x))

(double (double double))
= lambda (x) ( (double double) ((double double) x))
= lambda (x) ( (lambda (x') (double (double x')))
                 ((lambda (x') (double (double x'))) x) )
= lambda (x) ( (lambda (x') (double (double x')))
                 ((double (double x))) )
= lambda (x) ( double (double (double (double x))) )

((double (double double)) inc)
= (double (double (double (double inc))))
= (double (double (double (lambda (x) (inc (inc x))))))
= (double (double (lambda (x) (inc (inc (inc (inc x)))))))
= (double (lambda (x) (inc (inc (inc (inc (inc (inc (inc (inc x))))))))))
= (lambda (x) ( ... 16 incs ... x))

(((double (double double)) inc) 5)
= (... 16 incs ... 5)
= 16+5
= 21

I found this quite hard to follow. I had to use additional intermediary steps (not shown above) to evaluate expressions like (double (lambda (x) (inc (inc x)))).

Using other equalities

Instead of direct substitution we can transform the expression into equivalent terms we find easier to reason about, such as those with mathematical properties we can apply to simplify the expression. We still use substitution, but with other equalities instead of replacing a term’s name with its definition.

For this example, I found it easier to think of double in terms of function composition.

f . g    = \x -> f (g x)

double f = \x -> f (f x)
         = f . f

Function composition is associative (which we can convince ourselves of using equalities¹). I find this made it easier to reduce the nested double calls.

double double
  = double . double
double (double double)
  = (double . double) . (double . double)
  = double . double . double . double           -- by associativity of composition
(double (double double)) inc
  = (double . double . double . double) inc
  = (double . double . double) (double inc)     -- by defn of f . g
  = (double . double . double) (inc . inc)      -- by: double f = f . f
  = (double . double) (double (inc . inc))
  = (double . double) (inc . inc . inc . inc)
  = double (inc . inc . inc . inc . inc . inc . inc . inc)
  = ( ... 16 incs ...)
  = (16+)                                       -- \x -> inc (inc x) = (2+)

So (((double (double double)) inc) 5) = (16+) 5 = 21

Composition lets us deal with functions as values without having to substitute in for their arguments at each step. The fact composition is associative hides unimportant details, such as not having to worry about the order of composition in expressions like (double . double) . (double . double). I found each step made more sense to me, and I had more confidence in my answer.

I imagine different people will find different forms easier than others, but the point is we can choose whichever transformations we like to get the expression into an equivalent form we can work with more easily.

Conclusion

For a while now I’ve appreciated that pure functions mean we can more easily use substitution to understand code, but it wasn’t until this exercise that I’ve finally started to get a vague idea of what equational reasoning means. It is more than just substitution – it is being able to use all sorts of transformations and properties to understand our code. I don’t think this example really showcases this idea, but I did feel like it was my first glimpse into a different, powerful way of understanding code.

My attempt at showing function composition is associative:

-- Associativity of composition
f . g = \x -> f (g x)

a . (b . c) = \x -> a ( (b.c) x)
            = \x -> a ( (\x' -> b (c x')) x)
            = \x -> a ( b (c x) )
            = \x -> (a . b) (c x)
            = \x -> ((a . b) . c) x
            = (a . b) . c

↩

F# assertion libraries

2014-07-22T21:30:00+10:00

There are a few different libraries that provide test assertions for F#. I went through a couple today and tried a trivial example in each.

I’m using xUnit for these examples, but all of this should apply to NUnit (and other test runners) too. I’ve put all the code in a Sample.fs gist if you want to see it all in one place.

xUnit assertions

I’ve written about getting started with NUnit in F# before. We can also use xUnit and its built in assertions.

open Xunit

[<Fact>]
let ``map (+1) over list using Xunit`` () =
    let result = List.map incr [1;2;3]
    Assert.Equal<int list>([2;3;4], result)

I needed to specify the int list type explicitly to get F# to resolve the correct overload.

Here’s an example of an assertion failure (when I change the expected value to [2;3;5]):

Position: First difference is at position 2
Expected: FSharpList { 2, 3, 5 }
Actual:   FSharpList { 2, 3, 4 }

FsUnit

FsUnit provides helpers for NUnit, xUnit, MbUnit, and MSTest assertions to make them play nicely with F# syntax and type inference. I installed the FsUnit.Xunit package.

open FsUnit.Xunit

[<Fact>]
let ``map (+1) over list using FsUnit.Xunit`` () =
    let result = List.map incr [1;2;3]
    result |> should equal [2;3;4]

Sample failure:

Position: First difference is at position 0
Expected: Equals [2; 3; 5]
Actual:   was [2; 3; 4]

(I’m not sure why first difference is at position 0 here?)

Unquote

Unquote lets us use quoted expressions for assertions.

open Swensen.Unquote

[<Fact>]
let ``map (+1) over list using Unquote`` () =
    test <@ List.map incr [1;2;3] = [2;3;4] @>

If the test fails, Unquote shows each step in reducing the expression so you can see where they start to differ:

List.map Sample.incr [1; 2; 3] = [2; 3; 5]
[2; 3; 4] = [2; 3; 5]
false

This case only shows 3 steps, but more complex expressions will show more.

FsCheck

FsCheck is influenced by Haskell’s QuickCheck and Scala’s scalacheck. Rather than asserting a specific input and output, we define a property that should hold for all values of a type (optionally requiring they meet certain criteria, such as being a positive integer).

The FsCheck.Xunit package has specific support for xUnit through a PropertyAttribute that let us run properties directly as an xUnit test (otherwise a little more boilerplate is required, see “Using FsCheck with other testing frameworks” in the Quick Start guide).

open FsCheck
open FsCheck.Xunit

[<Property>]
let ``map f . map g = map (f . g)`` (xs:int list) =
    let f x = x*10
    let g x = x+1
    (List.map f << List.map g) xs = List.map (f << g) xs

If we modify the property to ensure it fails (for example, (List.map f << List.map g) xs = List.map (f << g) (List.filter even xs)), we get this output:

FsCheck.Xunit.PropertyFailedException
Falsifiable, after 3 tests (2 shrinks) (StdGen (267259328,295888818)):
[1]

This shows that given an input of [1] the property does not hold.

Fuchu

Fuchu is more focussed on test organisation than assertions, and can be used with any of the assertion-providing libraries above. If you’d like to try something different to the usual (N|x|Mb)Unit approaches for defining test cases then give it a look.

Digitising hand drawn sketches

2014-05-21T22:30:00+10:00

Every so often I want to quickly sketch out what should be a simple diagram. Irrespective of what drawing program I use, I always seem to take much more time than I intend for a result that does not even remotely resemble what I want.

So I decided to give up and find a way to use hand-drawn sketches instead. Here’s the method I ended up with, based almost entirely on Marc Liberatore’s “Whiteboard Diagrams as PDFs” post and the wonderful ImageMagick and Potrace utilities.

My drawings still look fairly terrible, but at least they convey what I want them to and are quick to produce! :)

Ingredients

Paper / whiteboard
Marker
Phone with camera
ImageMagick: brew install imagemagick
Potrace. I used the binary distribution, although it’s also on Homebrew.

Method

Commit sin against art

Here’s a sample result of me unleashing my inner da Vinci on a poor, defenceless bit of paper.

I don’t know much about art, but I repeat myself

I’ve found a thick texta/marker works well, but standard ball-point pens can come out alright too.

Photo cameras: not just for tweeting lunch

Next, take a photo with ye olde phone camera. I avoid using the flash - even light (no shadows) is best. (Try holding the paper up vertically so your phone does not cast a shadow over the paper.)

My phone auto-uploads photos to an online thingoe from which I can quickly crop the image and download to my Mac.

Post-processing

Next up I want to convert the photo to a grayscale bitmap and turn up the brightness and contrast to wash out the background and bring out the marker lines. I’m using ImageMagick’s convert to quickly do this from the console.

We’ll then run the bitmap through potrace as described in Marc’s post to create a nice SVG. We can stop there, or use ImageMagick again to get a PNG out.

Here’s the original photo, which I’ve cropped and saved as sketch.jpg:

Taken with no flash. Cropped and downloaded with no processing.

Then the adjustments:

# Convert to grayscale BMP. Dial up brightness (20) and contrast (10)
% convert sketch.jpg -colorspace Gray -brightness-contrast 20x10 sketch.bmp

# Convert to SVG (-s), set a reasonable height, smooth speckles (-t 10)
% potrace -s -H 400pt -t 10 sketch.bmp

# Convert SVG to PNG (using 256 colours)
% convert sketch.svg PNG8:sketch.png

And here’s the output:

The end result

You may need to tweak settings like brightness, contrast, dimensions, PNG quality/size¹ and so on.

Might be worth also running Pngcrush or similar optimiser over the resulting PNG: brew install pngcrush; png crush sketch.png sketch2.png. A GUI option for Mac is ImageOptim.↩

Haskell without the Haskell Platform

2014-05-10T20:00:00+10:00

Apparently it can be a bit tricky to get some Haskell libraries working on Windows, in which case the Haskell Platform is a great way to get going with Haskell. For Mac and Linux the platform works too, but we can also just grab the latest GHC and Cabal (ooh, shiny!) and go from there.

UPDATE 2019-02: I tend to use ghcup on Mac and Linux these days. Am leaving these steps here as building GHC and Cabal is still a valid way of getting Haskell up and running.

This is how I got it working on my Mac, with loads of help from ddere and bitemyapp on the #haskell-beginners channel on Freenode IRC. It is reasonable to assume all mistakes in this write up are mine, while they deserve the credit for any useful bits.

I’ve got XCode 5.1.1 installed, which I believe is a prerequisite (or at least the dev tools?). Other than that, grab a terminal and a browser, and we’re set to go.

tl;dr

UPDATE 2019-02: Check out ghcup for an easier way to get a platformless Haskell running on Mac or Linux. I’ve switched to using that to manage Haskell installations.

Here’s the summary if you want the steps without explanation:

Grab the binary GHC distribution, extract, configure --prefix=, make install, and add to PATH
Grab the Cabal binary and add it to the GHC bin directory
cabal update; cabal install cabal cabal-install alex happy
Add ~/.cabal/bin/ to PATH
Build projects in a sandbox (cabal sandbox init)
Build binaries in a sandbox and symlink or copy to ~/.cabal/bin; or install directly into ~/.cabal and rm -rf ~/.ghc if we ever get build conflicts. I’m doing the former.

The rest of the post will go through the specific commands used, and explain some of the decisions you might need to make.

Installing GHC

Grab the latest binary distribution of GHC and extract it somewhere (I used ~/dev/ghc-7.8.2)
Open a terminal and run ./configure --prefix= from the extract directory. I used ./configure --prefix=/Users/dave/dev/ghc.
make install
Next I added the GHC binaries to my PATH. That’s ~/dev/ghc/bin for me.

We should now be able to run ghc, ghci and co. Success!

Bootstrap cabal-install binary

Grab the latest cabal-install binary.
Extract it and copy the cabal binary somewhere. I put mine in alongside my GHC binaries in ~/dev/ghc/bin so it is on my PATH and I can quickly fallback to it if I nuke everything else but GHC.
Run cabal update to initialise the package database.

This will just be used to kick off our cabal-ing. Afterwards we’ll be managing cabal with cabal (for that nice recursive touch).

Final bits and pieces

We’re now going to build and install some final bits and pieces into Cabal’s user-db (stored in ~/.cabal/).

% cabal install cabal cabal-install alex happy

Next up I adjusted my PATH to make sure binaries are loaded from ~/.cabal/bin first¹. My PATH now looks like this:

export PATH=~/.cabal/bin:~/dev/ghc/bin:(non-haskell stuff)

New projects

We should now have everything we need to build Haskell projects. For projects we’ll run all our cabal install commands within a sandbox.

% mkdir myNewProj
% cd myNewProj
% cabal sandbox init
% cabal init
-- insert joyous haskelling here --

Moar binaries!

Sometimes we’d like to use cabal to install some binaries like hlint, hoogle or pointfree. I’ve heard a few schools of thought on this.

Sandboxed builds

Here is what I’ve found works reasonably well for me. I’ve created a directory ~/dev/hs/ to build these utilities in. From there:

~/dev/hs/ % mkdir hlint
~/dev/hs/ % cd hlint
~/dev/hs/hlint % cabal sandbox init
~/dev/hs/hlint % cabal install hlint
~/dev/hs/hlint % ln -s "$(pwd)/.cabal-sandbox/bin/hlint" ~/.cabal/bin/

This builds gives us a fresh hlint binary and creates a symbolic link to it in the .cabal/bin directory (i.e. somewhere on my PATH). Sometimes I’ll copy instead of symlink.

The good thing about this is if I need to use specific versions of a particular dependent library for a build I can cabal install it without worrying about it affecting other builds outside the sandbox.

The catch is some libraries also link against static assets that get put in $(pwd)/.cabal-sandbox/share, which means if we move or delete this sandbox that binary will stop working.

In user-db

The other approach is to cabal install the utility outside of a sandbox. This means all docs and static assets go into a safe location (~/.cabal), but on the downside we’ll sometimes get build failures due to library version conflicts.

In these cases we need to delete everything in ~/.ghc and try again. I have it on good authority from several sources that this is no problem. All our binaries in ~/.cabal should still work, it just means next cabal install won’t rely on cached library builds.

Still, I feel more comfortable with the sandboxed build approach (almost definitely because I don’t fully understand what’s going on behind the scenes).

Pandoc example

At the time of writing I had some trouble building the wonderful Pandoc library due to a change in a dependent library. Pandoc is a library that relies on statically linked assets by default which was mentioned in the sandboxed builds section as a possible problem. Thankfully it provides a build option to embed these assets.

% cd ~/dev/hs
% mkdir pandoc
% cd pandoc
% cabal sandbox init
% cabal install exceptions-0.4
% cabal install hsb2hs
% cabal install pandoc -fembed_data_files
% cp "$(pwd)/.cabal-sandbox/bin/" ~/.cabal/bin/

Installing a specific version of exceptions-0.4 fixed the build problem, while passing the -fembed_data_files option to the Pandoc build embeds the static assets so we can move the binary and delete the sandbox without breaking Pandoc.

Thanks to Carter for telling me which version of exceptions I needed, and about -fembed_data_files for Pandoc.

Request for corrections

This seems to be working ok for me, but if you can see any problems with this approach or can suggest any improvements please let me know and I’ll update the post.

We’ve now installed a verion of the cabal binary into ~/.cabal/bin. By putting that into our PATH first we’ll always use the latest version for our builds. If we lose our ~/.cabal for some reason then we can fall back to the one we put into the ghc folder earlier.↩

Some regex help from the F# compiler

2014-04-19T21:45:00+10:00

tl;dr: Make invalid regular expression strings and attempts to access non-existent capture groups a compile-time error, thanks to the Regex type provider.

Standard .NET regex

Say we want to parse out some information from basic Liquid tags, like this:

[Test]
public void GetInformationFromAllSampleTags() {
    const string input =
        @"This is a test. {% sample %}ABC{% endsample %}. Some {% other %} 123 {% endother %} tag.
          {% sample %} DEF {% endsample %}";
    GetSamples(input).ShouldBe(new [] { "ABC", "DEF" });
}

We can give ourselves two problems and implement this using System.Text.RegularExpressions (it looks almost identical in C#, see this gist or footnote¹):

// F#
let getSamples s : string seq =
    let re = @"\{%\s*(?\w+)\s*\%\}(?(?s:.*?))\{%\s*end\1\s*%\}"
    Regex.Matches(s, re)
        |> Seq.cast<Match>
        |> Seq.filter (fun m -> m.Groups.["tag"].Value = "sample")
        |> Seq.map (fun m -> m.Groups.["contents"].Value.Trim())

F# type provider version

First up we need to add the RegexProvider to our project via nuget: PM> Install-Package RegexProvider.

Now we can rewrite our previous implementation like this:

open FSharp.RegexProvider
type LiquidTagRegex = Regex< @"\{%\s*(?\w+)\s*\%\}(?(?s:.*?))\{%\s*end\1\s*%\}" >

let getSamples s : string seq =
    LiquidTagRegex().Matches(s)
        |> Seq.filter (fun m -> m.tag.Value = "sample")
        |> Seq.map (fun m -> m.contents.Value.Trim())

This will compile equivalently to our previous implementation², but we’ve gained some nice static checks.

We can access the tag and contents capture groups of our match as properties. This isn’t a method_missing-style dynamic lookup – if we rename the group in the regex to (?\w+) then we get a compile-time error:

error FS0039: The field, constructor or member 'tag' is not defined

Also neat, if we completely muck up our regex, the compiler will let us know:

error FS3033: The type provider ... reported an error: parsing "[asd" -
Unterminated [] set.

Tests would catch both these errors, but feedback doesn’t get much faster than “as we’re typing the code”, plus we get precise line numbers for the errors as well. It also reduces code noise, dealing directly with the capture group names rather than having to specify particular collection lookups.

An equivalent implementation in C#:

// C#
public IEnumerable<string> GetSamples(string s) {
    var re = @"\{%\s*(?\w+)\s*\%\}(?(?s:.*?))\{%\s*end\1\s*%\}";
    return Regex.Matches(s, re)
                .Cast()
                .Where(m => m.Groups["tag"].Value == "sample")
                .Select(m => m.Groups["contents"].Value.Trim());
}

↩

The type provider creates a type with the tag and contents properties, but this type gets erased in the final compiled output, replaced with the Groups accessor code from our original implementation.↩

dave^2 = -1

Using phantom types to associate static values and generic types

What is a phantom type?

A problem associating a static value with a type

Phantom types to the rescue

Conclusion

Quick, hacky truth tables in Haskell

For sale: Several tonnes of yak hair, going cheap

Local build

Add build to TeamCity

Docker

Add Docker build to TeamCity

StandaloneDeriving to fix forgetfulness in GHCi

Aggregation

Motivating example

Initial attempts

Representing aggregate calculations

Using our aggregation types

What have we gained for the price?

Conclusion

Suggested reading

Specifying FAKE targets

Pondering a prescription for pattern matching prevalance

Glorified conditional?

Pattern match all teh things!

The “meh” of matching

Combinator all teh things!

An aside for pattern matching-less languages

Conclusion

Currying vs. partial application

Reading type annotations

Structure

Generics

Higher order functions

Curried functions

Unit values

Types inside implementations

Conclusion

The Apply pattern

Aim

Prerequisites

Apply pattern

Example

When a generic type does not meet the prequisites

Mixing widget and non-widget arguments

Summary

Git tidbit: Comparing different paths across branches or commits

F# type signature gotchas

Apostrophes for type parameters

Asterisk for tuples

Postfix generic syntax

Using unfold

Translating to other languages

F#: Pattern matching on field literals

D3 newbie updates a bar chart

Starting point

Adjusting the bars for new data

Updating the axes

Basic transition

End result

A simple circuit, an Arduino, and Haskell

Nodebot prerequisites

A simple circuit

Haskellbot

Running this… er… ‘masterpiece’

Reasoning with more than evaluation

Direct substitution

Using other equalities

Conclusion

F# assertion libraries

xUnit assertions

FsUnit

Unquote

FsCheck

Fuchu

Digitising hand drawn sketches

Ingredients

Method

Commit sin against art

Photo cameras: not just for tweeting lunch

Using `unfold`