# Getting Values Out of Monads

Monads as used in functional programming are ‘one way’. The monad type class provides a way to put values into the monad (with return) but no way to get values out.

If you want to get values out of a monad you have to write a function yourself. Such a function would have a type h :: m a -> a where m is a particular monad. It’d be nice if there were some guarantees about these functions should behave. We’re stepping outside the jurisdiction of the monad laws, so you don’t get any help there. So what are the relevant laws?

If you put a value into the monad and immediately extract a value, you want to get the same value you put in. In other words you want h (return a) = a, which is the same as

If you have two layers of monad m (m a) then there are two ways you could get a value out. You could lift h into the monad with liftM (or fmap) to get values out of the inner monad, and then hit the whole thing with h. Alternatively you could use the monadic join to collapse the two layers of the monad into one, and then apply h once. It would be nice if you get the same result either way, i.e. if h (join x) = h (fmap h x), or

Taken together, these two laws are exactly the laws of an algebra over the monad. Categorically, an algebra over a monad M is an object A together with a map h : MAA such that the following diagrams commute:


Note that the categorical definitions says that you have to specify an object as well as a morphism. In the category Hask objects are types. If we want to specify a type class for algebras, it must be a multi-parameter type class where you specify the underlying type as well as the monad - i.e. the algebra doesn’t have to be polymorphic over all types, it is allowed to vary depending on the type of values in the monad.

In terms of do notation, the algebra laws mean that the following program transformation is valid, where xs has type m (m a):

## Examples

Enough theory - what are some examples of algebras? For the list monad, both sum and product are list algebras for any numeric type. You can check that they satisfy the algebra laws, i.e.

and similarly for product.

Note that the function length isn’t a list algebra, since length (join [[1],[2,3]]) is 3, but length (fmap length [[1],[2,3]]) is 2.

The function head is nearly a list algebra, only failing because it isn’t defined for the empty list. In fact it’s an algebra over the type of non-empty lists.

Similarly maximum and minimum are algebras over the type of non-empty lists.

More generally, if a is an instance of the Monoid type class, then foldr mappend mempty is a list algebra. This is reflected in the definition of the fold function in Data.Foldable, which has type

showing that fold is an f-algebra for any f that is an instance of Foldable. This gives algebras over Maybe, Either a, Tree and many more.

Given any value a of type r then h m = runReader m a is a Reader r-algebra.

Similarly h m = evalState m a is a State s-algebra, and fst . runWriter is a Writer w-algebra.

For the monad of probability spaces, the expectation is an algebra. If P(PA) is a distribution of probability distributions, then it doesn’t matter if we take inner expectations before taking the outer expectation, or if we collapse the entire distribution first and takes its expectation. This is related to the law of iterated expectation. Note that higher moments aren’t algebras, and nor are central moments.

I had hoped that for the parser monad I could define an algebra by

However, this fails the algebra laws. In particular, if we have a parser of type Parser (Parser a), the outer parser might need to read some of the input string before the inner parser will match correctly, so we can’t just map over the inner parser. So I wonder if there are any algebras for the parser monad?

For the IO monad, the obvious candidate for an algebra is

My intuition is that this shouldn’t be an algebra, i.e. it should fail to respect one of the algebra laws. However, I haven’t been able to construct an example where it fails yet. Anyone have any ideas?

In this post I want to explain the relationship between trees and the reader monad. In particular, how trees generalize readers.

Consider the type of binary trees with values at the leaves

This is both a functor and a monad –

We want a way to get a value out of the tree. One way to do this is to pick a direction, either left or right, and then keep taking that branch until you reach a leaf.

There are many different trees t that all return the same value when you call either runTree t True or runTree t False. If two trees s and t return the same value under runTree for all values of the boolean argument, say that s is equivalent to t and write s ~ t.

This satisfies the axioms for an equivalence relation – reflexivity (every tree is equivalent to itself), symmetry (if s ~ t then t ~ s) and transitivity (if r ~ s and s ~ t then r ~ t). So it makes sense to consider the set of equivalence classes of binary trees under this relation.

What may be surprising is that the type of “equivalence classes of binary trees” is also a monad. To define the monad we have to define return and bind for it. It’s nontrivial to do this in Haskell, but relatively easy mathematically.

Let C (for class) be the operation that maps a tree to its equivalence class, and R (for representative) be the operation that selects a particular tree from an equivalence class. Then it must be true that C(R(x)) = x. It isn’t true in general R(C(t)) = t, but we will always get a tree that’s equivalent to the one we put in.

We define ‘return a’ for the monad to be the equivalence class C(Leaf a).

To define x ≫= f first choose a representative of x and bind it to the function Rf using the normal bind operation for trees. Then take the equivalence class of the resulting tree. Symbolically,

$$\newcommand{\bind}{\gg \!\!=} x\bind f = C( R(x) \bind R\circ f )$$

So we have a monad - but what monad is it? Interestingly, it is exactly equivalent to Reader Bool.

To see why, note that two trees whose far left branch and far right branch are the same are equivalent (because runTree either traverses all the way down the left of the tree, or all the way down the right). So specifying an equivalence class of trees just requires specifying a pair of values of type a, i.e. the type a2. This is equivalent to the type 2 → a, which is the same as functions Bool -> a, which is the same as Reader Bool a.

To generalize to arbitrary readers we should consider r-ary trees Tree r a rather than binary trees – that is, trees that have r branches at each node. To deal with fact that r might be infinite we work with functions r -> Tree r a, which are equivalent to (Tree r a)^r, an r-tuple of trees. In Haskell,

Again this is both a functor and a monad

and again we can write a function runTree that, given an r, takes the same branch of the tree repeatedly until it reaches a value:

Consider two trees equivalent if they give the same result when fed to runTree r, for all values of r. Then the equivalence class of trees is a monad in the same way as before, and this monad is exactly the Reader r monad. To make this really obvious we could rename the data type and constructors, and define the runReader and ask functions:

You can then then use this like any other reader:

The local function can be defined similarly, in terms of Ask.

## Thoughts

One reason for thinking of readers as trees rather than using the more familiar definition is that the monadic join operator on trees T maps TT structures of size n to T structures of size n (where by ‘size’ I mean the number of leaves in the tree). This isn’t true for monads in general – for example, two layers of Reader Bool a is isomorphic to a4, which gets mapped to a2 by monadic join. So join preserves the combinatorial interpretation of the tree.

This is related to some stuff I’m thinking about to do with combinatorial species and how they relate to functional programming. A comment by Brent Yorgey sparked the idea for this post.

It seems likely that given any monad T and a family of equivalence relations ~a on Ta, the set of equivalence classes under the relation ~ is always a monad, but I haven’t thought much about it.

I think this is all extremely similar to a post written a few years back by Dan Piponi where he worked throught he details for the state monad instead of reader.

Most languages have some form of printf function that lets you print formatted strings. In Python it looks like this

In C it looks like this

What they have in common is that they take a format string as their first argument, followed by a variable number of arguments, of varying type. The arguments are formatted, interpolated into the string and then printed to the screen.

The immediate problem you face if you want to write printf in Haskell is the type system. All functions have a type, and the type determines what arguments it can accept. You can write polymorphic functions that accept a range of input types, but writing a function that accepts a variable number of arguments? That seems tough.

I recently read through the source of the Text.Printf module, and was impressed by how concise and self-contained it is, and especially by the neat trick that allows printf to take a variable number of arguments. In this post I’ll walk through creating a simplified, cut-down version of Text.Printf to give a flavor of the module.

I’ll need the following uncontroversial language extensions:

First define a data type to hold things that are possible arguments to printf (in our cut-down module, we will only care about interpolating strings and integers into the format string).

With this in hand, we could write the universal printf function, uprintf. This takes a format string and a list of arguments to be interpolated into the string, and returns the formatted string. Its type signature is therefore:

All printf functionality will eventually be implemented in terms of this function. We’ll hold off on giving its implementation until later in the module, and focus on how we can write the polyvariadic, polymorphic printf function.

We get polymorphism using type classes. Define a type class for types that can be converted into a UPrintf, along with a couple of useful instances:

There are multiple valid return types for the printf function. For example, we might want to return a String, or an IO action that outputs to a file or to stdout, or more generally any instance of MonadIO. So introduce a type class for valid return types:

The spr function takes a format string and a list of suitable arguments to be interpolated into the string, and returns something. The key is in what the valid return types are. Certainly, it makes sense to return a string by calling uprintf (we reverse the argument list, for reasons that will become clear in a bit)

It also makes sense to return an I/O action that does some printing

Here’s the cool part. If a is an instance of PrintfArg and t is an instance of PrintfType, then a function a -> t is also a valid instance of PrintfType – you should picture it as consuming the a, and outputting another valid return type (which might be another function).

This is one way to implement polyvariadic functions in Haskell. Functions are curried by default so it’s easy to incrementally consume arguments, and return a function that consumes the rest of the arguments.

This is why we need to reverse the argument list when we make the final call to uprintf – the arg list acts like a stack, so that the last arguments pushed onto it are the first ones out. We need to reverse the stack at the end so that we format the arguments in the right order.

Now to write printf, just call spr with an empty argument list – the supplied arguments will be consed onto the list by spr until there are none remaining, and uprintf is called:

The rest is just details. The uprintf function incrementally consumes the format string, building the output up as it goes. If it exhausts either the format string or the argument string while the other still has elements remaining, then it throws an error. Otherwise it watches for the % signs that signal a value to be interpolated, and calls the correct formatting function:

The fmt function looks at what type of value is to be interpolated, and makes a call to the appropriate function for converting the argument into a string:

Our simple module only allows for strings and integers to be interpolated into the format string, and doesn’t allow for complicated format strings like "%8d" or "%6.2f". These features can be implemented by adding more cases to fmt – the fundamental architecture that allows for polymorphic, polyvariadic functions is in place.

# Swap Two Variables Without Using a Temp Variable (With Math!)

I wrote this as a comment on Hacker News before polishing it up to put it here – see original.

The question “How do you swap two variables without using a temporary variable?” is a favorite of unimaginative interviewers everywhere. If you haven’t seen it, the typical solution is three bitwise exclusive or (xor) operations in succession:

As you can imagine, this is extremely useful in situations where you just can’t spare even one byte of memory to hold a temporary variable.

This is known as the ‘xor trick’. But it’s not the only solution. Here’s another way:

That looks a bit different - instead of performing the same operation three times, we performed an addition followed by two subtractions.

Can we describe all possible solutions? Sure - you just need the concept of a mathematical group. A group is a set of values G together with a binary operation (traditionally written as a filled dot · ) that satisfy four axioms:

1. Closure – for all values a and b in G their multiplication a · b is also in G.

2. Associativity – for all a, b and c in G you have (a · b) · c = a · (b · c).

3. Identity – there is a value e in G such that for every a in G, the equation e · a = a · e = a holds.

4. Inverses – for every a in G there is an inverse a-1 such that the equations a-1 · a = e = a · a-1 hold.

If the types of a and b form a group, then we can pull off the ‘xor trick’ like this (using the left arrow for assignment)

\begin{align} a & \leftarrow a \cdot b \\ b & \leftarrow a \cdot b^{-1} \\ a & \leftarrow b^{-1} \cdot a \end{align}

The values of a and b as we perform each assignment can be shown in a table. Initially we’ve set a = c and b = d.

$$\begin{array}{l|c|c} & a & b \\ \hline \textrm{} & c & d \\ a \leftarrow a\cdot b & c\cdot d & d \\ b \leftarrow a\cdot b^{-1} & c\cdot d & (c\cdot d)\cdot d^{-1} = c \cdot (d\cdot d^{-1}) = c \\ a \leftarrow b^{-1} \cdot a & c^{-1}\cdot(c\cdot d) = (c^{-1}\cdot c)\cdot d = d & c \\ \end{array}$$

Note that it’s important that we write b-1 · a in the last assignment and not a · b-1 since you can’t in general assume that the group you’re working with is commutative.

Groups come up a lot in programming, so there are many ways to pull off the variable swapping trick:

It works when a and b are integers, since integers form a group where the operation is addition, the identity is 0 and the inverse of n is -n. Remarkably we don’t have to worry about integer overflow when adding two machine ints, since integers also form a group under addition modulo N.

It works when a and b have fixed-length binary representations, since bitstrings form a group where the operation is bitwise xor (the ^ operator), the identity is a string of 0s and every string is its own inverse (this last property is why when you do this variable swap trick with xor, you end up repeating the same operation three times).

It almost works when a and b are nonzero floats – it’s nearly (but not quite) true that nonzero floats form a group where the operation is multiplication, the identity is 1 and the inverse of x is 1/x. It’s not quite a group, because of issues like over- and underflow, NaN values and non-associativity of floating point multiplication.

It works when a and b are matrices of fixed dimensions, which form a group where the operation is addition, the identity is the zero matrix and the inverse is given by elementwise negation.

It almost works when a and b are square, nonsingular matrices of the same dimension, which form a group where the operation is matrix multiplication, the identity is the identity matrix and inverses are given by the matrix inverse (again, this only works modulo floating point weirdness).

It works when a and b are permutations, which are a group under composition, with identity given by the null permutation and inverses are ‘undo all the swaps’.

So there you go - a multitude of ways to swap two variables without using a temporary variable. With math!

Edit: It was pointed out to me by a commenter on Reddit that you don’t need the full properties of a group. A quasigroup is a set Q with three binary operations: multiplication *, left division \ and right division / which satisfy the following axioms:

\begin{align} x \backslash (x * y) & = y \\ x * (x \backslash y) & = y \\ (x * y) / y & = x \\ (x / y) * y & = x \end{align}

Every group is automatically a quasigroup if we define a * b = a · b, a / b = a · b-1 and a \ b = a-1 · b. So quasigroups are a generalization of groups. If we have a quasigroup then the xor trick can be expressed as

\begin{align} a & \leftarrow a * b \\ b & \leftarrow a / b \\ a & \leftarrow b \backslash a \end{align}

You can follow through the assignments using the quasigroup axioms where appropriate to prove that this really doe swap the two variables.

It’s exciting that every Latin square defines a quasigroup, and complete solutions to Sudoku are examples of Latin squares. Therefore every problem of the type “swap two variables without using a temp variable” can be solved using a sufficiently large Sudoku.

# The Algebra of Algebraic Data Types, Part 3

I gave a talk on this subject at London Haskell in November 2012. A video of the talk is on YouTube and slides are on GitHub. This is Part 3 in a series; you can also read Part 1 and Part 2.

Last time I described recursive types, showed how they can be written as algebraic equations, and you you can solve the equations to deduce interesting facts about types. In this post I’ll explain what it means to do calculus with types.

## Zippers

The regular Haskell list is a linked list. While they are simple to define and convenient to work with, they have disadvantages. If you want to append an element to the list of length n, it takes O(n) time. Accessing or modifying elements partway through the list is also slow. Linked lists don’t support efficient random access.

You could solve this with an array, but immutable arrays have their own problems. Data sharing is harder, for example, if you want the elements of the array to be contiguous in memory.

Fortunately, not all applications require random access. Sometimes you just want to keep track of your location within a list, with the ability to move around in the list, possibly modifying elements as you go.

This is exactly the functionality offered by a zipper, which is a data structure with a pointer to a distinguished location, called the focus.

### List Zippers

You can write a zipper for any data structure, but I’ll focus on lists.

To store the pointer we could wrap the list up with an Int that stores the location currently in focus. That’s fraught with all kind of difficulties, though – and it doesn’t meet our requirements for efficient access to the element in focus.

Instead, we define a new type composed of the element in focus, coupled with a pair of lists - one of which contains the elements before the focus, and the other the elements after the focus. Schematically it looks like this:

The element in focus is in the center, and the two lists on either side. In Haskell we would write:

data Zipper a = Z [a] a [a]


For a zipper to be useful we need to define functions that move the focus left and right. For efficiency reasons we store the first list in reverse order – we need to efficiently add and remove elements from the end of the list, rather than the head:

left  (Z (l:ls) c rs) = Z ls l (c:rs)
right (Z ls c (r:rs)) = Z (c:ls) r rs


If you want to read more about zippers, the final chapter of Learn You a Haskell has a great introduction.

## One-Hole Contexts

The zipper is a product type – it’s the product of two lists of as and a single a. Remembering that lists are L(a) = 1 / (1 - a), the list zipper LZ(a) is written as

\begin{align} L\!Z(a) & = a \cdot L(a) \cdot L(a) \\ & = \frac{a}{(1-a) ^ 2} \end{align}

Because a zipper is a product type, it’s possible to tease it apart, decomposing it into a single piece of data of paired with two lists. The two lists tell us where the focus is, and we complete the zipper by filling in the piece of data that’s in focus. Algebraically we have

$$Z(a) = a \cdot L(a) ^ 2$$

which is clearly equivalent to the previous expression. The first part of the product is just a piece of data, and we can think of the second as a data structure with a hole. When we ‘fill in’ the hole by pairing it with some data, we get a zipper. This ‘data structure with a hole’ goes by the name of a one-hole context. For the list zipper, this looks like:

The empty circle is a hole, representing ‘no data’ – it’s a placeholder for data. The type that contains no data is the unit type (), so we can say that the type of a hole is ().

There’s nothing special about list zippers – any zipper can be decomposed into a one-hole context, and a piece of data that fills the hole.

### One-Hole Contexts for Tuples

I said that you can write a zipper for any data type, so let’s write some zippers for tuples – that is, types of the form $$a ^ n$$ for some whole number n.

#### One-Tuples

The simplest case is a one-tuple, which is just a single piece of data. A zipper for a one-tuple is trivial - there’s only one location for data, so there’s only one focus – the zipper for a one-tuple a is just a.

But remember that a zipper is a piece of data paired with a one-hole context, so what is the one-hole context? There’s only one place to put the hole, and once we’ve put a hole in a single piece of data, we’re just left with the hole! Schematically, the pairing of the data with the one-hole context looks like this:

$$( a, \circ )$$

The type of the one-hole context has to be (), or 1, so that when we paired it with the data a, we get a back again.

#### Two-Tuples

In a two-tuple there are two possible locations for the hole. Remembering that a sum represents a choice, either the left constructor or the right constructor, the pairing of data and one-hole context looks like:

$$\Big( a , \; (\circ, a) + (a,\circ) \Big)$$

Remembering that the type of the hole is 1, the type of the one-hole context is a + a, or 2a.

#### Three-Tuples

In a three-tuple there are three possible locations for the hole. The pairing of data and one-hole context looks like

$$\Big( a, \;(\circ,a,a) + (a,\circ,a) + (a,a,\circ) \Big)$$

The type of the one-hole context is a² + a² + a², or 3a².

### Finding the pattern

Perhaps you’ve already seen the pattern. In the left column are the types of n-tuples for n = 1, 2, 3 and in the right column are their corresponding zippers:

\begin{align} a^{\phantom{2}} &&& 1 \\ a^2 &&& 2a \\ a^3 &&& 3a^2 \end{align}

If you’ve studied any calculus, you will immediately spot the pattern – for an n-tuple of type $$a ^ n$$, the type of the zipper is $$n a ^ {n-1}$$, which is exactly the derivative of $$a ^ n$$.

In fact, this pattern always applies! This was noted by Conor McBride in a paper, whose title gives the key insight:

“The Derivative of a Regular Type is its Type of One-Hole Contexts”

If you take a type expression and differentiate it, the result is the type of one-hole contexts for that type. This makes it easy to figure out how to write zippers for arbitrary data types.

## Calculus of Types

I’ll use the notation $$\partial_a$$ to mean ‘take the derivative with respect to a’. You may already be familiar with the notation $$d / da$$ for differentiation – this is no different, but it has the advantage that it can be typeset on one line!

How far can we push the analogy between taking the derivative of a type and finding its type of one-hole contexts? Is it valid to differentiate a type in all contexts, or do we have to be careful? We should certainly check that the operation of differentiation makes sense when applied to the types we already know about, for example.

### Constants

Here’s a simple rule from calculus – the derivative of any constant is zero:

$$\partial_a({\rm const}) = 0$$

To interpret this for types, remember that the derivative operator d/da makes holes in a data structure that contains data of type a. This equation is saying that if a type doesn’t have any data of type a, then its one-hole context is Void, i.e. you can’t create any instances of it.

### Sums

The sum rule for derivatives is that for any f(a) and g(a), differentiating the sum is the same as differentiating individually and then summing:

$$\partial_a (f(a) + g(a)) = \partial_a f(a) + \partial_a g(a)$$

This tells us that making a hole in the sum type is equivalent to making a hole in each of the summands, and taking their sum. In more explicit language, if F a and G a are types with one-hole contexts DF a and DG a, then the one-hole context of

type Sum a = Add (F a) (G a)


is

type DSum a = Add (DF a) (DG a)


### Products

In calculus, the derivative of a product is given by Leibniz’s product rule:

$$\partial_a ( f(a) \cdot g(a) ) = \partial_a f(a)\cdot g(a) + f(a) \cdot \partial_a g(a)$$

Interpreted as a statement about types, this tells us that making a hole in the product of two types is equivalent to either making a hole in the first type (and leaving the second one as it is) or making a hole in the second type (and leaving the first one as it is). That is, the one-hole context of

type Prod a = (F a, G a)


is

type DProd a = Add (DF a, G a) (F a, DG a)


### Composition

We haven’t yet met composition. This occurs when you have a data structure F that holds elements of type G a. In Haskell, we could write this as

data Compose f g a = Compose (f (g a))


where f and g have kind * -> * and a has kind *, although its simpler to keep the syntax light and just write f (g a).

In calculus the chain rule tells us how to differentiate compositions:

$$\partial_a f (g (a)) = \partial_a g(a) \cdot \partial_g f (g(a))$$

As a type equation, this says that to make a hole in a layered data structure, we need a product – one half of the product tells us where the hole is in the outer structure, and the other half keeps track of the hole in the inner structure. That is, the one-hole context of

type Comp a = F (G a)


is

type DComp a = (DG a, DF (G a))


## Deriving Zippers

I said that knowing that the derivative of an ADT corresponds to its type of one-hole contexts makes it easy to derive zippers for arbitrary data types. Now I’ll justify that assertion.

### List Zippers

The type of lists is

$$L(a) = \frac{1}{1-a}$$

We can differentiate this using the quotient rule and the chain rule, finding

$$\partial_a L(a) = \frac{1}{(1-a) ^ 2} = L(a) ^ 2$$

which says that the one-hole context of a list is a pair of lists, exactly as we saw earlier. To get the list zipper we take the product of an a with a pair of lists L(*a).

### Tree Zippers

Remember that the type of binary trees T(a) is defined by the recursive equation

$$T(a) = 1 + a \cdot T(a) ^ 2$$

We could solve for T(a) as we did in the previous post, and differentiate the result. However, it’s much easier to use implicit differentiation to differentiate the above expression, to get

$$\partial T(a) = T(a) ^ 2 + 2 \cdot a \cdot T(a) \cdot \partial T(a)$$

We can then solve this for dT/da, finding

$$\partial T (a) = \frac{T(a) ^ 2}{1 - 2 \cdot a\cdot T(a)}$$

To make sense of this, remember that the type 1 / (1 - x) is synonymous with the type L(x), of lists of x. So the tree zipper is

$$\partial T (a) = T(a) ^ 2 \cdot L\big(2 \cdot a \cdot T(a)\big)$$

That is, the one-hole context of a tree is a product of two trees, and a list holding elements of type (Bool, a, Tree a). It looks like this:

The hole is our current location, the focus. Beneath the hole are two trees, which hold the part of the tree that we haven’t yet traversed. In this picture they are both single-element trees.

The sequence of red nodes leading back to the top of the tree is the past - the elements of the tree that we traversed to end up at the hole. Each node is associated with a Bool, which tells us whether we took the left or the right branch after passing through that node, and a Tree a which contains all the elements that we missed by not going down the other path.

Exercise: Code up the zipper for binary trees, remembering that a zipper is a one-hole context paired with data. You’ll need to write functions left and right to take the left or right path at a particular node, and up to go back up the tree.

Exercise: Rose trees are trees with arbitrarily many branches from each node. In Haskell you would define them by

data Rose a = Rose a [Rose a]


What does a rose tree zipper look like?

## Next Time

In this post we learned what it means to take the derivative of an ADT, and how to manipulate the type algebra to automatically derive zippers for arbitrary data types.

Next time I’ll talk about assigning a meaning to subtraction and division of types – and I might get on to mentioning combinatorial species.

# The Algebra of Algebraic Data Types, Part 2

I gave a talk on this subject at London Haskell in November 2012. A video of the talk is on YouTube and slides are on GitHub. This is Part 2 in a series; you can read Part 1 here.

Last time I introduced the unit type Unit or () and the zero type Void. I also introduced the type operators Add and Mul, and the function type a -> b.

In this post I’ll be using both Haskell notation and mathematical notation. The table below will help you convert back and forth between the two.

\begin{align} {\rm Void} && \leftrightarrow && 0 \\ () \,,\; {\rm Unit} && \leftrightarrow && 1 \\ {\rm Add} \; a \; b && \leftrightarrow && a + b \\ (a,b) \,,\; {\rm Mul} \; a \; b && \leftrightarrow && a \cdot b \\ a \rightarrow b && \leftrightarrow && b ^ a \end{align}

In this post I’ll talk about recursive types, and show how you can abuse the algebra of types to deduce interesting things about them.

## Maybe

Let’s ease ourselves in by exploring Maybe a. This is the type that indicates that it might contain a value of type a, but it also might be empty. It’s defined like this

data Maybe a = Nothing | Just a


The vertical bar indicates that this is a sum type, so we could write it using the type Add that we defined last time, if we first defined the types Nothing and Just a. It would look like this:

data Nothing = Nothing
data Just a  = Just a

type Maybe a = Add Nothing (Just a)


See how we’ve replace the data declaration with a type declaration? That means that Maybe a is no longer a new type – it’s just a synonym for a type we already knew about.

But we can take it further. Notice that Nothing only has one value, so it’s equivalent to (). Similarly, Just is a container with a single value of type a, so it’s equivalent to a. We therefore have

type Maybe a = Add () a


But that’s the same as saying that Maybe a is the same as 1 + a. All that Maybe does is add one more possible value to a type.

## Recursive Types

### Lists

The basic list in Haskell is a linked list. A list of as is either empty, denoted [], or it is a cons of a single a onto another list of as, denoted a : as. If we wanted to define lists ourself, we could write something like

data List a = Nil | Cons a (List a)


Let’s take a second to look at the structure of this declaration. Just like Maybe, the List type is the sum of two simpler types.

The first summand is Nil, a nullary constructor which is equivalent to ().

The second summand is Cons a (List a), which is a product consisting of an a and a list of as.

The algebraic form, if we write lists as L(a), is

$$L(a) = 1 + a \cdot L(a)$$

It seems as though we should be able to write the list type in Haskell as

type List a = Add () (a, (List a))


but in fact that won’t compile. The reason is that type synonyms are expanded out at compile time, after type checking, but before compilation. This definition would never finish expanding – it would just keep growing as

Add () (a, Add () (a, Add () (a, ...)))


and so on. The reason that this doesn’t work is to do with the way that Haskell handles recursive type definitions (like most languages with sophisticated type systems, it uses isorecursive rather than equirecursive types).

The way around it is to use a newtype declaration instead of a type declaration, and wrap the type inside a new constructor L:

newtype List a = L (Add () (a, List a))


This is just to satisfy the type checker – when the code is compiled the additional constructor will be optimized away, and what we’re left with will be identical to the type declaration above.

If I write the type List a as L(a) then this declaration for lists says that

$$L(a) = 1 + a \cdot L(a)$$

To see what lists really are, we can start expanding the definition by repeated substitution. In the expansion, whenever I see a type of the form a · a, I’ll replace it with a² to save space.

Exercise: the expression a² can denote a pair (a, a) or a function 2 → a. Show these are the same.

\begin{align} L(a) & = 1 + a \cdot L(a) \\ & = 1 + a \cdot (1 + a \cdot L(a)) \\ & = 1 + a + a ^ 2 \cdot (1 + a \cdot L(a)) \\ & = 1 + a + a ^ 2 + a ^ 3 \cdot (1 + a \cdot L(a)) \\ & = \ldots \\ & = 1 + a + a ^ 2 + a ^ 3 + a ^ 4 + a ^ 5 + \cdots \end{align}

This is telling us that a list of a is either the empty list, or it’s a list containing a single a, or a list of two a, or three a, etc. Maybe you already knew that - but it’s neat that the algebra is telling us!

But now, here is something really cool. Let’s start with our equation for lists, and instead of repeatedly substituting, we’ll forget for a moment that the objects are types and pretend that they’re arbitrary mathematical expressions. That means we’re justified in using whatever techniques we like to “solve for L(a)”.

First subtract a · L(a) from both sides:

$$L(a) - a \cdot L(a) = 1$$

Now factor out the L(a) on the left-hand side:

$$(1 - a) \cdot L(a) = 1$$

And finally, divide both sides by 1 - a:

$$L(a) = \frac{1}{1-a}$$

This looks nonsensical, because we don’t know what it means to subtract one type from another, and we certainly don’t know what it means to divide one type by another. But is it telling us something interesting anyway?

If you studied calculus, you may remember that many functions have an expansion as a Taylor Series. We can ask Wolfram Alpha what the Taylor series of 1 / (1 - a) is, and it tells us:

$$L(a) = 1 + a + a ^ 2 + a ^ 3 + a ^ 4 + \cdots$$

That is, the series expansion of L(a) is exactly what we deduced from our repeated substitutions. Although we abused the algebra horribly, by performing operations that are completely unjustified on types, we still ended up with a sensible answer.

### Trees

Consider the type of binary trees with values at the nodes. In Haskell you can write this as

data Tree a = Empty | Node a (Tree a) (Tree a)


In a way that is probably becoming familiar, we can look at this as a sum of two types - a nullary type equivalent to (), and a product type. This time it is a product of three terms, but that’s not a problem - we just use a nested product, of the form (a, (b, c)).

The definition in terms of operators that we already know about (again, using a newtype to allow us to have a recursive definition) is

newtype Tree a = T (Add () (a, (Tree a, Tree a)))


In algebraic language, with T(a) for the type of trees, we would write

$$T(a) = 1 + a\cdot T(a)^2$$

To get a better handle on what trees are, we could apply repeated substitution in the same way we did for lists, but it would be much messier. Instead, can we apply the trick of rearranging the equation to solve for T(a)?

First, rearrange to get everything on one side of the equation:

$$a \cdot T(a)^2 - T(a) + 1 = 0$$

We can now recognize this as a quadratic equation for T(a), which we can solve using the quadratic formula, to get

$$T(a) = \frac{1 - \sqrt{1 - 4 a}}{2a}$$

This makes even less sense than the equation for lists. What on earth does it mean to take the square root of a type? But, keeping calm, we again ask Wolfram Alpha for the series expansion, and it tells us

$$T(a) = 1 + a + 2a ^ 2 + 5a ^ 3 + 14a ^ 4 + \cdots$$

How do we make sense of this? The first two terms tell us that a tree can either be empty (if it is Empty) or it can have one value of type a (if it is Node a Empty Empty).

Then the next term tells us that a tree can contain two values of type a, in two different ways, while the next term tells us that there can be three values of type a in five different ways.

But that’s exactly what we notice if we enumerate all of the binary trees, grouping them by the number of values they contain. There is one tree with no values and one tree with one value. Then there are two trees with two values and five trees with three values – as in this picture (borrowed from the excellent book Analytic Combinatorics by Flajolet and Sedgewick).

The equation counts the number of different binary trees that can exist! This counting property is related to the simple value-counting examples we saw in the first post, and also related to combinatorial species, as written about extensively by Brent Yorgey. Species and types have a lot in common, although they are not the same thing.

## Seven Trees In One

If we restrict to trees that contain only the unit type, i.e. Tree (), then the a appearing in T(a) is equal to 1, and we can write the defining equation of trees as

$$T = 1 + T^2$$

By playing with the algebra, and repeatedly using the fact that $$T ^ 2 = T - 1$$, we can deduce that

\begin{align} T^6 & = (T-1) ^ 3 \\ & = T ^ 3 - 3T ^ 2 + 3T - 1 \\ & = T (T - 1) - 3T ^ 2 + 3T - 1 \\ & = - 2T ^ 2 + 2T - 1 \\ & = -2(T-1) + 2T - 1 \\ & = 1 \end{align}

which, when interpreted in the language of types, says that a six-tuple of trees is equivalent to the unit type. In other words, there is only one six-tuple of trees.

This is clearly nonsense, so what’s gone wrong? Even more mysterious is that if we just multiply both sides of the equation by T, we get

$$T ^ 7 = T$$

which isn’t nonsense - it says that a seven-tuple of trees is equivalent to a single tree.

At first glance this isn’t a profound result. Any two types that have a countably infinite number of possible values are equivalent - that’s what being countable means.

However, it’s more subtle than this. In the paper Seven Trees In One, Andreas Blass not only shows how to find an explicit map from seven trees to one and back again (essentially he tells you how to write the from and to functions from the previous post), but he shows that the functions never have to look more than four levels deep into any of the trees.

Finally, he explains why deducing that $$T ^ 7 = T$$ is valid, but $$T ^ 6 = 1$$ is not. It turns out that you can turn a derivation like the one I gave above, which uses subtraction, into an ‘honest’ proof that doesn’t use subtraction, and hence is valid for types – but only when the power of T that you start with is one more than a multiple of 6. Hence you can reduce $$T ^ 7$$ to $$T$$ (because 7 = 6 + 1) but you can’t reduce $$T ^ 6$$ to 1.

## An Explanation?

It would be fair if you come away thinking that this post has raised more questions than it has supplied answers. For example,

• What does it mean to subtract, divide or take square roots of types?
• What else you can do to a type equation that still has a meaningful computational interpretation?
• Why do unjustified manipulations of type equations produce sensible answers anyway?

As a teaser for the last question, a mathematical structure that models types is the semiring. That’s just a way of saying that you can add and multiply objects, and you have objects corresponding to 0 and 1. There is a paper by Marcelo Fiore and Tom Leinster that shows that if you can start with a complex number t defined by the equation

$$t = p(t)$$

for some polynomial p, and deduce

$$q_1(t) = q_2(t)$$

where q1 and q2 aren’t constant (they can’t just be 1) then the same result is true for semirings. In particular, it is true for types, and you can find a proof that doesn’t use subtraction, or any other operators that are illegal for types.

Next time, I’ll explain what zippers are, and describe how to do calculus with types.

# The Algebra of Algebraic Data Types, Part 1

I gave a talk on this subject at London Haskell in November 2012. A video of the talk is on YouTube and slides are on GitHub.

In this series of posts I’ll explain why Haskell’s data types are called algebraic - without mentioning category theory or advanced math.

The algebra you learned in high school starts with numbers (e.g. 1, 2, 3 …) and operators (e.g. addition and multiplication). The operators give you a way to combine numbers and make new numbers from them. For example, combining 1 and 2 with the operation of addition gives you another number, 3 - a fact that we normally express as

$$1 + 2 = 3$$

When you get a little older you are introduced to variables (e.g. x, y, z …) which can stand for numbers. Further still, and you learn about the laws that algebra obeys. Laws like

$$0 + x = x$$ $$1 \cdot x = x$$

which hold for all values of x. There are other laws as well, which define properties of numbers or of operations.

When mathematicans talk about algebra, they mean something more general than this. A mathematical algebra has three parts:

• Objects are the ‘things’ of the algebra. The collection of objects defines what we’re talking about.
• Operations give us ways to combine old things to make new things.
• Laws are relationships between the objects and the operations.

In high school algebra the objects are numbers and the operations are addition, multiplication and friends.

## The algebra of Haskell types

In the algebra of Haskell types, the objects are types, for example Bool and Int. The operators take types that already exist, and generate new types from them. An example is the type constructor Maybe. It’s not a type itself, but you use it to create types - for example Maybe Bool and Maybe Int, which are types. Another example is Either, which creates a new type from two old types - for example Either Int Bool.

### Counting

A connection to the more familiar algebra of numbers can be seen by counting the possible values that a type has. Take Bool, defined by

data Bool = False | True


There are two values that an object of type Bool can have – it is either False or True (technically it could also be undefined – a fact that I’m going to ignore for the rest of the post). Loosely, the type Bool corresponds to the number ‘2’ in the algebra of numbers.

If Bool is 2, then what is 1? It should be a type with only one value. In the computer science literature such a type is often called Unit and defined as

data Unit = Unit


In Haskell there is already a type with only one value - it’s called () (pronounced “unit”). You can’t define it yourself, but if you could it would look like

data () = ()


Using this counting analogy, Int corresponds to the number $$2 ^ {32}$$, as this is the number of values of type Int (at least, it is on my machine).

In principle we could types corresponding to 3, 4, 5 and so on. Sometimes we might have a genuine need to do this - for example, the type corresponding to 7 is useful for encoding days of the week. But it would be nicer if we could build up new types from old. This is where the operators of the algebra come in.

A type corresponding to addition is

data Add a b = AddL a | AddR b


That is, the type a + b is a tagged union, holding either an a or a b. To see why this corresponds to addition, we can revisit the counting argument. Let’s say that a is Bool and b is (), so that there are 2 values a and 1 value for b. How many values of type Add Bool () are there? We can list them out:

addValues = [AddL False, AddL True, AddR ()]


There are three values, and 3 = 2 + 1. This is often called a sum type. In Haskell the sum type is often called Either, defined as

data Either a b = Left a | Right b


but I’ll stick with Add.

### Multiplication

A type corresponding to multiplication is

data Mul a b = Mul a b


That is, the type a · b is a container holding both an a and a b. The counting argument justifies the correspondence with multiplication - if we fix a and b to both be Bool, the possible values of the type Mul Bool Bool are

mulValues = [Mul False False, Mul False True, Mul True False, Mul True True]


There are four values, and 4 = 2 x 2. This is often called a product type. In Haskell the product is the pair type:

data (,) a b = (a, b)


but I’ll stick with Mul.

### Zero

Using addition and multiplication we can generate types corresponding to all the numbers from 1 upwards - but what about 0? That would be a type with no values. It sounds odd, but you can define such a type:

data Void


Notice that there are no constructors in the data definition, so you can’t ever construct a value of type Void – it has zero values, just as we wanted!

## Laws in the algebra of Haskell types

What are the laws for the types we’ve just defined? Just like in the algebra of numbers, a law will assert the equality of two objects - in our case, the objects will be types.

However, when I talk about equality, I don’t mean Haskell equality, in the sense of the (==) function. Instead, I mean that the two types are in one-to-one correspondence – that is, when I say that two types a and b are equal, I mean that you could write two functions

from :: a -> b
to   :: b -> a


that pair up values of a with values of b, so that the following equations always hold (here the == is genuine, Haskell-flavored equality):

to (from a) == a
from (to b) == b


For example, I contend that the types Bool and Add () () are equivalent. I can demonstrate the equivalence with the following functions:

to :: Bool -> Add () ()

from :: Add () () -> Bool


I’ll use the triple equality symbol, ===, to denote this kind of equivalence between types.

### Laws for sum types

Here are two laws for addition:

Add Void a === a


which says that there are as many values of type Add Void a as there are of type a, and

Add a b === Add b a


which says that it doesn’t matter which order you add things in. These laws are probably more familiar to you in the algebra of numbers as

$$0 + x = x$$ $$x + y = y + x$$

If you fancy an exercise, you can demonstrate the correctness of the laws in the Haskell algebra – either with a counting argument, or by writing the functions from and to.

### Laws for product types

There are three useful laws for multiplication:

Mul Void a === Void


which says that if you multiply anything by Void, you get Void back,

Mul () a === a


which says that if you multiply by () you don’t change anything, and

Mul a b === Mul b a


which says that it doesn’t matter which order you multiply in. The more familiar forms of these laws are

$$0 \cdot x = 0$$ $$1 \cdot x = x$$ $$x \cdot y = y \cdot x$$

Two more exercises: (i) prove the validity of these laws in the Haskell algebra, and (ii) explain why we don’t need laws of the form

Mul a Void === Void
Mul a ()   === a


There’s also a law that relates the addition and multiplication operators:

Mul a (Add b c) === Add (Mul a b) (Mul a c)


This one is a bit trickier to reason about, but writing the corresponding from and to functions isn’t too hard. The arithmetic version of this law is the friendlier-looking

$$a \cdot (b + c) = a \cdot b + a \cdot c$$

called the distributive law.

## Function types

As well as concrete types like Int and Bool, in Haskell you also have function types, like Int -> Bool or Double -> String. How do these fit into the algebra?

To figure this out we can go back to the counting argument. How many functions of type a → b are there?

Let’s be concrete, and fix a and b to both be Bool. The value False can map to either False or True, and similarly for the value True - thus there are $$2 \cdot 2 = 2 ^ 2 = 4$$ possible functions Bool -> Bool. To be really explicit, we could enumerate them:

f1 :: Bool -> Bool -- equivalent to 'id'
f1 True  = True
f1 False = False

f2 :: Bool -> Bool -- equivalent to 'const False'
f2 _     = False

f3 :: Bool -> Bool -- equivalent to 'const True'
f3 _     = True

f4 :: Bool -> Bool -- equivalent to 'not'
f4 True  = False
f4 False = True


What happens if b is still Bool (with two values) and a is a type with three values, say

data Trio = First | Second | Third


Then each of First, Second and Third can map to two possible values, and in total there are $$2 \cdot 2 \cdot 2 = 2 ^ 3 = 8$$ functions of type Trio -> Bool.

The same argument holds in general. If there are $$A$$ values of type a, and $$B$$ values of type b, then the number of values of type a → b is

$$B ^ A$$

This justifies the common terminology for function types as exponential types.

### Laws for functions

There are two laws for function types that involve the unit type. They are

() -> a === a


which says that there are as many functions () -> a as there are values of type a, and

a -> () === ()


which says that there is only one function a -> () – in particular, it is const (). The arithmetic versions of these laws are

$$a^1 = a$$ $$1^a = 1$$

There is also a law that allows factoring out of common arguments:

(a -> b, a -> c) === a -> (b,c)


whose arithmetic form is

$$b^a \cdot c^a = (bc)^a$$

and a law about functions that return other functions:

a -> (b -> c) === (b,a) -> c


whose arithmetic form is

$$(c ^ b) ^ a = c ^ { b \cdot a }$$

This last law may be more familiar when the order of the variables in the pair on the right-hand side is switched, and the parens on the left hand side are removed:

a -> b -> c === (a,b) -> c


which just says that we can curry and uncurry functions. Again, it’s an interesting exercise to prove all of these laws by writing the corresponding to and from functions.

## Next time

In the next post I’ll look at recursive types, like lists and binary trees, and show how you can abuse the algebra of types in all kinds of ways to deduce interesting facts about types.

# I/O Is Pure

The code for this post is available in a gist.

A common question amongst people learning Haskell is whether I/O is pure or not. Haskell advertises itself as a purely functional programming language, but I/O looks like it’s inherently impure - for example, the function getLine, which gets a line from stdin, returns a different result depending on what the user types:

Prelude> x <- getLine
Hello
Prelude> x
"Hello"


How can this possibly be pure?

In this post I want to explain exactly why I/O in Haskell is pure. I’ll do it by building up data structures that represent blocks of code. These data structures can later be executed, and they cause effects to occur - but until that point we’ll always work with pure functions, never with effects.

Let’s look at a simplified form of I/O, where we only care about reading from stdin, writing to stdout and returning a value. We can model this with the following data type:

That is, an IOAction is one of the following three things:

• A container for a value of type a,
• A container holding a String to be printed to stdout, followed by another IOAction a, or
• A container holding a function from String -> IOAction a, which can be applied to whatever String is read from stdin.

Notice that the only terminal constructor is Return – that means that any IOAction must be a combination of Get and Put constructors, finally ending in a Return.

Some simple actions include the one that prints to stdout before returning ():

and the action that reads from stdin and returns the string unchanged:

To build up a language for doing I/O we need to be able to combine and sequence actions. We want the ability to perform an IOAction a followed by an IOAction b, and return some result.

In fact, we could have the second IOAction depend on the return value of the first one - that is, we need a sequencing combinator of the following type:

We want to take the IOAction a supplied in the first argument, get its return value (which is of type a) and feed that to the function in the second argument, getting an IOAction b out, which can be sequenced with the first IOAction a.

That’s a bit of a mouthful, but writing this combinator isn’t too hard. When we reach the final Return, we apply the function f to get a new action. For the other constructors, we keep the form of the action the same, and just thread seqio through the constructor:

Using seqio we can define the action that gets input from stdin and immediately prints it to the screen:

or even more complicated actions:

Although this looks like imperative code (admittedly with pretty unpleasant syntax), it’s really a value of type IOAction (). In Haskell, code can be data and data can be code.

In the gist I’ve defined a function to convert an IOAction to a String, which allows them to be printed, so you can load the file into GHCi and verify that hello is in fact just data:

*Main> print hello
Put "What is your name?" (
Get ($0 -> Put "What is your age?" ( Get ($1 ->
Put "Hello $0!" ( Put "You are$1 years old" (
Return ()
)
)
)
)
)
)


It will surprise no one to learn that IOAction is a monad. In fact we’ve already defined the necessary bind operation in seqio, so getting the Monad instance is trivial:

The main benefit of doing this is that we can now sequence actions using Haskell’s do notation, which desugars into calls to (>>=), and hence to seqio. Our earlier hello example can now be written as:

Remember though, that this is still just defining a value of type IOAction () - no code is executed, and no effects occur! So far, this post is 100% pure.

To see the effects, we need to define a function that takes an IOAction a and converts it into a value of type IO a, which can then be executed by the interpreter or the runtime system. It’s easy to write such a function just by turning it into the approprate calls to putStrLn and getLine.

You can now load up GHCi and apply run to any action – a value of type IO a will be returned, and then immediately executed by the interpreter:

*Main> run hello

Yes - an IOAction is a mini-language for doing I/O. In this mini language you are restricted to only reading from stdin and writing to stdout - there is no accessing files, spawning threads or network I/O.
In effect we have a “safe” domain-specific language. If a user of your program or library supplies a value of type IOAction a, you know that you are free to convert it to an IO a using run and execute it, and it will never do anything except reading from stdin and writing to stdout (not that those things aren’t potentially dangerous in themselves, but…)