1 2 3 

If you want to get values out of a monad you have to write a function yourself. Such a function would have a type h :: m a > a
where m
is a particular monad. It’d be nice if there were some guarantees about these functions should behave. We’re stepping outside the jurisdiction of the monad laws, so you don’t get any help there. So what are the relevant laws?
If you put a value into the monad and immediately extract a value, you want to get the same value you put in. In other words you want h (return a) = a
, which is the same as
1


If you have two layers of monad m (m a)
then there are two ways you could get a value out. You could lift h
into the monad with liftM
(or fmap
) to get values out of the inner monad, and then hit the whole thing with h
. Alternatively you could use the monadic join to collapse the two layers of the monad into one, and then apply h
once. It would be nice if you get the same result either way, i.e. if h (join x) = h (fmap h x)
, or
1


Taken together, these two laws are exactly the laws of an algebra over the monad. Categorically, an algebra over a monad M is an object A together with a map h : MA → A such that the following diagrams commute:
Note that the categorical definitions says that you have to specify an object as well as a morphism. In the category Hask objects are types. If we want to specify a type class for algebras, it must be a multiparameter type class where you specify the underlying type as well as the monad  i.e. the algebra doesn’t have to be polymorphic over all types, it is allowed to vary depending on the type of values in the monad.
In terms of do notation, the algebra laws mean that the following program transformation is valid, where xs
has type m (m a)
:
1 2 

Enough theory  what are some examples of algebras? For the list monad, both sum
and product
are list algebras for any numeric type. You can check that they satisfy the algebra laws, i.e.
1 2 

and similarly for product.
Note that the function length
isn’t a list algebra, since length (join [[1],[2,3]])
is 3
, but length (fmap length [[1],[2,3]])
is 2
.
The function head
is nearly a list algebra, only failing because it isn’t defined for the empty list. In fact it’s an algebra over the type of nonempty lists.
Similarly maximum
and minimum
are algebras over the type of nonempty lists.
More generally, if a
is an instance of the Monoid
type class, then foldr mappend mempty
is a list algebra. This is reflected in the definition of the fold
function in Data.Foldable, which has type
1


showing that fold
is an f
algebra for any f
that is an instance of Foldable
. This gives algebras over Maybe
, Either a
, Tree
and many more.
Given any value a
of type r
then h m = runReader m a
is a Reader r
algebra.
Similarly h m = evalState m a
is a State s
algebra, and fst . runWriter
is a Writer w
algebra.
For the monad of probability spaces, the expectation is an algebra. If P(PA) is a distribution of probability distributions, then it doesn’t matter if we take inner expectations before taking the outer expectation, or if we collapse the entire distribution first and takes its expectation. This is related to the law of iterated expectation. Note that higher moments aren’t algebras, and nor are central moments.
I had hoped that for the parser monad I could define an algebra by
1 2 3 4 5 6 7 8 

However, this fails the algebra laws. In particular, if we have a parser of type Parser (Parser a)
, the outer parser might need to read some of the input string before the inner parser will match correctly, so we can’t just map over the inner parser. So I wonder if there are any algebras for the parser monad?
For the IO monad, the obvious candidate for an algebra is
1


My intuition is that this shouldn’t be an algebra, i.e. it should fail to respect one of the algebra laws. However, I haven’t been able to construct an example where it fails yet. Anyone have any ideas?
]]>Consider the type of binary trees with values at the leaves
1 2 

This is both a functor and a monad –
1 2 3 4 5 6 7 8 

We want a way to get a value out of the tree. One way to do this is to pick a direction, either left or right, and then keep taking that branch until you reach a leaf.
1 2 3 

There are many different trees t
that all return the same value when you call either runTree t True
or runTree t False
. If two trees s
and t
return the same value under runTree
for all values of the boolean argument, say that s
is equivalent to t
and write s ~ t
.
This satisfies the axioms for an equivalence relation – reflexivity (every tree is equivalent to itself), symmetry (if s ~ t
then t ~ s
) and transitivity (if r ~ s
and s ~ t
then r ~ t
). So it makes sense to consider the set of equivalence classes of binary trees under this relation.
What may be surprising is that the type of “equivalence classes of binary trees” is also a monad. To define the monad we have to define return and bind for it. It’s nontrivial to do this in Haskell, but relatively easy mathematically.
Let C (for class) be the operation that maps a tree to its equivalence class, and R (for representative) be the operation that selects a particular tree from an equivalence class. Then it must be true that C(R(x)) = x. It isn’t true in general R(C(t)) = t, but we will always get a tree that’s equivalent to the one we put in.
We define ‘return a’ for the monad to be the equivalence class C(Leaf a).
To define x ≫= f first choose a representative of x and bind it to the function R ○ f using the normal bind operation for trees. Then take the equivalence class of the resulting tree. Symbolically,
So we have a monad  but what monad is it? Interestingly, it is exactly equivalent to Reader Bool
.
To see why, note that two trees whose far left branch and far right branch are the same are equivalent (because runTree
either traverses all the way down the left of the tree, or all the way down the right). So specifying an equivalence class of trees just requires specifying a pair of values of type a, i.e. the type a^{2.} This is equivalent to the type 2 → a, which is the same as functions Bool > a
, which is the same as Reader Bool a
.
To generalize to arbitrary readers we should consider rary trees Tree r a
rather than binary trees – that is, trees that have r branches at each node. To deal with fact that r might be infinite we work with functions r > Tree r a
, which are equivalent to (Tree r a)^r
, an rtuple of trees. In Haskell,
1 2 

Again this is both a functor and a monad
1 2 3 4 5 6 7 8 

and again we can write a function runTree
that, given an r
, takes the same branch of the tree repeatedly until it reaches a value:
1 2 3 

Consider two trees equivalent if they give the same result when fed to runTree r
, for all values of r
. Then the equivalence class of trees is a monad in the same way as before, and this monad is exactly the Reader r
monad. To make this really obvious we could rename the data type and constructors, and define the runReader
and ask
functions:
1 2 3 4 5 

You can then then use this like any other reader:
1 2 3 4 

The local
function can be defined similarly, in terms of Ask
.
One reason for thinking of readers as trees rather than using the more familiar definition is that the monadic join operator on trees T maps T ○ T structures of size n to T structures of size n (where by ‘size’ I mean the number of leaves in the tree). This isn’t true for monads in general – for example, two layers of Reader Bool a
is isomorphic to a^{4,} which gets mapped to a^{2} by monadic join. So join preserves the combinatorial interpretation of the tree.
This is related to some stuff I’m thinking about to do with combinatorial species and how they relate to functional programming. A comment by Brent Yorgey sparked the idea for this post.
It seems likely that given any monad T and a family of equivalence relations ~^{a} on Ta, the set of equivalence classes under the relation ~ is always a monad, but I haven’t thought much about it.
I think this is all extremely similar to a post written a few years back by Dan Piponi where he worked throught he details for the state monad instead of reader.
]]>1


In C it looks like this
1


What they have in common is that they take a format string as their first argument, followed by a variable number of arguments, of varying type. The arguments are formatted, interpolated into the string and then printed to the screen.
The immediate problem you face if you want to write printf in Haskell is the type system. All functions have a type, and the type determines what arguments it can accept. You can write polymorphic functions that accept a range of input types, but writing a function that accepts a variable number of arguments? That seems tough.
I recently read through the source of the Text.Printf module, and was impressed by how concise and selfcontained it is, and especially by the neat trick that allows printf
to take a variable number of arguments. In this post I’ll walk through creating a simplified, cutdown version of Text.Printf to give a flavor of the module.
I’ll need the following uncontroversial language extensions:
1


First define a data type to hold things that are possible arguments to printf
(in our cutdown module, we will only care about interpolating strings and integers into the format string).
1 2 

With this in hand, we could write the universal printf function, uprintf
. This takes a format string and a list of arguments to be interpolated into the string, and returns the formatted string. Its type signature is therefore:
1


All printf functionality will eventually be implemented in terms of this function. We’ll hold off on giving its implementation until later in the module, and focus on how we can write the polyvariadic, polymorphic printf
function.
We get polymorphism using type classes. Define a type class for types that can be converted into a UPrintf
, along with a couple of useful instances:
1 2 3 4 5 6 7 8 

There are multiple valid return types for the printf
function. For example, we might want to return a String
, or an IO
action that outputs to a file or to stdout, or more generally any instance of MonadIO
. So introduce a type class for valid return types:
1 2 

The spr
function takes a format string and a list of suitable arguments to be interpolated into the string, and returns something. The key is in what the valid return types are. Certainly, it makes sense to return a string by calling uprintf
(we reverse the argument list, for reasons that will become clear in a bit)
1 2 

It also makes sense to return an I/O action that does some printing
1 2 

Here’s the cool part. If a
is an instance of PrintfArg
and t
is an instance of PrintfType
, then a function a > t
is also a valid instance of PrintfType
– you should picture it as consuming the a
, and outputting another valid return type (which might be another function).
This is one way to implement polyvariadic functions in Haskell. Functions are curried by default so it’s easy to incrementally consume arguments, and return a function that consumes the rest of the arguments.
1 2 

This is why we need to reverse the argument list when we make the final call to uprintf
– the arg list acts like a stack, so that the last arguments pushed onto it are the first ones out. We need to reverse the stack at the end so that we format the arguments in the right order.
Now to write printf
, just call spr
with an empty argument list – the supplied arguments will be consed onto the list by spr
until there are none remaining, and uprintf
is called:
1 2 

The rest is just details. The uprintf
function incrementally consumes the format string, building the output up as it goes. If it exhausts either the format string or the argument string while the other still has elements remaining, then it throws an error. Otherwise it watches for the %
signs that signal a value to be interpolated, and calls the correct formatting function:
1 2 3 4 5 6 7 8 9 

The fmt
function looks at what type of value is to be interpolated, and makes a call to the appropriate function for converting the argument into a string:
1 2 3 4 5 6 7 8 9 10 11 

Our simple module only allows for strings and integers to be interpolated into the format string, and doesn’t allow for complicated format strings like "%8d"
or "%6.2f"
. These features can be implemented by adding more cases to fmt
– the fundamental architecture that allows for polymorphic, polyvariadic functions is in place.
The question “How do you swap two variables without using a temporary variable?” is a favorite of unimaginative interviewers everywhere. If you haven’t seen it, the typical solution is three bitwise exclusive or (xor) operations in succession:
1 2 3 4 5 6 7 

As you can imagine, this is extremely useful in situations where you just can’t spare even one byte of memory to hold a temporary variable.
This is known as the ‘xor trick’. But it’s not the only solution. Here’s another way:
1 2 3 4 5 6 7 

That looks a bit different  instead of performing the same operation three times, we performed an addition followed by two subtractions.
Can we describe all possible solutions? Sure  you just need the concept of a mathematical group. A group is a set of values G together with a binary operation (traditionally written as a filled dot · ) that satisfy four axioms:
Closure – for all values a and b in G their multiplication a · b is also in G.
Associativity – for all a, b and c in G you have (a · b) · c = a · (b · c).
Identity – there is a value e in G such that for every a in G, the equation e · a = a · e = a holds.
Inverses – for every a in G there is an inverse a^{1} such that the equations a^{1} · a = e = a · a^{1} hold.
If the types of a and b form a group, then we can pull off the ‘xor trick’ like this (using the left arrow for assignment)
The values of a and b as we perform each assignment can be shown in a table. Initially we’ve set a = c and b = d.
Note that it’s important that we write b^{1} · a in the last assignment and not a · b^{1} since you can’t in general assume that the group you’re working with is commutative.
Groups come up a lot in programming, so there are many ways to pull off the variable swapping trick:
It works when a and b are integers, since integers form a group where the operation is addition, the identity is 0 and the inverse of n is n. Remarkably we don’t have to worry about integer overflow when adding two machine ints, since integers also form a group under addition modulo N.
It works when a and b have fixedlength binary representations, since bitstrings form a group where the operation is bitwise xor (the ^
operator), the identity is a string of 0s and every string is its own inverse (this last property is why when you do this variable swap trick with xor, you end up repeating the same operation three times).
It almost works when a and b are nonzero floats – it’s nearly (but not quite) true that nonzero floats form a group where the operation is multiplication, the identity is 1 and the inverse of x is 1/x. It’s not quite a group, because of issues like over and underflow, NaN
values and nonassociativity of floating point multiplication.
It works when a and b are matrices of fixed dimensions, which form a group where the operation is addition, the identity is the zero matrix and the inverse is given by elementwise negation.
It almost works when a and b are square, nonsingular matrices of the same dimension, which form a group where the operation is matrix multiplication, the identity is the identity matrix and inverses are given by the matrix inverse (again, this only works modulo floating point weirdness).
It works when a and b are permutations, which are a group under composition, with identity given by the null permutation and inverses are ‘undo all the swaps’.
So there you go  a multitude of ways to swap two variables without using a temporary variable. With math!
Edit: It was pointed out to me by a commenter on Reddit that you don’t need the full properties of a group. A quasigroup is a set Q with three binary operations: multiplication *
, left division \
and right division /
which satisfy the following axioms:
Every group is automatically a quasigroup if we define a * b = a · b, a / b = a · b^{1} and a \ b = a^{1} · b. So quasigroups are a generalization of groups. If we have a quasigroup then the xor trick can be expressed as
You can follow through the assignments using the quasigroup axioms where appropriate to prove that this really doe swap the two variables.
It’s exciting that every Latin square defines a quasigroup, and complete solutions to Sudoku are examples of Latin squares. Therefore every problem of the type “swap two variables without using a temp variable” can be solved using a sufficiently large Sudoku.
]]>Last time I described recursive types, showed how they can be written as algebraic equations, and you you can solve the equations to deduce interesting facts about types. In this post I’ll explain what it means to do calculus with types.
The regular Haskell list is a linked list. While they are simple to define and convenient to work with, they have disadvantages. If you want to append an element to the list of length n, it takes O(n) time. Accessing or modifying elements partway through the list is also slow. Linked lists don’t support efficient random access.
You could solve this with an array, but immutable arrays have their own problems. Data sharing is harder, for example, if you want the elements of the array to be contiguous in memory.
Fortunately, not all applications require random access. Sometimes you just want to keep track of your location within a list, with the ability to move around in the list, possibly modifying elements as you go.
This is exactly the functionality offered by a zipper, which is a data structure with a pointer to a distinguished location, called the focus.
You can write a zipper for any data structure, but I’ll focus on lists.
To store the pointer we could wrap the list up with an Int
that stores the location currently in focus. That’s fraught with all kind of difficulties, though – and it doesn’t meet our requirements for efficient access to the element in focus.
Instead, we define a new type composed of the element in focus, coupled with a pair of lists  one of which contains the elements before the focus, and the other the elements after the focus. Schematically it looks like this:
The element in focus is in the center, and the two lists on either side. In Haskell we would write:
data Zipper a = Z [a] a [a]
For a zipper to be useful we need to define functions that move the focus left and right. For efficiency reasons we store the first list in reverse order – we need to efficiently add and remove elements from the end of the list, rather than the head:
left (Z (l:ls) c rs) = Z ls l (c:rs)
right (Z ls c (r:rs)) = Z (c:ls) r rs
If you want to read more about zippers, the final chapter of Learn You a Haskell has a great introduction.
The zipper is a product type – it’s the product of two lists of as and a single a. Remembering that lists are L(a) = 1 / (1  a), the list zipper LZ(a) is written as
Because a zipper is a product type, it’s possible to tease it apart, decomposing it into a single piece of data of paired with two lists. The two lists tell us where the focus is, and we complete the zipper by filling in the piece of data that’s in focus. Algebraically we have
which is clearly equivalent to the previous expression. The first part of the product is just a piece of data, and we can think of the second as a data structure with a hole. When we ‘fill in’ the hole by pairing it with some data, we get a zipper. This ‘data structure with a hole’ goes by the name of a onehole context. For the list zipper, this looks like:
The empty circle is a hole, representing ‘no data’ – it’s a placeholder for data. The type that contains no data is the unit type ()
, so we can say that the type of a hole is ()
.
There’s nothing special about list zippers – any zipper can be decomposed into a onehole context, and a piece of data that fills the hole.
I said that you can write a zipper for any data type, so let’s write some zippers for tuples – that is, types of the form \( a ^ n \) for some whole number n.
The simplest case is a onetuple, which is just a single piece of data. A zipper for a onetuple is trivial  there’s only one location for data, so there’s only one focus – the zipper for a onetuple a is just a.
But remember that a zipper is a piece of data paired with a onehole context, so what is the onehole context? There’s only one place to put the hole, and once we’ve put a hole in a single piece of data, we’re just left with the hole! Schematically, the pairing of the data with the onehole context looks like this:
The type of the onehole context has to be ()
, or 1, so that when we paired it with the data a, we get a back again.
In a twotuple there are two possible locations for the hole. Remembering that a sum represents a choice, either the left constructor or the right constructor, the pairing of data and onehole context looks like:
Remembering that the type of the hole is 1, the type of the onehole context is a + a, or 2a.
In a threetuple there are three possible locations for the hole. The pairing of data and onehole context looks like
The type of the onehole context is a² + a² + a², or 3a².
Perhaps you’ve already seen the pattern. In the left column are the types of ntuples for n = 1, 2, 3 and in the right column are their corresponding zippers:
If you’ve studied any calculus, you will immediately spot the pattern – for an ntuple of type \( a ^ n \), the type of the zipper is \( n a ^ {n1} \), which is exactly the derivative of \( a ^ n \).
In fact, this pattern always applies! This was noted by Conor McBride in a paper, whose title gives the key insight:
If you take a type expression and differentiate it, the result is the type of onehole contexts for that type. This makes it easy to figure out how to write zippers for arbitrary data types.
I’ll use the notation \( \partial_a \) to mean ‘take the derivative with respect to a’. You may already be familiar with the notation \( d / da\) for differentiation – this is no different, but it has the advantage that it can be typeset on one line!
How far can we push the analogy between taking the derivative of a type and finding its type of onehole contexts? Is it valid to differentiate a type in all contexts, or do we have to be careful? We should certainly check that the operation of differentiation makes sense when applied to the types we already know about, for example.
Here’s a simple rule from calculus – the derivative of any constant is zero:
To interpret this for types, remember that the derivative operator d/da makes holes in a data structure that contains data of type a. This equation is saying that if a type doesn’t have any data of type a, then its onehole context is Void
, i.e. you can’t create any instances of it.
The sum rule for derivatives is that for any f(a) and g(a), differentiating the sum is the same as differentiating individually and then summing:
This tells us that making a hole in the sum type is equivalent to making a hole in each of the summands, and taking their sum. In more explicit language, if F a
and G a
are types with onehole contexts DF a
and DG a
, then the onehole context of
type Sum a = Add (F a) (G a)
is
type DSum a = Add (DF a) (DG a)
In calculus, the derivative of a product is given by Leibniz’s product rule:
Interpreted as a statement about types, this tells us that making a hole in the product of two types is equivalent to either making a hole in the first type (and leaving the second one as it is) or making a hole in the second type (and leaving the first one as it is). That is, the onehole context of
type Prod a = (F a, G a)
is
type DProd a = Add (DF a, G a) (F a, DG a)
We haven’t yet met composition. This occurs when you have a data structure F
that holds elements of type G a
. In Haskell, we could write this as
data Compose f g a = Compose (f (g a))
where f
and g
have kind * > *
and a
has kind *
, although its simpler to keep the syntax light and just write f (g a)
.
In calculus the chain rule tells us how to differentiate compositions:
As a type equation, this says that to make a hole in a layered data structure, we need a product – one half of the product tells us where the hole is in the outer structure, and the other half keeps track of the hole in the inner structure. That is, the onehole context of
type Comp a = F (G a)
is
type DComp a = (DG a, DF (G a))
I said that knowing that the derivative of an ADT corresponds to its type of onehole contexts makes it easy to derive zippers for arbitrary data types. Now I’ll justify that assertion.
The type of lists is
We can differentiate this using the quotient rule and the chain rule, finding
which says that the onehole context of a list is a pair of lists, exactly as we saw earlier. To get the list zipper we take the product of an a with a pair of lists L(*a).
Remember that the type of binary trees T(a) is defined by the recursive equation
We could solve for T(a) as we did in the previous post, and differentiate the result. However, it’s much easier to use implicit differentiation to differentiate the above expression, to get
We can then solve this for dT/da, finding
To make sense of this, remember that the type 1 / (1  x) is synonymous with the type L(x), of lists of x. So the tree zipper is
That is, the onehole context of a tree is a product of two trees, and a list holding elements of type (Bool, a, Tree a)
. It looks like this:
The hole is our current location, the focus. Beneath the hole are two trees, which hold the part of the tree that we haven’t yet traversed. In this picture they are both singleelement trees.
The sequence of red nodes leading back to the top of the tree is the past  the elements of the tree that we traversed to end up at the hole. Each node is associated with a Bool
, which tells us whether we took the left or the right branch after passing through that node, and a Tree a
which contains all the elements that we missed by not going down the other path.
Exercise: Code up the zipper for binary trees, remembering that a zipper is a onehole context paired with data. You’ll need to write functions left
and right
to take the left or right path at a particular node, and up
to go back up the tree.
Exercise: Rose trees are trees with arbitrarily many branches from each node. In Haskell you would define them by
data Rose a = Rose a [Rose a]
What does a rose tree zipper look like?
In this post we learned what it means to take the derivative of an ADT, and how to manipulate the type algebra to automatically derive zippers for arbitrary data types.
Next time I’ll talk about assigning a meaning to subtraction and division of types – and I might get on to mentioning combinatorial species.
]]>Last time I introduced the unit type Unit
or ()
and the zero type Void
. I also introduced the type operators Add
and Mul
, and the function type a > b
.
In this post I’ll be using both Haskell notation and mathematical notation. The table below will help you convert back and forth between the two.
In this post I’ll talk about recursive types, and show how you can abuse the algebra of types to deduce interesting things about them.
Let’s ease ourselves in by exploring Maybe a
. This is the type that indicates that it might contain a value of type a, but it also might be empty. It’s defined like this
data Maybe a = Nothing  Just a
The vertical bar indicates that this is a sum type, so we could write it using the type Add
that we defined last time, if we first defined the types Nothing
and Just a
. It would look like this:
data Nothing = Nothing
data Just a = Just a
type Maybe a = Add Nothing (Just a)
See how we’ve replace the data
declaration with a type
declaration? That means that Maybe a
is no longer a new type – it’s just a synonym for a type we already knew about.
But we can take it further. Notice that Nothing
only has one value, so it’s equivalent to ()
. Similarly, Just
is a container with a single value of type a, so it’s equivalent to a. We therefore have
type Maybe a = Add () a
But that’s the same as saying that Maybe a is the same as 1 + a. All that Maybe
does is add one more possible value to a type.
The basic list in Haskell is a linked list. A list of as is either empty, denoted []
, or it is a cons of a single a onto another list of as, denoted a : as
. If we wanted to define lists ourself, we could write something like
data List a = Nil  Cons a (List a)
Let’s take a second to look at the structure of this declaration. Just like Maybe
, the List
type is the sum of two simpler types.
The first summand is Nil
, a nullary constructor which is equivalent to ()
.
The second summand is Cons a (List a)
, which is a product consisting of an a and a list of as.
The algebraic form, if we write lists as L(a), is
It seems as though we should be able to write the list type in Haskell as
type List a = Add () (a, (List a))
but in fact that won’t compile. The reason is that type synonyms are expanded out at compile time, after type checking, but before compilation. This definition would never finish expanding – it would just keep growing as
Add () (a, Add () (a, Add () (a, ...)))
and so on. The reason that this doesn’t work is to do with the way that Haskell handles recursive type definitions (like most languages with sophisticated type systems, it uses isorecursive rather than equirecursive types).
The way around it is to use a newtype
declaration instead of a type
declaration, and wrap the type inside a new constructor L
:
newtype List a = L (Add () (a, List a))
This is just to satisfy the type checker – when the code is compiled the additional constructor will be optimized away, and what we’re left with will be identical to the type
declaration above.
If I write the type List a
as L(a) then this declaration for lists says that
To see what lists really are, we can start expanding the definition by repeated substitution. In the expansion, whenever I see a type of the form a · a, I’ll replace it with a² to save space.
Exercise: the expression a² can denote a pair (a, a) or a function 2 → a. Show these are the same.
This is telling us that a list of a is either the empty list, or it’s a list containing a single a, or a list of two a, or three a, etc. Maybe you already knew that  but it’s neat that the algebra is telling us!
But now, here is something really cool. Let’s start with our equation for lists, and instead of repeatedly substituting, we’ll forget for a moment that the objects are types and pretend that they’re arbitrary mathematical expressions. That means we’re justified in using whatever techniques we like to “solve for L(a)”.
First subtract a · L(a) from both sides:
Now factor out the L(a) on the lefthand side:
And finally, divide both sides by 1  a:
This looks nonsensical, because we don’t know what it means to subtract one type from another, and we certainly don’t know what it means to divide one type by another. But is it telling us something interesting anyway?
If you studied calculus, you may remember that many functions have an expansion as a Taylor Series. We can ask Wolfram Alpha what the Taylor series of 1 / (1  a) is, and it tells us:
That is, the series expansion of L(a) is exactly what we deduced from our repeated substitutions. Although we abused the algebra horribly, by performing operations that are completely unjustified on types, we still ended up with a sensible answer.
Consider the type of binary trees with values at the nodes. In Haskell you can write this as
data Tree a = Empty  Node a (Tree a) (Tree a)
In a way that is probably becoming familiar, we can look at this as a sum of two types  a nullary type equivalent to ()
, and a product type. This time it is a product of three terms, but that’s not a problem  we just use a nested product, of the form (a, (b, c))
.
The definition in terms of operators that we already know about (again, using a newtype
to allow us to have a recursive definition) is
newtype Tree a = T (Add () (a, (Tree a, Tree a)))
In algebraic language, with T(a) for the type of trees, we would write
To get a better handle on what trees are, we could apply repeated substitution in the same way we did for lists, but it would be much messier. Instead, can we apply the trick of rearranging the equation to solve for T(a)?
First, rearrange to get everything on one side of the equation:
We can now recognize this as a quadratic equation for T(a), which we can solve using the quadratic formula, to get
This makes even less sense than the equation for lists. What on earth does it mean to take the square root of a type? But, keeping calm, we again ask Wolfram Alpha for the series expansion, and it tells us
How do we make sense of this? The first two terms tell us that a tree can either be empty (if it is Empty
) or it can have one value of type a (if it is Node a Empty Empty
).
Then the next term tells us that a tree can contain two values of type a, in two different ways, while the next term tells us that there can be three values of type a in five different ways.
But that’s exactly what we notice if we enumerate all of the binary trees, grouping them by the number of values they contain. There is one tree with no values and one tree with one value. Then there are two trees with two values and five trees with three values – as in this picture (borrowed from the excellent book Analytic Combinatorics by Flajolet and Sedgewick).
The equation counts the number of different binary trees that can exist! This counting property is related to the simple valuecounting examples we saw in the first post, and also related to combinatorial species, as written about extensively by Brent Yorgey. Species and types have a lot in common, although they are not the same thing.
If we restrict to trees that contain only the unit type, i.e. Tree ()
, then the a appearing in T(a) is equal to 1, and we can write the defining equation of trees as
By playing with the algebra, and repeatedly using the fact that \( T ^ 2 = T  1 \), we can deduce that
which, when interpreted in the language of types, says that a sixtuple of trees is equivalent to the unit type. In other words, there is only one sixtuple of trees.
This is clearly nonsense, so what’s gone wrong? Even more mysterious is that if we just multiply both sides of the equation by T, we get
which isn’t nonsense  it says that a seventuple of trees is equivalent to a single tree.
At first glance this isn’t a profound result. Any two types that have a countably infinite number of possible values are equivalent  that’s what being countable means.
However, it’s more subtle than this. In the paper Seven Trees In One, Andreas Blass not only shows how to find an explicit map from seven trees to one and back again (essentially he tells you how to write the from
and to
functions from the previous post), but he shows that the functions never have to look more than four levels deep into any of the trees.
Finally, he explains why deducing that \( T ^ 7 = T \) is valid, but \( T ^ 6 = 1 \) is not. It turns out that you can turn a derivation like the one I gave above, which uses subtraction, into an ‘honest’ proof that doesn’t use subtraction, and hence is valid for types – but only when the power of T that you start with is one more than a multiple of 6. Hence you can reduce \( T ^ 7 \) to \( T \) (because 7 = 6 + 1) but you can’t reduce \(T ^ 6\) to 1.
It would be fair if you come away thinking that this post has raised more questions than it has supplied answers. For example,
As a teaser for the last question, a mathematical structure that models types is the semiring. That’s just a way of saying that you can add and multiply objects, and you have objects corresponding to 0 and 1. There is a paper by Marcelo Fiore and Tom Leinster that shows that if you can start with a complex number t defined by the equation
for some polynomial p, and deduce
where q1 and q2 aren’t constant (they can’t just be 1) then the same result is true for semirings. In particular, it is true for types, and you can find a proof that doesn’t use subtraction, or any other operators that are illegal for types.
Next time, I’ll explain what zippers are, and describe how to do calculus with types.
]]>In this series of posts I’ll explain why Haskell’s data types are called algebraic  without mentioning category theory or advanced math.
The algebra you learned in high school starts with numbers (e.g. 1, 2, 3 …) and operators (e.g. addition and multiplication). The operators give you a way to combine numbers and make new numbers from them. For example, combining 1 and 2 with the operation of addition gives you another number, 3  a fact that we normally express as
When you get a little older you are introduced to variables (e.g. x, y, z …) which can stand for numbers. Further still, and you learn about the laws that algebra obeys. Laws like
which hold for all values of x. There are other laws as well, which define properties of numbers or of operations.
When mathematicans talk about algebra, they mean something more general than this. A mathematical algebra has three parts:
In high school algebra the objects are numbers and the operations are addition, multiplication and friends.
In the algebra of Haskell types, the objects are types, for example Bool
and Int
. The operators take types that already exist, and generate new types from them. An example is the type constructor Maybe
. It’s not a type itself, but you use it to create types  for example Maybe Bool
and Maybe Int
, which are types. Another example is Either
, which creates a new type from two old types  for example Either Int Bool
.
A connection to the more familiar algebra of numbers can be seen by counting the possible values that a type has. Take Bool
, defined by
data Bool = False  True
There are two values that an object of type Bool
can have – it is either False
or True
(technically it could also be undefined
– a fact that I’m going to ignore for the rest of the post). Loosely, the type Bool
corresponds to the number ‘2’ in the algebra of numbers.
If Bool
is 2, then what is 1? It should be a type with only one value. In the computer science literature such a type is often called Unit
and defined as
data Unit = Unit
In Haskell there is already a type with only one value  it’s called ()
(pronounced “unit”). You can’t define it yourself, but if you could it would look like
data () = ()
Using this counting analogy, Int
corresponds to the number \(2 ^ {32}\), as this is the number of values of type Int
(at least, it is on my machine).
In principle we could types corresponding to 3, 4, 5 and so on. Sometimes we might have a genuine need to do this  for example, the type corresponding to 7 is useful for encoding days of the week. But it would be nicer if we could build up new types from old. This is where the operators of the algebra come in.
A type corresponding to addition is
data Add a b = AddL a  AddR b
That is, the type a + b is a tagged union, holding either an a or a b. To see why this corresponds to addition, we can revisit the counting argument. Let’s say that a is Bool
and b is ()
, so that there are 2 values a and 1 value for b. How many values of type Add Bool ()
are there? We can list them out:
addValues = [AddL False, AddL True, AddR ()]
There are three values, and 3 = 2 + 1. This is often called a sum type. In Haskell the sum type is often called Either
, defined as
data Either a b = Left a  Right b
but I’ll stick with Add
.
A type corresponding to multiplication is
data Mul a b = Mul a b
That is, the type a · b is a container holding both an a and a b. The counting argument justifies the correspondence with multiplication  if we fix a and b to both be Bool
, the possible values of the type Mul Bool Bool
are
mulValues = [Mul False False, Mul False True, Mul True False, Mul True True]
There are four values, and 4 = 2 x 2. This is often called a product type. In Haskell the product is the pair type:
data (,) a b = (a, b)
but I’ll stick with Mul
.
Using addition and multiplication we can generate types corresponding to all the numbers from 1 upwards  but what about 0? That would be a type with no values. It sounds odd, but you can define such a type:
data Void
Notice that there are no constructors in the data definition, so you can’t ever construct a value of type Void
– it has zero values, just as we wanted!
What are the laws for the types we’ve just defined? Just like in the algebra of numbers, a law will assert the equality of two objects  in our case, the objects will be types.
However, when I talk about equality, I don’t mean Haskell equality, in the sense of the (==)
function. Instead, I mean that the two types are in onetoone correspondence – that is, when I say that two types a and b are equal, I mean that you could write two functions
from :: a > b
to :: b > a
that pair up values of a with values of b, so that the following equations always hold (here the ==
is genuine, Haskellflavored equality):
to (from a) == a
from (to b) == b
For example, I contend that the types Bool
and Add () ()
are equivalent. I can demonstrate the equivalence with the following functions:
to :: Bool > Add () ()
to False = AddL ()
to True = AddR ()
from :: Add () () > Bool
from (AddL _) = False
from (AddR _) = True
I’ll use the triple equality symbol, ===
, to denote this kind of equivalence between types.
Here are two laws for addition:
Add Void a === a
which says that there are as many values of type Add Void a
as there are of type a
, and
Add a b === Add b a
which says that it doesn’t matter which order you add things in. These laws are probably more familiar to you in the algebra of numbers as
If you fancy an exercise, you can demonstrate the correctness of the laws in the Haskell algebra – either with a counting argument, or by writing the functions from
and to
.
There are three useful laws for multiplication:
Mul Void a === Void
which says that if you multiply anything by Void
, you get Void
back,
Mul () a === a
which says that if you multiply by ()
you don’t change anything, and
Mul a b === Mul b a
which says that it doesn’t matter which order you multiply in. The more familiar forms of these laws are
Two more exercises: (i) prove the validity of these laws in the Haskell algebra, and (ii) explain why we don’t need laws of the form
Mul a Void === Void
Mul a () === a
There’s also a law that relates the addition and multiplication operators:
Mul a (Add b c) === Add (Mul a b) (Mul a c)
This one is a bit trickier to reason about, but writing the corresponding from
and to
functions isn’t too hard. The arithmetic version of this law is the friendlierlooking
called the distributive law.
As well as concrete types like Int
and Bool
, in Haskell you also have function types, like Int > Bool
or Double > String
. How do these fit into the algebra?
To figure this out we can go back to the counting argument. How many functions of type a → b are there?
Let’s be concrete, and fix a and b to both be Bool
. The value False
can map to either False
or True
, and similarly for the value True
 thus there are \(2 \cdot 2 = 2 ^ 2 = 4\) possible functions Bool > Bool
. To be really explicit, we could enumerate them:
f1 :: Bool > Bool  equivalent to 'id'
f1 True = True
f1 False = False
f2 :: Bool > Bool  equivalent to 'const False'
f2 _ = False
f3 :: Bool > Bool  equivalent to 'const True'
f3 _ = True
f4 :: Bool > Bool  equivalent to 'not'
f4 True = False
f4 False = True
What happens if b is still Bool
(with two values) and a is a type with three values, say
data Trio = First  Second  Third
Then each of First
, Second
and Third
can map to two possible values, and in total there are \(2 \cdot 2 \cdot 2 = 2 ^ 3 = 8\) functions of type Trio > Bool
.
The same argument holds in general. If there are \(A\) values of type a, and \(B\) values of type b, then the number of values of type a → b is
This justifies the common terminology for function types as exponential types.
There are two laws for function types that involve the unit type. They are
() > a === a
which says that there are as many functions () > a
as there are values of type a
, and
a > () === ()
which says that there is only one function a > ()
– in particular, it is const ()
. The arithmetic versions of these laws are
There is also a law that allows factoring out of common arguments:
(a > b, a > c) === a > (b,c)
whose arithmetic form is
and a law about functions that return other functions:
a > (b > c) === (b,a) > c
whose arithmetic form is
This last law may be more familiar when the order of the variables in the pair on the righthand side is switched, and the parens on the left hand side are removed:
a > b > c === (a,b) > c
which just says that we can curry and uncurry functions. Again, it’s an interesting exercise to prove all of these laws by writing the corresponding to
and from
functions.
In the next post I’ll look at recursive types, like lists and binary trees, and show how you can abuse the algebra of types in all kinds of ways to deduce interesting facts about types.
]]>A common question amongst people learning Haskell is whether I/O is pure or not. Haskell advertises itself as a purely functional programming language, but I/O looks like it’s inherently impure  for example, the function getLine
, which gets a line from stdin, returns a different result depending on what the user types:
Prelude> x < getLine
Hello
Prelude> x
"Hello"
How can this possibly be pure?
In this post I want to explain exactly why I/O in Haskell is pure. I’ll do it by building up data structures that represent blocks of code. These data structures can later be executed, and they cause effects to occur  but until that point we’ll always work with pure functions, never with effects.
Let’s look at a simplified form of I/O, where we only care about reading from stdin, writing to stdout and returning a value. We can model this with the following data type:
1 2 3 

That is, an IOAction
is one of the following three things:
a
,String
to be printed to stdout, followed by another IOAction a
, orString > IOAction a
, which can be applied to whatever String
is read from stdin.Notice that the only terminal constructor is Return
– that means that any IOAction
must be a combination of Get
and Put
constructors, finally ending in a Return
.
Some simple actions include the one that prints to stdout before returning ()
:
1


and the action that reads from stdin and returns the string unchanged:
1


To build up a language for doing I/O we need to be able to combine and sequence actions. We want the ability to perform an IOAction a
followed by an IOAction b
, and return some result.
In fact, we could have the second IOAction
depend on the return value of the first one  that is, we need a sequencing combinator of the following type:
1


We want to take the IOAction a
supplied in the first argument, get its return value (which is of type a
) and feed that to the function in the second argument, getting an IOAction b
out, which can be sequenced with the first IOAction a
.
That’s a bit of a mouthful, but writing this combinator isn’t too hard. When we reach the final Return
, we apply the function f
to get a new action. For the other constructors, we keep the form of the action the same, and just thread seqio
through the constructor:
1 2 3 

Using seqio
we can define the action that gets input from stdin and immediately prints it to the screen:
1


or even more complicated actions:
1 2 3 4 5 6 

Although this looks like imperative code (admittedly with pretty unpleasant syntax), it’s really a value of type IOAction ()
. In Haskell, code can be data and data can be code.
In the gist I’ve defined a function to convert an IOAction
to a String
, which allows them to be printed, so you can load the file into GHCi and verify that hello
is in fact just data:
*Main> print hello
Put "What is your name?" (
Get ($0 >
Put "What is your age?" (
Get ($1 >
Put "Hello $0!" (
Put "You are $1 years old" (
Return ()
)
)
)
)
)
)
It will surprise no one to learn that IOAction
is a monad. In fact we’ve already defined the necessary bind operation in seqio
, so getting the Monad
instance is trivial:
1 2 3 

The main benefit of doing this is that we can now sequence actions using Haskell’s do
notation, which desugars into calls to (>>=)
, and hence to seqio
. Our earlier hello
example can now be written as:
1 2 3 4 5 6 

Remember though, that this is still just defining a value of type IOAction ()
 no code is executed, and no effects occur! So far, this post is 100% pure.
To see the effects, we need to define a function that takes an IOAction a
and converts it into a value of type IO a
, which can then be executed by the interpreter or the runtime system. It’s easy to write such a function just by turning it into the approprate calls to putStrLn
and getLine
.
1 2 3 4 

You can now load up GHCi and apply run
to any action – a value of type IO a
will be returned, and then immediately executed by the interpreter:
*Main> run hello
What is your name?
Chris
What is your age?
29
Hello Chris!
You are 29 years old
Is there any practical use to this?
Yes  an IOAction
is a minilanguage for doing I/O. In this mini language you are restricted to only reading from stdin and writing to stdout  there is no accessing files, spawning threads or network I/O.
In effect we have a “safe” domainspecific language. If a user of your program or library supplies a value of type IOAction a
, you know that you are free to convert it to an IO a
using run
and execute it, and it will never do anything except reading from stdin and writing to stdout (not that those things aren’t potentially dangerous in themselves, but…)