Chris Taylor

Mainly math and haskell

The Algebra of Algebraic Data Types, Part 1

I gave a talk on this subject at London Haskell in November 2012. A video of the talk is on YouTube and slides are on GitHub.

In this series of posts I’ll explain why Haskell’s data types are called algebraic - without mentioning category theory or advanced math.

The algebra you learned in high school starts with numbers (e.g. 1, 2, 3 …) and operators (e.g. addition and multiplication). The operators give you a way to combine numbers and make new numbers from them. For example, combining 1 and 2 with the operation of addition gives you another number, 3 - a fact that we normally express as

$$ 1 + 2 = 3 $$

When you get a little older you are introduced to variables (e.g. x, y, z …) which can stand for numbers. Further still, and you learn about the laws that algebra obeys. Laws like

$$ 0 + x = x $$ $$ 1 \cdot x = x$$

which hold for all values of x. There are other laws as well, which define properties of numbers or of operations.

When mathematicans talk about algebra, they mean something more general than this. A mathematical algebra has three parts:

  • Objects are the ‘things’ of the algebra. The collection of objects defines what we’re talking about.
  • Operations give us ways to combine old things to make new things.
  • Laws are relationships between the objects and the operations.

In high school algebra the objects are numbers and the operations are addition, multiplication and friends.

The algebra of Haskell types

In the algebra of Haskell types, the objects are types, for example Bool and Int. The operators take types that already exist, and generate new types from them. An example is the type constructor Maybe. It’s not a type itself, but you use it to create types - for example Maybe Bool and Maybe Int, which are types. Another example is Either, which creates a new type from two old types - for example Either Int Bool.

Counting

A connection to the more familiar algebra of numbers can be seen by counting the possible values that a type has. Take Bool, defined by

data Bool = False | True

There are two values that an object of type Bool can have – it is either False or True (technically it could also be undefined – a fact that I’m going to ignore for the rest of the post). Loosely, the type Bool corresponds to the number ‘2’ in the algebra of numbers.

If Bool is 2, then what is 1? It should be a type with only one value. In the computer science literature such a type is often called Unit and defined as

data Unit = Unit

In Haskell there is already a type with only one value - it’s called () (pronounced “unit”). You can’t define it yourself, but if you could it would look like

data () = ()

Using this counting analogy, Int corresponds to the number \(2 ^ {32}\), as this is the number of values of type Int (at least, it is on my machine).

Addition

In principle we could types corresponding to 3, 4, 5 and so on. Sometimes we might have a genuine need to do this - for example, the type corresponding to 7 is useful for encoding days of the week. But it would be nicer if we could build up new types from old. This is where the operators of the algebra come in.

A type corresponding to addition is

data Add a b = AddL a | AddR b

That is, the type a + b is a tagged union, holding either an a or a b. To see why this corresponds to addition, we can revisit the counting argument. Let’s say that a is Bool and b is (), so that there are 2 values a and 1 value for b. How many values of type Add Bool () are there? We can list them out:

addValues = [AddL False, AddL True, AddR ()]

There are three values, and 3 = 2 + 1. This is often called a sum type. In Haskell the sum type is often called Either, defined as

data Either a b = Left a | Right b

but I’ll stick with Add.

Multiplication

A type corresponding to multiplication is

data Mul a b = Mul a b

That is, the type a · b is a container holding both an a and a b. The counting argument justifies the correspondence with multiplication - if we fix a and b to both be Bool, the possible values of the type Mul Bool Bool are

mulValues = [Mul False False, Mul False True, Mul True False, Mul True True]

There are four values, and 4 = 2 x 2. This is often called a product type. In Haskell the product is the pair type:

data (,) a b = (a, b)

but I’ll stick with Mul.

Zero

Using addition and multiplication we can generate types corresponding to all the numbers from 1 upwards - but what about 0? That would be a type with no values. It sounds odd, but you can define such a type:

data Void

Notice that there are no constructors in the data definition, so you can’t ever construct a value of type Void – it has zero values, just as we wanted!

Laws in the algebra of Haskell types

What are the laws for the types we’ve just defined? Just like in the algebra of numbers, a law will assert the equality of two objects - in our case, the objects will be types.

However, when I talk about equality, I don’t mean Haskell equality, in the sense of the (==) function. Instead, I mean that the two types are in one-to-one correspondence – that is, when I say that two types a and b are equal, I mean that you could write two functions

from :: a -> b
to   :: b -> a

that pair up values of a with values of b, so that the following equations always hold (here the == is genuine, Haskell-flavored equality):

to (from a) == a
from (to b) == b

For example, I contend that the types Bool and Add () () are equivalent. I can demonstrate the equivalence with the following functions:

to :: Bool -> Add () ()
to False = AddL ()
to True  = AddR ()

from :: Add () () -> Bool
from (AddL _) = False
from (AddR _) = True

I’ll use the triple equality symbol, ===, to denote this kind of equivalence between types.

Laws for sum types

Here are two laws for addition:

Add Void a === a

which says that there are as many values of type Add Void a as there are of type a, and

Add a b === Add b a

which says that it doesn’t matter which order you add things in. These laws are probably more familiar to you in the algebra of numbers as

$$ 0 + x = x $$ $$ x + y = y + x $$

If you fancy an exercise, you can demonstrate the correctness of the laws in the Haskell algebra – either with a counting argument, or by writing the functions from and to.

Laws for product types

There are three useful laws for multiplication:

Mul Void a === Void

which says that if you multiply anything by Void, you get Void back,

Mul () a === a

which says that if you multiply by () you don’t change anything, and

Mul a b === Mul b a

which says that it doesn’t matter which order you multiply in. The more familiar forms of these laws are

$$ 0 \cdot x = 0 $$ $$ 1 \cdot x = x $$ $$ x \cdot y = y \cdot x $$

Two more exercises: (i) prove the validity of these laws in the Haskell algebra, and (ii) explain why we don’t need laws of the form

Mul a Void === Void
Mul a ()   === a

There’s also a law that relates the addition and multiplication operators:

Mul a (Add b c) === Add (Mul a b) (Mul a c)

This one is a bit trickier to reason about, but writing the corresponding from and to functions isn’t too hard. The arithmetic version of this law is the friendlier-looking

$$ a \cdot (b + c) = a \cdot b + a \cdot c $$

called the distributive law.

Function types

As well as concrete types like Int and Bool, in Haskell you also have function types, like Int -> Bool or Double -> String. How do these fit into the algebra?

To figure this out we can go back to the counting argument. How many functions of type a → b are there?

Let’s be concrete, and fix a and b to both be Bool. The value False can map to either False or True, and similarly for the value True - thus there are \(2 \cdot 2 = 2 ^ 2 = 4\) possible functions Bool -> Bool. To be really explicit, we could enumerate them:

f1 :: Bool -> Bool -- equivalent to 'id'
f1 True  = True
f1 False = False

f2 :: Bool -> Bool -- equivalent to 'const False'
f2 _     = False

f3 :: Bool -> Bool -- equivalent to 'const True'
f3 _     = True

f4 :: Bool -> Bool -- equivalent to 'not'
f4 True  = False
f4 False = True

What happens if b is still Bool (with two values) and a is a type with three values, say

data Trio = First | Second | Third

Then each of First, Second and Third can map to two possible values, and in total there are \(2 \cdot 2 \cdot 2 = 2 ^ 3 = 8\) functions of type Trio -> Bool.

The same argument holds in general. If there are \(A\) values of type a, and \(B\) values of type b, then the number of values of type a → b is

$$ B ^ A $$

This justifies the common terminology for function types as exponential types.

Laws for functions

There are two laws for function types that involve the unit type. They are

() -> a === a

which says that there are as many functions () -> a as there are values of type a, and

a -> () === ()

which says that there is only one function a -> () – in particular, it is const (). The arithmetic versions of these laws are

$$ a^1 = a $$ $$ 1^a = 1 $$

There is also a law that allows factoring out of common arguments:

(a -> b, a -> c) === a -> (b,c)

whose arithmetic form is

$$ b^a \cdot c^a = (bc)^a $$

and a law about functions that return other functions:

a -> (b -> c) === (b,a) -> c

whose arithmetic form is

$$ (c ^ b) ^ a = c ^ { b \cdot a } $$

This last law may be more familiar when the order of the variables in the pair on the right-hand side is switched, and the parens on the left hand side are removed:

a -> b -> c === (a,b) -> c

which just says that we can curry and uncurry functions. Again, it’s an interesting exercise to prove all of these laws by writing the corresponding to and from functions.

Next time

In the next post I’ll look at recursive types, like lists and binary trees, and show how you can abuse the algebra of types in all kinds of ways to deduce interesting facts about types.