Chris Taylor

Mainly math and haskell

Polyvariadic Functions and Printf

Most languages have some form of printf function that lets you print formatted strings. In Python it looks like this

1
print "The %s river is %d miles long" % ("Amazon", 4000)

In C it looks like this

1
printf("The %s river is %d miles long\n", "Amazon", 4000)

What they have in common is that they take a format string as their first argument, followed by a variable number of arguments, of varying type. The arguments are formatted, interpolated into the string and then printed to the screen.

The immediate problem you face if you want to write printf in Haskell is the type system. All functions have a type, and the type determines what arguments it can accept. You can write polymorphic functions that accept a range of input types, but writing a function that accepts a variable number of arguments? That seems tough.

I recently read through the source of the Text.Printf module, and was impressed by how concise and self-contained it is, and especially by the neat trick that allows printf to take a variable number of arguments. In this post I’ll walk through creating a simplified, cut-down version of Text.Printf to give a flavor of the module.

I’ll need the following uncontroversial language extensions:

1
{-# LANGUAGE TypeSynonymInstances, FlexibleInstances #-}

First define a data type to hold things that are possible arguments to printf (in our cut-down module, we will only care about interpolating strings and integers into the format string).

1
2
data UPrintf = String String
             | Integer Integer

With this in hand, we could write the universal printf function, uprintf. This takes a format string and a list of arguments to be interpolated into the string, and returns the formatted string. Its type signature is therefore:

1
uprintf :: String -> [UPrintf] -> String

All printf functionality will eventually be implemented in terms of this function. We’ll hold off on giving its implementation until later in the module, and focus on how we can write the polyvariadic, polymorphic printf function.

We get polymorphism using type classes. Define a type class for types that can be converted into a UPrintf, along with a couple of useful instances:

1
2
3
4
5
6
7
8
class PrintfArg a where
    toUPrintf :: a -> UPrintf

instance PrintfArg String where
    toUPrintf = String

instance PrintfArg Integer where
    toUPrintf = Integer

There are multiple valid return types for the printf function. For example, we might want to return a String, or an IO action that outputs to a file or to stdout, or more generally any instance of MonadIO. So introduce a type class for valid return types:

1
2
class PrintfType t where
    spr :: String -> [UPrintf] -> t

The spr function takes a format string and a list of suitable arguments to be interpolated into the string, and returns something. The key is in what the valid return types are. Certainly, it makes sense to return a string by calling uprintf (we reverse the argument list, for reasons that will become clear in a bit)

1
2
instance PrintfType String where
    spr formatString args = uprintf formatString (reverse args)

It also makes sense to return an I/O action that does some printing

1
2
instance PrintfType (IO ()) where
    spr formatString args = putStrLn (uprintf formatString (reverse args))

Here’s the cool part. If a is an instance of PrintfArg and t is an instance of PrintfType, then a function a -> t is also a valid instance of PrintfType – you should picture it as consuming the a, and outputting another valid return type (which might be another function).

This is one way to implement polyvariadic functions in Haskell. Functions are curried by default so it’s easy to incrementally consume arguments, and return a function that consumes the rest of the arguments.

1
2
instance (PrintfArg a, PrintfType t) => PrintfType (a -> t) where
    spr fmt args a = spr fmt (toUPrintf a : args)

This is why we need to reverse the argument list when we make the final call to uprintf – the arg list acts like a stack, so that the last arguments pushed onto it are the first ones out. We need to reverse the stack at the end so that we format the arguments in the right order.

Now to write printf, just call spr with an empty argument list – the supplied arguments will be consed onto the list by spr until there are none remaining, and uprintf is called:

1
2
printf :: PrintfType t => String -> t
printf formatString = spr formatString []

The rest is just details. The uprintf function incrementally consumes the format string, building the output up as it goes. If it exhausts either the format string or the argument string while the other still has elements remaining, then it throws an error. Otherwise it watches for the % signs that signal a value to be interpolated, and calls the correct formatting function:

1
2
3
4
5
6
7
8
9
uprintf :: String -> [UPrintf] -> String
uprintf ""         []     = ""
uprintf ""         _      = fmterror
uprintf ('%':_:_ ) []     = argerror
uprintf ('%':c:cs) (u:us) = fmt c u ++ uprintf cs us
uprintf ( c :cs   ) us    = c : uprintf cs us

fmterror = error "Reached end of format string with args remaining."
argerror = error "Insufficient args for format string"

The fmt function looks at what type of value is to be interpolated, and makes a call to the appropriate function for converting the argument into a string:

1
2
3
4
5
6
7
8
9
10
11
fmt :: Char -> UPrintf -> String
fmt 'd' u = asint u
fmt 's' u = asstr u

asint (Integer i) = show i
asint (String  _) = typeerror "Integer" "String"

asstr (String  s) = s
asstr (Integer _) = typeerror "String" "Integer"

typeerror t1 t2 = error $ "Type error: expected " ++ t1 ++ ", got " ++ t2 ++ "."

Our simple module only allows for strings and integers to be interpolated into the format string, and doesn’t allow for complicated format strings like "%8d" or "%6.2f". These features can be implemented by adding more cases to fmt – the fundamental architecture that allows for polymorphic, polyvariadic functions is in place.