Handling errors with Either
There are quite a few ways to indicate and handle errors in Haskell. We are going to look at one solution: using the type Either. Either is defined like this:
data Either a b
= Left a
| Right b
Simply put, a value of type Either a b
can contain either a value of type a
,
or a value of type b
.
We can tell them apart from the constructor used.
Left True :: Either Bool b
Right 'a' :: Either a Char
With this type, we can use the
Left
constructor to indicate failure with some error value attached,
and the Right
constructor with one type to represent success with the
expected result.
Since Either
is polymorphic, we can use any two types to represent
failure and success. It is often useful to describe the failure modes
using an ADT.
For example, let's say that we want to parse a Char
as a decimal digit
to an Int
. This operation could fail if the Character is not a digit.
We can represent this error as a data type:
data ParseDigitError
= NotADigit Char
deriving Show
And our parsing function can have the type:
parseDigit :: Char -> Either ParseDigitError Int
Now when we implement our parsing function, we can return Left
on an error
describing the problem, and Right
with the parsed value on successful parsing:
parseDigit :: Char -> Either ParseDigitError Int
parseDigit c =
case c of
'0' -> Right 0
'1' -> Right 1
'2' -> Right 2
'3' -> Right 3
'4' -> Right 4
'5' -> Right 5
'6' -> Right 6
'7' -> Right 7
'8' -> Right 8
'9' -> Right 9
_ -> Left (NotADigit c)
Either a
is also an instance of Functor
and Applicative
,
so we have some combinators to work with if we want to combine these
kinds of computations.
For example, if we had three characters and we wanted to try and parse each of them and then find the maximum between them; we could use the applicative interface:
max3chars :: Char -> Char -> Char -> Either ParseDigitError Int
max3chars x y z =
(\a b c -> max a (max b c))
<$> parseDigit x
<*> parseDigit y
<*> parseDigit z
The Functor
and Applicative
interfaces of Either a
allow us to
apply functions to the payload values and delay the error handling to a
later phase. Semantically, the first Either in order that returns a Left
will be the return value. We can see how this works in the implementation
of the applicative instance:
instance Applicative (Either e) where
pure = Right
Left e <*> _ = Left e
Right f <*> r = fmap f r
At some point, someone will actually want to inspect the result
and see if we get an error (with the Left
constructor) or the expected value
(with the Right
constructor) and they can do that by pattern-matching the result.
Applicative + Traversable
The Applicative
interface of Either
is very powerful and can be combined
with another abstraction called
Traversable
-
for data structures that can be traversed from left to right, like a linked list or a binary tree.
With these, we can combine an unspecified amount of values such as Either ParseDigitError Int
,
as long as they are all in a data structure that implements Traversable
.
Let's see an example:
ghci> :t "1234567"
"1234567" :: String
-- remember, a String is an alias for a list of Char
ghci> :info String
type String :: *
type String = [Char]
-- Defined in ‘GHC.Base’
ghci> :t map parseDigit "1234567"
map parseDigit "1234567" :: [Either ParseDigitError Int]
ghci> map parseDigit "1234567"
[Right 1,Right 2,Right 3,Right 4,Right 5,Right 6,Right 7]
ghci> :t sequenceA
sequenceA :: (Traversable t, Applicative f) => t (f a) -> f (t a)
-- Substitute `t` with `[]`, and `f` with `Either Error` for a specialized version
ghci> sequenceA (map parseDigit "1234567")
Right [1,2,3,4,5,6,7]
ghci> map parseDigit "1a2"
[Right 1,Left (NotADigit 'a'),Right 2]
ghci> sequenceA (map parseDigit "1a2")
Left (NotADigit 'a')
The pattern of doing map
and then sequenceA
is another function called traverse
:
ghci> :t traverse
traverse
:: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
ghci> traverse parseDigit "1234567"
Right [1,2,3,4,5,6,7]
ghci> traverse parseDigit "1a2"
Left (NotADigit 'a')
We can use traverse
on any two types where one implements the Applicative
interface, like Either a
or IO
, and the other implements the Traversable
interface,
like []
(linked lists) and
Map k
(also known as a dictionary in other languages - a mapping from keys to values).
For example, using IO
and Map
. Note that we can construct a Map
data structure
from a list of tuples using the
fromList
function - the first value in the tuple is the key, and the second is the type.
ghci> import qualified Data.Map as M -- from the containers package
ghci> file1 = ("output/file1.html", "input/file1.txt")
ghci> file2 = ("output/file2.html", "input/file2.txt")
ghci> file3 = ("output/file3.html", "input/file3.txt")
ghci> files = M.fromList [file1, file2, file3]
ghci> :t files :: M.Map FilePath FilePath -- FilePath is an alias of String
files :: M.Map FilePath FilePath :: M.Map FilePath FilePath
ghci> readFiles = traverse readFile
ghci> :t readFiles
readFiles :: Traversable t => t FilePath -> IO (t String)
ghci> readFiles files
fromList [("output/file1.html","I'm the content of file1.txt\n"),("output/file2.html","I'm the content of file2.txt\n"),("output/file3.html","I'm the content of file3.txt\n")]
ghci> :t readFiles files
readFiles files :: IO (Map String String)
Above, we created a function readFiles
that will take a mapping from output file path
to input file path and returns an IO operation that, when run will read the input files
and replace their contents right there in the map! Surely this will be useful later.
Multiple errors
Note, since Either
has the kind * -> * -> *
(it takes two type
parameters) Either
cannot be an instance of Functor
or Applicative
:
instances of these type classes must have the
kind * -> *
.
Remember that when we look at a type class function signature like:
fmap :: Functor f => (a -> b) -> f a -> f b
And if we want to implement it for a specific type (in place of the f
),
we need to be able to substitute the f
with the target type. If we'd try
to do it with Either
we would get:
fmap :: (a -> b) -> Either a -> Either b
And neither Either a
or Either b
are saturated, so this won't type check.
For the same reason, if we'll try to substitute f
with, say, Int
, we'll get:
fmap :: (a -> b) -> Int a -> Int b
Which also doesn't make sense.
While we can't use Either
, we can use Either e
, which has the kind
* -> *
. Now let's try substituting f
with Either e
in this signature:
liftA2 :: Applicative => (a -> b -> c) -> f a -> f b -> f c
And we'll get:
liftA2 :: (a -> b -> c) -> Either e a -> Either e b -> Either e c
What this teaches us is that we can only use the applicative interface to
combine two Either
s with the same type for the Left
constructor.
So what can we do if we have two functions that can return different errors? There are a few approaches; the most prominent ones are:
- Make them return the same error type. Write an ADT that holds all possible
error descriptions. This can work in some cases but isn't always ideal.
For example, a user calling
parseDigit
shouldn't be forced to handle a possible case that the input might be an empty string - Use a specialized error type for each type, and when they are composed together,
map the error type of each function to a more general error type. This can
be done with the function
first
from theBifunctor
type class
Monadic interface
The applicative interface allows us to lift a function to work on multiple
Either
values (or other applicative functor instances such as IO
and Parser
).
But more often than not, we'd like to use a value from one computation
that might return an error in another computation that might return an error.
For example, a compiler such as GHC operates in stages, such as lexical analysis, parsing, type-checking, and so on. Each stage depends on the output of the stage before it, and each stage might fail. We can write the types for these functions:
tokenize :: String -> Either Error [Token]
parse :: [Token] -> Either Error AST
typecheck :: AST -> Either Error TypedAST
We want to compose these functions so that they work in a chain. The output of tokenize
goes to parse
, and the output of parse
goes to typecheck
.
We know that we can lift a function over an Either
(and other functors),
we can also lift a function that returns an Either
:
-- reminder the type of fmap
fmap :: Functor f => (a -> b) -> f a -> f b
-- specialized for `Either Error`
fmap :: (a -> b) -> Either Error a -> Either Error b
-- here, `a` is [Token] and `b` is `Either Error AST`:
> fmap parse (tokenize string) :: Either Error (Either Error AST)
While this code compiles, it isn't great, because we are building
layers of Either Error
, and we can't use this trick again with
typecheck
! typecheck
expects an AST
, but if we try to fmap it
on fmap parse (tokenize string)
, the a
will be Either Error AST
instead.
What we would really like is to flatten this structure instead of nesting it.
If we look at the kind of values Either Error (Either Error AST)
could have,
it looks something like this:
Left <error>
Right (Left error)
Right (Right <ast>)
Exercise: What if we just used pattern matching for this instead? What would this look like?
Solution
case tokenize string of
Left err ->
Left err
Right tokens ->
case parse tokens of
Left err ->
Left err
Right ast ->
typecheck ast
If we run into an error in a stage, we return that error and stop. If we succeed, we use the value in the next stage.
Flattening this structure for Either
is very similar to that last part - the body
of the Right tokens
case:
flatten :: Either e (Either e a) -> Either e a
flatten e =
case e of
Left l -> Left l
Right x -> x
Because we have this function, we can now use it on the output of
fmap parse (tokenize string) :: Either Error (Either Error AST)
from before:
> flatten (fmap parse (tokenize string)) :: Either Error AST
And now, we can use this function again to compose with typecheck
:
> flatten (fmap typecheck (flatten (fmap parse (tokenize string)))) :: Either Error TypedAST
This flatten
+ fmap
combination looks like a recurring pattern which
we can combine into a function:
flatMap :: (a -> Either e b) -> Either e a -> Either e b
flatMap func val = flatten (fmap func val)
And now, we can write the code this way:
> flatMap typecheck (flatMap parse (tokenize string)) :: Either Error TypedAST
-- Or using backticks syntax to convert the function to infix form:
> typecheck `flatMap` parse `flatMap` tokenize string
-- Or create a custom infix operator: (=<<) = flatMap
> typeCheck =<< parse =<< tokenize string
This function, flatten
(and flatMap
as well), have different names in Haskell.
They are called
join
and =<<
(pronounced "reverse bind"),
and they are the essence of another incredibly useful abstraction in Haskell.
If we have a type that can implement:
- The
Functor
interface, specifically thefmap
function - The
Applicative
interface, most importantly thepure
function - This
join
function
They can implement an instance of the Monad
type class.
With functors, we were able to "lift" a function to work over the type implementing the functor type class:
fmap :: (a -> b) -> f a -> f b
With applicative functors we were able to "lift" a function of multiple arguments over multiple values of a type implementing the applicative functor type class, and also lift a value into that type:
pure :: a -> f a
liftA2 :: (a -> b -> c) -> f a -> f b -> f c
With monads we can now flatten (or "join" in Haskell terminology) types that implement
the Monad
interface:
join :: m (m a) -> m a
-- this is =<< with the arguments reversed, pronounced "bind"
(>>=) :: m a -> (a -> m b) -> m b
With >>=
, we can write our compilation pipeline from before in a left-to-right
manner, which seems to be more popular for monads:
> tokenize string >>= parse >>= typecheck
We had already met this function before when we talked about IO
. Yes,
IO
also implements the Monad
interface. The monadic interface for IO
helped us with creating a proper ordering of effects.
The essence of the Monad
interface is the join
/>>=
functions, and as we've seen
we can implement >>=
in terms of join
, we can also implement join
in terms
of >>=
(try it!).
The monadic interface can mean very different things for different types. For IO
this
is ordering of effects, for Either
it is early cutoff,
for Logic
this means backtracking computation, etc.
Again, don't worry about analogies and metaphors; focus on the API and the laws.
Hey, did you check the monad laws? left identity, right identity, and associativity? We've already discussed a type class with exactly these laws - the
Monoid
type class. Maybe this is related to the famous quote about monads being just monoids in something something...
Do notation?
Remember the do notation? It turns out it works for any type that is
an instance of Monad
. How cool is that? Instead of writing:
pipeline :: String -> Either Error TypedAST
pipeline string =
tokenize string >>= \tokens ->
parse tokens >>= \ast ->
typecheck ast
We can write:
pipeline :: String -> Either Error TypedAST
pipeline string = do
tokens <- tokenize string
ast <- parse tokens
typecheck ast
And it will work! Still, in this particular case, tokenize string >>= parse >>= typecheck
is so concise it can only be beaten by using
>=>
or
<=<:
(>=>) :: Monad m => (a -> m b) -> (b -> m c) -> a -> m c
(<=<) :: Monad m => (b -> m c) -> (a -> m b) -> a -> m c
-- compare with function composition:
(.) :: (b -> c) -> (a -> b) -> a -> c
pipeline = tokenize >=> parse >=> typecheck
or
pipeline = typecheck <=< parse <=< tokenize
Haskell's ability to create very concise code using abstractions is great once one is familiar with the abstractions. Knowing the monad abstraction, we are now already familiar with the core composition API of many libraries - for example:
- Concurrent and asynchronous programming
- Web programming
- Testing
- Emulating stateful computation
- sharing environment between computations
- and many more.
Summary
Using Either
for error handling is useful for two reasons:
- We encode possible errors using types, and we force users to acknowledge and handle them, thus making our code more resilient to crashes and bad behaviours
- The
Functor
,Applicative
, andMonad
interfaces provide us with mechanisms for composing functions that might fail (almost) effortlessly - reducing boilerplate while maintaining strong guarantees about our code and delaying the need to handle errors until it is appropriate