Safer HTML construction with types
In this section, we'll learn how to create our own distinguished types for HTML, and how they can help us avoid the invalid construction of HTML strings.
There are a few ways of defining new types in Haskell; in this section,
we are going to meet two ways: newtype
and type
.
newtype
A newtype
declaration is a way to define a new, distinct type for an existing set of values.
This is useful when we want to reuse existing values but give them a different meaning
and ensure we can't mix the two.
For example, we can represent seconds, minutes, grams, and yens using integer values,
but we don't want to mix grams and seconds accidentally.
In our case, we want to represent structured HTML using textual values, but distinguish them from everyday strings that are not valid HTML.
A newtype
declaration looks like this:
newtype <type-name> = <constructor> <existing-type>
For example, in our case, we can define a distinct type for Html
like this:
newtype Html = Html String
The first Html
, to the left of the equals sign, lives in the types
name space, meaning that you will only see that name to the right of a
double-colon sign (::
).
The second Html
lives in the expressions (or terms/values) namespace,
meaning that you will see it where you expect expressions (we'll touch where
exactly that can be in a moment).
The two names, <type-name>
and <constructor>
, do not have to be the
same, but they often are. And note that both have to start with a
capital letter.
The right-hand side of the newtype declaration describes the shape of a
value of that type. In our case, we expect a value of
type Html
to have the constructor Html
and then an expression of
type string, for example: Html "hello"
or Html ("hello " <> "world")
.
You can think of the constructor as a function that takes the argument and returns something of our new type:
Html :: String -> Html
Note: We cannot use an expression of type Html
the same way we'd
use a String
. So "hello " <> Html "world"
would fail at type
checking.
This is useful when we want encapsulation. We can define and use existing representation and functions for our underlying type, but not mix them with other unrelated (to our domain) types. Similar as meters and feet can both be numbers, but we don't want to accidentally add feet to meters without any conversion.
For now, let's create a couple of types for our use case. We want two separate types to represent:
- A complete Html document
- A type for html structures such as headings and paragraphs that can go inside the tag
We want them to be distinct because we don't want to mix them.
Solution
newtype Html = Html String
newtype Structure = Structure String
Using newtype
s
To use the underlying type that the newtype wraps, we first need to extract it out of the type. We do this using pattern matching.
Pattern matching can be used in two ways, in case-expressions and in function definitions.
-
case expressions are kind of beefed up switch expressions and look like this:
case <expression> of <pattern> -> <expression> ... <pattern> -> <expression>
The
<expression>
is the thing we want to unpack, and thepattern
is its concrete shape. For example, if we wanted to extract theString
out of the typeStructure
we defined in the exercise above, we do:getStructureString :: Structure -> String getStructureString struct = case struct of Structure str -> str
This way, we can extract the
String
out ofStructure
and return it.In later chapters we'll introduce
data
declarations (which are kind of a struct + enum chimera), where we can define multiple constructors to a type. Then the multiple patterns of a case expression will make more sense. -
Alternatively, when declaring a function, we can also use pattern matching on the arguments:
func <pattern> = <expression>
For example:
getStructureString :: Structure -> String getStructureString (Structure str) = str
Using the types we created, we can change the HTML functions we've defined before, namely
html_
,body_
,p_
, etc., to operate on these types instead ofString
s.But first, let's meet another operator that will make our code more concise.
One very cool thing about newtype
is that wrapping and extracting expressions doesn't actually
have a performance cost! The compiler knows how to remove any wrapping and extraction
of the newtype
constructor and use the underlying type.
The new type and the constructor we defined are only there to help us distinguish between the type we created and the underlying type when we write our code, they are not needed when the code is running.
newtype
s provide us with type safety with no performance penalty!
Chaining functions
Another interesting and extremely common operator
(which is a regular library function in Haskell) is .
(pronounced compose).
This operator was made to look like the composition operator
you may know from math (∘
).
Let's look at its type and implementation:
(.) :: (b -> c) -> (a -> b) -> a -> c
(.) f g x = f (g x)
Compose takes 3 arguments: two functions (named f
and g
here) and
a third argument named x
. It then passes the argument x
to the second
function g
and calls the first function f
with the result of g x
.
Note that g
takes as input something of the type
a
and returns something of the type b
, and f
takes
something of the type b
and returns something of the type c
.
Another important thing to note is that types that start with
a lowercase letter are type variables.
Think of them as similar to regular variables. Just like
content
could be any string, like "hello"
or "world"
, a type variable
can be any type: Bool
, String
, String -> String
, etc.
This ability is called parametric polymorphism (other languages often call this generics).
The catch is that type variables must match in a signature, so if for
example, we write a function with the type signature a -> a
, the
input type and the return type must match, but it could be
any type - we cannot know what it is. So the only way to implement a
function with that signature is:
id :: a -> a
id x = x
id
, short for the identity function, returns the exact value it received.
If we tried any other way, for example, returning some made-up value
like "hello"
, or trying to use x
as a value of a type we know, like
writing x + x
, the type checker will complain.
Also, remember that ->
is right-associative? This signature is equivalent to:
(.) :: (b -> c) -> (a -> b) -> (a -> c)
Doesn't it look like a function that takes two functions and returns a third function that is the composition of the two?
We can now use this operator to change our HTML functions. Let's start
with one example: p_
.
Before, we had:
p_ :: String -> String
p_ = el "p"
And now, we can write:
p_ :: String -> Structure
p_ = Structure . el "p"
The function p_
will take an arbitrary String
, which is the content
of the paragraph we wish to create, wrap it in <p>
and </p>
tags,
and then wrap it in the Structure
constructor to produce the
output type Structure
(remember: newtype constructors can be used as functions!).
Let's take a deeper look at the types:
Structure :: String -> Structure
el "p" :: String -> String
(.) :: (b -> c) -> (a -> b) -> (a -> c)
Structure . el "p" :: String -> Structure
Let's see why the expression Structure . el "p"
type checks,
and why its type is String -> Structure
.
Type checking with pen and paper
If we want to figure out if and how exactly an expression type-checks, we can do that rather systematically. Let's look at an example where we try and type-check this expression:
p_ = Structure . el "p"
First, we write down the type of the outer-most function. In
our case, this is the operator .
which has the type:
(.) :: (b -> c) -> (a -> b) -> (a -> c)
After that, we can try to match the type of the arguments we apply to this function with the type of the arguments from the type signature.
In this case, we try to apply two arguments to .
:
Structure :: String -> Structure
el "p" :: String -> String
And luckily, .
expects two arguments with the types:
b -> c
a -> b
Note: Applying a function with more arguments than it expects is a type error.
Since the .
operator takes at least the number of arguments we supply, we continue
to the next phase of type-checking: matching the types of the inputs with the types
of the expected inputs (from the type signature of the operator).
When we match two types, we check for equivalence between them. There are a few possible scenarios here:
- When the two types are concrete (as opposed to type variables)
and simple, like
Int
andBool
, we check if they are the same. If they are, they type check, and we continue. If they aren't, they don't type check, and we throw an error. - When the two types we match are more complex (for example, both are functions), we try to match their inputs and outputs (in the case of functions). If the inputs and outputs match, then the two types match.
- There is a special case when one of the types is a type variable - in this case, we treat the matching process like an equation and write it down somewhere. The next time we see this type variable, we replace it with its match in the equation. Think about this like assigning a type variable with a value.
In our case, we want to match (or check the equivalence of) these types:
String -> Structure
withb -> c
String -> String
witha -> b
Let's do this one by one, starting with (1) - matching String -> Structure
and b -> c
:
- Because the two types are complex, we check that they are both functions, match their
inputs and outputs:
String
withb
, andStructure
withc
. - Because
b
is a type variable, we mark down somewhere thatb
should be equivalent toString
. We writeb ~ String
(we use~
to denote equivalence). - We match
Structure
andc
, same as before, we write down thatc ~ Structure
.
No problem so far; let's try matching String -> String
with a -> b
:
- The two types are complex; we see that both are functions, so we match their inputs and outputs.
- Matching
String
witha
- we write down thata ~ String
. - Matching
String
withb
- we remember that we have already written aboutb
- looking back, we see that we already noted thatb ~ String
. We need to replaceb
with the type that we wrote down before and check it against this type, so we matchString
withString
which, fortunately, type-check because they are the same.
So far, so good. We've type-checked the expression and discovered the following equivalences about the type variables in it:
a ~ String
b ~ String
c ~ Structure
Now, when asking what is the type of the expression:
p_ = Structure . el "p"
We say that it is the type of .
after replacing the type variables using the equations, we found
and removing the inputs we applied to it, so we started with:
(.) :: (b -> c) -> (a -> b) -> (a -> c)
Then we replaced the type variables:
(.) :: (String -> Structure) -> (String -> String) -> (String -> Structure)
And removed the two arguments when we applied the function:
Structure . el "p" :: String -> Structure
And we got the type of expression!
Fortunately, Haskell can do this process for us. But when Haskell complains that our types fail to type-check, and we don't understand exactly why, going through this process can help us understand where the types do not match, and then we can figure out how to solve it.
Note: If we use a parametrically polymorphic function more than once, or use different functions that have similar type variable names, the type variables don't have to match in all instances simply because they share a name. Each instance has its own unique set of type variables. For example, consider the following snippet:
incrementChar :: Char -> Char incrementChar c = chr (ord (id c) + id 1)
where the types for the functions we use are:
id :: a -> a ord :: Char -> Int chr :: Int -> Char
In the snippet above, we use
id
twice (for no good reason other than for demonstration purposes). The firstid
takes aChar
as argument, and itsa
is equivalent toChar
. The secondid
takes anInt
as argument, and its distincta
is equivalent toInt
.This, unfortunately, only applies to functions defined at the top-level. If we'd define a local function to be passed as an argument to
incrementChar
with the same type signature asid
, the types must match in all uses. So this code:incrementChar :: (a -> a) -> Char -> Char incrementChar func c = chr (ord (func c) + func 1)
Will not type check. Try it!
Appending Structure
Before, when we wanted to create richer HTML content and appended
nodes to one another, we used the append (<>
) operator.
Since we are now not using String
anymore, we need another way
to do it.
While it is possible to overload <>
using a feature in
Haskell called type classes, we will instead create a new function
and call it append_
, and cover type classes later.
append_
should take two Structure
s, and return a third Structure
,
appending the inner String
in the first Structure
to the second and wrapping the result back in Structure
.
Try implementing append_
.
Solution
append_ :: Structure -> Structure -> Structure
append_ (Structure a) (Structure b) =
Structure (a <> b)
Converting back Html
to String
After constructing a valid Html
value, we want to be able to
print it to the output so we can display it in our browser.
For that, we need a function that takes an Html
and converts it to a String
, which we can then pass to putStrLn
.
Implement the render
function.
Solution
render :: Html -> String
render html =
case html of
Html str -> str
type
Let's look at one more way to give new names to types.
A type
definition looks really similar to a newtype
definition - the only
difference is that we reference the type name directly without a constructor:
type <type-name> = <existing-type>
For example, in our case, we can write:
type Title = String
type
, in contrast with newtype
, is just a type name alias.
When we declare Title
as a type alias of String
,
we mean that Title
and String
are interchangeable,
and we can use one or the other whenever we want:
"hello" :: Title
"hello" :: String
Both are valid in this case.
We can sometimes use type
s to give a bit more clarity to our code,
but they are much less useful than newtype
s which allow us to
distinguish two types with the same type representation.
The rest of the owl
Try changing the code we wrote in previous chapters to use the new types we created.
Tips
We can combine
makeHtml
andhtml_
, and removebody_
head_
andtitle_
by callingel
directly inhtml_
, which can now have the typeTitle -> Structure -> Html
. This will make our HTML EDSL less flexible but more compact.Alternatively, we could create
newtype
s forHtmlHead
andHtmlBody
and pass those tohtml_
, and we might do that in later chapters, but I've chosen to keep the API a bit simple for now, we can always refactor later!
Solution
-- hello.hs
main :: IO ()
main = putStrLn (render myhtml)
myhtml :: Html
myhtml =
html_
"My title"
( append_
(h1_ "Heading")
( append_
(p_ "Paragraph #1")
(p_ "Paragraph #2")
)
)
newtype Html
= Html String
newtype Structure
= Structure String
type Title
= String
html_ :: Title -> Structure -> Html
html_ title content =
Html
( el "html"
( el "head" (el "title" title)
<> el "body" (getStructureString content)
)
)
p_ :: String -> Structure
p_ = Structure . el "p"
h1_ :: String -> Structure
h1_ = Structure . el "h1"
el :: String -> String -> String
el tag content =
"<" <> tag <> ">" <> content <> "</" <> tag <> ">"
append_ :: Structure -> Structure -> Structure
append_ c1 c2 =
Structure (getStructureString c1 <> getStructureString c2)
getStructureString :: Structure -> String
getStructureString content =
case content of
Structure str -> str
render :: Html -> String
render html =
case html of
Html str -> str
Are we safe yet?
We have made some progress - now we can't write "Hello"
where we'd expect either a paragraph or a heading, but we can still
write Structure "hello"
and get something that isn't a
paragraph or a heading. So while we made it harder for the user
to make mistakes by accident, we haven't really been able to enforce
the invariants we wanted to enforce in our library.
Next, we'll see how we can make expressions such as Structure "hello"
illegal
as well using modules and smart constructors.