Representing the markup language as a Haskell data type
One of the clear differentiators between Haskell (also other ML-family of languages) and most mainstream languages is the ability to represent data precisely and succinctly.
So how do we represent our markup language using Haskell?
Previously, in our HTML builder library, we used newtype
s to differentiate
between HTML documents, structures, and titles, but we didn't really need to
differentiate between different kinds of structures, such as paragraphs and headings,
not without parsing the data, at least.
In this case, we have a list of structures, and each structure could be one of a few specific options (a paragraph, a heading, a list, etc.), and we want to be able to know which structure is which so we can easily convert it into the equivalent HTML representation.
For that, we have data
definitions. Using data
we can
create custom types by grouping multiple types together and having
alternative structures. Think of them as a combination of both structs and enums.
data
declarations look like this:
data <Type-name> <type-args>
= <Data-constructor1> <types>
| <Data-constructor2> <types>
| ...
It looks really similar to newtype
, but there are two important
differences:
- In the
<types>
part, we can write many types (LikeInt
,String
, orBool
). Fornewtype
s, we can only write one. - We can have alternative structures using
|
,newtype
s have no alternatives.
This is because newtype
is used to provide a type-safe alias, and data
is used to build a new composite type that can potentially have alternatives.
Let's see a few examples of data types:
-
Bool
data Bool = True | False
We created a new data type named
Bool
with the possible valuesTrue
orFalse
. In this case, we only have constructor alternatives, and none of the constructors carry additional values. This is similar to enums in other languages. -
Person
data Person = Person String Int -- where the first is the name and the second is -- the age
We created a new data type named
Person
. Values of the typePerson
look like this:Person <some-string> <some-int>
For example:
Person "Gil" 32
In this case, we create a composite of multiple types without alternatives. This is similar to structs in other languages, but structs give each field a name, and here we distinguish them by position.
Alternatively, Haskell has syntactic sugar for naming fields called records. The above definition can also be written like this:
data Person = Person { name :: String , age :: Int }
Values of this type can be written exactly as before,
Person "Gil" 32
Or with this syntax:
Person { name = "Gil", age = 32 }
Haskell will also generate functions that can be used to extract the fields from the composite type:
name :: Person -> String age :: Person -> Int
Which can be used like this:
ghci> age (Person { name = "Gil", age = 32 }) 32
We even have a special syntax for updating specific fields in a record. Of course, we do not update records in place - we generate a new value instead.
ghci> gil = Person { name = "Gil", age = 32 } ghci> age (gil { age = 33 }) 33 ghci> age gil 32
Unfortunately, having specialized functions for each field also means that if we defined a different data type with the field
age
, the functions which GHC needs to generate will clash.The easiest way to solve this is to give fields unique names, for example by adding a prefix:
data Person = Person { pName :: String , pAge :: Int }
Another way is by using extensions to the Haskell language, which we will cover in later chapters.
-
Tuple
data Tuple a b = Tuple a b
This is pretty similar to
Person
, but we can plug any type we want for this definition. For example:Tuple "Clicked" True :: Tuple String Bool Tuple 'a' 'z' :: Tuple Char Char
This type has special syntax in Haskell:
("Clicked", True) :: (String, Bool) ('a', 'z') :: (Char, Char)
This
Tuple
definition is polymorphic; we define the structure but are able to plug different types into the structure to get concrete types. You can think ofTuple
as a template for a data type waiting to be filled or as a function waiting for types as input in order to return a data type. We can even take a look at the "type" signature ofTuple
inghci
using the:kind
command.ghci> data Tuple a b = Tuple a b ghci> :kind Tuple Tuple :: * -> * -> *
Quick detour: Kinds
The
:kind
command is called as such because the "type" of a type is called a kind. Kinds can be one of two things, either a*
, which means a saturated (or concrete) type, such asInt
orPerson
, or an->
of two kinds, which is, as you might have guessed, a type function, taking kind and returning a kind.Note that only types that have the kind
*
can have values. So, for example, whileTuple Int
is a valid Haskell concept that has the kind* -> *
, and we can write code that will work "generically" for all types that have a certain kind (e.g.* -> *
), we cannot construct a value that has the kind* -> *
. All values have types and all types that have values have the kind*
.We will talk more about kinds later; let's focus on types for now!
-
Either
data Either a b = Left a | Right b
Similar to Tuple, but instead of having only one constructor, we have two. This means that we can choose which side we want. Here are a couple of values of type
Either String Int
:Left "Hello" Right 17
This type is useful for modeling errors. Either we succeeded and got what we wanted (The
Right
constructor with the value), or we didn't and got an error instead (TheLeft
constructor with a string or a custom error type).
In our program, we use data
types to model the different kinds of content types
in our markup language. We tag each structure using the data constructor
and provide the rest of the information (the paragraph text, the list items, etc.)
in the <types>
section of the data declaration for each constructor:
type Document
= [Structure]
data Structure
= Heading Natural String
| Paragraph String
| UnorderedList [String]
| OrderedList [String]
| CodeBlock [String]
Note: Natural
is defined in the base
package but not exported from Prelude
.
Find out which module to import Natural
by using Hoogle.
Exercises
Represent the following markup documents as values of Document
:
-
Hello, world!
-
* Welcome To this tutorial about Haskell.
-
Remember that multiple lines with no separation are grouped together into a single paragraph but list items remain separate. # Item 1 of a list # Item 2 of the same list
-
* Compiling programs with ghc Running ghc invokes the Glasgow Haskell Compiler (GHC), and can be used to compile Haskell modules and programs into native executables and libraries. Create a new Haskell source file named hello.hs, and write the following code in it: > main = putStrLn "Hello, Haskell!" Now, we can compile the program by invoking ghc with the file name: > ➜ ghc hello.hs > [1 of 1] Compiling Main ( hello.hs, hello.o ) > Linking hello ... GHC created the following files: - hello.hi - Haskell interface file - hello.o - Object file, the output of the compiler before linking - hello (or hello.exe on Microsoft Windows) - A native runnable executable. GHC will produce an executable when the source file satisfies both conditions: # Defines the main function in the source file # Defines the module name to be Main or does not have a module declaration Otherwise, it will only produce the .o and .hi files.
Solutions:
Solution 1
example1 :: Document
example1 =
[ Paragraph "Hello, world!"
]
Solution 2
example2 :: Document
example2 =
[ Heading 1 "Welcome"
, Paragraph "To this tutorial about Haskell."
]
Solution 3
example3 :: Document
example3 =
[ Paragraph "Remember that multiple lines with no separation are grouped together into a single paragraph but list items remain separate."
, OrderedList
[ "Item 1 of a list"
, "Item 2 of the same list"
]
]
Solution 4
example4 :: Document
example4 =
[ Heading 1 "Compiling programs with ghc"
, Paragraph "Running ghc invokes the Glasgow Haskell Compiler (GHC), and can be used to compile Haskell modules and programs into native executables and libraries."
, Paragraph "Create a new Haskell source file named hello.hs, and write the following code in it:"
, CodeBlock
[ "main = putStrLn \"Hello, Haskell!\""
]
, Paragraph "Now, we can compile the program by invoking ghc with the file name:"
, CodeBlock
[ "➜ ghc hello.hs"
, "[1 of 1] Compiling Main ( hello.hs, hello.o )"
, "Linking hello ..."
]
, Paragraph "GHC created the following files:"
, UnorderedList
[ "hello.hi - Haskell interface file"
, "hello.o - Object file, the output of the compiler before linking"
, "hello (or hello.exe on Microsoft Windows) - A native runnable executable."
]
, Paragraph "GHC will produce an executable when the source file satisfies both conditions:"
, OrderedList
[ "Defines the main function in the source file"
, "Defines the module name to be Main or does not have a module declaration"
]
, Paragraph "Otherwise, it will only produce the .o and .hi files."
]
Add a new module named Markup
and add the data type definition to it.
Note that in this case, we do want to export the constructors of Structure
.
Solution
-- Markup.hs
module Markup
( Document
, Structure(..)
)
where
import Numeric.Natural
type Document
= [Structure]
data Structure
= Heading Natural String
| Paragraph String
| UnorderedList [String]
| OrderedList [String]
| CodeBlock [String]
Translating directly?
You might ask, "Why do we even need to represent the markup as a type? Why don't we convert it into HTML as soon as we parse it instead?". That's a good question and a valid strategy. The reason we first represent it as a Haskell type is for flexibility and modularity.
If the parsing code is coupled with HTML generation, we lose the ability to pre-process the markup document. For example, we might want to take only a small part of the document (for a summary) and present it, or create a table of content from headings. Or maybe we'd like to add other targets and not just HTML - maybe markdown format or a GUI reader?
Parsing to an "abstract data type" (ADT) representation (one that does not contain the details of the language, for example, '#' for ordered lists) gives us the freedom to do so much more than just conversion to HTML that it's usually worth it, in my opinion, unless you really need to optimize the process.