Fancy options parsing
We'd like to define a nicer interface for our program. While we could manage something
ourselves with getArgs
and pattern matching, it is easier to get good results using a library.
We are going to use a package called
optparse-applicative.
optparse-applicative
provides us with an EDSL (yes, another one) to build
command arguments parsers. Things like commands, switches, and flags can be built
and composed together to make a parser for command-line arguments without actually
writing operations on strings as we did when we wrote our Markup parser, and will
provide other benefits such as automatic generation of usage lines, help screens,
error reporting, and more.
While optparse-applicative
's dependency footprint isn't very large,
it is likely that a user of our library wouldn't need command-line parsing
in this particular case, so it makes sense to add this dependency to the executable
section
(rather than the library
section) in the .cabal
file:
executable hs-blog-gen
import: common-settings
hs-source-dirs: app
main-is: Main.hs
build-depends:
base
+ , optparse-applicative
, hs-blog
ghc-options:
-O
Building a command-line parser
The optparse-applicative package has pretty decent documentation, but we will cover a few important things to pay attention to in this chapter.
In general, there are four important things we need to do:
-
Define our model - we want to define an ADT that describes the various options and commands for our program
-
Define a parser that will produce a value of our model type when run
-
Run the parser on our program arguments input
-
Pattern match on the model and call the right operations according to the options
Define a model
Let's envision our command-line interface for a second; what should it look like?
We want to be able to convert a single file or input stream to either a file or an output stream, or we want to process a whole directory and create a new directory. We can model it in an ADT like this:
data Options
= ConvertSingle SingleInput SingleOutput
| ConvertDir FilePath FilePath
deriving Show
data SingleInput
= Stdin
| InputFile FilePath
deriving Show
data SingleOutput
= Stdout
| OutputFile FilePath
deriving Show
Note that we could technically also use
Maybe FilePath
to encode bothSingleInput
andSingleOutput
, but then we would have to remember whatNothing
means in each context. By creating a new type with properly named constructors for each option, we make it easier for readers of the code to understand the meaning of our code.
In terms of interface, we could decide that when a user would like to convert
a single input source, they would use the convert
command, and supply the optional flags
--input FILEPATH
and --output FILEPATH
to read or write from a file.
When the user does not supply one or both flags, we will read or write from
the standard input/output accordingly.
If the user would like to convert a directory, they can use the convert-dir
command and supply the two mandatory flags --input FILEPATH
and
--output FILEPATH
.
Build a parser
This is the most interesting part of the process. How do we build a parser that fits our model?
The optparse-applicative
library introduces a new type called Parser
.
Parser
, similar to Maybe
and IO
, has the kind * -> *
- when it
is supplied with a saturated (or concrete) type such as Int
, Bool
or
Options
, it can become a saturated type (one that has values).
A Parser a
represents a specification of a command-line options parser
that produces a value of type a
when the command-line arguments are
successfully parsed.
This is similar to how IO a
represents a description of a program
that can produce a value of type a
. The main difference between these
two types is that while we can't convert an IO a
to an a
(we just chain IO operations and have the Haskell runtime execute them),
we can convert a Parser a
to a function that takes a list of strings
representing the program arguments and produces an a
if it manages
to parse the arguments.
As we've seen with the previous EDSLs, this library uses the combinator pattern as well. We need to consider the basic primitives for building a parser and the methods of composing small parsers into bigger parsers.
Let's see an example of a small parser:
inp :: Parser FilePath
inp =
strOption
( long "input"
<> short 'i'
<> metavar "FILE"
<> help "Input file"
)
out :: Parser FilePath
out =
strOption
( long "output"
<> short 'o'
<> metavar "FILE"
<> help "Output file"
)
strOption
is a parser builder. It is a function that takes a combined
option modifiers as an argument, and returns a parser that will parse a string.
We can specify the type to be FilePath
because FilePath
is an
alias to String
. The parser builder describes how to parse the value,
and the modifiers describe its properties, such as the flag name,
the shorthand of the flag name, and how it would be described in the usage
and help messages.
Actually,
strOption
can return any string type that implements the interfaceIsString
. There are a few such types, for example,Text
, a much more efficient Unicode text type from thetext
package. It is more efficient thanString
because whileString
is implemented as a linked list ofChar
,Text
is implemented as an array of bytes.Text
is usually what we should use for text values instead ofString
. We haven't been using it up until now because it is slightly less ergonomic to use thanString
. But it is often the preferred type to use for text!
As you can see, modifiers can be composed using the <>
function,
which means modifiers implement an instance of the Semigroup
type class!
With such an interface, we don't have to supply all the modifier options, only the relevant ones. So if we don't want to have a shortened flag name, we don't have to add it.
Functor
For the data type we've defined, having Parser FilePath
takes us
a good step in the right direction, but it is not exactly what we need
for a ConvertSingle
. We need a Parser SingleInput
and a
Parser SingleOutput
. If we had a FilePath
, we could convert
it into SingleInput
by using the InputFile
constructor.
Remember, InputFile
is also a function:
InputFile :: FilePath -> SingleInput
OutputFile :: FilePath -> SingleOutput
However, to convert a parser, we need functions with these types:
f :: Parser FilePath -> Parser SingleInput
g :: Parser FilePath -> Parser SingleOutput
Fortunately, the Parser
interface provides us with a function to "lift"
a function like FilePath -> SingleInput
to work on parsers, making
it a function with the type Parser FilePath -> Parser SingleInput
.
Of course, this function will work for any input and output,
so if we have a function with the type a -> b
, we can pass it to
that function and get a new function of the type Parser a -> Parser b
.
This function is called fmap
:
fmap :: (a -> b) -> Parser a -> Parser b
-- Or with its infix version
(<$>) :: (a -> b) -> Parser a -> Parser b
We've seen fmap
before in the interface of other types:
fmap :: (a -> b) -> [a] -> [b]
fmap :: (a -> b) -> IO a -> IO b
fmap
is a type class function like <>
and show
. It belongs
to the type class Functor
:
class Functor f where
fmap :: (a -> b) -> f a -> f b
And it has the following laws:
-- 1. Identity law:
-- if we don't change the values, nothing should change
fmap id = id
-- 2. Composition law:
-- Composing the lifted functions is the same a composing
-- them after fmap
fmap (f . g) == fmap f . fmap g
Any type f
that can implement fmap
and follow these laws can be a valid
instance of Functor
.
Notice how
f
has a kind* -> *
, we can infer the kind off
by looking at the other types in the type signature offmap
:
a
andb
have the kind*
because they are used as arguments/return types of functionsf a
has the kind*
because it is used as an argument to a function, thereforef
has the kind* -> *
Let's choose a data type and see if we can implement a Functor
instance.
We need to choose a data type that has the kind * -> *
. Maybe
fits the bill.
We need to implement a function fmap :: (a -> b) -> Maybe a -> Maybe b
.
Here's one very simple (and wrong) implementation:
mapMaybe :: (a -> b) -> Maybe a -> Maybe b
mapMaybe func maybeX = Nothing
Check it out yourself! It compiles successfully! But unfortunately, it does not
satisfy the first law. fmap id = id
means that
mapMaybe id (Just x) == Just x
, however, from the definition, we can
clearly see that mapMaybe id (Just x) == Nothing
.
This is a good example of how Haskell doesn't help us ensure the laws
are satisfied, and why they are important. Unlawful Functor
instances
will behave differently from how we'd expect a Functor
to behave.
Let's try again!
mapMaybe :: (a -> b) -> Maybe a -> Maybe b
mapMaybe func maybeX =
case maybeX of
Nothing -> Nothing
Just x -> Just (func x)
This mapMaybe
will satisfy the functor laws. This can be proved
by doing algebra - if we can do substitution and reach the other side of the
equation in each law, then the law holds.
Functor is a very important type class, and many types implement this interface.
As we know, IO
, Maybe
, []
and Parser
all have the kind * -> *
,
and all allow us to map over their "payload" type.
Often, people try to look for analogies and metaphors to what a type class means, but type classes with funny names like
Functor
don't usually have an analogy or a metaphor that fits them in all cases. It is easier to give up on the metaphor and think about it as it is - an interface with laws.
We can use fmap
on Parser
to make a parser that returns FilePath
to
return a SingleInput
or SingleOutput
instead:
pInputFile :: Parser SingleInput
pInputFile = fmap InputFile parser
where
parser =
strOption
( long "input"
<> short 'i'
<> metavar "FILE"
<> help "Input file"
)
pOutputFile :: Parser SingleOutput
pOutputFile = OutputFile <$> parser -- fmap and <$> are the same
where
parser =
strOption
( long "output"
<> short 'o'
<> metavar "FILE"
<> help "Output file"
)
Applicative
Now that we have two parsers,
pInputFile :: Parser SingleInput
and pOutputFile :: Parser SingleOutput
,
we want to combine them as Options
. Again, if we only had
SingleInput
and SingleOutput
, we could use the constructor ConvertSingle
:
ConvertSingle :: SingleInput -> SingleOutput -> Options
Can we do a similar trick to the one we saw before with fmap
?
Does a function exist that can lift a binary function to work
on Parser
s instead? One with this type signature:
???
:: (SingleInput -> SingleOutput -> Options)
-> (Parser SingleInput -> Parser SingleOutput -> Parser Options)
Yes. This function is called liftA2
, and it is from the Applicative
type class. Applicative
(also known as applicative functor) has three
primary functions:
class Functor f => Applicative f where
pure :: a -> f a
liftA2 :: (a -> b -> c) -> f a -> f b -> f c
(<*>) :: f (a -> b) -> f a -> f b
Applicative
is another very popular type class with many instances.
Just like any Monoid
is a Semigroup
, any Applicative
is a Functor
. This means that any type that wants to implement
the Applicative
interface should also implement the Functor
interface.
Beyond what a regular functor can do, which is to lift a function over
a certain f
, applicative functors allow us to apply a function to
multiple instances of a certain f
, as well as "lift" any value of type a
into an f a
.
You should already be familiar with pure
, we've seen it when we
talked about IO
. For IO
, pure
lets us create an IO
action
with a specific return value without doing IO.
With pure
for Parser
, we can create a Parser
that, when run,
will return a specific value as output without doing any parsing.
liftA2
and <*>
are two functions that can be implemented in
terms of one another. <*>
is actually the more useful one between
the two. Because when combined with fmap
(or rather the infix version <$>
),
it can be used to apply a function with many arguments instead of just two.
To combine our two parsers into one, we can use either liftA2
or
a combination of <$>
and <*>
:
-- with liftA2
pConvertSingle :: Parser Options
pConvertSingle =
liftA2 ConvertSingle pInputFile pOutputFile
-- with <$> and <*>
pConvertSingle :: Parser Options
pConvertSingle =
ConvertSingle <$> pInputFile <*> pOutputFile
Note that both <$>
and <*>
associate to the left,
so we have invisible parenthesis that look like this:
pConvertSingle :: Parser Options
pConvertSingle =
(ConvertSingle <$> pInputFile) <*> pOutputFile
Let's take a deeper look at the types of the sub-expressions we have here to prove that this type-checks:
pConvertSingle :: Parser Options
pInputFile :: Parser SingleInput
pOutputFile :: Parser SingleOutput
ConvertSingle :: SingleInput -> SingleOutput -> Options
(<$>) :: (a -> b) -> Parser a -> Parser b
-- Specifically, here `a` is `SingleInput`
-- and `b` is `SingleOutput -> Options`,
ConvertSingle <$> pInputFile :: Parser (SingleOutput -> Options)
(<*>) :: Parser (a -> b) -> Parser a -> Parser b
-- Specifically, here `a -> b` is `SingleOutput -> Options`
-- so `a` is `SingleOutput` and `b` is `Options`
-- So we get:
(ConvertSingle <$> pInputFile) <*> pOutputFile :: Parser Options
With <$>
and <*>
we can chain as many parsers (or any applicative, really)
as we want. This is because of two things: currying and parametric polymorphism.
Because functions in Haskell take exactly one argument and return exactly one,
any multiple-argument function can be represented as a -> b
.
You can find the laws for the applicative functors in this article called Typeclassopedia, which talks about various useful type classes and their laws.
Applicative functor is a very important concept and will appear in various
parser interfaces (not just for command-line arguments, but also JSON
parsers and general parsers), I/O, concurrency, non-determinism, and more.
The reason this library is called optparse-applicative is because
it uses the Applicative
interface as the main API for
constructing parsers.
Exercise: create a similar interface for the ConvertDir
constructor of Options
.
Solution
pInputDir :: Parser FilePath
pInputDir =
strOption
( long "input"
<> short 'i'
<> metavar "DIRECTORY"
<> help "Input directory"
)
pOutputDir :: Parser FilePath
pOutputDir =
strOption
( long "output"
<> short 'o'
<> metavar "DIRECTORY"
<> help "Output directory"
)
pConvertDir :: Parser Options
pConvertDir =
ConvertDir <$> pInputDir <*> pOutputDir
Alternative
One thing we forgot about is that each input and output for
ConvertSingle
could also potentially use the standard input and output instead.
Up until now, we only offered one option: reading from or writing to a file
by specifying the flags --input
and --output
.
However, we'd like to make these flags optional, and when they are
not specified, use the alternative standard i/o. We can do that by using
the function optional
from Control.Applicative
:
optional :: Alternative f => f a -> f (Maybe a)
optional
works on types which implement instances of the
Alternative
type class:
class Applicative f => Alternative f where
(<|>) :: f a -> f a -> f a
empty :: f a
Alternative
looks very similar to the Monoid
type class,
but it works on applicative functors. This type class isn't
very common and is mostly used for parsing libraries as far as I know.
It provides us with an interface to combine two Parser
s -
if the first one fails to parse, try the other.
It also provides other useful functions such as optional
,
which will help us with our case:
pSingleInput :: Parser SingleInput
pSingleInput =
fromMaybe Stdin <$> optional pInputFile
pSingleOutput :: Parser SingleOutput
pSingleOutput =
fromMaybe Stdout <$> optional pOutputFile
Note that with fromMaybe :: a -> Maybe a -> a
we can extract
the a
out of the Maybe
by supplying a value for the Nothing
case.
Now we can use these more appropriate functions in pConvertSingle
instead:
pConvertSingle :: Parser Options
pConvertSingle =
ConvertSingle <$> pSingleInput <*> pSingleOutput
Commands and subparsers
We currently have two possible operations in our interface,
convert a single source, or convert a directory. A nice interface for
selecting the right operation would be via commands.
If the user would like to convert a single source, they can use
convert
, for a directory, convert-dir
.
We can create a parser with commands with the subparser
and command
functions:
subparser :: Mod CommandFields a -> Parser a
command :: String -> ParserInfo a -> Mod CommandFields a
subparser
takes command modifiers (which can be constructed
with the command
function) as input and produces a Parser
.
command
takes the command name (in our case, "convert" or "convert-dir")
and a ParserInfo a
, and produces a command modifier. As we've seen
before these modifiers have a Monoid
instance, and they can be
composed, meaning that we can append multiple commands to serve as alternatives.
A ParserInfo a
can be constructed with the info
function:
info :: Parser a -> InfoMod a -> ParserInfo a
This function wraps a Parser
with some additional information
such as a helper message, description, and more, so that the program
itself, and each sub-command can print some additional information.
Let's see how to construct a ParserInfo
:
pConvertSingleInfo :: ParserInfo Options
pConvertSingleInfo =
info
(helper <*> pConvertSingle)
(progDesc "Convert a single markup source to html")
Note that helper
adds a helper output screen in case the parser fails.
Let's also build a command:
pConvertSingleCommand :: Mod CommandFields Options
pConvertSingleCommand =
command "convert" pConvertSingleInfo
Try creating a Parser Options
combining the two options with subparser
.
Solution
pOptions :: Parser Options
pOptions =
subparser
( command
"convert"
( info
(helper <*> pConvertSingle)
(progDesc "Convert a single markup source to html")
)
<> command
"convert-dir"
( info
(helper <*> pConvertDir)
(progDesc "Convert a directory of markup files to html")
)
)
ParserInfo
Since we finished building a parser, we should wrap it up in a ParserInfo
and add some information to it to make it ready to run:
opts :: ParserInfo Options
opts =
info (helper <*> pOptions)
( fullDesc
<> header "hs-blog-gen - a static blog generator"
<> progDesc "Convert markup files or directories to html"
)
Running a parser
optparse-applicative
provides a non-IO
interface to parse arguments,
but the most convenient way to use it is to let it take care of fetching
program arguments, try to parse them, and throw errors and help messages in case
it fails. This can be done with the function execParser :: ParserInfo a -> IO a
.
We can place all these options parsing stuff in a new module
and then import it from app/Main.hs
. Let's do that.
Here's what we have up until now:
app/OptParse.hs
-- | Command-line options parsing
module OptParse
( Options(..)
, SingleInput(..)
, SingleOutput(..)
, parse
)
where
import Data.Maybe (fromMaybe)
import Options.Applicative
------------------------------------------------
-- * Our command-line options model
-- | Model
data Options
= ConvertSingle SingleInput SingleOutput
| ConvertDir FilePath FilePath
deriving Show
-- | A single input source
data SingleInput
= Stdin
| InputFile FilePath
deriving Show
-- | A single output sink
data SingleOutput
= Stdout
| OutputFile FilePath
deriving Show
------------------------------------------------
-- * Parser
-- | Parse command-line options
parse :: IO Options
parse = execParser opts
opts :: ParserInfo Options
opts =
info (pOptions <**> helper)
( fullDesc
<> header "hs-blog-gen - a static blog generator"
<> progDesc "Convert markup files or directories to html"
)
-- | Parser for all options
pOptions :: Parser Options
pOptions =
subparser
( command
"convert"
( info
(helper <*> pConvertSingle)
(progDesc "Convert a single markup source to html")
)
<> command
"convert-dir"
( info
(helper <*> pConvertDir)
(progDesc "Convert a directory of markup files to html")
)
)
------------------------------------------------
-- * Single source to sink conversion parser
-- | Parser for single source to sink option
pConvertSingle :: Parser Options
pConvertSingle =
ConvertSingle <$> pSingleInput <*> pSingleOutput
-- | Parser for single input source
pSingleInput :: Parser SingleInput
pSingleInput =
fromMaybe Stdin <$> optional pInputFile
-- | Parser for single output sink
pSingleOutput :: Parser SingleOutput
pSingleOutput =
fromMaybe Stdout <$> optional pOutputFile
-- | Input file parser
pInputFile :: Parser SingleInput
pInputFile = fmap InputFile parser
where
parser =
strOption
( long "input"
<> short 'i'
<> metavar "FILE"
<> help "Input file"
)
-- | Output file parser
pOutputFile :: Parser SingleOutput
pOutputFile = OutputFile <$> parser
where
parser =
strOption
( long "output"
<> short 'o'
<> metavar "FILE"
<> help "Output file"
)
------------------------------------------------
-- * Directory conversion parser
pConvertDir :: Parser Options
pConvertDir =
ConvertDir <$> pInputDir <*> pOutputDir
-- | Parser for input directory
pInputDir :: Parser FilePath
pInputDir =
strOption
( long "input"
<> short 'i'
<> metavar "DIRECTORY"
<> help "Input directory"
)
-- | Parser for output directory
pOutputDir :: Parser FilePath
pOutputDir =
strOption
( long "output"
<> short 'o'
<> metavar "DIRECTORY"
<> help "Output directory"
)
Pattern matching on Options
After running the command-line arguments parser, we can pattern match
on our model and call the right functions. Currently, our program
does not expose this kind of API. So let's go to our src/HsBlog.hs
module and change the API. We can delete main
from that file and
add two new functions instead:
convertSingle :: Html.Title -> Handle -> Handle -> IO ()
convertDirectory :: FilePath -> FilePath -> IO ()
Handle
is an I/O abstraction over file system objects, including stdin
and stdout
.
Before, we used writeFile
and getContents
- these functions either
get a FilePath
to open and work on, or they assume the Handle
is the standard I/O.
We can use the explicit versions that take a Handle
from System.IO
instead:
convertSingle :: Html.Title -> Handle -> Handle -> IO ()
convertSingle title input output = do
content <- hGetContents input
hPutStrLn output (process title content)
We will leave convertDirectory
unimplemented for now and implement it in the next chapter.
In app/Main.hs
, we will need to pattern match on the Options
and
prepare to call the right functions from HsBlog
.
Let's look at our full app/Main.hs
and src/HsBlog.hs
:
app/Main.hs
-- | Entry point for the hs-blog-gen program
module Main where
import OptParse
import qualified HsBlog
import System.Exit (exitFailure)
import System.Directory (doesFileExist)
import System.IO
main :: IO ()
main = do
options <- parse
case options of
ConvertDir input output ->
HsBlog.convertDirectory input output
ConvertSingle input output -> do
(title, inputHandle) <-
case input of
Stdin ->
pure ("", stdin)
InputFile file ->
(,) file <$> openFile file ReadMode
outputHandle <-
case output of
Stdout -> pure stdout
OutputFile file -> do
exists <- doesFileExist file
shouldOpenFile <-
if exists
then confirm
else pure True
if shouldOpenFile
then
openFile file WriteMode
else
exitFailure
HsBlog.convertSingle title inputHandle outputHandle
hClose inputHandle
hClose outputHandle
------------------------------------------------
-- * Utilities
-- | Confirm user action
confirm :: IO Bool
confirm =
putStrLn "Are you sure? (y/n)" *>
getLine >>= \answer ->
case answer of
"y" -> pure True
"n" -> pure False
_ -> putStrLn "Invalid response. use y or n" *>
confirm
src/HsBlog.hs
-- HsBlog.hs
module HsBlog
( convertSingle
, convertDirectory
, process
)
where
import qualified HsBlog.Markup as Markup
import qualified HsBlog.Html as Html
import HsBlog.Convert (convert)
import System.IO
convertSingle :: Html.Title -> Handle -> Handle -> IO ()
convertSingle title input output = do
content <- hGetContents input
hPutStrLn output (process title content)
convertDirectory :: FilePath -> FilePath -> IO ()
convertDirectory = error "Not implemented"
process :: Html.Title -> String -> String
process title = Html.render . convert title . Markup.parse
We need to make a few small changes to the .cabal
file.
First, we need to add the dependency directory
to the executable
,
because we use the library System.Directory
in Main
.
Second, we need to list OptParse
in the list of modules in
the executable
.
executable hs-blog-gen
import: common-settings
hs-source-dirs: app
main-is: Main.hs
+ other-modules:
+ OptParse
build-depends:
base
+ , directory
, optparse-applicative
, hs-blog
ghc-options:
-O
Summary
We've learned about a new fancy library called optparse-applicative
and used it to create a fancier command-line interface in a declarative way.
See the result of running hs-blog-gen --help
(or the equivalent
cabal
/stack
commands we discussed in the last chapter):
hs-blog-gen - a static blog generator
Usage: hs-blog-gen COMMAND
Convert markup files or directories to html
Available options:
-h,--help Show this help text
Available commands:
convert Convert a single markup source to html
convert-dir Convert a directory of markup files to html
Along the way, we've learned two powerful new abstractions, Functor
and Applicative
, as well as revisited an abstraction
called Monoid
. With this library, we've seen another example
of the usefulness of these abstractions for constructing APIs and EDSLs.
We will continue to meet these abstractions in the rest of the book.
Bonus exercise: Add another flag named --replace
to indicate that
if the output file or directory already exists, replacing them is okay.
You can view the git commit of the changes we've made and the code up until now.