James Bowen James Bowen

Attoparsec: The Clarity of Do-Syntax

attoparsec.jpg

In last week’s article we completed our look at the Applicative Parsing library. We took all our smaller combinators and put them together to parse our Gherkin syntax. This week, we’ll look at a new library: Attoparsec. Instead of trying to do everything using a purely applicative structure, this library uses a monadic approach. This approach is much more common. It results in syntax that is simpler to read and understand. It will also make it easier for us to add certain features.

To follow along with the code for this article, take a look at the attoparsec branch on Github! For some more excellent ideas about useful libraries, download our Production Checklist! It includes material on libraries for everything from data structures to machine learning!

If you’re new to Haskell, make sure you download our Beginner’s Checklist! It’ll tell you about all the steps you need to take to get started on your Haskell journey!

The Parser Type

In applicative parsing, all our parsers had the type RE Char. This type belonged to the Applicative typeclass but was not a Monad. For Attoparsec, we’ll instead be using the Parser type, a full monad. So in general we’ll be writing parsers with the following types:

featureParser :: Parser Feature
scenarioParser :: Parser Scenario
statementParser :: Parser Statement
exampleTableParser :: Parser ExampleTable
valueParser :: Parser Value

Parsing Values

The first thing we should realize though is that our parser is still an Applicative! So not everything needs to change! We can still make use of operators like *> and <|>. In fact, we can leave our value parsing code almost exactly the same! For instance, the valueParser, nullParser, and boolParser expressions can remain the same:

valueParser :: Parser Value
valueParser =
  nullParser <|>
  boolParser <|>
  numberParser <|>
  stringParser

nullParser :: Parser Value
nullParser =
  (string "null" <|>
  string "NULL" <|>
  string "Null") *> pure ValueNull

boolParser :: Parser Value
boolParser = (trueParser *> pure (ValueBool True)) <|> (falseParser *> pure (ValueBool False))
  where
    trueParser = string "True" <|> string "true" <|> string "TRUE"
    falseParser = string "False" <|> string "false" <|> string "FALSE"

If we wanted, we could make these more "monadic" without changing their structure. For instance, we can use return instead of pure (since they are identical). We can also use >> instead of *> to perform monadic actions while discarding a result. Our value parser for numbers changes a bit, but it gets simpler! The authors of Attoparsec provide a convenient parser for reading scientific numbers:

numberParser :: Parser Value
numberParser = ValueNumber <$> scientific

Then for string values, we’ll use the takeTill combinator to read all the characters until a vertical bar or newline. Then we’ll apply a few text functions to remove the whitespace and get it back to a String. (The Parser monad we’re using parses things as Text rather than String).

stringParser :: Parser Value
stringParser = (ValueString . unpack . strip) <$> 
  takeTill (\c -> c == '|' || c == '\n')

Parsing Examples

As we parse the example table, we’ll switch to a more monadic approach by using do-syntax. First, we establish a cellParser that will read a value within a cell.

cellParser = do
  skipWhile nonNewlineSpace
  val <- valueParser
  skipWhile (not . barOrNewline)
  char '|'
  return val

Each line in our statement refers to a step of the parsing process. So first we skip all the leading whitespace. Then we parse our value. Then we skip the remaining space, and parse the final vertical bar to end the cell. Then we’ll return the value we parsed.

It’s a lot easier to keep track of what’s going on here compared to applicative syntax. It’s not hard to see which parts of the input we discard and which we use. If we don’t assign the value with <- within do-syntax, we discard the value. If we retrieve it, we’ll use it. To complete the exampleLineParser, we parse the initial bar, get many values, close out the line, and then return them:

exampleLineParser :: Parser [Value]
exampleLineParser = do
  char '|'
  cells <- many cellParser
  char '\n'
  return cells
  where
    cellParser = ...

Reading the keys for the table is almost identical. All that changes is that our cellParser uses many letter instead of valueParser. So now we can put these pieces together for our exampleTableParser:

exampleTableParser :: Parser ExampleTable
exampleTableParser = do
  string "Examples:"
  consumeLine
  keys <- exampleColumnTitleLineParser
  valueLists <- many exampleLineParser
  return $ ExampleTable keys (map (zip keys) valueLists)

We read the signal string "Examples:", followed by consuming the line. Then we get our keys and values, and build the table with them. Again, this is much simpler than mapping a function like buildExampleTable like in applicative syntax.

Statements

The Statement parser is another area where we can improve the clarity of our code. Once again, we’ll define two helper parsers. These will fetch the portions outside brackets and then inside brackets, respectively:

nonBrackets :: Parser String
nonBrackets = many (satisfy (\c -> c /= '\n' && c /= '<'))

insideBrackets :: Parser String
insideBrackets = do
  char '<'
  key <- many letter
  char '>'
  return key

Now when we put these together, we can more clearly see the steps of the process outlined in do-syntax. First we parse the “signal” word, then a space. Then we get the “pairs” of non-bracketed and bracketed portions. Finally, we’ll get one last non-bracketed part:

parseStatementLine :: Text -> Parser Statement
parseStatementLine signal = do
  string signal
  char ' '
  pairs <- many ((,) <$> nonBrackets <*> insideBrackets)
  finalString <- nonBrackets
  ...

Now we can define our helper function buildStatement and call it on its own line in do-syntax. Then we’ll return the resulting Statement. This is much easier to read than tracking which functions we map over which sections of the parser:

parseStatementLine :: Text -> Parser Statement
parseStatementLine signal = do
  string signal
  char ' '
  pairs <- many ((,) <$> nonBrackets <*> insideBrackets)
  finalString <- nonBrackets
  let (fullString, keys) = buildStatement pairs finalString
  return $ Statement fullString keys
  where
    buildStatement 
      :: [(String, String)] -> String -> (String, [String])
    buildStatement [] last = (last, [])
    buildStatement ((str, key) : rest) rem =
      let (str', keys) = buildStatement rest rem
      in (str <> "<" <> key <> ">" <> str', key : keys)

Scenarios and Features

As with applicative parsing, it’s now straightforward for us to finish everything off. To parse a scenario, we read the keyword, consume the line to read the title, and read the statements and examples:

scenarioParser :: Parser Scenario
scenarioParser = do
  string "Scenario: "
  title <- consumeLine
  statements <- many (parseStatement <* char '\n')
  examples <- (exampleTableParser <|> return (ExampleTable [] []))
  return $ Scenario title statements examples

Again, we provide an empty ExampleTable as an alternative if there are no examples. The parser for Background looks very similar. The only difference is we ignore the result of the line and instead use Background as the title string.

backgroundParser :: Parser Scenario
backgroundParser = do
  string "Background:"
  consumeLine
  statements <- many (parseStatement <* char '\n')
  examples <- (exampleTableParser <|> return (ExampleTable [] []))
  return $ Scenario "Background" statements examples

Finally, we’ll put all this together as a feature. We read the title, get the background if it exists, and read our scenarios:

featureParser :: Parser Feature
featureParser = do
  string "Feature: "
  title <- consumeLine
  maybeBackground <- optional backgroundParser
  scenarios <- many scenarioParser
  return $ Feature title maybeBackground scenarios

Feature Description

One extra feature we’ll add now is that we can more easily parse the “description” of a feature. We omitted them in applicative parsing, as it’s a real pain to implement. It becomes much simpler when using a monadic approach. The first step we have to take though is to make one parser for all the main elements of our feature. This approach looks like this:

featureParser :: Parser Feature
featureParser = do
  string "Feature: "
  title <- consumeLine
  (description, maybeBackground, scenarios) <- parseRestOfFeature
  return $ Feature title description maybeBackground scenarios

parseRestOfFeature :: Parser ([String], Maybe Scenario, [Scenario])
parseRestOfFeature = ...

Now we’ll use a recursive function that reads one line of the description at a time and adds to a growing list. The trick is that we’ll use the choice combinator offered by Attoparsec.

We’ll create two parsers. The first assumes there are no further lines of description. It attempts to parse the background and scenario list. The second reads a line of description, adds it to our growing list, and recurses:

parseRestOfFeature :: Parser ([String], Maybe Scenario, [Scenario])
parseRestOfFeature = parseRestOfFeatureTail []
  where
    parseRestOfFeatureTail prevDesc = do
      (fullDesc, maybeBG, scenarios) <- choice [noDescriptionLine prevDesc, descriptionLine prevDesc]
      return (fullDesc, maybeBG, scenarios)

So we’ll first try to run this noDescriptionLineParser. It will try to read the background and then the scenarios as we’ve always done. If it succeeds, we know we’re done. The argument we passed is the full description:

where
  noDescriptionLine prevDesc = do
    maybeBackground <- optional backgroundParser
    scenarios <- some scenarioParser
    return (prevDesc, maybeBackground, scenarios)

Now if this parser fails, we know that it means the next line is actually part of the description. So we’ll write a parser to consume a full line, and then recurse:

descriptionLine prevDesc = do
  nextLine <- consumeLine
  parseRestOfFeatureTail (prevDesc ++ [nextLine])

And now we’re done! We can parse descriptions!

Conclusion

That wraps up our exploration of Attoparsec. Come back next week where we’ll finish this series off by learning about Megaparsec. We’ll find that it’s syntactically very similar to Attoparsec with a few small exceptions. We’ll see how we can use some of the added power of monadic parsing to enrich our syntax.

To learn more about cool Haskell libraries, be sure to check out our Production Checklist! It’ll tell you a little bit about libraries in all kinds of areas like databases and web APIs.

If you’ve never written Haskell at all, download our Beginner’s Checklist! It’ll give you all the resources you need to get started on your Haskell journey!

Read More
James Bowen James Bowen

Applicative Parsing II: Putting the Pieces Together

applicative_parsing_2.png

In last week’s article, we introduced the Applicative parsing library. We learned about the RE type and the basic combinators like sym and string. We saw how we could combine those together with applicative functions like many and <*> to parse strings into data structures. This week, we’ll put these pieces together in an actual parser for our Gherkin syntax. To follow along with the code examples, check out Parser.hs on the Github repository.

Starting next week, we’ll explore some other parsing libraries, starting with Attoparsec. For a little more information about those and many other libraries, download our Production Checklist! It summarizes many libraries on topics from databases to Web APIs.

If you’ve never written Haskell at all, get started! Download our free Beginners Checklist!.

Value Parser

In keeping with our approach from the last article, we’re going to start with smaller elements of our syntax. Then we can use these to build larger ones with ease. To that end, let’s build a parser for our Value type, the most basic data structure in our syntax. Let’s recall what that looks like:

data Value =
ValueNull |
ValueBool Bool |
ValueString String |
ValueNumber Scientific

Since we have different constructors, we’ll make a parser for each one. Then we can combine them with alternative syntax:

valueParser :: RE Char Value
valueParser =
  nullParser <|>
  boolParser <|>
  numberParser <|>
  stringParser

Now our parsers for the null values and boolean values are easy. For each of them, we’ll give a few different options about what strings we can use to represent those elements. Then, as with the larger parser, we’ll combine them with <|>.

nullParser :: RE Char Value
nullParser =
  (string “null” <|>
  string “NULL” <|>
  string “Null”) *> pure ValueNull

boolParser :: RE Char Value
boolParser =
  trueParser *> pure (ValueBool True) <|> 
  falseParser *> pure (ValueBool False)
  where
    trueParser = string “True” <|> string “true” <|> string “TRUE”
    falseParser = string “False” <|> string “false” <|> string “FALSE”

Notice in both these cases we discard the actual string with *> and then return our constructor. We have to wrap the desired result with pure.

Number and String Values

Numbers and strings are a little more complicated since we can’t rely on hard-coded formats. In the case of numbers, we’ll account for integers, decimals, and negative numbers. We'll ignore scientific notation for now. An integer is simple to parse, since we’ll have many characters that are all numbers. We use some instead of many to enforce that there is at least one:

numberParser :: RE Char Value
numberPaser = …
  where
    integerParser = some (psym isNumber)

A decimal parser will read some numbers, then a decimal point, and then more numbers. We'll insist there is at least one number after the decimal point.

numberParser :: RE Char Value
numberPaser = …
  where
    integerParser = some (psym isNumber)
    decimalParser = 
      many (psym isNumber) <*> sym ‘.’ <*> some (psym isNumber)

Finally, for negative numbers, we’ll read a negative symbol and then one of the other parsers:

numberParser :: RE Char Value
numberPaser = …
  where
    integerParser = some (psym isNumber)
    decimalParser = 
      many (psym isNumber) <*> sym ‘.’ <*> some (psym isNumber)
    negativeParser = sym ‘-’ <*> (decimalParser <|> integerParser)

However, we can’t combine these parsers as is! Right now, they all return different results! The integer parser returns a single string. The decimal parser returns two strings and the decimal character, and so on. In general, we’ll want to combine each parser's results into a single string and then pass them to the read function. This requires mapping a couple functions over our last two parsers:

numberParser :: RE Char Value
numberPaser = …
  where
    integerParser = some (psym isNumber)
    decimalParser = combineDecimal <$> 
      many (psym isNumber) <*> sym ‘.’ <*> some (psym isNumber)
    negativeParser = (:) <$> 
      sym ‘-’ <*> (decimalParser <|> integerParser)

    combineDecimal :: String -> Char -> String -> String
    combineDecimal base point decimal = base ++ (point : decimal)

Now all our number parsers return strings, so we can safely combine them. We'll map the ValueNumber constructor over the value we read from the string.

numberParser :: RE Char Value
numberPaser = (ValueNumber . read) <$>
  (negativeParser <|> decimalParser <|> integerParser)
  where
    ...

Note that order matters! If we put the integer parser first, we’ll be in trouble! If we encounter a decimal, the integer parser will greedily succeed and parse everything before the decimal point. We'll either lose all the information after the decimal, or worse, have a parse failure.

The last thing we need to do is read a string. We need to read everything in the example cell until we hit a vertical bar, but then ignore any whitespace. Luckily, we have the right combinator for this, and we’ve even written a trim function already!

stringParser :: RE Char Value
stringParser = (ValueString . trim) <$> readUntilBar

And now our valueParser will work as expected!

Building an Example Table

Now that we can parse individual values, let’s figure out how to parse the full example table. We can use our individual value parser to parse a whole line of values! The first step is to read the vertical bar at the start of the line.

exampleLineParser :: RE Char [Value]
exampleLineParser = sym ‘|’ *> ...

Next, we’ll build a parser for each cell. It will read the whitespace, then the value, and then read up through the next bar.

exampleLineParser :: RE Char [Value]
exampleLineParser = sym ‘|’ *> ...
  where
    cellParser = 
      many isNonNewlineSpace *> valueParser <* readThroughBar

isNonNewlineSpace :: RE Char Char
isNonNewlineSpace = psym (\c -> isSpace c && c /= ‘\n’)

Now we read many of these and finish by reading the newline:

exampleLineParser :: RE Char [Value]
exampleLineParser = 
  sym ‘|’ *> many cellParser <* readThroughEndOfLine
  where
    cellParser = 
      many isNonNewlineSpace *> valueParser <* readThroughBar

Now, we need a similar parser that reads the title column of our examples. This will have the same structure as the value cells, only it will read normal alphabetic strings instead of values.

exampleColumnTitleLineParser :: RE Char [String]
exampleColumnTitleLineParser = sym ‘|’ *> many cellParser <* readThroughEndOfLine
  where
    cellParser = 
      many isNonNewlineSpace *> many (psym isAlpha) <* readThroughBar

Now we can start building the full example parser. We’ll want to read the string, the column titles, and then the value lines.

exampleTableParser :: RE Char ExampleTable
exampleTableParser =
  (string “Examples:” *> readThroughEndOfLine) *>
  exampleColumnTitleLineParser <*>
  many exampleLineParser

We’re not quite done yet. We’ll need to apply a function over these results that will produce the final ExampleTable. And the trick is that we want to map up the example keys with their values. We can accomplish this with a simple function. It will return zip the keys over each value list using map:

exampleTableParser :: RE Char ExampleTable
exampleTableParser = buildExampleTable <$>
  (string “Examples:” *> readThroughEndOfLine) *>
  exampleColumnTitleLineParser <*>
  many exampleLineParser
  where
    buildExampleTable :: [String] -> [[Value]] -> ExampleTable
    buildExampleTable keys valueLists = ExampleTable keys (map (zip keys) valueLists)

Statements

Now we that we can parse the examples for a given scenario, we need to parse the Gherkin statements. To start with, let’s make a generic parser that takes the keyword as an argument. Then our full parser will try each of the different statement keywords:

parseStatementLine :: String -> RE Char Statement
parseStatementLine signal = …

parseStatement :: RE Char Statement
parseStatement =
  parseStatementLine “Given” <|>
  parseStatementLine “When” <|>
  parseStatementLine “Then” <|>
  parseStatementLine “And”

Now we’ll get the signal word out of the way and parse the statement line itself.

parseStatementLine :: String -> RE Char Statement
parseStatementLine signal = string signal *> sym ' ' *> ...

Parsing the statement is tricky. We want to parse the keys inside brackets and separate them as keys. But we also want them as part of the statement’s string. To that end, we’ll make two helper parsers. First, nonBrackets will parse everything in a string up through a bracket (or a newline).

nonBrackets :: RE Char String
nonBrackets = many (psym (\c -> c /= ‘\n’ && c /= ‘<’))

We’ll also want a parser that parses the brackets and returns the keyword inside:

insideBrackets :: RE Char String
insideBrackets = sym ‘<’ *> many (psym (/= ‘>’)) <* sym ‘>’

Now to read a statement, we start with non-brackets, and alternate with keys in brackets. Let's observe that we start and end with non-brackets, since they can be empty. Thus we can represent a line a list of non-bracket/bracket pairs, followed by a last non-bracket part. To make a pair, we combine the parser results in a tuple using the (,) constructor enabled by TupleSections:

parseStatementLine :: String -> RE Char Statement
parseStatementLine signal = string signal *> sym ‘ ‘ *>
  many ((,) <$> nonBrackets <*> insideBrackets) <*> nonBrackets

From here, we need a recursive function that will build up our final statement string and the list of keys. We do this with buildStatement.

parseStatementLine :: String -> RE Char Statement
parseStatementLine signal = string signal *> sym ‘ ‘ *>
  (buildStatement <$> 
    many ((,) <$> nonBrackets <*> insideBrackets) <*> nonBrackets)
  where
    buildStatement :: 
      [(String, String)] -> String -> (String, [String])
    buildStatement [] last = (last, [])
    buildStatement ((str, key) : rest) rem =
      let (str', keys) = buildStatement rest rem
      in (str <> "<" <> key <> ">" <> str', key : keys)

The last thing we need is a final helper that will take the result of buildStatement and turn it into a Statement. We’ll call this finalizeStatement, and then we’re done!

parseStatementLine :: String -> RE Char Statement
parseStatementLine signal = string signal *> sym ‘ ‘ *>
  (finalizeStatement . buildStatement <$> 
    many ((,) <$> nonBrackets <*> insideBrackets) <*> nonBrackets)
  where
    buildStatement :: 
      [(String, String)] -> String -> (String, [String])
    buildStatement [] last = (last, [])
    buildStatement ((str, key) : rest) rem =
      let (str', keys) = buildStatement rest rem
      in (str <> "<" <> key <> ">" <> str', key : keys)

    finalizeStatement :: (String, [String]) -> Statement
    finalizeStatement (regex, variables) = Statement regex variables

Scenarios

Now that we have all our pieces in place, it’s quite easy to write the parser for scenario! First we get the title by reading the keyword and then the rest of the line:

scenarioParser :: RE Char Scenario
scenarioParser = string “Scenario: “ *> readThroughEndOfLine ...

After that, we read many statements, and then the example table. Since the example table might not exist, we’ll provide an alternative that is a pure, empty table. We can wrap everything together by mapping the Scenario constructor over it.

scenarioParser :: RE Char Scenario
scenarioParser = Scenario <$>
  (string “Scenario: “ *> readThroughEndOfLine) <*>
  many (statementParser <* sym ‘\n’) <*>
  (exampleTableParser <|> pure (ExampleTable [] []))

We can also make a “Background” parser that is very similar. All that changes is that we read the string “Background” instead of a title. Since we’ll hard-code the title as “Background”, we can include it with the constructor and map it over the parser.

backgroundParser :: RE Char Scenario
backgroundParser = Scenario “Background” <$>
  (string “Background:” *> readThroughEndOfLine) *>
 many (statementParser <* sym ‘\n’) <*>
  (exampleTableParser <|> pure (ExampleTable [] []))

Finally the Feature

We’re almost done! All we have left is to write the featureParser itself! As with scenarios, we’ll start with the keyword and a title line:

featureParser :: RE Char Feature
featureParser = Feature <$>
  (string “Feature: “ *> readThroughEndOfLine) <*>
  ...

Now we’ll use the optional combinator to parse the Background if it exists, but return Nothing if it doesn’t. Then we’ll wrap up with parsing many scenarios!

featureParser :: RE Char Feature
featureParser = Feature <$>
  (string “Feature: “ *> readThroughEndOfLine) <*>
  (optional backgroundParser) <*>
  (many scenarioParser)

Note that here we’re ignoring the “description” of a feature we proposed as part of our original syntax. Since there are no keywords for that, it turns out to be painful to deal with it using applicative parsing. When we look at monadic approaches starting next week, we’ll see it isn’t as hard there.

Conclusion

This wraps up our exploration of applicative parsing. We can see how well suited Haskell is for parsing. The functional nature of the language means it's easy to start with small building blocks like our first parsers. Then we can gradually combine them to make something larger. It can be a little tricky to wrap our heads around all the different operators and combinators. But once you understand the ways in which these let us combine our parsers, they make a lot of sense and are easy to use.

To further your knowledge of useful Haskell libraries, download our free Production Checklist! It will tell you about libraries for many tasks, from databases to machine learning!

If you’ve never written a line of Haskell before, never fear! Download our Beginners Checklist to learn more!

Read More
James Bowen James Bowen

Applicative Parsing I: Building the Foundation

applicative_parsing_1.png

Last week we prepared ourselves for parsing by going over the basics of the Gherkin Syntax. In this article and the next, we’ll be using the applicative parsing library to parse that syntax. This week, we’ll focus on the fundamentals of this library, and build up a vocabulary of combinators to use. We'll make heavy use of the Applicative typeclass. If you need a refresher on that, check out this article. As we start coding, you can also follow along with the examples on Github here! Most of the code here is in Parser.hs.

In the coming weeks, we’ll be seeing a couple other parsing libraries as well. If you want to get some ideas about these and more, download our Production Checklist. It summarizes many other useful libraries for writing higher level Haskell.

If you’ve never started writing Haskell, now’s your chance! Get our free Beginner’s Checklist and learn the basics of getting started!

Getting Started

So to start parsing, let’s make some notes about our input format. First, we’ll treat our input feature document as a single string. We’ll remove all empty lines, and then trim leading and trailing whitespace from each line.

parseFeatureFromFile :: FilePath -> IO Feature
parseFeatureFromFile inputFile = do
  fileContents <- lines <$> readFile inputFile
  let nonEmptyLines = filter (not . isEmpty) fileContents
  let trimmedLines = map trim nonEmptyLines
  let finalString = unlines trimmedLines
  case parseFeature finalString of
    ...

…
isEmpty :: String -> Bool
isEmpty = all isSpace

trim :: String -> String
trim input = reverse flippedTrimmed
  where
    trimStart = dropWhile isSpace input
    flipped = reverse trimStart
    flippedTrimmed = dropWhile isSpace flipped

This means a few things for our syntax. First, we don’t care about indentation. Second, we ignore extra lines. This means our parsers might allow certain formats we don’t want. But that’s OK because we’re trying to keep things simple.

The RE Type

With applicative based parsing, the main data type we’ll be working with is called RE, for regular expression. This represents a parser, and it’s parameterized by two types:

data RE s a = ...

The s type refers to the fundamental unit we’ll be parsing. Since we're parsing our input as a single String, this will be Char. Then the a type is the result of the parsing element. This varies from parser to parser. The most basic combinator we can use is sym. This parses a single symbol of your choosing:

sym :: s - > RE s s

parseLowercaseA :: RE Char Char
parseLowercaseA = sym ‘a’

To use an RE parser, we call the match function or its infix equivalent =~. These will return a Just value if we can match the entire input string, and Nothing otherwise:

>> match parseLowercaseA “a”
Just ‘a’
>> “b” =~ parseLowercaseA
Nothing
>> “ab” =~ parseLowercaseA
Nothing -- (Needs to parse entire input)

Predicates and Strings

Naturally, we’ll want some more complicated functionality. Instead of parsing a single input character, we can parse any character that fits a particular predicate by using psym. So if we want to read any character that was not a newline, we could do:

parseNonNewline :: RE Char Char
parseNonNewline = psym (/= ‘\n’)

The string combinator allows us to match a particular full string and then return it:

readFeatureWord :: RE Char String
readFeatureWord = string “Feature”

We’ll use this for parsing keywords, though we’ll often end up discarding the “result”.

Applicative Combinators

Now the RE type is applicative. This means we can apply all kinds of applicative combinators over it. One of these is many, which allows us to apply a single parser several times. Here is one combinator that we’ll use a lot. It allows us to read everything up until a newline and return the resulting string:

readUntilEndOfLine :: RE Char String
readUntilEndOfLine = many (psym (/= '\n'))

Beyond this, we’ll want to make use of the applicative <*> operator to combine different parsers. We can also apply a pure function (or constructor) on top of those by using <$>. Suppose we have a data type that stores two characters. Here’s how we can build a parser for it:

data TwoChars = TwoChars Char Char

parseTwoChars :: RE Char TwoChars
parseTwoChars = TwoChars <$> parseNonNewline <*> parseNonNewline

...

>> match parseTwoChars “ab”
Just (TwoChars ‘a’ ‘b’)

We can also use <* and *>, which are cousins of the main applicative operator. The first one will parse but then ignore the right hand parse result. The second discards the left side result.

parseFirst :: RE Char Char
parseFirst = parseNonNewline <* parseNonNewline

parseSecond :: RE Char Char
parseSecond = parseNonNewline *> parseNonnewline

…

>> match parseFirst “ab”
Just ‘a’
>> match parseSecond “ab”
Just ‘b’
>> match parseFirst “a”
Nothing

Notice the last one fails because the parser needs to have both inputs! We’ll come back to this idea of failure in a second. But now that we know this technique, we can write a couple other useful parsers:

readThroughEndOfLine :: RE Char String
readThroughEndOfLine = readUntilEndOfLine <* sym '\n'

readThroughBar :: RE Char String
readThroughBar = readUntilBar <* sym '|'

readUntilBar :: RE Char String
readUntilBar = many (psym (\c -> c /= '|' && c /= '\n'))

The first will parse the rest of the line and then consume the newline character itself. The other parsers accomplish this same task, except with the vertical bar character. We’ll need these when we parse the Examples section next week.

Alternatives: Dealing with Parse Failure

We introduced the notion of a parser “failing” up above. Of course, we need to be able to offer alternatives when a parser fails! Otherwise our language will be very limited in its structure. Luckily, the RE type also implements Alternative. This means we can use the <|> operator to determine an alternative parser when one fails. Let’s see this in action:

parseFeatureTitle :: RE Char String
parseFeatureTitle = string “Feature: “ *> readThroughEndOfLine

parseScenarioTitle :: RE Char String
parseScenarioTitle = string “Scenario: “ *> readThroughEndOfLine

parseEither :: RE Char String
parseEither = parseFeatureTitle <|> parseScenarioTitle

…

>> match parseFeatureTitle “Feature: Login\n”
Just “Login”
>> match parseFeatureTitle “Scenario: Login\n”
Nothing
>> match parseEither “Scenario: Login\n”
Just “Login”

Of course, if ALL the options fail, then we’ll still have a failing parser!

>> match parseEither “Random: Login\n”
Nothing

We’ll need this to introduce some level of choice into our parsing system. For instance, it’s up to the user if they want to include a Background as part of their feature. So we need to be able to read the background if it’s there or else move onto parsing a scenario.

Conclusion

That wraps up our introduction to the basic combinators of applicative parsing. Next week, we’ll take all the pieces we’ve developed here and put them to work on Gherkin syntax itself. Everything seems pretty small so far. But we’ll see that we can actually build up our results very rapidly once we have the basic pieces in place!

If you want to see some more libraries that are useful for important Haskell tasks, take a look at our Production Checklist. It will introduce you to some libraries for parsing, databases, APIs, and much more!

If you’re new to Haskell, there’s no better time to start! Download our free Beginners Checklist! It will help you download the right tools and start learning the language.

Read More
James Bowen James Bowen

Parsing Primer: Gherkin Syntax!

One topic I have yet to discuss on this blog is how to parse a domain specific language. This is odd, because Haskell has some awesome approaches for parsing. Haskell expressions tend to compose in awesome and simple ways. This provides an ideal environment in which to break down parsing into simpler tasks. Thus there are many excellent libraries out there.

In these next few weeks, we’ll be taking a tour of these different parsing libraries. But before we look at specific code, it will be useful to establish a common example for what we’re going to be parsing. In this article, I’ll introduce Gherkin Syntax, the language behind the Cucumber framework. We’ll go through the language specifics, then show the basics of how we set ourselves up for success in Haskell.

Gherkin Background

Cucumber is a framework for Behavior Driven Development. Under BDD, we first describe all the general behaviors we want our code to perform in plain language. This paradigm is an alternative to Test Driven Development. There, we use test cases to determine our next programming objectives. But BDD can do both of these if we can take behavior descriptions and automatically create tests from them! This would allow less technical members of a project team to effectively write tests!

The main challenge of this is formalizing a language for describing these behaviors. If we have a formal language, then we can parse it. If we can parse it into a reasonable structure, then we can turn that structure into runnable test code. This series will focus on the second part of this problem: turning Gherkin Syntax into a data structure (a Haskell data structure, in our case).

Gherkin Syntax

Gherkin syntax has many complexities, but for these articles we’ll be focusing on the core elements of it. The behaviors you want to test are broken down into a series of features. We describe each feature in its own .feature file. So our overarching task is to read input from a single file and turn it into a Feature object.

We begin our description of a feature with the Feature keyword (obviously). We'll give it a title, and then give it an indented description (our example will be a simple banking app):

Feature: Registering a User
  As a potential user
  I want to be able to create an account with a username,
    email and password
  So that I can start depositing money into my account

Each feature then has a series of scenarios. These describe specific cases of what can happen as part of this feature. Each scenario begins with the Scenario keyword and a title.

Scenario: Successful registration
  ...

Scenario: Email is already taken
  ...

Scenario: Username is already taken
  ...

Each scenario then has a series of Gherkin statements. These statements begin with one of the keywords Given, When, Then, or And. You should use Given statements to describe pre-conditions of the scenario. Then you’ll use When to describe the particular action a user is taking to initiate the scenario. And finally, you’ll use Then to describe the after effects.

Scenario: Email is already taken
  Given there is already an account with the email “test@test.com”
  When I register an account with username “test”,
    email “test@test.com” and password “1234abcd!?”
  Then it should fail with an error:
    “An account with that email already exists”

You can supplement any of these cases with a statement beginning with And.

Scenario: Email is already taken
  Given there is already an account with the email “test@test.com”
  And there is already an account with the username “test”
  When I register an account with username “test”,
    email “test@test.com” and password “1234abcd!?”
  Then it should fail with an error: 
    “An account with that email already exists”
  And there should still only be one account with 
    the email “test@test.com”

Gherkin syntax does not enforce that you use the different keywords in a semantically sound way. We could start every statement with Given and it would still work. But obviously you should do whatever you can to make your tests sound correct.

We can also fill in statements with variables in angle brackets. We'll then follow the scenario with a table of examples for those variables:

Scenario: Successful Registration
  Given There is no account with username <username>
    or email <email>
  When I register the account with username <username>,
    email <email> and password <password>
  Then it should successfully create the account
    with <username>, <email>, and <password>
  Examples:
    | username | email              | password      |
    | john doe | john@doe.com       | ABCD1234!?    |
    | jane doe | jane.doe@gmail.com | abcdefgh1.aba |
    | jackson  | jackson@yahoo.com  | cadsw4ll0p/   |

We can also create a Background for the whole feature. This is a scenario-like description of preconditions that exist for every scenario in that feature. This can also have an example table:

Feature: User Log In
  ...

Background:
  Given: There is an existing user with username <username>,
    email <email> and password <password>
  Examples:
    | username | email              | password      |
    | john doe | john@doe.com       | ABCD1234!?    |
    | jane doe | jane.doe@gmail.com | abcdefgh1.aba |

And that’s the whole language we’re going to be working with!

Haskell Data Structures

Let’s appreciate now how easy it is to create data structures in Haskell to represent this syntax. We’ll start with a description of a Feature. It has a title, description (which we’ll treat as a list of multiple lines), the background, and then a list of scenarios. We’ll also treat the background like a “nameless” scenario that may or may not exist:

data Feature = Feature
  { featureTitle :: String
  , featureDescription :: [String]
  , featureBackground :: Maybe Scenario
  , featureScenarios :: [Scenario]
  }

Now let’s describe what a Scenario is. It's main components are its title and a list of statements. We’ll also observe that we should have some kind of structure for the list of examples we'll provide:

data Scenario = Scenario
  { scenarioTitle :: String
  , scenarioStatements :: [Statement]
  , scenarioExamples :: ExampleTable
  }

This ExampleTable will store a list of possible keys as well as list of tuple maps. Each tuple will contain keys and values. At the scale we’re likely to be working at, it’s not worth it to use a full Map:

data ExampleTable = ExampleTable
  { exampleTableKeys :: [String]
  , exampleTableExamples :: [[(String, Value)]]
  }

Now we'll have to define what we mean by a Value. We’ll keep it simple and only use literal bools, strings, numbers, and a null value:

data Value =
  ValueNumber Scientific |
  ValueString String |
  ValueBool Bool |
  ValueNull

And finally we’ll describe a statement. This will have the string itself, as well as a list of variable keywords to interpolate:

data Statement = Statement
  { statementText :: String
  , statementExampleVariables :: [String]
  }

And that’s all there is too it! We can put all these types in a single file and feel pretty good about that. In Java or C++, we would want to make a separate file (or two!) for each type and there would be a lot more boilerplate involved.

General Parsing Approach

Another reason we’ll see that Haskell is good for parsing is the ease of breaking problems down into smaller pieces. We’ll have one function for parsing an example table, a different function for parsing a statement, and so on. Then gluing these together will actually be slick and simple!

Conclusion

Next week, come back and we’ll actually look at how we start parsing this. The first library we’ll use is the regex-applicative parsing library. We’ll see how we can get a lot of what we want without even using a monadic context!

For some more ideas on parsing libraries you can use, check out our free Production Checklist. It will tell you about different libraries for parsing as well as a great many other tasks, from data structures to web APIs!

If you’ve never written in Haskell before but are intrigued by the possibilities, download our Beginner’s Checklist and read our Liftoff Series!

Read More
James Bowen James Bowen

Monday Morning Haskell: Upgraded!

Welcome to the new Monday Morning Haskell! We just went live with the latest changes to the website this week. So it’s time to announce what’s coming next. Our main project right now is converting older blog content into permanent, organized, series. We currently have two sections for these series. One is focused on beginners, the other on more advanced Haskellers.

Beginners Section

The Beginners Section will obviously focus on content for people who are new to Haskell. Right now, there are two series of articles. The first is our Liftoff series. If you have never programmed in Haskell before, this is the series for you! You'll learn how to install Haskell on your system, as well as the core language mechanics.

The second set of articles is our Haskell Brain series. This series focuses on the mental side of learning Haskell. It goes through the psychological hurdles many people face when starting Haskell. It also goes over some interesting general techniques for learning.

Advanced Section

The Advanced Section features content for those trying to make the step up from hobbyists to professional Haskellers. It incorporates two of our more recent series from the blog. First up is the Web Skills series. This series goes through some interesting libraries for tasks you might need when building a Web backend. For instance, you'll learn about the Persistent database library, the Servant API library, and some general testing techniques.

The advanced section also features a series on Haskell and machine learning. It starts off by making the case for why Haskell is a good fit for machine learning in general. Then it goes through some specific examples. One highlight of this series is a tutorial on the Haskell Tensor Flow bindings. It also gives some examples of using dependent types within Tensor Flow.

Resources Update

We've also made a big update to our subscriber-only resources. All the resources are available on the resources page. If you're a subscriber to our email list, you should have gotten an email with the password to this page! If you're not subscribed yet, you can still sign up for free! You'll get access to all of these:

  1. Beginner's Checklist – The newly revised version of our Getting Started Checklist will help you review some of the core concepts of Haskell. It will also point you towards some additional resources to help you learn even more!
  2. Production Checklist – This NEW resource lists a large number of libraries you can use for production tasks. It goes well beyond the set covered in the web skills series and gives a short summary of each.
  3. Recursion Workbook – This workbook contains a couple chapters of content that will teach you all about recursion. Then it offers 10 practice problems so you can put your skills to the test!
  4. Stack Mini-Course – This mini-course will walk you through the basics of the Haskell Stack tool so you can actually make your own projects!
  5. Servant Tutorial – At BayHac 2016 I gave a talk on the Servant library. If you're a subscriber, you can get the slides and the sample code for that talk.
  6. Tensor Flow Guide – This guide accompanies our Haskell AI series. It goes through all the details you need to know about getting the Haskell Tensor Flow library up and running.

If you subscribe, you'll also get our monthly newsletter! This will detail what's new on the blog, and what content you can expect in the future!

The Blog

Going forward, I'll be continuing to take some older blog content and form it into coherent series. Most of my weekly blog posts for the time being will focus on announcing when these are available. I do have quite a bit more fresh content planned for the future though, so stay tuned! In the meantime, if there's an old blog article you're trying to find, you can use our search functionality! I've added tags to each blog post to help you out!

So don't forget, if you want access to our awesome resources, sign up for free!

Read More
James Bowen James Bowen

Functors Done Quick!

Suppose we're writing some code to deal with bank accounts. Most of our code will refer to these using a proper data type. But less refined parts of our code might use a tuple with the same information instead. We would want a conversion function to go between them. Here's a simple example:

data BankAccount = BankAccount
  { bankName :: String
  , ownerName :: String
  , accountBalance :: Double
  }

convertAccount :: (String, String, Double) -> BankAccount
convertAccout (bank, owner, balance) = BankAccount bank owner balance

Naturally, we'll want a convenience function for performing this operation on a list of items. We'll can use map for lists.

convertAccounts :: [(String, String, Double)] -> [BankAccount]
convertAccounts = map convertAccount

But Haskell has a plethora of different data structures. We can store our data in a Set, or a Vector, for a couple examples. What if different parts of our code store the data differently? They would need their own conversion functions, since the list version of map doesn't work on a Set or Vector. Can we make this code more generic?

Functors

If you read the blog post a couple weeks ago, you'll remember the idea of typeclasses. This is how we can make our code generic! We want to generalize the behavior of running a transformation over a data structure. We can make a typeclass to encapsulate this behavior. Luckily, Haskell already has such a typeclass, called Functor. It has a single function, fmap. Here is how it is defined:

class Functor f where
  fmap :: (a -> b)  ->  f a  ->  f b

If that type signature looks familiar, that's because it's almost identical to the map function over lists. And in fact, the List type uses map as it's implementation for fmap:

map :: (a  ->  b)  ->  [a]  ->  [b]

instance Functor [] where
  fmap = map

Other Functor Instances

Now, Set and Vector do have map functions. But to make our code generic, we have to define functor instances as a go-between:

instance Functor Vector where
  fmap = Data.Vector.map

instance Functor Set where
  fmap = Data.Set.map

With all this in mind, we can now rewrite convertAccounts generically.

convertAccounts :: (Functor f) => f (String, String, Double)  ->  f BankAccount
convertAccounts = fmap convertAccount

Now anything can use convertAccounts no matter how it structures the data, as long as it uses a functor! Let's looks at some of the other functors out there!

While it might not seem to fit in the same category as lists, vectors and sets, Maybe is also a functor! Here's its implementation:

instance Functor Maybe where
  fmap _ Nothing = Nothing
  fmap f (Just a) = Just (f a)

Another example of a functor is Either. This one is a little confusing since Either has two type parameters. But really, we have to fix the first parameter. Then the conversion function is only applied to the second. This means that, like with the Nothing case above, when we have Left, we return the original value:

instance Functor (Either a) where
  fmap _ (Left a) = Left a
  fmap f (Right x) = Right (f x)

Conceptualizing Functors

So concretely, Functor is a typeclass in Haskell. But how can we think of it conceptually? This is actually pretty simple. A functor is nothing more than a generic container or box. We don't know how many elements it contains. We don't know what the structure of those elements is. But if we have a way to transform those elements, we can apply that function over all of them. The result will be a new container with the same structure, but new elements. As far as abstractions go, this is probably the cleanest one we'll get, so enjoy it!

Conclusion

Functor is an example of typeclass that we can use to get general behavior. In this case, the behavior is transforming a group of objects in a container while maintaining the container's structure. We saw how this typeclass allowed us to re-use a function over many different types. Functors are the simplest in a series of important typeclasses. Applicative functors would come next, and then monads. Monads are vital to Haskell. So understanding functors is an important first step towards learning more complex Haskell.

But you can't learn about data structures until you know the basics! If you've never written any Haskell before, download out Getting Started Checklist! If you're comfortable with the basics and want more of a challenge, take a look at our Recursion Workbook!

Read More
James Bowen James Bowen

Need to be Faster? Be Lazy!

In more procedural and object oriented languages, we write code as a series of commands. These commands get executed in the order we write them in no matter what. Consider this example:

int myFunction(int a, int b, int c) {
  int result1 = longCalculation1(a,b);
  int result2 = longCalculation2(b,c);
  int result3 = longCalculation3(a,c);
  if (result1 < 10) {
    return result1;
  } else if (result2 < 100) {
    return result2;
  } else {
    return result3;
  }
}

There’s a clear inefficiency here. No matter what, we’ll perform all three long running operations. But we might not actually need all the results! We could rewrite the code to get around this.

int myFunction(int a, int b, int c) {
  int result1 = longCalculation1(a,b);
  if (result1 < 10) {
    return result1;
  } else {
    int result2 = longCalculation2(b,c);
    if (result2 < 100) {
     return result2;
    } else {
      int result3 = longCalculation3(a,c);
      return result3;
    }
  }
}

But now it’s a little less clear what’s going on. The code isn’t as readable. And there are some situations where this kind of refactoring is impossible. This is an inevitable consequence of the paradigm of eager evaluation in almost all mainstream languages. In Haskell we write expressions, rather than commands. Thus evaluation order is a little less clear. In fact, Haskell expressions are evaluated lazily. We don’t perform any calculations until we’re sure they’re needed! Let’s see how this works.

How Laziness Works

Here’s how we can write the function above in Haskell:

myFunction :: Int -> Int -> Int -> Int
myFunction a b c =
  let result1 = longCalculation1 a b
      result2 = longCalculation2 b c
      result3 = longCalculation3 a c
  in if result1 < 10
       then result1
       else if result2 < 100
         then result2
         else result3

While this seems semantically identical to the first C++ version, it actually runs as efficiently as the second version! In Haskell, result1, result2, and result3 get stored as “thunks”. GHC sets aside a piece of memory for the result, and knows what calculation it has to perform to get the result. But it doesn’t perform the calculation until we need the result.

Here’s another example. Suppose we want all Pythagorean triples whose sum is less than 1000. Sounds like a tall order. But enter the following into GHCI, and you’ll see that it happens very quickly!

>> let triples = [(a,b,c) | a <- [1..1000], b <- [1..1000], c <- [1..1000], a + b + c < 1000, (a ** 2) + (b**2) == c ** 2]

Did it perform all that calculation so quickly? Of course not! If you now print triples, it will take a while to print it all out. But suppose we only wanted 5 examples! It doesn’t take too long!

>> take 5 triples
[(3.0,4.0,5.0),(4.0,3.0,5.0),(5.0,12.0,13.0),(6.0,8.0,10.0),(7.0,24.0,25.0)]

As we see, an element is typically only brought into scope when it is needed by an IO action, such as print. If you’re using GHCI and print the result of a calculation, you’ll need to evaluate the whole calculation. Otherwise, you’ll need the calculation once it (or an expression that depends on it) gets printed by main.

Infinite Lists as a Consequence of Laziness

Besides potentially saving time, laziness has some other interesting consequences. One of these is that we can have data structures that can’t exist in other languages. For instance, we can define an “infinite” list:

>> let infList = [1..]

This list starts at 1, and each element counts up by 1, going up to infinity! But how is this possible? We don’t have an infinite amount of memory! The key is that we don’t actually bring any of the elements into scope until we need them. For example, we can take the first 10 elements of an infinite list.

>> take 10 [1..]
[1,2,3,4,5,6,7,8,9,10]

Of course, if we try to print the entire list, we’ll run into problems!

>> [1..]
(Endless printing of numbers)

But there are some cool things we can do with infinite lists. For instance, it’s easy to match up a list of elements with the numeric index of the element in the list. We can do this by using zip in conjunction with an infinite list:

addIndex :: [a] -> [(Int, a)]
addIndex = zip [1..]

Or we could match every element with its index modulo 4:

addIndexMod4 :: [a] -> [(Int, a)]
addIndexMod4 = zip (cycle [0,1,2,3])

Disadvantages of Laziness

Haskell’s laziness isn’t infallible. It can get us into trouble sometimes. While it often saves us time, it can cost us in terms of space. This is apparent even in a simple example using foldl.

>> foldl (+) 0 [1..100000000]
Stack overflow!

When we add the numbers up through 100 million, we should be able to do it with constant memory. All we would need would be a running tally of the current sum. On the one hand, laziness means that the entire list of a hundred million numbers is never all in scope at the same time. But on the other hand, all the calculations involved in that running tally happen lazily! So at some point, our memory footprint actually looks like this:

(((((1 + 2) + 3) + 4) + …) + 100000000)

That is, all the individual numbers are in memory at the same time because the + operations aren’t evaluated until they need to be! In situations like this example, we want to introduce strictness into our code. We can do this with the seq function. This function is a little special. It takes two arguments, and returns the second of them. However, it is strict in its first argument. That is, the first item we pass to it will be fully evaluated.

We can see this in use in the definition of foldl’, the strict counterpart to foldl:

foldl’ f accum [] = accum
foldl’ f accum (x : xs) = let newAccum = f accum x
                           in seq newAccum $ foldl’ f newAccum xs

The use of seq here causes Haskell to evaluate newAccum strictly, so we don't keep storing calculations in memory. Using this technique, we can now actually add up that list of integers!

>> foldl’ (+) 0 [1..100000000]
5000000050000000

Conclusion

Laziness is another feature that makes Haskell pretty unique among programming languages. Like any language feature, it has its drawbacks. It gives us yet another way we have to reason differently about Haskell compared to other languages. But it also has some distinct advantages. We can have some significantly faster code in many cases. It also allows us to use structures like infinite lists that don’t exist in other languages.

Hopefully this has convinced you to give Haskell a try. Take a look at our Getting Started Checklist and get going!

Read More
James Bowen James Bowen

Immutability: The Less Things Change, The More You Know

Most programmers take it for granted that they can change the values of their expressions. Consider this simple Python example:

> a = [1,2,3]
> a.reverse()
> a
[3,2,1]

We can see that the reverse function actually changed the underlying list. Here’s another example, this time in C++. We pass a pointer to an integer as a parameter, and we can update the integer within the function.

int myFunction(int* a) {
  int result = 0;
  if (*a % 2 == 0) {
    result = 10;
  } else {
    result = 20;
  }
  ++(*a);
  return result;
}

When we call this function, the original expression changes values.

int main() {
  int x = 4;
  int* xp = &x;
  int result = myFunction(xp);
  cout << result << endl;
  cout << *xp << endl;
}

Even though xp initially points to the value 4, when we print it at the end, the value is now 5! But as we’ll learn, Haskell does not, in general, allow us to do this! Let’s see how it works.

Immutability

In Haskell, all expressions are immutable! This means you cannot change the underlying value of something like you can in Python or C++. There are still some functions that appear to mutate things. But in general, they don’t change the original value. They create entirely new values! Let’s look at an example with reverse:

>> let a = [1,2,3]
>> reverse a
[3,2,1]
>> a
[1,2,3] -- unchanged!

The reverse function takes one argument, a list, and returns a list. But the final value is a totally new list! Observe how the original expression a remains the same! Compare this to the earlier Python example. The reverse function actually had a “void” return value. Instead, it changed the original list.

Record syntax is another example where we appear to mutate a value in Haskell. Consider this type and an accompanying mutator function:

data Person = Person
  { personName :: String
  , personAge :: Int
  } deriving (Show)

makeAdult :: Person -> Person
makeAdult person = person { personAge = 18}

But when we actually use the function, we’ll find again that it creates a totally new value! The old one stays the same!

>> let p = Person “John” 17
>> makeAdult p
Person {personName = “John”, personAge = 18}
>> p
Person {personName = “John”, personAge = 17}

Advantages of Immutability

Immutability might seem constraining at first. But it’s actually very liberating! Until you try programming with immutability by default, you don’t realize quite how many bugs mutable data causes. It is tremendously useful to know that your values cannot change. Suppose we take a list as a parameter to a function. In Haskell, we know that no matter how many functions we call with that list as a parameter, it will still be the same each time.

example :: [Int] -> Int
example myList = …
  where
    -- Each call uses the EXACT same list!
    result1 = function1 myList
    result2 = function2 result1 myList
    result3 = function3 result2 myList

Immutability also means you don’t have to worry about different ways to “copy” a data structure. Every copy is a shallow copy, since you can’t change the values within the structure anyway!

Ways Around Immutability

Naturally, there are situations where you want to have mutable data. But we can always simulate this effect in Haskell by using more advanced types! For instance, we can easily represent the C++ function above using the State monad.

myFunction :: State Int Int
myFunction = do
  a <- get
  let result = if a `mod` 2 == 0
        then 10
        else 20
  modify (+1) -- Change the underlying state value
  return result

{-
>> let x = 4
>> runState myFunction x
(10, 5)
>> x
4
-}

Again, this doesn’t actually “mutate” any data. When we pass x into our State function, x itself doesn’t change! Instead, the function returns a completely new value. Now, different calls to get can return us different values depending on the state. But this fact is encoded in the type system. We explicitly declare that there is an Int value that can change.

Of course, there are times where we actually do want to change the specific values in memory. One example of this is if we want to perform an in-place sort. We’ll have the move the elements of the array to different spots in memory. Otherwise we will have to allocate at least O(n) more space for the final list. In cases like this, we can use IO references. To sort an array, we’d want Data.Array.IO. For many other cases, we’ll just want the IORef type. Whenever you need to truly mutate data, you need to be in the IO monad.

Looking at all these examples, what we see is that Haskell doesn’t actually limit us at all! We can get all the same mutability effects we have in other languages. The difference is that in Haskell the default behavior is immutability. We have to use the type system to specify when we want mutable data.

Contrast this with C++. We can get immutable data by using the const keyword if we want. But the default is mutable data and we have to use the type system to make it immutable.

Conclusion

Immutability sounds crazy. But it does a huge amount to limit the kinds of bugs we can get. It seems like a big limitation on your code, but there are plenty of workarounds when you need them. The key fact is that mutable data is encoded within the type system. This forces you to be very conscious about when your data is mutable, and that will help you avoid bugs.

Want to see for yourself what the hype is about? Give Haskell a shot! Download our Getting Started Checklist and start learning Haskell!

Read More
James Bowen James Bowen

General Functions with Typeclasses

Last week, we looked at the basics of Haskell’s data types. We saw that haskell is not an object oriented language, and we don’t have inheritance between data types. This would get very confusing with all the different constructors that a data type can have. Haskell gives a lot of the same functionality as inheritance by using Typeclasses. This week we’ll take a quick look at this concept.

What is a Typeclass?

A typeclass encapsulates functionality that is common to different types. In practice, a typeclass describes a series of functions that you expect to exist for a given type. When these functions exist, you can create what is called an “instance” of a typeclass.

Typeclasses are a lot like interfaces in Java. You specify a group of functions, but only with the type signatures. Then for each relevant type, you'll need to specify an implementation for each function. As an example, suppose we had two different types referring to different kinds of people.

data Student = Student String Int

data Teacher = Teacher
  { teacherName:: String
  , teacherAge:: Int
  , teacherDepartment :: String
  , teacherSalary :: Int
  }

We could then make a typeclass called IsPerson. We'll give it a couple functions that refer to the name and age of the person. Then we parameterize the class by the single type a. We'll use that type parameter in the type signatures of the functions:

class IsPerson a where
  personName :: a -> String
  personAge :: a -> Int

Creating Instances of Typeclasses

Now let's create an instance of the typeclass. All we have to do is implement each function under the instance keyword:

instance IsPerson Student where
  personName (Student name _) = name
  personAge (Student _ age) = age

instance IsPerson Teacher where
  personName = teacherName
  personAge = teacherAge

There are a lot of simple typeclasses in the base libraries that you’ll need to know for some basic tasks. For instance, to compare two items for equality, you’ll need the Eq typeclass:

class Eq a where
  (==) :: a -> a -> Bool
  (/=) :: a -> a -> Bool

We can define instances for these for all our types. But for simple, base library classes like these, GHC can define them for us! All we need is the deriving keyword:

data Student = Student String Int
  deriving (Eq)

data Teacher = Teacher
  { teacherName:: String
  , teacherAge:: Int
  , teacherDepartment :: String
  , teacherSalary :: Int
  }
  deriving (Eq)

Using Typeclass Constraints

But why are typeclasses important? Well, often times we want to write code that is as general as possible. We want to write functions that assume as little about their inputs as they can. For instance, suppose we have this function that will print a teacher’s name:

printName :: Teacher -> IO ()
printName teacher = putStrLn $ personName teacher

We can use this function for more types than Teacher though! Any type that implements IsPerson will do. So we can make the function polymorphic, and add the IsPerson constraint on our a type:

printName :: (IsPerson a) => a-> IO ()
printName person = putStrLn $ personName person

You can also use typeclasses to constrain a type parameter of a new data type.

data (IsPerson a) => EmployeeRecord a = EmployeeRecord
  { employee :: a
  , employeeTenure :: Int
  }

Typeclasses can even provide a form of inheritance. You can constrain a typeclass by another typeclass! A couple base classes show an example of this. The “Orderable” typeclass Ord depends on the type having an instance of Eq:

class (Eq a) => Ord a where
  compare :: a -> a -> Ordering
  (<=) :: a -> a -> Bool
  ...

Conclusion

Haskell programmers like code that is as general as possible. Object oriented languages try to accomplish this with inheritance. But Haskell gets most of the same functionality with typeclasses instead. They describe common features between types, and provide a lot of flexibility.

To continue learning more about the Haskell basics, take a look at our Getting Started Checklist and get going!

Do you already understand the basics and want more of a challenge? Check out our Recursion Workbook!

Read More
James Bowen James Bowen

Haskell Data Types in 5 Steps

People often speak of a dichotomy between “object oriented” programming and “functional” programming. Haskell falls into the latter category, meaning we do more of our work with functions. We don't use hierarchies of objects to abstract work away. But Haskell is also heavily driven by its type system. So of course we still define our own data types in Haskell! Even better, Haskell has unique mechanisms you won't find in OO languages!

The Data Keyword and Constructors

In general, we define a new data type by using the data keyword, followed by the name of the type we’re defining. The type has to begin with a capital letter to distinguish it from normal expression names.

data Employee = ...

To start defining our type, we must provide a constructor. This is another capitalized word that allows you to create expressions of your new type. The constructor name is then followed by a list of 0 or more other types. These are like the “fields” that a data type carries in a language like Java or C++.

data Employee = Executive String Int Int

employee1 :: Employee
employee1 = Executive "Jane Doe" 38 300000

Sum Types

In a language like Java, you can have multiple constructors for a type. But the type will still encapsulate the same data no matter what constructor you use. In Haskell, you can have many constructors for your data type, separated by a vertical bar |. Each of your constructors then has its own list of data types! So different constructors of the same type can have different underlying data! We refer to a type with multiple constructors as a “sum” type.

data Employee =
 Executive String Int Int |
 VicePresident String String Int |
 Manager String String |
 Engineer String Int

If your type has only one constructor, it is not uncommon to re-use the name of the type as the constructor name:

data Employee = Employee String Int

Record Syntax

You can also define a type using “record syntax”. This allows you to provide field names to each type in the constructor. With these, you access the individual fields with simple functions. Otherwise, you'll need to resort to pattern matching. This is more commonly seen with types that use a single constructor. It is a good practice to prefix your field names with the type name to avoid name conflicts.

data Employee = Employee
 { employeeName :: String
 , employeeAge :: Int
 }

printName :: Employee -> IO ()
printName employee = putStrLn $ employeeName employee

Type Synonyms

As in C++, you can create type synonyms, providing a second name for a type. Sometimes, expressions can mean different things, even though they have the same representation. Type synonyms can help keep these straight. To make a synonym, use the type keyword, the new name you would like to use to refer to your type, and then the original type.

type InterestRate = Float
type BankBalance = Float

applyInterest :: BankBalance -> InterestRate -> BankBalance
applyInterest balance interestRate = balance + (balance * interestRate)

Note though that type synonyms have no impact on how your code compiles! This means it is still quite possible to misuse them! The following type signatures will still compile for this function:

applyInterest :: Float -> Float -> Float
applyInterest :: InterestRate -> BankBalance -> Float

Newtypes

To avoid the confusion that can occur above, you can use the newtype keyword. A newtype is like a cross between data and type. Like type, you’re essentially renaming a type. But you do this by writing a declaration that has exactly one constructor with exactly one type. As with a data declaration, you can use record syntax within newtypes.

newtype BankBalance = BankBalance Float
newtype InterestRate = InterestRate { unInterestRate :: Float }

Once you’ve done this, you will have to use the constructors (or record functions) to wrap and unwrap your code:

applyInterest :: BankBalance -> InterestRate -> BankBalance
applyInterest (BankBalance bal) rate = BankBalance $
 bal + (unInterestRate rate * bal)

Newtype declarations do affect how your code compiles. So the following invalid type signature will NOT compile!

applyInterest :: InterestRate -> BankBalance -> Float
applyInterest (BankBalance bal) (InterestRate rate) = ...

Conclusion

As we learned a couple weeks ago, types are important in Haskell. So it’s not surprising that Haskell has some nifty constructs for building our own types. Constructors and sum types give us the flexibility to choose what kind of data we want to store. We can even change the data stored for different elements of the same type! Type synonyms and newtypes give us two different ways to rename our types. The first is easy and helps avoid confusion. The second requires more code re-writing, but provides more type safety.

If you’ve never written a line of Haskell before, never fear! Take a look at our Getting Started Checklist to get going!

Read More
James Bowen James Bowen

Syntactic Implications of Expressions

Last week we explored expressions and types, the fundamental building blocks of Haskell. These concepts revealed some major structural differences between Haskell and procedural languages. This week we’ll consider the implications of these ideas. We'll see how they affect syntactic constructs in Haskell.

If Statements

Consider this Java function:

public int func(int input) {
  int z = 5;
  if (input % 2 == 0) {
    z = 4;
  }
  return z * input;
}

Here we see a very basic if-statement. Under a certain condition, we change the value of z. If the condition isn’t true we do nothing. But this is not how if statements work in Haskell! What lies inside the if statement is a command, and we compose our code with expressions and not commands!

In Haskell, an if-statement is an expression like anything else. This means it has to have a type! This constrains us in a couple ways. If the condition is true, we can supply an expression that will be the result. Then the type of the whole statement has the type of this expression. But what if the condition is not true? Can an expression be null or void? Most of the time no!* The following is rather problematic:

myValue :: Int -> ??? -- What type would this have if the condition is false?
myValue x = if x `mod` 2 == 0 then 5

This means that if-statements in Haskell must have an else branch. Furthermore, the else branch must have an expression that has the same type as in the true branch!

myValue :: Int -> Int -- Both the false and true branches are Int
myValue x = if x `mod` 2 == 0 then 5 else 3

Notice the real difference here. We’re used to saying “if x, do y”. But in Haskell, we assign an expression to be some value that may differ depending on a condition. So from a conceptual standpoint, the “if” is further inside the statement. This is one big hurdle to cross when first learning Haskell.

*Note: In monadic circumstances, a “null” value (represented as the unit type) can make sense. See when and unless.

Where Statements

We've established that everything in Haskell is an expression. But we often use commands to assign values to intermediate variables. How can we do this in Haskell? After all, if a computation is complicated, we don’t want to describe it all on one line.

The where statement fills this purpose. We can use where, followed by any number of statements under to assign names to intermediate values!

mathFunc :: Int -> Int -> Int -> Int
mathFunc a b c = sum + product + difference
  where
    sum = a + b + c
    product = a * b * c
    difference = c - b - a

Statements in a where clause can be in any order. They can depend on each other, as long as there are no dependency loops!

mathFunc :: Int -> Int -> Int -> Int
mathFunc a b c = sum + product + difference
  where
    product = a * b * c * sum
    sum = a + b + c
    difference = sum - b - product

Let Statements

There’s a second way of describing intermediate variables! We can also use a let statement, combined with in. Unlike with where, let bindings must be in the right order. They can't depend on later values. Here’s the above function, written using let:

mathFunc :: Int -> Int -> Int -> Int
mathFunc a b c =
  let sum = a + b + c
       product = a * b * c
       difference = c - b - a
  in sum + product + difference

Why do we have two ways of expressing the same concept? Well think about writing an essay. You'll often have terms you want to define for the reader. Sometimes, it makes more sense to define these before you use them. This is like using let. But sometimes, you can use the expression first, and show the details of how you calculated it later. This is what where is for! This especially works when the expression name is descriptive.

As a side note, there are also situations with monads where you can’t use where and have to use let. But that’s a topic for another time!

Conclusion

If you’re new Haskell, hopefully these short articles are giving you some quick insights into the language. For more details, take a look at our Getting Started Checklist! If you think you’ve mastered the basics and want to learn how to organize a real project, you should take our free Stack mini-course. It will teach you how to use the Stack tool to keep your Haskell code straight.

Read More
James Bowen James Bowen

Back to Basics: Expressions and Types, Distilled

This week we begin our “Haskell Distilled” series. These articles will look at important Haskell concepts and break them down into the most fundamental parts. This week we start with the essence of what defines Haskell as a language.

Everything is an Expression

In Haskell, almost everything we write is an expression. This goes from the simplest primitive elements to the most complicated functions. Each of the following is an expression:

True
False
4
9.8
[6, 2]
\a -> a + 5
‘a’ 
“Hello”

This is, in part, what defines Haskell as a functional language. We can compare this against a more procedural language like java:

public int add5ToInput(int x) {
  int result = x  + 5;
  return 5;
}

The method we’ve defined here isn’t a expression in the same way a function is in Haskell. We can’t substitute the definition of this method in for an invocation of it. In Haskell, we can do exactly this with functions! Each of the lines in this method are not expressions themselves either. Rather, each of them is a command to execute.

Even Haskell code that appears to be procedural in nature is really an expression!

main = do
  putStrLn “Hello”
  putStrLn “World”

In this case, main is an expression of type IO (), not a list of procedures. Of course, the IO monad functions in a procedural manner sometimes. But still, the substitution property holds up.

Every Expression has a Type

Once we understand expressions, the next building block is types. All expressions have types. For instance, we can write out potential types for each of the expressions we had above:

True :: Bool
False :: Bool
4 :: Int
9.8 :: Float
[6, 2] :: [Int]
\a -> a + 5 :: Int -> Int
‘a’ :: Char
“Hello” :: String

Types tell us what operations we can perform with the data we have. They ensure we don’t try to perform operations with non-sensical data. For instance we can only use the add operation (+) on expressions that are of the same type and that are numeric:

(+) :: (Num a) -> a -> a -> a

By contrast, in javascript, where we can "add" arbitrary variables (think [] + {}) which can have non-sensical results.

Functions are Still Just Expressions

As we saw in the Java example above, methods are a little cumbersome. There is a distinction between a method definition and the commands we use for normal code. But in Haskell, we define our more complicated expressions as functions. And these are still expressions just like our most primitive definitions! Functions have types as well, based on the parameters they take. So we could have functions that take one or more parameters:

add5 :: Int -> Int
add5 x = x + 5

sumAndProduct :: Int -> Int -> Int
sumAndProduct x y = (x + y) * (x + y)

In our Java example, the statement return result is fundamentally different from the method definition of add5ToInput. But above, sumAndProduct isn’t a different concept from the simpler expression 5.

We use functions by “applying” them to arguments like so:

sumAndProduct 7 8 :: Int

We can also partially apply functions. This means we fix one of the arguments, but leave the rest of it open-ended.

sumAndProduct 7 :: Int -> Int

The result of this expression is a new function that takes only a single argument. It’s result will be the sumAndProduct function with the first argument fixed as 7. We've peeled off one of the arguments from the function type.

Conclusion

In the coming week’s we’ll continue breaking down simple concepts. We’ll examine how some of Haskell’s other syntactic constructs work within this system of expressions and types. If you’re new to Haskell, you should check out our Getting Started Checklist so you can start using these concepts!

Read More
James Bowen James Bowen

1 Year of Haskell

This week marks the one year anniversary of Monday Morning Haskell! I’ve written an awful lot in the past year. It’s now obvious that the “blog” format doesn’t do justice to the amount of content on the site. There’s no organization, so a lot of the most useful content is stuck way down in the archives. So I’m taking this opportunity to announce some plans to reorganize the website!

Website Organization

I have two main focuses when it comes to Haskell. First, making the language easier for beginners to pick up. Second, showing the many industrial-strength tasks we can perform with Haskell. There will be two different sections of the website highlighting these components.

There will be a beginner’s corner that will focus on giving advice for starting the language from the ground up. It will feature content about the fundamental concepts and techniques of the language. It will also showcase some advice for conquering the psychological hurdles of Haskell. There will also be a section devoted to articles on production libraries, such as the recent series on building a Web API and using Tensor Flow.

Future Posts

This means I'll be focusing on organizing permanent content for a while. As a result, I won’t have as much time to spend on new blog posts. But I will still be publishing something new every Monday morning! These new posts will focus on distilling concepts into the most important parts. They'll look back at earlier lessons and pick out the highlights. Of course, I still have plans for some more in-depth technical tutorials. I’ll be looking at areas such as front-end web development, creating EDSLs, the C foreign function interface, and more.

Academy

I also am working hard on some new course material to go with our current free mini-course on learning Stack! I'm currently working on a full-length courses beginners course to help people learn Haskell from scratch. I will follow that up with another course aimed at using Haskell in production. This second course will go through both generally important production skills and showcase Haskell libraries applying those skills.

Conclusion

Altogether, the fun is just beginning at Monday Morning Haskell! I’m committed to continuing to build this website to be a resource for beginners and experts alike!

If you’re a beginner yourself, check out any of our newbie resources, like our Getting Started Checklist, our Recursion workbook, or the Stack mini-course mentioned above!

Read More
James Bowen James Bowen

The Right Types of Assumptions

A little while back I was discussing my Haskell and AI article with some other engineers I know. These people are very smart and come from diverse backgrounds of software engineering. But they tend to prefer dynamically typed languages like Javascript and Ruby. So they don’t buy into Haskell, which makes them no different from most of the engineering community.

Still, some things about the discussion surprised me. I heard, of course, the oft-repeated mantra that types do not replace tests. But even more surprising was one claim that types do not, and are not even meant to provide any kind of safety. One engineer even claimed he thinks of types only as directives to help a compiler optimize a program.

Now, it would be bold to claim that Haskell is better for all things and all circumstances. But I don’t think I’ve ever done that on this blog. It would also be unrealistic to claim that Haskell’s type system can replace all testing. And in fact I’ve dismissed that claim a couple times on this blog.

But surely Haskell’s type system gets us something. Surely, compiled types add some layer of extra safety and correctness to our programs. I’ve never thought this to be a particularly controversial point. Any yet many smart people dismiss even this claim. So these conversations got me thinking, why is it that I think Haskell's type system is useful? Why do I believe what I do?

Convincing Myself

For me, the evidence is a matter of personal experience. And a lot of other Haskellers would agree. I programmed in dynamic languages, such as Python and Javascript. I’ve also dealt a lot with Objective C and Java, which have static types but do not restrict effects as Haskell does.

It’s very clear to me where I get fewer bugs and write cleaner code that is more often correct, and that’s with Haskell. But of course I can’t base everything on my own experience, which could be subject to many biases. For instance, it could be that I haven’t spent enough time writing in dynamic languages at an industry level. It could also be that I happen to have become a more competent programmer in the last year. Perhaps I would have seen similar improvements no matter what language I focused on. Still, Haskell changed my programming in ways that other languages wouldn't have. I feel I could now go back to other languages and write better programs than I could before I learned Haskell.

Regardless, I would not feel as confident. Without Haskell's guarantees, there’s a great deal more mental overhead. You have to always ask yourself questions like, is every error condition checked? Are all my values non-null? Have I written enough tests to cover basic data format cases? Have I prevented effects like slow-running DB operations in performance critical places? Determining the answers to these might require a line-by-line audit of the code. Thorough code review is good. But when it becomes a pain, corners will get cut.

Perhaps if I had a couple more years experience writing Javascript, I wouldn’t view these tasks as such a burden. But one of the best answers I ever read on Quora gave this piece of advice. As you become a better programmer, you don’t get better at keeping things in your head and knowing the whole system. You get better at sectioning off the system in a way that you don’t have to keep it in your head. Haskell lends itself very well to this approach.

So it’s true that other languages can be written in a safe way. But Haskell forces you to write and think safely. In this regard, Haskell would actually be a fantastic first language.

Empirical Evidence?

Now there’s been some efforts to study the effects of programming language choice. This dissertation from UC Davis examined many open-source code bases to find what factors led to fewer bugs. The conclusion, was that strongly typed, functional languages fare better. A much older paper from Yale found that Haskell was a clear winner when it came to code clarity. (Python and Javascript weren’t covered in the study though).

Case closed right?

Well of course not. With the UC Davis paper, many possible issues prevent firm conclusions. After all, it only studied open source code and not industrial code. Confounding factors might well exist. Functional projects could just have fewer users. This could explain fewer bug reports. Also, code size dwarfed any effects of language choice. More code = more bugs.

But it’s hard to see how we could settle this question with an empirical study. No one has access to programmers with arbitrary levels of experience in a language. The Yale paper might have had the right idea. But it would need a larger scale to draw more solid conclusions.

What Other Evidence is out there?

So failing this kind of assessment, what is the best way to provide evidence in Haskell’s favor? It’s rather hard to look past testimony here. It would be great to see more publicity around stories like at Channable. Engineers there were able to use convert a small piece of infrastructure to Haskell. The result was a system that had fewer bugs that they could refactor with confidence.

Of course, testimony like this is still imperfect. There could be major survivorship bias. For every person who posts about Haskell solving their problems, there may be 5 who found it didn't help.

There’s also the fact of the world that many people don’t like change. You could provide all the evidence in the world in favor of a particular language. Some people will still choose to fall back on what they know. For people who have “tried and true” business models, this makes sense. Anything else seems risky.

So how do we Convince People?

I’ll repeat what I’ve noted a couple times in previous articles. Network effects are currently a huge drag on Haskell. The more Haskell developers who are out there, the easier it will be to convince companies to give it a try. Even if it is only on small projects, there will be more chances to see it succeed.

What is the best way to go about solving these network issues? For one, we need more marketing/Haskell evangelism. That said, there is still a sense that we need more educational material at all levels. Helping beginners get into the language is a great start. But there's definitely another hurdle to go from hobbyist to industry level.

Conclusion

Every once in awhile it’s important to take a step back and consider our assumptions. I definitely found it worthwhile to reexamine some of my beliefs about Haskell’s type system. It's helped me to remember why I think the way I do. There’s a good amount of evidence out there for Haskell’s utility and safety as a language. But the burden is on us as a community to collect those stories and put them out there more.

If you haven’t tried Haskell before, hopefully I’ve convinced you to be one of those people! Check out our Getting Started Checklist for some good resources in starting your Haskell education.

If you’ve dabbled a bit and want to make the next step up, we also have a few other resources for you! Check out of Stack mini-course, our Recursion Workbook or our Haskell Tensor Flow guide!

Read More
James Bowen James Bowen

Eff to the Rescue!

In the last couple weeks, we’ve seen quite a flurry of typeclasses. We used MonadDatabase and MonadCache to abstract the different effects within our API. This brought with it some benefits, but also some drawbacks. With these concepts abstracted, we could distill the API code into its simpler tasks. We didn't need to worry about connection configurations or lifting through different monads.

As we’ve seen though, there was a lot of boilerplate code involved. And there would be more if we wanted the freedom to have different parts of our app use different monad stacks. Free Monads are one solution to this problem. They allow you to compose your program so that the order in which you specify your effects does not matter. We’ll still have to write “interpretations” as we did before. But now they’ll be a lot more composable.

You can follow along the code for this by checking out the effects-3 branch on Github. Also, I do have to give a shoutout to Sandy Maguire for his talk on Eff and Free monads from BayHac. Most of what I know about free monads comes from that talk. You should also check out his blog as well.

Typeclass Boilerplate

Let’s review the main drawback of our type class approach. Recall our original definition of the AppMonad, and some of the instances we had to write for it:

newtype AppMonad a = AppMonad (ReaderT RedisInfo (SqlPersistT (LoggingT IO)) a)
  deriving (Functor, Applicative, Monad)

instance MonadDatabase AppMonad where
  fetchUserDB = liftSqlPersistT . fetchUserDB
  createUserDB = liftSqlPersistT . createUserDB
  deleteUserDB = liftSqlPersistT . deleteUserDB
  fetchArticleDB = liftSqlPersistT . fetchArticleDB
  createArticleDB = liftSqlPersistT . createArticleDB
  deleteArticleDB = liftSqlPersistT . deleteArticleDB
  fetchArticlesByAuthor = liftSqlPersistT . fetchArticlesByAuthor
  fetchRecentArticles = liftSqlPersistT fetchRecentArticles

liftSqlPersistT :: SqlPersistT (LoggingT IO) a -> AppMonad a
liftSqlPersistT action = AppMonad $ ReaderT (const action)

instance (MonadIO m, MonadLogger m) => MonadDatabase (SqlPersistT m) where
  ...

But suppose another part of our application wants to use a different monad stack. Perhaps it uses different configuration information and tracks different state. But it still needs to be able to connect to the database. As a result, we’ll need to write more instances. Each of these will need a new definition for all the different effect functions. Most all these will be repetitive and involve some combination of lifts. This isn’t great. Further, suppose we want arbitrary reordering of the monad stack. The number of instances you’ll have to write scales quadratically. Once you get to six or seven layers, this is a real pain.

Main Ideas of Eff

We can get much better composability by using free monads. I’m not going to get into the conceptual details of free monads. Instead I’ll show how to implement them using the Eff monad from the Freer Effects library. Let's first think back to how we define constraints on monads in our handler functions.

fetchUsersHandler :: (MonadDatabase m, MonadCache m) => Int64 -> m User

We take some generic monad, and constrain it to implement our type classes. With Eff, we’ll specify constraints in a different way. We have only one monad, the Eff monad. This monad is parameterized by a type-level list of other monads that it carries on its stack.

type JustIO a = Eff ‘[IO] a

type ReaderWithIO a = Eff ‘[Reader RedisInfo, IO] a

With this in mind, we can specify constraints on what monads are part of our stack using Member. Here’s how we’ll re-write the type signature from above:

fetchUsersHandler :: (Member Database r, Member Cache r) => Int64 -> Eff r User

We’ll specify exactly what Database and Cache are in the next section. But in essence, we’re stating that we have these two kinds of effects that live somewhere on our monad stack r. It doesn’t matter what order they’re in! This gives us a lot of flexibility. But before we see why, let’s examine how we actually write these effects.

Coding Up Effects

The first thing we’ll do is represent our effects as data types, rather than type classes. Starting with our database functionality, we’ll make a type Database a. This type will have one constructor for each function from our MonadDatabase typeclass. We’ll capitalize the names since they’re constructors instead of functions names. Then we’ll use GADT syntax, so that the result will be of type Database instead of a function in a particular monad. To start, here’s what our FetchUserDB constructor looks like:

{-# LANGUAGE GADTs #-}

data Database a where
  FetchUserDB :: Int64 -> Database (Maybe User)
  ...

Our previous definition looked like Int64 -> m (Maybe User). But we’re now constructing a Database action. Here’s the rest of the definition:

data Database a where
  FetchUserDB :: Int64 -> Database (Maybe User)
  CreateUserDB :: User -> Database Int64
  DeleteUserDB :: Int64 -> Database ()
  FetchArticleDB :: Int64 -> Database (Maybe Article)
  CreateArticleDB :: Article -> Database Int64
  DeleteArticleDB :: Int64 -> Database ()
  FetchArticlesByAuthor :: Int64 -> Database [KeyVal Article]
  FetchRecentArticles :: Database [(KeyVal User, KeyVal Article)]

Now we can also do the same thing with a Cache type instead of our MonadCache class:

data Cache a where
  CacheUser :: Int64 -> User -> Cache ()
  FetchCachedUser :: Int64 -> Cache (Maybe User)
  DeleteCachedUser :: Int64 -> Cache ()

Now, unfortunately, we do need some boilerplate with Eff. For each of constructor we create, we’ll need a function to run that item within the Eff monad. For these, we’ll use the send function from the Eff library. Each function states that our effect type is a member of our monad set. Then it will otherwise match the type of that constructor, only within the Eff monad. Here are the three examples for our Cache type.

cacheUser :: (Member Cache r) => Int64 -> User -> Eff r ()
cacheUser uid user = send $ CacheUser uid user

fetchCachedUser :: (Member Cache r) => Int64 -> Eff r (Maybe User)
fetchCachedUser = send . FetchCachedUser

deleteCachedUser :: (Member Cache r) => Int64 -> Eff r ()
deleteCachedUser = send . DeleteCachedUser

But wait! You might be asking, aren’t we trying to avoid boilerplate? Well, it’s hard to avoid all boilerplate. But the real gain we’ll get is that our boilerplate will scale in a linear fashion. We only need this code once per effect type we create. Remember, the alternative is quadratic growth.

Interpreting our Effects

To write "interpretations" of our effects in the type class system, we wrote instances. Here, we can do it with functions that we'll prefix with run. These will assume we have an action where our effect is on "top" of the monad stack. The result will be a new action with that layer peeled off.

runDatabase :: Eff (Database ': r) a -> Eff r a
runDatabase = ...

Now, we have to consider, what would be necessary to run our database effects? For our production application, we need to know that SqlPersistT lives in the monad stack. So we’ll add (SqlPersistT (LoggingT IO)) as a constraint on the rest of the r for our monad.

runDatabase :: (Member (SqlPersistT (LoggingT IO)) r) => Eff (Database ': r) a -> Eff r a

So we are in effect constraining the ordering of our monad, but doing it in a logical way. It wouldn’t make sense for us to ever run our database effects without knowing about the database itself.

To write this function, we specify a transformation between this Member of the rest of our stack and our Database type. We can run this transformation with runNat:

runDatabase :: (Member (SqlPersistT (LoggingT IO)) r) => Eff (Database ': r) a -> Eff r a
runDatabase = runNat databaseToSql
  where
    databaseToSql :: Database a -> SqlPersistT (LoggingT IO) a
    ...

Now we need a conversion between a Database object and a SqlPersistT action. For this, we plug in all the different function definitions we’ve been using all along. For instance, here’s what our fetchUserDB and createDB definitions look like:

databaseToSql (FetchUserDB uid) = get (toSqlKey uid)
databaseToSql (CreateUserDB user) = fromSqlKey <$> insert user

Our other constructors will follow this pattern as well.

Now, we’ll also want a way to interpret SqlPersistT effects within Eff. We’ll depend on only having IO as a deeper member within the stack here, though we also need the PGInfo parameter. Then we use runNat and convert between our SqlPersistT action and a normal IO action. We’ve done this before with runPGAction:

runSqlPersist :: (Member IO r) => PGInfo -> Eff ((SqlPersistT (LoggingT IO)) ': r) a -> Eff r a
runSqlPersist pgInfo = runNat $ runPGAction pgInfo

We go through this same process with Redis and our cache. To run a Redis action from our monad stack, we have to take the RedisInfo as a parameter and then also have IO on our stack:

runRedisAction :: (Member IO r) => RedisInfo -> Eff (Redis ': r) a -> Eff r a
runRedisAction redisInfo = runNat redisToIO
  where
    redisToIO :: Redis a -> IO a
    redisToIO action = do
      connection <- connect redisInfo
      runRedis connection action

Once we have this transformation, we can use the dependency on Redis to run Cache actions.

runCache :: (Member Redis r) => Eff (Cache ': r) a -> Eff r a
runCache = runNat cacheToRedis
  where
    cacheToRedis :: Cache a -> Redis a
    cacheToRedis (CacheUser uid user) = void $ setex (pack . show $ uid) 3600 (pack . show $ user)
    cacheToRedis (FetchCachedUser uid) = do
      result <- get (pack . show $ uid)
      case result of
        Right (Just userString) -> return $ Just (read . unpack $ userString)
        _ -> return Nothing
    cacheToRedis (DeleteCachedUser uid) = void $ del [pack . show $ uid]

And now we're done with our interpretations!

A Final Natural Transformation

Since we’re using Servant, we’ll still have to pick a final ordering. We need a natural transformation from Eff to Handler. Thus we'll specify a specific order so we have a specific Eff. We’ll put our cache effects on the top of our stack, then database operations, and finally, plain old IO.

transformEffToHandler ::
  PGInfo ->
  RedisInfo ->
  (Eff '[Cache, Redis, Database, SqlPersistT (LoggingT IO), IO]) :~> Handler

So how do we define this transformation? As always, we’ll want to create an IO action that exposes an Either value so we can catch errors. First, we can use our different run functions to peel off all the layers on our stack until all we have is IO:

transformEffToHandler ::
  PGInfo ->
  RedisInfo ->
  (Eff '[Cache, Redis, Database, SqlPersistT (LoggingT IO), IO]) :~> Handler
transformEffToHandler pgInfo redisInfo = NT $ \action -> do
  -- ioAct :: Err ‘[IO] a
  let ioAct = (runSqlPersist pgInfo . runDatabase . runRedisAction redisInfo . runCache) action
  ...

When we only have a single monad on our stack, we can use runM to get an action in that monad. So we need to apply that to our action, handle errors, and return the result as a Handler!

transformEffToHandler ::
  PGInfo ->
  RedisInfo -> 
  (Eff '[Cache, Redis, Database, SqlPersistT (LoggingT IO), IO]) :~> Handler
transformEffToHandler pgInfo redisInfo = NT $ \action -> do
  let ioAct = (runSqlPersist pgInfo . runDatabase . runRedisAction redisInfo . runCache) action
  result <- liftIO (runWithServantHandler (runM ioAct))
  Handler $ either throwError return result

And with that we’re done! Here’s the big win with Eff. It’s quite easy for us to write a different transformation on a different ordering of the Stack. We just change the order in which we apply our run functions!

-- Put Database on top instead of Cache
transformEffToHandler :: 
  PGInfo -> 
  RedisInfo -> 
  (Eff '[Database, SqlPersistT (LoggingT IO), Cache, Redis, IO]) :~> Handler
transformEffToHandler pgInfo redisInfo = NT $ \action -> do
  let ioAct = (runRedisAction redisInfo . runCache . runSqlPersist pgInfo . runDatabase) action
  result <- liftIO (runWithServantHandler (runM ioAct))
  Handler $ either throwError return result

Can we avoid outside services with this approach? Sure! We can specify test interpretations of our effects that don’t use SqlPersistT or Redis. We’ll still have IO for reasons mentioned last week, but it’s still an easy change. We'll define separate runTestDatabase and runTestCache functions that use the same effects we saw last week. They’ll depend on using the State over our in-memory maps.

runTestDatabase :: 
  (Member (StateT (UserMap, ArticleMap, UserMap) IO) r) => 
  Eff (Database ': r) a -> 
  Eff r a
runTestDatabase = runNat databaseToState
  where
    databaseToState :: Database a -> StateT (UserMap, ArticleMap, UserMap) IO a
    …

runTestCache ::
  (Member (StateT (UserMap, ArticleMap, UserMap) IO) r) =>
  Eff (Cache ': r) a ->
  Eff r a
runTestCache = runNat cacheToState
  where
    cacheToState :: Cache a -> StateT (UserMap, ArticleMap, UserMap) IO a
    ...

Then we fill in the definitions with the same functions we used when writing our TestMonad. After that, we define another natural transformation, in the same pattern:

transformTestEffToHandler ::
  MVar (UserMap, ArticleMap, UserMap) ->
  Eff '[Cache, Database, StateT (UserMap, ArticleMap, UserMap) IO] :~> Handler
transformTestEffToHandler sharedMap = NT $ \action -> do
  let stateAct = (runTestDatabase . runTestCache) action
  result <- liftIO (runWithServantHandler (runEff stateAct))
  Handler $ either throwError return result
  where
    runEff :: Eff '[StateT (UserMap, ArticleMap, UserMap) IO] a -> IO a
    runEff action = do
      let stateAction = runM action
      runStateTWithPointer stateAction sharedMap

Incorporating our Interpretations

The final step we’ll take is to change a couple different type signatures within our API code. We’ll pass a new natural transformation to our Server function:

fullAPIServer :: 
  ((Eff '[Cache, Redis, Database, SqlPersistT (LoggingT IO), IO]) :~> Handler) ->
  Server FullAPI
fullAPIServer nt = ...

And then we’ll change all our handlers to use Eff with the proper members, instead of AppMonad:

fetchUsersHandler :: (Member Database r, Member Cache r) => Int64 -> Eff r User
createUserHandler :: (Member Database r) => User -> Eff r Int64
fetchArticleHandler :: (Member Database r) => Int64 -> Eff r Article
createArticleHandler :: (Member Database r) => Article -> Eff r Int64
fetchArticlesByAuthorHandler :: (Member Database r) => Int64 -> Eff r [KeyVal Article]
fetchRecentArticlesHandler :: (Member Database r) => Eff r [(KeyVal User, KeyVal Article)]

Conclusion

We’ve come a long way with our small application. It doesn’t do much. But it has served as a great launchpad for learning many interesting libraries and techniques. In particular, we’ve seen in these last few weeks how to organize effects within our application. With the Eff library, we can represent our effects with data types that we can re-order with ease.

If you’ve never tried Haskell before, give it a shot! Download our Getting Started Checklist and get going!

If you’ve done a little Haskell but aren’t set on your skills yet, maybe this article went over your head. That’s OK! You can work on your skills more with our Recursion Workbook!

Read More
James Bowen James Bowen

A Different Point of View: Interpreting our Monads Without Outside Services

Last week we updated our API to use some interesting monadic constructs. These allowed us to narrow down the places where effects could happen in our application. This week we’ll examine another advantage of this system. We’ll examine how we can simplify our tests and remove the dependency on outside services.

You can follow along this code by looking at the effects-2 branches on the Github repository. In effects-2-start, we’ve updated our tests to use the AppMonad instead of normal IO functions. We can still do better though (see the effects-2-end branch for the final product). We can create a second monad that implements our MonadDatabase and MonadCache classes. This creates what we call a different interpretation of our effects. We can do this in such a way that they don’t rely on running instances of Postgres and Redis.

Re-Imagining our Monad

Let’s imagine the simplest possible way to have a “database”. Instead of using a remote service, we could use in-memory maps. So let’s start with a couple type synonyms:

type UserMap = Map.Map Int64 User
type ArticleMap = Map.Map Int64 Article

There are three different maps in our application. The first map will be our normal Users table from the database. The second map will be the database’s Article table. The third map will refer to our Users cache. Now we’ll create a monad that links all these different elements together, and wraps them in StateT. We’ll then be able to update these maps between requests. We still need IO on our monad stack for reasons we’ll see later.

newtype TestMonad a = TestMonad (StateT (UserMap, ArticleMap, UserMap) IO a)
  deriving (Functor, Applicative, Monad)

instance MonadIO TestMonad where
  liftIO action = TestMonad $ liftIO action

Now we want to create instances of our database type classes for this monad. Let’s start an implementation of MonadDatabase by considering how we’ll fetch a user:

instance MonadDatabase TestMonad where
  fetchUserDB uid = ...

All we need to do is grab the first map out of our state tuple, and then use the normal Map lookup function! We can do the same with an article:

fetchUserDB uid = TestMonad $ do
  userDB <- (view _1) <$> get
  return $ Map.lookup uid userDB

fetchArticleDB aid = TestMonad $ do
  articleDB <- (view _2) <$> get
  return $ Map.lookup aid articleDB

Creating elements is a little more complicated, since we have to generate the keys. This isn’t that hard though! We’ll check if the map is empty and use 1 for the key if there are no entries. Otherwise find the max key and add 1 to it (note that the API for Map.findMax has changed since I wrote this) :

createUserDB user = TestMonad $ do
  (userDB, articleDB, userCache) <- get
  let newUid = if Map.null userDB
        then 1
        else 1 + (fst . Map.findMax) userDB
  ...

Now we’ll create a modified map by inserting our new element. Then we’ll put the modified map back in along with the other maps:

createUserDB user = TestMonad $ do
  (userDB, articleDB, userCache) <- get
  let newUid = if Map.null userDB
        then 1
        else 1 + (fst . Map.findMax) userDB
  let userDB' = Map.insert newUid user userDB
  put (userDB', articleDB, userCache)
  return newUid

createArticleDB article = TestMonad $ do
  (userDB, articleDB, userCache) <- get
  let newAid = if Map.null articleDB
        then 1
        else 1 + (fst . Map.findMax) articleDB
  let articleDB' = Map.insert newAid article articleDB
  put (userDB, articleDB', userCache)
  return newAid

Deletion follows the same general pattern. The only difference is we delete from the map instead of inserting!

deleteUserDB uid = TestMonad $ do
  (userDB, articleDB, userCache) <- get
  let userDB' = Map.delete uid userDB
  put (userDB', articleDB, userCache)

deleteArticleDB aid = TestMonad $ do
  (userDB, articleDB, userCache) <- get
  let articleDB' = Map.delete aid articleDB
  put (userDB, articleDB', userCache)

Now our final two functions will involve actually performing some application logic. To fetch articles by author, we get the list of articles in our database and filter it using the author ID:

fetchArticlesByAuthor uid = TestMonad $ do
  articleDB <- (view _2) <$> get
  return $ map KeyVal (filter articleByAuthor (Map.toList articleDB))
  where
    articleByAuthor (_, article) = articleAuthorId article == toSqlKey uid

For fetching the recent articles, we first sort all the articles in our map by timestamp. Then we take the ten most recent:

fetchRecentArticles = TestMonad $ do
  (userDB, articleDB, _) <- get
  let recentArticles = take 10 (sortBy orderByTimestamp (Map.toList articleDB)) 
  ...
  where
    orderByTimestamp (_, article1) (_, article2) =
      articlePublishedTime article2 `compare` articlePublishedTime article1

But now we have to match each of them with right user. This involves performing a lookup based on the user ID. But then we’re done!

fetchRecentArticles = TestMonad $ do
  (userDB, articleDB, _) <- get
  let recentArticles = take 10 (sortBy orderByTimestamp (Map.toList articleDB)) 
  return $ map (matchWithAuthor userDB) recentArticles
  where
    orderByTimestamp (_, article1) (_, article2) =
      articlePublishedTime article2 `compare` articlePublishedTime article1
    matchWithAuthor userDB (aid, article) =
      case Map.lookup (fromSqlKey (articleAuthorId article)) userDB of
        Nothing -> error "Found article with no user" 
        Just u -> (KeyVal (fromSqlKey (articleAuthorId article), u), KeyVal (aid, article))

Our instance for MonadCache is very similar. We'll manipulate the third map instead of the first 2:

instance MonadCache TestMonad where
  cacheUser uid user = TestMonad $ do
    (userDB, articleDB, userCache) <- get
    let userCache' = Map.insert uid user userCache
    put (userDB, articleDB, userCache')
  fetchCachedUser uid = TestMonad $ do
    userCache <- (view _3) <$> get
    return $ Map.lookup uid userCache
  deleteCachedUser uid = TestMonad $ do
    (userDB, articleDB, userCache) <- get
    let userCache' = Map.delete uid userCache
    put (userDB, articleDB, userCache')

Another Natural Transformation

Now we’re not quite done. We need the ability to run a version of our server that uses this interpretation of our effects. To do this, we need a natural transformation like we had before with AppMonad. Unfortunately, the StateT of our maps won’t get threaded through properly unless we use a pointer to it. This is why we need IO on our stack. Here’s a function that will use a pointer (MVar) to our state, run it, and then swap in the new map.

runStateTWithPointer :: (Exception e, MonadIO m) => StateT s m a -> MVar s -> m (Either e a)
runStateTWithPointer action ref = do
  env <- liftIO $ readMVar ref
  (val, newEnv) <- runStateT action env
  void $ liftIO $ swapMVar ref newEnv
  return $ Right val

Now for our transformation, we’ll take this pointer and run the state. Then we need to catch exceptions like we did in our transformation for AppMonad:

transformTestToHandler :: MVar (UserMap, ArticleMap, UserMap) -> TestMonad :~> Handler
transformTestToHandler sharedMap = NT $ \(TestMonad action) -> do
  result <- liftIO $ handleAny handler $
    runStateTWithPointer action sharedMap 
  Handler $ either throwError return result
  where
    handler :: SomeException -> IO (Either ServantErr a)
    handler e = return $ Left $ err500 { errBody = pack (show e) }

Now when we setup our tests, we’ll run our server using this transformation instead. Notice that we don’t have to do anything with Postgres or Redis here!

setupTests :: IO (ClientEnv, MVar (UserMap, ArticleMap, UserMap), ThreadId)
setupTests = do
  mgr <- newManager tlsManagerSettings
  baseUrl <- parseBaseUrl "http://127.0.0.1:8000"
  let clientEnv = ClientEnv mgr baseUrl
  let initialMap = (Map.empty, Map.empty, Map.empty)
  mapRef <- newMVar initialMap
  tid <- forkIO $
    run 8000 (serve usersAPI (testAPIServer (transformTestToHandler mapRef)))
  threadDelay 1000000
  return (clientEnv, mapRef, tid)

Now when our tests run, they’ll hit a server storing the information in memory instead of a Postgres server. This is super cool!

Integrating with our Tests

Unfortunately, it’s still a little awkward to write our tests. A lot of what they’re actually testing is the internal state of the “database” in question. So we need this function that takes the pointer to the map (the same pointer used by the server) and runs actions on it:

runTestMonad :: MVar (UserMap, ArticleMap, UserMap) -> TestMonad a -> IO a
runTestMonad mapVar (TestMonad action) = do
  currentState <- readMVar mapVar
  (result, newMap) <- runStateT action currentState
  swapMVar mapVar newMap
  return result

Now in our tests, we’ll wrap any calls to the database with this action. Here’s an example of our first before hook:

beforeHook1 :: ClientEnv -> MVar (UserMap, ArticleMap, UserMap) -> IO (Bool, Bool, Bool)
beforeHook1 clientEnv mapVar = do
  callResult <- runClientM (fetchUserClient 1) clientEnv
  let throwsError = isLeft callResult
  (inPG, inRedis) <- runTestMonad mapVar $ do
    inPG <- isJust <$> fetchUserDB 1
    inRedis <- isJust <$> fetchCachedUser 1
    return (inPG, inRedis)
  return (throwsError, inPG, inRedis)

One excellent consequence of using an in-memory map is that we don’t care if there’s data in our “database” at the end. Thus we can completely get rid of our after hooks, which were a bit of a pain!

main :: IO ()
main = do
  (clientEnv, dbMap, tid) <- setupTests
  hspec $ before (beforeHook1 clientEnv dbMap) spec1
  hspec $ before (beforeHook2 clientEnv dbMap) spec2
  hspec $ before (beforeHook3 clientEnv dbMap) spec3
  hspec $ before (beforeHook4 clientEnv dbMap) spec4
  hspec $ before (beforeHook5 clientEnv dbMap) spec5
  hspec $ before (beforeHook6 clientEnv dbMap) spec6
  killThread tid 
  return ()

And now our tests also run perfectly well without needing the docker container to be active! Hooray!

Conclusion

There’s a certain argument that we haven’t really accomplished much. Our app is very shallow, and most of the logic happens within the database calls themselves. Recall that many of our handler functions reduced to the database calls. Hence, the only thing we’re testing right now is our test interpretation!

But it’s easy to imagine that if our application were more complicated, this logic wouldn’t be at the core of our code. In most cases, database queries are the prelude to manipulating the data. And this TestMonad would remove the inconvenience of sourcing that data from outside.

Stay tuned for next week, where we’ll wrap up this consideration of effects by looking at free monads! We’ll consider the “freer-effects” library. It will let us cut down a bit on some of the boilerplate we get with this MTL style approach.

Never tried Haskell before? Do you have visions of conquering all foes with these sorts of abstractions? Check out our Getting Started Checklist and start your journey!

Have you dabbled a little but want to test your skills some more? Take a look at our Recursion Workbook!

Read More
James Bowen James Bowen

Organizing our Effects Effectively

In the last 5 weeks or so, we’ve built a web application exposing a small API. The application is quite narrow, encompassing only a small amount of functionality. But it is still deep, covering several different libraries and techniques.

In these next couple weeks, we’ll look at some architectural considerations. We’ll observe some of the weaknesses of this system, and how we can improve on them. This week will focus on an approach with type classes and monad transformers. In a couple weeks, we’ll consider free monads, and how we can use them.

You can follow along with this code on the effects-1 branch of the Github repo.

Weaknesses

In our current system, there are a lot of different functions like these:

fetchUserPG :: PGInfo -> Int64 -> IO (Maybe User)
createUserPG :: PGInfo -> User -> IO Int64
cacheUser :: RedisInfo -> Int64 -> User -> IO ()

Now, the parameters do inform us what each function should be accessing. But the functions are still regular IO functions. This means a novice programmer could come in and get the idea that it’s fine to use arbitrary effects. For instance, why not fetch our Postgres information from the Redis function? After all, fetchPGInfo is an IO function as well:

fetchPostgresConnection :: IO PGInfo
...

cacheUser :: RedisInfo -> Int64 -> User -> IO ()
cacheUser = do
  pgInfo <- fetchPostgresConnection
  -- Connect to Postgres instead of Redis :(

Our API also has some uncomfortable lifting in our handler functions. We have to call liftIO because all our database functions are IO functions.

fetchUsersHandler :: PGInfo -> RedisInfo -> Int64 -> Handler User
fetchUsersHandler pgInfo redisInfo uid = do
  -- liftIO #1
  maybeCachedUser <- liftIO $ fetchUserRedis redisInfo uid
  case maybeCachedUser of
    Just user -> return user
    Nothing -> do
      -- liftIO #2
      maybeUser <- liftIO $ fetchUserPG pgInfo uid
      case maybeUser of
        -- liftIO #3
        Just user -> liftIO (cacheUser redisInfo uid user) >> return user
        Nothing -> Handler $ (throwE $ err401 { errBody = "Could not find user with that ID" })

At the very least, our connection parameters are explicit here. If we hid them in a Reader, this would introduce even more lifts.

This article will focus on using type classes to restrict how we use effects. With any luck, we'll also clean up our code a bit and make it easier to test things. But we’ll focus more on testing more next week.

Now, depending on the project size and scope, these weaknesses might not be issues. But it’s definitely a useful exercise to see alternative ways to organize our code.

Defining our Type Classes

Our first step for limiting our effects will be to create two type classes. We'll have one for our main database, and one for our cache. We'll try to make these functions agnostic to the underlying database representation. Hence, we’ll change our API to remove the notion of Entity. We’ll replace it with the idea of KeyVal, a wrapper around a tuple.

newtype KeyVal a = KeyVal (Int64, a)

With that, here are the 8 functions we have for accessing our database:

class (Monad m) => MonadDatabase m where
  fetchUserDB :: Int64 -> m (Maybe User) 
  createUserDB :: User -> m Int64 
  deleteUserDB :: Int64 -> m ()
  fetchArticleDB :: Int64 -> m (Maybe Article)
  createArticleDB :: Article -> m Int64
  deleteArticleDB :: Int64 -> m ()
  fetchArticlesByAuthor :: Int64 -> m [KeyVal Article]
  fetchRecentArticles :: m [(KeyVal User, KeyVal Article)]

And then we have three functions for how we interact with our cache:

class (Monad m) => MonadCache m where
  cacheUser :: Int64 -> User -> m ()
  fetchCachedUser :: Int64 -> m (Maybe User)
  deleteCachedUser :: Int64 -> m ()

We can now create instances of these type classes for any different monad we want to use. Let’s start by describing implementations for our existing libraries.

Writing Instances

We’ll start with SqlPersistT. We want to make an instance of MonadDatabase for it. We'll gather all the different functionality from the last few articles.

instance (MonadIO m, MonadLogger m) => MonadDatabase (SqlPersistT m) where
  fetchUserDB uid = get (toSqlKey uid)

  createUserDB user = fromSqlKey <$> insert user

  deleteUserDB uid = delete (toSqlKey uid :: Key User)

  fetchArticleDB aid = ((fmap entityVal) . listToMaybe) <$> (select . from $ \articles -> do
    where_ (articles ^. ArticleId ==. val (toSqlKey aid))
    return articles)

  createArticleDB article = fromSqlKey <$> insert article

  deleteArticleDB aid = delete (toSqlKey aid :: Key Article)

  fetchArticlesByAuthor uid = do
    entities <- select . from $ \articles -> do
      where_ (articles ^. ArticleAuthorId ==. val (toSqlKey uid))
      return articles
    return $ unEntity <$> entities

  fetchRecentArticles = do
    tuples <- select . from $ \(users `InnerJoin` articles) -> do
      on (users ^. UserId ==. articles ^. ArticleAuthorId)
      orderBy [desc (articles ^. ArticlePublishedTime)]
      limit 10
      return (users, articles)
    return $ (\(userEntity, articleEntity) -> (unEntity userEntity, unEntity articleEntity)) <$> tuples

Since we’re removing Entity from our API, we use this unEntity function. It will give us back the key and value as a KeyVal:

unEntity :: (ToBackendKey SqlBackend a) => Entity a -> KeyVal a
unEntity (Entity id_ val_) = KeyVal (fromSqlKey id_, val_)

Now we’ll do the same with our cache functions. We’ll make an instance of MonadCache for the Redis monad:

instance MonadCache Redis where
  cacheUser uid user = void $ setex (pack . show $ uid) 3600 (pack . show $ user)
  fetchCachedUser uid = do
    result <- get (pack . show $ uid)
    case result of
      Right (Just userString) -> return $ Just (read . unpack $ userString)
      _ -> return Nothing
  deleteCachedUser uid = void $ del [pack . show $ uid]

And that’s all there is here! Let’s see how we can combine these for easy use within our API.

Making our App Monad

We’d like to describe an “App Monad” that will allow us to access both these functionalities with ease. We’ll make a wrapper around a monad transformer incorporating a Reader for the Redis information and the SqlPersistT monad. We derive Monad for this type using GeneralizedNewtypeDeriving:

{-# LANGUAGE GeneralizedNewtypeDeriving #-}

newtype AppMonad a = AppMonad (ReaderT RedisInfo (SqlPersistT (LoggingT IO)) a)
  deriving (Functor, Applicative, Monad)

Now we’ll want to make instances of MonadDatabase and MonadCache. The instances are easy though; we'll use the instances for the underlying monads. First, let's define a transformation from an SqlPersistT action to our AppMonad. We need to build out the ReaderT RedisInfo for this. We'll use the ReaderT constructor and ignore the info with const.

liftSqlPersistT :: SqlPersistT (LoggingT IO) a -> AppMonad a
liftSqlPersistT action = AppMonad $ ReaderT (const action)

We can also define a transformation on Redis actions:

liftRedis :: Redis a -> AppMonad a
liftRedis action = do
  info <- AppMonad ask
  connection <- liftIO $ connect info
  liftIO $ runRedis connection action

We'll apply our underlying instances like so:

instance MonadDatabase AppMonad where
  fetchUserDB = liftSqlPersistT . fetchUserDB
  createUserDB = liftSqlPersistT . createUserDB
  deleteUserDB = liftSqlPersistT . deleteUserDB
  fetchArticleDB = liftSqlPersistT . fetchArticleDB
  createArticleDB = liftSqlPersistT . createArticleDB
  deleteArticleDB = liftSqlPersistT . deleteArticleDB
  fetchArticlesByAuthor = liftSqlPersistT . fetchArticlesByAuthor
  fetchRecentArticles = liftSqlPersistT fetchRecentArticles

instance MonadCache AppMonad where
  cacheUser uid user = liftRedis (cacheUser uid user)
  fetchCachedUser = liftRedis . fetchCachedUser 
  deleteCachedUser = liftRedis . deleteCachedUser

And that's it! We have our instances. Now we want to move on and figure out how we’ll actually incorporate this new monad into our API.

Writing a Natural Transformation

We would like to make it so that our handler functions can use AppMonad instead of the Handler monad. But Servant is sort’ve hard-coded to use Handler, so what do we do? The answer is we define a “Natural Transformation”.

I found this term to be a bit like "category". It seems innocuous but actually refers to something deeply mathematical. But we don't need to know too much to use it. The type operator (:~>) defines a natural transformation. All we need to make it is a function that takes an action in our monad and converts it into an action in the Handler monad. We'll need to pass our connection information to make this work.

transformAppToHandler :: PGInfo -> RedisInfo -> AppMonad :~> Handler

We’ll start by defining a “handler” that will catch any errors we throw and recast them as Servant errors. In general, you want to list the specific types of exceptions you’ll catch. It's not a great idea to catch every exception like this. But for this example, we’ll keep it simple:

handler :: SomeException -> IO (Either ServantErr a)
handler e = return $ Left $ err500 { errBody = pack (show e)}

Notice this returns an Either which is always a Left. Let's now define how we convert an action from our “AppMonad” into an Either as well. We’ll get the result and pass it on as a Right value.

runAppAction :: Exception e => AppMonad a -> IO (Either e a)
runAppAction (AppMonad action) = do
  result <- runPGAction pgInfo $ runReaderT action redisInfo
  return $ Right result

And putting it together, here’s our transformation. We catch errors, and then wrap the result up in Handler.

transformAppToHandler :: PGInfo -> RedisInfo -> AppMonad :~> Handler
transformAppToHandler pgInfo redisInfo = NT $ \action -> do
  result <- liftIO (handleAny handler (runAppAction action))
  Handler $ either throwError return result
  ...

Incorporating the App Monad

All we have to do now is incorporate our new monad into our handlers. First off, let’s change our API to remove Entities:

type FullAPI =
       "users" :> Capture "userid" Int64 :> Get '[JSON] User
  :<|> "users" :> ReqBody '[JSON] User :> Post '[JSON] Int64
  :<|> "articles" :> Capture "articleid" Int64 :> Get '[JSON] Article
  :<|> "articles" :> ReqBody '[JSON] Article :> Post '[JSON] Int64
  :<|> "articles" :> "author" :> Capture "authorid" Int64 :> Get '[JSON] [KeyVal Article]
  :<|> "articles" :> "recent" :> Get '[JSON] [(KeyVal User, KeyVal Article)]

We want to update the type of each function. The AppMonad incorporates all the configuration information. So we don’t need to pass connection information explicitly. Instead, we can use constraints on our monad type classes to expose those effects. Here’s what our type signatures look like:

fetchUsersHandler :: (MonadDatabase m, MonadCache m) => Int64 -> m User
createUserHandler :: (MonadDatabase m) => User -> m Int64
fetchArticleHandler :: (MonadDatabase m) => Int64 -> m Article
createArticleHandler :: (MonadDatabase m)=> Article -> m Int64
fetchArticlesByAuthorHandler :: (MonadDatabase m) => Int64 -> m [KeyVal Article]
fetchRecentArticlesHandler :: (MonadDatabase m) => m [(KeyVal User, KeyVal Article)]

And now a lot of our functions are simple monadic calls. We don’t even need to use “lift”!

createUserHandler :: (MonadDatabase m) => User -> m Int64
createUserHandler = createUserDB

createArticleHandler :: (MonadDatabase m)=> Article -> m Int64
createArticleHandler = createArticleDB

fetchArticlesByAuthorHandler :: (MonadDatabase m) => Int64 -> m [KeyVal Article]
fetchArticlesByAuthorHandler = fetchArticlesByAuthor

fetchRecentArticlesHandler :: (MonadDatabase m) => m [(KeyVal User, KeyVal Article)]
fetchRecentArticlesHandler = fetchRecentArticles

The “fetch” functions are a bit more complicated since we’ll want to do stuff like check the cache first. But again, all our functions are simple monadic calls without using any lifting. Here’s how our fetch handlers look:

fetchUsersHandler :: (MonadDatabase m, MonadCache m) => Int64 -> m User
fetchUsersHandler uid = do
  maybeCachedUser <- fetchCachedUser uid
  case maybeCachedUser of
    Just user -> return user
    Nothing -> do
      maybeUser <- fetchUserDB uid
      case maybeUser of
        Just user -> cacheUser uid user >> return user
        Nothing -> error "Could not find user with that ID"

fetchArticleHandler :: (MonadDatabase m) => Int64 -> m Article
fetchArticleHandler aid = do
  maybeArticle <- fetchArticleDB aid
  case maybeArticle of
    Just article -> return article
    Nothing -> error "Could not find article with that ID"

And now we’ll change our Server function. We’ll update it so that it takes our natural transformation as an argument. Then we’ll use the enter function combined with that transformation. This is how Servant knows what monad we want for our handlers:

fullAPIServer :: (AppMoand :~> Handler) -> Server FullAPI
fullAPIServer naturalTransformation =
  enter naturalTransformation $
    fetchUsersHandler :<|>
    createUserHandler :<|>
    fetchArticleHandler :<|>
    createArticleHandler :<|>
    fetchArticlesByAuthorHandler :<|>
    fetchRecentArticlesHandler

runServer :: IO ()
runServer = do
  pgInfo <- fetchPostgresConnection
  redisInfo <- fetchRedisConnection
  -- Pass the natural transformation as an argument!
  run 8000 (serve usersAPI (fullAPIServer (transformAppToHandler pgInfo redisInfo)))

And now we’re done!

Weaknesses with this Approach

Of course, this system is not without it’s weaknesses. In particular, there’s quite a bit of boilerplate. This is especially true if we don’t want to fix the ordering of our monad stack. For instance what if another part of our application puts SqlPersistT on top of Redis? What if we want to mix other monad transformers in? We’ll need new instances of MonadDatabase and MonadCache for that. We'll end up writing a lot more simple definitions. We’ll examine solutions to this weakness in a couple weeks when we look at free monads.

We’ll also need to add new functions to our type classes every time we want to update their functionality. And then we’ll have to update EVERY instance of that typeclass, which can be quite a pain. The more instances we have, the more painful it will be to add new functionality.

Conclusion

So with a few useful tricks, we can come up with code that is a lot cleaner. We employed type classes to great effect to limit how effects appear in our application. By writing instances of these classes for different monads, we can change the behavior of our application. Next week, we’ll see how we can use this behavior to write simpler tests!

When managing an application with this many dependencies you need the right tools. I used Stack for all my Haskell project organization. Check out our free Stack mini-course to learn more!

But if you’ve never tried Haskell at all, give it a try! Take a look at our Getting Started Checklist.

Read More
James Bowen James Bowen

Join the Club: Type-safe Joins with Esqueleto!

In the last four articles or so, we’ve done a real whirlwind tour of Haskell libraries. We created a database schema using Persistent and used it to write basic SQL queries in a type-safe way. We saw how to expose this database via an API with Servant. We also went ahead and added some caching to that server with Redis. Finally, we wrote some basic tests around the behavior of this API. By using Docker, we made those tests reproducible.

In this article, we’re going to review this whole process by adding another type to our schema. We’ll write some new endpoints for an Article type, and link this type to our existing User type with a foreign key. Then we’ll learn one more library: Esqueleto. Esqueleto improves on Persistent by allowing us to write type-safe SQL joins.

As with the previous articles, there’s a specific branch on the Github repository for this series. Go there and take a look at the esqueleto branch to see the complete code for this article.

Adding Article to our Schema

So our first step is to extend our schema with our Article type. We’re going to give each article a title, some body text, and a timestamp for its publishing time. One new feature we’ll see is that we’ll add a foreign key referencing the user who wrote the article. Here’s what it looks like within our schema:

PTH.share [PTH.mkPersist PTH.sqlSettings, PTH.mkMigrate "migrateAll"] [PTH.persistLowerCase|
 User sql=users
   ...

 Article sql=articles
   title Text
   body Text
   publishedTime UTCTime
   authorId UserId
   UniqueTitle title
   deriving Show Read Eq
|]

We can use UserId as a type in our schema. This will create a foreign key column when we create the table in our database. In practice, our Article type will look like this when we use it in Haskell:

data Article = Article
 { articleTitle :: Text
 , articleBody :: Text
 , articlePublishedTime :: UTCTime
 , articleAuthorId :: Key User
 }

This means it doesn’t reference the entire user. Instead, it contains the SQL key of that user. Since we’ll be adding the article to our API, we need to add ToJSON and FromJSON instances as well. These are pretty basic as well, so you can check them out here if you’re curious. If you’re curious about JSON instances in general, take a look at this article.

Adding Endpoints

Now we’re going to extend our API to expose certain information about these articles. First, we’ll write a couple basic endpoints for creating an article and then fetching it by its ID:

type FullAPI = 
      "users" :> Capture "userid" Int64 :> Get '[JSON] User
 :<|> "users" :> ReqBody '[JSON] User :> Post '[JSON] Int64
 :<|> "articles" :> Capture "articleid" Int64 :> Get '[JSON] Article
 :<|> "articles" :> ReqBody '[JSON] Article :> Post '[JSON] Int64

Now, we’ll write a couple special endpoints. The first will take a User ID as a key and then it will provide all the different articles the user has written. We’ll do this endpoint as /articles/author/:authorid.

...
 :<|> "articles" :> "author" :> Capture "authorid" Int64 :> Get '[JSON] [Entity Article]

Our last endpoint will fetch the most recent articles, up to a limit of 10. This will take no parameters and live at the /articles/recent route. It will return tuples of users and their articles, both as entities.

…
 :<|> "articles" :> "recent" :> Get '[JSON] [(Entity User, Entity Article)]

Adding Queries (with Esqueleto!)

Before we can actually implement these endpoints, we’ll need to write the basic queries for them. For creating an article, we use the standard Persistent insert function:

createArticlePG :: PGInfo -> Article -> IO Int64
createArticlePG connString article = fromSqlKey <$> runAction connString (insert article)

We could do the same for the basic fetch endpoint. But we’ll write this basic query using Esqueleto in the interest of beginning to learn the syntax. With Persistent, we used list parameters to specify different filters and SQL operations. Esqueleto instead uses a special monad to compose the different type of query. The general format of an esqueleto select call will look like this:

fetchArticlePG :: PGInfo -> Int64 -> IO (Maybe Article)
fetchArticlePG connString aid = runAction connString selectAction
 where
   selectAction :: SqlPersistT (LoggingT IO) (Maybe Article)
   selectAction = select . from $ \articles -> do
     ...

We use select . from and then provide a function that takes a table variable. Our first queries will only refer to a single table, but we'll see a join later. To complete the function, we’ll provide the monadic action that will incorporate the different parts of our query.

The most basic filtering function we can call from within this monad is where_. This allows us to provide a condition on the query, much as we could with the filter list from Persistent.

selectAction :: SqlPersistT (LoggingT IO) (Maybe Article)
   selectAction = select . from $ \articles -> do
     where_ (articles ^. ArticleId ==. val (toSqlKey aid))

First, we use the ArticleId lens to specify which value of our table we’re filtering. Then we specify the value to compare against. We not only need to lift our Int64 into an SqlKey, but we also need to lift that value using the val function.

But now that we’ve added this condition, all we need to do is return the table variable. Now, select returns our results in a list. But since we’re searching by ID, we only expect one result. We’ll use listToMaybe so we only return the head element if it exists. We’ll also use entityVal once again to unwrap the article from its entity.

selectAction :: SqlPersistT (LoggingT IO) (Maybe Article)
   selectAction = ((fmap entityVal) . listToMaybe) <$> (select . from $ \articles -> do
     where_ (articles ^. ArticleId ==. val (toSqlKey aid))
     return articles)

Now we should know enough that we can write out the next query. It will fetch all the articles that have written by a particular user. We’ll still be querying on the articles table. But now instead checking the article ID, we’ll make sure the ArticleAuthorId is equal to a certain value. Once again, we’ll lift our Int64 user key into an SqlKey and then again with val to compare it in “SQL-land”.

fetchArticleByAuthorPG :: PGInfo -> Int64 -> IO [Entity Article]
fetchArticleByAuthorPG connString uid = runAction connString fetchAction
 where
   fetchAction :: SqlPersistT (LoggingT IO) [Entity Article]
   fetchAction = select . from $ \articles -> do
     where_ (articles ^. ArticleAuthorId ==. val (toSqlKey uid))
     return articles

And that’s the full query! We want a list of entities this time, so we’ve taken out listToMaybe and entityVal.

Now let’s write the final query, where we’ll find the 10 most recent articles regardless of who wrote them. We’ll include the author along with each article. So we’re returning a list of of these different tuples of entities. This query will involve our first join. Instead of using a single table for this query, we’ll use the InnerJoin constructor to combine our users table with the articles table.

fetchRecentArticlesPG :: PGInfo -> IO [(Entity User, Entity Article)]
fetchRecentArticlesPG connString = runAction connString fetchAction
 where
   fetchAction :: SqlPersistT (LoggingT IO) [(Entity User, Entity Article)]
   fetchAction = select . from $ \(users `InnerJoin` articles) -> do

Since we’re joining two tables together, we need to specify what columns we’re joining on. We’ll use the on function for that:

fetchAction :: SqlPersistT (LoggingT IO) [(Entity User, Entity Article)]
   fetchAction = select . from $ \(users `InnerJoin` articles) -> do
     on (users ^. UserId ==. articles ^. ArticleAuthorId)

Now we’ll order our articles based on the timestamp of the article using orderBy. The newest articles should come first, so we'll use a descending order. Then we limit the number of results with the limit function. Finally, we’ll return both the users and the articles, and we’re done!

fetchAction :: SqlPersistT (LoggingT IO) [(Entity User, Entity Article)]
   fetchAction = select . from $ \(users `InnerJoin` articles) -> do
     on (users ^. UserId ==. articles ^. ArticleAuthorId)
     orderBy [desc (articles ^. ArticlePublishedTime)]
     limit 10
     return (users, articles)

Caching Different Types of Items

We won’t go into the details of caching our articles in Redis, but there is one potential issue we want to observe. Currently we’re using a user’s SQL key as their key in our Redis store. So for instance, the string “15” could be such a key. If we try to naively use the same idea for our articles, we’ll have a conflict! Trying to store an article with ID “15” will overwrite the entry containing the User!

But the way around this is rather simple. What we would do is that for the user’s key, we would make the string something like users:15. Then for our article, we’ll have its key be articles:15. As long as we deserialize it the proper way, this will be fine.

Filling in the Server handlers

Now that we’ve written our database query functions, it is very simple to fill in our Server handlers. Most of them boil down to following the patterns we’ve already set with our other two endpoints:

fetchArticleHandler :: PGInfo -> Int64 -> Handler Article
fetchArticleHandler pgInfo aid = do
 maybeArticle <- liftIO $ fetchArticlePG pgInfo aid
 case maybeArticle of
   Just article -> return article
   Nothing -> Handler $ (throwE $ err401 { errBody = "Could not find article with that ID" })

createArticleHandler :: PGInfo -> Article -> Handler Int64
createArticleHandler pgInfo article = liftIO $ createArticlePG pgInfo article

fetchArticlesByAuthorHandler :: PGInfo -> Int64 -> Handler [Entity Article]
fetchArticlesByAuthorHandler pgInfo uid = liftIO $ fetchArticlesByAuthorPG pgInfo uid

fetchRecentArticlesHandler :: PGInfo -> Handler [(Entity User, Entity Article)]
fetchRecentArticlesHandler pgInfo = liftIO $ fetchRecentArticlesPG pgInfo

Then we’ll complete our Server FullAPI like so:

fullAPIServer :: PGInfo -> RedisInfo -> Server FullAPI
fullAPIServer pgInfo redisInfo =
 (fetchUsersHandler pgInfo redisInfo) :<|>
 (createUserHandler pgInfo) :<|>
 (fetchArticleHandler pgInfo) :<|>
 (createArticleHandler pgInfo) :<|>
 (fetchArticlesByAuthorHandler pgInfo) :<|>
 (fetchRecentArticlesHandler pgInfo)

One interesting thing we can do is that we can compose our API types into different sections. For instance, we could separate our FullAPI into two parts. First, we could have the UsersAPI type from before, and then we could make a new type for ArticlesAPI. We can glue these together with the e-plus operator just as we could individual endpoints!

type FullAPI = UsersAPI :<|> ArticlesAPI

type UsersAPI =
      "users" :> Capture "userid" Int64 :> Get '[JSON] User
 :<|> "users" :> ReqBody '[JSON] User :> Post '[JSON] Int64

type ArticlesAPI =
 "articles" :> Capture "articleid" Int64 :> Get '[JSON] Article
 :<|> "articles" :> ReqBody '[JSON] Article :> Post '[JSON] Int64
 :<|> "articles" :> "author" :> Capture "authorid" Int64 :> Get '[JSON] [Entity Article]
 :<|> "articles" :> "recent" :> Get '[JSON] [(Entity User, Entity Article)]

If we do this, we’ll have to make similar adjustments in other areas combining the endpoints. For example, we would need to update the server handler joining and the client functions.

Writing Tests

Since we already have some user tests, it would also be good to have a few tests on the Articles section of the API. We’ll add one simple test around creating an article and then fetching it. Then we’ll add one test each for the "articles-by-author" and "recent articles" endpoints.

So one of the tricky parts of filling in this section will be that we need to make test Article object. But we'll need them to be functions on the User ID. This is because we can’t know a priori what SQL IDs we'll get when we insert the users into the database. But we can fill in all the other fields, including the published time. Here’s one example, but we’ll have a total of 18 different “test” articles.

testArticle1 :: Int64 -> Article
testArticle1 uid = Article
 { articleTitle = "First post"
 , articleBody = "A great description of our first blog post body."
 , articlePublishedTime = posixSecondsToUTCTime 1498914000
 , articleAuthorId = toSqlKey uid
 }

-- 17 other articles and some test users as well
…

Our before hooks will create all these different entities in the database. In general, we’ll go straight to the database without calling the API itself. Like with our users tests, we'll want to delete any database items we create. Let's write a generic after-hook that will take user IDs and article IDs and delete them from our database:

deleteArtifacts :: PGInfo -> RedisInfo -> [Int64] -> [Int64] -> IO ()
deleteArtifacts pgInfo redisInfo users articles = do
 void $ forM articles $ \a -> deleteArticlePG pgInfo a
 void $ forM users $ \u -> do
   deleteUserCache redisInfo u
   deleteUserPG pgInfo u

It’s important we delete the articles first! If we delete the users first, we'll encounter foreign key exceptions!

Our basic create-and-fetch test looks a lot like the previous user tests. We test the success of the response and that the new article lives in Postgres as we expect.

beforeHook4 :: ClientEnv -> PGInfo -> IO (Bool, Bool, Int64, Int64)
beforeHook4 clientEnv pgInfo = do
 userKey <- createUserPG pgInfo testUser2
 articleKeyEither <- runClientM (createArticleClient (testArticle1 userKey)) clientEnv
 case articleKeyEither of
   Left _ -> error "DB call failed on spec 4!"
   Right articleKey -> do
     fetchResult <- runClientM (fetchArticleClient articleKey) clientEnv
     let callSucceeds = isRight fetchResult
     articleInPG <- isJust <$> fetchArticlePG pgInfo articleKey
     return (callSucceeds, articleInPG, userKey, articleKey)

spec4 :: SpecWith (Bool, Bool, Int64, Int64)
spec4 = describe "After creating and fetching an article" $ do
 it "The fetch call should return a result" $ \(succeeds, _, _, _) -> succeeds `shouldBe` True
 it "The article should be in Postgres" $ \(_, inPG, _, _) -> inPG `shouldBe` True

afterHook4 :: PGInfo -> RedisInfo -> (Bool, Bool, Int64, Int64) -> IO ()
afterHook4 pgInfo redisInfo (_, _, uid, aid) = deleteArtifacts pgInfo redisInfo [uid] [aid]

Our next test will create two different users and several different articles. We'll first insert the users and get their keys. Then we can use these keys to create the articles. We create five articles in this test. We assign three to the first user, and two to the second user:

beforeHook5 :: ClientEnv -> PGInfo -> IO ([Article], [Article], Int64, Int64, [Int64])
beforeHook5 clientEnv pgInfo = do
 uid1 <- createUserPG pgInfo testUser3
 uid2 <- createUserPG pgInfo testUser4
 articleIds <- mapM (createArticlePG pgInfo)
   [ testArticle2 uid1, testArticle3 uid1, testArticle4 uid1
   , testArticle5 uid2, testArticle6 uid2 ]
 ...

Now we want to test that we when call the articles-by-user endpoint, we only get the right articles. We’ll return each group of articles, the user IDs, and the list of article IDs:

beforeHook5 :: ClientEnv -> PGInfo -> IO ([Article], [Article], Int64, Int64, [Int64])
beforeHook5 clientEnv pgInfo = do
 uid1 <- createUserPG pgInfo testUser3
 uid2 <- createUserPG pgInfo testUser4
 articleIds <- mapM (createArticlePG pgInfo)
   [ testArticle2 uid1, testArticle3 uid1, testArticle4 uid1
   , testArticle5 uid2, testArticle6 uid2 ]
 firstArticles <- runClientM (fetchArticlesByAuthorClient uid1) clientEnv
 secondArticles <- runClientM (fetchArticlesByAuthorClient uid2) clientEnv
 case (firstArticles, secondArticles) of
   (Right as1, Right as2) -> return (entityVal <$> as1, entityVal <$> as2, uid1, uid2, articleIds)
   _ -> error "Spec 5 failed!"

Now we can write the assertion itself, testing that the articles returned are what we expect.

spec5 :: SpecWith ([Article], [Article], Int64, Int64, [Int64])
spec5 = describe "When fetching articles by author ID" $ do
 it "Fetching by the first author should return 3 articles" $ \(firstArticles, _, uid1, _, _) ->
   firstArticles `shouldBe` [testArticle2 uid1, testArticle3 uid1, testArticle4 uid1]
 it "Fetching by the second author should return 2 articles" $ \(_, secondArticles, _, uid2, _) ->
   secondArticles `shouldBe` [testArticle5 uid2, testArticle6 uid2]

We would then follow that up with a similar after hook.

The final test will follow a similar pattern. Only this time, we’ll be checking the combinations of users and articles. We’ll also make sure to include 12 different articles to test that the API limits results to 10.

beforeHook6 :: ClientEnv -> PGInfo -> IO ([(User, Article)], Int64, Int64, [Int64])
beforeHook6 clientEnv pgInfo = do
 uid1 <- createUserPG pgInfo testUser5
 uid2 <- createUserPG pgInfo testUser6
 articleIds <- mapM (createArticlePG pgInfo)
   [ testArticle7 uid1, testArticle8 uid1, testArticle9 uid1, testArticle10 uid2
   , testArticle11 uid2, testArticle12 uid1, testArticle13 uid2, testArticle14 uid2
   , testArticle15 uid2, testArticle16 uid1, testArticle17 uid1, testArticle18 uid2
   ]
 recentArticles <- runClientM fetchRecentArticlesClient clientEnv
 case recentArticles of
   Right as -> return (entityValTuple <$> as, uid1, uid2, articleIds)
   _ -> error "Spec 6 failed!"
 where
   entityValTuple (Entity _ u, Entity _ a) = (u, a)

Our spec will check that the list of 10 articles we get back matches our expectations. Then, as always, we remove the entities from our database.

Now we call these tests with our other tests, with small wrappers to call the hooks:

main :: IO ()
main = do
 ...
 hspec $ before (beforeHook4 clientEnv pgInfo) $ after (afterHook4 pgInfo redisInfo) $ spec4
 hspec $ before (beforeHook5 clientEnv pgInfo) $ after (afterHook5 pgInfo redisInfo) $ spec5
 hspec $ before (beforeHook6 clientEnv pgInfo) $ after (afterHook6 pgInfo redisInfo) $ spec6

And now we’re done! The tests pass!

…
After creating and fetching an article
 The fetch call should return a result
 The article should be in Postgres

Finished in 0.1698 seconds
2 examples, 0 failures

When fetching articles by author ID
 Fetching by the first author should return 3 articles
 Fetching by the second author should return 2 articles

Finished in 0.4944 seconds
2 examples, 0 failures

When fetching recent articles
 Should fetch exactly the 10 most recent articles

Conclusion

This completes our overview of useful production libraries. Over these articles, we’ve constructed a small web API from scratch. We’ve seen some awesome abstractions that let us deal with only the most important pieces of the project. Both Persistent and Servant generated a lot of extra boilerplate for us. This article showed the power of the Esqueleto library in allowing us to do type-safe joins. We also saw an end-to-end process of adding a new type and endpoints to our API.

In the coming weeks, we’ll be dealing with some more issues that can arise when building these kinds of systems. In particular, we’ll see how we can use alternative monads on top of Servant. Doing this can present certain issues that we'll explore. We’ll culminate by exploring the different approaches to encapsulating effects.

Be sure to check out our Haskell Stack mini-course!! It'll show you how to use Stack, so you can incorproate all the libraries from this series!

If you’re new to Haskell and not ready for that yet, take a look at our Getting Started Checklist and get going!

Read More
James Bowen James Bowen

Tangled Webs: Testing an Integrated System

In the last few articles, we’ve combined several useful Haskell libraries to make a small web app. We used Persistent to create a schema with automatic migrations for our database. Then we used Servant to expose this database as an API through a couple simple queries. Finally, we used Redis to act as a cache so that repeated requests for the same user happen faster.

For our next step, we’ll tackle a thorny issue: testing. How do we test a system that has so many moving parts? There are a couple different general approaches we could take. On one end of the spectrum we could mock out most of our API calls and services. This helps gives our testing deterministic behavior. This is desirable since we would want to tie deployments to test results. But we also want to be faithfully testing our API. So on the other end, there’s the approach we’ll try in this article. We'll set up functions to run our API and other services, and then use before and after hooks to use them.

Creating Client Functions for Our API

Calling our API from our tests means we’ll want a way to make API calls programmatically. We can do this with amazing ease with a Servant API by using the servant-client library. This library has one main function: client. This function takes a proxy for our API and generates programmatic client functions. Let's remember our basic endpoint types (after resolving connection information parameters):

fetchUsersHandler :: Int64 -> Handler User
createUserHandler :: User -> Handler Int64

We’d like to be able to call these API’s with functions that use the same parameters. Those types might look something like this:

fetchUserClient :: Int64 -> m User
createUserClient :: User -> m Int64

Where m is some monad. And in this case, the ServantClient library provides such a monad, ClientM. So let’s re-write these type signatures, but leave them seemingly unimplemented:

fetchUserClient :: Int64 -> ClientM User
createUserClient :: User -> ClientM Int64

Now we’ll construct a pattern match that combines these function names with the :<|> operator. As always, we need to make sure we do this in the same order as the original API type. Then we’ll set this pattern to be the result of calling client on a proxy for our API:

fetchUserClient :: Int64 -> ClientM User
createUserClient :: User -> ClientM Int64
(fetchUserClient :<|> createUserClient) = client (Proxy :: Proxy UsersAPI)

And that’s it! The Servant library fills in the details for us and implements these functions! We’ll see how we can actually call these functions later in the article.

Setting Up the Tests

We’d like to get to the business of deciding on our test cases and writing them. But first we need to make sure that our tests have a proper environment. This means 3 things. First we need to fetch the connection information for our data stores and API. This means the PGInfo, the RedisInfo, and the ClientEnv we’ll use to call the client functions we wrote. Second, we need to actually migrate our database so it has the proper tables. Third, we need to make sure our server is actually running. Let’s start with the connection information, as this is easy:

import Database (fetchPostgresConnection, fetchRedisConnection)

...

setupTests = do
  pgInfo <- fetchPostgresConnection
  redisInfo <- fetchRedisConnection
  ...

Now to create our client environment, we’ll need two main things. We’ll need a manager for the network connections and the base URL for the API. Since we’re running the API locally, we’ll use a localhost URL. The default manager from the Network library will work fine for us:

import Network.HTTP.Client (newManager)
import Network.HTTP.Client.TLS (tlsManagerSettings)
import Servant.Client (ClientEnv(..))

main = do
  pgInfo <- fetchPostgresConnection
  redisInfo <- fetchRedisConnection
  mgr <- newManager tlsManagerSettings
  baseUrl <- parseBaseUrl "http://127.0.0.1:8000"
  let clientEnv = ClientEnv mgr baseUrl

Now we can run our migration, which will ensure that our users table exists:

import Schema (migrateAll)

main = do
  pgInfo <- fetchPostgresConnection
  ...
  runStdoutLoggingT $ withPostgresqlConn pgInfo $ \dbConn ->
    runReaderT (runMigrationSilent migrateAll) dbConn

Last of all, we’ll start our server with runServer from our API module. We’ll fork this off to a separate thread, as otherwise it will block the test thread! We’ll wait for a second afterward to make sure it actually loads before the tests run (there are less hacky ways to do this of course). But then we’ll return all the important information we need, and we're done with test setup:

main :: IO (PGInfo, RedisInfo, ClientEnv, ThreadID)
main = do
  pgInfo <- fetchPostgresConnection
  redisInfo <- fetchRedisConnection
  mgr <- newManager tlsManagerSettings
  baseUrl <- parseBaseUrl "http://127.0.0.1:8000"
  let clientEnv = ClientEnv mgr baseUrl
  runStdoutLoggingT $ withPostgresqlConn pgInfo $ \dbConn ->
    runReaderT (runMigrationSilent migrateAll) dbConn
  threadId <- forkIO runServer
  threadDelay 1000000
  return (pgInfo, redisInfo, clientEnv, serverThreadId)

Organizing our 3 Test Cases

Now that we’re all set up, we can decide on our test cases. We’ll look at 3. First, if we have an empty database and we fetch a user by some arbitrary ID, we’ll expect an error. Further, we should expect that the user does not exist in the database or in the cache, even after calling fetch.

In our second test case, we’ll look at the effects of calling the create endpoint. We’ll save the key we get from this endpoint. Then we’ll verify that this user exists in the database, but NOT in the cache. Finally, our third case will insert the user with the create endpoint and then fetch the user. We’ll expect at the end of this that in fact the user exists in both the database AND the cache.

We organize each of our tests into up to three parts: the “before hook”, the test assertions, and the “after hook”. A “before hook” is some IO code we’ll run that will return particular results to our test assertion. We want to make sure it’s done running BEFORE any test assertions. This way, there’s no interleaving of effects between our test output and the API calls. Each before hook will first make the API calls we want. Then they'll investigate our different databases and determine if certain users exist.

We also want our tests to be database-neutral. That is, the database and cache should be in the same state after the test as they were before. So we’ll also have “after hooks” that run after our tests have finished (if we’ve actually created anything). The after hooks will delete any new entries. This means our before hooks also have to pass the keys for any database entities they create. This way the after hooks know what to delete.

Last of course, we actually need the testing code that assertions about the results. These will be pretty straightforward as we’ll see below.

Test #1

For our first test, we’ll start by making a client call to our API. We use runClientM combined with our clientEnv and the fetchUserClient function. Next, we’ll determine that the call in fact returns an error as it should. Then we’ll add two more lines checking if there’s an entry with the arbitrary ID in our database and our cache. Finally, we return all three boolean values:

beforeHook1 :: ClientEnv -> PGInfo -> RedisInfo -> IO (Bool, Bool, Bool)
beforeHook1 clientEnv pgInfo redisInfo = do
  callResult <- runClientM (fetchUserClient 1) clientEnv
  let throwsError = isLeft (callResult)
  inPG <- isJust <$> fetchUserPG pgInfo 1
  inRedis <- isJust <$> fetchUserRedis redisInfo 1
  return (throwsError, inPG, inRedis)

Now we’ll write our assertion. Since we’re using a before hook returning three booleans, the type of our Spec will be SpecWith (Bool, Bool, Bool). Each it assertion will take this boolean tuple as a parameter, though we’ll only use one for each line.

spec1 :: SpecWith (Bool, Bool, Bool)
spec1 = describe "After fetching on an empty database" $ do
  it "The fetch call should throw an error" $ \(throwsError, _, _) -> throwsError `shouldBe` True
  it "There should be no user in Postgres" $ \(_, inPG, _) -> inPG `shouldBe` False
  it "There should be no user in Redis" $ \(_, _, inRedis) -> inRedis `shouldBe` False

And that’s all we need for the first test! We don’t need an after hook since it doesn’t add anything to our database.

Tests 2 and 3

Now that we’re a little more familiar with how this code works, let’s take a quick look at the next before hook. This time we’ll first try creating our user. If this fails for whatever reason, we’ll throw an error and stop the tests. Then we can use the key to check out if the user exists in our database and Redis. We return the boolean values and the key.

beforeHook2 :: ClientEnv -> PGInfo -> RedisInfo -> IO (Bool, Bool, Int64)
beforeHook2 clientEnv pgInfo redisInfo = do
  userKeyEither <- runClientM (createUserClient testUser) clientEnv
  case userKeyEither of
    Left _ -> error "DB call failed on spec 2!"
    Right userKey -> do 
      inPG <- isJust <$> fetchUserPG pgInfo userKey
      inRedis <- isJust <$> fetchUserRedis redisInfo userKey
      return (inPG, inRedis, userKey)

Now our spec will look similar. This time we expect to find a user in Postgres, but not in Redis.

spec2 :: SpecWith (Bool, Bool, Int64)
spec2 = describe "After creating the user but not fetching" $ do
  it "There should be a user in Postgres" $ \(inPG, _, _) -> inPG `shouldBe` True
  it "There should be no user in Redis" $ \(_, inRedis, _) -> inRedis `shouldBe` False

Now we need to add the after hook, which will delete the user from the database and cache. Of course, we expect the user won’t exist in the cache, but we include this since we’ll need it in the final example:

afterHook :: PGInfo -> RedisInfo -> (Bool, Bool, Int64) -> IO ()
afterHook pgInfo redisInfo (_, _, key) = do
  deleteUserCache redisInfo key
  deleteUserPG pgInfo key

Last, we’ll write one more test case. This will mimic the previous case, except we’ll throw in a call to fetch in between. As a result, we expect the user to be in both Postgres and Redis:

beforeHook3 :: ClientEnv -> PGInfo -> RedisInfo -> IO (Bool, Bool, Int64)
beforeHook3 clientEnv pgInfo redisInfo = do
  userKeyEither <- runClientM (createUserClient testUser) clientEnv
  case userKeyEither of
    Left _ -> error "DB call failed on spec 3!"
    Right userKey -> do 
      _ <- runClientM (fetchUserClient userKey) clientEnv 
      inPG <- isJust <$> fetchUserPG pgInfo userKey
      inRedis <- isJust <$> fetchUserRedis redisInfo userKey
      return (inPG, inRedis, userKey)

spec3 :: SpecWith (Bool, Bool, Int64)
spec3 = describe "After creating the user and fetching" $ do
  it "There should be a user in Postgres" $ \(inPG, _, _) -> inPG `shouldBe` True
  it "There should be a user in Redis" $ \(_, inRedis, _) -> inRedis `shouldBe` True

And it will use the same after hook as case 2, so we’re done!

Hooking in and running the tests

The last step is to glue all our pieces together with hspec, before, and after. Here’s our main function, which also kills the thread running the server once it’s done:

main :: IO ()
main = do
  (pgInfo, redisInfo, clientEnv, tid) <- setupTests
  hspec $ before (beforeHook1 clientEnv pgInfo redisInfo) spec1
  hspec $ before (beforeHook2 clientEnv pgInfo redisInfo) $ after (afterHook pgInfo redisInfo) $ spec2
  hspec $ before (beforeHook3 clientEnv pgInfo redisInfo) $ after (afterHook pgInfo redisInfo) $ spec3
  killThread tid 
  return ()

And now our tests should pass!

After fetching on an empty database
  The fetch call should throw an error
  There should be no user in Postgres
  There should be no user in Redis

Finished in 0.0410 seconds
3 examples, 0 failures

After creating the user but not fetching
  There should be a user in Postgres
  There should be no user in Redis

Finished in 0.0585 seconds
2 examples, 0 failures

After creating the user and fetching
  There should be a user in Postgres
  There should be a user in Redis

Finished in 0.0813 seconds
2 examples, 0 failures

Using Docker

So when I say, “the tests pass”, they now work on my system. But if you were to clone the code as is and try to run them, you would get failures. The tests depend on Postgres and Redis, so if you don't have them running, they fail! It is quite annoying to have your tests depend on these outside services. This is the weakness of devising our tests as we have. It increases the on-boarding time for anyone coming into your codebase. The new person has to figure out which things they need to run, install them, and so on.

So how do we fix this? One answer is by using Docker. Docker allows you to create containers that have particular services running within them. This spares you from worrying about the details of setting up the services on your local machine. Even more important, you can deploy a docker image to your remote environments. So develop and prod will match your local system. To setup this process, we’ll create a description of the services we want running on our Docker container. We do this with a docker-compose file. Here’s what ours looks like:

version: '2'

services:
  postgres:
    image: postgres:9.6
    container_name: prod-haskell-series-postgres
    ports:
      - "5432:5432"

  redis:
    image: redis:4.0
    container_name: prod-haskell-series-redis
    ports:
      - "6379:6379"

Then, you can start these services for your Docker machines with docker-compose up. Granted, you do have to install and run Docker. But if you have several different services, this is a much easier on-boarding process. Better yet, the "compose" file ensures everyone uses the same versions of these services.

Even with this container running, the tests will still fail! That’s because you also need the tests themselves to be running on your Docker cluster. But with Stack, this is easy! We’ll add the following flag to our stack.yaml file:

docker:
  enable: true

Now, whenever you build and test your program, you will do so on Docker. The first time you do this, Docker will need to set everything up on the container. This means it will have to download Stack and ALL the different packages you use. So the first run will take a while. But subsequent runs will be normal. So after all that finishes, NOW the tests should work!

Conclusion

Testing integrated systems is hard. We can try mocking out the behavior of external services. But this can lead to a test representation of our program that isn’t faithful to the production system. But using the before and after hooks from Hspec is a great way make sure all your external events happen first. Then you can pass those results to simpler test assertions.

When it comes time to run your system, it helps if you can bring up all your external services with one command! Docker allows you to do this by listing the different services in the docker-compose file. Then, Stack makes it easy to run your program and tests on a docker container, so you can use the services!

Stack is the key to all this integration. If you’ve never used Stack before, you should check out our free mini-course. It will teach you all the basics of organizing a Haskell project using Stack.

If this is your first exposure to Haskell, I’ve hopefully convinced of some of its awesome possibilities! Take a look at our Getting Started Checklist and get learning!

Read More
James Bowen James Bowen

A Cache is Fast: Enhancing our API with Redis

In the last couple weeks we’ve used Persistent to store a User type in a Postgresql database. Then we were able to use Servant to create a very simple API that exposed this database to the outside world. This week, we’re going to look at how we can improve the performance of our API using a Redis cache.

One cannot overstate the importance of caching in both software and hardware. There's a hierarchy of memory types from registers, to RAM, to the File system, to a remote database. Accessing each of these gets progressively slower (by orders of magnitude). But the faster means of storage are more expensive, so we can’t always have as much as we'd like.

But memory usage operates on a very important principle. When we use a piece of memory once, we’re very likely to use it again in the near-future. So when we pull something out of long-term memory, we can temporarily store it in short-term memory as well. This way when we need it again, we can get it faster. After a certain point, that item will be overwritten by other more urgent items. This is the essence of caching.

Redis 101

Redis is an application that allows us to create a key-value store of items. It functions like a database, except it only uses these keys. It lacks the sophistication of joins, foreign table references and indices. So we can’t run the kinds of sophisticated queries that are possible on an SQL database. But we can run simple key lookups, and we can do them faster. In this article, we'll use Redis as a short-term cache for our user objects.

For this article, we've got one main goal for cache integration. Whenever we “fetch” a user using the GET endpoint in our API, we want to store that user in our Redis cache. Then the next time someone requests that user from our API, we'll grab them out of the cache. This will save us the trouble of making a longer call to our Postgres database.

Connecting to Redis

Haskell's Redis library has a lot of similarities to Persistent and Postgres. First, we’ll need some sort of data that tells us where to look for our database. For Postgres, we used a simple ConnectionString with a particular format. Redis uses a full data type called ConnectInfo.

data ConnectInfo = ConnectInfo
  { connectHost :: HostName -- String
  , connectPort :: PortId   -- (Can just be a number)
  , connectAuth :: Maybe ByteString
  , connectDatabase :: Integer
  , connectMaxConnection :: Int
  , connectMaxIdleTime :: NominalDiffTime
  }

This has many of the same fields we stored in our PG string, like the host IP address, and the port number. The rest of this article assumes you are running a local Redis instance at port 6379. This means we can use defaultConnectInfo. As always, in a real system you’d want to grab this information out of a configuration, so you’d need IO.

fetchRedisConnection :: IO ConnectInfo
fetchRedisConnection = return defaultConnectInfo

With Postgres, we used withPostgresqlConn to actually connect to the database. With Redis, we do this with the connect function. We'll get a Connection object that we can use to run Redis actions.

connect :: ConnectInfo -> IO Connection

With this connection, we simply use runRedis, and then combine it with an action. Here’s the wrapper runRedisAction we’ll write for that:

runRedisAction :: ConnectInfo -> Redis a -> IO a
runRedisAction redisInfo action = do
  connection <- connect redisInfo
  runRedis connection action

The Redis Monad

Just as we used the SqlPersistT monad with Persist, we’ll use the Redis monad to interact with our Redis cache. Our API is simple, so we’ll stick to three basic functions. The real types of these functions are a bit more complicated. But this is because of polymorphism related to transactions, and we won't be using those.

get :: ByteString -> Redis (Either x (Maybe ByteString))
set :: ByteString -> ByteString -> Redis (Either x ())
setex :: ByteString -> ByteString -> Int -> Redis (Either x ())

Redis is a key-value store, so everything we set here will use ByteString items. But once we’ve done that, these functions are all we need to use. The get function takes a ByteString of the key and delivers the value as another ByteString. The set function takes both the serialized key and value and stores them in the cache. The setex function does the same thing as set except that it also sets an expiration time for the item we’re storing.

Expiration is a very useful feature to be aware of, since most relational databases don’t have this. The nature of a cache is that it’s only supposed to store a subset of our information at any given time. If we never expire or delete anything, it might eventually store our whole database. That would defeat the purpose of using a cache! It's memory footprint should remain low compared to our database. So we'll use setex in our API.

Saving a User in Redis

So now let’s move on to the actions we’ll actually use in our API. First, we’ll write a function that will actually store a key-value pair of an Int64 key and the User in the database. Here’s how we start:

cacheUser :: ConnectInfo -> Int64 -> User -> IO ()
cacheUser redisInfo uid user = runRedisAction redisInfo $ setex ??? ??? ???

All we need to do now is convert our key and our value to ByteString values. We'll keep it simple and use Data.ByteString.Char8 combined with our Show and Read instances. Then we’ll create a Redis action using setex and expire the key after 3600 seconds (one hour).

import Data.ByteString.Char8 (pack, unpack)

...

cacheUser :: ConnectInfo -> Int64 -> User -> IO ()
cacheUser redisInfo uid user = runRedisAction redisInfo $ void $ 
  setex (pack . show $ uid) 3600 (pack . show $ user)

(We use void to ignore the result of the Redis call).

Fetching from Redis

Fetching a user is a similar process. We’ll take the connection information and the key we’re looking for. The action we’ll create uses the bytestring representation and calls get. But we can’t ignore the result of this call like we could before! Retrieving anything gives us Either e (Maybe ByteString). A Left response indicates an error, while Right Nothing indicates the key doesn’t exist. We’ll ignore the errors and treat the result as Maybe User though. If any error comes up, we’ll return Nothing. This means we run a simple pattern match:

fetchUserRedis :: ConnectInfo -> Int64 -> IO (Maybe User)
fetchUserRedis redisInfo uid = runRedisAction redisInfo $ do
  result <- Redis.get (pack . show $ uid)
  case result of
    Right (Just userString) -> return $ Just (read . unpack $ userString)
    _ -> return Nothing

If we do find something for that key, we’ll read it out of its ByteString format and then we’ll have our final User object.

Applying this to our API

Now that we’re all set up with our Redis functions, we have the update the fetchUsersHandler to use this cache. First, we now need to pass the Redis connection information as another parameter. For ease of reading, we’ll refer to these using type synonyms (PGInfo and RedisInfo) from now on:

type PGInfo = ConnectionString
type RedisInfo = ConnectInfo

…

fetchUsersHandler :: PGInfo -> RedisInfo -> Int64 -> Handler User
fetchUsersHandler pgInfo redisInfo uid = do
  ...

The first thing we’ll try is to look up the user by their ID in the Redis cache. If the user exists, we’ll immediately return that user.

fetchUsersHandler :: PGInfo -> RedisInfo -> Int64 -> Handler User
fetchUsersHandler pgInfo redisInfo uid = do
  maybeCachedUser <- liftIO $ fetchUserRedis redisInfo uid
  case maybeCachedUser of
    Just user -> return user
    Nothing -> do
      ...

If the user doesn’t exist, we’ll then drop into the logic of fetching the user in the database. We’ll replicate our logic of throwing an error if we find that user doesn’t actually exist. But if we find the user, we need one more step. Before we return it, we should call cacheUser and store it for the future.

fetchUsersHandler :: PGInfo -> RedisInfo -> Int64 -> Handler User
fetchUsersHandler pgInfo redisInfo uid = do
  maybeCachedUser <- liftIO $ fetchUserRedis redisInfo uid
  case maybeCachedUser of
    Just user -> return user
    Nothing -> do
      maybeUser <- liftIO $ fetchUserPG pgInfo uid
      case maybeUser of
        Just user -> liftIO (cacheUser redisInfo uid user) >> return user
        Nothing -> Handler $ (throwE $ err401 { errBody = "Could not find user with that ID" })

Since we changed our type signature, we’ll have to make a few other updates as well, but these are quite simple:

usersServer :: PGInfo -> RedisInfo -> Server UsersAPI
usersServer pgInfo redisInfo =
  (fetchUsersHandler pgInfo redisInfo) :<|> 
  (createUserHandler pgInfo)


runServer :: IO ()
runServer = do
  pgInfo <- fetchPostgresConnection
  redisInfo <- fetchRedisConnection
  run 8000 (serve usersAPI (usersServer pgInfo redisInfo))

And that’s it! We have a functioning cache with expiring entries. This means that repeated queries to our fetch endpoint should be much faster!

Conclusion

Caching is a vitally important way that we can write software that is often much faster for our users. Redis is a key-value store that we can use as a cache for our most frequently used data. We can use it as an alternative to forcing every single API call to hit our database. In Haskell, the Redis API requires everything to be a ByteString. So we have to deal with some logic surrounding encoding and decoding. But otherwise it operates in a very similar way to Persistent and Postgres.

Be sure to take a look at this code on Github! There’s a redis branch for this article. It includes all the code samples, including things I skipped over like imports!

We’re starting to get to the point where we’re using a lot of different libraries in our Haskell application! It pays to know how to organize everything, so package management is vital! I tend to use Stack for all my package management. It makes it quite easy to bring all these different libraries together. If you want to learn how to use Stack, check out our free Stack mini-course!

If you’ve never learned Haskell before, you should try it out! Download our Getting Started Checklist!

Read More