James Bowen James Bowen

Applicative Parsing I: Building the Foundation

applicative_parsing_1.png

Last week we prepared ourselves for parsing by going over the basics of the Gherkin Syntax. In this article and the next, we’ll be using the applicative parsing library to parse that syntax. This week, we’ll focus on the fundamentals of this library, and build up a vocabulary of combinators to use. We'll make heavy use of the Applicative typeclass. If you need a refresher on that, check out this article. As we start coding, you can also follow along with the examples on Github here! Most of the code here is in Parser.hs.

In the coming weeks, we’ll be seeing a couple other parsing libraries as well. If you want to get some ideas about these and more, download our Production Checklist. It summarizes many other useful libraries for writing higher level Haskell.

If you’ve never started writing Haskell, now’s your chance! Get our free Beginner’s Checklist and learn the basics of getting started!

Getting Started

So to start parsing, let’s make some notes about our input format. First, we’ll treat our input feature document as a single string. We’ll remove all empty lines, and then trim leading and trailing whitespace from each line.

parseFeatureFromFile :: FilePath -> IO Feature
parseFeatureFromFile inputFile = do
  fileContents <- lines <$> readFile inputFile
  let nonEmptyLines = filter (not . isEmpty) fileContents
  let trimmedLines = map trim nonEmptyLines
  let finalString = unlines trimmedLines
  case parseFeature finalString of
    ...

…
isEmpty :: String -> Bool
isEmpty = all isSpace

trim :: String -> String
trim input = reverse flippedTrimmed
  where
    trimStart = dropWhile isSpace input
    flipped = reverse trimStart
    flippedTrimmed = dropWhile isSpace flipped

This means a few things for our syntax. First, we don’t care about indentation. Second, we ignore extra lines. This means our parsers might allow certain formats we don’t want. But that’s OK because we’re trying to keep things simple.

The RE Type

With applicative based parsing, the main data type we’ll be working with is called RE, for regular expression. This represents a parser, and it’s parameterized by two types:

data RE s a = ...

The s type refers to the fundamental unit we’ll be parsing. Since we're parsing our input as a single String, this will be Char. Then the a type is the result of the parsing element. This varies from parser to parser. The most basic combinator we can use is sym. This parses a single symbol of your choosing:

sym :: s - > RE s s

parseLowercaseA :: RE Char Char
parseLowercaseA = sym ‘a’

To use an RE parser, we call the match function or its infix equivalent =~. These will return a Just value if we can match the entire input string, and Nothing otherwise:

>> match parseLowercaseA “a”
Just ‘a’
>> “b” =~ parseLowercaseA
Nothing
>> “ab” =~ parseLowercaseA
Nothing -- (Needs to parse entire input)

Predicates and Strings

Naturally, we’ll want some more complicated functionality. Instead of parsing a single input character, we can parse any character that fits a particular predicate by using psym. So if we want to read any character that was not a newline, we could do:

parseNonNewline :: RE Char Char
parseNonNewline = psym (/= ‘\n’)

The string combinator allows us to match a particular full string and then return it:

readFeatureWord :: RE Char String
readFeatureWord = string “Feature”

We’ll use this for parsing keywords, though we’ll often end up discarding the “result”.

Applicative Combinators

Now the RE type is applicative. This means we can apply all kinds of applicative combinators over it. One of these is many, which allows us to apply a single parser several times. Here is one combinator that we’ll use a lot. It allows us to read everything up until a newline and return the resulting string:

readUntilEndOfLine :: RE Char String
readUntilEndOfLine = many (psym (/= '\n'))

Beyond this, we’ll want to make use of the applicative <*> operator to combine different parsers. We can also apply a pure function (or constructor) on top of those by using <$>. Suppose we have a data type that stores two characters. Here’s how we can build a parser for it:

data TwoChars = TwoChars Char Char

parseTwoChars :: RE Char TwoChars
parseTwoChars = TwoChars <$> parseNonNewline <*> parseNonNewline

...

>> match parseTwoChars “ab”
Just (TwoChars ‘a’ ‘b’)

We can also use <* and *>, which are cousins of the main applicative operator. The first one will parse but then ignore the right hand parse result. The second discards the left side result.

parseFirst :: RE Char Char
parseFirst = parseNonNewline <* parseNonNewline

parseSecond :: RE Char Char
parseSecond = parseNonNewline *> parseNonnewline

…

>> match parseFirst “ab”
Just ‘a’
>> match parseSecond “ab”
Just ‘b’
>> match parseFirst “a”
Nothing

Notice the last one fails because the parser needs to have both inputs! We’ll come back to this idea of failure in a second. But now that we know this technique, we can write a couple other useful parsers:

readThroughEndOfLine :: RE Char String
readThroughEndOfLine = readUntilEndOfLine <* sym '\n'

readThroughBar :: RE Char String
readThroughBar = readUntilBar <* sym '|'

readUntilBar :: RE Char String
readUntilBar = many (psym (\c -> c /= '|' && c /= '\n'))

The first will parse the rest of the line and then consume the newline character itself. The other parsers accomplish this same task, except with the vertical bar character. We’ll need these when we parse the Examples section next week.

Alternatives: Dealing with Parse Failure

We introduced the notion of a parser “failing” up above. Of course, we need to be able to offer alternatives when a parser fails! Otherwise our language will be very limited in its structure. Luckily, the RE type also implements Alternative. This means we can use the <|> operator to determine an alternative parser when one fails. Let’s see this in action:

parseFeatureTitle :: RE Char String
parseFeatureTitle = string “Feature: “ *> readThroughEndOfLine

parseScenarioTitle :: RE Char String
parseScenarioTitle = string “Scenario: “ *> readThroughEndOfLine

parseEither :: RE Char String
parseEither = parseFeatureTitle <|> parseScenarioTitle

…

>> match parseFeatureTitle “Feature: Login\n”
Just “Login”
>> match parseFeatureTitle “Scenario: Login\n”
Nothing
>> match parseEither “Scenario: Login\n”
Just “Login”

Of course, if ALL the options fail, then we’ll still have a failing parser!

>> match parseEither “Random: Login\n”
Nothing

We’ll need this to introduce some level of choice into our parsing system. For instance, it’s up to the user if they want to include a Background as part of their feature. So we need to be able to read the background if it’s there or else move onto parsing a scenario.

Conclusion

That wraps up our introduction to the basic combinators of applicative parsing. Next week, we’ll take all the pieces we’ve developed here and put them to work on Gherkin syntax itself. Everything seems pretty small so far. But we’ll see that we can actually build up our results very rapidly once we have the basic pieces in place!

If you want to see some more libraries that are useful for important Haskell tasks, take a look at our Production Checklist. It will introduce you to some libraries for parsing, databases, APIs, and much more!

If you’re new to Haskell, there’s no better time to start! Download our free Beginners Checklist! It will help you download the right tools and start learning the language.

Read More
James Bowen James Bowen

Parsing Primer: Gherkin Syntax!

One topic I have yet to discuss on this blog is how to parse a domain specific language. This is odd, because Haskell has some awesome approaches for parsing. Haskell expressions tend to compose in awesome and simple ways. This provides an ideal environment in which to break down parsing into simpler tasks. Thus there are many excellent libraries out there.

In these next few weeks, we’ll be taking a tour of these different parsing libraries. But before we look at specific code, it will be useful to establish a common example for what we’re going to be parsing. In this article, I’ll introduce Gherkin Syntax, the language behind the Cucumber framework. We’ll go through the language specifics, then show the basics of how we set ourselves up for success in Haskell.

Gherkin Background

Cucumber is a framework for Behavior Driven Development. Under BDD, we first describe all the general behaviors we want our code to perform in plain language. This paradigm is an alternative to Test Driven Development. There, we use test cases to determine our next programming objectives. But BDD can do both of these if we can take behavior descriptions and automatically create tests from them! This would allow less technical members of a project team to effectively write tests!

The main challenge of this is formalizing a language for describing these behaviors. If we have a formal language, then we can parse it. If we can parse it into a reasonable structure, then we can turn that structure into runnable test code. This series will focus on the second part of this problem: turning Gherkin Syntax into a data structure (a Haskell data structure, in our case).

Gherkin Syntax

Gherkin syntax has many complexities, but for these articles we’ll be focusing on the core elements of it. The behaviors you want to test are broken down into a series of features. We describe each feature in its own .feature file. So our overarching task is to read input from a single file and turn it into a Feature object.

We begin our description of a feature with the Feature keyword (obviously). We'll give it a title, and then give it an indented description (our example will be a simple banking app):

Feature: Registering a User
  As a potential user
  I want to be able to create an account with a username,
    email and password
  So that I can start depositing money into my account

Each feature then has a series of scenarios. These describe specific cases of what can happen as part of this feature. Each scenario begins with the Scenario keyword and a title.

Scenario: Successful registration
  ...

Scenario: Email is already taken
  ...

Scenario: Username is already taken
  ...

Each scenario then has a series of Gherkin statements. These statements begin with one of the keywords Given, When, Then, or And. You should use Given statements to describe pre-conditions of the scenario. Then you’ll use When to describe the particular action a user is taking to initiate the scenario. And finally, you’ll use Then to describe the after effects.

Scenario: Email is already taken
  Given there is already an account with the email “test@test.com”
  When I register an account with username “test”,
    email “test@test.com” and password “1234abcd!?”
  Then it should fail with an error:
    “An account with that email already exists”

You can supplement any of these cases with a statement beginning with And.

Scenario: Email is already taken
  Given there is already an account with the email “test@test.com”
  And there is already an account with the username “test”
  When I register an account with username “test”,
    email “test@test.com” and password “1234abcd!?”
  Then it should fail with an error: 
    “An account with that email already exists”
  And there should still only be one account with 
    the email “test@test.com”

Gherkin syntax does not enforce that you use the different keywords in a semantically sound way. We could start every statement with Given and it would still work. But obviously you should do whatever you can to make your tests sound correct.

We can also fill in statements with variables in angle brackets. We'll then follow the scenario with a table of examples for those variables:

Scenario: Successful Registration
  Given There is no account with username <username>
    or email <email>
  When I register the account with username <username>,
    email <email> and password <password>
  Then it should successfully create the account
    with <username>, <email>, and <password>
  Examples:
    | username | email              | password      |
    | john doe | john@doe.com       | ABCD1234!?    |
    | jane doe | jane.doe@gmail.com | abcdefgh1.aba |
    | jackson  | jackson@yahoo.com  | cadsw4ll0p/   |

We can also create a Background for the whole feature. This is a scenario-like description of preconditions that exist for every scenario in that feature. This can also have an example table:

Feature: User Log In
  ...

Background:
  Given: There is an existing user with username <username>,
    email <email> and password <password>
  Examples:
    | username | email              | password      |
    | john doe | john@doe.com       | ABCD1234!?    |
    | jane doe | jane.doe@gmail.com | abcdefgh1.aba |

And that’s the whole language we’re going to be working with!

Haskell Data Structures

Let’s appreciate now how easy it is to create data structures in Haskell to represent this syntax. We’ll start with a description of a Feature. It has a title, description (which we’ll treat as a list of multiple lines), the background, and then a list of scenarios. We’ll also treat the background like a “nameless” scenario that may or may not exist:

data Feature = Feature
  { featureTitle :: String
  , featureDescription :: [String]
  , featureBackground :: Maybe Scenario
  , featureScenarios :: [Scenario]
  }

Now let’s describe what a Scenario is. It's main components are its title and a list of statements. We’ll also observe that we should have some kind of structure for the list of examples we'll provide:

data Scenario = Scenario
  { scenarioTitle :: String
  , scenarioStatements :: [Statement]
  , scenarioExamples :: ExampleTable
  }

This ExampleTable will store a list of possible keys as well as list of tuple maps. Each tuple will contain keys and values. At the scale we’re likely to be working at, it’s not worth it to use a full Map:

data ExampleTable = ExampleTable
  { exampleTableKeys :: [String]
  , exampleTableExamples :: [[(String, Value)]]
  }

Now we'll have to define what we mean by a Value. We’ll keep it simple and only use literal bools, strings, numbers, and a null value:

data Value =
  ValueNumber Scientific |
  ValueString String |
  ValueBool Bool |
  ValueNull

And finally we’ll describe a statement. This will have the string itself, as well as a list of variable keywords to interpolate:

data Statement = Statement
  { statementText :: String
  , statementExampleVariables :: [String]
  }

And that’s all there is too it! We can put all these types in a single file and feel pretty good about that. In Java or C++, we would want to make a separate file (or two!) for each type and there would be a lot more boilerplate involved.

General Parsing Approach

Another reason we’ll see that Haskell is good for parsing is the ease of breaking problems down into smaller pieces. We’ll have one function for parsing an example table, a different function for parsing a statement, and so on. Then gluing these together will actually be slick and simple!

Conclusion

Next week, come back and we’ll actually look at how we start parsing this. The first library we’ll use is the regex-applicative parsing library. We’ll see how we can get a lot of what we want without even using a monadic context!

For some more ideas on parsing libraries you can use, check out our free Production Checklist. It will tell you about different libraries for parsing as well as a great many other tasks, from data structures to web APIs!

If you’ve never written in Haskell before but are intrigued by the possibilities, download our Beginner’s Checklist and read our Liftoff Series!

Read More
James Bowen James Bowen

Monday Morning Haskell: Upgraded!

Welcome to the new Monday Morning Haskell! We just went live with the latest changes to the website this week. So it’s time to announce what’s coming next. Our main project right now is converting older blog content into permanent, organized, series. We currently have two sections for these series. One is focused on beginners, the other on more advanced Haskellers.

Beginners Section

The Beginners Section will obviously focus on content for people who are new to Haskell. Right now, there are two series of articles. The first is our Liftoff series. If you have never programmed in Haskell before, this is the series for you! You'll learn how to install Haskell on your system, as well as the core language mechanics.

The second set of articles is our Haskell Brain series. This series focuses on the mental side of learning Haskell. It goes through the psychological hurdles many people face when starting Haskell. It also goes over some interesting general techniques for learning.

Advanced Section

The Advanced Section features content for those trying to make the step up from hobbyists to professional Haskellers. It incorporates two of our more recent series from the blog. First up is the Web Skills series. This series goes through some interesting libraries for tasks you might need when building a Web backend. For instance, you'll learn about the Persistent database library, the Servant API library, and some general testing techniques.

The advanced section also features a series on Haskell and machine learning. It starts off by making the case for why Haskell is a good fit for machine learning in general. Then it goes through some specific examples. One highlight of this series is a tutorial on the Haskell Tensor Flow bindings. It also gives some examples of using dependent types within Tensor Flow.

Resources Update

We've also made a big update to our subscriber-only resources. All the resources are available on the resources page. If you're a subscriber to our email list, you should have gotten an email with the password to this page! If you're not subscribed yet, you can still sign up for free! You'll get access to all of these:

  1. Beginner's Checklist – The newly revised version of our Getting Started Checklist will help you review some of the core concepts of Haskell. It will also point you towards some additional resources to help you learn even more!
  2. Production Checklist – This NEW resource lists a large number of libraries you can use for production tasks. It goes well beyond the set covered in the web skills series and gives a short summary of each.
  3. Recursion Workbook – This workbook contains a couple chapters of content that will teach you all about recursion. Then it offers 10 practice problems so you can put your skills to the test!
  4. Stack Mini-Course – This mini-course will walk you through the basics of the Haskell Stack tool so you can actually make your own projects!
  5. Servant Tutorial – At BayHac 2016 I gave a talk on the Servant library. If you're a subscriber, you can get the slides and the sample code for that talk.
  6. Tensor Flow Guide – This guide accompanies our Haskell AI series. It goes through all the details you need to know about getting the Haskell Tensor Flow library up and running.

If you subscribe, you'll also get our monthly newsletter! This will detail what's new on the blog, and what content you can expect in the future!

The Blog

Going forward, I'll be continuing to take some older blog content and form it into coherent series. Most of my weekly blog posts for the time being will focus on announcing when these are available. I do have quite a bit more fresh content planned for the future though, so stay tuned! In the meantime, if there's an old blog article you're trying to find, you can use our search functionality! I've added tags to each blog post to help you out!

So don't forget, if you want access to our awesome resources, sign up for free!

Read More
James Bowen James Bowen

Functors Done Quick!

Suppose we're writing some code to deal with bank accounts. Most of our code will refer to these using a proper data type. But less refined parts of our code might use a tuple with the same information instead. We would want a conversion function to go between them. Here's a simple example:

data BankAccount = BankAccount
  { bankName :: String
  , ownerName :: String
  , accountBalance :: Double
  }

convertAccount :: (String, String, Double) -> BankAccount
convertAccout (bank, owner, balance) = BankAccount bank owner balance

Naturally, we'll want a convenience function for performing this operation on a list of items. We'll can use map for lists.

convertAccounts :: [(String, String, Double)] -> [BankAccount]
convertAccounts = map convertAccount

But Haskell has a plethora of different data structures. We can store our data in a Set, or a Vector, for a couple examples. What if different parts of our code store the data differently? They would need their own conversion functions, since the list version of map doesn't work on a Set or Vector. Can we make this code more generic?

Functors

If you read the blog post a couple weeks ago, you'll remember the idea of typeclasses. This is how we can make our code generic! We want to generalize the behavior of running a transformation over a data structure. We can make a typeclass to encapsulate this behavior. Luckily, Haskell already has such a typeclass, called Functor. It has a single function, fmap. Here is how it is defined:

class Functor f where
  fmap :: (a -> b)  ->  f a  ->  f b

If that type signature looks familiar, that's because it's almost identical to the map function over lists. And in fact, the List type uses map as it's implementation for fmap:

map :: (a  ->  b)  ->  [a]  ->  [b]

instance Functor [] where
  fmap = map

Other Functor Instances

Now, Set and Vector do have map functions. But to make our code generic, we have to define functor instances as a go-between:

instance Functor Vector where
  fmap = Data.Vector.map

instance Functor Set where
  fmap = Data.Set.map

With all this in mind, we can now rewrite convertAccounts generically.

convertAccounts :: (Functor f) => f (String, String, Double)  ->  f BankAccount
convertAccounts = fmap convertAccount

Now anything can use convertAccounts no matter how it structures the data, as long as it uses a functor! Let's looks at some of the other functors out there!

While it might not seem to fit in the same category as lists, vectors and sets, Maybe is also a functor! Here's its implementation:

instance Functor Maybe where
  fmap _ Nothing = Nothing
  fmap f (Just a) = Just (f a)

Another example of a functor is Either. This one is a little confusing since Either has two type parameters. But really, we have to fix the first parameter. Then the conversion function is only applied to the second. This means that, like with the Nothing case above, when we have Left, we return the original value:

instance Functor (Either a) where
  fmap _ (Left a) = Left a
  fmap f (Right x) = Right (f x)

Conceptualizing Functors

So concretely, Functor is a typeclass in Haskell. But how can we think of it conceptually? This is actually pretty simple. A functor is nothing more than a generic container or box. We don't know how many elements it contains. We don't know what the structure of those elements is. But if we have a way to transform those elements, we can apply that function over all of them. The result will be a new container with the same structure, but new elements. As far as abstractions go, this is probably the cleanest one we'll get, so enjoy it!

Conclusion

Functor is an example of typeclass that we can use to get general behavior. In this case, the behavior is transforming a group of objects in a container while maintaining the container's structure. We saw how this typeclass allowed us to re-use a function over many different types. Functors are the simplest in a series of important typeclasses. Applicative functors would come next, and then monads. Monads are vital to Haskell. So understanding functors is an important first step towards learning more complex Haskell.

But you can't learn about data structures until you know the basics! If you've never written any Haskell before, download out Getting Started Checklist! If you're comfortable with the basics and want more of a challenge, take a look at our Recursion Workbook!

Read More
James Bowen James Bowen

Need to be Faster? Be Lazy!

In more procedural and object oriented languages, we write code as a series of commands. These commands get executed in the order we write them in no matter what. Consider this example:

int myFunction(int a, int b, int c) {
  int result1 = longCalculation1(a,b);
  int result2 = longCalculation2(b,c);
  int result3 = longCalculation3(a,c);
  if (result1 < 10) {
    return result1;
  } else if (result2 < 100) {
    return result2;
  } else {
    return result3;
  }
}

There’s a clear inefficiency here. No matter what, we’ll perform all three long running operations. But we might not actually need all the results! We could rewrite the code to get around this.

int myFunction(int a, int b, int c) {
  int result1 = longCalculation1(a,b);
  if (result1 < 10) {
    return result1;
  } else {
    int result2 = longCalculation2(b,c);
    if (result2 < 100) {
     return result2;
    } else {
      int result3 = longCalculation3(a,c);
      return result3;
    }
  }
}

But now it’s a little less clear what’s going on. The code isn’t as readable. And there are some situations where this kind of refactoring is impossible. This is an inevitable consequence of the paradigm of eager evaluation in almost all mainstream languages. In Haskell we write expressions, rather than commands. Thus evaluation order is a little less clear. In fact, Haskell expressions are evaluated lazily. We don’t perform any calculations until we’re sure they’re needed! Let’s see how this works.

How Laziness Works

Here’s how we can write the function above in Haskell:

myFunction :: Int -> Int -> Int -> Int
myFunction a b c =
  let result1 = longCalculation1 a b
      result2 = longCalculation2 b c
      result3 = longCalculation3 a c
  in if result1 < 10
       then result1
       else if result2 < 100
         then result2
         else result3

While this seems semantically identical to the first C++ version, it actually runs as efficiently as the second version! In Haskell, result1, result2, and result3 get stored as “thunks”. GHC sets aside a piece of memory for the result, and knows what calculation it has to perform to get the result. But it doesn’t perform the calculation until we need the result.

Here’s another example. Suppose we want all Pythagorean triples whose sum is less than 1000. Sounds like a tall order. But enter the following into GHCI, and you’ll see that it happens very quickly!

>> let triples = [(a,b,c) | a <- [1..1000], b <- [1..1000], c <- [1..1000], a + b + c < 1000, (a ** 2) + (b**2) == c ** 2]

Did it perform all that calculation so quickly? Of course not! If you now print triples, it will take a while to print it all out. But suppose we only wanted 5 examples! It doesn’t take too long!

>> take 5 triples
[(3.0,4.0,5.0),(4.0,3.0,5.0),(5.0,12.0,13.0),(6.0,8.0,10.0),(7.0,24.0,25.0)]

As we see, an element is typically only brought into scope when it is needed by an IO action, such as print. If you’re using GHCI and print the result of a calculation, you’ll need to evaluate the whole calculation. Otherwise, you’ll need the calculation once it (or an expression that depends on it) gets printed by main.

Infinite Lists as a Consequence of Laziness

Besides potentially saving time, laziness has some other interesting consequences. One of these is that we can have data structures that can’t exist in other languages. For instance, we can define an “infinite” list:

>> let infList = [1..]

This list starts at 1, and each element counts up by 1, going up to infinity! But how is this possible? We don’t have an infinite amount of memory! The key is that we don’t actually bring any of the elements into scope until we need them. For example, we can take the first 10 elements of an infinite list.

>> take 10 [1..]
[1,2,3,4,5,6,7,8,9,10]

Of course, if we try to print the entire list, we’ll run into problems!

>> [1..]
(Endless printing of numbers)

But there are some cool things we can do with infinite lists. For instance, it’s easy to match up a list of elements with the numeric index of the element in the list. We can do this by using zip in conjunction with an infinite list:

addIndex :: [a] -> [(Int, a)]
addIndex = zip [1..]

Or we could match every element with its index modulo 4:

addIndexMod4 :: [a] -> [(Int, a)]
addIndexMod4 = zip (cycle [0,1,2,3])

Disadvantages of Laziness

Haskell’s laziness isn’t infallible. It can get us into trouble sometimes. While it often saves us time, it can cost us in terms of space. This is apparent even in a simple example using foldl.

>> foldl (+) 0 [1..100000000]
Stack overflow!

When we add the numbers up through 100 million, we should be able to do it with constant memory. All we would need would be a running tally of the current sum. On the one hand, laziness means that the entire list of a hundred million numbers is never all in scope at the same time. But on the other hand, all the calculations involved in that running tally happen lazily! So at some point, our memory footprint actually looks like this:

(((((1 + 2) + 3) + 4) + …) + 100000000)

That is, all the individual numbers are in memory at the same time because the + operations aren’t evaluated until they need to be! In situations like this example, we want to introduce strictness into our code. We can do this with the seq function. This function is a little special. It takes two arguments, and returns the second of them. However, it is strict in its first argument. That is, the first item we pass to it will be fully evaluated.

We can see this in use in the definition of foldl’, the strict counterpart to foldl:

foldl’ f accum [] = accum
foldl’ f accum (x : xs) = let newAccum = f accum x
                           in seq newAccum $ foldl’ f newAccum xs

The use of seq here causes Haskell to evaluate newAccum strictly, so we don't keep storing calculations in memory. Using this technique, we can now actually add up that list of integers!

>> foldl’ (+) 0 [1..100000000]
5000000050000000

Conclusion

Laziness is another feature that makes Haskell pretty unique among programming languages. Like any language feature, it has its drawbacks. It gives us yet another way we have to reason differently about Haskell compared to other languages. But it also has some distinct advantages. We can have some significantly faster code in many cases. It also allows us to use structures like infinite lists that don’t exist in other languages.

Hopefully this has convinced you to give Haskell a try. Take a look at our Getting Started Checklist and get going!

Read More
James Bowen James Bowen

Immutability: The Less Things Change, The More You Know

Most programmers take it for granted that they can change the values of their expressions. Consider this simple Python example:

> a = [1,2,3]
> a.reverse()
> a
[3,2,1]

We can see that the reverse function actually changed the underlying list. Here’s another example, this time in C++. We pass a pointer to an integer as a parameter, and we can update the integer within the function.

int myFunction(int* a) {
  int result = 0;
  if (*a % 2 == 0) {
    result = 10;
  } else {
    result = 20;
  }
  ++(*a);
  return result;
}

When we call this function, the original expression changes values.

int main() {
  int x = 4;
  int* xp = &x;
  int result = myFunction(xp);
  cout << result << endl;
  cout << *xp << endl;
}

Even though xp initially points to the value 4, when we print it at the end, the value is now 5! But as we’ll learn, Haskell does not, in general, allow us to do this! Let’s see how it works.

Immutability

In Haskell, all expressions are immutable! This means you cannot change the underlying value of something like you can in Python or C++. There are still some functions that appear to mutate things. But in general, they don’t change the original value. They create entirely new values! Let’s look at an example with reverse:

>> let a = [1,2,3]
>> reverse a
[3,2,1]
>> a
[1,2,3] -- unchanged!

The reverse function takes one argument, a list, and returns a list. But the final value is a totally new list! Observe how the original expression a remains the same! Compare this to the earlier Python example. The reverse function actually had a “void” return value. Instead, it changed the original list.

Record syntax is another example where we appear to mutate a value in Haskell. Consider this type and an accompanying mutator function:

data Person = Person
  { personName :: String
  , personAge :: Int
  } deriving (Show)

makeAdult :: Person -> Person
makeAdult person = person { personAge = 18}

But when we actually use the function, we’ll find again that it creates a totally new value! The old one stays the same!

>> let p = Person “John” 17
>> makeAdult p
Person {personName = “John”, personAge = 18}
>> p
Person {personName = “John”, personAge = 17}

Advantages of Immutability

Immutability might seem constraining at first. But it’s actually very liberating! Until you try programming with immutability by default, you don’t realize quite how many bugs mutable data causes. It is tremendously useful to know that your values cannot change. Suppose we take a list as a parameter to a function. In Haskell, we know that no matter how many functions we call with that list as a parameter, it will still be the same each time.

example :: [Int] -> Int
example myList = …
  where
    -- Each call uses the EXACT same list!
    result1 = function1 myList
    result2 = function2 result1 myList
    result3 = function3 result2 myList

Immutability also means you don’t have to worry about different ways to “copy” a data structure. Every copy is a shallow copy, since you can’t change the values within the structure anyway!

Ways Around Immutability

Naturally, there are situations where you want to have mutable data. But we can always simulate this effect in Haskell by using more advanced types! For instance, we can easily represent the C++ function above using the State monad.

myFunction :: State Int Int
myFunction = do
  a <- get
  let result = if a `mod` 2 == 0
        then 10
        else 20
  modify (+1) -- Change the underlying state value
  return result

{-
>> let x = 4
>> runState myFunction x
(10, 5)
>> x
4
-}

Again, this doesn’t actually “mutate” any data. When we pass x into our State function, x itself doesn’t change! Instead, the function returns a completely new value. Now, different calls to get can return us different values depending on the state. But this fact is encoded in the type system. We explicitly declare that there is an Int value that can change.

Of course, there are times where we actually do want to change the specific values in memory. One example of this is if we want to perform an in-place sort. We’ll have the move the elements of the array to different spots in memory. Otherwise we will have to allocate at least O(n) more space for the final list. In cases like this, we can use IO references. To sort an array, we’d want Data.Array.IO. For many other cases, we’ll just want the IORef type. Whenever you need to truly mutate data, you need to be in the IO monad.

Looking at all these examples, what we see is that Haskell doesn’t actually limit us at all! We can get all the same mutability effects we have in other languages. The difference is that in Haskell the default behavior is immutability. We have to use the type system to specify when we want mutable data.

Contrast this with C++. We can get immutable data by using the const keyword if we want. But the default is mutable data and we have to use the type system to make it immutable.

Conclusion

Immutability sounds crazy. But it does a huge amount to limit the kinds of bugs we can get. It seems like a big limitation on your code, but there are plenty of workarounds when you need them. The key fact is that mutable data is encoded within the type system. This forces you to be very conscious about when your data is mutable, and that will help you avoid bugs.

Want to see for yourself what the hype is about? Give Haskell a shot! Download our Getting Started Checklist and start learning Haskell!

Read More
James Bowen James Bowen

General Functions with Typeclasses

Last week, we looked at the basics of Haskell’s data types. We saw that haskell is not an object oriented language, and we don’t have inheritance between data types. This would get very confusing with all the different constructors that a data type can have. Haskell gives a lot of the same functionality as inheritance by using Typeclasses. This week we’ll take a quick look at this concept.

What is a Typeclass?

A typeclass encapsulates functionality that is common to different types. In practice, a typeclass describes a series of functions that you expect to exist for a given type. When these functions exist, you can create what is called an “instance” of a typeclass.

Typeclasses are a lot like interfaces in Java. You specify a group of functions, but only with the type signatures. Then for each relevant type, you'll need to specify an implementation for each function. As an example, suppose we had two different types referring to different kinds of people.

data Student = Student String Int

data Teacher = Teacher
  { teacherName:: String
  , teacherAge:: Int
  , teacherDepartment :: String
  , teacherSalary :: Int
  }

We could then make a typeclass called IsPerson. We'll give it a couple functions that refer to the name and age of the person. Then we parameterize the class by the single type a. We'll use that type parameter in the type signatures of the functions:

class IsPerson a where
  personName :: a -> String
  personAge :: a -> Int

Creating Instances of Typeclasses

Now let's create an instance of the typeclass. All we have to do is implement each function under the instance keyword:

instance IsPerson Student where
  personName (Student name _) = name
  personAge (Student _ age) = age

instance IsPerson Teacher where
  personName = teacherName
  personAge = teacherAge

There are a lot of simple typeclasses in the base libraries that you’ll need to know for some basic tasks. For instance, to compare two items for equality, you’ll need the Eq typeclass:

class Eq a where
  (==) :: a -> a -> Bool
  (/=) :: a -> a -> Bool

We can define instances for these for all our types. But for simple, base library classes like these, GHC can define them for us! All we need is the deriving keyword:

data Student = Student String Int
  deriving (Eq)

data Teacher = Teacher
  { teacherName:: String
  , teacherAge:: Int
  , teacherDepartment :: String
  , teacherSalary :: Int
  }
  deriving (Eq)

Using Typeclass Constraints

But why are typeclasses important? Well, often times we want to write code that is as general as possible. We want to write functions that assume as little about their inputs as they can. For instance, suppose we have this function that will print a teacher’s name:

printName :: Teacher -> IO ()
printName teacher = putStrLn $ personName teacher

We can use this function for more types than Teacher though! Any type that implements IsPerson will do. So we can make the function polymorphic, and add the IsPerson constraint on our a type:

printName :: (IsPerson a) => a-> IO ()
printName person = putStrLn $ personName person

You can also use typeclasses to constrain a type parameter of a new data type.

data (IsPerson a) => EmployeeRecord a = EmployeeRecord
  { employee :: a
  , employeeTenure :: Int
  }

Typeclasses can even provide a form of inheritance. You can constrain a typeclass by another typeclass! A couple base classes show an example of this. The “Orderable” typeclass Ord depends on the type having an instance of Eq:

class (Eq a) => Ord a where
  compare :: a -> a -> Ordering
  (<=) :: a -> a -> Bool
  ...

Conclusion

Haskell programmers like code that is as general as possible. Object oriented languages try to accomplish this with inheritance. But Haskell gets most of the same functionality with typeclasses instead. They describe common features between types, and provide a lot of flexibility.

To continue learning more about the Haskell basics, take a look at our Getting Started Checklist and get going!

Do you already understand the basics and want more of a challenge? Check out our Recursion Workbook!

Read More
James Bowen James Bowen

Haskell Data Types in 5 Steps

People often speak of a dichotomy between “object oriented” programming and “functional” programming. Haskell falls into the latter category, meaning we do more of our work with functions. We don't use hierarchies of objects to abstract work away. But Haskell is also heavily driven by its type system. So of course we still define our own data types in Haskell! Even better, Haskell has unique mechanisms you won't find in OO languages!

The Data Keyword and Constructors

In general, we define a new data type by using the data keyword, followed by the name of the type we’re defining. The type has to begin with a capital letter to distinguish it from normal expression names.

data Employee = ...

To start defining our type, we must provide a constructor. This is another capitalized word that allows you to create expressions of your new type. The constructor name is then followed by a list of 0 or more other types. These are like the “fields” that a data type carries in a language like Java or C++.

data Employee = Executive String Int Int

employee1 :: Employee
employee1 = Executive "Jane Doe" 38 300000

Sum Types

In a language like Java, you can have multiple constructors for a type. But the type will still encapsulate the same data no matter what constructor you use. In Haskell, you can have many constructors for your data type, separated by a vertical bar |. Each of your constructors then has its own list of data types! So different constructors of the same type can have different underlying data! We refer to a type with multiple constructors as a “sum” type.

data Employee =
 Executive String Int Int |
 VicePresident String String Int |
 Manager String String |
 Engineer String Int

If your type has only one constructor, it is not uncommon to re-use the name of the type as the constructor name:

data Employee = Employee String Int

Record Syntax

You can also define a type using “record syntax”. This allows you to provide field names to each type in the constructor. With these, you access the individual fields with simple functions. Otherwise, you'll need to resort to pattern matching. This is more commonly seen with types that use a single constructor. It is a good practice to prefix your field names with the type name to avoid name conflicts.

data Employee = Employee
 { employeeName :: String
 , employeeAge :: Int
 }

printName :: Employee -> IO ()
printName employee = putStrLn $ employeeName employee

Type Synonyms

As in C++, you can create type synonyms, providing a second name for a type. Sometimes, expressions can mean different things, even though they have the same representation. Type synonyms can help keep these straight. To make a synonym, use the type keyword, the new name you would like to use to refer to your type, and then the original type.

type InterestRate = Float
type BankBalance = Float

applyInterest :: BankBalance -> InterestRate -> BankBalance
applyInterest balance interestRate = balance + (balance * interestRate)

Note though that type synonyms have no impact on how your code compiles! This means it is still quite possible to misuse them! The following type signatures will still compile for this function:

applyInterest :: Float -> Float -> Float
applyInterest :: InterestRate -> BankBalance -> Float

Newtypes

To avoid the confusion that can occur above, you can use the newtype keyword. A newtype is like a cross between data and type. Like type, you’re essentially renaming a type. But you do this by writing a declaration that has exactly one constructor with exactly one type. As with a data declaration, you can use record syntax within newtypes.

newtype BankBalance = BankBalance Float
newtype InterestRate = InterestRate { unInterestRate :: Float }

Once you’ve done this, you will have to use the constructors (or record functions) to wrap and unwrap your code:

applyInterest :: BankBalance -> InterestRate -> BankBalance
applyInterest (BankBalance bal) rate = BankBalance $
 bal + (unInterestRate rate * bal)

Newtype declarations do affect how your code compiles. So the following invalid type signature will NOT compile!

applyInterest :: InterestRate -> BankBalance -> Float
applyInterest (BankBalance bal) (InterestRate rate) = ...

Conclusion

As we learned a couple weeks ago, types are important in Haskell. So it’s not surprising that Haskell has some nifty constructs for building our own types. Constructors and sum types give us the flexibility to choose what kind of data we want to store. We can even change the data stored for different elements of the same type! Type synonyms and newtypes give us two different ways to rename our types. The first is easy and helps avoid confusion. The second requires more code re-writing, but provides more type safety.

If you’ve never written a line of Haskell before, never fear! Take a look at our Getting Started Checklist to get going!

Read More
James Bowen James Bowen

Syntactic Implications of Expressions

Last week we explored expressions and types, the fundamental building blocks of Haskell. These concepts revealed some major structural differences between Haskell and procedural languages. This week we’ll consider the implications of these ideas. We'll see how they affect syntactic constructs in Haskell.

If Statements

Consider this Java function:

public int func(int input) {
  int z = 5;
  if (input % 2 == 0) {
    z = 4;
  }
  return z * input;
}

Here we see a very basic if-statement. Under a certain condition, we change the value of z. If the condition isn’t true we do nothing. But this is not how if statements work in Haskell! What lies inside the if statement is a command, and we compose our code with expressions and not commands!

In Haskell, an if-statement is an expression like anything else. This means it has to have a type! This constrains us in a couple ways. If the condition is true, we can supply an expression that will be the result. Then the type of the whole statement has the type of this expression. But what if the condition is not true? Can an expression be null or void? Most of the time no!* The following is rather problematic:

myValue :: Int -> ??? -- What type would this have if the condition is false?
myValue x = if x `mod` 2 == 0 then 5

This means that if-statements in Haskell must have an else branch. Furthermore, the else branch must have an expression that has the same type as in the true branch!

myValue :: Int -> Int -- Both the false and true branches are Int
myValue x = if x `mod` 2 == 0 then 5 else 3

Notice the real difference here. We’re used to saying “if x, do y”. But in Haskell, we assign an expression to be some value that may differ depending on a condition. So from a conceptual standpoint, the “if” is further inside the statement. This is one big hurdle to cross when first learning Haskell.

*Note: In monadic circumstances, a “null” value (represented as the unit type) can make sense. See when and unless.

Where Statements

We've established that everything in Haskell is an expression. But we often use commands to assign values to intermediate variables. How can we do this in Haskell? After all, if a computation is complicated, we don’t want to describe it all on one line.

The where statement fills this purpose. We can use where, followed by any number of statements under to assign names to intermediate values!

mathFunc :: Int -> Int -> Int -> Int
mathFunc a b c = sum + product + difference
  where
    sum = a + b + c
    product = a * b * c
    difference = c - b - a

Statements in a where clause can be in any order. They can depend on each other, as long as there are no dependency loops!

mathFunc :: Int -> Int -> Int -> Int
mathFunc a b c = sum + product + difference
  where
    product = a * b * c * sum
    sum = a + b + c
    difference = sum - b - product

Let Statements

There’s a second way of describing intermediate variables! We can also use a let statement, combined with in. Unlike with where, let bindings must be in the right order. They can't depend on later values. Here’s the above function, written using let:

mathFunc :: Int -> Int -> Int -> Int
mathFunc a b c =
  let sum = a + b + c
       product = a * b * c
       difference = c - b - a
  in sum + product + difference

Why do we have two ways of expressing the same concept? Well think about writing an essay. You'll often have terms you want to define for the reader. Sometimes, it makes more sense to define these before you use them. This is like using let. But sometimes, you can use the expression first, and show the details of how you calculated it later. This is what where is for! This especially works when the expression name is descriptive.

As a side note, there are also situations with monads where you can’t use where and have to use let. But that’s a topic for another time!

Conclusion

If you’re new Haskell, hopefully these short articles are giving you some quick insights into the language. For more details, take a look at our Getting Started Checklist! If you think you’ve mastered the basics and want to learn how to organize a real project, you should take our free Stack mini-course. It will teach you how to use the Stack tool to keep your Haskell code straight.

Read More
James Bowen James Bowen

Back to Basics: Expressions and Types, Distilled

This week we begin our “Haskell Distilled” series. These articles will look at important Haskell concepts and break them down into the most fundamental parts. This week we start with the essence of what defines Haskell as a language.

Everything is an Expression

In Haskell, almost everything we write is an expression. This goes from the simplest primitive elements to the most complicated functions. Each of the following is an expression:

True
False
4
9.8
[6, 2]
\a -> a + 5
‘a’ 
“Hello”

This is, in part, what defines Haskell as a functional language. We can compare this against a more procedural language like java:

public int add5ToInput(int x) {
  int result = x  + 5;
  return 5;
}

The method we’ve defined here isn’t a expression in the same way a function is in Haskell. We can’t substitute the definition of this method in for an invocation of it. In Haskell, we can do exactly this with functions! Each of the lines in this method are not expressions themselves either. Rather, each of them is a command to execute.

Even Haskell code that appears to be procedural in nature is really an expression!

main = do
  putStrLn “Hello”
  putStrLn “World”

In this case, main is an expression of type IO (), not a list of procedures. Of course, the IO monad functions in a procedural manner sometimes. But still, the substitution property holds up.

Every Expression has a Type

Once we understand expressions, the next building block is types. All expressions have types. For instance, we can write out potential types for each of the expressions we had above:

True :: Bool
False :: Bool
4 :: Int
9.8 :: Float
[6, 2] :: [Int]
\a -> a + 5 :: Int -> Int
‘a’ :: Char
“Hello” :: String

Types tell us what operations we can perform with the data we have. They ensure we don’t try to perform operations with non-sensical data. For instance we can only use the add operation (+) on expressions that are of the same type and that are numeric:

(+) :: (Num a) -> a -> a -> a

By contrast, in javascript, where we can "add" arbitrary variables (think [] + {}) which can have non-sensical results.

Functions are Still Just Expressions

As we saw in the Java example above, methods are a little cumbersome. There is a distinction between a method definition and the commands we use for normal code. But in Haskell, we define our more complicated expressions as functions. And these are still expressions just like our most primitive definitions! Functions have types as well, based on the parameters they take. So we could have functions that take one or more parameters:

add5 :: Int -> Int
add5 x = x + 5

sumAndProduct :: Int -> Int -> Int
sumAndProduct x y = (x + y) * (x + y)

In our Java example, the statement return result is fundamentally different from the method definition of add5ToInput. But above, sumAndProduct isn’t a different concept from the simpler expression 5.

We use functions by “applying” them to arguments like so:

sumAndProduct 7 8 :: Int

We can also partially apply functions. This means we fix one of the arguments, but leave the rest of it open-ended.

sumAndProduct 7 :: Int -> Int

The result of this expression is a new function that takes only a single argument. It’s result will be the sumAndProduct function with the first argument fixed as 7. We've peeled off one of the arguments from the function type.

Conclusion

In the coming week’s we’ll continue breaking down simple concepts. We’ll examine how some of Haskell’s other syntactic constructs work within this system of expressions and types. If you’re new to Haskell, you should check out our Getting Started Checklist so you can start using these concepts!

Read More
James Bowen James Bowen

1 Year of Haskell

This week marks the one year anniversary of Monday Morning Haskell! I’ve written an awful lot in the past year. It’s now obvious that the “blog” format doesn’t do justice to the amount of content on the site. There’s no organization, so a lot of the most useful content is stuck way down in the archives. So I’m taking this opportunity to announce some plans to reorganize the website!

Website Organization

I have two main focuses when it comes to Haskell. First, making the language easier for beginners to pick up. Second, showing the many industrial-strength tasks we can perform with Haskell. There will be two different sections of the website highlighting these components.

There will be a beginner’s corner that will focus on giving advice for starting the language from the ground up. It will feature content about the fundamental concepts and techniques of the language. It will also showcase some advice for conquering the psychological hurdles of Haskell. There will also be a section devoted to articles on production libraries, such as the recent series on building a Web API and using Tensor Flow.

Future Posts

This means I'll be focusing on organizing permanent content for a while. As a result, I won’t have as much time to spend on new blog posts. But I will still be publishing something new every Monday morning! These new posts will focus on distilling concepts into the most important parts. They'll look back at earlier lessons and pick out the highlights. Of course, I still have plans for some more in-depth technical tutorials. I’ll be looking at areas such as front-end web development, creating EDSLs, the C foreign function interface, and more.

Academy

I also am working hard on some new course material to go with our current free mini-course on learning Stack! I'm currently working on a full-length courses beginners course to help people learn Haskell from scratch. I will follow that up with another course aimed at using Haskell in production. This second course will go through both generally important production skills and showcase Haskell libraries applying those skills.

Conclusion

Altogether, the fun is just beginning at Monday Morning Haskell! I’m committed to continuing to build this website to be a resource for beginners and experts alike!

If you’re a beginner yourself, check out any of our newbie resources, like our Getting Started Checklist, our Recursion workbook, or the Stack mini-course mentioned above!

Read More
James Bowen James Bowen

The Right Types of Assumptions

A little while back I was discussing my Haskell and AI article with some other engineers I know. These people are very smart and come from diverse backgrounds of software engineering. But they tend to prefer dynamically typed languages like Javascript and Ruby. So they don’t buy into Haskell, which makes them no different from most of the engineering community.

Still, some things about the discussion surprised me. I heard, of course, the oft-repeated mantra that types do not replace tests. But even more surprising was one claim that types do not, and are not even meant to provide any kind of safety. One engineer even claimed he thinks of types only as directives to help a compiler optimize a program.

Now, it would be bold to claim that Haskell is better for all things and all circumstances. But I don’t think I’ve ever done that on this blog. It would also be unrealistic to claim that Haskell’s type system can replace all testing. And in fact I’ve dismissed that claim a couple times on this blog.

But surely Haskell’s type system gets us something. Surely, compiled types add some layer of extra safety and correctness to our programs. I’ve never thought this to be a particularly controversial point. Any yet many smart people dismiss even this claim. So these conversations got me thinking, why is it that I think Haskell's type system is useful? Why do I believe what I do?

Convincing Myself

For me, the evidence is a matter of personal experience. And a lot of other Haskellers would agree. I programmed in dynamic languages, such as Python and Javascript. I’ve also dealt a lot with Objective C and Java, which have static types but do not restrict effects as Haskell does.

It’s very clear to me where I get fewer bugs and write cleaner code that is more often correct, and that’s with Haskell. But of course I can’t base everything on my own experience, which could be subject to many biases. For instance, it could be that I haven’t spent enough time writing in dynamic languages at an industry level. It could also be that I happen to have become a more competent programmer in the last year. Perhaps I would have seen similar improvements no matter what language I focused on. Still, Haskell changed my programming in ways that other languages wouldn't have. I feel I could now go back to other languages and write better programs than I could before I learned Haskell.

Regardless, I would not feel as confident. Without Haskell's guarantees, there’s a great deal more mental overhead. You have to always ask yourself questions like, is every error condition checked? Are all my values non-null? Have I written enough tests to cover basic data format cases? Have I prevented effects like slow-running DB operations in performance critical places? Determining the answers to these might require a line-by-line audit of the code. Thorough code review is good. But when it becomes a pain, corners will get cut.

Perhaps if I had a couple more years experience writing Javascript, I wouldn’t view these tasks as such a burden. But one of the best answers I ever read on Quora gave this piece of advice. As you become a better programmer, you don’t get better at keeping things in your head and knowing the whole system. You get better at sectioning off the system in a way that you don’t have to keep it in your head. Haskell lends itself very well to this approach.

So it’s true that other languages can be written in a safe way. But Haskell forces you to write and think safely. In this regard, Haskell would actually be a fantastic first language.

Empirical Evidence?

Now there’s been some efforts to study the effects of programming language choice. This dissertation from UC Davis examined many open-source code bases to find what factors led to fewer bugs. The conclusion, was that strongly typed, functional languages fare better. A much older paper from Yale found that Haskell was a clear winner when it came to code clarity. (Python and Javascript weren’t covered in the study though).

Case closed right?

Well of course not. With the UC Davis paper, many possible issues prevent firm conclusions. After all, it only studied open source code and not industrial code. Confounding factors might well exist. Functional projects could just have fewer users. This could explain fewer bug reports. Also, code size dwarfed any effects of language choice. More code = more bugs.

But it’s hard to see how we could settle this question with an empirical study. No one has access to programmers with arbitrary levels of experience in a language. The Yale paper might have had the right idea. But it would need a larger scale to draw more solid conclusions.

What Other Evidence is out there?

So failing this kind of assessment, what is the best way to provide evidence in Haskell’s favor? It’s rather hard to look past testimony here. It would be great to see more publicity around stories like at Channable. Engineers there were able to use convert a small piece of infrastructure to Haskell. The result was a system that had fewer bugs that they could refactor with confidence.

Of course, testimony like this is still imperfect. There could be major survivorship bias. For every person who posts about Haskell solving their problems, there may be 5 who found it didn't help.

There’s also the fact of the world that many people don’t like change. You could provide all the evidence in the world in favor of a particular language. Some people will still choose to fall back on what they know. For people who have “tried and true” business models, this makes sense. Anything else seems risky.

So how do we Convince People?

I’ll repeat what I’ve noted a couple times in previous articles. Network effects are currently a huge drag on Haskell. The more Haskell developers who are out there, the easier it will be to convince companies to give it a try. Even if it is only on small projects, there will be more chances to see it succeed.

What is the best way to go about solving these network issues? For one, we need more marketing/Haskell evangelism. That said, there is still a sense that we need more educational material at all levels. Helping beginners get into the language is a great start. But there's definitely another hurdle to go from hobbyist to industry level.

Conclusion

Every once in awhile it’s important to take a step back and consider our assumptions. I definitely found it worthwhile to reexamine some of my beliefs about Haskell’s type system. It's helped me to remember why I think the way I do. There’s a good amount of evidence out there for Haskell’s utility and safety as a language. But the burden is on us as a community to collect those stories and put them out there more.

If you haven’t tried Haskell before, hopefully I’ve convinced you to be one of those people! Check out our Getting Started Checklist for some good resources in starting your Haskell education.

If you’ve dabbled a bit and want to make the next step up, we also have a few other resources for you! Check out of Stack mini-course, our Recursion Workbook or our Haskell Tensor Flow guide!

Read More
James Bowen James Bowen

Eff to the Rescue!

In the last couple weeks, we’ve seen quite a flurry of typeclasses. We used MonadDatabase and MonadCache to abstract the different effects within our API. This brought with it some benefits, but also some drawbacks. With these concepts abstracted, we could distill the API code into its simpler tasks. We didn't need to worry about connection configurations or lifting through different monads.

As we’ve seen though, there was a lot of boilerplate code involved. And there would be more if we wanted the freedom to have different parts of our app use different monad stacks. Free Monads are one solution to this problem. They allow you to compose your program so that the order in which you specify your effects does not matter. We’ll still have to write “interpretations” as we did before. But now they’ll be a lot more composable.

You can follow along the code for this by checking out the effects-3 branch on Github. Also, I do have to give a shoutout to Sandy Maguire for his talk on Eff and Free monads from BayHac. Most of what I know about free monads comes from that talk. You should also check out his blog as well.

Typeclass Boilerplate

Let’s review the main drawback of our type class approach. Recall our original definition of the AppMonad, and some of the instances we had to write for it:

newtype AppMonad a = AppMonad (ReaderT RedisInfo (SqlPersistT (LoggingT IO)) a)
  deriving (Functor, Applicative, Monad)

instance MonadDatabase AppMonad where
  fetchUserDB = liftSqlPersistT . fetchUserDB
  createUserDB = liftSqlPersistT . createUserDB
  deleteUserDB = liftSqlPersistT . deleteUserDB
  fetchArticleDB = liftSqlPersistT . fetchArticleDB
  createArticleDB = liftSqlPersistT . createArticleDB
  deleteArticleDB = liftSqlPersistT . deleteArticleDB
  fetchArticlesByAuthor = liftSqlPersistT . fetchArticlesByAuthor
  fetchRecentArticles = liftSqlPersistT fetchRecentArticles

liftSqlPersistT :: SqlPersistT (LoggingT IO) a -> AppMonad a
liftSqlPersistT action = AppMonad $ ReaderT (const action)

instance (MonadIO m, MonadLogger m) => MonadDatabase (SqlPersistT m) where
  ...

But suppose another part of our application wants to use a different monad stack. Perhaps it uses different configuration information and tracks different state. But it still needs to be able to connect to the database. As a result, we’ll need to write more instances. Each of these will need a new definition for all the different effect functions. Most all these will be repetitive and involve some combination of lifts. This isn’t great. Further, suppose we want arbitrary reordering of the monad stack. The number of instances you’ll have to write scales quadratically. Once you get to six or seven layers, this is a real pain.

Main Ideas of Eff

We can get much better composability by using free monads. I’m not going to get into the conceptual details of free monads. Instead I’ll show how to implement them using the Eff monad from the Freer Effects library. Let's first think back to how we define constraints on monads in our handler functions.

fetchUsersHandler :: (MonadDatabase m, MonadCache m) => Int64 -> m User

We take some generic monad, and constrain it to implement our type classes. With Eff, we’ll specify constraints in a different way. We have only one monad, the Eff monad. This monad is parameterized by a type-level list of other monads that it carries on its stack.

type JustIO a = Eff ‘[IO] a

type ReaderWithIO a = Eff ‘[Reader RedisInfo, IO] a

With this in mind, we can specify constraints on what monads are part of our stack using Member. Here’s how we’ll re-write the type signature from above:

fetchUsersHandler :: (Member Database r, Member Cache r) => Int64 -> Eff r User

We’ll specify exactly what Database and Cache are in the next section. But in essence, we’re stating that we have these two kinds of effects that live somewhere on our monad stack r. It doesn’t matter what order they’re in! This gives us a lot of flexibility. But before we see why, let’s examine how we actually write these effects.

Coding Up Effects

The first thing we’ll do is represent our effects as data types, rather than type classes. Starting with our database functionality, we’ll make a type Database a. This type will have one constructor for each function from our MonadDatabase typeclass. We’ll capitalize the names since they’re constructors instead of functions names. Then we’ll use GADT syntax, so that the result will be of type Database instead of a function in a particular monad. To start, here’s what our FetchUserDB constructor looks like:

{-# LANGUAGE GADTs #-}

data Database a where
  FetchUserDB :: Int64 -> Database (Maybe User)
  ...

Our previous definition looked like Int64 -> m (Maybe User). But we’re now constructing a Database action. Here’s the rest of the definition:

data Database a where
  FetchUserDB :: Int64 -> Database (Maybe User)
  CreateUserDB :: User -> Database Int64
  DeleteUserDB :: Int64 -> Database ()
  FetchArticleDB :: Int64 -> Database (Maybe Article)
  CreateArticleDB :: Article -> Database Int64
  DeleteArticleDB :: Int64 -> Database ()
  FetchArticlesByAuthor :: Int64 -> Database [KeyVal Article]
  FetchRecentArticles :: Database [(KeyVal User, KeyVal Article)]

Now we can also do the same thing with a Cache type instead of our MonadCache class:

data Cache a where
  CacheUser :: Int64 -> User -> Cache ()
  FetchCachedUser :: Int64 -> Cache (Maybe User)
  DeleteCachedUser :: Int64 -> Cache ()

Now, unfortunately, we do need some boilerplate with Eff. For each of constructor we create, we’ll need a function to run that item within the Eff monad. For these, we’ll use the send function from the Eff library. Each function states that our effect type is a member of our monad set. Then it will otherwise match the type of that constructor, only within the Eff monad. Here are the three examples for our Cache type.

cacheUser :: (Member Cache r) => Int64 -> User -> Eff r ()
cacheUser uid user = send $ CacheUser uid user

fetchCachedUser :: (Member Cache r) => Int64 -> Eff r (Maybe User)
fetchCachedUser = send . FetchCachedUser

deleteCachedUser :: (Member Cache r) => Int64 -> Eff r ()
deleteCachedUser = send . DeleteCachedUser

But wait! You might be asking, aren’t we trying to avoid boilerplate? Well, it’s hard to avoid all boilerplate. But the real gain we’ll get is that our boilerplate will scale in a linear fashion. We only need this code once per effect type we create. Remember, the alternative is quadratic growth.

Interpreting our Effects

To write "interpretations" of our effects in the type class system, we wrote instances. Here, we can do it with functions that we'll prefix with run. These will assume we have an action where our effect is on "top" of the monad stack. The result will be a new action with that layer peeled off.

runDatabase :: Eff (Database ': r) a -> Eff r a
runDatabase = ...

Now, we have to consider, what would be necessary to run our database effects? For our production application, we need to know that SqlPersistT lives in the monad stack. So we’ll add (SqlPersistT (LoggingT IO)) as a constraint on the rest of the r for our monad.

runDatabase :: (Member (SqlPersistT (LoggingT IO)) r) => Eff (Database ': r) a -> Eff r a

So we are in effect constraining the ordering of our monad, but doing it in a logical way. It wouldn’t make sense for us to ever run our database effects without knowing about the database itself.

To write this function, we specify a transformation between this Member of the rest of our stack and our Database type. We can run this transformation with runNat:

runDatabase :: (Member (SqlPersistT (LoggingT IO)) r) => Eff (Database ': r) a -> Eff r a
runDatabase = runNat databaseToSql
  where
    databaseToSql :: Database a -> SqlPersistT (LoggingT IO) a
    ...

Now we need a conversion between a Database object and a SqlPersistT action. For this, we plug in all the different function definitions we’ve been using all along. For instance, here’s what our fetchUserDB and createDB definitions look like:

databaseToSql (FetchUserDB uid) = get (toSqlKey uid)
databaseToSql (CreateUserDB user) = fromSqlKey <$> insert user

Our other constructors will follow this pattern as well.

Now, we’ll also want a way to interpret SqlPersistT effects within Eff. We’ll depend on only having IO as a deeper member within the stack here, though we also need the PGInfo parameter. Then we use runNat and convert between our SqlPersistT action and a normal IO action. We’ve done this before with runPGAction:

runSqlPersist :: (Member IO r) => PGInfo -> Eff ((SqlPersistT (LoggingT IO)) ': r) a -> Eff r a
runSqlPersist pgInfo = runNat $ runPGAction pgInfo

We go through this same process with Redis and our cache. To run a Redis action from our monad stack, we have to take the RedisInfo as a parameter and then also have IO on our stack:

runRedisAction :: (Member IO r) => RedisInfo -> Eff (Redis ': r) a -> Eff r a
runRedisAction redisInfo = runNat redisToIO
  where
    redisToIO :: Redis a -> IO a
    redisToIO action = do
      connection <- connect redisInfo
      runRedis connection action

Once we have this transformation, we can use the dependency on Redis to run Cache actions.

runCache :: (Member Redis r) => Eff (Cache ': r) a -> Eff r a
runCache = runNat cacheToRedis
  where
    cacheToRedis :: Cache a -> Redis a
    cacheToRedis (CacheUser uid user) = void $ setex (pack . show $ uid) 3600 (pack . show $ user)
    cacheToRedis (FetchCachedUser uid) = do
      result <- get (pack . show $ uid)
      case result of
        Right (Just userString) -> return $ Just (read . unpack $ userString)
        _ -> return Nothing
    cacheToRedis (DeleteCachedUser uid) = void $ del [pack . show $ uid]

And now we're done with our interpretations!

A Final Natural Transformation

Since we’re using Servant, we’ll still have to pick a final ordering. We need a natural transformation from Eff to Handler. Thus we'll specify a specific order so we have a specific Eff. We’ll put our cache effects on the top of our stack, then database operations, and finally, plain old IO.

transformEffToHandler ::
  PGInfo ->
  RedisInfo ->
  (Eff '[Cache, Redis, Database, SqlPersistT (LoggingT IO), IO]) :~> Handler

So how do we define this transformation? As always, we’ll want to create an IO action that exposes an Either value so we can catch errors. First, we can use our different run functions to peel off all the layers on our stack until all we have is IO:

transformEffToHandler ::
  PGInfo ->
  RedisInfo ->
  (Eff '[Cache, Redis, Database, SqlPersistT (LoggingT IO), IO]) :~> Handler
transformEffToHandler pgInfo redisInfo = NT $ \action -> do
  -- ioAct :: Err ‘[IO] a
  let ioAct = (runSqlPersist pgInfo . runDatabase . runRedisAction redisInfo . runCache) action
  ...

When we only have a single monad on our stack, we can use runM to get an action in that monad. So we need to apply that to our action, handle errors, and return the result as a Handler!

transformEffToHandler ::
  PGInfo ->
  RedisInfo -> 
  (Eff '[Cache, Redis, Database, SqlPersistT (LoggingT IO), IO]) :~> Handler
transformEffToHandler pgInfo redisInfo = NT $ \action -> do
  let ioAct = (runSqlPersist pgInfo . runDatabase . runRedisAction redisInfo . runCache) action
  result <- liftIO (runWithServantHandler (runM ioAct))
  Handler $ either throwError return result

And with that we’re done! Here’s the big win with Eff. It’s quite easy for us to write a different transformation on a different ordering of the Stack. We just change the order in which we apply our run functions!

-- Put Database on top instead of Cache
transformEffToHandler :: 
  PGInfo -> 
  RedisInfo -> 
  (Eff '[Database, SqlPersistT (LoggingT IO), Cache, Redis, IO]) :~> Handler
transformEffToHandler pgInfo redisInfo = NT $ \action -> do
  let ioAct = (runRedisAction redisInfo . runCache . runSqlPersist pgInfo . runDatabase) action
  result <- liftIO (runWithServantHandler (runM ioAct))
  Handler $ either throwError return result

Can we avoid outside services with this approach? Sure! We can specify test interpretations of our effects that don’t use SqlPersistT or Redis. We’ll still have IO for reasons mentioned last week, but it’s still an easy change. We'll define separate runTestDatabase and runTestCache functions that use the same effects we saw last week. They’ll depend on using the State over our in-memory maps.

runTestDatabase :: 
  (Member (StateT (UserMap, ArticleMap, UserMap) IO) r) => 
  Eff (Database ': r) a -> 
  Eff r a
runTestDatabase = runNat databaseToState
  where
    databaseToState :: Database a -> StateT (UserMap, ArticleMap, UserMap) IO a
    …

runTestCache ::
  (Member (StateT (UserMap, ArticleMap, UserMap) IO) r) =>
  Eff (Cache ': r) a ->
  Eff r a
runTestCache = runNat cacheToState
  where
    cacheToState :: Cache a -> StateT (UserMap, ArticleMap, UserMap) IO a
    ...

Then we fill in the definitions with the same functions we used when writing our TestMonad. After that, we define another natural transformation, in the same pattern:

transformTestEffToHandler ::
  MVar (UserMap, ArticleMap, UserMap) ->
  Eff '[Cache, Database, StateT (UserMap, ArticleMap, UserMap) IO] :~> Handler
transformTestEffToHandler sharedMap = NT $ \action -> do
  let stateAct = (runTestDatabase . runTestCache) action
  result <- liftIO (runWithServantHandler (runEff stateAct))
  Handler $ either throwError return result
  where
    runEff :: Eff '[StateT (UserMap, ArticleMap, UserMap) IO] a -> IO a
    runEff action = do
      let stateAction = runM action
      runStateTWithPointer stateAction sharedMap

Incorporating our Interpretations

The final step we’ll take is to change a couple different type signatures within our API code. We’ll pass a new natural transformation to our Server function:

fullAPIServer :: 
  ((Eff '[Cache, Redis, Database, SqlPersistT (LoggingT IO), IO]) :~> Handler) ->
  Server FullAPI
fullAPIServer nt = ...

And then we’ll change all our handlers to use Eff with the proper members, instead of AppMonad:

fetchUsersHandler :: (Member Database r, Member Cache r) => Int64 -> Eff r User
createUserHandler :: (Member Database r) => User -> Eff r Int64
fetchArticleHandler :: (Member Database r) => Int64 -> Eff r Article
createArticleHandler :: (Member Database r) => Article -> Eff r Int64
fetchArticlesByAuthorHandler :: (Member Database r) => Int64 -> Eff r [KeyVal Article]
fetchRecentArticlesHandler :: (Member Database r) => Eff r [(KeyVal User, KeyVal Article)]

Conclusion

We’ve come a long way with our small application. It doesn’t do much. But it has served as a great launchpad for learning many interesting libraries and techniques. In particular, we’ve seen in these last few weeks how to organize effects within our application. With the Eff library, we can represent our effects with data types that we can re-order with ease.

If you’ve never tried Haskell before, give it a shot! Download our Getting Started Checklist and get going!

If you’ve done a little Haskell but aren’t set on your skills yet, maybe this article went over your head. That’s OK! You can work on your skills more with our Recursion Workbook!

Read More
James Bowen James Bowen

A Different Point of View: Interpreting our Monads Without Outside Services

Last week we updated our API to use some interesting monadic constructs. These allowed us to narrow down the places where effects could happen in our application. This week we’ll examine another advantage of this system. We’ll examine how we can simplify our tests and remove the dependency on outside services.

You can follow along this code by looking at the effects-2 branches on the Github repository. In effects-2-start, we’ve updated our tests to use the AppMonad instead of normal IO functions. We can still do better though (see the effects-2-end branch for the final product). We can create a second monad that implements our MonadDatabase and MonadCache classes. This creates what we call a different interpretation of our effects. We can do this in such a way that they don’t rely on running instances of Postgres and Redis.

Re-Imagining our Monad

Let’s imagine the simplest possible way to have a “database”. Instead of using a remote service, we could use in-memory maps. So let’s start with a couple type synonyms:

type UserMap = Map.Map Int64 User
type ArticleMap = Map.Map Int64 Article

There are three different maps in our application. The first map will be our normal Users table from the database. The second map will be the database’s Article table. The third map will refer to our Users cache. Now we’ll create a monad that links all these different elements together, and wraps them in StateT. We’ll then be able to update these maps between requests. We still need IO on our monad stack for reasons we’ll see later.

newtype TestMonad a = TestMonad (StateT (UserMap, ArticleMap, UserMap) IO a)
  deriving (Functor, Applicative, Monad)

instance MonadIO TestMonad where
  liftIO action = TestMonad $ liftIO action

Now we want to create instances of our database type classes for this monad. Let’s start an implementation of MonadDatabase by considering how we’ll fetch a user:

instance MonadDatabase TestMonad where
  fetchUserDB uid = ...

All we need to do is grab the first map out of our state tuple, and then use the normal Map lookup function! We can do the same with an article:

fetchUserDB uid = TestMonad $ do
  userDB <- (view _1) <$> get
  return $ Map.lookup uid userDB

fetchArticleDB aid = TestMonad $ do
  articleDB <- (view _2) <$> get
  return $ Map.lookup aid articleDB

Creating elements is a little more complicated, since we have to generate the keys. This isn’t that hard though! We’ll check if the map is empty and use 1 for the key if there are no entries. Otherwise find the max key and add 1 to it (note that the API for Map.findMax has changed since I wrote this) :

createUserDB user = TestMonad $ do
  (userDB, articleDB, userCache) <- get
  let newUid = if Map.null userDB
        then 1
        else 1 + (fst . Map.findMax) userDB
  ...

Now we’ll create a modified map by inserting our new element. Then we’ll put the modified map back in along with the other maps:

createUserDB user = TestMonad $ do
  (userDB, articleDB, userCache) <- get
  let newUid = if Map.null userDB
        then 1
        else 1 + (fst . Map.findMax) userDB
  let userDB' = Map.insert newUid user userDB
  put (userDB', articleDB, userCache)
  return newUid

createArticleDB article = TestMonad $ do
  (userDB, articleDB, userCache) <- get
  let newAid = if Map.null articleDB
        then 1
        else 1 + (fst . Map.findMax) articleDB
  let articleDB' = Map.insert newAid article articleDB
  put (userDB, articleDB', userCache)
  return newAid

Deletion follows the same general pattern. The only difference is we delete from the map instead of inserting!

deleteUserDB uid = TestMonad $ do
  (userDB, articleDB, userCache) <- get
  let userDB' = Map.delete uid userDB
  put (userDB', articleDB, userCache)

deleteArticleDB aid = TestMonad $ do
  (userDB, articleDB, userCache) <- get
  let articleDB' = Map.delete aid articleDB
  put (userDB, articleDB', userCache)

Now our final two functions will involve actually performing some application logic. To fetch articles by author, we get the list of articles in our database and filter it using the author ID:

fetchArticlesByAuthor uid = TestMonad $ do
  articleDB <- (view _2) <$> get
  return $ map KeyVal (filter articleByAuthor (Map.toList articleDB))
  where
    articleByAuthor (_, article) = articleAuthorId article == toSqlKey uid

For fetching the recent articles, we first sort all the articles in our map by timestamp. Then we take the ten most recent:

fetchRecentArticles = TestMonad $ do
  (userDB, articleDB, _) <- get
  let recentArticles = take 10 (sortBy orderByTimestamp (Map.toList articleDB)) 
  ...
  where
    orderByTimestamp (_, article1) (_, article2) =
      articlePublishedTime article2 `compare` articlePublishedTime article1

But now we have to match each of them with right user. This involves performing a lookup based on the user ID. But then we’re done!

fetchRecentArticles = TestMonad $ do
  (userDB, articleDB, _) <- get
  let recentArticles = take 10 (sortBy orderByTimestamp (Map.toList articleDB)) 
  return $ map (matchWithAuthor userDB) recentArticles
  where
    orderByTimestamp (_, article1) (_, article2) =
      articlePublishedTime article2 `compare` articlePublishedTime article1
    matchWithAuthor userDB (aid, article) =
      case Map.lookup (fromSqlKey (articleAuthorId article)) userDB of
        Nothing -> error "Found article with no user" 
        Just u -> (KeyVal (fromSqlKey (articleAuthorId article), u), KeyVal (aid, article))

Our instance for MonadCache is very similar. We'll manipulate the third map instead of the first 2:

instance MonadCache TestMonad where
  cacheUser uid user = TestMonad $ do
    (userDB, articleDB, userCache) <- get
    let userCache' = Map.insert uid user userCache
    put (userDB, articleDB, userCache')
  fetchCachedUser uid = TestMonad $ do
    userCache <- (view _3) <$> get
    return $ Map.lookup uid userCache
  deleteCachedUser uid = TestMonad $ do
    (userDB, articleDB, userCache) <- get
    let userCache' = Map.delete uid userCache
    put (userDB, articleDB, userCache')

Another Natural Transformation

Now we’re not quite done. We need the ability to run a version of our server that uses this interpretation of our effects. To do this, we need a natural transformation like we had before with AppMonad. Unfortunately, the StateT of our maps won’t get threaded through properly unless we use a pointer to it. This is why we need IO on our stack. Here’s a function that will use a pointer (MVar) to our state, run it, and then swap in the new map.

runStateTWithPointer :: (Exception e, MonadIO m) => StateT s m a -> MVar s -> m (Either e a)
runStateTWithPointer action ref = do
  env <- liftIO $ readMVar ref
  (val, newEnv) <- runStateT action env
  void $ liftIO $ swapMVar ref newEnv
  return $ Right val

Now for our transformation, we’ll take this pointer and run the state. Then we need to catch exceptions like we did in our transformation for AppMonad:

transformTestToHandler :: MVar (UserMap, ArticleMap, UserMap) -> TestMonad :~> Handler
transformTestToHandler sharedMap = NT $ \(TestMonad action) -> do
  result <- liftIO $ handleAny handler $
    runStateTWithPointer action sharedMap 
  Handler $ either throwError return result
  where
    handler :: SomeException -> IO (Either ServantErr a)
    handler e = return $ Left $ err500 { errBody = pack (show e) }

Now when we setup our tests, we’ll run our server using this transformation instead. Notice that we don’t have to do anything with Postgres or Redis here!

setupTests :: IO (ClientEnv, MVar (UserMap, ArticleMap, UserMap), ThreadId)
setupTests = do
  mgr <- newManager tlsManagerSettings
  baseUrl <- parseBaseUrl "http://127.0.0.1:8000"
  let clientEnv = ClientEnv mgr baseUrl
  let initialMap = (Map.empty, Map.empty, Map.empty)
  mapRef <- newMVar initialMap
  tid <- forkIO $
    run 8000 (serve usersAPI (testAPIServer (transformTestToHandler mapRef)))
  threadDelay 1000000
  return (clientEnv, mapRef, tid)

Now when our tests run, they’ll hit a server storing the information in memory instead of a Postgres server. This is super cool!

Integrating with our Tests

Unfortunately, it’s still a little awkward to write our tests. A lot of what they’re actually testing is the internal state of the “database” in question. So we need this function that takes the pointer to the map (the same pointer used by the server) and runs actions on it:

runTestMonad :: MVar (UserMap, ArticleMap, UserMap) -> TestMonad a -> IO a
runTestMonad mapVar (TestMonad action) = do
  currentState <- readMVar mapVar
  (result, newMap) <- runStateT action currentState
  swapMVar mapVar newMap
  return result

Now in our tests, we’ll wrap any calls to the database with this action. Here’s an example of our first before hook:

beforeHook1 :: ClientEnv -> MVar (UserMap, ArticleMap, UserMap) -> IO (Bool, Bool, Bool)
beforeHook1 clientEnv mapVar = do
  callResult <- runClientM (fetchUserClient 1) clientEnv
  let throwsError = isLeft callResult
  (inPG, inRedis) <- runTestMonad mapVar $ do
    inPG <- isJust <$> fetchUserDB 1
    inRedis <- isJust <$> fetchCachedUser 1
    return (inPG, inRedis)
  return (throwsError, inPG, inRedis)

One excellent consequence of using an in-memory map is that we don’t care if there’s data in our “database” at the end. Thus we can completely get rid of our after hooks, which were a bit of a pain!

main :: IO ()
main = do
  (clientEnv, dbMap, tid) <- setupTests
  hspec $ before (beforeHook1 clientEnv dbMap) spec1
  hspec $ before (beforeHook2 clientEnv dbMap) spec2
  hspec $ before (beforeHook3 clientEnv dbMap) spec3
  hspec $ before (beforeHook4 clientEnv dbMap) spec4
  hspec $ before (beforeHook5 clientEnv dbMap) spec5
  hspec $ before (beforeHook6 clientEnv dbMap) spec6
  killThread tid 
  return ()

And now our tests also run perfectly well without needing the docker container to be active! Hooray!

Conclusion

There’s a certain argument that we haven’t really accomplished much. Our app is very shallow, and most of the logic happens within the database calls themselves. Recall that many of our handler functions reduced to the database calls. Hence, the only thing we’re testing right now is our test interpretation!

But it’s easy to imagine that if our application were more complicated, this logic wouldn’t be at the core of our code. In most cases, database queries are the prelude to manipulating the data. And this TestMonad would remove the inconvenience of sourcing that data from outside.

Stay tuned for next week, where we’ll wrap up this consideration of effects by looking at free monads! We’ll consider the “freer-effects” library. It will let us cut down a bit on some of the boilerplate we get with this MTL style approach.

Never tried Haskell before? Do you have visions of conquering all foes with these sorts of abstractions? Check out our Getting Started Checklist and start your journey!

Have you dabbled a little but want to test your skills some more? Take a look at our Recursion Workbook!

Read More
James Bowen James Bowen

Organizing our Effects Effectively

In the last 5 weeks or so, we’ve built a web application exposing a small API. The application is quite narrow, encompassing only a small amount of functionality. But it is still deep, covering several different libraries and techniques.

In these next couple weeks, we’ll look at some architectural considerations. We’ll observe some of the weaknesses of this system, and how we can improve on them. This week will focus on an approach with type classes and monad transformers. In a couple weeks, we’ll consider free monads, and how we can use them.

You can follow along with this code on the effects-1 branch of the Github repo.

Weaknesses

In our current system, there are a lot of different functions like these:

fetchUserPG :: PGInfo -> Int64 -> IO (Maybe User)
createUserPG :: PGInfo -> User -> IO Int64
cacheUser :: RedisInfo -> Int64 -> User -> IO ()

Now, the parameters do inform us what each function should be accessing. But the functions are still regular IO functions. This means a novice programmer could come in and get the idea that it’s fine to use arbitrary effects. For instance, why not fetch our Postgres information from the Redis function? After all, fetchPGInfo is an IO function as well:

fetchPostgresConnection :: IO PGInfo
...

cacheUser :: RedisInfo -> Int64 -> User -> IO ()
cacheUser = do
  pgInfo <- fetchPostgresConnection
  -- Connect to Postgres instead of Redis :(

Our API also has some uncomfortable lifting in our handler functions. We have to call liftIO because all our database functions are IO functions.

fetchUsersHandler :: PGInfo -> RedisInfo -> Int64 -> Handler User
fetchUsersHandler pgInfo redisInfo uid = do
  -- liftIO #1
  maybeCachedUser <- liftIO $ fetchUserRedis redisInfo uid
  case maybeCachedUser of
    Just user -> return user
    Nothing -> do
      -- liftIO #2
      maybeUser <- liftIO $ fetchUserPG pgInfo uid
      case maybeUser of
        -- liftIO #3
        Just user -> liftIO (cacheUser redisInfo uid user) >> return user
        Nothing -> Handler $ (throwE $ err401 { errBody = "Could not find user with that ID" })

At the very least, our connection parameters are explicit here. If we hid them in a Reader, this would introduce even more lifts.

This article will focus on using type classes to restrict how we use effects. With any luck, we'll also clean up our code a bit and make it easier to test things. But we’ll focus more on testing more next week.

Now, depending on the project size and scope, these weaknesses might not be issues. But it’s definitely a useful exercise to see alternative ways to organize our code.

Defining our Type Classes

Our first step for limiting our effects will be to create two type classes. We'll have one for our main database, and one for our cache. We'll try to make these functions agnostic to the underlying database representation. Hence, we’ll change our API to remove the notion of Entity. We’ll replace it with the idea of KeyVal, a wrapper around a tuple.

newtype KeyVal a = KeyVal (Int64, a)

With that, here are the 8 functions we have for accessing our database:

class (Monad m) => MonadDatabase m where
  fetchUserDB :: Int64 -> m (Maybe User) 
  createUserDB :: User -> m Int64 
  deleteUserDB :: Int64 -> m ()
  fetchArticleDB :: Int64 -> m (Maybe Article)
  createArticleDB :: Article -> m Int64
  deleteArticleDB :: Int64 -> m ()
  fetchArticlesByAuthor :: Int64 -> m [KeyVal Article]
  fetchRecentArticles :: m [(KeyVal User, KeyVal Article)]

And then we have three functions for how we interact with our cache:

class (Monad m) => MonadCache m where
  cacheUser :: Int64 -> User -> m ()
  fetchCachedUser :: Int64 -> m (Maybe User)
  deleteCachedUser :: Int64 -> m ()

We can now create instances of these type classes for any different monad we want to use. Let’s start by describing implementations for our existing libraries.

Writing Instances

We’ll start with SqlPersistT. We want to make an instance of MonadDatabase for it. We'll gather all the different functionality from the last few articles.

instance (MonadIO m, MonadLogger m) => MonadDatabase (SqlPersistT m) where
  fetchUserDB uid = get (toSqlKey uid)

  createUserDB user = fromSqlKey <$> insert user

  deleteUserDB uid = delete (toSqlKey uid :: Key User)

  fetchArticleDB aid = ((fmap entityVal) . listToMaybe) <$> (select . from $ \articles -> do
    where_ (articles ^. ArticleId ==. val (toSqlKey aid))
    return articles)

  createArticleDB article = fromSqlKey <$> insert article

  deleteArticleDB aid = delete (toSqlKey aid :: Key Article)

  fetchArticlesByAuthor uid = do
    entities <- select . from $ \articles -> do
      where_ (articles ^. ArticleAuthorId ==. val (toSqlKey uid))
      return articles
    return $ unEntity <$> entities

  fetchRecentArticles = do
    tuples <- select . from $ \(users `InnerJoin` articles) -> do
      on (users ^. UserId ==. articles ^. ArticleAuthorId)
      orderBy [desc (articles ^. ArticlePublishedTime)]
      limit 10
      return (users, articles)
    return $ (\(userEntity, articleEntity) -> (unEntity userEntity, unEntity articleEntity)) <$> tuples

Since we’re removing Entity from our API, we use this unEntity function. It will give us back the key and value as a KeyVal:

unEntity :: (ToBackendKey SqlBackend a) => Entity a -> KeyVal a
unEntity (Entity id_ val_) = KeyVal (fromSqlKey id_, val_)

Now we’ll do the same with our cache functions. We’ll make an instance of MonadCache for the Redis monad:

instance MonadCache Redis where
  cacheUser uid user = void $ setex (pack . show $ uid) 3600 (pack . show $ user)
  fetchCachedUser uid = do
    result <- get (pack . show $ uid)
    case result of
      Right (Just userString) -> return $ Just (read . unpack $ userString)
      _ -> return Nothing
  deleteCachedUser uid = void $ del [pack . show $ uid]

And that’s all there is here! Let’s see how we can combine these for easy use within our API.

Making our App Monad

We’d like to describe an “App Monad” that will allow us to access both these functionalities with ease. We’ll make a wrapper around a monad transformer incorporating a Reader for the Redis information and the SqlPersistT monad. We derive Monad for this type using GeneralizedNewtypeDeriving:

{-# LANGUAGE GeneralizedNewtypeDeriving #-}

newtype AppMonad a = AppMonad (ReaderT RedisInfo (SqlPersistT (LoggingT IO)) a)
  deriving (Functor, Applicative, Monad)

Now we’ll want to make instances of MonadDatabase and MonadCache. The instances are easy though; we'll use the instances for the underlying monads. First, let's define a transformation from an SqlPersistT action to our AppMonad. We need to build out the ReaderT RedisInfo for this. We'll use the ReaderT constructor and ignore the info with const.

liftSqlPersistT :: SqlPersistT (LoggingT IO) a -> AppMonad a
liftSqlPersistT action = AppMonad $ ReaderT (const action)

We can also define a transformation on Redis actions:

liftRedis :: Redis a -> AppMonad a
liftRedis action = do
  info <- AppMonad ask
  connection <- liftIO $ connect info
  liftIO $ runRedis connection action

We'll apply our underlying instances like so:

instance MonadDatabase AppMonad where
  fetchUserDB = liftSqlPersistT . fetchUserDB
  createUserDB = liftSqlPersistT . createUserDB
  deleteUserDB = liftSqlPersistT . deleteUserDB
  fetchArticleDB = liftSqlPersistT . fetchArticleDB
  createArticleDB = liftSqlPersistT . createArticleDB
  deleteArticleDB = liftSqlPersistT . deleteArticleDB
  fetchArticlesByAuthor = liftSqlPersistT . fetchArticlesByAuthor
  fetchRecentArticles = liftSqlPersistT fetchRecentArticles

instance MonadCache AppMonad where
  cacheUser uid user = liftRedis (cacheUser uid user)
  fetchCachedUser = liftRedis . fetchCachedUser 
  deleteCachedUser = liftRedis . deleteCachedUser

And that's it! We have our instances. Now we want to move on and figure out how we’ll actually incorporate this new monad into our API.

Writing a Natural Transformation

We would like to make it so that our handler functions can use AppMonad instead of the Handler monad. But Servant is sort’ve hard-coded to use Handler, so what do we do? The answer is we define a “Natural Transformation”.

I found this term to be a bit like "category". It seems innocuous but actually refers to something deeply mathematical. But we don't need to know too much to use it. The type operator (:~>) defines a natural transformation. All we need to make it is a function that takes an action in our monad and converts it into an action in the Handler monad. We'll need to pass our connection information to make this work.

transformAppToHandler :: PGInfo -> RedisInfo -> AppMonad :~> Handler

We’ll start by defining a “handler” that will catch any errors we throw and recast them as Servant errors. In general, you want to list the specific types of exceptions you’ll catch. It's not a great idea to catch every exception like this. But for this example, we’ll keep it simple:

handler :: SomeException -> IO (Either ServantErr a)
handler e = return $ Left $ err500 { errBody = pack (show e)}

Notice this returns an Either which is always a Left. Let's now define how we convert an action from our “AppMonad” into an Either as well. We’ll get the result and pass it on as a Right value.

runAppAction :: Exception e => AppMonad a -> IO (Either e a)
runAppAction (AppMonad action) = do
  result <- runPGAction pgInfo $ runReaderT action redisInfo
  return $ Right result

And putting it together, here’s our transformation. We catch errors, and then wrap the result up in Handler.

transformAppToHandler :: PGInfo -> RedisInfo -> AppMonad :~> Handler
transformAppToHandler pgInfo redisInfo = NT $ \action -> do
  result <- liftIO (handleAny handler (runAppAction action))
  Handler $ either throwError return result
  ...

Incorporating the App Monad

All we have to do now is incorporate our new monad into our handlers. First off, let’s change our API to remove Entities:

type FullAPI =
       "users" :> Capture "userid" Int64 :> Get '[JSON] User
  :<|> "users" :> ReqBody '[JSON] User :> Post '[JSON] Int64
  :<|> "articles" :> Capture "articleid" Int64 :> Get '[JSON] Article
  :<|> "articles" :> ReqBody '[JSON] Article :> Post '[JSON] Int64
  :<|> "articles" :> "author" :> Capture "authorid" Int64 :> Get '[JSON] [KeyVal Article]
  :<|> "articles" :> "recent" :> Get '[JSON] [(KeyVal User, KeyVal Article)]

We want to update the type of each function. The AppMonad incorporates all the configuration information. So we don’t need to pass connection information explicitly. Instead, we can use constraints on our monad type classes to expose those effects. Here’s what our type signatures look like:

fetchUsersHandler :: (MonadDatabase m, MonadCache m) => Int64 -> m User
createUserHandler :: (MonadDatabase m) => User -> m Int64
fetchArticleHandler :: (MonadDatabase m) => Int64 -> m Article
createArticleHandler :: (MonadDatabase m)=> Article -> m Int64
fetchArticlesByAuthorHandler :: (MonadDatabase m) => Int64 -> m [KeyVal Article]
fetchRecentArticlesHandler :: (MonadDatabase m) => m [(KeyVal User, KeyVal Article)]

And now a lot of our functions are simple monadic calls. We don’t even need to use “lift”!

createUserHandler :: (MonadDatabase m) => User -> m Int64
createUserHandler = createUserDB

createArticleHandler :: (MonadDatabase m)=> Article -> m Int64
createArticleHandler = createArticleDB

fetchArticlesByAuthorHandler :: (MonadDatabase m) => Int64 -> m [KeyVal Article]
fetchArticlesByAuthorHandler = fetchArticlesByAuthor

fetchRecentArticlesHandler :: (MonadDatabase m) => m [(KeyVal User, KeyVal Article)]
fetchRecentArticlesHandler = fetchRecentArticles

The “fetch” functions are a bit more complicated since we’ll want to do stuff like check the cache first. But again, all our functions are simple monadic calls without using any lifting. Here’s how our fetch handlers look:

fetchUsersHandler :: (MonadDatabase m, MonadCache m) => Int64 -> m User
fetchUsersHandler uid = do
  maybeCachedUser <- fetchCachedUser uid
  case maybeCachedUser of
    Just user -> return user
    Nothing -> do
      maybeUser <- fetchUserDB uid
      case maybeUser of
        Just user -> cacheUser uid user >> return user
        Nothing -> error "Could not find user with that ID"

fetchArticleHandler :: (MonadDatabase m) => Int64 -> m Article
fetchArticleHandler aid = do
  maybeArticle <- fetchArticleDB aid
  case maybeArticle of
    Just article -> return article
    Nothing -> error "Could not find article with that ID"

And now we’ll change our Server function. We’ll update it so that it takes our natural transformation as an argument. Then we’ll use the enter function combined with that transformation. This is how Servant knows what monad we want for our handlers:

fullAPIServer :: (AppMoand :~> Handler) -> Server FullAPI
fullAPIServer naturalTransformation =
  enter naturalTransformation $
    fetchUsersHandler :<|>
    createUserHandler :<|>
    fetchArticleHandler :<|>
    createArticleHandler :<|>
    fetchArticlesByAuthorHandler :<|>
    fetchRecentArticlesHandler

runServer :: IO ()
runServer = do
  pgInfo <- fetchPostgresConnection
  redisInfo <- fetchRedisConnection
  -- Pass the natural transformation as an argument!
  run 8000 (serve usersAPI (fullAPIServer (transformAppToHandler pgInfo redisInfo)))

And now we’re done!

Weaknesses with this Approach

Of course, this system is not without it’s weaknesses. In particular, there’s quite a bit of boilerplate. This is especially true if we don’t want to fix the ordering of our monad stack. For instance what if another part of our application puts SqlPersistT on top of Redis? What if we want to mix other monad transformers in? We’ll need new instances of MonadDatabase and MonadCache for that. We'll end up writing a lot more simple definitions. We’ll examine solutions to this weakness in a couple weeks when we look at free monads.

We’ll also need to add new functions to our type classes every time we want to update their functionality. And then we’ll have to update EVERY instance of that typeclass, which can be quite a pain. The more instances we have, the more painful it will be to add new functionality.

Conclusion

So with a few useful tricks, we can come up with code that is a lot cleaner. We employed type classes to great effect to limit how effects appear in our application. By writing instances of these classes for different monads, we can change the behavior of our application. Next week, we’ll see how we can use this behavior to write simpler tests!

When managing an application with this many dependencies you need the right tools. I used Stack for all my Haskell project organization. Check out our free Stack mini-course to learn more!

But if you’ve never tried Haskell at all, give it a try! Take a look at our Getting Started Checklist.

Read More
James Bowen James Bowen

Join the Club: Type-safe Joins with Esqueleto!

In the last four articles or so, we’ve done a real whirlwind tour of Haskell libraries. We created a database schema using Persistent and used it to write basic SQL queries in a type-safe way. We saw how to expose this database via an API with Servant. We also went ahead and added some caching to that server with Redis. Finally, we wrote some basic tests around the behavior of this API. By using Docker, we made those tests reproducible.

In this article, we’re going to review this whole process by adding another type to our schema. We’ll write some new endpoints for an Article type, and link this type to our existing User type with a foreign key. Then we’ll learn one more library: Esqueleto. Esqueleto improves on Persistent by allowing us to write type-safe SQL joins.

As with the previous articles, there’s a specific branch on the Github repository for this series. Go there and take a look at the esqueleto branch to see the complete code for this article.

Adding Article to our Schema

So our first step is to extend our schema with our Article type. We’re going to give each article a title, some body text, and a timestamp for its publishing time. One new feature we’ll see is that we’ll add a foreign key referencing the user who wrote the article. Here’s what it looks like within our schema:

PTH.share [PTH.mkPersist PTH.sqlSettings, PTH.mkMigrate "migrateAll"] [PTH.persistLowerCase|
 User sql=users
   ...

 Article sql=articles
   title Text
   body Text
   publishedTime UTCTime
   authorId UserId
   UniqueTitle title
   deriving Show Read Eq
|]

We can use UserId as a type in our schema. This will create a foreign key column when we create the table in our database. In practice, our Article type will look like this when we use it in Haskell:

data Article = Article
 { articleTitle :: Text
 , articleBody :: Text
 , articlePublishedTime :: UTCTime
 , articleAuthorId :: Key User
 }

This means it doesn’t reference the entire user. Instead, it contains the SQL key of that user. Since we’ll be adding the article to our API, we need to add ToJSON and FromJSON instances as well. These are pretty basic as well, so you can check them out here if you’re curious. If you’re curious about JSON instances in general, take a look at this article.

Adding Endpoints

Now we’re going to extend our API to expose certain information about these articles. First, we’ll write a couple basic endpoints for creating an article and then fetching it by its ID:

type FullAPI = 
      "users" :> Capture "userid" Int64 :> Get '[JSON] User
 :<|> "users" :> ReqBody '[JSON] User :> Post '[JSON] Int64
 :<|> "articles" :> Capture "articleid" Int64 :> Get '[JSON] Article
 :<|> "articles" :> ReqBody '[JSON] Article :> Post '[JSON] Int64

Now, we’ll write a couple special endpoints. The first will take a User ID as a key and then it will provide all the different articles the user has written. We’ll do this endpoint as /articles/author/:authorid.

...
 :<|> "articles" :> "author" :> Capture "authorid" Int64 :> Get '[JSON] [Entity Article]

Our last endpoint will fetch the most recent articles, up to a limit of 10. This will take no parameters and live at the /articles/recent route. It will return tuples of users and their articles, both as entities.

…
 :<|> "articles" :> "recent" :> Get '[JSON] [(Entity User, Entity Article)]

Adding Queries (with Esqueleto!)

Before we can actually implement these endpoints, we’ll need to write the basic queries for them. For creating an article, we use the standard Persistent insert function:

createArticlePG :: PGInfo -> Article -> IO Int64
createArticlePG connString article = fromSqlKey <$> runAction connString (insert article)

We could do the same for the basic fetch endpoint. But we’ll write this basic query using Esqueleto in the interest of beginning to learn the syntax. With Persistent, we used list parameters to specify different filters and SQL operations. Esqueleto instead uses a special monad to compose the different type of query. The general format of an esqueleto select call will look like this:

fetchArticlePG :: PGInfo -> Int64 -> IO (Maybe Article)
fetchArticlePG connString aid = runAction connString selectAction
 where
   selectAction :: SqlPersistT (LoggingT IO) (Maybe Article)
   selectAction = select . from $ \articles -> do
     ...

We use select . from and then provide a function that takes a table variable. Our first queries will only refer to a single table, but we'll see a join later. To complete the function, we’ll provide the monadic action that will incorporate the different parts of our query.

The most basic filtering function we can call from within this monad is where_. This allows us to provide a condition on the query, much as we could with the filter list from Persistent.

selectAction :: SqlPersistT (LoggingT IO) (Maybe Article)
   selectAction = select . from $ \articles -> do
     where_ (articles ^. ArticleId ==. val (toSqlKey aid))

First, we use the ArticleId lens to specify which value of our table we’re filtering. Then we specify the value to compare against. We not only need to lift our Int64 into an SqlKey, but we also need to lift that value using the val function.

But now that we’ve added this condition, all we need to do is return the table variable. Now, select returns our results in a list. But since we’re searching by ID, we only expect one result. We’ll use listToMaybe so we only return the head element if it exists. We’ll also use entityVal once again to unwrap the article from its entity.

selectAction :: SqlPersistT (LoggingT IO) (Maybe Article)
   selectAction = ((fmap entityVal) . listToMaybe) <$> (select . from $ \articles -> do
     where_ (articles ^. ArticleId ==. val (toSqlKey aid))
     return articles)

Now we should know enough that we can write out the next query. It will fetch all the articles that have written by a particular user. We’ll still be querying on the articles table. But now instead checking the article ID, we’ll make sure the ArticleAuthorId is equal to a certain value. Once again, we’ll lift our Int64 user key into an SqlKey and then again with val to compare it in “SQL-land”.

fetchArticleByAuthorPG :: PGInfo -> Int64 -> IO [Entity Article]
fetchArticleByAuthorPG connString uid = runAction connString fetchAction
 where
   fetchAction :: SqlPersistT (LoggingT IO) [Entity Article]
   fetchAction = select . from $ \articles -> do
     where_ (articles ^. ArticleAuthorId ==. val (toSqlKey uid))
     return articles

And that’s the full query! We want a list of entities this time, so we’ve taken out listToMaybe and entityVal.

Now let’s write the final query, where we’ll find the 10 most recent articles regardless of who wrote them. We’ll include the author along with each article. So we’re returning a list of of these different tuples of entities. This query will involve our first join. Instead of using a single table for this query, we’ll use the InnerJoin constructor to combine our users table with the articles table.

fetchRecentArticlesPG :: PGInfo -> IO [(Entity User, Entity Article)]
fetchRecentArticlesPG connString = runAction connString fetchAction
 where
   fetchAction :: SqlPersistT (LoggingT IO) [(Entity User, Entity Article)]
   fetchAction = select . from $ \(users `InnerJoin` articles) -> do

Since we’re joining two tables together, we need to specify what columns we’re joining on. We’ll use the on function for that:

fetchAction :: SqlPersistT (LoggingT IO) [(Entity User, Entity Article)]
   fetchAction = select . from $ \(users `InnerJoin` articles) -> do
     on (users ^. UserId ==. articles ^. ArticleAuthorId)

Now we’ll order our articles based on the timestamp of the article using orderBy. The newest articles should come first, so we'll use a descending order. Then we limit the number of results with the limit function. Finally, we’ll return both the users and the articles, and we’re done!

fetchAction :: SqlPersistT (LoggingT IO) [(Entity User, Entity Article)]
   fetchAction = select . from $ \(users `InnerJoin` articles) -> do
     on (users ^. UserId ==. articles ^. ArticleAuthorId)
     orderBy [desc (articles ^. ArticlePublishedTime)]
     limit 10
     return (users, articles)

Caching Different Types of Items

We won’t go into the details of caching our articles in Redis, but there is one potential issue we want to observe. Currently we’re using a user’s SQL key as their key in our Redis store. So for instance, the string “15” could be such a key. If we try to naively use the same idea for our articles, we’ll have a conflict! Trying to store an article with ID “15” will overwrite the entry containing the User!

But the way around this is rather simple. What we would do is that for the user’s key, we would make the string something like users:15. Then for our article, we’ll have its key be articles:15. As long as we deserialize it the proper way, this will be fine.

Filling in the Server handlers

Now that we’ve written our database query functions, it is very simple to fill in our Server handlers. Most of them boil down to following the patterns we’ve already set with our other two endpoints:

fetchArticleHandler :: PGInfo -> Int64 -> Handler Article
fetchArticleHandler pgInfo aid = do
 maybeArticle <- liftIO $ fetchArticlePG pgInfo aid
 case maybeArticle of
   Just article -> return article
   Nothing -> Handler $ (throwE $ err401 { errBody = "Could not find article with that ID" })

createArticleHandler :: PGInfo -> Article -> Handler Int64
createArticleHandler pgInfo article = liftIO $ createArticlePG pgInfo article

fetchArticlesByAuthorHandler :: PGInfo -> Int64 -> Handler [Entity Article]
fetchArticlesByAuthorHandler pgInfo uid = liftIO $ fetchArticlesByAuthorPG pgInfo uid

fetchRecentArticlesHandler :: PGInfo -> Handler [(Entity User, Entity Article)]
fetchRecentArticlesHandler pgInfo = liftIO $ fetchRecentArticlesPG pgInfo

Then we’ll complete our Server FullAPI like so:

fullAPIServer :: PGInfo -> RedisInfo -> Server FullAPI
fullAPIServer pgInfo redisInfo =
 (fetchUsersHandler pgInfo redisInfo) :<|>
 (createUserHandler pgInfo) :<|>
 (fetchArticleHandler pgInfo) :<|>
 (createArticleHandler pgInfo) :<|>
 (fetchArticlesByAuthorHandler pgInfo) :<|>
 (fetchRecentArticlesHandler pgInfo)

One interesting thing we can do is that we can compose our API types into different sections. For instance, we could separate our FullAPI into two parts. First, we could have the UsersAPI type from before, and then we could make a new type for ArticlesAPI. We can glue these together with the e-plus operator just as we could individual endpoints!

type FullAPI = UsersAPI :<|> ArticlesAPI

type UsersAPI =
      "users" :> Capture "userid" Int64 :> Get '[JSON] User
 :<|> "users" :> ReqBody '[JSON] User :> Post '[JSON] Int64

type ArticlesAPI =
 "articles" :> Capture "articleid" Int64 :> Get '[JSON] Article
 :<|> "articles" :> ReqBody '[JSON] Article :> Post '[JSON] Int64
 :<|> "articles" :> "author" :> Capture "authorid" Int64 :> Get '[JSON] [Entity Article]
 :<|> "articles" :> "recent" :> Get '[JSON] [(Entity User, Entity Article)]

If we do this, we’ll have to make similar adjustments in other areas combining the endpoints. For example, we would need to update the server handler joining and the client functions.

Writing Tests

Since we already have some user tests, it would also be good to have a few tests on the Articles section of the API. We’ll add one simple test around creating an article and then fetching it. Then we’ll add one test each for the "articles-by-author" and "recent articles" endpoints.

So one of the tricky parts of filling in this section will be that we need to make test Article object. But we'll need them to be functions on the User ID. This is because we can’t know a priori what SQL IDs we'll get when we insert the users into the database. But we can fill in all the other fields, including the published time. Here’s one example, but we’ll have a total of 18 different “test” articles.

testArticle1 :: Int64 -> Article
testArticle1 uid = Article
 { articleTitle = "First post"
 , articleBody = "A great description of our first blog post body."
 , articlePublishedTime = posixSecondsToUTCTime 1498914000
 , articleAuthorId = toSqlKey uid
 }

-- 17 other articles and some test users as well
…

Our before hooks will create all these different entities in the database. In general, we’ll go straight to the database without calling the API itself. Like with our users tests, we'll want to delete any database items we create. Let's write a generic after-hook that will take user IDs and article IDs and delete them from our database:

deleteArtifacts :: PGInfo -> RedisInfo -> [Int64] -> [Int64] -> IO ()
deleteArtifacts pgInfo redisInfo users articles = do
 void $ forM articles $ \a -> deleteArticlePG pgInfo a
 void $ forM users $ \u -> do
   deleteUserCache redisInfo u
   deleteUserPG pgInfo u

It’s important we delete the articles first! If we delete the users first, we'll encounter foreign key exceptions!

Our basic create-and-fetch test looks a lot like the previous user tests. We test the success of the response and that the new article lives in Postgres as we expect.

beforeHook4 :: ClientEnv -> PGInfo -> IO (Bool, Bool, Int64, Int64)
beforeHook4 clientEnv pgInfo = do
 userKey <- createUserPG pgInfo testUser2
 articleKeyEither <- runClientM (createArticleClient (testArticle1 userKey)) clientEnv
 case articleKeyEither of
   Left _ -> error "DB call failed on spec 4!"
   Right articleKey -> do
     fetchResult <- runClientM (fetchArticleClient articleKey) clientEnv
     let callSucceeds = isRight fetchResult
     articleInPG <- isJust <$> fetchArticlePG pgInfo articleKey
     return (callSucceeds, articleInPG, userKey, articleKey)

spec4 :: SpecWith (Bool, Bool, Int64, Int64)
spec4 = describe "After creating and fetching an article" $ do
 it "The fetch call should return a result" $ \(succeeds, _, _, _) -> succeeds `shouldBe` True
 it "The article should be in Postgres" $ \(_, inPG, _, _) -> inPG `shouldBe` True

afterHook4 :: PGInfo -> RedisInfo -> (Bool, Bool, Int64, Int64) -> IO ()
afterHook4 pgInfo redisInfo (_, _, uid, aid) = deleteArtifacts pgInfo redisInfo [uid] [aid]

Our next test will create two different users and several different articles. We'll first insert the users and get their keys. Then we can use these keys to create the articles. We create five articles in this test. We assign three to the first user, and two to the second user:

beforeHook5 :: ClientEnv -> PGInfo -> IO ([Article], [Article], Int64, Int64, [Int64])
beforeHook5 clientEnv pgInfo = do
 uid1 <- createUserPG pgInfo testUser3
 uid2 <- createUserPG pgInfo testUser4
 articleIds <- mapM (createArticlePG pgInfo)
   [ testArticle2 uid1, testArticle3 uid1, testArticle4 uid1
   , testArticle5 uid2, testArticle6 uid2 ]
 ...

Now we want to test that we when call the articles-by-user endpoint, we only get the right articles. We’ll return each group of articles, the user IDs, and the list of article IDs:

beforeHook5 :: ClientEnv -> PGInfo -> IO ([Article], [Article], Int64, Int64, [Int64])
beforeHook5 clientEnv pgInfo = do
 uid1 <- createUserPG pgInfo testUser3
 uid2 <- createUserPG pgInfo testUser4
 articleIds <- mapM (createArticlePG pgInfo)
   [ testArticle2 uid1, testArticle3 uid1, testArticle4 uid1
   , testArticle5 uid2, testArticle6 uid2 ]
 firstArticles <- runClientM (fetchArticlesByAuthorClient uid1) clientEnv
 secondArticles <- runClientM (fetchArticlesByAuthorClient uid2) clientEnv
 case (firstArticles, secondArticles) of
   (Right as1, Right as2) -> return (entityVal <$> as1, entityVal <$> as2, uid1, uid2, articleIds)
   _ -> error "Spec 5 failed!"

Now we can write the assertion itself, testing that the articles returned are what we expect.

spec5 :: SpecWith ([Article], [Article], Int64, Int64, [Int64])
spec5 = describe "When fetching articles by author ID" $ do
 it "Fetching by the first author should return 3 articles" $ \(firstArticles, _, uid1, _, _) ->
   firstArticles `shouldBe` [testArticle2 uid1, testArticle3 uid1, testArticle4 uid1]
 it "Fetching by the second author should return 2 articles" $ \(_, secondArticles, _, uid2, _) ->
   secondArticles `shouldBe` [testArticle5 uid2, testArticle6 uid2]

We would then follow that up with a similar after hook.

The final test will follow a similar pattern. Only this time, we’ll be checking the combinations of users and articles. We’ll also make sure to include 12 different articles to test that the API limits results to 10.

beforeHook6 :: ClientEnv -> PGInfo -> IO ([(User, Article)], Int64, Int64, [Int64])
beforeHook6 clientEnv pgInfo = do
 uid1 <- createUserPG pgInfo testUser5
 uid2 <- createUserPG pgInfo testUser6
 articleIds <- mapM (createArticlePG pgInfo)
   [ testArticle7 uid1, testArticle8 uid1, testArticle9 uid1, testArticle10 uid2
   , testArticle11 uid2, testArticle12 uid1, testArticle13 uid2, testArticle14 uid2
   , testArticle15 uid2, testArticle16 uid1, testArticle17 uid1, testArticle18 uid2
   ]
 recentArticles <- runClientM fetchRecentArticlesClient clientEnv
 case recentArticles of
   Right as -> return (entityValTuple <$> as, uid1, uid2, articleIds)
   _ -> error "Spec 6 failed!"
 where
   entityValTuple (Entity _ u, Entity _ a) = (u, a)

Our spec will check that the list of 10 articles we get back matches our expectations. Then, as always, we remove the entities from our database.

Now we call these tests with our other tests, with small wrappers to call the hooks:

main :: IO ()
main = do
 ...
 hspec $ before (beforeHook4 clientEnv pgInfo) $ after (afterHook4 pgInfo redisInfo) $ spec4
 hspec $ before (beforeHook5 clientEnv pgInfo) $ after (afterHook5 pgInfo redisInfo) $ spec5
 hspec $ before (beforeHook6 clientEnv pgInfo) $ after (afterHook6 pgInfo redisInfo) $ spec6

And now we’re done! The tests pass!

…
After creating and fetching an article
 The fetch call should return a result
 The article should be in Postgres

Finished in 0.1698 seconds
2 examples, 0 failures

When fetching articles by author ID
 Fetching by the first author should return 3 articles
 Fetching by the second author should return 2 articles

Finished in 0.4944 seconds
2 examples, 0 failures

When fetching recent articles
 Should fetch exactly the 10 most recent articles

Conclusion

This completes our overview of useful production libraries. Over these articles, we’ve constructed a small web API from scratch. We’ve seen some awesome abstractions that let us deal with only the most important pieces of the project. Both Persistent and Servant generated a lot of extra boilerplate for us. This article showed the power of the Esqueleto library in allowing us to do type-safe joins. We also saw an end-to-end process of adding a new type and endpoints to our API.

In the coming weeks, we’ll be dealing with some more issues that can arise when building these kinds of systems. In particular, we’ll see how we can use alternative monads on top of Servant. Doing this can present certain issues that we'll explore. We’ll culminate by exploring the different approaches to encapsulating effects.

Be sure to check out our Haskell Stack mini-course!! It'll show you how to use Stack, so you can incorproate all the libraries from this series!

If you’re new to Haskell and not ready for that yet, take a look at our Getting Started Checklist and get going!

Read More
James Bowen James Bowen

Tangled Webs: Testing an Integrated System

In the last few articles, we’ve combined several useful Haskell libraries to make a small web app. We used Persistent to create a schema with automatic migrations for our database. Then we used Servant to expose this database as an API through a couple simple queries. Finally, we used Redis to act as a cache so that repeated requests for the same user happen faster.

For our next step, we’ll tackle a thorny issue: testing. How do we test a system that has so many moving parts? There are a couple different general approaches we could take. On one end of the spectrum we could mock out most of our API calls and services. This helps gives our testing deterministic behavior. This is desirable since we would want to tie deployments to test results. But we also want to be faithfully testing our API. So on the other end, there’s the approach we’ll try in this article. We'll set up functions to run our API and other services, and then use before and after hooks to use them.

Creating Client Functions for Our API

Calling our API from our tests means we’ll want a way to make API calls programmatically. We can do this with amazing ease with a Servant API by using the servant-client library. This library has one main function: client. This function takes a proxy for our API and generates programmatic client functions. Let's remember our basic endpoint types (after resolving connection information parameters):

fetchUsersHandler :: Int64 -> Handler User
createUserHandler :: User -> Handler Int64

We’d like to be able to call these API’s with functions that use the same parameters. Those types might look something like this:

fetchUserClient :: Int64 -> m User
createUserClient :: User -> m Int64

Where m is some monad. And in this case, the ServantClient library provides such a monad, ClientM. So let’s re-write these type signatures, but leave them seemingly unimplemented:

fetchUserClient :: Int64 -> ClientM User
createUserClient :: User -> ClientM Int64

Now we’ll construct a pattern match that combines these function names with the :<|> operator. As always, we need to make sure we do this in the same order as the original API type. Then we’ll set this pattern to be the result of calling client on a proxy for our API:

fetchUserClient :: Int64 -> ClientM User
createUserClient :: User -> ClientM Int64
(fetchUserClient :<|> createUserClient) = client (Proxy :: Proxy UsersAPI)

And that’s it! The Servant library fills in the details for us and implements these functions! We’ll see how we can actually call these functions later in the article.

Setting Up the Tests

We’d like to get to the business of deciding on our test cases and writing them. But first we need to make sure that our tests have a proper environment. This means 3 things. First we need to fetch the connection information for our data stores and API. This means the PGInfo, the RedisInfo, and the ClientEnv we’ll use to call the client functions we wrote. Second, we need to actually migrate our database so it has the proper tables. Third, we need to make sure our server is actually running. Let’s start with the connection information, as this is easy:

import Database (fetchPostgresConnection, fetchRedisConnection)

...

setupTests = do
  pgInfo <- fetchPostgresConnection
  redisInfo <- fetchRedisConnection
  ...

Now to create our client environment, we’ll need two main things. We’ll need a manager for the network connections and the base URL for the API. Since we’re running the API locally, we’ll use a localhost URL. The default manager from the Network library will work fine for us:

import Network.HTTP.Client (newManager)
import Network.HTTP.Client.TLS (tlsManagerSettings)
import Servant.Client (ClientEnv(..))

main = do
  pgInfo <- fetchPostgresConnection
  redisInfo <- fetchRedisConnection
  mgr <- newManager tlsManagerSettings
  baseUrl <- parseBaseUrl "http://127.0.0.1:8000"
  let clientEnv = ClientEnv mgr baseUrl

Now we can run our migration, which will ensure that our users table exists:

import Schema (migrateAll)

main = do
  pgInfo <- fetchPostgresConnection
  ...
  runStdoutLoggingT $ withPostgresqlConn pgInfo $ \dbConn ->
    runReaderT (runMigrationSilent migrateAll) dbConn

Last of all, we’ll start our server with runServer from our API module. We’ll fork this off to a separate thread, as otherwise it will block the test thread! We’ll wait for a second afterward to make sure it actually loads before the tests run (there are less hacky ways to do this of course). But then we’ll return all the important information we need, and we're done with test setup:

main :: IO (PGInfo, RedisInfo, ClientEnv, ThreadID)
main = do
  pgInfo <- fetchPostgresConnection
  redisInfo <- fetchRedisConnection
  mgr <- newManager tlsManagerSettings
  baseUrl <- parseBaseUrl "http://127.0.0.1:8000"
  let clientEnv = ClientEnv mgr baseUrl
  runStdoutLoggingT $ withPostgresqlConn pgInfo $ \dbConn ->
    runReaderT (runMigrationSilent migrateAll) dbConn
  threadId <- forkIO runServer
  threadDelay 1000000
  return (pgInfo, redisInfo, clientEnv, serverThreadId)

Organizing our 3 Test Cases

Now that we’re all set up, we can decide on our test cases. We’ll look at 3. First, if we have an empty database and we fetch a user by some arbitrary ID, we’ll expect an error. Further, we should expect that the user does not exist in the database or in the cache, even after calling fetch.

In our second test case, we’ll look at the effects of calling the create endpoint. We’ll save the key we get from this endpoint. Then we’ll verify that this user exists in the database, but NOT in the cache. Finally, our third case will insert the user with the create endpoint and then fetch the user. We’ll expect at the end of this that in fact the user exists in both the database AND the cache.

We organize each of our tests into up to three parts: the “before hook”, the test assertions, and the “after hook”. A “before hook” is some IO code we’ll run that will return particular results to our test assertion. We want to make sure it’s done running BEFORE any test assertions. This way, there’s no interleaving of effects between our test output and the API calls. Each before hook will first make the API calls we want. Then they'll investigate our different databases and determine if certain users exist.

We also want our tests to be database-neutral. That is, the database and cache should be in the same state after the test as they were before. So we’ll also have “after hooks” that run after our tests have finished (if we’ve actually created anything). The after hooks will delete any new entries. This means our before hooks also have to pass the keys for any database entities they create. This way the after hooks know what to delete.

Last of course, we actually need the testing code that assertions about the results. These will be pretty straightforward as we’ll see below.

Test #1

For our first test, we’ll start by making a client call to our API. We use runClientM combined with our clientEnv and the fetchUserClient function. Next, we’ll determine that the call in fact returns an error as it should. Then we’ll add two more lines checking if there’s an entry with the arbitrary ID in our database and our cache. Finally, we return all three boolean values:

beforeHook1 :: ClientEnv -> PGInfo -> RedisInfo -> IO (Bool, Bool, Bool)
beforeHook1 clientEnv pgInfo redisInfo = do
  callResult <- runClientM (fetchUserClient 1) clientEnv
  let throwsError = isLeft (callResult)
  inPG <- isJust <$> fetchUserPG pgInfo 1
  inRedis <- isJust <$> fetchUserRedis redisInfo 1
  return (throwsError, inPG, inRedis)

Now we’ll write our assertion. Since we’re using a before hook returning three booleans, the type of our Spec will be SpecWith (Bool, Bool, Bool). Each it assertion will take this boolean tuple as a parameter, though we’ll only use one for each line.

spec1 :: SpecWith (Bool, Bool, Bool)
spec1 = describe "After fetching on an empty database" $ do
  it "The fetch call should throw an error" $ \(throwsError, _, _) -> throwsError `shouldBe` True
  it "There should be no user in Postgres" $ \(_, inPG, _) -> inPG `shouldBe` False
  it "There should be no user in Redis" $ \(_, _, inRedis) -> inRedis `shouldBe` False

And that’s all we need for the first test! We don’t need an after hook since it doesn’t add anything to our database.

Tests 2 and 3

Now that we’re a little more familiar with how this code works, let’s take a quick look at the next before hook. This time we’ll first try creating our user. If this fails for whatever reason, we’ll throw an error and stop the tests. Then we can use the key to check out if the user exists in our database and Redis. We return the boolean values and the key.

beforeHook2 :: ClientEnv -> PGInfo -> RedisInfo -> IO (Bool, Bool, Int64)
beforeHook2 clientEnv pgInfo redisInfo = do
  userKeyEither <- runClientM (createUserClient testUser) clientEnv
  case userKeyEither of
    Left _ -> error "DB call failed on spec 2!"
    Right userKey -> do 
      inPG <- isJust <$> fetchUserPG pgInfo userKey
      inRedis <- isJust <$> fetchUserRedis redisInfo userKey
      return (inPG, inRedis, userKey)

Now our spec will look similar. This time we expect to find a user in Postgres, but not in Redis.

spec2 :: SpecWith (Bool, Bool, Int64)
spec2 = describe "After creating the user but not fetching" $ do
  it "There should be a user in Postgres" $ \(inPG, _, _) -> inPG `shouldBe` True
  it "There should be no user in Redis" $ \(_, inRedis, _) -> inRedis `shouldBe` False

Now we need to add the after hook, which will delete the user from the database and cache. Of course, we expect the user won’t exist in the cache, but we include this since we’ll need it in the final example:

afterHook :: PGInfo -> RedisInfo -> (Bool, Bool, Int64) -> IO ()
afterHook pgInfo redisInfo (_, _, key) = do
  deleteUserCache redisInfo key
  deleteUserPG pgInfo key

Last, we’ll write one more test case. This will mimic the previous case, except we’ll throw in a call to fetch in between. As a result, we expect the user to be in both Postgres and Redis:

beforeHook3 :: ClientEnv -> PGInfo -> RedisInfo -> IO (Bool, Bool, Int64)
beforeHook3 clientEnv pgInfo redisInfo = do
  userKeyEither <- runClientM (createUserClient testUser) clientEnv
  case userKeyEither of
    Left _ -> error "DB call failed on spec 3!"
    Right userKey -> do 
      _ <- runClientM (fetchUserClient userKey) clientEnv 
      inPG <- isJust <$> fetchUserPG pgInfo userKey
      inRedis <- isJust <$> fetchUserRedis redisInfo userKey
      return (inPG, inRedis, userKey)

spec3 :: SpecWith (Bool, Bool, Int64)
spec3 = describe "After creating the user and fetching" $ do
  it "There should be a user in Postgres" $ \(inPG, _, _) -> inPG `shouldBe` True
  it "There should be a user in Redis" $ \(_, inRedis, _) -> inRedis `shouldBe` True

And it will use the same after hook as case 2, so we’re done!

Hooking in and running the tests

The last step is to glue all our pieces together with hspec, before, and after. Here’s our main function, which also kills the thread running the server once it’s done:

main :: IO ()
main = do
  (pgInfo, redisInfo, clientEnv, tid) <- setupTests
  hspec $ before (beforeHook1 clientEnv pgInfo redisInfo) spec1
  hspec $ before (beforeHook2 clientEnv pgInfo redisInfo) $ after (afterHook pgInfo redisInfo) $ spec2
  hspec $ before (beforeHook3 clientEnv pgInfo redisInfo) $ after (afterHook pgInfo redisInfo) $ spec3
  killThread tid 
  return ()

And now our tests should pass!

After fetching on an empty database
  The fetch call should throw an error
  There should be no user in Postgres
  There should be no user in Redis

Finished in 0.0410 seconds
3 examples, 0 failures

After creating the user but not fetching
  There should be a user in Postgres
  There should be no user in Redis

Finished in 0.0585 seconds
2 examples, 0 failures

After creating the user and fetching
  There should be a user in Postgres
  There should be a user in Redis

Finished in 0.0813 seconds
2 examples, 0 failures

Using Docker

So when I say, “the tests pass”, they now work on my system. But if you were to clone the code as is and try to run them, you would get failures. The tests depend on Postgres and Redis, so if you don't have them running, they fail! It is quite annoying to have your tests depend on these outside services. This is the weakness of devising our tests as we have. It increases the on-boarding time for anyone coming into your codebase. The new person has to figure out which things they need to run, install them, and so on.

So how do we fix this? One answer is by using Docker. Docker allows you to create containers that have particular services running within them. This spares you from worrying about the details of setting up the services on your local machine. Even more important, you can deploy a docker image to your remote environments. So develop and prod will match your local system. To setup this process, we’ll create a description of the services we want running on our Docker container. We do this with a docker-compose file. Here’s what ours looks like:

version: '2'

services:
  postgres:
    image: postgres:9.6
    container_name: prod-haskell-series-postgres
    ports:
      - "5432:5432"

  redis:
    image: redis:4.0
    container_name: prod-haskell-series-redis
    ports:
      - "6379:6379"

Then, you can start these services for your Docker machines with docker-compose up. Granted, you do have to install and run Docker. But if you have several different services, this is a much easier on-boarding process. Better yet, the "compose" file ensures everyone uses the same versions of these services.

Even with this container running, the tests will still fail! That’s because you also need the tests themselves to be running on your Docker cluster. But with Stack, this is easy! We’ll add the following flag to our stack.yaml file:

docker:
  enable: true

Now, whenever you build and test your program, you will do so on Docker. The first time you do this, Docker will need to set everything up on the container. This means it will have to download Stack and ALL the different packages you use. So the first run will take a while. But subsequent runs will be normal. So after all that finishes, NOW the tests should work!

Conclusion

Testing integrated systems is hard. We can try mocking out the behavior of external services. But this can lead to a test representation of our program that isn’t faithful to the production system. But using the before and after hooks from Hspec is a great way make sure all your external events happen first. Then you can pass those results to simpler test assertions.

When it comes time to run your system, it helps if you can bring up all your external services with one command! Docker allows you to do this by listing the different services in the docker-compose file. Then, Stack makes it easy to run your program and tests on a docker container, so you can use the services!

Stack is the key to all this integration. If you’ve never used Stack before, you should check out our free mini-course. It will teach you all the basics of organizing a Haskell project using Stack.

If this is your first exposure to Haskell, I’ve hopefully convinced of some of its awesome possibilities! Take a look at our Getting Started Checklist and get learning!

Read More
James Bowen James Bowen

A Cache is Fast: Enhancing our API with Redis

In the last couple weeks we’ve used Persistent to store a User type in a Postgresql database. Then we were able to use Servant to create a very simple API that exposed this database to the outside world. This week, we’re going to look at how we can improve the performance of our API using a Redis cache.

One cannot overstate the importance of caching in both software and hardware. There's a hierarchy of memory types from registers, to RAM, to the File system, to a remote database. Accessing each of these gets progressively slower (by orders of magnitude). But the faster means of storage are more expensive, so we can’t always have as much as we'd like.

But memory usage operates on a very important principle. When we use a piece of memory once, we’re very likely to use it again in the near-future. So when we pull something out of long-term memory, we can temporarily store it in short-term memory as well. This way when we need it again, we can get it faster. After a certain point, that item will be overwritten by other more urgent items. This is the essence of caching.

Redis 101

Redis is an application that allows us to create a key-value store of items. It functions like a database, except it only uses these keys. It lacks the sophistication of joins, foreign table references and indices. So we can’t run the kinds of sophisticated queries that are possible on an SQL database. But we can run simple key lookups, and we can do them faster. In this article, we'll use Redis as a short-term cache for our user objects.

For this article, we've got one main goal for cache integration. Whenever we “fetch” a user using the GET endpoint in our API, we want to store that user in our Redis cache. Then the next time someone requests that user from our API, we'll grab them out of the cache. This will save us the trouble of making a longer call to our Postgres database.

Connecting to Redis

Haskell's Redis library has a lot of similarities to Persistent and Postgres. First, we’ll need some sort of data that tells us where to look for our database. For Postgres, we used a simple ConnectionString with a particular format. Redis uses a full data type called ConnectInfo.

data ConnectInfo = ConnectInfo
  { connectHost :: HostName -- String
  , connectPort :: PortId   -- (Can just be a number)
  , connectAuth :: Maybe ByteString
  , connectDatabase :: Integer
  , connectMaxConnection :: Int
  , connectMaxIdleTime :: NominalDiffTime
  }

This has many of the same fields we stored in our PG string, like the host IP address, and the port number. The rest of this article assumes you are running a local Redis instance at port 6379. This means we can use defaultConnectInfo. As always, in a real system you’d want to grab this information out of a configuration, so you’d need IO.

fetchRedisConnection :: IO ConnectInfo
fetchRedisConnection = return defaultConnectInfo

With Postgres, we used withPostgresqlConn to actually connect to the database. With Redis, we do this with the connect function. We'll get a Connection object that we can use to run Redis actions.

connect :: ConnectInfo -> IO Connection

With this connection, we simply use runRedis, and then combine it with an action. Here’s the wrapper runRedisAction we’ll write for that:

runRedisAction :: ConnectInfo -> Redis a -> IO a
runRedisAction redisInfo action = do
  connection <- connect redisInfo
  runRedis connection action

The Redis Monad

Just as we used the SqlPersistT monad with Persist, we’ll use the Redis monad to interact with our Redis cache. Our API is simple, so we’ll stick to three basic functions. The real types of these functions are a bit more complicated. But this is because of polymorphism related to transactions, and we won't be using those.

get :: ByteString -> Redis (Either x (Maybe ByteString))
set :: ByteString -> ByteString -> Redis (Either x ())
setex :: ByteString -> ByteString -> Int -> Redis (Either x ())

Redis is a key-value store, so everything we set here will use ByteString items. But once we’ve done that, these functions are all we need to use. The get function takes a ByteString of the key and delivers the value as another ByteString. The set function takes both the serialized key and value and stores them in the cache. The setex function does the same thing as set except that it also sets an expiration time for the item we’re storing.

Expiration is a very useful feature to be aware of, since most relational databases don’t have this. The nature of a cache is that it’s only supposed to store a subset of our information at any given time. If we never expire or delete anything, it might eventually store our whole database. That would defeat the purpose of using a cache! It's memory footprint should remain low compared to our database. So we'll use setex in our API.

Saving a User in Redis

So now let’s move on to the actions we’ll actually use in our API. First, we’ll write a function that will actually store a key-value pair of an Int64 key and the User in the database. Here’s how we start:

cacheUser :: ConnectInfo -> Int64 -> User -> IO ()
cacheUser redisInfo uid user = runRedisAction redisInfo $ setex ??? ??? ???

All we need to do now is convert our key and our value to ByteString values. We'll keep it simple and use Data.ByteString.Char8 combined with our Show and Read instances. Then we’ll create a Redis action using setex and expire the key after 3600 seconds (one hour).

import Data.ByteString.Char8 (pack, unpack)

...

cacheUser :: ConnectInfo -> Int64 -> User -> IO ()
cacheUser redisInfo uid user = runRedisAction redisInfo $ void $ 
  setex (pack . show $ uid) 3600 (pack . show $ user)

(We use void to ignore the result of the Redis call).

Fetching from Redis

Fetching a user is a similar process. We’ll take the connection information and the key we’re looking for. The action we’ll create uses the bytestring representation and calls get. But we can’t ignore the result of this call like we could before! Retrieving anything gives us Either e (Maybe ByteString). A Left response indicates an error, while Right Nothing indicates the key doesn’t exist. We’ll ignore the errors and treat the result as Maybe User though. If any error comes up, we’ll return Nothing. This means we run a simple pattern match:

fetchUserRedis :: ConnectInfo -> Int64 -> IO (Maybe User)
fetchUserRedis redisInfo uid = runRedisAction redisInfo $ do
  result <- Redis.get (pack . show $ uid)
  case result of
    Right (Just userString) -> return $ Just (read . unpack $ userString)
    _ -> return Nothing

If we do find something for that key, we’ll read it out of its ByteString format and then we’ll have our final User object.

Applying this to our API

Now that we’re all set up with our Redis functions, we have the update the fetchUsersHandler to use this cache. First, we now need to pass the Redis connection information as another parameter. For ease of reading, we’ll refer to these using type synonyms (PGInfo and RedisInfo) from now on:

type PGInfo = ConnectionString
type RedisInfo = ConnectInfo

…

fetchUsersHandler :: PGInfo -> RedisInfo -> Int64 -> Handler User
fetchUsersHandler pgInfo redisInfo uid = do
  ...

The first thing we’ll try is to look up the user by their ID in the Redis cache. If the user exists, we’ll immediately return that user.

fetchUsersHandler :: PGInfo -> RedisInfo -> Int64 -> Handler User
fetchUsersHandler pgInfo redisInfo uid = do
  maybeCachedUser <- liftIO $ fetchUserRedis redisInfo uid
  case maybeCachedUser of
    Just user -> return user
    Nothing -> do
      ...

If the user doesn’t exist, we’ll then drop into the logic of fetching the user in the database. We’ll replicate our logic of throwing an error if we find that user doesn’t actually exist. But if we find the user, we need one more step. Before we return it, we should call cacheUser and store it for the future.

fetchUsersHandler :: PGInfo -> RedisInfo -> Int64 -> Handler User
fetchUsersHandler pgInfo redisInfo uid = do
  maybeCachedUser <- liftIO $ fetchUserRedis redisInfo uid
  case maybeCachedUser of
    Just user -> return user
    Nothing -> do
      maybeUser <- liftIO $ fetchUserPG pgInfo uid
      case maybeUser of
        Just user -> liftIO (cacheUser redisInfo uid user) >> return user
        Nothing -> Handler $ (throwE $ err401 { errBody = "Could not find user with that ID" })

Since we changed our type signature, we’ll have to make a few other updates as well, but these are quite simple:

usersServer :: PGInfo -> RedisInfo -> Server UsersAPI
usersServer pgInfo redisInfo =
  (fetchUsersHandler pgInfo redisInfo) :<|> 
  (createUserHandler pgInfo)


runServer :: IO ()
runServer = do
  pgInfo <- fetchPostgresConnection
  redisInfo <- fetchRedisConnection
  run 8000 (serve usersAPI (usersServer pgInfo redisInfo))

And that’s it! We have a functioning cache with expiring entries. This means that repeated queries to our fetch endpoint should be much faster!

Conclusion

Caching is a vitally important way that we can write software that is often much faster for our users. Redis is a key-value store that we can use as a cache for our most frequently used data. We can use it as an alternative to forcing every single API call to hit our database. In Haskell, the Redis API requires everything to be a ByteString. So we have to deal with some logic surrounding encoding and decoding. But otherwise it operates in a very similar way to Persistent and Postgres.

Be sure to take a look at this code on Github! There’s a redis branch for this article. It includes all the code samples, including things I skipped over like imports!

We’re starting to get to the point where we’re using a lot of different libraries in our Haskell application! It pays to know how to organize everything, so package management is vital! I tend to use Stack for all my package management. It makes it quite easy to bring all these different libraries together. If you want to learn how to use Stack, check out our free Stack mini-course!

If you’ve never learned Haskell before, you should try it out! Download our Getting Started Checklist!

Read More
James Bowen James Bowen

Serve it up with Servant!

Last week we began our series on production Haskell techniques by learning about Persistent. We created a schema that contained a single User type that we could store in a Postgresql database. We examined a couple functions allowing us to make SQL queries about these users.

This week, we’ll see how we can expose this database to the outside world using an API. We’ll construct our API using the Servant library. Servant involves some advanced type level constructs, so there’s a lot to wrap your head around. There are definitely simpler approaches to HTTP servers than what Servant uses. But I’ve found that the power Servant gives us is well worth the effort.

This article will give a brief overview on Servant. But if you want a more in-depth introduction, you should check out my talk from Bayhac last spring! That talk was more exhaustive about the different combinators you can use in your APIs. It also showed authentication techniques, client functions and documentation. You can also check out the slides and code for that presentation!

Also, take a look at the servant branch on the Github repo for this project to see all the code for this article!

Defining our API

The first step in writing an API for our user database is to decide what the different endpoints are. We can decide this independent of what language or library we’ll use. For this article, our API will have two different endpoints. The first will be a POST request to /users. This request will contain a “user” definition in its body, and the result will be that we’ll create a user in our database. Here’s a sample of what this might look like:

POST /users
{
  userName : “John Doe”,
  userEmail : “john@doe.com”,
  userAge : 29,
  userOccupation: “Teacher”
}

It will then return a response containing the database key of the user we created. This will allow any clients to fetch the user again. The second endpoint will use the ID to fetch a user by their database identifier. It will be a GET request to /users/:userid. So for instance, the last request might have returned us something like 16. We could then do the following:

GET /users/16

And our response would look like the request body from above.

An API as a Type

So we’ve got our very simple API. How do we actually define this in Haskell, and more specifically with Servant? Well, Servant does something pretty unique (as far I’ve researched). In Servant we define our API by using a type. Our type will include sub-types for each of the endpoints of our API. We combine the different endpoints by using the (:<|>) operator. I'll sometimes refer to this as “E-plus”, for “endpoint-plus”. This is a type operator, like some of the operators we saw with dependent types and tensor flow. Here’s the blueprint of our API:

type UsersAPI = 
  fetchEndpoint
  :<|> createEndpoint

Now let's define what we mean by fetchEndpoint and createEndpoint. Endpoints combine different combinators that describe different information about the endpoint. We link combinators together with the (:>) operator, which I call “C-plus” (combinator plus). Here’s what our final API looks like. We’ll go through what each combinator means in the next section:

type UsersAPI =
       “users” :> Capture “userid” Int64 :> Get ‘[JSON] User
  :<|> “users” :> ReqBody ‘[JSON] User :> Post ‘[JSON] Int64

Different Combinators

Both of these endpoints have three different combinators. Let’s start by examining the fetch endpoint. It starts off with a string combinator. This is a path component, allowing us to specify what url extension the caller should use to hit to endpoint. We can use this combinator multiple times to have a more complicated path for the endpoint. If we instead wanted this endpoint to be at /api/users/:userid then we’d change it to:

“api” :> “users” :> Capture “userid” Int64 :> Get ‘[JSON] User

The second combinator (Capture) allows us to get a value out of the URL itself. We give this value a name and then we supply a type parameter. We won't have to do any path parsing or manipulation ourselves. Servant will handle the tricky business of parsing the URL and passing us an Int64. If you want to use your own custom class as a piece of HTTP data, that's not too difficult. You’ll just have to write an instance of the FromHttpApiData class. All the basic types like Int64 already have instances.

The final combinator itself contains three important pieces of information for this endpoint. First, it tells us that this is in fact a GET request. Second, it gives us the list of content-types that are allowable in the response. This is a type level list of content formats. Each type in this list must have different classes for serialization and deserialization of our data. We could have used a more complicated list like ’[JSON, PlainText, OctetStream]. But for the rest of this article, we’ll just use JSON. This means we'll use the ToJSON and FromJSON typeclasses for serialization.

The last piece of this combinator is the type our endpoint returns. So a successful request will give the caller back a response that contains a User in JSON format. Notice this isn’t a Maybe User. If the ID is not in our database, we’ll return a 401 error to indicate failure, rather than returning Nothing.

Our second endpoint has many similarities. It uses the same string path component. Then its final combinator is the same except that it indicates it is a POST request instead of a GET request. The second combinator then tells us what we can expect the request body to look like. In this case, the request body should contain a JSON representation of a User. It requires a list of acceptable content types, and then the type we want, like the Get and Post combinators.

That completes the “definition” of our API. We’ll need to add ToJSON and FromJSON instances of our User type in order for this to function. You can take a look at those on Github, and check out this article for more details on creating those instances!

Writing Handlers

Now that we’ve defined the type of our API, we need to write handler functions for each endpoint. This is where Servant’s awesomeness kicks in. We can map each endpoint up to a function that has a particular type based on the combinators in the endpoint. So, first let’s remember our endpoint for fetching a user:

“users” :> Capture “userid” Int64 :> Get ‘[JSON] User

The string path component doesn’t add any arguments to our function. The Capture component will result in a parameter of type Int64 that we’ll need in our function. Then the return type of our function should be User. This almost completely defines the type signature of our handler. We'll note though that it needs to be in the Handler monad. So here’s what it’ll look like:

fetchUsersHandler :: Int64 -> Handler User
...

Servant can also look at the type for our create endpoint:

“users” :> ReqBody ‘[JSON] User :> Post ‘[JSON] Int64

The parameter for a ReqBody parameter is just the type argument. So it will resolve the endpoint into the handler type:

createUserHandler :: User -> Handler Int64
...

Now, we’ll need to be able to access our Postgres database through both these handlers. So they’ll each get an extra parameter referring to the ConnectionString. We’ll pass that from our code so that by the time Servant is resolving the types, the parameter is accounted for:

fetchUsersHandler :: ConnectionString -> Int64 -> Handler User
createUserHandler :: ConnectionString -> User -> Handler Int64

Before we go any further, we should discuss the Handler monad. This is a wrapper around the monad ExceptT ServantErr IO. In other words, each of these requests might fail. To make it fail, we can throw errors of type ServantErr. Then of course we can also call IO functions, because these are network operations.

Before we implement these functions, let’s first write a couple simple helpers. These will use the runAction function from last week’s article to run database actions:

fetchUserPG :: ConnectionString -> Int64 -> IO (Maybe User)
fetchUserPG connString uid = runAction connString (get (toSqlKey uid))

createUserPG :: ConnectionString -> User -> IO Int64
createUserPG connString user = fromSqlKey <$> runAction connString (insert user)

For completeness (and use later in testing), we’ll also add a simple delete function. We need the where clause for type inference:

deleteUserPG :: ConnectionString -> Int64 -> IO ()
deleteUserPG connString uid = runAction connString (delete userKey)
  where
    userKey :: Key User
    userKey = toSqlKey uid

Now from our Servant handlers, we’ll call these two functions. This will completely cover the case of the create endpoint. But we’ll need a little bit more logic for the fetch endpoint. Since our functions are in the IO monad, we have to lift them up to Handler.

fetchUsersHandler :: ConnectionString -> Int64 -> Handler User
fetchUserHandler connString uid = do
  maybeUser <- liftIO $ fetchUserPG connString uid
  ...

createUserHandler :: ConnectionString -> User -> Handler Int64
createuserHandler connString user = liftIO $ createUserPG connString user

To complete our fetch handler, we need to account for a non-existent user. Instead of making the type of the whole endpoint a Maybe, we’ll throw a ServantErr in this case. We can use one of the built-in Servant error functions, which correspond to normal error codes. Then we can update the body. In this case, we’ll throw a 401 error. Here’s how we do that:

fetchUsersHandler :: ConnectionString -> Int64 -> Handler User
fetchUserHandler connString uid = do
  maybeUser <- lift $ fetchUserPG connString uid
  case maybeUser of
    Just user -> return user
    Nothing -> Handler $ (throwE $ err401 { errBody = “Could not find user with ID: “ ++ (show uid)})

createUserHandler :: ConnectionString -> User -> Handler Int64
createuserHandler connString user = lift $ createUserPG connString user

And that’s it! We're done with our handler functions!

Combining it All into a Server

Our next step is to create an object of type Server over our API. This is actually remarkably simple. When we defined the original type, we combined the endpoints with the (:<|>) operator. To make our Server, we do the same thing but with the handler functions:

usersServer :: ConnectionString -> Server UsersAPI
usersServer pgInfo = 
  (fetchUsersHandler pgInfo) :<|> 
  (createUserHandler pgInfo)

And Servant does all the work of ensuring that the type of each endpoint matches up with the type of the handler! It’s pretty awesome. Suppose we changed the type of our fetchUsersHandler so that it took a Key User instead of an Int64. We’d get a compile error:

fetchUsersHandler :: ConnectionString -> Int64 -> Handler User
…

-- Compile Error!
• Couldn't match type ‘Key User’ with ‘Int’
      Expected type: Server UsersAPI
        Actual type: (Key User -> Handler User)
                     :<|> (User -> Handler Int64)

There's now a mismatch between our API definition and our handler definition. So Servant knows to throw an error! The one issue is that the error messages can be rather difficult to interpret sometimes. This is especially the case when your API becomes very large! The “Actual type” section of the above error will become massive! So always be careful when changing your endpoints! Frequent compilation is your friend!

Building the Application

The final piece of the puzzle is to actually build an Application object out of our server. The first step of this process is to create a Proxy for our API. Remember that our API is a type, and not a term. But a Proxy allows us to represent this type at the term level. The concept is a little complicated, but the code is not!

import Data.Proxy

…

usersAPI :: Proxy UsersAPI
usersAPI = Proxy :: Proxy UsersAPI

Now we can make our runnable Application like so (assuming we have a Postgres connection):

serve usersAPI (usersServer connString)

We’ll run this server from port 8000 by using the run function, again from Network.Wai. (See Github for a full list of imports). We’ll fetch our connection string, and then we’re good to go!

runServer :: IO ()
runServer = do
  pgInfo <- fetchPostgresConnection
  run 8000 (serve usersAPI (usersServer pgInfo))

Conclusion

The Servant library offers some truly awesome possibilities. We’re able to define a web API at the type level. We can then define handler functions using the parameters the endpoints expect. Servant handles all the work of marshalling back and forth between the HTTP request and the native Haskell types. It also ensures a match between the endpoints and the handler function types!

If you want to see even more of the possibilities that Servant offers, you should watch my talk from Bayhac. It goes through some more advanced concepts like authentication and client side functions. You can get the slides and all the code examples for that talk here.

If you’ve never tried Haskell before, there’s no time like the present to start! Download our Getting Started Checklist for some tools to help start your Haskell journey!

Read More
James Bowen James Bowen

Trouble with Databases? Persevere with Persistent!

Our recent series at Monday Morning Haskell focused on machine learning. In particular, we did a deep dive on the Haskell Tensor Flow library. While AI is huge area indeed, it doesn't account for the bulk of day-to-day work. To build even a basic production system, there are a multitude of simpler tasks. In our newest series, we’ll be learning a host of different libraries to perform these tasks!

In this first article we’ll discuss Persistent. Many libraries allow you to make a quick SQL call. But Persistent does much more than that. With Persistent, you can link your Haskell types to your database definition. You can also make type-safe queries to save yourself the hassle of decoding data. All in all, it's a very cool system.

All the code for this series will be on Github! To follow along with this article, take a look at the persistent branch.

Our Basic Type

We’ll start by considering a simple user type that looks like this:

data User = User
  { userName :: Text
  , userEmail :: Text
  , userAge :: Int
  , userOccupation :: Text
  }

Imagine we want to store objects of this type in an SQL database. We’ll first need to define the table to store our users. We could do this with a manual SQL command or through an editor, but regardless, the process will be error prone. The command would look something like this:

create table users (
  name varchar(100),
  email varchar(100),
  age bigint,
  occupation varchar(100)
)

When we do this, there's nothing linking our Haskell data type to the table structure. If we update the Haskell code, we have to remember to update the database. And this means writing another error-prone command.

From our Haskell program, we’ll also want to make SQL queries based on the structure of the user. We could write out these raw commands and execute them, but the same issues apply. This method is error prone and not at all type-safe. Persistent helps us solve these problems.

Persistent and Template Haskell

We can get these bonuses from Persistent without all that much extra code! To do this, we’re going to use Template Haskell (TH). We’ve seen TH once in the past when we were deriving lenses and prisms for different data types. On that occasion we noted a few pros and cons of TH. It does allow us to avoid writing some boilerplate code. But it will make our compile times longer as well. It will also make our code less accessible to inexperienced Haskellers. With lenses though, it only saved us a few dozen lines of total code. With Persistent, TH generates a lot more code, so the pros definitely outweigh the cons.

When we created lenses with TH, we used a simple declaration makeLenses. Here, we’ll do something a bit more complicated. We’ll use a language construct called a “quasi-quoter”. This is a block of code that follows some syntax designed by the programmer or in a library, rather than normal Haskell syntax. It is often used in libraries that do some sort of foreign function interface. We delimit a quasi-quoter by a combination of brackets and pipes. Here’s what the Template Haskell call looks like. The quasi-quoter is the final argument:

import qualified Database.Persist.TH as PTH

PTH.share [PTH.mkPersist PTH.sqlSettings, PTH.mkMigrate "migrateAll"] [PTH.persistLowerCase|

|]

The share function takes a list of settings and then the quasi-quoter itself. It then generates the necessary Template Haskell for our data schema. Within this section, we’ll define all the different types our database will use. We notate certain settings about how those types. In particular we specify sqlSettings, so everything we do here will focus on an SQL database. More importantly, we also create a migration function, migrateAll. After this Template Haskell gets compiled, this function will allow us to migrate our DB. This means it will create all our tables for us!

But before we see this in action, we need to re-define our user type. Instead of defining User in the normal Haskell way, we’re going to define it within the quasi-quoter. Note that this level of Template Haskell requires many compiler extensions. Here’s our definition:

{-# LANGUAGE TemplateHaskell            #-}
{-# LANGUAGE QuasiQuotes                #-}
{-# LANGUAGE TypeFamilies               #-}
{-# LANGUAGE MultiParamTypeClasses      #-}
{-# LANGUAGE GADTs                      #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE RecordWildCards            #-}
{-# LANGUAGE FlexibleInstances          #-}
{-# LANGUAGE OverloadedStrings          #-}



PTH.share [PTH.mkPersist PTH.sqlSettings, PTH.mkMigrate "migrateAll"] [PTH.persistLowerCase|
  User sql=users
    name Text
    email Text
    age Int
    occupation Text
    UniqueEmail email
    deriving Show Read
|]

There are a lot of similarities to a normal data definition in Haskell. We’ve changed the formatting and reversed the order of the types and names. But you can still tell what’s going on. The field names are all there. We’ve still derived basic instances like we would in Haskell.

But we’ve also added some new directives. For instance, we’ve stated what the table name should be (by default it would be user, not users). We’ve also created a UniqueEmail constraint. This tells our database that each user has to have a unique email. The migration will handle creating all the necessary indices for this to work!

This Template Haskell will generate the normal Haskell data type for us. All fields will have the prefix user and will be camel-cased, as we specified. The compiler will also generate certain special instances for our type. These will enable us to use Persistent's type-safe query functions. Finally, this code generates lenses that we'll use as filters in our queries, as we'll see later.

Entities and Keys

Persistent also has a construct allowing us to handle database IDs. For each type we put in the schema, we’ll have a corresponding Entity type. An Entity refers to a row in our database, and it associates a database ID with the object itself. The database ID has the type SqlKey and is a wrapper around Int64. So the following would look like a valid entity:

import Database.Persist (Entity(..))

sampleUser :: Entity User
sampleUser = Entity (toSqlKey 1) $ User
  { userName = “admin”
  , userEmail = “admin@test.com”
  , userAge = 23
  , userOccupation = “System Administrator”
  }

This nice little abstraction that allows us to avoid muddling our user type with the database ID. This allows our other code to use a more pure User type.

The SqlPersistT Monad

So now that we have the basics of our schema, how do we actually interact with our database from Haskell code? As a specific example, we’ll be accessing a PostgresQL database. This requires the SqlPersistT monad. All the query functions return actions in this monad. The monad transformer has to live on top of a monad that is MonadIO, since we obviously need IO to run database queries.

If we’re trying to make a database query from a normal IO function, the first thing we need is a ConnectionString. This string encodes information about the location of the database. The connection string generally has 4-5 components. It has the host/IP address, the port, the database username, and the database name. So for instance if you’re running Postgres on your local machine, you might have something like:

{-# LANGUAGE OverloadedStrings #-}

import Database.Persist.Postgresql (ConnectionString)

connString :: ConnectionString
connString = “host=127.0.0.1 port=5432 user=postgres dbname=postgres password=password”

Now that we have the connection string, we’re set to call withPostgresqlConn. This function takes the string and then a function requiring a backend:

-- Also various constraints on the monad m
withPostgresqlConn :: (IsSqlBackend backend) => ConnectionString -> (backend -> m a) -> m a

The IsSqlBackend constraint forces us to use a type that conforms to Persistent’s guidelines. The SqlPersistT monad is only a synonym for ReaderT backend. So in general, the only thing we’ll do with this backend is use it as an argument to runReaderT. Once we’ve done this, we can pass any action within SqlPersistT as an argument to run.

import Control.Monad.Logger (runStdoutLoggingT)
import Database.Persist.Postgresql (ConnectionString, withPostgresqlConn, SqlPersistT)

…

runAction :: ConnectionString -> SqlPersistT a ->  IO a
runAction connectionString action = runStdoutLoggingT $ withPostgresqlConn connectionString $ \backend ->
  runReaderT action backend

Note we add in a call to runStdoutLoggingT so that our action can log its results, as Persistent expects. This is necessary whenever we use withPostgresqlConn. Here's how we would run our migration function:

migrateDB :: IO ()
migrateDB = runAction connString (runMigration migrateAll)

This will create the users table, perfectly to spec with our data definition!

Queries

Now let’s wrap up by taking a quick examination of the kinds of queries we can run. The first thing we could do is insert a new user into our database. For this, Persistent has the insert function. When we insert the user, we’ll get a key for that user as a result. Here’s the type signature for insert specified to our particular User type:

insert :: (MonadIO m) => User -> SqlPersistT m (Key User)

Then of course we can also do things in reverse. Suppose we have a key for our user and we want to get it out of the database. We’ll want the get function. Of course this might fail if there is no corresponding user in the database, so we need a Maybe.:

get :: (MonadIO m) => Key User -> SqlPersistT m (Maybe User)

We can use these functions for any type satisfying the PersistRecordBackend class. This is included for free when we use the template Haskell approach. So you can use these queries on any type that lives in your schema.

But SQL allows us to do much more than query with the key. Suppose we want to get all the users that meet certain criteria. We’ll want to use the selectList function, which replicates the behavior of the SQL SELECT command. It takes a couple different arguments for the different ways to run a selection. The two list types look a little complicated, but we’ll examine them in more detail:

selectList 
  :: PersistRecordBackend backend val 
  => [Filter val] 
  -> [SelectOpt val]
  -> SqlPersistT m [val]

As before, the PersistRecordBackend constraint is satisfied by any type in our TH schema. So we know our User type fits. So let’s examine the first argument. It provides a list of different filters that will determine which elements we fetch. For instance, suppose we want all users who are younger than 25 and whose occupation is “Teacher”. Remember the lenses I mentioned that get generated? We’ll create two different filters on this by using these lenses.

selectYoungTeachers :: (MonadIO m, MonadLogger m) => SqlPersistT m [User]
selectYoungTeachers = select [UserAge <. 25, UserOccupation ==. “Teacher”] []

We use the UserAge lens and the UserOccupation lens to choose the fields to filter on. We use a "less-than" operator to state that the age must be smaller than 25. Similarly, we use the ==. operator to match on the occupation. Then we provide an empty list of SelectOpts.

The second list of selection operations provides some other features we might expect in a select statement. First, we can provide an ordering on our returned data. We’ll also use the generated lenses here. For instance, Asc UserEmail will order our list by email. Here's an ordered query where we also limit ourselves to 100 entries. Here’s what that query would look like:

selectYoungTeachers’ :: (MonadIO m) => SqlPersistT m [User]
selectYoungTeachers’ = selectList [UserAge <=. 25, UserOccupation ==. “Teacher”] [Asc UserEmail]

The other types of SelectOpts include limits and offsets. For instance, we can further modify this query to exclude the first 5 users (as ordered by email) and then limit our selection to 100:

selectYoungTeachers' :: (MonadIO m) => SqlPersistT m [Entity User]
selectYoungTeachers' = selectList
  [UserAge <. 25, UserOccupation ==. "Teacher"] [Asc UserEmail, OffsetBy 5, LimitTo 100]

And that’s all there is to making queries that are type-safe and sensible. We know we’re actually filtering on values that make sense for our types. We don’t have to worry about typos ruining our code at runtime.

Conclusion

Persistent gives us some excellent tools for interacting with databases from Haskell. The Template Haskell mechanisms generate a lot of boilerplate code that helps us. For instance, we can migrate our database to the create the correct tables for our Haskell types. We also can perform queries that filter results in a type-safe way. All in all, it’s a fantastic experience.

Never tried Haskell? Do other languages frustrate you with runtime errors whenever you try to run SQL queries? You should give Haskell a try! Download our Getting Started Checklist for some tools that will help you!

If you’re a little familiar with Haskell but aren’t sure how to incorporate libraries like Persistent, you should check out our Stack mini-course. It’ll walk you through the basics of making a simple Haskell program using Stack.

Read More