James Bowen James Bowen

Spock II: Databases and Sessions!

Spock.png

Last week we learned the basics of the the Spock library. We saw how to set up some simple routes. Like Servant, there's a bit of dependent-type machinery with routing. But we didn't need to learn any complex operators. We just needed to match up the number of arguments to our routes. We also saw how to use an application state to persist some data between requests.

This week, we'll add a couple more complex features to our Spock application. First we'll connect to a database. Second, we'll use sessions to keep track of users.

For some more examples of useful Haskell libraries, check out our Production Checklist!

Adding a Database

Last week, we added some global application state. Even with this improvement, our vistor count doesn't persist. When we reset the server, everything goes away, and our users will see a different number. We can change this by adding a database connection to our server. We'll follow the Spock tutorial example and connect to an SQLite database by using Persistent.

If you haven't used Persistent before, take a look at this tutorial in our Haskell Web series! You can also look at our sample code on Github for any of the boilerplate you might be missing. Here's the super simple schema we'll use. Remember that Persistent will give us an auto-incrementing primary key.

share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persistLowerCase|
  NameEntry json
    name Text
    deriving Show
|]

Spock expects us to use a pool of connections to our database when we use it. So let's create one to an SQLite file using createSqlitePool. We need to run this from a logging monad. While we're at it, we can migrate our database from the main startup function. This ensures we're using an up-to-date schema:

import Database.Persist.Sqlite (createSqlitePool)

...

main :: IO ()
main = do
  ref <- newIORef M.empty
  pool <- runStdoutLoggingT $ createSqlitePool "spock_example.db" 5
  runStdoutLoggingT $ runSqlPool (runMigration migrateAll) pool
  ...

Now that we've created this pool, we can pass that to our configuration. We'll use the PCPool constructor. We're now using an SQLBackend for our server, so we'll also have to change the type of our router to reflect this:

main :: IO ()
main = do
  …
  spockConfig <-
    defaultSpockCfg EmptySession (PCPool pool) (AppState ref)
  runSpock 8080 (spock spockConfig app)

app :: SpockM SqlBackend MySession AppState ()
app = ...

Now we want to update our route action to access the database instead of this map. But first, we'll write a helper function that will allow us to call any SQL action from within our SpockM monad. It looks like this:

runSQL :: (HasSpock m, SpockConn m ~ SqlBackend)
  => SqlPersistT (LoggingT IO) a -> m a
runSQL action = runQuery $ \conn -> 
  runStdoutLoggingT $ runSqlConn action conn

At the core of this is the runQuery function from the Spock library. It works since our router now uses SpockM SqlBackend instead of SpockM (). Now let's write a couple SQL actions we can use. We'll have one performing a lookup by name, and returning the Key of the first entry that matches, if one exists. Then we'll also have one that will insert a new name and return its key.

fetchByName
  :: T.Text
  -> SqlPersistT (LoggingT IO) (Maybe Int64)
fetchByName name = (fmap (fromSqlKey . entityKey)) <$> 
  (listToMaybe <$> selectList [NameEntryName ==. name] [])

insertAndReturnKey
  :: T.Text
  -> SqlPersistT (LoggingT IO) Int64
insertAndReturnKey name = fromSqlKey <$> insert (NameEntry name)

Now we can use these functions instead of our map!

app :: SpockM SqlBackend MySession AppState ()
app = do
  get root $ text "Hello World!"
  get ("hello" <//> var) $ \name -> do
    existingKeyMaybe <- runSQL $ fetchByName name
    visitorNumber <- case existingKeyMaybe of
      Nothing -> runSQL $ insertAndReturnKey name
      Just i -> return i
    text ("Hello " <> name <> ", you are visitor number " <> 
      T.pack (show visitorNumber))

And voila! We can shutdown our server between runs, and we'll preserve the visitors we've seen!

Tracking Users

Now, using a route to identify our users isn't what we want to do. Anyone can visit any route after all! So for the last modification to the server, we're going to add a small "login" functionality. We'll use the App's session to track what user is currently visiting. Our new flow will look like this:

  1. We'll change our entry route to /hello.
  2. If the user visits this, we'll show a field allowing them to enter their name and log in.
  3. Pressing the login button will send a post request to our server. This will update the session to match the session ID with the username.
  4. It will then send the user to the /home page, which will greet them and present a logout button.
  5. If they log out, we'll clear the session.

Note that using the session is different from using the app state map that we had in the first part. We share the app state across everyone who uses our server. But the session will contain user-specific references.

Adding a Session

The first step is to change our session type. Once again, we'll use a IORef wrapper around a map. This time though, we'll use a simple type synonym to simplify things. Here's our type definition and the updated main function.

type MySession = IORef (M.Map T.Text T.Text)

main :: IO ()
main = do
  ref <- newIORef M.empty
  -- Initialize a reference for the session
  sessionRef <- newIORef M.empty
  pool <- runStdoutLoggingT $ createSqlitePool "spock_example.db" 5
  runStdoutLoggingT $ runSqlPool (runMigration migrateAll) pool
  -- Pass that reference!
  spockConfig <-
    defaultSpockCfg sessionRef (PCPool pool) (AppState ref)
  runSpock 8080 (spock spockConfig app)

Updating the Hello Page

Now let's update our "Hello" page. Check out the appendix below for what our helloHTML looks like. It's a "login" form with a username field and a submit button.

-- Notice we use MySession!
app :: SpockM SqlBackend MySession AppState ()
app = do
  get root $ text "Hello World!"
  get "hello" $ html helloHTML
  ...

Now we need to add a handler for the post request to /hello. We'll use the post function instead of get. Now instead of our action taking an argument, we'll extract the post body using the body function. If our application were more complicated, we would want to use a proper library for Form URL encoding and decoding. But for this small example, we'll use a simple helper decodeUsername. You can view this helper in the appendix.

app :: SpockM SqlBackend MySession AppState ()
app = do
  …
  post "hello" $ do
    nameEntry <- decodeUsername <$> body
    ...

Now we want to save this user using our session and then redirect them to the home page. First we'll need to get the session ID and the session itself. We use the functions getSessionId and readSession for this. Then we'll want to update our session by associating the name with the session ID. Finally, we'll redirect to home.

post "hello" $ do
  nameEntry <- decodeUsername <$> body
  sessId <- getSessionId 
  currentSessionRef <- readSession
  liftIO $ modifyIORef' currentSessionRef $
    M.insert sessId (nameEntryName nameEntry)
  redirect "home"

The Home Page

Now on the home page, we'll want to check if we've got a user associated with the session ID. If we do, we'll display some text greeting that user (and also display a logout button). Again, we need to invoke getSessionId and readSession. If we have no user associated with the session, we'll bounce them back to the hello page.

get "home" $ do
  sessId <- getSessionId 
  currentSessionRef <- readSession
  currentSession <- liftIO $ readIORef currentSessionRef
  case M.lookup sessId currentSession of
    Nothing -> redirect "hello"
    Just name -> html $ homeHTML name

The last piece of functionality we need is to "logout". We'll follow the familiar pattern of getting the session ID and session. This time, we'll change the session by clearing the session key. Then we'll redirect the user back to the hello page.

post "logout" $ do
  sessId <- getSessionId 
  currentSessionRef <- readSession
  liftIO $ modifyIORef' currentSessionRef $ M.delete sessId
  redirect "hello"

And now our site tracks our users' sessions! We can access the same page as a different user on different sessions!

Conclusion

This wraps up our exploration of the Spock library! We've done a shallow but wide look at some of the different features Spock has to offer. We saw several different ways to persist information across requests on our server! Connecting to a database is the most important. But using the session is a pretty advanced feature that is quite easy in Spock!

For some more cool examples of Haskell web libraries, take a look at our Web Skills Series! You can also download our Production Checklist for even more ideas!

Appendix - HTML Fragments and Helpers

helloHTML :: T.Text
helloHTML =
  "<html>\
    \<body>\
      \<p>Hello! Please enter your username!\
      \<form action=\"/hello\" method=\"post\">\
        \Username: <input type=\"text\" name=\"username\"><br>\
        \<input type=\"submit\"><br>\
      \</form>\
    \</body>\
  \</html>"

homeHTML :: T.Text -> T.Text
homeHTML name =
  "<html><body><p>Hello " <> name <> 
    "</p>\
    \<form action=\"logout\" method=\"post\">\
      \<input type=\"submit\" name=\"logout_button\"<br>\
    \</form>\
  \</body>\
  \</html>" 

-- Note: 61 -> '=' in ASCII
-- We expect input like "username=christopher"
parseUsername :: B.ByteString -> T.Text
parseUsername input = decodeUtf8 $ B.drop 1 tail_
  where
    tail_ = B.dropWhile (/= 61) input
Read More
James Bowen James Bowen

Simple Web Routing with Spock!

spock_image2.jpg

In our Haskell Web Series, we go over the basics of how we can build a web application with Haskell. That includes using Persistent for our database layer, and Servant for our HTTP layer. But these aren't the only libraries for those tasks in the Haskell ecosystem.

We've already looked at how to use Beam as another potential database library. In these next two articles, we'll examine Spock, another HTTP library. We'll compare it to Servant and see what the different design decisions are. We'll start this week by looking at the basics of routing. We'll also see how to use a global application state to coordinate information on our server. Next week, we'll see how to hook up a database and use sessions.

For some useful libraries, make sure to download our Production Checklist. It will give you some more ideas for libraries you can use even beyond these! Also, you can follow along the code here by looking at our Github repository!

Getting Started

Spock gives us a helpful starting point for making a basic server. We'll begin by taking a look at the starter code on their homepage. Here's our initial adaptation of it:

data MySession = EmptySession
data MyAppState = DummyAppState (IORef Int)

main :: IO ()
main = do
  ref <- newIORef 0
  spockConfig <- defaultSpockCfg EmptySession PCNoDatabase (DummyAppState ref)
  runSpock 8080 (spock spockConfig app)

app :: SpockM () MySession MyAppState ()
app = do
  get root $ text "Hello World!"
  get ("hello" <//> var) $ \name -> do
    (DummyAppState ref) <- getState
    visitorNumber <- liftIO $ atomicModifyIORef' ref $ \i -> (i+1, i+1)
    text ("Hello " <> name <> ", you are visitor number " <> T.pack (show visitorNumber))

In our main function, we initialize an IO ref that we'll use as the only "state" of our application. Then we'll create a configuration object for our server. Last, we'll run our server using our app specification of the actual routes.

The configuration has a few important fields attached to it. For now, we're using dummy values for all these. Our config wants a Session, which we've defined as EmptySession. It also wants some kind of a database, which we'll add later. Finally, it includes an application state, and for now we'll only supply our pointer to an integer. We'll see later how we can add a bit more flavor to each of these parameters. But for the moment, let's dig a bit deeper into the app expression that defines the routing for our Server.

The SpockM Monad

Our router lives in the SpockM monad. We can see this has three different type parameters. Remember the defaultSpockConfig had three comparable arguments! We have the empty session as MySession and the IORef app state as MyAppState. Finally, there's an extra () parameter corresponding to our empty database. (The return value of our router is also ()).

Now each element of this monad is a path component. These path components use HTTP verbs, as you might expect. At the moment, our router only has a couple get routes. The first lies at the root of our path, and outputs Hello World!. The second lies at hello/{name}. It will print a message specifying the input name while keeping track of how many visitors we've had.

Composing Routes

Now let's talk a little bit now about the structure of our router code. The SpockM monad works like a Writer monad. Each action we take adds a new route to the application. In this case, we take two actions, each responding to get requests (we'll see an example of a post request next week).

For any of our HTTP verbs, the first argument will be a representation of the path. On our first route, we use the hard-coded root expression to refer to the / path. For our second expression, we have a couple different components that we combine with <//>.

First, we have a string path component hello. We could combine other strings as well. Let's suppose we wanted the route /api/hello/world. We'd use the expression:

"api" <//> "hello" <//> "world"

In our original code though, the second part of the path is a var. This allows us to substitute information into the path. When we visit /hello/james, we'll be able to get the path component james as a variable. Spock passes this argument to the function we have as the second argument of the get combinator.

This argument has a rather complicated type RouteSpec. We don't need to go into the details here. But the simplest thing we can return is some raw text by using the text combinator. (We could also use html if we have our own template). We conclude both our route definitions by doing this.

Notice that the expression for our first route has no parameters, while the second has one parameter. As you might guess, the parameter in the second route refers to the variable we can pull out of the path thanks to var. We have the same number of var elements in the path as we do arguments to the function. Spock uses dependent types to ensure these match.

Using the App State

Now that we know the basics, let's start using some of Spock's more advanced features. This week, we'll see how to use the App State.

Currently, we bump the visitor count each time we visit the route with a name, even if that name is the same. So visiting /hello/michael the first time results in:

Hello michael, you are visitor number 1

Then we'll visit again and see:

Hello michael, you are visitor number 2

Instead, let's make it so we assign each name to a particular number. This way, when a user visits the same route again, they'll see what number they originally were.

Making this change is rather easy. Instead of using an IORef on an Int for our state, we'll use a mapping from Text to Int:

data AppState = AppState (IORef (M.Map Text Int))

Now we'll initialize our ref with an empty map and pass it to our config:

main :: IO ()
main = do
  ref <- newIORef M.empty
  spockConfig <- defaultSpockCfg EmptySession PCNoDatabase (AppState ref)
  runSpock 8080 (spock spockConfig app)

And for our hello/{name} route, we'll update it to follow this process:

  1. Get the map reference
  2. See if we have an entry for this user yet.
  3. If not, insert them with the length of the map, and write this back to our IORef
  4. Return the message

This process is pretty straightforward. Let's see what it looks like:

app :: SpockM () MySession AppState ()
app = do
  get root $ text "Hello World!"
  get ("hello" <//> var) $ \name -> do
    (AppState mapRef) <- getState
    visitorNumber <- liftIO $ atomicModifyIORef' mapRef $ updateMapWithName name
    text ("Hello " <> name <> ", you are visitor number " <> T.pack (show visitorNumber))

updateMapWithName :: T.Text -> M.Map T.Text Int -> (M.Map T.Text Int, Int)
updateMapWithName name nameMap = case M.lookup name nameMap of
  Nothing -> (M.insert name (mapSize + 1) nameMap, mapSize + 1)
  Just i -> (nameMap, i)
  where
    mapSize = M.size nameMap

We create a function to update the map every time our app encounters a new name. The we update our IORef with atomicModifyIORef. And now if we visit /hello/michael twice in a row, we'll get the same output both times!

Conclusion

That's as far as we'll go this week! We covered the basics of how to make a basic application in Spock. We saw the basics of composing routes. Then we saw how we could use the app state to keep track of information across requests. Next week, we'll improve this process by adding a database to our application. We'll also use sessions to keep track of users.

For more cool libraries, read up on our Haskell Web Series. Also, you can download our Production Checklist for more ideas!

Read More
James Bowen James Bowen

Keeping it Clean: Haskell Code Formatters

clean_code.png

A long time ago, we had an article that featured some tips on how to organize your import statements. As far as I remember, that’s the only piece we’ve ever done on code formatting. But as you work on projects with more people, formatting is one thing you need to consider to keep everyone sane. You’d like to have a consistent style across the code base. That way, there’s less controversy in code reviews, and people don't need to think as much to update code. They shouldn't have to wonder about following the style guide or what exists in a fragment of code.

This week, we’re going to go over three different Haskell code formatting tools. We’ll examine Stylish Haskell, Hindent, and Brittany. These all have their pluses and minuses, as we’ll see.

For some ideas of what Haskell projects you can do, download our Production Checklist. You can also take our free Stack mini-course and learn how to use Stack to organize your code!

Stylish Haskell

The first tool we’ll look at is Stylish Haskell. This is a straightforward tool to use, as it does some cool things with no configuration required. Let’s take a look at a poorly formatted version of code from our Beam article.

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE ImpredicativeTypes #-}

module Schema where

import Database.Beam
import Database.Beam.Backend
import Database.Beam.Migrate
import Database.Beam.Sqlite
import Database.SQLite.Simple (open, Connection)

import Data.Int (Int64)
import Data.Text (Text)
import Data.Time (UTCTime)
import qualified Data.UUID as U

data UserT f = User
  { _userId :: Columnar f Int64
  , _userName :: Columnar f Text
  , _userEmail :: Columnar f Text
  , _userAge :: Columnar f Int
  , _userOccupation :: Columnar f Text
  } deriving (Generic)

There are many undesirable things here. Our language pragmas don’t line up their end braces. They also aren’t in any discernible order. Our imports are also not lined up, and neither are the fields in our data types.

Stylish Haskell can fix all this. First, we’ll install it globally with:

stack install stylish-haskell

(You can also use cabal instead of stack). Then we can call the stylish-haskell command on a file. By default, it will output the results to the terminal. But if we pass the -i flag, it will update the file in place. This will make all the changes we want to line up the various statements in our file!

>> stylish-haskell -i Schema.hs

--- Result:

{-# LANGUAGE DeriveGeneric         #-}
{-# LANGUAGE FlexibleContexts      #-}
{-# LANGUAGE FlexibleInstances     #-}
{-# LANGUAGE GADTs                 #-}
{-# LANGUAGE ImpredicativeTypes    #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE OverloadedStrings     #-}
{-# LANGUAGE StandaloneDeriving    #-}
{-# LANGUAGE TypeApplications      #-}
{-# LANGUAGE TypeFamilies          #-}
{-# LANGUAGE TypeSynonymInstances  #-}

module Schema where

import           Database.Beam
import           Database.Beam.Backend
import           Database.Beam.Migrate
import           Database.Beam.Sqlite
import           Database.SQLite.Simple (Connection, open)

import           Data.Int               (Int64)
import           Data.Text              (Text)
import           Data.Time              (UTCTime)
import qualified Data.UUID              as U

data UserT f = User
  { _userId         :: Columnar f Int64
  , _userName       :: Columnar f Text
  , _userEmail      :: Columnar f Text
  , _userAge        :: Columnar f Int
  , _userOccupation :: Columnar f Text
  } deriving (Generic)

Stylish Haskell integrates well with most common editors. For instance, if you use Vim, you can also run the command from within the editor with the command:

:%!stylish-haskell

We get all these features without any configuration. If we want to change things though, we can create a configuration file. We’ll make a default file with the following command:

stylish-haskell --defaults > .stylish-haskell.yaml

Then if we want, we can modify it a bit. For one example, we've aligned our imports above globally. This means they all leave space for qualified. But we can decide we don’t want a group of imports to have that space if there are no qualified imports. There’s a setting for this in the config. By default, it looks like this:

imports:
  align: global

We can change it to group to ensure our imports are only aligned within their grouping.

imports:
  align: group

And now when we run the command, we’ll get a different result:

module Schema where

import Database.Beam
import Database.Beam.Backend
import Database.Beam.Migrate
import Database.Beam.Sqlite
import Database.SQLite.Simple (Connection, open)

import           Data.Int  (Int64)
import           Data.Text (Text)
import           Data.Time (UTCTime)
import qualified Data.UUID as U

So in short, Stylish Haskell is a great tool for a limited scope. It has uncontroversial suggestions for several areas like imports and pragmas. It also removes trailing whitespace, and adjusts case statements sensibly. That said, it doesn’t affect your main Haskell code. Let’s look at a couple tools that can do that.

Hindent

Another program we can use is hindent. As its name implies, it deals with updating whitespace and indentation levels. Let’s look at a very simple example. Consider this code, adapted from our Beam article:

user1' = User default_  (val_ "James")  (val_ "james@example.com")  (val_ 25)  (val_ "programmer")

findUsers :: Connection -> IO ()
findUsers conn = runBeamSqlite conn $ do
    users <- runSelectReturningList $ select $ do
        user <- (all_ (_blogUsers blogDb))
        article <- (all_ (_blogArticles blogDb))
        guard_ (user ^. userName ==. (val_ "James"))
        guard_ (article ^. articleUserId ==. user ^. userId) 
        return (user, article)
    mapM_ (liftIO . putStrLn . show) users

There are a few things we could change. First, we might want to update the indentation level so that it is 2 instead of 4. Second, let's restrict the line size to only being 80. When we run hindent on this file, it’ll make the changes.

user1' =
  User
    default_
    (val_ "James")
    (val_ "james@example.com")
    (val_ 25)
    (val_ "programmer")

findUsers :: Connection -> IO ()
findUsers conn =
  runBeamSqlite conn $ do
    users <-
      runSelectReturningList $
      select $ do
        user <- (all_ (_blogUsers blogDb))
        article <- (all_ (_blogArticles blogDb))
        guard_ (user ^. userName ==. (val_ "James"))
        guard_ (article ^. articleUserId ==. user ^. userId)
        return (user, article)
    mapM_ (liftIO . putStrLn . show) users

Hindent is also configurable. We can create a file .hindent.yaml. By default, we would have the following configuration:

indent-size: 2
line-length: 80
force-trailing-newline: true

But then we can change it if we want so that the indentation level is 3:

indent-size: 3

And now when we run it, we’ll actually see that it’s changed to reflect that:

findUsers :: Connection -> IO ()
findUsers conn =
   runBeamSqlite conn $ do
      users <-
         runSelectReturningList $
         select $ do
            user <- (all_ (_blogUsers blogDb))
            article <- (all_ (_blogArticles blogDb))
            guard_ (user ^. userName ==. (val_ "James"))
            guard_ (article ^. articleUserId ==. user ^. userId)
            return (user, article)
      mapM_ (liftIO . putStrLn . show) users

Hindent also has some other effects that, as far as I can tell, are not configurable. You can see that the separation of lines was not preserved above. In another example, it spaced out instance definitions that I had grouped in another file:

-- BEFORE
deriving instance Show User
deriving instance Eq User
deriving instance Show UserId
deriving instance Eq UserId

-- AFTER
deriving instance Show User

deriving instance Eq User

deriving instance Show UserId

deriving instance Eq UserId

So make sure you’re aware of everything it does before committing to using it. Like stylish-haskell, hindent integrates well with text editors.

Brittany

Brittany is an alternative to Hindent for modifying your expression definitions. It mainly focuses on the use of horizontal space throughout your code. As far as I see, it doesn’t line up language pragmas or change import statements in the way stylish-haskell does. It also doesn’t touch data type declarations. Instead, it seeks to reformat your code to make maximal use of space while avoiding lines that are too long. As an example, we could look at this line from our Beam example:

insertArticles :: Connection -> IO ()
insertArticles conn = runBeamSqlite conn $ runInsert $ 
  insert (_blogArticles blogDb) $ insertValues articles

Our decision on where to separate the line is a little bit arbitrary. But at the very least we don’t try to cram it all on one line. But if we have either the approach above or the one-line version, Brittany will change it to this:

brittany --write-mode=inplace MyModule.hs

--

insertArticles :: Connection -> IO ()
insertArticles conn =
  runBeamSqlite conn $ runInsert $ insert (_blogArticles blogDb) $ insertValues
    articles

This makes “better” use of horizontal space in the sense that we get as much on the first line. That said, one could argue that the first approach we have actually looks nicer. Brittany can also change type signatures that overflow the line limit. Suppose we have this arbitrary type signature that’s too long for a single line:

myReallyLongFunction :: State ComplexType Double -> Maybe Double -> Either Double ComplexType -> IO a -> StateT ComplexType IO a

Brittany will fix it up so that each argument type is on a single line:

myReallyLongFunction
  :: State ComplexType Double
  -> Maybe Double
  -> Either Double ComplexType
  -> IO a
  -> StateT ComplexType IO a

This can be useful in projects with very complicated types. The structure makes it easier for you to add Haddock comments to the various arguments.

Dangers

There is of course, a (small) danger to using tools like these. If you’re going to use them, you want to ensure everyone on the project is using them. Suppose person A isn’t using the program, and commits code that isn’t formatted by the program. Person B might then look through that code, and their editor will correct the file. This will leave them with local changes to the file that aren’t relevant to whatever work they’re doing. This can cause a lot of confusion when they submit code for review. Whoever reviews their code has to sift through the format changes, which slows the review.

People can also have (unreasonably) strong opinions about code formatting. So it’s generally something you want to nail down early on a project and avoid changing afterward. With the examples in this article, I would say it would be an easy sell to use Stylish Haskell on a project. However, the specific choices made in H-Indent and Brittany can be more controversial. So it might cause more problems than it would solve to institute those project-wide.

Conclusion

It’s possible to lose a surprising amount of productivity to code formatting. So it can be important to nail down standards early and often. Code formatting programs can make it easy to enforce particular standards. They’re also very simple to incorporate into your projects with stack and your editor of choice!

Now that you know how to format your code, need some suggestions for what to work on next? Take a look at our Production Checklist! It’ll give you some cool ideas of libraries you can use for building Haskell web apps and much more!

Read More
James Bowen James Bowen

Beam: Database Power without Template Haskell!

haskell_beam.png

As part of our Haskell Web Series, we examined the Persistent and Esqueleto libraries. The first of these allows you to create a database schema in a special syntax. You can then use Template Haskell to generate all the necessary Haskell data types and instances for your types. Even better, you can write Haskell code to query on these that resembles SQL. These queries are type-safe, which is awesome. However, the need to specify our schema with template Haskell presented some drawbacks. For instance, the code takes longer to compile and is less approachable for beginners.

This week on the blog, we'll be exploring another database library called Beam. This library allows us to specify our database schema without using Template Haskell. There's some boilerplate involved, but it's not bad at all! Like Persistent, Beam has support for many backends, such as SQLite and PostgresQL. Unlike Persistent, Beam also supports join queries as a built-in part of its system.

For some more ideas on advanced libraries, be sure to check out our Production Checklist! It includes a couple more different database options to look at.

Specifying our Types

As a first note, while Beam doesn't require Template Haskell, it does need a lot of other compiler extensions. You can look at those in the appendix below, or else take a look at the example code on Github. Now let's think back to how we specified our schema when using Persistent:

import qualified Database.Persist.TH as PTH

PTH.share [PTH.mkPersist PTH.sqlSettings, PTH.mkMigrate "migrateAll"] [PTH.persistLowerCase|
  User sql=users
    name Text
    email Text
    age Int
    occupation Text
    UniqueEmail email
    deriving Show Read Eq

  Article sql=articles
    title Text
    body Text
    publishedTime UTCTime
    authorId UserId
    UniqueTitle title
    deriving Show Read Eq

With Beam, we won't use Template Haskell, so we'll actually be creating normal Haskell data types. There will still be some oddities though. First, by convention, we'll specify our types with the extra character T at the end. This is unnecessary, but the convention helps us remember what types relate to tables. We'll also have to provide an extra type parameter f, that we'll get into a bit more later:

data UserT f =
  …

data ArticleT f =
  ...

Our next convention will be to use an underscore in front of our field names. We will also, unlike Persistent, specify the type name in the field names. With these conventions, I'm following the advice of the library's creator, Travis.

data UserT f =
  { _userId :: ...
  , _userName :: …
  , _userEmail :: …
  , _userAge :: …
  , _userOccupation :: …
  }

data ArticleT f =
  { _articleId :: …
  , _articleTitle :: …
  , _articleBody :: …
  , _articlePublishedTime :: …
  }

So when we specify the actual types of each field, we'll just put the relevant data type, like Int, Text or whatever, right? Well, not quite. To complete our types, we're going to fill in each field with the type we want, except specified via Columnar f. Also, we'll derive Generic on both of these types, which will allow Beam to work its magic:

data UserT f =
  { _userId :: Columnar f Int64
  , _userName :: Columnar f Text
  , _userEmail :: Columnar f Text
  , _userAge :: Columnar f Int
  , _userOccupation :: Columnar f Text
  } deriving (Generic)

data ArticleT f =
  { _articleId :: Columnar f Int64
  , _articleTitle :: Columnar f Text
  , _articleBody :: Columnar f Text
  , _articlePublishedTime :: Columnar f Int64 -- Unix Epoch
  } deriving (Generic)

Now there are a couple small differences between this and our previous schema. First, we have the primary key as an explicit field of our type. With Persistent, we separated it using the Entity abstraction. We'll see below how we can deal with situations where that key isn't known. The second difference is that (for now), we've left out the userId field on the article. We'll add this when we deal with primary keys.

Columar

So what exactly is this Columnar business about? Well under most circumstances, we'd like to specify a User with the raw field types. But there are some situations where we'll have to use a more complicated type for an SQL expression. Let's start with the simple case first.

Luckily, Columnar works in such a way that if we useIdentity for f, we can use raw types to fill in the field values. We'll make a type synonym specifically for this identity case. We can then make some examples:

type User = UserT Identity
type Article = ArticleT Identity

user1 :: User
user1 = User 1 "James" "james@example.com" 25 "programmer"

user2 :: User
user2 = User 2 "Katie" "katie@example.com " 25 "engineer"

users :: [User]
users = [ user1, user2 ]

As a note, if you find it cumbersome to repeat the Columnar keyword, you can shorten it to C:

data UserT f =
  { _userId :: C f Int64
  , _userName :: C f Text
  , _userEmail :: C f Text
  , _userAge :: C f Int
  , _userOccupation :: C f Text
  } deriving (Generic)

Now, our initial examples will assign all our fields with raw values. So we won't initially need to use anything for the f parameter besides Identity. Further down though, we'll deal with the case of auto-incrementing primary keys. In this case, we'll use the default_ function, whose type is actually a Beam form of an SQL expression. In this case, we'll be using a different type for f, but the flexibility will allow us to keep using our User constructor!

Instances for Our Types

Now that we've specified our types, we can use the Beamable and Table type classes to tell Beam more about our types. Before we can make any of these types a Table, we'll want to assign its primary key type. So let's make a couple more type synonyms to represent these:

type UserId = PrimaryKey UserT Identity
type ArticleId = PrimaryKey ArticleT Identity

While we're at it, let's add that foreign key to our Article type:

data ArticleT f =
  { _articleId :: Columnar f Int64
  , _articleTitle :: Columnar f Text
  , _articleBody :: Columnar f Text
  , _articlePublishedTime :: Columnar f Int64
  , _articleUserId :: PrimaryKey UserT f
  } deriving (Generic)

We can now generate instances for Beamable both on our main types and on the primary key types. We'll also derive instances for Show and Eq:

data UserT f =
  …

deriving instance Show User
deriving instance Eq User

instance Beamable UserT
instance Beamable (PrimaryKey UserT)

data ArticleT f =
  …

deriving instance Show Article
deriving instance Eq Article

instance Beamable ArticleT
instance Beamable (PrimaryKey ArticleT)

Now we'll create an instance for the Table class. This will involve some type family syntax. We'll specify UserId and ArticleId as our primary key data types. Then we can fill in the primaryKey function to match up the right field.

instance Table UserT where
  data PrimaryKey UserT f = UserId (Columnar f Int64) deriving Generic
  primaryKey = UserId . _userId

instance Table ArticleT where
  data PrimaryKey ArticleT f = ArticleId (Columnar f Int64) deriving Generic
  primaryKey = ArticleId . _articleId

Accessor Lenses

We'll do one more thing to mimic Persistent. The Template Haskell automatically generated lenses for us. We could use those when making database queries. Below, we'll use something similar. But we'll use a special function, tableLenses, to make these rather than Template Haskell. If you remember back to how we used the Servant Client library, we could create client functions by using client and matching it against a pattern. We'll do something similar with tableLenses. We'll use LensFor on each field of our tables, and create a pattern constructing an item.

User
  (LensFor userId)
  (LensFor userName)
  (LensFor userEmail)
  (LensFor userAge)
  (LensFor userOccupation) = tableLenses

Article
  (LensFor articleId)
  (LensFor articleTitle)
  (LensFor articleBody)
  (LensFor articlePublishedTime)
  (UserId (LensFor articuleUserId)) = tableLenses

Note we have to wrap the foreign key lens in UserId.

Creating our Database

Now unlike Persistent, we'll create an extra type that will represent our database. Each of our two tables will have a field within this database:

data BlogDB f = BlogDB
  { _blogUsers :: f (TableEntity UserT)
  , _blogArticles :: f (TableEntity ArticleT)
  } deriving (Generic)

We'll need to make our database type an instance of the Database class. We'll also specify a set of default settings we can use on our database. Both of these items will involve a parameter be, which stands for a backend, (e.g. SQLite, Postgres). We leave this parameter generic for now.

instance Database be BlogDB

blogDb :: DatabaseSettings be BlogDB
blogDb = defaultDbSettings

Inserting into Our Database

Now, migrating our database with Beam is a little more complicated than it is with Persistent. We might cover that in a later article. For now, we'll keep things simple, and use an SQLite database and migrate it ourselves. So let's first create our tables. We have to follow Beam's conventions here, particularly on the user_id__id field for our foreign key:

CREATE TABLE users \
  ( id INTEGER PRIMARY KEY AUTOINCREMENT\
  , name VARCHAR NOT NULL \
  , email VARCHAR NOT NULL \
  , age INTEGER NOT NULL \
  , occupation VARCHAR NOT NULL \
  );
CREATE TABLE articles \
  ( id INTEGER PRIMARY KEY AUTOINCREMENT \
  , title VARCHAR NOT NULL \
  , body VARCHAR NOT NULL \
  , published_time INTEGER NOT NULL \
  , user_id__id INTEGER NOT NULL \
  );

Now we want to write a couple queries that can interact with the database. Let's start by inserting our raw users. We begin by opening up an SQLite connection, and we'll write a function that uses this connection:

import Database.SQLite.Simple (open, Connection)

main :: IO ()
main = do
  conn <- open "blogdb1.db"
  insertUsers conn

insertUsers :: Connection -> IO ()
insertUsers = ...

We start our expression by using runBeamSqlite and passing the connection. Then we use runInsert to specify to Beam that we wish to make an insert statement.

import Database.Beam
import Database.Beam.SQLite

insertUsers :: Connection -> IO ()
insertUsers conn = runBeamSqlite conn $ runInsert $
  ...

Now we'll use the insert function and signal which one of our tables we want out of our database:

insertUsers :: Connection -> IO ()
insertUsers conn = runBeamSqlite conn $ runInsert $
  insert (_blogUsers blogDb) $ ...

Last, since we are inserting raw values (UserT Identity), we use the insertValues function to complete this call:

insertUsers :: Connection -> IO ()
insertUsers conn = runBeamSqlite conn $ runInsert $
  insert (_blogUsers blogDb) $ insertValues users

And now we can check and verify that our users exist!

SELECT * FROM users;
1|James|james@example.com|25|programmer
2|Katie|katie@example.com|25|engineer

Let's do the same for articles. We'll use the pk function to access the primary key of a particular User:

article1 :: Article
article1 = Article 1 "First article" 
  "A great article" 1531193221 (pk user1)

article2 :: Article
article2 = Article 2 "Second article" 
  "A better article" 1531199221 (pk user2)

article3 :: Article
article3 = Article 3 "Third article" 
  "The best article" 1531200221 (pk user1)

articles :: [Article]
articles = [ article1, article2, article3]

insertArticles :: Connection -> IO ()
insertArticles conn = runBeamSqlite conn $ runInsert $
  insert (_blogArticles blogDb) $ insertValues articles

Select Queries

Now that we've inserted a couple elements, let's run some basic select statements. In general for select, we'll want the runSelectReturningList function. We could also query for a single element with a different function if we wanted:

findUsers :: Connection -> IO ()
findUsers conn = runBeamSqlite conn $ do
  users <- runSelectReturningList $ ...

Now we'll use select instead of insert from the last query. We'll also use the function all_ on our users field in the database to signify that we want them all. And that's all we need!:

findUsers :: Connection -> IO ()
findUsers conn = runBeamSqlite conn $ do
  users <- runSelectReturningList $ select (all_ (_blogUsers blogDb))
  mapM_ (liftIO . putStrLn . show) users

To do a filtered query, we'll start with the same framework. But now we need to enhance our select statement into a monadic expression. We'll start by selecting user from all our users:

findUsers :: Connection -> IO ()
findUsers conn = runBeamSqlite conn $ do
  users <- runSelectReturningList $ select $ do
   user <- (all_ (_blogUsers blogDb))
    ...
  mapM_ (liftIO . putStrLn . show) users

And we'll now filter on that by using guard_ and applying one of our lenses. We use a ==. operator for equality like in Persistent. We also have to wrap our raw comparison value with val:

findUsers :: Connection -> IO ()
findUsers conn = runBeamSqlite conn $ do
  users <- runSelectReturningList $ select $ do
    user <- (all_ (_blogUsers blogDb))
    guard_ (user ^. userName ==. (val_ "James"))
    return user
  mapM_ (liftIO . putStrLn . show) users

And that's all we need! Beam will generate the SQL for us! Now let's try to do a join. This is actually much simpler in Beam than with Persistent/Esqueleto. All we need is to add a couple more statements to our "select" on the articles. We'll just filter them by the user ID!

findUsersAndArticles :: Connection -> IO ()
findUsersAndArticles conn = runBeamSqlite conn $ do
  users <- runSelectReturningList $ select $ do
    user <- (all_ (_blogUsers blogDb))
    guard_ (user ^. userName ==. (val_ "James"))
    articles <- (all_ (_blogArticles blogDb))
    guard_ (article ^. articleUserId ==. user ^. userId)
    return user
  mapM_ (liftIO . putStrLn . show) users

That's all there is to it!

Auto Incrementing Primary Keys

In the examples above, we hard-coded all our IDs. But this isn't typically what you want. We should let the database assign the ID via some rule, in our case auto-incrementing. In this case, instead of creating a User "value", we'll make an "expression". This is possible through the polymorphic f parameter in our type. We'll leave off the type signature since it's a bit confusing. But here's the expression we'll create:

user1' = User
  default_ 
  (val_ "James")
  (val_ "james@example.com")
  (val_ 25)
  (val_ "programmer")

We use default_ to represent an expression that will tell SQL to use a default value. Then we lift all our other values with val_. Finally, we'll use insertExpressions instead of insertValues in our Haskell expression.

insertUsers :: Connection -> IO ()
insertUsers conn = runBeamSqlite conn $ runInsert $
  insert (_blogUsers blogDb) $ insertExpressions [ user1' ]

Then we'll have our auto-incrementing key!

Conclusion

That concludes our introduction to the Beam library. As we saw, Beam is a great library that lets you specify a database schema without using any Template Haskell. For more details, make sure to check out the documentation!

For a more in depth look at using Haskell libraries to make a web app, be sure to read our Haskell Web Series. It goes over some database mechanics as well as creating APIs and testing. As an added challenge, trying re-writing the code in that series to use Beam instead of Persistent. See how much of the Servant code needs to change to accommodate that.

And for more examples of cool libraries, download our Production Checklist! There are some more database and API libraries you can check out!

Appendix: Compiler Extensions

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE NoMonoMorphismRestriction #-}
Read More
James Bowen James Bowen

Deploying Confidently: Haskell and Circle CI

circle_haskell.png

In last week’s article, we deployed our Haskell code to the cloud using Heroku. Our solution worked, but the process was also very basic and very manual. Let’s review the steps we would take to deploy code on a real project with this approach.

  1. Make a pull request against master branch
  2. Merge code into master
  3. Pull master locally, run tests
  4. Manually run git push heroku master
  5. Hope everything works fine on Heroku

This isn’t a great approach. Wherever there are manual steps in our development process, we’re likely to forget something. This will almost always come around to bite us at some point. In this article, we’ll see how we can automate our development workflow using Circle CI.

Getting Started with Circle

To follow along with this article, you should already have your project stored on Github. As soon as you have this, you can integrate with Circle easily. Go to the Circle Website and login with Github. Then go to “Add Project”. You should see all your personal repositories. Clicking your Haskell project should allow you to integrate the two services.

Now that Circle knows about our repository, it will try to build whenever we push code up to Github. But we have to tell Circle CI what to do once we’ve pushed our code! For this step, we’ll need to create a config file and store it as part of our repository. Note we’ll be using Version 2 of the Circle CI configuration. To define this configuration we first create a folder called .circleci at the root of our repository. Then we make a YAML file called config.yaml.

In Circle V2, we specify “workflows” for the Circle container to run through. To keep things simple, we’ll limit our actions to take place within a single workflow. We specify the workflows section at the bottom of our config:

workflows:
  version: 2
  build_and_test:
    jobs:
      - build_project

Now at the top, we’ll again specify version 2, and then lay out a bare-bones definition of our build_project job.

version: 2
jobs:
  build_project:
    machine: true
    steps:
      - checkout
      - run: echo “Hello”

The machine section indicates a default Circle machine image we’re using for our project. There’s no built-in Haskell machine configuration we can use, so we’re using a basic image. Then for our steps, we’ll first checkout our code, and then run a simple “echo” command. Let’s now consider how we can get this machine to get the Stack utility so we can actually go and build our code.

Installing Stack

So right now our Circle container has no Haskell tools. This means we'll need to do everything from scratch. This is a useful learning exercise. We’ll learn the minimal steps we need to take to build a Haskell project on a Linux box. Next week, we’ll see a shortcut we can use.

Luckily, the Stack tool handles most of our problems for us, but we first have to download it. So after checking our our code, we’ll run several different commands to install Stack. Here’s what they look like:

steps:
  - checkout
  - run: wget https://github.com/commercialhaskell/stack/releases/download/v1.6.1/stack-1.6.1-linux-x86_64.tar.gz -O /tmp/stack.tar.gz
  - run: sudo mkdir /tmp/stack-download
  - run: sudo tar -xzf /tmp/stack.tar.gz -C /tmp/stack-download
  - run: sudo chmod +x /tmp/stack-download/stack-1.6.1-linux-x86_64/stack
  - run: sudo mv /tmp/stack-download/stack-1.6.1-linux-x86_64/stack /usr/bin/stack

The wget command downloads Stack off Github. If you’re using a different version of Stack than we are (1.6.1), you’ll need to change the version numbers of course. We’ll then create a temporary directory to unzip the actual executable to. Then we use tar to perform the unzip step. This leaves us with the stack executable in the appropriate folder. We’ll give this executable x permissions, and then move it onto the machine’s path. Then we can use stack!

Building Our Project

Now we’ve done most of the hard work! From here, we’ll just use the Stack commands to make sure our code works. We’ll start by running stack setup. This will download whatever version of GHC our project needs. Then we’ll run the stack test command to make sure our code compiles and passes all our test suites.

steps:
  - checkout
  - run: wget …
  ... 
  - run: stack setup
  - run: stack test

Note that Circle expects our commands to finish with exit code 0. This means if any of them has a non-zero exit code, the build will be a “failure”. This includes our stack test step. Thus, if we push code that fails any of our tests, we’ll see it as a build failure! This spares us the extra steps of running our tests manually and “hoping” they’ll work on the environment we deploy to.

Caching

There is a pretty big weakness in this process right now. Every Circle container we make starts from scratch. Thus we’ll have to download GHC and all the different libraries our code depends on for every build. This means you might need to wait 30-60 minutes to see if your code passes depending on the size of your project! We don’t want this. So to make things faster, we’ll tell Circle to cache this information, since it won’t change on most builds. We’ll take the following two steps:

  1. Only download GHC when stack.yaml changes (since the LTS might have changed). This involves caching the ~/.stack directory
  2. Only re-download libraries when either stack.yaml or our .cabal file changes. For this, we’ll cache the .stack-work library.

For each of these, we’ll make an appropriate cache key. At the start of our build process, we’ll attempt to restore these directories from the cache based on particular keys. As part of each key, we’ll use a checksum of the relevant file.

steps:
  - checkout
  - restore-cache:
      keys:
        - stack-{{ checksum “stack.yaml” }}
  - restore-cache:
      keys:
        - stack-{{checksum “stack.yaml”}}-{{checksum “project.cabal”}}

If these files change, the checksum will be different, so Circle won’t be able to restore the directories. Then our other steps will run in full, downloading all the relevant information. At the end of the process, we want to then make sure we’ve saved these directories under the same key. We do this with the save_cache command:

steps:
  …
  - stack test
  - save-cache:
      key: stack-{{ checksum “stack.yaml” }}
      paths:
        - “~/.stack”
  - restore-cache:
      keys: stack-{{checksum “stack.yaml”}}-{{checksum “project.cabal”}}
      paths:
        - “.stack-work”

Now the next builds won’t take as long! There are other ways we can make our cache keys. For instance, we could use the Stack LTS as part of the key, and bump this every time we change which LTS we’re using. The downside is that there’s a little more manual work required. But this work won’t happen too often. The positive side is that we won’t need to re-download GHC when we add extra dependencies to stack.yaml.

Deploying to Heroku

Last but not least, we’ll want to actually deploy our code to heroku every time we push to the master branch. Heroku makes it very easy for us to do this! First, go to the app dashboard for Heroku. Then find the Deploy tab. You should see an option to connect with Github. Use it to connect your repository. Then make sure you check the box that indicates Heroku should wait for CI. Now, whenever your build successfully completes, your code will get pushed to Heroku!

Conclusion

You might have noticed that there’s some redundancy with our approaches now! Our Circle CI container will build the code. Then our Heroku container will also build the code! This is very inefficient, and it can lead to deployment problems down the line. Next week, we’ll see how we can use Docker in this process. Docker fully integrates with Circle V2. It will simplify our Circle config definition. It will also spare us from needing to rebuild all our code on Heroku again!

With all these tools at your disposal, it’s time to finally build that Haskell app you always wanted to! Download our Production Checklist to learn some cool libraries you can use!

If you’ve never programmed in Haskell before, hopefully you can see that it’s not too difficult to use! Download our Haskell Beginner’s Checklist and get started!

Read More
James Bowen James Bowen

For All the World to See: Deploying Haskell with Heroku

deployment_1.jpg

In several different articles now, we’ve explored how to build web apps using Haskell. See for instance, our Haskell Web Series and our API integrations series. But all this is meaningless in the end if we don’t have a way to deploy our code so that other people on the internet can find it! In this next series, we’ll explore how we can use common services to deploy Haskell code. It’ll involve a few more steps than code in more well-supported languages!

If you’ve never programmed in Haskell at all, you’ve got a few things to learn before you start deploying code! Download our Beginners Checklist for tips on how to start learning! But maybe you’ve done some Haskell already, and need some more ideas for libraries to use. In that case, take a look at our Production Checklist for guidance!

Deploying Code on Heroku

In this article, we’re going to focus on using the Heroku service to deploy our code. Heroku allows us to do this with ease. We can get a quick prototype out for free, making it ideal for Hackathons. Like most platforms though, Heroku is easiest to use with more common languages. Heroku can automatically detect Javascript or Python apps and take the proper steps. Since Haskell isn’t used as much, we’ll need one extra specification to get Heroku support. Luckily, most of the hard work is already done for us.

Buildpacks

Heroku uses the concept of a “buildpack” to determine how to turn your project into runnable code. You’ll deploy your app by pushing your code to a remote repository. Then the buildpack will tell Heroku how to construct the executables you need. If you specify a Node.js project, Heroku will find your package.json file and download everything from NPM. If it’s Python, Heroku will install pip and do the same thing.

Heroku does not have any default buildpacks for Haskell projects. However, there is a buildpack on Github we can use (star this repository!). It will tell our Heroku container to download Stack, and then use Stack to build all our executables. So let’s see how we can build a rudimentary Haskell project using this process.

Creating Our Application

We’ll need to start by making a free account on Heroku. Then we’ll download the Heroku CLI so we can connect from the terminal. Use the heroku login command and enter your credentials.

Now we want to create our application. In your terminal, cd into the directory that has your Haskell Stack project. Make sure it’s also a Github repository already. It’s fine if the repository is only local for now. Run this command to create your application (replace haskell-test-app with your desired app name):

heroku create haskell-test-app \
  -b https://github.com/mfine/heroku-buildpack-stack

The -b argument specifies our buildpack. We'll pull it from the specified Github repository. If this works, you should be able to go to your Heroku dashboard and see an entry for your new application. You’ll have a Heroku domain for your project that you can see on project settings.

Now we need to make a Procfile. This tells Heroku the specific binary we need to run to start our web server. Make sure you have an executable in your .cabal file that starts up the server. Then in the Procfile, you’ll specify that executable under the web name:

web: run-server

Note though that you can’t use a hard-coded port! Heroku will choose a port for you. You can get it by retrieving the PORT environment variable. Here’s what your code might look like:

runServer :: IO ()
runServer = do
  port <- read <$> getEnv “PORT”
  Run port (serve myAPI myServer)

Now you’ll need to “scale” the application to make sure it has at least a single machine to run on. From your repository, run the command:

heroku ps:scale web=1

Finally, we need to push our application to the Heroku container. To do this, make sure Heroku added the remote heroku Github repository. You can do this with the following command:

git remote -v

It should show you two remotes named heroku, one for fetch, and one for push. If those don’t exist, you can add them like so:

heroku git:remote -a haskell-test-app

Then you can finish up by running this command:

git push heroku master

You should see terminal output indicating that Heroku recognizes your application. If you wait long enough, you'll start to see the Stack build process. If you have any environment variables for your project, set them from the app dashboard. You can also set variables with the following command:

heroku config:set VAR_NAME=var_value

Once our app finishes building, you can visit the URL Heroku gives you. It should look like https://your-app.herokuapp.com. You’ve now deployed your Haskell code to the cloud!

Weaknesses

There are a few weaknesses to this system. The main one is that our entire build process takes place on the cloud. This might seem like an advantage, and it has its perks. Haskell applications can take a LONG time to compile though. This is especially true if the project is large and involves Template Haskell. Services like Heroku often have timeouts on their build process. So if compilation takes too long, the build will fail. Luckily, the containers will cache previous results. This means Stack won't have to keep re-downloading all the libraries. So even if our first build times out, the second might succeed.

Conclusion

This concludes part 1 of our Haskell Deployment series. We’ll see the same themes quite a bit throughout this series. It’s definitely possible to deploy our Haskell code using common services. But we often have to do a little bit more work to do so. Next week we’ll see how we can automate our deployment process with Circle CI.

Want some more tips on developing web applications with Haskell? Download our Production Checklist to learn about some other libraries you can use! For a more detailed explanation of one approach, read our Haskell Web Skills series.

Read More
James Bowen James Bowen

Connecting to Mailchimp...from Scratch!

mailing_list.png

Welcome to the third and final article in our series on Haskell API integrations! We started this series off by learning how to send and receive text messages using Twilio. Then we learned how to send emails using the Mailgun service. Both of these involved applying existing Haskell libraries suited to the tasks. This week, we’ll learn how to connect with Mailchimp, a service for managing email subscribers. Only this time, we’re going to do it a bit differently.

There are a couple different Haskell libraries out there for Mailchimp. But we’re not going to use them! Instead, we’ll learn how we can use Servant to connect directly to the API. This should give us some understanding for how to write one of these libraries. It should also make us more confident of integrating with any API of our choosing!

To follow along the code for this article, checkout the mailchimp branch on Github! It’ll show you all the imports and compiler extensions you need!

The topics in this article are quite advanced. If any of it seems crazy confusing, there are plenty of easier resources for you to start off with!

  1. If you’ve never written Haskell at all, see our Beginners Checklist to learn how to get started!
  2. If you want to learn more about the Servant library we’ll be using, check out my talk from BayHac 2017 and download the slides and companion code.
  3. Our Production Checklist has some further resources and libraries you can look at for common tasks like writing web APIs!

Mailchimp 101

Now let’s get going! To integrate with Mailchimp, you first need to make an account and create a mailing list! This is pretty straightforward, and you'll want to save 3 pieces of information. First is base URL for the Mailchimp API. It will look like

https://{server}.api.mailchimp.com/3.0

Where {server} should be replaced by the region that appears in the URL when you log into your account. For instance, mine is: https://us14.api.mailchimp.com/3.0. You’ll also need your API Key, which appears in the “Extras” section under your account profile. Then you’ll also want to save the name of the mailing list you made.

Our 3 Tasks

We’ll be trying to perform three tasks using the API. First, we want to derive the internal “List ID” of our particular Mailchimp list. We can do this by analyzing the results of calling the endpoint at:

GET {base-url}/lists

It will give us all the information we need about our different mailing lists.

Once we have the list ID, we can use that to perform actions on that list. We can for instance retrieve all the information about the list’s subscribers by using:

GET {base-url}/lists/{list-id}/members

We’ll add an extra count param to this, as otherwise we'll only see the results for 10 users:

GET {base-url}/lists/{list-id}/members?count=2000

Finally, we’ll use this same basic resource to subscribe a user to our list. This involves a POST request and a request body containing the user’s email address. Note that all requests and responses will be in the JSON format:

POST {base-url}/lists/{list-id}/members

{
  “email_address”: “person@email.com”,
  “status”: “subscribed”
}

On top of these endpoints, we’ll also need to add basic authentication to every API call. This is where our API key comes in. Basic auth requires us to provides a “username” and “password” with every API request. Mailchimp doesn’t care what we provide as the username. As long as we provide the API key as the password, we’ll be good. Servant will make it easy for us to do this.

Types and Instances

Once we have the structure of the API down, our next goal is to define wrapper types. These will allow us to serialize our data into the format demanded by the Mailchimp API. We’ll have four different newtypes. The first will represent a single email list in a response object. All we care about is the list name and its ID, which we represent with Text:

newtype MailchimpSingleList = MailchimpSingleList (Text, Text)
  deriving (Show)

Now we want to be able to deserialize a response containing many different lists:

newtype MailchimpListResponse =
  MailchimpListResponse [MailchimpSingleList]
deriving (Show)

In a similar way, we want to represent a single subscriber and a response containing several subscribers:

newtype MailchimpSubscriber = MailchimpSubscriber
  { unMailchimpSubscriber :: Text }
deriving (Show)

newtype MailchimpMembersResponse =
  MailchimpMembersResponse [MailchimpSubscriber]
deriving (Show)

The purpose of using these newtypes is so we can define JSON instances for them. In general, we only need FromJSON instances so we can deserialize the response we get back from the API. Here’s what our different instances look like:

instance FromJSON MailchimpSingleList where
  parseJSON = withObject "MailchimpSingleList" $ \o -> do
    name <- o .: "name"
    id_ <- o .: "id"
    return $ MailchimpSingleList (name, id_)

instance FromJSON MailchimpListResponse where
  parseJSON = withObject "MailchimpListResponse" $ \o -> do
    lists <- o .: "lists"
    MailchimpListResponse <$> forM lists parseJSON

instance FromJSON MailchimpSubscriber where
  parseJSON = withObject "MailchimpSubscriber" $ \o -> do
    email <- o .: "email_address" 
    return $ MailchimpSubscriber email

instance FromJSON MailchimpListResponse where
  parseJSON = withObject "MailchimpListResponse" $ \o -> do
    lists <- o .: "lists"
    MailchimpListResponse <$> forM lists parseJSON

And last, we need a ToJSON instance for our individual subscriber type. This is because we’ll be sending that as a POST request body:

instance ToJSON MailchimpSubscriber where
  toJSON (MailchimpSubscriber email) = object
    [ "email_address" .= email
    , "status" .= ("subscribed" :: Text)
    ]

Defining a Server Type

Now that we've defined our types, we can go ahead and define our actual API using Servant. This might seem a little confusing. After all, we’re not building a Mailchimp Server! But by writing this API, we can use the client function from the servant-client library. This will derive all the client functions we need to call into the Mailchimp API. Let’s start by defining a combinator that will description our authentication format using BasicAuth. Since we aren’t writing any server code, we don’t need a “return” type for our authentication.

type MCAuth = BasicAuth "mailchimp" ()

Now let’s write the lists endpoint. It has the authentication, our string path, and then returns us our list response.

type MailchimpAPI =
  MCAuth :> “lists” :> Get ‘[JSON] MailchimpListResponse :<|>
  ...

For our next endpoint, we need to capture the list ID as a parameter. Then we’ll add the extra query parameter related to “count”. It will return us the members in our list.

type Mailchimp API =
  …
  MCAuth :> “lists” :> Capture “list-id” Text :>
    QueryParam “count” Int :> Get ‘[JSON] MailchimpMembersResponse

Finally, we need the “subscribe” endpoint. This will look like our last endpoint, except without the count parameter and as a post request. Then we’ll include a single subscriber in the request body.

type Mailchimp API =
  …
  MCAuth :> “lists” :> Capture “list-id” Text :>
    ReqBody ‘[JSON] MailchimpSubscriber :> Post ‘[JSON] ()

mailchimpApi :: Proxy MailchimpApi
mailchimpApi = Proxy :: Proxy MailchimpApi

Now with servant-client, it’s very easy to derive the client functions for these endpoints. We define the type signatures and use client. Note how the type signatures line up with the parameters that we expect based on the endpoint definitions. Each endpoint takes the BasicAuthData type. This contains a username and password for authenticating the request.

fetchListsClient :: BasicAuthData -> ClientM MailchimpListResponse
fetchSubscribersClient :: BasicAuthData -> Text -> Maybe Int
  -> ClientM MailchimpMembersResponse
subscribeNewUserClient :: BasicAuthData -> Text -> MailchimpSubscriber
  -> ClientM ()
( fetchListsClient :<|>
  fetchSubscribersClient :<|>
  subscribeNewUserClient) = client mailchimpApi

Running Our Client Functions

Now let’s write some helper functions so we can call these functions from the IO monad. Here’s a generic function that will take one of our endpoints and call it using Servant’s runClientM mechanism.

runMailchimp :: (BasicAuthData -> ClientM a) -> IO (Either ServantError a)
runMailchimp action = do
  baseUrl <- getEnv "MAILCHIMP_BASE_URL"
  apiKey <- getEnv "MAILCHIMP_API_KEY"
  trueUrl <- parseBaseUrl baseUrl
  let userData = BasicAuthData "username" (pack apiKey)
  manager <- newTlsManager
  let clientEnv = ClientEnv manager trueUrl
  runClientM (action userData) clientEnv

First we derive our environment variables and get a network connection manager. Then we run the client action against the ClientEnv. Not too difficult.

Now we’ll write a function that will take a list name, query the API for all our lists, and give us the list ID for that name. It will return an Either value since the client call might actually fail. It calls our list client and filters through the results until it finds a list whose name matches. We’ll return an error value if the list isn’t found.

fetchMCListId :: Text -> IO (Either String Text)
fetchMCListId listName = do
  listsResponse <- runMailchimp fetchListsClient
  case listsResponse of
    Left err -> return $ Left (show err)
    Right (MailchimpListResponse lists) ->
      case find nameMatches lists of
        Nothing -> return $ Left "Couldn't find list with that name!"
        Just (MailchimpSingleList (_, id_)) -> return $ Right id_ 
  where
    nameMatches :: MailchimpSingleList -> Bool
    nameMatches (MailchimpSingleList (name, _)) = name == listName

Our function for retrieving the subscribers for a particular list is more straightforward. We make the client call and either return the error or else unwrap the subscriber emails and return them.

fetchMCListMembers :: Text -> IO (Either String [Text])
fetchMCListMembers listId = do
  membersResponse <- runMailchimp 
    (\auth -> fetchSubscribersClient auth listId (Just 2000))
  case membersResponse of
    Left err -> return $ Left (show err)
    Right (MailchimpMembersResponse subs) -> return $
      Right (map unMailchimpSubscriber subs)

And our subscribe function looks very similar. We wrap the email up in the MailchimpSubscriber type and then we make the client call using runMailchimp.

subscribeMCMember :: Text -> Text -> IO (Either String ())
subscribeMCMember listId email = do
  subscribeResponse <- runMailchimp (\auth ->
    subscribeNewUserClient auth listId (MailchimpSubscriber email))
  case subscribeResponse of
    Left err -> return $ Left (show err)
    Right _ -> return $ Right ()

The SubscriberList Effect

Since the rest of our server uses Eff, let’s add an effect type for our subscription list. This will help abstract away the Mailchimp details. We’ll call this effect SubscriberList, and it will have a constructor for each of our three actions:

data SubscriberList a where
  FetchListId :: SubscriberList (Either String Text)
  FetchListMembers ::
    Text -> SubscriberList (Either String [Subscriber])
  SubscribeUser ::
    Text -> Subscriber -> SubscriberList (Either String ())

fetchListId :: (Member SubscriberList r) => Eff r (Either String Text)
fetchListId = send FetchListId

fetchListMembers :: (Member SubscriberList r) =>
  Text -> Eff r (Either String [Subscriber])
fetchListMembers listId = send (FetchListMembers listId)

subscribeUser :: (Member SubscriberList r) =>
  Text -> Subscriber -> Eff r (Either String ())
subscribeUser listId subscriber =
  send (SubscribeUser listId subscriber)

Note we use our wrapper type Subscriber from the schema.

To complete the puzzle, we need a function to convert this action into IO. Like all our different transformations, we use runNat on a natural transformation:

runSubscriberList :: (Member IO r) =>
  Eff (SubscriberList ': r) a -> Eff r a
runSubscriberList = runNat subscriberListToIO
  where
    subscriberListToIO :: SubscriberList a -> IO a
    ...

Now for each constructor, we’ll call into the helper functions we wrote above. We’ll add a little bit of extra logic that’s going to handle unwrapping the Mailchimp specific types we used and some error handling.

runSubscriberList :: (Member IO r) =>
  Eff (SubscriberList ': r) a -> Eff r a
runSubscriberList = runNat subscriberListToIO
  where
    subscriberListToIO :: SubscriberList a -> IO a
    subscriberListToIO FetchListId = do
      listName <- pack <$> getEnv "MAILCHIMP_LIST_NAME"
      fetchMCListId listName
    subscriberListToIO (FetchListMembers listId) = do
      membersEither <- fetchMCListMembers listId
      case membersEither of
        Left e -> return $ Left e
        Right emails -> return $ Right (Subscriber <$> emails)
    subscriberListToIO (SubscribeUser listId (Subscriber email)) =
      subscribeMCMember listId email

Modifying the Server

The last step of this process is to incorporate the new effects into our server. Our aim is to replace the simplistic Database effect we were using before. This is a snap. We’ll start by substituting our SubscriberList into the natural transformation used by Servant:

transformToHandler ::
  (Eff '[SubscriberList, Email, SMS, IO]) :~> Handler
transformToHandler = NT $ \action -> do
  let ioAct = runM $ runTwilio (runEmail (runSubscriberList action))
  liftIO ioAct

We now need to change our other server functions to use the new effects. In both cases, we’ll need to first fetch the list ID, handle the failure, and we can then proceed with the other operation. Here’s how we subscribe a new user:

subscribeHandler :: (Member SubscriberList r) => Text -> Eff r ()
subscribeHandler email = do
  listId <- fetchListId 
  case listId of
    Left _ -> error "Failed to find list ID!"
    Right listId' -> do
      _ <- subscribeUser listId' (Subscriber email)
      return ()

Finally, we send an email like so, combining last week’s Email effect with the SubscriberList effect we just created:

emailList :: (Member SubscriberList r, Member Email r) =>
  (Text, ByteString, Maybe ByteString) -> Eff r ()
emailList content = do
  listId <- fetchListId 
  case listId of
    Left _ -> error "Failed to find list ID!"
    Right listId' -> do
      subscribers <- fetchListMembers listId'
      case subscribers of
        Left _ -> error "Failed to find subscribers!"
        Right subscribers' -> do
          _ <- sendEmailToList
            content (subscriberEmail <$> subscribers')
          return ()

Conclusion

That wraps up our exploration of Mailchimp and our series on integrating APIs with Haskell! In part 1 of this series, we saw how to send and receive texts using the Twilio API. Then in part 2, we sent emails to our users with Mailgun. Finally, we used the Mailchimp API to more reliably store our list of subscribers. We even did this from scratch, without the use of a library like we had for the other two effects. We used Servant to great effect here, specifying what our API would look like even though we weren’t writing a server for it! This enabled us to derive client functions that could call the API for us.

This series combined tons of complex ideas from many other topics. If you were a little lost trying to keep track of everything, I highly recommend you check out our Haskell Web Skills series. It’ll teach you a lot of cool techniques, such as how to connect Haskell to a database and set up a server with Servant. You should also download our Production Checklist for some more ideas about cool libraries!

And of course, if you’re a total beginner at Haskell, hopefully you understand now that Haskell CAN be used for some very advanced functionality. Furthermore, we can do so with incredibly elegant solutions that separate our effects very nicely. If you’re interested in learning more about the language, download our free Beginners Checklist!

Read More
James Bowen James Bowen

Mailing it out with Mailgun!

emails.jpg

Last week, we started our exploration of the world of APIs by integrating Haskell with Twilio. We were able to send a basic SMS message, and then create a server that could respond to a user’s message. This week, we’re going to venture into another type of effect: sending emails. We’ll be using Mailgun for this task, along with the Hailgun Haskell API for it.

You can take a look at the full code for this article by looking at the mailgun branch on our Github repository. If this article sparks your curiosity for more Haskell libraries, you should download our Production Checklist!

Making an Account

To start with, we’ll need a mailgun account obviously. Signing up is free and straightforward. It will ask you for an email domain, but you don’t need one to get started. As long as you’re in testing mode, you can use a sandbox domain they provide to host your mail server.

With Twilio, we had to specify a “verified” phone number that we could message in testing mode. Similarly, you will also need to designate a verified email address. Your sandboxed domain will only be able to send to this address. You’ll also need to save a couple pieces of information about your Mailgun account. In particular, you need your API Key, the sandboxed email domain, and the reply address for your emails to use. Save these as environment variables on your local system and remote machine.

Basic Email

Now let’s get a feel for the Hailgun code by sending a basic email. All this occurs in the simple IO monad. We ultimately want to use the function sendEmail, which requires both a HailgunContext and a HailgunMessage:

sendEmail
  :: HailgunContext
  -> HailgunMessage
  -> IO (Either HailgunErrorResponse HailgunSendResponse)

We’ll start by retrieving our environment variables. With our domain and API key, we can build the HailgunContext we’ll need to pass as an argument.

import Data.ByteString.Char8 (pack)

sendMail :: IO ()
sendMail = do
  domain <- getEnv “MAILGUN_DOMAIN”
  apiKey <- getEnv “MAILGUN_API_KEY”
  replyAddress <- pack <$> getEnv “MAILGUN_REPLY_ADDRESS”
  -- Last argument is an optional proxy
  let context = HailgunContext domain apiKey Nothing
  ...

Now to build the message itself, we’ll use a builder function hailgunMessage. It takes several different parameters:

hailgunMessage
 :: MessageSubject
 -> MessageContent
 -> UnverifiedEmailAddress -- Reply Address, just a ByteString
 -> MessageRecipients
 -> [Attachment]
 -> Either HailgunErrorMessage HailgunMessage

These are all very easy to fill in. The MessageSubject is Text and then we’ll pass our reply address from above. For the content, we’ll start by using the TextOnly constructor for a plain text email. We’ll see an example later of how we can use HTML in the content:

sendMail :: IO ()
sendMail = do
  …
  replyAddress <- pack <$> getEnv “MAILGUN_REPLY_ADDRESS”
  let msg = mkMessage replyAddress
  …
  where
    mkMessage replyAddress = hailgunMessage
      “Hello Mailgun!”
      (TextOnly “This is a test message.”)
      replyAddress
      ...

The MessageRecipients type has three fields. First are the direct recipients, then the CC’d emails, and then the BCC’d users. We're only sending to a single user at the moment. So we can take the emptyMessageRecipients item and modify it. We’ll wrap up our construction by providing an empty list of attachments for now:

where
  mkMessage replyAddress = hailgunMessage
    “Hello Mailgun!”
    (TextOnly “This is a test message.”)
    replyAddress
    (emptyMessageRecipients { recipientsTo = [“verified@mail.com”] } )
    []

If there are issues, the hailgunMessage function can throw an error, as can the sendEmail function itself. But as long as we check these errors, we’re in good shape to send out the email!

createAndSendEmail :: IO ()
createAndSendEmail = do
  domain <- getEnv “MAILGUN_DOMAIN”
  apiKey <- getEnv “MAILGUN_API_KEY”
  replyAddress <- pack <$> getEnv “MAILGUN_REPLY_ADDRESS”
  let context = HailgunContext domain apiKey Nothing
  let msg = mkMessage replyAddress
  case msg of
    Left err -> putStrLn (“Making failed: “ ++ show err)
    Right msg’ -> do
      result <- sendEmail context msg
      case result of
        Left err -> putStrLn (“Sending failed: “ ++ show err)
        Right resp -> putStrLn (“Sending succeeded: “ ++ show rep)

Notice how it’s very easy to build all our functions up when we start with the type definitions. We can work through each type and figure out what it needs. I reflect on this idea some more in this article on Compile Driven Learning, which is part of our Haskell Brain Series for newcomers to Haskell!

Effify Email

Now we’d like to incorporate sending an email into our server. As you’ll note from looking at the source code, I revamped the Servant server to use free monads. There are many different effects in our system, and this helps us keep them straight. Check out this article for more details on free monads and the Eff library. To start, we want to describe our email sending as an effect. We’ll start with a simple data type that has a single constructor:

data Email a where
  SendSubscribeEmail :: Text -> Email (Either String ())

sendSubscribeEmail :: (Member Email r)
  => Text -> Eff r (Either String ())
sendSubscribeEmail email = send (SendSubscribeEmail email)

Now we need a way to peel the Email effect off our stack, which we can do as long as we have IO. We’ll mimic the sendEmail function we already wrote as the transformation. We now take the user’s email we’re sending to as an input!

runEmail :: (Member IO r) => Eff (Email ': r) a -> Eff r a
runEmail = runNat emailToIO
  where
    emailToIO :: Email a -> IO a
    emailToIO (SendSubscribeEmail subEmail) = do
      domain <- getEnv "MAILGUN_DOMAIN"
      apiKey <- getEnv "MAILGUN_API_KEY"
      replyEmail <- pack <$> getEnv "MAILGUN_REPLY_ADDRESS"
      let context = HailgunContext domain apiKey Nothing
      case mkSubscribeMessage replyEmail (encodeUtf8 subEmail) of
        Left err -> return $ Left err
        Right msg -> do
          result <- sendEmail context msg
          case result of
            Left err -> return $ Left (show err)
            Right resp -> return $ Right ()

Extending our SMS Handler

Now that we’ve properly described sending an email as an effect, let’s incorporate it into our server! We’ll start by writing another data type that will represent the potential commands a user might text to us. For now, it will only have the “subscribe” command.

data SMSCommand = SubscribeCommand Text

Now let’s write a function that will take their message and interpret it as a command. If they text subscribe {email}, we’ll send them an email!

messageToCommand :: Text -> Maybe SMSCommand
messageToCommand messageBody = case splitOn " " messageBody of
  ["subscribe", email] -> Just $ SubscribeCommand email
  _ -> Nothing

Now we’ll extend our server handler to reply. If we interpret their command correctly, we’ll send the email! Otherwise, we’ll send them back a text saying we couldn’t understand them. Notice how our SMS effect and Email effect are part of this handler:

smsHandler :: (Member SMS r, Member Email r)
  => IncomingMessage -> Eff r ()
smsHandler msg = 
  case messageToCommand (body msg) of
    Nothing -> sendText (fromNumber msg) 
      "Sorry, we didn't understand that request!"
    Just (SubscribeCommand email) -> do
      _ <- sendSubscribeEmail email
      return ()

And now our server will be able to send the email when the user "subscribes"!

Attaching a File

Let’s make our email a little more complicated. Right now we’re only sending a very basic email. Let’s modify it so it has an attachment. We can build an attachment by providing a path to a file as well as a string describing it. To get this file, our message making function will need the current running directory. We’ll also change the body a little bit.

mkSubscribeMessage :: ByteString -> ByteString -> FilePath -> Either HailgunErrorMessage HailgunMessage
mkSubscribeMessage replyAddress subscriberAddress currentDir = 
  hailgunMessage
    "Thanks for signing up!"
    content
    replyAddress 
    (emptyMessageRecipients { recipientsTo = [subscriberAddress] })
    -- Notice the attachment!
    [ Attachment 
        (rewardFilepath currentDir)
        (AttachmentBS "Your Reward")
    ]
  where
    content = TextOnly "Here's your reward!”

rewardFilepath :: FilePath -> FilePath
rewardFilepath currentDir = currentDir ++ "/attachments/reward.txt"

Now when our user signs up, they’ll get whatever attachment file we’ve specified!

HTML Content

To show off one more feature, let’s change the content of our email so that it contains some HTML instead of only text! In particular, we’ll give them the chance to confirm their subscription by clicking a link to our server. All that changes here is that we’ll use the TextAndHTML constructor instead of TextOnly. We do want to provide a plain text interpretation of our email in case HTML can’t be rendered for whatever reason. Notice the use of the <a> tags for the link:

content = TextAndHTML 
   textOnly
   ("Here's your reward! To confirm your subscription, click " <> 
     link <> "!")
  where
    textOnly = "Here's your reward! To confirm your subscription, go to "
       <> "https://haskell-apis.herokuapp.com/api/subscribe/"
       <> subscriberAddress
       <> " and we'll sign you up!"
   link = "<a href=\"https://haskell-apis.herokuapp.com/api/subscribe/" 
     <> subscriberAddress <> "\">this link</a>"

Now we’ll add another endpoint that will capture the email as a parameter and save it to a database. The Database effect very much resembles the one from the Eff article. It’ll save the email in a database table.

type ServerAPI = "api" :> "ping" :> Get '[JSON] String :<|>
  "api" :> "sms" :> ReqBody '[FormUrlEncoded] IncomingMessage
    :> Post '[JSON] () :<|>
  "api" :> "subscribe" :> Capture "email" Text :> Get '[JSON] ()

subscribeHandler :: (Member Database r) => Text -> Eff r ()
subscribeHandler email = registerUser email

Now if we wanted to write a function that would email everyone in our system, it’s not hard at all! We extend our effect types for both Email and Database. The Database function will retrieve all the subscribers in our system. Meanwhile the Email effect will send the specified email to the whole list.

data Database a where
  RegisterUser :: Text -> Database ()
  RetrieveSubscribers :: Database [Text]

data Email a where
  SendSubscribeEmail :: Text -> Email (Either String ())
  -- First parameter is (Subject line, Text content, HTML Context)
  SendEmailToList
    :: (Text, ByteString, Maybe ByteString)
    -> [Text]
    -> Email (Either String ())

And combining these just requires using both effects:

sendEmailToList :: (Member Email r, Member Database r) => ByteString -> ByteString -> Eff r ()
sendEmailToList = do
  list <- retrieveSubscribers
  void $ sendEmailToList list

Notice the absence of any lift calls! This is one of the cool strengths of Eff.

Conclusion

As we’ve seen in this article, sending emails with Haskell isn’t too scary. The Hailgun API is quite intuitive and when you break things down piece by piece and look at the types involved. This article brought together ideas from both compile driven development and the Eff framework. In particular, we can see in this series how convenient it is to separate our effects with Eff so that we aren’t doing a lot of messy lifts.

There’s a lot of advanced material in this article, so if you think you need to backtrack, don’t worry, we’ve got you covered! Our Haskell Web Skills Series will teach you how to use libraries like Persistent for database management and Servant for making an API. For some more libraries you can use to write enhanced Haskell, download our Production Checklist!

If you’ve never programmed in Haskell at all, you should try it out! Download our Haskell Beginner’s Checklist or read our Liftoff Series!

Read More
James Bowen James Bowen

Sending Texts with Twilio and Haskell!

text_convos.jpg

Writing our own Haskell code using only simple libraries is fun. But we can’t do everything from scratch. There are all kinds of cools services out there to use so we don’t have to. We can interface with a lot of these by using APIs. Often, the most well supported APIs use languages like Python and Javascript. But adventurous Haskell developers have also developed bindings for these systems! So in the next few weeks, we’ll be exploring a couple of these. We’ll also see what we can do when there isn’t an out-of-the-box library for us to use.

This week, we’ll focus on the Twilio API. We’ll see how we can send SMS messages from our Haskell code using the twilio library. We’ll also write a simple server to use Twilio’s callback system to receive text messages and process them programmatically. You can follow along with the code here on the Github repository for this series.

Of course, none of this is useful if you’ve never written any Haskell before! If you want to get started with the language basics, download our Beginners Checklist. To learn more about advanced techniques and libraries, grab our Production Checklist!

Setting Up Our Account

Naturally, you’ll need a Twilio account to use the Twilio API. Once you have this set up, you need to add your first Twilio number. This will be the number you’ll send text messages to. You'll also see it as the sender for other messages in your system. You should also go through the process of verifying your own phone number. This will allow you to send and receive messages on that phone without “publishing” your app.

You also need a couple other pieces of information from your account. There’s the account SID, and the authentication token. You can find these on the dashboard for your project on the Twilio page. You’ll need these values in your code. But since you don’t want to put them into version control, you should save them as environment variables on your machine. Then when you need to, you can fetch them like so:

fetchSid :: IO String
fetchSid = getEnv “TWILIO_ACCOUT_SID”

fetchToken :: IO String
fetchToken = getEnv “TWILIO_AUTH_TOKEN”

Sending a Message

The first thing we’ll want to do is use the API to actually send a text message. We perform Twilio actions within the Twilio monad. It’s rather straightforward to access this monad from IO. All we need is the runTwilio’ function:

runTwilio’ :: IO String -> IO String -> Twilio a -> IO a

The first two parameters to this function are IO actions to fetch the account SID and auth token. We've already written those. Then the final parameter of course is our Twilio action.

sendMessage :: IO ()
sendMessage = runTwilio’ fetchSid fetchToken $ do
  ...

To compose a message, we’ll use the PostMessage constructor. This takes three parameters. First, the “to” number of our message. Fill this in with the number to your physical phone. Then the second parameter is the “from” number, which has to be our Twilio account’s phone number. Then the third parameter is the message itself. To send the message, all we have to do is use the post function! That’s all there is to it!

sendMessage :: IO ()
sendMessage = runTwilio’ fetchSid fetchToken $ do
  let msg = PostMessage “+15551231234” “+15559879876” “Hello Twilio!”
  _ <- post msg
  return ()

And just like that, you’ve sent your first Twilio message! Note that it does cost a small amount of money to send messages over Twilio. But a trial account should give you enough free credit to experiment a little bit.

Receiving Messages

Now, it’s a little more complicated to deal with incoming messages. The first thing we need to do is create a webhook on our Twilio account. To do this, go to “Manage Numbers” from your project dashboard page. Then select your Twilio number. You’ll now want to scroll to the section called “Messaging” and then within that, find “A Message Comes In”. You want to select “Webhook” in the dropdown. Then you’ll need to specify a URL where your server is located, and select “HTTP Post”. For setting up a quick server, I use Heroku combined with this nifty build pack that works with Stack. I’ll go into that in more depth in a later article. But the main thing to see is that our endpoint is /api/sms.

twilio_dashboard.png

With this webhook set up, Twilio will send a post request to the endpoint every time a user texts our number. The request will contain the message and the number of the sender. So let’s set up a server using Servant to pick up that request.

We’ll start by specifying a simple type to encode the message we’ll receive from Twilio:

data IncomingMessage = IncomingMessage
  { fromNumber :: Text
  , body :: Text
  }

Twilio encodes its post request body as FormURLEncoded. In order for Servant to deserialize this, we’ll need to define an instance of the FromForm class for our type. This function takes in a hash map from keys to lists of values. It will return either an error string or our desired value.

instance FromForm IncomingMessage where
  fromForm :: Form -> Either Text IncomingMessage
  fromForm (From form) = ...

So form is a hash map, and we want to look up the “From” number of the message as well as its body. Then as long as we find at least one result for each of these, we’ll return the message. Otherwise, we return an error.

instance FromForm IncomingMessage where
  fromForm :: Form -> Either Text IncomingMessage
  fromForm (From form) = case lookupResults of
    Just ((fromNumber : _), (body : _)) -> 
      Right $ IncomingMessage fromNumber body
    Just _ -> Left “Found the keys but no values”
    Nothing -> Left “Didn’t find keys”
    where
      lookupResults = do
        fromNumber <- HashMap.lookup “From” form
        body <- HashMap.lookup “Body” form
        return (fromNumber, body)

Now that we have this instance, we can finally define our API endpoint! All it needs are the simple path components and the request body. For now, we won’t actually post any response.

type TwilioServerAPI = "api" :> "sms" :> 
  ReqBody '[FormUrlEncoded] IncomingMessage :> Post '[JSON] ()

Writing Our Handler

Now let’s we want to write a handler for our endpoint. First though, we’ll write a natural transformation so we can write our handler in the Twilio monad.

transformToHandler :: Twilio :~> Handler
transformToHandler = NT $ \action -> 
  liftIO $ runTwilio' fetchSid fetchToken action

Now we’ll write a simple handler that will echo the user’s message back to them.

twilioNum :: Text
twilioNum “+15559879876”

smsHandler :: IncomingMessage -> Twilio ()
smsHandler msg = do
  let newMessage = PostMessage (fromNumber msg) twilioNum (body msg)
  _ <- post newMessage
  return ()

And now we wrap up with some of the Servant mechanics to run our server.

twilioAPI :: Proxy TwilioServerAPI
twilioAPI = Proxy :: Proxy TwilioServerAPI

twilioServer :: Server TwilioServerAPI
twilioServer = enter transformToHandler smsHandler

runServer :: IO ()
runServer = do
  port <- read <$> getEnv “PORT”
  run port (serve twilioAPI twilioServer)

And now if we send a text message to our Twilio number, we’ll see that same message back as a reply!

Conclusion

In this article, we saw how we could use just a few simple lines of Haskell to send and receive text messages. There was a fair amount of effort required in using the Twilio tools themselves, but most of that is easy once you know where to look! Come back next week and we’ll explore how we can send emails with the Mailgun API. We’ll see how we can combine text and email for some pretty cool functionality.

An important thing making these apps easy is knowing the right tools to use! One of the tools we used in this part was the Servant web API library. To learn more about this, be sure to check out our Haskell Web Skills Series. For more ideas of web libraries to use, download our Production Checklist.

And if you’ve never written Haskell before, hopefully I’ve convinced you that it IS possible to do some cool things with the language! Download our Beginners Checklist to get stated!

Read More
James Bowen James Bowen

Megaparsec: Same Syntax, More Features!

megaparsec.png

Last week, we took a step into the monadic world of parsing by learning about the Attoparsec library. It provided us with a clearer syntax to work with compared to applicative parsing. This week, we’ll explore one final library: Megaparsec.

This library has a lot in common with Attoparsec. In fact, the two have a lot of compatibility by design. Ultimately, we’ll find that we don’t need to change our syntax a whole lot. But Megaparsec does have a few extra features that can make our lives simpler.

To follow the code examples here, head to the megaparsec branch on Github! To learn about more awesome libraries you can use in production, make sure to download our Production Checklist! But never fear if you’re new to Haskell! Just take a look at our Beginners checklist and you’ll know where to get started!

A Different Parser Type

To start out, the basic parsing type for Megaparsec is a little more complicated. It has two type parameters, e and s, and also comes with a built-in monad transformer ParsecT.

data ParsecT e s m a

type Parsec e s = ParsecT e s Identity

The e type allows us to provide some custom error data to our parser. The s type refers to the input type of our parser, typically some variant of String. This parameter also exists under the hood in Attoparsec. But we sidestepped that issue by using the Text module. For now, we’ll set up our own type alias that will sweep these parameters under the rug:

type MParser = Parsec Void Text

Trying our Hardest

Let’s start filling in our parsers. There’s one structural difference between Attoparsec and Megaparsec. When a parser fails in Attoparsec, its default behavior is to backtrack. This means it acts as though it consumed no input. This is not the case in Megaparsec! A naive attempt to repeat our nullParser code could fail in some ways:

nullParser :: MParser Value
nullParser = nullWordParser >> return ValueNull
  where
    nullWordParser = string "Null" <|> string "NULL" <|> string "null"

Suppose we get the input "NULL" for this parser. Our program will attempt to select the first parser, which will parse the N token. Then it will fail on U. It will move on to the second parser, but it will have already consumed the N! Thus the second and third parser will both fail as well!

We get around this issue by using the try combinator. Using try gives us the Attoparsec behavior of backtracking if our parser fails. The following will work without issue:

nullParser :: MParser Value
nullParser = nullWordParser >> return ValueNull
  where
    nullWordParser = 
      try (string "Null") <|> 
      try (string "NULL") <|> 
      try (string "null")

Even better, Megaparsec also has a convenience function string’ for case insensitive parsing. So our null and boolean parsers become even simpler:

nullParser :: MParser Value
nullParser = M.string' "null" >> return ValueNull

boolParser :: MParser Value
boolParser = 
  (trueParser >> return (ValueBool True)) <|> 
  (falseParser >> return (ValueBool False))
    where
      trueParser = M.string' "true"
      falseParser = M.string' "false"

Unlike Attoparsec, we don’t have a convenient parser for scientific numbers. We’ll have to go back to our logic from applicative parsing, only this time with monadic syntax.

numberParser :: MParser Value
numberParser = (ValueNumber . read) <$>
  (negativeParser <|> decimalParser <|> integerParser)
  where
    integerParser :: MParser String
    integerParser = M.try (some M.digitChar)

    decimalParser :: MParser String
    decimalParser = M.try $ do
      front <- many M.digitChar
      M.char '.'
      back <- some M.digitChar
      return $ front ++ ('.' : back)

    negativeParser :: MParser String
    negativeParser = M.try $ do
      M.char '-'
      num <- decimalParser <|> integerParser
      return $ '-' : num

Notice that each of our first two parsers use try to allow proper backtracking. For parsing strings, we’ll use the satisfy combinator to read everything up until a bar or newline:

stringParser :: MParser Value
stringParser = (ValueString . trim) <$>
  many (M.satisfy (not . barOrNewline))

And then filling in our value parser is easy as it was before:

valueParser :: MParser Value
valueParser =
  nullParser <|>
  boolParser <|>
  numberParser <|>
  stringParser

Filling in the Details

Aside from some trivial alterations, nothing changes about how we parse example tables. The Statement parser requires adding in another try call when we’re grabbing our pairs:

parseStatementLine :: Text -> MParser Statement
parseStatementLine signal = do
  M.string signal
  M.char ' '
  pairs <- many $ M.try ((,) <$> nonBrackets <*> insideBrackets)
  finalString <- nonBrackets
  let (fullString, keys) = buildStatement pairs finalString
  return $ Statement fullString keys
  where
    buildStatement  = ...

Otherwise, we’ll fail on any case where we don’t use any keywords in the statement! But it's otherwise the same. Of course, we also need to change how we call our parser in the first place. We'll use the runParser function instead of Attoparsec’s parseOnly. This takes an extra argument for the source file of our parser to provide better messages.

parseFeatureFromFile :: FilePath -> IO Feature
parseFeatureFromFile inputFile = do
  …
  case runParser featureParser finalString inputFile of
    Left s -> error (show s)
    Right feature -> return feature

But nothing else changes in the structure of our parsers. It's very easy to take Attoparsec code and Megaparsec code and re-use it with the other library!

Adding some State

One bonus we do get from Megaparsec is that its monad transformer makes it easier for us to use other monadic functionality. Our parser for statement lines has always been a little bit clunky. Let’s clean it up a little bit by allowing ourselves to store a list of strings as a state object. Here’s how we’ll change our parser type:

type MParser = ParsecT Void Text (State [String])

Now whenever we parse a key using our brackets parser, we can append that key to our existing list using modify. We’ll also return the brackets along with the string instead of merely the keyword:

insideBrackets :: MParser String
insideBrackets = do
  M.char '<'
  key <- many M.letterChar
  M.char '>'
  modify (++ [key]) -- Store the key in the state!
  return $ ('<' : key) ++ ['>']

Now instead of forming tuples, we can concatenate the strings we parse!

parseStatementLine :: Text -> MParser Statement
parseStatementLine signal = do
  M.string signal
  M.char ' '
  pairs <- many $ M.try ((++) <$> nonBrackets <*> insideBrackets)
  finalString <- nonBrackets
  let fullString = concat pairs ++ finalString
  …

And now how do we get our final list of keys? Simple! We get our state value, reset it, and return everything. No need for our messy buildStatement function!

parseStatementLine :: Text -> MParser Statement
parseStatementLine signal = do
  M.string signal
  M.char ' '
  pairs <- many $ M.try ((++) <$> nonBrackets <*> insideBrackets)
  finalString <- nonBrackets
  let fullString = concat pairs ++ finalString
  keys <- get
  put []
  return $ Statement fullString keys

When we run this parser at the start, we now have to use runParserT instead of runParser. This returns us an action in the State monad, meaning we have to use evalState to get our final result:

parseFeatureFromFile :: FilePath -> IO Feature
parseFeatureFromFile inputFile = do
  …
  case evalState (stateAction finalString) [] of
    Left s -> error (show s)
    Right feature -> return feature
  where
    stateAction s = runParserT featureParser inputFile s

Bonuses of Megaparsec

As a last bonus, let's look at error messages in Megaparsec. When we have errors in Attoparsec, the parseOnly function gives us an error string. But it’s not that helpful. All it tells us is what individual parser on the inside of our system failed:

>> parseOnly nullParser "true"
Left "string"
>> parseOnly "numberParser" "hello"
Left "Failed reading: takeWhile1"

These messages don’t tell us where within the input it failed, or what we expected instead. Let’s compare this to Megaparsec and runParser:

>> runParser nullParser "true" ""
Left (TrivialError 
  (SourcePos {sourceName = "true", sourceLine = Pos 1, sourceColumn = Pos 1} :| []) 
  (Just EndOfInput) 
  (fromList [Tokens ('n' :| "ull")]))
>> runParser numberParser "hello" ""
Left (TrivialError 
  (SourcePos {sourceName = "hello", sourceLine = Pos 1, sourceColumn = Pos 1} :| []) 
    (Just EndOfInput) 
    (fromList [Tokens ('-' :| ""),Tokens ('.' :| ""),Label ('d' :| "igit")]))

This gives us a lot more information! We can see the string we’re trying to parse. We can also see the exact position it fails at. It’ll even give us a picture of what parsers it was trying to use. In a larger system, this makes a big difference. We can track down where we’ve gone wrong either in developing our syntax, or conforming our input to meet the syntax. If we customize the e parameter type, we can even add our own details into the error message to help even more!

Conclusion

This wraps up our exploration of parsing libraries in Haskell! In the past few weeks, we’ve learned about Applicative parsing, Attoparsec, and Megaparsec. The first provides useful and intuitive combinators for when our language is regular. It allows us to avoid using a monad for parsing and the baggage that might bring. With Attoparsec, we saw an introduction to monadic style parsing. This provided us with a syntax that was easier to understand and where we could see what was happening. Finally, this week, we explored Megaparsec. This library has a lot in common syntactically with Attoparsec. But it provides a few more bells and whistles that can make many tasks easier.

Ready to explore some more areas of Haskell development? Want to get some ideas for new libraries to learn? Download our Production Checklist! It’ll give you a quick summary of some tools in areas ranging from data structures to web APIs!

Never programmed in Haskell before? Want to get started? Check out our Beginners Checklist! It has all the tools you need to start your Haskell journey!

Read More
James Bowen James Bowen

Attoparsec: The Clarity of Do-Syntax

attoparsec.jpg

In last week’s article we completed our look at the Applicative Parsing library. We took all our smaller combinators and put them together to parse our Gherkin syntax. This week, we’ll look at a new library: Attoparsec. Instead of trying to do everything using a purely applicative structure, this library uses a monadic approach. This approach is much more common. It results in syntax that is simpler to read and understand. It will also make it easier for us to add certain features.

To follow along with the code for this article, take a look at the attoparsec branch on Github! For some more excellent ideas about useful libraries, download our Production Checklist! It includes material on libraries for everything from data structures to machine learning!

If you’re new to Haskell, make sure you download our Beginner’s Checklist! It’ll tell you about all the steps you need to take to get started on your Haskell journey!

The Parser Type

In applicative parsing, all our parsers had the type RE Char. This type belonged to the Applicative typeclass but was not a Monad. For Attoparsec, we’ll instead be using the Parser type, a full monad. So in general we’ll be writing parsers with the following types:

featureParser :: Parser Feature
scenarioParser :: Parser Scenario
statementParser :: Parser Statement
exampleTableParser :: Parser ExampleTable
valueParser :: Parser Value

Parsing Values

The first thing we should realize though is that our parser is still an Applicative! So not everything needs to change! We can still make use of operators like *> and <|>. In fact, we can leave our value parsing code almost exactly the same! For instance, the valueParser, nullParser, and boolParser expressions can remain the same:

valueParser :: Parser Value
valueParser =
  nullParser <|>
  boolParser <|>
  numberParser <|>
  stringParser

nullParser :: Parser Value
nullParser =
  (string "null" <|>
  string "NULL" <|>
  string "Null") *> pure ValueNull

boolParser :: Parser Value
boolParser = (trueParser *> pure (ValueBool True)) <|> (falseParser *> pure (ValueBool False))
  where
    trueParser = string "True" <|> string "true" <|> string "TRUE"
    falseParser = string "False" <|> string "false" <|> string "FALSE"

If we wanted, we could make these more "monadic" without changing their structure. For instance, we can use return instead of pure (since they are identical). We can also use >> instead of *> to perform monadic actions while discarding a result. Our value parser for numbers changes a bit, but it gets simpler! The authors of Attoparsec provide a convenient parser for reading scientific numbers:

numberParser :: Parser Value
numberParser = ValueNumber <$> scientific

Then for string values, we’ll use the takeTill combinator to read all the characters until a vertical bar or newline. Then we’ll apply a few text functions to remove the whitespace and get it back to a String. (The Parser monad we’re using parses things as Text rather than String).

stringParser :: Parser Value
stringParser = (ValueString . unpack . strip) <$> 
  takeTill (\c -> c == '|' || c == '\n')

Parsing Examples

As we parse the example table, we’ll switch to a more monadic approach by using do-syntax. First, we establish a cellParser that will read a value within a cell.

cellParser = do
  skipWhile nonNewlineSpace
  val <- valueParser
  skipWhile (not . barOrNewline)
  char '|'
  return val

Each line in our statement refers to a step of the parsing process. So first we skip all the leading whitespace. Then we parse our value. Then we skip the remaining space, and parse the final vertical bar to end the cell. Then we’ll return the value we parsed.

It’s a lot easier to keep track of what’s going on here compared to applicative syntax. It’s not hard to see which parts of the input we discard and which we use. If we don’t assign the value with <- within do-syntax, we discard the value. If we retrieve it, we’ll use it. To complete the exampleLineParser, we parse the initial bar, get many values, close out the line, and then return them:

exampleLineParser :: Parser [Value]
exampleLineParser = do
  char '|'
  cells <- many cellParser
  char '\n'
  return cells
  where
    cellParser = ...

Reading the keys for the table is almost identical. All that changes is that our cellParser uses many letter instead of valueParser. So now we can put these pieces together for our exampleTableParser:

exampleTableParser :: Parser ExampleTable
exampleTableParser = do
  string "Examples:"
  consumeLine
  keys <- exampleColumnTitleLineParser
  valueLists <- many exampleLineParser
  return $ ExampleTable keys (map (zip keys) valueLists)

We read the signal string "Examples:", followed by consuming the line. Then we get our keys and values, and build the table with them. Again, this is much simpler than mapping a function like buildExampleTable like in applicative syntax.

Statements

The Statement parser is another area where we can improve the clarity of our code. Once again, we’ll define two helper parsers. These will fetch the portions outside brackets and then inside brackets, respectively:

nonBrackets :: Parser String
nonBrackets = many (satisfy (\c -> c /= '\n' && c /= '<'))

insideBrackets :: Parser String
insideBrackets = do
  char '<'
  key <- many letter
  char '>'
  return key

Now when we put these together, we can more clearly see the steps of the process outlined in do-syntax. First we parse the “signal” word, then a space. Then we get the “pairs” of non-bracketed and bracketed portions. Finally, we’ll get one last non-bracketed part:

parseStatementLine :: Text -> Parser Statement
parseStatementLine signal = do
  string signal
  char ' '
  pairs <- many ((,) <$> nonBrackets <*> insideBrackets)
  finalString <- nonBrackets
  ...

Now we can define our helper function buildStatement and call it on its own line in do-syntax. Then we’ll return the resulting Statement. This is much easier to read than tracking which functions we map over which sections of the parser:

parseStatementLine :: Text -> Parser Statement
parseStatementLine signal = do
  string signal
  char ' '
  pairs <- many ((,) <$> nonBrackets <*> insideBrackets)
  finalString <- nonBrackets
  let (fullString, keys) = buildStatement pairs finalString
  return $ Statement fullString keys
  where
    buildStatement 
      :: [(String, String)] -> String -> (String, [String])
    buildStatement [] last = (last, [])
    buildStatement ((str, key) : rest) rem =
      let (str', keys) = buildStatement rest rem
      in (str <> "<" <> key <> ">" <> str', key : keys)

Scenarios and Features

As with applicative parsing, it’s now straightforward for us to finish everything off. To parse a scenario, we read the keyword, consume the line to read the title, and read the statements and examples:

scenarioParser :: Parser Scenario
scenarioParser = do
  string "Scenario: "
  title <- consumeLine
  statements <- many (parseStatement <* char '\n')
  examples <- (exampleTableParser <|> return (ExampleTable [] []))
  return $ Scenario title statements examples

Again, we provide an empty ExampleTable as an alternative if there are no examples. The parser for Background looks very similar. The only difference is we ignore the result of the line and instead use Background as the title string.

backgroundParser :: Parser Scenario
backgroundParser = do
  string "Background:"
  consumeLine
  statements <- many (parseStatement <* char '\n')
  examples <- (exampleTableParser <|> return (ExampleTable [] []))
  return $ Scenario "Background" statements examples

Finally, we’ll put all this together as a feature. We read the title, get the background if it exists, and read our scenarios:

featureParser :: Parser Feature
featureParser = do
  string "Feature: "
  title <- consumeLine
  maybeBackground <- optional backgroundParser
  scenarios <- many scenarioParser
  return $ Feature title maybeBackground scenarios

Feature Description

One extra feature we’ll add now is that we can more easily parse the “description” of a feature. We omitted them in applicative parsing, as it’s a real pain to implement. It becomes much simpler when using a monadic approach. The first step we have to take though is to make one parser for all the main elements of our feature. This approach looks like this:

featureParser :: Parser Feature
featureParser = do
  string "Feature: "
  title <- consumeLine
  (description, maybeBackground, scenarios) <- parseRestOfFeature
  return $ Feature title description maybeBackground scenarios

parseRestOfFeature :: Parser ([String], Maybe Scenario, [Scenario])
parseRestOfFeature = ...

Now we’ll use a recursive function that reads one line of the description at a time and adds to a growing list. The trick is that we’ll use the choice combinator offered by Attoparsec.

We’ll create two parsers. The first assumes there are no further lines of description. It attempts to parse the background and scenario list. The second reads a line of description, adds it to our growing list, and recurses:

parseRestOfFeature :: Parser ([String], Maybe Scenario, [Scenario])
parseRestOfFeature = parseRestOfFeatureTail []
  where
    parseRestOfFeatureTail prevDesc = do
      (fullDesc, maybeBG, scenarios) <- choice [noDescriptionLine prevDesc, descriptionLine prevDesc]
      return (fullDesc, maybeBG, scenarios)

So we’ll first try to run this noDescriptionLineParser. It will try to read the background and then the scenarios as we’ve always done. If it succeeds, we know we’re done. The argument we passed is the full description:

where
  noDescriptionLine prevDesc = do
    maybeBackground <- optional backgroundParser
    scenarios <- some scenarioParser
    return (prevDesc, maybeBackground, scenarios)

Now if this parser fails, we know that it means the next line is actually part of the description. So we’ll write a parser to consume a full line, and then recurse:

descriptionLine prevDesc = do
  nextLine <- consumeLine
  parseRestOfFeatureTail (prevDesc ++ [nextLine])

And now we’re done! We can parse descriptions!

Conclusion

That wraps up our exploration of Attoparsec. Come back next week where we’ll finish this series off by learning about Megaparsec. We’ll find that it’s syntactically very similar to Attoparsec with a few small exceptions. We’ll see how we can use some of the added power of monadic parsing to enrich our syntax.

To learn more about cool Haskell libraries, be sure to check out our Production Checklist! It’ll tell you a little bit about libraries in all kinds of areas like databases and web APIs.

If you’ve never written Haskell at all, download our Beginner’s Checklist! It’ll give you all the resources you need to get started on your Haskell journey!

Read More
James Bowen James Bowen

Applicative Parsing II: Putting the Pieces Together

applicative_parsing_2.png

In last week’s article, we introduced the Applicative parsing library. We learned about the RE type and the basic combinators like sym and string. We saw how we could combine those together with applicative functions like many and <*> to parse strings into data structures. This week, we’ll put these pieces together in an actual parser for our Gherkin syntax. To follow along with the code examples, check out Parser.hs on the Github repository.

Starting next week, we’ll explore some other parsing libraries, starting with Attoparsec. For a little more information about those and many other libraries, download our Production Checklist! It summarizes many libraries on topics from databases to Web APIs.

If you’ve never written Haskell at all, get started! Download our free Beginners Checklist!.

Value Parser

In keeping with our approach from the last article, we’re going to start with smaller elements of our syntax. Then we can use these to build larger ones with ease. To that end, let’s build a parser for our Value type, the most basic data structure in our syntax. Let’s recall what that looks like:

data Value =
ValueNull |
ValueBool Bool |
ValueString String |
ValueNumber Scientific

Since we have different constructors, we’ll make a parser for each one. Then we can combine them with alternative syntax:

valueParser :: RE Char Value
valueParser =
  nullParser <|>
  boolParser <|>
  numberParser <|>
  stringParser

Now our parsers for the null values and boolean values are easy. For each of them, we’ll give a few different options about what strings we can use to represent those elements. Then, as with the larger parser, we’ll combine them with <|>.

nullParser :: RE Char Value
nullParser =
  (string “null” <|>
  string “NULL” <|>
  string “Null”) *> pure ValueNull

boolParser :: RE Char Value
boolParser =
  trueParser *> pure (ValueBool True) <|> 
  falseParser *> pure (ValueBool False)
  where
    trueParser = string “True” <|> string “true” <|> string “TRUE”
    falseParser = string “False” <|> string “false” <|> string “FALSE”

Notice in both these cases we discard the actual string with *> and then return our constructor. We have to wrap the desired result with pure.

Number and String Values

Numbers and strings are a little more complicated since we can’t rely on hard-coded formats. In the case of numbers, we’ll account for integers, decimals, and negative numbers. We'll ignore scientific notation for now. An integer is simple to parse, since we’ll have many characters that are all numbers. We use some instead of many to enforce that there is at least one:

numberParser :: RE Char Value
numberPaser = …
  where
    integerParser = some (psym isNumber)

A decimal parser will read some numbers, then a decimal point, and then more numbers. We'll insist there is at least one number after the decimal point.

numberParser :: RE Char Value
numberPaser = …
  where
    integerParser = some (psym isNumber)
    decimalParser = 
      many (psym isNumber) <*> sym ‘.’ <*> some (psym isNumber)

Finally, for negative numbers, we’ll read a negative symbol and then one of the other parsers:

numberParser :: RE Char Value
numberPaser = …
  where
    integerParser = some (psym isNumber)
    decimalParser = 
      many (psym isNumber) <*> sym ‘.’ <*> some (psym isNumber)
    negativeParser = sym ‘-’ <*> (decimalParser <|> integerParser)

However, we can’t combine these parsers as is! Right now, they all return different results! The integer parser returns a single string. The decimal parser returns two strings and the decimal character, and so on. In general, we’ll want to combine each parser's results into a single string and then pass them to the read function. This requires mapping a couple functions over our last two parsers:

numberParser :: RE Char Value
numberPaser = …
  where
    integerParser = some (psym isNumber)
    decimalParser = combineDecimal <$> 
      many (psym isNumber) <*> sym ‘.’ <*> some (psym isNumber)
    negativeParser = (:) <$> 
      sym ‘-’ <*> (decimalParser <|> integerParser)

    combineDecimal :: String -> Char -> String -> String
    combineDecimal base point decimal = base ++ (point : decimal)

Now all our number parsers return strings, so we can safely combine them. We'll map the ValueNumber constructor over the value we read from the string.

numberParser :: RE Char Value
numberPaser = (ValueNumber . read) <$>
  (negativeParser <|> decimalParser <|> integerParser)
  where
    ...

Note that order matters! If we put the integer parser first, we’ll be in trouble! If we encounter a decimal, the integer parser will greedily succeed and parse everything before the decimal point. We'll either lose all the information after the decimal, or worse, have a parse failure.

The last thing we need to do is read a string. We need to read everything in the example cell until we hit a vertical bar, but then ignore any whitespace. Luckily, we have the right combinator for this, and we’ve even written a trim function already!

stringParser :: RE Char Value
stringParser = (ValueString . trim) <$> readUntilBar

And now our valueParser will work as expected!

Building an Example Table

Now that we can parse individual values, let’s figure out how to parse the full example table. We can use our individual value parser to parse a whole line of values! The first step is to read the vertical bar at the start of the line.

exampleLineParser :: RE Char [Value]
exampleLineParser = sym ‘|’ *> ...

Next, we’ll build a parser for each cell. It will read the whitespace, then the value, and then read up through the next bar.

exampleLineParser :: RE Char [Value]
exampleLineParser = sym ‘|’ *> ...
  where
    cellParser = 
      many isNonNewlineSpace *> valueParser <* readThroughBar

isNonNewlineSpace :: RE Char Char
isNonNewlineSpace = psym (\c -> isSpace c && c /= ‘\n’)

Now we read many of these and finish by reading the newline:

exampleLineParser :: RE Char [Value]
exampleLineParser = 
  sym ‘|’ *> many cellParser <* readThroughEndOfLine
  where
    cellParser = 
      many isNonNewlineSpace *> valueParser <* readThroughBar

Now, we need a similar parser that reads the title column of our examples. This will have the same structure as the value cells, only it will read normal alphabetic strings instead of values.

exampleColumnTitleLineParser :: RE Char [String]
exampleColumnTitleLineParser = sym ‘|’ *> many cellParser <* readThroughEndOfLine
  where
    cellParser = 
      many isNonNewlineSpace *> many (psym isAlpha) <* readThroughBar

Now we can start building the full example parser. We’ll want to read the string, the column titles, and then the value lines.

exampleTableParser :: RE Char ExampleTable
exampleTableParser =
  (string “Examples:” *> readThroughEndOfLine) *>
  exampleColumnTitleLineParser <*>
  many exampleLineParser

We’re not quite done yet. We’ll need to apply a function over these results that will produce the final ExampleTable. And the trick is that we want to map up the example keys with their values. We can accomplish this with a simple function. It will return zip the keys over each value list using map:

exampleTableParser :: RE Char ExampleTable
exampleTableParser = buildExampleTable <$>
  (string “Examples:” *> readThroughEndOfLine) *>
  exampleColumnTitleLineParser <*>
  many exampleLineParser
  where
    buildExampleTable :: [String] -> [[Value]] -> ExampleTable
    buildExampleTable keys valueLists = ExampleTable keys (map (zip keys) valueLists)

Statements

Now we that we can parse the examples for a given scenario, we need to parse the Gherkin statements. To start with, let’s make a generic parser that takes the keyword as an argument. Then our full parser will try each of the different statement keywords:

parseStatementLine :: String -> RE Char Statement
parseStatementLine signal = …

parseStatement :: RE Char Statement
parseStatement =
  parseStatementLine “Given” <|>
  parseStatementLine “When” <|>
  parseStatementLine “Then” <|>
  parseStatementLine “And”

Now we’ll get the signal word out of the way and parse the statement line itself.

parseStatementLine :: String -> RE Char Statement
parseStatementLine signal = string signal *> sym ' ' *> ...

Parsing the statement is tricky. We want to parse the keys inside brackets and separate them as keys. But we also want them as part of the statement’s string. To that end, we’ll make two helper parsers. First, nonBrackets will parse everything in a string up through a bracket (or a newline).

nonBrackets :: RE Char String
nonBrackets = many (psym (\c -> c /= ‘\n’ && c /= ‘<’))

We’ll also want a parser that parses the brackets and returns the keyword inside:

insideBrackets :: RE Char String
insideBrackets = sym ‘<’ *> many (psym (/= ‘>’)) <* sym ‘>’

Now to read a statement, we start with non-brackets, and alternate with keys in brackets. Let's observe that we start and end with non-brackets, since they can be empty. Thus we can represent a line a list of non-bracket/bracket pairs, followed by a last non-bracket part. To make a pair, we combine the parser results in a tuple using the (,) constructor enabled by TupleSections:

parseStatementLine :: String -> RE Char Statement
parseStatementLine signal = string signal *> sym ‘ ‘ *>
  many ((,) <$> nonBrackets <*> insideBrackets) <*> nonBrackets

From here, we need a recursive function that will build up our final statement string and the list of keys. We do this with buildStatement.

parseStatementLine :: String -> RE Char Statement
parseStatementLine signal = string signal *> sym ‘ ‘ *>
  (buildStatement <$> 
    many ((,) <$> nonBrackets <*> insideBrackets) <*> nonBrackets)
  where
    buildStatement :: 
      [(String, String)] -> String -> (String, [String])
    buildStatement [] last = (last, [])
    buildStatement ((str, key) : rest) rem =
      let (str', keys) = buildStatement rest rem
      in (str <> "<" <> key <> ">" <> str', key : keys)

The last thing we need is a final helper that will take the result of buildStatement and turn it into a Statement. We’ll call this finalizeStatement, and then we’re done!

parseStatementLine :: String -> RE Char Statement
parseStatementLine signal = string signal *> sym ‘ ‘ *>
  (finalizeStatement . buildStatement <$> 
    many ((,) <$> nonBrackets <*> insideBrackets) <*> nonBrackets)
  where
    buildStatement :: 
      [(String, String)] -> String -> (String, [String])
    buildStatement [] last = (last, [])
    buildStatement ((str, key) : rest) rem =
      let (str', keys) = buildStatement rest rem
      in (str <> "<" <> key <> ">" <> str', key : keys)

    finalizeStatement :: (String, [String]) -> Statement
    finalizeStatement (regex, variables) = Statement regex variables

Scenarios

Now that we have all our pieces in place, it’s quite easy to write the parser for scenario! First we get the title by reading the keyword and then the rest of the line:

scenarioParser :: RE Char Scenario
scenarioParser = string “Scenario: “ *> readThroughEndOfLine ...

After that, we read many statements, and then the example table. Since the example table might not exist, we’ll provide an alternative that is a pure, empty table. We can wrap everything together by mapping the Scenario constructor over it.

scenarioParser :: RE Char Scenario
scenarioParser = Scenario <$>
  (string “Scenario: “ *> readThroughEndOfLine) <*>
  many (statementParser <* sym ‘\n’) <*>
  (exampleTableParser <|> pure (ExampleTable [] []))

We can also make a “Background” parser that is very similar. All that changes is that we read the string “Background” instead of a title. Since we’ll hard-code the title as “Background”, we can include it with the constructor and map it over the parser.

backgroundParser :: RE Char Scenario
backgroundParser = Scenario “Background” <$>
  (string “Background:” *> readThroughEndOfLine) *>
 many (statementParser <* sym ‘\n’) <*>
  (exampleTableParser <|> pure (ExampleTable [] []))

Finally the Feature

We’re almost done! All we have left is to write the featureParser itself! As with scenarios, we’ll start with the keyword and a title line:

featureParser :: RE Char Feature
featureParser = Feature <$>
  (string “Feature: “ *> readThroughEndOfLine) <*>
  ...

Now we’ll use the optional combinator to parse the Background if it exists, but return Nothing if it doesn’t. Then we’ll wrap up with parsing many scenarios!

featureParser :: RE Char Feature
featureParser = Feature <$>
  (string “Feature: “ *> readThroughEndOfLine) <*>
  (optional backgroundParser) <*>
  (many scenarioParser)

Note that here we’re ignoring the “description” of a feature we proposed as part of our original syntax. Since there are no keywords for that, it turns out to be painful to deal with it using applicative parsing. When we look at monadic approaches starting next week, we’ll see it isn’t as hard there.

Conclusion

This wraps up our exploration of applicative parsing. We can see how well suited Haskell is for parsing. The functional nature of the language means it's easy to start with small building blocks like our first parsers. Then we can gradually combine them to make something larger. It can be a little tricky to wrap our heads around all the different operators and combinators. But once you understand the ways in which these let us combine our parsers, they make a lot of sense and are easy to use.

To further your knowledge of useful Haskell libraries, download our free Production Checklist! It will tell you about libraries for many tasks, from databases to machine learning!

If you’ve never written a line of Haskell before, never fear! Download our Beginners Checklist to learn more!

Read More
James Bowen James Bowen

Applicative Parsing I: Building the Foundation

applicative_parsing_1.png

Last week we prepared ourselves for parsing by going over the basics of the Gherkin Syntax. In this article and the next, we’ll be using the applicative parsing library to parse that syntax. This week, we’ll focus on the fundamentals of this library, and build up a vocabulary of combinators to use. We'll make heavy use of the Applicative typeclass. If you need a refresher on that, check out this article. As we start coding, you can also follow along with the examples on Github here! Most of the code here is in Parser.hs.

In the coming weeks, we’ll be seeing a couple other parsing libraries as well. If you want to get some ideas about these and more, download our Production Checklist. It summarizes many other useful libraries for writing higher level Haskell.

If you’ve never started writing Haskell, now’s your chance! Get our free Beginner’s Checklist and learn the basics of getting started!

Getting Started

So to start parsing, let’s make some notes about our input format. First, we’ll treat our input feature document as a single string. We’ll remove all empty lines, and then trim leading and trailing whitespace from each line.

parseFeatureFromFile :: FilePath -> IO Feature
parseFeatureFromFile inputFile = do
  fileContents <- lines <$> readFile inputFile
  let nonEmptyLines = filter (not . isEmpty) fileContents
  let trimmedLines = map trim nonEmptyLines
  let finalString = unlines trimmedLines
  case parseFeature finalString of
    ...

…
isEmpty :: String -> Bool
isEmpty = all isSpace

trim :: String -> String
trim input = reverse flippedTrimmed
  where
    trimStart = dropWhile isSpace input
    flipped = reverse trimStart
    flippedTrimmed = dropWhile isSpace flipped

This means a few things for our syntax. First, we don’t care about indentation. Second, we ignore extra lines. This means our parsers might allow certain formats we don’t want. But that’s OK because we’re trying to keep things simple.

The RE Type

With applicative based parsing, the main data type we’ll be working with is called RE, for regular expression. This represents a parser, and it’s parameterized by two types:

data RE s a = ...

The s type refers to the fundamental unit we’ll be parsing. Since we're parsing our input as a single String, this will be Char. Then the a type is the result of the parsing element. This varies from parser to parser. The most basic combinator we can use is sym. This parses a single symbol of your choosing:

sym :: s - > RE s s

parseLowercaseA :: RE Char Char
parseLowercaseA = sym ‘a’

To use an RE parser, we call the match function or its infix equivalent =~. These will return a Just value if we can match the entire input string, and Nothing otherwise:

>> match parseLowercaseA “a”
Just ‘a’
>> “b” =~ parseLowercaseA
Nothing
>> “ab” =~ parseLowercaseA
Nothing -- (Needs to parse entire input)

Predicates and Strings

Naturally, we’ll want some more complicated functionality. Instead of parsing a single input character, we can parse any character that fits a particular predicate by using psym. So if we want to read any character that was not a newline, we could do:

parseNonNewline :: RE Char Char
parseNonNewline = psym (/= ‘\n’)

The string combinator allows us to match a particular full string and then return it:

readFeatureWord :: RE Char String
readFeatureWord = string “Feature”

We’ll use this for parsing keywords, though we’ll often end up discarding the “result”.

Applicative Combinators

Now the RE type is applicative. This means we can apply all kinds of applicative combinators over it. One of these is many, which allows us to apply a single parser several times. Here is one combinator that we’ll use a lot. It allows us to read everything up until a newline and return the resulting string:

readUntilEndOfLine :: RE Char String
readUntilEndOfLine = many (psym (/= '\n'))

Beyond this, we’ll want to make use of the applicative <*> operator to combine different parsers. We can also apply a pure function (or constructor) on top of those by using <$>. Suppose we have a data type that stores two characters. Here’s how we can build a parser for it:

data TwoChars = TwoChars Char Char

parseTwoChars :: RE Char TwoChars
parseTwoChars = TwoChars <$> parseNonNewline <*> parseNonNewline

...

>> match parseTwoChars “ab”
Just (TwoChars ‘a’ ‘b’)

We can also use <* and *>, which are cousins of the main applicative operator. The first one will parse but then ignore the right hand parse result. The second discards the left side result.

parseFirst :: RE Char Char
parseFirst = parseNonNewline <* parseNonNewline

parseSecond :: RE Char Char
parseSecond = parseNonNewline *> parseNonnewline

…

>> match parseFirst “ab”
Just ‘a’
>> match parseSecond “ab”
Just ‘b’
>> match parseFirst “a”
Nothing

Notice the last one fails because the parser needs to have both inputs! We’ll come back to this idea of failure in a second. But now that we know this technique, we can write a couple other useful parsers:

readThroughEndOfLine :: RE Char String
readThroughEndOfLine = readUntilEndOfLine <* sym '\n'

readThroughBar :: RE Char String
readThroughBar = readUntilBar <* sym '|'

readUntilBar :: RE Char String
readUntilBar = many (psym (\c -> c /= '|' && c /= '\n'))

The first will parse the rest of the line and then consume the newline character itself. The other parsers accomplish this same task, except with the vertical bar character. We’ll need these when we parse the Examples section next week.

Alternatives: Dealing with Parse Failure

We introduced the notion of a parser “failing” up above. Of course, we need to be able to offer alternatives when a parser fails! Otherwise our language will be very limited in its structure. Luckily, the RE type also implements Alternative. This means we can use the <|> operator to determine an alternative parser when one fails. Let’s see this in action:

parseFeatureTitle :: RE Char String
parseFeatureTitle = string “Feature: “ *> readThroughEndOfLine

parseScenarioTitle :: RE Char String
parseScenarioTitle = string “Scenario: “ *> readThroughEndOfLine

parseEither :: RE Char String
parseEither = parseFeatureTitle <|> parseScenarioTitle

…

>> match parseFeatureTitle “Feature: Login\n”
Just “Login”
>> match parseFeatureTitle “Scenario: Login\n”
Nothing
>> match parseEither “Scenario: Login\n”
Just “Login”

Of course, if ALL the options fail, then we’ll still have a failing parser!

>> match parseEither “Random: Login\n”
Nothing

We’ll need this to introduce some level of choice into our parsing system. For instance, it’s up to the user if they want to include a Background as part of their feature. So we need to be able to read the background if it’s there or else move onto parsing a scenario.

Conclusion

That wraps up our introduction to the basic combinators of applicative parsing. Next week, we’ll take all the pieces we’ve developed here and put them to work on Gherkin syntax itself. Everything seems pretty small so far. But we’ll see that we can actually build up our results very rapidly once we have the basic pieces in place!

If you want to see some more libraries that are useful for important Haskell tasks, take a look at our Production Checklist. It will introduce you to some libraries for parsing, databases, APIs, and much more!

If you’re new to Haskell, there’s no better time to start! Download our free Beginners Checklist! It will help you download the right tools and start learning the language.

Read More
James Bowen James Bowen

Join the Club: Type-safe Joins with Esqueleto!

In the last four articles or so, we’ve done a real whirlwind tour of Haskell libraries. We created a database schema using Persistent and used it to write basic SQL queries in a type-safe way. We saw how to expose this database via an API with Servant. We also went ahead and added some caching to that server with Redis. Finally, we wrote some basic tests around the behavior of this API. By using Docker, we made those tests reproducible.

In this article, we’re going to review this whole process by adding another type to our schema. We’ll write some new endpoints for an Article type, and link this type to our existing User type with a foreign key. Then we’ll learn one more library: Esqueleto. Esqueleto improves on Persistent by allowing us to write type-safe SQL joins.

As with the previous articles, there’s a specific branch on the Github repository for this series. Go there and take a look at the esqueleto branch to see the complete code for this article.

Adding Article to our Schema

So our first step is to extend our schema with our Article type. We’re going to give each article a title, some body text, and a timestamp for its publishing time. One new feature we’ll see is that we’ll add a foreign key referencing the user who wrote the article. Here’s what it looks like within our schema:

PTH.share [PTH.mkPersist PTH.sqlSettings, PTH.mkMigrate "migrateAll"] [PTH.persistLowerCase|
 User sql=users
   ...

 Article sql=articles
   title Text
   body Text
   publishedTime UTCTime
   authorId UserId
   UniqueTitle title
   deriving Show Read Eq
|]

We can use UserId as a type in our schema. This will create a foreign key column when we create the table in our database. In practice, our Article type will look like this when we use it in Haskell:

data Article = Article
 { articleTitle :: Text
 , articleBody :: Text
 , articlePublishedTime :: UTCTime
 , articleAuthorId :: Key User
 }

This means it doesn’t reference the entire user. Instead, it contains the SQL key of that user. Since we’ll be adding the article to our API, we need to add ToJSON and FromJSON instances as well. These are pretty basic as well, so you can check them out here if you’re curious. If you’re curious about JSON instances in general, take a look at this article.

Adding Endpoints

Now we’re going to extend our API to expose certain information about these articles. First, we’ll write a couple basic endpoints for creating an article and then fetching it by its ID:

type FullAPI = 
      "users" :> Capture "userid" Int64 :> Get '[JSON] User
 :<|> "users" :> ReqBody '[JSON] User :> Post '[JSON] Int64
 :<|> "articles" :> Capture "articleid" Int64 :> Get '[JSON] Article
 :<|> "articles" :> ReqBody '[JSON] Article :> Post '[JSON] Int64

Now, we’ll write a couple special endpoints. The first will take a User ID as a key and then it will provide all the different articles the user has written. We’ll do this endpoint as /articles/author/:authorid.

...
 :<|> "articles" :> "author" :> Capture "authorid" Int64 :> Get '[JSON] [Entity Article]

Our last endpoint will fetch the most recent articles, up to a limit of 10. This will take no parameters and live at the /articles/recent route. It will return tuples of users and their articles, both as entities.

…
 :<|> "articles" :> "recent" :> Get '[JSON] [(Entity User, Entity Article)]

Adding Queries (with Esqueleto!)

Before we can actually implement these endpoints, we’ll need to write the basic queries for them. For creating an article, we use the standard Persistent insert function:

createArticlePG :: PGInfo -> Article -> IO Int64
createArticlePG connString article = fromSqlKey <$> runAction connString (insert article)

We could do the same for the basic fetch endpoint. But we’ll write this basic query using Esqueleto in the interest of beginning to learn the syntax. With Persistent, we used list parameters to specify different filters and SQL operations. Esqueleto instead uses a special monad to compose the different type of query. The general format of an esqueleto select call will look like this:

fetchArticlePG :: PGInfo -> Int64 -> IO (Maybe Article)
fetchArticlePG connString aid = runAction connString selectAction
 where
   selectAction :: SqlPersistT (LoggingT IO) (Maybe Article)
   selectAction = select . from $ \articles -> do
     ...

We use select . from and then provide a function that takes a table variable. Our first queries will only refer to a single table, but we'll see a join later. To complete the function, we’ll provide the monadic action that will incorporate the different parts of our query.

The most basic filtering function we can call from within this monad is where_. This allows us to provide a condition on the query, much as we could with the filter list from Persistent.

selectAction :: SqlPersistT (LoggingT IO) (Maybe Article)
   selectAction = select . from $ \articles -> do
     where_ (articles ^. ArticleId ==. val (toSqlKey aid))

First, we use the ArticleId lens to specify which value of our table we’re filtering. Then we specify the value to compare against. We not only need to lift our Int64 into an SqlKey, but we also need to lift that value using the val function.

But now that we’ve added this condition, all we need to do is return the table variable. Now, select returns our results in a list. But since we’re searching by ID, we only expect one result. We’ll use listToMaybe so we only return the head element if it exists. We’ll also use entityVal once again to unwrap the article from its entity.

selectAction :: SqlPersistT (LoggingT IO) (Maybe Article)
   selectAction = ((fmap entityVal) . listToMaybe) <$> (select . from $ \articles -> do
     where_ (articles ^. ArticleId ==. val (toSqlKey aid))
     return articles)

Now we should know enough that we can write out the next query. It will fetch all the articles that have written by a particular user. We’ll still be querying on the articles table. But now instead checking the article ID, we’ll make sure the ArticleAuthorId is equal to a certain value. Once again, we’ll lift our Int64 user key into an SqlKey and then again with val to compare it in “SQL-land”.

fetchArticleByAuthorPG :: PGInfo -> Int64 -> IO [Entity Article]
fetchArticleByAuthorPG connString uid = runAction connString fetchAction
 where
   fetchAction :: SqlPersistT (LoggingT IO) [Entity Article]
   fetchAction = select . from $ \articles -> do
     where_ (articles ^. ArticleAuthorId ==. val (toSqlKey uid))
     return articles

And that’s the full query! We want a list of entities this time, so we’ve taken out listToMaybe and entityVal.

Now let’s write the final query, where we’ll find the 10 most recent articles regardless of who wrote them. We’ll include the author along with each article. So we’re returning a list of of these different tuples of entities. This query will involve our first join. Instead of using a single table for this query, we’ll use the InnerJoin constructor to combine our users table with the articles table.

fetchRecentArticlesPG :: PGInfo -> IO [(Entity User, Entity Article)]
fetchRecentArticlesPG connString = runAction connString fetchAction
 where
   fetchAction :: SqlPersistT (LoggingT IO) [(Entity User, Entity Article)]
   fetchAction = select . from $ \(users `InnerJoin` articles) -> do

Since we’re joining two tables together, we need to specify what columns we’re joining on. We’ll use the on function for that:

fetchAction :: SqlPersistT (LoggingT IO) [(Entity User, Entity Article)]
   fetchAction = select . from $ \(users `InnerJoin` articles) -> do
     on (users ^. UserId ==. articles ^. ArticleAuthorId)

Now we’ll order our articles based on the timestamp of the article using orderBy. The newest articles should come first, so we'll use a descending order. Then we limit the number of results with the limit function. Finally, we’ll return both the users and the articles, and we’re done!

fetchAction :: SqlPersistT (LoggingT IO) [(Entity User, Entity Article)]
   fetchAction = select . from $ \(users `InnerJoin` articles) -> do
     on (users ^. UserId ==. articles ^. ArticleAuthorId)
     orderBy [desc (articles ^. ArticlePublishedTime)]
     limit 10
     return (users, articles)

Caching Different Types of Items

We won’t go into the details of caching our articles in Redis, but there is one potential issue we want to observe. Currently we’re using a user’s SQL key as their key in our Redis store. So for instance, the string “15” could be such a key. If we try to naively use the same idea for our articles, we’ll have a conflict! Trying to store an article with ID “15” will overwrite the entry containing the User!

But the way around this is rather simple. What we would do is that for the user’s key, we would make the string something like users:15. Then for our article, we’ll have its key be articles:15. As long as we deserialize it the proper way, this will be fine.

Filling in the Server handlers

Now that we’ve written our database query functions, it is very simple to fill in our Server handlers. Most of them boil down to following the patterns we’ve already set with our other two endpoints:

fetchArticleHandler :: PGInfo -> Int64 -> Handler Article
fetchArticleHandler pgInfo aid = do
 maybeArticle <- liftIO $ fetchArticlePG pgInfo aid
 case maybeArticle of
   Just article -> return article
   Nothing -> Handler $ (throwE $ err401 { errBody = "Could not find article with that ID" })

createArticleHandler :: PGInfo -> Article -> Handler Int64
createArticleHandler pgInfo article = liftIO $ createArticlePG pgInfo article

fetchArticlesByAuthorHandler :: PGInfo -> Int64 -> Handler [Entity Article]
fetchArticlesByAuthorHandler pgInfo uid = liftIO $ fetchArticlesByAuthorPG pgInfo uid

fetchRecentArticlesHandler :: PGInfo -> Handler [(Entity User, Entity Article)]
fetchRecentArticlesHandler pgInfo = liftIO $ fetchRecentArticlesPG pgInfo

Then we’ll complete our Server FullAPI like so:

fullAPIServer :: PGInfo -> RedisInfo -> Server FullAPI
fullAPIServer pgInfo redisInfo =
 (fetchUsersHandler pgInfo redisInfo) :<|>
 (createUserHandler pgInfo) :<|>
 (fetchArticleHandler pgInfo) :<|>
 (createArticleHandler pgInfo) :<|>
 (fetchArticlesByAuthorHandler pgInfo) :<|>
 (fetchRecentArticlesHandler pgInfo)

One interesting thing we can do is that we can compose our API types into different sections. For instance, we could separate our FullAPI into two parts. First, we could have the UsersAPI type from before, and then we could make a new type for ArticlesAPI. We can glue these together with the e-plus operator just as we could individual endpoints!

type FullAPI = UsersAPI :<|> ArticlesAPI

type UsersAPI =
      "users" :> Capture "userid" Int64 :> Get '[JSON] User
 :<|> "users" :> ReqBody '[JSON] User :> Post '[JSON] Int64

type ArticlesAPI =
 "articles" :> Capture "articleid" Int64 :> Get '[JSON] Article
 :<|> "articles" :> ReqBody '[JSON] Article :> Post '[JSON] Int64
 :<|> "articles" :> "author" :> Capture "authorid" Int64 :> Get '[JSON] [Entity Article]
 :<|> "articles" :> "recent" :> Get '[JSON] [(Entity User, Entity Article)]

If we do this, we’ll have to make similar adjustments in other areas combining the endpoints. For example, we would need to update the server handler joining and the client functions.

Writing Tests

Since we already have some user tests, it would also be good to have a few tests on the Articles section of the API. We’ll add one simple test around creating an article and then fetching it. Then we’ll add one test each for the "articles-by-author" and "recent articles" endpoints.

So one of the tricky parts of filling in this section will be that we need to make test Article object. But we'll need them to be functions on the User ID. This is because we can’t know a priori what SQL IDs we'll get when we insert the users into the database. But we can fill in all the other fields, including the published time. Here’s one example, but we’ll have a total of 18 different “test” articles.

testArticle1 :: Int64 -> Article
testArticle1 uid = Article
 { articleTitle = "First post"
 , articleBody = "A great description of our first blog post body."
 , articlePublishedTime = posixSecondsToUTCTime 1498914000
 , articleAuthorId = toSqlKey uid
 }

-- 17 other articles and some test users as well
…

Our before hooks will create all these different entities in the database. In general, we’ll go straight to the database without calling the API itself. Like with our users tests, we'll want to delete any database items we create. Let's write a generic after-hook that will take user IDs and article IDs and delete them from our database:

deleteArtifacts :: PGInfo -> RedisInfo -> [Int64] -> [Int64] -> IO ()
deleteArtifacts pgInfo redisInfo users articles = do
 void $ forM articles $ \a -> deleteArticlePG pgInfo a
 void $ forM users $ \u -> do
   deleteUserCache redisInfo u
   deleteUserPG pgInfo u

It’s important we delete the articles first! If we delete the users first, we'll encounter foreign key exceptions!

Our basic create-and-fetch test looks a lot like the previous user tests. We test the success of the response and that the new article lives in Postgres as we expect.

beforeHook4 :: ClientEnv -> PGInfo -> IO (Bool, Bool, Int64, Int64)
beforeHook4 clientEnv pgInfo = do
 userKey <- createUserPG pgInfo testUser2
 articleKeyEither <- runClientM (createArticleClient (testArticle1 userKey)) clientEnv
 case articleKeyEither of
   Left _ -> error "DB call failed on spec 4!"
   Right articleKey -> do
     fetchResult <- runClientM (fetchArticleClient articleKey) clientEnv
     let callSucceeds = isRight fetchResult
     articleInPG <- isJust <$> fetchArticlePG pgInfo articleKey
     return (callSucceeds, articleInPG, userKey, articleKey)

spec4 :: SpecWith (Bool, Bool, Int64, Int64)
spec4 = describe "After creating and fetching an article" $ do
 it "The fetch call should return a result" $ \(succeeds, _, _, _) -> succeeds `shouldBe` True
 it "The article should be in Postgres" $ \(_, inPG, _, _) -> inPG `shouldBe` True

afterHook4 :: PGInfo -> RedisInfo -> (Bool, Bool, Int64, Int64) -> IO ()
afterHook4 pgInfo redisInfo (_, _, uid, aid) = deleteArtifacts pgInfo redisInfo [uid] [aid]

Our next test will create two different users and several different articles. We'll first insert the users and get their keys. Then we can use these keys to create the articles. We create five articles in this test. We assign three to the first user, and two to the second user:

beforeHook5 :: ClientEnv -> PGInfo -> IO ([Article], [Article], Int64, Int64, [Int64])
beforeHook5 clientEnv pgInfo = do
 uid1 <- createUserPG pgInfo testUser3
 uid2 <- createUserPG pgInfo testUser4
 articleIds <- mapM (createArticlePG pgInfo)
   [ testArticle2 uid1, testArticle3 uid1, testArticle4 uid1
   , testArticle5 uid2, testArticle6 uid2 ]
 ...

Now we want to test that we when call the articles-by-user endpoint, we only get the right articles. We’ll return each group of articles, the user IDs, and the list of article IDs:

beforeHook5 :: ClientEnv -> PGInfo -> IO ([Article], [Article], Int64, Int64, [Int64])
beforeHook5 clientEnv pgInfo = do
 uid1 <- createUserPG pgInfo testUser3
 uid2 <- createUserPG pgInfo testUser4
 articleIds <- mapM (createArticlePG pgInfo)
   [ testArticle2 uid1, testArticle3 uid1, testArticle4 uid1
   , testArticle5 uid2, testArticle6 uid2 ]
 firstArticles <- runClientM (fetchArticlesByAuthorClient uid1) clientEnv
 secondArticles <- runClientM (fetchArticlesByAuthorClient uid2) clientEnv
 case (firstArticles, secondArticles) of
   (Right as1, Right as2) -> return (entityVal <$> as1, entityVal <$> as2, uid1, uid2, articleIds)
   _ -> error "Spec 5 failed!"

Now we can write the assertion itself, testing that the articles returned are what we expect.

spec5 :: SpecWith ([Article], [Article], Int64, Int64, [Int64])
spec5 = describe "When fetching articles by author ID" $ do
 it "Fetching by the first author should return 3 articles" $ \(firstArticles, _, uid1, _, _) ->
   firstArticles `shouldBe` [testArticle2 uid1, testArticle3 uid1, testArticle4 uid1]
 it "Fetching by the second author should return 2 articles" $ \(_, secondArticles, _, uid2, _) ->
   secondArticles `shouldBe` [testArticle5 uid2, testArticle6 uid2]

We would then follow that up with a similar after hook.

The final test will follow a similar pattern. Only this time, we’ll be checking the combinations of users and articles. We’ll also make sure to include 12 different articles to test that the API limits results to 10.

beforeHook6 :: ClientEnv -> PGInfo -> IO ([(User, Article)], Int64, Int64, [Int64])
beforeHook6 clientEnv pgInfo = do
 uid1 <- createUserPG pgInfo testUser5
 uid2 <- createUserPG pgInfo testUser6
 articleIds <- mapM (createArticlePG pgInfo)
   [ testArticle7 uid1, testArticle8 uid1, testArticle9 uid1, testArticle10 uid2
   , testArticle11 uid2, testArticle12 uid1, testArticle13 uid2, testArticle14 uid2
   , testArticle15 uid2, testArticle16 uid1, testArticle17 uid1, testArticle18 uid2
   ]
 recentArticles <- runClientM fetchRecentArticlesClient clientEnv
 case recentArticles of
   Right as -> return (entityValTuple <$> as, uid1, uid2, articleIds)
   _ -> error "Spec 6 failed!"
 where
   entityValTuple (Entity _ u, Entity _ a) = (u, a)

Our spec will check that the list of 10 articles we get back matches our expectations. Then, as always, we remove the entities from our database.

Now we call these tests with our other tests, with small wrappers to call the hooks:

main :: IO ()
main = do
 ...
 hspec $ before (beforeHook4 clientEnv pgInfo) $ after (afterHook4 pgInfo redisInfo) $ spec4
 hspec $ before (beforeHook5 clientEnv pgInfo) $ after (afterHook5 pgInfo redisInfo) $ spec5
 hspec $ before (beforeHook6 clientEnv pgInfo) $ after (afterHook6 pgInfo redisInfo) $ spec6

And now we’re done! The tests pass!

…
After creating and fetching an article
 The fetch call should return a result
 The article should be in Postgres

Finished in 0.1698 seconds
2 examples, 0 failures

When fetching articles by author ID
 Fetching by the first author should return 3 articles
 Fetching by the second author should return 2 articles

Finished in 0.4944 seconds
2 examples, 0 failures

When fetching recent articles
 Should fetch exactly the 10 most recent articles

Conclusion

This completes our overview of useful production libraries. Over these articles, we’ve constructed a small web API from scratch. We’ve seen some awesome abstractions that let us deal with only the most important pieces of the project. Both Persistent and Servant generated a lot of extra boilerplate for us. This article showed the power of the Esqueleto library in allowing us to do type-safe joins. We also saw an end-to-end process of adding a new type and endpoints to our API.

In the coming weeks, we’ll be dealing with some more issues that can arise when building these kinds of systems. In particular, we’ll see how we can use alternative monads on top of Servant. Doing this can present certain issues that we'll explore. We’ll culminate by exploring the different approaches to encapsulating effects.

Be sure to check out our Haskell Stack mini-course!! It'll show you how to use Stack, so you can incorproate all the libraries from this series!

If you’re new to Haskell and not ready for that yet, take a look at our Getting Started Checklist and get going!

Read More
James Bowen James Bowen

Tangled Webs: Testing an Integrated System

In the last few articles, we’ve combined several useful Haskell libraries to make a small web app. We used Persistent to create a schema with automatic migrations for our database. Then we used Servant to expose this database as an API through a couple simple queries. Finally, we used Redis to act as a cache so that repeated requests for the same user happen faster.

For our next step, we’ll tackle a thorny issue: testing. How do we test a system that has so many moving parts? There are a couple different general approaches we could take. On one end of the spectrum we could mock out most of our API calls and services. This helps gives our testing deterministic behavior. This is desirable since we would want to tie deployments to test results. But we also want to be faithfully testing our API. So on the other end, there’s the approach we’ll try in this article. We'll set up functions to run our API and other services, and then use before and after hooks to use them.

Creating Client Functions for Our API

Calling our API from our tests means we’ll want a way to make API calls programmatically. We can do this with amazing ease with a Servant API by using the servant-client library. This library has one main function: client. This function takes a proxy for our API and generates programmatic client functions. Let's remember our basic endpoint types (after resolving connection information parameters):

fetchUsersHandler :: Int64 -> Handler User
createUserHandler :: User -> Handler Int64

We’d like to be able to call these API’s with functions that use the same parameters. Those types might look something like this:

fetchUserClient :: Int64 -> m User
createUserClient :: User -> m Int64

Where m is some monad. And in this case, the ServantClient library provides such a monad, ClientM. So let’s re-write these type signatures, but leave them seemingly unimplemented:

fetchUserClient :: Int64 -> ClientM User
createUserClient :: User -> ClientM Int64

Now we’ll construct a pattern match that combines these function names with the :<|> operator. As always, we need to make sure we do this in the same order as the original API type. Then we’ll set this pattern to be the result of calling client on a proxy for our API:

fetchUserClient :: Int64 -> ClientM User
createUserClient :: User -> ClientM Int64
(fetchUserClient :<|> createUserClient) = client (Proxy :: Proxy UsersAPI)

And that’s it! The Servant library fills in the details for us and implements these functions! We’ll see how we can actually call these functions later in the article.

Setting Up the Tests

We’d like to get to the business of deciding on our test cases and writing them. But first we need to make sure that our tests have a proper environment. This means 3 things. First we need to fetch the connection information for our data stores and API. This means the PGInfo, the RedisInfo, and the ClientEnv we’ll use to call the client functions we wrote. Second, we need to actually migrate our database so it has the proper tables. Third, we need to make sure our server is actually running. Let’s start with the connection information, as this is easy:

import Database (fetchPostgresConnection, fetchRedisConnection)

...

setupTests = do
  pgInfo <- fetchPostgresConnection
  redisInfo <- fetchRedisConnection
  ...

Now to create our client environment, we’ll need two main things. We’ll need a manager for the network connections and the base URL for the API. Since we’re running the API locally, we’ll use a localhost URL. The default manager from the Network library will work fine for us:

import Network.HTTP.Client (newManager)
import Network.HTTP.Client.TLS (tlsManagerSettings)
import Servant.Client (ClientEnv(..))

main = do
  pgInfo <- fetchPostgresConnection
  redisInfo <- fetchRedisConnection
  mgr <- newManager tlsManagerSettings
  baseUrl <- parseBaseUrl "http://127.0.0.1:8000"
  let clientEnv = ClientEnv mgr baseUrl

Now we can run our migration, which will ensure that our users table exists:

import Schema (migrateAll)

main = do
  pgInfo <- fetchPostgresConnection
  ...
  runStdoutLoggingT $ withPostgresqlConn pgInfo $ \dbConn ->
    runReaderT (runMigrationSilent migrateAll) dbConn

Last of all, we’ll start our server with runServer from our API module. We’ll fork this off to a separate thread, as otherwise it will block the test thread! We’ll wait for a second afterward to make sure it actually loads before the tests run (there are less hacky ways to do this of course). But then we’ll return all the important information we need, and we're done with test setup:

main :: IO (PGInfo, RedisInfo, ClientEnv, ThreadID)
main = do
  pgInfo <- fetchPostgresConnection
  redisInfo <- fetchRedisConnection
  mgr <- newManager tlsManagerSettings
  baseUrl <- parseBaseUrl "http://127.0.0.1:8000"
  let clientEnv = ClientEnv mgr baseUrl
  runStdoutLoggingT $ withPostgresqlConn pgInfo $ \dbConn ->
    runReaderT (runMigrationSilent migrateAll) dbConn
  threadId <- forkIO runServer
  threadDelay 1000000
  return (pgInfo, redisInfo, clientEnv, serverThreadId)

Organizing our 3 Test Cases

Now that we’re all set up, we can decide on our test cases. We’ll look at 3. First, if we have an empty database and we fetch a user by some arbitrary ID, we’ll expect an error. Further, we should expect that the user does not exist in the database or in the cache, even after calling fetch.

In our second test case, we’ll look at the effects of calling the create endpoint. We’ll save the key we get from this endpoint. Then we’ll verify that this user exists in the database, but NOT in the cache. Finally, our third case will insert the user with the create endpoint and then fetch the user. We’ll expect at the end of this that in fact the user exists in both the database AND the cache.

We organize each of our tests into up to three parts: the “before hook”, the test assertions, and the “after hook”. A “before hook” is some IO code we’ll run that will return particular results to our test assertion. We want to make sure it’s done running BEFORE any test assertions. This way, there’s no interleaving of effects between our test output and the API calls. Each before hook will first make the API calls we want. Then they'll investigate our different databases and determine if certain users exist.

We also want our tests to be database-neutral. That is, the database and cache should be in the same state after the test as they were before. So we’ll also have “after hooks” that run after our tests have finished (if we’ve actually created anything). The after hooks will delete any new entries. This means our before hooks also have to pass the keys for any database entities they create. This way the after hooks know what to delete.

Last of course, we actually need the testing code that assertions about the results. These will be pretty straightforward as we’ll see below.

Test #1

For our first test, we’ll start by making a client call to our API. We use runClientM combined with our clientEnv and the fetchUserClient function. Next, we’ll determine that the call in fact returns an error as it should. Then we’ll add two more lines checking if there’s an entry with the arbitrary ID in our database and our cache. Finally, we return all three boolean values:

beforeHook1 :: ClientEnv -> PGInfo -> RedisInfo -> IO (Bool, Bool, Bool)
beforeHook1 clientEnv pgInfo redisInfo = do
  callResult <- runClientM (fetchUserClient 1) clientEnv
  let throwsError = isLeft (callResult)
  inPG <- isJust <$> fetchUserPG pgInfo 1
  inRedis <- isJust <$> fetchUserRedis redisInfo 1
  return (throwsError, inPG, inRedis)

Now we’ll write our assertion. Since we’re using a before hook returning three booleans, the type of our Spec will be SpecWith (Bool, Bool, Bool). Each it assertion will take this boolean tuple as a parameter, though we’ll only use one for each line.

spec1 :: SpecWith (Bool, Bool, Bool)
spec1 = describe "After fetching on an empty database" $ do
  it "The fetch call should throw an error" $ \(throwsError, _, _) -> throwsError `shouldBe` True
  it "There should be no user in Postgres" $ \(_, inPG, _) -> inPG `shouldBe` False
  it "There should be no user in Redis" $ \(_, _, inRedis) -> inRedis `shouldBe` False

And that’s all we need for the first test! We don’t need an after hook since it doesn’t add anything to our database.

Tests 2 and 3

Now that we’re a little more familiar with how this code works, let’s take a quick look at the next before hook. This time we’ll first try creating our user. If this fails for whatever reason, we’ll throw an error and stop the tests. Then we can use the key to check out if the user exists in our database and Redis. We return the boolean values and the key.

beforeHook2 :: ClientEnv -> PGInfo -> RedisInfo -> IO (Bool, Bool, Int64)
beforeHook2 clientEnv pgInfo redisInfo = do
  userKeyEither <- runClientM (createUserClient testUser) clientEnv
  case userKeyEither of
    Left _ -> error "DB call failed on spec 2!"
    Right userKey -> do 
      inPG <- isJust <$> fetchUserPG pgInfo userKey
      inRedis <- isJust <$> fetchUserRedis redisInfo userKey
      return (inPG, inRedis, userKey)

Now our spec will look similar. This time we expect to find a user in Postgres, but not in Redis.

spec2 :: SpecWith (Bool, Bool, Int64)
spec2 = describe "After creating the user but not fetching" $ do
  it "There should be a user in Postgres" $ \(inPG, _, _) -> inPG `shouldBe` True
  it "There should be no user in Redis" $ \(_, inRedis, _) -> inRedis `shouldBe` False

Now we need to add the after hook, which will delete the user from the database and cache. Of course, we expect the user won’t exist in the cache, but we include this since we’ll need it in the final example:

afterHook :: PGInfo -> RedisInfo -> (Bool, Bool, Int64) -> IO ()
afterHook pgInfo redisInfo (_, _, key) = do
  deleteUserCache redisInfo key
  deleteUserPG pgInfo key

Last, we’ll write one more test case. This will mimic the previous case, except we’ll throw in a call to fetch in between. As a result, we expect the user to be in both Postgres and Redis:

beforeHook3 :: ClientEnv -> PGInfo -> RedisInfo -> IO (Bool, Bool, Int64)
beforeHook3 clientEnv pgInfo redisInfo = do
  userKeyEither <- runClientM (createUserClient testUser) clientEnv
  case userKeyEither of
    Left _ -> error "DB call failed on spec 3!"
    Right userKey -> do 
      _ <- runClientM (fetchUserClient userKey) clientEnv 
      inPG <- isJust <$> fetchUserPG pgInfo userKey
      inRedis <- isJust <$> fetchUserRedis redisInfo userKey
      return (inPG, inRedis, userKey)

spec3 :: SpecWith (Bool, Bool, Int64)
spec3 = describe "After creating the user and fetching" $ do
  it "There should be a user in Postgres" $ \(inPG, _, _) -> inPG `shouldBe` True
  it "There should be a user in Redis" $ \(_, inRedis, _) -> inRedis `shouldBe` True

And it will use the same after hook as case 2, so we’re done!

Hooking in and running the tests

The last step is to glue all our pieces together with hspec, before, and after. Here’s our main function, which also kills the thread running the server once it’s done:

main :: IO ()
main = do
  (pgInfo, redisInfo, clientEnv, tid) <- setupTests
  hspec $ before (beforeHook1 clientEnv pgInfo redisInfo) spec1
  hspec $ before (beforeHook2 clientEnv pgInfo redisInfo) $ after (afterHook pgInfo redisInfo) $ spec2
  hspec $ before (beforeHook3 clientEnv pgInfo redisInfo) $ after (afterHook pgInfo redisInfo) $ spec3
  killThread tid 
  return ()

And now our tests should pass!

After fetching on an empty database
  The fetch call should throw an error
  There should be no user in Postgres
  There should be no user in Redis

Finished in 0.0410 seconds
3 examples, 0 failures

After creating the user but not fetching
  There should be a user in Postgres
  There should be no user in Redis

Finished in 0.0585 seconds
2 examples, 0 failures

After creating the user and fetching
  There should be a user in Postgres
  There should be a user in Redis

Finished in 0.0813 seconds
2 examples, 0 failures

Using Docker

So when I say, “the tests pass”, they now work on my system. But if you were to clone the code as is and try to run them, you would get failures. The tests depend on Postgres and Redis, so if you don't have them running, they fail! It is quite annoying to have your tests depend on these outside services. This is the weakness of devising our tests as we have. It increases the on-boarding time for anyone coming into your codebase. The new person has to figure out which things they need to run, install them, and so on.

So how do we fix this? One answer is by using Docker. Docker allows you to create containers that have particular services running within them. This spares you from worrying about the details of setting up the services on your local machine. Even more important, you can deploy a docker image to your remote environments. So develop and prod will match your local system. To setup this process, we’ll create a description of the services we want running on our Docker container. We do this with a docker-compose file. Here’s what ours looks like:

version: '2'

services:
  postgres:
    image: postgres:9.6
    container_name: prod-haskell-series-postgres
    ports:
      - "5432:5432"

  redis:
    image: redis:4.0
    container_name: prod-haskell-series-redis
    ports:
      - "6379:6379"

Then, you can start these services for your Docker machines with docker-compose up. Granted, you do have to install and run Docker. But if you have several different services, this is a much easier on-boarding process. Better yet, the "compose" file ensures everyone uses the same versions of these services.

Even with this container running, the tests will still fail! That’s because you also need the tests themselves to be running on your Docker cluster. But with Stack, this is easy! We’ll add the following flag to our stack.yaml file:

docker:
  enable: true

Now, whenever you build and test your program, you will do so on Docker. The first time you do this, Docker will need to set everything up on the container. This means it will have to download Stack and ALL the different packages you use. So the first run will take a while. But subsequent runs will be normal. So after all that finishes, NOW the tests should work!

Conclusion

Testing integrated systems is hard. We can try mocking out the behavior of external services. But this can lead to a test representation of our program that isn’t faithful to the production system. But using the before and after hooks from Hspec is a great way make sure all your external events happen first. Then you can pass those results to simpler test assertions.

When it comes time to run your system, it helps if you can bring up all your external services with one command! Docker allows you to do this by listing the different services in the docker-compose file. Then, Stack makes it easy to run your program and tests on a docker container, so you can use the services!

Stack is the key to all this integration. If you’ve never used Stack before, you should check out our free mini-course. It will teach you all the basics of organizing a Haskell project using Stack.

If this is your first exposure to Haskell, I’ve hopefully convinced of some of its awesome possibilities! Take a look at our Getting Started Checklist and get learning!

Read More
James Bowen James Bowen

A Cache is Fast: Enhancing our API with Redis

In the last couple weeks we’ve used Persistent to store a User type in a Postgresql database. Then we were able to use Servant to create a very simple API that exposed this database to the outside world. This week, we’re going to look at how we can improve the performance of our API using a Redis cache.

One cannot overstate the importance of caching in both software and hardware. There's a hierarchy of memory types from registers, to RAM, to the File system, to a remote database. Accessing each of these gets progressively slower (by orders of magnitude). But the faster means of storage are more expensive, so we can’t always have as much as we'd like.

But memory usage operates on a very important principle. When we use a piece of memory once, we’re very likely to use it again in the near-future. So when we pull something out of long-term memory, we can temporarily store it in short-term memory as well. This way when we need it again, we can get it faster. After a certain point, that item will be overwritten by other more urgent items. This is the essence of caching.

Redis 101

Redis is an application that allows us to create a key-value store of items. It functions like a database, except it only uses these keys. It lacks the sophistication of joins, foreign table references and indices. So we can’t run the kinds of sophisticated queries that are possible on an SQL database. But we can run simple key lookups, and we can do them faster. In this article, we'll use Redis as a short-term cache for our user objects.

For this article, we've got one main goal for cache integration. Whenever we “fetch” a user using the GET endpoint in our API, we want to store that user in our Redis cache. Then the next time someone requests that user from our API, we'll grab them out of the cache. This will save us the trouble of making a longer call to our Postgres database.

Connecting to Redis

Haskell's Redis library has a lot of similarities to Persistent and Postgres. First, we’ll need some sort of data that tells us where to look for our database. For Postgres, we used a simple ConnectionString with a particular format. Redis uses a full data type called ConnectInfo.

data ConnectInfo = ConnectInfo
  { connectHost :: HostName -- String
  , connectPort :: PortId   -- (Can just be a number)
  , connectAuth :: Maybe ByteString
  , connectDatabase :: Integer
  , connectMaxConnection :: Int
  , connectMaxIdleTime :: NominalDiffTime
  }

This has many of the same fields we stored in our PG string, like the host IP address, and the port number. The rest of this article assumes you are running a local Redis instance at port 6379. This means we can use defaultConnectInfo. As always, in a real system you’d want to grab this information out of a configuration, so you’d need IO.

fetchRedisConnection :: IO ConnectInfo
fetchRedisConnection = return defaultConnectInfo

With Postgres, we used withPostgresqlConn to actually connect to the database. With Redis, we do this with the connect function. We'll get a Connection object that we can use to run Redis actions.

connect :: ConnectInfo -> IO Connection

With this connection, we simply use runRedis, and then combine it with an action. Here’s the wrapper runRedisAction we’ll write for that:

runRedisAction :: ConnectInfo -> Redis a -> IO a
runRedisAction redisInfo action = do
  connection <- connect redisInfo
  runRedis connection action

The Redis Monad

Just as we used the SqlPersistT monad with Persist, we’ll use the Redis monad to interact with our Redis cache. Our API is simple, so we’ll stick to three basic functions. The real types of these functions are a bit more complicated. But this is because of polymorphism related to transactions, and we won't be using those.

get :: ByteString -> Redis (Either x (Maybe ByteString))
set :: ByteString -> ByteString -> Redis (Either x ())
setex :: ByteString -> ByteString -> Int -> Redis (Either x ())

Redis is a key-value store, so everything we set here will use ByteString items. But once we’ve done that, these functions are all we need to use. The get function takes a ByteString of the key and delivers the value as another ByteString. The set function takes both the serialized key and value and stores them in the cache. The setex function does the same thing as set except that it also sets an expiration time for the item we’re storing.

Expiration is a very useful feature to be aware of, since most relational databases don’t have this. The nature of a cache is that it’s only supposed to store a subset of our information at any given time. If we never expire or delete anything, it might eventually store our whole database. That would defeat the purpose of using a cache! It's memory footprint should remain low compared to our database. So we'll use setex in our API.

Saving a User in Redis

So now let’s move on to the actions we’ll actually use in our API. First, we’ll write a function that will actually store a key-value pair of an Int64 key and the User in the database. Here’s how we start:

cacheUser :: ConnectInfo -> Int64 -> User -> IO ()
cacheUser redisInfo uid user = runRedisAction redisInfo $ setex ??? ??? ???

All we need to do now is convert our key and our value to ByteString values. We'll keep it simple and use Data.ByteString.Char8 combined with our Show and Read instances. Then we’ll create a Redis action using setex and expire the key after 3600 seconds (one hour).

import Data.ByteString.Char8 (pack, unpack)

...

cacheUser :: ConnectInfo -> Int64 -> User -> IO ()
cacheUser redisInfo uid user = runRedisAction redisInfo $ void $ 
  setex (pack . show $ uid) 3600 (pack . show $ user)

(We use void to ignore the result of the Redis call).

Fetching from Redis

Fetching a user is a similar process. We’ll take the connection information and the key we’re looking for. The action we’ll create uses the bytestring representation and calls get. But we can’t ignore the result of this call like we could before! Retrieving anything gives us Either e (Maybe ByteString). A Left response indicates an error, while Right Nothing indicates the key doesn’t exist. We’ll ignore the errors and treat the result as Maybe User though. If any error comes up, we’ll return Nothing. This means we run a simple pattern match:

fetchUserRedis :: ConnectInfo -> Int64 -> IO (Maybe User)
fetchUserRedis redisInfo uid = runRedisAction redisInfo $ do
  result <- Redis.get (pack . show $ uid)
  case result of
    Right (Just userString) -> return $ Just (read . unpack $ userString)
    _ -> return Nothing

If we do find something for that key, we’ll read it out of its ByteString format and then we’ll have our final User object.

Applying this to our API

Now that we’re all set up with our Redis functions, we have the update the fetchUsersHandler to use this cache. First, we now need to pass the Redis connection information as another parameter. For ease of reading, we’ll refer to these using type synonyms (PGInfo and RedisInfo) from now on:

type PGInfo = ConnectionString
type RedisInfo = ConnectInfo

…

fetchUsersHandler :: PGInfo -> RedisInfo -> Int64 -> Handler User
fetchUsersHandler pgInfo redisInfo uid = do
  ...

The first thing we’ll try is to look up the user by their ID in the Redis cache. If the user exists, we’ll immediately return that user.

fetchUsersHandler :: PGInfo -> RedisInfo -> Int64 -> Handler User
fetchUsersHandler pgInfo redisInfo uid = do
  maybeCachedUser <- liftIO $ fetchUserRedis redisInfo uid
  case maybeCachedUser of
    Just user -> return user
    Nothing -> do
      ...

If the user doesn’t exist, we’ll then drop into the logic of fetching the user in the database. We’ll replicate our logic of throwing an error if we find that user doesn’t actually exist. But if we find the user, we need one more step. Before we return it, we should call cacheUser and store it for the future.

fetchUsersHandler :: PGInfo -> RedisInfo -> Int64 -> Handler User
fetchUsersHandler pgInfo redisInfo uid = do
  maybeCachedUser <- liftIO $ fetchUserRedis redisInfo uid
  case maybeCachedUser of
    Just user -> return user
    Nothing -> do
      maybeUser <- liftIO $ fetchUserPG pgInfo uid
      case maybeUser of
        Just user -> liftIO (cacheUser redisInfo uid user) >> return user
        Nothing -> Handler $ (throwE $ err401 { errBody = "Could not find user with that ID" })

Since we changed our type signature, we’ll have to make a few other updates as well, but these are quite simple:

usersServer :: PGInfo -> RedisInfo -> Server UsersAPI
usersServer pgInfo redisInfo =
  (fetchUsersHandler pgInfo redisInfo) :<|> 
  (createUserHandler pgInfo)


runServer :: IO ()
runServer = do
  pgInfo <- fetchPostgresConnection
  redisInfo <- fetchRedisConnection
  run 8000 (serve usersAPI (usersServer pgInfo redisInfo))

And that’s it! We have a functioning cache with expiring entries. This means that repeated queries to our fetch endpoint should be much faster!

Conclusion

Caching is a vitally important way that we can write software that is often much faster for our users. Redis is a key-value store that we can use as a cache for our most frequently used data. We can use it as an alternative to forcing every single API call to hit our database. In Haskell, the Redis API requires everything to be a ByteString. So we have to deal with some logic surrounding encoding and decoding. But otherwise it operates in a very similar way to Persistent and Postgres.

Be sure to take a look at this code on Github! There’s a redis branch for this article. It includes all the code samples, including things I skipped over like imports!

We’re starting to get to the point where we’re using a lot of different libraries in our Haskell application! It pays to know how to organize everything, so package management is vital! I tend to use Stack for all my package management. It makes it quite easy to bring all these different libraries together. If you want to learn how to use Stack, check out our free Stack mini-course!

If you’ve never learned Haskell before, you should try it out! Download our Getting Started Checklist!

Read More
James Bowen James Bowen

Serve it up with Servant!

Last week we began our series on production Haskell techniques by learning about Persistent. We created a schema that contained a single User type that we could store in a Postgresql database. We examined a couple functions allowing us to make SQL queries about these users.

This week, we’ll see how we can expose this database to the outside world using an API. We’ll construct our API using the Servant library. Servant involves some advanced type level constructs, so there’s a lot to wrap your head around. There are definitely simpler approaches to HTTP servers than what Servant uses. But I’ve found that the power Servant gives us is well worth the effort.

This article will give a brief overview on Servant. But if you want a more in-depth introduction, you should check out my talk from Bayhac last spring! That talk was more exhaustive about the different combinators you can use in your APIs. It also showed authentication techniques, client functions and documentation. You can also check out the slides and code for that presentation!

Also, take a look at the servant branch on the Github repo for this project to see all the code for this article!

Defining our API

The first step in writing an API for our user database is to decide what the different endpoints are. We can decide this independent of what language or library we’ll use. For this article, our API will have two different endpoints. The first will be a POST request to /users. This request will contain a “user” definition in its body, and the result will be that we’ll create a user in our database. Here’s a sample of what this might look like:

POST /users
{
  userName : “John Doe”,
  userEmail : “john@doe.com”,
  userAge : 29,
  userOccupation: “Teacher”
}

It will then return a response containing the database key of the user we created. This will allow any clients to fetch the user again. The second endpoint will use the ID to fetch a user by their database identifier. It will be a GET request to /users/:userid. So for instance, the last request might have returned us something like 16. We could then do the following:

GET /users/16

And our response would look like the request body from above.

An API as a Type

So we’ve got our very simple API. How do we actually define this in Haskell, and more specifically with Servant? Well, Servant does something pretty unique (as far I’ve researched). In Servant we define our API by using a type. Our type will include sub-types for each of the endpoints of our API. We combine the different endpoints by using the (:<|>) operator. I'll sometimes refer to this as “E-plus”, for “endpoint-plus”. This is a type operator, like some of the operators we saw with dependent types and tensor flow. Here’s the blueprint of our API:

type UsersAPI = 
  fetchEndpoint
  :<|> createEndpoint

Now let's define what we mean by fetchEndpoint and createEndpoint. Endpoints combine different combinators that describe different information about the endpoint. We link combinators together with the (:>) operator, which I call “C-plus” (combinator plus). Here’s what our final API looks like. We’ll go through what each combinator means in the next section:

type UsersAPI =
       “users” :> Capture “userid” Int64 :> Get ‘[JSON] User
  :<|> “users” :> ReqBody ‘[JSON] User :> Post ‘[JSON] Int64

Different Combinators

Both of these endpoints have three different combinators. Let’s start by examining the fetch endpoint. It starts off with a string combinator. This is a path component, allowing us to specify what url extension the caller should use to hit to endpoint. We can use this combinator multiple times to have a more complicated path for the endpoint. If we instead wanted this endpoint to be at /api/users/:userid then we’d change it to:

“api” :> “users” :> Capture “userid” Int64 :> Get ‘[JSON] User

The second combinator (Capture) allows us to get a value out of the URL itself. We give this value a name and then we supply a type parameter. We won't have to do any path parsing or manipulation ourselves. Servant will handle the tricky business of parsing the URL and passing us an Int64. If you want to use your own custom class as a piece of HTTP data, that's not too difficult. You’ll just have to write an instance of the FromHttpApiData class. All the basic types like Int64 already have instances.

The final combinator itself contains three important pieces of information for this endpoint. First, it tells us that this is in fact a GET request. Second, it gives us the list of content-types that are allowable in the response. This is a type level list of content formats. Each type in this list must have different classes for serialization and deserialization of our data. We could have used a more complicated list like ’[JSON, PlainText, OctetStream]. But for the rest of this article, we’ll just use JSON. This means we'll use the ToJSON and FromJSON typeclasses for serialization.

The last piece of this combinator is the type our endpoint returns. So a successful request will give the caller back a response that contains a User in JSON format. Notice this isn’t a Maybe User. If the ID is not in our database, we’ll return a 401 error to indicate failure, rather than returning Nothing.

Our second endpoint has many similarities. It uses the same string path component. Then its final combinator is the same except that it indicates it is a POST request instead of a GET request. The second combinator then tells us what we can expect the request body to look like. In this case, the request body should contain a JSON representation of a User. It requires a list of acceptable content types, and then the type we want, like the Get and Post combinators.

That completes the “definition” of our API. We’ll need to add ToJSON and FromJSON instances of our User type in order for this to function. You can take a look at those on Github, and check out this article for more details on creating those instances!

Writing Handlers

Now that we’ve defined the type of our API, we need to write handler functions for each endpoint. This is where Servant’s awesomeness kicks in. We can map each endpoint up to a function that has a particular type based on the combinators in the endpoint. So, first let’s remember our endpoint for fetching a user:

“users” :> Capture “userid” Int64 :> Get ‘[JSON] User

The string path component doesn’t add any arguments to our function. The Capture component will result in a parameter of type Int64 that we’ll need in our function. Then the return type of our function should be User. This almost completely defines the type signature of our handler. We'll note though that it needs to be in the Handler monad. So here’s what it’ll look like:

fetchUsersHandler :: Int64 -> Handler User
...

Servant can also look at the type for our create endpoint:

“users” :> ReqBody ‘[JSON] User :> Post ‘[JSON] Int64

The parameter for a ReqBody parameter is just the type argument. So it will resolve the endpoint into the handler type:

createUserHandler :: User -> Handler Int64
...

Now, we’ll need to be able to access our Postgres database through both these handlers. So they’ll each get an extra parameter referring to the ConnectionString. We’ll pass that from our code so that by the time Servant is resolving the types, the parameter is accounted for:

fetchUsersHandler :: ConnectionString -> Int64 -> Handler User
createUserHandler :: ConnectionString -> User -> Handler Int64

Before we go any further, we should discuss the Handler monad. This is a wrapper around the monad ExceptT ServantErr IO. In other words, each of these requests might fail. To make it fail, we can throw errors of type ServantErr. Then of course we can also call IO functions, because these are network operations.

Before we implement these functions, let’s first write a couple simple helpers. These will use the runAction function from last week’s article to run database actions:

fetchUserPG :: ConnectionString -> Int64 -> IO (Maybe User)
fetchUserPG connString uid = runAction connString (get (toSqlKey uid))

createUserPG :: ConnectionString -> User -> IO Int64
createUserPG connString user = fromSqlKey <$> runAction connString (insert user)

For completeness (and use later in testing), we’ll also add a simple delete function. We need the where clause for type inference:

deleteUserPG :: ConnectionString -> Int64 -> IO ()
deleteUserPG connString uid = runAction connString (delete userKey)
  where
    userKey :: Key User
    userKey = toSqlKey uid

Now from our Servant handlers, we’ll call these two functions. This will completely cover the case of the create endpoint. But we’ll need a little bit more logic for the fetch endpoint. Since our functions are in the IO monad, we have to lift them up to Handler.

fetchUsersHandler :: ConnectionString -> Int64 -> Handler User
fetchUserHandler connString uid = do
  maybeUser <- liftIO $ fetchUserPG connString uid
  ...

createUserHandler :: ConnectionString -> User -> Handler Int64
createuserHandler connString user = liftIO $ createUserPG connString user

To complete our fetch handler, we need to account for a non-existent user. Instead of making the type of the whole endpoint a Maybe, we’ll throw a ServantErr in this case. We can use one of the built-in Servant error functions, which correspond to normal error codes. Then we can update the body. In this case, we’ll throw a 401 error. Here’s how we do that:

fetchUsersHandler :: ConnectionString -> Int64 -> Handler User
fetchUserHandler connString uid = do
  maybeUser <- lift $ fetchUserPG connString uid
  case maybeUser of
    Just user -> return user
    Nothing -> Handler $ (throwE $ err401 { errBody = “Could not find user with ID: “ ++ (show uid)})

createUserHandler :: ConnectionString -> User -> Handler Int64
createuserHandler connString user = lift $ createUserPG connString user

And that’s it! We're done with our handler functions!

Combining it All into a Server

Our next step is to create an object of type Server over our API. This is actually remarkably simple. When we defined the original type, we combined the endpoints with the (:<|>) operator. To make our Server, we do the same thing but with the handler functions:

usersServer :: ConnectionString -> Server UsersAPI
usersServer pgInfo = 
  (fetchUsersHandler pgInfo) :<|> 
  (createUserHandler pgInfo)

And Servant does all the work of ensuring that the type of each endpoint matches up with the type of the handler! It’s pretty awesome. Suppose we changed the type of our fetchUsersHandler so that it took a Key User instead of an Int64. We’d get a compile error:

fetchUsersHandler :: ConnectionString -> Int64 -> Handler User
…

-- Compile Error!
• Couldn't match type ‘Key User’ with ‘Int’
      Expected type: Server UsersAPI
        Actual type: (Key User -> Handler User)
                     :<|> (User -> Handler Int64)

There's now a mismatch between our API definition and our handler definition. So Servant knows to throw an error! The one issue is that the error messages can be rather difficult to interpret sometimes. This is especially the case when your API becomes very large! The “Actual type” section of the above error will become massive! So always be careful when changing your endpoints! Frequent compilation is your friend!

Building the Application

The final piece of the puzzle is to actually build an Application object out of our server. The first step of this process is to create a Proxy for our API. Remember that our API is a type, and not a term. But a Proxy allows us to represent this type at the term level. The concept is a little complicated, but the code is not!

import Data.Proxy

…

usersAPI :: Proxy UsersAPI
usersAPI = Proxy :: Proxy UsersAPI

Now we can make our runnable Application like so (assuming we have a Postgres connection):

serve usersAPI (usersServer connString)

We’ll run this server from port 8000 by using the run function, again from Network.Wai. (See Github for a full list of imports). We’ll fetch our connection string, and then we’re good to go!

runServer :: IO ()
runServer = do
  pgInfo <- fetchPostgresConnection
  run 8000 (serve usersAPI (usersServer pgInfo))

Conclusion

The Servant library offers some truly awesome possibilities. We’re able to define a web API at the type level. We can then define handler functions using the parameters the endpoints expect. Servant handles all the work of marshalling back and forth between the HTTP request and the native Haskell types. It also ensures a match between the endpoints and the handler function types!

If you want to see even more of the possibilities that Servant offers, you should watch my talk from Bayhac. It goes through some more advanced concepts like authentication and client side functions. You can get the slides and all the code examples for that talk here.

If you’ve never tried Haskell before, there’s no time like the present to start! Download our Getting Started Checklist for some tools to help start your Haskell journey!

Read More
James Bowen James Bowen

Trouble with Databases? Persevere with Persistent!

Our recent series at Monday Morning Haskell focused on machine learning. In particular, we did a deep dive on the Haskell Tensor Flow library. While AI is huge area indeed, it doesn't account for the bulk of day-to-day work. To build even a basic production system, there are a multitude of simpler tasks. In our newest series, we’ll be learning a host of different libraries to perform these tasks!

In this first article we’ll discuss Persistent. Many libraries allow you to make a quick SQL call. But Persistent does much more than that. With Persistent, you can link your Haskell types to your database definition. You can also make type-safe queries to save yourself the hassle of decoding data. All in all, it's a very cool system.

All the code for this series will be on Github! To follow along with this article, take a look at the persistent branch.

Our Basic Type

We’ll start by considering a simple user type that looks like this:

data User = User
  { userName :: Text
  , userEmail :: Text
  , userAge :: Int
  , userOccupation :: Text
  }

Imagine we want to store objects of this type in an SQL database. We’ll first need to define the table to store our users. We could do this with a manual SQL command or through an editor, but regardless, the process will be error prone. The command would look something like this:

create table users (
  name varchar(100),
  email varchar(100),
  age bigint,
  occupation varchar(100)
)

When we do this, there's nothing linking our Haskell data type to the table structure. If we update the Haskell code, we have to remember to update the database. And this means writing another error-prone command.

From our Haskell program, we’ll also want to make SQL queries based on the structure of the user. We could write out these raw commands and execute them, but the same issues apply. This method is error prone and not at all type-safe. Persistent helps us solve these problems.

Persistent and Template Haskell

We can get these bonuses from Persistent without all that much extra code! To do this, we’re going to use Template Haskell (TH). We’ve seen TH once in the past when we were deriving lenses and prisms for different data types. On that occasion we noted a few pros and cons of TH. It does allow us to avoid writing some boilerplate code. But it will make our compile times longer as well. It will also make our code less accessible to inexperienced Haskellers. With lenses though, it only saved us a few dozen lines of total code. With Persistent, TH generates a lot more code, so the pros definitely outweigh the cons.

When we created lenses with TH, we used a simple declaration makeLenses. Here, we’ll do something a bit more complicated. We’ll use a language construct called a “quasi-quoter”. This is a block of code that follows some syntax designed by the programmer or in a library, rather than normal Haskell syntax. It is often used in libraries that do some sort of foreign function interface. We delimit a quasi-quoter by a combination of brackets and pipes. Here’s what the Template Haskell call looks like. The quasi-quoter is the final argument:

import qualified Database.Persist.TH as PTH

PTH.share [PTH.mkPersist PTH.sqlSettings, PTH.mkMigrate "migrateAll"] [PTH.persistLowerCase|

|]

The share function takes a list of settings and then the quasi-quoter itself. It then generates the necessary Template Haskell for our data schema. Within this section, we’ll define all the different types our database will use. We notate certain settings about how those types. In particular we specify sqlSettings, so everything we do here will focus on an SQL database. More importantly, we also create a migration function, migrateAll. After this Template Haskell gets compiled, this function will allow us to migrate our DB. This means it will create all our tables for us!

But before we see this in action, we need to re-define our user type. Instead of defining User in the normal Haskell way, we’re going to define it within the quasi-quoter. Note that this level of Template Haskell requires many compiler extensions. Here’s our definition:

{-# LANGUAGE TemplateHaskell            #-}
{-# LANGUAGE QuasiQuotes                #-}
{-# LANGUAGE TypeFamilies               #-}
{-# LANGUAGE MultiParamTypeClasses      #-}
{-# LANGUAGE GADTs                      #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE RecordWildCards            #-}
{-# LANGUAGE FlexibleInstances          #-}
{-# LANGUAGE OverloadedStrings          #-}



PTH.share [PTH.mkPersist PTH.sqlSettings, PTH.mkMigrate "migrateAll"] [PTH.persistLowerCase|
  User sql=users
    name Text
    email Text
    age Int
    occupation Text
    UniqueEmail email
    deriving Show Read
|]

There are a lot of similarities to a normal data definition in Haskell. We’ve changed the formatting and reversed the order of the types and names. But you can still tell what’s going on. The field names are all there. We’ve still derived basic instances like we would in Haskell.

But we’ve also added some new directives. For instance, we’ve stated what the table name should be (by default it would be user, not users). We’ve also created a UniqueEmail constraint. This tells our database that each user has to have a unique email. The migration will handle creating all the necessary indices for this to work!

This Template Haskell will generate the normal Haskell data type for us. All fields will have the prefix user and will be camel-cased, as we specified. The compiler will also generate certain special instances for our type. These will enable us to use Persistent's type-safe query functions. Finally, this code generates lenses that we'll use as filters in our queries, as we'll see later.

Entities and Keys

Persistent also has a construct allowing us to handle database IDs. For each type we put in the schema, we’ll have a corresponding Entity type. An Entity refers to a row in our database, and it associates a database ID with the object itself. The database ID has the type SqlKey and is a wrapper around Int64. So the following would look like a valid entity:

import Database.Persist (Entity(..))

sampleUser :: Entity User
sampleUser = Entity (toSqlKey 1) $ User
  { userName = “admin”
  , userEmail = “admin@test.com”
  , userAge = 23
  , userOccupation = “System Administrator”
  }

This nice little abstraction that allows us to avoid muddling our user type with the database ID. This allows our other code to use a more pure User type.

The SqlPersistT Monad

So now that we have the basics of our schema, how do we actually interact with our database from Haskell code? As a specific example, we’ll be accessing a PostgresQL database. This requires the SqlPersistT monad. All the query functions return actions in this monad. The monad transformer has to live on top of a monad that is MonadIO, since we obviously need IO to run database queries.

If we’re trying to make a database query from a normal IO function, the first thing we need is a ConnectionString. This string encodes information about the location of the database. The connection string generally has 4-5 components. It has the host/IP address, the port, the database username, and the database name. So for instance if you’re running Postgres on your local machine, you might have something like:

{-# LANGUAGE OverloadedStrings #-}

import Database.Persist.Postgresql (ConnectionString)

connString :: ConnectionString
connString = “host=127.0.0.1 port=5432 user=postgres dbname=postgres password=password”

Now that we have the connection string, we’re set to call withPostgresqlConn. This function takes the string and then a function requiring a backend:

-- Also various constraints on the monad m
withPostgresqlConn :: (IsSqlBackend backend) => ConnectionString -> (backend -> m a) -> m a

The IsSqlBackend constraint forces us to use a type that conforms to Persistent’s guidelines. The SqlPersistT monad is only a synonym for ReaderT backend. So in general, the only thing we’ll do with this backend is use it as an argument to runReaderT. Once we’ve done this, we can pass any action within SqlPersistT as an argument to run.

import Control.Monad.Logger (runStdoutLoggingT)
import Database.Persist.Postgresql (ConnectionString, withPostgresqlConn, SqlPersistT)

…

runAction :: ConnectionString -> SqlPersistT a ->  IO a
runAction connectionString action = runStdoutLoggingT $ withPostgresqlConn connectionString $ \backend ->
  runReaderT action backend

Note we add in a call to runStdoutLoggingT so that our action can log its results, as Persistent expects. This is necessary whenever we use withPostgresqlConn. Here's how we would run our migration function:

migrateDB :: IO ()
migrateDB = runAction connString (runMigration migrateAll)

This will create the users table, perfectly to spec with our data definition!

Queries

Now let’s wrap up by taking a quick examination of the kinds of queries we can run. The first thing we could do is insert a new user into our database. For this, Persistent has the insert function. When we insert the user, we’ll get a key for that user as a result. Here’s the type signature for insert specified to our particular User type:

insert :: (MonadIO m) => User -> SqlPersistT m (Key User)

Then of course we can also do things in reverse. Suppose we have a key for our user and we want to get it out of the database. We’ll want the get function. Of course this might fail if there is no corresponding user in the database, so we need a Maybe.:

get :: (MonadIO m) => Key User -> SqlPersistT m (Maybe User)

We can use these functions for any type satisfying the PersistRecordBackend class. This is included for free when we use the template Haskell approach. So you can use these queries on any type that lives in your schema.

But SQL allows us to do much more than query with the key. Suppose we want to get all the users that meet certain criteria. We’ll want to use the selectList function, which replicates the behavior of the SQL SELECT command. It takes a couple different arguments for the different ways to run a selection. The two list types look a little complicated, but we’ll examine them in more detail:

selectList 
  :: PersistRecordBackend backend val 
  => [Filter val] 
  -> [SelectOpt val]
  -> SqlPersistT m [val]

As before, the PersistRecordBackend constraint is satisfied by any type in our TH schema. So we know our User type fits. So let’s examine the first argument. It provides a list of different filters that will determine which elements we fetch. For instance, suppose we want all users who are younger than 25 and whose occupation is “Teacher”. Remember the lenses I mentioned that get generated? We’ll create two different filters on this by using these lenses.

selectYoungTeachers :: (MonadIO m, MonadLogger m) => SqlPersistT m [User]
selectYoungTeachers = select [UserAge <. 25, UserOccupation ==. “Teacher”] []

We use the UserAge lens and the UserOccupation lens to choose the fields to filter on. We use a "less-than" operator to state that the age must be smaller than 25. Similarly, we use the ==. operator to match on the occupation. Then we provide an empty list of SelectOpts.

The second list of selection operations provides some other features we might expect in a select statement. First, we can provide an ordering on our returned data. We’ll also use the generated lenses here. For instance, Asc UserEmail will order our list by email. Here's an ordered query where we also limit ourselves to 100 entries. Here’s what that query would look like:

selectYoungTeachers’ :: (MonadIO m) => SqlPersistT m [User]
selectYoungTeachers’ = selectList [UserAge <=. 25, UserOccupation ==. “Teacher”] [Asc UserEmail]

The other types of SelectOpts include limits and offsets. For instance, we can further modify this query to exclude the first 5 users (as ordered by email) and then limit our selection to 100:

selectYoungTeachers' :: (MonadIO m) => SqlPersistT m [Entity User]
selectYoungTeachers' = selectList
  [UserAge <. 25, UserOccupation ==. "Teacher"] [Asc UserEmail, OffsetBy 5, LimitTo 100]

And that’s all there is to making queries that are type-safe and sensible. We know we’re actually filtering on values that make sense for our types. We don’t have to worry about typos ruining our code at runtime.

Conclusion

Persistent gives us some excellent tools for interacting with databases from Haskell. The Template Haskell mechanisms generate a lot of boilerplate code that helps us. For instance, we can migrate our database to the create the correct tables for our Haskell types. We also can perform queries that filter results in a type-safe way. All in all, it’s a fantastic experience.

Never tried Haskell? Do other languages frustrate you with runtime errors whenever you try to run SQL queries? You should give Haskell a try! Download our Getting Started Checklist for some tools that will help you!

If you’re a little familiar with Haskell but aren’t sure how to incorporate libraries like Persistent, you should check out our Stack mini-course. It’ll walk you through the basics of making a simple Haskell program using Stack.

Read More
James Bowen James Bowen

Getting the User's Opinion: Options in Haskell

GUI's are hard to create. Luckily for us, we can often get away with making our code available through a command line interface. As you start writing more Haskell programs, you'll probably have to do this at some point.

This article will go over some of the ins and outs of CLI’s. In particular, we’ll look at the basics of handling options. Then we'll see some nifty techniques for actually testing the behavior of our CLI.

A Simple Sample

To motivate the examples in this article, let’s design a simple program We’ll have the user input a message. Then we’ll print the message to a file a certain number of times and list the user’s name as the author at the top. We’ll also allow them to uppercase the message if they want. So we’ll get five pieces of input from the user:

  1. The filename they want
  2. Their name to place at the top
  3. Whether they want to uppercase or not
  4. The message
  5. The repetition number

We’ll use arguments and options for the first three pieces of information. Then we'll have a command line prompt for the other two. For instance, we’ll insist the user pass the expected file name as an argument. Then we’ll take an option for the name the user wants to put at the top. Finally, we’ll take a flag for whether the user wants the message upper-cased. So here are a few different invocations of the program.

>> run-cli “myfile.txt” -n “John Doe”
What message do you want in the file?
Sample Message
How many times should it be repeated?
5

This will print the following output to myfile.txt:

From: John Doe
Sample Message
Sample Message
Sample Message
Sample Message
Sample Message

Here’s another run, this time with an error in the input:

>> run-cli “myfile2.txt” -n “Jane Doe” -u
What message do you want in the file?
A new message
How many times should it be repeated?
asdf
Sorry, that isn't a valid number. Please enter a number.
3

This file will look like:

From: Jane Doe
A NEW MESSAGE
A NEW MESSAGE
A NEW MESSAGE

Finally, if we don’t get the right arguments, we should get a usage error:

>> run-cli
Missing: FILENAME -n USERNAME

Usage: CLIPractice-exe FILENAME -n USERNAME [-u]
  Comand Line Sample Program

Getting the Input

So the most important aspect of the program is getting the message and repetitions. We’ll ignore the options for now. We’ll print a couple messages, and then use the getLine function to get their input. There’s no way for them to give us a bad message, so this section is easy.

getMessage :: IO String
getMessage = do
  putStrLn "What message do you want in the file?"
  getLine

But they might try to give us a number we can’t actually parse. So for this task, we’ll have to set up a loop where we keep asking the user for a number until they give us a good value. This will be recursive in the failure case. If the user won’t enter a valid number, they’ll have no choice but to terminate the program by other means.

getRepetitions :: IO Int
getRepetitions = do
  putStrLn "How many times should it be repeated?"
  getNumber

getNumber :: IO Int
getNumber = do
  rep <- getLine
  case readMaybe rep of
    Nothing -> do
      putStrLn "Sorry, that isn't a valid number. Please enter a number."
      getNumber
    Just i -> return i

Once we’re doing reading the input, we’ll print the output to a file. In this instance, we hard-code all the options for now. Here’s the full program.

import Data.Char (toUpper)
import System.IO (writeFile)
import Text.Read (readMaybe)

runCLI :: IO ()
runCLI = do
  let fileName = "myfile.txt"
  let userName = "John Doe"
  let isUppercase = False
  message <- getMessage
  reps <- getRepetitions
  writeFile fileName (fileContents userName message reps isUppercase)

fileContents :: String -> String -> Int -> Bool -> String
fileContents userName message repetitions isUppercase = unlines $
  ("From: " ++ userName) :
  (replicate repetitions finalMessage)
  where
    finalMessage = if isUppercase 
     then map toUpper message 
    else message

Parsing Options

Now we have to deal with the question of how we actually parse the different options. We can do this by hand with the getArgs function, but this is somewhat error prone. A better option in general is to use the Options.Applicative library. We’ll explore the different possibilities this library allows. We’ll use three different helper functions for the three pieces of input we need.

The first thing we’ll do is build a data structure to hold the different options we want. We want to know the file name to store at, the name at the top, and the uppercase status.

data CommandOptions = CommandOptions
  { fileName :: FilePath
  , userName :: String
  , isUppercase :: Bool }

Now we need to parse each of these. We’ll start with the uppercase value. The most simple parser we have is the flag function. It tells us if a particular flag (we’ll call it -u) is present, we’ll uppercase the message, otherwise not. It gets coded like this with the Options library:

uppercaseParser :: Parser Bool
uppercaseParser = flag False True (short 'u')

Notice we use short in the final argument to denote the flag character. We could also use the switch function, since this flag is only a boolean, but this version is more general.

Now we’ll move on to the argument for the filename. This uses the argument helper function. We’ll use a string parser (str) to ensure we get the actual string. We won’t worry about the filename having a particular format here. Notice we add some metadata to this argument. This tells the user what they are missing if they don’t use the proper format.

fileNameParser :: Parser String
fileNameParser = argument str (metavar "FILENAME")

Finally, we’ll deal with the option of what name will go at the top. We could also do this as an argument, but let’s see what the option is like. An argument is a required positional parameter. An option on the other hand comes after a particular flag. We also add metadata here for a better error message as well. The short piece of our metadata ensures it will use the option character we want.

userNameParser :: Parser FilePath
userNameParser = option str (short 'n' <> metavar "USERNAME")

Now we have to combine these different parsers and add a little more info about our program.

import Options.Applicative (execParser, info, helper, Parser, fullDesc, 
  progDesc, short, metavar, flag, argument, str, option)

parseOptions :: IO CommandOptions
parseOptions = execParser $ info (helper <*> commandOptsParser) commandOptsInfo
  where
    commandOptsParser = CommandOptions <$> fileNameParser <*> userNameParser <*> uppercaseParser
    commandOptsInfo = fullDesc <> progDesc "Command Line Sample Program"

-- Revamped to take options
runCLI :: CommandOptions -> IO ()
runCLI commandOptions = do
  let file = fileName commandOptions
  let user = userName commandOptions
  let uppercase = isUppercase commandOptions
  message <- getMessage
  reps <- getRepetitions
  writeFile file (fileContents user message reps uppercase)

And now we’re done! We build our command object using these three different parsers. We chain the operations together using applicatives! Then we pass the result to our main program. If you aren’t too familiar with functors, and applicatives, we went over these a while ago on the blog. Refresh your memory!

IO Testing

Now we have our program working, we need to ask ourselves how we test its behavior. We can do manual command line tests ourselves, but it would be nice to have an automated solution. The key to this is the Handle abstraction.

Let’s first look at some basic file handling types.

openFile :: FilePath -> IO Handle
hGetLine :: Handle -> IO String
hPutStrLn :: Handle -> IO ()
hClose :: Handle -> IO ()

Normally when we write something to a file, we open a handle for it. We use the handle (instead of the string literal name) for all the different operations. When we’re done, we close the handle.

The good news is that the stdin and stdout streams are actually the exact same Handle type under the hood!

stdin :: Handle
stdout :: Handle

How does this help us test? The first step is to abstract away the handles we’re working with. Instead of using print and getLine, we’ll want to use hGetLine and hPutStrLn. Then we take these parameters as arguments to our program and functions. Let’s look at our reading functions:

getMessage :: Handle -> Handle -> IO String
getMessage inHandle outHandle = do
  hPutStrLn outHandle "What message do you want in the file?"
  hGetLine inHandle

getRepetitions :: Handle -> Handle -> IO Int
getRepetitions inHandle outHandle = do
  hPutStrLn outHandle "How many times should it be repeated?"
  getNumber inHandle outHandle

getNumber :: Handle -> Handle -> IO Int
getNumber inHandle outHandle = do
  rep <- hGetLine inHandle
  case readMaybe rep of
    Nothing -> do
      hPutStrLn outHandle "Sorry, that isn't a valid number. Please enter a number."
      getNumber inHandle outHandle
    Just i -> return i

Once we’ve done this, we can make the input and output handles parameters to our program as follows. Our wrapper executable will pass stdin and stdout:

-- Library File:
runCLI :: Handle -> Handle -> CommandOptions -> IO ()
runCLI inHandle outHandle commandOptions = do
  let file = fileName commandOptions
  let user = userName commandOptions
  let uppercase = isUppercase commandOptions
  message <- getMessage inHandle outHandle
  reps <- getRepetitions inHandle outHandle
  writeFile file (fileContents user message reps uppercase)

-- Executable File
main :: IO ()
main = do
  options <- parseOptions
  runCLI stdin stdout options

Now our library API takes the handles as parameters. This means in our testing code, we can pass whatever handle we want to test the code. And, as you may have guessed, we’ll do this with files, instead of stdin and stdout. We’ll make one file with our expected terminal output:

What message do you want in the file?
How many times should it be repeated?

We’ll make another file with our input:

Sample Message
5

And then the file we expect to be created:

From: John Doe
Sample Message
Sample Message
Sample Message
Sample Message
Sample Message

Now we can write a test calling our library function. It will pass the expected arguments object as well as the proper file handles. Then we can compare the output of our test file and the output file.

import Lib

import System.IO
import Test.HUnit

main :: IO ()
main = do
  inputHandle <- openFile "input.txt" ReadMode
  outputHandle <- openFile "terminal_output.txt" WriteMode
  runCLI inputHandle outputHandle options
  hClose inputHandle
  hClose outputHandle
  expectedTerminal <- readFile "expected_terminal.txt"
  actualTerminal <- readFile "terminal_output.txt"
  expectedFile <- readFile "expected_output.txt"
  actualFile <- readFile "testOutput.txt"
  assertEqual "Terminal Output Should Match" expectedTerminal actualTerminal
  assertEqual "Output File Should Match" expectedFile actualFile

options :: CommandOptions
options = CommandOptions "testOutput.txt" "John Doe" False

And that’s it! We can also use this process to add tests around the error cases, like when the user enters invalid numbers.

Summary

Writing a command line interface isn't always the easiest task. Getting a user’s input sometimes requires creating loops if they won’t give you the information you want. Then dealing with arguments can be a major pain. The Options.Applicative library contains many option parsing tools. It helps you deal with flags, options, and arguments. When you're ready to test your program, you'll want to abstract the file handles away. You can use stdin and stdout from your main executable. But then when you test, you can use files as your input and output handles.

Want to try writing a CLI but don't know Haskell yet? No sweat! Download our Getting Started Checklist and get going learning the language!

When you're making a full project with executables and test suites, you need to keep organized! Take our FREE Stack mini-course to learn how to organize your Haskell with Stack.

Read More
James Bowen James Bowen

Playing Match Maker

In last week’s article we saw an introduction to the Functional Graph Library. This is a neat little library that allows us to build graphs in Haskell. It then makes it very easy to solve basic graph problems. For instance, we could plug our graph into a function that would give us the shortest path. In another example, we found the minimum spanning tree of our graph with a single function call.

Those examples were rather contrived though. Our “input” was already in a graph format more or less, so we didn’t have to think much to convert it. Then we solved arbitrary problems without providing any real context. In programming graph algorithms often come up when you’re least expecting it! We’ll prove this with a sample problem.

Motivating Example

Suppose we’re building a house. We have several people who are working on the house, and they all have various tasks to do. The need certain tools to do these tasks. As long as a person gets a tool for one of the jobs they’re working on, they can make progress. Of course, we have a limited supply of tools. So suppose we have this set of tools:

Hammer
Hammer
Power Saw
Ladder
Ladder
Ladder
Caulking Gun

And now we have the following people working on this house who all have the following needs:

Jason, Hammer, Ladder, Caulking Gun
Amanda, Hammer
Kristina, Caulking Gun
Chad, Ladder
Josephine, Power Saw
Chris, Power Saw, Ladder
Dennis, Caulking Gun, Hammer

We want to find an assignment of people to tools such that the highest number of people has at least one of their tools. In this situation we can actually find an assignment that gets all seven people a tool:

Jason - Ladder
Amanda - Hammer
Kristina - Caulking Gun
Chad - Ladder
Josephine - Power Saw
Chris - Ladder
Dennis - Hammer

We’ll read our problem in from a handle like we did last time, and assume we first read the number of tools, then people. Our output will be the list of tools and then a map from each person’s name to the list of tools they can use.

module Tools where

import           Control.Monad (replicateM)
import           Data.List.Split (splitOn)
import           System.IO (hGetLine, Handle)

readInput :: Handle -> IO ([String], [(String, [String])])
readInput handle = do
  numTools <- read <$> hGetLine handle
  numPeople <- read <$> hGetLine handle
  tools <- replicateM numTools (hGetLine handle)
  people <- replicateM numPeople (readPersonLine handle)
  return (tools, people)

readPersonLine :: Handle -> IO (String, [String]) 
readPersonLine handle = do
  line <- hGetLine handle
  let components = splitOn ", " line
  return (head components, tail components)

Some Naive Solutions

Now our first guess might be to try a greedy algorithm. We’ll iterate through the list of tools, find the first person in the list who can use that tool, and recurse on the rest. This might look a little like this:

solveToolsGreedy :: Handle -> IO Int
solveToolsGreedy handle = do
  (tools, personMap) <- readInput handle
  return $ findMaxMatchingGreedy tools (Map.toList personMap)

findMaxMatchingGreedy :: [String] -> [(String, [String])] -> Int 
findMaxMatchingGreedy [] _ = 0 -- No more tools to match
findMaxMatchingGreedy (tool : rest) personMap = case break (containsTool tool) personMap of
  (allPeople, []) -> findMaxMatchingGreedy rest personMap -- Can't match this tool
  (somePeople, (_ : otherPeople)) -> 1 + findMaxMatchingGreedy rest (somePeople ++ otherPeople)

containsTool :: String -> (String, [String]) -> Bool
containsTool tool pair = tool `elem` (snd pair)

Unfortunately, this could lead to some sub-optimal outcomes. In this case, our greed might cause us to assign the caulking gun to Jason, and then Kristina won’t be able to use any tools.

So now let’s try and fix this by during 2 recursive calls! We’ll find the first person we can assign the tool to (or otherwise drop the tool). Once we’ve done this, we’ll imagine two scenarios. In case 1, this person will use the tool, so we can remove the tool and the person from our lists. Then we'll recurse on the remainder, and add 1. In case 2, this person will NOT use the tool, so we’ll recurse except REMOVE the tool from that person’s list.

findMaxMatchingSlow :: [String] -> [(String, [String])] -> Int
findMaxMatchingSlow [] _ = 0
findMaxMatchingSlow allTools@(tool : rest) personMap = 
  case break (containsTool tool) personMap of
    (allPeople, []) -> findMaxMatchingGreedy rest personMap -- Can't match this tool
    (somePeople, (chosen : otherPeople)) -> max useIt loseIt
      where
        useIt = 1 + findMaxMatchingSlow rest (somePeople ++ otherPeople)
        loseIt = findMaxMatchingSlow allTools newList
        newList = somePeople ++ (modifiedChosen : otherPeople)
        modifiedChosen = dropTool tool chosen

dropTool :: String -> (String, [String]) -> (String, [String])
dropTool tool (name, validTools) = (name, delete tool validTools)

The good news is that this will get us the optimal solution! It solves our simple case quite well! The bad news is that it will take too long on more difficult cases. A naive use-it-or-lose-it algorithm like this will take exponential time (O(2^n)). This means even for modest input sizes (~100) we’ll be waiting for a loooong time. Anything much larger takes prohibitively long. Plus, there’s no way for us to memoize the solution here.

Graphs to the Rescue!

So at this point, are we condemned to choose between a fast inaccurate algorithm and a correct but slow one? In this case the answer is no! This problem is actually best solved by using a graph algorithm! This is an example of what’s called a “bipartite matching” problem. We’ll create a graph with two sets of nodes. On the left, we’ll have a node for each tool. On the right, we’ll have a node for each person. The only edges in our graph will go from nodes on the left towards nodes on the right. A “tool” node has an edge to a “person” node if that person can use the tool. Here’s a partial representation of our graph (plus or minus my design skills). We’ve only drawn in the edges related to Amanda, Christine and Josephine so far.

Now we want to find the “maximum matching” in this graph. That is, we want the largest set of edges such that no two edges share a node. The way to solve this problem using graph algorithms is to turn it into yet ANOTHER graph problem! We’ll add a node on the far left, called the “source” node. We’ll connect it to every “tool” node. Now we’ll add a node on the far right, called the “sink” node. It will receive an edge from every “person” node. All the edges in this graph have a distance of 1.

Again, most of the middle edges are missing here.

Again, most of the middle edges are missing here.

The size of the maximum matching in this case is equal to the “max flow” from the source node to the sink node. This is a somewhat advanced concept. But imagine there is water gushing out of the source node and that every edge is a “pipe” whose value (1) is the capacity. We want the largest amount of water that can go through to the sink at once.

Last week we saw built-in functions for shortest path and min spanning tree. FGL also has an out-of-the-box solution for max flow. So our main goal now is to take our objects and construct the above graph.

Preparing Our Solution

A couple weeks ago, we created a segment tree that was very specific to the problem. This time, we’ll show what it’s like to write a more generic algorithm. Throughout the rest of the article, you can imagine that a is a tool, and b is a person. We’ll write a general maxMatching function that will take a list of a’s, a list of b’s, AND a predicate function. This function will determine whether an a object and a b object should have an edge between them. We’ll use the containsTool function from above as our predicate. Then we'll call our general function.

findMaxMatchingBest :: [String] -> [(String, [String])] -> Int
findMaxMatchingBest tools personMap = findMaxMatching containsTool tools personMap

…(different module)

findMaxMatching :: (a -> b -> Bool) -> [a] -> [b] -> Int
findMaxMatching predicate as bs = ...

Building Our Graph

To build our graph, we’ll have to decide on our labels. Once again, we’ll only label our edges with integers. In fact, they’ll all have a “capacity” label of 1. But our nodes will be a little more complicated. We’ll want to associate the node with the object, and we have a heterogeneous (and polymorphic) set of items. We’ll make this NodeLabel type that could refer to any of the four types of nodes:

data NodeLabel a b = 
  LeftNode a |
  RightNode b |
  SourceNode |
  SinkNode

Next we’ll start building our graph by constructing the inner part. We’ll make the two sets of nodes as well as the edges connecting them. We’ll assign the left nodes to the indices from 1 up through the size of that list. And then the right nodes will take on the indices from one above the first list's size through the sum of the list sizes.

createInnerGraph 
  :: (a -> b -> Bool) 
  -> [a]
  -> [b]
  -> ([LNode (NodeLabel a b)], [LNode (NodeLabel a b)], [LEdge Int])
createInnerGraph predicate as bs = ...
  where
    sz_a = length as
    sz_b = length bs
    aNodes = zip [1..sz_a] (LeftNode <$> as)
    bNodes = zip [(sz_a + 1)..(sz_a + sz_b)] (RightNode <$> bs)

Next we’ll also make tuples matching the index to the item itself without its node label wrapper. This will allow us to call the predicate on these items. We’ll then get all the edges out by using a list comprehension. We'll pull each pairing and determining if the predicate holds. If it does, we’ll add the edge.

where
  ...
  indexedAs = zip [1..sz_a] as
  indexedBs = zip [(sz_a + 1)..(sz_a + sz_b)] bs
  nodesAreConnected (_, aItem) (_, bItem) = predicate aItem bItem
  edges = [(fst aN, fst bN, 1) | aN <- indexedAs, bN <- indexedBs, nodesAreConnected aN bN]

Now we’ve got all our pieces, so we combine them to complete the definition:

createInnerGraph predicate as bs = (aNodes, bNodes, edges)

Now we’ll construct the “total graph”. This will include the source and sink nodes. It will include the indices of these nodes in the return value so that we can use them in our algorithm:

totalGraph :: (a -> b -> Bool) -> [a] -> [b] 
  -> (Gr (NodeLabel a b) Int, Int, Int)

Now we’ll start our definition by getting all the pieces out of the inner graph as well as the size of each list. Then we’ll assign the index for the source and sink to be the numbers after these combined sizes. We’ll also make the nodes themselves and give them the proper labels.

totalGraph predicate as bs = ...
  where
    sz_a = length as
    sz_b = length bs
    (leftNodes, rightNodes, middleEdges) = createInnerGraph predicate as bs
    sourceIndex = sz_a + sz_b + 1
    sinkIndex = sz_a + sz_b + 2
    sourceNode = (sourceIndex, SourceNode)
    sinkNode = (sinkIndex, SinkNode)

Now to finish this definition, we’ll first create edges from the source out to the right nodes. Then we'll make edges from the left nodes to the sink. We’ll also use list comprehensions there. Then we’ll combine all our nodes and edges into two lists.

where
  ...
  sourceEdges = [(sourceIndex, lIndex, 1) | lIndex <- fst <$> leftNodes]
  sinkEdges = [(rIndex, sinkIndex, 1) | rIndex <- fst <$> rightNodes]
  allNodes = sourceNode : sinkNode : (leftNodes ++ rightNodes)
  allEdges = sourceEdges ++ middleEdges ++ sinkEdges

Finally, we’ll complete the definition by making our graph. As we noted, we'll also return the source and sink indices:

totalGraph predicate as bs = (mkGraph allNodes allEdges, sourceIndex, sinkIndex)
  where
    ...

The Grand Finale

OK one last step! We can now fill in our findMaxMatching function. We’ll first get the necessary components from building the graph. Then we’ll call the maxFlow function. This works out of the box, just like sp and msTree from the last article!

import Data.Graph.Inductive.Graph (LNode, LEdge, mkGraph)
import Data.Graph.Inductive.PatriciaTree (Gr)
import Data.Graph.Inductive.Query.MaxFlow (maxFlow)

findMaxMatching :: (a -> b -> Bool) -> [a] -> [b] -> Int
findMaxMatching predicate as bs = maxFlow graph source sink
  where
    (graph, source, sink) = totalGraph predicate as bs

And we’re done! This will always give us the correct answer and it runs very fast! Take a look at the code on Github if you want to experiment with it!

Conclusion

Whew algorithms are exhausting aren’t they? That was a ton of code we just wrote. Let’s do a quick review. So this time around we looked at an actual problem that was not an obvious graph problem. We even tried a couple different algorithmic approaches. They both had issues though. Ultimately, we found that a graph algorithm was the solution, and we were able to implement it with FGL.

If you want to use FGL (or most any awesome Haskell library), it would help a ton if you learned how to use Stack! This great tool wraps project organization and package management into one tool. Check out our FREE Stack mini-course and learn more!

If you’ve never programmed in Haskell at all, then what are you waiting for? It’s super fun! You should download our Getting Started Checklist for some tips and resources on starting out!

Stay tuned next week for more on the Monday Morning Haskell Blog!

Read More