Join the Club: Type-safe Joins with Esqueleto!

Oct 30

In the last four articles or so, we’ve done a real whirlwind tour of Haskell libraries. We created a database schema using Persistent and used it to write basic SQL queries in a type-safe way. We saw how to expose this database via an API with Servant. We also went ahead and added some caching to that server with Redis. Finally, we wrote some basic tests around the behavior of this API. By using Docker, we made those tests reproducible.

In this article, we’re going to review this whole process by adding another type to our schema. We’ll write some new endpoints for an Article type, and link this type to our existing User type with a foreign key. Then we’ll learn one more library: Esqueleto. Esqueleto improves on Persistent by allowing us to write type-safe SQL joins.

As with the previous articles, there’s a specific branch on the Github repository for this series. Go there and take a look at the esqueleto branch to see the complete code for this article.

Adding Article to our Schema

So our first step is to extend our schema with our Article type. We’re going to give each article a title, some body text, and a timestamp for its publishing time. One new feature we’ll see is that we’ll add a foreign key referencing the user who wrote the article. Here’s what it looks like within our schema:

PTH.share [PTH.mkPersist PTH.sqlSettings, PTH.mkMigrate "migrateAll"] [PTH.persistLowerCase|
 User sql=users
   ...

 Article sql=articles
   title Text
   body Text
   publishedTime UTCTime
   authorId UserId
   UniqueTitle title
   deriving Show Read Eq
|]

We can use UserId as a type in our schema. This will create a foreign key column when we create the table in our database. In practice, our Article type will look like this when we use it in Haskell:

data Article = Article
 { articleTitle :: Text
 , articleBody :: Text
 , articlePublishedTime :: UTCTime
 , articleAuthorId :: Key User
 }

This means it doesn’t reference the entire user. Instead, it contains the SQL key of that user. Since we’ll be adding the article to our API, we need to add ToJSON and FromJSON instances as well. These are pretty basic as well, so you can check them out here if you’re curious. If you’re curious about JSON instances in general, take a look at this article.

Adding Endpoints

Now we’re going to extend our API to expose certain information about these articles. First, we’ll write a couple basic endpoints for creating an article and then fetching it by its ID:

type FullAPI = 
      "users" :> Capture "userid" Int64 :> Get '[JSON] User
 :<|> "users" :> ReqBody '[JSON] User :> Post '[JSON] Int64
 :<|> "articles" :> Capture "articleid" Int64 :> Get '[JSON] Article
 :<|> "articles" :> ReqBody '[JSON] Article :> Post '[JSON] Int64

Now, we’ll write a couple special endpoints. The first will take a User ID as a key and then it will provide all the different articles the user has written. We’ll do this endpoint as /articles/author/:authorid.

...
 :<|> "articles" :> "author" :> Capture "authorid" Int64 :> Get '[JSON] [Entity Article]

Our last endpoint will fetch the most recent articles, up to a limit of 10. This will take no parameters and live at the /articles/recent route. It will return tuples of users and their articles, both as entities.

…
 :<|> "articles" :> "recent" :> Get '[JSON] [(Entity User, Entity Article)]

Adding Queries (with Esqueleto!)

Before we can actually implement these endpoints, we’ll need to write the basic queries for them. For creating an article, we use the standard Persistent insert function:

createArticlePG :: PGInfo -> Article -> IO Int64
createArticlePG connString article = fromSqlKey <$> runAction connString (insert article)

We could do the same for the basic fetch endpoint. But we’ll write this basic query using Esqueleto in the interest of beginning to learn the syntax. With Persistent, we used list parameters to specify different filters and SQL operations. Esqueleto instead uses a special monad to compose the different type of query. The general format of an esqueleto select call will look like this:

fetchArticlePG :: PGInfo -> Int64 -> IO (Maybe Article)
fetchArticlePG connString aid = runAction connString selectAction
 where
   selectAction :: SqlPersistT (LoggingT IO) (Maybe Article)
   selectAction = select . from $ \articles -> do
     ...

We use select . from and then provide a function that takes a table variable. Our first queries will only refer to a single table, but we'll see a join later. To complete the function, we’ll provide the monadic action that will incorporate the different parts of our query.

The most basic filtering function we can call from within this monad is where_. This allows us to provide a condition on the query, much as we could with the filter list from Persistent.

selectAction :: SqlPersistT (LoggingT IO) (Maybe Article)
   selectAction = select . from $ \articles -> do
     where_ (articles ^. ArticleId ==. val (toSqlKey aid))

First, we use the ArticleId lens to specify which value of our table we’re filtering. Then we specify the value to compare against. We not only need to lift our Int64 into an SqlKey, but we also need to lift that value using the val function.

But now that we’ve added this condition, all we need to do is return the table variable. Now, select returns our results in a list. But since we’re searching by ID, we only expect one result. We’ll use listToMaybe so we only return the head element if it exists. We’ll also use entityVal once again to unwrap the article from its entity.

selectAction :: SqlPersistT (LoggingT IO) (Maybe Article)
   selectAction = ((fmap entityVal) . listToMaybe) <$> (select . from $ \articles -> do
     where_ (articles ^. ArticleId ==. val (toSqlKey aid))
     return articles)

Now we should know enough that we can write out the next query. It will fetch all the articles that have written by a particular user. We’ll still be querying on the articles table. But now instead checking the article ID, we’ll make sure the ArticleAuthorId is equal to a certain value. Once again, we’ll lift our Int64 user key into an SqlKey and then again with val to compare it in “SQL-land”.

fetchArticleByAuthorPG :: PGInfo -> Int64 -> IO [Entity Article]
fetchArticleByAuthorPG connString uid = runAction connString fetchAction
 where
   fetchAction :: SqlPersistT (LoggingT IO) [Entity Article]
   fetchAction = select . from $ \articles -> do
     where_ (articles ^. ArticleAuthorId ==. val (toSqlKey uid))
     return articles

And that’s the full query! We want a list of entities this time, so we’ve taken out listToMaybe and entityVal.

Now let’s write the final query, where we’ll find the 10 most recent articles regardless of who wrote them. We’ll include the author along with each article. So we’re returning a list of of these different tuples of entities. This query will involve our first join. Instead of using a single table for this query, we’ll use the InnerJoin constructor to combine our users table with the articles table.

fetchRecentArticlesPG :: PGInfo -> IO [(Entity User, Entity Article)]
fetchRecentArticlesPG connString = runAction connString fetchAction
 where
   fetchAction :: SqlPersistT (LoggingT IO) [(Entity User, Entity Article)]
   fetchAction = select . from $ \(users `InnerJoin` articles) -> do

Since we’re joining two tables together, we need to specify what columns we’re joining on. We’ll use the on function for that:

fetchAction :: SqlPersistT (LoggingT IO) [(Entity User, Entity Article)]
   fetchAction = select . from $ \(users `InnerJoin` articles) -> do
     on (users ^. UserId ==. articles ^. ArticleAuthorId)

Now we’ll order our articles based on the timestamp of the article using orderBy. The newest articles should come first, so we'll use a descending order. Then we limit the number of results with the limit function. Finally, we’ll return both the users and the articles, and we’re done!

fetchAction :: SqlPersistT (LoggingT IO) [(Entity User, Entity Article)]
   fetchAction = select . from $ \(users `InnerJoin` articles) -> do
     on (users ^. UserId ==. articles ^. ArticleAuthorId)
     orderBy [desc (articles ^. ArticlePublishedTime)]
     limit 10
     return (users, articles)

Caching Different Types of Items

We won’t go into the details of caching our articles in Redis, but there is one potential issue we want to observe. Currently we’re using a user’s SQL key as their key in our Redis store. So for instance, the string “15” could be such a key. If we try to naively use the same idea for our articles, we’ll have a conflict! Trying to store an article with ID “15” will overwrite the entry containing the User!

But the way around this is rather simple. What we would do is that for the user’s key, we would make the string something like users:15. Then for our article, we’ll have its key be articles:15. As long as we deserialize it the proper way, this will be fine.

Filling in the Server handlers

Now that we’ve written our database query functions, it is very simple to fill in our Server handlers. Most of them boil down to following the patterns we’ve already set with our other two endpoints:

fetchArticleHandler :: PGInfo -> Int64 -> Handler Article
fetchArticleHandler pgInfo aid = do
 maybeArticle <- liftIO $ fetchArticlePG pgInfo aid
 case maybeArticle of
   Just article -> return article
   Nothing -> Handler $ (throwE $ err401 { errBody = "Could not find article with that ID" })

createArticleHandler :: PGInfo -> Article -> Handler Int64
createArticleHandler pgInfo article = liftIO $ createArticlePG pgInfo article

fetchArticlesByAuthorHandler :: PGInfo -> Int64 -> Handler [Entity Article]
fetchArticlesByAuthorHandler pgInfo uid = liftIO $ fetchArticlesByAuthorPG pgInfo uid

fetchRecentArticlesHandler :: PGInfo -> Handler [(Entity User, Entity Article)]
fetchRecentArticlesHandler pgInfo = liftIO $ fetchRecentArticlesPG pgInfo

Then we’ll complete our Server FullAPI like so:

fullAPIServer :: PGInfo -> RedisInfo -> Server FullAPI
fullAPIServer pgInfo redisInfo =
 (fetchUsersHandler pgInfo redisInfo) :<|>
 (createUserHandler pgInfo) :<|>
 (fetchArticleHandler pgInfo) :<|>
 (createArticleHandler pgInfo) :<|>
 (fetchArticlesByAuthorHandler pgInfo) :<|>
 (fetchRecentArticlesHandler pgInfo)

One interesting thing we can do is that we can compose our API types into different sections. For instance, we could separate our FullAPI into two parts. First, we could have the UsersAPI type from before, and then we could make a new type for ArticlesAPI. We can glue these together with the e-plus operator just as we could individual endpoints!

type FullAPI = UsersAPI :<|> ArticlesAPI

type UsersAPI =
      "users" :> Capture "userid" Int64 :> Get '[JSON] User
 :<|> "users" :> ReqBody '[JSON] User :> Post '[JSON] Int64

type ArticlesAPI =
 "articles" :> Capture "articleid" Int64 :> Get '[JSON] Article
 :<|> "articles" :> ReqBody '[JSON] Article :> Post '[JSON] Int64
 :<|> "articles" :> "author" :> Capture "authorid" Int64 :> Get '[JSON] [Entity Article]
 :<|> "articles" :> "recent" :> Get '[JSON] [(Entity User, Entity Article)]

If we do this, we’ll have to make similar adjustments in other areas combining the endpoints. For example, we would need to update the server handler joining and the client functions.

Writing Tests

Since we already have some user tests, it would also be good to have a few tests on the Articles section of the API. We’ll add one simple test around creating an article and then fetching it. Then we’ll add one test each for the "articles-by-author" and "recent articles" endpoints.

So one of the tricky parts of filling in this section will be that we need to make test Article object. But we'll need them to be functions on the User ID. This is because we can’t know a priori what SQL IDs we'll get when we insert the users into the database. But we can fill in all the other fields, including the published time. Here’s one example, but we’ll have a total of 18 different “test” articles.

testArticle1 :: Int64 -> Article
testArticle1 uid = Article
 { articleTitle = "First post"
 , articleBody = "A great description of our first blog post body."
 , articlePublishedTime = posixSecondsToUTCTime 1498914000
 , articleAuthorId = toSqlKey uid
 }

-- 17 other articles and some test users as well
…

Our before hooks will create all these different entities in the database. In general, we’ll go straight to the database without calling the API itself. Like with our users tests, we'll want to delete any database items we create. Let's write a generic after-hook that will take user IDs and article IDs and delete them from our database:

deleteArtifacts :: PGInfo -> RedisInfo -> [Int64] -> [Int64] -> IO ()
deleteArtifacts pgInfo redisInfo users articles = do
 void $ forM articles $ \a -> deleteArticlePG pgInfo a
 void $ forM users $ \u -> do
   deleteUserCache redisInfo u
   deleteUserPG pgInfo u

It’s important we delete the articles first! If we delete the users first, we'll encounter foreign key exceptions!

Our basic create-and-fetch test looks a lot like the previous user tests. We test the success of the response and that the new article lives in Postgres as we expect.

beforeHook4 :: ClientEnv -> PGInfo -> IO (Bool, Bool, Int64, Int64)
beforeHook4 clientEnv pgInfo = do
 userKey <- createUserPG pgInfo testUser2
 articleKeyEither <- runClientM (createArticleClient (testArticle1 userKey)) clientEnv
 case articleKeyEither of
   Left _ -> error "DB call failed on spec 4!"
   Right articleKey -> do
     fetchResult <- runClientM (fetchArticleClient articleKey) clientEnv
     let callSucceeds = isRight fetchResult
     articleInPG <- isJust <$> fetchArticlePG pgInfo articleKey
     return (callSucceeds, articleInPG, userKey, articleKey)

spec4 :: SpecWith (Bool, Bool, Int64, Int64)
spec4 = describe "After creating and fetching an article" $ do
 it "The fetch call should return a result" $ \(succeeds, _, _, _) -> succeeds `shouldBe` True
 it "The article should be in Postgres" $ \(_, inPG, _, _) -> inPG `shouldBe` True

afterHook4 :: PGInfo -> RedisInfo -> (Bool, Bool, Int64, Int64) -> IO ()
afterHook4 pgInfo redisInfo (_, _, uid, aid) = deleteArtifacts pgInfo redisInfo [uid] [aid]

Our next test will create two different users and several different articles. We'll first insert the users and get their keys. Then we can use these keys to create the articles. We create five articles in this test. We assign three to the first user, and two to the second user:

beforeHook5 :: ClientEnv -> PGInfo -> IO ([Article], [Article], Int64, Int64, [Int64])
beforeHook5 clientEnv pgInfo = do
 uid1 <- createUserPG pgInfo testUser3
 uid2 <- createUserPG pgInfo testUser4
 articleIds <- mapM (createArticlePG pgInfo)
   [ testArticle2 uid1, testArticle3 uid1, testArticle4 uid1
   , testArticle5 uid2, testArticle6 uid2 ]
 ...

Now we want to test that we when call the articles-by-user endpoint, we only get the right articles. We’ll return each group of articles, the user IDs, and the list of article IDs:

beforeHook5 :: ClientEnv -> PGInfo -> IO ([Article], [Article], Int64, Int64, [Int64])
beforeHook5 clientEnv pgInfo = do
 uid1 <- createUserPG pgInfo testUser3
 uid2 <- createUserPG pgInfo testUser4
 articleIds <- mapM (createArticlePG pgInfo)
   [ testArticle2 uid1, testArticle3 uid1, testArticle4 uid1
   , testArticle5 uid2, testArticle6 uid2 ]
 firstArticles <- runClientM (fetchArticlesByAuthorClient uid1) clientEnv
 secondArticles <- runClientM (fetchArticlesByAuthorClient uid2) clientEnv
 case (firstArticles, secondArticles) of
   (Right as1, Right as2) -> return (entityVal <$> as1, entityVal <$> as2, uid1, uid2, articleIds)
   _ -> error "Spec 5 failed!"

Now we can write the assertion itself, testing that the articles returned are what we expect.

spec5 :: SpecWith ([Article], [Article], Int64, Int64, [Int64])
spec5 = describe "When fetching articles by author ID" $ do
 it "Fetching by the first author should return 3 articles" $ \(firstArticles, _, uid1, _, _) ->
   firstArticles `shouldBe` [testArticle2 uid1, testArticle3 uid1, testArticle4 uid1]
 it "Fetching by the second author should return 2 articles" $ \(_, secondArticles, _, uid2, _) ->
   secondArticles `shouldBe` [testArticle5 uid2, testArticle6 uid2]

We would then follow that up with a similar after hook.

The final test will follow a similar pattern. Only this time, we’ll be checking the combinations of users and articles. We’ll also make sure to include 12 different articles to test that the API limits results to 10.

beforeHook6 :: ClientEnv -> PGInfo -> IO ([(User, Article)], Int64, Int64, [Int64])
beforeHook6 clientEnv pgInfo = do
 uid1 <- createUserPG pgInfo testUser5
 uid2 <- createUserPG pgInfo testUser6
 articleIds <- mapM (createArticlePG pgInfo)
   [ testArticle7 uid1, testArticle8 uid1, testArticle9 uid1, testArticle10 uid2
   , testArticle11 uid2, testArticle12 uid1, testArticle13 uid2, testArticle14 uid2
   , testArticle15 uid2, testArticle16 uid1, testArticle17 uid1, testArticle18 uid2
   ]
 recentArticles <- runClientM fetchRecentArticlesClient clientEnv
 case recentArticles of
   Right as -> return (entityValTuple <$> as, uid1, uid2, articleIds)
   _ -> error "Spec 6 failed!"
 where
   entityValTuple (Entity _ u, Entity _ a) = (u, a)

Our spec will check that the list of 10 articles we get back matches our expectations. Then, as always, we remove the entities from our database.

Now we call these tests with our other tests, with small wrappers to call the hooks:

main :: IO ()
main = do
 ...
 hspec $ before (beforeHook4 clientEnv pgInfo) $ after (afterHook4 pgInfo redisInfo) $ spec4
 hspec $ before (beforeHook5 clientEnv pgInfo) $ after (afterHook5 pgInfo redisInfo) $ spec5
 hspec $ before (beforeHook6 clientEnv pgInfo) $ after (afterHook6 pgInfo redisInfo) $ spec6

And now we’re done! The tests pass!

…
After creating and fetching an article
 The fetch call should return a result
 The article should be in Postgres

Finished in 0.1698 seconds
2 examples, 0 failures

When fetching articles by author ID
 Fetching by the first author should return 3 articles
 Fetching by the second author should return 2 articles

Finished in 0.4944 seconds
2 examples, 0 failures

When fetching recent articles
 Should fetch exactly the 10 most recent articles

Conclusion

This completes our overview of useful production libraries. Over these articles, we’ve constructed a small web API from scratch. We’ve seen some awesome abstractions that let us deal with only the most important pieces of the project. Both Persistent and Servant generated a lot of extra boilerplate for us. This article showed the power of the Esqueleto library in allowing us to do type-safe joins. We also saw an end-to-end process of adding a new type and endpoints to our API.

In the coming weeks, we’ll be dealing with some more issues that can arise when building these kinds of systems. In particular, we’ll see how we can use alternative monads on top of Servant. Doing this can present certain issues that we'll explore. We’ll culminate by exploring the different approaches to encapsulating effects.

Be sure to check out our Haskell Stack mini-course!! It'll show you how to use Stack, so you can incorproate all the libraries from this series!

If you’re new to Haskell and not ready for that yet, take a look at our Getting Started Checklist and get going!

LibraryProduction

James Bowen