James Bowen 6/16/22 James Bowen 6/16/22

Just Catching a Few Things

We've now seen a few of the different nuances in handling exceptions within our code. Earlier this month we learned about the "catch" and "handle" functions, which are the backbone of capturing exceptions in our code. And then last time around we saw the importance of how these catch particular types of exceptions.

Today we'll go over a new pair of handling functions. These allow us to narrow down the range of exceptions we'll handle, rather than catching every exception of a particular type. These functions are catchJust, and its flipped counterpart, handleJust. Here are the type signatures:

catchJust :: Exception e =>
  (e -> Maybe b) ->
  IO a ->
  (b -> IO a) ->
  IO a

handleJust :: Exception e =>
  (e -> Maybe b) ->
  (b -> IO a) ->
  IO a ->
  IO a

The defining features of these handler functions is the filter predicate at the start: the first argument of type e -> Maybe b. This takes the input exception type and returns a Maybe value. That maybe value can be some transformation on the exception input.

Let's make a simple example using our ListException type. Let's recall what this type looks like:

data ListException =
  ListIsEmpty String |
  NotEnoughElements String Int Int
  deriving (Show)

instance Exception ListException

As a simple example, let's write a predicate that will only capture our ListIsEmpty exception. It will return the name of the function causing the error.

isEmptyList :: ListException -> Maybe String
isEmptyList (ListIsEmpty functionName) = Just functionName
isEmptyList _ = Nothing

Now we'll write a function that will process a list and print its first element. But if it is empty, it will print the name of the function. This will use catchJust.

printFirst :: [Int] -> IO Int
printFirst input = catchJust isEmptyList action handler
  where
    action :: IO Int
    action = do
      let result = myHead input
      print result
      return result

    handler :: String -> IO Int
    handler functionName = do
      putStrLn $ "Caught Empty List exception from function: " ++ functionName ++ ". Returning 0!"
      print 0
      return 0

Now when run this, we'll see the error message we expect:

main :: IO ()
main = do
  result1 <- printFirst []
  result2 <- printFirst [2, 3, 4]
  print $ result1 + result2


...

Caught Empty List exception from function: myHead. Returning 0!
0
2
2

But if we change the function to use "sum2Pairs" instead (which throws NotEnoughElements, rather than ListIsEmpty), then we'll still see the exception!

sum2Pairs :: (Num a) => [a] -> (a, a)
sum2Pairs (a : b : c : d : _) = (a + b, c + d)
sum2Pairs input = throw (NotEnoughElements "sum2Pairs" 4 (length input))

newMain :: IO ()
newMain = do
  result1 <- printSums []
  result2 <- printSums [2, 3, 4, 5]
  print $ (result1, result2)

...

>> stack exec my-program
my-program: NotEnoughElements "sum2Pairs" 4 0

We can modify the predicate so that it always catches exceptions from a particular function and gives different error messages depending on the exception thrown:

isSum2Pairs :: ListException -> Maybe ListException
isSum2Pairs e@(ListIsEmpty function) = if function == "sum2Pairs'"
  then Just e
  else Nothing
isSum2Pairs e@(NotEnoughElements function _ _) = if function == "sum2Pairs'"
  then Just e
  else Nothing

Now let's modify sum2Pairs so that it can throw either error type, depending on its input:

sum2Pairs' :: (Num a) => [a] -> (a, a)
sum2Pairs' (a : b : c : d : _) = (a + b, c + d)
sum2Pairs' [] = throw (ListIsEmpty "sum2Pairs'")
sum2Pairs' input = throw (NotEnoughElements "sum2Pairs'" 4 (length input))

When we use this updated version in our main function, we'll see we get a variety of outputs!

printSums' :: [Int] -> IO (Int, Int)
printSums' input = catchJust isSum2Pairs action handler
  where
    action :: IO (Int, Int)
    action = do
      let result = sum2Pairs' input
      print result
      return result

    handler :: ListException -> IO (Int, Int)
    handler e = do
      putStrLn $ "Caught exception: " ++ show e ++ ". Returning (0, 0)!"
      print (0, 0)
      return (0, 0)

newMain :: IO ()
newMain = do
  result1 <- printSums' []
  result2 <- printSums' [2, 3, 4]
  print $ (result1, result2)
...

>> stack exec my-program
Caught exception: ListIsEmpty "sum2Pairs'". Returning (0, 0)!
(0,0)
Caught exception: NotEnoughElements "sum2Pairs'" 4 3. Returning (0, 0)!
(0,0)
((0,0),(0,0))

Next time, we'll look at a more practical usage of this approach with IO Errors! Until then, make sure you subscribe to our monthly newsletter so you can stay up to date with the latest news!

James Bowen 6/13/22 James Bowen 6/13/22

Exception Type Details

A couple articles ago, we defined a basic exception type. Today, we'll go over some more details behind the way these exception types work. We'll consider how one might catch all exceptions, but also why this might not be a good idea.

Here's how we defined our exception type:

data ListException = ListIsEmpty | NotEnoughElements
  deriving (Show)

instance Exception ListException

This indicates two different kinds of failures we might have when trying to process a list in a function. As long as we define or derive a Show instance, we can simply say instance Exception, and we'll be able to treat this type as an exception, because the class has no minimum definition.

So far, our example is a simple enumeration. But of course it's also possible to add data to these exception constructors. Let's suppose we want to know what function triggered the failure in the first type, and how many elements we expected and observed in the second type. Let's also define a custom Show instance.

data ListException =
  ListIsEmpty String |
  NotEnoughElements String Int Int

instance Show ListException where
  show (ListIsEmpty function) = "The function '" function ++ "' requires a non-empty list!"
  show (NotEnoughElements function expected observed) =
    "The function '" ++ function ++ "' expected " ++ show expected ++ " elements but only got " ++
    show observed ++ " elements."

Now we can rewrite our functions to add this information.

myHead :: [a] -> a
myHead [] = throw (ListIsEmpty "myHead")
myHead (a : _) = a

sum2Pairs :: (Num a) => [a] -> (a, a)
sum2Pairs (a : b : c : d : _) = (a + b, c + d)
sum2Pairs input = throw (NotEnoughElements "sum2Pairs" 4 (length input))

And we can see this in action:

main :: IO ()
main = do
  result0 <- try (evaluate (myHead []) :: IO (Either ListException Int)
  print result0
  result1 <- try (evaluate (sum2Pairs [2, 3, 4])) :: IO (Either ListException (Int, Int))
  print result1
  result2 <- try (evaluate (sum2Pairs [2, 3, 4, 5])) :: IO (Either ListException (Int, Int))
  print result2

...

>> stack exec my-program
Left The function 'myHead' requires a non-empty list!
Left The function 'sum2Pairs' expected 4 elements but only got 3 elements.
Right (5, 9)

Now we didn't have to implement any custom functions to make our type an exception. But if we wanted to, we could! There are three functions we can override, but they all have appropriate default behaviors. The first of these functions is displayException. You can use it to provide a second way to display the exception beyond the Show instance, if you desire that for whatever reason. However, the Show instance still has priority when the error is thrown by the system.

Let's try keeping the derived instance of Show, but use our new function as the display message.

data ListException =
  ListIsEmpty String |
  NotEnoughElements String Int Int
  deriving (Show)

instance Exception ListException where
  displayMessage (ListIsEmpty function) = "The function '" function ++ "' requires a non-empty list!"
  displayMessage (NotEnoughElements function expected observed) =
    "The function '" ++ function ++ "' expected " ++ show expected ++ " elements but only got " ++
    show observed ++ " elements."

We'll find that our program uses the Show instance.

main :: IO ()
main = do
  return (myHead ([] :: [Int]) >>= print

...

>> stack exec my-program
my-program: ListIsEmpty "myHead"

The other two functions in the definition require us to learn an additional concept: SomeException.

class Exception e where
  toException :: e -> SomeException
  fromException :: SomeException -> Maybe e

The type SomeException is essentially a wrapper type for all exceptions in Haskell. When the system receives and throws your exception, it is always wrapped as SomeException under the hood. So in a way, this acts like the base Exception class in a language like Java or Python. However, it acts like a wrapper instead of a "parent" class due to the lack of type-based inheritance in Haskell.

data SomeException e = forall e. Exception e => SomeException e

The two functions above would allow you to override how you transform your exception type back and forth with the SomeException type. However, there's rarely any reason to override this behavior.

Now, since every exception is SomeException, this means we could catch every possible exception with a handler function. Let's recall our previous example where we could catch a ListException but not a file-based IO exception for opening a non-existent file:

main :: IO ()
main = do
  handle handler $ readFile "does_not_exist.txt" >>= print
  handle handler $ do
    result <- return (sum2Pairs [2, 3, 4])
    print result
  where
    handler :: ListException -> IO ()
    handler e = print e

...

>> stack exec my-program
my-program: does_not_exit.txt: openFile: does not exist (No such file or directory)

If we modify our handler to take SomeException instead of ListException, it will catch both types!

main :: IO ()
main = do
  ...
  where
    handler :: SomeException -> IO ()
    handler e = print e

...

>> stack exec my-program
does_not_exit.txt: openFile: does not exist (No such file or directory)
NotEnoughElements "sum2Pairs" 4 3

Typically, this is not a great idea. Haskell's type system allows us to be very specific with the errors we can catch, and we should take advantage of that. If you aren't anticipating a particular error, you shouldn't catch it. And if it pops up, you should adjust your program accordingly. However, catching "any" exception, logging it, and exiting gracefully as we just did IS a reasonable use case mentioned in the documentation on this subject.

The "proper" way to handle multiple exception types is to daisy chain handle calls with different types like so:

main :: IO ()
main = handle ioHandler $ handle listHandler $ do
  readFile "does_not_exist.txt" >>= print 
  result <- return (sum2Pairs [2, 3, 4])
  print result
  where
    listHandler :: ListException -> IO ()
    listHandler e = putStrLn $ "List exception: " ++ show e

    ioHandler :: IOError -> IO ()
    ioHandler e = putStrLn $ "IO Error: " ++ show e

In the next couple articles, we'll explore more ways to catch errors. Stay tuned! If you want access to our subscriber resources, you can sign up for our monthly newsletter!

James Bowen 6/9/22 James Bowen 6/9/22

"Try"-ing It Out First

Earlier this week we explored how to "catch" exceptions using the functions catch and handle. Today we'll learn a couple new tools for this task. The first function we'll look at is try, but in order to really use it, we'll also have to use evaluate.

Like catch, we can use try to turn our exception into a computation that our program can process and react to gracefully. However, instead of taking an exception handler, this function will simply return the exception using an Either value.

try :: Exception e => IO a -> IO (Either e a)

The computation produces the result type a, but could throw an exception e. So we return the type Either e a. All this must be done in the IO monad, like we saw with catch.

Let's recall our previous approach to catching exceptions. Since we had a pure function (sum2Pairs) that could throw the exception, we would use return in order to move it into the IO monad to use catch. We also needed an explicit type signature on our handler function so that our program knows what exceptions it is trying to catch:

main :: IO ()
main = do
  catch (return (sum2Pairs [2, 3, 4]) >>= print) handler
  catch (return (sum2Pairs [2, 3, 4, 5]) >>= print) handler
  where
    handler :: ListException -> IO ()
    handler e = print e

Let's try to substitute try in for these expressions. Once again, we'll explicitly annotate the resulting value with the exception type.

main :: IO ()
main = do
  result1 <- try (return (sum2Pairs [2, 3, 4])) :: IO (Either ListException (Int, Int))
  print result1
  result2 <- try (return (sum2Pairs [2, 3, 4, 5])) :: IO (Either ListException (Int, Int))
  print result2

However, this doesn't work the way we want! Our program crashes on the exceptional case!

my-program: NotEnoughElements

The reason for this lies in Haskell's laziness. The exceptional computation doesn't actually occur until we "need" the value, which is when the print statement happens. But by delaying the computation, our program loses the try context. We can try to wrap the print statement into our "try" block, but it makes our program unnecessarily complicated.

Instead, we have a different tool to help us. This is the evaluate function.

evaluate :: a -> IO a

At first glance, this seems to be the same type as return! It takes a pure value and wraps it in the IO monad. However, it will take care of "evaluating" our expression in a strict (non-lazy) manner. So the computation will occur when we need it to, and we can use "try". If we change our above implementation by swapping evaluate for return, then it works!

main :: IO ()
main = do
  result1 <- try (evaluate (sum2Pairs [2, 3, 4])) :: IO (Either ListException (Int, Int))
  print result1
  result2 <- try (evaluate (sum2Pairs [2, 3, 4, 5])) :: IO (Either ListException (Int, Int))
  print result2

...

Left NotEnoughElements
Right (5, 9)

So now we have a more reliable way of turning our "pure" computations into expressions where we can catch their exceptions. In the next couple articles, we'll focus some more on what we can do with exceptional data types. Until then, make sure to subscribe to our monthly newsletter so you can stay up to date with the latest news and get access to our subscriber resources!

James Bowen 6/6/22 James Bowen 6/6/22

Catching What We’ve Thrown

Last week we learned how to throw exceptions in Haskell. In the next couple articles, we're going to learn how to "catch" them, so that in exceptional circumstances we can still proceed with our program in a sane way.

Now, throwing exceptions disrupted our patterns of type safety quite a bit. We could throw an exception from any piece of seemingly pure code. Even our simple function from a list to an element of that list could invoke throw:

data ListException = ListIsEmpty | NotEnoughElements
  deriving (Show)

instance Exception ListException

myHead :: [a] -> a
myHead [] = throw ListIsEmpty
myHead (a : _) = a

sum2Pairs :: (Num a) => [a] -> (a, a)
sum2Pairs (a : b : c : d : _) = (a + b, c + d)
sum2Pairs _ = throw NotEnoughElements

Unlike throwing exceptions though, we can only "catch" exceptions in the IO monad. As we discussed last month, the IO monad involves a lot of operations to communicate with the outside world, and so it is the most "impure" of monads. Part of this impurity is that we can "intercept" exception siganls that are sent to the operating system.

The first function we'll go over this time for catching exceptions is, well, catch. Here's its type signature:

catch :: (Exception e)
  => IO a
  -> (e -> IO a)
  -> IO a

It takes an IO action we would like to perform and then a "handler" for a particular kind of exception that can occur. The handler takes the exception as an input and then produces a new IO action with the same return value. Here's how we can use it in our example:

main :: IO ()
main = do
  catch (return (sum2Pairs [2, 3, 4]) >>= print) handler
  catch (return (sum2Pairs [2, 3, 4, 5]) >>= print) handler
  where
    handler :: ListException -> IO ()
    handler e = print e

...

>> stack exec my-program
NotEnoughElements
(5, 9)

Notice we need to wrap our pure computation sum2Pairs in the IO monad using return to catch its exception. Then we need to make it so our handler function returns the same type. In this case, we make that type () and just print the results.

Two final notes. First, the function handle is the same as catch except its arguments are reversed.

handle :: (Exception e)
  => IO a
  -> (e -> IO a)
  -> IO a

This can make for cleaner code in our example. We can put our handler function first and use do-syntax for the computation itself. This is good with lengthier examples.

main :: IO ()
main = do
  handle handler $ do
    result <- return (sum2Pairs [2, 3, 4])
    print result
  handle handler $ do
    result <- return (sum2Pairs [2, 3, 4, 5])
    print result
  where
    handler :: ListException -> IO ()
    handler e = print e

Second, our handler will only catch exceptions that match the type of the handler! We specified the handler as a separate expression with its own type signature because you need to specify what the type is! It wouldn't work to just inline this definition, because GHC would complain about an ambiguous type. So for example, if we opened a non-existant file, our handler would not catch this, and the program would crash:

main :: IO ()
main = do
  handle handler $ readFile "does_not_exist.txt" >>= print
  handle handler $ do
    result <- return (sum2Pairs [2, 3, 4, 5])
    print result
  where
    handler :: ListException -> IO ()
    handler e = print e

...

>> stack exec my-program
my-program: does_not_exit.txt: openFile: does not exist (No such file or directory)

It is possible to catch all exceptions, but this is not advisable, as the documentation says. We'll go into more details about that possibility later.

For now, you should check out one of our useful resources for whatever stage of your Haskell journey you are at! If you're just starting out, our Beginners Checklist will help you out. If you're looking to incorporate exceptions into a larger project, try out our production checklist for some more suggestions of libraries to use!

James Bowen 6/2/22 James Bowen 6/2/22

Throwing Exceptions: The Basics

Haskell is a pure, functional, strongly typed language. Unfortunately, this doesn't mean that nothing ever goes wrong or that there are no runtime errors. However, we can still use the type system in a few different ways to denote the specific problems that can occur. In the ideal case of error handling, I see an analogy to the state monad. Haskell "doesn't have mutable state". Except really it does…you just have to specify that mutable state is possible by placing your function in the State monad. Similarly, if we use particular functions, we often find that their types indicate the possibility that errors could arise in the computation.

The blog topic for June is "exceptional cases", so we're going to explore a wide variety of different ways that we can indicate runtime problems in Haskell and, more importantly, how we can write our code to catch these problems so our program doesn't suddenly crash in an unexpected way.

To start this journey, let's learn about "Exceptions" and how to throw them. A language like Java will have a class to represent the idea of exceptions:

class Exception {
  ...
}

This would serve as the base for other exception types. So you might define your own, like a "File" exception:

class FileException extends Exception {
}

Of course Haskell doesn't have classes or use inheritance in the same way. When it comes to inheritance, we rely on typeclasses. So Exception is a typeclass, not a data type.

class (Typeable e, Show e) => Exception e where
  ...

Notice that an exception type must be "showable". This makes sense, since the purpose of exceptions is to print them to the screen for output! They must also be Typeable, but virtually any type you'll make fulfills this constraint without you needing to even specify it.

There isn't a minimum definition for the Exception class. This means it is easy to define your own exception type. So as a first example, let's define an exception to work with lists. Certain list operations expect the list is non-empty, or that it has at least a certain number of elements. So we'll make an enumerated type with two constructors.

data ListException = ListIsEmpty | IndexNotFound
  deriving (Show)

We can derive the Show class, but we can't actually derive Exception under normal circumstances. However, since we don't need any functions, we just make a trivial instance.

data ListException = ListIsEmpty | NotEnoughElements
  deriving (Show)

instance Exception ListException

So what can we do with exceptions? Well the most important thing is that we can "throw" them to indicate the error has occurred. The throw function has a strange type if you look up the documentation:

throw :: forall (r :: RuntimeRep). forall (a :: TYPE r). forall e. Exception e => e -> a

This is a bit confusing, but to build a basic understanding, we can just look at the last part:

throw :: forall e. Exception e => e -> a

If we have an exception, we can use "throw" to trigger that exception and return any type. The a can be anything we want! All the magic stuff in the type signature essentially allows us to return this exception as "any type".

So for example, we can define a couple functions to operate on lists. These will have the "happy path" where we have enough elements, but they'll also have a failure mode. In the failure mode we'll throw the exception.

myHead :: [a] -> a
myHead [] = throw ListIsEmpty
myHead (a : _) = a

sum2Pairs :: (Num a) => [a] -> (a, a)
sum2Pairs (a : b : c : d : _) = (a + b, c + d)
sum2Pairs _ = throw NotEnoughElements

And when we use these functions, we can see how the exceptions occur:

>> myHead [4, 5]
4
>> myHead []
*** Exception: ListIsEmpty
>> sum2Pairs [5, 6, 7, 8, 9, 10]
(11, 15)
>> sum2Pairs [4, 5, 6]
Exception: NotEnoughElements

So even though our functions return different types, we can still use throw with our exception type on both of them.

You might also notice that our functions have pure type signatures! So using throw by itself in this way violates our notion of what pure functions ought to do. It's necessary to have this escape hatch in certain circumstances. However, we really want to avoid writing our code in this way if we possibly can.

In the coming weeks, we'll examine how to "catch" these kinds of exceptions so that our code still has some semblance of purity. To stay up to date with the latest Haskell news, make sure to subscribe to our monthly newsletter! This will keep you informed and, even better, give you access to our subscriber resources!

James Bowen 5/30/22 James Bowen 5/30/22

Unit Testing User Interactions

To round out our month of IO, I'd like to bring together several of the topics I've mentioned over the course of the month. A few weeks ago when talking about the interact function, I brought up the example of a command line program that would allow the user to enter simple addition expressions and print out the answer. Then going back to the first article this month, I mentioned how we can use the Handle abstraction to write a program that could work with either terminal input or file input so that we can test it. And finally, we can go all the way back to Monads month for some information on lifting functions and creating our own monad.

Today we're going to combine all these ideas! We'll have a simple command line program that will use a custom monad to abstract away input details, and then write some tests for it!

Let's write this program in a test-driven way. What are the use cases we want? Well each time a user enters a line on the terminal, we'll treat that as an expression to evaluate, and then print the solution.

-- Input
4 + 5
6 + -2

-- Output
9
4

If they enter something that doesn't follow our simple equation format, it should print an appropriate message:

-- Input
4 + 5 + 6
3 +
Hello + Goodbye
4 * 5

-- Output
There are too many parts! Please enter something in the format "x + y"
There are too few parts! Please enter something in the format "x + y"
It doesn't look like those are numbers!
Please only use addition!

And last of all, the program should be able to "recover". So if the user has one incorrect line, they can still enter in another equation and it should work.

-- Input
6 +
9 + 14

-- Output
There are too few parts! Please enter something in the format "x + y"
23

So how will we write this program in a way that we can test it? The key idea is that we'll create a monad that stores the "Handles" we're working with, and then we'll be able to customize it. So let's create a monad type that has a Reader over our input and output handles.

data AppConfig = AppConfig
  { inHandle :: Handle
  , outHandle :: Handle
  }

newtype AppMonad a = AppMonad (ReaderT AppConfig IO a)
  deriving (Functor, Applicative, Monad)

We can start with some simple instances for MonadIO and MonadReader, as well as a "run" function.

instance MonadIO AppMonad where
  liftIO = AppMonad . lift

instance MonadReader AppConfig AppMonad where
  ask = AppMonad ask
  local f (AppMonad a) = AppMonad (local f a)

runApp :: AppMonad a -> (Handle, Handle) -> IO a
runApp (AppMonad action) (inH, outH) = runReaderT action (AppConfig inH outH)

Now we can write some functions that will read and write using our handles.

appGetLine :: AppMonad String
appGetLine = do
  inH <- asks inHandle
  liftIO $ hGetLine inH

appPutStrLn :: String -> AppMonad ()
appPutStrLn output = do
  outH <- asks outHandle
  liftIO $ hPutStrLn outH output

appIsEOF :: AppMonad Bool
appIsEOF = do
  inH <- asks inHandle
  liftIO $ hIsEOF inH

Now let's write the core logic function for our program, taking the line of input and producing a line of output:

evalLine :: String -> String
evalLine input = case splitOn " " input of
  [first, op, second] -> if op /= "+"
    then "Please only use addition!"
    else case (readMaybe first, readMaybe second) of
      (Just x, Just y) -> show (x + y)
      _ -> "It doesn't look like those are numbers!"
  (first : op : second : other : _) -> "There are too many parts! Please enter something in the format \"x + y\""
  _ -> "There are too few parts! Please enter something in the format \"x + y\""

And now it's straightforward to write the input/output loop:

runCLI :: AppMonad ()
runCLI = go
  where
    go = do
      ended <- appIsEOF
      if ended
        then return ()
        else do
          input <- appGetLine
          let output = evalLine input
          appPutStrLn output
          go

Finally, in the "main" function, we just need to call runApp with the standard handles:

main :: IO ()
main = runApp runCLI (stdin, stdout)

In our testing code, then we can write a function that will take two file paths, a file containing our expected input, and a file containing our expected output. It will create an input handle from the first first, and create a temporary file (remember that concept?) for our program's output handle.

testCLIProgram :: FilePath -> FilePath -> Assertion
testCLIProgram inputFile expectedOutputFile = do
  currentDir <- getCurrentDirectory
  inH <- openFile inputFile ReadMode
  (actualOutputFile, outH) <- openTempFile currentDir "output.txt"
  ...

Then we'll run our program, which will write all its output to the temporary file. Then we'll reset the output handle to the beginning (remember it's still readable), and compare its contents to those in the expected output. If they match, our program works!

testCLIProgram :: FilePath -> FilePath -> Assertion
testCLIProgram inputFile expectedOutputFile = do
  currentDir <- getCurrentDirectory
  (actualOutputFile, outH) <- openTempFile currentDir "output.txt"
  inH <- openFile inputFile ReadMode
  runApp runCLI (inH, outH)
  hSeek outH AbsoluteSeek 0
  actualOutput <- hGetContents outH
  expectedOutput <- readFile expectedOutputFile
  actualOutput @?= expectedOutput
  hClose inH
  hClose outH
  removeFile actualOutputFile

This lists all our operations in logical order, but it still doesn't necessarily cover all the exceptional cases correctly! We might still want to use the bracket pattern to ensure file cleanup happens correctly. The "resources" we acquire are the temporary file and its handle, the input handle, and the expected output string. We want to close the handles and delete the file once everything is finished running:

testCLIProgram :: FilePath -> FilePath -> Assertion
testCLIProgram inputFile expectedOutputFile = bracket acquire release runTest
  where
    acquire :: IO (FilePath, Handle, Handle, String)
    acquire = do
      currentDir <- getCurrentDirectory
      (actualOutputFile, outH) <- openTempFile currentDir "actual_output.txt"
      inH <- openFile inputFile ReadMode
      expectedOutput <- readFile expectedOutputFile
      return (actualOutputFile, outH, inH, expectedOutput)

    release :: (FilePath, Handle, Handle, String) -> IO ()
    release (fp, outH, inH, _) = do
      hClose outH
      hClose inH
      removeFile fp

    runTest :: (FilePath, Handle, Handle, String) -> IO ()
    runTest (fp, outH, inH, expectedOutput) = do
      runApp runCLI (inH, outH)
      hSeek outH AbsoluteSeek 0
      actualOutput <- hGetContents outH
      actualOutput @?= expectedOutput

And so our "test main" can now just run the different tests as it needs to!

main :: IO ()
main = defaultMain $ testGroup
  [ testCase "App 1" (testCLIProgram "input1.txt" "output1.txt")
  , testCase "App 2" (testCLIProgram "input2.txt" "output2.txt")
  , testCase "App 3" (testCLIProgram "input3.txt" "output3.txt")
  ]

So in the course of this article, I think we managed to use at least half a dozen of our monad and IO concepts! So hopefully you are beginning to see how all these ideas build on each other and allow you to do some pretty cool things!

Next month, we'll kind of be sticking with the IO theme. But we'll start looking specifically at exceptional cases and the different ways we have to handle those more smoothly. If you want to stay up to date with all the latest topics we're covering at Monday Morning Haskell, make sure you subscribe to our monthly newsletter! If you miss a few articles over the course of the month, you'll always get a summary so you can catch up!

James Bowen 5/26/22 James Bowen 5/26/22

Sizing Up our Files

Earlier this week we went over some basic mechanics with regard to binary files. This week we'll look at a couple functions for dealing with file size. These are perhaps a bit more useful with binary files, but they also work with normal files, as we'll see.

The two functions are very simple. We can get the file size, and we can set the file size:

hFileSize :: Handle -> IO Integer

hSetFileSize :: Handle -> Integer -> IO ()

Getting the file size does exactly what you would expect. It gives us an integer for the number of bytes in the file. We can use this on our bitmap from last time, but also on a normal text file with the lines "First Line" through "Fourth Line".

main :: IO ()
main = do
  h1 <- openFile "pic_1.bmp" ReadMode
  h2 <- openFile "testfile.txt" ReadMode
  hFileSize h1 >>= print
  hFileSize h2 >>= print

...

822
46

Note however, that we cannot get the file size of terminal handles, since these aren't, of course, files. A potential hope would be that this would return the number of bytes we've written to standard out so far, or the (strictly read) number of bytes we get in stdin before end-of-file. But it throws an error instead:

main :: IO ()
main = do
  hFileSize stdin >> print
  hFileSize stdout >> print

...

<stdin>: hFileSize: inappropriate type (not a regular file)

Now setting the file size is also possible, but it's a tricky and limited operation. First of all, it will not work on a handle in ReadMode:

main :: IO ()
main = do
  h <- openFile "testfile.txt" ReadMode
  hSetFileSize h 34

...

testfile.txt: hSetFileSize: invalid argument (Invalid argument)

In ReadWriteMode however, this operation will succeed. By truncating from 46 to 34, we remove the final line "Fourth Line" from the file (don't forget the newline character!).

main :: IO ()
main = do
  h <- openFile "testfile.txt" ReadMode
  hSetFileSize h 34

... (File content)

First Line
Second Line
Third Line

Setting the file size also works with WriteMode. Remember that opening a file in write mode will erase its existing contents. But we can start writing new contents to the file and then truncate later.

main :: IO ()
main = do
  h <- openFile "testfile.txt" WriteMode
  hPutStrLn h "First Line"
  hPutStrLn h "Second Line"
  hPutStrLn h "Third Line"
  hPutStrLn h "Fourth Line"
  hSetFileSize h 34

... (File content)

First Line
Second Line
Third Line

And, as you can probably tell by now, hSetFileSize only truncates from the end of files. It can't remove content from the beginning. So with our binary file example, we could drop 48 bytes to remove one of the "lines" of the picture, but we can't use this function to remove the 54 byte header:

main :: IO ()
main = do
  h <- openFile "pic_1.bmp" ReadWriteMode
  hSetFileSize h 774

Finally, hSetFileSize can also be used to add space to a file. Of course, the space it adds will all be null characters (byte = 0). But this can still be useful in certain circumstances.

main :: IO ()
main = do
  h <- openFile "pic_1.bmp" ReadWriteMode
  hSetFileSize h 870
  inputBytes <- B.unpack <$> B.hGetContents h
  let lines = chunksOf 48 (drop 54 inputBytes)
  print (last lines)

...

[0,0,0,...]

These aren't the most common operations, but perhaps you'll find a use for them! We're almost done with our look at more obscure IO actions. If you've missed some of these articles and want a summary of this month's new material, make sure to subscribe to our monthly newsletter! You'll also get a sneak peak at what's coming next!

James Bowen 5/23/22 James Bowen 5/23/22

Using Binary Mode in Haskell

So far in our IO adventures, we've only been dealing with plain text files. But a lot of data isn't meant to be read as string data. Some of the most interesting and important problems in computing today are about reading image data and processing it so our programs can understand what's going on. Executable program files are also in a binary format, rather than human readable. So today, we're going to explore how IO works with binary files.

First, it's important to understand that handles have encodings, which we can retrieve using hGetEncoding. For the most part, your files will default as UTF-8.

hGetEncoding :: Handle -> IO (Maybe TextEncoding)

main :: IO ()
main = do
  hGetEncoding stdin >>= print
  hGetEncoding stdout >>= print
  h <- openFile "testfile.txt" ReadMode
  hGetEncoding h >>= print

...

Just UTF-8
Just UTF-8
Just UTF-8

There are other encodings of course, like char8, latin1, and utf16. These are different ways of turning text into bytes, and each TextEncoding expression refers to one of these. If you know you have a file written in UTF16, you can change the encoding using hSetEncoding:

hSetEncoding :: Handle -> TextEncoding -> IO ()

main :: IO ()
main = do
  h <- openFile "myutf16file.txt" ReadMode
  hSetEncoding h utf16
  myString <- hGetLine h
  ...

But now notice that hGetEncoding returns a Maybe value. For binary files, there is no encoding! We are only allowed to read raw data. You can set a file to read as binary by using hSetBinaryMode True, or by just using openBinaryFile.

hSetBinaryMode :: Handle -> Bool -> IO ()

openBinaryFile :: FilePath -> IOMode -> IO Handle

main :: IO ()
main = do
  h <- openBinaryFile "pic_1.bmp" ReadMode
  ...

When it comes to processing binary data, it is best to parse your input into a ByteString rather than a string. Using the unpack function will then allow you to operate on the raw list of bytes:

import qualified Data.ByteString as B

main :: IO ()
main = do
  h <- openBinaryFile "pic_1.bmp" ReadMode
  inputBytes <- B.hGetContents h
  print $ length inputBytes

In this example, I've opened up an image files, and converted its data into a list of bytes (using the Word type).

Further processing of the image will require some knowledge of the image format. As a basic example, I made a 24-bit bitmap with horizontal stripes throughout. The size was 16 pixels by 16 pixels. With 3 bytes (24 bits) per pixel, the total size of the "image" would be 768. So then upon seeing that my program above printed "822", I could figure out that the first 54 bits were just header data.

I could then separate my data into "lines" (48-byte chunks) and I successfully observed that each of these chunks followed a specific pattern. Many lines were all white (the only value was 255), and other lines had three repeating values.

import qualified Data.ByteString as B
import Data.List.Split (chunksOf)

main :: IO ()
main = do
  h <- openBinaryFile "pic_1.bmp" ReadMode
  inputBytes <- B.unpack <$> B.hGetContents h
  let lines = chunksOf 48 (drop 54 inputBytes)
  forM_ lines print

...

[255, 255, 255, ...]
[36, 28, 237, 36, 28, 237, ...]
[255, 255, 255, ...]
[76, 177, 34, 76, 177, 34 ...]
[255, 255, 255, ...]
[36, 28, 237, 36, 28, 237, ...]
[255, 255, 255, ...]
[76, 177, 34, 76, 177, 34 ...]
[255, 255, 255, ...]
[0, 242, 255, 0, 242, 255, ...]
[255, 255, 255, ...]
[232, 162, 0, 232, 162, 0, ...]
[255, 255, 255, ...]
[0, 242, 255, 0, 242, 255, ...]
[255, 255, 255, ...]
[232, 162, 0, 232, 162, 0, ...]

Now that the data is broken into simple numbers, it would be possible to do many kinds of mathematical algorithms on it if there were some interesting data to process!

In our last couple of IO articles, we'll keep looking at some issues with binary data. If you want monthly summaries of what we're writing here at Monday Morning Haskell, make sure to subscribe to our monthly newsletter! This will also give you access to our subscriber resources!

James Bowen 5/19/22 James Bowen 5/19/22

Interactive IO

Today we'll continue our study of IO by looking at an interactive IO program. In this kind of program, the user will enter commands continuously on the command line to interact with our program. The fun part is that we'll find a use for a lesser-known library function called, well, interact!

Imagine you're writing a command line program where you want the user to keep entering input lines, and you do some kind of processing for each line. The most simple example would be an echo program, where we simply repeat the user's input back out to them:

>> Hello
Hello
>> Goodbye
Goodbye

A naive approach to writing this in Haskell would use recursion like so:

main :: IO ()
main = go
  where
    go = do
      input <- getLine
      putStrLn input
      go

However, there's no terminal condition on this loop. It keeps expecting to read a new line. Our only way to end the program is with "ctrl+C". Typically, the cleaner way to end a program is to use the input "ctrl+D" instead, which is the "end of file" character. However, this example will not end elegantly if we do that:

>> Hello
Hello
>> Goodbye
Goodbye
>> (ctrl+D)
<stdin>: hGetLine: end of file

What's happening here is that getLine will throw this error when it reads the "end of file" character. In order to fix this, we can use these helper functions.

hIsEOF :: Handle -> IO Bool

-- Specialized to stdin
isEOF :: IO Bool

These give us a boolean that indicates whether we have reached the "end of file" as our input. The first works for any file handle and the second tells us about the stdin handle. If it returns false, then we are safe to proceed with getLine. So here's how we would rewrite our program:

main :: IO ()
main = go
  where
    go = do
      ended <- isEOF
      if ended
        then return ()
        else do
          input <- getLine
          putStrLn input
          go

Now we won't get that error message when we enter "ctrl+D".

But for these specific problems, there's another tool we can turn to, and this is the "interact" function:

interact :: (String -> String) -> IO ()

The function we supply simply takes an input string and determines what string should be output as a result. It handles all the messiness of looping for us. So we could write our echo program very simply like so:

main :: IO ()
main = interact id

...

>> Hello
Hello
>> Goodbye
Goodbye
>> Ctrl+D

Or if we're a tiny bit more ambitious, we can capitalize each of the user's entries:

main :: IO ()
main = interact (map toUpper)

...

>> Hello
HELLO
>> Goodbye
GOODBYE
>> Ctrl+D

The function is a little tricky though, because the String -> String function is actually about taking the whole input string and returning the whole output string. The fact that it works line-by-line with simple functions is an interesting consequence of Haskell's laziness.

However, because the function is taking the whole input string, you can also write your function so that it breaks the input into lines and does a processing function on each line. Here's what that would look like:

processSingleLine :: String -> String
processSingleLine = map toUpper

processString :: String -> String
processString input = result
  where
    ls = lines input
    result = unlines (map processSingleLine ls)

main :: IO ()
main = interact processString

For our uppercase and id examples, this works the same way. But this would be the only proper way to write our program if we wanted to, for example, parse a simple equation on each line and print the result:

processSimpleAddition :: String -> String
processSingleAddition input = case splitOn " " input of
  [num1, _, num2] -> show (read num1 + read num2)
  _ -> "Invalid input!"

processString :: String -> String
processString input = result
  where
    ls = lines input
    result = unlines (map processSimpleAddition ls)

main :: IO ()
main = interact processString

...

>> 4 + 5
9
>> 3 + 2
5
>> Hello
Invalid input!

So hIsEOF and interact are just a couple more tools you can add to your arsenal to simplify some of these common types of programs. If you're enjoying these blog posts, make sure to subscribe to our monthly newsletter! This will keep you up to date with our newest posts AND give you access to our subscriber resources!

James Bowen 5/16/22 James Bowen 5/16/22

Buffering...Please Wait...

Today we continue our exploration of more obscure IO concepts with the idea of buffering. Buffering determines the more precise mechanics of how our program reads and writes with files. In the right circumstance, using the proper buffering method can make your program work a lot more efficiently.

To start, let's consider the different options Haskell offers us. The BufferMode type has three options:

data BufferMode =
  NoBuffering |
  LineBuffering |
  BlockBuffering (Maybe Int)

Every handle has an assigned buffering mode. We can get and set this value using the appropriate functions:

hGetBuffering :: Handle -> IO BufferMode

hSetBuffering :: Handle -> BufferMode -> IO ()

By default, terminal handles will use NoBuffering and file handles will use BlockBuffering:

main :: IO ()
main = do
  hGetBuffering stdin >>= print
  hGetBuffering stdout >>= print
  (openFile "myfile.txt" ReadMode) >>= hGetBuffering >>= print
  (openFile "myfile2.txt" WriteMode) >>= hGetBuffering >>= print

...

NoBuffering
NoBuffering
BlockBuffering Nothing
BlockBuffering Nothing

So far this seems like some nice trivia to know, but what do these terms actually mean?

Well, when your program reads and writes to files, it doesn't do the "writing" at the exact time you expect. When your program executes hPutStr or hPutStrLn, the given string will be added to the handle's buffer, but depending on the mode, it won't immediately be written out to the file.

If you use NoBuffering though, it will be written immediately. Once the buffer has even a single character, it will write this character to the file. If you use LineBuffering, it will wait until it encounters a newline character.

Finally, there is BlockBuffering. This constructor holds an optional number. The buffer won't write until it contains the given number of bytes. If the value is Nothing, then the underlying number just depends on the operating system.

This idea might sound dangerous to you. Does this mean that it's likely that your program will just leave data unwritten if it doesn't get the right amount? Well no. You can also flush buffers, which will cause them to write their information out no matter what. This happens automatically on important operations like hClose (remember to close your handles!). You can also do this manually with the hFlush function:

hFlush :: Handle -> IO ()

For the most part, you won't notice the difference in buffer modes on normal programs. But under certain circumstances, it can make a big difference in performance. The act of writing information to a file is actually a very long and expensive operation as far as programs are concerned. So doing fewer writes with larger amounts of data tends to be more efficient than doing more writes with smaller amounts of data.

Hopefully you can see now why BlockBuffering is an option. Typically, this is the most efficient way if you're writing a large amount of data, while NoBuffering is the least efficient.

To these this out, I wrote a simple program to write out one hundred thousand numbers to a file, and timed it with different buffer modes:

someFunc :: IO ()
someFunc = do
  let numbers = [1..100000]
  h <- openFile "number.txt" WriteMode
  hSetBuffering h NoBuffering
  timestamp1 <- getCurrentTime
  forM_ numbers (hPrint h)
  hClose h
  timestamp2 <- getCurrentTime
  print $ diffUTCTime timestamp2 timestamp1

When running with NoBuffering, this operation took almost a full second: 0.93938s. However, when I changed to LineBuffering, it dropped to 0.2367s. Finally, with BlockBuffering Nothing, I got a blazing fast 0.05473s. That's around 17x faster! So if you're writing a large amount of data to a file, this can make a definite difference!

If you're writing a program where write-performance is important, I hope this knowledge helps you! Even if not, it's good to know what kinds of things are happening under the hood. If you want to keep up to date with more Haskell knowledge, both obscure and obvious, make sure to subscribe to our monthly newsletter! If you're just starting out, this will give you access to resources like our Beginners Checklist and Recursion Workbook!

James Bowen 5/12/22 James Bowen 5/12/22

Using Temporary Files

In the last article we learned about seeking. Today we'll see another context where we can use these tools while learning about another new idea: temporary files.

Our "new" function for today is openTempFile. Its type signature looks like this:

openTempFile :: FilePath -> String -> IO (FilePath, Handle)

The first argument is the directory in which to create the file. The second is a "template" for the file name. The template can look like a normal file name, like name.extension. The name of the file that will actually be created will have some random digits appended to the name. For example, we might get name1207-5.extension.

The result of the function is that Haskell will create the file and pass a handle to us in ReadWrite mode. So our two outputs are the full path to the file and its handle.

Despite the name openTempFile, this function won't do anything to delete the file when it's done. You'll still have to do that yourself. However, it does have some useful built-in mechanics. It is guaranteed to not overwrite an existing file on the system, and it also gives limited file permissions so it can't be used by an attacker.

How might we use such a file? Well let's suppose we have some calculation that we break into multiple stages, so that it uses an intermediate file in between. As a contrived example, let's suppose we have two functions. One that writes fibonacci numbers to a file, and another that takes the sum of numbers in a file. We'll have both of these operate on a pre-existing Handle object:

writeFib :: Integer -> Handle -> IO ()
writeFib n handle = writeNum (0, 1) 0
  where
    writeNum :: (Integer, Integer) -> Integer -> IO ()
    writeNum (a, b) x = if x > n then return ()
      else hPutStrLn handle (show a) >> writeNum (b, a + b) (x + 1)

sumNumbers :: Handle -> IO Integer
sumNumbers handle = do
  hSeek handle AbsoluteSeek 0
  nums <- (fmap read . lines) <$> hGetContents handle
  return $ sum nums

Notice how we "seek" to the beginner of the file in our reading function. This means we can use the same handle for both operations, assuming the handle has ReadWrite mode. So let's see how we put this together with openTempFile:

main :: IO ()
main = do
  n <- read <$> getLine
  (file, handle) <- openTempFile "/tmp/fib" "calculations.txt"
  writeFib n handle
  sum <- sumNumbers handle
  print sum
  hClose handle
  removeFile file

A couple notes here. First, if the directory passed to openTempFile doesn't exist, this will cause an error. We also need to print the sum before closing the handle, or else Haskell will not actually try to read anything until after closure due to laziness!

But aside from these caveats, our function works! If we don't remove the file, then we'll be able to see the file at a location like /tmp/fib/calculations6132-6.txt.

This example doesn't necessarily demonstrate why we would use openTempFile instead of just giving the file the name calculations.txt. The answer to that is our process is now safer with respect to concurrency. We could run this same operation on different threads in parallel, and there would be no file conflicts. We'll see exactly how to do that later this year!

For now, make sure you're subscribed to our monthly newsletter so that you can stay up to date with all latest information and offers! If you're already subscribed, take a look at our subscriber resources that can help you improve your Haskell!

James Bowen 5/9/22 James Bowen 5/9/22

Finding What You Seek

In our last couple of articles, we've gone through the basics of how to use Handles. This useful abstraction not only lets us access files in a stateful way, but also to treat terminal streams (standard in, standard out) in the same way we treat files. This week we'll learn a few tricks that are a little more specific to handles on files. File handles are seekable, meaning we can move around where we are "pointing" to in the file, similar to moving the position of a video recording.

To understand how this works, we should first make a note, if it isn't clear already, that a Handle is a stateful object. The handle points to the file, but it also tracks information about where it is in the file. For example, let's define a file:

First Line
Second Line
Third Line
Fourth Line
...

We can have two different functions that will print from a handle. One will print a single line, the other will print two lines.

printOneLine :: Handle -> IO ()
printOneLine h = hGetLine h >>= putStrLn

printTwoLines :: Handle -> IO ()
printTwoLines h = do
  hGetLine h >>= putStrLn
  hGetLine h >>= putStrLn

If we call these back to back on our file, it will print all three lines of the file, rather than re-printing the first line.

main :: IO ()
main = do
  h <- openFile "testfile.txt" ReadMode
  printOneLine h
  printTwoLines h
  hClose h

...

>> stack exec io-program
First Line
Second Line
Third Line

This is because the state of h carries over after the first call. So the handle remembers that it is now pointing at the second line.

Now you might wonder, if this computation is stateful, why doesn't it use the State monad? It turns out the IO monad already is its own "state" monad. However, the "state" in this case is the state of the whole operating system! Or we can even think of IO as tracking the state of "the whole outside world". This is why IO is so impure, because the "state of the whole outside world" changes for every single call!

We can illustrate most plainly how the state has changed by printing the position of the handle. This is accessible through the function hGetPosn, which gives us an item of type HandlePosn. We can also use hTell to give us this value as an integer.

hGetPosn :: Handle -> IO HandlePosn

hTell :: Handle -> IO Integer

Let's see the position at different points in our program.

main :: IO ()
main = do
  h <- openFile "testfile.txt" ReadMode
  hGetPosn h >>= print
  hTell h >>= print
  printOneLine h
  hGetPosn h >>= print
  hTell h >>= print
  printTwoLines h
  hGetPosn h >>= print
  hTell h >>= print
  hClose h

...
>> stack exec io-program
{handle: testfile.txt} at position 0
0
First Line
{handle: testfile.txt} at position 11
11
Second Line
Third Line
{handle: testfile.txt} at position 34
34

We can manipulate this position in a number of different ways, but they all depend on the file being seekable. By and large, read and write file handles are seekable, while the terminal handles are not. As we'll see, "append" handles are also not seekable.

hIsSeekable :: IO Bool

main :: IO ()
main = do
  hIsSeekable stdin >>= print
  hIsSeekable stdout >>= print
  h <- openFile "testfile.txt" ReadMode
  hIsSeekable h >>= print
  hClose h

...

>> stack exec io-program
False
False
True

Note: you'll get an error if you even query a closed handle for whether or not it is seekable.

So how do we change the position? The first way is through hSetPosn.

hSetPosn :: HandlePosn -> IO ()

This lets us go back to a previous position we saw. So in this example, we'll read one line and save that position. Then we'll read two more lines, go back to the previous position, and read one line again. Because the HandlePosn object relates both to the numeric position AND the specific handle, we don't need to specify the Handle again in the function call.

main :: IO ()
main = do
  h <- openFile "testfile.txt" ReadMode
  printOneLine h
  p <- hGetPosn h
  printTwoLines h
  hSetPosn p
  printOneLine h
  hClose h

...

>> stack exec io-program
First Line
Second Line
Third Line
Second Line

We can do various tricks with hSeek, which takes the handle and an integer position. It also takes a SeekMode. This tells us if the integer refers to an "absolute" position in the file, a position "relative" to the current position, or even a position relative to the end.

data SeekMode =
  AbsoluteSeek |
  RelativeSeek |
  SeekFromEnd

hSeek :: Handle -> SeekMode -> Integer -> IO ()

In this example we'll read the first line, advance the seek position a few characters (which will cut off what we see of the second line), and then go back to the start.

main :: IO ()
main = do
  h <- openFile "testfile.txt" ReadMode
  printOneLine h
  hSeek h RelativeSeek 4
  printTwoLines h
  hSeek h AbsoluteSeek 0
  printOneLine h
  hClose h

>> stack exec io-program
First Line
nd Line
Third Line
Second Line

We can also seek when writing to a file. As always with WriteMode, there's a gotcha. In this example, we'll write our first line, go back to the start, write another line, and then write a final line at the end.

main :: IO ()
main = do
  h <- openFile "testfile.txt" WriteMode
  hPutStrLn h "The First Line"
  hSeek h AbsoluteSeek 0
  hPutStrLn h "Second Line"
  hSeek h SeekFromEnd 0
  hPutStrLn h "Third Line"
  hClose h

The result file is a little confusing though!

Second Line
ne
Third Line

We overwrote most of the first line we wrote, instead of appending at the beginning! All that's left of "The First Line" is the "ne" and newline character!

We might hope to fix this by using AppendMode, but it doesn't work! This mode makes the assumption that you are only writing new information to the end of a file. Therefore, append handles are not seekable.

If you're just writing application level code, you probably don't need to worry too often about these subtleties. But if you have any desire to write a low-level library, you'll need to know about all these specific mechanics! Stay tuned for more IO-related content in the coming weeks. If you want to stay up to date, make sure to subscribe to our monthly newsletter! You'll get access to our subscriber resources, which includes a lot of great beginner materials!

James Bowen 5/5/22 James Bowen 5/5/22

Handling Files more Easily

Earlier this week we learned about the Handle abstraction which helps us to deal with files in Haskell. An important part of this abstraction is that handles are either "open" or "closed". Today we'll go over a couple ideas to help us deal with opening and closing handles more gracefully.

When we first get a handle, it is "open". Once we're done with it, we can (and should) "close" it so that other parts of our program can use it safely. We close a handle with the hClose function:

hClose :: Handle -> IO ()

Most of the time I use files, I find myself opening and closing the handle in the same function. So for example, if we're reading a file and getting the first 4 lines:

read4Lines :: FilePath -> IO [String]
read4Lines fp = do
  handle <- openFile fp ReadMode
  myLines <- lines <$> readFile handle
  let result = take 4 myLines
  hClose handle
  return myLines

A recommendation I would give when you are writing a function like this is to write the code for the hClose call immediately after you write the code of openFile. So your might start like this:

read4Lines :: FilePath -> IO [String]
read4Lines fp = do
  handle <- openFile fp ReadMode
  -- Handle logic
  hClose handle
  -- Return result

And only after writing these two lines should you write the core logic of the function and the final return statement. This is an effective way to make sure you don't forget to close your handles.

In the strictest sense though, this isn't even fool proof. If you cause some kind of exception while performing your file operations, an exception will be thrown before your program closes the handle. The technically correct way out of this is to use the bracket pattern. The bracket function allows you to specify an action that will take place after the main action is done, no matter if an exception is thrown. This is like using try/catch/finally in Java or try/except/finally in Python. The finally action is the second input to bracket, while our main action is the final argument.

bracket :: IO a -> (a -> IO b) -> (a -> IO c) -> IO c

If we specialize this signature to our specific use case, it might look like this:

bracket :: IO Handle -> (Handle -> IO ()) -> (Handle -> IO [String]) -> IO [String]

And we can then write our function above like this:

read4LinesHandle :: Handle -> IO [String]
read4LinesHandle handle = do
  myLines <- lines <$> readFile handle
  let result = take 4 myLines
  return result

read4Lines :: FilePath -> IO [String]
read4Lines fp = bracket (openFile fp) hClose read4LinesHandle

Now our handle gets closed even if we encounter an error while reading.

Now, this pattern (open a file, perform a handle operation, close the handle) is so common with file handles specifically that there's a special function for it: withFile:

withFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r

This makes our lives a little simpler in the example above:

read4LinesHandle :: Handle -> IO [String]
read4LinesHandle handle = ...

read4Lines :: FilePath -> IO [String]
read4Lines fp = withFile fp ReadMode read4LinesHandle

If you're ever in doubt about whether a Handle is open or not, you can also check this very easily. There are a couple boolean functions to help you out:

hIsOpen :: Handle -> IO Bool

hIsClosed :: Handle -> IO Bool

Hopefully this gives you more confidence in the proper way to deal with file handles! We'll be back next week with some more tricks you can do with these objects. In the meantime, you can subscribe to our monthly newsletter! This will keep you up to date with our latest articles and give you access to our subscriber resources!

James Bowen 5/2/22 James Bowen 5/2/22

Getting a Handle on IO

Welcome to May! This month is "All About IO". We'll be discussing many of the different useful types and functions related to our program's input and output. Many of these will live in the System.IO library module, so bookmark that if you want to demystify how IO works in Haskell!

The first concept you should get a grasp on if you want to do anything non-trivial with IO in Haskell is the idea of a Handle. You can think of a handle as a pointer to a file. We can use this pointer to read from the file or write to the file. The first interaction you'll have with a handle is when you generate it with openFile.

data Handle

openFile :: FilePath -> IOMode -> IO Handle

The first argument here is the FilePath, which is just a type alias for a plain old string. The second argument tells us how we are interacting with the file. There are four different modes of interacting with a file:

data IOMode =
  ReadMode |
  WriteMode |
  AppendMode |
  ReadWriteMode

Each one allows a different set of IO operations, and these are mostly intuitive. With ReadMode, we can read lines from the handle we receive, but we can't edit the file. With AppendMode, we can write new lines to the end of the file, but we can't read from it. In order to do both kinds of operations, we need ReadWriteMode.

As an important note, WriteMode is the most dangerous! This mode only allows writing. It is impossible to read from the file handle. This is because opening a file in WriteMode will erase its existing contents. At first glance it's easy to think that WriteMode will allow you to just write to the end of the file, adding to its contents. But this is the job of AppendMode! Note however that both these modes will create the file if it does not already exist.

Here's an example of some simple interactions with files:

readFirstLine :: FilePath -> IO String
readFirstLine fp = do
  handle <- openFile fp ReadMode
  let firstLine = hGetLine handle
  hClose handle
  return firstLine

writeSingleLine :: FilePath -> String -> IO ()
writeSingleLine fp newLine = do
  -- Create file if it doesn't exist, overwrite its contents if it does!
  handle <- openFile fp WriteMode
  hPutStrLn handle newLine
  hClose handle

addNewLine :: FilePath -> String -> IO ()
addNewLine fp newLine = do
  handle <- openFile fp AppendMode
  hPutStrLn handle newLine
  hClose handle

A few notes. All these functions use hClose when they're done with the handle. This is an important way of letting the file system know we are done with this file.

hClose :: Handle -> IO ()

If we don't close our handles, we might end up with conflicts. Multiple different handles can exist at the same time for reading a file. But only a single handle can existing for writing to a file at any given time. And if we have a write-capable handle (anything other than ReadMode), we can't have other ReadMode handles to that file. So if we write a file but don't close it's handle, we won't be able to read from that file later!

A couple of the functions we wrote above might sound familiar to the most basic IO functions. The first functions you learned in Haskell were probably print, putStrLn, and getLine:

print :: (Show a) => a -> IO ()

putStrLn :: String -> IO ()

getLine :: IO String

The first two will output text to the console, the third will pause and let the user enter a line on the console. Above, we used these two functions:

hPutStrLn :: Handle -> String -> IO ()

hGetLine :: Handle -> IO String

These functions work exactly the same, except they are dealing with a file, so they have the extra Handle argument.

The neat thing is that interacting with the console uses the same Handle abstraction! When your Haskell program starts, you already have access to the following open file handles:

stdin :: Handle

stdout :: Handle

stderr :: Handle

So the basic functions are simply defined in terms of the Handle functions like so:

putStrLn = hPutStrLn stdout

getLine = hGetLine stdin

This fact allows you to write a program that can work either with predefined files as the input and output channels, or the standard handles. This is amazingly useful for writing unit tests.

echoProgram :: (Handle, Handle) -> IO ()
echoProgram (inHandle, outHandle) = do
  inputLine <- hGetLine inHandle
  hPutStrLn outHandle inputLine

main :: IO ()
main = echoProgram (stdin, stdout)

testMain :: IO ()
testMain = do
  input <- openFile "test_input.txt" ReadMode
  output <- openFile "test_output.txt" WriteMode
  echoProgram (input, output)
  hClose input
  hClose output
  -- Assert that "test_output.txt" contains the expected line.
  ...

That's all for our first step into the world of IO. For the rest of this month, we'll be looking at other useful functions. For now, make sure you subscribe to our monthly newsletter so you get a summary of anything you might have missed. You'll also get access to our subscriber resources, which can really help you kickstart your Haskell journey!

James Bowen 4/28/22 James Bowen 4/28/22

Traverse: Fully Generalized Loops

Last time around, we discussed mapM and sequence, which allow us to run loop-like activities in Haskell while also incorporating monadic effects. Today for our last article of for-loops month, we'll look at the final generalization of this idea: the traverse function.

To understand traverse, it helps to recall the ideas behind fmap. When we use fmap, we can take any Functor structure and transform all the underlying elements of that functor, returning a new object with the exact same structure, but different elements. The new elements might be of the same type or they can be entirely different.

fmap :: (Functor f) => (a -> b) -> f a -> f b

We can apply this idea over many different data structures in Haskell. However, it is a pure function. If the operation we're attempting requires a monadic effect, we won't be able to use fmap. For an example, consider having a list of strings which represent people's names. These names correspond to objects of type Person in a database, and we would like to take our list and look up all the people. Here's a naive C++ outline of this:

class Person;

Person lookupPersonByName(const std::string& name) {
  // Database call
  ...
}

std::vector<Person> lookupAllNames(const std::vector& names) {
  std::vector<Person> results;
  for (const auto& name : names) {
    results.push_back(lookupPersonByName(name));
  }
  return results;
}

In C++, this function can just be a normal for loop (though we would want to parallelize in a production setting). In Haskell though, the lookupAllNames function would need to be an IO-like function.

data Person = ...

lookupPersonByName :: String -> IO Person
...

This means we can't use fmap. Now, mapM from the last article is a viable option here. But it's important to also consider its generalization, found in the Traversable class:

class (Functor t, Foldable t) => Traversable t where
  traverse :: Applicative f => (a -> f b) -> t a -> f (t b)

Let's break this down. The traverse function has two inputs:

A function transforming an object using an effect (Applicative or Monadic)
A container of that object. (The container is the traversable class)

The result of this is a new container which has applied the transformation to the input elements, occurring within the effect class. So for our database example, we might have a list of names, and we transform them all:

data Person = ...

lookupPersonByName :: String -> IO Person

lookupAllNames :: [String] -> IO [Person]
lookupAllNames = traverse lookupPersonByName

Since any "foldable functor" will do though, we can also apply a traversal over a Maybe String, an Either String object, or a Map of strings, for example. All these calls will occur in the IO monad.

lookupAllNames :: (Foldable t, Functor t) => t String -> IO (t Person)
lookupAllNames = traverse lookupPersonByName

...

>> :t (lookupAllNames (Just "Joseph"))
IO (Maybe Person)
>> :t (lookupAllNames (Right "Joseph"))
IO (Either a Person)
>> :t (lookupAllNames (Map.fromList [(1, "Joseph"), (2, "Christina")]))
IO (Map.Map Int Person)

A big advantage of this approach over C++ is that we can use Haskell's monadic behavior to easily determine the correct side effect when an operation fails. In our example, by wrapping the calls in IO we ensure that the user is aware that an IO error could occur that they might need to catch. But we could also improve the monad to make the type of error more clear:

data DatabaseError

lookupPersonByName :: String -> ExceptT DatabaseError IO Person

lookupAllNames :: (Foldable t, Functor t) => t String -> ExceptT DatabaseError IO (t Person)

In this particular case, we would short circuit the operation when an error is encountered. In C++, you would probably want to have lookupPersonByName return a type like StatusOr<Person>. Combining these Status objects appropriately might be a bit tricky. So it's nice that monads do this for us automatically in Haskell.

The last thing I'll note is that in Data.Traversable we finally have a function defined as the word for! Similar to mapM and forM, this function is just a flipped version of traverse:

for :: (Traversable t, Applicative f) => t a -> (a -> f b) -> f (t b)

lookupPersonByName :: String -> IO Person

lookupAllNames :: [String] -> IO [Person]
lookupAllNames inputs = for inputs lookupPersonByName

So now when someone says "Haskell doesn't have for loops", you know the proper reply is "yes it does, here is the 'for' function!".

For these two months now, we've explored a bit about monads, and we've explored different kinds of loops. In both these cases, IO presents some interesting challenges. So next month is going to be all about IO! So make sure you keep coming back Mondays and Thursdays, and subscribe to our monthly newsletter so you'll get a summary in case you miss something!

James Bowen 4/25/22 James Bowen 4/25/22

Effectful Loops: Sequence and MapM

We've covered a lot of different ways to run loop behavior in Haskell, but all of them operate in a "pure" way. That is, none of them use monadic behavior. While folding and scanning provide us with a basic mechanism for tracking stateful computations, you sometimes have more complicated problems where the stateful object has more layers. And none of the functions we've seen so far allow IO activity, like printing to the console or accessing the file system.

So to motivate this example, let's imagine we're parsing several different files, each containing a set of names. We would like to read each of these files and create a combined list. Here's what this might look like in C++ with a for-loop:

std::vector<std::string> readSingleFile(std::string filename) { ... }

std::vector<std::string> readNamesFromFiles(std::vector<std::string> filenames) {
  std::vector<std::string> results;
  for (const auto& file : filenames) {
    auto namesFromThisFile = readSingleFile(file);
    // std::copy also works well
    for (const auto& name : namesFromThisFile) {
      results.push_back(name);
    }
  }
  return results;
}

This looks a lot like a map problem in Haskell (or really, concatMap). However, we have a problem. The function we would like to map must be an IO function!

readSingleFile :: FilePath -> IO [String]
readSingleFile fp = lines <$> readFile fp

This means that if we try to use map with this function and a list of FilePaths, we won't get a list of lists that we can immediately concat. And in fact, we won't even get an IO object containing the list of lists! The type will actually be [IO [String]], a list of IO actions which each return a list of strings!

>> let filepaths = [...]
>> let results = map readSingleFile filePaths
>> :t results
[IO [String]]

By itself, this doesn't seem to help us achieve our goal. But there are a couple helpers that can get the rest of the way. One function is sequence. This takes a list of monadic actions and runs them back to back, collecting the results! This function generalizes beyond lists, but we'll just think about the type signature using lists for right now.

sequence :: (Monad m) => [m a] -> m [a]

If we imagine our list of monadic actions, this function essentially acts as though it is running them all together in do syntax and returning a list of the results.

sequence :: (Monad m) => [m a] -> m [a]
sequence [action1, action2, action3, ...] = do
  result1 <- action1
  result2 <- action2
  result3 <- action3
  ...
  return [result1, result2, result3, ...]

So we could apply this function against our previous output, and then concat the results within the IO object using fmap.

readSingleFile :: FilePath -> IO [String]

readNamesFromFiles :: [FilePath] -> IO [String]
readNamesFromFiles files =
  (fmap concat) $ sequence (map readSingleFile files)

But there's an even simpler way to do this! There's also the function mapM, which essentially combines map and sequence:

mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]

It's type signature is like the original map, but instead it takes a function in a monadic action and produces a single monadic action. This allows us to simplify our solution:

readSingleFile :: FilePath -> IO [String]

readNamesFromFiles :: [FilePath] -> IO [String]
readNamesFromFiles files =
  (fmap concat) $ mapM readSingleFile files

Since mapping works so much like a for-loop, there is even the function forM. This is the same as mapM, except that its arguments are reversed, so the list comes first.

forM :: (Monad m) => [a] -> (a -> m b) -> m [b]

Each of these functions also has an equivalent underscored function, which discards the result. These can be useful when you're only interested in the monadic side effect of the function, rather than the return value.

sequence_ :: (Monad m) => [m a] -> m ()

mapM_ :: (Monad m) => (a -> m b) -> [a] -> m ()

forM_ :: (Monad m) => [a] -> (a -> m b) -> m ()

Here's an example where we'll want that. Suppose our files might contain duplicated names, and we want to discard duplicates. This means we'll want a Set instead of a list. Our C++ code doesn't change much:

std::vector<std::string> readSingleFile(std::string filename) { ... }

std::set<std::string> readNamesFromFiles(std::vector<std::string> filenames) {
  std::set<std::string> results;
  for (const auto& file : filenames) {
    auto namesFromThisFile = readSingleFile(file);
    for (const auto& name : namesFromThisFile) {
      results.insert(name);
    }
  }
  return results;
}

We can change things up in our Haskell code though! Instead of post-processing the list of lists afterward, we'll keep track of the growing set using the State monad!

readSingleFile :: FilePath -> StateT (Set.Set String) IO ()
readSingleFile fp = do
  names <- lift (lines <$> readFile fp)
  prevSet <- get
  put $ foldr Set.insert prevSet names

readNamesFromFiles :: [FilePath] -> IO [String]
readNamesFromFiles filenames = Set.toList <$> execStateT
  (mapM_ readSingleFile filenames) Set.empty

Notice how this uses the execStateT shortcut we talked about last month! Generally speaking, combining the State monad and mapM provides a fully generalizable way to write for-loop code. It allows you to incorporate IO as needed, and it allows you to track any kind of state you would like. Sometimes though, it'll be much more convenient to use traverse, which we'll talk about in our final article this month!

If you're enjoying learning about for loops in Haskell, you should sign up for our monthly newsletter! This will keep you up to date with any new content that comes out and you'll get access to our subscriber resources!

James Bowen 4/21/22 James Bowen 4/21/22

What about While Loops? Unfolding in Haskell

So far this month, we've been focusing on "for" loops. All of our functions have taken a list as an input and produced either a single value or a list of accumulated values matching the length of the list. But sometimes we actually want to do the opposite! We want to take a single seed value and produce a list of values! This is more in keeping with the behavior of a "while" loop, though it's also possible to do this as a for-loop.

Recall that "fold" is our basic tool for producing a single value from a list. Now when we do the opposite, the concept is calling "unfolding"! The key function here is unfoldr:

unfoldr :: (b -> Maybe (a, b)) -> b -> [a]

It takes a single input value and produces a list of items in a resulting type. The function we pass will take our input type and produce a new result value as well as a new input value! This new input value gets passed again to the function on the next iteration. It continues until our function produces Nothing.

unfoldr :: (input -> Maybe (result, input)) -> input -> [result]

Here's a C++ example. Suppose we're trying to produce a binary representation of a number and we're OK with this representation being variable length. Here's how we might do this with a while loop.

enum BitValue { ONE, ZERO };

// This representation is a little odd in that we use an empty list to
// represent '0'
std::vector<BitValue> produceBits(uint64_t input) {
  std::vector result;
  while (input > 0) {
    if (input % 2 == 0) {
      result.push_back(ZERO);
    } else {
      result.push_back(ONE);
    }
    input /= 2;
  }
  std::reverse(result.begin(), result.end());
  return result;
}

So each time through the loop we produce a new bit depending on the "input" value status. Once we hit 0, we're done.

How can we do this in Haskell with unfoldr? Well first let's write a function to do the "unfolding". This follows the same logic that's inside the loop:

produceSingleBit :: Word -> Maybe (Bit, Word)
produceSingleBit 0 = Nothing
produceSingleBit x = if x `mod` 2 == 0
  then Just (Zero, x `quot` 2)
  else Just (One, x `quot` 2)

And now to complete the function, it's a simple application of unfoldr!

produceBits :: Word -> [Bit]
produceBits x = reverse (unfoldr produceSingleBit x)

...

>> produceBits 4
[One, Zero, Zero]
>> produceBits 3
[One, One]
>> produceBits 9
[One, Zero, Zero, One]

We can also implement a fibonacci function using unfold. Our "unfolding" function just needs one of its inputs to act as a counter. We provide this counter and the initial values 0 and 1 as the seed values, and it will keep counting down. This will provide us with the complete list of fibonacci numbers up to the given input index.

fib :: Word -> [Word]
fib x = unfoldr unfoldFib (x, 0, 1)
  where
    unfoldFib (count, a b) = if count == 0
      then Nothing
      else Just (b, (count - 1, b, a + b))

...

>> fib 1
[1]
>> fib 2
[1, 1]
>> fib 5
[1, 1, 2, 3, 5]

It might seem a little unnatural, but there are lots of opportunities to incorporate unfold into your Haskell code! Just keep a lookout for these "while loop" kinds of problems. To stay updated with all the latest on Monday Morning Haskell, make sure to subscribe to our newsletter! You can also follow my streaming schedule on Twitch, which I'll also post on Twitter!

James Bowen 4/18/22 James Bowen 4/18/22

Combining Ideas: mapAccum

In the last couple weeks, we've learned about folding and scanning, which are valuable tools for replacing conventional for loops in our Haskell code. Today we'll go over a lesser known idea that sort of combines folding and scanning (or just folding and mapping). The particular function we'll look at is mapAccumL. For our motivating example, let's consider a parsing program.

Our input is a set of strings giving a series of mathematical operations.

Add 5
Multiply 3
Add 9
Divide 8
Subtract 2

We want to know the final "value" of this file, (in this case it's 6) but we also want to keep track of the operations we used to get there. Here's a C++ outline:

enum OpType { ADD, SUB, MUL, DIV };
struct Operation {
  OpType opType;
  double value;
};

Operation parseOperation(const std::string& inputLine) { ... }

double processOperation(double currentValue, Operation op) {
  switch (op.opType) {
    case ADD: return currentValue + op.value;
    case SUB: return currentValue - op.value;
    case MUL: return currentValue * op.value;
    case DIV: return currentValue / op.value;
  }
}

Given the list of input lines, there are two approaches we could use to get the desired outputs. We can process the lines first and then do the calculation. This results in two loops like so:

std::pair<double, std::vector<Operation>> processLines(const std::vector<std::string>& lines) {
  std::vector<Operation> operations;
  for (const auto& line : lines) {
    operations.push_back(parseOperation(line));
  }
  double result = 0.0;
  for (auto& operation : operations) {
    result = processOperation(result, operation);
  }
  return {result, operations);
}

But we can also write it with a single for loop like so:

std::pair<double, std::vector<Operation>> processLines(const std::vector<std::string>& lines) {
  std::vector<Operation> operations;
  double result = 0.0;
  for (const auto& line : lines) {
    Operation newOp = parseOperation(line);
    operations.push_back(newOp);
    result = processOperation(result, newOp);
  }
  return {result, operations);
}

This for loop essentially performs multiple different actions for us at the same time. It appends to our list, AND it updates our result value. So it's acting like a map and a fold simultaneously.

In Haskell, most of the functional loop replacements really only perform a single action, so it's not necessarily clear how to do "double duty" like this. Here's a simple Haskell approach that splits the work in two pieces:

data OpType = Add | Sub | Mul | Div
data Operation = Operation OpType Double

parseOperation :: String -> Operation
processOperation :: Double -> Operation -> Double

processLines :: [String] -> (Double, [Operation])
processLines lines = (ops, result)
  where
    ops = map parseOperation lines
    result = foldl processOperation 0.0 ops

It turns out though, we have a function that can combine these steps! This function is mapAccumL.

mapAccumL :: (a -> b -> (a, c)) -> a -> [b] -> (a, [c])

Once again, I'll change up this type signature to assign a semantic value to each of these types. As always, the type b that lives in our input list is our item type. The type a is still our primary result, but now I'll assign c as the accum type.

mapAccumL :: (result -> item -> (result, accum)) -> result -> [item] -> (result, [accum])

It should be obvious how we can rewrite our Haskell expression now using this function:

processLines :: [String] -> (Double, [Operation])
processLines lines = mapAccumL processSingleLine 0.0 lines
  where
    processSingleLine result line =
      let newOp = parseOperation line
      in  (processOperation result newOp, newOp)

Now, our original implementation is still perfectly fine and clean! And we could also do this using foldl as well. But it's good to know that Haskell has more functions out there that can do more complicated types of loops than just simple maps and folds.

If you want to see me writing some Haskell code, including many many for loops, tune into my Twitch Stream every Monday evening! Start times will be announced via Twitter. You can also subscribe to our newsletter to learn more and stay up to date!

James Bowen 4/14/22 James Bowen 4/14/22

Using Scans to Accumulate

In last week's article we talked about how fold is one of the fundamental for-loop functions you'll want to use in Haskell. It relies on a folding function, which incorporates each new list item into a result value. There's one potential drawback though: you won't get any intermediate results, at least without some effort. Sometimes, you want to know what the "result" value was each step of the way.

This is where "scan" comes in. Let's start with a C++ example for accumulated sums:

std::vector<int> addWithSums(const std::vector<int>& inputs) {
  std::vector<int> results = {0};
  int total = 0;
  for (int i = 0; i < inputs.size(); ++i) {
    total += inputs[i];
    results.push_back(total);  
  }
  return results;
}

Let's consider a simple folding sum solution in Haskell:

sum :: [Int] -> Int
sum = foldl (+) 0

We could adapt this solution to give intermediate results. But it would be a little bit tricky. Instead of using (+) by itself as our folding function, we have to make a custom function that will store the list of accumulated values. In order to make it efficient, we'll also have to accumulate it in reverse and add an extra step at the end.

accumulatedSum :: [Int] -> [Int]
accumulatedSum inputs = reverse (foldl foldingFunc [0] inputs)
  where
    foldingFunc :: [Int] -> Int -> [Int]
    foldingFunc prev x = x + head prev : prev

However, we can instead perform this job with the idea of a "scan". There are scan functions corresponding to the fold functions, so we have scanl, scanr, and scanl'.

scanl :: (b -> a -> b) -> b -> [a] -> [b]

scanl' :: (b -> a -> b) -> b -> [a] -> [b]

scanr :: (a -> b -> b) -> b -> [a] -> [b]

Let's focus on scanl. Once again, I'll re-write the type signatures to be more clear with "items" and "results".

scanl :: (result -> item -> result) -> result -> [item] -> [result]

This is almost identical to foldl, except that the final result is a list of results, rather than a single result. And it does exactly what you would expect! Each time it calculates a result value in the folding function (scanning function?) it will include this in a list at the end. This makes it much easier for us to write our accumulated sum function!

accumulatedSum :: [Int] -> [Int]
accumulatedSum inputs = scanl (+) 0 inputs

...

>> scanl (+) 0 [5, 3, 8, 11]
[0, 5, 8, 16, 27]

As a curiosity, you can use this pattern to provide an infinite list of all the triangle numbers:

triangleNumbers :: [Int]
triangleNumbers = scanl' (+) 0 [1..]

…

>> triangleNumbers !! 4
10
>> triangleNumbers !! 7
28

Next week we'll be back with more loop alternatives! In the meantime, you should subscribe to our newsletter so you can stay up to date with the latest news!

James Bowen 4/11/22 James Bowen 4/11/22

Two for One: Using concatMap

Today's for-loop replacement is a simpler one that combines two functions we should already know! We'll see how we can use concatMap to cover some of the basic loop cases we might encounter in other languages. This function covers the case of "every item in my list produces multiple results, and I want these results in a single new list." Let's write some C++ code that demonstrates this idea. We'll begin with a basic function that takes a single (unsigned) integer and produces a list of unsigned integers.

std::vector<uint64_t> makeBoundaries(uint_64t input) {
  if (input == 0) {
    return {0, 1, 2);
  } else if (input == 1) {
    return {0, 1, 2, 3};
  } else {
    return {input - 2, input - 1, input, input + 1, input + 2)
  }
}

This function gives the two numbers above and below our input, with a floor of 0 since the value is unsigned. Now let's suppose we want to take the boundaries of each integer in a vector of inputs, and place them all in a single list. We might end up with something like this:

std::vector<uint64_t> makeAllBoundaries(std::vector<uint64_t> inputs) {
  std::vector<uint64_t> results;
  for (uint64_t i : inputs) {
    std::vector<uint64_t> outputs = makeBoundaries(i);
    for (uint64_t o : outputs) {
      results.push_back(o);
    }
  }
  return results;
}

Here we end up with nested for loops in the same function! We can't avoid this behavior occurring. But we can avoid needing to write this into our source code in Haskell with the concatMap function:

concatMap :: (a -> [b]) -> [a] -> [b]

As the name implies, this is a combination of the map function we've already seen with an extra concatenation step. Instead of mapping a function that transforms a single item to a single item, the function now produces a list of items. But this function's "concat" step will append all the result lists for us. Here's how we could write this code in Haskell:

makeBoundaries :: Word -> [Word]
makeBoundaries 0 = [0, 1, 2]
makeBoundaries 1 = [0, 1, 2, 3]
makeBoundaries i = [i - 2, i - 1, i, i + 1, i + 2]

makeAllBoundaries :: [Word] -> [Word]
makeAllBoundaries inputs = concatMap makeBoundaries inputs

Nice and simple! Nothing in our code really looks "iterative". We're just mapping a single function over our input, with a little bit of extra magic to bring the results together. Under the hood, this function combines the existing "concat" and "map" functions, which use recursion. Ultimately, most Haskell for-loop replacements rely on recursion to create their "iterative" behavior. But it's nice that we don't always have to bring that pattern into our own code.

If you want to stay up to date with Haskell tips and tricks, make sure to subscribe to our monthly newsletter! We'll have some special offers coming up soon, so you won't want to miss them!