James Bowen James Bowen

Data Structures: Hash Maps!

Throughout this month we've been exploring the basics of many different data structures in Haskell. I started out with a general process called 10 Steps for understanding Data Structures in Haskell. And I've now applied that process to four common structures in Haskell:

  1. Lists
  2. Sets
  3. Maps
  4. Hash Sets

Today we're taking the next logical step in the progression and looking at Hash Maps. Starting later this week, we'll start looking as lesser-known Haskell structures that don't fit some of the common patterns we've been seeing so far! So keep an eye on this blog page as well as the Data Structures Series page!

Read More
James Bowen James Bowen

10 Steps to Understanding Data Structures in Haskell

(Skip to the steps)

Last year I completed Advent of Code, which presented a lot of interesting algorithmic challenges. One thing these problems really forced me to do was focus more clearly on using appropriate data structures, often going beyond the basics.

And looking back on that, it occurred to me that I hadn't really seen many tutorials on Haskell structures beyond lists and maps. Nor in fact, had I thought to write any myself! I've touched on sequences and arrays, but usually in the context of another topic, rather than focusing on the structure itself.

So after thinking about it, I decided it seemed worthwhile to start doing some material providing an overview on all the different structures you can use in Haskell. So data structures will be our "blog topic" starting in this month of July, and probably actually going into August. I'll be adding these overviews in a permanent series, so each blog post over the next few Mondays and Thursdays will link to the newest installment in that series.

But even beyond providing a basic overview of each type, I thought it would be helpful to come up with a process for learning new data structures - a process I could apply to learning any structure in any langugage. Going to the relevant API page can always be a little overwhelming. Where do you start? Which functions do you actually need?

So I made a list of the 10 steps you should take when learning the API for a data structure in a new language. The first of these have to do with reminding yourself what the structure is used for. Then you get down to the most important actions to get yourself started using that structure in code.

The 10 Steps

  1. What operations does it support most efficiently? (What is it good at?)
  2. What is it not as good at?
  3. How many parameters does the type use and what are their constraints.
  4. How do I initialize the structure?
  5. How do I get the size of the structure?
  6. How do I add items to the structure?
  7. How do I access (or get) elements from the structure?
  8. If possible, how do I delete elements from the structure?
  9. How do I combine two of these structures?
  10. How should I import the functions of this structure?

Each part of the series will run through these steps for a new structure, focusing on the basic API functions you'll need to know. To see my first try at using this approach where I go over the basic list type, head over to the first part of the series! As I update the series with more advanced structures I'll add more links to this post.

If you want to quickly see all the APIs for the structures we'll be covering, head to our eBooks page and download the FREE Data Structures at a Glance eBook!

Read More
James Bowen James Bowen

A Brief Look at Asynchronous Exceptions

We've covered a lot of different exception mechanisms this month. Today we'll cover just one more concept that will serve as a teaser for some future exploration of concurrency. All the exception mechanisms we've looked at so are serial in nature. We call a certain IO function from our main thread, and we block the program until that operation finishes. However, exceptions can also happen in an asynchronous way. This can happen in a couple different ways.

First, as we'll learn later this year, we can fork different threads for our program to run, and it is possible to raise an exception in a different thread! If you're interested in exploring how this works, the relevant functions you should learn about are forkIO and throwTo:

forkIO :: IO () -> IO ThreadId

throwTo :: (Exception e) => ThreadId -> e -> IO ()

The first will take an IO action and run it on a new thread, returning the ID of that thread. The section function then allows you to raise an exception in that thread using the ID. We'll look into the details of this function at a later date.

However, there are also certain asynchronous exceptions that can occur no matter what we do. At any point, our program could theoretically run out of memory (StackOverflow or HeapOverflow), and our program will have to abort the execution of our program in an asynchronous manner. So even if we aren't specially forking new threads, we still might encounter these kinds of exceptions.

The mask function is a special utility that prevents an action from being interrupted by an asynchronous exception.

mask :: ((forall a. IO a -> IO a) -> IO b) -> IO b

However, just from looking at the type signature, this function is rather confusing. It takes one argument, a function that takes another function as its input! As we can read in the documentation though, the most common use case for this function is to protect a resource to make sure we release it even if an exception is thrown while performing a computation with it.

Lucky for us, the bracket function already handles this case, as we've discussed. So the chances that you'll have to manually use mask are not very high.

This wraps up our discussion of exceptions for now on Monday Morning Haskell! We'll be sure to touch on this subject again when we get to concurrency in a few months. If you want a summary of this month's articles in your inbox, there's still time to subscribe to our monthly newsletter! You won't want to miss what's coming up next week!

Read More
James Bowen James Bowen

Catching Before Production: Assert Statements in Haskell

We've spent a lot of time this month going over exceptions, which are ways to signal within our program that something unexpected has happened. These will often result in an early termination for our program even if we catch them. But by catching them, we can typically provide more helpful error messages and logs. Exceptions are intended for use in production code. You don't want them to ever go off, but they are there .

However, there are other bugs that you really want to catch before they ever make it into production. You don't want to formally recognize them in the type system because other parts of the program shouldn't have to deal with those possibilities. In these cases, it is common practice for programmers to use "assert" statements instead.

We'll start with a simple example in Python. We'll write a function to adjust a price, first by subtracting and second by taking the square root. Of course, you cannot take the square root of a negative number (and prices shouldn't be negative anyways). So we'll assert that the price is non-negative before we take the root.

def adjustPrice(input):
  adjustedDown = input - 400.0
  assert (adjustedDown >= 0)
  return $ sqrt(adjustedDown)

In Haskell we also have an assert function. It looks a bit like throw in the sense that its type signature looks pure, but can actually cause an error.

assert :: Bool -> a -> a

If the boolean input is "true", nothing happens. The function will return the second input as its output. But if the boolean is false, then it will throw an exception. This is useful because it will provide us with more information about where the error occurred. So let's rewrite the above function in Haskell.

adjustPrice :: Double -> Double
adjustPrice input = assert (adjustedDown >= 0.0) (sqrt adjustedDown)
  where
    adjustedDown = input - 400.0

If we give it a bad input, we'll get a helpful error message with the file and line number where the assertion occurred:

main :: IO ()
main = do
  let result = adjustPrice 325.0
  print result

...

>> stack exec my-program

my-program: Assertion failed
CallStack (from HasCallStack):
  assert, called at src/Lib.hs in MyProgram-0.1.0.0:Lib

Without using the asssert, our function would simply return NaN and continue on! It would be much harder for us to track down where the bug came from. Ideally, we would catch a case like this in unit testing. And it might indicate that our "adjustment" is too high (perhaps it should be 40.0 instead of 400.0).

For the sake of efficiency, assert statements are turned off in executable code. This is why it is imperative that you write a unit test to uncover the assertion problem. In order to run your program with assertions, you'll need to use the fno-ignore-asserts GHC option. This is usually off for executables, but on for test suites.

We have one more concept to talk about with exception handling, so get ready for that! If you want a summary of all the topics we talked about this month, make sure to subscribe to our monthly newsletter!

Read More
James Bowen James Bowen

Resources and "Bracket"

During our focus on the IO Monad, we learned a few things about opening and closing file handles. One useful tidbit we learned from this process was the "bracket" pattern. This pattern allows us to manage the acquisition and release of resources in our system. The IO monad is very often concerned with external resources, whether files on our filesystem or operating system resources like thread locks and process IDs.

The general rule behind these kinds of resources is that we do not want our program to be in a state where they are unreachable by code. Another way of saying this is that any code that acquires a release must make sure to release it. For the example of file handles, we can acquire the resource with openFile and release it with hClose.

processFile :: FilePath -> IO ()
processFile fp = do
  -- Acquire resource
  fileHandle <- openFile fp ReadMode
  -- ... Do something with the file
  -- Release the resource
  hClose fileHandle

Now we might want to call this function with an exception handler so that our program doesn't crash if we encounter a serious problem:

main :: IO ()
main = do
  handle ioHandler (processFile "my_file.txt"
  ...
  where
    ioHandler :: IOError -> IO ()
    ioHandler e = putStr "Handled Exception: " >> print e

However, this error handler doesn't have access to the file handle. So it can't actually ensure the handle gets closed. So if our error occurs during the "do something" part of the function, this file handle will still be open.

But now suppose we need to do a second operation on this file that appends to it. If we still have a "Read Mode" handle open, we're not going to be able to open it for appending. So if our handler doesn't close the file, we'll encounter a potentially unnecessary error.

main :: IO ()
main = do
  handle ioHandler (processFile "my_file.txt")
  -- Might fail!
  appendOperation "my_file.txt"
  where
    ioHandler :: IOError -> IO ()
    ioHandler e = putStr "Handled Exception: " >> print e

The solution to this problem is to use the "bracket" pattern of resource usage. Under this pattern, our IO operation has 3 stages:

  1. Acquire the resource
  2. Use the resource
  3. Release the resource

The bracket function has three input arguments for these stages, though the order is 1 -> 3 -> 2:

bracket
  :: IO a -- 1. Acquire the resource
  -> (a -> IO b) -- 3. Release the resource
  -> (a -> IO c) -- 2. Use the resource
  -> IO c -- Final result

Let's add some semantic clarity to this type:

bracket
  :: IO resource -- 1. Acquire
  -> (resource -> IO extra) -- 3. Release
  -> (resource -> IO result) -- 2. Use
  -> IO result -- Result

The "resource" is often a Handle object, thread ID, or an object representing a concurrent lock. The "extra" type is usually the unit (). Most operations that release resources have no essential return value. So no other part of our computation takes this "extra" type as an input.

Now, even if an exception is raised by our operation, the "release" part of the code will be run. So if we rewrite our code in the following way, the file handle will get closed and we'll be able to perform the append operation, even if an exception occurs:

processFile :: FilePath -> IO ()
processFile fp = bracket
  (openFile fp ReadMode) -- 1. Acquire resource
  hClose -- 3. Release resource
  (\fileHandle -> do ... -- 2. Use resource)

main :: IO ()
main = do
  handle ioHandler (processFile "my_file.txt")
  appendOperation "my_file.txt"
  where
    ioHandler :: IOError -> IO ()
    ioHandler e = putStr "Handled Exception: " >> print e

As we discussed in May, the withFile helper does this for you automatically.

Now there are a few different variations you can use with bracket. If you don't need the resource as part of your computation, you can use bracket_ with an underscore. This follows the pattern of other monad functions like mapM_ and forM_ where the underscore indicates we don't use part of the result of a computation.

bracket_
  :: IO resource -- 1. Acquire
  -> (IO extra) -- 3. Release
  -> (IO result) -- 2. Use
  -> IO result -- Result

If you're passing around some kind of global manager object of data and state, this function may simplify your code.

There is also bracketOnError. This will only run the "release" action if an error is encountered. If the "do something" step succeeds, the release step is skipped. So it might apply more if you're trying to use this function as an alternative to handle.

bracketOnError
  :: IO resource -- 1. Acquire
  -> (resource -> IO extra) -- 3. Release if error
  -> (resource -> IO result) -- 2. Use the resource
  -> IO result -- Result

The last example is finally.

finally
  :: IO result
  -> IO extra
  -> IO result

This function is less directly related to resource management. It simply specifies a second action (returning "extra") that will be run once the primary computation ("result") is done, no matter if it succeeds or fails with an exception.

This might remind us of the pattern in other languages of "try/catch/finally". In Haskell, bracket will give you the behavior of the "finally" concept. But if you don't need the resource acquisition step, then you use the finally function.

Hopefully these tricks are helping you to write cleaner Haskell code. We just have a couple more exception patterns to go over this month. If you think you've missed something, you can scroll back on the blog. But you can also subscribe to our mailing list to get a summary at the end of every month!

Read More
James Bowen James Bowen

Further Filtering

In our previous article we explored the functions catchJust and handleJust. These allow us to do some more specific filtering on the exceptions we're trying to catch. With catch and handle, we'll catch all exceptions of a particular type. And in many cases, especially where we have pre-defined our own error type, this is a useful behavior.

However, we can also consider cases with built-in error types, like IOError. There are a lot of different IO errors our program could throw. And sometimes, we only watch to catch a few of them.

Let's consider just two of these examples. In both actions of this function, we'll open a file and print its first line. But in the first example, the file itself does not exist. In the second example, the file exists, but we'll also open the file in "Write Mode" while we're reading it.

main :: IO ()
main = do
  action1
  action2
  where
    action1 = do
      h <- openFile "does_not_exist.txt" ReadMode
      firstLine <- hGetLine h
      putStrLn firstLine
      hClose h
    action2 = do
      h <- openFile "does_exist.txt" ReadMode
      firstLine <- hGetLine h
      putStrLn firstLine
      h2 <- openFile "does_exist.txt" WriteMode
      hPutStrLn h2 "Hello World"
      hClose h

These will cause two different kinds of IOError. But we can catch them both with a handler function:

main :: IO ()
main = do
  handle handler action1
  handle handler action2
  where
    action1 = do
      h <- openFile "does_not_exist.txt" ReadMode
      firstLine <- hGetLine h
      putStrLn firstLine
      hClose h
    action2 = do
      h <- openFile "does_exist.txt" ReadMode
      firstLine <- hGetLine h
      putStrLn firstLine
      h2 <- openFile "does_exist.txt" WriteMode
      hPutStrLn h2 "Hello World"
      hClose h

    handler :: IOError -> IO ()
    handler e = print e

And now we can run this and see both errors are printed.

>> stack exec my-program
does_not_exist.txt: openFile: does not exist (No such file or directory)
First line
does_exist.txt: openFile: resource busy (file is locked)

But suppose we only anticipated our program encountering the "does not exist" error. We don't expect a "resource busy" error, so we want our program to crash if it happens so we are forced to fix it. We need to filter the error types and use handleJust instead.

Luckily, there are many predicates on IOErrors like isDoesNotExistError. We can use this to write our own predicate:

-- Library function
isDoesNotExistError :: IOError -> Bool

-- Our predicate
filterIO :: IOError -> Maybe IOError
filterIO e = if isDoesNotExistError e
  then Just e
  else Nothing

Now let's quickly recall the type signatures of catchJust and handleJust:

catchJust :: Exception e =>
  (e -> Maybe b) ->
  IO a ->
  (b -> IO a) ->
  IO a

handleJust :: Exception e =>
  (e -> Maybe b) ->
  (b -> IO a) ->
  IO a ->
  IO a

We can rewrite our function now so it only captures the "does not exist" error. We'll add the predicate, and use it with handleJust, along with our existing handler.

main :: IO ()
main = do
  handleJust filterIO handler action1
  handleJust filterIO handler action2
  where
    action1 = do
      h <- openFile "does_not_exist.txt" ReadMode
      firstLine <- hGetLine h
      putStrLn firstLine
      hClose h
    action2 = do
      h <- openFile "does_exist.txt" ReadMode
      firstLine <- hGetLine h
      putStrLn firstLine
      h2 <- openFile "does_exist.txt" WriteMode
      hPutStrLn h2 "Hello World"
      hClose h

    handler :: IOError -> IO ()
    handler e = putStr "Caught error: " >> print e

    filterIO :: IOError -> Maybe IOError
    filterIO e = if isDoesNotExistError e
      then Just e
      else Nothing

When we run the program, we'll see that the first error is caught. We see our custom message "Caught error" instead of the program name. But in the second instance, our program crashes!

>> stack exec my-program
Caught error: does_not_exist.txt: openFile: does not exist (no such file or directory)
First line
my-program: does_exist.txt: openFile: resource busy (file is locked)

Hopefully this provides you with a clear and practical example of how you can use these filtering functions for handling your errors! Next time, we'll take a deeper look at the "bracket" pattern. We touched on this during IO month, but it's an important concept, and there are more helper functions we can incorporate with it! So make sure to stop back here later in the week. And also make sure to subscribe to our monthly newsletter if you haven't already!

Read More
James Bowen James Bowen

Just Catching a Few Things

We've now seen a few of the different nuances in handling exceptions within our code. Earlier this month we learned about the "catch" and "handle" functions, which are the backbone of capturing exceptions in our code. And then last time around we saw the importance of how these catch particular types of exceptions.

Today we'll go over a new pair of handling functions. These allow us to narrow down the range of exceptions we'll handle, rather than catching every exception of a particular type. These functions are catchJust, and its flipped counterpart, handleJust. Here are the type signatures:

catchJust :: Exception e =>
  (e -> Maybe b) ->
  IO a ->
  (b -> IO a) ->
  IO a

handleJust :: Exception e =>
  (e -> Maybe b) ->
  (b -> IO a) ->
  IO a ->
  IO a

The defining features of these handler functions is the filter predicate at the start: the first argument of type e -> Maybe b. This takes the input exception type and returns a Maybe value. That maybe value can be some transformation on the exception input.

Let's make a simple example using our ListException type. Let's recall what this type looks like:

data ListException =
  ListIsEmpty String |
  NotEnoughElements String Int Int
  deriving (Show)

instance Exception ListException

As a simple example, let's write a predicate that will only capture our ListIsEmpty exception. It will return the name of the function causing the error.

isEmptyList :: ListException -> Maybe String
isEmptyList (ListIsEmpty functionName) = Just functionName
isEmptyList _ = Nothing

Now we'll write a function that will process a list and print its first element. But if it is empty, it will print the name of the function. This will use catchJust.

printFirst :: [Int] -> IO Int
printFirst input = catchJust isEmptyList action handler
  where
    action :: IO Int
    action = do
      let result = myHead input
      print result
      return result

    handler :: String -> IO Int
    handler functionName = do
      putStrLn $ "Caught Empty List exception from function: " ++ functionName ++ ". Returning 0!"
      print 0
      return 0

Now when run this, we'll see the error message we expect:

main :: IO ()
main = do
  result1 <- printFirst []
  result2 <- printFirst [2, 3, 4]
  print $ result1 + result2


...

Caught Empty List exception from function: myHead. Returning 0!
0
2
2

But if we change the function to use "sum2Pairs" instead (which throws NotEnoughElements, rather than ListIsEmpty), then we'll still see the exception!

sum2Pairs :: (Num a) => [a] -> (a, a)
sum2Pairs (a : b : c : d : _) = (a + b, c + d)
sum2Pairs input = throw (NotEnoughElements "sum2Pairs" 4 (length input))

newMain :: IO ()
newMain = do
  result1 <- printSums []
  result2 <- printSums [2, 3, 4, 5]
  print $ (result1, result2)

...

>> stack exec my-program
my-program: NotEnoughElements "sum2Pairs" 4 0

We can modify the predicate so that it always catches exceptions from a particular function and gives different error messages depending on the exception thrown:

isSum2Pairs :: ListException -> Maybe ListException
isSum2Pairs e@(ListIsEmpty function) = if function == "sum2Pairs'"
  then Just e
  else Nothing
isSum2Pairs e@(NotEnoughElements function _ _) = if function == "sum2Pairs'"
  then Just e
  else Nothing

Now let's modify sum2Pairs so that it can throw either error type, depending on its input:

sum2Pairs' :: (Num a) => [a] -> (a, a)
sum2Pairs' (a : b : c : d : _) = (a + b, c + d)
sum2Pairs' [] = throw (ListIsEmpty "sum2Pairs'")
sum2Pairs' input = throw (NotEnoughElements "sum2Pairs'" 4 (length input))

When we use this updated version in our main function, we'll see we get a variety of outputs!

printSums' :: [Int] -> IO (Int, Int)
printSums' input = catchJust isSum2Pairs action handler
  where
    action :: IO (Int, Int)
    action = do
      let result = sum2Pairs' input
      print result
      return result

    handler :: ListException -> IO (Int, Int)
    handler e = do
      putStrLn $ "Caught exception: " ++ show e ++ ". Returning (0, 0)!"
      print (0, 0)
      return (0, 0)

newMain :: IO ()
newMain = do
  result1 <- printSums' []
  result2 <- printSums' [2, 3, 4]
  print $ (result1, result2)
...

>> stack exec my-program
Caught exception: ListIsEmpty "sum2Pairs'". Returning (0, 0)!
(0,0)
Caught exception: NotEnoughElements "sum2Pairs'" 4 3. Returning (0, 0)!
(0,0)
((0,0),(0,0))

Next time, we'll look at a more practical usage of this approach with IO Errors! Until then, make sure you subscribe to our monthly newsletter so you can stay up to date with the latest news!

Read More
James Bowen James Bowen

Exception Type Details

A couple articles ago, we defined a basic exception type. Today, we'll go over some more details behind the way these exception types work. We'll consider how one might catch all exceptions, but also why this might not be a good idea.

Here's how we defined our exception type:

data ListException = ListIsEmpty | NotEnoughElements
  deriving (Show)

instance Exception ListException

This indicates two different kinds of failures we might have when trying to process a list in a function. As long as we define or derive a Show instance, we can simply say instance Exception, and we'll be able to treat this type as an exception, because the class has no minimum definition.

So far, our example is a simple enumeration. But of course it's also possible to add data to these exception constructors. Let's suppose we want to know what function triggered the failure in the first type, and how many elements we expected and observed in the second type. Let's also define a custom Show instance.

data ListException =
  ListIsEmpty String |
  NotEnoughElements String Int Int

instance Show ListException where
  show (ListIsEmpty function) = "The function '" function ++ "' requires a non-empty list!"
  show (NotEnoughElements function expected observed) =
    "The function '" ++ function ++ "' expected " ++ show expected ++ " elements but only got " ++
    show observed ++ " elements."

Now we can rewrite our functions to add this information.

myHead :: [a] -> a
myHead [] = throw (ListIsEmpty "myHead")
myHead (a : _) = a

sum2Pairs :: (Num a) => [a] -> (a, a)
sum2Pairs (a : b : c : d : _) = (a + b, c + d)
sum2Pairs input = throw (NotEnoughElements "sum2Pairs" 4 (length input))

And we can see this in action:

main :: IO ()
main = do
  result0 <- try (evaluate (myHead []) :: IO (Either ListException Int)
  print result0
  result1 <- try (evaluate (sum2Pairs [2, 3, 4])) :: IO (Either ListException (Int, Int))
  print result1
  result2 <- try (evaluate (sum2Pairs [2, 3, 4, 5])) :: IO (Either ListException (Int, Int))
  print result2

...

>> stack exec my-program
Left The function 'myHead' requires a non-empty list!
Left The function 'sum2Pairs' expected 4 elements but only got 3 elements.
Right (5, 9)

Now we didn't have to implement any custom functions to make our type an exception. But if we wanted to, we could! There are three functions we can override, but they all have appropriate default behaviors. The first of these functions is displayException. You can use it to provide a second way to display the exception beyond the Show instance, if you desire that for whatever reason. However, the Show instance still has priority when the error is thrown by the system.

Let's try keeping the derived instance of Show, but use our new function as the display message.

data ListException =
  ListIsEmpty String |
  NotEnoughElements String Int Int
  deriving (Show)

instance Exception ListException where
  displayMessage (ListIsEmpty function) = "The function '" function ++ "' requires a non-empty list!"
  displayMessage (NotEnoughElements function expected observed) =
    "The function '" ++ function ++ "' expected " ++ show expected ++ " elements but only got " ++
    show observed ++ " elements."

We'll find that our program uses the Show instance.

main :: IO ()
main = do
  return (myHead ([] :: [Int]) >>= print

...

>> stack exec my-program
my-program: ListIsEmpty "myHead"

The other two functions in the definition require us to learn an additional concept: SomeException.

class Exception e where
  toException :: e -> SomeException
  fromException :: SomeException -> Maybe e

The type SomeException is essentially a wrapper type for all exceptions in Haskell. When the system receives and throws your exception, it is always wrapped as SomeException under the hood. So in a way, this acts like the base Exception class in a language like Java or Python. However, it acts like a wrapper instead of a "parent" class due to the lack of type-based inheritance in Haskell.

data SomeException e = forall e. Exception e => SomeException e

The two functions above would allow you to override how you transform your exception type back and forth with the SomeException type. However, there's rarely any reason to override this behavior.

Now, since every exception is SomeException, this means we could catch every possible exception with a handler function. Let's recall our previous example where we could catch a ListException but not a file-based IO exception for opening a non-existent file:

main :: IO ()
main = do
  handle handler $ readFile "does_not_exist.txt" >>= print
  handle handler $ do
    result <- return (sum2Pairs [2, 3, 4])
    print result
  where
    handler :: ListException -> IO ()
    handler e = print e

...

>> stack exec my-program
my-program: does_not_exit.txt: openFile: does not exist (No such file or directory)

If we modify our handler to take SomeException instead of ListException, it will catch both types!

main :: IO ()
main = do
  ...
  where
    handler :: SomeException -> IO ()
    handler e = print e

...

>> stack exec my-program
does_not_exit.txt: openFile: does not exist (No such file or directory)
NotEnoughElements "sum2Pairs" 4 3

Typically, this is not a great idea. Haskell's type system allows us to be very specific with the errors we can catch, and we should take advantage of that. If you aren't anticipating a particular error, you shouldn't catch it. And if it pops up, you should adjust your program accordingly. However, catching "any" exception, logging it, and exiting gracefully as we just did IS a reasonable use case mentioned in the documentation on this subject.

The "proper" way to handle multiple exception types is to daisy chain handle calls with different types like so:

main :: IO ()
main = handle ioHandler $ handle listHandler $ do
  readFile "does_not_exist.txt" >>= print 
  result <- return (sum2Pairs [2, 3, 4])
  print result
  where
    listHandler :: ListException -> IO ()
    listHandler e = putStrLn $ "List exception: " ++ show e

    ioHandler :: IOError -> IO ()
    ioHandler e = putStrLn $ "IO Error: " ++ show e

In the next couple articles, we'll explore more ways to catch errors. Stay tuned! If you want access to our subscriber resources, you can sign up for our monthly newsletter!

Read More
James Bowen James Bowen

"Try"-ing It Out First

Earlier this week we explored how to "catch" exceptions using the functions catch and handle. Today we'll learn a couple new tools for this task. The first function we'll look at is try, but in order to really use it, we'll also have to use evaluate.

Like catch, we can use try to turn our exception into a computation that our program can process and react to gracefully. However, instead of taking an exception handler, this function will simply return the exception using an Either value.

try :: Exception e => IO a -> IO (Either e a)

The computation produces the result type a, but could throw an exception e. So we return the type Either e a. All this must be done in the IO monad, like we saw with catch.

Let's recall our previous approach to catching exceptions. Since we had a pure function (sum2Pairs) that could throw the exception, we would use return in order to move it into the IO monad to use catch. We also needed an explicit type signature on our handler function so that our program knows what exceptions it is trying to catch:

main :: IO ()
main = do
  catch (return (sum2Pairs [2, 3, 4]) >>= print) handler
  catch (return (sum2Pairs [2, 3, 4, 5]) >>= print) handler
  where
    handler :: ListException -> IO ()
    handler e = print e

Let's try to substitute try in for these expressions. Once again, we'll explicitly annotate the resulting value with the exception type.

main :: IO ()
main = do
  result1 <- try (return (sum2Pairs [2, 3, 4])) :: IO (Either ListException (Int, Int))
  print result1
  result2 <- try (return (sum2Pairs [2, 3, 4, 5])) :: IO (Either ListException (Int, Int))
  print result2

However, this doesn't work the way we want! Our program crashes on the exceptional case!

my-program: NotEnoughElements

The reason for this lies in Haskell's laziness. The exceptional computation doesn't actually occur until we "need" the value, which is when the print statement happens. But by delaying the computation, our program loses the try context. We can try to wrap the print statement into our "try" block, but it makes our program unnecessarily complicated.

Instead, we have a different tool to help us. This is the evaluate function.

evaluate :: a -> IO a

At first glance, this seems to be the same type as return! It takes a pure value and wraps it in the IO monad. However, it will take care of "evaluating" our expression in a strict (non-lazy) manner. So the computation will occur when we need it to, and we can use "try". If we change our above implementation by swapping evaluate for return, then it works!

main :: IO ()
main = do
  result1 <- try (evaluate (sum2Pairs [2, 3, 4])) :: IO (Either ListException (Int, Int))
  print result1
  result2 <- try (evaluate (sum2Pairs [2, 3, 4, 5])) :: IO (Either ListException (Int, Int))
  print result2

...

Left NotEnoughElements
Right (5, 9)

So now we have a more reliable way of turning our "pure" computations into expressions where we can catch their exceptions. In the next couple articles, we'll focus some more on what we can do with exceptional data types. Until then, make sure to subscribe to our monthly newsletter so you can stay up to date with the latest news and get access to our subscriber resources!

Read More
James Bowen James Bowen

Catching What We’ve Thrown

Last week we learned how to throw exceptions in Haskell. In the next couple articles, we're going to learn how to "catch" them, so that in exceptional circumstances we can still proceed with our program in a sane way.

Now, throwing exceptions disrupted our patterns of type safety quite a bit. We could throw an exception from any piece of seemingly pure code. Even our simple function from a list to an element of that list could invoke throw:

data ListException = ListIsEmpty | NotEnoughElements
  deriving (Show)

instance Exception ListException

myHead :: [a] -> a
myHead [] = throw ListIsEmpty
myHead (a : _) = a

sum2Pairs :: (Num a) => [a] -> (a, a)
sum2Pairs (a : b : c : d : _) = (a + b, c + d)
sum2Pairs _ = throw NotEnoughElements

Unlike throwing exceptions though, we can only "catch" exceptions in the IO monad. As we discussed last month, the IO monad involves a lot of operations to communicate with the outside world, and so it is the most "impure" of monads. Part of this impurity is that we can "intercept" exception siganls that are sent to the operating system.

The first function we'll go over this time for catching exceptions is, well, catch. Here's its type signature:

catch :: (Exception e)
  => IO a
  -> (e -> IO a)
  -> IO a

It takes an IO action we would like to perform and then a "handler" for a particular kind of exception that can occur. The handler takes the exception as an input and then produces a new IO action with the same return value. Here's how we can use it in our example:

main :: IO ()
main = do
  catch (return (sum2Pairs [2, 3, 4]) >>= print) handler
  catch (return (sum2Pairs [2, 3, 4, 5]) >>= print) handler
  where
    handler :: ListException -> IO ()
    handler e = print e

...

>> stack exec my-program
NotEnoughElements
(5, 9)

Notice we need to wrap our pure computation sum2Pairs in the IO monad using return to catch its exception. Then we need to make it so our handler function returns the same type. In this case, we make that type () and just print the results.

Two final notes. First, the function handle is the same as catch except its arguments are reversed.

handle :: (Exception e)
  => IO a
  -> (e -> IO a)
  -> IO a

This can make for cleaner code in our example. We can put our handler function first and use do-syntax for the computation itself. This is good with lengthier examples.

main :: IO ()
main = do
  handle handler $ do
    result <- return (sum2Pairs [2, 3, 4])
    print result
  handle handler $ do
    result <- return (sum2Pairs [2, 3, 4, 5])
    print result
  where
    handler :: ListException -> IO ()
    handler e = print e

Second, our handler will only catch exceptions that match the type of the handler! We specified the handler as a separate expression with its own type signature because you need to specify what the type is! It wouldn't work to just inline this definition, because GHC would complain about an ambiguous type. So for example, if we opened a non-existant file, our handler would not catch this, and the program would crash:

main :: IO ()
main = do
  handle handler $ readFile "does_not_exist.txt" >>= print
  handle handler $ do
    result <- return (sum2Pairs [2, 3, 4, 5])
    print result
  where
    handler :: ListException -> IO ()
    handler e = print e

...

>> stack exec my-program
my-program: does_not_exit.txt: openFile: does not exist (No such file or directory)

It is possible to catch all exceptions, but this is not advisable, as the documentation says. We'll go into more details about that possibility later.

For now, you should check out one of our useful resources for whatever stage of your Haskell journey you are at! If you're just starting out, our Beginners Checklist will help you out. If you're looking to incorporate exceptions into a larger project, try out our production checklist for some more suggestions of libraries to use!

Read More
James Bowen James Bowen

Throwing Exceptions: The Basics

Haskell is a pure, functional, strongly typed language. Unfortunately, this doesn't mean that nothing ever goes wrong or that there are no runtime errors. However, we can still use the type system in a few different ways to denote the specific problems that can occur. In the ideal case of error handling, I see an analogy to the state monad. Haskell "doesn't have mutable state". Except really it does…you just have to specify that mutable state is possible by placing your function in the State monad. Similarly, if we use particular functions, we often find that their types indicate the possibility that errors could arise in the computation.

The blog topic for June is "exceptional cases", so we're going to explore a wide variety of different ways that we can indicate runtime problems in Haskell and, more importantly, how we can write our code to catch these problems so our program doesn't suddenly crash in an unexpected way.

To start this journey, let's learn about "Exceptions" and how to throw them. A language like Java will have a class to represent the idea of exceptions:

class Exception {
  ...
}

This would serve as the base for other exception types. So you might define your own, like a "File" exception:

class FileException extends Exception {
}

Of course Haskell doesn't have classes or use inheritance in the same way. When it comes to inheritance, we rely on typeclasses. So Exception is a typeclass, not a data type.

class (Typeable e, Show e) => Exception e where
  ...

Notice that an exception type must be "showable". This makes sense, since the purpose of exceptions is to print them to the screen for output! They must also be Typeable, but virtually any type you'll make fulfills this constraint without you needing to even specify it.

There isn't a minimum definition for the Exception class. This means it is easy to define your own exception type. So as a first example, let's define an exception to work with lists. Certain list operations expect the list is non-empty, or that it has at least a certain number of elements. So we'll make an enumerated type with two constructors.

data ListException = ListIsEmpty | IndexNotFound
  deriving (Show)

We can derive the Show class, but we can't actually derive Exception under normal circumstances. However, since we don't need any functions, we just make a trivial instance.

data ListException = ListIsEmpty | NotEnoughElements
  deriving (Show)

instance Exception ListException

So what can we do with exceptions? Well the most important thing is that we can "throw" them to indicate the error has occurred. The throw function has a strange type if you look up the documentation:

throw :: forall (r :: RuntimeRep). forall (a :: TYPE r). forall e. Exception e => e -> a

This is a bit confusing, but to build a basic understanding, we can just look at the last part:

throw :: forall e. Exception e => e -> a

If we have an exception, we can use "throw" to trigger that exception and return any type. The a can be anything we want! All the magic stuff in the type signature essentially allows us to return this exception as "any type".

So for example, we can define a couple functions to operate on lists. These will have the "happy path" where we have enough elements, but they'll also have a failure mode. In the failure mode we'll throw the exception.

myHead :: [a] -> a
myHead [] = throw ListIsEmpty
myHead (a : _) = a

sum2Pairs :: (Num a) => [a] -> (a, a)
sum2Pairs (a : b : c : d : _) = (a + b, c + d)
sum2Pairs _ = throw NotEnoughElements

And when we use these functions, we can see how the exceptions occur:

>> myHead [4, 5]
4
>> myHead []
*** Exception: ListIsEmpty
>> sum2Pairs [5, 6, 7, 8, 9, 10]
(11, 15)
>> sum2Pairs [4, 5, 6]
Exception: NotEnoughElements

So even though our functions return different types, we can still use throw with our exception type on both of them.

You might also notice that our functions have pure type signatures! So using throw by itself in this way violates our notion of what pure functions ought to do. It's necessary to have this escape hatch in certain circumstances. However, we really want to avoid writing our code in this way if we possibly can.

In the coming weeks, we'll examine how to "catch" these kinds of exceptions so that our code still has some semblance of purity. To stay up to date with the latest Haskell news, make sure to subscribe to our monthly newsletter! This will keep you informed and, even better, give you access to our subscriber resources!

Read More
James Bowen James Bowen

Unit Testing User Interactions

To round out our month of IO, I'd like to bring together several of the topics I've mentioned over the course of the month. A few weeks ago when talking about the interact function, I brought up the example of a command line program that would allow the user to enter simple addition expressions and print out the answer. Then going back to the first article this month, I mentioned how we can use the Handle abstraction to write a program that could work with either terminal input or file input so that we can test it. And finally, we can go all the way back to Monads month for some information on lifting functions and creating our own monad.

Today we're going to combine all these ideas! We'll have a simple command line program that will use a custom monad to abstract away input details, and then write some tests for it!

Let's write this program in a test-driven way. What are the use cases we want? Well each time a user enters a line on the terminal, we'll treat that as an expression to evaluate, and then print the solution.

-- Input
4 + 5
6 + -2

-- Output
9
4

If they enter something that doesn't follow our simple equation format, it should print an appropriate message:

-- Input
4 + 5 + 6
3 +
Hello + Goodbye
4 * 5

-- Output
There are too many parts! Please enter something in the format "x + y"
There are too few parts! Please enter something in the format "x + y"
It doesn't look like those are numbers!
Please only use addition!

And last of all, the program should be able to "recover". So if the user has one incorrect line, they can still enter in another equation and it should work.

-- Input
6 +
9 + 14

-- Output
There are too few parts! Please enter something in the format "x + y"
23

So how will we write this program in a way that we can test it? The key idea is that we'll create a monad that stores the "Handles" we're working with, and then we'll be able to customize it. So let's create a monad type that has a Reader over our input and output handles.

data AppConfig = AppConfig
  { inHandle :: Handle
  , outHandle :: Handle
  }

newtype AppMonad a = AppMonad (ReaderT AppConfig IO a)
  deriving (Functor, Applicative, Monad)

We can start with some simple instances for MonadIO and MonadReader, as well as a "run" function.

instance MonadIO AppMonad where
  liftIO = AppMonad . lift

instance MonadReader AppConfig AppMonad where
  ask = AppMonad ask
  local f (AppMonad a) = AppMonad (local f a)

runApp :: AppMonad a -> (Handle, Handle) -> IO a
runApp (AppMonad action) (inH, outH) = runReaderT action (AppConfig inH outH)

Now we can write some functions that will read and write using our handles.

appGetLine :: AppMonad String
appGetLine = do
  inH <- asks inHandle
  liftIO $ hGetLine inH

appPutStrLn :: String -> AppMonad ()
appPutStrLn output = do
  outH <- asks outHandle
  liftIO $ hPutStrLn outH output

appIsEOF :: AppMonad Bool
appIsEOF = do
  inH <- asks inHandle
  liftIO $ hIsEOF inH

Now let's write the core logic function for our program, taking the line of input and producing a line of output:

evalLine :: String -> String
evalLine input = case splitOn " " input of
  [first, op, second] -> if op /= "+"
    then "Please only use addition!"
    else case (readMaybe first, readMaybe second) of
      (Just x, Just y) -> show (x + y)
      _ -> "It doesn't look like those are numbers!"
  (first : op : second : other : _) -> "There are too many parts! Please enter something in the format \"x + y\""
  _ -> "There are too few parts! Please enter something in the format \"x + y\""

And now it's straightforward to write the input/output loop:

runCLI :: AppMonad ()
runCLI = go
  where
    go = do
      ended <- appIsEOF
      if ended
        then return ()
        else do
          input <- appGetLine
          let output = evalLine input
          appPutStrLn output
          go

Finally, in the "main" function, we just need to call runApp with the standard handles:

main :: IO ()
main = runApp runCLI (stdin, stdout)

In our testing code, then we can write a function that will take two file paths, a file containing our expected input, and a file containing our expected output. It will create an input handle from the first first, and create a temporary file (remember that concept?) for our program's output handle.

testCLIProgram :: FilePath -> FilePath -> Assertion
testCLIProgram inputFile expectedOutputFile = do
  currentDir <- getCurrentDirectory
  inH <- openFile inputFile ReadMode
  (actualOutputFile, outH) <- openTempFile currentDir "output.txt"
  ...

Then we'll run our program, which will write all its output to the temporary file. Then we'll reset the output handle to the beginning (remember it's still readable), and compare its contents to those in the expected output. If they match, our program works!

testCLIProgram :: FilePath -> FilePath -> Assertion
testCLIProgram inputFile expectedOutputFile = do
  currentDir <- getCurrentDirectory
  (actualOutputFile, outH) <- openTempFile currentDir "output.txt"
  inH <- openFile inputFile ReadMode
  runApp runCLI (inH, outH)
  hSeek outH AbsoluteSeek 0
  actualOutput <- hGetContents outH
  expectedOutput <- readFile expectedOutputFile
  actualOutput @?= expectedOutput
  hClose inH
  hClose outH
  removeFile actualOutputFile

This lists all our operations in logical order, but it still doesn't necessarily cover all the exceptional cases correctly! We might still want to use the bracket pattern to ensure file cleanup happens correctly. The "resources" we acquire are the temporary file and its handle, the input handle, and the expected output string. We want to close the handles and delete the file once everything is finished running:

testCLIProgram :: FilePath -> FilePath -> Assertion
testCLIProgram inputFile expectedOutputFile = bracket acquire release runTest
  where
    acquire :: IO (FilePath, Handle, Handle, String)
    acquire = do
      currentDir <- getCurrentDirectory
      (actualOutputFile, outH) <- openTempFile currentDir "actual_output.txt"
      inH <- openFile inputFile ReadMode
      expectedOutput <- readFile expectedOutputFile
      return (actualOutputFile, outH, inH, expectedOutput)

    release :: (FilePath, Handle, Handle, String) -> IO ()
    release (fp, outH, inH, _) = do
      hClose outH
      hClose inH
      removeFile fp

    runTest :: (FilePath, Handle, Handle, String) -> IO ()
    runTest (fp, outH, inH, expectedOutput) = do
      runApp runCLI (inH, outH)
      hSeek outH AbsoluteSeek 0
      actualOutput <- hGetContents outH
      actualOutput @?= expectedOutput

And so our "test main" can now just run the different tests as it needs to!

main :: IO ()
main = defaultMain $ testGroup
  [ testCase "App 1" (testCLIProgram "input1.txt" "output1.txt")
  , testCase "App 2" (testCLIProgram "input2.txt" "output2.txt")
  , testCase "App 3" (testCLIProgram "input3.txt" "output3.txt")
  ]

So in the course of this article, I think we managed to use at least half a dozen of our monad and IO concepts! So hopefully you are beginning to see how all these ideas build on each other and allow you to do some pretty cool things!

Next month, we'll kind of be sticking with the IO theme. But we'll start looking specifically at exceptional cases and the different ways we have to handle those more smoothly. If you want to stay up to date with all the latest topics we're covering at Monday Morning Haskell, make sure you subscribe to our monthly newsletter! If you miss a few articles over the course of the month, you'll always get a summary so you can catch up!

Read More
James Bowen James Bowen

Sizing Up our Files

Earlier this week we went over some basic mechanics with regard to binary files. This week we'll look at a couple functions for dealing with file size. These are perhaps a bit more useful with binary files, but they also work with normal files, as we'll see.

The two functions are very simple. We can get the file size, and we can set the file size:

hFileSize :: Handle -> IO Integer

hSetFileSize :: Handle -> Integer -> IO ()

Getting the file size does exactly what you would expect. It gives us an integer for the number of bytes in the file. We can use this on our bitmap from last time, but also on a normal text file with the lines "First Line" through "Fourth Line".

main :: IO ()
main = do
  h1 <- openFile "pic_1.bmp" ReadMode
  h2 <- openFile "testfile.txt" ReadMode
  hFileSize h1 >>= print
  hFileSize h2 >>= print

...

822
46

Note however, that we cannot get the file size of terminal handles, since these aren't, of course, files. A potential hope would be that this would return the number of bytes we've written to standard out so far, or the (strictly read) number of bytes we get in stdin before end-of-file. But it throws an error instead:

main :: IO ()
main = do
  hFileSize stdin >> print
  hFileSize stdout >> print

...

<stdin>: hFileSize: inappropriate type (not a regular file)

Now setting the file size is also possible, but it's a tricky and limited operation. First of all, it will not work on a handle in ReadMode:

main :: IO ()
main = do
  h <- openFile "testfile.txt" ReadMode
  hSetFileSize h 34

...

testfile.txt: hSetFileSize: invalid argument (Invalid argument)

In ReadWriteMode however, this operation will succeed. By truncating from 46 to 34, we remove the final line "Fourth Line" from the file (don't forget the newline character!).

main :: IO ()
main = do
  h <- openFile "testfile.txt" ReadMode
  hSetFileSize h 34

... (File content)

First Line
Second Line
Third Line

Setting the file size also works with WriteMode. Remember that opening a file in write mode will erase its existing contents. But we can start writing new contents to the file and then truncate later.

main :: IO ()
main = do
  h <- openFile "testfile.txt" WriteMode
  hPutStrLn h "First Line"
  hPutStrLn h "Second Line"
  hPutStrLn h "Third Line"
  hPutStrLn h "Fourth Line"
  hSetFileSize h 34

... (File content)

First Line
Second Line
Third Line

And, as you can probably tell by now, hSetFileSize only truncates from the end of files. It can't remove content from the beginning. So with our binary file example, we could drop 48 bytes to remove one of the "lines" of the picture, but we can't use this function to remove the 54 byte header:

main :: IO ()
main = do
  h <- openFile "pic_1.bmp" ReadWriteMode
  hSetFileSize h 774

Finally, hSetFileSize can also be used to add space to a file. Of course, the space it adds will all be null characters (byte = 0). But this can still be useful in certain circumstances.

main :: IO ()
main = do
  h <- openFile "pic_1.bmp" ReadWriteMode
  hSetFileSize h 870
  inputBytes <- B.unpack <$> B.hGetContents h
  let lines = chunksOf 48 (drop 54 inputBytes)
  print (last lines)

...

[0,0,0,...]

These aren't the most common operations, but perhaps you'll find a use for them! We're almost done with our look at more obscure IO actions. If you've missed some of these articles and want a summary of this month's new material, make sure to subscribe to our monthly newsletter! You'll also get a sneak peak at what's coming next!

Read More
James Bowen James Bowen

Using Binary Mode in Haskell

So far in our IO adventures, we've only been dealing with plain text files. But a lot of data isn't meant to be read as string data. Some of the most interesting and important problems in computing today are about reading image data and processing it so our programs can understand what's going on. Executable program files are also in a binary format, rather than human readable. So today, we're going to explore how IO works with binary files.

First, it's important to understand that handles have encodings, which we can retrieve using hGetEncoding. For the most part, your files will default as UTF-8.

hGetEncoding :: Handle -> IO (Maybe TextEncoding)

main :: IO ()
main = do
  hGetEncoding stdin >>= print
  hGetEncoding stdout >>= print
  h <- openFile "testfile.txt" ReadMode
  hGetEncoding h >>= print

...

Just UTF-8
Just UTF-8
Just UTF-8

There are other encodings of course, like char8, latin1, and utf16. These are different ways of turning text into bytes, and each TextEncoding expression refers to one of these. If you know you have a file written in UTF16, you can change the encoding using hSetEncoding:

hSetEncoding :: Handle -> TextEncoding -> IO ()

main :: IO ()
main = do
  h <- openFile "myutf16file.txt" ReadMode
  hSetEncoding h utf16
  myString <- hGetLine h
  ...

But now notice that hGetEncoding returns a Maybe value. For binary files, there is no encoding! We are only allowed to read raw data. You can set a file to read as binary by using hSetBinaryMode True, or by just using openBinaryFile.

hSetBinaryMode :: Handle -> Bool -> IO ()

openBinaryFile :: FilePath -> IOMode -> IO Handle

main :: IO ()
main = do
  h <- openBinaryFile "pic_1.bmp" ReadMode
  ...

When it comes to processing binary data, it is best to parse your input into a ByteString rather than a string. Using the unpack function will then allow you to operate on the raw list of bytes:

import qualified Data.ByteString as B

main :: IO ()
main = do
  h <- openBinaryFile "pic_1.bmp" ReadMode
  inputBytes <- B.hGetContents h
  print $ length inputBytes

In this example, I've opened up an image files, and converted its data into a list of bytes (using the Word type).

Further processing of the image will require some knowledge of the image format. As a basic example, I made a 24-bit bitmap with horizontal stripes throughout. The size was 16 pixels by 16 pixels. With 3 bytes (24 bits) per pixel, the total size of the "image" would be 768. So then upon seeing that my program above printed "822", I could figure out that the first 54 bits were just header data.

I could then separate my data into "lines" (48-byte chunks) and I successfully observed that each of these chunks followed a specific pattern. Many lines were all white (the only value was 255), and other lines had three repeating values.

import qualified Data.ByteString as B
import Data.List.Split (chunksOf)

main :: IO ()
main = do
  h <- openBinaryFile "pic_1.bmp" ReadMode
  inputBytes <- B.unpack <$> B.hGetContents h
  let lines = chunksOf 48 (drop 54 inputBytes)
  forM_ lines print

...

[255, 255, 255, ...]
[36, 28, 237, 36, 28, 237, ...]
[255, 255, 255, ...]
[76, 177, 34, 76, 177, 34 ...]
[255, 255, 255, ...]
[36, 28, 237, 36, 28, 237, ...]
[255, 255, 255, ...]
[76, 177, 34, 76, 177, 34 ...]
[255, 255, 255, ...]
[0, 242, 255, 0, 242, 255, ...]
[255, 255, 255, ...]
[232, 162, 0, 232, 162, 0, ...]
[255, 255, 255, ...]
[0, 242, 255, 0, 242, 255, ...]
[255, 255, 255, ...]
[232, 162, 0, 232, 162, 0, ...]

Now that the data is broken into simple numbers, it would be possible to do many kinds of mathematical algorithms on it if there were some interesting data to process!

In our last couple of IO articles, we'll keep looking at some issues with binary data. If you want monthly summaries of what we're writing here at Monday Morning Haskell, make sure to subscribe to our monthly newsletter! This will also give you access to our subscriber resources!

Read More
James Bowen James Bowen

Interactive IO

Today we'll continue our study of IO by looking at an interactive IO program. In this kind of program, the user will enter commands continuously on the command line to interact with our program. The fun part is that we'll find a use for a lesser-known library function called, well, interact!

Imagine you're writing a command line program where you want the user to keep entering input lines, and you do some kind of processing for each line. The most simple example would be an echo program, where we simply repeat the user's input back out to them:

>> Hello
Hello
>> Goodbye
Goodbye

A naive approach to writing this in Haskell would use recursion like so:

main :: IO ()
main = go
  where
    go = do
      input <- getLine
      putStrLn input
      go

However, there's no terminal condition on this loop. It keeps expecting to read a new line. Our only way to end the program is with "ctrl+C". Typically, the cleaner way to end a program is to use the input "ctrl+D" instead, which is the "end of file" character. However, this example will not end elegantly if we do that:

>> Hello
Hello
>> Goodbye
Goodbye
>> (ctrl+D)
<stdin>: hGetLine: end of file

What's happening here is that getLine will throw this error when it reads the "end of file" character. In order to fix this, we can use these helper functions.

hIsEOF :: Handle -> IO Bool

-- Specialized to stdin
isEOF :: IO Bool

These give us a boolean that indicates whether we have reached the "end of file" as our input. The first works for any file handle and the second tells us about the stdin handle. If it returns false, then we are safe to proceed with getLine. So here's how we would rewrite our program:

main :: IO ()
main = go
  where
    go = do
      ended <- isEOF
      if ended
        then return ()
        else do
          input <- getLine
          putStrLn input
          go

Now we won't get that error message when we enter "ctrl+D".

But for these specific problems, there's another tool we can turn to, and this is the "interact" function:

interact :: (String -> String) -> IO ()

The function we supply simply takes an input string and determines what string should be output as a result. It handles all the messiness of looping for us. So we could write our echo program very simply like so:

main :: IO ()
main = interact id

...

>> Hello
Hello
>> Goodbye
Goodbye
>> Ctrl+D

Or if we're a tiny bit more ambitious, we can capitalize each of the user's entries:

main :: IO ()
main = interact (map toUpper)

...

>> Hello
HELLO
>> Goodbye
GOODBYE
>> Ctrl+D

The function is a little tricky though, because the String -> String function is actually about taking the whole input string and returning the whole output string. The fact that it works line-by-line with simple functions is an interesting consequence of Haskell's laziness.

However, because the function is taking the whole input string, you can also write your function so that it breaks the input into lines and does a processing function on each line. Here's what that would look like:

processSingleLine :: String -> String
processSingleLine = map toUpper

processString :: String -> String
processString input = result
  where
    ls = lines input
    result = unlines (map processSingleLine ls)

main :: IO ()
main = interact processString

For our uppercase and id examples, this works the same way. But this would be the only proper way to write our program if we wanted to, for example, parse a simple equation on each line and print the result:

processSimpleAddition :: String -> String
processSingleAddition input = case splitOn " " input of
  [num1, _, num2] -> show (read num1 + read num2)
  _ -> "Invalid input!"

processString :: String -> String
processString input = result
  where
    ls = lines input
    result = unlines (map processSimpleAddition ls)

main :: IO ()
main = interact processString

...

>> 4 + 5
9
>> 3 + 2
5
>> Hello
Invalid input!

So hIsEOF and interact are just a couple more tools you can add to your arsenal to simplify some of these common types of programs. If you're enjoying these blog posts, make sure to subscribe to our monthly newsletter! This will keep you up to date with our newest posts AND give you access to our subscriber resources!

Read More
James Bowen James Bowen

Buffering...Please Wait...

Today we continue our exploration of more obscure IO concepts with the idea of buffering. Buffering determines the more precise mechanics of how our program reads and writes with files. In the right circumstance, using the proper buffering method can make your program work a lot more efficiently.

To start, let's consider the different options Haskell offers us. The BufferMode type has three options:

data BufferMode =
  NoBuffering |
  LineBuffering |
  BlockBuffering (Maybe Int)

Every handle has an assigned buffering mode. We can get and set this value using the appropriate functions:

hGetBuffering :: Handle -> IO BufferMode

hSetBuffering :: Handle -> BufferMode -> IO ()

By default, terminal handles will use NoBuffering and file handles will use BlockBuffering:

main :: IO ()
main = do
  hGetBuffering stdin >>= print
  hGetBuffering stdout >>= print
  (openFile "myfile.txt" ReadMode) >>= hGetBuffering >>= print
  (openFile "myfile2.txt" WriteMode) >>= hGetBuffering >>= print

...

NoBuffering
NoBuffering
BlockBuffering Nothing
BlockBuffering Nothing

So far this seems like some nice trivia to know, but what do these terms actually mean?

Well, when your program reads and writes to files, it doesn't do the "writing" at the exact time you expect. When your program executes hPutStr or hPutStrLn, the given string will be added to the handle's buffer, but depending on the mode, it won't immediately be written out to the file.

If you use NoBuffering though, it will be written immediately. Once the buffer has even a single character, it will write this character to the file. If you use LineBuffering, it will wait until it encounters a newline character.

Finally, there is BlockBuffering. This constructor holds an optional number. The buffer won't write until it contains the given number of bytes. If the value is Nothing, then the underlying number just depends on the operating system.

This idea might sound dangerous to you. Does this mean that it's likely that your program will just leave data unwritten if it doesn't get the right amount? Well no. You can also flush buffers, which will cause them to write their information out no matter what. This happens automatically on important operations like hClose (remember to close your handles!). You can also do this manually with the hFlush function:

hFlush :: Handle -> IO ()

For the most part, you won't notice the difference in buffer modes on normal programs. But under certain circumstances, it can make a big difference in performance. The act of writing information to a file is actually a very long and expensive operation as far as programs are concerned. So doing fewer writes with larger amounts of data tends to be more efficient than doing more writes with smaller amounts of data.

Hopefully you can see now why BlockBuffering is an option. Typically, this is the most efficient way if you're writing a large amount of data, while NoBuffering is the least efficient.

To these this out, I wrote a simple program to write out one hundred thousand numbers to a file, and timed it with different buffer modes:

someFunc :: IO ()
someFunc = do
  let numbers = [1..100000]
  h <- openFile "number.txt" WriteMode
  hSetBuffering h NoBuffering
  timestamp1 <- getCurrentTime
  forM_ numbers (hPrint h)
  hClose h
  timestamp2 <- getCurrentTime
  print $ diffUTCTime timestamp2 timestamp1

When running with NoBuffering, this operation took almost a full second: 0.93938s. However, when I changed to LineBuffering, it dropped to 0.2367s. Finally, with BlockBuffering Nothing, I got a blazing fast 0.05473s. That's around 17x faster! So if you're writing a large amount of data to a file, this can make a definite difference!

If you're writing a program where write-performance is important, I hope this knowledge helps you! Even if not, it's good to know what kinds of things are happening under the hood. If you want to keep up to date with more Haskell knowledge, both obscure and obvious, make sure to subscribe to our monthly newsletter! If you're just starting out, this will give you access to resources like our Beginners Checklist and Recursion Workbook!

Read More
James Bowen James Bowen

Using Temporary Files

In the last article we learned about seeking. Today we'll see another context where we can use these tools while learning about another new idea: temporary files.

Our "new" function for today is openTempFile. Its type signature looks like this:

openTempFile :: FilePath -> String -> IO (FilePath, Handle)

The first argument is the directory in which to create the file. The second is a "template" for the file name. The template can look like a normal file name, like name.extension. The name of the file that will actually be created will have some random digits appended to the name. For example, we might get name1207-5.extension.

The result of the function is that Haskell will create the file and pass a handle to us in ReadWrite mode. So our two outputs are the full path to the file and its handle.

Despite the name openTempFile, this function won't do anything to delete the file when it's done. You'll still have to do that yourself. However, it does have some useful built-in mechanics. It is guaranteed to not overwrite an existing file on the system, and it also gives limited file permissions so it can't be used by an attacker.

How might we use such a file? Well let's suppose we have some calculation that we break into multiple stages, so that it uses an intermediate file in between. As a contrived example, let's suppose we have two functions. One that writes fibonacci numbers to a file, and another that takes the sum of numbers in a file. We'll have both of these operate on a pre-existing Handle object:

writeFib :: Integer -> Handle -> IO ()
writeFib n handle = writeNum (0, 1) 0
  where
    writeNum :: (Integer, Integer) -> Integer -> IO ()
    writeNum (a, b) x = if x > n then return ()
      else hPutStrLn handle (show a) >> writeNum (b, a + b) (x + 1)

sumNumbers :: Handle -> IO Integer
sumNumbers handle = do
  hSeek handle AbsoluteSeek 0
  nums <- (fmap read . lines) <$> hGetContents handle
  return $ sum nums

Notice how we "seek" to the beginner of the file in our reading function. This means we can use the same handle for both operations, assuming the handle has ReadWrite mode. So let's see how we put this together with openTempFile:

main :: IO ()
main = do
  n <- read <$> getLine
  (file, handle) <- openTempFile "/tmp/fib" "calculations.txt"
  writeFib n handle
  sum <- sumNumbers handle
  print sum
  hClose handle
  removeFile file

A couple notes here. First, if the directory passed to openTempFile doesn't exist, this will cause an error. We also need to print the sum before closing the handle, or else Haskell will not actually try to read anything until after closure due to laziness!

But aside from these caveats, our function works! If we don't remove the file, then we'll be able to see the file at a location like /tmp/fib/calculations6132-6.txt.

This example doesn't necessarily demonstrate why we would use openTempFile instead of just giving the file the name calculations.txt. The answer to that is our process is now safer with respect to concurrency. We could run this same operation on different threads in parallel, and there would be no file conflicts. We'll see exactly how to do that later this year!

For now, make sure you're subscribed to our monthly newsletter so that you can stay up to date with all latest information and offers! If you're already subscribed, take a look at our subscriber resources that can help you improve your Haskell!

Read More