Effectful Loops: Sequence and MapM

We've covered a lot of different ways to run loop behavior in Haskell, but all of them operate in a "pure" way. That is, none of them use monadic behavior. While folding and scanning provide us with a basic mechanism for tracking stateful computations, you sometimes have more complicated problems where the stateful object has more layers. And none of the functions we've seen so far allow IO activity, like printing to the console or accessing the file system.

So to motivate this example, let's imagine we're parsing several different files, each containing a set of names. We would like to read each of these files and create a combined list. Here's what this might look like in C++ with a for-loop:

std::vector<std::string> readSingleFile(std::string filename) { ... }

std::vector<std::string> readNamesFromFiles(std::vector<std::string> filenames) {
  std::vector<std::string> results;
  for (const auto& file : filenames) {
    auto namesFromThisFile = readSingleFile(file);
    // std::copy also works well
    for (const auto& name : namesFromThisFile) {
      results.push_back(name);
    }
  }
  return results;
}

This looks a lot like a map problem in Haskell (or really, concatMap). However, we have a problem. The function we would like to map must be an IO function!

readSingleFile :: FilePath -> IO [String]
readSingleFile fp = lines <$> readFile fp

This means that if we try to use map with this function and a list of FilePaths, we won't get a list of lists that we can immediately concat. And in fact, we won't even get an IO object containing the list of lists! The type will actually be [IO [String]], a list of IO actions which each return a list of strings!

>> let filepaths = [...]
>> let results = map readSingleFile filePaths
>> :t results
[IO [String]]

By itself, this doesn't seem to help us achieve our goal. But there are a couple helpers that can get the rest of the way. One function is sequence. This takes a list of monadic actions and runs them back to back, collecting the results! This function generalizes beyond lists, but we'll just think about the type signature using lists for right now.

sequence :: (Monad m) => [m a] -> m [a]

If we imagine our list of monadic actions, this function essentially acts as though it is running them all together in do syntax and returning a list of the results.

sequence :: (Monad m) => [m a] -> m [a]
sequence [action1, action2, action3, ...] = do
  result1 <- action1
  result2 <- action2
  result3 <- action3
  ...
  return [result1, result2, result3, ...]

So we could apply this function against our previous output, and then concat the results within the IO object using fmap.

readSingleFile :: FilePath -> IO [String]

readNamesFromFiles :: [FilePath] -> IO [String]
readNamesFromFiles files =
  (fmap concat) $ sequence (map readSingleFile files)

But there's an even simpler way to do this! There's also the function mapM, which essentially combines map and sequence:

mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]

It's type signature is like the original map, but instead it takes a function in a monadic action and produces a single monadic action. This allows us to simplify our solution:

readSingleFile :: FilePath -> IO [String]

readNamesFromFiles :: [FilePath] -> IO [String]
readNamesFromFiles files =
  (fmap concat) $ mapM readSingleFile files

Since mapping works so much like a for-loop, there is even the function forM. This is the same as mapM, except that its arguments are reversed, so the list comes first.

forM :: (Monad m) => [a] -> (a -> m b) -> m [b]

Each of these functions also has an equivalent underscored function, which discards the result. These can be useful when you're only interested in the monadic side effect of the function, rather than the return value.

sequence_ :: (Monad m) => [m a] -> m ()

mapM_ :: (Monad m) => (a -> m b) -> [a] -> m ()

forM_ :: (Monad m) => [a] -> (a -> m b) -> m ()

Here's an example where we'll want that. Suppose our files might contain duplicated names, and we want to discard duplicates. This means we'll want a Set instead of a list. Our C++ code doesn't change much:

std::vector<std::string> readSingleFile(std::string filename) { ... }

std::set<std::string> readNamesFromFiles(std::vector<std::string> filenames) {
  std::set<std::string> results;
  for (const auto& file : filenames) {
    auto namesFromThisFile = readSingleFile(file);
    for (const auto& name : namesFromThisFile) {
      results.insert(name);
    }
  }
  return results;
}

We can change things up in our Haskell code though! Instead of post-processing the list of lists afterward, we'll keep track of the growing set using the State monad!

readSingleFile :: FilePath -> StateT (Set.Set String) IO ()
readSingleFile fp = do
  names <- lift (lines <$> readFile fp)
  prevSet <- get
  put $ foldr Set.insert prevSet names

readNamesFromFiles :: [FilePath] -> IO [String]
readNamesFromFiles filenames = Set.toList <$> execStateT
  (mapM_ readSingleFile filenames) Set.empty

Notice how this uses the execStateT shortcut we talked about last month! Generally speaking, combining the State monad and mapM provides a fully generalizable way to write for-loop code. It allows you to incorporate IO as needed, and it allows you to track any kind of state you would like. Sometimes though, it'll be much more convenient to use traverse, which we'll talk about in our final article this month!

If you're enjoying learning about for loops in Haskell, you should sign up for our monthly newsletter! This will keep you up to date with any new content that comes out and you'll get access to our subscriber resources!

Previous
Previous

Traverse: Fully Generalized Loops

Next
Next

What about While Loops? Unfolding in Haskell