Making Arrays Mutable!

sorting_array.jpg

Last week we walked through the process of refactoring our code to use Data.Array instead of Data.Map. But in the process, we introduced a big inefficiency! When we use the Array.// function to "update" our array, it has to create a completely new copy of the array! For various reasons, Map doesn't have to do this.

So how can we fix this problem? The answer is to use the MArray interface, for mutable arrays. With mutable arrays, we can modify them in-place, without a copy. This results in code that is much more efficient. This week, we'll explore the modifications we can make to our code to allow this. You can see a quick summary of all the changes in this Git Commit.

Refactoring code can seem like an hard process, but it's actually quite easy with Haskell! In this article, we'll use the idea of "Compile Driven Development". With this process, we update our types and then let compiler errors show us all the changes we need. To learn more about this, and other Haskell paradigms, read our Haskell Brain series!

Mutable Arrays

To start with, let's address the seeming contradiction of having mutable data in an immutable language. We'll be working with the IOArray type in this article. An item of type IOArray acts like a pointer, similar to an IORef. And this pointer is, in fact, immutable! We can't make it point to a different spot in memory. But we can change the underlying data at this memory. But to do so, we'll need a monad that allows such side effects.

In our case, with IOArray, we'll use the IO monad. This is also possible with the ST monad. But the specific interface functions we'll use (which are possible with either option) live in the MArray library. There are four in particular we're concerned with:

freeze :: (Ix i, MArray a e m, IArray b e) => a i e -> m (b i e)

thaw :: (Ix i, IArray a e, MArray b e m) => a i e -> m (b i e)

readArray :: (MArray a e m, Ix i) => a i e -> i -> m e

writeArray :: (MArray a e m, Ix i) => a i e -> i -> e -> m ()

The first two are conversion functions between normal, immutable arrays and mutable arrays. Freezing turns the array immutable, thawing makes it mutable. The second two are our replacements for Array.! and Array.// when reading and updating the array. There are a lot of typeclass constraints in these. So let's simplify them by substituting in the types we'll use:

freeze
  :: IOArray Location CellBoundaries 
  -> IO (Array Location CellBoundaries)

thaw 
  :: Array Location CellBoundaries 
  -> IO (IOArray Location CellBoundaries)

readArray
  :: IOArray Location CellBoundaries 
  -> Location 
  -> IO CellBoundaries

writeArray
  :: IOArray Location CellBoundaries
  -> Location
  -> CellBoundaries
  -> IO ()

Obviously, we'll need to add the IO monad into our code at some point. Let's see how this works.

Basic Changes

We won't need to change how the main World type uses the array. We'll only be changing how the SearchState stores it. So let's go ahead and change that type:

type MMaze = IA.IOArray Location CellBoundaries

data SearchState = SearchState
  { randomGen :: StdGen
  , locationStack :: [Location]
  , currentBoundaries :: MMaze
  , visitedCells :: Set.Set Location
  }

The first issue is that we should now pass a mutable array to our initial search state. We'll use the same initialBounds item, except we'll thaw it first to get a mutable version. Then we'll construct the state and pass it along to our search function. At the end, we'll freeze the resulting state. All this involves making our generation function live in the IO monad:

-- This did not have IO before!
generateRandomMaze :: StdGen -> (Int, Int) -> IO Maze
generateRandomMaze gen (numRows, numColumns) = do
  initialMutableBounds <- IA.thaw initialBounds
  let initialState = SearchState 
                       g2
                       [(startX, startY)]
                       initialMutableBounds
                       Set.empty
  let finalBounds = currentBoundaries
                      (execState dfsSearch initialState)
  IA.freeze finalBounds
  where
    (startX, g1) = …
    (startY, g2) = …

    initialBounds :: Maze
    initialBounds = …

This seems to "solve" our issues in this function and push all our errors into dfsSearch. But it should be obvious that we need a fundamental change there. We'll need the IO monad to make array updates. So the type signatures of all our search functions need to change. In particular, we want to combine monads with StateT SearchState IO. Then we'll make any "pure" functions use IO instead.

dfsSearch :: StateT SearchState IO ()

findCandidates :: Location -> Maze -> Set.Set Location
  -> IO [(Location, CellBoundaries, Location, CellBoundaries)]

chooseCandidate
  :: [(Location, CellBoundaries, Location, CellBoundaries)]
  -> StateT SearchState IO ()

This will lead us to update our generation function.

generateRandomMaze :: StdGen -> (Int, Int) -> IO Maze
generateRandomMaze gen (numRows, numColumns) = do
  initialMutableBounds <- IA.thaw initialBounds
  let initialState = SearchState
                       g2
                       [(startX, startY)]
                       initialMutableBounds
                       Set.empty
  finalBounds <- currentBoundaries <$>
                  (execStateT dfsSearch initialState)
  IA.freeze finalBounds
  where
  …

The original dfsSearch definition is almost fine. But findCandidates is now a monadic function. So we'll have to extract its result instead of using let:

-- Previously
let candidateLocs = findCandidates currentLoc bounds visited

-- Now
candidateLocs <- lift $ findCandidates currentLoc bounds visited

The findCandidates function though will need a bit more re-tooling. The main this is that we need readArray instead of Array.!. The first swap is easy:

findCandidates currentLocation@(x, y) bounds visited = do
  currentLocBounds <- IA.readArray bounds currentLocation
  ...

It's tempting to go ahead and read all the other values for upLoc, rightLoc, etc. right now:

findCandidates currentLocation@(x, y) bounds visited = do
  currentLocBounds <- IA.readArray bounds currentLocation
  let upLoc = (x, y + 1)
  upBounds <- IA.readArray bounds upLoc
  ...

We can't do that though, because this will access them in a strict way. We don't want to access upLoc until we know the location is valid. So we need to do this within the case statement:

findCandidates currentLocation@(x, y) bounds visited = do
  currentLocBounds <- IA.readArray bounds currentLocation
  let upLoc = (x, y + 1)
  maybeUpCell <- case (upBoundary currentLocBounds,
                       Set.member upLoc visited) of
    (Wall, False) -> do
      upBounds <- IA.readArray bounds upLoc
      return $ Just
        ( upLoc
        , upBounds {downBoundary = AdjacentCell currentLocation}
        , currentLocation
        , currentLocBounds {upBoundary = AdjacentCell upLoc}
        )
    _ -> return Nothing

And then we'll do the same for the other directions and that's all for this function!

Choosing Candidates

We don't have to change too much about our chooseCandidates function! The primary change is to eliminate the line where we use Array.// to update the array. We'll replace this with two monadic lines using writeArray instead. Here's all that happens!

chooseCandidate candidates = do
  (SearchState gen currentLocs boundsMap visited) <- get
  ...
  lift $ IA.writeArray boundsMap chosenLocation newChosenBounds
  lift $ IA.writeArray boundsMap prevLocation newPrevBounds
  put (SearchState newGen (chosenLocation : currentLocs) boundsMap newVisited)

Aside from that, there's one small change in our runner to use the IO monad for generateRandomMaze. But after that, we're done!

Conclusion

As mentioned above, you can see all these changes in this commit on our github repository. The last two articles have illustrated how it's not hard to refactor our Haskell code much of the time. As long as we are methodical, we can pick the one thing that needs to change. Then we let the compiler errors direct us to everything we need to update as a result. I find refactoring other languages (particularly Python/Javascript) to be much more stressful. I'm often left wondering...have I actually covered everything? But in Haskell, there's a much better chance of getting everything right the first time!

To learn more about Compile Driven Development, read our Haskell Brain Series. If you're new to Haskell you can also read our Liftoff Series and download our Beginners Checklist!

Previous
Previous

Quicksort with Haskell!

Next
Next

Compile Driven Development In Action: Refactoring to Arrays!