Making Arrays Mutable!
Last week we walked through the process of refactoring our code to use Data.Array
instead of Data.Map
. But in the process, we introduced a big inefficiency! When we use the Array.//
function to "update" our array, it has to create a completely new copy of the array! For various reasons, Map
doesn't have to do this.
So how can we fix this problem? The answer is to use the MArray
interface, for mutable arrays. With mutable arrays, we can modify them in-place, without a copy. This results in code that is much more efficient. This week, we'll explore the modifications we can make to our code to allow this. You can see a quick summary of all the changes in this Git Commit.
Refactoring code can seem like an hard process, but it's actually quite easy with Haskell! In this article, we'll use the idea of "Compile Driven Development". With this process, we update our types and then let compiler errors show us all the changes we need. To learn more about this, and other Haskell paradigms, read our Haskell Brain series!
Mutable Arrays
To start with, let's address the seeming contradiction of having mutable data in an immutable language. We'll be working with the IOArray
type in this article. An item of type IOArray
acts like a pointer, similar to an IORef
. And this pointer is, in fact, immutable! We can't make it point to a different spot in memory. But we can change the underlying data at this memory. But to do so, we'll need a monad that allows such side effects.
In our case, with IOArray
, we'll use the IO
monad. This is also possible with the ST
monad. But the specific interface functions we'll use (which are possible with either option) live in the MArray
library. There are four in particular we're concerned with:
freeze :: (Ix i, MArray a e m, IArray b e) => a i e -> m (b i e)
thaw :: (Ix i, IArray a e, MArray b e m) => a i e -> m (b i e)
readArray :: (MArray a e m, Ix i) => a i e -> i -> m e
writeArray :: (MArray a e m, Ix i) => a i e -> i -> e -> m ()
The first two are conversion functions between normal, immutable arrays and mutable arrays. Freezing turns the array immutable, thawing makes it mutable. The second two are our replacements for Array.!
and Array.//
when reading and updating the array. There are a lot of typeclass constraints in these. So let's simplify them by substituting in the types we'll use:
freeze
:: IOArray Location CellBoundaries
-> IO (Array Location CellBoundaries)
thaw
:: Array Location CellBoundaries
-> IO (IOArray Location CellBoundaries)
readArray
:: IOArray Location CellBoundaries
-> Location
-> IO CellBoundaries
writeArray
:: IOArray Location CellBoundaries
-> Location
-> CellBoundaries
-> IO ()
Obviously, we'll need to add the IO
monad into our code at some point. Let's see how this works.
Basic Changes
We won't need to change how the main World
type uses the array. We'll only be changing how the SearchState
stores it. So let's go ahead and change that type:
type MMaze = IA.IOArray Location CellBoundaries
data SearchState = SearchState
{ randomGen :: StdGen
, locationStack :: [Location]
, currentBoundaries :: MMaze
, visitedCells :: Set.Set Location
}
The first issue is that we should now pass a mutable array to our initial search state. We'll use the same initialBounds
item, except we'll thaw
it first to get a mutable version. Then we'll construct the state and pass it along to our search function. At the end, we'll freeze
the resulting state. All this involves making our generation function live in the IO
monad:
-- This did not have IO before!
generateRandomMaze :: StdGen -> (Int, Int) -> IO Maze
generateRandomMaze gen (numRows, numColumns) = do
initialMutableBounds <- IA.thaw initialBounds
let initialState = SearchState
g2
[(startX, startY)]
initialMutableBounds
Set.empty
let finalBounds = currentBoundaries
(execState dfsSearch initialState)
IA.freeze finalBounds
where
(startX, g1) = …
(startY, g2) = …
initialBounds :: Maze
initialBounds = …
This seems to "solve" our issues in this function and push all our errors into dfsSearch
. But it should be obvious that we need a fundamental change there. We'll need the IO
monad to make array updates. So the type signatures of all our search functions need to change. In particular, we want to combine monads with StateT SearchState IO
. Then we'll make any "pure" functions use IO
instead.
dfsSearch :: StateT SearchState IO ()
findCandidates :: Location -> Maze -> Set.Set Location
-> IO [(Location, CellBoundaries, Location, CellBoundaries)]
chooseCandidate
:: [(Location, CellBoundaries, Location, CellBoundaries)]
-> StateT SearchState IO ()
This will lead us to update our generation function.
generateRandomMaze :: StdGen -> (Int, Int) -> IO Maze
generateRandomMaze gen (numRows, numColumns) = do
initialMutableBounds <- IA.thaw initialBounds
let initialState = SearchState
g2
[(startX, startY)]
initialMutableBounds
Set.empty
finalBounds <- currentBoundaries <$>
(execStateT dfsSearch initialState)
IA.freeze finalBounds
where
…
The original dfsSearch
definition is almost fine. But findCandidates
is now a monadic function. So we'll have to extract its result instead of using let
:
-- Previously
let candidateLocs = findCandidates currentLoc bounds visited
-- Now
candidateLocs <- lift $ findCandidates currentLoc bounds visited
The findCandidates
function though will need a bit more re-tooling. The main this is that we need readArray
instead of Array.!
. The first swap is easy:
findCandidates currentLocation@(x, y) bounds visited = do
currentLocBounds <- IA.readArray bounds currentLocation
...
It's tempting to go ahead and read all the other values for upLoc
, rightLoc
, etc. right now:
findCandidates currentLocation@(x, y) bounds visited = do
currentLocBounds <- IA.readArray bounds currentLocation
let upLoc = (x, y + 1)
upBounds <- IA.readArray bounds upLoc
...
We can't do that though, because this will access them in a strict way. We don't want to access upLoc
until we know the location is valid. So we need to do this within the case statement:
findCandidates currentLocation@(x, y) bounds visited = do
currentLocBounds <- IA.readArray bounds currentLocation
let upLoc = (x, y + 1)
maybeUpCell <- case (upBoundary currentLocBounds,
Set.member upLoc visited) of
(Wall, False) -> do
upBounds <- IA.readArray bounds upLoc
return $ Just
( upLoc
, upBounds {downBoundary = AdjacentCell currentLocation}
, currentLocation
, currentLocBounds {upBoundary = AdjacentCell upLoc}
)
_ -> return Nothing
And then we'll do the same for the other directions and that's all for this function!
Choosing Candidates
We don't have to change too much about our chooseCandidates
function! The primary change is to eliminate the line where we use Array.//
to update the array. We'll replace this with two monadic lines using writeArray
instead. Here's all that happens!
chooseCandidate candidates = do
(SearchState gen currentLocs boundsMap visited) <- get
...
lift $ IA.writeArray boundsMap chosenLocation newChosenBounds
lift $ IA.writeArray boundsMap prevLocation newPrevBounds
put (SearchState newGen (chosenLocation : currentLocs) boundsMap newVisited)
Aside from that, there's one small change in our runner to use the IO
monad for generateRandomMaze
. But after that, we're done!
Conclusion
As mentioned above, you can see all these changes in this commit on our github repository. The last two articles have illustrated how it's not hard to refactor our Haskell code much of the time. As long as we are methodical, we can pick the one thing that needs to change. Then we let the compiler errors direct us to everything we need to update as a result. I find refactoring other languages (particularly Python/Javascript) to be much more stressful. I'm often left wondering...have I actually covered everything? But in Haskell, there's a much better chance of getting everything right the first time!
To learn more about Compile Driven Development, read our Haskell Brain Series. If you're new to Haskell you can also read our Liftoff Series and download our Beginners Checklist!