Implementation: Creating a Maze Environment

maze_environment_small.jpg

In our last episode we explored the similarities and differences of the "World" idea from Gloss and the "Environment" of Open AI Gym. We designed a Haskell typeclass to capture this idea of an environment that could use associated data types connected for the specific game we're playing.

Now this week we're going to show this class in action. We're going to take our toy example of using BFS to search through a maze and we're going to make an Environment for it! We'll see how all these details fit together and we'll see the game play out in our console!

The code in this article is available on GitHub! For this article, you should focus on the MazeEnvironment module. If you'd like to watch this article as a video instead, take a look at this YouTube video!

There's also a special announcement at the end of this post if you're interested in AI and machine learning, so make sure you read to the end! Let's get started!

Defining Our Types

I may sound like a broken record at this point, but we're going to start by defining some types! First we'll make the State type for our environment, which will be functionally equivalent to our previous World type in Gloss.

data MazeGameState = MazeGameState
  { playerLoc :: Location
  , startLoc :: Location
  , endLoc :: Location
  , gameGrid :: Grid
  }

This state isn't the object of our class though! In order to do that, we have to build a monadic type. So what monad should we use? Naturally, we want to use the State monad to track our "State" type. Seems pretty obvious. We'll also include IO so that we can render our game later.

newtype MazeGameM a = MazeGameM (StateT MazeGameState IO a)
  deriving (Functor, Applicative, Monad)

Let's also define some basic monad instances like MonadState and MonadIO. This will make our life easier when we write our implementation code!

instance (MonadState MazeGameState) MazeGameM where
  get = MazeGameM get
  put env = MazeGameM $ put env

instance MonadIO MazeGameM where
  liftIO action = MazeGameM (lift action)

Instance Types

Now that we've got our monadic type we're ready to start making our instance. First, we want to assign the associated types. Remember these are the Observation type, the Action type, and the "Environment State" type.

When it comes to the observation, the "mutable" data we have in our game state is just the player's location. So we can assign Location as the corresponding type in our class.

For the "action", we have 5 options. We can move in any of the four cardinal directions, or we can make no move. So let's define an enum for that.

data Direction =
  MoveUp |
  MoveLeft |
  MoveDown |
  MoveRight |
  MoveNone
  deriving (Show)

And of course we'll use our state from above for the environment type. So here's what our instance looks like so far:

instance EnvironmentMonad MazeGameM where
  type (Observation MazeGameM) = Location
  type (Action MazeGameM) = Direction
  type (EnvironmentState MazeGameM) = MazeGameState
  ...

Simple Functions

Now we can start filling in the functions and expressions for the instance. To "reset" our environment, we just want to change the player's location to the start:

resetGame :: MazeGameM Location
resetGame = do
  current <- get
  put $ current { playerLoc = startLoc current }
  return (startLoc current)

Running the environment is also rather simple. Give an action in our monad, we'll unwrap its state action, run that with the given environment, and produce the IO action!

instance EnvironmentMonad MazeGameM where
  ...
  resetEnv = resetGame
  runEnv env (MazeGameM action) = evalStateT action env
  ...

After that, getting the "current" observation is as easy as querying for the player's location. And for "possible actions" we'll just return a full list of the enum values. If we wanted to get fancy, we could remove the option of moving into walls or off the board, but we'll need to handle that logic later anyways, so we'll keep this part simple.

instance EnvironmentMonad MazeGameM where
  type (Observation MazeGameM) = Location
  type (Action MazeGameM) = Direction
  type (EnvironmentState MazeGameM) = MazeGameState
  resetEnv = resetGame
  runEnv env (MazeGameM action) = evalStateT action env
  currentObservation = MazeGameM (playerLoc <$> get)
  possibleActions _ = return [MoveUp, MoveLeft, MoveDown, MoveRight, MoveNone]
  ...

Stepping Forward

We just have one field left, but it's the most complicated! We need to determine how to "step" the game based on an action. This will always be the toughest part because this is where all your game logic really happens! Our game is pretty simple though. Let's actually start with a few helpers.

First, let's determine if a space is a valid move in our grid. We just check that it's in bounds and that it is not a wall:

isValidLoc :: Location -> Grid -> Bool
isValidLoc (r, c) grid =
  r >= 0 &&
  c >= 0 &&
  r <= (fst . snd) (A.bounds grid) &&
  c <= (snd . snd) (A.bounds grid) &&
  grid A.! (r, c) == Empty

Now we want two functions that are kind of inverses. We'll use findNextLoc to take a current location and the direction, and give us the next location. Then moveDirection will do the opposite, taking a start and end point and giving us the direction between them.

findNextLoc :: Direction -> Location -> Location
findNextLoc MoveUp (r, c) = (r - 1, c)
findNextLoc MoveLeft (r, c) = (r, c - 1)
findNextLoc MoveDown (r, c) = (r + 1, c)
findNextLoc MoveRight (r, c) = (r, c + 1)
findNextLoc MoveNone (r, c) = (r, c)

moveDirection :: Location -> Location -> Direction
moveDirection (r, c) nextLoc
  | nextLoc == (r - 1, c) = MoveUp
  | nextLoc == (r, c - 1) = MoveLeft
  | nextLoc == (r + 1, c) = MoveDown
  | nextLoc == (r, c + 1) = MoveRight
  | otherwise = MoveNone

Now we're ready to write our step function. Recall that this function will take our game's action, which is a direction that we desire to move. We start by retrieving the game state and the current location.

stepGame :: Direction -> MazeGameM (Location, Reward, Bool)
stepGame direction = do
  current <- get
  let currentLoc = playerLoc current
  ...

Now we can find the next location based on this direction. If it's valid, we'll assign it as our "final" location (if not, we use the previous location). Then we save this in our state with "put".

stepGame :: Direction -> MazeGameM (Location, Reward, Bool)
stepGame direction = do
  current <- get
  let currentLoc = playerLoc current
  let nextLoc = findNextLoc direction currentLoc
  let finalLoc = if isValidLoc nextLoc (gameGrid current)
                   then nextLoc
                   else currentLoc
  put $ current { playerLoc = finalLoc }
  ...

Finally, we must determine our return values! Of course, the final location provides our new observation. If our new location is the final location, we'll provide the user with a "reward" of 1.0 and declare the game to be "done". Otherwise, we give no reward and the game goes on.

stepGame :: Direction -> MazeGameM (Location, Reward, Bool)
stepGame direction = do
  current <- get
  let currentLoc = playerLoc current
  let nextLoc = findNextLoc direction currentLoc
  let finalLoc = if isValidLoc nextLoc (gameGrid current)
                   then nextLoc
                   else currentLoc
  put $ current { playerLoc = finalLoc }
  let done = finalLoc == endLoc current
  let reward = if currentLoc /= finalLoc && done
                  then Reward 1.0
                  else Reward 0.0
  return (finalLoc, reward, done)

Filling this function in for stepEnv then completes our instance definition!

```haskell
instance EnvironmentMonad MazeGameM where
  ...
  stepEnv = stepGame

Playing Our Game

Now to play the game, we need to supply a "brain". That is, we need to be able to choose an action based on the game state. First, we retrieve the environment:

chooseMoveMaze :: MazeGameM Direction
chooseMoveMaze = do
  env <- get
  ...

With access to the game's grid and the relevant locations, we can then turn to our trusty Breadth-First-Search function!

chooseMoveMaze :: MazeGameM Direction
chooseMoveMaze = do
  env <- get
  let path = bfsSearch (gameGrid env) (playerLoc env) (endLoc env)
  ...

If there is no path, our direction is "None". Otherwise, we can take the first item from the path, and use moveDirection to calculate the direction we need. And then we're done!

chooseMoveMaze :: MazeGameM Direction
chooseMoveMaze = do
  env <- get
  let path = bfsSearch (gameGrid env) (playerLoc env) (endLoc env)
  case path of
    [] -> return MoveNone
    (move : _) -> return $ moveDirection (playerLoc env) move

Now we can play our game! We'll create a basic environment, and use gameLoop from last week, in conjunction with our brain function!

>> let baseEnv = MazeGameState (0,0) (0, 0) (2, 2) baseGrid
>> runEnv baseEnv (gameLoop chooseMoveMaze)

Rendering Our Game

This is great! We can now play our game...in theory. But in practice, we'd like to be able to see what's going on! So let's make a render function and make our environment renderable! Let's start by defining an ASCII character to correspond with each kind of location in the game.

  1. Use 'o' for the player's location
  2. Use 'x' for walls
  3. Use 'F' for the "finish"
  4. Use underscore '_' for blank spaces

This is a simple function to write:

charForLoc :: MazeGameState -> Location -> Char
charForLoc env loc = if loc == playerLoc env
  then 'o'
  else if loc == endLoc env
    then 'F'
    else if gameGrid env A.! loc == Wall then 'x' else '_'

Now to render, we just have to divide our Array into its constituent rows. The groupBy function is the easiest way to do this. We use the first element of the tuple index to do the matching.

instance RenderableEnvironment MazeGameM where
  renderEnv = do
    env <- get
    let rows = groupBy
                 (\((a, _), _) ((b, _), _) -> a == b)
                 (A.assocs (gameGrid env))
    ...

Now we just do a nested loop, and print the character for each cell!

instance RenderableEnvironment MazeGameM where
  renderEnv = do
    env <- get
    let rows = groupBy
                 (\((a, _), _) ((b, _), _) -> a == b)
                 (A.assocs (gameGrid env))
    forM_ rows $ \row -> liftIO $ do
      forM_ row $ \(l, _) -> putChar (charForLoc env l)
      putStr "\n"
    liftIO $ putStr "\n"

Now instead of using gameLoop like above, we can use gameRenderLoop!

>> let baseEnv = MazeGameState (0,0) (0, 0) (2, 2) baseGrid
>> runEnv baseEnv (gameRenderLoop chooseMoveMaze)
...

It will display the game at each stage so we can see our player moving along!

o___
_xx_
_xF_
____

_o__
_xx_
_xF_
____

__o_
_xx_
_xF_
____

___o
_xx_
_xF_
____

____
_xxo
_xF_
____

____
_xx_
_xFo
____

____
_xx_
_xo_
____

Special Announcement!

Now the only thing cooler than making a game with a working AI is having that AI learn for itself what to do! Next time, we'll modify our environment so it can use machine learning to improve an agent's behavior over time! We'll use a simple reinforcement learning approach to help our player navigate this simple maze.

If you've made it this far, you deserve to hear our special announcement, which is very much related to this idea of machine learning. One of the most important technologies when it comes to machine learning these days is TensorFlow. But as with Open Gym AI, most of the support for TensorFlow is in Python, not Haskell.

But a Haskell library does exist for TensorFlow! And today I am releasing a new course called Haskell Brain that will help you learn how to use this library. It goes over the basics of getting setup with Haskell and TensorFlow on your machine, as well as all the conceptual and syntactic ideas you need to get started writing your code. So if combining machine learning and Haskell sounds like a thrilling idea to you, then head to our course page now!

Previous
Previous

Learning to Navigate the Maze!

Next
Next

From World to Environment: Open AI Gym Primer