Implementation: Creating a Maze Environment
In our last episode we explored the similarities and differences of the "World" idea from Gloss and the "Environment" of Open AI Gym. We designed a Haskell typeclass to capture this idea of an environment that could use associated data types connected for the specific game we're playing.
Now this week we're going to show this class in action. We're going to take our toy example of using BFS to search through a maze and we're going to make an Environment for it! We'll see how all these details fit together and we'll see the game play out in our console!
The code in this article is available on GitHub! For this article, you should focus on the MazeEnvironment module. If you'd like to watch this article as a video instead, take a look at this YouTube video!
There's also a special announcement at the end of this post if you're interested in AI and machine learning, so make sure you read to the end! Let's get started!
Defining Our Types
I may sound like a broken record at this point, but we're going to start by defining some types! First we'll make the State
type for our environment, which will be functionally equivalent to our previous World
type in Gloss.
data MazeGameState = MazeGameState
{ playerLoc :: Location
, startLoc :: Location
, endLoc :: Location
, gameGrid :: Grid
}
This state isn't the object of our class though! In order to do that, we have to build a monadic type. So what monad should we use? Naturally, we want to use the State
monad to track our "State" type. Seems pretty obvious. We'll also include IO
so that we can render our game later.
newtype MazeGameM a = MazeGameM (StateT MazeGameState IO a)
deriving (Functor, Applicative, Monad)
Let's also define some basic monad instances like MonadState
and MonadIO
. This will make our life easier when we write our implementation code!
instance (MonadState MazeGameState) MazeGameM where
get = MazeGameM get
put env = MazeGameM $ put env
instance MonadIO MazeGameM where
liftIO action = MazeGameM (lift action)
Instance Types
Now that we've got our monadic type we're ready to start making our instance. First, we want to assign the associated types. Remember these are the Observation type, the Action type, and the "Environment State" type.
When it comes to the observation, the "mutable" data we have in our game state is just the player's location. So we can assign Location
as the corresponding type in our class.
For the "action", we have 5 options. We can move in any of the four cardinal directions, or we can make no move. So let's define an enum for that.
data Direction =
MoveUp |
MoveLeft |
MoveDown |
MoveRight |
MoveNone
deriving (Show)
And of course we'll use our state from above for the environment type. So here's what our instance looks like so far:
instance EnvironmentMonad MazeGameM where
type (Observation MazeGameM) = Location
type (Action MazeGameM) = Direction
type (EnvironmentState MazeGameM) = MazeGameState
...
Simple Functions
Now we can start filling in the functions and expressions for the instance. To "reset" our environment, we just want to change the player's location to the start:
resetGame :: MazeGameM Location
resetGame = do
current <- get
put $ current { playerLoc = startLoc current }
return (startLoc current)
Running the environment is also rather simple. Give an action in our monad, we'll unwrap its state action, run that with the given environment, and produce the IO action!
instance EnvironmentMonad MazeGameM where
...
resetEnv = resetGame
runEnv env (MazeGameM action) = evalStateT action env
...
After that, getting the "current" observation is as easy as querying for the player's location. And for "possible actions" we'll just return a full list of the enum values. If we wanted to get fancy, we could remove the option of moving into walls or off the board, but we'll need to handle that logic later anyways, so we'll keep this part simple.
instance EnvironmentMonad MazeGameM where
type (Observation MazeGameM) = Location
type (Action MazeGameM) = Direction
type (EnvironmentState MazeGameM) = MazeGameState
resetEnv = resetGame
runEnv env (MazeGameM action) = evalStateT action env
currentObservation = MazeGameM (playerLoc <$> get)
possibleActions _ = return [MoveUp, MoveLeft, MoveDown, MoveRight, MoveNone]
...
Stepping Forward
We just have one field left, but it's the most complicated! We need to determine how to "step" the game based on an action. This will always be the toughest part because this is where all your game logic really happens! Our game is pretty simple though. Let's actually start with a few helpers.
First, let's determine if a space is a valid move in our grid. We just check that it's in bounds and that it is not a wall:
isValidLoc :: Location -> Grid -> Bool
isValidLoc (r, c) grid =
r >= 0 &&
c >= 0 &&
r <= (fst . snd) (A.bounds grid) &&
c <= (snd . snd) (A.bounds grid) &&
grid A.! (r, c) == Empty
Now we want two functions that are kind of inverses. We'll use findNextLoc
to take a current location and the direction, and give us the next location. Then moveDirection
will do the opposite, taking a start and end point and giving us the direction between them.
findNextLoc :: Direction -> Location -> Location
findNextLoc MoveUp (r, c) = (r - 1, c)
findNextLoc MoveLeft (r, c) = (r, c - 1)
findNextLoc MoveDown (r, c) = (r + 1, c)
findNextLoc MoveRight (r, c) = (r, c + 1)
findNextLoc MoveNone (r, c) = (r, c)
moveDirection :: Location -> Location -> Direction
moveDirection (r, c) nextLoc
| nextLoc == (r - 1, c) = MoveUp
| nextLoc == (r, c - 1) = MoveLeft
| nextLoc == (r + 1, c) = MoveDown
| nextLoc == (r, c + 1) = MoveRight
| otherwise = MoveNone
Now we're ready to write our step function. Recall that this function will take our game's action, which is a direction that we desire to move. We start by retrieving the game state and the current location.
stepGame :: Direction -> MazeGameM (Location, Reward, Bool)
stepGame direction = do
current <- get
let currentLoc = playerLoc current
...
Now we can find the next location based on this direction. If it's valid, we'll assign it as our "final" location (if not, we use the previous location). Then we save this in our state with "put".
stepGame :: Direction -> MazeGameM (Location, Reward, Bool)
stepGame direction = do
current <- get
let currentLoc = playerLoc current
let nextLoc = findNextLoc direction currentLoc
let finalLoc = if isValidLoc nextLoc (gameGrid current)
then nextLoc
else currentLoc
put $ current { playerLoc = finalLoc }
...
Finally, we must determine our return values! Of course, the final location provides our new observation. If our new location is the final location, we'll provide the user with a "reward" of 1.0 and declare the game to be "done". Otherwise, we give no reward and the game goes on.
stepGame :: Direction -> MazeGameM (Location, Reward, Bool)
stepGame direction = do
current <- get
let currentLoc = playerLoc current
let nextLoc = findNextLoc direction currentLoc
let finalLoc = if isValidLoc nextLoc (gameGrid current)
then nextLoc
else currentLoc
put $ current { playerLoc = finalLoc }
let done = finalLoc == endLoc current
let reward = if currentLoc /= finalLoc && done
then Reward 1.0
else Reward 0.0
return (finalLoc, reward, done)
Filling this function in for stepEnv
then completes our instance definition!
```haskell
instance EnvironmentMonad MazeGameM where
...
stepEnv = stepGame
Playing Our Game
Now to play the game, we need to supply a "brain". That is, we need to be able to choose an action based on the game state. First, we retrieve the environment:
chooseMoveMaze :: MazeGameM Direction
chooseMoveMaze = do
env <- get
...
With access to the game's grid and the relevant locations, we can then turn to our trusty Breadth-First-Search function!
chooseMoveMaze :: MazeGameM Direction
chooseMoveMaze = do
env <- get
let path = bfsSearch (gameGrid env) (playerLoc env) (endLoc env)
...
If there is no path, our direction is "None". Otherwise, we can take the first item from the path, and use moveDirection
to calculate the direction we need. And then we're done!
chooseMoveMaze :: MazeGameM Direction
chooseMoveMaze = do
env <- get
let path = bfsSearch (gameGrid env) (playerLoc env) (endLoc env)
case path of
[] -> return MoveNone
(move : _) -> return $ moveDirection (playerLoc env) move
Now we can play our game! We'll create a basic environment, and use gameLoop
from last week, in conjunction with our brain function!
>> let baseEnv = MazeGameState (0,0) (0, 0) (2, 2) baseGrid
>> runEnv baseEnv (gameLoop chooseMoveMaze)
Rendering Our Game
This is great! We can now play our game...in theory. But in practice, we'd like to be able to see what's going on! So let's make a render function and make our environment renderable! Let's start by defining an ASCII character to correspond with each kind of location in the game.
- Use 'o' for the player's location
- Use 'x' for walls
- Use 'F' for the "finish"
- Use underscore '_' for blank spaces
This is a simple function to write:
charForLoc :: MazeGameState -> Location -> Char
charForLoc env loc = if loc == playerLoc env
then 'o'
else if loc == endLoc env
then 'F'
else if gameGrid env A.! loc == Wall then 'x' else '_'
Now to render, we just have to divide our Array into its constituent rows. The groupBy
function is the easiest way to do this. We use the first element of the tuple index to do the matching.
instance RenderableEnvironment MazeGameM where
renderEnv = do
env <- get
let rows = groupBy
(\((a, _), _) ((b, _), _) -> a == b)
(A.assocs (gameGrid env))
...
Now we just do a nested loop, and print the character for each cell!
instance RenderableEnvironment MazeGameM where
renderEnv = do
env <- get
let rows = groupBy
(\((a, _), _) ((b, _), _) -> a == b)
(A.assocs (gameGrid env))
forM_ rows $ \row -> liftIO $ do
forM_ row $ \(l, _) -> putChar (charForLoc env l)
putStr "\n"
liftIO $ putStr "\n"
Now instead of using gameLoop
like above, we can use gameRenderLoop
!
>> let baseEnv = MazeGameState (0,0) (0, 0) (2, 2) baseGrid
>> runEnv baseEnv (gameRenderLoop chooseMoveMaze)
...
It will display the game at each stage so we can see our player moving along!
o___
_xx_
_xF_
____
_o__
_xx_
_xF_
____
__o_
_xx_
_xF_
____
___o
_xx_
_xF_
____
____
_xxo
_xF_
____
____
_xx_
_xFo
____
____
_xx_
_xo_
____
Special Announcement!
Now the only thing cooler than making a game with a working AI is having that AI learn for itself what to do! Next time, we'll modify our environment so it can use machine learning to improve an agent's behavior over time! We'll use a simple reinforcement learning approach to help our player navigate this simple maze.
If you've made it this far, you deserve to hear our special announcement, which is very much related to this idea of machine learning. One of the most important technologies when it comes to machine learning these days is TensorFlow. But as with Open Gym AI, most of the support for TensorFlow is in Python, not Haskell.
But a Haskell library does exist for TensorFlow! And today I am releasing a new course called Haskell Brain that will help you learn how to use this library. It goes over the basics of getting setup with Haskell and TensorFlow on your machine, as well as all the conceptual and syntactic ideas you need to get started writing your code. So if combining machine learning and Haskell sounds like a thrilling idea to you, then head to our course page now!