James Bowen 11/11/19 James Bowen 11/11/19

Looking Ahead with More Steps!

In last week's article, we set ourselves up to make our agent use temporal difference learning. But TD is actually a whole family of potential learning methods we can use. They intertwine with other concepts in a bigger category of reinforcement learning algorithms.

In this article, we'll consider a couple possible TD approaches. We'll also examine a bit of theory surrounding other reinforcement learning topics.

For a more high level overview of using Haskell and AI, take a look at our Haskell AI Series! This series will also help you better grasp some of the basics of TensorFlow.

One Step Temporal Difference

Temporal difference learning has one general principle. The evaluation of the current game position should be similar to the evaluation of positions in the future. So at any given step we have our "current" evaluation. Then we have a "target" evaluation based on the future. We want to train our network so that the current board gets evaluated more like the target value.

We can see this in the way we defined our model. The tdTrainStep takes two different values, the target evaluation and the current evaluation.

data TDModel = TDModel
  { …
  , tdTrainStep :: TensorData Float -> TensorData Float -> Session ()
  }

And in fact, doing this calculation isn't so different from what we've done before. We'll take the difference between these evaluations, square it, and use reduceSum. This gives our loss function. Then we'll have TensorFlow minimize the loss function.

createTDModel :: Session TDModel
createTDModel = do
  ...
  -- Train Model
  targetEval <- placeholder (Shape [1])
  currentEval <- placeholder (Shape [1])
  let diff = targetEval `sub` currentEval
  let loss = reduceSum (diff `mul` diff)
  trainer <- minimizeWith
    adam loss [hiddenWeights, hiddenBias, outputWeights, outputBias]
  let trainStep = \targetEvalFeed currentEvalFeed ->
        runWithFeeds [feed targetEval targetEvalFeed, feed currentEval currentEvalFeed] trainer
  return $ TDModel
    { ...
    , tdTrainStep = trainStep
    }

Let's now recall how we got our target value last week. We looked at all our possible moves, and used them to advance the world one step. We then took the best outcome out of those, and that was our target value. Because we're advancing one step into the world, we call this "one-step" TD learning.

Adding More Steps

But there's no reason we can't look further into the future! We can consider what the game will look like in 2 moves, not just one move! To do this, let's generalize our function for stepping forward. It will be stateful over the same parameters as our main iteration function. But we'll call it in a way so that it doesn't affect our main values.

We'll make one change to our approach from last time. If a resulting world is over, we'll immediately put the "correct" evaluation value. In our old approach, we would apply this later. Our new function will return the score from advancing the game, the game result, and the World at this step.

advanceWorldAndGetScore :: Float -> TDModel
  -> StateT (World, StdGen) Session (Float, GameResult, World)
advanceWorldAndGetScore randomChance model = do
  (currentWorld, gen) <- get
  let allMoves = possibleMoves currentWorld
  let newWorlds = fst <$> map ((flip stepWorld) currentWorld) allMoves
  allScoresAndResults <- Data.Vector.fromList <$>
    (forM newWorlds $ \w -> case worldResult w of
      GameLost -> return (0.0, GameLost)
      GameWon -> return (1.0, GameWon)
      GameInProgress -> do
        let worldData = encodeTensorData
              (Shape [1, inputDimen]) (vectorizeWorld8 w)
        scoreVector <- lift $ (tdEvaluateWorldStep model) worldData
        return $ (Data.Vector.head scoreVector, GameInProgress))

  let (chosenIndex, newGen) = bestIndexOrRandom
                                allScoresAndResults gen 
  put (newWorlds !! chosenIndex, newGen)
  let (finalScore, finalResult) = allScoresAndResults ! chosenIndex
  return $ (finalScore, finalResult, newWorlds !! chosenIndex)
  where
    -- Same as before, except with resultOrdering
    bestIndexOrRandom :: Vector (Float, GameResult) -> StdGen
      -> (Int, StdGen)
    ...

    -- First order by result (Win > InProgress > Loss), then score
    resultOrdering :: (Float, GameResult) -> (Float, GameResult)
      -> Ordering
    ...

Now we'll call this from our primary iteration function. It seems a little strange. We unwrap the World from our state only to re-wrap it in another state call. But it will make more sense in a second!

runWorldIteration :: Float -> TDModel
  -> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
  (currentWorld, gen) <- get

  ((chosenNextScore, finalResult, nextWorld), (_, newGen)) <-
    lift $ runStateT
      (advanceWorldAndGetScore randomChance model)
      (currentWorld, gen)

So at the moment, our code is still doing one-step temporal difference. But here's the key. We can now sequence our state action to look further into the future. We'll then get many values to compare for the score. Here's what it looks like for us to look two moves ahead and take the average of all the scores we get:

runWorldIteration :: Float -> TDModel
  -> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
  (currentWorld, gen) <- get

  let numSteps = 2
  let repeatedUpdates = sequence $ replicate numSteps
        (advanceWorldAndGetScore randomChance model)
  (allWorldResults, (_, newGen)) <- lift $
    runStateT repeatedUpdates (currentWorld, gen)

  let allScores = map (\(s, _, _) -> s) allWorldResults
  let averageScore = sum allScores / fromIntegral (length allScores)
  let nextScoreData = encodeTensorData
        (Shape [1]) (Data.Vector.singleton averageScore)
  ...

When it comes to continuing the function though, we only consider the first world and result:

runWorldIteration :: Float -> TDModel
  -> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
  let (_, result1, nextWorld1) = Prelude.head allWorldResults
  put (nextWorld1, newGen)
  case result1 of
    GameLost -> return False
    GameWon -> return True
    GameInProgress -> runWorldIteration randomChance model

We could take more steps if we wanted! We could also change how we get our target score. We could give more weight to near-future scores. Or we could give more weight to scores in the far future. These are all just parameters we can tune now. We can now refer to our temporal difference algorithm as "n-step", rather than 1-step.

Monte Carlo vs. Dynamic Programming

With different parameters, our TD approach can look like other common learning approaches. Dynamic Programming is an approach where we adjust our weights after each move in the game. We expect rewards for a particular state to be like those of near-future states. We use the term "bootstrapping" for "online" learning approaches like this. TD learning also applies bootstrapping.

However, dynamic programming requires that we have a strong model of the world. That is, we would have to know the probability of getting into certain states from our current state. This allows us to more accurately predict the future. We could apply this approach to our maze game on a small enough grid. But the model size would increase exponentially with the grid size and enemies! So our approach doesn't actually do this! We can advance the world with a particular move, but we don't have a comprehensive model of how the world works.

In this regard, TD learning is more like Monte Carlo learning. This algorithm is "model free". But it is not an online algorithm! We must play through an entire episode of the game before we can update the weights. We could take our "n-step" approach above, and play it out over the course of the entire game. If we then chose to provide the full weighting to the final evaluation, our model would be like Monte Carlo!

In general, the more steps we add to our TD approach, the more it approximates Monte Carlo learning. The fewer steps we have, the more it looks like dynamic programming.

TD Lambda

TD Gammon, the algorithm we mentioned last time, uses a variation of TD learning called "TD Lambda". It involves looking both forward in time as well as backwards. It observes that the best solutions lie between the extremes of one-step TD and Monte Carlo.

Academic literature can help give a more complete picture of machine learning. One great text is Reinforcement Learning, by Sutton and Barto. It's one of the authoritative texts on the topics we've discussed in this article!

What's Next

This concludes our exploration of AI within the context of our Maze Game. We'll come back to AI and Machine Learning again soon. Next week, we'll start tackling a new subject in the realm of functional programming, something we've never looked at before on this blog! Stay tuned!

James Bowen 11/4/19 James Bowen 11/4/19

Setting Up Our Model with Look-Ahead

Last week we went over some of the basics of Temporal Difference (TD) learning. We explored a bit of the history, and compared it to its cousin, Q-Learning. Now let's start getting some code out there. Since there's a lot in common with Q-Learning, we'll want a similar structure.

This is at least the third different model we've defined over the course of this series. So we can now start observing the patterns we see in developing these algorithms. Here's a quick outline, before we get started:

Define the inputs and outputs of the system.
Define the data model. This should contain the weight variables we are trying to learn. By including them in the model, we can output our results later. It should also contain important Session actions, such as training.
Create the model
Run iterations using our model

We'll follow this outline throughout the article!

If you're new to Haskell and machine learning, a lot of the code we write here won't make sense. You should start off a little easier with our Haskell AI Series. You should also download our Haskell Tensor Flow Guide.

Inputs and Outputs

For our world features, we'll stick with our hand-crafted feature set, but simplified. Recall that we selected 8 different features for every location our bot could move to. We'll stick with these 8 features. But we only need to worry about them for the current location of the bot. We'll factor in look-ahead by advancing the world for our different moves. So the "features" of adjacent squares are irrelevant. This vectorization is easy enough to get using produceLocationFeatures:

vectorizeWorld8 :: World -> V.Vector Float
vectorizeWorld8 w = V.fromList (fromIntegral <$>
  [ lfOnActiveEnemy standStill
  , lfShortestPathLength standStill
  , lfManhattanDistance standStill
  , lfEnemiesOnPath standStill
  , lfNearestEnemyDistance standStill
  , lfNumNearbyEnemies standStill
  , lfStunAvailable standStill
  , lfDrillsRemaining standStill
  ])
  where
    standStill = produceLocationFeatures
      (playerLocation . worldPlayer $ w) w False

We also don't need to be as concerned about exploring the maze with this agent. We'll be defining what its possible moves are at every turn. This is a simple matter of using this function we have from our game:

possibleMoves :: World -> [PlayerMove]

We should also take this opportunity to specify the dimensions of our network. We'll use 20 hidden units:

inputDimen :: Int64
inputDimen = 8

hiddenDimen :: Int64
hiddenDimen = 20

outputDimen :: Int64
outputDimen = 1

Define the Model

Now let's define our data model. As in the past, we'll use a dense (fully-connected) neural network with one hidden layer. This means we'll expose two sets of weights and biases:

data TDModel = TDModel
  { tdHiddenWeights :: Variable Float
  , tdHiddenBias :: Variable Float
  , tdOutputWeights :: Variable Float
  , tdOutputBias :: Variable Float
  ...
  }

We'll also have two different actions to take with our tensor graph, as we had with Q-Learning. The first will be for evaluating a single world state. The second will take an expected score for the world state as well as the actual score for a world state. It will compare them and train our model:

data TDModel = TDModel
  { ...
  , tdEvaluateWorldStep :: TensorData Float -> Session (Vector Float)
  , tdTrainStep :: TensorData Float -> TensorData Float -> Session ()
  }

Building the Model

Now we need to construct this model. We'll start off as always by initializing random variables for our weights and biases. We'll also make a placeholder for our world input:

createTDModel :: Session TDModel
createTDModel = do
  (worldInputVector :: Tensor Value Float) <-
    placeholder (Shape [1, inputDimen])
  hiddenWeights <- truncatedNormal (vector [inputDimen, hiddenDimen])
    >>= initializedVariable
  hiddenBias <- truncatedNormal (vector [hiddenDimen])
    >>= initializedVariable
  outputWeights <- truncatedNormal (vector [hiddenDimen, outputDimen])
    >>= initializedVariable
  outputBias <- truncatedNormal (vector [outputDimen])
    >>= initializedVariable
  ...

Each layer of our dense network consists of a matrix multiplication by the weights, and adding the bias vector. Between the layers, we'll apply relu activation. We conclude by running the output vector with an input feed:

createTDModel :: Session TDModel
createTDModel = do
  ...
  let hiddenLayerResult = relu $
        (worldInputVector `matMul` (readValue hiddenWeights))
        `add` (readValue hiddenBias)
  let outputLayerResult =
        (hiddenLayerResult `matMul` (readValue outputWeights))
        `add` (readValue outputBias)
  let evaluateStep = \inputFeed -> runWithFeeds
        [feed worldInputVector inputFeed] outputLayerResult
  ...

We'll leave the training step undefined for now. We'll work on that next time.

createTDModel :: Session TDModel
createTDModel = do
  …
  return $ TDModel
    { tdHiddenWeights = hiddenWeights
    , tdHiddenBias = hiddenBias
    , tdOutputWeights = outputWeights
    , tdOutputBias = outputBias
    , tdEvaluateWorldStep = evaluateStep
    , tdTrainStep = undefined
    }

Running World Iterations

Much of the skeleton and support code remains the same from Q-Learning. But let's go over the details of running a single iteration on one of our worlds. This function will take our model as a parameter, as well as a random move chance. (Recall that adding randomness to our moves will help us avoid a stagnant model). It will be stateful over the World and a random generator.

runWorldIteration :: Float -> TDModel
  -> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
  ...

We'll start off by getting all the possible moves from our current position. We'll step the world forward for each one of these moves. Then we'll feed the resulting worlds into our model. This will give us the scores for every move:

runWorldIteration :: Float -> TDModel
  -> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
  (currentWorld, gen) <- get
  let allMoves = possibleMoves currentWorld
  let newWorlds = fst <$> map ((flip stepWorld) currentWorld) allMoves
  (allScores :: Vector Float) <-
    Data.Vector.fromList <$> (forM newWorlds $ \w -> do 
      let worldData = encodeTensorData
            (Shape [1, inputDimen]) (vectorizeWorld8 w)
      scoreVector <- lift $ (tdEvaluateWorldStep model) worldData
      return $ Data.Vector.head scoreVector)
  ...

Now we need to take a similar action to what we had with Q-Learning. We'll roll the dice, and either select the move with the best score, or we'll select a random index.

runWorldIteration :: Float -> TDModel
  -> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
  ...
  let (chosenIndex, newGen) = bestIndexOrRandom allScores gen
  ...
  where
    bestIndexOrRandom :: Vector Float -> StdGen -> (Int, StdGen)
    bestIndexOrRandom scores gen =
      let (randomMoveRoll, gen') = randomR (0.0, 1.0) gen
          (randomIndex, gen'') = randomR (0, 1) gen'
      in  if randomMoveRoll < randomChance
            then (randomIndex, gen'')
            else (maxIndex scores, gen')

Now that we have our "chosen" move and its score, we'll encode that score as data to pass to the training step. The exception to this is if the game ends. In that case, we'll have a "true" score of 1 or 0 to give. While we're at it, we can also calculate the continuationAction. This is either returning a boolean for ending the game, or looping again.

runWorldIteration :: Float -> TDModel
  -> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
  ...
  let nextWorld = newWorlds !! chosenIndex
  put (nextWorld, newGen)
  let (chosenNextScore, continuationAction) =
        case worldResult nextWorld of
          GameLost -> (0.0, return False)
          GameWon -> (1.0, return True)
          GameInProgress -> ( allScores ! chosenIndex
                            , runWorldIteration randomChance model)
  let nextScoreData = encodeTensorData
        (Shape [1]) (Data.Vector.singleton chosenNextScore)
  ...

We'll also encode the evaluation of our current world. Then we'll pass these values to our training step, and run the continuation!

runWorldIteration :: Float -> TDModel
  -> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
  ...
  let currentWorldData = encodeTensorData
        (Shape [1, inputDimen]) (vectorizeWorld8 currentWorld)
  currentScoreVector <- lift $
    (tdEvaluateWorldStep model) currentWorldData
  let currentScoreData = encodeTensorData
        (Shape [1]) currentScoreVector

  lift $ (tdTrainStep model) nextScoreData currentScoreData

  continuationAction

What's Next?

We've now got the basic framework set up for our TD agent. Next time, we'll start digging into the actual formula we use to learn the weights. It's a little more complicated than some of the previous loss functions we've dealt with in the past.

If you want to get started with Haskell and Tensor Flow, download our Haskell Tensor Flow Guide. It will help you learn the basics of this complicated library!

James Bowen 10/28/19 James Bowen 10/28/19

Temporal Difference Primer

Last week we finished our exploration of supervised learning with our maze game. We explored a more complex model that used convolution and pooling. This week, we're going back to "unsupervised" learning. We'll consider another approach that does not require the specification of "correct" outputs.

This approach is Temporal Difference Learning (TD Learning). It relies on having a function to evaluate a game position. Its main principle is that the current position should have a similar evaluation to positions in the near future.

Our evaluation function will use weights whose values our training program will learn. We'll want to learn these weights to minimize the difference between game evaluations. In this article, we'll take a high level look at this approach, before we get into the details next time.

History of TD Learning

The concept of TD learning was first developed in the 1980's. One of the more famous applications of TD learning in the 1990's was to learn an AI for Backgammon, called TD Gammon. This agent could play the game at an intermediate human level. It did this initially with no hand-crafting of any of the game rules or any algorithm.

Getting to this level with a "knowledge free" algorithm was almost unheard of at the time. When providing hand-crafted features, the agent could then play at a near-expert level. It explored many possibilities that human players had written off. In doing so, it contributed new ideas to high level backgammon play. It was an important breakthrough in unsupervised techniques.

Q-Learning vs. TD Learning

A few weeks back, we explored Q-Learning. And at first glance, Q-Learning and TD learning might sound similar. But with temporal difference, we'll be learning a different function. In Q-Learning, we learned the Q function. This is a function that takes in our current game board and provides a score for each possible move. With TD, we'll be learning what we call the V function. This function is a direct evaluation of the current board.

With our game mechanics, our agent chooses between 10 different moves. So the "output" vector of our Q-Learning network had size 10. Now in temporal difference learning, we'll only output a single number. This will be an "evaluation", or score, of the current position.

If a game has more than 2 outcomes, you would want the evaluation function to give a score for each of them. But our game has a binary outcome, so one number is enough.

Basics

Despite this difference, our TensorFlow code will have a similar structure to Q-Learning. Here's a high level overview:

Our model will take an arbitrary game state and produce a score.
At each iteration, we will get the model's output score on all possible moves from that position. We'll account for enemy moves when doing this. We will then choose the move for the best resulting board.
We will advance the world based on this move, and then pass the resulting world through our model again.
Then, adjust the weights so that the evaluations of the new world and the original world are more similar.
If the resulting world is either a "win" or a "loss", we'll use the correct value (1 or 0) as the evaluation. Otherwise, we'll use our evaluation function.

What's Next

Next time, we'll dig into more specifics. It will be a bit tricky to use an evaluation function for our game in conjunction with TensorFlow. But once we have that, we can get into the meatier parts of this algorithm. We'll see exactly what operations we need to train our agent.

To learn more about using Haskell with AI, read our Haskell AI Series! This series shows some of the unique ideas that Haskell can bring to the world of machine learning.

James Bowen 10/21/19 James Bowen 10/21/19

Enemies, Convolution and Pooling

Let's remember now that last week we were still keeping with the idea of supervised learning. We were trying to get our agent to reach the goal on an empty maze, using own moves as a guide. Since the problem was so simple, we could make an agent that navigating the maze.

But what happens if we introduce enemies into the maze? This will complicate the process. Our manual AI isn't actually that successful in this scenario. But recording our own moves provides a reasonable data set.

First, we need to update the features a little bit. Recall that we're representing our board so that every grid space has a single feature. In our old representation, we used a value of 100 to represent the target space, and 10 to represent our own space. We need to expand this representation. We want to know where the enemies are, if they're stunned, and if we still have our stun.

Let's use 25.0 to represent our position if we have the stun. If we don't have our stun available, we'll use 10.0 instead. For positions containing active enemies, we'll use -25.0. If the enemy is in the stunned state, we'll use -10.0.

vectorizeWorld :: World -> V.Vector Float
vectorizeWorld w = finalFeatures
  where
    initialGrid = V.fromList $ take 100 (repeat 0.0)
    (px, py) = playerLocation (worldPlayer w)
    (gx, gy) = endLocation w
    playerLocIndex = (9 - py) * 10 + px
    playerHasStun = playerCurrentStunDelay (worldPlayer w) == 0
    goalLocIndex = (9 - gy) * 10 + gx
    finalFeatures = initialGrid V.//
      ([ (playerLocIndex
       , if playerHasStun then 25.0 else 10.0)
       , (goalLocIndex, 100.0)] ++ enemyLocationUpdates)

    enemyLocationUpdates = enemyPair <$> (worldEnemies w)

    enemyPair e =
      let (ex, ey) = enemyLocation e
      in  ( (9 - ey) * 10 + ex,
             if enemyCurrentStunTimer e == 0 then -25.0 else -10.0)

Using our moves as a training set, we see some interesting results. It's difficult to get great results on our error rates. The data set is very prone to overfitting. We often end up with training error in the 20-30% range with a test error near 50%. And yet, our agent can consistently win the game! Our problem space is still a bit simplistic, but it is an encouraging result!

Convolution and Pooling

As suggested last week, the ideas of convolution and pooling could be useful on this feature set. Let's try it out, for experimentation purposes. We'll start with a function to create a layer in our network that does convolution and pooling.

buildConvPoolLayer :: Int64 -> Int64 -> Tensor v Float -> Build (Variable Float, Variable Float, Tensor Build Float)
buildConvPoolLayer inputChannels outputChannels input = do
  weights <- truncatedNormal (vector weightsShape)
    >>= initializedVariable
  bias <- truncatedNormal (vector [outputChannels])
    >>= initializedVariable
  let conv = conv2D' convAttrs input (readValue weights)
               `add` readValue bias
  let results = maxPool' poolAttrs (relu conv)
  return (weights, bias, results)
  where
    weightsShape :: [Int64]
    weightsShape = [5, 5, inputChannels, outputChannels]

    -- Create "attributes" for our operations
    -- E.g. How far does convolution slide?
    convStridesAttr = opAttr "strides" .~ ([1,1,1,1] :: [Int64])
    poolStridesAttr = opAttr "strides" .~ ([1,2,2,1] :: [Int64])
    poolKSizeAttr = opAttr "ksize" .~ ([1,2,2,1] :: [Int64])
    paddingAttr = opAttr "padding" .~ ("SAME" :: ByteString)
    dataFormatAttr = opAttr "data_format" .~ ("NHWC" :: ByteString)
    convAttrs = convStridesAttr . paddingAttr . dataFormatAttr
    poolAttrs = poolKSizeAttr . poolStridesAttr . paddingAttr . dataFormatAttr

The input to this layer will be our 10x10 "image" of a grid. The output will be 5x5x32. The "5" dimensions come from halving the board dimension. The "32" will come from the number of channels. We'll replace our first neural network layer with this layer:

createModel :: Build Model
createModel = do
  let batchSize = -1
  let (gridDimen :: Int64) = 10
  let inputChannels = 1
  let (convChannels :: Int64) = 32
  let (nnInputSize :: Int64) = 5 * 5 * 32

  (inputs :: Tensor Value Float) <-
    placeholder [batchSize, moveFeatures]
  (outputs :: Tensor Value Int64) <-
    placeholder [batchSize]

  let convDimens = [batchSize, gridDimen, gridDimen, inputChannels]
  let inputAsGrid = reshape inputs (vector convDimens)
  (convWeights, convBias, convResults) <-
    buildConvPoolLayer inputChannels convChannels inputAsGrid

  let nnInput = reshape
          convResults (vector [batchSize, nnInputSize])
  (nnWeights, nnBias, nnResults) <-
    buildNNLayer nnInputSize moveLabels nnInput

  (actualOutput :: Tensor Value Int64) <- render $
    argMax nnResults (scalar (1 :: Int64))

We have a bit less success here. Our training is slower. We get similar error rate. But now our agent doesn't seem to win, unlike the agent from the pure, dense network. So it's not clear that we can actually treat this grid as an image and make any degree of progress.

What's Next?

Over the course of creating these different algorithms, we've discretized the game. We've broken it down into two phases. One where we move, one where the enemies can move. This will make it easier for us to move back to our evaluation function approach. We can even try using multi-move look-ahead. This could lead us to a position where we can try temporal difference learning. Researchers first used this approach to train an agent how to play Backgammon at a very high level.

Temporal difference learning is a very interesting concept. It's the last approach we'll try with our agent. We'll start out next time with an overview of TD learning before we jump into coding.

For another look at using Haskell with AI problems, read our Haskell and AI series! You can also download our Haskell Tensor Flow Guide for some more help with this library!

James Bowen 10/14/19 James Bowen 10/14/19

Different Feature Schemes

In last week's edition of our Maze AI series, we explored the mechanics of supervised learning. We took the training data we'd been building up and trained an agent on it. We had one set of data to make the AI follow our own human moves, and another to follow our hand-crafted AI. This wasn't particularly successful. The resulting agent had a lot of difficulty navigating the maze and using its stun at the right times.

This week, we'll explore a couple different ways we can expand the feature set. First, we'll try encoding the legality of moves in our feature set. Second, we'll try expanding the feature set to include more data about the grid. This will motivate some other approaches to the problem. We'll conclude by taking the specifics of grid navigation out. We'll let our agent go to work on an empty grid to validate that this is at least a reasonable approach.

For some more reading on using Haskell and AI, take a look at our Haskell AI Series. We explore some reasons why Haskell could be a good fit for AI and machine learning problems. It will also help you through some of the basics of using Haskell and Tensor Flow.

Encoding Legal Moves

Our supervised agent uses our current feature set. Let's remind ourselves what these features are. We have five different directions we can go (up, down, left, right, stand still). And in each of these directions, we calculate 8 different features.

The maze distance to the goal
The manhattan distance to the goal
Whether the location contains an active enemy
The number of enemies on the shortest path to the goal from that location
The distance to the nearest enemy from that location
The number of nearby enemies in manhattan distance terms
Whether our stun is available
The number of drills we have after the move

Some of these features are higher level. We do non-trivial calculations to figure them out. This gives our agent some idea of strategy. But there's not a ton of lower level information available! We zero out the features for a particular spot if it's past the world boundary. But we can't immediately tell from these features if a particular move is legal.

This is a big oversight. It's possible for our AI to learn about the legality of moves from the higher level training data. But it would take a lot more data and a lot more time.

So let's add a feature for how "easy" a move is. A value of 0 will indicate an illegal move, either past the world boundary or through a wall when we don't have a drill. A value of 1 will indicate a move that requires a drill. A value of 2 will indicate a normal move.

We'll add the extra feature into the LocationFeatures type. We'll also add an extra parameter to our produceLocationFeatures function. This boolean indicates whether a drill would be necessary. Note, we don't need to account for WorldBoundary. The value will get zeroed out in that case. We'll call this feature lfMoveEase since a higher value indicates less effort.

data LocationFeatures = LocationFeatures
  { …
  , lfMoveEase :: Int
  }

produceLocationFeatures :: Location -> World -> Bool -> LocationFeatures
produceLocationFeatures location@(lx, ly) w needsDrill = LocationFeatures
  …
  moveEase
  where
    moveEase = if not needsDrill then 2
      else if drillsRemaing > 0 then 1 else 0

It's easy to add the extra parameter to the function call in produceWorldFeatures. We already use case statements on the boundary types. Now we need to account for it when vectorizing our world.

vectorizeWorld :: World -> V.Vector Float
vectorizeWorld w = V.fromList (fromInegral <$>
  [ ...
  , lfMoveEase standStill
  ...
  , zeroIfNull (lfMoveEase <$> up)
  ...
  , zeroIfNull (lfMoveEase <$> right)
  ...
  , zeroIfNull (lfMoveEase <$> down)
  ...
  , zeroIfNull (lfMoveEase <$> left)
  ])

We we train with this feature set, we actually get a good training error, down to around 10%. Thus it can learn our data a bit better. Yet it still can't navigate right.

Expanding the Feature Set

Another option we can try is to serialize the world in a more raw state. We currently use more strategic features. But what about using the information on the board?

Here's a different way to look at it. Let's fix it so that the grid must be 10x10, there must be 2 enemies, and we must start with 2 drills powerups on the map. Let's get these features about the world:

100 features for the grid cells. Each feature will be the integer value corresponding the the wall-shape of that cell. These are hexadecimal, like we have when serializing the maze.
4 features for the player. Get the X and Y coordinates for the position, the current stun delay, and the number of drills remaining.
3 features for each enemy. Again, X and Y coordinates, as well as a stun timer.
2 coordinate features for each drill location. Once a drill gets taken, we'll use -1 and -1.

This will give us a total of 114 features. Here's how it breaks down.

vectorizeWorld :: World -> V.Vector Float
vectorizeWorld w = gridFeatures V.++ playerFeatures V.++
                     enemyFeatures V.++ drillFeatures
  where

    -- 1. Features for the Grid
    mazeStr = Data.Text.unpack $ dumpMaze (worldBoundaries w)
    gridFeatures = V.fromList $
      fromIntegral <$> digitToInt <$> mazeStr

    player = worldPlayer w
    enemies = worldEnemies w

    -- 2. Features for the player
    playerFeatures = V.fromList $ fromIntegral <$>
      [ fst . playerLocation $ player
      , snd . playerLocation $ player
      , fromIntegral $ playerCurrentStunDelay player
      , fromIntegral $ playerDrillsRemaining player
      ]

    -- 3. Features for the two enemies
    enemy1 = worldEnemies w !! 0
    enemy2 = worldEnemies w !! 1
    enemyFeatures = V.fromList $ fromIntegral <$>
      [ fst . enemyLocation $ enemy1
      , snd . enemyLocation $ enemy1
      , fromIntegral $ enemyCurrentStunTimer enemy1
      , fst . enemyLocation $ enemy2
      , snd . enemyLocation $ enemy2
      , fromIntegral $ enemyCurrentStunTimer enemy2
      ]

    -- 4. Features for the drill locations
    drills = worldDrillPowerUpLocations w
    drillFeatures = V.fromList $ fromIntegral <$>
      if length drills == 0 then [-1, -1, -1, -1]
        else if length drills == 1
          then [fst (head drills), snd (head drills), -1, -1]
          else [ fst (head drills), snd (head drills)
               , fst (drills !! 1), snd (drills !! 1)
               ]

As an optimization, we can make the grid features part of the world since they will not change.

Still though, our model struggles to complete the grid when training off this data. Compared to the high-level features, the model doesn't even learn very well. We get training errors around 25-30%, but a test error close to 50%. With more data and time, our model might be able to draw the connection between various features.

We could attempt to make our model more sophisticated. We're working with grid data, which is a little like an image. Image processing algorithms use concepts such as convolution and pooling. This allows them to derive patterns arising from how the grid actually looks. We're only looking at the data as a flat vector.

It's unlikely though that convolution and pooling would help us with this feature set. Our secondary features don't fit into the grid. So we would actually want to add them in at a later stage in the process. Besides, we won't get that much data from taking the average value or the max value in a 2x2 segment of the maze. (This is what pooling does).

If we simplify the problem though, we might find a situation where they'll help.

A Simpler Problem

We're having a lot of difficulty with getting our agent to navigate the maze. So let's throw away the problem of navigation for a second. Can we train an agent that will navigate the empty maze? This should be doable.

Let's start with a bare bones feature set with the goal and current location highlighted in a grid. We'll give a value of 10 for our player's location, and a value of 100 for the target location. We start with a vector of all zeros, and uses Vector.// to modify the proper values:

vectorizeWorld :: World -> V.Vector Float
vectorizeWorld w =
  where
    initialGrid = V.fromList $ take 100 (repeat 0.0)
    (px, py) = playerLocation (worldPlayer w)
    (gx, gy) = endLocation w
    playerLocIndex = (9 - py) * 10 + px
    goalLocIndex = (9 - gy) * 10 + gx
    finalFeatures = initialGrid V.//
      [(playerLocIndex, 10), (goalLocIndex)]

Our AI bot will always follow the same path in this grid, so it will be quite easy for our agent to learn this path! Even if we use our own moves and vary the path a little bit, the agent can still learn it. It'll achieve 100% accuracy on the AI data. It can't get that high on our data, since we might choose different moves for different squares. But we can still train it so it wins every time.

Conclusion

So our results are still not looking great. But next week we'll take this last idea and run a little further with it. We'll keep it so that our features only come from the grid itself. But we'll add a few more complications with enemies. We might find that convolution and pooling are useful in that case.

If you're interested in using Haskell for AI but don't know where to start, read our Haskell AI Series! We discuss some important ideas like why Haskell is a good AI language. We also get into the basics of Tensor Flow with Haskell.

James Bowen 10/7/19 James Bowen 10/7/19

Using Our Data with Supervised Learning

Our aim these last couple weeks has been to try a supervised learning approach to our game. In last week's article we gathered training data for playing the game. We had two different sources. First, we played the game ourselves and recorded our moves. Second, we let our AI play the game and recorded it. This gave us a few CSV files. Each line in these is a record containing the 40 "features" of the game board at the time and the move we chose.

This week, we're going to explore how to build a machine-learning agent based on this data. This will use supervised learning techniques . Look at the supervised-learning branch on our Github repository for more details.

To get started with Haskell and Tensor Flow, download our Haskell Tensor Flow Guide. This library is a little tricky to work with, so you want to make sure you know what you're doing!

Defining Our Model

For our supervised model, we're going to use a fully connected neural network with a single hidden layer. We'll have 40 input features, 100 hidden units, and then our 10 output values for the different move scores. We'll be following a very similar pattern to one we explored in this older article, using the basic Iris data set. We'll copy a lot of code from that article. We won't go over a lot of the helper code in this article, so feel free to check out that one for some help with that!

We define each layer with a "weights" matrix and a "bias" vector. We multiply the input by the weights and then add the bias vector. Let's explore how we can build a single layer of the network. This will take the input and output size, as well as the input tensor. It will have three results. One variable for the weights, one for the biases, and then a final "output" tensor:

buildNNLayer :: Int64 -> Int64 -> Tensor v Float
  -> Build (Variable Float, Variable Float, Tensor Build Float)

The definition is pretty simple. We'll initialize random variables for the weights and bias. We'll produce the result tensor by multiplying by the weights and adding the bias.

buildNNLayer :: Int64 -> Int64 -> Tensor v Float
  -> Build (Variable Float, Variable Float, Tensor Build Float)
buildNNLayer inputSize outputSize input = do
  weights <- truncatedNormal (vector [inputSize, outputSize])
    >>= initializedVariable
  bias <- truncatedNormal (vector [outputSize])
    >>= initializedVariable
  let results = (input `matMul` readValue weights)
        `add` readValue bias
  return (weights, bias, results)

Now that we understand the layers a little better, it's easier to define our model. First, we'll want to include both sets of weights and biases in the model, so we can output them later:

data Model = Model
  { w1 :: Variable Float
  , b1 :: Variable Float
  , w2 :: Variable Float
  , b2 :: Variable Float
  ...
  }

Now we want two different "steps" we can run. The training step will take a batch of data and determine what our network produces for the inputs. It will compare our network's output with the expected output. Then it will train to minimize the loss function. The error rate step will simply produce the error rate on the given data. That is, it will tell us what percentage of the moves we are getting correct. Both of these will be Session actions that take two inputs. First, the TensorData for the features, and then the TensorData for the correct moves:

data Model = Model
  { 
  … -- Weights and biases
  , train :: TensorData Float
          -> TensorData Int64
          -> Session ()
  , errorRate :: TensorData Float
              -> TensorData Int64
              -> Session (V.Vector Float) -- Produces a single number
  }

Let's see how we put this all together.

Building Our Model

To start, let's make placeholders for our input features and expected output results. A dimension of -1 means we can provide any size we like:

createModel :: Build Model
createModel = do
  let batchSize = -1
  (inputs :: Tensor Value Float) <-
    placeholder [batchSize, moveFeatures]
  (outputs :: Tensor Value Int64) <-
    placeholder [batchSize]
  ...

Now we build the layers of our neural network using our helper. We'll apply relu, an activation function, on the results of our hidden layer. This helps our model deal with interaction effects and non-linearities:

createModel :: Build Model
createModel = do
  ...
  (hiddenWeights, hiddenBiases, hiddenResults) <-
    buildNNLayer moveFeatures hiddenUnits inputs
  let rectifiedHiddenResults = relu hiddenResults
  (finalWeights, finalBiases, finalResults) <-
    buildNNLayer hiddenUnits moveLabels rectifiedHiddenResults
  ...

Now to get our error rate, we need a couple steps. We'll get the best move from each predicted result using argMax. We can then compare these to the training data using equal. By using reduceMean we'll get the percentage of our moves that match. Subtracting this from 1 gives our error rate:

createModel :: Build Model
createModel = do
  ...
  (actualOutput :: Tensor Value Int64) <- render $
    argMax finalResults (scalar (1 :: Int64))
  let (correctPredictions :: Tensor Build Float) = cast $
        equal actualOutput outputs
  (errorRate_ :: Tensor Value Float) <- render $
    1 - (reduceMean correctPredictions)

Now we need our training step. We'll compare outputs. This involves the softmaxCrossEntropyWithLogits function. We train our model by selecting our variables for training, and using minimizeWith. This will update the variables to reduce the value of the loss function:

createModel :: Build Model
createModel = do
  ...
  let outputVectors = oneHot outputs (fromIntegral moveLabels) 1 0
  let loss = reduceMean $ fst $
    softmaxCrossEntropyWithLogits finalResults outputVectors
  let params =
        [hiddenWeights, hiddenBiases, finalWeights, finalBiases]
  train_ <- minimizeWith adam loss params
  ...

We conclude by creating our functions. These take the tensor data as parameters. Then they use runWithFeeds to put the data into our placeholders:

createModel :: Build Model
createModel = do
  ...
return $ Model
    { train = \inputFeed outputFeed ->
        runWithFeeds
          [ feed inputs inputFeed
          , feed outputs outputFeed
          ]
          train_
    , errorRate = \inputFeed outputFeed ->
        runWithFeeds
          [ feed inputs inputFeed
          , feed outputs outputFeed
          ]
          errorRate_
    , w1 = hiddenWeights
    , b1 = hiddenBiases
    , w2 = finalWeights
    , b2 = finalBiases
    }

Running Our Tests

Now let's run our tests. We'll read the move record data from the file, shuffle them, and set aside a certain proportion as our test set. Then we'll build our model:

runTraining totalFile = runSession $ do
  initialRecords <- liftIO $ readRecordFromFile totalFile
  shuffledRecords <- liftIO $ shuffleM (V.toList initialRecords)
  let testRecords = V.fromList $ take 2000 shuffledRecords
  let trainingRecords = V.fromList $ drop 2000 shuffledRecords
  model <- build createModel
  ...

Then we run our iterations (we'll do 50000, as an example). We select some random records (100 per batch), and then convert them to data. Then we run our train step. Finally, every 100 iterations or so, we'll get a gauge of the training error on this set. This involves the errorRate step. Note our error rate returns a vector with a single wrapped value. So we need to unwrap it with !.

runTraining totalFile = runSession $ do
  ...
  forM_ ([0..50000] :: [Int]) $ \i -> do
    trainingSample <- liftIO $ chooseRandomRecords trainingRecords
    let (trainingInputs, trainingOutputs) =
          convertRecordsToTensorData trainingSample
    (train model) trainingInputs trainingOutputs
    when (i `mod` 100 == 0) $ do
      err <- (errorRate model) trainingInputs trainingOutputs
      liftIO $ putStrLn $
        (show i) ++ " : current error " ++ show ((err V.! 0) * 100)

Now to run the final test, we use the errorRate step again, this time on our test data:

runTraining totalFile = runSession $ do
  ...

  -- Testing
  let (testingInputs, testingOutputs) =
        convertRecordsToTensorData testRecords
  testingError <- (errorRate model) testingInputs testingOutputs
  liftIO $ putStrLn $
    "test error " ++ show ((testingError V.! 0) * 100)

Results

When it comes to testing our system, we should use proper validation techniques. We want a model that will represent our training data well. But it should also generalize well to other reasonable examples. If our model represents the training data too well, we're in danger of "overfitting" our data. To check this, we'll hold back roughly 20% of the data. This will be our "test" data. We'll train our model on the other 80% of the data. Every 100 steps or so, we print out the training error on that batch of data. We hope this figure drops. But then at the very end, we'll run the model on the other 20% of the data, and we'll see what the error rate is. This will be the true test of our system.

We know we have overfitting if we see figures on training error that are lower than the testing error. When training on human moves for 50000 iterations, the training error drops to the high teens and low 20's. But the test error is often still close to 50%. This suggests we shouldn't be training quite as long.

The AI moves provide a little more consistency though. The training error seems to stabilize around the mid 20's and low 30's, and we end up with a test error of about 34%.

Conclusion

Our error rate isn't terrible. But it's not great either. And worse, testing shows it doesn't appear to capture the behaviors well enough to win the game. A case like this suggests our model isn't sophisticated enough to capture the problem. It could also suggest our data is too noisy, and the patterns we hoped to find aren't there. The feature set we have might not capture all the important information about the graph.

For our final look at this problem, we're going to try a more new serialization technique. Instead of deriving our own features, we're going to serialize the entire game board! The "feature space" will be much much larger now. It will include the structure of the graph and information about enemies and drills. This will call for a more sophisticated model. A pure fully connected network will take a long time to learn things like how walls allow moves or not. A big drawback of this technique is that it will not generalize to arbitrary mazes. It will only work for a certain size and number of enemies. But with enough training time we may find that interesting patterns emerge. So come back next week to see how this works!

James Bowen 9/30/19 James Bowen 9/30/19

Gathering Smart Data

Last week we made a few more fixes to our Q-Learning algorithm. Ultimately though, it still seems to fall short for even basic versions of our problem.

Q-learning is an example of an "unsupervised" learning approach. We don't tell the machine learning algorithm what the "correct" moves are. We give it rewards when it wins the game (and negative rewards when it loses). But it needs to figure out how to play to get those rewards. With supervised learning, we'll have specific examples of what it should do! We'll have data points saying, "for this feature set, we should make this move." We'll determine a way to record the moves we make in our game, both as a human player and with our manual AI algorithm! This will become our "training" data for the supervised learning approach.

This week's code is all on the Gloss side of things. You can find it on our Github repository under the branch record-player-ai. Next week, we'll jump back into Tensor Flow. If you're not familiar yet with how to use Haskell and Tensor Flow, download our Haskell Tensor Flow Guide!

Recording Moves

To gather training data, we first need a way to record moves in the middle of the game. Gloss doesn't give us access to the IO monad in our update functions. So we'll unfortunately have to resort to unsafePerformIO for this, since we need the data in a file. (We did the same thing when saving game states). Here's the skeleton of our function:

unsafeSaveMove :: Int -> World -> World -> World
unsaveSaveMove moveChoice prevWorld nextWorld = unsafePerformIO $ do
  ...

The first parameter will be a representation of our move, an integer from 0-9. This follows the format we had with serialization.

0 -> Move Up
1 -> Move Right
2 -> Move Down
3 -> Move Left
4 -> Stand Still
X + 5 -> Move direction X and use the stun

The first World parameter will be the world under which we made the move. The second world will be the resulting world. This parameter only exists as a pass-through, because of how unsafePerformIO works.

Given these parameters, our function is pretty straightforward. We want to record a single line that has the serialized world state values and our final move choice. These will go in a comma separated list. We'll save everything to a file called moves.csv. So let's open that file and get the list of numbers. We'll immediately convert the numbers to strings with show.

unsafeSaveMove :: Int -> World -> World -> World
unsaveSaveMove moveChoice prevWorld nextWorld = unsafePerformIO $ do
  handle <- openFile "moves.csv" AppendMode
  let numbers = show <$>
      (Vector.toList (vectorizeWorld prevWorld) ++
        [fromIntegral moveChoice])
  ...

Now that our values are all strings, we can get them in a comma separated format with intercalate. We'll write this string to the file and close the handle!

unsafeSaveMove :: Int -> World -> World -> World
unsaveSaveMove moveChoice prevWorld nextWorld = unsafePerformIO $ do
  handle <- openFile "moves.csv" AppendMode
  let numbers = show <$>
        (Vector.toList (vectorizeWorld prevWorld) ++
          [fromIntegral moveChoice])
  let csvString = intercalate "," numbers
  hPutStrLn handle csvString
  hClose handle
  return nextWorld

Now let's figure out how to call this function!

Saving Human Moves

Saving the moves we make as a human is pretty easy. All we need to do is hook into the inputHandler. Recall this section, that receives moves from arrow keys and makes our move:

inputHandler :: Event -> World -> World
inputHandler event w
  ...
  | otherwise = case event of
      (EventKey (SpecialKey KeyUp) Down (Modifiers _ _ Down) _) ->
        drillLocation upBoundary breakUpWall breakDownWall w
      (EventKey (SpecialKey KeyUp) Down _ _) ->
        updatePlayerMove upBoundary
      (EventKey (SpecialKey KeyDown) Down (Modifiers _ _ Down) _) ->
        drillLocation downBoundary breakDownWall breakUpWall w
      (EventKey (SpecialKey KeyDown) Down _ _) ->
        updatePlayerMove downBoundary
      (EventKey (SpecialKey KeyRight) Down (Modifiers _ _ Down) _) ->
        drillLocation rightBoundary breakRightWall breakLeftWall w
      (EventKey (SpecialKey KeyRight) Down _ _) ->
        updatePlayerMove rightBoundary
      (EventKey (SpecialKey KeyLeft) Down (Modifiers _ _ Down) _) ->
        drillLocation leftBoundary breakLeftWall breakRightWall w
      (EventKey (SpecialKey KeyLeft) Down _ _) ->
        updatePlayerMove leftBoundary
      (EventKey (SpecialKey KeySpace) Down _ _) ->
        if playerCurrentStunDelay currentPlayer /= 0
          then w
          else w
            { worldPlayer =
                activatePlayerStun currentPlayer playerParams
            , worldEnemies = stunEnemyIfClose <$> worldEnemies w
            , stunCells = stunAffectedCells
            }
  …

All these lines return World objects! So we just need to wrap them as the final argument to unsafeSaveWorld. Then we add the appropriate move choice number. The strange part is that we cannot move AND stun at the same time when playing as a human. So using the stun will always be 9, which means stunning while standing still. Here are the updates:

inputHandler :: Event -> World -> World
inputHandler event w
  ...
  | otherwise = case event of
      (EventKey (SpecialKey KeyUp) Down (Modifiers _ _ Down) _) -> 
        unsafeSaveMove 0 w $
          drillLocation upBoundary breakUpWall breakDownWall w
      (EventKey (SpecialKey KeyUp) Down _ _) ->
        unsafeSaveMove 0 w $ updatePlayerMove upBoundary
      (EventKey (SpecialKey KeyDown) Down (Modifiers _ _ Down) _) -> 
        unsafeSaveMove 2 w $
          drillLocation downBoundary breakDownWall breakUpWall w
      (EventKey (SpecialKey KeyDown) Down _ _) ->
        unsafeSaveMove 2 w $ updatePlayerMove downBoundary
      (EventKey (SpecialKey KeyRight) Down (Modifiers _ _ Down) _) -> 
        unsafeSaveMove 1 w $
          drillLocation rightBoundary breakRightWall breakLeftWall w
      (EventKey (SpecialKey KeyRight) Down _ _) ->
        unsafeSaveMove 1 w $ updatePlayerMove rightBoundary
      (EventKey (SpecialKey KeyLeft) Down (Modifiers _ _ Down) _) ->
        unsafeSaveMove 3 w $
          drillLocation leftBoundary breakLeftWall breakRightWall w
      (EventKey (SpecialKey KeyLeft) Down _ _) ->
        unsafeSaveMove 3 w $ updatePlayerMove leftBoundary
      (EventKey (SpecialKey KeySpace) Down _ _) ->
        if playerCurrentStunDelay currentPlayer /= 0
          then w
          else unsafeSaveMove 9 w $ w
            { worldPlayer =
                activatePlayerStun currentPlayer playerParams
            , worldEnemies = stunEnemyIfClose <$> worldEnemies w
            , stunCells = stunAffectedCells
            }
  …

And now whenever we play the game, it will save our moves! Keep in mind though, it takes a lot of training data to get good results when using supervised learning. I played for an hour and got around 10000 data points. We'll see if this is enough!

Saving AI Moves

While the game is a least a little fun, it's also exhausting to keep playing it to generate data! So now let's consider how we can get the AI to play the game itself and generate data. The first step is to reset the game automatically on winning or losing:

updateFunc :: Float -> World -> World
updateFunc _ w =
  | (worldResult w == GameWon || worldResult w == GameLost) &&
       (usePlayerAI params) =
    ...

The rest will follow the other logic we have for resetting the game. Now we must examine where to insert our call to unsafeSaveMove. The answer is our updateWorldForPlayerMove function. Wecan see that we get the move (and our player's cached memory) as part of makePlayerMove:

updateWorldForPlayerMove :: World -> World
updateWorldForPlayerMove w = …
  where
    (move, memory) = makePlayerMove w
    ...

We'll want a quick function to convert our move into the number choice:

moveNumber :: PlayerMove -> Int
moveNumber (PlayerMove md useStun dd) =
  let directionFactor = case (md, dd) of
        (DirectionUp, _) -> 0
        (_, DirectionUp) -> 0
        (DirectionRight, _) -> 1
        (_, DirectionRight) -> 1
        (DirectionDown, _) -> 2
        (_, DirectionDown) -> 2
        (DirectionLeft, _) -> 3
        (_, DirectionLeft) -> 3
        _ -> 4
  in  if useStun then directionFactor + 5 else directionFactor

Our saving function requires a pass-through world parameter. So we'll do the saving on our first new World calculation. This comes from modifyWorldForPlayerDrill:

updateWorldForPlayerMove :: World -> World
updateWorldForPlayerMove w = …
  where
    (move, memory) = makePlayerMove w

    worldAfterDrill = unsafeSaveMove (moveNumber move) w
     (modifyWorldForPlayerDrill …)
    ...

And that's all! Now our AI will play the game by itself, gathering data for hours on end if we like! We'll get some different data for different cases, such as 4 enemies 4 drills, 8 enemies 5 drills, and so on. This is much faster and easier than playing the game ourselves! It will automatically get 12-15 thousand data points an hour if we let it!

Conclusion

With a little bit of persistence, we can now get a lot of data for the decisions a smarter agent will make. Next week, we'll take the data we've acquired and use it to write a supervised learning algorithm! Instead of using Q-learning, we'll make the weights reflect the decisions that we (or the AI) would make.

Supervised learning is not without its pitfalls! It won't necessarily perform optimally. It will perform like the training data. So even if we're successful, our algorithm will replicate our own mistakes! It'll be interesting to see how this plays out, so stay tuned!

For more information on using Haskell in AI, take a look at our Haskell AI Series. Plus, download our Haskell Tensor Flow Guide to learn more about using this library!

James Bowen 9/23/19 James Bowen 9/23/19

Tweaks, Fixes, and Some Results

In last week's episode of this AI series, we added random exploration to our algorithm. This helped us escape certain "traps" and local minimums in the model that could keep us rooted in bad spots. But it still didn't improve results too much.

This week we'll explore a couple more ways we can fix and improve our algorithm. For the first time, see some positive outcomes. Still, we'll find our approach still isn't great.

To get started with Tensor Flow and Haskell, download our guide! It's a complex process so you'll want some help! You should also check out our Haskell AI Series to learn more about why Haskell is a good choice as an AI language!

Improvements

To start out, there are a few improvements we can make to how we do q-learning. Let's recall the basic outline of running a world iteration. There are three steps. We get our "new" move from the "input" world. Then we apply that move, and get our "next" move against the "next" world. Then we use the possible reward to create our target actions, and use that to train our model.

runWorldIteration model = do
  (prevWorld, _, _) <- get

  -- Get the next move on the current world (with random chance)
  let inputWorldVector = … -- vectorize prevWorld
  currentMoveWeights <- lift $ lift $
    (iterateWorldStep model) inputWorldVector
  let bestMove = moveFromOutput currentMoveWeights
  let newMove = chooseRandomMoveWithChance …

  -- Get the next world using this move, and produce our next move
  let nextWorld = stepWorld newMove prevWorld
  let nextWorldVector = vectorizeWorld nextWorld
  nextMoveVector <- lift $ lift $
    (iterateWorldStep model) nextWorldVector

  -- Use these to get "target action values" and use them to train!
  let (bestNextMoveIndex, maxScore) =
          (V.maxIndex nextMoveVector, V.maximum nextMoveVector)
  let targetActionData = encodeTensorData (Shape [10, 1]) $
          nextMoveVector V.//
            [(bestNextMoveIndex, newReward + maxScore)]
  lift $ lift $ (trainStep model) nextWorldVector targetActionData

There are a couple issues here. First, we want to substitute based on the first new move, not the later move. We want to learn from the move we are taking now, since we assess its result now. Thus we want to substitute for that index. We'll re-write our randomizer to account for this and return the index it chooses.

Next, when training our model, we should the original world, instead of the next world. That is, we want inputWorldVector instead of nextWorldVector. Our logic is this. We get our "future" action, which accounts for the game's reward. We want our current action on this world should be more like the future action. Here's what the changes look like:

runWorldIteration model = do
  (prevWorld, _, _) <- get

  -- Get the next move on the current world (with random chance)
  let inputWorldVector = … -- vectorize prevWorld
  currentMoveWeights <- lift $ lift $
    (iterateWorldStep model) inputWorldVector
  let bestMove = moveFromOutput currentMoveWeights
  let (newMove, newMoveIndex) = chooseRandomMoveWithChance …

  -- Get the next world using this move, and produce our next move
  let nextWorld = stepWorld newMove prevWorld
  let nextWorldVector = vectorizeWorld nextWorld
  nextMoveVector <- lift $ lift $
    (iterateWorldStep model) nextWorldVector

  -- Use these to get "target action values" and use them to train!
  let maxScore = V.maximum nextMoveVector
  let targetActionData = encodeTensorData (Shape [10, 1]) $
          nextMoveVector V.//
            [(newMoveIndex, newReward + maxScore)]
  lift $ lift $ (trainStep model) inputWorldVector targetActionData

Another change we can make is to provide some rewards based on whether the selected move was legal or not. To do this, we'll need to update the stepWorld game API to return this boolean value:

stepWorld :: PlayerMove -> World -> (World, Bool)

Then we can add a small amount (0.01) to our reward value if we get a legal move, and subtract this otherwise.

As a last flourish, we should also add a timeout condition. Our next step will be to test on simple mazes that have no enemies. This means we'll never get eaten, so we need some loss condition if we get stuck. This timeout condition should have the same negative reward as losing.

Results

Now that we've made some improvements, we'll train on a very basic maze that's only 5x5 and has no walls and no enemies. Whereas we used to struggle to even finish this maze, we now achieve the goal a fair amount of the time. One of our training iterations achieved the goal around 2/3 of the time.

However, our bot is still useless against enemies! It loses every time if we try to train from scratch on a map with a single enemy. One attempt to circumvent this is to first train our weights to solve the empty maze. Then we can start with these weights as we attempt to avoid the enemy. That way, we have some pre-existing knowledge, and we don't have to learn everything at once. Still though, it doesn't result in much improvement. Typical runs only succeeded 40-50 times out of 2000 iterations.

Limiting Features

One conclusion we can draw is that we actually have too many features! Our intuition is that a larger feature set would take more iterations to learn. If the features aren't chosen carefully, they'll introduce noise.

So instead of tracking 8 features for each possible direction of movement, let's stick with 3. We'll see if the enemy is on the location, check the distance to the end, and count the number of nearby enemies. When we do this, we get comparable results on the empty maze. But when it comes to avoiding enemies, we do a little better, surviving 150-250 iterations out of 2000. These statistics are all very rough, of course. If we wanted a more thorough analysis, we'd use multiple maze configurations and a lot more runs using the finalized weights.

Conclusions

We can't draw too many conclusions from this yet. Our model is still failing to solve simple versions of our problem. It's quite possible that our model is too simplistic. After all, all we're doing is a simple matrix multiplication on our features. In theory, this should be able to solve the problem, but it may take a lot more iterations. The results stream we see also suggests local minimums are a big problem. Logging information reveals that we often die in the same spot in the maze many times in a row. The negative rewards aren't enough to draw us out, and we are often relying on random moves to find better outcomes.

So next week we're going to start changing our approach. We'll explore a way to introduce supervised learning into our process. This depends on "correct" data. We'll try a couple different ways to get that data. We'll use our own "human" input, as well as the good AI we've written in the past to solve this problem. All we need is a way to record the moves we make! So stay tuned!

James Bowen 9/9/19 James Bowen 9/9/19

Running Training Iterations

In our last article we built a simple Tensor Flow model to perform Q-Learning on our brain. This week, we'll build out the rest of the code we need to run iterations on this model. This will train it to perform better and make more intelligent decisions.

The machine learning code for this project is in a separate repository from the game code. Check out MazeLearner to follow along. Everything for this article is on the basic-trainer branch.

To learn more about Haskell and AI, make sure to read our Haskell and AI Series!

Iterating on the Model

First let's recall what our Tensor Flow model looks like:

data Model = Model
  { weightsT :: Variable Float
  , iterateWorldStep :: TensorData Float -> Session (Vector Float)
  , trainStep :: TensorData Float -> TensorData Float -> Session ()
  }

We need to think about how we're going to use the last two functions of it. We want to iterate on and make updates to the weights. Across the different iterations, there's certain information we need to track. The first value we'll track is the list of "rewards" from each iteration (this will be more clear in the next section). Then we'll also track the number of wins we get in the iteration.

To track these, we'll use the State monad, run on top the the Session.

runAllIterations :: Model -> World -> StateT ([Float], Int) Session ()

We'll also want a function to run a single iteration. This, in turn, will have its own state information. It will track the World state of the game it's playing. It will also track sum of the accumulated reward values from the moves in that game. Since we'll run it from our function above, it will have a nested StateT type. It will ultimately return a boolean value indicating if we have won the game. We'll define the details in the next section:

runWorldIteration :: Model ->
  StateT (World, Float) (StateT ([Float], Int) Session) Bool

We can now start by filling out our function for running all the iterations. Supposing we'll perform 1000 iterations, we'll make a loop for each iteration. We can start each loop by running the world iteration function on the current model.

runAllIterations :: Model -> World -> StateT ([Float], Int) Session ()
runAllIterations model initialWorld = do
  let numIterations = 1000
  void $ forM [1..numIterations] $ \i -> do
    (wonGame, (_, finalReward)) <-
      runStateT (runWorldIteration model world)
    ...

And now the rest is a simple matter of using our the results to update the existing state:

runAllIterations :: Model -> World -> StateT ([Float], Int) Session ()
runAllIterations model initialWorld = do
  let numIterations = 2000
  forM [1..numIterations] $ \i -> do
    (wonGame, (_, finalReward)) <-
      runStateT (runWorldIteration model) (initialWorld, 0.0)
    (prevRewards, prevWinCount) <- get
    let newRewards = finalReward : prevRewards
    let newWinCount = if wonGame
          then prevWinCount + 1
          else prevWinCount
    put (newRewards, newWinCount)

Running a Single Iteration

Now let's delve into the process of a single iteration. Broadly speaking, we have four goals.

Take the current world and serialize it. Pass it through the iterateStep to get the move our model would make in this world.
Apply this move, getting the "next" world state.
Determine the scores for our moves in this next world. Apply the given reward as the score for the best of these moves.
Use this result to compare against our original moves. Feed it into the training step and update our weights.

Let's start with steps 1 and 2. We'll get the vector representation of the current world. Then we need to encode it as TensorData so we can pass it to an input feed. Next we run our model's iterate step and get our output move. Then we can use that to advance the world state using stepWorld and updateEnvironment.

runWorldIteration
  :: Model
  -> StateT (World, Float) (StateT ([Float], Int) Session) Bool
runWorldIteration model = do
  -- Serialize the world
  (prevWorld :: World, prevReward) <- get
  let (inputWorldVector :: TensorData Float) =
        encodeTensorData (Shape [1, 8]) (vectorizeWorld prevWorld)

  -- Run our model to get the output vector and de-serialize it
  -- Lift twice to get into the Session monad
  (currentMove :: Vector Float) <- lift $ lift $
    (iterateWorldStep model) inputWorldVector
  let newMove = moveFromOutput currentMove

  -- Get the next world state
  let nextWorld = updateEnvironment (stepWorld newMove prevWorld)

Now we need to perform the Q-Learning step. We'll start by repeating the process in our new world state and getting the next vector of move scores:

runWorldIteration model = do
  ...
  let nextWorld = updateEnvironment (stepWorld newMove prevWorld)

  let nextWorldVector =
        encodeTensorData (Shape [1, 8]) (vectorizeWorld nextWorld)

  (nextMoveVector :: Vector Float) <- lift $ lift $
    (iterateWorldStep model) nextWorldVector
  ...

Now it gets a little tricky. We want to examine if the game is over after our last move. If we won, we'll get a reward of 1.0. If we lost, we'll get a reward of -1.0. Otherwise, there's no reward. While we figure out this reward value, we can also determine our final monadic action. We could return a boolean value if the game is over, or recursively iterate again:

runWorldIteration model = do
  ...
  let nextWorld = ...
  (nextMoveVector :: Vector Float) <- ...
  let (newReward, containuationAction) = case worldResult nextWorld of
        GameInProgress -> (0.0, runWorldIteration model)
        GameWon -> (1.0, return True)
        GameLost -> (-1.0, return False)
  ...

Now we'll look at the vector for our next move and replace one of its values. We'll find the maximum score, and replace it with a value that factors in the actual reward we get from the game. This is how we insert "truth" into our training process and how we'll actually learn good reward values.

import qualified Data.Vector as V

runWorldIteration model = do
  ...
  let nextWorld = ...
  (nextMoveVector :: Vector Float) <- ...
  let (newReward, containuationAction) = ...
  let (bestNextMoveIndex, maxScore) =
        (V.maxIndex nextMoveVector, V.maximum nextMoveVector)
  let (targetActionValues :: Vector Float) = nextMoveVector V.//
        [(bestNextMoveIndex, newReward + (0.99 * maxScore))]
  let targetActionData =
        encodeTensorData (Shape [10, 1]) targetActionValues
  ...

Then we'll encode this new vector as the second input to our training step. We'll still use the nextWorldVector as the first input. We conclude by updating our state variables to have their new values. Then we run the continuation action we got earlier.

runWorldIteration model = do
  ...
  let nextWorld = ...
  (nextMoveVector :: Vector Float) <- ...
  let targetActionData = ...

  -- Run training to alter the weights
  lift $ lift $ (trainStep model) nextWorldVector targetActionData
  put (nextWorld, prevReward + newReward)
  continuationAction

Tying It Together

Now to make this code run, we need a little bit of code to tie it together. We'll make a Session action to train our game. It will output the final weights of our model.

trainGame :: World -> Session (Vector Float)
trainGame w = do
  model <- buildModel
  (finalReward, finalWinCount) <-
    execStateT (runAllIterations model w) ([], 0)
  run (readValue $ weightsT model)

Then we can run this from IO using runSession.

playGameTraining :: World -> IO (Vector Float)
playGameTraining w = runSession (trainGame w)

Last of all, we can run this on any World we like by first loading it from a file. For our first examples, we'll use a smaller 10x10 grid with 2 enemies and 1 drill powerup.

main :: IO ()
main = do
  world <- loadWorldFromFile "training_games/maze_grid_10_10_2_1.game"
  finalWeights <- playGameTraining world
  print finalWeights

Conclusion

We've now got the basics down for making our Tensor Flow program work. Come back next week where we'll take a more careful look at how it's performing. We'll see if the AI from this process is actually any good or if there are tweaks we need to make to the learning process.

And make sure to download our Haskell Tensor Flow Guide! This library is difficult to use. There are a lot of secondary dependencies for it. So don't go in trying to use it blind!

James Bowen 9/2/19 James Bowen 9/2/19

Making a Learning Model

Last week we took a few more steps towards using machine learning to improve the player AI for our maze game. We saw how to vectorize the input and output data for our world state and moves. This week, we'll finally start seeing how to use these in the larger context of a Tensor Flow program. We'll make a model for a super basic neural network that will apply the technique of Q-Learning.

Our machine learning code will live in a separate repository than the primary game code. Be sure to check that out here! The first couple weeks of this part of the series will use the basic-trainer branch.

This week, we'll finally started diving into using Haskell with Tensor Flow. Be sure to read our Haskell AI Series to learn more about this! You can also download our Haskell Tensor Flow guide to learn the basics of the library.

Model Basics

This week's order of business will be to build a Tensor Flow graph that can make decisions in our maze game. The graph should take a serialized world state as an input, and then produce a distribution of scores. These scores correspond to the different moves we can make.

Re-calling from last week, the input to our model will be a 1x8 vector, and the output will be a 10x1 vector. For now then, we'll represent our model with a single variable tensor that will be a matrix of size 8x10. We'll get the output by multiply the inputs by the weights.

Ultimately, there are three things we need to access from this model.

The final weights
A step to iterate the world
A step to train our model and adjust the weights.

Here's what the model looks like, using Tensor Flow types:

data Model = Model
  { weightsT :: Variable Float
  , iterateWorldStep :: TensorData Float -> Session (Vector Float)
  , trainStep :: TensorData Float -> TensorData Float -> Session ()
  }

The first element is the variable tensor for our weights. We need to expose this so we can output them at the end. The second element is a function that will take in a serialized world state and produce the output move. Then the third element will take both a serialized world state AND some expected values. It will update the variable tensor as part of the Q-Learning process. Next week, we'll write iteration functions in the Session monad. They'll use these two elements.

Building the Iterate Step

To make these Tensor Flow items, we'll also need to use the Session monad. Let's start a basic function to build up our model:

buildModel :: Session Model
buildModel = do
  ...

To start, let's make a variable for our weights. At the start, we'll randomize them with truncatedNormal and then make that into a Variable:

buildModel :: Session Model
buildModel = do
  (initialWeights :: Tensor Value Float) <-
    truncatedNormal (vector [8, 10])
  (weights :: Variable Float) <- initializedVariable initialWeights

Now let's build the items for running our iterate step. This first involves taking the inputs as a placeholder. Remember, the inputs come from the vectorization of the world state.

Then to produce our output, we'll multiply the inputs by our weights. The result is a Build tensor, so we need to render it to use it in the next part. As an extra note, we need readValue to turn our Variable into a Tensor we can use in operations.

buildModel :: Session Model
buildModel = do
  (initialWeights :: Tensor Value Float) <-
    truncatedNormal (vector [8, 10])
  (weights :: Variable Float) <- initializedVariable initialWeights
  (inputs :: Tensor Value Float) <- placeholder (Shape [1,8])
  let (allOutputs :: Tensor Build Float) =
        inputs `matMul` (readValue weights)
  returnedOutputs <- render allOutputs
  ...

The next part is to create a step to "run" the outputs. Since the outputs depend on a placeholder, we need to create a feed for the input. Then we can create a runnable Session action with runWithFeeds. This gives us the second element of our Model, the iterateStep.

buildModel :: Session Model
buildModel = do
  ...
  let iterateStep = \inputFeed ->
        runWithFeeds [feed inputs inputFeed] returnedOutputs
  ...

Using Q-Learning in the Model

This gives us what we need to run our basic AI and make moves in the game. But we still need to apply some learning mechanism to update the weights!

We want to use Q-Learning. This means we'll compare the output of our model with the next output from continuing to step through the world. So first let's introduce another placeholder for these new outputs:

buildModel :: Session Model
buildModel = do
  initialWeights <- ...
  weights <- ...
  inputs <- ...
  returnedOutputs <- ...

  let iterateStep = ...

  -- Next set of outputs
  (nextOutputs :: Tensor Value Float) <- placeholder (Shape [10, 1])
  ...

Now we'll define our "loss" function. That is, we'll find the squared difference between our real output and the "next" output. Next week we'll see that the "next" output uses extra information about the game. This will allow us to bring an element of "truth" that we can learn from.

buildModel :: Session Model
buildModel = do
  ...
  returnedOutputs <- ...

  -- Q-Learning Section
  (nextOutputs :: Tensor Value Float) <- placeholder (Shape [10, 1])
  let (diff :: Tensor Build Float) = nextOutputs `sub` allOutputs
  let (loss :: Tensor Build Float) = reduceSum (diff `mul` diff)
  ...

Now, we'll make a final ControlNode using minimizeWith. This will minimize the loss function using the adam optimizer. We'll pass weights as an input, since this is a variable we are trying to update for this change.

buildModel :: Session Model
buildModel = do
  ...
  returnedOutputs <- ...

  -- Q-Learning Section
  (nextOutputs :: Tensor Value Float) <- placeholder (Shape [10, 1])
  let (diff :: Tensor Build Float) = nextOutputs `sub` allOutputs
  let (loss :: Tensor Build Float) = reduceSum (diff `mul` diff)
  (trainer_ :: ControlNode) <- minimizeWith adam loss [weights]

Finally, we'll make our training step, that will run the training node on two input feeds. One for the world input, and one for the expected output. Then we can return our completed model.

buildModel :: Session Model
buildModel = do
  ...
  inputs <- ...
  weights <- ...
  returnedOutputs <- ...
  let iterateStep = ...

  -- Q-Learning Section
  (nextOutputs :: Tensor Value Float) <- placeholder (Shape [10, 1])
  let diff = ...
  let loss = ...
  (trainer_ :: ControlNode) <- minimizeWith adam loss [weights]
  let trainingStep = \inputFeed nextOutputFeed -> runWithFeeds
        [ feed inputs inputFeed
        , feed nextOutputs nextOutputFeed
        ]
        trainer_
  return $ Model
    weights
    iterateStep
    trainingStep

Conclusion

Now we've got our machine learning model. We have different functions that can iterate on our world state as well as train the outputs of our graph. Next week, we'll see how to combine these steps within the Session monad. Then we can start running training iterations and produce results.

If you want to follow along with these code examples, make sure to download our Haskell Tensor Flow Guide! This library is quite tricky to use. There are a lot of secondary dependencies for it. So you won't want to go in trying to use it blind!

James Bowen 8/26/19 James Bowen 8/26/19

Q-Learning Primer

This week, we're going to take the machine learning process in a different direction than I expected. In the last couple weeks, we've built a simple evaluation function for our world state. We could learn this function using an approach called Temporal Difference learning. We might come back to this approach at some point. But for now, we're actually going to try something a little different.

Instead, we're going to focus on a technique called Q-Learning. Instead of an evaluation function for the world, we're going to learn makePlayerMove. We'll keep most of the same function structure. We're still going to take our world and turn it into a feature space that we can represent as a numeric vector. But instead of producing a single output, we'll give a score for every move from that position. This week, we'll take the basic steps to ready our game for this approach.

As always, check out the Github repository repository for this project. This week's code is on the q-learning branch!

Next week, we'll finally get into some Tensor Flow code. Make sure you're ready for it by reading up on our Tensor Flow Guide!

Vectorizing Inputs

To learn a function, we need to be able to represent both the inputs and the outputs of our system as numeric vectors. We've already done most of the work here. Let's recall our evaluateWorld function. We'll keep the same feature values. But now we'll wrap them, instead of applying scores immediately:

data WorldFeatures = WorldFeatures
  { onActiveEnemy :: Int
  , shortestPathLength :: Int
  , manhattanDistance :: Int
  , enemiesOnPath :: Int
  , nearestEnemyDistance :: Int
  , numNearbyEnemies :: Int
  , stunAvailable :: Int
  , drillsRemaining :: Int
  }

produceWorldFeatures :: World -> WorldFeatures
produceWorldFeatures w = WorldFeatures
  (if onActiveEnemy then 1 else 0)
  shortestPathLength
  manhattanDistance
  enemiesOnPath
  nearestEnemyDistance
  numNearbyEnemies
  (if stunAvailable then 1 else 0)
  (fromIntegral drillsRemaining)
  where
    -- Calculated as before
    onActiveEnemy = ...
    enemiesOnPath = ...
    shortestPathLength = ...
    nearestEnemyDistance = ...
    manhattanDistance = ...
    stunAvailable = ...
    numNearbyEnemies = ...
    drillsRemaining = ...

Now, in our ML code, we'll want to convert this into a vector. Using a vector will enable use to encode this information as a tensor.

vectorizeWorld :: World -> Vector Float
vectorizeWorld w = fromList (fromIntegral <$>
  [ wfOnActiveEnemy features
  , wfShortestPathLength features
  , wfManhattanDistance features
  , wfEnemiesOnPath features
  , wfNearestEnemyDistance features
  , wfNumNearbyEnemies features
  , wfStunAvailable features
  , wfDrillsRemaining features
  ])
  where
    features = produceWorldFeatures w

Vectorizing Outputs

Now we have the inputs to our tensor system. We'll ultimately get a vector of outputs as a result. We want this vector to provide a score for every move. We have 10 potential moves in general. There are five "movement" directions, moving up, right, down, left, and standing still. Then for each direction, we can either use our stun or not. We'll use a drill when the movement direction sends us against a wall. Certain moves won't be available in certain situations. But our size should account for them all.

Our function will often propose invalid moves. For example, it might suggest using the stun while on cooldown, or drilling when we don't have one. In these cases, our game logic should dictate that our player doesn't move. Hopefully this trains our network to make correct moves. If we wanted to, we could even apply a slight negative reward for these.

What we need is the ability to convert a vector of outputs into a move. Once we fix the vector size, this is not difficult. As a slight hack, this function will always give the same direction for moving and drilling. We'll let the game logic determine if the drill needs to apply.

moveFromOutput :: Vector Int -> PlayerMove
moveFromOutput vals = PlayerMove moveDirection useStun moveDirection
  where
    bestMoveIndex = maxIndex vals
    moveDirection = case bestMoveIndex `mod` 5 of
      0 -> DirectionUp
      1 -> DirectionRight
      2 -> DirectionDown
      3 -> DirectionLeft
      4 -> DirectionNone
    useStun = bestMoveIndex > 4

Discrete Updates

Now that we can get numeric vectors for everything, we need to be able to step through the world, one player move at a time. We currently have a generic update function. Depending on the time, it might step the player forward, or it might not. We want to change this so there are two steps. First we receive a player's move, and then we step the world forward until it is time for the next player move:

stepWorld :: PlayerMove -> World -> World

This isn't too difficult; it just requires a little shuffling of our existing code. First, we'll add a new applyPlayerMove' function. This will take the existing applyPlayerMove and add a little bit of validation to it:

applyPlayerMove' :: PlayerMove -> World -> World
applyPlayerMove' move w = if isValidMove
  then worldAfterMove
  else w
  where
    player = worldPlayer w
    currentLoc = playerLocation player

    worldAfterDrill = modifyWorldForPlayerDrill w
      (drillDirection move)

    worldAfterStun = if activateStun move
      then modifyWorldForStun worldAfterDrill
      else worldAfterDrill

    newLocation = nextLocationForMove
      (worldBoundaries worldAfterDrill Array.! currentLoc) 
      currentLoc
      (playerMoveDirection move)

    isValidStunUse = if activateStun move
      then playerCurrentStunDelay player == 0
      else True
    isValidMovement = playerMoveDirection move == DirectionNone ||
      newLocation /= currentLoc
    isValidMove = isValidStunUse && isValidMovement

    worldAfterMove =
      modifyWorldForPlayerMove worldAfterStun newLocation

Now we'll add an updateEnvironment function. This will perform all the work of our updateFunc except for moving the player.

updateEnvironment :: World -> World
updateEnvironment w
  | playerLocation player == endLocation w =
    w { worldResult = GameWon }
  | playerLocation player `elem` activeEnemyLocations =
    w { worldResult = GameLost }
  | otherwise =
    updateWorldForEnemyTicks .
    updateWorldForPlayerTick .
    updateWorldForEnemyMoves .
    clearStunCells .
    incrementWorldTime $ w
  where
    player = worldPlayer w
    activeEnemyLocations = enemyLocation <$>
      filter (\e -> enemyCurrentStunTimer e == 0) (worldEnemies w)

Now we combine these. First we'll make the player's move. Then we'll update the environment once for each tick of the player's "lag" time.

stepWorld :: PlayerMove -> World -> World
stepWorld move w = execStateM (sequence updateActions) worldAfterMove
  where
    worldAfterMove = applyPlayerMove' move w
    updateActions = replicate
      ( fromIntegral .
        lagTime .
        playerGameParameters .
        worldParameters $ w)
      (modify updateEnvironment)

And these are all the modifications we'll need to get going!

Q Learning Teaser

Now we can start thinking about the actual machine learning process. We'll get into a lot more detail next week. But for now, let's think about a particular training iteration. We'll want to use our existing network to step forward into the game. This will produce a certain "reward", and leave the game state in a new position. Then we'll get more values for our next moves out of that position. We'll use the updated move scores and the reward to learn better values for our function weights.

Of course, the immediate "reward" values for most moves will be 0. The only moves that will carry a reward will be those where we either win the game or lose the game. So it could take a while for our program to learn good behaviors. It will take time for the "end" behaviors of the game to affect normal moves. For this reason, we'll start our training on much smaller mazes than the primary game. This should help speed up the training process.

Conclusion

Next week, we'll take our general framework for Q-Learning and apply it within Tensor Flow. We'll get the basics of Q-Learning down with a couple different types of models. For a wider perspective on Haskell and AI problems, make sure to check out our Haskell AI Series!

James Bowen 8/19/19 James Bowen 8/19/19

Adding Features for Better Behavior

Last week we started exploring the idea of an AI built on an evaluation function. This has the potential to allow us to avoid a lot of the hand-crafting that comes with AI design. Hard old way specified all the rules for the AI to follow. In the new approach, we create a mathematical function to evaluate a game position. Then we can look at all our possible moves and select the one with the best result. We could, if we wanted to, turn the input to our evaluation function into a vector of numbers. And its output is also a number. This property will help us realize our dream future to machine learn this function.

We made a rudimentary version of this function last week. Even before turning to machine learning, there are a couple ways to improve our function. We can try tweaking the weights we applied to each feature. But we can also try coming up with new features, or try different combinations of features. This week, we'll try the latter approach.

In the coming weeks as we start exploring machine learning, we'll use Tensor Flow with Haskell! To get prepared, download our Haskell Tensor Flow guide!

Existing Features

Last week, we came up with a few different features that could help us navigate this maze. These features included:

Maze distance to goal
Manhattan distance to goal
Whether or not an enemy is on our location
Whether or not our stun is available
The number of drills we have available
The number of enemies that are nearby (using manhattan distance)

But there were some clear sub-optimal behaviors with our bot. We tend to get "zoned out" by enemies, even when they aren't near us by maze distance. Obviously, it would suit us to use maze distance instead of manhattan distance. But we also want to be willing to approach enemies aggressively when we have our stun, and retreat intelligently without it. To that end, let's add a couple more features:

The number of enemies on the shortest path to the goal.
The shortest distance to an enemy from a particular square (only up to 5)

We'll impose a penalty for close enemies if we don't have our stun. Otherwise we'll ignore this first new feature. Then we'll also impose a penalty having more enemies on our shortest path. This will make us more willing to use the stun, rather than waiting.

Enemies In The Way

Our first order of business will be to determine how many enemies lie on our shortest path. We'll filter the path itself based on membership in the active enemies set:

evaluateWorld :: World -> Float
evaluateWorld w =

  where
    activeEnemyLocations = …

    shortestPath =
      getShortestPath (worldBoundaries w) playerLoc goalLoc

    enemiesOnPath = length $ filter
      (\l -> Set.member l (Set.fromList activeEnemyLocations))
      shortestPath

Then we'll assign each enemy on this path a penalty greater than the value of using the stun. We'll add this score to our other scores.

evaluateWorld :: World -> Float
evaluateWorld w =
  enemiesOnPathScore +
  ...
  where
    enemiesOnPath = ...
    enemiesOnPathScore = -85.0 * (fromIntegral enemiesOnPath)

Maze Distance

Next lets get the shortest maze distance to a nearby enemy. We'll actually want to generalize the behavior of our existing BFS function for this. We want to find the shortest path to any one of the enemy locations. So instead of supplying a single target location, we'll supply a set of target locations. Then we'll cap the distance to search so we aren't doing a full BFS of the maze every time. This gives an optional range parameter. Let's use these ideas to make an expanded API that our original function will use.

getShortestPathToTargetsWithLimit
  :: Maze
  -> Location
  -> Set.Set Location
  -> Maybe Int
  -> [Location]
getShortestPathToTargetsWithLimit
  maze initialLocation targetLocations maxRange = ...

-- Original function call!
getShortestPath maze initialLocation targetLocation =
  getShortestPathToTargetsWithLimit maze initialLocation
    (Set.singleton targetLocation) Nothing

bfs
  :: Maze
  -> Location
  -> Set.Set Location -- Now a set of targets
  -> Maybe Int -- Added range parameter
  -> [Location]
bfs = ...

We'll have to make a few tweaks to our algorithm now. Each search state element will have a "distance" associated with it.

data BFSState = BFSState
  { bfsSearchQueue :: Seq.Seq (Location, Int)
  ...



-- Our initial state has a distance of 0
getShortestPathToTargetsWithLimit
  maze initialLocation targetLocations maxRange =
    evalState
      (bfs maze initialLocation targetLocations maxRange)
      (BFSState
        (Seq.singleton (initialLocation, 0))
        (Set.Singleton initialLocation)
        Map.empty)

Now we need a couple modifications to the core bfs function. When extracting the next element in the queue, we have to consider its distance. All new items we create will increment that distance. And if we're at the max distance, we won't add anything to the queue. Finally, when evaluating if we're done, we'll check against the set of targets, rather than a single target. Here's our bfs code, with differences noted.

bfs
  :: Maze
  -> Location
  -> Set.Set Location
  -> Maybe Int
  -> State BFSState [Location]
bfs maze initialLocation targetLocations maxRange = do
  BFSState searchQueue visitedSet parentsMap <- get
  if Seq.null searchQueue
    then return []
    else do

      -- ! Unwrap distance as well
      let (nextLoc, distance) = Seq.index searchQueue 0

      -- ! Check set membership, not equality
      if Set.member nextLoc targetLocations
        then return (unwindPath parentsMap [nextLoc])
        else do

          -- ! Add the new distance to each adjacent cell
          let adjacentCells = (, distance + 1) <$> 
                getAdjacentLocations maze nextLoc


          -- ! Account for the distance with a new helper function
          let unvisitedNextCells = filter
                (shouldAddNextCell visitedSet)
                adjacentCells

          let newSearchQueue = foldr
                (flip (Seq.|>))
                (Seq.drop 1 searchQueue) 
                unvisitedNextCells
              newVisitedSet = Set.insert nextLoc visitedSet
              newParentsMap = foldr
                (\(l, _) -> Map.insert l nextLoc)
                parentsMap unvisitedNextCells
          put (BFSState newSearchQueue newVisitedSet newParentsMap)
          bfs maze initialLocation targetLocations maxRange
  where
    -- ! Helper function to account for distance when adding to queue
    shouldAddNextCell visitedSet (loc, distance) = case maxRange of
      Nothing -> not (Set.member loc visitedSet)
      Just x -> distance <= x && not (Set.member loc visitedSet)

    unwindPath parentsMap currentPath = ...

Now to use this feature, we'll use our new different shortest path call. If the distance is "0", this means we have no enemies near us, and there's no penalty. We also won't apply a penalty if our stun is available. Otherwise, we'll provide a stiffer penalty the shorter the path. Then we mix it in with the other scores.

evaluateWorld :: World -> Float
evaluateWorld w =
  ...
  nearestEnemyDistanceScore +
  ...
  where
    ...
    nearestEnemyDistance = length $ getShortestPathToTargetsWithLimit
      (worldBoundaries w)
      playerLoc
      (Set.fromList activeEnemyLocations)
      (Just 4)
    nearestEnemyDistanceScore =
      if nearestEnemyDistance == 0 || stunAvailable then 0.0
        else -100.0 * (fromIntegral (5 - nearestEnemyDistance))

We'll also drop the enemy manhattan distance weight to -5.0.

Results

From this change, our player suddenly appears much more intelligent! It will back away from enemies when it is missing it's stun. It will use the stun and go past the enemy when appropriate.

There are still ways we could improve the AI. It doesn't account for future space to retreat when running away. It sometimes uses the stun too early, when it might be better to wait for more enemies to come into range. But it's not clear how we could improve it by tweaking the weights. This means it's time to consider machine learning as an option to get better weights!

Conclusion

Next week, we'll re-acquaint ourselves with the basics of machine learning and Tensor Flow. This will set us up to write a program that will determine our AI weights.

We're going to start working with Tensor Flow next week! To make sure you can keep up, download our Haskell Tensor Flow Guide. It'll help you with the basics of making this complex Haskell library work.

James Bowen 8/12/19 James Bowen 8/12/19

Building a Better Brain

In the last few weeks, we've focused a lot on the player AI for our game. We've used a few more advanced tricks to help our player navigate the maze using drills. But that's come at a performance cost. The game can now get a little choppy when there are a lot of enemies, or when our player is far away from the goal. It also takes longer to run our analysis iterations than we would like.

This week, we'll improve the performance of our AI by caching the determined path. Lots of our calculations for shortest path measurements get repeated. We can keep track of these, and avoid the entire BFS algorithm altogether in a lot of circumstances!

This week, you should take a look at the search-caching branch on our Github repository for the complete code we're implementing here. We'll focus on changes in the MazeUtils.hs file.

We're also going to do a little bit of profiling for this article. Profiling your code is an important skill to learn about if you ever want to use Haskell in production. For some other useful skills, check out our Production Checklist!

Profiling Our Code

As alluded to above, we have a pretty good idea of where the performance bottleneck is for our code. But it always pays to be sure. So to double check, we're going to run our code under profiling. We'll go through some of the basics here, but you should also check out this article we did on profiling a while back.

We'll get a readout for our code that will tell us which functions are taking the most time. This will tell us where we can make the most effective improvements. It will also give us a concrete way to prove our improvement later.

To start, we'll need to rebuild our code with stack build --profile. Be warned this can take a while, since all the libraries also need to be re-built. Then we can re-run the analysis program we used last week:

stack exec -- analyze-game maze_save_2 --enemies +RTS -p

Here's the abbreviated readout in the file `analyze-game.EXE.prof:

total time = 32.62 secs

COST CENTRE                                  %time
drillBFS.newParentsMap.\                     21.9
drillBFS.unvisitedNextItems.\                21.7
drillBFS.newVisitedSet                       19.4
getDrillAdjacentItems                        6.2
drillBFS                                     4.5
drillBFS.newSearchQueue                      4.0
getDrillAdjacentItems.mkItemFromResult       3.0
bfs.newParentsMap.\                          2.1
bfs.newVisitedSet                            2.0
getDrillAdjacentItems.mkItemFromResult.(...) 1.7
drillBFS.unvisitedNextItems                  1.4
bfs.unvisitedNextCells.\                     1.1
drillBFS.newParentsMap                       1.0
getDrillAdjacentItems.bounds                 1.0
bfs                                          0.6
getAdjacentLocations                         0.5

Unsurprisingly, we see that drillBFS and it's helpers are the biggest culprits. They account for the top seven entries on the list and a whopping 82% of the time we spend. The enemy AI calculations come in a distant second at around 6.3% of the time. So let's focus on fixing the player algorithm.

A Basic Cache for the Player

As we try to improve our player AI, there's one big observation we can make. Perhaps some of you already noted this when reading about that AI in the first place. For the most part, our player follows a single path the whole time. We calculate the complete path from start to finish on each player move cycle, but then throw most of it away. The only time we get "blown off" this path is when we have to run away from enemies.

There are only a few circumstances where we change this path! So let's make PlayerMemory type that will keep track of it. This should save us a ton of time!

newtype PlayerMemory = PlayerMemory (Maybe [Location])

data Player = Player
  { …
  , playerMemory :: PlayerMemory
  }

We'll add this memory to our player type. When we initialize it from JSON instances, it should start out empty. There's no need to keep track of this in a save-game file.

This change will complicate our move API a little bit. It will now produce the PlayerMemory as an output:

makePlayerMove :: World -> (PlayerMove, PlayerMemory)

Using Our Memory

When it comes to making out move, we first need to put the path into memory. To start, we'll make PlayerMemory out of the path we get from BFS.

makePlayerMove :: World -> (PlayerMove, PlayerMemory)
makePlayerMove w =
  ( PlayerMove finalMoveDirection useStun drillDirection
  , ...
  )
  where
    shortestPath = getShortestPathWithDrills …
    memoryFromMove = PlayerMemory (Just shortestPath)
    ...

In general, we'll want to return this "memory". But there's one circumstance where we'll want to invalidate it. When we have to retreat from our enemies, we'll diverge from this ideal path. In this case, we'll return Nothing. Here's what that logic looks like:

makePlayerMove :: World -> (PlayerMove, PlayerMemory)
makePlayerMove w =
  ( PlayerMove finalMoveDirection useStun drillDirection
  , if emptyCache then (PlayerMemory Nothing) else memoryFromMove
  )
  where
    (finalMoveDirection, useStun, emptyCache) = if not enemyClose
      then (shortestPathMoveDirection, False, False)
      else if canStun
        then (shortestPathMoveDirection, True, False)
        else case find (/= shortestPathMoveLocation) possibleMoves of
          Nothing -> (DirectionNone, False, True)
          Just l -> (getMoveDirection playerLoc, False, True)

Now let's consider when we use the cached information, as this will let us skip the BFS call altogether! We'll add one more validity check when doing this. We'll ensure that the list is non-empty and that our current location is at the head of the list. Then we can use the tail of the memory list as the shortest path call!

makePlayerMove :: World -> (PlayerMove, PlayerMemory)
makePlayerMove w = ...
  where
    (useCache, cachePath) = case playerMemory currentPlayer of
      (PlayerMemory (Just (first : rest))) ->
        (first == playerLoc, rest)
      _ -> (False, [])
    shortestPath = if useCache then cachePath
      else getShortestPathWithDrills ...

The last thing we need is to ensure that the cache goes back into memory. This is a simple modification of our function for making the player move:

modifyWorldForPlayerMove :: World -> Location -> PlayerMemory -> World
modifyWorldForPlayerMove w newLoc memory = ...
  where
    currentPlayer = worldPlayer w
    playerWithMemory = currentPlayer {playerMemory = memory}
    playerAfterMove = movePlayer newLoc playerWithMemory
    ...

Now we can run our analysis again. We'll see that our Player's AI functions are still the biggest contributor. But the percentage has gone down a lot. They now take only take up around 55% of our total time, instead of 82%! Meanwhile, the percentage of time from the normal BFS functions is now up to around 35%. Most importantly, the total time for the analysis declined five-fold. On the first run, it was 32.62 seconds, and it now only takes 6.79 seconds, a huge improvement!

total time = 6.79 secs

COST CENTRE                                  %time
drillBFS.unvisitedNextItems.\                14.3
drillBFS.newParentsMap.\                     14.2
drillBFS.newVisitedSet                       12.6
bfs.newParentsMap.\                          9.9
bfs.newVisitedSet                            9.2
bfs.unvisitedNextCells.\                     5.7
getDrillAdjacentItems                        4.3
drillBFS.newSearchQueue                      2.8
getAdjacentLocations                         2.8
drillBFS                                     2.6
bfs                                          2.6
getDrillAdjacentItems.mkItemFromResult       2.0
bfs.newSearchQueue                           1.8
getDrillAdjacentItems.mkItemFromResult.(...) 1.1
bfs.unwindPath                               1.1
bfs.unvisitedNextCells                       1.0
drillBFS.unvisitedNextItems                  0.9
bfs.newParentsMap                            0.7

Conclusion

Profiling is an important tool we can use for improving our code, no matter what language we're working in. When our program isn't performing how we like, we have to be sure to address the right parts of it. It may have been tempting to make a different assumption from the start. Since there are many enemy characters, it would be natural to tackle that algorithm first. But our profiling output made it clear that the player AI was the problem.

Next week, we'll start exploring different AI concepts. We'll start moving towards a kind of AI that can be machine-learned. Our code will be simpler, but our product won't be as good, at least at the start! But we'll start getting used to the way an AI can evaluate positions.

For more useful resources in improving your Haskell skills, download our Production Checklist! It has a lot of different tools and libraries to check out!

James Bowen 8/5/19 James Bowen 8/5/19

Moving Towards ML: Evaluation Functions

Before we get started, here's a reminder that today (August 5th) is the last day of enrollment for our Haskell From Scratch course! Sign-ups close at midnight Pacfic time! Don't miss out!

This week, we're going to start taking our AI in a somewhat new direction. Right now, we're hard-coding specific decisions for our player to make. But this week, we'll make a more general function for evaluating different positions. Our initial results will be inferior to the AI we've hand-coded. But we'll set ourselves up to have a much better AI in the future by applying machine learning.

For more details on the code for this article, take a look at the evaluation-game-function branch on our Github Repository! This article also starts our move towards machine learning related concepts. So now would be a good time to review our Haskell AI Series. You can download our Tensor Flow Guide to learn more about using Haskell and Tensor Flow!

Evaluation as a Strategy

Currently, our AI follows a strict set of rules. It performs pretty well for the current problem space. But suppose circumstances changed. Suppose we use different maze structures. Or we could add a completely new feature to the game. In these cases, we might need a completely different set of ideas to build a competent AI.

Our new strategy will be much more general. We'll supply our AI with a function that can evaluate a particular board position. That is, it will look at the world, and create a numeric output scoring it. Then our brain will look at all possible moves, score each position, and choose the move with the best result.

If game rules change, we'll need to rethink the evaluation function. But, by making the problem one of numbers to numbers, it'll be easier to use machine learning (instead of our own logic) to devise this function. This way, we can radically change the nature of the game, and we won't need to do too much manual work to change the AI. We might need to add new features (as we'll discuss later). But otherwise we would just need to re-train the evaluation function.

Top Down Development

To implement this approach, we'll put the "function" in functional programming. We'll start by outlining our decision making process with a series of type signatures. Let's remember that first, our overarching goal is a function that takes a World and gives us a PlayerMove:

makePlayerMove :: World -> PlayerMove

We should first determine the set of possible moves:

possibleMoves :: World -> [PlayerMove]

Then we'll need to calculate the new World from each of those moves. (We won't go over this function in this article. It mainly consists of refactoring code we already have for manipulating the game).

applyPlayerMove :: World -> PlayerMove -> World

Then we'll score each of those resulting worlds. This is where the real "brain" is going to live now:

evaluateWorld :: World -> Float

Now that we know the functions we're writing, we can already implement makePlayerMove. We'll assume our helpers already exist and then we apply the process outlined above:

makePlayerMove :: World -> PlayerMove
makePlayerMove w = bestMove
  where
    -- 1. Get our Moves
    allMoves = possibleMoves w

    -- 2. See what the results of each move are
    possibleWorlds = applyPlayerMove w <$> allMoves

    -- 3. Score each resulting world
    scores = evaluateWorld <$> possibleWorlds

    -- 4. Combine the world with its move and choose the best one
    movesWithScores = zip allMoves movesWithScores
    bestMove = fst $ maximumBy (\(_, score1) (_, score2) ->
      compare score1 score2) movesWithScores

This will compile, and we can now move on to the individual components.

Getting Possible Moves

Let's start with getting all the possible moves. When it comes to movement, we generally have five options: stand still, or move in one of four directions. But if we're out of drills, or near the boundary of the world, this can restrict our options. But we always have the sure option of standing still, so let's start with that:

possibleMoves :: World -> [PlayerMove]
possibleMoves w = …
  where
    standStillMove = PlayerMove DirectionNone False DirectionNone
    ...

Now in every direction, we'll have a Maybe move possibility. If it's a WorldBoundary, we'll get Nothing. Otherwise if it's a wall, then we'll have a possible move as long as a drill is available. Otherwise the move is possible, and we won't need a drill. We'll wrap these behaviors in a helper function, and then it's easy to use that in each direction:

possibleMoves :: World -> [PlayerMove]
possibleMoves w = baseMoves
  where
    standStillMove = PlayerMove DirectionNone False DirectionNone
    player = worldPlayer w
    bounds = (worldBoundaries w) Array.! (playerLocation player)

    possibleMove :: (CellBoundaries -> BoundaryType) ->
      MoveDirection -> Maybe PlayerMove
    possibleMove boundaryFunc direction =
      case boundaryFunc bounds of
        WorldBoundary -> Nothing
        Wall _ -> if playerDrillsRemaining player > 0
          then Just $ PlayerMove direction False direction
          else Nothing
        AdjacentCell _ -> Just $
          PlayerMove direction False DirectionNone

    upMove = possibleMove upBoundary DirectionUp
    rightMove = possibleMove rightBoundary DirectionRight
    downMove = possibleMove downBoundary DirectionDown
    leftMove = possibleMove leftBoundary DirectionLeft

    baseMoves = standStillMove : (catMaybes [upMove, rightMove, downMove, leftMove])

Now we have to factor in that each move can also apply the stun if it's available.

possibleMoves :: World -> [PlayerMove]
possibleMoves w = baseMoves ++ stunMoves
  where
    ...
    baseMoves = standStillMove : (catMaybes [upMove, rightMove, downMove, leftMove])

    stunMoves = if playerCurrentStunDelay player /= 0 then []
      else [ m { activateStun = True } | m <- baseMoves ]

And now we've got our moves!

Evaluating the Game Position

Now let's start tackling the problem of evaluating a particular game situation. Any manual solution we come up with here is likely to have problems. This is where machine learning will come in. But here's the general approach we want.

First, we'll select particular "features" of the world. For instance, how far away are we from the end of the maze? How many enemies are within our stun radius? We'll consider all these elements, and then come up with a "weight" for each feature. A weight is a measurement of whether that feature makes the position "good" or "bad". Then, we'll add together the weighted feature values to get a score. So here's a list of the features we're going to use:

How close are we (in maze search terms) from the target location? This will use pure BFS and it will not account for using drills.
How close are we in manhattan distance terms from the target location?
Is there an active enemy on the same square as the player (this will receive a heavy negative weight!)
How many enemies are within our stun radius?
Is our stun available?
How many drills do we have left?

Let's start by getting all these features:

evaluateWorld :: World -> Float
evaluateWorld w = ...
  where
    player = worldPlayer w
    playerLoc@(px, py) = playerLocation player
    radius = stunRadius . playerGameParameters . worldParameters $ w
    goalLoc@(gx, gy) = endLocation w
    activeEnemyLocations = enemyLocation <$>
      (filter (\e -> enemyCurrentStunTimer e == 0) (worldEnemies w))

    onActiveEnemy = playerLocation player `elem` activeEnemyLocations

    shortestPathLength = length $
      getShortestPath (worldBoundaries w) playerLoc goalLoc

    manhattanDistance = abs (gx - px) + abs (gy - py)

    stunAvailable = playerCurrentStunDelay player == 0

    numNearbyEnemies = length
      [ el | el@(elx, ely) <- activeEnemyLocations,
        abs (elx - px) <= radius && abs (ely - py) <= radius ]

    drillsRemaining = playerDrillsRemaining player

Now let's move on to assigning scores. If our player is on the same square as an active enemy, we lose. So let's give this a weight of -1000. Conversely, the closer we get to the target, the closer we are to winning. So let's devise a function where if that distance is 0, the score is 1000. Then the farther away we get, the more points we lose. Let's say, 20 points per square. For manhattan distance, we'll use a strict penalty, rather than reward:

evaluateWorld :: World -> Float
evaluateWorld w = ...
  where
    ...
    onActiveEnemyScore = if onActiveEnemy then -1000.0 else 0.0
    shortestPathScore = 1000.0 - (20.0 * (fromIntegral shortestPathLength))
    manhattanDistanceScore = (-5.0) * (fromIntegral manhattanDistance)

Now we want to generally reward having our power ups available to us. This will stop the bot from needlessly using them and also reward it for picking up new drills. We'll also penalize having enemies too close to us.

evaluateWorld :: World -> Float
evaluateWorld w = ...
  where
    ...
    stunAvailableScore = if stunAvailable then 80.0 else 0.0
    numNearbyEnemiesScore = -100.0 * (fromIntegral numNearbyEnemies)
    drillsRemainingScore = 30.0 * (fromIntegral drillsRemaining)

And to complete the function, we'll just add these together:

evaluateWorld :: World -> Float
evaluateWorld w =
  onActiveEnemyScore +
  shortestPathScore +
  manhattanDistanceScore +
  stunAvailableScore +
  numNearbyEnemiesScore +
  drillsRemainingScore

How Well Does it Work?

When we run the game now with the AI active, we see some interesting behaviors. Our bot will generally navigate the maze well. It's path isn't optimal, as we have with drillBFS. But it makes decent choices about drilling. Its behavior around enemies is a bit strange. It tends to stay away from them, even if they're not actually close in maze difference. This makes it take longer than it needs.

We still don't have good retreating behavior in certain cases. It will often stand still and let an enemy grab it instead of running away.

At this point, we have a couple options for improving the AI. First, we could try tweaking the weights. This will be tedious for us to do manually. This is why we want to apply machine learning techniques to come up with optimal weights.

But the other option is to update the feature space. If we can come up with more intelligent features, we won't need as precise weights.

Conclusion

Next week, we'll try to fix our behavior around enemies. We'll use true maze distance in more places as opposed to manhattan distance. This should give us some big improvements. Then we'll start looking into how we can learn better weights.

We'll be coming up pretty soon on using Tensor Flow for this program! Download our Haskell Tensor Flow Guide to learn more!

And if you're still a Haskell beginner, there's never been a better time to learn! Register for our Haskell From Scratch course to jump-start your Haskell journey! Enrollment ends at midnight TODAY! (August 5th).

James Bowen 7/22/19 James Bowen 7/22/19

Analyzing Our Parameters

Our last couple articles have focused on developing an AI for the player character in our game. It isn't perfect, but it's a decent approximation of how a human would try to play the game. This means we can now play iterations of the game without any human involvement. And by changing the parameters of our world, we can play a lot of different versions of the game.

Our goal for this week will be to write some simple analysis functions. These will play through the game without needing to display anything on the screen. Then we'll be able to play different versions in quick succession and compare results.

As always, the code for this project is on a Github Repository. For this article, take a look at the analyze-game branch.

If you're completely new to Haskell, a simple game like this is a great way to get started! But you should start with our Beginners Checklist! It'll help you get everything set up with the language on your local machine! Then you can move onto our Liftoff Series to learn more about Haskell's mechanics.

Generating a Result

The first thing we need is a function that takes a world state and generates a result for it. Our game does have a degree of randomness. But once we fix the starting random generator for a everything is deterministic. This means we need a function like:

runGameToResult :: World -> GameResult

We'll want to use our updateFunc from the main game runner. This is our "evolution" function. It's job is to go from one World state to another. It evolves the game over the course of one timestep by allowing each of the agents to make a decision (or wait). (Note we don't use the Float parameter in our game. It's just needed by Gloss).

updateFunc :: Float -> World -> World

Since we want to track an ever evolving stateful variable, we'll use the State monad. For each iteration, we'll change the world using this update step. Then we'll check its result and see if it's finished. If not, we'll continue to run the game.

runGameToResult :: World -> GameResult
runGameToResult = evalState runGameState
  where
    runGameState :: State World GameResult
    runGameState = do
      modify (updateFunc 1.0)
      currentResult <- gets worldResult
      if currentResult /= GameInProgress
        then return currentResult
        else runGameState

Analysis: Generating World Iterations

Now that we can run a given world to its conclusion, let's add another step to the process. We'll run several different iterations with any given set of parameters on a world. Each of these will have a different set of starting enemy locations and drill power-ups. Let's make a function that will take a random generator and a "base world". It will derive a new world with random initial enemy positions and drill locations.

generateWorldIteration :: World -> StdGen -> World

We'll use a helper function from our game that generates random locations in our maze. It's stateful over the random generator.

generateRandomLocation :: (Int, Int) -> State StdGen Location

So first let's get all our locations:

generateWorldIteration :: World -> StdGen -> World
generateWorldIteration w gen1 = ...
  where
    params = worldParameters w
    rowCount = numRows params
    columnCount = numColumns params
    enemyCount = numEnemies params
    drillCount = numDrillPowerups params

    (enemyLocations, gen2) = runState
      (sequence
        (map
          (const (generateRandomLocation (rowCount, columnCount)))
          [1..enemyCount])
        )
      gen1
    (drillLocations, gen3) = runState
      (sequence
        (map
          (const (generateRandomLocation (rowCount, columnCount)))
          [1..drillCount])
        )
      gen2
    ...

Then we have to use the locations to generate our different enemies. Last, we'll plug all these new elements into our base world and return it!

generateWorldIteration :: World -> StdGen -> World
generateWorldIteration w gen1 = w
  { worldEnemies = enemies
  , worldDrillPowerUpLocations = drillLocations
  , worldRandomGenerator = gen3
  , worldTime = 0
  }
where
    ...
    (enemyLocations, gen2) = ...
    (drillLocations, gen3) = …
    enemies = mkNewEnemy (enemyGameParameters params) <$> enemyLocations

Analysis: Making Parameter Sets

For our next order of business, we want to make what we'll call a parameter set. We want to run the game with different parameters each time. For instance, we can take a base set of parameters, and then change the number of enemies present in each one:

varyNumEnemies :: GameParameters -> [GameParameters]
varyNumEnemies baseParams = newParams <$> allEnemyNumbers
  where
    baseNumEnemies = numEnemies baseParams
    allEnemyNumbers = [baseNumEnemies..(baseNumEnemies + 9)]
    newParams i = baseParams { numEnemies = i }

We can do the same for the number of drill pickups:

varyNumDrillPickups :: GameParameters -> [GameParameters]
varyNumDrillPickups baseParams = newParams <$> allDrillNumbers
  where
    baseNumDrills = numDrillPowerups baseParams
    allDrillNumbers = [baseNumDrills..(baseNumDrills + 9)]
    newParams i = baseParams { numDrillPowerups = i }

Finally, we can have a different cooldown time for our player's stun ability.

varyPlayerStunCooldown :: GameParameters -> [GameParameters]
varyPlayerStunCooldown baseParams = newParams <$> allCooldowns
  where
    basePlayerParams = playerGameParameters baseParams
    baseCooldown = initialStunTimer basePlayerParams
    allCooldowns = [(baseCooldown - 4)..(baseCooldown + 5)]
    newParams i = baseParams
      { playerGameParameters = basePlayerParams { initialStunTimer = i }}

If you fork our code, you can try altering some other parameters. You can even try combining certain parameters to see what the results are!

Tying It Together

We've done most of the hard work now. We'll have a function that takes a number of iterations per parameter set, the base world, and a generator for those sets. It'll match up each parameter set to the number of wins the player gets over the course of the iterations.

runAllIterations
  :: Int
  -> World
  -> (GameParameters -> [GameParameters])
  -> [(GameParameters, Int)]
runAllIterations numIterations w paramGenerator =
  map countWins results
  where
    aiParams = (worldParameters w) { usePlayerAI = True }
    paramSets = paramGenerator aiParams

    runParamSet :: GameParameters -> [GameResult]
    runParamSet ps = map
      (runGame w {worldParameters = ps })
      [1..numIterations]

    runGame :: World -> Int -> GameResult
    runGame baseWorld seed = runGameToResult
      (generateWorldIteration baseWorld (mkStdGen seed))

    results :: [(GameParameters, [GameResult])]
    results = zip paramSets (map runParamSet paramSets)

    countWins :: (GameParameters, [GameResult]) -> (GameParameters, Int)
    countWins (gp, gameResults) =
      (gp, length (filter (== GameWon) gameResults))

We need one more function. It will read an input file and apply our steps over a particular parameter group. Here's an example with varying the number of enemies:

analyzeNumEnemies :: FilePath -> IO ()
analyzeNumEnemies fp = do
  world <- loadWorldFromFile fp
  let numIterations = 10
  putStrLn "Analyzing Different Numbers of Enemies"
  let results = runAllIterations numIterations world varyNumEnemies
  forM_ results $ \(gp, numWins) -> putStrLn $
    "With " ++ (show (numEnemies gp)) ++ " Enemies: " ++ (show numWins)
      ++ " wins out of " ++ (show numIterations) ++ " iterations."

Now we're done! In the appendix, you can find some basic results of our investigation!

Conclusion

Soon, we'll take our analysis steps and apply them in a more systematic way. We'll try to gauge the difficulty of a particular game level. Then we can make levels that get more and more challenging!

But first, we'll start exploring a few ways we can improve the player and enemy AI abilities. We'll start by implementing some basic caching mechanisms in our breadth first search. Then we'll consider some other AI patterns besides simple BFS.

For a review of the code in this article, take a look at our Github Repository. You'll want to explore the analyze-game branch!

We'll soon be exploring machine learning a bit more as we try to improve the game. Make sure to read our series on Haskell and AI to learn more! Download our Haskell Tensorflow Guide to see how we can use tensor flow with Haskell!

Appendix

With 4 drills and 10 cooldown time:

Analyzing Different Numbers of Enemies
With 4 Enemies: 10 wins out of 10 iterations.
With 5 Enemies: 9 wins out of 10 iterations.
With 6 Enemies: 9 wins out of 10 iterations.
With 7 Enemies: 10 wins out of 10 iterations.
With 8 Enemies: 9 wins out of 10 iterations.
With 9 Enemies: 9 wins out of 10 iterations.
With 10 Enemies: 9 wins out of 10 iterations.
With 11 Enemies: 9 wins out of 10 iterations.
With 12 Enemies: 8 wins out of 10 iterations.
With 13 Enemies: 7 wins out of 10 iterations.

With 13 enemies and 10 cooldown time:

With 2 Drills: 5 wins out of 10 iterations.
With 3 Drills: 7 wins out of 10 iterations.
With 4 Drills: 8 wins out of 10 iterations.
With 5 Drills: 8 wins out of 10 iterations.
With 6 Drills: 8 wins out of 10 iterations.
With 7 Drills: 7 wins out of 10 iterations.
With 8 Drills: 8 wins out of 10 iterations.
With 9 Drills: 8 wins out of 10 iterations.
With 10 Drills: 8 wins out of 10 iterations.
With 11 Drills: 8 wins out of 10 iterations.

James Bowen 7/15/19 James Bowen 7/15/19

Advanced Search with Drilling!

In last week's article we explored how we can make an AI for our main player character. This meant we could play the game without input from a user. The game can now "play itself", and churn out a lot of iterations and results. This, in turn, will let us test combinations of parameters so we can make levels that are challenging.

Our AI is still a little too simplistic. The version we made last week doesn't incorporate the drill feature at all. So this week, let's see if we can devise a way to use that. We'll start with some of the search algorithm ideas we're already using for BFS, and expand from there.

For this article, you'll want to look at the player-ai-drill branch on our Github Repository. It has the full implementation, and you can check the newest commits to see what has changed.

This article will depend a lot on our knowledge of monads, particularly the state monad. If you're newer to Haskell development, you should check out our series on Functional Data Structures. It'll help you understand this tricky concept better.

Updating Our Types

Once again, there are a few quick updates we'll want to make before we start writing our AI. Last week we wrote our main function using this type:

data MoveChoice =
  MoveUp |
  MoveRight |
  MoveDown |
  MoveLeft |
  StandStill

data PlayerMove = PlayerMove
  { playerMoveChoice :: MoveChoice
  , activateStun :: Bool
  }

makePlayerMove :: World -> PlayerMove
...

First we'll change the original type to be MoveDirection. Then to account for the drill powerup, we'll add another field using this type:

data MoveDirection =
  DirectionUp |
  DirectionRight |
  DirectionDown |
  DirectionLeft |
  DirectionNone

data PlayerMove = PlayerMove
  { playerMoveDirection :: MoveDirection
  , activateStun :: Bool
  , drillDirection :: MoveDirection
  }

So we can start out by using DirectionNone for the drill every time, and ensure our game works.

Modifying the World

With the new potential to use the drill to change our world, we'll want another modifier function. This will reuse a lot of old code from our player input section.

modifyWorldForDrill :: World -> MoveDirection -> World

To start, we'll refactor our drillLocation function to be a top level function. Nothing much has to change with its implementation. Let's recall its type signature:

drillLocation
  :: (CellBoundaries -> BoundaryType)
  -> (CellBoundaries -> CellBoundaries)
  -> (CellBoundaries -> CellBoundaries)
  -> World
  -> World

Now our player modifier is pretty straightforward. We'll do a case analysis of the desired direction, as we do with the input keys. This gives us all the necessary inputs to drillLocation:

modifyWorldForPlayerDrill :: World -> MoveDirection -> World
modifyWorldForPlayerDrill w drillDirection = case drillDirection of
  DirectionUp ->
      drillLocation upBoundary breakUpWall breakDownWall w
  DirectionRight ->
      drillLocation rightBoundary breakRightWall breakLeftWall w
  DirectionDown ->
      drillLocation downBoundary breakDownWall breakUpWall w
  DirectionLeft ->
      drillLocation leftBoundary breakLeftWall breakRightWall w
  DirectionNone -> w

And now working this into our update function is easy. It's just another function we have to compose with the other options. We'll apply the drill as the first step, even before the stun:

updateWorldForPlayerMove :: World -> World
updateWorldForPlayerMove w = if shouldMovePlayer
  then worldAfterMove
  else w
  where
    ...
    move = makePlayerMove w

    worldAfterDrill =
        modifyWorldForPlayerDrill w (drillDirection move)

    worldAfterStun = if activateStun move
      then modifyWorldForStun worldAfterDrill
      else worldAfterDrill

    … -- Apply move

Changing the AI

Now let's get cracking on a better algorithm! The good news is that we can still stick to breadth first search. But what needs to change is the items in our search queue. There's a certain amount of state involved in each path we take. Before, we know that if a path backtracks onto a previous location, it will be slower. But now, we can have a faster path coming over a previous square IF we do so with more drills. But each time we take a drill, we need to remove it from the map (from the point of view of this path). Otherwise we could keep picking up the same drill! Thus a "search item", as we'll call it, must also contain the number of drills we have as well as the set of drill power-ups existing on the map. This item has all the important world state for our player moving around the map.

type DrillSearchItem = (Location, Word, Set.Set Location)

Then the other obvious change is that we can go to more adjacent squares. As long as the drill count is at least one, we can go to any adjacent tile. We'll see how this factors into our algorithm below.

Re-Doing BFS

We're going to mimic our original BFS algorithm for the most part when it comes to the drilling version. First, let's reconsider our notion of "adjacent" cells. Remember this function from BFS?

getAdjacentLocations :: Maze -> Location -> [Location]

We now want to re-write it using our DrillSearchItem alias.

getDrillAdjacentItems :: Maze -> DrillSearchItem -> [DrillSearchItem]

In the previous version, we had a section dedicated to checking the bounds around the given location. Then we could see which locations are adjacent. We'll want a similar section for drilling, but we want to use walls as well, as long as we have at least one drill. So let's pattern match on our current item, and gather the locations we can use. We'll also use a boolean to denote if we used a drill to get to that location.

getDrillAdjacentItems :: Maze -> DrillSearchItem -> [DrillSearchItem]
getDrillAdjacentItems maze (location, drillsRemaining, drillLocs) =
  …
  where
    canDrill = drillsRemaining > 0

    maybeUpLoc = case upBoundary bounds of
      (AdjacentCell loc) -> Just (loc, False)
      (Wall loc) -> if canDrill then Just (loc, True) else Nothing
      _ -> Nothing
    maybeRightLoc = case rightBoundary bounds of
      (AdjacentCell loc) -> Just (loc, False)
      (Wall loc) -> if canDrill then Just (loc, True) else Nothing
      _ -> Nothing
    maybeDownLoc = case downBoundary bounds of
      (AdjacentCell loc) -> Just (loc, False)
      (Wall loc) -> if canDrill then Just (loc, True) else Nothing
      _ -> Nothing
    maybeLeftLoc = case leftBoundary bounds of
      (AdjacentCell loc) -> Just (loc, False)
      (Wall loc) -> if canDrill then Just (loc, True) else Nothing
      _ -> Nothing

Now we want a helper function to convert each of these results into a new DrillSearchItem. If we applied the drill, we'll want to subtract one from the remaining drills count. But then if we go over a drill powerup, we'll increment the count and remove this location from our set:

getDrillAdjacentItems :: Maze -> DrillSearchItem -> [DrillSearchItem]
getDrillAdjacentItems maze (location, drillsRemaining, drillLocs) =
  ...
  where
    ...
    mkItemFromResult :: (Location, Bool) -> DrillSearchItem
    mkItemFromResult (loc, usedDrill) =
      let drillsAfterMove =
        if usedDrill
          then drillsRemaining - 1
          else drillsRemaining

      let (drillsAfterPickup, newDrillLocs) =
        if Set.member loc drillLocs
          then (drillsAfterMove + 1, Set.delete loc drillLocs)
          else (drillsAfterMove, drillLocs)
      in  (loc, drillsAfterPickup, newDrillLocs)

And then finally we apply this function over every Just result in our adjacent items!

getDrillAdjacentItems :: Maze -> DrillSearchItem -> [DrillSearchItem]
getDrillAdjacentItems maze (location, drillsRemaining, drillLocs) =
  mkItemFromResult <$>
   (catMaybes [maybeUpLoc, maybeRightLoc, maybeDownLoc, maybeLeftLoc])
  where
  ...

When we rewrite the other BFS functions, the changes are quite trivial. We'll write a new function for the BFS search, since it will apply the different helper, along with other surface level tweaks. The state stays the same except for using DrillSearchItems:

data DrillBFSState = DrillBFSState
  (Seq.Seq DrillSearchItem)
  (Set.Set DrillSearchItem)
  (Map.Map DrillSearchItem DrillSearchItem)

drillBFS :: Maze -> Location -> State DrillBFSState [Location]

Note that we're still only returning a list of locations. We'll let other functions handle the logic of determining if we used the drill or not. So our final call to this function is pretty clean. It just takes a couple extra arguments for our initial state:

getShortestPathWithDrills
  :: Maze
  -> Word
  -> Set.Set Location
  -> Location
  -> Location
  -> [Location]
getShortestPathWithDrills
  maze numDrills drillLocs initialLocation targetLocation = 
      evalState
          (drillBFS maze targetLocation)
          (DrillBFSState
          (Seq.singleton initialItem)
          (Set.singleton initialItem)
          Map.empty)
  where
    initialItem =
      (initialLocation, numDrills, drillLocs)

Last Touches

Once we've changed our algorithm, we have to make one more change to the makePlayerMove function. We'll examine the shortestPathMoveDirection. This will tell us the location we're supposed to go to next. If this is behind a wall, we know we also have to apply the drill in that direction. If it's an AdjacentCell, the drill is not necessary, so use DirectionNone.

makePlayerMove :: World -> PlayerMove
makePlayerMove w =
  PlayerMove finalMoveDirection useStun drillDirection
  where
    ...
    shortestPath = getShortestPathWithDrills
      maze
      (playerDrillsRemaining currentPlayer)
      (Set.fromList $ worldDrillPowerUpLocations w)
      playerLoc
      (endLocation w)
    shortestPathMoveLocation = if null shortestPath
      then playerLoc
      else (head shortestPath)
    shortestPathMoveDirection =
      getMoveDirection playerLoc shortestPathMoveLocation

    locationBounds = maze Array.! playerLoc


    -- Apply the drill if there's a wall!
    drillDirection = case shortestPathMoveDirection of
      DirectionUp -> case upBoundary locationBounds of
        Wall _ -> DirectionUp
        _ -> DirectionNone
      DirectionRight -> case rightBoundary locationBounds of
        Wall _ -> DirectionRight
        _ -> DirectionNone
      DirectionDown -> case downBoundary locationBounds of
        Wall _ -> DirectionDown
        _ -> DirectionNone
      DirectionLeft -> case leftBoundary locationBounds of
        Wall _ -> DirectionLeft
        _ -> DirectionNone
      DirectionNone -> DirectionNone

      ...

Note that we don't need to perform any checks on whether drilling is viable here. If, for some reason, our AI tells us to use the drill and it's invalid, drillLocation will stop this. Then our function for getting the next location will keep the player stationary.

Now we can go ahead and play the game with the AI enabled. We can see that it uses the drill effectively and makes intelligent choices about where and when to do so. It will sometimes go out of its way to pick up an extra drill if this allows it to make a strategic hole. It's not 100% optimal. For instance, we won't use a drill to break a wall to pick up three hidden drills. But it covers most of the important cases for us.

There is a drawback with our algorithm. The game can actually appear a little choppy, especially at the start of the maze. Because we modified the search state to contain a lot more information, the search space is many times bigger. So it is slower to arrive at solutions, especially when we're far from the end.

There are ways we can make this basic algorithm more efficient. Caching is a good place to start. We recompute a lot of information about distances within the maze every time. But there are also other search approaches we can make that will generally be faster. In the coming weeks, we'll explore some of these options.

Conclusion

But first, we'll take a look next week at how we can now run the game separately from the UI. This will allow us to perform many iterations. We'll measure the AI's success rate under different parameters. We'll see how the size of the maze, the number of enemies, and the number of drills affects that rate. This will give us information we can use to make a better use experience for players.

If you want to learn more about Haskell, you should subscribe to our mailing list! You'll get our monthly newsletter, as well as access to our subscriber resouces! This includes, for instance, our Beginners Checklist, to help you get started!

And last, don't forget to take a look at our Github Repository to see all the code in its final state! For this article, look for the player-ai-drill branch!

James Bowen 7/8/19 James Bowen 7/8/19

Preparing for Simulation: Player AI

Our goal for the next couple of weeks will be to run our game as a simulation, without any kind of player input. This means we'll be able to run through rapid iterations of the game and see the results. We can tune parameters and see what makes the game more competitive. But if we don't have player input, we have to move the main character somehow!

This means writing an AI for our player like we have for the enemies. The "player" has more possible actions and inputs, so it will be a little more complicated. We'll restructure our code a bit to make this easier. As with previous parts, take a look at our Github repository for more details. Refer to the player-ai branch for this article.

If you've never programmed in Haskell before, take a look at our Beginners Checklist and read our Liftoff Series! They'll help you get going so you can understand some of the concepts in these articles!

Basic Rules

For the first iteration of our AI, we won't use the drill power up at all. Each "move" will consist of two things. First, we'll pick a direction to move (or stand still). Second, we'll return a boolean for whether to activate the stun power-up.

data MoveChoice =
  MoveUp |
  MoveRight |
  MoveDown |
  MoveLeft |
  StandStill

data PlayerMove = PlayerMove
  {  moveChoice :: MoveChoice
  , activateStun :: Bool
  }

The choice depends on the state of the world. So our final goal is a function like:

makePlayerMove :: World -> PlayerMove
...

Once we have the player's move, we'll leave it up to the main update function to determine the result. We could also return a Location instead of a MoveChoice. This might seem more natural at times. But there are a couple reasons we want to stick with this approach.

One long-term goal is to machine-learn this function. So it will help a lot to limit the scope of the output space as much as possible. This means we're much more likely to come up with legal moves. We also want to leave it up to the game engine to determine that our move can't corrupt the game state. So we'll make our "choice" and devise other functions to manipulate the world.

Now that we know our basic types, let's come up with some simple rules to dictate our behavior.

Determine the shortest path to the destination
If there is an enemy on that path within the stun radius, activate the stun, if we can.
If we don't have our stun, choose any move away from the enemy.

So we can see some specific elements we'll need to code up. Step 1 is the simplest. We've already written out a function for finding the shortest path to our destination. Determining if there's an enemy on that path will be simple as well. The last part will be a little trickier, but manageable. Before we write this function though, we want to update our game infrastructure a bit to support to AI. We'll try to keep this section somewhat light on details.

Game Infrastructure

First, we'll need a couple new parameters for our game. Our player now needs a lagTime parameter like what the Enemy type has. We also need a boolean on our parameter object indicating whether we're using the AI or human input. As always, we'll update JSON instances to reflect these new fields:

data GameParameters = GameParameters
  { …
  , usePlayerAI :: Bool
  }

data PlayerGameParameters = PlayerGameParameters
  { …
  , lagTime :: Word
  }

For a default lag time, we'll use 5. This gives the player 4 moves for ever enemy move. We'll add an extra command line parser that will look for the argument --use-player-ai. We'll use this to fill in the parameter value.

usePlayerAIInfo :: Parser Bool
usePlayerAIInfo = switch
  (long "use-player-ai" <>
    help "Whether to use the AI version of the player")

There are also some structural changes we want to make in the game. Both the player and enemies can now move on the same update cycle, so we have to decide how they interact.

We'll now have a series of discrete update functions for each part of the game. Each of these functions changes the world in some particular way. So they all have the same type signature. We'll structure our update function by composing these different functions. Here's what it might look like:

updateFunc :: Float -> World -> World
updateFunc _ w
  | … -- Win/Lose Conditions
  | otherwise = newWorld -- Normal Game Tick
  where
    afterPlayerMoveWorld = if usePlayerAI . worldParameters $ w
      then
        updateWorldForPlayerMove .
        clearStunCells .
        incrementWorldTime $ w
      else clearStunCells . incrementWorldTime $ w

    newWorld :: World
    newWorld =
      updateWorldForEnemyTicks .
      updateWorldForPlayerTick .
      updateWorldForEnemyMoves .
      updateWorldForPlayerMove .
      clearStunCells .
      incrementWorldTime $
      w

-- Reduce Stun Timers, etc.
updateWorldForEnemyTicks :: World -> World
updateWorldForPlayerTick :: World -> World

-- Update Locations
updateWorldForEnemyMoves :: World -> World

-- Update Location, Pickup Drills, Activate Stun if necessary
updateWorldForPlayerMove :: World -> World

incrementWorldTime :: World -> World

clearStunCells :: World -> World

Note though that we do want to account for the case where the player isn't AI controlled. So the final product actually looks like this:

updateFunc :: Float -> World -> World
updateFunc _ w
  ...
  | otherwise = newWorld
  where
    afterPlayerMoveWorld = if usePlayerAI . worldParameters $ w
      then
        updateWorldForPlayerMove .
        clearStunCells .
        incrementWorldTime $ w
      else clearStunCells . incrementWorldTime $ w

    newWorld :: World
    newWorld =
      updateWorldForEnemyTicks .
      updateWorldForPlayerTick .
      updateWorldForEnemyMoves $
      afterPlayerMoveWorld

We won't go over most of these changes in depth right here, as they're purely refactoring. In future articles we'll definitely look a bit more at how the enemy movement code changed. We'll want it to look as much as possible like the player AI code so we can machine . In the next section, we'll look at the player move function in a little more detail before writing our AI. Before that though, it's worth noting that we should also disable our input handler when using the AI:

inputHandler :: Event -> World -> World
inputHandler event w
  ...
  | usePlayerAI . worldParameters $ w = w -- No updates when AI is on!

Wrapping the Player Move

Above, we can see a mutator function updateWorldForPlayerMove. This will mainly be a wrapper around our primary AI function, makePlayerMove. But we'll establish a general pattern with it. This wrapper will first handle the game logic of determining whether we should move or not.

updateWorldForPlayerMove :: World -> World
updateWorldForPlayerMove w = if shouldMovePlayer
  then worldAfterMove
  else w
  where
    shouldMovePlayer =
      (worldTime w)
      `mod`
      (lagTime . playerGameParameters . worldParameters $ w) == 0
    worldAfterMove = ...

Then it will call the AI function. It will use the results to figure out what world changes are necessary and apply them. The object is to separate the "brain" from the game logic as much as possible. Note that we update the world for the stun first, and then the player's move.

updateWorldForPlayerMove :: World -> World
updateWorldForPlayerMove w = if shouldMovePlayer
  then worldAfterMove
  else w
  where
    shouldMovePlayer =
      (worldTime w) `mod`
      (lagTime . playerGameParameters . worldParameters $ w) == 0
    move = makePlayerMove w
    player = worldPlayer w
    currentLoc = playerLocation player

    worldAfterStun = if activateStun move
      then modifyWorldForStun w
      else w

    newLocation = nextLocationForMove
      (worldBoundaries w Array.! currentLoc)
      currentLoc   
      (playerMoveChoice move)
    worldAfterMove = modifyWorldForPlayerMove
      worldAfterStun newLocation

-- Stun enemies and change parameters as needed
modifyWorldForStun :: World -> World
...

-- Changes the player's location, track drill pickups
modifyWorldForPlayerMove :: World -> Location -> World
…

-- Take a move direction and give the location
nextLocationForMove ::
  CellBoundaries -> Location -> MoveChoice -> Location

Note we can apply more game logic and rules in our World modifier functions. Suppose, for some reason, the AI function returns True to stun enemies when our stun is on cooldown. It's the job of modifyWorldForStun to ensure nothing actually happens.

Shortest Path

Now that that's done, we can start writing our AI function at last! Let's start simple, and find the shortest path to the end.

makePlayerMove w = …
  where
    currentPlayer = worldPlayer w
    playerLoc = playerLocation currentPlayer
    maze = worldBoundaries w

    shortestPath = getShortestPath maze playerLoc (endLocation w)

Now we'll derive a PlayerMoveChoice based on that path. As mentioned above, it's tempting to return the location on the path. But remember, we want to limit the output scope as much as we can. Let's make a function for getting the direction out of two locations.

getMoveChoice :: Location -> Location -> MoveChoice
getMoveChoice (x1, y1) (x2, y2)
  | y2 == y1 + 1 = MoveUp
  | x2 == x1 + 1 = MoveRight
  | y2 == y1 - 1 = MoveDown
  | x2 == x1 - 1 = MoveLeft
  | otherwise = StandStill

So now we could, if we were naive, simply get out our move choice from the path:

makePlayerMove w = PlayerMove moveChoice False
  where
    …
    shortestPath = getShortestPath maze playerLoc (endLocation w)
    moveChoice = if null shortestPath then StandStill else getMoveChoice playerLoc (head shortestPath)

At this point, we should be able to run our game! The player will boldly walk towards the destination square through the maze. But if we have any enemies in our path, we're toast! Let's determine how we can use our stun!

Stunning Enemies

To determine when to use our stun, let's first get a set of the active enemy locations.

makePlayerMove w =
  where
    …
    shortestPath = …
    activeEnemyLocs = Set.fromList
      (enemyLocation <$>
        (filter (\e -> enemyCurrentStunTimer e == 0)
          (worldEnemies w)))

Now let's truncate the path to only include spots that might be in our stun radius. Then we can find if an enemy is there:

makePlayerMove w = (moveChoice, useStun)
  where
    …
    shortestPath = …
    enemyLocs = Set.fromList (enemyLocation <$> (worldEnemies w))
    radius = stunRadius . playerGameParameters . worldParameters $ w
    enemyClose = any
      (\l -> Set.member l activeEnemyLocs)
      (take radius shortestPath)
    canStun = playerCurrentStunDelay currentPlayer == 0
    useStun = enemyClose && canStun

We're getting closer now! We'll find that our player can stun enemies as long as it's ready. But if the stun isn't ready, we'll run straight into them! We don't want that! So let's figure out how to make a good retreat in case we don't have our stun ready.

Retreating

This leaves us with our last bit of logic. If an enemy stands between us and our shortest path, is close by, and we don't have our stun ready, we should run away. We'll make this as simple as possible by picking the first adjacent location that isn't on our shortest path. This doesn't always work that well. But it'll do for now. We need a list of adjacent locations, and then we select one that isn't on our shortest path. Here's how that logic pans out:

makePlayerMove :: World -> PlayerMove
makePlayerMove w = PlayerMove finalMoveChoice useStun
  where
    ...
    shortestPath = getShortestPath maze playerLoc (endLocation w)
    shortestPathMoveLocation = if null shortestPath
      then playerLoc
      else (head shortestPath)
    shortestPathMoveChoice = getMoveChoice
      playerLoc shortestPathMoveLocation

    activeEnemyLocs = ...
    radius = ...
    enemyClose = ...

    canStun = playerCurrentStunDelay currentPlayer == 0

    possibleMoves = getAdjacentLocations maze playerLoc

    (finalMoveChoice, useStun) = if not enemyClose
      then (shortestPathMoveChoice, False)
      else if canStun
        then (shortestPathMoveChoice, True)
        else case find (/= shortestPathMoveLocation) possibleMoves of
          Nothing -> (StandStill, False)
          Just l -> (getMoveChoice playerLoc l, False)

And now our AI actually works quite well! It can navigate the maze and stun enemies when it needs to. When it has to wait for its stun to re-charge, it'll back away from enemies as needed until the stun is ready.

Conclusion

Next week, we'll use some more advanced tactics to navigate the maze. Specifically, we'll look into how we can use our drill powerup. We'll need to re-think how we calculate the shortest path. We won't be using pure maze distance any more, since we can change the maze! So that will be an interesting problem. After we wrap that up, we'll look into separating the game from the GUI component so we can run simulations.

As always, check out our Github repository for full implementation details. This article uses the player-ai branch, so make sure you select that one!

And for more tips on getting better at Haskell, you should subscribe to our mailing list! You'll get access to all our subscriber resources. This includes our Beginners Checklist and our Production Checklist!

James Bowen 7/1/19 James Bowen 7/1/19

Gloss Review!

For the last few months, we've been constructing a simple game using the Gloss library. This library provides a neat and tidy interface for us to construct game components and put them together. The game is available to build and fork on our Github Repository. Here's a quick review of everything we've done:

Part 1: Overview of the Gloss library, constructing basic simulations and games

Part 2: Creating our maze type

Part 3: Generating random mazes using Depth-First-Search

Part 4: Victory Status Screen

Part 5: Serializing and parsing a maze

Part 6: Refactoring using Compile Driven Development

Part 7: Using Mutable Arrays for maze construction

Part 8: Adding enemies to the maze

Part 9: Making enemies more intelligent with Breadth-First-Search

Part 10: Adding a stun power to fight back against enemies

Part 11: Parameterizing the application and saving the world state

Part 12: Re-loading the world state, adding command line parameters

Part 13: Adding a drill power-up to take shortcuts through the maze

In this series, we explored some cool algorithmic concepts. We saw how to use the state monad for breadth first search and depth first search. We also explored mutable arrays, which aren't used that often in Haskell. If you're new to Haskell, it's good to get familiar with how some of these algorithms work.

We also emphasized using a methodical development approach. The Gloss architecture enables us to have a simple process for adding new features to the game. Combining this with compile driven development is powerful combination for rapid iteration.

We're probably not going to add any more core features to this game. But that doesn't mean we're done working with it! We're going to continue using the game as a platform for learning about interesting concepts and algorithms. Expect to see some articles related to search algorithms, AI development, and game architecture coming soon!

Until then, don't forget that you can subscribe to Monday Morning Haskell! You'll hear about our upcoming articles through our monthly newsletter. You'll also get access to our subscriber resources! There's plenty to learn both for beginners and more advanced Haskellers!

James Bowen 6/17/19 James Bowen 6/17/19

Loading Games and Changing Colors!

Last week we added functionality for serializing our world state. This allowed us to save our game with the press of a button. We also parameterized our application so that we could customize many aspects of it. We didn't explore all the possibilities though! This week, we'll see how we can load a previous game-state by using command line options. We'll also use options to specify how the game appears!

As always, take a look at our Github repository to see the full code. This article corresponds to the part-9 branch.

All this is part of our effort to make our game more "mature". But you should also consider some libraries that will be more useful in industry! Take a look at our Production Checklist!. Whether it's web servers or databases, you'll learn how Haskell interacts with more advanced concepts!

Command Line Options

Everything we do this week will happen through command line options. For a quick refresher on these, take a look at this article! We'll have a couple main concerns. First, we want to take an optional filename argument to load the initial world state. If we get this, we'll throw the player right back into their game! If we don't get this argument, we'll generate a new, random state for them to work with.

Second, we'll make command line parameters for all our different render parameters. This will allow anyone invoking the game to customize the appearance! We'll also allow them to specify a whole file for these as well. This will involve quite a bit of work with the Options.Applicative library.

Our main goal is this function:

parseOptions :: IO (Maybe FilePath, RenderParameters)

This will return us a possible file path to load our world state from, as well as a set of render parameters. Let's start with some basic framework code.

Parser Setup

The first item we want is a generic parser that will give us Maybe values. As far as I could tell the options library doesn't have this built-in. But it's not too hard to write. We want to use the option function to start with. It will attempt the given parser. If it succeeds, we want to wrap it with Just. Then we'll append a default value of Nothing to the options in case it fails.

maybeParser ::
  ReadM a -> Mod OptionFields (Maybe a) -> Parser (Maybe a)
maybeParser reader opts =
  option (Just <$> reader) (opts <> value Nothing)

We can now use this to build a parser for the maze file:

mazeFileParser :: Parser (Maybe FilePath)
mazeFileParser = maybeParser str
 (long "load-file" <> short 'f'
   <> help "A file to use to load the world state")

And now we just apply execParser on this, supplying some simple Info for our parser:

parseOptions :: IO (Maybe FilePath)
parseOptions = execParser $ info mazeFileParser commandInfo

commandInfo :: InfoMod (Maybe FilePath)
commandInfo = fullDesc <> progDesc "Haskell Maze Game"

Runner Updates

The next step is a short function for loading our world from the file. Since we have our JSON instance on the World type, we'll rely on decodeFileStrict'. There's one caveat. If the game parameters don't have a random seed value, we'll use a new one. Otherwise we'll use mkStdGen on the seed:

loadWorldFromFile :: FilePath -> IO World
loadWorldFromFile fp = do
 parseResult <- Data.Aeson.decodeFileStrict' fp
 case parseResult of
   Just w -> do
     gen <- case randomGeneratorSeed (worldParameters w) of
       Nothing -> getStdGen
       Just i -> return $ mkStdGen i
     return $ w { worldRandomGenerator = gen }
   Nothing -> error $ "Couldn't parse world from file " ++ fp ++ "!"

Now we want to make some changes to our main running function. We'll run our argument parser first. If we get a file from the options, we'll load the initial World from that file. Otherwise, we'll use our previous flow with generating a random maze.

main :: IO ()
main = do
 maybeLoadFile <- parseOptions
 initialWorld <- case maybeLoadFile of
   Just loadFile -> loadWorldFromFile loadFile
   Nothing -> …
 play ...

Parsing a Render File

This is good enough to load the world, so we can re-start from a saved (or derived) position. But suppose we wanted to go a step further. Suppose we wanted to also load our render parameters. We could use another file if we liked. We could start with another parser for a file path:

renderFileParser :: Parser (Maybe FilePath)
renderFileParser = maybeParser str
 (long "render-param-file" <> short 'r'
   <> help "A file to use to load render parameters")

Then we'll combine our two parsers together like so:

parser :: Parser (Maybe FilePath, Maybe FilePath)
parser = (,) <$>
 mazeFileParser <*>
 renderFileParser

Now we'll add a bit more logic to the wrapper function. If we have a file, we should use it to load the RenderParameters object:

parseOptions :: IO (Maybe FilePath, RenderParameters)
parseOptions = do
 (mazeFile, renderFile) <- execParser $ info parser commandInfo
 case renderFile of
   Nothing -> return (mazeFile, defaultRenderParameters)
   Just fp -> do
     parseResult <- decodeFileStrict' fp
     case parseResult of
       Nothing -> return (mazeFile, defaultRenderParameters)
       Just fileRenderParams -> return (mazeFile, fileRenderParams)

Note that the type of our commandInfo will also need to change as a result of this. But then we just have the simple task of getting other these items out in our main function:

main :: IO ()
main = do
 (maybeLoadFile, renderParams) <- parseOptions
 ...

Individual Render Parameters

We have one last trick though! Suppose we want to change one thing about the game's appearance and we don't want to use a JSON file. We can add individual options on render elements. We've got a lot of possible elements. We'll wrap them all in a basic type, matching the fields we have in the different sub-components. Each of these fields is optional. We'll "merge" them with a complete set of render parameters to get a final result.

data RenderParamInfo = RenderParamInfo
 -- Screen/Text Parameters
 (Maybe Int)
 (Maybe Int)
 (Maybe Int)
 (Maybe Float)
 (Maybe Float)
 (Maybe Float)
 (Maybe Float)
 -- Player Parameters
 (Maybe Float)
 (Maybe Color)
 (Maybe Float)
 (Maybe Color)
 -- Enemy Parameters
 (Maybe Float)
 (Maybe Color)
 (Maybe Color)
 -- Cell Parameters
 (Maybe Color)
 (Maybe Color)
 (Maybe Float)

Each field will have it's own parser. These will all be variations on our maybeParser:

maybeIntParser :: Mod OptionFields (Maybe Int) -> Parser (Maybe Int)
maybeIntParser = maybeParser auto

maybeFloatParser ::
  Mod OptionFields (Maybe Float) -> Parser (Maybe Float)
maybeFloatParser = maybeParser auto

maybeColorParser ::
  Mod OptionFields (Maybe Color) -> Parser (Maybe Color)
maybeColorParser = maybeParser (maybeReader colorReader)
 where
   colorReader "blue" = Just blue
   … -- other colors

Then we can combine them using applicative syntax, and providing some help information:

parseRenderInfo :: Parser RenderParamInfo
parseRenderInfo = RenderParamInfo <$>
 maybeIntParser (long "screen-dimen"
   <> help "The screen width/height") <*>
 maybeIntParser (long "screen-offset-x"
   <> help "The screen width/height") <*>
 ...

Next we'll write a "merge" function. This will take a RenderParameters item with default values, and apply the Just values.

mergeOptions ::
  RenderParameters -> RenderParamInfo -> RenderParameters
mergeOptions rp (RenderParamInfo sd_ sox_ ...)
 = RenderParameters
   (fromMaybe (screenDimen rp) sd_)
   (fromMaybe (screenOffsetX rp) sox_)
   ...

Then we add this new parser to our set:

parser :: Parser (Maybe FilePath, Maybe FilePath, RenderParamInfo)
parser = (,,) <$>
 mazeFileParser <*>
 renderFileParser <*>
 parseRenderInfo

And our original function should now reflect the need to merge parameters.

parseOptions :: IO (Maybe FilePath, RenderParameters)
parseOptions = do
 (mazeFile, renderFile, renderInfo) <- execParser $
                                         info parser commandInfo
 case renderFile of
   Nothing -> return
     (mazeFile, mergeOptions defaultRenderParameters renderInfo)
   Just fp -> do
     parseResult <- decodeFileStrict' fp
     case parseResult of
       Nothing -> return
         (mazeFile, mergeOptions defaultRenderParameters renderInfo)
       Just fileRenderParams -> return
         (mazeFile, mergeOptions fileRenderParams renderInfo)

Wrapping Up

Now we've got a lot of interesting possibilities. We can save our game from a particular state:

Then we can load it back up with different colors. For example, we can make some obnoxious green walls:

stack exec -- maze-game --load-file=maze_game_save_1557121650\
  --cell-wall-color=green --enemy-base-color=red

Conclusion

Now that our game is customizable and re-loadable we'll be able to do a lot more interesting things with it. We'll be able to run many simulations that test the difficulty. Some day, we'll be able to make the AI much better! For the time being though, there are still some more features we can add. So the next couple parts of this series will explore some other exciting twists in the game-play. This will lead us to a point where other types of refactors will be necessary.

Making a project like this requires a good knowledge of the Stack tool. To learn more, take our free mini-course on Using Stack!.

James Bowen 6/10/19 James Bowen 6/10/19

Spring Cleaning: Parameters and Saving!

Our game is a lot more interesting after the changes we made last week. But there's still lots of room for improvement. There are many things that could make the game more or less interesting, depending how we tune things. For instance, how many enemies is too many? What's a good stun duration? How quickly does the game get harder if we tweak these parameters?

Right now, it would be hard for us to answer these questions in a systematic way. We've baked many of these parameters directly into the code. So we would have to recompile everything if we wanted to test out another version of the game. This is a bad sign. We should be able to run the same binary with different sets of parameters.

Another issue with our game is that the only real "flow" is to start off with a random arrangement. We don't know what the map will be or where the enemies start. But if we want to test how well certain concepts work, we'll want true re-playability. In other words, we'll want to be able to start the game from a certain state we've established.

This week, we'll start to clean these things up. The first job will be to move a lot of our magic numbers into the World type. Then we'll devise a way to serialize the complete world state. We've already done the hard work of ensuring we can serialize the map. The rest will be pretty straightforward using JSON instances. We'll wrap up this week by adding the option to save the game in the middle of the action. Then next week, we'll add some options to our game so we can load a particular starting state from a file.

As with each of the different phases of this projects, you can take a look at our Github repository to see how we do everything. For this article, you should be following the part-8 branch. We'll also provide a couple commit links for so you can follow along step-by-step.

As you get better at using Haskell, you'll be able to use it for more and more types of projects. Download our Production Checklist for some ideas!

Parameterizing the App

There are a lot of "magic numbers" floating around our app right now. Some of these have to do with game-play logic. How many enemies are there? What's their cool-down time? How long is our player's stun timer? Then there are other parameters that have to do with how we draw the game. For instance, what colors do we use? How big are the cells?

We make this distinction because we should be able to run our game without any render information. At some point, we'll run simulations of this game that don't get drawn at all. So it would be useless to have drawing information around. Thus we'll have GameParameters types that will live in the World. Then we'll have RenderParameters types for drawing everything.

With that said, let's starting devising what information these types contain. We'll start out with types describing the player and enemy parameters:

data PlayerGameParameters = PlayerGameParameters
  { initialStunTimer :: Word
  , stunTimerIncrease :: Word
  , stunTimerMax :: Word
  , stunRadius :: Int
  }

data EnemyGameParameters = EnemyGameParameters
  { initialStunTime :: Word
  , stunTimeDecrease :: Word
  , minStunTime :: Word
  , enemyRandomMoveChance :: Word
  , initialLagTime :: Word
  , minLagTime :: Word
  }

Now we can use these to populate a bigger type with more generic game parameters:

data GameParameters = GameParameters
  { numRows :: Int
  , numColumns :: Int
  , numEnemies :: Int
  , tickRate :: Int
  , playerGameParameters :: PlayerGameParameters
  , enemyGameParameters :: EnemyGameParameters
  , randomGeneratorSeed :: Maybe Int
  }

Notice the random seed is a Maybe value. In the normal circumstances of running the program, we don't want to fix the random generator. But there are cases where we'll want to load from a specific stored state. If we fix the generator seed value, gameplay will be deterministic. This could be a desirable property in some circumstances. In most cases though, this will be Nothing.

With all these types in place, we'll now add the game parameters to our World:

data World = World
  { …
  , worldParameters :: GameParameters
  }

We'll go through a similar process with RenderParameters. The main difference will be that we will not attach the type to the World. There will also be a CellRenderParameters type as well as types for the Player and Enemy. This gives us information about how individual cells get displayed on our screen. Here's a quick sample of this code. You can see the other types at the bottom as an appendix.

data RenderParameters = RenderParameters
  { screenWidth :: Float
  , screenHeight :: Float
  , screenOffsetX :: Float
  , screenOffsetY :: Float
  , textOffset :: (Float, Float)
  , textScale :: (Float, Float)
  , playerRenderParameters :: PlayerRenderParameters
  , enemyRenderParameters :: EnemyRenderParameters
  , cellRenderParameters :: CellRenderParameters
  }

data CellRenderParameters = CellRenderParameters
  { cellWallColor :: Color
  , cellStunColor :: Color
  , cellWallWidth :: Float
  }

No More Magic

Once we have these types in place, our next step is to replace the magic numbers (and colors) in our application. We'll need to add the parameters as arguments in a few places. Most of all, the drawingFunc will need the RenderParameters argument.

drawingFunc :: RenderParameters -> World -> Picture
...

This process isn't too much of a challenge, as all our important functions take the World as an input. Then for now, we'll pass our default parameter packs as arguments when running the program. Here's a quick look at changes to our main function:

main :: IO ()
main = do
  gen <- getStdGen
  let gameParams = defaultGameParameters
      renderParams = defaultRenderParameters
      (maze, gen') = generateRandomMaze
        gen (numRows gameParams, numColumns gameParams)
      (randomLocations, gen'') = runState
        (replicateM
          (numEnemies gameParams)
          (generateRandomLocation
            (numRows gameParams, numColumns gameParams)))
            gen'
      enemies = (mkNewEnemy
        (enemyGameParameters gameParams)) <$> randomLocations
      endCell = (numColumns gameParams - 1, numRows gameParams - 1)
      initialWorld = World
        (newPlayer (playerGameParameters gameParams))
        (0,0) endCell maze GameInProgress gen'' enemies [] 0 
        gameParams
  play
    (windowDisplay renderParams)
    white
    (tickRate gameParams)
    initialWorld
    (drawingFunc renderParams)
    inputHandler
    updateFunc

Take a look at this commit for a longer look at all our parameter changes.

Serializing Our World

Now that we've updated our World type, we'll want to determine how we can serialize it. For simplicity's sake we'll use JSON serialization. This is mostly a matter of creating (or, if you wish, deriving), a bunch of ToJSON and FromJSON instances. Check out this article for a refresher on the Data.Aeson library.

Most of this code is pretty simple. With game parameters, a lot of the instances are a simple matter of creating and parsing pairs. Here's an example with Player:

instance FromJSON Player where
  parseJSON = withObject "Player" $ \o -> do
    location <- o .: "location"
    currentStunDelay <- o .: "currentStunDelay"
    nextStunDelay <- o .: "nextStunDelay"
    return $ Player location currentStunDelay nextStunDelay

instance ToJSON Player where
  toJSON p = object
    [ "location" .= playerLocation p
    , "currentStunDelay" .= playerCurrentStunDelay p
    , "nextStunDelay" .= playerNextStunDelay p
    ]

But there are a few caveats. To start, we need to make a separate file for these instances. We'll need our maze parsing code for the World type, and this depends on the Types module. We have to avoid the resulting dependency cycle.

It's generally a bad practice to separate instances from the type declarations. We call these "orphan" instances and you'll get a compiler warning about them in other projects. A way around this is to create wrapper types. This is a little tedious, so we won't do it for all the types. But we will show the concept for the Color type from Graphics.Gloss. Let's start with a wrapper type:

newtype ColorWrapper = ColorWrapper { unColor :: Color }

Now we can create instances on this wrapper, and they're considered valid. To cover all cases of color, we'd use RGB arrays. But we'll color cover the 9 or so colors we care about and parse them as strings. Here's what the instances look like. Notice how we wrap and unwrap the actually library functions for the colors:

instance ToJSON ColorWrapper where
  toJSON (ColorWrapper c) = Ae.String colorStr
    where
      colorStr
        | c == blue = "blue"
        | c == red = "red"
        | c == yellow = "yellow"
        | c == green = "green"
        | c == cyan = "cyan"
        | c == orange = "orange"
        | c == magenta = "magenta"
        | c == rose = "rose"
        | c == black = "black"

instance FromJSON ColorWrapper where
  parseJSON = withText "ColorWrapper" parseText
    where
      parseText "blue" = return (ColorWrapper blue)
      parseText "red" = return (ColorWrapper red)
      parseText "yellow" = return (ColorWrapper yellow)
      parseText "green" = return (ColorWrapper green)
      parseText "cyan" = return (ColorWrapper cyan)
      parseText "orange" = return (ColorWrapper orange)
      parseText "magenta" = return (ColorWrapper magenta)
      parseText "rose" = return (ColorWrapper rose)
      parseText "black" = return (ColorWrapper black)
      parseText _ = error "Couldn't parse color!"

Then we can use these instances within other parsers for render parameters. The other caveat now is to parse out our World in two stages. We'll get all the basic fields and parameters first:

instance FromJSON World where
  parseJSON = withObject "World" $ \o -> do
    player <- o .: "player"
    startLoc <- o .: "startLocation"
    endLoc <- o .: "endLocation"
    result <- o .: "result"
    enemies <- o .: "enemies"
    stunCells <- o .: "stunCells"
    time <- o .: "time"
    params <- o .: "gameParameters"
  ...

Now we'll get the boundaries as a Text item. We'll parse the maze boundaries out using our parser as well as the number of rows and columns.

instance FromJSON World where
  parseJSON = withObject "World" $ \o -> do
    ...
    (boundaryString :: Text) <- o .: "boundaries"
    let (rs, cs) = (numRows params, numColumns params)
    let boundaries =
      case runParser (mazeParser (rs, cs)) "" boundaryString of
          Right result -> result
          _ -> error "Map parse failed!"

As a last trick, we'll check what the random seed is within our parameters. If it's Nothing, we'll fix the generator with a seed of 1 and rely on other code to change it:

instance FromJSON World where
  parseJSON = withObject "World" $ \o -> do
    ...
    let gen = case randomGeneratorSeed params of
          Just i -> mkStdGen i
          _ -> mkStdGen 1
    return $ World player startLoc endLoc boundaries
      result gen enemies stunCells time params

Take a look at this commit to see the full code for these instances. Now let's see how we use them!

Saving Our World

We'd like to make it so that our user can save their game-state by hitting the s key at any point in the game. This idea starts out simple enough. We add a handler for the key in our inputHandler.

inputHandler :: Event -> World -> World
inputHandler event w
  | worldResult w == GameWon = ...
  | worldResult w == GameLost = ...
  | otherwise = case event of
      ...
      (EventKey (Char 's') Down _ _) -> ...

But now we're a little stuck! We want to write out to a file, but our handler is a pure function! There miiight be a way to do this without breaking functional purity. Perhaps we could keep a list of saved world states and add a handler to save them at the end of our program. But Gloss wasn't made for that. So we're going to break the rules a bit and resort to unsafePerformIO. This allows us to run an IO computation from a seemingly pure context. Here's the basic layout:

inputHandler :: Event -> World -> World
inputHandler event w
      ...
      (EventKey (Char 's') Down _ _) -> unsafeSaveWorldToFile w

unsafeSaveWorldToFile :: World -> World
unsafeSaveWorldToFile w = unsafePerformIO $ do
  …
  return w

Since we have a JSON instance for our World, we'll lean on the encodeFile function. The rest of the work here comes from generating a filename using the current time, for uniqueness:

unsafeSaveWorldToFile :: World -> World
unsafeSaveWorldToFile w = unsafePerformIO $ do
  timeAsString <- show . floor <$> getPOSIXTime
  currentDir <- getCurrentDirectory
  let filename = currentDir ++ "/maze_game_save_" ++ timeAsString
  encodeFile filename w
  return w

And that's all it takes for us to save some game files! Perhaps you've heard the phrase "don't try this at home." When it comes to unsafePerformIO, feel free to try it at home, but don't try it at work! Take a look at this commit for details on saving the state.

Conclusion

In spite of unsafePerformIO, our game feel like a much more "grown-up" program now. The code quality is much better with our parameters. We now have a lot more options of what to do when it comes to improving it. Saving the world state is the first step towards solving some interesting problems. Next week, we'll explore how we can load the saved game states we've created.

As we move forward, we'll keep trying to turn this game into a more mature program. Eventually though, you should think about using Haskell for more common production use cases. To learn about different libraries you can use, download our Production Checklist!

Appendix: Render Parameter Types

data RenderParameters = RenderParameters
  { screenDimen :: Int
  , screenOffsetX :: Int
  , screenOffsetY :: Int
  , textOffset :: (Float, Float)
  , textScale :: (Float, Float)
  , playerRenderParameters :: PlayerRenderParameters
  , enemyRenderParameters :: EnemyRenderParameters
  , cellRenderParameters :: CellRenderParameters
  }

data PlayerRenderParameters = PlayerRenderParameters
  { playerIndicatorSize :: Float
  , playerIndicatorColor :: Color
  , playerStunIndicatorSize :: Float
  , playerStunIndicatorColor :: Color
  }

data EnemyRenderParameters = EnemyRenderParameters
  { enemySize :: Float
  , enemyBaseColor :: Color
  , enemyStunnedColor :: Color
  }

data CellRenderParameters = CellRenderParameters
  { cellWallColor :: Color
  , cellStunColor :: Color
  , cellWallWidth :: Float
  }

defaultRenderParameters :: RenderParameters
defaultRenderParameters = RenderParameters
  625 10 10 (-275, 0) (0.12, 0.25) playerParams enemyParams cellParams
  where
    playerParams = PlayerRenderParameters 10 black 5 red
    enemyParams = EnemyRenderParameters 10 orange yellow
    cellParams = CellRenderParameters blue cyan 2