Moving Towards ML: Evaluation Functions

decision_tree.png

Before we get started, here's a reminder that today (August 5th) is the last day of enrollment for our Haskell From Scratch course! Sign-ups close at midnight Pacfic time! Don't miss out!

This week, we're going to start taking our AI in a somewhat new direction. Right now, we're hard-coding specific decisions for our player to make. But this week, we'll make a more general function for evaluating different positions. Our initial results will be inferior to the AI we've hand-coded. But we'll set ourselves up to have a much better AI in the future by applying machine learning.

For more details on the code for this article, take a look at the evaluation-game-function branch on our Github Repository! This article also starts our move towards machine learning related concepts. So now would be a good time to review our Haskell AI Series. You can download our Tensor Flow Guide to learn more about using Haskell and Tensor Flow!

Evaluation as a Strategy

Currently, our AI follows a strict set of rules. It performs pretty well for the current problem space. But suppose circumstances changed. Suppose we use different maze structures. Or we could add a completely new feature to the game. In these cases, we might need a completely different set of ideas to build a competent AI.

Our new strategy will be much more general. We'll supply our AI with a function that can evaluate a particular board position. That is, it will look at the world, and create a numeric output scoring it. Then our brain will look at all possible moves, score each position, and choose the move with the best result.

If game rules change, we'll need to rethink the evaluation function. But, by making the problem one of numbers to numbers, it'll be easier to use machine learning (instead of our own logic) to devise this function. This way, we can radically change the nature of the game, and we won't need to do too much manual work to change the AI. We might need to add new features (as we'll discuss later). But otherwise we would just need to re-train the evaluation function.

Top Down Development

To implement this approach, we'll put the "function" in functional programming. We'll start by outlining our decision making process with a series of type signatures. Let's remember that first, our overarching goal is a function that takes a World and gives us a PlayerMove:

makePlayerMove :: World -> PlayerMove

We should first determine the set of possible moves:

possibleMoves :: World -> [PlayerMove]

Then we'll need to calculate the new World from each of those moves. (We won't go over this function in this article. It mainly consists of refactoring code we already have for manipulating the game).

applyPlayerMove :: World -> PlayerMove -> World

Then we'll score each of those resulting worlds. This is where the real "brain" is going to live now:

evaluateWorld :: World -> Float

Now that we know the functions we're writing, we can already implement makePlayerMove. We'll assume our helpers already exist and then we apply the process outlined above:

makePlayerMove :: World -> PlayerMove
makePlayerMove w = bestMove
  where
    -- 1. Get our Moves
    allMoves = possibleMoves w

    -- 2. See what the results of each move are
    possibleWorlds = applyPlayerMove w <$> allMoves

    -- 3. Score each resulting world
    scores = evaluateWorld <$> possibleWorlds

    -- 4. Combine the world with its move and choose the best one
    movesWithScores = zip allMoves movesWithScores
    bestMove = fst $ maximumBy (\(_, score1) (_, score2) ->
      compare score1 score2) movesWithScores

This will compile, and we can now move on to the individual components.

Getting Possible Moves

Let's start with getting all the possible moves. When it comes to movement, we generally have five options: stand still, or move in one of four directions. But if we're out of drills, or near the boundary of the world, this can restrict our options. But we always have the sure option of standing still, so let's start with that:

possibleMoves :: World -> [PlayerMove]
possibleMoves w = …
  where
    standStillMove = PlayerMove DirectionNone False DirectionNone
    ...

Now in every direction, we'll have a Maybe move possibility. If it's a WorldBoundary, we'll get Nothing. Otherwise if it's a wall, then we'll have a possible move as long as a drill is available. Otherwise the move is possible, and we won't need a drill. We'll wrap these behaviors in a helper function, and then it's easy to use that in each direction:

possibleMoves :: World -> [PlayerMove]
possibleMoves w = baseMoves
  where
    standStillMove = PlayerMove DirectionNone False DirectionNone
    player = worldPlayer w
    bounds = (worldBoundaries w) Array.! (playerLocation player)

    possibleMove :: (CellBoundaries -> BoundaryType) ->
      MoveDirection -> Maybe PlayerMove
    possibleMove boundaryFunc direction =
      case boundaryFunc bounds of
        WorldBoundary -> Nothing
        Wall _ -> if playerDrillsRemaining player > 0
          then Just $ PlayerMove direction False direction
          else Nothing
        AdjacentCell _ -> Just $
          PlayerMove direction False DirectionNone

    upMove = possibleMove upBoundary DirectionUp
    rightMove = possibleMove rightBoundary DirectionRight
    downMove = possibleMove downBoundary DirectionDown
    leftMove = possibleMove leftBoundary DirectionLeft

    baseMoves = standStillMove : (catMaybes [upMove, rightMove, downMove, leftMove])

Now we have to factor in that each move can also apply the stun if it's available.

possibleMoves :: World -> [PlayerMove]
possibleMoves w = baseMoves ++ stunMoves
  where
    ...
    baseMoves = standStillMove : (catMaybes [upMove, rightMove, downMove, leftMove])

    stunMoves = if playerCurrentStunDelay player /= 0 then []
      else [ m { activateStun = True } | m <- baseMoves ]

And now we've got our moves!

Evaluating the Game Position

Now let's start tackling the problem of evaluating a particular game situation. Any manual solution we come up with here is likely to have problems. This is where machine learning will come in. But here's the general approach we want.

First, we'll select particular "features" of the world. For instance, how far away are we from the end of the maze? How many enemies are within our stun radius? We'll consider all these elements, and then come up with a "weight" for each feature. A weight is a measurement of whether that feature makes the position "good" or "bad". Then, we'll add together the weighted feature values to get a score. So here's a list of the features we're going to use:

  1. How close are we (in maze search terms) from the target location? This will use pure BFS and it will not account for using drills.
  2. How close are we in manhattan distance terms from the target location?
  3. Is there an active enemy on the same square as the player (this will receive a heavy negative weight!)
  4. How many enemies are within our stun radius?
  5. Is our stun available?
  6. How many drills do we have left?

Let's start by getting all these features:

evaluateWorld :: World -> Float
evaluateWorld w = ...
  where
    player = worldPlayer w
    playerLoc@(px, py) = playerLocation player
    radius = stunRadius . playerGameParameters . worldParameters $ w
    goalLoc@(gx, gy) = endLocation w
    activeEnemyLocations = enemyLocation <$>
      (filter (\e -> enemyCurrentStunTimer e == 0) (worldEnemies w))

    onActiveEnemy = playerLocation player `elem` activeEnemyLocations

    shortestPathLength = length $
      getShortestPath (worldBoundaries w) playerLoc goalLoc

    manhattanDistance = abs (gx - px) + abs (gy - py)

    stunAvailable = playerCurrentStunDelay player == 0

    numNearbyEnemies = length
      [ el | el@(elx, ely) <- activeEnemyLocations,
        abs (elx - px) <= radius && abs (ely - py) <= radius ]

    drillsRemaining = playerDrillsRemaining player

Now let's move on to assigning scores. If our player is on the same square as an active enemy, we lose. So let's give this a weight of -1000. Conversely, the closer we get to the target, the closer we are to winning. So let's devise a function where if that distance is 0, the score is 1000. Then the farther away we get, the more points we lose. Let's say, 20 points per square. For manhattan distance, we'll use a strict penalty, rather than reward:

evaluateWorld :: World -> Float
evaluateWorld w = ...
  where
    ...
    onActiveEnemyScore = if onActiveEnemy then -1000.0 else 0.0
    shortestPathScore = 1000.0 - (20.0 * (fromIntegral shortestPathLength))
    manhattanDistanceScore = (-5.0) * (fromIntegral manhattanDistance)

Now we want to generally reward having our power ups available to us. This will stop the bot from needlessly using them and also reward it for picking up new drills. We'll also penalize having enemies too close to us.

evaluateWorld :: World -> Float
evaluateWorld w = ...
  where
    ...
    stunAvailableScore = if stunAvailable then 80.0 else 0.0
    numNearbyEnemiesScore = -100.0 * (fromIntegral numNearbyEnemies)
    drillsRemainingScore = 30.0 * (fromIntegral drillsRemaining)

And to complete the function, we'll just add these together:

evaluateWorld :: World -> Float
evaluateWorld w =
  onActiveEnemyScore +
  shortestPathScore +
  manhattanDistanceScore +
  stunAvailableScore +
  numNearbyEnemiesScore +
  drillsRemainingScore

How Well Does it Work?

When we run the game now with the AI active, we see some interesting behaviors. Our bot will generally navigate the maze well. It's path isn't optimal, as we have with drillBFS. But it makes decent choices about drilling. Its behavior around enemies is a bit strange. It tends to stay away from them, even if they're not actually close in maze difference. This makes it take longer than it needs.

We still don't have good retreating behavior in certain cases. It will often stand still and let an enemy grab it instead of running away.

At this point, we have a couple options for improving the AI. First, we could try tweaking the weights. This will be tedious for us to do manually. This is why we want to apply machine learning techniques to come up with optimal weights.

But the other option is to update the feature space. If we can come up with more intelligent features, we won't need as precise weights.

Conclusion

Next week, we'll try to fix our behavior around enemies. We'll use true maze distance in more places as opposed to manhattan distance. This should give us some big improvements. Then we'll start looking into how we can learn better weights.

We'll be coming up pretty soon on using Tensor Flow for this program! Download our Haskell Tensor Flow Guide to learn more!

And if you're still a Haskell beginner, there's never been a better time to learn! Register for our Haskell From Scratch course to jump-start your Haskell journey! Enrollment ends at midnight TODAY! (August 5th).