Different Feature Schemes

Oct 14

In last week's edition of our Maze AI series, we explored the mechanics of supervised learning. We took the training data we'd been building up and trained an agent on it. We had one set of data to make the AI follow our own human moves, and another to follow our hand-crafted AI. This wasn't particularly successful. The resulting agent had a lot of difficulty navigating the maze and using its stun at the right times.

This week, we'll explore a couple different ways we can expand the feature set. First, we'll try encoding the legality of moves in our feature set. Second, we'll try expanding the feature set to include more data about the grid. This will motivate some other approaches to the problem. We'll conclude by taking the specifics of grid navigation out. We'll let our agent go to work on an empty grid to validate that this is at least a reasonable approach.

For some more reading on using Haskell and AI, take a look at our Haskell AI Series. We explore some reasons why Haskell could be a good fit for AI and machine learning problems. It will also help you through some of the basics of using Haskell and Tensor Flow.

Encoding Legal Moves

Our supervised agent uses our current feature set. Let's remind ourselves what these features are. We have five different directions we can go (up, down, left, right, stand still). And in each of these directions, we calculate 8 different features.

The maze distance to the goal
The manhattan distance to the goal
Whether the location contains an active enemy
The number of enemies on the shortest path to the goal from that location
The distance to the nearest enemy from that location
The number of nearby enemies in manhattan distance terms
Whether our stun is available
The number of drills we have after the move

Some of these features are higher level. We do non-trivial calculations to figure them out. This gives our agent some idea of strategy. But there's not a ton of lower level information available! We zero out the features for a particular spot if it's past the world boundary. But we can't immediately tell from these features if a particular move is legal.

This is a big oversight. It's possible for our AI to learn about the legality of moves from the higher level training data. But it would take a lot more data and a lot more time.

So let's add a feature for how "easy" a move is. A value of 0 will indicate an illegal move, either past the world boundary or through a wall when we don't have a drill. A value of 1 will indicate a move that requires a drill. A value of 2 will indicate a normal move.

We'll add the extra feature into the LocationFeatures type. We'll also add an extra parameter to our produceLocationFeatures function. This boolean indicates whether a drill would be necessary. Note, we don't need to account for WorldBoundary. The value will get zeroed out in that case. We'll call this feature lfMoveEase since a higher value indicates less effort.

data LocationFeatures = LocationFeatures
  { …
  , lfMoveEase :: Int
  }

produceLocationFeatures :: Location -> World -> Bool -> LocationFeatures
produceLocationFeatures location@(lx, ly) w needsDrill = LocationFeatures
  …
  moveEase
  where
    moveEase = if not needsDrill then 2
      else if drillsRemaing > 0 then 1 else 0

It's easy to add the extra parameter to the function call in produceWorldFeatures. We already use case statements on the boundary types. Now we need to account for it when vectorizing our world.

vectorizeWorld :: World -> V.Vector Float
vectorizeWorld w = V.fromList (fromInegral <$>
  [ ...
  , lfMoveEase standStill
  ...
  , zeroIfNull (lfMoveEase <$> up)
  ...
  , zeroIfNull (lfMoveEase <$> right)
  ...
  , zeroIfNull (lfMoveEase <$> down)
  ...
  , zeroIfNull (lfMoveEase <$> left)
  ])

We we train with this feature set, we actually get a good training error, down to around 10%. Thus it can learn our data a bit better. Yet it still can't navigate right.

Expanding the Feature Set

Another option we can try is to serialize the world in a more raw state. We currently use more strategic features. But what about using the information on the board?

Here's a different way to look at it. Let's fix it so that the grid must be 10x10, there must be 2 enemies, and we must start with 2 drills powerups on the map. Let's get these features about the world:

100 features for the grid cells. Each feature will be the integer value corresponding the the wall-shape of that cell. These are hexadecimal, like we have when serializing the maze.
4 features for the player. Get the X and Y coordinates for the position, the current stun delay, and the number of drills remaining.
3 features for each enemy. Again, X and Y coordinates, as well as a stun timer.
2 coordinate features for each drill location. Once a drill gets taken, we'll use -1 and -1.

This will give us a total of 114 features. Here's how it breaks down.

vectorizeWorld :: World -> V.Vector Float
vectorizeWorld w = gridFeatures V.++ playerFeatures V.++
                     enemyFeatures V.++ drillFeatures
  where

    -- 1. Features for the Grid
    mazeStr = Data.Text.unpack $ dumpMaze (worldBoundaries w)
    gridFeatures = V.fromList $
      fromIntegral <$> digitToInt <$> mazeStr

    player = worldPlayer w
    enemies = worldEnemies w

    -- 2. Features for the player
    playerFeatures = V.fromList $ fromIntegral <$>
      [ fst . playerLocation $ player
      , snd . playerLocation $ player
      , fromIntegral $ playerCurrentStunDelay player
      , fromIntegral $ playerDrillsRemaining player
      ]

    -- 3. Features for the two enemies
    enemy1 = worldEnemies w !! 0
    enemy2 = worldEnemies w !! 1
    enemyFeatures = V.fromList $ fromIntegral <$>
      [ fst . enemyLocation $ enemy1
      , snd . enemyLocation $ enemy1
      , fromIntegral $ enemyCurrentStunTimer enemy1
      , fst . enemyLocation $ enemy2
      , snd . enemyLocation $ enemy2
      , fromIntegral $ enemyCurrentStunTimer enemy2
      ]

    -- 4. Features for the drill locations
    drills = worldDrillPowerUpLocations w
    drillFeatures = V.fromList $ fromIntegral <$>
      if length drills == 0 then [-1, -1, -1, -1]
        else if length drills == 1
          then [fst (head drills), snd (head drills), -1, -1]
          else [ fst (head drills), snd (head drills)
               , fst (drills !! 1), snd (drills !! 1)
               ]

As an optimization, we can make the grid features part of the world since they will not change.

Still though, our model struggles to complete the grid when training off this data. Compared to the high-level features, the model doesn't even learn very well. We get training errors around 25-30%, but a test error close to 50%. With more data and time, our model might be able to draw the connection between various features.

We could attempt to make our model more sophisticated. We're working with grid data, which is a little like an image. Image processing algorithms use concepts such as convolution and pooling. This allows them to derive patterns arising from how the grid actually looks. We're only looking at the data as a flat vector.

It's unlikely though that convolution and pooling would help us with this feature set. Our secondary features don't fit into the grid. So we would actually want to add them in at a later stage in the process. Besides, we won't get that much data from taking the average value or the max value in a 2x2 segment of the maze. (This is what pooling does).

If we simplify the problem though, we might find a situation where they'll help.

A Simpler Problem

We're having a lot of difficulty with getting our agent to navigate the maze. So let's throw away the problem of navigation for a second. Can we train an agent that will navigate the empty maze? This should be doable.

Let's start with a bare bones feature set with the goal and current location highlighted in a grid. We'll give a value of 10 for our player's location, and a value of 100 for the target location. We start with a vector of all zeros, and uses Vector.// to modify the proper values:

vectorizeWorld :: World -> V.Vector Float
vectorizeWorld w =
  where
    initialGrid = V.fromList $ take 100 (repeat 0.0)
    (px, py) = playerLocation (worldPlayer w)
    (gx, gy) = endLocation w
    playerLocIndex = (9 - py) * 10 + px
    goalLocIndex = (9 - gy) * 10 + gx
    finalFeatures = initialGrid V.//
      [(playerLocIndex, 10), (goalLocIndex)]

Our AI bot will always follow the same path in this grid, so it will be quite easy for our agent to learn this path! Even if we use our own moves and vary the path a little bit, the agent can still learn it. It'll achieve 100% accuracy on the AI data. It can't get that high on our data, since we might choose different moves for different squares. But we can still train it so it wins every time.

Conclusion

So our results are still not looking great. But next week we'll take this last idea and run a little further with it. We'll keep it so that our features only come from the grid itself. But we'll add a few more complications with enemies. We might find that convolution and pooling are useful in that case.

If you're interested in using Haskell for AI but don't know where to start, read our Haskell AI Series! We discuss some important ideas like why Haskell is a good AI language. We also get into the basics of Tensor Flow with Haskell.

GamesAI

James Bowen

Different Feature Schemes

Encoding Legal Moves

Expanding the Feature Set

A Simpler Problem

Conclusion

Enemies, Convolution and Pooling

Using Our Data with Supervised Learning