Gathering Smart Data
Last week we made a few more fixes to our Q-Learning algorithm. Ultimately though, it still seems to fall short for even basic versions of our problem.
Q-learning is an example of an "unsupervised" learning approach. We don't tell the machine learning algorithm what the "correct" moves are. We give it rewards when it wins the game (and negative rewards when it loses). But it needs to figure out how to play to get those rewards. With supervised learning, we'll have specific examples of what it should do! We'll have data points saying, "for this feature set, we should make this move." We'll determine a way to record the moves we make in our game, both as a human player and with our manual AI algorithm! This will become our "training" data for the supervised learning approach.
This week's code is all on the Gloss side of things. You can find it on our Github repository under the branch record-player-ai
. Next week, we'll jump back into Tensor Flow. If you're not familiar yet with how to use Haskell and Tensor Flow, download our Haskell Tensor Flow Guide!
Recording Moves
To gather training data, we first need a way to record moves in the middle of the game. Gloss doesn't give us access to the IO
monad in our update functions. So we'll unfortunately have to resort to unsafePerformIO
for this, since we need the data in a file. (We did the same thing when saving game states). Here's the skeleton of our function:
unsafeSaveMove :: Int -> World -> World -> World
unsaveSaveMove moveChoice prevWorld nextWorld = unsafePerformIO $ do
...
The first parameter will be a representation of our move, an integer from 0-9. This follows the format we had with serialization.
0 -> Move Up
1 -> Move Right
2 -> Move Down
3 -> Move Left
4 -> Stand Still
X + 5 -> Move direction X and use the stun
The first World
parameter will be the world under which we made the move. The second world will be the resulting world. This parameter only exists as a pass-through, because of how unsafePerformIO
works.
Given these parameters, our function is pretty straightforward. We want to record a single line that has the serialized world state values and our final move choice. These will go in a comma separated list. We'll save everything to a file called moves.csv
. So let's open that file and get the list of numbers. We'll immediately convert the numbers to strings with show
.
unsafeSaveMove :: Int -> World -> World -> World
unsaveSaveMove moveChoice prevWorld nextWorld = unsafePerformIO $ do
handle <- openFile "moves.csv" AppendMode
let numbers = show <$>
(Vector.toList (vectorizeWorld prevWorld) ++
[fromIntegral moveChoice])
...
Now that our values are all strings, we can get them in a comma separated format with intercalate
. We'll write this string to the file and close the handle!
unsafeSaveMove :: Int -> World -> World -> World
unsaveSaveMove moveChoice prevWorld nextWorld = unsafePerformIO $ do
handle <- openFile "moves.csv" AppendMode
let numbers = show <$>
(Vector.toList (vectorizeWorld prevWorld) ++
[fromIntegral moveChoice])
let csvString = intercalate "," numbers
hPutStrLn handle csvString
hClose handle
return nextWorld
Now let's figure out how to call this function!
Saving Human Moves
Saving the moves we make as a human is pretty easy. All we need to do is hook into the inputHandler
. Recall this section, that receives moves from arrow keys and makes our move:
inputHandler :: Event -> World -> World
inputHandler event w
...
| otherwise = case event of
(EventKey (SpecialKey KeyUp) Down (Modifiers _ _ Down) _) ->
drillLocation upBoundary breakUpWall breakDownWall w
(EventKey (SpecialKey KeyUp) Down _ _) ->
updatePlayerMove upBoundary
(EventKey (SpecialKey KeyDown) Down (Modifiers _ _ Down) _) ->
drillLocation downBoundary breakDownWall breakUpWall w
(EventKey (SpecialKey KeyDown) Down _ _) ->
updatePlayerMove downBoundary
(EventKey (SpecialKey KeyRight) Down (Modifiers _ _ Down) _) ->
drillLocation rightBoundary breakRightWall breakLeftWall w
(EventKey (SpecialKey KeyRight) Down _ _) ->
updatePlayerMove rightBoundary
(EventKey (SpecialKey KeyLeft) Down (Modifiers _ _ Down) _) ->
drillLocation leftBoundary breakLeftWall breakRightWall w
(EventKey (SpecialKey KeyLeft) Down _ _) ->
updatePlayerMove leftBoundary
(EventKey (SpecialKey KeySpace) Down _ _) ->
if playerCurrentStunDelay currentPlayer /= 0
then w
else w
{ worldPlayer =
activatePlayerStun currentPlayer playerParams
, worldEnemies = stunEnemyIfClose <$> worldEnemies w
, stunCells = stunAffectedCells
}
…
All these lines return World
objects! So we just need to wrap them as the final argument to unsafeSaveWorld
. Then we add the appropriate move choice number. The strange part is that we cannot move AND stun at the same time when playing as a human. So using the stun will always be 9, which means stunning while standing still. Here are the updates:
inputHandler :: Event -> World -> World
inputHandler event w
...
| otherwise = case event of
(EventKey (SpecialKey KeyUp) Down (Modifiers _ _ Down) _) ->
unsafeSaveMove 0 w $
drillLocation upBoundary breakUpWall breakDownWall w
(EventKey (SpecialKey KeyUp) Down _ _) ->
unsafeSaveMove 0 w $ updatePlayerMove upBoundary
(EventKey (SpecialKey KeyDown) Down (Modifiers _ _ Down) _) ->
unsafeSaveMove 2 w $
drillLocation downBoundary breakDownWall breakUpWall w
(EventKey (SpecialKey KeyDown) Down _ _) ->
unsafeSaveMove 2 w $ updatePlayerMove downBoundary
(EventKey (SpecialKey KeyRight) Down (Modifiers _ _ Down) _) ->
unsafeSaveMove 1 w $
drillLocation rightBoundary breakRightWall breakLeftWall w
(EventKey (SpecialKey KeyRight) Down _ _) ->
unsafeSaveMove 1 w $ updatePlayerMove rightBoundary
(EventKey (SpecialKey KeyLeft) Down (Modifiers _ _ Down) _) ->
unsafeSaveMove 3 w $
drillLocation leftBoundary breakLeftWall breakRightWall w
(EventKey (SpecialKey KeyLeft) Down _ _) ->
unsafeSaveMove 3 w $ updatePlayerMove leftBoundary
(EventKey (SpecialKey KeySpace) Down _ _) ->
if playerCurrentStunDelay currentPlayer /= 0
then w
else unsafeSaveMove 9 w $ w
{ worldPlayer =
activatePlayerStun currentPlayer playerParams
, worldEnemies = stunEnemyIfClose <$> worldEnemies w
, stunCells = stunAffectedCells
}
…
And now whenever we play the game, it will save our moves! Keep in mind though, it takes a lot of training data to get good results when using supervised learning. I played for an hour and got around 10000 data points. We'll see if this is enough!
Saving AI Moves
While the game is a least a little fun, it's also exhausting to keep playing it to generate data! So now let's consider how we can get the AI to play the game itself and generate data. The first step is to reset the game automatically on winning or losing:
updateFunc :: Float -> World -> World
updateFunc _ w =
| (worldResult w == GameWon || worldResult w == GameLost) &&
(usePlayerAI params) =
...
The rest will follow the other logic we have for resetting the game. Now we must examine where to insert our call to unsafeSaveMove
. The answer is our updateWorldForPlayerMove
function. Wecan see that we get the move (and our player's cached memory) as part of makePlayerMove
:
updateWorldForPlayerMove :: World -> World
updateWorldForPlayerMove w = …
where
(move, memory) = makePlayerMove w
...
We'll want a quick function to convert our move into the number choice:
moveNumber :: PlayerMove -> Int
moveNumber (PlayerMove md useStun dd) =
let directionFactor = case (md, dd) of
(DirectionUp, _) -> 0
(_, DirectionUp) -> 0
(DirectionRight, _) -> 1
(_, DirectionRight) -> 1
(DirectionDown, _) -> 2
(_, DirectionDown) -> 2
(DirectionLeft, _) -> 3
(_, DirectionLeft) -> 3
_ -> 4
in if useStun then directionFactor + 5 else directionFactor
Our saving function requires a pass-through world parameter. So we'll do the saving on our first new World
calculation. This comes from modifyWorldForPlayerDrill
:
updateWorldForPlayerMove :: World -> World
updateWorldForPlayerMove w = …
where
(move, memory) = makePlayerMove w
worldAfterDrill = unsafeSaveMove (moveNumber move) w
(modifyWorldForPlayerDrill …)
...
And that's all! Now our AI will play the game by itself, gathering data for hours on end if we like! We'll get some different data for different cases, such as 4 enemies 4 drills, 8 enemies 5 drills, and so on. This is much faster and easier than playing the game ourselves! It will automatically get 12-15 thousand data points an hour if we let it!
Conclusion
With a little bit of persistence, we can now get a lot of data for the decisions a smarter agent will make. Next week, we'll take the data we've acquired and use it to write a supervised learning algorithm! Instead of using Q-learning, we'll make the weights reflect the decisions that we (or the AI) would make.
Supervised learning is not without its pitfalls! It won't necessarily perform optimally. It will perform like the training data. So even if we're successful, our algorithm will replicate our own mistakes! It'll be interesting to see how this plays out, so stay tuned!
For more information on using Haskell in AI, take a look at our Haskell AI Series. Plus, download our Haskell Tensor Flow Guide to learn more about using this library!