Making a Learning Model

Sep 2

Last week we took a few more steps towards using machine learning to improve the player AI for our maze game. We saw how to vectorize the input and output data for our world state and moves. This week, we'll finally start seeing how to use these in the larger context of a Tensor Flow program. We'll make a model for a super basic neural network that will apply the technique of Q-Learning.

Our machine learning code will live in a separate repository than the primary game code. Be sure to check that out here! The first couple weeks of this part of the series will use the basic-trainer branch.

This week, we'll finally started diving into using Haskell with Tensor Flow. Be sure to read our Haskell AI Series to learn more about this! You can also download our Haskell Tensor Flow guide to learn the basics of the library.

Model Basics

This week's order of business will be to build a Tensor Flow graph that can make decisions in our maze game. The graph should take a serialized world state as an input, and then produce a distribution of scores. These scores correspond to the different moves we can make.

Re-calling from last week, the input to our model will be a 1x8 vector, and the output will be a 10x1 vector. For now then, we'll represent our model with a single variable tensor that will be a matrix of size 8x10. We'll get the output by multiply the inputs by the weights.

Ultimately, there are three things we need to access from this model.

The final weights
A step to iterate the world
A step to train our model and adjust the weights.

Here's what the model looks like, using Tensor Flow types:

data Model = Model
  { weightsT :: Variable Float
  , iterateWorldStep :: TensorData Float -> Session (Vector Float)
  , trainStep :: TensorData Float -> TensorData Float -> Session ()
  }

The first element is the variable tensor for our weights. We need to expose this so we can output them at the end. The second element is a function that will take in a serialized world state and produce the output move. Then the third element will take both a serialized world state AND some expected values. It will update the variable tensor as part of the Q-Learning process. Next week, we'll write iteration functions in the Session monad. They'll use these two elements.

Building the Iterate Step

To make these Tensor Flow items, we'll also need to use the Session monad. Let's start a basic function to build up our model:

buildModel :: Session Model
buildModel = do
  ...

To start, let's make a variable for our weights. At the start, we'll randomize them with truncatedNormal and then make that into a Variable:

buildModel :: Session Model
buildModel = do
  (initialWeights :: Tensor Value Float) <-
    truncatedNormal (vector [8, 10])
  (weights :: Variable Float) <- initializedVariable initialWeights

Now let's build the items for running our iterate step. This first involves taking the inputs as a placeholder. Remember, the inputs come from the vectorization of the world state.

Then to produce our output, we'll multiply the inputs by our weights. The result is a Build tensor, so we need to render it to use it in the next part. As an extra note, we need readValue to turn our Variable into a Tensor we can use in operations.

buildModel :: Session Model
buildModel = do
  (initialWeights :: Tensor Value Float) <-
    truncatedNormal (vector [8, 10])
  (weights :: Variable Float) <- initializedVariable initialWeights
  (inputs :: Tensor Value Float) <- placeholder (Shape [1,8])
  let (allOutputs :: Tensor Build Float) =
        inputs `matMul` (readValue weights)
  returnedOutputs <- render allOutputs
  ...

The next part is to create a step to "run" the outputs. Since the outputs depend on a placeholder, we need to create a feed for the input. Then we can create a runnable Session action with runWithFeeds. This gives us the second element of our Model, the iterateStep.

buildModel :: Session Model
buildModel = do
  ...
  let iterateStep = \inputFeed ->
        runWithFeeds [feed inputs inputFeed] returnedOutputs
  ...

Using Q-Learning in the Model

This gives us what we need to run our basic AI and make moves in the game. But we still need to apply some learning mechanism to update the weights!

We want to use Q-Learning. This means we'll compare the output of our model with the next output from continuing to step through the world. So first let's introduce another placeholder for these new outputs:

buildModel :: Session Model
buildModel = do
  initialWeights <- ...
  weights <- ...
  inputs <- ...
  returnedOutputs <- ...

  let iterateStep = ...

  -- Next set of outputs
  (nextOutputs :: Tensor Value Float) <- placeholder (Shape [10, 1])
  ...

Now we'll define our "loss" function. That is, we'll find the squared difference between our real output and the "next" output. Next week we'll see that the "next" output uses extra information about the game. This will allow us to bring an element of "truth" that we can learn from.

buildModel :: Session Model
buildModel = do
  ...
  returnedOutputs <- ...

  -- Q-Learning Section
  (nextOutputs :: Tensor Value Float) <- placeholder (Shape [10, 1])
  let (diff :: Tensor Build Float) = nextOutputs `sub` allOutputs
  let (loss :: Tensor Build Float) = reduceSum (diff `mul` diff)
  ...

Now, we'll make a final ControlNode using minimizeWith. This will minimize the loss function using the adam optimizer. We'll pass weights as an input, since this is a variable we are trying to update for this change.

buildModel :: Session Model
buildModel = do
  ...
  returnedOutputs <- ...

  -- Q-Learning Section
  (nextOutputs :: Tensor Value Float) <- placeholder (Shape [10, 1])
  let (diff :: Tensor Build Float) = nextOutputs `sub` allOutputs
  let (loss :: Tensor Build Float) = reduceSum (diff `mul` diff)
  (trainer_ :: ControlNode) <- minimizeWith adam loss [weights]

Finally, we'll make our training step, that will run the training node on two input feeds. One for the world input, and one for the expected output. Then we can return our completed model.

buildModel :: Session Model
buildModel = do
  ...
  inputs <- ...
  weights <- ...
  returnedOutputs <- ...
  let iterateStep = ...

  -- Q-Learning Section
  (nextOutputs :: Tensor Value Float) <- placeholder (Shape [10, 1])
  let diff = ...
  let loss = ...
  (trainer_ :: ControlNode) <- minimizeWith adam loss [weights]
  let trainingStep = \inputFeed nextOutputFeed -> runWithFeeds
        [ feed inputs inputFeed
        , feed nextOutputs nextOutputFeed
        ]
        trainer_
  return $ Model
    weights
    iterateStep
    trainingStep

Conclusion

Now we've got our machine learning model. We have different functions that can iterate on our world state as well as train the outputs of our graph. Next week, we'll see how to combine these steps within the Session monad. Then we can start running training iterations and produce results.

If you want to follow along with these code examples, make sure to download our Haskell Tensor Flow Guide! This library is quite tricky to use. There are a lot of secondary dependencies for it. So you won't want to go in trying to use it blind!

GamesAI

James Bowen

Making a Learning Model

Model Basics

Building the Iterate Step

Using Q-Learning in the Model

Conclusion

Running Training Iterations

Q-Learning Primer