Ownership: Managing Memory in Rust
When we first discussed Rust we mentioned how it has a different memory model than Haskell. The suggestion was that Rust allows more control over memory usage, like C++. In C++, we explicitly allocate memory on the heap with new
and de-allocate it with delete
. In Rust, we do allocate memory and de-allocate memory at specific points in our program. Thus it doesn't have garbage collection, as Haskell does. But it doesn't work quite the same way as C++.
In this article, we'll discuss the notion of ownership. This is the main concept governing Rust's memory model. Heap memory always has one owner, and once that owner goes out of scope, the memory gets de-allocated. We'll see how this works; if anything, it's a bit easier than C++!
For a more detailed look at getting started with Rust, take a look at our Rust video tutorial!
Scope (with Primitives)
Before we get into ownership, there are a couple ideas we want to understand. First, let's go over the idea of scope. If you code in Java, C, or C++, this should be familiar. We declare variables within a certain scope, like a for-loop or a function definition. When that block of code ends, the variable is out of scope. We can no longer access it.
int main() {
for (int i = 0; i < 10; ++i) {
int j = 0;
// Do something with j...
}
// This doesn't work! j is out of scope!
std::cout << j << std::endl;
}
Rust works the same way. When we declare a variable within a block, we cannot access it after the block ends. (In a language like Python, this is actually not the case!)
fn main() {
let j: i32 = {
let i = 14;
i + 5
};
// i is out of scope. We can't use it anymore.
println!("{}", j);
}
Another important thing to understand about primitive types is that we can copy them. Since they have a fixed size, and live on the stack, copying should be inexpensive. Consider:
fn main() {
let mut i: i32 = 10;
let j = i;
i = 15;
// Prints 15, 10
println!("{}, {}", i, j);
}
The j
variable is a full copy. Changing the value of i
doesn't change the value of j
. Now for the first time, let's talk about a non-primitive type, String
.
The String Type
We've dealt with strings a little by using string literals. But string literals don't give us a complete string type. They have a fixed size. So even if we declare them as mutable, we can't do certain operations like append another string. This would change how much memory they use!
let mut my_string = "Hello";
my_string.append(" World!"); // << This doesn't exist for literals!
Instead, we can use the String
type. This is a non-primitive object type that will allocate memory on the heap. Here's how we can use it and append to one:
let mut my_string = String::from("Hello");
my_string.push_str(" World!");
Now let's consider some of the implications of scope with object types.
Scope with Strings
At a basic level, some of the same rules apply. If we declare a string within a block, we cannot access it after that block ends.
fn main() {
let str_length = {
let s = String::from("Hello");
s.len()
}; // s goes out of scope here
// Fails!
println!("{}", s);
}
What's cool is that once our string does go out of scope, Rust handles cleaning up the heap memory for it! We don't need to call delete
as we would in C++. We define memory cleanup for an object by declaring the drop
function. We'll get into more details with this in a later article.
C++ doesn't automatically de-allocate for us! In this example, we must delete myObject
at the end of the for
loop block. We can't de-allocate it after, so it will leak memory!
int main() {
for (int i = 0; i < 10; ++i) {
// Allocate myObject
MyType* myObject = new MyType(i);
// Do something with myObject …
// We MUST delete myObject here or it will leak memory!
delete myObject;
}
// Can't delete myObject here!
}
So it's neat that Rust handles deletion for us. But there are some interesting implications of this.
Copying Strings
What happens when we try to copy a string?
let len = {
let s1 = String::from("Hello");
let s2 = s1;
s2.len()
};
This first version works fine. But we have to think about what will happen in this case:
let len = {
let mut s1 = String::from("123");
let mut s2 = s1;
s1.push_str("456");
s1.len() + s2.len()
};
For people coming from C++ or Java, there seem to be two possibilities. If copying into s2
is a shallow copy, we would expect the sum length to be 12. If it's a deep copy, the sum should be 9.
But this code won't compile at all in Rust! The reason is ownership.
Ownership
Deep copies are often much more expensive than the programmer intends. So a performance-oriented language like Rust avoids using deep copying by default. But let's think about what will happen if the example above is a simple shallow copy. When s1
and s2
go out of scope, Rust will call drop
on both of them. And they will free the same memory! This kind of "double delete" is a big problem that can crash your program and cause security problems.
In Rust, here's what would happen with the above code. Using let s2 = s1
will do a shallow copy. So s2
will point to the same heap memory. But at the same time, it will invalidate the s1
variable. Thus when we try to push values to s1
, we'll be using an invalid reference. This causes the compiler error.
At first, s1
"owns" the heap memory. So when s1
goes out of scope, it will free the memory. But declaring s2
gives over ownership of that memory to the s2
reference. So s1
is now invalid. Memory can only have one owner. This is the main idea to get familiar with.
Here's an important implication of this. In general, passing variables to a function gives up ownership. In this example, after we pass s1
over to add_to_len
, we can no longer use it.
fn main() {
let s1 = String::from("Hello");
let length = add_to_length(s1);
// This is invalid! s1 is now out of scope!
println!("{}", s1);
}
// After this function, drop is called on s
// This deallocates the memory!
fn add_to_length(s: String) -> i32 {
5 + s.len()
}
This seems like it would be problematic. Won't we want to call different functions with the same variable as an argument? We could work around this by giving back the reference through the return value. This requires the function to return a tuple.
fn main() {
let s1 = String::from("Hello");
let (length, s2) = add_to_length(s1);
// Works
println!("{}", s2);
}
fn add_to_length(s: String) -> (i32, String) {
(5 + s.len(), s)
}
But this is cumbersome. There's a better way.
Borrowing References
Like in C++, we can pass a variable by reference. We use the ampersand operator (&
) for this. It allows another function to "borrow" ownership, rather than "taking" ownership. When it's done, the original reference will still be valid. In this example, the s1
variable re-takes ownership of the memory after the function call ends.
fn main() {
let s1 = String::from("Hello");
let length = add_to_length(&s1);
// Works
println!("{}", s1);
}
fn add_to_length(s: &String) -> i32 {
5 + s.len()
}
This works like a const
reference in C++. If you want a mutable reference, you can do this as well. The original variable must be mutable, and then you specify mut
in the type signature.
fn main() {
let mut s1 = String::from("Hello");
let length = add_to_length(&mut s1);
// Prints "Hello World!"
println!("{}", s1);
}
fn add_to_length(s: &mut String) -> i32 {
s.push_str(", World!");
5 + s.len()
}
There's one big catch though! You can only have a single mutable reference to a variable at a time! Otherwise your code won't compile! This helps prevent a large category of bugs!
As a final note, if you want to do a true deep copy of an object, you should use the clone
function.
fn main() {
let s1 = String::from("Hello");
let s2 = s1.clone();
// Works!
println!("{}", s1);
println!("{}", s2);
}
Notes On Slices
We can wrap up with a couple thoughts on slices. Slices give us an immutable, fixed-size reference to a continuous part of an array. Often, we can use the string literal type str
as a slice of an object String
. Slices are either primitive data, stored on the stack, or they refer to another object. This means they do not have ownership and thus do not de-allocate memory when they go out of scope.
What's Next?
Hopefully this gives you a better understanding of how memory works in Rust! Next time, we'll start digging into how we can define our own types. We'll start seeing some more ways that Rust acts like Haskell!
Digging Into Rust's Syntax
Last time we kicked off our study of Rust with a quick overview comparing it with Haskell. In this article, we'll start getting familiar with some of the basic syntax of Rust. The initial code looks a bit more C-like. But we'll also see how functional principles like those in Haskell are influential!
For a more comprehensive guide to starting out with Rust, take a look at our Rust video tutorial!
Hello World
As we should with any programming language, let's start with a quick "Hello World" program.
fn main() {
println!("Hello World!");
}
Immediately, we can see that this looks more like a C++ program than a Haskell program. We can call a print statement without any mention of the IO
monad. We see braces used to delimit the function body, and a semicolon at the end of the statement. If we wanted, we could add more print statements.
fn main() {
println!("Hello World!");
println!("Goodbye!");
}
There's nothing in the type signature of this main
function. But we'll explore more further down.
Primitive Types and Variables
Before we can start getting into type signatures though, we need to understand types more! In another nod to C++ (or Java), Rust distinguishes between primitive types and other more complicated types. We'll see that type names are a bit more abbreviated than in other languages. The basic primitives include:
- Various sizes of integers, signed and unsigned (
i32
,u8
, etc.) - Floating point types
f32
andf64
. - Booleans (
bool
) - Characters (
char
). Note these can represent unicode scalar values (i.e. beyond ASCII)
We mentioned last time how memory matters more in Rust. The main distinction between primitives and other types is that primitives have a fixed size. This means they are always stored on the stack. Other types with variable size must go into heap memory. We'll see next time what some of the implications of this are.
Like "do-syntax" in Haskell, we can declare variables using the let
keyword. We can specify the type of a variable after the name. Note also that we can use string interpolation with println
.
fn main() {
let x: i32 = 5;
let y: f64 = 5.5;
println!("X is {}, Y is {}", x, y);
}
So far, very much like C++. But now let's consider a couple Haskell-like properties. While variables are statically typed, it is typically unnecessary to state the type of the variable. This is because Rust has type inference, like Haskell! This will become more clear as we start writing type signatures in the next section. Another big similarity is that variables are immutable by default. Consider this:
fn main() {
let x: i32 = 5;
x = 6;
}
This will throw an error! Once the x
value gets assigned its value, we can't assign another! We can change this behavior though by specifying the mut
(mutable) keyword. This works in a simple way with primitive types. But as we'll see next time, it's not so simple with others! The following code compiles fine!
fn main() {
let mut x: i32 = 5;
x = 6;
}
Functions and Type Signatures
When writing a function, we specify parameters much like we would in C++. We have type signatures and variable names within the parentheses. Specifying the types on your signatures is required. This allows type inference to do its magic on almost everything else. In this example, we no longer need any type signatures in main
. It's clear from calling printNumbers
what x
and y
are.
fn main() {
let x = 5;
let y = 7;
printNumbers(x, y);
}
fn printNumbers(x: i32, y: i32) {
println!("X is {}, Y is {}", x, y);
}
We can also specify a return type using the arrow operator ->
. Our functions so far have no return value. This means the actual return type is ()
, like the unit in Haskell. We can include it if we want, but it's optional:
fn printNumbers(x: i32, y: i32) -> () {
println!("X is {}, Y is {}", x, y);
}
We can also specify a real return type though. Note that there's no semicolon here! This is important!
fn add(x: i32, y: i32) -> i32 {
x + y
}
This is because a value should get returned through an expression, not a statement. Let's understand this distinction.
Statements vs. Expressions
In Haskell most of our code is expressions. They inform our program what a function "is", rather than giving a set of steps to follow. But when we use monads, we often use something like statements in do
syntax.
addExpression :: Int -> Int -> Int
addExpression x y = x + y
addWithStatements ::Int -> Int -> IO Int
addWithStatements x y = do
putStrLn "Adding: "
print x
print y
return $ x + y
Rust has both these concepts. But it's a little more common to mix in statements with your expressions in Rust. Statements do not return values. They end in semicolons. Assigning variables with let
and printing are expressions.
Expressions return values. Function calls are expressions. Block statements enclosed in braces are expressions. Here's our first example of an if
expression. Notice how we can still use statements within the blocks, and how we can assign the result of the function call:
fn main() {
let x = 45;
let y = branch(x);
}
fn branch(x: i32) -> i32 {
if x > 40 {
println!("Greater");
x * 2
} else {
x * 3
}
}
Unlike Haskell, it is possible to have an if
expression without an else
branch. But this wouldn't work in the above example, since we need a return value! As in Haskell, all branches need to have the same type. If the branches only have statements, that type can be ()
.
Note that an expression can become a statement by adding a semicolon! The following no longer compiles! Rust thinks the block has no return value, because it only has a statement! By removing the semicolon, the code will compile!
fn add(x: i32, y: i32) -> i32 {
x + y; // << Need to remove the semicolon!
}
This behavior is very different from both C++ and Haskell, so it takes a little bit to get used to it!
Tuples, Arrays, and Slices
Like Haskell, Rust has simple compound types like tuples and arrays (vs. lists for Haskell). These arrays are more like static arrays in C++ though. This means they have a fixed size. One interesting effect of this is that arrays include their size in their type. Tuples meanwhile have similar type signatures to Haskell:
fn main() {
let my_tuple: (u32, f64, bool) = (4, 3.14, true);
let my_array: [i8; 3] = [1, 2, 3];
}
Arrays and tuples composed of primitive types are themselves primitive! This makes sense, because they have a fixed size.
Another concept relating to collections is the idea of a slice. This allows us to look at a contiguous portion of an array. Slices use the &
operator though. We'll understand why more after the next article!
fn main() {
let an_array = [1, 2, 3, 4, 5];
let a_slice = &a[1..4]; // Gives [2, 3, 4]
}
What's Next
We've now got a foothold with the basics of Rust syntax. Next time, we'll start digging deeper into more complicated types. We'll discuss types that get allocated on the heap. We'll also learn the important concept of ownership that goes along with that.
Get Ready for Rust!
I'm excited to announce that for the next few weeks, we'll be exploring the Rust language! Rust is a very interesting language to compare to Haskell. It has some similar syntax. But it is not as similar as, say, Elm or Purescript. Rust can also look a great deal like C++. And its similarities with C++ are where a lot of its strongpoints are.
In these next few weeks we'll go through some of the basics of Rust. We'll look at things like syntax and building small projects. In this article, we'll do a brief high level comparison between Haskell and Rust. Next time, we'll start digger deeper in some actual code.
To get jump started on your Rust development, take a look at our Starting out with Rust video tutorial!.
Why Rust?
Rust has a few key differences that make it better than Haskell for certain tasks and criteria. One of the big changes is that Rust gives more control over the allocation of memory in one's program.
Haskell is a garbage collected language. The programmer does not control when items get allocated or deallocated. Every so often, your Haskell program will stop completely. It will go through all the allocated objects, and deallocate ones which are no longer needed. This simplifies our task of programming, since we don't have to worry about memory. It helps enable language features like laziness. But it makes the performance of your program a lot less predictable.
I once proposed that Haskell's type safety makes it good for safety critical programs. There's still some substance to this idea. But the specific example I suggested was a self-driving car, a complex real-time system. But the performance unknowns of Haskell make it a poor choice for such real-time systems.
With more control over memory, a programmer can make more assertions over performance. One could assert that a program never uses too much memory. And they'll also have the confidence that it won't pause mid-calculation. Besides this principle, Rust is also made to be more performant in general. It strives to be like C/C++, perhaps the most performant of all mainstream languages.
Rust is also currently more popular with programmers. A larger community correlates to certain advantages, like a broader ecosystem of packages. Companies are more likely to use Rust than Haskell since it will be easier to recruit engineers. It's also a bit easier to bring engineers from non-functional backgrounds into Rust.
Similarities
That said, Rust still has a lot in common with Haskell! Both languages embrace strong type systems. They view the compiler as a key element in testing the correctness of our program. Both embrace useful syntactic features like sum types, typeclasses, polymorphism, and type inference. Both languages also use immutability to make it easier to write correct programs.
What's Next?
Next time, we'll start digging into the language itself. We'll go over some basic examples that show some of the important syntactic points about Rust. We'll explore some of the cool ways in which Rust is like Haskell, but also some of the big differences.
Looking Ahead with More Steps!
In last week's article, we set ourselves up to make our agent use temporal difference learning. But TD is actually a whole family of potential learning methods we can use. They intertwine with other concepts in a bigger category of reinforcement learning algorithms.
In this article, we'll consider a couple possible TD approaches. We'll also examine a bit of theory surrounding other reinforcement learning topics.
For a more high level overview of using Haskell and AI, take a look at our Haskell AI Series! This series will also help you better grasp some of the basics of TensorFlow.
One Step Temporal Difference
Temporal difference learning has one general principle. The evaluation of the current game position should be similar to the evaluation of positions in the future. So at any given step we have our "current" evaluation. Then we have a "target" evaluation based on the future. We want to train our network so that the current board gets evaluated more like the target value.
We can see this in the way we defined our model. The tdTrainStep
takes two different values, the target evaluation and the current evaluation.
data TDModel = TDModel
{ …
, tdTrainStep :: TensorData Float -> TensorData Float -> Session ()
}
And in fact, doing this calculation isn't so different from what we've done before. We'll take the difference between these evaluations, square it, and use reduceSum
. This gives our loss function. Then we'll have TensorFlow minimize
the loss function.
createTDModel :: Session TDModel
createTDModel = do
...
-- Train Model
targetEval <- placeholder (Shape [1])
currentEval <- placeholder (Shape [1])
let diff = targetEval `sub` currentEval
let loss = reduceSum (diff `mul` diff)
trainer <- minimizeWith
adam loss [hiddenWeights, hiddenBias, outputWeights, outputBias]
let trainStep = \targetEvalFeed currentEvalFeed ->
runWithFeeds [feed targetEval targetEvalFeed, feed currentEval currentEvalFeed] trainer
return $ TDModel
{ ...
, tdTrainStep = trainStep
}
Let's now recall how we got our target value last week. We looked at all our possible moves, and used them to advance the world one step. We then took the best outcome out of those, and that was our target value. Because we're advancing one step into the world, we call this "one-step" TD learning.
Adding More Steps
But there's no reason we can't look further into the future! We can consider what the game will look like in 2 moves, not just one move! To do this, let's generalize our function for stepping forward. It will be stateful over the same parameters as our main iteration function. But we'll call it in a way so that it doesn't affect our main values.
We'll make one change to our approach from last time. If a resulting world is over, we'll immediately put the "correct" evaluation value. In our old approach, we would apply this later. Our new function will return the score from advancing the game, the game result, and the World
at this step.
advanceWorldAndGetScore :: Float -> TDModel
-> StateT (World, StdGen) Session (Float, GameResult, World)
advanceWorldAndGetScore randomChance model = do
(currentWorld, gen) <- get
let allMoves = possibleMoves currentWorld
let newWorlds = fst <$> map ((flip stepWorld) currentWorld) allMoves
allScoresAndResults <- Data.Vector.fromList <$>
(forM newWorlds $ \w -> case worldResult w of
GameLost -> return (0.0, GameLost)
GameWon -> return (1.0, GameWon)
GameInProgress -> do
let worldData = encodeTensorData
(Shape [1, inputDimen]) (vectorizeWorld8 w)
scoreVector <- lift $ (tdEvaluateWorldStep model) worldData
return $ (Data.Vector.head scoreVector, GameInProgress))
let (chosenIndex, newGen) = bestIndexOrRandom
allScoresAndResults gen
put (newWorlds !! chosenIndex, newGen)
let (finalScore, finalResult) = allScoresAndResults ! chosenIndex
return $ (finalScore, finalResult, newWorlds !! chosenIndex)
where
-- Same as before, except with resultOrdering
bestIndexOrRandom :: Vector (Float, GameResult) -> StdGen
-> (Int, StdGen)
...
-- First order by result (Win > InProgress > Loss), then score
resultOrdering :: (Float, GameResult) -> (Float, GameResult)
-> Ordering
...
Now we'll call this from our primary iteration function. It seems a little strange. We unwrap the World
from our state only to re-wrap it in another state call. But it will make more sense in a second!
runWorldIteration :: Float -> TDModel
-> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
(currentWorld, gen) <- get
((chosenNextScore, finalResult, nextWorld), (_, newGen)) <-
lift $ runStateT
(advanceWorldAndGetScore randomChance model)
(currentWorld, gen)
So at the moment, our code is still doing one-step temporal difference. But here's the key. We can now sequence our state action to look further into the future. We'll then get many values to compare for the score. Here's what it looks like for us to look two moves ahead and take the average of all the scores we get:
runWorldIteration :: Float -> TDModel
-> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
(currentWorld, gen) <- get
let numSteps = 2
let repeatedUpdates = sequence $ replicate numSteps
(advanceWorldAndGetScore randomChance model)
(allWorldResults, (_, newGen)) <- lift $
runStateT repeatedUpdates (currentWorld, gen)
let allScores = map (\(s, _, _) -> s) allWorldResults
let averageScore = sum allScores / fromIntegral (length allScores)
let nextScoreData = encodeTensorData
(Shape [1]) (Data.Vector.singleton averageScore)
...
When it comes to continuing the function though, we only consider the first world and result:
runWorldIteration :: Float -> TDModel
-> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
let (_, result1, nextWorld1) = Prelude.head allWorldResults
put (nextWorld1, newGen)
case result1 of
GameLost -> return False
GameWon -> return True
GameInProgress -> runWorldIteration randomChance model
We could take more steps if we wanted! We could also change how we get our target score. We could give more weight to near-future scores. Or we could give more weight to scores in the far future. These are all just parameters we can tune now. We can now refer to our temporal difference algorithm as "n-step", rather than 1-step.
Monte Carlo vs. Dynamic Programming
With different parameters, our TD approach can look like other common learning approaches. Dynamic Programming is an approach where we adjust our weights after each move in the game. We expect rewards for a particular state to be like those of near-future states. We use the term "bootstrapping" for "online" learning approaches like this. TD learning also applies bootstrapping.
However, dynamic programming requires that we have a strong model of the world. That is, we would have to know the probability of getting into certain states from our current state. This allows us to more accurately predict the future. We could apply this approach to our maze game on a small enough grid. But the model size would increase exponentially with the grid size and enemies! So our approach doesn't actually do this! We can advance the world with a particular move, but we don't have a comprehensive model of how the world works.
In this regard, TD learning is more like Monte Carlo learning. This algorithm is "model free". But it is not an online algorithm! We must play through an entire episode of the game before we can update the weights. We could take our "n-step" approach above, and play it out over the course of the entire game. If we then chose to provide the full weighting to the final evaluation, our model would be like Monte Carlo!
In general, the more steps we add to our TD approach, the more it approximates Monte Carlo learning. The fewer steps we have, the more it looks like dynamic programming.
TD Lambda
TD Gammon, the algorithm we mentioned last time, uses a variation of TD learning called "TD Lambda". It involves looking both forward in time as well as backwards. It observes that the best solutions lie between the extremes of one-step TD and Monte Carlo.
Academic literature can help give a more complete picture of machine learning. One great text is Reinforcement Learning, by Sutton and Barto. It's one of the authoritative texts on the topics we've discussed in this article!
What's Next
This concludes our exploration of AI within the context of our Maze Game. We'll come back to AI and Machine Learning again soon. Next week, we'll start tackling a new subject in the realm of functional programming, something we've never looked at before on this blog! Stay tuned!
Setting Up Our Model with Look-Ahead
Last week we went over some of the basics of Temporal Difference (TD) learning. We explored a bit of the history, and compared it to its cousin, Q-Learning. Now let's start getting some code out there. Since there's a lot in common with Q-Learning, we'll want a similar structure.
This is at least the third different model we've defined over the course of this series. So we can now start observing the patterns we see in developing these algorithms. Here's a quick outline, before we get started:
- Define the inputs and outputs of the system.
- Define the data model. This should contain the weight variables we are trying to learn. By including them in the model, we can output our results later. It should also contain important
Session
actions, such as training. - Create the model
- Run iterations using our model
We'll follow this outline throughout the article!
If you're new to Haskell and machine learning, a lot of the code we write here won't make sense. You should start off a little easier with our Haskell AI Series. You should also download our Haskell Tensor Flow Guide.
Inputs and Outputs
For our world features, we'll stick with our hand-crafted feature set, but simplified. Recall that we selected 8 different features for every location our bot could move to. We'll stick with these 8 features. But we only need to worry about them for the current location of the bot. We'll factor in look-ahead by advancing the world for our different moves. So the "features" of adjacent squares are irrelevant. This vectorization is easy enough to get using produceLocationFeatures
:
vectorizeWorld8 :: World -> V.Vector Float
vectorizeWorld8 w = V.fromList (fromIntegral <$>
[ lfOnActiveEnemy standStill
, lfShortestPathLength standStill
, lfManhattanDistance standStill
, lfEnemiesOnPath standStill
, lfNearestEnemyDistance standStill
, lfNumNearbyEnemies standStill
, lfStunAvailable standStill
, lfDrillsRemaining standStill
])
where
standStill = produceLocationFeatures
(playerLocation . worldPlayer $ w) w False
We also don't need to be as concerned about exploring the maze with this agent. We'll be defining what its possible moves are at every turn. This is a simple matter of using this function we have from our game:
possibleMoves :: World -> [PlayerMove]
We should also take this opportunity to specify the dimensions of our network. We'll use 20 hidden units:
inputDimen :: Int64
inputDimen = 8
hiddenDimen :: Int64
hiddenDimen = 20
outputDimen :: Int64
outputDimen = 1
Define the Model
Now let's define our data model. As in the past, we'll use a dense (fully-connected) neural network with one hidden layer. This means we'll expose two sets of weights and biases:
data TDModel = TDModel
{ tdHiddenWeights :: Variable Float
, tdHiddenBias :: Variable Float
, tdOutputWeights :: Variable Float
, tdOutputBias :: Variable Float
...
}
We'll also have two different actions to take with our tensor graph, as we had with Q-Learning. The first will be for evaluating a single world state. The second will take an expected score for the world state as well as the actual score for a world state. It will compare them and train our model:
data TDModel = TDModel
{ ...
, tdEvaluateWorldStep :: TensorData Float -> Session (Vector Float)
, tdTrainStep :: TensorData Float -> TensorData Float -> Session ()
}
Building the Model
Now we need to construct this model. We'll start off as always by initializing random variables for our weights and biases. We'll also make a placeholder for our world input:
createTDModel :: Session TDModel
createTDModel = do
(worldInputVector :: Tensor Value Float) <-
placeholder (Shape [1, inputDimen])
hiddenWeights <- truncatedNormal (vector [inputDimen, hiddenDimen])
>>= initializedVariable
hiddenBias <- truncatedNormal (vector [hiddenDimen])
>>= initializedVariable
outputWeights <- truncatedNormal (vector [hiddenDimen, outputDimen])
>>= initializedVariable
outputBias <- truncatedNormal (vector [outputDimen])
>>= initializedVariable
...
Each layer of our dense network consists of a matrix multiplication by the weights, and adding the bias vector. Between the layers, we'll apply relu
activation. We conclude by running the output vector with an input feed:
createTDModel :: Session TDModel
createTDModel = do
...
let hiddenLayerResult = relu $
(worldInputVector `matMul` (readValue hiddenWeights))
`add` (readValue hiddenBias)
let outputLayerResult =
(hiddenLayerResult `matMul` (readValue outputWeights))
`add` (readValue outputBias)
let evaluateStep = \inputFeed -> runWithFeeds
[feed worldInputVector inputFeed] outputLayerResult
...
We'll leave the training step undefined
for now. We'll work on that next time.
createTDModel :: Session TDModel
createTDModel = do
…
return $ TDModel
{ tdHiddenWeights = hiddenWeights
, tdHiddenBias = hiddenBias
, tdOutputWeights = outputWeights
, tdOutputBias = outputBias
, tdEvaluateWorldStep = evaluateStep
, tdTrainStep = undefined
}
Running World Iterations
Much of the skeleton and support code remains the same from Q-Learning. But let's go over the details of running a single iteration on one of our worlds. This function will take our model as a parameter, as well as a random move chance. (Recall that adding randomness to our moves will help us avoid a stagnant model). It will be stateful over the World
and a random generator.
runWorldIteration :: Float -> TDModel
-> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
...
We'll start off by getting all the possible moves from our current position. We'll step the world forward for each one of these moves. Then we'll feed the resulting worlds into our model. This will give us the scores for every move:
runWorldIteration :: Float -> TDModel
-> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
(currentWorld, gen) <- get
let allMoves = possibleMoves currentWorld
let newWorlds = fst <$> map ((flip stepWorld) currentWorld) allMoves
(allScores :: Vector Float) <-
Data.Vector.fromList <$> (forM newWorlds $ \w -> do
let worldData = encodeTensorData
(Shape [1, inputDimen]) (vectorizeWorld8 w)
scoreVector <- lift $ (tdEvaluateWorldStep model) worldData
return $ Data.Vector.head scoreVector)
...
Now we need to take a similar action to what we had with Q-Learning. We'll roll the dice, and either select the move with the best score, or we'll select a random index.
runWorldIteration :: Float -> TDModel
-> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
...
let (chosenIndex, newGen) = bestIndexOrRandom allScores gen
...
where
bestIndexOrRandom :: Vector Float -> StdGen -> (Int, StdGen)
bestIndexOrRandom scores gen =
let (randomMoveRoll, gen') = randomR (0.0, 1.0) gen
(randomIndex, gen'') = randomR (0, 1) gen'
in if randomMoveRoll < randomChance
then (randomIndex, gen'')
else (maxIndex scores, gen')
Now that we have our "chosen" move and its score, we'll encode that score as data to pass to the training step. The exception to this is if the game ends. In that case, we'll have a "true" score of 1 or 0 to give. While we're at it, we can also calculate the continuationAction
. This is either returning a boolean for ending the game, or looping again.
runWorldIteration :: Float -> TDModel
-> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
...
let nextWorld = newWorlds !! chosenIndex
put (nextWorld, newGen)
let (chosenNextScore, continuationAction) =
case worldResult nextWorld of
GameLost -> (0.0, return False)
GameWon -> (1.0, return True)
GameInProgress -> ( allScores ! chosenIndex
, runWorldIteration randomChance model)
let nextScoreData = encodeTensorData
(Shape [1]) (Data.Vector.singleton chosenNextScore)
...
We'll also encode the evaluation of our current world. Then we'll pass these values to our training step, and run the continuation!
runWorldIteration :: Float -> TDModel
-> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
...
let currentWorldData = encodeTensorData
(Shape [1, inputDimen]) (vectorizeWorld8 currentWorld)
currentScoreVector <- lift $
(tdEvaluateWorldStep model) currentWorldData
let currentScoreData = encodeTensorData
(Shape [1]) currentScoreVector
lift $ (tdTrainStep model) nextScoreData currentScoreData
continuationAction
What's Next?
We've now got the basic framework set up for our TD agent. Next time, we'll start digging into the actual formula we use to learn the weights. It's a little more complicated than some of the previous loss functions we've dealt with in the past.
If you want to get started with Haskell and Tensor Flow, download our Haskell Tensor Flow Guide. It will help you learn the basics of this complicated library!
Temporal Difference Primer
Last week we finished our exploration of supervised learning with our maze game. We explored a more complex model that used convolution and pooling. This week, we're going back to "unsupervised" learning. We'll consider another approach that does not require the specification of "correct" outputs.
This approach is Temporal Difference Learning (TD Learning). It relies on having a function to evaluate a game position. Its main principle is that the current position should have a similar evaluation to positions in the near future.
Our evaluation function will use weights whose values our training program will learn. We'll want to learn these weights to minimize the difference between game evaluations. In this article, we'll take a high level look at this approach, before we get into the details next time.
History of TD Learning
The concept of TD learning was first developed in the 1980's. One of the more famous applications of TD learning in the 1990's was to learn an AI for Backgammon, called TD Gammon. This agent could play the game at an intermediate human level. It did this initially with no hand-crafting of any of the game rules or any algorithm.
Getting to this level with a "knowledge free" algorithm was almost unheard of at the time. When providing hand-crafted features, the agent could then play at a near-expert level. It explored many possibilities that human players had written off. In doing so, it contributed new ideas to high level backgammon play. It was an important breakthrough in unsupervised techniques.
Q-Learning vs. TD Learning
A few weeks back, we explored Q-Learning. And at first glance, Q-Learning and TD learning might sound similar. But with temporal difference, we'll be learning a different function. In Q-Learning, we learned the Q function. This is a function that takes in our current game board and provides a score for each possible move. With TD, we'll be learning what we call the V function. This function is a direct evaluation of the current board.
With our game mechanics, our agent chooses between 10 different moves. So the "output" vector of our Q-Learning network had size 10. Now in temporal difference learning, we'll only output a single number. This will be an "evaluation", or score, of the current position.
If a game has more than 2 outcomes, you would want the evaluation function to give a score for each of them. But our game has a binary outcome, so one number is enough.
Basics
Despite this difference, our TensorFlow code will have a similar structure to Q-Learning. Here's a high level overview:
- Our model will take an arbitrary game state and produce a score.
- At each iteration, we will get the model's output score on all possible moves from that position. We'll account for enemy moves when doing this. We will then choose the move for the best resulting board.
- We will advance the world based on this move, and then pass the resulting world through our model again.
- Then, adjust the weights so that the evaluations of the new world and the original world are more similar.
- If the resulting world is either a "win" or a "loss", we'll use the correct value (1 or 0) as the evaluation. Otherwise, we'll use our evaluation function.
What's Next
Next time, we'll dig into more specifics. It will be a bit tricky to use an evaluation function for our game in conjunction with TensorFlow. But once we have that, we can get into the meatier parts of this algorithm. We'll see exactly what operations we need to train our agent.
To learn more about using Haskell with AI, read our Haskell AI Series! This series shows some of the unique ideas that Haskell can bring to the world of machine learning.
Enemies, Convolution and Pooling
Let's remember now that last week we were still keeping with the idea of supervised learning. We were trying to get our agent to reach the goal on an empty maze, using own moves as a guide. Since the problem was so simple, we could make an agent that navigating the maze.
But what happens if we introduce enemies into the maze? This will complicate the process. Our manual AI isn't actually that successful in this scenario. But recording our own moves provides a reasonable data set.
First, we need to update the features a little bit. Recall that we're representing our board so that every grid space has a single feature. In our old representation, we used a value of 100 to represent the target space, and 10 to represent our own space. We need to expand this representation. We want to know where the enemies are, if they're stunned, and if we still have our stun.
Let's use 25.0 to represent our position if we have the stun. If we don't have our stun available, we'll use 10.0 instead. For positions containing active enemies, we'll use -25.0. If the enemy is in the stunned state, we'll use -10.0.
vectorizeWorld :: World -> V.Vector Float
vectorizeWorld w = finalFeatures
where
initialGrid = V.fromList $ take 100 (repeat 0.0)
(px, py) = playerLocation (worldPlayer w)
(gx, gy) = endLocation w
playerLocIndex = (9 - py) * 10 + px
playerHasStun = playerCurrentStunDelay (worldPlayer w) == 0
goalLocIndex = (9 - gy) * 10 + gx
finalFeatures = initialGrid V.//
([ (playerLocIndex
, if playerHasStun then 25.0 else 10.0)
, (goalLocIndex, 100.0)] ++ enemyLocationUpdates)
enemyLocationUpdates = enemyPair <$> (worldEnemies w)
enemyPair e =
let (ex, ey) = enemyLocation e
in ( (9 - ey) * 10 + ex,
if enemyCurrentStunTimer e == 0 then -25.0 else -10.0)
Using our moves as a training set, we see some interesting results. It's difficult to get great results on our error rates. The data set is very prone to overfitting. We often end up with training error in the 20-30% range with a test error near 50%. And yet, our agent can consistently win the game! Our problem space is still a bit simplistic, but it is an encouraging result!
Convolution and Pooling
As suggested last week, the ideas of convolution and pooling could be useful on this feature set. Let's try it out, for experimentation purposes. We'll start with a function to create a layer in our network that does convolution and pooling.
buildConvPoolLayer :: Int64 -> Int64 -> Tensor v Float -> Build (Variable Float, Variable Float, Tensor Build Float)
buildConvPoolLayer inputChannels outputChannels input = do
weights <- truncatedNormal (vector weightsShape)
>>= initializedVariable
bias <- truncatedNormal (vector [outputChannels])
>>= initializedVariable
let conv = conv2D' convAttrs input (readValue weights)
`add` readValue bias
let results = maxPool' poolAttrs (relu conv)
return (weights, bias, results)
where
weightsShape :: [Int64]
weightsShape = [5, 5, inputChannels, outputChannels]
-- Create "attributes" for our operations
-- E.g. How far does convolution slide?
convStridesAttr = opAttr "strides" .~ ([1,1,1,1] :: [Int64])
poolStridesAttr = opAttr "strides" .~ ([1,2,2,1] :: [Int64])
poolKSizeAttr = opAttr "ksize" .~ ([1,2,2,1] :: [Int64])
paddingAttr = opAttr "padding" .~ ("SAME" :: ByteString)
dataFormatAttr = opAttr "data_format" .~ ("NHWC" :: ByteString)
convAttrs = convStridesAttr . paddingAttr . dataFormatAttr
poolAttrs = poolKSizeAttr . poolStridesAttr . paddingAttr . dataFormatAttr
The input to this layer will be our 10x10
"image" of a grid. The output will be 5x5x32
. The "5" dimensions come from halving the board dimension. The "32" will come from the number of channels. We'll replace our first neural network layer with this layer:
createModel :: Build Model
createModel = do
let batchSize = -1
let (gridDimen :: Int64) = 10
let inputChannels = 1
let (convChannels :: Int64) = 32
let (nnInputSize :: Int64) = 5 * 5 * 32
(inputs :: Tensor Value Float) <-
placeholder [batchSize, moveFeatures]
(outputs :: Tensor Value Int64) <-
placeholder [batchSize]
let convDimens = [batchSize, gridDimen, gridDimen, inputChannels]
let inputAsGrid = reshape inputs (vector convDimens)
(convWeights, convBias, convResults) <-
buildConvPoolLayer inputChannels convChannels inputAsGrid
let nnInput = reshape
convResults (vector [batchSize, nnInputSize])
(nnWeights, nnBias, nnResults) <-
buildNNLayer nnInputSize moveLabels nnInput
(actualOutput :: Tensor Value Int64) <- render $
argMax nnResults (scalar (1 :: Int64))
We have a bit less success here. Our training is slower. We get similar error rate. But now our agent doesn't seem to win, unlike the agent from the pure, dense network. So it's not clear that we can actually treat this grid as an image and make any degree of progress.
What's Next?
Over the course of creating these different algorithms, we've discretized the game. We've broken it down into two phases. One where we move, one where the enemies can move. This will make it easier for us to move back to our evaluation function approach. We can even try using multi-move look-ahead. This could lead us to a position where we can try temporal difference learning. Researchers first used this approach to train an agent how to play Backgammon at a very high level.
Temporal difference learning is a very interesting concept. It's the last approach we'll try with our agent. We'll start out next time with an overview of TD learning before we jump into coding.
For another look at using Haskell with AI problems, read our Haskell and AI series! You can also download our Haskell Tensor Flow Guide for some more help with this library!
Different Feature Schemes
In last week's edition of our Maze AI series, we explored the mechanics of supervised learning. We took the training data we'd been building up and trained an agent on it. We had one set of data to make the AI follow our own human moves, and another to follow our hand-crafted AI. This wasn't particularly successful. The resulting agent had a lot of difficulty navigating the maze and using its stun at the right times.
This week, we'll explore a couple different ways we can expand the feature set. First, we'll try encoding the legality of moves in our feature set. Second, we'll try expanding the feature set to include more data about the grid. This will motivate some other approaches to the problem. We'll conclude by taking the specifics of grid navigation out. We'll let our agent go to work on an empty grid to validate that this is at least a reasonable approach.
For some more reading on using Haskell and AI, take a look at our Haskell AI Series. We explore some reasons why Haskell could be a good fit for AI and machine learning problems. It will also help you through some of the basics of using Haskell and Tensor Flow.
Encoding Legal Moves
Our supervised agent uses our current feature set. Let's remind ourselves what these features are. We have five different directions we can go (up, down, left, right, stand still). And in each of these directions, we calculate 8 different features.
- The maze distance to the goal
- The manhattan distance to the goal
- Whether the location contains an active enemy
- The number of enemies on the shortest path to the goal from that location
- The distance to the nearest enemy from that location
- The number of nearby enemies in manhattan distance terms
- Whether our stun is available
- The number of drills we have after the move
Some of these features are higher level. We do non-trivial calculations to figure them out. This gives our agent some idea of strategy. But there's not a ton of lower level information available! We zero out the features for a particular spot if it's past the world boundary. But we can't immediately tell from these features if a particular move is legal.
This is a big oversight. It's possible for our AI to learn about the legality of moves from the higher level training data. But it would take a lot more data and a lot more time.
So let's add a feature for how "easy" a move is. A value of 0 will indicate an illegal move, either past the world boundary or through a wall when we don't have a drill. A value of 1 will indicate a move that requires a drill. A value of 2 will indicate a normal move.
We'll add the extra feature into the LocationFeatures
type. We'll also add an extra parameter to our produceLocationFeatures
function. This boolean indicates whether a drill would be necessary. Note, we don't need to account for WorldBoundary
. The value will get zeroed out in that case. We'll call this feature lfMoveEase
since a higher value indicates less effort.
data LocationFeatures = LocationFeatures
{ …
, lfMoveEase :: Int
}
produceLocationFeatures :: Location -> World -> Bool -> LocationFeatures
produceLocationFeatures location@(lx, ly) w needsDrill = LocationFeatures
…
moveEase
where
moveEase = if not needsDrill then 2
else if drillsRemaing > 0 then 1 else 0
It's easy to add the extra parameter to the function call in produceWorldFeatures
. We already use case statements on the boundary types. Now we need to account for it when vectorizing our world.
vectorizeWorld :: World -> V.Vector Float
vectorizeWorld w = V.fromList (fromInegral <$>
[ ...
, lfMoveEase standStill
...
, zeroIfNull (lfMoveEase <$> up)
...
, zeroIfNull (lfMoveEase <$> right)
...
, zeroIfNull (lfMoveEase <$> down)
...
, zeroIfNull (lfMoveEase <$> left)
])
We we train with this feature set, we actually get a good training error, down to around 10%. Thus it can learn our data a bit better. Yet it still can't navigate right.
Expanding the Feature Set
Another option we can try is to serialize the world in a more raw state. We currently use more strategic features. But what about using the information on the board?
Here's a different way to look at it. Let's fix it so that the grid must be 10x10, there must be 2 enemies, and we must start with 2 drills powerups on the map. Let's get these features about the world:
- 100 features for the grid cells. Each feature will be the integer value corresponding the the wall-shape of that cell. These are hexadecimal, like we have when serializing the maze.
- 4 features for the player. Get the X and Y coordinates for the position, the current stun delay, and the number of drills remaining.
- 3 features for each enemy. Again, X and Y coordinates, as well as a stun timer.
- 2 coordinate features for each drill location. Once a drill gets taken, we'll use -1 and -1.
This will give us a total of 114 features. Here's how it breaks down.
vectorizeWorld :: World -> V.Vector Float
vectorizeWorld w = gridFeatures V.++ playerFeatures V.++
enemyFeatures V.++ drillFeatures
where
-- 1. Features for the Grid
mazeStr = Data.Text.unpack $ dumpMaze (worldBoundaries w)
gridFeatures = V.fromList $
fromIntegral <$> digitToInt <$> mazeStr
player = worldPlayer w
enemies = worldEnemies w
-- 2. Features for the player
playerFeatures = V.fromList $ fromIntegral <$>
[ fst . playerLocation $ player
, snd . playerLocation $ player
, fromIntegral $ playerCurrentStunDelay player
, fromIntegral $ playerDrillsRemaining player
]
-- 3. Features for the two enemies
enemy1 = worldEnemies w !! 0
enemy2 = worldEnemies w !! 1
enemyFeatures = V.fromList $ fromIntegral <$>
[ fst . enemyLocation $ enemy1
, snd . enemyLocation $ enemy1
, fromIntegral $ enemyCurrentStunTimer enemy1
, fst . enemyLocation $ enemy2
, snd . enemyLocation $ enemy2
, fromIntegral $ enemyCurrentStunTimer enemy2
]
-- 4. Features for the drill locations
drills = worldDrillPowerUpLocations w
drillFeatures = V.fromList $ fromIntegral <$>
if length drills == 0 then [-1, -1, -1, -1]
else if length drills == 1
then [fst (head drills), snd (head drills), -1, -1]
else [ fst (head drills), snd (head drills)
, fst (drills !! 1), snd (drills !! 1)
]
As an optimization, we can make the grid features part of the world since they will not change.
Still though, our model struggles to complete the grid when training off this data. Compared to the high-level features, the model doesn't even learn very well. We get training errors around 25-30%, but a test error close to 50%. With more data and time, our model might be able to draw the connection between various features.
We could attempt to make our model more sophisticated. We're working with grid data, which is a little like an image. Image processing algorithms use concepts such as convolution and pooling. This allows them to derive patterns arising from how the grid actually looks. We're only looking at the data as a flat vector.
It's unlikely though that convolution and pooling would help us with this feature set. Our secondary features don't fit into the grid. So we would actually want to add them in at a later stage in the process. Besides, we won't get that much data from taking the average value or the max value in a 2x2 segment of the maze. (This is what pooling does).
If we simplify the problem though, we might find a situation where they'll help.
A Simpler Problem
We're having a lot of difficulty with getting our agent to navigate the maze. So let's throw away the problem of navigation for a second. Can we train an agent that will navigate the empty maze? This should be doable.
Let's start with a bare bones feature set with the goal and current location highlighted in a grid. We'll give a value of 10 for our player's location, and a value of 100 for the target location. We start with a vector of all zeros, and uses Vector.//
to modify the proper values:
vectorizeWorld :: World -> V.Vector Float
vectorizeWorld w =
where
initialGrid = V.fromList $ take 100 (repeat 0.0)
(px, py) = playerLocation (worldPlayer w)
(gx, gy) = endLocation w
playerLocIndex = (9 - py) * 10 + px
goalLocIndex = (9 - gy) * 10 + gx
finalFeatures = initialGrid V.//
[(playerLocIndex, 10), (goalLocIndex)]
Our AI bot will always follow the same path in this grid, so it will be quite easy for our agent to learn this path! Even if we use our own moves and vary the path a little bit, the agent can still learn it. It'll achieve 100% accuracy on the AI data. It can't get that high on our data, since we might choose different moves for different squares. But we can still train it so it wins every time.
Conclusion
So our results are still not looking great. But next week we'll take this last idea and run a little further with it. We'll keep it so that our features only come from the grid itself. But we'll add a few more complications with enemies. We might find that convolution and pooling are useful in that case.
If you're interested in using Haskell for AI but don't know where to start, read our Haskell AI Series! We discuss some important ideas like why Haskell is a good AI language. We also get into the basics of Tensor Flow with Haskell.
Using Our Data with Supervised Learning
Our aim these last couple weeks has been to try a supervised learning approach to our game. In last week's article we gathered training data for playing the game. We had two different sources. First, we played the game ourselves and recorded our moves. Second, we let our AI play the game and recorded it. This gave us a few CSV files. Each line in these is a record containing the 40 "features" of the game board at the time and the move we chose.
This week, we're going to explore how to build a machine-learning agent based on this data. This will use supervised learning techniques . Look at the supervised-learning
branch on our Github repository for more details.
To get started with Haskell and Tensor Flow, download our Haskell Tensor Flow Guide. This library is a little tricky to work with, so you want to make sure you know what you're doing!
Defining Our Model
For our supervised model, we're going to use a fully connected neural network with a single hidden layer. We'll have 40 input features, 100 hidden units, and then our 10 output values for the different move scores. We'll be following a very similar pattern to one we explored in this older article, using the basic Iris data set. We'll copy a lot of code from that article. We won't go over a lot of the helper code in this article, so feel free to check out that one for some help with that!
We define each layer with a "weights" matrix and a "bias" vector. We multiply the input by the weights and then add the bias vector. Let's explore how we can build a single layer of the network. This will take the input and output size, as well as the input tensor. It will have three results. One variable for the weights, one for the biases, and then a final "output" tensor:
buildNNLayer :: Int64 -> Int64 -> Tensor v Float
-> Build (Variable Float, Variable Float, Tensor Build Float)
The definition is pretty simple. We'll initialize random variables for the weights and bias. We'll produce the result tensor by multiplying by the weights and adding the bias.
buildNNLayer :: Int64 -> Int64 -> Tensor v Float
-> Build (Variable Float, Variable Float, Tensor Build Float)
buildNNLayer inputSize outputSize input = do
weights <- truncatedNormal (vector [inputSize, outputSize])
>>= initializedVariable
bias <- truncatedNormal (vector [outputSize])
>>= initializedVariable
let results = (input `matMul` readValue weights)
`add` readValue bias
return (weights, bias, results)
Now that we understand the layers a little better, it's easier to define our model. First, we'll want to include both sets of weights and biases in the model, so we can output them later:
data Model = Model
{ w1 :: Variable Float
, b1 :: Variable Float
, w2 :: Variable Float
, b2 :: Variable Float
...
}
Now we want two different "steps" we can run. The training step will take a batch of data and determine what our network produces for the inputs. It will compare our network's output with the expected output. Then it will train to minimize the loss function. The error rate step will simply produce the error rate on the given data. That is, it will tell us what percentage of the moves we are getting correct. Both of these will be Session
actions that take two inputs. First, the TensorData
for the features, and then the TensorData
for the correct moves:
data Model = Model
{
… -- Weights and biases
, train :: TensorData Float
-> TensorData Int64
-> Session ()
, errorRate :: TensorData Float
-> TensorData Int64
-> Session (V.Vector Float) -- Produces a single number
}
Let's see how we put this all together.
Building Our Model
To start, let's make placeholders for our input features and expected output results. A dimension of -1
means we can provide any size we like:
createModel :: Build Model
createModel = do
let batchSize = -1
(inputs :: Tensor Value Float) <-
placeholder [batchSize, moveFeatures]
(outputs :: Tensor Value Int64) <-
placeholder [batchSize]
...
Now we build the layers of our neural network using our helper. We'll apply relu
, an activation function, on the results of our hidden layer. This helps our model deal with interaction effects and non-linearities:
createModel :: Build Model
createModel = do
...
(hiddenWeights, hiddenBiases, hiddenResults) <-
buildNNLayer moveFeatures hiddenUnits inputs
let rectifiedHiddenResults = relu hiddenResults
(finalWeights, finalBiases, finalResults) <-
buildNNLayer hiddenUnits moveLabels rectifiedHiddenResults
...
Now to get our error rate, we need a couple steps. We'll get the best move from each predicted result using argMax
. We can then compare these to the training data using equal
. By using reduceMean
we'll get the percentage of our moves that match. Subtracting this from 1 gives our error rate:
createModel :: Build Model
createModel = do
...
(actualOutput :: Tensor Value Int64) <- render $
argMax finalResults (scalar (1 :: Int64))
let (correctPredictions :: Tensor Build Float) = cast $
equal actualOutput outputs
(errorRate_ :: Tensor Value Float) <- render $
1 - (reduceMean correctPredictions)
Now we need our training step. We'll compare outputs. This involves the softmaxCrossEntropyWithLogits
function. We train our model by selecting our variables for training, and using minimizeWith
. This will update the variables to reduce the value of the loss
function:
createModel :: Build Model
createModel = do
...
let outputVectors = oneHot outputs (fromIntegral moveLabels) 1 0
let loss = reduceMean $ fst $
softmaxCrossEntropyWithLogits finalResults outputVectors
let params =
[hiddenWeights, hiddenBiases, finalWeights, finalBiases]
train_ <- minimizeWith adam loss params
...
We conclude by creating our functions. These take the tensor data as parameters. Then they use runWithFeeds
to put the data into our placeholders:
createModel :: Build Model
createModel = do
...
return $ Model
{ train = \inputFeed outputFeed ->
runWithFeeds
[ feed inputs inputFeed
, feed outputs outputFeed
]
train_
, errorRate = \inputFeed outputFeed ->
runWithFeeds
[ feed inputs inputFeed
, feed outputs outputFeed
]
errorRate_
, w1 = hiddenWeights
, b1 = hiddenBiases
, w2 = finalWeights
, b2 = finalBiases
}
Running Our Tests
Now let's run our tests. We'll read the move record data from the file, shuffle them, and set aside a certain proportion as our test set. Then we'll build our model:
runTraining totalFile = runSession $ do
initialRecords <- liftIO $ readRecordFromFile totalFile
shuffledRecords <- liftIO $ shuffleM (V.toList initialRecords)
let testRecords = V.fromList $ take 2000 shuffledRecords
let trainingRecords = V.fromList $ drop 2000 shuffledRecords
model <- build createModel
...
Then we run our iterations (we'll do 50000, as an example). We select some random records (100 per batch), and then convert them to data. Then we run our train
step. Finally, every 100 iterations or so, we'll get a gauge of the training error on this set. This involves the errorRate
step. Note our error rate returns a vector with a single wrapped value. So we need to unwrap it with !
.
runTraining totalFile = runSession $ do
...
forM_ ([0..50000] :: [Int]) $ \i -> do
trainingSample <- liftIO $ chooseRandomRecords trainingRecords
let (trainingInputs, trainingOutputs) =
convertRecordsToTensorData trainingSample
(train model) trainingInputs trainingOutputs
when (i `mod` 100 == 0) $ do
err <- (errorRate model) trainingInputs trainingOutputs
liftIO $ putStrLn $
(show i) ++ " : current error " ++ show ((err V.! 0) * 100)
Now to run the final test, we use the errorRate
step again, this time on our test data:
runTraining totalFile = runSession $ do
...
-- Testing
let (testingInputs, testingOutputs) =
convertRecordsToTensorData testRecords
testingError <- (errorRate model) testingInputs testingOutputs
liftIO $ putStrLn $
"test error " ++ show ((testingError V.! 0) * 100)
Results
When it comes to testing our system, we should use proper validation techniques. We want a model that will represent our training data well. But it should also generalize well to other reasonable examples. If our model represents the training data too well, we're in danger of "overfitting" our data. To check this, we'll hold back roughly 20% of the data. This will be our "test" data. We'll train our model on the other 80% of the data. Every 100 steps or so, we print out the training error on that batch of data. We hope this figure drops. But then at the very end, we'll run the model on the other 20% of the data, and we'll see what the error rate is. This will be the true test of our system.
We know we have overfitting if we see figures on training error that are lower than the testing error. When training on human moves for 50000 iterations, the training error drops to the high teens and low 20's. But the test error is often still close to 50%. This suggests we shouldn't be training quite as long.
The AI moves provide a little more consistency though. The training error seems to stabilize around the mid 20's and low 30's, and we end up with a test error of about 34%.
Conclusion
Our error rate isn't terrible. But it's not great either. And worse, testing shows it doesn't appear to capture the behaviors well enough to win the game. A case like this suggests our model isn't sophisticated enough to capture the problem. It could also suggest our data is too noisy, and the patterns we hoped to find aren't there. The feature set we have might not capture all the important information about the graph.
For our final look at this problem, we're going to try a more new serialization technique. Instead of deriving our own features, we're going to serialize the entire game board! The "feature space" will be much much larger now. It will include the structure of the graph and information about enemies and drills. This will call for a more sophisticated model. A pure fully connected network will take a long time to learn things like how walls allow moves or not. A big drawback of this technique is that it will not generalize to arbitrary mazes. It will only work for a certain size and number of enemies. But with enough training time we may find that interesting patterns emerge. So come back next week to see how this works!
Gathering Smart Data
Last week we made a few more fixes to our Q-Learning algorithm. Ultimately though, it still seems to fall short for even basic versions of our problem.
Q-learning is an example of an "unsupervised" learning approach. We don't tell the machine learning algorithm what the "correct" moves are. We give it rewards when it wins the game (and negative rewards when it loses). But it needs to figure out how to play to get those rewards. With supervised learning, we'll have specific examples of what it should do! We'll have data points saying, "for this feature set, we should make this move." We'll determine a way to record the moves we make in our game, both as a human player and with our manual AI algorithm! This will become our "training" data for the supervised learning approach.
This week's code is all on the Gloss side of things. You can find it on our Github repository under the branch record-player-ai
. Next week, we'll jump back into Tensor Flow. If you're not familiar yet with how to use Haskell and Tensor Flow, download our Haskell Tensor Flow Guide!
Recording Moves
To gather training data, we first need a way to record moves in the middle of the game. Gloss doesn't give us access to the IO
monad in our update functions. So we'll unfortunately have to resort to unsafePerformIO
for this, since we need the data in a file. (We did the same thing when saving game states). Here's the skeleton of our function:
unsafeSaveMove :: Int -> World -> World -> World
unsaveSaveMove moveChoice prevWorld nextWorld = unsafePerformIO $ do
...
The first parameter will be a representation of our move, an integer from 0-9. This follows the format we had with serialization.
0 -> Move Up
1 -> Move Right
2 -> Move Down
3 -> Move Left
4 -> Stand Still
X + 5 -> Move direction X and use the stun
The first World
parameter will be the world under which we made the move. The second world will be the resulting world. This parameter only exists as a pass-through, because of how unsafePerformIO
works.
Given these parameters, our function is pretty straightforward. We want to record a single line that has the serialized world state values and our final move choice. These will go in a comma separated list. We'll save everything to a file called moves.csv
. So let's open that file and get the list of numbers. We'll immediately convert the numbers to strings with show
.
unsafeSaveMove :: Int -> World -> World -> World
unsaveSaveMove moveChoice prevWorld nextWorld = unsafePerformIO $ do
handle <- openFile "moves.csv" AppendMode
let numbers = show <$>
(Vector.toList (vectorizeWorld prevWorld) ++
[fromIntegral moveChoice])
...
Now that our values are all strings, we can get them in a comma separated format with intercalate
. We'll write this string to the file and close the handle!
unsafeSaveMove :: Int -> World -> World -> World
unsaveSaveMove moveChoice prevWorld nextWorld = unsafePerformIO $ do
handle <- openFile "moves.csv" AppendMode
let numbers = show <$>
(Vector.toList (vectorizeWorld prevWorld) ++
[fromIntegral moveChoice])
let csvString = intercalate "," numbers
hPutStrLn handle csvString
hClose handle
return nextWorld
Now let's figure out how to call this function!
Saving Human Moves
Saving the moves we make as a human is pretty easy. All we need to do is hook into the inputHandler
. Recall this section, that receives moves from arrow keys and makes our move:
inputHandler :: Event -> World -> World
inputHandler event w
...
| otherwise = case event of
(EventKey (SpecialKey KeyUp) Down (Modifiers _ _ Down) _) ->
drillLocation upBoundary breakUpWall breakDownWall w
(EventKey (SpecialKey KeyUp) Down _ _) ->
updatePlayerMove upBoundary
(EventKey (SpecialKey KeyDown) Down (Modifiers _ _ Down) _) ->
drillLocation downBoundary breakDownWall breakUpWall w
(EventKey (SpecialKey KeyDown) Down _ _) ->
updatePlayerMove downBoundary
(EventKey (SpecialKey KeyRight) Down (Modifiers _ _ Down) _) ->
drillLocation rightBoundary breakRightWall breakLeftWall w
(EventKey (SpecialKey KeyRight) Down _ _) ->
updatePlayerMove rightBoundary
(EventKey (SpecialKey KeyLeft) Down (Modifiers _ _ Down) _) ->
drillLocation leftBoundary breakLeftWall breakRightWall w
(EventKey (SpecialKey KeyLeft) Down _ _) ->
updatePlayerMove leftBoundary
(EventKey (SpecialKey KeySpace) Down _ _) ->
if playerCurrentStunDelay currentPlayer /= 0
then w
else w
{ worldPlayer =
activatePlayerStun currentPlayer playerParams
, worldEnemies = stunEnemyIfClose <$> worldEnemies w
, stunCells = stunAffectedCells
}
…
All these lines return World
objects! So we just need to wrap them as the final argument to unsafeSaveWorld
. Then we add the appropriate move choice number. The strange part is that we cannot move AND stun at the same time when playing as a human. So using the stun will always be 9, which means stunning while standing still. Here are the updates:
inputHandler :: Event -> World -> World
inputHandler event w
...
| otherwise = case event of
(EventKey (SpecialKey KeyUp) Down (Modifiers _ _ Down) _) ->
unsafeSaveMove 0 w $
drillLocation upBoundary breakUpWall breakDownWall w
(EventKey (SpecialKey KeyUp) Down _ _) ->
unsafeSaveMove 0 w $ updatePlayerMove upBoundary
(EventKey (SpecialKey KeyDown) Down (Modifiers _ _ Down) _) ->
unsafeSaveMove 2 w $
drillLocation downBoundary breakDownWall breakUpWall w
(EventKey (SpecialKey KeyDown) Down _ _) ->
unsafeSaveMove 2 w $ updatePlayerMove downBoundary
(EventKey (SpecialKey KeyRight) Down (Modifiers _ _ Down) _) ->
unsafeSaveMove 1 w $
drillLocation rightBoundary breakRightWall breakLeftWall w
(EventKey (SpecialKey KeyRight) Down _ _) ->
unsafeSaveMove 1 w $ updatePlayerMove rightBoundary
(EventKey (SpecialKey KeyLeft) Down (Modifiers _ _ Down) _) ->
unsafeSaveMove 3 w $
drillLocation leftBoundary breakLeftWall breakRightWall w
(EventKey (SpecialKey KeyLeft) Down _ _) ->
unsafeSaveMove 3 w $ updatePlayerMove leftBoundary
(EventKey (SpecialKey KeySpace) Down _ _) ->
if playerCurrentStunDelay currentPlayer /= 0
then w
else unsafeSaveMove 9 w $ w
{ worldPlayer =
activatePlayerStun currentPlayer playerParams
, worldEnemies = stunEnemyIfClose <$> worldEnemies w
, stunCells = stunAffectedCells
}
…
And now whenever we play the game, it will save our moves! Keep in mind though, it takes a lot of training data to get good results when using supervised learning. I played for an hour and got around 10000 data points. We'll see if this is enough!
Saving AI Moves
While the game is a least a little fun, it's also exhausting to keep playing it to generate data! So now let's consider how we can get the AI to play the game itself and generate data. The first step is to reset the game automatically on winning or losing:
updateFunc :: Float -> World -> World
updateFunc _ w =
| (worldResult w == GameWon || worldResult w == GameLost) &&
(usePlayerAI params) =
...
The rest will follow the other logic we have for resetting the game. Now we must examine where to insert our call to unsafeSaveMove
. The answer is our updateWorldForPlayerMove
function. Wecan see that we get the move (and our player's cached memory) as part of makePlayerMove
:
updateWorldForPlayerMove :: World -> World
updateWorldForPlayerMove w = …
where
(move, memory) = makePlayerMove w
...
We'll want a quick function to convert our move into the number choice:
moveNumber :: PlayerMove -> Int
moveNumber (PlayerMove md useStun dd) =
let directionFactor = case (md, dd) of
(DirectionUp, _) -> 0
(_, DirectionUp) -> 0
(DirectionRight, _) -> 1
(_, DirectionRight) -> 1
(DirectionDown, _) -> 2
(_, DirectionDown) -> 2
(DirectionLeft, _) -> 3
(_, DirectionLeft) -> 3
_ -> 4
in if useStun then directionFactor + 5 else directionFactor
Our saving function requires a pass-through world parameter. So we'll do the saving on our first new World
calculation. This comes from modifyWorldForPlayerDrill
:
updateWorldForPlayerMove :: World -> World
updateWorldForPlayerMove w = …
where
(move, memory) = makePlayerMove w
worldAfterDrill = unsafeSaveMove (moveNumber move) w
(modifyWorldForPlayerDrill …)
...
And that's all! Now our AI will play the game by itself, gathering data for hours on end if we like! We'll get some different data for different cases, such as 4 enemies 4 drills, 8 enemies 5 drills, and so on. This is much faster and easier than playing the game ourselves! It will automatically get 12-15 thousand data points an hour if we let it!
Conclusion
With a little bit of persistence, we can now get a lot of data for the decisions a smarter agent will make. Next week, we'll take the data we've acquired and use it to write a supervised learning algorithm! Instead of using Q-learning, we'll make the weights reflect the decisions that we (or the AI) would make.
Supervised learning is not without its pitfalls! It won't necessarily perform optimally. It will perform like the training data. So even if we're successful, our algorithm will replicate our own mistakes! It'll be interesting to see how this plays out, so stay tuned!
For more information on using Haskell in AI, take a look at our Haskell AI Series. Plus, download our Haskell Tensor Flow Guide to learn more about using this library!
Tweaks, Fixes, and Some Results
In last week's episode of this AI series, we added random exploration to our algorithm. This helped us escape certain "traps" and local minimums in the model that could keep us rooted in bad spots. But it still didn't improve results too much.
This week we'll explore a couple more ways we can fix and improve our algorithm. For the first time, see some positive outcomes. Still, we'll find our approach still isn't great.
To get started with Tensor Flow and Haskell, download our guide! It's a complex process so you'll want some help! You should also check out our Haskell AI Series to learn more about why Haskell is a good choice as an AI language!
Improvements
To start out, there are a few improvements we can make to how we do q-learning. Let's recall the basic outline of running a world iteration. There are three steps. We get our "new" move from the "input" world. Then we apply that move, and get our "next" move against the "next" world. Then we use the possible reward to create our target actions, and use that to train our model.
runWorldIteration model = do
(prevWorld, _, _) <- get
-- Get the next move on the current world (with random chance)
let inputWorldVector = … -- vectorize prevWorld
currentMoveWeights <- lift $ lift $
(iterateWorldStep model) inputWorldVector
let bestMove = moveFromOutput currentMoveWeights
let newMove = chooseRandomMoveWithChance …
-- Get the next world using this move, and produce our next move
let nextWorld = stepWorld newMove prevWorld
let nextWorldVector = vectorizeWorld nextWorld
nextMoveVector <- lift $ lift $
(iterateWorldStep model) nextWorldVector
-- Use these to get "target action values" and use them to train!
let (bestNextMoveIndex, maxScore) =
(V.maxIndex nextMoveVector, V.maximum nextMoveVector)
let targetActionData = encodeTensorData (Shape [10, 1]) $
nextMoveVector V.//
[(bestNextMoveIndex, newReward + maxScore)]
lift $ lift $ (trainStep model) nextWorldVector targetActionData
There are a couple issues here. First, we want to substitute based on the first new move, not the later move. We want to learn from the move we are taking now, since we assess its result now. Thus we want to substitute for that index. We'll re-write our randomizer to account for this and return the index it chooses.
Next, when training our model, we should the original world, instead of the next world. That is, we want inputWorldVector
instead of nextWorldVector
. Our logic is this. We get our "future" action, which accounts for the game's reward. We want our current action on this world should be more like the future action. Here's what the changes look like:
runWorldIteration model = do
(prevWorld, _, _) <- get
-- Get the next move on the current world (with random chance)
let inputWorldVector = … -- vectorize prevWorld
currentMoveWeights <- lift $ lift $
(iterateWorldStep model) inputWorldVector
let bestMove = moveFromOutput currentMoveWeights
let (newMove, newMoveIndex) = chooseRandomMoveWithChance …
-- Get the next world using this move, and produce our next move
let nextWorld = stepWorld newMove prevWorld
let nextWorldVector = vectorizeWorld nextWorld
nextMoveVector <- lift $ lift $
(iterateWorldStep model) nextWorldVector
-- Use these to get "target action values" and use them to train!
let maxScore = V.maximum nextMoveVector
let targetActionData = encodeTensorData (Shape [10, 1]) $
nextMoveVector V.//
[(newMoveIndex, newReward + maxScore)]
lift $ lift $ (trainStep model) inputWorldVector targetActionData
Another change we can make is to provide some rewards based on whether the selected move was legal or not. To do this, we'll need to update the stepWorld
game API to return this boolean value:
stepWorld :: PlayerMove -> World -> (World, Bool)
Then we can add a small amount (0.01) to our reward value if we get a legal move, and subtract this otherwise.
As a last flourish, we should also add a timeout condition. Our next step will be to test on simple mazes that have no enemies. This means we'll never get eaten, so we need some loss condition if we get stuck. This timeout condition should have the same negative reward as losing.
Results
Now that we've made some improvements, we'll train on a very basic maze that's only 5x5 and has no walls and no enemies. Whereas we used to struggle to even finish this maze, we now achieve the goal a fair amount of the time. One of our training iterations achieved the goal around 2/3 of the time.
However, our bot is still useless against enemies! It loses every time if we try to train from scratch on a map with a single enemy. One attempt to circumvent this is to first train our weights to solve the empty maze. Then we can start with these weights as we attempt to avoid the enemy. That way, we have some pre-existing knowledge, and we don't have to learn everything at once. Still though, it doesn't result in much improvement. Typical runs only succeeded 40-50 times out of 2000 iterations.
Limiting Features
One conclusion we can draw is that we actually have too many features! Our intuition is that a larger feature set would take more iterations to learn. If the features aren't chosen carefully, they'll introduce noise.
So instead of tracking 8 features for each possible direction of movement, let's stick with 3. We'll see if the enemy is on the location, check the distance to the end, and count the number of nearby enemies. When we do this, we get comparable results on the empty maze. But when it comes to avoiding enemies, we do a little better, surviving 150-250 iterations out of 2000. These statistics are all very rough, of course. If we wanted a more thorough analysis, we'd use multiple maze configurations and a lot more runs using the finalized weights.
Conclusions
We can't draw too many conclusions from this yet. Our model is still failing to solve simple versions of our problem. It's quite possible that our model is too simplistic. After all, all we're doing is a simple matrix multiplication on our features. In theory, this should be able to solve the problem, but it may take a lot more iterations. The results stream we see also suggests local minimums are a big problem. Logging information reveals that we often die in the same spot in the maze many times in a row. The negative rewards aren't enough to draw us out, and we are often relying on random moves to find better outcomes.
So next week we're going to start changing our approach. We'll explore a way to introduce supervised learning into our process. This depends on "correct" data. We'll try a couple different ways to get that data. We'll use our own "human" input, as well as the good AI we've written in the past to solve this problem. All we need is a way to record the moves we make! So stay tuned!
Adding Random Exploration
Last week, we finally built a pipeline to use machine learning on our maze game. We made a Tensor Flow graph that could train a "brain" with weights so we could navigate the maze. This week, we'll see how our training works, or rather how it doesn't work. We'll consider how randomizing moves during training might help.
Our machine learning code lives in this repository. For this article, you'll want to look at the randomize-moves
branch. Take a look here for the original game code. You'll want the q-learning
branch in the main repo.
This part of the series uses Haskell and Tensor Flow. To learn more about using these together, download our Haskell Tensor Flow Guide!
Unsupervised Machine Learning
With a few tweaks, we can run our game using the new output weights. But what we'll find as we train the weights is that our bot never seems to win! It always seems to do the same thing! It might move up and then get stuck because it can't move up anymore. It might stand still the whole time and let the enemies come grab it. Why would this happen?
Remember that reinforcement learning depends on being able to reinforce good behaviors. Thus at some point, we have to hope our AI will win the game. Then it will get the good reward so that it can change its behavior to adapt and get good results more often. But if it never gets a good result in the whole training process, it will never learn good behaviors!
This is part of the challenge of unsupervised learning. In a supervised learning algorithm, we have specific good examples to learn from. One way to approach this would be to record our own moves of playing the game. Then the AI could learn directly from us! We'll probably try this approach in the future!
But q-learning is an unsupervised algorithm. We're forcing our AI to explore the world and learn for its own. But right now, it's only making moves that it thinks are "optimal." But with a random set of weights, the "optimal" moves aren't very optimal at all! Part of a good "exploration" plan means letting it choose moves from time to time that don't seem optimal.
Adding a Random Choice
As our first attempt to fix this, we'll add a "random move chance" to our training process. At each training step, our network chooses its "best" move, and we use that to update the world state. From now on, whenever we do this, we'll roll the dice. And if we get a number below our random chance, we'll pick a random move instead of our "best" move.
Over the course of training though, we want to decrease this random chance. In theory, our AI should be better as we train the network. So as we get closer to the end of training, we'll want to make fewer random decisions, and more "best" decisions. We'll aim to start this parameter as 1 in 5, and reduce it down to 1 in 50 as training continues. So how do we implement this?
First of all, we want to keep track of a value representing our chance of making a random move. Our runAllIterations
function should be stateful in this parameter.
-- Third "Float" parameter is the random chance
runAllIterations :: Model -> World
-> StateT ([Float, Int, Float) Session ()
...
trainGame :: World -> Session (Vector Float)
trainGame w = do
model <- buildModel
let initialRandomChance = 0.2
(finalReward, finalWinCount, _) <- execStateT
(runAllIterations model w)
([], 0, initialRandomChance)
run (readValue $ weightsT model)
Then within runAllIterations
, we'll make two changes. First, we'll make a new random generator for each training game. Then, we'll update the random chance, reducing it with the number of iterations:
runAllIterations :: Model -> World
-> StateT ([Float, Int, Float) Session ()
runAllIterations model initialWorld = do
let numIterations = 2000
forM [1..numIterations] $ \i -> do
gen <- liftIO getStdGen
(wonGame, (_, finalReward, _)) <- runStateT
(runWorldIteration model)
(initialWorld, 0.0, gen)
(prevRewards, prevWinCount, randomChance) <- get
let modifiedRandomChance = 1.0 / ((fromIntegral i / 40.0) + 5)
put (newRewards, newWinCount, modifiedRandomChance)
return ()
Making Random Moves
We can see now that runWorldIteration
must now be stateful in the random generator. We'll retrieve that as well as the random chance at the start of the operation:
runWorldIteration :: Model -> StateT (World, Float, StdGen)
(StateT ([Float], Int, Float) Session) Bool
runWorldIteration model = do
(prevWorld, prevReward, gen) <- get
(_, _, randomChance) <- lift get
...
Now let's refactor our serialization code a bit. We want to be able to make a new move based on the index, without needing the weights:
moveFromIndex :: Int -> PlayerMove
moveFromIndex bestMoveIndex =
PlayerMove moveDirection useStun moveDirection
where
moveDirection = case bestMoveIndex `mod` 5 of
0 -> DirectionUp
1 -> DirectionRight
2 -> DirectionDown
3 -> DirectionLeft
4 -> DirectionNone
Now we can add a function that will run the random generator and give us a random move if it's low enough. Otherwise, it will keep the best move.
chooseMoveWithRandomChance ::
PlayerMove -> StdGen -> Float -> (PlayerMove, StdGen)
chooseMoveWithRandomChance bestMove gen randomChance =
let (randVal, gen') = randomR (0.0, 1.0) gen
(randomIndex, gen'') = randomR (0, 1) gen'
randomMove = moveFromIndex randomIndex
in if randVal < randomChance
then (randomMove, gen'')
else (bestMove, gen')
Now it's a simple matter of applying this function, and we're all set!
runWorldIteration :: Model -> StateT (World, Float StdGen)
(StateT ([Float], Int, Float) Session) Bool
runWorldIteration model = do
(prevWorld, prevReward, gen) <- get
(_, _, randomChance) <- lift get
...
let bestMove = ...
let (newMove, newGen) = chooseMoveWithRandomChance
bestMove gen randomChance
…
put (nextWorld, prevReward + newReward, newGen)
continuationAction
Conclusion
When we test our bot, it has a bit more variety in its moves now, but it's still not succeeding. So what do we want to do about this? It's possible that something is wrong with our network or the algorithm. But it's difficult to reveal this when the problem space is difficult. After all, we're expecting this agent to navigate a complex maze AND avoid/stun enemies.
It might help to break this process down a bit. Next week, we'll start looking at simpler examples of mazes. We'll see if our current approach can be effective at navigating an empty grid. Then we'll see if we can take some of the weights we learned and use them as a starting point for harder problems. We'll try to navigate a true maze, and see if we get better weights. Then we'll look at an empty grid with enemies. And so on. This approach will make it more obvious if there are flaws with our machine learning method.
If you've never programmed in Haskell before, it might be a little hard to jump into machine learning. Check out our Beginners Checklist and our Liftoff Series to get started!
Running Training Iterations
In our last article we built a simple Tensor Flow model to perform Q-Learning on our brain. This week, we'll build out the rest of the code we need to run iterations on this model. This will train it to perform better and make more intelligent decisions.
The machine learning code for this project is in a separate repository from the game code. Check out MazeLearner to follow along. Everything for this article is on the basic-trainer
branch.
To learn more about Haskell and AI, make sure to read our Haskell and AI Series!
Iterating on the Model
First let's recall what our Tensor Flow model looks like:
data Model = Model
{ weightsT :: Variable Float
, iterateWorldStep :: TensorData Float -> Session (Vector Float)
, trainStep :: TensorData Float -> TensorData Float -> Session ()
}
We need to think about how we're going to use the last two functions of it. We want to iterate on and make updates to the weights. Across the different iterations, there's certain information we need to track. The first value we'll track is the list of "rewards" from each iteration (this will be more clear in the next section). Then we'll also track the number of wins we get in the iteration.
To track these, we'll use the State
monad, run on top the the Session
.
runAllIterations :: Model -> World -> StateT ([Float], Int) Session ()
We'll also want a function to run a single iteration. This, in turn, will have its own state information. It will track the World
state of the game it's playing. It will also track sum of the accumulated reward values from the moves in that game. Since we'll run it from our function above, it will have a nested StateT
type. It will ultimately return a boolean value indicating if we have won the game. We'll define the details in the next section:
runWorldIteration :: Model ->
StateT (World, Float) (StateT ([Float], Int) Session) Bool
We can now start by filling out our function for running all the iterations. Supposing we'll perform 1000 iterations, we'll make a loop for each iteration. We can start each loop by running the world iteration function on the current model.
runAllIterations :: Model -> World -> StateT ([Float], Int) Session ()
runAllIterations model initialWorld = do
let numIterations = 1000
void $ forM [1..numIterations] $ \i -> do
(wonGame, (_, finalReward)) <-
runStateT (runWorldIteration model world)
...
And now the rest is a simple matter of using our the results to update the existing state:
runAllIterations :: Model -> World -> StateT ([Float], Int) Session ()
runAllIterations model initialWorld = do
let numIterations = 2000
forM [1..numIterations] $ \i -> do
(wonGame, (_, finalReward)) <-
runStateT (runWorldIteration model) (initialWorld, 0.0)
(prevRewards, prevWinCount) <- get
let newRewards = finalReward : prevRewards
let newWinCount = if wonGame
then prevWinCount + 1
else prevWinCount
put (newRewards, newWinCount)
Running a Single Iteration
Now let's delve into the process of a single iteration. Broadly speaking, we have four goals.
- Take the current world and serialize it. Pass it through the
iterateStep
to get the move our model would make in this world. - Apply this move, getting the "next" world state.
- Determine the scores for our moves in this next world. Apply the given reward as the score for the best of these moves.
- Use this result to compare against our original moves. Feed it into the training step and update our weights.
Let's start with steps 1 and 2. We'll get the vector representation of the current world. Then we need to encode it as TensorData
so we can pass it to an input feed. Next we run our model's iterate step and get our output move. Then we can use that to advance the world state using stepWorld
and updateEnvironment
.
runWorldIteration
:: Model
-> StateT (World, Float) (StateT ([Float], Int) Session) Bool
runWorldIteration model = do
-- Serialize the world
(prevWorld :: World, prevReward) <- get
let (inputWorldVector :: TensorData Float) =
encodeTensorData (Shape [1, 8]) (vectorizeWorld prevWorld)
-- Run our model to get the output vector and de-serialize it
-- Lift twice to get into the Session monad
(currentMove :: Vector Float) <- lift $ lift $
(iterateWorldStep model) inputWorldVector
let newMove = moveFromOutput currentMove
-- Get the next world state
let nextWorld = updateEnvironment (stepWorld newMove prevWorld)
Now we need to perform the Q-Learning step. We'll start by repeating the process in our new world state and getting the next vector of move scores:
runWorldIteration model = do
...
let nextWorld = updateEnvironment (stepWorld newMove prevWorld)
let nextWorldVector =
encodeTensorData (Shape [1, 8]) (vectorizeWorld nextWorld)
(nextMoveVector :: Vector Float) <- lift $ lift $
(iterateWorldStep model) nextWorldVector
...
Now it gets a little tricky. We want to examine if the game is over after our last move. If we won, we'll get a reward of 1.0. If we lost, we'll get a reward of -1.0. Otherwise, there's no reward. While we figure out this reward value, we can also determine our final monadic action. We could return a boolean value if the game is over, or recursively iterate again:
runWorldIteration model = do
...
let nextWorld = ...
(nextMoveVector :: Vector Float) <- ...
let (newReward, containuationAction) = case worldResult nextWorld of
GameInProgress -> (0.0, runWorldIteration model)
GameWon -> (1.0, return True)
GameLost -> (-1.0, return False)
...
Now we'll look at the vector for our next move and replace one of its values. We'll find the maximum score, and replace it with a value that factors in the actual reward we get from the game. This is how we insert "truth" into our training process and how we'll actually learn good reward values.
import qualified Data.Vector as V
runWorldIteration model = do
...
let nextWorld = ...
(nextMoveVector :: Vector Float) <- ...
let (newReward, containuationAction) = ...
let (bestNextMoveIndex, maxScore) =
(V.maxIndex nextMoveVector, V.maximum nextMoveVector)
let (targetActionValues :: Vector Float) = nextMoveVector V.//
[(bestNextMoveIndex, newReward + (0.99 * maxScore))]
let targetActionData =
encodeTensorData (Shape [10, 1]) targetActionValues
...
Then we'll encode this new vector as the second input to our training step. We'll still use the nextWorldVector
as the first input. We conclude by updating our state variables to have their new values. Then we run the continuation action we got earlier.
runWorldIteration model = do
...
let nextWorld = ...
(nextMoveVector :: Vector Float) <- ...
let targetActionData = ...
-- Run training to alter the weights
lift $ lift $ (trainStep model) nextWorldVector targetActionData
put (nextWorld, prevReward + newReward)
continuationAction
Tying It Together
Now to make this code run, we need a little bit of code to tie it together. We'll make a Session
action to train our game. It will output the final weights of our model.
trainGame :: World -> Session (Vector Float)
trainGame w = do
model <- buildModel
(finalReward, finalWinCount) <-
execStateT (runAllIterations model w) ([], 0)
run (readValue $ weightsT model)
Then we can run this from IO
using runSession
.
playGameTraining :: World -> IO (Vector Float)
playGameTraining w = runSession (trainGame w)
Last of all, we can run this on any World
we like by first loading it from a file. For our first examples, we'll use a smaller 10x10 grid with 2 enemies and 1 drill powerup.
main :: IO ()
main = do
world <- loadWorldFromFile "training_games/maze_grid_10_10_2_1.game"
finalWeights <- playGameTraining world
print finalWeights
Conclusion
We've now got the basics down for making our Tensor Flow program work. Come back next week where we'll take a more careful look at how it's performing. We'll see if the AI from this process is actually any good or if there are tweaks we need to make to the learning process.
And make sure to download our Haskell Tensor Flow Guide! This library is difficult to use. There are a lot of secondary dependencies for it. So don't go in trying to use it blind!
Making a Learning Model
Last week we took a few more steps towards using machine learning to improve the player AI for our maze game. We saw how to vectorize the input and output data for our world state and moves. This week, we'll finally start seeing how to use these in the larger context of a Tensor Flow program. We'll make a model for a super basic neural network that will apply the technique of Q-Learning.
Our machine learning code will live in a separate repository than the primary game code. Be sure to check that out here! The first couple weeks of this part of the series will use the basic-trainer
branch.
This week, we'll finally started diving into using Haskell with Tensor Flow. Be sure to read our Haskell AI Series to learn more about this! You can also download our Haskell Tensor Flow guide to learn the basics of the library.
Model Basics
This week's order of business will be to build a Tensor Flow graph that can make decisions in our maze game. The graph should take a serialized world state as an input, and then produce a distribution of scores. These scores correspond to the different moves we can make.
Re-calling from last week, the input to our model will be a 1x8 vector, and the output will be a 10x1 vector. For now then, we'll represent our model with a single variable tensor that will be a matrix of size 8x10. We'll get the output by multiply the inputs by the weights.
Ultimately, there are three things we need to access from this model.
- The final weights
- A step to iterate the world
- A step to train our model and adjust the weights.
Here's what the model looks like, using Tensor Flow types:
data Model = Model
{ weightsT :: Variable Float
, iterateWorldStep :: TensorData Float -> Session (Vector Float)
, trainStep :: TensorData Float -> TensorData Float -> Session ()
}
The first element is the variable tensor for our weights. We need to expose this so we can output them at the end. The second element is a function that will take in a serialized world state and produce the output move. Then the third element will take both a serialized world state AND some expected values. It will update the variable tensor as part of the Q-Learning process. Next week, we'll write iteration functions in the Session
monad. They'll use these two elements.
Building the Iterate Step
To make these Tensor Flow items, we'll also need to use the Session
monad. Let's start a basic function to build up our model:
buildModel :: Session Model
buildModel = do
...
To start, let's make a variable for our weights. At the start, we'll randomize them with truncatedNormal
and then make that into a Variable
:
buildModel :: Session Model
buildModel = do
(initialWeights :: Tensor Value Float) <-
truncatedNormal (vector [8, 10])
(weights :: Variable Float) <- initializedVariable initialWeights
Now let's build the items for running our iterate step. This first involves taking the inputs as a placeholder. Remember, the inputs come from the vectorization of the world state.
Then to produce our output, we'll multiply the inputs by our weights. The result is a Build
tensor, so we need to render
it to use it in the next part. As an extra note, we need readValue
to turn our Variable
into a Tensor
we can use in operations.
buildModel :: Session Model
buildModel = do
(initialWeights :: Tensor Value Float) <-
truncatedNormal (vector [8, 10])
(weights :: Variable Float) <- initializedVariable initialWeights
(inputs :: Tensor Value Float) <- placeholder (Shape [1,8])
let (allOutputs :: Tensor Build Float) =
inputs `matMul` (readValue weights)
returnedOutputs <- render allOutputs
...
The next part is to create a step to "run" the outputs. Since the outputs depend on a placeholder, we need to create a feed for the input. Then we can create a runnable Session
action with runWithFeeds
. This gives us the second element of our Model
, the iterateStep
.
buildModel :: Session Model
buildModel = do
...
let iterateStep = \inputFeed ->
runWithFeeds [feed inputs inputFeed] returnedOutputs
...
Using Q-Learning in the Model
This gives us what we need to run our basic AI and make moves in the game. But we still need to apply some learning mechanism to update the weights!
We want to use Q-Learning. This means we'll compare the output of our model with the next output from continuing to step through the world. So first let's introduce another placeholder for these new outputs:
buildModel :: Session Model
buildModel = do
initialWeights <- ...
weights <- ...
inputs <- ...
returnedOutputs <- ...
let iterateStep = ...
-- Next set of outputs
(nextOutputs :: Tensor Value Float) <- placeholder (Shape [10, 1])
...
Now we'll define our "loss" function. That is, we'll find the squared difference between our real output and the "next" output. Next week we'll see that the "next" output uses extra information about the game. This will allow us to bring an element of "truth" that we can learn from.
buildModel :: Session Model
buildModel = do
...
returnedOutputs <- ...
-- Q-Learning Section
(nextOutputs :: Tensor Value Float) <- placeholder (Shape [10, 1])
let (diff :: Tensor Build Float) = nextOutputs `sub` allOutputs
let (loss :: Tensor Build Float) = reduceSum (diff `mul` diff)
...
Now, we'll make a final ControlNode
using minimizeWith
. This will minimize the loss function using the adam
optimizer. We'll pass weights
as an input, since this is a variable we are trying to update for this change.
buildModel :: Session Model
buildModel = do
...
returnedOutputs <- ...
-- Q-Learning Section
(nextOutputs :: Tensor Value Float) <- placeholder (Shape [10, 1])
let (diff :: Tensor Build Float) = nextOutputs `sub` allOutputs
let (loss :: Tensor Build Float) = reduceSum (diff `mul` diff)
(trainer_ :: ControlNode) <- minimizeWith adam loss [weights]
Finally, we'll make our training step, that will run the training node on two input feeds. One for the world input, and one for the expected output. Then we can return our completed model.
buildModel :: Session Model
buildModel = do
...
inputs <- ...
weights <- ...
returnedOutputs <- ...
let iterateStep = ...
-- Q-Learning Section
(nextOutputs :: Tensor Value Float) <- placeholder (Shape [10, 1])
let diff = ...
let loss = ...
(trainer_ :: ControlNode) <- minimizeWith adam loss [weights]
let trainingStep = \inputFeed nextOutputFeed -> runWithFeeds
[ feed inputs inputFeed
, feed nextOutputs nextOutputFeed
]
trainer_
return $ Model
weights
iterateStep
trainingStep
Conclusion
Now we've got our machine learning model. We have different functions that can iterate on our world state as well as train the outputs of our graph. Next week, we'll see how to combine these steps within the Session
monad. Then we can start running training iterations and produce results.
If you want to follow along with these code examples, make sure to download our Haskell Tensor Flow Guide! This library is quite tricky to use. There are a lot of secondary dependencies for it. So you won't want to go in trying to use it blind!
Q-Learning Primer
This week, we're going to take the machine learning process in a different direction than I expected. In the last couple weeks, we've built a simple evaluation function for our world state. We could learn this function using an approach called Temporal Difference learning. We might come back to this approach at some point. But for now, we're actually going to try something a little different.
Instead, we're going to focus on a technique called Q-Learning. Instead of an evaluation function for the world, we're going to learn makePlayerMove
. We'll keep most of the same function structure. We're still going to take our world and turn it into a feature space that we can represent as a numeric vector. But instead of producing a single output, we'll give a score for every move from that position. This week, we'll take the basic steps to ready our game for this approach.
As always, check out the Github repository repository for this project. This week's code is on the q-learning
branch!
Next week, we'll finally get into some Tensor Flow code. Make sure you're ready for it by reading up on our Tensor Flow Guide!
Vectorizing Inputs
To learn a function, we need to be able to represent both the inputs and the outputs of our system as numeric vectors. We've already done most of the work here. Let's recall our evaluateWorld
function. We'll keep the same feature values. But now we'll wrap them, instead of applying scores immediately:
data WorldFeatures = WorldFeatures
{ onActiveEnemy :: Int
, shortestPathLength :: Int
, manhattanDistance :: Int
, enemiesOnPath :: Int
, nearestEnemyDistance :: Int
, numNearbyEnemies :: Int
, stunAvailable :: Int
, drillsRemaining :: Int
}
produceWorldFeatures :: World -> WorldFeatures
produceWorldFeatures w = WorldFeatures
(if onActiveEnemy then 1 else 0)
shortestPathLength
manhattanDistance
enemiesOnPath
nearestEnemyDistance
numNearbyEnemies
(if stunAvailable then 1 else 0)
(fromIntegral drillsRemaining)
where
-- Calculated as before
onActiveEnemy = ...
enemiesOnPath = ...
shortestPathLength = ...
nearestEnemyDistance = ...
manhattanDistance = ...
stunAvailable = ...
numNearbyEnemies = ...
drillsRemaining = ...
Now, in our ML code, we'll want to convert this into a vector. Using a vector will enable use to encode this information as a tensor.
vectorizeWorld :: World -> Vector Float
vectorizeWorld w = fromList (fromIntegral <$>
[ wfOnActiveEnemy features
, wfShortestPathLength features
, wfManhattanDistance features
, wfEnemiesOnPath features
, wfNearestEnemyDistance features
, wfNumNearbyEnemies features
, wfStunAvailable features
, wfDrillsRemaining features
])
where
features = produceWorldFeatures w
Vectorizing Outputs
Now we have the inputs to our tensor system. We'll ultimately get a vector of outputs as a result. We want this vector to provide a score for every move. We have 10 potential moves in general. There are five "movement" directions, moving up, right, down, left, and standing still. Then for each direction, we can either use our stun or not. We'll use a drill when the movement direction sends us against a wall. Certain moves won't be available in certain situations. But our size should account for them all.
Our function will often propose invalid moves. For example, it might suggest using the stun while on cooldown, or drilling when we don't have one. In these cases, our game logic should dictate that our player doesn't move. Hopefully this trains our network to make correct moves. If we wanted to, we could even apply a slight negative reward for these.
What we need is the ability to convert a vector of outputs into a move. Once we fix the vector size, this is not difficult. As a slight hack, this function will always give the same direction for moving and drilling. We'll let the game logic determine if the drill needs to apply.
moveFromOutput :: Vector Int -> PlayerMove
moveFromOutput vals = PlayerMove moveDirection useStun moveDirection
where
bestMoveIndex = maxIndex vals
moveDirection = case bestMoveIndex `mod` 5 of
0 -> DirectionUp
1 -> DirectionRight
2 -> DirectionDown
3 -> DirectionLeft
4 -> DirectionNone
useStun = bestMoveIndex > 4
Discrete Updates
Now that we can get numeric vectors for everything, we need to be able to step through the world, one player move at a time. We currently have a generic update
function. Depending on the time, it might step the player forward, or it might not. We want to change this so there are two steps. First we receive a player's move, and then we step the world forward until it is time for the next player move:
stepWorld :: PlayerMove -> World -> World
This isn't too difficult; it just requires a little shuffling of our existing code. First, we'll add a new applyPlayerMove'
function. This will take the existing applyPlayerMove
and add a little bit of validation to it:
applyPlayerMove' :: PlayerMove -> World -> World
applyPlayerMove' move w = if isValidMove
then worldAfterMove
else w
where
player = worldPlayer w
currentLoc = playerLocation player
worldAfterDrill = modifyWorldForPlayerDrill w
(drillDirection move)
worldAfterStun = if activateStun move
then modifyWorldForStun worldAfterDrill
else worldAfterDrill
newLocation = nextLocationForMove
(worldBoundaries worldAfterDrill Array.! currentLoc)
currentLoc
(playerMoveDirection move)
isValidStunUse = if activateStun move
then playerCurrentStunDelay player == 0
else True
isValidMovement = playerMoveDirection move == DirectionNone ||
newLocation /= currentLoc
isValidMove = isValidStunUse && isValidMovement
worldAfterMove =
modifyWorldForPlayerMove worldAfterStun newLocation
Now we'll add an updateEnvironment
function. This will perform all the work of our updateFunc
except for moving the player.
updateEnvironment :: World -> World
updateEnvironment w
| playerLocation player == endLocation w =
w { worldResult = GameWon }
| playerLocation player `elem` activeEnemyLocations =
w { worldResult = GameLost }
| otherwise =
updateWorldForEnemyTicks .
updateWorldForPlayerTick .
updateWorldForEnemyMoves .
clearStunCells .
incrementWorldTime $ w
where
player = worldPlayer w
activeEnemyLocations = enemyLocation <$>
filter (\e -> enemyCurrentStunTimer e == 0) (worldEnemies w)
Now we combine these. First we'll make the player's move. Then we'll update the environment once for each tick of the player's "lag" time.
stepWorld :: PlayerMove -> World -> World
stepWorld move w = execStateM (sequence updateActions) worldAfterMove
where
worldAfterMove = applyPlayerMove' move w
updateActions = replicate
( fromIntegral .
lagTime .
playerGameParameters .
worldParameters $ w)
(modify updateEnvironment)
And these are all the modifications we'll need to get going!
Q Learning Teaser
Now we can start thinking about the actual machine learning process. We'll get into a lot more detail next week. But for now, let's think about a particular training iteration. We'll want to use our existing network to step forward into the game. This will produce a certain "reward", and leave the game state in a new position. Then we'll get more values for our next moves out of that position. We'll use the updated move scores and the reward to learn better values for our function weights.
Of course, the immediate "reward" values for most moves will be 0. The only moves that will carry a reward will be those where we either win the game or lose the game. So it could take a while for our program to learn good behaviors. It will take time for the "end" behaviors of the game to affect normal moves. For this reason, we'll start our training on much smaller mazes than the primary game. This should help speed up the training process.
Conclusion
Next week, we'll take our general framework for Q-Learning and apply it within Tensor Flow. We'll get the basics of Q-Learning down with a couple different types of models. For a wider perspective on Haskell and AI problems, make sure to check out our Haskell AI Series!
Adding Features for Better Behavior
Last week we started exploring the idea of an AI built on an evaluation function. This has the potential to allow us to avoid a lot of the hand-crafting that comes with AI design. Hard old way specified all the rules for the AI to follow. In the new approach, we create a mathematical function to evaluate a game position. Then we can look at all our possible moves and select the one with the best result. We could, if we wanted to, turn the input to our evaluation function into a vector of numbers. And its output is also a number. This property will help us realize our dream future to machine learn this function.
We made a rudimentary version of this function last week. Even before turning to machine learning, there are a couple ways to improve our function. We can try tweaking the weights we applied to each feature. But we can also try coming up with new features, or try different combinations of features. This week, we'll try the latter approach.
In the coming weeks as we start exploring machine learning, we'll use Tensor Flow with Haskell! To get prepared, download our Haskell Tensor Flow guide!
Existing Features
Last week, we came up with a few different features that could help us navigate this maze. These features included:
- Maze distance to goal
- Manhattan distance to goal
- Whether or not an enemy is on our location
- Whether or not our stun is available
- The number of drills we have available
- The number of enemies that are nearby (using manhattan distance)
But there were some clear sub-optimal behaviors with our bot. We tend to get "zoned out" by enemies, even when they aren't near us by maze distance. Obviously, it would suit us to use maze distance instead of manhattan distance. But we also want to be willing to approach enemies aggressively when we have our stun, and retreat intelligently without it. To that end, let's add a couple more features:
- The number of enemies on the shortest path to the goal.
- The shortest distance to an enemy from a particular square (only up to 5)
We'll impose a penalty for close enemies if we don't have our stun. Otherwise we'll ignore this first new feature. Then we'll also impose a penalty having more enemies on our shortest path. This will make us more willing to use the stun, rather than waiting.
Enemies In The Way
Our first order of business will be to determine how many enemies lie on our shortest path. We'll filter the path itself based on membership in the active enemies set:
evaluateWorld :: World -> Float
evaluateWorld w =
where
activeEnemyLocations = …
shortestPath =
getShortestPath (worldBoundaries w) playerLoc goalLoc
enemiesOnPath = length $ filter
(\l -> Set.member l (Set.fromList activeEnemyLocations))
shortestPath
Then we'll assign each enemy on this path a penalty greater than the value of using the stun. We'll add this score to our other scores.
evaluateWorld :: World -> Float
evaluateWorld w =
enemiesOnPathScore +
...
where
enemiesOnPath = ...
enemiesOnPathScore = -85.0 * (fromIntegral enemiesOnPath)
Maze Distance
Next lets get the shortest maze distance to a nearby enemy. We'll actually want to generalize the behavior of our existing BFS function for this. We want to find the shortest path to any one of the enemy locations. So instead of supplying a single target location, we'll supply a set of target locations. Then we'll cap the distance to search so we aren't doing a full BFS of the maze every time. This gives an optional range parameter. Let's use these ideas to make an expanded API that our original function will use.
getShortestPathToTargetsWithLimit
:: Maze
-> Location
-> Set.Set Location
-> Maybe Int
-> [Location]
getShortestPathToTargetsWithLimit
maze initialLocation targetLocations maxRange = ...
-- Original function call!
getShortestPath maze initialLocation targetLocation =
getShortestPathToTargetsWithLimit maze initialLocation
(Set.singleton targetLocation) Nothing
bfs
:: Maze
-> Location
-> Set.Set Location -- Now a set of targets
-> Maybe Int -- Added range parameter
-> [Location]
bfs = ...
We'll have to make a few tweaks to our algorithm now. Each search state element will have a "distance" associated with it.
data BFSState = BFSState
{ bfsSearchQueue :: Seq.Seq (Location, Int)
...
-- Our initial state has a distance of 0
getShortestPathToTargetsWithLimit
maze initialLocation targetLocations maxRange =
evalState
(bfs maze initialLocation targetLocations maxRange)
(BFSState
(Seq.singleton (initialLocation, 0))
(Set.Singleton initialLocation)
Map.empty)
Now we need a couple modifications to the core bfs
function. When extracting the next element in the queue, we have to consider its distance. All new items we create will increment that distance. And if we're at the max distance, we won't add anything to the queue. Finally, when evaluating if we're done, we'll check against the set of targets, rather than a single target. Here's our bfs
code, with differences noted.
bfs
:: Maze
-> Location
-> Set.Set Location
-> Maybe Int
-> State BFSState [Location]
bfs maze initialLocation targetLocations maxRange = do
BFSState searchQueue visitedSet parentsMap <- get
if Seq.null searchQueue
then return []
else do
-- ! Unwrap distance as well
let (nextLoc, distance) = Seq.index searchQueue 0
-- ! Check set membership, not equality
if Set.member nextLoc targetLocations
then return (unwindPath parentsMap [nextLoc])
else do
-- ! Add the new distance to each adjacent cell
let adjacentCells = (, distance + 1) <$>
getAdjacentLocations maze nextLoc
-- ! Account for the distance with a new helper function
let unvisitedNextCells = filter
(shouldAddNextCell visitedSet)
adjacentCells
let newSearchQueue = foldr
(flip (Seq.|>))
(Seq.drop 1 searchQueue)
unvisitedNextCells
newVisitedSet = Set.insert nextLoc visitedSet
newParentsMap = foldr
(\(l, _) -> Map.insert l nextLoc)
parentsMap unvisitedNextCells
put (BFSState newSearchQueue newVisitedSet newParentsMap)
bfs maze initialLocation targetLocations maxRange
where
-- ! Helper function to account for distance when adding to queue
shouldAddNextCell visitedSet (loc, distance) = case maxRange of
Nothing -> not (Set.member loc visitedSet)
Just x -> distance <= x && not (Set.member loc visitedSet)
unwindPath parentsMap currentPath = ...
Now to use this feature, we'll use our new different shortest path call. If the distance is "0", this means we have no enemies near us, and there's no penalty. We also won't apply a penalty if our stun is available. Otherwise, we'll provide a stiffer penalty the shorter the path. Then we mix it in with the other scores.
evaluateWorld :: World -> Float
evaluateWorld w =
...
nearestEnemyDistanceScore +
...
where
...
nearestEnemyDistance = length $ getShortestPathToTargetsWithLimit
(worldBoundaries w)
playerLoc
(Set.fromList activeEnemyLocations)
(Just 4)
nearestEnemyDistanceScore =
if nearestEnemyDistance == 0 || stunAvailable then 0.0
else -100.0 * (fromIntegral (5 - nearestEnemyDistance))
We'll also drop the enemy manhattan distance weight to -5.0.
Results
From this change, our player suddenly appears much more intelligent! It will back away from enemies when it is missing it's stun. It will use the stun and go past the enemy when appropriate.
There are still ways we could improve the AI. It doesn't account for future space to retreat when running away. It sometimes uses the stun too early, when it might be better to wait for more enemies to come into range. But it's not clear how we could improve it by tweaking the weights. This means it's time to consider machine learning as an option to get better weights!
Conclusion
Next week, we'll re-acquaint ourselves with the basics of machine learning and Tensor Flow. This will set us up to write a program that will determine our AI weights.
We're going to start working with Tensor Flow next week! To make sure you can keep up, download our Haskell Tensor Flow Guide. It'll help you with the basics of making this complex Haskell library work.
Building a Better Brain
In the last few weeks, we've focused a lot on the player AI for our game. We've used a few more advanced tricks to help our player navigate the maze using drills. But that's come at a performance cost. The game can now get a little choppy when there are a lot of enemies, or when our player is far away from the goal. It also takes longer to run our analysis iterations than we would like.
This week, we'll improve the performance of our AI by caching the determined path. Lots of our calculations for shortest path measurements get repeated. We can keep track of these, and avoid the entire BFS algorithm altogether in a lot of circumstances!
This week, you should take a look at the search-caching
branch on our Github repository for the complete code we're implementing here. We'll focus on changes in the MazeUtils.hs
file.
We're also going to do a little bit of profiling for this article. Profiling your code is an important skill to learn about if you ever want to use Haskell in production. For some other useful skills, check out our Production Checklist!
Profiling Our Code
As alluded to above, we have a pretty good idea of where the performance bottleneck is for our code. But it always pays to be sure. So to double check, we're going to run our code under profiling. We'll go through some of the basics here, but you should also check out this article we did on profiling a while back.
We'll get a readout for our code that will tell us which functions are taking the most time. This will tell us where we can make the most effective improvements. It will also give us a concrete way to prove our improvement later.
To start, we'll need to rebuild our code with stack build --profile
. Be warned this can take a while, since all the libraries also need to be re-built. Then we can re-run the analysis program we used last week:
stack exec -- analyze-game maze_save_2 --enemies +RTS -p
Here's the abbreviated readout in the file `analyze-game.EXE.prof:
total time = 32.62 secs
COST CENTRE %time
drillBFS.newParentsMap.\ 21.9
drillBFS.unvisitedNextItems.\ 21.7
drillBFS.newVisitedSet 19.4
getDrillAdjacentItems 6.2
drillBFS 4.5
drillBFS.newSearchQueue 4.0
getDrillAdjacentItems.mkItemFromResult 3.0
bfs.newParentsMap.\ 2.1
bfs.newVisitedSet 2.0
getDrillAdjacentItems.mkItemFromResult.(...) 1.7
drillBFS.unvisitedNextItems 1.4
bfs.unvisitedNextCells.\ 1.1
drillBFS.newParentsMap 1.0
getDrillAdjacentItems.bounds 1.0
bfs 0.6
getAdjacentLocations 0.5
Unsurprisingly, we see that drillBFS
and it's helpers are the biggest culprits. They account for the top seven entries on the list and a whopping 82% of the time we spend. The enemy AI calculations come in a distant second at around 6.3% of the time. So let's focus on fixing the player algorithm.
A Basic Cache for the Player
As we try to improve our player AI, there's one big observation we can make. Perhaps some of you already noted this when reading about that AI in the first place. For the most part, our player follows a single path the whole time. We calculate the complete path from start to finish on each player move cycle, but then throw most of it away. The only time we get "blown off" this path is when we have to run away from enemies.
There are only a few circumstances where we change this path! So let's make PlayerMemory
type that will keep track of it. This should save us a ton of time!
newtype PlayerMemory = PlayerMemory (Maybe [Location])
data Player = Player
{ …
, playerMemory :: PlayerMemory
}
We'll add this memory to our player type. When we initialize it from JSON instances, it should start out empty. There's no need to keep track of this in a save-game file.
This change will complicate our move API a little bit. It will now produce the PlayerMemory
as an output:
makePlayerMove :: World -> (PlayerMove, PlayerMemory)
Using Our Memory
When it comes to making out move, we first need to put the path into memory. To start, we'll make PlayerMemory
out of the path we get from BFS.
makePlayerMove :: World -> (PlayerMove, PlayerMemory)
makePlayerMove w =
( PlayerMove finalMoveDirection useStun drillDirection
, ...
)
where
shortestPath = getShortestPathWithDrills …
memoryFromMove = PlayerMemory (Just shortestPath)
...
In general, we'll want to return this "memory". But there's one circumstance where we'll want to invalidate it. When we have to retreat from our enemies, we'll diverge from this ideal path. In this case, we'll return Nothing
. Here's what that logic looks like:
makePlayerMove :: World -> (PlayerMove, PlayerMemory)
makePlayerMove w =
( PlayerMove finalMoveDirection useStun drillDirection
, if emptyCache then (PlayerMemory Nothing) else memoryFromMove
)
where
(finalMoveDirection, useStun, emptyCache) = if not enemyClose
then (shortestPathMoveDirection, False, False)
else if canStun
then (shortestPathMoveDirection, True, False)
else case find (/= shortestPathMoveLocation) possibleMoves of
Nothing -> (DirectionNone, False, True)
Just l -> (getMoveDirection playerLoc, False, True)
Now let's consider when we use the cached information, as this will let us skip the BFS call altogether! We'll add one more validity check when doing this. We'll ensure that the list is non-empty and that our current location is at the head of the list. Then we can use the tail of the memory list as the shortest path call!
makePlayerMove :: World -> (PlayerMove, PlayerMemory)
makePlayerMove w = ...
where
(useCache, cachePath) = case playerMemory currentPlayer of
(PlayerMemory (Just (first : rest))) ->
(first == playerLoc, rest)
_ -> (False, [])
shortestPath = if useCache then cachePath
else getShortestPathWithDrills ...
The last thing we need is to ensure that the cache goes back into memory. This is a simple modification of our function for making the player move:
modifyWorldForPlayerMove :: World -> Location -> PlayerMemory -> World
modifyWorldForPlayerMove w newLoc memory = ...
where
currentPlayer = worldPlayer w
playerWithMemory = currentPlayer {playerMemory = memory}
playerAfterMove = movePlayer newLoc playerWithMemory
...
Now we can run our analysis again. We'll see that our Player's AI functions are still the biggest contributor. But the percentage has gone down a lot. They now take only take up around 55% of our total time, instead of 82%! Meanwhile, the percentage of time from the normal BFS functions is now up to around 35%. Most importantly, the total time for the analysis declined five-fold. On the first run, it was 32.62 seconds, and it now only takes 6.79 seconds, a huge improvement!
total time = 6.79 secs
COST CENTRE %time
drillBFS.unvisitedNextItems.\ 14.3
drillBFS.newParentsMap.\ 14.2
drillBFS.newVisitedSet 12.6
bfs.newParentsMap.\ 9.9
bfs.newVisitedSet 9.2
bfs.unvisitedNextCells.\ 5.7
getDrillAdjacentItems 4.3
drillBFS.newSearchQueue 2.8
getAdjacentLocations 2.8
drillBFS 2.6
bfs 2.6
getDrillAdjacentItems.mkItemFromResult 2.0
bfs.newSearchQueue 1.8
getDrillAdjacentItems.mkItemFromResult.(...) 1.1
bfs.unwindPath 1.1
bfs.unvisitedNextCells 1.0
drillBFS.unvisitedNextItems 0.9
bfs.newParentsMap 0.7
Conclusion
Profiling is an important tool we can use for improving our code, no matter what language we're working in. When our program isn't performing how we like, we have to be sure to address the right parts of it. It may have been tempting to make a different assumption from the start. Since there are many enemy characters, it would be natural to tackle that algorithm first. But our profiling output made it clear that the player AI was the problem.
Next week, we'll start exploring different AI concepts. We'll start moving towards a kind of AI that can be machine-learned. Our code will be simpler, but our product won't be as good, at least at the start! But we'll start getting used to the way an AI can evaluate positions.
For more useful resources in improving your Haskell skills, download our Production Checklist! It has a lot of different tools and libraries to check out!
Moving Towards ML: Evaluation Functions
Before we get started, here's a reminder that today (August 5th) is the last day of enrollment for our Haskell From Scratch course! Sign-ups close at midnight Pacfic time! Don't miss out!
This week, we're going to start taking our AI in a somewhat new direction. Right now, we're hard-coding specific decisions for our player to make. But this week, we'll make a more general function for evaluating different positions. Our initial results will be inferior to the AI we've hand-coded. But we'll set ourselves up to have a much better AI in the future by applying machine learning.
For more details on the code for this article, take a look at the evaluation-game-function
branch on our Github Repository! This article also starts our move towards machine learning related concepts. So now would be a good time to review our Haskell AI Series. You can download our Tensor Flow Guide to learn more about using Haskell and Tensor Flow!
Evaluation as a Strategy
Currently, our AI follows a strict set of rules. It performs pretty well for the current problem space. But suppose circumstances changed. Suppose we use different maze structures. Or we could add a completely new feature to the game. In these cases, we might need a completely different set of ideas to build a competent AI.
Our new strategy will be much more general. We'll supply our AI with a function that can evaluate a particular board position. That is, it will look at the world, and create a numeric output scoring it. Then our brain will look at all possible moves, score each position, and choose the move with the best result.
If game rules change, we'll need to rethink the evaluation function. But, by making the problem one of numbers to numbers, it'll be easier to use machine learning (instead of our own logic) to devise this function. This way, we can radically change the nature of the game, and we won't need to do too much manual work to change the AI. We might need to add new features (as we'll discuss later). But otherwise we would just need to re-train the evaluation function.
Top Down Development
To implement this approach, we'll put the "function" in functional programming. We'll start by outlining our decision making process with a series of type signatures. Let's remember that first, our overarching goal is a function that takes a World
and gives us a PlayerMove
:
makePlayerMove :: World -> PlayerMove
We should first determine the set of possible moves:
possibleMoves :: World -> [PlayerMove]
Then we'll need to calculate the new World
from each of those moves. (We won't go over this function in this article. It mainly consists of refactoring code we already have for manipulating the game).
applyPlayerMove :: World -> PlayerMove -> World
Then we'll score each of those resulting worlds. This is where the real "brain" is going to live now:
evaluateWorld :: World -> Float
Now that we know the functions we're writing, we can already implement makePlayerMove
. We'll assume our helpers already exist and then we apply the process outlined above:
makePlayerMove :: World -> PlayerMove
makePlayerMove w = bestMove
where
-- 1. Get our Moves
allMoves = possibleMoves w
-- 2. See what the results of each move are
possibleWorlds = applyPlayerMove w <$> allMoves
-- 3. Score each resulting world
scores = evaluateWorld <$> possibleWorlds
-- 4. Combine the world with its move and choose the best one
movesWithScores = zip allMoves movesWithScores
bestMove = fst $ maximumBy (\(_, score1) (_, score2) ->
compare score1 score2) movesWithScores
This will compile, and we can now move on to the individual components.
Getting Possible Moves
Let's start with getting all the possible moves. When it comes to movement, we generally have five options: stand still, or move in one of four directions. But if we're out of drills, or near the boundary of the world, this can restrict our options. But we always have the sure option of standing still, so let's start with that:
possibleMoves :: World -> [PlayerMove]
possibleMoves w = …
where
standStillMove = PlayerMove DirectionNone False DirectionNone
...
Now in every direction, we'll have a Maybe
move possibility. If it's a WorldBoundary
, we'll get Nothing
. Otherwise if it's a wall, then we'll have a possible move as long as a drill is available. Otherwise the move is possible, and we won't need a drill. We'll wrap these behaviors in a helper function, and then it's easy to use that in each direction:
possibleMoves :: World -> [PlayerMove]
possibleMoves w = baseMoves
where
standStillMove = PlayerMove DirectionNone False DirectionNone
player = worldPlayer w
bounds = (worldBoundaries w) Array.! (playerLocation player)
possibleMove :: (CellBoundaries -> BoundaryType) ->
MoveDirection -> Maybe PlayerMove
possibleMove boundaryFunc direction =
case boundaryFunc bounds of
WorldBoundary -> Nothing
Wall _ -> if playerDrillsRemaining player > 0
then Just $ PlayerMove direction False direction
else Nothing
AdjacentCell _ -> Just $
PlayerMove direction False DirectionNone
upMove = possibleMove upBoundary DirectionUp
rightMove = possibleMove rightBoundary DirectionRight
downMove = possibleMove downBoundary DirectionDown
leftMove = possibleMove leftBoundary DirectionLeft
baseMoves = standStillMove : (catMaybes [upMove, rightMove, downMove, leftMove])
Now we have to factor in that each move can also apply the stun if it's available.
possibleMoves :: World -> [PlayerMove]
possibleMoves w = baseMoves ++ stunMoves
where
...
baseMoves = standStillMove : (catMaybes [upMove, rightMove, downMove, leftMove])
stunMoves = if playerCurrentStunDelay player /= 0 then []
else [ m { activateStun = True } | m <- baseMoves ]
And now we've got our moves!
Evaluating the Game Position
Now let's start tackling the problem of evaluating a particular game situation. Any manual solution we come up with here is likely to have problems. This is where machine learning will come in. But here's the general approach we want.
First, we'll select particular "features" of the world. For instance, how far away are we from the end of the maze? How many enemies are within our stun radius? We'll consider all these elements, and then come up with a "weight" for each feature. A weight is a measurement of whether that feature makes the position "good" or "bad". Then, we'll add together the weighted feature values to get a score. So here's a list of the features we're going to use:
- How close are we (in maze search terms) from the target location? This will use pure BFS and it will not account for using drills.
- How close are we in manhattan distance terms from the target location?
- Is there an active enemy on the same square as the player (this will receive a heavy negative weight!)
- How many enemies are within our stun radius?
- Is our stun available?
- How many drills do we have left?
Let's start by getting all these features:
evaluateWorld :: World -> Float
evaluateWorld w = ...
where
player = worldPlayer w
playerLoc@(px, py) = playerLocation player
radius = stunRadius . playerGameParameters . worldParameters $ w
goalLoc@(gx, gy) = endLocation w
activeEnemyLocations = enemyLocation <$>
(filter (\e -> enemyCurrentStunTimer e == 0) (worldEnemies w))
onActiveEnemy = playerLocation player `elem` activeEnemyLocations
shortestPathLength = length $
getShortestPath (worldBoundaries w) playerLoc goalLoc
manhattanDistance = abs (gx - px) + abs (gy - py)
stunAvailable = playerCurrentStunDelay player == 0
numNearbyEnemies = length
[ el | el@(elx, ely) <- activeEnemyLocations,
abs (elx - px) <= radius && abs (ely - py) <= radius ]
drillsRemaining = playerDrillsRemaining player
Now let's move on to assigning scores. If our player is on the same square as an active enemy, we lose. So let's give this a weight of -1000. Conversely, the closer we get to the target, the closer we are to winning. So let's devise a function where if that distance is 0, the score is 1000. Then the farther away we get, the more points we lose. Let's say, 20 points per square. For manhattan distance, we'll use a strict penalty, rather than reward:
evaluateWorld :: World -> Float
evaluateWorld w = ...
where
...
onActiveEnemyScore = if onActiveEnemy then -1000.0 else 0.0
shortestPathScore = 1000.0 - (20.0 * (fromIntegral shortestPathLength))
manhattanDistanceScore = (-5.0) * (fromIntegral manhattanDistance)
Now we want to generally reward having our power ups available to us. This will stop the bot from needlessly using them and also reward it for picking up new drills. We'll also penalize having enemies too close to us.
evaluateWorld :: World -> Float
evaluateWorld w = ...
where
...
stunAvailableScore = if stunAvailable then 80.0 else 0.0
numNearbyEnemiesScore = -100.0 * (fromIntegral numNearbyEnemies)
drillsRemainingScore = 30.0 * (fromIntegral drillsRemaining)
And to complete the function, we'll just add these together:
evaluateWorld :: World -> Float
evaluateWorld w =
onActiveEnemyScore +
shortestPathScore +
manhattanDistanceScore +
stunAvailableScore +
numNearbyEnemiesScore +
drillsRemainingScore
How Well Does it Work?
When we run the game now with the AI active, we see some interesting behaviors. Our bot will generally navigate the maze well. It's path isn't optimal, as we have with drillBFS
. But it makes decent choices about drilling. Its behavior around enemies is a bit strange. It tends to stay away from them, even if they're not actually close in maze difference. This makes it take longer than it needs.
We still don't have good retreating behavior in certain cases. It will often stand still and let an enemy grab it instead of running away.
At this point, we have a couple options for improving the AI. First, we could try tweaking the weights. This will be tedious for us to do manually. This is why we want to apply machine learning techniques to come up with optimal weights.
But the other option is to update the feature space. If we can come up with more intelligent features, we won't need as precise weights.
Conclusion
Next week, we'll try to fix our behavior around enemies. We'll use true maze distance in more places as opposed to manhattan distance. This should give us some big improvements. Then we'll start looking into how we can learn better weights.
We'll be coming up pretty soon on using Tensor Flow for this program! Download our Haskell Tensor Flow Guide to learn more!
And if you're still a Haskell beginner, there's never been a better time to learn! Register for our Haskell From Scratch course to jump-start your Haskell journey! Enrollment ends at midnight TODAY! (August 5th).
Haskell From Scratch Re-Opened!
This week we're taking a break from our Gloss/AI series to make a special announcement! Haskell from Scratch, our beginners course, is now re-opened for enrollment! We've added some more content since the last time we offered it. The biggest addition is a mini-project to help you practice your new skills!
Enrollment will only be open for another week, so don't wait! Next Monday, August 5th, will be the last day to sign up! Enrollments will close at midnight. Once you sign up for the course, you'll have permanent access to the course material. This includes any new content we add in the future. So even if you don't have the time now, it's still a good idea to sign up!
I also want to take this opportunity to tell a little bit of the story of how I learned Haskell. I want to share the mistakes I made, since those motivated me to make this course.
My History with Haskell
I first learned Haskell in college as part of a course on programming language theory. I admired the elegance of a few things in particular. I liked how lists and tuples worked well with the type system. I also appreciated the elegance of Haskell's type definitions. No other language I had seen represented the idea of sum types so well. I also saw how useful pattern matching and recursion were. They made it very easy to break problems down into manageable parts.
After college, I had the idea for a code generation project. A college assignment had taught me some useful Haskell libraries for the task. So I got to work writing some Haskell. At first things were quite haphazard. Eventually though, I developed some semblance of test driven development and product organization.
About nine months into that project, I had the great fortune of landing a Haskell project at my day job. As I ramped up on this project, I saw how deficient my knowledge was in a lot of areas. I realized then a lot of the mistakes I had been making while learning the language. This motivated me to start the Monday Morning Haskell blog.
Main Advice
Of course, I've tried to incorporate my learnings throughout the material on this blog. But if I had to distill the key ideas, here's what they'd be.
First, learn tools and project organization early! Learn how to use Stack and/or Cabal! For help with this, you can check out our free Stack mini-course! After several months on my side project, I had to start from scratch to some extent. The only "testing" I was doing was running some manual executables and commands in GHCI. So once I learned more about these tools, I had to re-work a lot of code.
Second, it helps a lot to have some kind of structure when you're first learning the language. Working on a project is nice, but there are a lot of unknown-unknowns out there. You'll often find a "solution" for your problem, only to see that you need a lot more knowledge to implement it. You need to have a solid foundation on the core concepts before you can dive in on anything. So look for a source that provides some kind of structure to your Haskell learning, like a book (or an online course!).
Third, let's get to monads. They're an important key to Haskell and widely misunderstood. But there are a couple things that will help a lot. First, learn the syntactic patterns of do-syntax. Second, learn how to use run
functions (runState
, runReaderT
, etc.). These are how you bring monadic expressions into the rest of your code. You can check out our Monads Series for some help on these ideas. (And of course, you'll learn all about monads in Haskell From Scratch!)
Finally, ask for help earlier! I still don't plug into the Haskell network as much as I should. There are a lot of folks out there who are more than willing to help. Freenode is a great place, as is Reddit and even Twitter!
Conclusion
There's never been a better time to start learning Haskell! The language tools have developed a ton in the last few years and the community is growing stronger. And of course, we've once again opened up our Haskell From Scratch Beginners Course! You don't need any Haskell experience to take this course. So if you always wanted to learn more about Haskell but needed more organization, this is your chance!
If you want to stay up to date with the latest at Monday Morning Haskell, make sure to Subscribe to our mailing list! You'll hear the latest about upcoming articles, as well as any new course offerings. You'll also get access to our Subscriber Resources.
Analyzing Our Parameters
Our last couple articles have focused on developing an AI for the player character in our game. It isn't perfect, but it's a decent approximation of how a human would try to play the game. This means we can now play iterations of the game without any human involvement. And by changing the parameters of our world, we can play a lot of different versions of the game.
Our goal for this week will be to write some simple analysis functions. These will play through the game without needing to display anything on the screen. Then we'll be able to play different versions in quick succession and compare results.
As always, the code for this project is on a Github Repository. For this article, take a look at the analyze-game
branch.
If you're completely new to Haskell, a simple game like this is a great way to get started! But you should start with our Beginners Checklist! It'll help you get everything set up with the language on your local machine! Then you can move onto our Liftoff Series to learn more about Haskell's mechanics.
Generating a Result
The first thing we need is a function that takes a world state and generates a result for it. Our game does have a degree of randomness. But once we fix the starting random generator for a everything is deterministic. This means we need a function like:
runGameToResult :: World -> GameResult
We'll want to use our updateFunc
from the main game runner. This is our "evolution" function. It's job is to go from one World
state to another. It evolves the game over the course of one timestep by allowing each of the agents to make a decision (or wait). (Note we don't use the Float
parameter in our game. It's just needed by Gloss).
updateFunc :: Float -> World -> World
Since we want to track an ever evolving stateful variable, we'll use the State
monad. For each iteration, we'll change the world using this update step. Then we'll check its result and see if it's finished. If not, we'll continue to run the game.
runGameToResult :: World -> GameResult
runGameToResult = evalState runGameState
where
runGameState :: State World GameResult
runGameState = do
modify (updateFunc 1.0)
currentResult <- gets worldResult
if currentResult /= GameInProgress
then return currentResult
else runGameState
Analysis: Generating World Iterations
Now that we can run a given world to its conclusion, let's add another step to the process. We'll run several different iterations with any given set of parameters on a world. Each of these will have a different set of starting enemy locations and drill power-ups. Let's make a function that will take a random generator and a "base world". It will derive a new world with random initial enemy positions and drill locations.
generateWorldIteration :: World -> StdGen -> World
We'll use a helper function from our game that generates random locations in our maze. It's stateful over the random generator.
generateRandomLocation :: (Int, Int) -> State StdGen Location
So first let's get all our locations:
generateWorldIteration :: World -> StdGen -> World
generateWorldIteration w gen1 = ...
where
params = worldParameters w
rowCount = numRows params
columnCount = numColumns params
enemyCount = numEnemies params
drillCount = numDrillPowerups params
(enemyLocations, gen2) = runState
(sequence
(map
(const (generateRandomLocation (rowCount, columnCount)))
[1..enemyCount])
)
gen1
(drillLocations, gen3) = runState
(sequence
(map
(const (generateRandomLocation (rowCount, columnCount)))
[1..drillCount])
)
gen2
...
Then we have to use the locations to generate our different enemies. Last, we'll plug all these new elements into our base world and return it!
generateWorldIteration :: World -> StdGen -> World
generateWorldIteration w gen1 = w
{ worldEnemies = enemies
, worldDrillPowerUpLocations = drillLocations
, worldRandomGenerator = gen3
, worldTime = 0
}
where
...
(enemyLocations, gen2) = ...
(drillLocations, gen3) = …
enemies = mkNewEnemy (enemyGameParameters params) <$> enemyLocations
Analysis: Making Parameter Sets
For our next order of business, we want to make what we'll call a parameter set. We want to run the game with different parameters each time. For instance, we can take a base set of parameters, and then change the number of enemies present in each one:
varyNumEnemies :: GameParameters -> [GameParameters]
varyNumEnemies baseParams = newParams <$> allEnemyNumbers
where
baseNumEnemies = numEnemies baseParams
allEnemyNumbers = [baseNumEnemies..(baseNumEnemies + 9)]
newParams i = baseParams { numEnemies = i }
We can do the same for the number of drill pickups:
varyNumDrillPickups :: GameParameters -> [GameParameters]
varyNumDrillPickups baseParams = newParams <$> allDrillNumbers
where
baseNumDrills = numDrillPowerups baseParams
allDrillNumbers = [baseNumDrills..(baseNumDrills + 9)]
newParams i = baseParams { numDrillPowerups = i }
Finally, we can have a different cooldown time for our player's stun ability.
varyPlayerStunCooldown :: GameParameters -> [GameParameters]
varyPlayerStunCooldown baseParams = newParams <$> allCooldowns
where
basePlayerParams = playerGameParameters baseParams
baseCooldown = initialStunTimer basePlayerParams
allCooldowns = [(baseCooldown - 4)..(baseCooldown + 5)]
newParams i = baseParams
{ playerGameParameters = basePlayerParams { initialStunTimer = i }}
If you fork our code, you can try altering some other parameters. You can even try combining certain parameters to see what the results are!
Tying It Together
We've done most of the hard work now. We'll have a function that takes a number of iterations per parameter set, the base world, and a generator for those sets. It'll match up each parameter set to the number of wins the player gets over the course of the iterations.
runAllIterations
:: Int
-> World
-> (GameParameters -> [GameParameters])
-> [(GameParameters, Int)]
runAllIterations numIterations w paramGenerator =
map countWins results
where
aiParams = (worldParameters w) { usePlayerAI = True }
paramSets = paramGenerator aiParams
runParamSet :: GameParameters -> [GameResult]
runParamSet ps = map
(runGame w {worldParameters = ps })
[1..numIterations]
runGame :: World -> Int -> GameResult
runGame baseWorld seed = runGameToResult
(generateWorldIteration baseWorld (mkStdGen seed))
results :: [(GameParameters, [GameResult])]
results = zip paramSets (map runParamSet paramSets)
countWins :: (GameParameters, [GameResult]) -> (GameParameters, Int)
countWins (gp, gameResults) =
(gp, length (filter (== GameWon) gameResults))
We need one more function. It will read an input file and apply our steps over a particular parameter group. Here's an example with varying the number of enemies:
analyzeNumEnemies :: FilePath -> IO ()
analyzeNumEnemies fp = do
world <- loadWorldFromFile fp
let numIterations = 10
putStrLn "Analyzing Different Numbers of Enemies"
let results = runAllIterations numIterations world varyNumEnemies
forM_ results $ \(gp, numWins) -> putStrLn $
"With " ++ (show (numEnemies gp)) ++ " Enemies: " ++ (show numWins)
++ " wins out of " ++ (show numIterations) ++ " iterations."
Now we're done! In the appendix, you can find some basic results of our investigation!
Conclusion
Soon, we'll take our analysis steps and apply them in a more systematic way. We'll try to gauge the difficulty of a particular game level. Then we can make levels that get more and more challenging!
But first, we'll start exploring a few ways we can improve the player and enemy AI abilities. We'll start by implementing some basic caching mechanisms in our breadth first search. Then we'll consider some other AI patterns besides simple BFS.
For a review of the code in this article, take a look at our Github Repository. You'll want to explore the analyze-game
branch!
We'll soon be exploring machine learning a bit more as we try to improve the game. Make sure to read our series on Haskell and AI to learn more! Download our Haskell Tensorflow Guide to see how we can use tensor flow with Haskell!
Appendix
With 4 drills and 10 cooldown time:
Analyzing Different Numbers of Enemies
With 4 Enemies: 10 wins out of 10 iterations.
With 5 Enemies: 9 wins out of 10 iterations.
With 6 Enemies: 9 wins out of 10 iterations.
With 7 Enemies: 10 wins out of 10 iterations.
With 8 Enemies: 9 wins out of 10 iterations.
With 9 Enemies: 9 wins out of 10 iterations.
With 10 Enemies: 9 wins out of 10 iterations.
With 11 Enemies: 9 wins out of 10 iterations.
With 12 Enemies: 8 wins out of 10 iterations.
With 13 Enemies: 7 wins out of 10 iterations.
With 13 enemies and 10 cooldown time:
With 2 Drills: 5 wins out of 10 iterations.
With 3 Drills: 7 wins out of 10 iterations.
With 4 Drills: 8 wins out of 10 iterations.
With 5 Drills: 8 wins out of 10 iterations.
With 6 Drills: 8 wins out of 10 iterations.
With 7 Drills: 7 wins out of 10 iterations.
With 8 Drills: 8 wins out of 10 iterations.
With 9 Drills: 8 wins out of 10 iterations.
With 10 Drills: 8 wins out of 10 iterations.
With 11 Drills: 8 wins out of 10 iterations.