James Bowen 1/6/20 James Bowen 1/6/20

Organizing Our Package!

To start off the new year, we're going to look at the process of creating and managing a new Haskell project. After learning the very basics of Haskell, this is one of the first problems you'll face when starting out. Over the next few weeks, we'll look at some programs we can use to manage our packages, such as Cabal, Stack, and Nix.

The first two of these options both use the .cabal file format for the project specification file. So to start this series off, we're going to spend this week going through this format. We'll also discuss, at a high level, what a package consists of and how this format helps us specify it.

If you'd like to skip ahead a little and get to the business of writing a Haskell project, you should take our free Stack mini-course! It will teach you all the most basic commands you'll need to get your first Haskell project up and running. If you're completely new to Haskell, you should first get familiar with the basics of the language! Head to our Liftoff Series for help with that!

What is a Package Anyway?

The .cabal file describes a single Haskell package. So before we understand this file or its format, we should understand what a package is. Generally speaking, a package has one of two goals.

Provide library code that others can use within their own Haskell projects
Create an executable program that others can run on their computers

In the first case, we'll usually want to publish our package on a repository (such as Hackage or Github) so that others can download it and use it. In the second case, we can allow others to download our code and compile it from source code. But we can also create the binary ourselves for certain platforms, and then publish it.

Our package might also include testing code that is internal to the project. This code allows us (or anyone using the source code) to verify the behavior of the code.

So our package contains some combination of these three different elements. The main job of the .cabal file is to describe these elements: the source code files they use, their dependencies, and how they get compiled.

Any package has a single .cabal file, which should bear the name of the package (e.g. MyPackage.cabal). And a .cabal file should only correspond to a single package. That said, it is possible for a single project on your machine to contain many packages. Each sub-package would have its own .cabal file. Armed with this knowledge, let's start exploring the different areas of the .cabal file.

Metadata

The most basic part of the .cabal file is the metadata section at the top. This contains information useful information about the package. For starters, it should have the project's name, as well as the version number, author name, and a maintainer email.

name: MyProject
version: 0.1.0.0
author: John Smith
maintainer: john@smith.com

It can also specify information like a license file and copyright owner. Then there are a couple other fields that tell the Cabal package manager how to build the package. These are the cabal-version and the build-type (usually Simple).

license-file: LICENSE
copyright: Monday Morning Haskell 2020
build-type: Simple
cabal-version: >=1.10

The Library

Now the rest of our the .cabal file will describe the different code elements of our project. The format for these sections are all similar but with a few tweaks. The library section describes the library code for our project. That is, the code people would have access to when using our package as a dependency. It has a few important fields.

The "exposed" modules tell us the public API for our library. Anyone using our library as a dependency can import these modules.
"Other" modules are internal source files that other users shouldn't need. We can omit this if there are none.
We also provide a list of "source" directories for our library code. Any module that lives in a sub-directory of one of these gets namespaced.
We also need to specify dependencies. These are other packages, generally ones that live in Hackage, that our project depends on. We can provide various kinds of version constraints on these.
Finally, there is a "default language" section. This either indicates Haskell2010, or Haskell1998. The latter has fewer language features and extensions. So for newer projects, you should almost always use Haskell2010.

Here's a sample library section:

library:
  exposed-modules:
      Runner
      Router
      Schema
  other-modules:
      Internal.OptionsParser
      Internal.JsonParser
  hs-source-dirs:
      src
  build-depends:
      base >=4.9 && <4.10
  default-language: Haskell2010

A couple other optional fields are ghc-options and default-language-extensions. The first specifies command line options for GHC that we want to include whenever we build our code. The second, specifies common language extensions to use by default. This way, we don't always need {-# LANGUAGE #-} pragmas at the tops of all our files.

library:
  ...
  ghc-options:
    -Wall
    -Werror
  default-extensions:
    OverloadedStrings
    FlexibleContexts

The library is the most important section of the file, and probably the one you'll update the most. You'll need to update the build-depends section each time you add a new dependency. And you'll update one of the modules sections every time you make a new file. There are several other fields you can use for various circumstances, but these should be enough to get you started.

Executables

Now we can also provide different kinds of executable code with our package. This allows us to produce a binary program that others can run. We specify such a binary in an "executable" section. Executables have similar fields, such as the source directory, language , and dependency list.

Instead of listing exposed modules and other modules, we specify one file as the main-is for running the program. This file should contain a main expression of type IO () that gets run when executing the binary.

The executable can (and generally should) import our library as a dependency. We use the name of our project in the build-depends section to indicate this.

exectuable run-my-project
  main-is: RunProject.hs
  hs-source-dirs: app
  build-depends:
      base >=4.9 && <4.10
    , MyProject
  default-language: Haskell2010

As we explore different package managers, we'll see how we can build and run these executables.

Test Suites

There are a couple special types of executables we can make as well. Test suites and benchmarks allow us to test out our code in different ways. Their format is almost identical to executables. You would use the word test-suite or benchmark instead of executable. Then you must also provide a type, describing how the program should exit if the test fails. In most cases, you'll want to use exitcode-stdio-1.0.

test-suite test-my-project
  type: exitcode-stdio-1.0
  main-is: Test.hs
  hs-source-dirs: test
  build-depends:
      base >=4.9 && <4.10
    , hunit
    , MyProject
  default-language: Haskell2010

As we mentioned, a test suite is essentially a special type of executable. Thus, it will have a "main" module with a main :: IO () expression. Any testing library will have some kind of special main function that allows us to create test cases. Then we'll have some kind of special test command as part of our package manager that will run our test suites.

Conclusion

That wraps up the main sections of our .cabal file. Knowing the different pieces of our project is the first step towards running our code. Next week, we'll start exploring the different tools that will make use of this project description. We'll start with Cabal, and then move onto Stack.

James Bowen 12/30/19 James Bowen 12/30/19

Happy New Years from MMH!

Tomorrow is New Year's Eve, so this week we're celebrating with a review of everything we've done with year! This was a busy year where we broached a few new topics. We looked a lot at a couple areas where we don't hear much about Haskell development. Let's do a quick recap.

2019 In Review

We started this year by going over some of the basic concepts in Haskell data types. We compared these ideas with data in other languages. This culminated in the release of our Haskell Data Series. This series is a good starting point for Haskell beginners. It can help you get more familiar with a lot of Haskell concepts, by comparison to other languages.

For much of the spring and summer, we then focused on game development with the Gloss library. The end product of this was our Maze Game. It can teach some useful Haskell skills and provides some cheap entertainment!

During the fall, we then used this game as a platform to learn more about AI and machine learning. We explored a couple algorithms for solving our maze by hand. We tried several different approaches to machine-learn an algorithm for the task. We went through quite a bit of ML theory, even if the results we're always great in practice! Most of the code from this series is in our MazeLearner repository.

We closed out the year by exploring the Rust programming language. This language is like a mix of Haskell and C++. It's syntax is more like C++, but it incorporates many good functional ideas. The result is a language that has many of the nice things we've come to expect from Haskell land. But it's also performant and very accessible to a wider audience.

Upcoming Plans

This next year, we've got a lot planned! We're planning on some major new offerings in our online course selection! There are currently two out there. There's the free Stack mini-course (open now). Then there's the Haskell From Scratch beginner's course, which will reopen soon.

We're now close to launching a new course showcasing some of the more practical uses for Haskell! We're expecting to launch that within the next couple months! We've also got plans in the works for some smaller courses that should go live later in the year.

In the short-term, we've got a few different things planned for the blog as well. We're going to retool the website to give a better appearance. especially for code. We'll also look to have tighter integration of code samples with Github. Expect to see these updates on our permanent content soon!

Topic-wise, you can expect a wide variety of content. We'll spend some time at the start of the year going back to the basics, as we usually do. We'll move on to some more practical elements of Haskell. We'll also come back to AI and machine learning. We'll look at some very simple games, and generalize the idea of agents within them. We'll use these simple games as an easier platform to make complex algorithms work.

Conclusion

There's a lot planned as we move into our 4th year of Haskell! So stay tuned! And remember to subscribe to our mailing list! This will give you access to our Subscriber Resources! You'll also get our monthly newsletter, so you know what's happening!

James Bowen 12/23/19 James Bowen 12/23/19

Collections and Lifetimes in Rust!

Last week we discussed how to use Cargo to create and manage Rust projects. This week we're going to finish up our discussion of Rust. We'll look at a couple common container types: vectors and hash maps. Rust has some interesting ideas for how to handle common operations, as we'll see. We'll also touch on the topic of lifetimes. This is another concept related to memory management, for trickier scenarios.

For a good summary of the main principles of Rust, take a look at our Rust Video Tutorial. It will give you a concise overview of the topics we've covered in our series and more!

Vectors

Vectors, like lists or arrays in Haskell, contain many elements of the same type. It's important to note that Rust vectors store items in memory like arrays. That is, all their elements are in a contiguous space in memory. We refer to them with the parameterized type Vec in Rust. There are a couple ways to initialize them:

let v1: Vec<i32> = vec![1, 2, 3, 4];
let v2: Vec<u32> = Vec::new();

The first vector uses the vec! macro to initialize. It will have four elements. Then the second will have no elements. Of course, the second vector won't be much use to us! Since it's immutable, it will never contain any elements! Of course, we can change this by making it mutable! We can use simple operations like push and pop to manipulate it.

let mut v2: Vec<u32> = Vec::new();
v2.push(5);
v2.push(6);
v2.push(7);
v2.push(8);
let x = v2.pop();

println!("{:?}", v2);
println!("{} was popped!", x);

Note that pop will remove from the back of the vector. So the printed vector will have 5, 6, and 7. The second line will print 8.

There are a couple different ways of accessing vectors by index. The first way is to use traditional bracket syntax, like we would find in C++. This will throw an error and crash if you are out of bounds!

let v1: Vec<i32> = vec![1, 2, 3, 4];

let first: i32 = v1[0];
let second: i32 = v1[1];

You can also use the get function. This returns the Option type we discussed a couple weeks back. This allows us to handle the error gracefully instead of crashing. In the example below, we print a failure message, rather than crashing as we would with v1[5].

let v1: Vec<i32> = vec![1, 2, 3, 4];

match v1.get(5) {
  Some(elem) => println!("Found an element: {}!", elem),
  None => println!("No element!"),
}

Another neat trick we can do with vectors is loop through them. This loop will add 2 to each of the integers in our list. It does this by using the * operator to de-reference to element, like in C++. We must pass it as a mutable reference to the vector to update it.

for i in &mut v1 {
  *i += 2;
}

Ownership with Vectors

Everything works fine with the examples above because we're only using primitive numbers. But if we use non-primitive types, we need to remember to apply the rules of ownership. In general, a vector should own its contents. So when we push an element into a vector, we give up ownership. The follow code will not compile because s1 is invalid after pushing!

let s1 = String::from("Hello");
let mut v1: Vec<String> = Vec::new();
v1.push(s1);
println!("{}", s1); // << Doesn't work!

Likewise, we can't get a normal String out of the vector. We can only get a reference to it. This will also cause problems:

let s1 = String::from("Hello");
let mut v1: Vec<String> = Vec::new();
v1.push(s1);

// Bad!
let s2: String = v1[0];

We can fix these easily though, by adding ampersands to make it clear we want a reference:

let s1 = String::from("Hello");
let mut v1: Vec<String> = Vec::new();
v1.push(s1);

let s2: &String = &v1[0];

But if we get an item out of the list, that reference gets invalidated if we then update the list again.

let mut v1: Vec<String> = Vec::new();
v1.push(String::from("Hello"));

let s2: &String = &v1[0];
v1.push(String::from("World"));

// Bad! s2 is invalidated by pushing the v1!
println!("{}", s2);

This can be very confusing. But again, the reason lies with the way memory works. The extra push might re-allocate the entire vector. This would invalidate the memory s2 points to.

Hash Maps

Now let's discuss hash maps. At a basic level, these work much like their Haskell counterparts. They have two type parameters, a key and a value. We can initialize them and insert elements:

let mut phone_numbers: HashMap<String, String> = HashMap::new();
phone_numbers.insert(
    String::from("Johnny"),
    String::from("555-123-4567"));
phone_numbers.insert(
    String::from("Katie"),
    String::from("555-987-6543"));

As with vectors, hash maps take ownership of their elements, both keys and values. We access elements in hash maps with the get function. As with vectors, this returns an Option:

match phone_numbers.get("Johnny") {
  Some(number) => println!("{}", number),
  None => println!("No number"),
}

We can also loop through the elements of a hash map. We get a tuple, rather than individual elements:

for (name, number) in &phone_numbers {
  println!("{}: {}", name, number);
}

Updating hash maps is interesting because of the entry function. This allows us to insert a new value for key, but only if that key doesn't exist. We apply or_insert on the result of entry. In this example, we'll maintain the previous phone number for "Johnny" but add a new one for "Nicholas".

phone_numbers.entry(String::from("Johnny")).
    or_insert(String::from("555-111-1111"));
phone_numbers.entry(String::from("Nicholas")).
    or_insert(String::from("555-111-1111"));

If we want to overwrite the key though, we can use insert. After this example, both keys will use the new phone number.

phone_numbers.insert(
    String::from("Johnny"),
    String::from("555-111-1111"));
phone_numbers.insert(
    String::from("Nicholas"),
    String::from("555-111-1111"));

Lifetimes

There's one more concept we should cover before putting Rust away. This is the idea of lifetimes. Ownership rules get even trickier as your programs get more complicated. Consider this simple function, returning the longer of two strings:

fn longest_string(s1: &String, s2: &String) -> &String {
  if s1.len() > s2.len() {
    s1
  } else {
    s2
  }
}

This seems innocuous enough, but it won't compile! The reason is that Rust doesn't know at compile time which string will get returned. This prevents it from analyzing the ownership of the variables it gets. Consider this invocation of the function:

fn main() {
  let s1 = String::from("A long Hello");
  let result;

  {
    let s2 = String::from("Hello");
    result = longest_string(&s1, &s2);
  }

  println!("{}", result);
}

With this particular set of parameters, things would work out. Since s1 is longer, result would get that reference. And when we print it at the end, s1 is still in scope. But if we flip the strings, then result would refer to s2, which is no longer in scope!

But the longest_string function doesn't know about the scopes of its inputs. And it doesn't know which value gets returned. So it complains at compile time. We can fix this by specifying the lifetimes of the inputs. Here's how we do that:

fn longest_string<'a>(s1: &'a String, s2: &'a String) -> &'a String {
  if s1.len() > s2.len() {
    s1
  } else {
    s2
  }
}

The lifetime annotation 'a is now a template of the function. Each of the types in that line should read "a reference with lifetime 'a' to a string". Both inputs have the same lifetime. Rust assumes this is the smallest common lifetime between them. It states that the return value must have this same (shortest) lifetime.

When we add this specification, our longest_string function compiles. But the main function we have above will not, since it violates the lifetime rules we gave it! By moving the print statement into the block, we can fix it:

fn main() {
  let s1 = String::from("A long Hello");
  let result;

  {
    let s2 = String::from("Hello");
    result = longest_string(&s1, &s2);
    println!("{}", result);
  }
}

The shortest common lifetime is the time inside the block. And we don't use the result of the function outside the block. So everything works now!

It's a little difficult to keep all these rules straight. Luckily, Rust finds all these problems at compile time! So we won't shoot ourselves in the foot and have difficult problems to debug!

Conclusion

That's all for our Rust series! Rust is an interesting language. We'll definitely come back to it at some point on this blog. For more detailed coverage, watch Rust Video Tutorial. You can also read the Rust Book, which has lots of great examples and covers all the important concepts!

Next week is New Year's, so we'll be doing a recap of all the exciting topics we've covered in 2019!

James Bowen 12/16/19 James Bowen 12/16/19

Making Rust Projects with Cargo!

In the last couple weeks, we've gotten a good starter look at Rust. We've considered different elements of basic syntax, including making data types. We've also looked at the important concepts of ownership and memory management.

This week, we're going to look at the more practical side of things. We'll explore how to actually build out a project using Cargo. Cargo is the Rust build system and package manager. This is Rust's counterpart to Stack and Cabal. We'll look at creating, building and testing projects. We'll also consider how to add dependencies and link our code together.

If you want to see Cargo in action, take a look at our Rust Video Tutorial. It gives a demonstration of most all the material in this article and then some more!

Cargo

As we mentioned above, Cargo is Rust's version of Stack. It exposes a small set of commands that allow us to build and test our code with ease. We can start out by creating a project with:

cargo new my_first_rust_project

This creates a bare-bones application with only a few files. The first of these is Cargo.toml. This is our project description file, combining the roles of a .cabal file and stack.yaml. It's initial layout is actually quite simple! We have four lines describing our package, and then an empty dependencies section:

[package]
name = "my_first_rust_project"
version = "0.1.0"
authors = ["Your Name <you@email.com>"]
edition = "2018"

[dependencies]

Cargo's initialization assumes you use Github. It will pull your name and email from the global Git config. It also creates a .git directory and .gitignore file for you.

The only other file it creates is a src/main.rs file, with a simple Hello World application:

fn main() {
  println!("Hello World!");
}

Building and Running

Cargo can, of course, also build our code. We can run cargo build, and this will compile all the code it needs to produce an executable for main.rs. With Haskell, our build artifacts go into the .stack-work directory. Cargo puts them in the target directory. Our executable ends up in target/debug, but we can run it with cargo run.

There's also a simple command we can run if we only want to check that our code compiles. Using cargo check will verify everything without creating any executables. This runs much faster than doing a normal build. You can do this with Stack by using GHCI and reloading your code with :r.

Like most good build systems, Cargo can detect if any important files have changed. If we run cargo build and files have changed, then it won't re-compile our code.

Adding Dependencies

Now let's see an example of using an external dependency. We'll use the rand crate to generate some random values. We can add it to our Cargo.toml file by specifying a particular version:

[dependencies]
rand = "0.7"

Rust uses semantic versioning to ensure you get dependencies that do not conflict. It also uses a .lock file to ensure that your builds are reproducible. But (to my knowledge at least), Rust does not yet have anything like Stackage. This means you have to specify the actual versions for all your dependencies. This seems to be one area where Stack has a distinct advantage.

Now, "rand" in this case is the name of the "crate". A crate is either an executable or a library. In this case, we'll use it as a library. A "package" is a collection of crates. This is somewhat like a Haskell package. We can specify different components in our .cabal file. We can only have one library, but many executables.

We can now include the random functionality in our Rust executable with the use keyword:

use rand::prelude::Rng;

fn main() {
  let mut rng = rand::thread_rng();
  let random_num: i32 = rng.gen_range(-100, 101);

  println!("Here's a random number: {}", random_num);
}

When we specify the import, rand is the name of the crate. Then prelude is the name of the module, and Rng is the name of the trait we'll be using.

Making a Library

Now let's enhance our project by adding a small library. We'll write this file in src/lib.rs. By Cargo's convention, this file will get compiled into our project's library. We can delineate different "modules" within this file by using the mod keyword and naming a block. We can expose the function within this block by declaring it with the pub keyword. Here's a module with a simple doubling function:

pub mod doubling {
  pub fn double_number(x: i32) -> i32 {
    x * 2
  }
}

We also have to make the module itself pub to export and use it! To use this function in our main binary, we need to import our library. We refer to the library crate with the name of our project. Then we namespace the import by module, and pick out the specific function (or * if we like).

use my_first_rust_project::doubling::double_number;
use rand::prelude::Rng;

fn main() {
  let mut rng = rand::thread_rng();
  let random_num: i32 = rng.gen_range(-100, 101);

  println!("Here's a random number: {}", random_num);
  println!("Here's twice the number: {}", double_number(random_num));
}

Adding Tests

Rust also allows testing, of course. Unlike most languages, Rust has the convention of putting unit tests in the same file as the source code. They go in a different module within that file. To make a test module, we put the cfg(test) annotation before it. Then we mark any test function with a test annotation.

// Still in lib.rs!

#[cfg(test)]
mod tests {
  use crate::doubling::double_number;

  #[test]
  fn test_double() {
    assert_eq!(double_number(4), 8);
    assert_eq!(double_number(5), 10);
  }
}

Notice that it must still import our other module, even though it's in the same file! Of course, integration tests would need to be in a separate file. Cargo still recognizes that if we create a tests directory it should look for test code there.

Now we can run our tests with cargo test. Because of the annotations, Cargo won't waste time compiling our test code when we run cargo build. This helps save time.

What's Next

We've done a very simple example here. We can see that a lot of Cargo's functionality relies on certain conventions. We may need to move beyond those conventions if our project demands it. You can see more details by watching our Rust Video Tutorial! Next time, we'll wrap up our study of Rust by looking at different container types!

James Bowen 12/9/19 James Bowen 12/9/19

Data Types in Rust: Borrowing from Both Worlds

Last time, we looked at the concept of ownership in Rust. This idea underpins how we manage memory in our Rust programs. It explains why we don't need garbage collection. and it helps a lot with ensuring our program runs efficiently.

This week, we'll study the basics of defining data types. As we've seen so far, Rust combines ideas from both object oriented languages and functional languages. We'll continue to see this trend with how we define data. There will be some ideas we know and love from Haskell. But we'll also see some ideas that come from C++.

For the quickest way to get up to speed with Rust, check out our Rust Video Tutorial! It will walk you through all the basics of Rust, including installation and making a project.

Defining Structs

Haskell has one primary way to declare a new data type: the data keyword. We can also rename types in certain ways with type and newtype, but data is the core of it all. Rust is a little different in that it uses a few different terms to refer to new data types. These all correspond to particular Haskell structures. The first of these terms is struct.

The name struct is a throwback to C and C++. But to start out we can actually think of it as a distinguished product type in Haskell. That is, a type with one constructor and many named fields. Suppose we have a User type with name, email, and age. Here's how we could make this type a struct in Rust:

struct User {
  name: String,
  email: String,
  age: u32,
}

This is very much like the following Haskell definition:

data User = User
  { name :: String
  , email :: string
  , age :: Int
  }

When we initialize a user, we should use braces and name the fields. We access individual fields using the . operator.

let user1 = User {
  name: String::from("James"),
  email: String::from("james@test.com"),
  age: 25,
};

println!("{}", user1.name);

If we declare a struct instance to be mutable, we can also change the value of its fields if we want!

let mut user1 = User {
  name: String::from("James"),
  email: String::from("james@test.com"),
  age: 25,
};

user1.age = 26;

When you're starting out, you shouldn't use references in your structs. Make them own all their data. It's possible to put references in a struct, but it makes things more complicated.

Tuple Structs

Rust also has the notion of a "tuple struct". These are like structs except they do not name their fields. The Haskell version would be an "undistinguished product type". This is a type with a single constructor, many fields, but no names. Consider these:

// Rust
struct User(String, String, u32);

-- Haskell
data User = User String String Int

We can destructure and pattern match on tuple structs. We can also use numbers as indices with the . operator, in place of user field names.

struct User(String, String, u32);

let user1 = User("james", "james@test.com", 25);

// Prints "james@test.com"
println!("{}", user1.1);

Rust also has the idea of a "unit struct". This is a type that has no data attached to it. These seem a little weird, but they can be useful, as in Haskell:

// Rust
struct MyUnitType;

-- Haskell
data MyUnitType = MyUnitType

Enums

The last main way we can create a data type is with an "enum". In Haskell, we typically use this term to refer to a type that has many constructors with no arguments. But in Rust, an enum is the general term for a type with many constructors, no matter how much data each has. Thus it captures the full range of what we can do with data in Haskell. Consider this example:

// Rust
struct Point(f32, f32);

enum Shape {
  Triangle(Point, Point, Point),
  Rectangle(Point, Point, Point, Point),
  Circle(Point, f32),
}

-- Haskell
data Point = Point Float Float

data Shape =
  Triangle Point Point Point |
  Rectangle Point Point Point Point |
  Circle Point Float

Pattern matching isn't quite as easy as in Haskell. We don't make multiple function definitions with different patterns. Instead, Rust uses the match operator to allow us to sort through these. Each match must be exhaustive, though you can use _ as a wildcard, as in Haskell. Expressions in a match can use braces, or not.

fn get_area(shape: Shape) -> f32 {
  match shape {
    Shape::Triangle(pt1, pt2, pt3) => {
      // Calculate 1/2 base * height
    },
    Shape::Rectangle(pt1, pt2, pt3, pt4) => {
      // Calculate base * height
    },
    Shape::Circle(center, radius) => (0.5) * radius * radius * PI
  }
}

Notice we have to namespace the names of the constructors! Namespacing is one element that feels more familiar from C++. Let's look at another.

Implementation Blocks

So far we've only looked at our new types as dumb data, like in Haskell. But unlike Haskell, Rust allows us to attach implementations to structs and enums. These definitions can contain instance methods and other functions. They act like class definitions from C++ or Python. We start off an implementation section with the impl keyword.

As in Python, any "instance" method has a parameter self. In Rust, this reference can be mutable or immutable. (In C++ it's called this, but it's an implicit parameter of instance methods). We call these methods using the same syntax as C++, with the . operator.

impl Shape {
  fn area(&self) -> f32 {
    match self {
      // Implement areas
    }
  }
}

fn main() {
  let shape1 = Shape::Circle(Point(0, 0), 5);
  println!("{}", shape1.area()); 
}

We can also create "associated functions" for our structs and enums. These are functions that don't take self as a parameter. They are like static functions in C++, or any function we would write for a type in Haskell.

impl Shape {
  fn shapes_intersect(s1: &Shape, s2: &Shape) -> bool 


}

fn main() {
  let shape1 = Shape::Circle(Point(0, 0), 5);
  let shape2 = Shape::Circle(Point(10, 0), 6);
  if Shape::shapes_intersect(&shape1, &shape2) {
    println!("They intersect!");
  } else {
    println!("No intersection!");
  };
}

Notice we still need to namespace the function name when we use it!

Generic Types

As in Haskell, we can also use generic parameters for our types. Let's compare the Haskell definition of Maybe with the Rust type Option, which does the same thing.

// Rust
enum Option<T> {
  Some(T),
  None,
}

-- Haskell
 data Maybe a =
  Just a |
  Nothing

Not too much is different here, except for the syntax.

We can also use generic types for functions:

fn compare<T>(t1: &T, t2: &T) -> bool

But, you won't be able to do much with generics unless you know some information about what the type does. This is where traits come in.

Traits

For the final topic of this article, we'll discuss traits. These are like typeclasses in Haskell, or interfaces in other languages. They allow us to define a set of functions. Types can provide an implementation for those functions. Then we can use those types anywhere we need a generic type with that trait.

Let's reconsider our shape example and suppose we have a different type for each of our shapes.

struct Point(f32, f32);

struct Rectangle {
  top_left: Point,
  top_right: Point,
  bottom_right: Point,
  bottom_left: Point,
}

struct Triangle {
  pt1: Point,
  pt2: Point,
  pt3: Point,
}

struct Circle {
  center: Point,
  radius: f32,
}

Now we can make a trait for calculating the area, and let each shape implement that trait! Here's how the syntax looks for defining it and then using it in a generic function. We can constrain what generics a function can use, as in Haskell:

pub trait HasArea {
  fn calculate_area(&self) -> f32;
}

impl HasArea for Circle {
  fn calculate_area(&self) -> f32 {
    self.radius * self.radius * PI
  }
}

fn double_area<T: HasArea>(element: &T) -> f32 {
  2 * element.calculate_area()
}

Also as in Haskell, we can derive certain traits with one line! The Debug trait works like Show:

#[derive(Debug)]
struct Circle

What's Next

This should give us a more complete understanding of how we can define data types in Rust. We see an interesting mix of concepts. Some ideas, like instance methods, come from the object oriented world of C++ or Python. Other ideas, like matchable enumerations, come from more functional languages like Haskell.

Next time, we'll start looking at making a project with Rust. We'll consider how we create a project, how to manage its dependencies, how to run it, and how to test it.

James Bowen 12/2/19 James Bowen 12/2/19

Ownership: Managing Memory in Rust

When we first discussed Rust we mentioned how it has a different memory model than Haskell. The suggestion was that Rust allows more control over memory usage, like C++. In C++, we explicitly allocate memory on the heap with new and de-allocate it with delete. In Rust, we do allocate memory and de-allocate memory at specific points in our program. Thus it doesn't have garbage collection, as Haskell does. But it doesn't work quite the same way as C++.

In this article, we'll discuss the notion of ownership. This is the main concept governing Rust's memory model. Heap memory always has one owner, and once that owner goes out of scope, the memory gets de-allocated. We'll see how this works; if anything, it's a bit easier than C++!

For a more detailed look at getting started with Rust, take a look at our Rust video tutorial!

Scope (with Primitives)

Before we get into ownership, there are a couple ideas we want to understand. First, let's go over the idea of scope. If you code in Java, C, or C++, this should be familiar. We declare variables within a certain scope, like a for-loop or a function definition. When that block of code ends, the variable is out of scope. We can no longer access it.

int main() {
  for (int i = 0; i < 10; ++i) {
    int j = 0;

    // Do something with j...
  }

  // This doesn't work! j is out of scope!
  std::cout << j << std::endl;
}

Rust works the same way. When we declare a variable within a block, we cannot access it after the block ends. (In a language like Python, this is actually not the case!)

fn main() {

  let j: i32 = {
    let i = 14;
    i + 5
  };

  // i is out of scope. We can't use it anymore.

  println!("{}", j);
}

Another important thing to understand about primitive types is that we can copy them. Since they have a fixed size, and live on the stack, copying should be inexpensive. Consider:

fn main() {

  let mut i: i32 = 10;
  let j = i;
  i = 15;

  // Prints 15, 10
  println!("{}, {}", i, j);
}

The j variable is a full copy. Changing the value of i doesn't change the value of j. Now for the first time, let's talk about a non-primitive type, String.

The String Type

We've dealt with strings a little by using string literals. But string literals don't give us a complete string type. They have a fixed size. So even if we declare them as mutable, we can't do certain operations like append another string. This would change how much memory they use!

let mut my_string = "Hello";
my_string.append(" World!"); // << This doesn't exist for literals!

Instead, we can use the String type. This is a non-primitive object type that will allocate memory on the heap. Here's how we can use it and append to one:

let mut my_string = String::from("Hello");
my_string.push_str(" World!");

Now let's consider some of the implications of scope with object types.

Scope with Strings

At a basic level, some of the same rules apply. If we declare a string within a block, we cannot access it after that block ends.

fn main() {

  let str_length = {
    let s = String::from("Hello");
    s.len()
  }; // s goes out of scope here

  // Fails!
  println!("{}", s);
}

What's cool is that once our string does go out of scope, Rust handles cleaning up the heap memory for it! We don't need to call delete as we would in C++. We define memory cleanup for an object by declaring the drop function. We'll get into more details with this in a later article.

C++ doesn't automatically de-allocate for us! In this example, we must delete myObject at the end of the for loop block. We can't de-allocate it after, so it will leak memory!

int main() {
  for (int i = 0; i < 10; ++i) {
    // Allocate myObject
    MyType* myObject = new MyType(i);

    // Do something with myObject …

    // We MUST delete myObject here or it will leak memory!
    delete myObject; 
  }

  // Can't delete myObject here!
}

So it's neat that Rust handles deletion for us. But there are some interesting implications of this.

Copying Strings

What happens when we try to copy a string?

let len = {
  let s1 = String::from("Hello");
  let s2 = s1;
  s2.len()
};

This first version works fine. But we have to think about what will happen in this case:

let len = {
  let mut s1 = String::from("123");
  let mut s2 = s1;
  s1.push_str("456");
  s1.len() + s2.len()
};

For people coming from C++ or Java, there seem to be two possibilities. If copying into s2 is a shallow copy, we would expect the sum length to be 12. If it's a deep copy, the sum should be 9.

But this code won't compile at all in Rust! The reason is ownership.

Ownership

Deep copies are often much more expensive than the programmer intends. So a performance-oriented language like Rust avoids using deep copying by default. But let's think about what will happen if the example above is a simple shallow copy. When s1 and s2 go out of scope, Rust will call drop on both of them. And they will free the same memory! This kind of "double delete" is a big problem that can crash your program and cause security problems.

In Rust, here's what would happen with the above code. Using let s2 = s1 will do a shallow copy. So s2 will point to the same heap memory. But at the same time, it will invalidate the s1 variable. Thus when we try to push values to s1, we'll be using an invalid reference. This causes the compiler error.

At first, s1 "owns" the heap memory. So when s1 goes out of scope, it will free the memory. But declaring s2 gives over ownership of that memory to the s2 reference. So s1 is now invalid. Memory can only have one owner. This is the main idea to get familiar with.

Here's an important implication of this. In general, passing variables to a function gives up ownership. In this example, after we pass s1 over to add_to_len, we can no longer use it.

fn main() {
  let s1 = String::from("Hello");
  let length = add_to_length(s1);

  // This is invalid! s1 is now out of scope!
  println!("{}", s1);
}

// After this function, drop is called on s
// This deallocates the memory!
fn add_to_length(s: String) -> i32 {
  5 + s.len()
}

This seems like it would be problematic. Won't we want to call different functions with the same variable as an argument? We could work around this by giving back the reference through the return value. This requires the function to return a tuple.

fn main() {
  let s1 = String::from("Hello");
  let (length, s2) = add_to_length(s1);

  // Works
  println!("{}", s2);
}

fn add_to_length(s: String) -> (i32, String) {
  (5 + s.len(), s)
}

But this is cumbersome. There's a better way.

Borrowing References

Like in C++, we can pass a variable by reference. We use the ampersand operator (&) for this. It allows another function to "borrow" ownership, rather than "taking" ownership. When it's done, the original reference will still be valid. In this example, the s1 variable re-takes ownership of the memory after the function call ends.

fn main() {
  let s1 = String::from("Hello");
  let length = add_to_length(&s1);

  // Works
  println!("{}", s1);
}

fn add_to_length(s: &String) -> i32 {
  5 + s.len()
}

This works like a const reference in C++. If you want a mutable reference, you can do this as well. The original variable must be mutable, and then you specify mut in the type signature.

fn main() {
  let mut s1 = String::from("Hello");
  let length = add_to_length(&mut s1);

  // Prints "Hello World!"
  println!("{}", s1);
}

fn add_to_length(s: &mut String) -> i32 {
  s.push_str(", World!");
  5 + s.len()
}

There's one big catch though! You can only have a single mutable reference to a variable at a time! Otherwise your code won't compile! This helps prevent a large category of bugs!

As a final note, if you want to do a true deep copy of an object, you should use the clone function.

fn main() {
  let s1 = String::from("Hello");
  let s2 = s1.clone();

  // Works!
  println!("{}", s1);
  println!("{}", s2);
}

Notes On Slices

We can wrap up with a couple thoughts on slices. Slices give us an immutable, fixed-size reference to a continuous part of an array. Often, we can use the string literal type str as a slice of an object String. Slices are either primitive data, stored on the stack, or they refer to another object. This means they do not have ownership and thus do not de-allocate memory when they go out of scope.

What's Next?

Hopefully this gives you a better understanding of how memory works in Rust! Next time, we'll start digging into how we can define our own types. We'll start seeing some more ways that Rust acts like Haskell!

James Bowen 11/25/19 James Bowen 11/25/19

Digging Into Rust's Syntax

Last time we kicked off our study of Rust with a quick overview comparing it with Haskell. In this article, we'll start getting familiar with some of the basic syntax of Rust. The initial code looks a bit more C-like. But we'll also see how functional principles like those in Haskell are influential!

For a more comprehensive guide to starting out with Rust, take a look at our Rust video tutorial!

Hello World

As we should with any programming language, let's start with a quick "Hello World" program.

fn main() {
  println!("Hello World!");
}

Immediately, we can see that this looks more like a C++ program than a Haskell program. We can call a print statement without any mention of the IO monad. We see braces used to delimit the function body, and a semicolon at the end of the statement. If we wanted, we could add more print statements.

fn main() {
  println!("Hello World!");
  println!("Goodbye!");
}

There's nothing in the type signature of this main function. But we'll explore more further down.

Primitive Types and Variables

Before we can start getting into type signatures though, we need to understand types more! In another nod to C++ (or Java), Rust distinguishes between primitive types and other more complicated types. We'll see that type names are a bit more abbreviated than in other languages. The basic primitives include:

Various sizes of integers, signed and unsigned (i32, u8, etc.)
Floating point types f32 and f64.
Booleans (bool)
Characters (char). Note these can represent unicode scalar values (i.e. beyond ASCII)

We mentioned last time how memory matters more in Rust. The main distinction between primitives and other types is that primitives have a fixed size. This means they are always stored on the stack. Other types with variable size must go into heap memory. We'll see next time what some of the implications of this are.

Like "do-syntax" in Haskell, we can declare variables using the let keyword. We can specify the type of a variable after the name. Note also that we can use string interpolation with println.

fn main() {
  let x: i32 = 5;
  let y: f64 = 5.5;
  println!("X is {}, Y is {}", x, y);
}

So far, very much like C++. But now let's consider a couple Haskell-like properties. While variables are statically typed, it is typically unnecessary to state the type of the variable. This is because Rust has type inference, like Haskell! This will become more clear as we start writing type signatures in the next section. Another big similarity is that variables are immutable by default. Consider this:

fn main() {
  let x: i32 = 5;
  x = 6;
}

This will throw an error! Once the x value gets assigned its value, we can't assign another! We can change this behavior though by specifying the mut (mutable) keyword. This works in a simple way with primitive types. But as we'll see next time, it's not so simple with others! The following code compiles fine!

fn main() {
  let mut x: i32 = 5;
  x = 6;
}

Functions and Type Signatures

When writing a function, we specify parameters much like we would in C++. We have type signatures and variable names within the parentheses. Specifying the types on your signatures is required. This allows type inference to do its magic on almost everything else. In this example, we no longer need any type signatures in main. It's clear from calling printNumbers what x and y are.

fn main() {
  let x = 5;
  let y = 7;
  printNumbers(x, y);
}

fn printNumbers(x: i32, y: i32) {
  println!("X is {}, Y is {}", x, y);
}

We can also specify a return type using the arrow operator ->. Our functions so far have no return value. This means the actual return type is (), like the unit in Haskell. We can include it if we want, but it's optional:

fn printNumbers(x: i32, y: i32) -> () {
  println!("X is {}, Y is {}", x, y);
}

We can also specify a real return type though. Note that there's no semicolon here! This is important!

fn add(x: i32, y: i32) -> i32 {
  x + y
}

This is because a value should get returned through an expression, not a statement. Let's understand this distinction.

Statements vs. Expressions

In Haskell most of our code is expressions. They inform our program what a function "is", rather than giving a set of steps to follow. But when we use monads, we often use something like statements in do syntax.

addExpression :: Int -> Int -> Int
addExpression x y = x + y

addWithStatements ::Int -> Int -> IO Int
addWithStatements x y = do
  putStrLn "Adding: "
  print x
  print y
  return $ x + y

Rust has both these concepts. But it's a little more common to mix in statements with your expressions in Rust. Statements do not return values. They end in semicolons. Assigning variables with let and printing are expressions.

Expressions return values. Function calls are expressions. Block statements enclosed in braces are expressions. Here's our first example of an if expression. Notice how we can still use statements within the blocks, and how we can assign the result of the function call:

fn main() {
  let x = 45;
  let y = branch(x);
}

fn branch(x: i32) -> i32 {
  if x > 40 {
    println!("Greater");
    x * 2
  } else {
    x * 3
  }
}

Unlike Haskell, it is possible to have an if expression without an else branch. But this wouldn't work in the above example, since we need a return value! As in Haskell, all branches need to have the same type. If the branches only have statements, that type can be ().

Note that an expression can become a statement by adding a semicolon! The following no longer compiles! Rust thinks the block has no return value, because it only has a statement! By removing the semicolon, the code will compile!

fn add(x: i32, y: i32) -> i32 {
  x + y; // << Need to remove the semicolon!
}

This behavior is very different from both C++ and Haskell, so it takes a little bit to get used to it!

Tuples, Arrays, and Slices

Like Haskell, Rust has simple compound types like tuples and arrays (vs. lists for Haskell). These arrays are more like static arrays in C++ though. This means they have a fixed size. One interesting effect of this is that arrays include their size in their type. Tuples meanwhile have similar type signatures to Haskell:

fn main() {
  let my_tuple: (u32, f64, bool) = (4, 3.14, true);
  let my_array: [i8; 3] = [1, 2, 3];
}

Arrays and tuples composed of primitive types are themselves primitive! This makes sense, because they have a fixed size.

Another concept relating to collections is the idea of a slice. This allows us to look at a contiguous portion of an array. Slices use the & operator though. We'll understand why more after the next article!

fn main() {
  let an_array = [1, 2, 3, 4, 5];
  let a_slice = &a[1..4]; // Gives [2, 3, 4]
}

What's Next

We've now got a foothold with the basics of Rust syntax. Next time, we'll start digging deeper into more complicated types. We'll discuss types that get allocated on the heap. We'll also learn the important concept of ownership that goes along with that.

James Bowen 11/18/19 James Bowen 11/18/19

Get Ready for Rust!

I'm excited to announce that for the next few weeks, we'll be exploring the Rust language! Rust is a very interesting language to compare to Haskell. It has some similar syntax. But it is not as similar as, say, Elm or Purescript. Rust can also look a great deal like C++. And its similarities with C++ are where a lot of its strongpoints are.

In these next few weeks we'll go through some of the basics of Rust. We'll look at things like syntax and building small projects. In this article, we'll do a brief high level comparison between Haskell and Rust. Next time, we'll start digger deeper in some actual code.

To get jump started on your Rust development, take a look at our Starting out with Rust video tutorial!.

Why Rust?

Rust has a few key differences that make it better than Haskell for certain tasks and criteria. One of the big changes is that Rust gives more control over the allocation of memory in one's program.

Haskell is a garbage collected language. The programmer does not control when items get allocated or deallocated. Every so often, your Haskell program will stop completely. It will go through all the allocated objects, and deallocate ones which are no longer needed. This simplifies our task of programming, since we don't have to worry about memory. It helps enable language features like laziness. But it makes the performance of your program a lot less predictable.

I once proposed that Haskell's type safety makes it good for safety critical programs. There's still some substance to this idea. But the specific example I suggested was a self-driving car, a complex real-time system. But the performance unknowns of Haskell make it a poor choice for such real-time systems.

With more control over memory, a programmer can make more assertions over performance. One could assert that a program never uses too much memory. And they'll also have the confidence that it won't pause mid-calculation. Besides this principle, Rust is also made to be more performant in general. It strives to be like C/C++, perhaps the most performant of all mainstream languages.

Rust is also currently more popular with programmers. A larger community correlates to certain advantages, like a broader ecosystem of packages. Companies are more likely to use Rust than Haskell since it will be easier to recruit engineers. It's also a bit easier to bring engineers from non-functional backgrounds into Rust.

Similarities

That said, Rust still has a lot in common with Haskell! Both languages embrace strong type systems. They view the compiler as a key element in testing the correctness of our program. Both embrace useful syntactic features like sum types, typeclasses, polymorphism, and type inference. Both languages also use immutability to make it easier to write correct programs.

What's Next?

Next time, we'll start digging into the language itself. We'll go over some basic examples that show some of the important syntactic points about Rust. We'll explore some of the cool ways in which Rust is like Haskell, but also some of the big differences.

James Bowen 11/11/19 James Bowen 11/11/19

Looking Ahead with More Steps!

In last week's article, we set ourselves up to make our agent use temporal difference learning. But TD is actually a whole family of potential learning methods we can use. They intertwine with other concepts in a bigger category of reinforcement learning algorithms.

In this article, we'll consider a couple possible TD approaches. We'll also examine a bit of theory surrounding other reinforcement learning topics.

For a more high level overview of using Haskell and AI, take a look at our Haskell AI Series! This series will also help you better grasp some of the basics of TensorFlow.

One Step Temporal Difference

Temporal difference learning has one general principle. The evaluation of the current game position should be similar to the evaluation of positions in the future. So at any given step we have our "current" evaluation. Then we have a "target" evaluation based on the future. We want to train our network so that the current board gets evaluated more like the target value.

We can see this in the way we defined our model. The tdTrainStep takes two different values, the target evaluation and the current evaluation.

data TDModel = TDModel
  { …
  , tdTrainStep :: TensorData Float -> TensorData Float -> Session ()
  }

And in fact, doing this calculation isn't so different from what we've done before. We'll take the difference between these evaluations, square it, and use reduceSum. This gives our loss function. Then we'll have TensorFlow minimize the loss function.

createTDModel :: Session TDModel
createTDModel = do
  ...
  -- Train Model
  targetEval <- placeholder (Shape [1])
  currentEval <- placeholder (Shape [1])
  let diff = targetEval `sub` currentEval
  let loss = reduceSum (diff `mul` diff)
  trainer <- minimizeWith
    adam loss [hiddenWeights, hiddenBias, outputWeights, outputBias]
  let trainStep = \targetEvalFeed currentEvalFeed ->
        runWithFeeds [feed targetEval targetEvalFeed, feed currentEval currentEvalFeed] trainer
  return $ TDModel
    { ...
    , tdTrainStep = trainStep
    }

Let's now recall how we got our target value last week. We looked at all our possible moves, and used them to advance the world one step. We then took the best outcome out of those, and that was our target value. Because we're advancing one step into the world, we call this "one-step" TD learning.

Adding More Steps

But there's no reason we can't look further into the future! We can consider what the game will look like in 2 moves, not just one move! To do this, let's generalize our function for stepping forward. It will be stateful over the same parameters as our main iteration function. But we'll call it in a way so that it doesn't affect our main values.

We'll make one change to our approach from last time. If a resulting world is over, we'll immediately put the "correct" evaluation value. In our old approach, we would apply this later. Our new function will return the score from advancing the game, the game result, and the World at this step.

advanceWorldAndGetScore :: Float -> TDModel
  -> StateT (World, StdGen) Session (Float, GameResult, World)
advanceWorldAndGetScore randomChance model = do
  (currentWorld, gen) <- get
  let allMoves = possibleMoves currentWorld
  let newWorlds = fst <$> map ((flip stepWorld) currentWorld) allMoves
  allScoresAndResults <- Data.Vector.fromList <$>
    (forM newWorlds $ \w -> case worldResult w of
      GameLost -> return (0.0, GameLost)
      GameWon -> return (1.0, GameWon)
      GameInProgress -> do
        let worldData = encodeTensorData
              (Shape [1, inputDimen]) (vectorizeWorld8 w)
        scoreVector <- lift $ (tdEvaluateWorldStep model) worldData
        return $ (Data.Vector.head scoreVector, GameInProgress))

  let (chosenIndex, newGen) = bestIndexOrRandom
                                allScoresAndResults gen 
  put (newWorlds !! chosenIndex, newGen)
  let (finalScore, finalResult) = allScoresAndResults ! chosenIndex
  return $ (finalScore, finalResult, newWorlds !! chosenIndex)
  where
    -- Same as before, except with resultOrdering
    bestIndexOrRandom :: Vector (Float, GameResult) -> StdGen
      -> (Int, StdGen)
    ...

    -- First order by result (Win > InProgress > Loss), then score
    resultOrdering :: (Float, GameResult) -> (Float, GameResult)
      -> Ordering
    ...

Now we'll call this from our primary iteration function. It seems a little strange. We unwrap the World from our state only to re-wrap it in another state call. But it will make more sense in a second!

runWorldIteration :: Float -> TDModel
  -> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
  (currentWorld, gen) <- get

  ((chosenNextScore, finalResult, nextWorld), (_, newGen)) <-
    lift $ runStateT
      (advanceWorldAndGetScore randomChance model)
      (currentWorld, gen)

So at the moment, our code is still doing one-step temporal difference. But here's the key. We can now sequence our state action to look further into the future. We'll then get many values to compare for the score. Here's what it looks like for us to look two moves ahead and take the average of all the scores we get:

runWorldIteration :: Float -> TDModel
  -> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
  (currentWorld, gen) <- get

  let numSteps = 2
  let repeatedUpdates = sequence $ replicate numSteps
        (advanceWorldAndGetScore randomChance model)
  (allWorldResults, (_, newGen)) <- lift $
    runStateT repeatedUpdates (currentWorld, gen)

  let allScores = map (\(s, _, _) -> s) allWorldResults
  let averageScore = sum allScores / fromIntegral (length allScores)
  let nextScoreData = encodeTensorData
        (Shape [1]) (Data.Vector.singleton averageScore)
  ...

When it comes to continuing the function though, we only consider the first world and result:

runWorldIteration :: Float -> TDModel
  -> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
  let (_, result1, nextWorld1) = Prelude.head allWorldResults
  put (nextWorld1, newGen)
  case result1 of
    GameLost -> return False
    GameWon -> return True
    GameInProgress -> runWorldIteration randomChance model

We could take more steps if we wanted! We could also change how we get our target score. We could give more weight to near-future scores. Or we could give more weight to scores in the far future. These are all just parameters we can tune now. We can now refer to our temporal difference algorithm as "n-step", rather than 1-step.

Monte Carlo vs. Dynamic Programming

With different parameters, our TD approach can look like other common learning approaches. Dynamic Programming is an approach where we adjust our weights after each move in the game. We expect rewards for a particular state to be like those of near-future states. We use the term "bootstrapping" for "online" learning approaches like this. TD learning also applies bootstrapping.

However, dynamic programming requires that we have a strong model of the world. That is, we would have to know the probability of getting into certain states from our current state. This allows us to more accurately predict the future. We could apply this approach to our maze game on a small enough grid. But the model size would increase exponentially with the grid size and enemies! So our approach doesn't actually do this! We can advance the world with a particular move, but we don't have a comprehensive model of how the world works.

In this regard, TD learning is more like Monte Carlo learning. This algorithm is "model free". But it is not an online algorithm! We must play through an entire episode of the game before we can update the weights. We could take our "n-step" approach above, and play it out over the course of the entire game. If we then chose to provide the full weighting to the final evaluation, our model would be like Monte Carlo!

In general, the more steps we add to our TD approach, the more it approximates Monte Carlo learning. The fewer steps we have, the more it looks like dynamic programming.

TD Lambda

TD Gammon, the algorithm we mentioned last time, uses a variation of TD learning called "TD Lambda". It involves looking both forward in time as well as backwards. It observes that the best solutions lie between the extremes of one-step TD and Monte Carlo.

Academic literature can help give a more complete picture of machine learning. One great text is Reinforcement Learning, by Sutton and Barto. It's one of the authoritative texts on the topics we've discussed in this article!

What's Next

This concludes our exploration of AI within the context of our Maze Game. We'll come back to AI and Machine Learning again soon. Next week, we'll start tackling a new subject in the realm of functional programming, something we've never looked at before on this blog! Stay tuned!

James Bowen 11/4/19 James Bowen 11/4/19

Setting Up Our Model with Look-Ahead

Last week we went over some of the basics of Temporal Difference (TD) learning. We explored a bit of the history, and compared it to its cousin, Q-Learning. Now let's start getting some code out there. Since there's a lot in common with Q-Learning, we'll want a similar structure.

This is at least the third different model we've defined over the course of this series. So we can now start observing the patterns we see in developing these algorithms. Here's a quick outline, before we get started:

Define the inputs and outputs of the system.
Define the data model. This should contain the weight variables we are trying to learn. By including them in the model, we can output our results later. It should also contain important Session actions, such as training.
Create the model
Run iterations using our model

We'll follow this outline throughout the article!

If you're new to Haskell and machine learning, a lot of the code we write here won't make sense. You should start off a little easier with our Haskell AI Series. You should also download our Haskell Tensor Flow Guide.

Inputs and Outputs

For our world features, we'll stick with our hand-crafted feature set, but simplified. Recall that we selected 8 different features for every location our bot could move to. We'll stick with these 8 features. But we only need to worry about them for the current location of the bot. We'll factor in look-ahead by advancing the world for our different moves. So the "features" of adjacent squares are irrelevant. This vectorization is easy enough to get using produceLocationFeatures:

vectorizeWorld8 :: World -> V.Vector Float
vectorizeWorld8 w = V.fromList (fromIntegral <$>
  [ lfOnActiveEnemy standStill
  , lfShortestPathLength standStill
  , lfManhattanDistance standStill
  , lfEnemiesOnPath standStill
  , lfNearestEnemyDistance standStill
  , lfNumNearbyEnemies standStill
  , lfStunAvailable standStill
  , lfDrillsRemaining standStill
  ])
  where
    standStill = produceLocationFeatures
      (playerLocation . worldPlayer $ w) w False

We also don't need to be as concerned about exploring the maze with this agent. We'll be defining what its possible moves are at every turn. This is a simple matter of using this function we have from our game:

possibleMoves :: World -> [PlayerMove]

We should also take this opportunity to specify the dimensions of our network. We'll use 20 hidden units:

inputDimen :: Int64
inputDimen = 8

hiddenDimen :: Int64
hiddenDimen = 20

outputDimen :: Int64
outputDimen = 1

Define the Model

Now let's define our data model. As in the past, we'll use a dense (fully-connected) neural network with one hidden layer. This means we'll expose two sets of weights and biases:

data TDModel = TDModel
  { tdHiddenWeights :: Variable Float
  , tdHiddenBias :: Variable Float
  , tdOutputWeights :: Variable Float
  , tdOutputBias :: Variable Float
  ...
  }

We'll also have two different actions to take with our tensor graph, as we had with Q-Learning. The first will be for evaluating a single world state. The second will take an expected score for the world state as well as the actual score for a world state. It will compare them and train our model:

data TDModel = TDModel
  { ...
  , tdEvaluateWorldStep :: TensorData Float -> Session (Vector Float)
  , tdTrainStep :: TensorData Float -> TensorData Float -> Session ()
  }

Building the Model

Now we need to construct this model. We'll start off as always by initializing random variables for our weights and biases. We'll also make a placeholder for our world input:

createTDModel :: Session TDModel
createTDModel = do
  (worldInputVector :: Tensor Value Float) <-
    placeholder (Shape [1, inputDimen])
  hiddenWeights <- truncatedNormal (vector [inputDimen, hiddenDimen])
    >>= initializedVariable
  hiddenBias <- truncatedNormal (vector [hiddenDimen])
    >>= initializedVariable
  outputWeights <- truncatedNormal (vector [hiddenDimen, outputDimen])
    >>= initializedVariable
  outputBias <- truncatedNormal (vector [outputDimen])
    >>= initializedVariable
  ...

Each layer of our dense network consists of a matrix multiplication by the weights, and adding the bias vector. Between the layers, we'll apply relu activation. We conclude by running the output vector with an input feed:

createTDModel :: Session TDModel
createTDModel = do
  ...
  let hiddenLayerResult = relu $
        (worldInputVector `matMul` (readValue hiddenWeights))
        `add` (readValue hiddenBias)
  let outputLayerResult =
        (hiddenLayerResult `matMul` (readValue outputWeights))
        `add` (readValue outputBias)
  let evaluateStep = \inputFeed -> runWithFeeds
        [feed worldInputVector inputFeed] outputLayerResult
  ...

We'll leave the training step undefined for now. We'll work on that next time.

createTDModel :: Session TDModel
createTDModel = do
  …
  return $ TDModel
    { tdHiddenWeights = hiddenWeights
    , tdHiddenBias = hiddenBias
    , tdOutputWeights = outputWeights
    , tdOutputBias = outputBias
    , tdEvaluateWorldStep = evaluateStep
    , tdTrainStep = undefined
    }

Running World Iterations

Much of the skeleton and support code remains the same from Q-Learning. But let's go over the details of running a single iteration on one of our worlds. This function will take our model as a parameter, as well as a random move chance. (Recall that adding randomness to our moves will help us avoid a stagnant model). It will be stateful over the World and a random generator.

runWorldIteration :: Float -> TDModel
  -> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
  ...

We'll start off by getting all the possible moves from our current position. We'll step the world forward for each one of these moves. Then we'll feed the resulting worlds into our model. This will give us the scores for every move:

runWorldIteration :: Float -> TDModel
  -> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
  (currentWorld, gen) <- get
  let allMoves = possibleMoves currentWorld
  let newWorlds = fst <$> map ((flip stepWorld) currentWorld) allMoves
  (allScores :: Vector Float) <-
    Data.Vector.fromList <$> (forM newWorlds $ \w -> do 
      let worldData = encodeTensorData
            (Shape [1, inputDimen]) (vectorizeWorld8 w)
      scoreVector <- lift $ (tdEvaluateWorldStep model) worldData
      return $ Data.Vector.head scoreVector)
  ...

Now we need to take a similar action to what we had with Q-Learning. We'll roll the dice, and either select the move with the best score, or we'll select a random index.

runWorldIteration :: Float -> TDModel
  -> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
  ...
  let (chosenIndex, newGen) = bestIndexOrRandom allScores gen
  ...
  where
    bestIndexOrRandom :: Vector Float -> StdGen -> (Int, StdGen)
    bestIndexOrRandom scores gen =
      let (randomMoveRoll, gen') = randomR (0.0, 1.0) gen
          (randomIndex, gen'') = randomR (0, 1) gen'
      in  if randomMoveRoll < randomChance
            then (randomIndex, gen'')
            else (maxIndex scores, gen')

Now that we have our "chosen" move and its score, we'll encode that score as data to pass to the training step. The exception to this is if the game ends. In that case, we'll have a "true" score of 1 or 0 to give. While we're at it, we can also calculate the continuationAction. This is either returning a boolean for ending the game, or looping again.

runWorldIteration :: Float -> TDModel
  -> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
  ...
  let nextWorld = newWorlds !! chosenIndex
  put (nextWorld, newGen)
  let (chosenNextScore, continuationAction) =
        case worldResult nextWorld of
          GameLost -> (0.0, return False)
          GameWon -> (1.0, return True)
          GameInProgress -> ( allScores ! chosenIndex
                            , runWorldIteration randomChance model)
  let nextScoreData = encodeTensorData
        (Shape [1]) (Data.Vector.singleton chosenNextScore)
  ...

We'll also encode the evaluation of our current world. Then we'll pass these values to our training step, and run the continuation!

runWorldIteration :: Float -> TDModel
  -> StateT (World, StdGen) Session Bool
runWorldIteration randomChance model = do
  ...
  let currentWorldData = encodeTensorData
        (Shape [1, inputDimen]) (vectorizeWorld8 currentWorld)
  currentScoreVector <- lift $
    (tdEvaluateWorldStep model) currentWorldData
  let currentScoreData = encodeTensorData
        (Shape [1]) currentScoreVector

  lift $ (tdTrainStep model) nextScoreData currentScoreData

  continuationAction

What's Next?

We've now got the basic framework set up for our TD agent. Next time, we'll start digging into the actual formula we use to learn the weights. It's a little more complicated than some of the previous loss functions we've dealt with in the past.

If you want to get started with Haskell and Tensor Flow, download our Haskell Tensor Flow Guide. It will help you learn the basics of this complicated library!

James Bowen 10/28/19 James Bowen 10/28/19

Temporal Difference Primer

Last week we finished our exploration of supervised learning with our maze game. We explored a more complex model that used convolution and pooling. This week, we're going back to "unsupervised" learning. We'll consider another approach that does not require the specification of "correct" outputs.

This approach is Temporal Difference Learning (TD Learning). It relies on having a function to evaluate a game position. Its main principle is that the current position should have a similar evaluation to positions in the near future.

Our evaluation function will use weights whose values our training program will learn. We'll want to learn these weights to minimize the difference between game evaluations. In this article, we'll take a high level look at this approach, before we get into the details next time.

History of TD Learning

The concept of TD learning was first developed in the 1980's. One of the more famous applications of TD learning in the 1990's was to learn an AI for Backgammon, called TD Gammon. This agent could play the game at an intermediate human level. It did this initially with no hand-crafting of any of the game rules or any algorithm.

Getting to this level with a "knowledge free" algorithm was almost unheard of at the time. When providing hand-crafted features, the agent could then play at a near-expert level. It explored many possibilities that human players had written off. In doing so, it contributed new ideas to high level backgammon play. It was an important breakthrough in unsupervised techniques.

Q-Learning vs. TD Learning

A few weeks back, we explored Q-Learning. And at first glance, Q-Learning and TD learning might sound similar. But with temporal difference, we'll be learning a different function. In Q-Learning, we learned the Q function. This is a function that takes in our current game board and provides a score for each possible move. With TD, we'll be learning what we call the V function. This function is a direct evaluation of the current board.

With our game mechanics, our agent chooses between 10 different moves. So the "output" vector of our Q-Learning network had size 10. Now in temporal difference learning, we'll only output a single number. This will be an "evaluation", or score, of the current position.

If a game has more than 2 outcomes, you would want the evaluation function to give a score for each of them. But our game has a binary outcome, so one number is enough.

Basics

Despite this difference, our TensorFlow code will have a similar structure to Q-Learning. Here's a high level overview:

Our model will take an arbitrary game state and produce a score.
At each iteration, we will get the model's output score on all possible moves from that position. We'll account for enemy moves when doing this. We will then choose the move for the best resulting board.
We will advance the world based on this move, and then pass the resulting world through our model again.
Then, adjust the weights so that the evaluations of the new world and the original world are more similar.
If the resulting world is either a "win" or a "loss", we'll use the correct value (1 or 0) as the evaluation. Otherwise, we'll use our evaluation function.

What's Next

Next time, we'll dig into more specifics. It will be a bit tricky to use an evaluation function for our game in conjunction with TensorFlow. But once we have that, we can get into the meatier parts of this algorithm. We'll see exactly what operations we need to train our agent.

To learn more about using Haskell with AI, read our Haskell AI Series! This series shows some of the unique ideas that Haskell can bring to the world of machine learning.

James Bowen 10/21/19 James Bowen 10/21/19

Enemies, Convolution and Pooling

Let's remember now that last week we were still keeping with the idea of supervised learning. We were trying to get our agent to reach the goal on an empty maze, using own moves as a guide. Since the problem was so simple, we could make an agent that navigating the maze.

But what happens if we introduce enemies into the maze? This will complicate the process. Our manual AI isn't actually that successful in this scenario. But recording our own moves provides a reasonable data set.

First, we need to update the features a little bit. Recall that we're representing our board so that every grid space has a single feature. In our old representation, we used a value of 100 to represent the target space, and 10 to represent our own space. We need to expand this representation. We want to know where the enemies are, if they're stunned, and if we still have our stun.

Let's use 25.0 to represent our position if we have the stun. If we don't have our stun available, we'll use 10.0 instead. For positions containing active enemies, we'll use -25.0. If the enemy is in the stunned state, we'll use -10.0.

vectorizeWorld :: World -> V.Vector Float
vectorizeWorld w = finalFeatures
  where
    initialGrid = V.fromList $ take 100 (repeat 0.0)
    (px, py) = playerLocation (worldPlayer w)
    (gx, gy) = endLocation w
    playerLocIndex = (9 - py) * 10 + px
    playerHasStun = playerCurrentStunDelay (worldPlayer w) == 0
    goalLocIndex = (9 - gy) * 10 + gx
    finalFeatures = initialGrid V.//
      ([ (playerLocIndex
       , if playerHasStun then 25.0 else 10.0)
       , (goalLocIndex, 100.0)] ++ enemyLocationUpdates)

    enemyLocationUpdates = enemyPair <$> (worldEnemies w)

    enemyPair e =
      let (ex, ey) = enemyLocation e
      in  ( (9 - ey) * 10 + ex,
             if enemyCurrentStunTimer e == 0 then -25.0 else -10.0)

Using our moves as a training set, we see some interesting results. It's difficult to get great results on our error rates. The data set is very prone to overfitting. We often end up with training error in the 20-30% range with a test error near 50%. And yet, our agent can consistently win the game! Our problem space is still a bit simplistic, but it is an encouraging result!

Convolution and Pooling

As suggested last week, the ideas of convolution and pooling could be useful on this feature set. Let's try it out, for experimentation purposes. We'll start with a function to create a layer in our network that does convolution and pooling.

buildConvPoolLayer :: Int64 -> Int64 -> Tensor v Float -> Build (Variable Float, Variable Float, Tensor Build Float)
buildConvPoolLayer inputChannels outputChannels input = do
  weights <- truncatedNormal (vector weightsShape)
    >>= initializedVariable
  bias <- truncatedNormal (vector [outputChannels])
    >>= initializedVariable
  let conv = conv2D' convAttrs input (readValue weights)
               `add` readValue bias
  let results = maxPool' poolAttrs (relu conv)
  return (weights, bias, results)
  where
    weightsShape :: [Int64]
    weightsShape = [5, 5, inputChannels, outputChannels]

    -- Create "attributes" for our operations
    -- E.g. How far does convolution slide?
    convStridesAttr = opAttr "strides" .~ ([1,1,1,1] :: [Int64])
    poolStridesAttr = opAttr "strides" .~ ([1,2,2,1] :: [Int64])
    poolKSizeAttr = opAttr "ksize" .~ ([1,2,2,1] :: [Int64])
    paddingAttr = opAttr "padding" .~ ("SAME" :: ByteString)
    dataFormatAttr = opAttr "data_format" .~ ("NHWC" :: ByteString)
    convAttrs = convStridesAttr . paddingAttr . dataFormatAttr
    poolAttrs = poolKSizeAttr . poolStridesAttr . paddingAttr . dataFormatAttr

The input to this layer will be our 10x10 "image" of a grid. The output will be 5x5x32. The "5" dimensions come from halving the board dimension. The "32" will come from the number of channels. We'll replace our first neural network layer with this layer:

createModel :: Build Model
createModel = do
  let batchSize = -1
  let (gridDimen :: Int64) = 10
  let inputChannels = 1
  let (convChannels :: Int64) = 32
  let (nnInputSize :: Int64) = 5 * 5 * 32

  (inputs :: Tensor Value Float) <-
    placeholder [batchSize, moveFeatures]
  (outputs :: Tensor Value Int64) <-
    placeholder [batchSize]

  let convDimens = [batchSize, gridDimen, gridDimen, inputChannels]
  let inputAsGrid = reshape inputs (vector convDimens)
  (convWeights, convBias, convResults) <-
    buildConvPoolLayer inputChannels convChannels inputAsGrid

  let nnInput = reshape
          convResults (vector [batchSize, nnInputSize])
  (nnWeights, nnBias, nnResults) <-
    buildNNLayer nnInputSize moveLabels nnInput

  (actualOutput :: Tensor Value Int64) <- render $
    argMax nnResults (scalar (1 :: Int64))

We have a bit less success here. Our training is slower. We get similar error rate. But now our agent doesn't seem to win, unlike the agent from the pure, dense network. So it's not clear that we can actually treat this grid as an image and make any degree of progress.

What's Next?

Over the course of creating these different algorithms, we've discretized the game. We've broken it down into two phases. One where we move, one where the enemies can move. This will make it easier for us to move back to our evaluation function approach. We can even try using multi-move look-ahead. This could lead us to a position where we can try temporal difference learning. Researchers first used this approach to train an agent how to play Backgammon at a very high level.

Temporal difference learning is a very interesting concept. It's the last approach we'll try with our agent. We'll start out next time with an overview of TD learning before we jump into coding.

For another look at using Haskell with AI problems, read our Haskell and AI series! You can also download our Haskell Tensor Flow Guide for some more help with this library!

James Bowen 10/14/19 James Bowen 10/14/19

Different Feature Schemes

In last week's edition of our Maze AI series, we explored the mechanics of supervised learning. We took the training data we'd been building up and trained an agent on it. We had one set of data to make the AI follow our own human moves, and another to follow our hand-crafted AI. This wasn't particularly successful. The resulting agent had a lot of difficulty navigating the maze and using its stun at the right times.

This week, we'll explore a couple different ways we can expand the feature set. First, we'll try encoding the legality of moves in our feature set. Second, we'll try expanding the feature set to include more data about the grid. This will motivate some other approaches to the problem. We'll conclude by taking the specifics of grid navigation out. We'll let our agent go to work on an empty grid to validate that this is at least a reasonable approach.

For some more reading on using Haskell and AI, take a look at our Haskell AI Series. We explore some reasons why Haskell could be a good fit for AI and machine learning problems. It will also help you through some of the basics of using Haskell and Tensor Flow.

Encoding Legal Moves

Our supervised agent uses our current feature set. Let's remind ourselves what these features are. We have five different directions we can go (up, down, left, right, stand still). And in each of these directions, we calculate 8 different features.

The maze distance to the goal
The manhattan distance to the goal
Whether the location contains an active enemy
The number of enemies on the shortest path to the goal from that location
The distance to the nearest enemy from that location
The number of nearby enemies in manhattan distance terms
Whether our stun is available
The number of drills we have after the move

Some of these features are higher level. We do non-trivial calculations to figure them out. This gives our agent some idea of strategy. But there's not a ton of lower level information available! We zero out the features for a particular spot if it's past the world boundary. But we can't immediately tell from these features if a particular move is legal.

This is a big oversight. It's possible for our AI to learn about the legality of moves from the higher level training data. But it would take a lot more data and a lot more time.

So let's add a feature for how "easy" a move is. A value of 0 will indicate an illegal move, either past the world boundary or through a wall when we don't have a drill. A value of 1 will indicate a move that requires a drill. A value of 2 will indicate a normal move.

We'll add the extra feature into the LocationFeatures type. We'll also add an extra parameter to our produceLocationFeatures function. This boolean indicates whether a drill would be necessary. Note, we don't need to account for WorldBoundary. The value will get zeroed out in that case. We'll call this feature lfMoveEase since a higher value indicates less effort.

data LocationFeatures = LocationFeatures
  { …
  , lfMoveEase :: Int
  }

produceLocationFeatures :: Location -> World -> Bool -> LocationFeatures
produceLocationFeatures location@(lx, ly) w needsDrill = LocationFeatures
  …
  moveEase
  where
    moveEase = if not needsDrill then 2
      else if drillsRemaing > 0 then 1 else 0

It's easy to add the extra parameter to the function call in produceWorldFeatures. We already use case statements on the boundary types. Now we need to account for it when vectorizing our world.

vectorizeWorld :: World -> V.Vector Float
vectorizeWorld w = V.fromList (fromInegral <$>
  [ ...
  , lfMoveEase standStill
  ...
  , zeroIfNull (lfMoveEase <$> up)
  ...
  , zeroIfNull (lfMoveEase <$> right)
  ...
  , zeroIfNull (lfMoveEase <$> down)
  ...
  , zeroIfNull (lfMoveEase <$> left)
  ])

We we train with this feature set, we actually get a good training error, down to around 10%. Thus it can learn our data a bit better. Yet it still can't navigate right.

Expanding the Feature Set

Another option we can try is to serialize the world in a more raw state. We currently use more strategic features. But what about using the information on the board?

Here's a different way to look at it. Let's fix it so that the grid must be 10x10, there must be 2 enemies, and we must start with 2 drills powerups on the map. Let's get these features about the world:

100 features for the grid cells. Each feature will be the integer value corresponding the the wall-shape of that cell. These are hexadecimal, like we have when serializing the maze.
4 features for the player. Get the X and Y coordinates for the position, the current stun delay, and the number of drills remaining.
3 features for each enemy. Again, X and Y coordinates, as well as a stun timer.
2 coordinate features for each drill location. Once a drill gets taken, we'll use -1 and -1.

This will give us a total of 114 features. Here's how it breaks down.

vectorizeWorld :: World -> V.Vector Float
vectorizeWorld w = gridFeatures V.++ playerFeatures V.++
                     enemyFeatures V.++ drillFeatures
  where

    -- 1. Features for the Grid
    mazeStr = Data.Text.unpack $ dumpMaze (worldBoundaries w)
    gridFeatures = V.fromList $
      fromIntegral <$> digitToInt <$> mazeStr

    player = worldPlayer w
    enemies = worldEnemies w

    -- 2. Features for the player
    playerFeatures = V.fromList $ fromIntegral <$>
      [ fst . playerLocation $ player
      , snd . playerLocation $ player
      , fromIntegral $ playerCurrentStunDelay player
      , fromIntegral $ playerDrillsRemaining player
      ]

    -- 3. Features for the two enemies
    enemy1 = worldEnemies w !! 0
    enemy2 = worldEnemies w !! 1
    enemyFeatures = V.fromList $ fromIntegral <$>
      [ fst . enemyLocation $ enemy1
      , snd . enemyLocation $ enemy1
      , fromIntegral $ enemyCurrentStunTimer enemy1
      , fst . enemyLocation $ enemy2
      , snd . enemyLocation $ enemy2
      , fromIntegral $ enemyCurrentStunTimer enemy2
      ]

    -- 4. Features for the drill locations
    drills = worldDrillPowerUpLocations w
    drillFeatures = V.fromList $ fromIntegral <$>
      if length drills == 0 then [-1, -1, -1, -1]
        else if length drills == 1
          then [fst (head drills), snd (head drills), -1, -1]
          else [ fst (head drills), snd (head drills)
               , fst (drills !! 1), snd (drills !! 1)
               ]

As an optimization, we can make the grid features part of the world since they will not change.

Still though, our model struggles to complete the grid when training off this data. Compared to the high-level features, the model doesn't even learn very well. We get training errors around 25-30%, but a test error close to 50%. With more data and time, our model might be able to draw the connection between various features.

We could attempt to make our model more sophisticated. We're working with grid data, which is a little like an image. Image processing algorithms use concepts such as convolution and pooling. This allows them to derive patterns arising from how the grid actually looks. We're only looking at the data as a flat vector.

It's unlikely though that convolution and pooling would help us with this feature set. Our secondary features don't fit into the grid. So we would actually want to add them in at a later stage in the process. Besides, we won't get that much data from taking the average value or the max value in a 2x2 segment of the maze. (This is what pooling does).

If we simplify the problem though, we might find a situation where they'll help.

A Simpler Problem

We're having a lot of difficulty with getting our agent to navigate the maze. So let's throw away the problem of navigation for a second. Can we train an agent that will navigate the empty maze? This should be doable.

Let's start with a bare bones feature set with the goal and current location highlighted in a grid. We'll give a value of 10 for our player's location, and a value of 100 for the target location. We start with a vector of all zeros, and uses Vector.// to modify the proper values:

vectorizeWorld :: World -> V.Vector Float
vectorizeWorld w =
  where
    initialGrid = V.fromList $ take 100 (repeat 0.0)
    (px, py) = playerLocation (worldPlayer w)
    (gx, gy) = endLocation w
    playerLocIndex = (9 - py) * 10 + px
    goalLocIndex = (9 - gy) * 10 + gx
    finalFeatures = initialGrid V.//
      [(playerLocIndex, 10), (goalLocIndex)]

Our AI bot will always follow the same path in this grid, so it will be quite easy for our agent to learn this path! Even if we use our own moves and vary the path a little bit, the agent can still learn it. It'll achieve 100% accuracy on the AI data. It can't get that high on our data, since we might choose different moves for different squares. But we can still train it so it wins every time.

Conclusion

So our results are still not looking great. But next week we'll take this last idea and run a little further with it. We'll keep it so that our features only come from the grid itself. But we'll add a few more complications with enemies. We might find that convolution and pooling are useful in that case.

If you're interested in using Haskell for AI but don't know where to start, read our Haskell AI Series! We discuss some important ideas like why Haskell is a good AI language. We also get into the basics of Tensor Flow with Haskell.

James Bowen 10/7/19 James Bowen 10/7/19

Using Our Data with Supervised Learning

Our aim these last couple weeks has been to try a supervised learning approach to our game. In last week's article we gathered training data for playing the game. We had two different sources. First, we played the game ourselves and recorded our moves. Second, we let our AI play the game and recorded it. This gave us a few CSV files. Each line in these is a record containing the 40 "features" of the game board at the time and the move we chose.

This week, we're going to explore how to build a machine-learning agent based on this data. This will use supervised learning techniques . Look at the supervised-learning branch on our Github repository for more details.

To get started with Haskell and Tensor Flow, download our Haskell Tensor Flow Guide. This library is a little tricky to work with, so you want to make sure you know what you're doing!

Defining Our Model

For our supervised model, we're going to use a fully connected neural network with a single hidden layer. We'll have 40 input features, 100 hidden units, and then our 10 output values for the different move scores. We'll be following a very similar pattern to one we explored in this older article, using the basic Iris data set. We'll copy a lot of code from that article. We won't go over a lot of the helper code in this article, so feel free to check out that one for some help with that!

We define each layer with a "weights" matrix and a "bias" vector. We multiply the input by the weights and then add the bias vector. Let's explore how we can build a single layer of the network. This will take the input and output size, as well as the input tensor. It will have three results. One variable for the weights, one for the biases, and then a final "output" tensor:

buildNNLayer :: Int64 -> Int64 -> Tensor v Float
  -> Build (Variable Float, Variable Float, Tensor Build Float)

The definition is pretty simple. We'll initialize random variables for the weights and bias. We'll produce the result tensor by multiplying by the weights and adding the bias.

buildNNLayer :: Int64 -> Int64 -> Tensor v Float
  -> Build (Variable Float, Variable Float, Tensor Build Float)
buildNNLayer inputSize outputSize input = do
  weights <- truncatedNormal (vector [inputSize, outputSize])
    >>= initializedVariable
  bias <- truncatedNormal (vector [outputSize])
    >>= initializedVariable
  let results = (input `matMul` readValue weights)
        `add` readValue bias
  return (weights, bias, results)

Now that we understand the layers a little better, it's easier to define our model. First, we'll want to include both sets of weights and biases in the model, so we can output them later:

data Model = Model
  { w1 :: Variable Float
  , b1 :: Variable Float
  , w2 :: Variable Float
  , b2 :: Variable Float
  ...
  }

Now we want two different "steps" we can run. The training step will take a batch of data and determine what our network produces for the inputs. It will compare our network's output with the expected output. Then it will train to minimize the loss function. The error rate step will simply produce the error rate on the given data. That is, it will tell us what percentage of the moves we are getting correct. Both of these will be Session actions that take two inputs. First, the TensorData for the features, and then the TensorData for the correct moves:

data Model = Model
  { 
  … -- Weights and biases
  , train :: TensorData Float
          -> TensorData Int64
          -> Session ()
  , errorRate :: TensorData Float
              -> TensorData Int64
              -> Session (V.Vector Float) -- Produces a single number
  }

Let's see how we put this all together.

Building Our Model

To start, let's make placeholders for our input features and expected output results. A dimension of -1 means we can provide any size we like:

createModel :: Build Model
createModel = do
  let batchSize = -1
  (inputs :: Tensor Value Float) <-
    placeholder [batchSize, moveFeatures]
  (outputs :: Tensor Value Int64) <-
    placeholder [batchSize]
  ...

Now we build the layers of our neural network using our helper. We'll apply relu, an activation function, on the results of our hidden layer. This helps our model deal with interaction effects and non-linearities:

createModel :: Build Model
createModel = do
  ...
  (hiddenWeights, hiddenBiases, hiddenResults) <-
    buildNNLayer moveFeatures hiddenUnits inputs
  let rectifiedHiddenResults = relu hiddenResults
  (finalWeights, finalBiases, finalResults) <-
    buildNNLayer hiddenUnits moveLabels rectifiedHiddenResults
  ...

Now to get our error rate, we need a couple steps. We'll get the best move from each predicted result using argMax. We can then compare these to the training data using equal. By using reduceMean we'll get the percentage of our moves that match. Subtracting this from 1 gives our error rate:

createModel :: Build Model
createModel = do
  ...
  (actualOutput :: Tensor Value Int64) <- render $
    argMax finalResults (scalar (1 :: Int64))
  let (correctPredictions :: Tensor Build Float) = cast $
        equal actualOutput outputs
  (errorRate_ :: Tensor Value Float) <- render $
    1 - (reduceMean correctPredictions)

Now we need our training step. We'll compare outputs. This involves the softmaxCrossEntropyWithLogits function. We train our model by selecting our variables for training, and using minimizeWith. This will update the variables to reduce the value of the loss function:

createModel :: Build Model
createModel = do
  ...
  let outputVectors = oneHot outputs (fromIntegral moveLabels) 1 0
  let loss = reduceMean $ fst $
    softmaxCrossEntropyWithLogits finalResults outputVectors
  let params =
        [hiddenWeights, hiddenBiases, finalWeights, finalBiases]
  train_ <- minimizeWith adam loss params
  ...

We conclude by creating our functions. These take the tensor data as parameters. Then they use runWithFeeds to put the data into our placeholders:

createModel :: Build Model
createModel = do
  ...
return $ Model
    { train = \inputFeed outputFeed ->
        runWithFeeds
          [ feed inputs inputFeed
          , feed outputs outputFeed
          ]
          train_
    , errorRate = \inputFeed outputFeed ->
        runWithFeeds
          [ feed inputs inputFeed
          , feed outputs outputFeed
          ]
          errorRate_
    , w1 = hiddenWeights
    , b1 = hiddenBiases
    , w2 = finalWeights
    , b2 = finalBiases
    }

Running Our Tests

Now let's run our tests. We'll read the move record data from the file, shuffle them, and set aside a certain proportion as our test set. Then we'll build our model:

runTraining totalFile = runSession $ do
  initialRecords <- liftIO $ readRecordFromFile totalFile
  shuffledRecords <- liftIO $ shuffleM (V.toList initialRecords)
  let testRecords = V.fromList $ take 2000 shuffledRecords
  let trainingRecords = V.fromList $ drop 2000 shuffledRecords
  model <- build createModel
  ...

Then we run our iterations (we'll do 50000, as an example). We select some random records (100 per batch), and then convert them to data. Then we run our train step. Finally, every 100 iterations or so, we'll get a gauge of the training error on this set. This involves the errorRate step. Note our error rate returns a vector with a single wrapped value. So we need to unwrap it with !.

runTraining totalFile = runSession $ do
  ...
  forM_ ([0..50000] :: [Int]) $ \i -> do
    trainingSample <- liftIO $ chooseRandomRecords trainingRecords
    let (trainingInputs, trainingOutputs) =
          convertRecordsToTensorData trainingSample
    (train model) trainingInputs trainingOutputs
    when (i `mod` 100 == 0) $ do
      err <- (errorRate model) trainingInputs trainingOutputs
      liftIO $ putStrLn $
        (show i) ++ " : current error " ++ show ((err V.! 0) * 100)

Now to run the final test, we use the errorRate step again, this time on our test data:

runTraining totalFile = runSession $ do
  ...

  -- Testing
  let (testingInputs, testingOutputs) =
        convertRecordsToTensorData testRecords
  testingError <- (errorRate model) testingInputs testingOutputs
  liftIO $ putStrLn $
    "test error " ++ show ((testingError V.! 0) * 100)

Results

When it comes to testing our system, we should use proper validation techniques. We want a model that will represent our training data well. But it should also generalize well to other reasonable examples. If our model represents the training data too well, we're in danger of "overfitting" our data. To check this, we'll hold back roughly 20% of the data. This will be our "test" data. We'll train our model on the other 80% of the data. Every 100 steps or so, we print out the training error on that batch of data. We hope this figure drops. But then at the very end, we'll run the model on the other 20% of the data, and we'll see what the error rate is. This will be the true test of our system.

We know we have overfitting if we see figures on training error that are lower than the testing error. When training on human moves for 50000 iterations, the training error drops to the high teens and low 20's. But the test error is often still close to 50%. This suggests we shouldn't be training quite as long.

The AI moves provide a little more consistency though. The training error seems to stabilize around the mid 20's and low 30's, and we end up with a test error of about 34%.

Conclusion

Our error rate isn't terrible. But it's not great either. And worse, testing shows it doesn't appear to capture the behaviors well enough to win the game. A case like this suggests our model isn't sophisticated enough to capture the problem. It could also suggest our data is too noisy, and the patterns we hoped to find aren't there. The feature set we have might not capture all the important information about the graph.

For our final look at this problem, we're going to try a more new serialization technique. Instead of deriving our own features, we're going to serialize the entire game board! The "feature space" will be much much larger now. It will include the structure of the graph and information about enemies and drills. This will call for a more sophisticated model. A pure fully connected network will take a long time to learn things like how walls allow moves or not. A big drawback of this technique is that it will not generalize to arbitrary mazes. It will only work for a certain size and number of enemies. But with enough training time we may find that interesting patterns emerge. So come back next week to see how this works!

James Bowen 9/30/19 James Bowen 9/30/19

Gathering Smart Data

Last week we made a few more fixes to our Q-Learning algorithm. Ultimately though, it still seems to fall short for even basic versions of our problem.

Q-learning is an example of an "unsupervised" learning approach. We don't tell the machine learning algorithm what the "correct" moves are. We give it rewards when it wins the game (and negative rewards when it loses). But it needs to figure out how to play to get those rewards. With supervised learning, we'll have specific examples of what it should do! We'll have data points saying, "for this feature set, we should make this move." We'll determine a way to record the moves we make in our game, both as a human player and with our manual AI algorithm! This will become our "training" data for the supervised learning approach.

This week's code is all on the Gloss side of things. You can find it on our Github repository under the branch record-player-ai. Next week, we'll jump back into Tensor Flow. If you're not familiar yet with how to use Haskell and Tensor Flow, download our Haskell Tensor Flow Guide!

Recording Moves

To gather training data, we first need a way to record moves in the middle of the game. Gloss doesn't give us access to the IO monad in our update functions. So we'll unfortunately have to resort to unsafePerformIO for this, since we need the data in a file. (We did the same thing when saving game states). Here's the skeleton of our function:

unsafeSaveMove :: Int -> World -> World -> World
unsaveSaveMove moveChoice prevWorld nextWorld = unsafePerformIO $ do
  ...

The first parameter will be a representation of our move, an integer from 0-9. This follows the format we had with serialization.

0 -> Move Up
1 -> Move Right
2 -> Move Down
3 -> Move Left
4 -> Stand Still
X + 5 -> Move direction X and use the stun

The first World parameter will be the world under which we made the move. The second world will be the resulting world. This parameter only exists as a pass-through, because of how unsafePerformIO works.

Given these parameters, our function is pretty straightforward. We want to record a single line that has the serialized world state values and our final move choice. These will go in a comma separated list. We'll save everything to a file called moves.csv. So let's open that file and get the list of numbers. We'll immediately convert the numbers to strings with show.

unsafeSaveMove :: Int -> World -> World -> World
unsaveSaveMove moveChoice prevWorld nextWorld = unsafePerformIO $ do
  handle <- openFile "moves.csv" AppendMode
  let numbers = show <$>
      (Vector.toList (vectorizeWorld prevWorld) ++
        [fromIntegral moveChoice])
  ...

Now that our values are all strings, we can get them in a comma separated format with intercalate. We'll write this string to the file and close the handle!

unsafeSaveMove :: Int -> World -> World -> World
unsaveSaveMove moveChoice prevWorld nextWorld = unsafePerformIO $ do
  handle <- openFile "moves.csv" AppendMode
  let numbers = show <$>
        (Vector.toList (vectorizeWorld prevWorld) ++
          [fromIntegral moveChoice])
  let csvString = intercalate "," numbers
  hPutStrLn handle csvString
  hClose handle
  return nextWorld

Now let's figure out how to call this function!

Saving Human Moves

Saving the moves we make as a human is pretty easy. All we need to do is hook into the inputHandler. Recall this section, that receives moves from arrow keys and makes our move:

inputHandler :: Event -> World -> World
inputHandler event w
  ...
  | otherwise = case event of
      (EventKey (SpecialKey KeyUp) Down (Modifiers _ _ Down) _) ->
        drillLocation upBoundary breakUpWall breakDownWall w
      (EventKey (SpecialKey KeyUp) Down _ _) ->
        updatePlayerMove upBoundary
      (EventKey (SpecialKey KeyDown) Down (Modifiers _ _ Down) _) ->
        drillLocation downBoundary breakDownWall breakUpWall w
      (EventKey (SpecialKey KeyDown) Down _ _) ->
        updatePlayerMove downBoundary
      (EventKey (SpecialKey KeyRight) Down (Modifiers _ _ Down) _) ->
        drillLocation rightBoundary breakRightWall breakLeftWall w
      (EventKey (SpecialKey KeyRight) Down _ _) ->
        updatePlayerMove rightBoundary
      (EventKey (SpecialKey KeyLeft) Down (Modifiers _ _ Down) _) ->
        drillLocation leftBoundary breakLeftWall breakRightWall w
      (EventKey (SpecialKey KeyLeft) Down _ _) ->
        updatePlayerMove leftBoundary
      (EventKey (SpecialKey KeySpace) Down _ _) ->
        if playerCurrentStunDelay currentPlayer /= 0
          then w
          else w
            { worldPlayer =
                activatePlayerStun currentPlayer playerParams
            , worldEnemies = stunEnemyIfClose <$> worldEnemies w
            , stunCells = stunAffectedCells
            }
  …

All these lines return World objects! So we just need to wrap them as the final argument to unsafeSaveWorld. Then we add the appropriate move choice number. The strange part is that we cannot move AND stun at the same time when playing as a human. So using the stun will always be 9, which means stunning while standing still. Here are the updates:

inputHandler :: Event -> World -> World
inputHandler event w
  ...
  | otherwise = case event of
      (EventKey (SpecialKey KeyUp) Down (Modifiers _ _ Down) _) -> 
        unsafeSaveMove 0 w $
          drillLocation upBoundary breakUpWall breakDownWall w
      (EventKey (SpecialKey KeyUp) Down _ _) ->
        unsafeSaveMove 0 w $ updatePlayerMove upBoundary
      (EventKey (SpecialKey KeyDown) Down (Modifiers _ _ Down) _) -> 
        unsafeSaveMove 2 w $
          drillLocation downBoundary breakDownWall breakUpWall w
      (EventKey (SpecialKey KeyDown) Down _ _) ->
        unsafeSaveMove 2 w $ updatePlayerMove downBoundary
      (EventKey (SpecialKey KeyRight) Down (Modifiers _ _ Down) _) -> 
        unsafeSaveMove 1 w $
          drillLocation rightBoundary breakRightWall breakLeftWall w
      (EventKey (SpecialKey KeyRight) Down _ _) ->
        unsafeSaveMove 1 w $ updatePlayerMove rightBoundary
      (EventKey (SpecialKey KeyLeft) Down (Modifiers _ _ Down) _) ->
        unsafeSaveMove 3 w $
          drillLocation leftBoundary breakLeftWall breakRightWall w
      (EventKey (SpecialKey KeyLeft) Down _ _) ->
        unsafeSaveMove 3 w $ updatePlayerMove leftBoundary
      (EventKey (SpecialKey KeySpace) Down _ _) ->
        if playerCurrentStunDelay currentPlayer /= 0
          then w
          else unsafeSaveMove 9 w $ w
            { worldPlayer =
                activatePlayerStun currentPlayer playerParams
            , worldEnemies = stunEnemyIfClose <$> worldEnemies w
            , stunCells = stunAffectedCells
            }
  …

And now whenever we play the game, it will save our moves! Keep in mind though, it takes a lot of training data to get good results when using supervised learning. I played for an hour and got around 10000 data points. We'll see if this is enough!

Saving AI Moves

While the game is a least a little fun, it's also exhausting to keep playing it to generate data! So now let's consider how we can get the AI to play the game itself and generate data. The first step is to reset the game automatically on winning or losing:

updateFunc :: Float -> World -> World
updateFunc _ w =
  | (worldResult w == GameWon || worldResult w == GameLost) &&
       (usePlayerAI params) =
    ...

The rest will follow the other logic we have for resetting the game. Now we must examine where to insert our call to unsafeSaveMove. The answer is our updateWorldForPlayerMove function. Wecan see that we get the move (and our player's cached memory) as part of makePlayerMove:

updateWorldForPlayerMove :: World -> World
updateWorldForPlayerMove w = …
  where
    (move, memory) = makePlayerMove w
    ...

We'll want a quick function to convert our move into the number choice:

moveNumber :: PlayerMove -> Int
moveNumber (PlayerMove md useStun dd) =
  let directionFactor = case (md, dd) of
        (DirectionUp, _) -> 0
        (_, DirectionUp) -> 0
        (DirectionRight, _) -> 1
        (_, DirectionRight) -> 1
        (DirectionDown, _) -> 2
        (_, DirectionDown) -> 2
        (DirectionLeft, _) -> 3
        (_, DirectionLeft) -> 3
        _ -> 4
  in  if useStun then directionFactor + 5 else directionFactor

Our saving function requires a pass-through world parameter. So we'll do the saving on our first new World calculation. This comes from modifyWorldForPlayerDrill:

updateWorldForPlayerMove :: World -> World
updateWorldForPlayerMove w = …
  where
    (move, memory) = makePlayerMove w

    worldAfterDrill = unsafeSaveMove (moveNumber move) w
     (modifyWorldForPlayerDrill …)
    ...

And that's all! Now our AI will play the game by itself, gathering data for hours on end if we like! We'll get some different data for different cases, such as 4 enemies 4 drills, 8 enemies 5 drills, and so on. This is much faster and easier than playing the game ourselves! It will automatically get 12-15 thousand data points an hour if we let it!

Conclusion

With a little bit of persistence, we can now get a lot of data for the decisions a smarter agent will make. Next week, we'll take the data we've acquired and use it to write a supervised learning algorithm! Instead of using Q-learning, we'll make the weights reflect the decisions that we (or the AI) would make.

Supervised learning is not without its pitfalls! It won't necessarily perform optimally. It will perform like the training data. So even if we're successful, our algorithm will replicate our own mistakes! It'll be interesting to see how this plays out, so stay tuned!

For more information on using Haskell in AI, take a look at our Haskell AI Series. Plus, download our Haskell Tensor Flow Guide to learn more about using this library!

James Bowen 9/23/19 James Bowen 9/23/19

Tweaks, Fixes, and Some Results

In last week's episode of this AI series, we added random exploration to our algorithm. This helped us escape certain "traps" and local minimums in the model that could keep us rooted in bad spots. But it still didn't improve results too much.

This week we'll explore a couple more ways we can fix and improve our algorithm. For the first time, see some positive outcomes. Still, we'll find our approach still isn't great.

To get started with Tensor Flow and Haskell, download our guide! It's a complex process so you'll want some help! You should also check out our Haskell AI Series to learn more about why Haskell is a good choice as an AI language!

Improvements

To start out, there are a few improvements we can make to how we do q-learning. Let's recall the basic outline of running a world iteration. There are three steps. We get our "new" move from the "input" world. Then we apply that move, and get our "next" move against the "next" world. Then we use the possible reward to create our target actions, and use that to train our model.

runWorldIteration model = do
  (prevWorld, _, _) <- get

  -- Get the next move on the current world (with random chance)
  let inputWorldVector = … -- vectorize prevWorld
  currentMoveWeights <- lift $ lift $
    (iterateWorldStep model) inputWorldVector
  let bestMove = moveFromOutput currentMoveWeights
  let newMove = chooseRandomMoveWithChance …

  -- Get the next world using this move, and produce our next move
  let nextWorld = stepWorld newMove prevWorld
  let nextWorldVector = vectorizeWorld nextWorld
  nextMoveVector <- lift $ lift $
    (iterateWorldStep model) nextWorldVector

  -- Use these to get "target action values" and use them to train!
  let (bestNextMoveIndex, maxScore) =
          (V.maxIndex nextMoveVector, V.maximum nextMoveVector)
  let targetActionData = encodeTensorData (Shape [10, 1]) $
          nextMoveVector V.//
            [(bestNextMoveIndex, newReward + maxScore)]
  lift $ lift $ (trainStep model) nextWorldVector targetActionData

There are a couple issues here. First, we want to substitute based on the first new move, not the later move. We want to learn from the move we are taking now, since we assess its result now. Thus we want to substitute for that index. We'll re-write our randomizer to account for this and return the index it chooses.

Next, when training our model, we should the original world, instead of the next world. That is, we want inputWorldVector instead of nextWorldVector. Our logic is this. We get our "future" action, which accounts for the game's reward. We want our current action on this world should be more like the future action. Here's what the changes look like:

runWorldIteration model = do
  (prevWorld, _, _) <- get

  -- Get the next move on the current world (with random chance)
  let inputWorldVector = … -- vectorize prevWorld
  currentMoveWeights <- lift $ lift $
    (iterateWorldStep model) inputWorldVector
  let bestMove = moveFromOutput currentMoveWeights
  let (newMove, newMoveIndex) = chooseRandomMoveWithChance …

  -- Get the next world using this move, and produce our next move
  let nextWorld = stepWorld newMove prevWorld
  let nextWorldVector = vectorizeWorld nextWorld
  nextMoveVector <- lift $ lift $
    (iterateWorldStep model) nextWorldVector

  -- Use these to get "target action values" and use them to train!
  let maxScore = V.maximum nextMoveVector
  let targetActionData = encodeTensorData (Shape [10, 1]) $
          nextMoveVector V.//
            [(newMoveIndex, newReward + maxScore)]
  lift $ lift $ (trainStep model) inputWorldVector targetActionData

Another change we can make is to provide some rewards based on whether the selected move was legal or not. To do this, we'll need to update the stepWorld game API to return this boolean value:

stepWorld :: PlayerMove -> World -> (World, Bool)

Then we can add a small amount (0.01) to our reward value if we get a legal move, and subtract this otherwise.

As a last flourish, we should also add a timeout condition. Our next step will be to test on simple mazes that have no enemies. This means we'll never get eaten, so we need some loss condition if we get stuck. This timeout condition should have the same negative reward as losing.

Results

Now that we've made some improvements, we'll train on a very basic maze that's only 5x5 and has no walls and no enemies. Whereas we used to struggle to even finish this maze, we now achieve the goal a fair amount of the time. One of our training iterations achieved the goal around 2/3 of the time.

However, our bot is still useless against enemies! It loses every time if we try to train from scratch on a map with a single enemy. One attempt to circumvent this is to first train our weights to solve the empty maze. Then we can start with these weights as we attempt to avoid the enemy. That way, we have some pre-existing knowledge, and we don't have to learn everything at once. Still though, it doesn't result in much improvement. Typical runs only succeeded 40-50 times out of 2000 iterations.

Limiting Features

One conclusion we can draw is that we actually have too many features! Our intuition is that a larger feature set would take more iterations to learn. If the features aren't chosen carefully, they'll introduce noise.

So instead of tracking 8 features for each possible direction of movement, let's stick with 3. We'll see if the enemy is on the location, check the distance to the end, and count the number of nearby enemies. When we do this, we get comparable results on the empty maze. But when it comes to avoiding enemies, we do a little better, surviving 150-250 iterations out of 2000. These statistics are all very rough, of course. If we wanted a more thorough analysis, we'd use multiple maze configurations and a lot more runs using the finalized weights.

Conclusions

We can't draw too many conclusions from this yet. Our model is still failing to solve simple versions of our problem. It's quite possible that our model is too simplistic. After all, all we're doing is a simple matrix multiplication on our features. In theory, this should be able to solve the problem, but it may take a lot more iterations. The results stream we see also suggests local minimums are a big problem. Logging information reveals that we often die in the same spot in the maze many times in a row. The negative rewards aren't enough to draw us out, and we are often relying on random moves to find better outcomes.

So next week we're going to start changing our approach. We'll explore a way to introduce supervised learning into our process. This depends on "correct" data. We'll try a couple different ways to get that data. We'll use our own "human" input, as well as the good AI we've written in the past to solve this problem. All we need is a way to record the moves we make! So stay tuned!

James Bowen 9/16/19 James Bowen 9/16/19

Adding Random Exploration

Last week, we finally built a pipeline to use machine learning on our maze game. We made a Tensor Flow graph that could train a "brain" with weights so we could navigate the maze. This week, we'll see how our training works, or rather how it doesn't work. We'll consider how randomizing moves during training might help.

Our machine learning code lives in this repository. For this article, you'll want to look at the randomize-moves branch. Take a look here for the original game code. You'll want the q-learning branch in the main repo.

This part of the series uses Haskell and Tensor Flow. To learn more about using these together, download our Haskell Tensor Flow Guide!

Unsupervised Machine Learning

With a few tweaks, we can run our game using the new output weights. But what we'll find as we train the weights is that our bot never seems to win! It always seems to do the same thing! It might move up and then get stuck because it can't move up anymore. It might stand still the whole time and let the enemies come grab it. Why would this happen?

Remember that reinforcement learning depends on being able to reinforce good behaviors. Thus at some point, we have to hope our AI will win the game. Then it will get the good reward so that it can change its behavior to adapt and get good results more often. But if it never gets a good result in the whole training process, it will never learn good behaviors!

This is part of the challenge of unsupervised learning. In a supervised learning algorithm, we have specific good examples to learn from. One way to approach this would be to record our own moves of playing the game. Then the AI could learn directly from us! We'll probably try this approach in the future!

But q-learning is an unsupervised algorithm. We're forcing our AI to explore the world and learn for its own. But right now, it's only making moves that it thinks are "optimal." But with a random set of weights, the "optimal" moves aren't very optimal at all! Part of a good "exploration" plan means letting it choose moves from time to time that don't seem optimal.

Adding a Random Choice

As our first attempt to fix this, we'll add a "random move chance" to our training process. At each training step, our network chooses its "best" move, and we use that to update the world state. From now on, whenever we do this, we'll roll the dice. And if we get a number below our random chance, we'll pick a random move instead of our "best" move.

Over the course of training though, we want to decrease this random chance. In theory, our AI should be better as we train the network. So as we get closer to the end of training, we'll want to make fewer random decisions, and more "best" decisions. We'll aim to start this parameter as 1 in 5, and reduce it down to 1 in 50 as training continues. So how do we implement this?

First of all, we want to keep track of a value representing our chance of making a random move. Our runAllIterations function should be stateful in this parameter.

-- Third "Float" parameter is the random chance
runAllIterations :: Model -> World
  -> StateT ([Float, Int, Float) Session ()
...

trainGame :: World -> Session (Vector Float)
trainGame w = do
  model <- buildModel
  let initialRandomChance = 0.2
  (finalReward, finalWinCount, _) <- execStateT
    (runAllIterations model w)
    ([], 0, initialRandomChance)
  run (readValue $ weightsT model)

Then within runAllIterations, we'll make two changes. First, we'll make a new random generator for each training game. Then, we'll update the random chance, reducing it with the number of iterations:

runAllIterations :: Model -> World
  -> StateT ([Float, Int, Float) Session ()
runAllIterations model initialWorld = do
  let numIterations = 2000
  forM [1..numIterations] $ \i -> do
    gen <- liftIO getStdGen
    (wonGame, (_, finalReward, _)) <- runStateT
      (runWorldIteration model)
      (initialWorld, 0.0, gen)
    (prevRewards, prevWinCount, randomChance) <- get
    let modifiedRandomChance = 1.0 / ((fromIntegral i / 40.0) + 5)
    put (newRewards, newWinCount, modifiedRandomChance)
  return ()

Making Random Moves

We can see now that runWorldIteration must now be stateful in the random generator. We'll retrieve that as well as the random chance at the start of the operation:

runWorldIteration :: Model -> StateT (World, Float, StdGen)
  (StateT ([Float], Int, Float) Session) Bool
runWorldIteration model = do
  (prevWorld, prevReward, gen) <- get
  (_, _, randomChance) <- lift get
  ...

Now let's refactor our serialization code a bit. We want to be able to make a new move based on the index, without needing the weights:

moveFromIndex :: Int -> PlayerMove
moveFromIndex bestMoveIndex =
  PlayerMove moveDirection useStun moveDirection
  where
    moveDirection = case bestMoveIndex `mod` 5 of
      0 -> DirectionUp
      1 -> DirectionRight
      2 -> DirectionDown
      3 -> DirectionLeft
      4 -> DirectionNone

Now we can add a function that will run the random generator and give us a random move if it's low enough. Otherwise, it will keep the best move.

chooseMoveWithRandomChance ::
  PlayerMove -> StdGen -> Float -> (PlayerMove, StdGen)
chooseMoveWithRandomChance bestMove gen randomChance =
  let (randVal, gen') = randomR (0.0, 1.0) gen
      (randomIndex, gen'') = randomR (0, 1) gen'
      randomMove = moveFromIndex randomIndex
  in  if randVal < randomChance
        then (randomMove, gen'')
        else (bestMove, gen')

Now it's a simple matter of applying this function, and we're all set!

runWorldIteration :: Model -> StateT (World, Float StdGen)
  (StateT ([Float], Int, Float) Session) Bool
runWorldIteration model = do
  (prevWorld, prevReward, gen) <- get
  (_, _, randomChance) <- lift get
  ...
  let bestMove = ...
  let (newMove, newGen) = chooseMoveWithRandomChance
                            bestMove gen randomChance
  …
  put (nextWorld, prevReward + newReward, newGen)
  continuationAction

Conclusion

When we test our bot, it has a bit more variety in its moves now, but it's still not succeeding. So what do we want to do about this? It's possible that something is wrong with our network or the algorithm. But it's difficult to reveal this when the problem space is difficult. After all, we're expecting this agent to navigate a complex maze AND avoid/stun enemies.

It might help to break this process down a bit. Next week, we'll start looking at simpler examples of mazes. We'll see if our current approach can be effective at navigating an empty grid. Then we'll see if we can take some of the weights we learned and use them as a starting point for harder problems. We'll try to navigate a true maze, and see if we get better weights. Then we'll look at an empty grid with enemies. And so on. This approach will make it more obvious if there are flaws with our machine learning method.

If you've never programmed in Haskell before, it might be a little hard to jump into machine learning. Check out our Beginners Checklist and our Liftoff Series to get started!

James Bowen 9/9/19 James Bowen 9/9/19

Running Training Iterations

In our last article we built a simple Tensor Flow model to perform Q-Learning on our brain. This week, we'll build out the rest of the code we need to run iterations on this model. This will train it to perform better and make more intelligent decisions.

The machine learning code for this project is in a separate repository from the game code. Check out MazeLearner to follow along. Everything for this article is on the basic-trainer branch.

To learn more about Haskell and AI, make sure to read our Haskell and AI Series!

Iterating on the Model

First let's recall what our Tensor Flow model looks like:

data Model = Model
  { weightsT :: Variable Float
  , iterateWorldStep :: TensorData Float -> Session (Vector Float)
  , trainStep :: TensorData Float -> TensorData Float -> Session ()
  }

We need to think about how we're going to use the last two functions of it. We want to iterate on and make updates to the weights. Across the different iterations, there's certain information we need to track. The first value we'll track is the list of "rewards" from each iteration (this will be more clear in the next section). Then we'll also track the number of wins we get in the iteration.

To track these, we'll use the State monad, run on top the the Session.

runAllIterations :: Model -> World -> StateT ([Float], Int) Session ()

We'll also want a function to run a single iteration. This, in turn, will have its own state information. It will track the World state of the game it's playing. It will also track sum of the accumulated reward values from the moves in that game. Since we'll run it from our function above, it will have a nested StateT type. It will ultimately return a boolean value indicating if we have won the game. We'll define the details in the next section:

runWorldIteration :: Model ->
  StateT (World, Float) (StateT ([Float], Int) Session) Bool

We can now start by filling out our function for running all the iterations. Supposing we'll perform 1000 iterations, we'll make a loop for each iteration. We can start each loop by running the world iteration function on the current model.

runAllIterations :: Model -> World -> StateT ([Float], Int) Session ()
runAllIterations model initialWorld = do
  let numIterations = 1000
  void $ forM [1..numIterations] $ \i -> do
    (wonGame, (_, finalReward)) <-
      runStateT (runWorldIteration model world)
    ...

And now the rest is a simple matter of using our the results to update the existing state:

runAllIterations :: Model -> World -> StateT ([Float], Int) Session ()
runAllIterations model initialWorld = do
  let numIterations = 2000
  forM [1..numIterations] $ \i -> do
    (wonGame, (_, finalReward)) <-
      runStateT (runWorldIteration model) (initialWorld, 0.0)
    (prevRewards, prevWinCount) <- get
    let newRewards = finalReward : prevRewards
    let newWinCount = if wonGame
          then prevWinCount + 1
          else prevWinCount
    put (newRewards, newWinCount)

Running a Single Iteration

Now let's delve into the process of a single iteration. Broadly speaking, we have four goals.

Take the current world and serialize it. Pass it through the iterateStep to get the move our model would make in this world.
Apply this move, getting the "next" world state.
Determine the scores for our moves in this next world. Apply the given reward as the score for the best of these moves.
Use this result to compare against our original moves. Feed it into the training step and update our weights.

Let's start with steps 1 and 2. We'll get the vector representation of the current world. Then we need to encode it as TensorData so we can pass it to an input feed. Next we run our model's iterate step and get our output move. Then we can use that to advance the world state using stepWorld and updateEnvironment.

runWorldIteration
  :: Model
  -> StateT (World, Float) (StateT ([Float], Int) Session) Bool
runWorldIteration model = do
  -- Serialize the world
  (prevWorld :: World, prevReward) <- get
  let (inputWorldVector :: TensorData Float) =
        encodeTensorData (Shape [1, 8]) (vectorizeWorld prevWorld)

  -- Run our model to get the output vector and de-serialize it
  -- Lift twice to get into the Session monad
  (currentMove :: Vector Float) <- lift $ lift $
    (iterateWorldStep model) inputWorldVector
  let newMove = moveFromOutput currentMove

  -- Get the next world state
  let nextWorld = updateEnvironment (stepWorld newMove prevWorld)

Now we need to perform the Q-Learning step. We'll start by repeating the process in our new world state and getting the next vector of move scores:

runWorldIteration model = do
  ...
  let nextWorld = updateEnvironment (stepWorld newMove prevWorld)

  let nextWorldVector =
        encodeTensorData (Shape [1, 8]) (vectorizeWorld nextWorld)

  (nextMoveVector :: Vector Float) <- lift $ lift $
    (iterateWorldStep model) nextWorldVector
  ...

Now it gets a little tricky. We want to examine if the game is over after our last move. If we won, we'll get a reward of 1.0. If we lost, we'll get a reward of -1.0. Otherwise, there's no reward. While we figure out this reward value, we can also determine our final monadic action. We could return a boolean value if the game is over, or recursively iterate again:

runWorldIteration model = do
  ...
  let nextWorld = ...
  (nextMoveVector :: Vector Float) <- ...
  let (newReward, containuationAction) = case worldResult nextWorld of
        GameInProgress -> (0.0, runWorldIteration model)
        GameWon -> (1.0, return True)
        GameLost -> (-1.0, return False)
  ...

Now we'll look at the vector for our next move and replace one of its values. We'll find the maximum score, and replace it with a value that factors in the actual reward we get from the game. This is how we insert "truth" into our training process and how we'll actually learn good reward values.

import qualified Data.Vector as V

runWorldIteration model = do
  ...
  let nextWorld = ...
  (nextMoveVector :: Vector Float) <- ...
  let (newReward, containuationAction) = ...
  let (bestNextMoveIndex, maxScore) =
        (V.maxIndex nextMoveVector, V.maximum nextMoveVector)
  let (targetActionValues :: Vector Float) = nextMoveVector V.//
        [(bestNextMoveIndex, newReward + (0.99 * maxScore))]
  let targetActionData =
        encodeTensorData (Shape [10, 1]) targetActionValues
  ...

Then we'll encode this new vector as the second input to our training step. We'll still use the nextWorldVector as the first input. We conclude by updating our state variables to have their new values. Then we run the continuation action we got earlier.

runWorldIteration model = do
  ...
  let nextWorld = ...
  (nextMoveVector :: Vector Float) <- ...
  let targetActionData = ...

  -- Run training to alter the weights
  lift $ lift $ (trainStep model) nextWorldVector targetActionData
  put (nextWorld, prevReward + newReward)
  continuationAction

Tying It Together

Now to make this code run, we need a little bit of code to tie it together. We'll make a Session action to train our game. It will output the final weights of our model.

trainGame :: World -> Session (Vector Float)
trainGame w = do
  model <- buildModel
  (finalReward, finalWinCount) <-
    execStateT (runAllIterations model w) ([], 0)
  run (readValue $ weightsT model)

Then we can run this from IO using runSession.

playGameTraining :: World -> IO (Vector Float)
playGameTraining w = runSession (trainGame w)

Last of all, we can run this on any World we like by first loading it from a file. For our first examples, we'll use a smaller 10x10 grid with 2 enemies and 1 drill powerup.

main :: IO ()
main = do
  world <- loadWorldFromFile "training_games/maze_grid_10_10_2_1.game"
  finalWeights <- playGameTraining world
  print finalWeights

Conclusion

We've now got the basics down for making our Tensor Flow program work. Come back next week where we'll take a more careful look at how it's performing. We'll see if the AI from this process is actually any good or if there are tweaks we need to make to the learning process.

And make sure to download our Haskell Tensor Flow Guide! This library is difficult to use. There are a lot of secondary dependencies for it. So don't go in trying to use it blind!

James Bowen 9/2/19 James Bowen 9/2/19

Making a Learning Model

Last week we took a few more steps towards using machine learning to improve the player AI for our maze game. We saw how to vectorize the input and output data for our world state and moves. This week, we'll finally start seeing how to use these in the larger context of a Tensor Flow program. We'll make a model for a super basic neural network that will apply the technique of Q-Learning.

Our machine learning code will live in a separate repository than the primary game code. Be sure to check that out here! The first couple weeks of this part of the series will use the basic-trainer branch.

This week, we'll finally started diving into using Haskell with Tensor Flow. Be sure to read our Haskell AI Series to learn more about this! You can also download our Haskell Tensor Flow guide to learn the basics of the library.

Model Basics

This week's order of business will be to build a Tensor Flow graph that can make decisions in our maze game. The graph should take a serialized world state as an input, and then produce a distribution of scores. These scores correspond to the different moves we can make.

Re-calling from last week, the input to our model will be a 1x8 vector, and the output will be a 10x1 vector. For now then, we'll represent our model with a single variable tensor that will be a matrix of size 8x10. We'll get the output by multiply the inputs by the weights.

Ultimately, there are three things we need to access from this model.

The final weights
A step to iterate the world
A step to train our model and adjust the weights.

Here's what the model looks like, using Tensor Flow types:

data Model = Model
  { weightsT :: Variable Float
  , iterateWorldStep :: TensorData Float -> Session (Vector Float)
  , trainStep :: TensorData Float -> TensorData Float -> Session ()
  }

The first element is the variable tensor for our weights. We need to expose this so we can output them at the end. The second element is a function that will take in a serialized world state and produce the output move. Then the third element will take both a serialized world state AND some expected values. It will update the variable tensor as part of the Q-Learning process. Next week, we'll write iteration functions in the Session monad. They'll use these two elements.

Building the Iterate Step

To make these Tensor Flow items, we'll also need to use the Session monad. Let's start a basic function to build up our model:

buildModel :: Session Model
buildModel = do
  ...

To start, let's make a variable for our weights. At the start, we'll randomize them with truncatedNormal and then make that into a Variable:

buildModel :: Session Model
buildModel = do
  (initialWeights :: Tensor Value Float) <-
    truncatedNormal (vector [8, 10])
  (weights :: Variable Float) <- initializedVariable initialWeights

Now let's build the items for running our iterate step. This first involves taking the inputs as a placeholder. Remember, the inputs come from the vectorization of the world state.

Then to produce our output, we'll multiply the inputs by our weights. The result is a Build tensor, so we need to render it to use it in the next part. As an extra note, we need readValue to turn our Variable into a Tensor we can use in operations.

buildModel :: Session Model
buildModel = do
  (initialWeights :: Tensor Value Float) <-
    truncatedNormal (vector [8, 10])
  (weights :: Variable Float) <- initializedVariable initialWeights
  (inputs :: Tensor Value Float) <- placeholder (Shape [1,8])
  let (allOutputs :: Tensor Build Float) =
        inputs `matMul` (readValue weights)
  returnedOutputs <- render allOutputs
  ...

The next part is to create a step to "run" the outputs. Since the outputs depend on a placeholder, we need to create a feed for the input. Then we can create a runnable Session action with runWithFeeds. This gives us the second element of our Model, the iterateStep.

buildModel :: Session Model
buildModel = do
  ...
  let iterateStep = \inputFeed ->
        runWithFeeds [feed inputs inputFeed] returnedOutputs
  ...

Using Q-Learning in the Model

This gives us what we need to run our basic AI and make moves in the game. But we still need to apply some learning mechanism to update the weights!

We want to use Q-Learning. This means we'll compare the output of our model with the next output from continuing to step through the world. So first let's introduce another placeholder for these new outputs:

buildModel :: Session Model
buildModel = do
  initialWeights <- ...
  weights <- ...
  inputs <- ...
  returnedOutputs <- ...

  let iterateStep = ...

  -- Next set of outputs
  (nextOutputs :: Tensor Value Float) <- placeholder (Shape [10, 1])
  ...

Now we'll define our "loss" function. That is, we'll find the squared difference between our real output and the "next" output. Next week we'll see that the "next" output uses extra information about the game. This will allow us to bring an element of "truth" that we can learn from.

buildModel :: Session Model
buildModel = do
  ...
  returnedOutputs <- ...

  -- Q-Learning Section
  (nextOutputs :: Tensor Value Float) <- placeholder (Shape [10, 1])
  let (diff :: Tensor Build Float) = nextOutputs `sub` allOutputs
  let (loss :: Tensor Build Float) = reduceSum (diff `mul` diff)
  ...

Now, we'll make a final ControlNode using minimizeWith. This will minimize the loss function using the adam optimizer. We'll pass weights as an input, since this is a variable we are trying to update for this change.

buildModel :: Session Model
buildModel = do
  ...
  returnedOutputs <- ...

  -- Q-Learning Section
  (nextOutputs :: Tensor Value Float) <- placeholder (Shape [10, 1])
  let (diff :: Tensor Build Float) = nextOutputs `sub` allOutputs
  let (loss :: Tensor Build Float) = reduceSum (diff `mul` diff)
  (trainer_ :: ControlNode) <- minimizeWith adam loss [weights]

Finally, we'll make our training step, that will run the training node on two input feeds. One for the world input, and one for the expected output. Then we can return our completed model.

buildModel :: Session Model
buildModel = do
  ...
  inputs <- ...
  weights <- ...
  returnedOutputs <- ...
  let iterateStep = ...

  -- Q-Learning Section
  (nextOutputs :: Tensor Value Float) <- placeholder (Shape [10, 1])
  let diff = ...
  let loss = ...
  (trainer_ :: ControlNode) <- minimizeWith adam loss [weights]
  let trainingStep = \inputFeed nextOutputFeed -> runWithFeeds
        [ feed inputs inputFeed
        , feed nextOutputs nextOutputFeed
        ]
        trainer_
  return $ Model
    weights
    iterateStep
    trainingStep

Conclusion

Now we've got our machine learning model. We have different functions that can iterate on our world state as well as train the outputs of our graph. Next week, we'll see how to combine these steps within the Session monad. Then we can start running training iterations and produce results.

If you want to follow along with these code examples, make sure to download our Haskell Tensor Flow Guide! This library is quite tricky to use. There are a lot of secondary dependencies for it. So you won't want to go in trying to use it blind!

James Bowen 8/26/19 James Bowen 8/26/19

Q-Learning Primer

This week, we're going to take the machine learning process in a different direction than I expected. In the last couple weeks, we've built a simple evaluation function for our world state. We could learn this function using an approach called Temporal Difference learning. We might come back to this approach at some point. But for now, we're actually going to try something a little different.

Instead, we're going to focus on a technique called Q-Learning. Instead of an evaluation function for the world, we're going to learn makePlayerMove. We'll keep most of the same function structure. We're still going to take our world and turn it into a feature space that we can represent as a numeric vector. But instead of producing a single output, we'll give a score for every move from that position. This week, we'll take the basic steps to ready our game for this approach.

As always, check out the Github repository repository for this project. This week's code is on the q-learning branch!

Next week, we'll finally get into some Tensor Flow code. Make sure you're ready for it by reading up on our Tensor Flow Guide!

Vectorizing Inputs

To learn a function, we need to be able to represent both the inputs and the outputs of our system as numeric vectors. We've already done most of the work here. Let's recall our evaluateWorld function. We'll keep the same feature values. But now we'll wrap them, instead of applying scores immediately:

data WorldFeatures = WorldFeatures
  { onActiveEnemy :: Int
  , shortestPathLength :: Int
  , manhattanDistance :: Int
  , enemiesOnPath :: Int
  , nearestEnemyDistance :: Int
  , numNearbyEnemies :: Int
  , stunAvailable :: Int
  , drillsRemaining :: Int
  }

produceWorldFeatures :: World -> WorldFeatures
produceWorldFeatures w = WorldFeatures
  (if onActiveEnemy then 1 else 0)
  shortestPathLength
  manhattanDistance
  enemiesOnPath
  nearestEnemyDistance
  numNearbyEnemies
  (if stunAvailable then 1 else 0)
  (fromIntegral drillsRemaining)
  where
    -- Calculated as before
    onActiveEnemy = ...
    enemiesOnPath = ...
    shortestPathLength = ...
    nearestEnemyDistance = ...
    manhattanDistance = ...
    stunAvailable = ...
    numNearbyEnemies = ...
    drillsRemaining = ...

Now, in our ML code, we'll want to convert this into a vector. Using a vector will enable use to encode this information as a tensor.

vectorizeWorld :: World -> Vector Float
vectorizeWorld w = fromList (fromIntegral <$>
  [ wfOnActiveEnemy features
  , wfShortestPathLength features
  , wfManhattanDistance features
  , wfEnemiesOnPath features
  , wfNearestEnemyDistance features
  , wfNumNearbyEnemies features
  , wfStunAvailable features
  , wfDrillsRemaining features
  ])
  where
    features = produceWorldFeatures w

Vectorizing Outputs

Now we have the inputs to our tensor system. We'll ultimately get a vector of outputs as a result. We want this vector to provide a score for every move. We have 10 potential moves in general. There are five "movement" directions, moving up, right, down, left, and standing still. Then for each direction, we can either use our stun or not. We'll use a drill when the movement direction sends us against a wall. Certain moves won't be available in certain situations. But our size should account for them all.

Our function will often propose invalid moves. For example, it might suggest using the stun while on cooldown, or drilling when we don't have one. In these cases, our game logic should dictate that our player doesn't move. Hopefully this trains our network to make correct moves. If we wanted to, we could even apply a slight negative reward for these.

What we need is the ability to convert a vector of outputs into a move. Once we fix the vector size, this is not difficult. As a slight hack, this function will always give the same direction for moving and drilling. We'll let the game logic determine if the drill needs to apply.

moveFromOutput :: Vector Int -> PlayerMove
moveFromOutput vals = PlayerMove moveDirection useStun moveDirection
  where
    bestMoveIndex = maxIndex vals
    moveDirection = case bestMoveIndex `mod` 5 of
      0 -> DirectionUp
      1 -> DirectionRight
      2 -> DirectionDown
      3 -> DirectionLeft
      4 -> DirectionNone
    useStun = bestMoveIndex > 4

Discrete Updates

Now that we can get numeric vectors for everything, we need to be able to step through the world, one player move at a time. We currently have a generic update function. Depending on the time, it might step the player forward, or it might not. We want to change this so there are two steps. First we receive a player's move, and then we step the world forward until it is time for the next player move:

stepWorld :: PlayerMove -> World -> World

This isn't too difficult; it just requires a little shuffling of our existing code. First, we'll add a new applyPlayerMove' function. This will take the existing applyPlayerMove and add a little bit of validation to it:

applyPlayerMove' :: PlayerMove -> World -> World
applyPlayerMove' move w = if isValidMove
  then worldAfterMove
  else w
  where
    player = worldPlayer w
    currentLoc = playerLocation player

    worldAfterDrill = modifyWorldForPlayerDrill w
      (drillDirection move)

    worldAfterStun = if activateStun move
      then modifyWorldForStun worldAfterDrill
      else worldAfterDrill

    newLocation = nextLocationForMove
      (worldBoundaries worldAfterDrill Array.! currentLoc) 
      currentLoc
      (playerMoveDirection move)

    isValidStunUse = if activateStun move
      then playerCurrentStunDelay player == 0
      else True
    isValidMovement = playerMoveDirection move == DirectionNone ||
      newLocation /= currentLoc
    isValidMove = isValidStunUse && isValidMovement

    worldAfterMove =
      modifyWorldForPlayerMove worldAfterStun newLocation

Now we'll add an updateEnvironment function. This will perform all the work of our updateFunc except for moving the player.

updateEnvironment :: World -> World
updateEnvironment w
  | playerLocation player == endLocation w =
    w { worldResult = GameWon }
  | playerLocation player `elem` activeEnemyLocations =
    w { worldResult = GameLost }
  | otherwise =
    updateWorldForEnemyTicks .
    updateWorldForPlayerTick .
    updateWorldForEnemyMoves .
    clearStunCells .
    incrementWorldTime $ w
  where
    player = worldPlayer w
    activeEnemyLocations = enemyLocation <$>
      filter (\e -> enemyCurrentStunTimer e == 0) (worldEnemies w)

Now we combine these. First we'll make the player's move. Then we'll update the environment once for each tick of the player's "lag" time.

stepWorld :: PlayerMove -> World -> World
stepWorld move w = execStateM (sequence updateActions) worldAfterMove
  where
    worldAfterMove = applyPlayerMove' move w
    updateActions = replicate
      ( fromIntegral .
        lagTime .
        playerGameParameters .
        worldParameters $ w)
      (modify updateEnvironment)

And these are all the modifications we'll need to get going!

Q Learning Teaser

Now we can start thinking about the actual machine learning process. We'll get into a lot more detail next week. But for now, let's think about a particular training iteration. We'll want to use our existing network to step forward into the game. This will produce a certain "reward", and leave the game state in a new position. Then we'll get more values for our next moves out of that position. We'll use the updated move scores and the reward to learn better values for our function weights.

Of course, the immediate "reward" values for most moves will be 0. The only moves that will carry a reward will be those where we either win the game or lose the game. So it could take a while for our program to learn good behaviors. It will take time for the "end" behaviors of the game to affect normal moves. For this reason, we'll start our training on much smaller mazes than the primary game. This should help speed up the training process.

Conclusion

Next week, we'll take our general framework for Q-Learning and apply it within Tensor Flow. We'll get the basics of Q-Learning down with a couple different types of models. For a wider perspective on Haskell and AI problems, make sure to check out our Haskell AI Series!