# Tweaks, Fixes, and Some Results

In last week's episode of this AI series, we added random exploration to our algorithm. This helped us escape certain "traps" and local minimums in the model that could keep us rooted in bad spots. But it still didn't improve results too much.

This week we'll explore a couple more ways we can fix and improve our algorithm. For the first time, see some positive outcomes. Still, we'll find our approach still isn't great.

To get started with Tensor Flow and Haskell, download our guide! It's a complex process so you'll want some help! You should also check out our Haskell AI Series to learn more about why Haskell is a good choice as an AI language!

## Improvements

To start out, there are a few improvements we can make to how we do q-learning. Let's recall the basic outline of running a world iteration. There are three steps. We get our "new" move from the "input" world. Then we apply that move, and get our "next" move against the "next" world. Then we use the possible reward to create our target actions, and use that to train our model.

``````runWorldIteration model = do
(prevWorld, _, _) <- get

-- Get the next move on the current world (with random chance)
let inputWorldVector = … -- vectorize prevWorld
currentMoveWeights <- lift \$ lift \$
(iterateWorldStep model) inputWorldVector
let bestMove = moveFromOutput currentMoveWeights
let newMove = chooseRandomMoveWithChance …

-- Get the next world using this move, and produce our next move
let nextWorld = stepWorld newMove prevWorld
let nextWorldVector = vectorizeWorld nextWorld
nextMoveVector <- lift \$ lift \$
(iterateWorldStep model) nextWorldVector

-- Use these to get "target action values" and use them to train!
let (bestNextMoveIndex, maxScore) =
(V.maxIndex nextMoveVector, V.maximum nextMoveVector)
let targetActionData = encodeTensorData (Shape [10, 1]) \$
nextMoveVector V.//
[(bestNextMoveIndex, newReward + maxScore)]
lift \$ lift \$ (trainStep model) nextWorldVector targetActionData``````

There are a couple issues here. First, we want to substitute based on the first new move, not the later move. We want to learn from the move we are taking now, since we assess its result now. Thus we want to substitute for that index. We'll re-write our randomizer to account for this and return the index it chooses.

Next, when training our model, we should the original world, instead of the next world. That is, we want `inputWorldVector` instead of `nextWorldVector`. Our logic is this. We get our "future" action, which accounts for the game's reward. We want our current action on this world should be more like the future action. Here's what the changes look like:

``````runWorldIteration model = do
(prevWorld, _, _) <- get

-- Get the next move on the current world (with random chance)
let inputWorldVector = … -- vectorize prevWorld
currentMoveWeights <- lift \$ lift \$
(iterateWorldStep model) inputWorldVector
let bestMove = moveFromOutput currentMoveWeights
let (newMove, newMoveIndex) = chooseRandomMoveWithChance …

-- Get the next world using this move, and produce our next move
let nextWorld = stepWorld newMove prevWorld
let nextWorldVector = vectorizeWorld nextWorld
nextMoveVector <- lift \$ lift \$
(iterateWorldStep model) nextWorldVector

-- Use these to get "target action values" and use them to train!
let maxScore = V.maximum nextMoveVector
let targetActionData = encodeTensorData (Shape [10, 1]) \$
nextMoveVector V.//
[(newMoveIndex, newReward + maxScore)]
lift \$ lift \$ (trainStep model) inputWorldVector targetActionData``````

Another change we can make is to provide some rewards based on whether the selected move was legal or not. To do this, we'll need to update the `stepWorld` game API to return this boolean value:

``stepWorld :: PlayerMove -> World -> (World, Bool)``

Then we can add a small amount (0.01) to our reward value if we get a legal move, and subtract this otherwise.

As a last flourish, we should also add a timeout condition. Our next step will be to test on simple mazes that have no enemies. This means we'll never get eaten, so we need some loss condition if we get stuck. This timeout condition should have the same negative reward as losing.

## Results

Now that we've made some improvements, we'll train on a very basic maze that's only 5x5 and has no walls and no enemies. Whereas we used to struggle to even finish this maze, we now achieve the goal a fair amount of the time. One of our training iterations achieved the goal around 2/3 of the time.

However, our bot is still useless against enemies! It loses every time if we try to train from scratch on a map with a single enemy. One attempt to circumvent this is to first train our weights to solve the empty maze. Then we can start with these weights as we attempt to avoid the enemy. That way, we have some pre-existing knowledge, and we don't have to learn everything at once. Still though, it doesn't result in much improvement. Typical runs only succeeded 40-50 times out of 2000 iterations.

## Limiting Features

One conclusion we can draw is that we actually have too many features! Our intuition is that a larger feature set would take more iterations to learn. If the features aren't chosen carefully, they'll introduce noise.

So instead of tracking 8 features for each possible direction of movement, let's stick with 3. We'll see if the enemy is on the location, check the distance to the end, and count the number of nearby enemies. When we do this, we get comparable results on the empty maze. But when it comes to avoiding enemies, we do a little better, surviving 150-250 iterations out of 2000. These statistics are all very rough, of course. If we wanted a more thorough analysis, we'd use multiple maze configurations and a lot more runs using the finalized weights.

## Conclusions

We can't draw too many conclusions from this yet. Our model is still failing to solve simple versions of our problem. It's quite possible that our model is too simplistic. After all, all we're doing is a simple matrix multiplication on our features. In theory, this should be able to solve the problem, but it may take a lot more iterations. The results stream we see also suggests local minimums are a big problem. Logging information reveals that we often die in the same spot in the maze many times in a row. The negative rewards aren't enough to draw us out, and we are often relying on random moves to find better outcomes.

So next week we're going to start changing our approach. We'll explore a way to introduce supervised learning into our process. This depends on "correct" data. We'll try a couple different ways to get that data. We'll use our own "human" input, as well as the good AI we've written in the past to solve this problem. All we need is a way to record the moves we make! So stay tuned!

# Running Training Iterations

In our last article we built a simple Tensor Flow model to perform Q-Learning on our brain. This week, we'll build out the rest of the code we need to run iterations on this model. This will train it to perform better and make more intelligent decisions.

The machine learning code for this project is in a separate repository from the game code. Check out MazeLearner to follow along. Everything for this article is on the `basic-trainer` branch.

## Iterating on the Model

First let's recall what our Tensor Flow model looks like:

``````data Model = Model
{ weightsT :: Variable Float
, iterateWorldStep :: TensorData Float -> Session (Vector Float)
, trainStep :: TensorData Float -> TensorData Float -> Session ()
}``````

We need to think about how we're going to use the last two functions of it. We want to iterate on and make updates to the weights. Across the different iterations, there's certain information we need to track. The first value we'll track is the list of "rewards" from each iteration (this will be more clear in the next section). Then we'll also track the number of wins we get in the iteration.

To track these, we'll use the `State` monad, run on top the the `Session`.

``runAllIterations :: Model -> World -> StateT ([Float], Int) Session ()``

We'll also want a function to run a single iteration. This, in turn, will have its own state information. It will track the `World` state of the game it's playing. It will also track sum of the accumulated reward values from the moves in that game. Since we'll run it from our function above, it will have a nested `StateT` type. It will ultimately return a boolean value indicating if we have won the game. We'll define the details in the next section:

``````runWorldIteration :: Model ->
StateT (World, Float) (StateT ([Float], Int) Session) Bool``````

We can now start by filling out our function for running all the iterations. Supposing we'll perform 1000 iterations, we'll make a loop for each iteration. We can start each loop by running the world iteration function on the current model.

``````runAllIterations :: Model -> World -> StateT ([Float], Int) Session ()
runAllIterations model initialWorld = do
let numIterations = 1000
void \$ forM [1..numIterations] \$ \i -> do
(wonGame, (_, finalReward)) <-
runStateT (runWorldIteration model world)
...``````

And now the rest is a simple matter of using our the results to update the existing state:

``````runAllIterations :: Model -> World -> StateT ([Float], Int) Session ()
runAllIterations model initialWorld = do
let numIterations = 2000
forM [1..numIterations] \$ \i -> do
(wonGame, (_, finalReward)) <-
runStateT (runWorldIteration model) (initialWorld, 0.0)
(prevRewards, prevWinCount) <- get
let newRewards = finalReward : prevRewards
let newWinCount = if wonGame
then prevWinCount + 1
else prevWinCount
put (newRewards, newWinCount)``````

## Running a Single Iteration

Now let's delve into the process of a single iteration. Broadly speaking, we have four goals.

1. Take the current world and serialize it. Pass it through the `iterateStep` to get the move our model would make in this world.
2. Apply this move, getting the "next" world state.
3. Determine the scores for our moves in this next world. Apply the given reward as the score for the best of these moves.
4. Use this result to compare against our original moves. Feed it into the training step and update our weights.

Let's start with steps 1 and 2. We'll get the vector representation of the current world. Then we need to encode it as `TensorData` so we can pass it to an input feed. Next we run our model's iterate step and get our output move. Then we can use that to advance the world state using `stepWorld` and `updateEnvironment`.

``````runWorldIteration
:: Model
-> StateT (World, Float) (StateT ([Float], Int) Session) Bool
runWorldIteration model = do
-- Serialize the world
(prevWorld :: World, prevReward) <- get
let (inputWorldVector :: TensorData Float) =
encodeTensorData (Shape [1, 8]) (vectorizeWorld prevWorld)

-- Run our model to get the output vector and de-serialize it
-- Lift twice to get into the Session monad
(currentMove :: Vector Float) <- lift \$ lift \$
(iterateWorldStep model) inputWorldVector
let newMove = moveFromOutput currentMove

-- Get the next world state
let nextWorld = updateEnvironment (stepWorld newMove prevWorld)``````

Now we need to perform the Q-Learning step. We'll start by repeating the process in our new world state and getting the next vector of move scores:

``````runWorldIteration model = do
...
let nextWorld = updateEnvironment (stepWorld newMove prevWorld)

let nextWorldVector =
encodeTensorData (Shape [1, 8]) (vectorizeWorld nextWorld)

(nextMoveVector :: Vector Float) <- lift \$ lift \$
(iterateWorldStep model) nextWorldVector
...``````

Now it gets a little tricky. We want to examine if the game is over after our last move. If we won, we'll get a reward of 1.0. If we lost, we'll get a reward of -1.0. Otherwise, there's no reward. While we figure out this reward value, we can also determine our final monadic action. We could return a boolean value if the game is over, or recursively iterate again:

``````runWorldIteration model = do
...
let nextWorld = ...
(nextMoveVector :: Vector Float) <- ...
let (newReward, containuationAction) = case worldResult nextWorld of
GameInProgress -> (0.0, runWorldIteration model)
GameWon -> (1.0, return True)
GameLost -> (-1.0, return False)
...``````

Now we'll look at the vector for our next move and replace one of its values. We'll find the maximum score, and replace it with a value that factors in the actual reward we get from the game. This is how we insert "truth" into our training process and how we'll actually learn good reward values.

``````import qualified Data.Vector as V

runWorldIteration model = do
...
let nextWorld = ...
(nextMoveVector :: Vector Float) <- ...
let (newReward, containuationAction) = ...
let (bestNextMoveIndex, maxScore) =
(V.maxIndex nextMoveVector, V.maximum nextMoveVector)
let (targetActionValues :: Vector Float) = nextMoveVector V.//
[(bestNextMoveIndex, newReward + (0.99 * maxScore))]
let targetActionData =
encodeTensorData (Shape [10, 1]) targetActionValues
...``````

Then we'll encode this new vector as the second input to our training step. We'll still use the `nextWorldVector` as the first input. We conclude by updating our state variables to have their new values. Then we run the continuation action we got earlier.

``````runWorldIteration model = do
...
let nextWorld = ...
(nextMoveVector :: Vector Float) <- ...
let targetActionData = ...

-- Run training to alter the weights
lift \$ lift \$ (trainStep model) nextWorldVector targetActionData
put (nextWorld, prevReward + newReward)
continuationAction``````

## Tying It Together

Now to make this code run, we need a little bit of code to tie it together. We'll make a `Session` action to train our game. It will output the final weights of our model.

``````trainGame :: World -> Session (Vector Float)
trainGame w = do
model <- buildModel
(finalReward, finalWinCount) <-
execStateT (runAllIterations model w) ([], 0)
run (readValue \$ weightsT model)``````

Then we can run this from `IO` using `runSession`.

``````playGameTraining :: World -> IO (Vector Float)
playGameTraining w = runSession (trainGame w)``````

Last of all, we can run this on any `World` we like by first loading it from a file. For our first examples, we'll use a smaller 10x10 grid with 2 enemies and 1 drill powerup.

``````main :: IO ()
main = do
world <- loadWorldFromFile "training_games/maze_grid_10_10_2_1.game"
finalWeights <- playGameTraining world
print finalWeights``````

## Conclusion

We've now got the basics down for making our Tensor Flow program work. Come back next week where we'll take a more careful look at how it's performing. We'll see if the AI from this process is actually any good or if there are tweaks we need to make to the learning process.

And make sure to download our Haskell Tensor Flow Guide! This library is difficult to use. There are a lot of secondary dependencies for it. So don't go in trying to use it blind!

# Making a Learning Model

Last week we took a few more steps towards using machine learning to improve the player AI for our maze game. We saw how to vectorize the input and output data for our world state and moves. This week, we'll finally start seeing how to use these in the larger context of a Tensor Flow program. We'll make a model for a super basic neural network that will apply the technique of Q-Learning.

Our machine learning code will live in a separate repository than the primary game code. Be sure to check that out here! The first couple weeks of this part of the series will use the `basic-trainer` branch.

## Model Basics

This week's order of business will be to build a Tensor Flow graph that can make decisions in our maze game. The graph should take a serialized world state as an input, and then produce a distribution of scores. These scores correspond to the different moves we can make.

Re-calling from last week, the input to our model will be a 1x8 vector, and the output will be a 10x1 vector. For now then, we'll represent our model with a single variable tensor that will be a matrix of size 8x10. We'll get the output by multiply the inputs by the weights.

Ultimately, there are three things we need to access from this model.

1. The final weights
2. A step to iterate the world
3. A step to train our model and adjust the weights.

Here's what the model looks like, using Tensor Flow types:

``````data Model = Model
{ weightsT :: Variable Float
, iterateWorldStep :: TensorData Float -> Session (Vector Float)
, trainStep :: TensorData Float -> TensorData Float -> Session ()
}``````

The first element is the variable tensor for our weights. We need to expose this so we can output them at the end. The second element is a function that will take in a serialized world state and produce the output move. Then the third element will take both a serialized world state AND some expected values. It will update the variable tensor as part of the Q-Learning process. Next week, we'll write iteration functions in the `Session` monad. They'll use these two elements.

## Building the Iterate Step

To make these Tensor Flow items, we'll also need to use the `Session` monad. Let's start a basic function to build up our model:

``````buildModel :: Session Model
buildModel = do
...``````

To start, let's make a variable for our weights. At the start, we'll randomize them with `truncatedNormal` and then make that into a `Variable`:

``````buildModel :: Session Model
buildModel = do
(initialWeights :: Tensor Value Float) <-
truncatedNormal (vector [8, 10])
(weights :: Variable Float) <- initializedVariable initialWeights``````

Now let's build the items for running our iterate step. This first involves taking the inputs as a placeholder. Remember, the inputs come from the vectorization of the world state.

Then to produce our output, we'll multiply the inputs by our weights. The result is a `Build` tensor, so we need to `render` it to use it in the next part. As an extra note, we need `readValue` to turn our `Variable` into a `Tensor` we can use in operations.

``````buildModel :: Session Model
buildModel = do
(initialWeights :: Tensor Value Float) <-
truncatedNormal (vector [8, 10])
(weights :: Variable Float) <- initializedVariable initialWeights
(inputs :: Tensor Value Float) <- placeholder (Shape [1,8])
let (allOutputs :: Tensor Build Float) =
inputs `matMul` (readValue weights)
returnedOutputs <- render allOutputs
...``````

The next part is to create a step to "run" the outputs. Since the outputs depend on a placeholder, we need to create a feed for the input. Then we can create a runnable `Session` action with `runWithFeeds`. This gives us the second element of our `Model`, the `iterateStep`.

``````buildModel :: Session Model
buildModel = do
...
let iterateStep = \inputFeed ->
runWithFeeds [feed inputs inputFeed] returnedOutputs
...``````

## Using Q-Learning in the Model

This gives us what we need to run our basic AI and make moves in the game. But we still need to apply some learning mechanism to update the weights!

We want to use Q-Learning. This means we'll compare the output of our model with the next output from continuing to step through the world. So first let's introduce another placeholder for these new outputs:

``````buildModel :: Session Model
buildModel = do
initialWeights <- ...
weights <- ...
inputs <- ...
returnedOutputs <- ...

let iterateStep = ...

-- Next set of outputs
(nextOutputs :: Tensor Value Float) <- placeholder (Shape [10, 1])
...``````

Now we'll define our "loss" function. That is, we'll find the squared difference between our real output and the "next" output. Next week we'll see that the "next" output uses extra information about the game. This will allow us to bring an element of "truth" that we can learn from.

``````buildModel :: Session Model
buildModel = do
...
returnedOutputs <- ...

-- Q-Learning Section
(nextOutputs :: Tensor Value Float) <- placeholder (Shape [10, 1])
let (diff :: Tensor Build Float) = nextOutputs `sub` allOutputs
let (loss :: Tensor Build Float) = reduceSum (diff `mul` diff)
...``````

Now, we'll make a final `ControlNode` using `minimizeWith`. This will minimize the loss function using the `adam` optimizer. We'll pass `weights` as an input, since this is a variable we are trying to update for this change.

``````buildModel :: Session Model
buildModel = do
...
returnedOutputs <- ...

-- Q-Learning Section
(nextOutputs :: Tensor Value Float) <- placeholder (Shape [10, 1])
let (diff :: Tensor Build Float) = nextOutputs `sub` allOutputs
let (loss :: Tensor Build Float) = reduceSum (diff `mul` diff)
(trainer_ :: ControlNode) <- minimizeWith adam loss [weights]``````

Finally, we'll make our training step, that will run the training node on two input feeds. One for the world input, and one for the expected output. Then we can return our completed model.

``````buildModel :: Session Model
buildModel = do
...
inputs <- ...
weights <- ...
returnedOutputs <- ...
let iterateStep = ...

-- Q-Learning Section
(nextOutputs :: Tensor Value Float) <- placeholder (Shape [10, 1])
let diff = ...
let loss = ...
(trainer_ :: ControlNode) <- minimizeWith adam loss [weights]
let trainingStep = \inputFeed nextOutputFeed -> runWithFeeds
[ feed inputs inputFeed
, feed nextOutputs nextOutputFeed
]
trainer_
return \$ Model
weights
iterateStep
trainingStep``````

## Conclusion

Now we've got our machine learning model. We have different functions that can iterate on our world state as well as train the outputs of our graph. Next week, we'll see how to combine these steps within the `Session` monad. Then we can start running training iterations and produce results.

If you want to follow along with these code examples, make sure to download our Haskell Tensor Flow Guide! This library is quite tricky to use. There are a lot of secondary dependencies for it. So you won't want to go in trying to use it blind!

# Q-Learning Primer

This week, we're going to take the machine learning process in a different direction than I expected. In the last couple weeks, we've built a simple evaluation function for our world state. We could learn this function using an approach called Temporal Difference learning. We might come back to this approach at some point. But for now, we're actually going to try something a little different.

Instead, we're going to focus on a technique called Q-Learning. Instead of an evaluation function for the world, we're going to learn `makePlayerMove`. We'll keep most of the same function structure. We're still going to take our world and turn it into a feature space that we can represent as a numeric vector. But instead of producing a single output, we'll give a score for every move from that position. This week, we'll take the basic steps to ready our game for this approach.

As always, check out the Github repository repository for this project. This week's code is on the `q-learning` branch!

Next week, we'll finally get into some Tensor Flow code. Make sure you're ready for it by reading up on our Tensor Flow Guide!

## Vectorizing Inputs

To learn a function, we need to be able to represent both the inputs and the outputs of our system as numeric vectors. We've already done most of the work here. Let's recall our `evaluateWorld` function. We'll keep the same feature values. But now we'll wrap them, instead of applying scores immediately:

``````data WorldFeatures = WorldFeatures
{ onActiveEnemy :: Int
, shortestPathLength :: Int
, manhattanDistance :: Int
, enemiesOnPath :: Int
, nearestEnemyDistance :: Int
, numNearbyEnemies :: Int
, stunAvailable :: Int
, drillsRemaining :: Int
}

produceWorldFeatures :: World -> WorldFeatures
produceWorldFeatures w = WorldFeatures
(if onActiveEnemy then 1 else 0)
shortestPathLength
manhattanDistance
enemiesOnPath
nearestEnemyDistance
numNearbyEnemies
(if stunAvailable then 1 else 0)
(fromIntegral drillsRemaining)
where
-- Calculated as before
onActiveEnemy = ...
enemiesOnPath = ...
shortestPathLength = ...
nearestEnemyDistance = ...
manhattanDistance = ...
stunAvailable = ...
numNearbyEnemies = ...
drillsRemaining = ...``````

Now, in our ML code, we'll want to convert this into a vector. Using a vector will enable use to encode this information as a tensor.

``````vectorizeWorld :: World -> Vector Float
vectorizeWorld w = fromList (fromIntegral <\$>
[ wfOnActiveEnemy features
, wfShortestPathLength features
, wfManhattanDistance features
, wfEnemiesOnPath features
, wfNearestEnemyDistance features
, wfNumNearbyEnemies features
, wfStunAvailable features
, wfDrillsRemaining features
])
where
features = produceWorldFeatures w``````

## Vectorizing Outputs

Now we have the inputs to our tensor system. We'll ultimately get a vector of outputs as a result. We want this vector to provide a score for every move. We have 10 potential moves in general. There are five "movement" directions, moving up, right, down, left, and standing still. Then for each direction, we can either use our stun or not. We'll use a drill when the movement direction sends us against a wall. Certain moves won't be available in certain situations. But our size should account for them all.

Our function will often propose invalid moves. For example, it might suggest using the stun while on cooldown, or drilling when we don't have one. In these cases, our game logic should dictate that our player doesn't move. Hopefully this trains our network to make correct moves. If we wanted to, we could even apply a slight negative reward for these.

What we need is the ability to convert a vector of outputs into a move. Once we fix the vector size, this is not difficult. As a slight hack, this function will always give the same direction for moving and drilling. We'll let the game logic determine if the drill needs to apply.

``````moveFromOutput :: Vector Int -> PlayerMove
moveFromOutput vals = PlayerMove moveDirection useStun moveDirection
where
bestMoveIndex = maxIndex vals
moveDirection = case bestMoveIndex `mod` 5 of
0 -> DirectionUp
1 -> DirectionRight
2 -> DirectionDown
3 -> DirectionLeft
4 -> DirectionNone
useStun = bestMoveIndex > 4``````

Now that we can get numeric vectors for everything, we need to be able to step through the world, one player move at a time. We currently have a generic `update` function. Depending on the time, it might step the player forward, or it might not. We want to change this so there are two steps. First we receive a player's move, and then we step the world forward until it is time for the next player move:

``stepWorld :: PlayerMove -> World -> World``

This isn't too difficult; it just requires a little shuffling of our existing code. First, we'll add a new `applyPlayerMove'` function. This will take the existing `applyPlayerMove` and add a little bit of validation to it:

``````applyPlayerMove' :: PlayerMove -> World -> World
applyPlayerMove' move w = if isValidMove
then worldAfterMove
else w
where
player = worldPlayer w
currentLoc = playerLocation player

worldAfterDrill = modifyWorldForPlayerDrill w
(drillDirection move)

worldAfterStun = if activateStun move
then modifyWorldForStun worldAfterDrill
else worldAfterDrill

newLocation = nextLocationForMove
(worldBoundaries worldAfterDrill Array.! currentLoc)
currentLoc
(playerMoveDirection move)

isValidStunUse = if activateStun move
then playerCurrentStunDelay player == 0
else True
isValidMovement = playerMoveDirection move == DirectionNone ||
newLocation /= currentLoc
isValidMove = isValidStunUse && isValidMovement

worldAfterMove =
modifyWorldForPlayerMove worldAfterStun newLocation``````

Now we'll add an `updateEnvironment` function. This will perform all the work of our `updateFunc` except for moving the player.

``````updateEnvironment :: World -> World
updateEnvironment w
| playerLocation player == endLocation w =
w { worldResult = GameWon }
| playerLocation player `elem` activeEnemyLocations =
w { worldResult = GameLost }
| otherwise =
updateWorldForEnemyTicks .
updateWorldForPlayerTick .
updateWorldForEnemyMoves .
clearStunCells .
incrementWorldTime \$ w
where
player = worldPlayer w
activeEnemyLocations = enemyLocation <\$>
filter (\e -> enemyCurrentStunTimer e == 0) (worldEnemies w)``````

Now we combine these. First we'll make the player's move. Then we'll update the environment once for each tick of the player's "lag" time.

``````stepWorld :: PlayerMove -> World -> World
stepWorld move w = execStateM (sequence updateActions) worldAfterMove
where
worldAfterMove = applyPlayerMove' move w
updateActions = replicate
( fromIntegral .
lagTime .
playerGameParameters .
worldParameters \$ w)
(modify updateEnvironment)``````

And these are all the modifications we'll need to get going!

## Q Learning Teaser

Now we can start thinking about the actual machine learning process. We'll get into a lot more detail next week. But for now, let's think about a particular training iteration. We'll want to use our existing network to step forward into the game. This will produce a certain "reward", and leave the game state in a new position. Then we'll get more values for our next moves out of that position. We'll use the updated move scores and the reward to learn better values for our function weights.

Of course, the immediate "reward" values for most moves will be 0. The only moves that will carry a reward will be those where we either win the game or lose the game. So it could take a while for our program to learn good behaviors. It will take time for the "end" behaviors of the game to affect normal moves. For this reason, we'll start our training on much smaller mazes than the primary game. This should help speed up the training process.

## Conclusion

Next week, we'll take our general framework for Q-Learning and apply it within Tensor Flow. We'll get the basics of Q-Learning down with a couple different types of models. For a wider perspective on Haskell and AI problems, make sure to check out our Haskell AI Series!

# Adding Features for Better Behavior

Last week we started exploring the idea of an AI built on an evaluation function. This has the potential to allow us to avoid a lot of the hand-crafting that comes with AI design. Hard old way specified all the rules for the AI to follow. In the new approach, we create a mathematical function to evaluate a game position. Then we can look at all our possible moves and select the one with the best result. We could, if we wanted to, turn the input to our evaluation function into a vector of numbers. And its output is also a number. This property will help us realize our dream future to machine learn this function.

We made a rudimentary version of this function last week. Even before turning to machine learning, there are a couple ways to improve our function. We can try tweaking the weights we applied to each feature. But we can also try coming up with new features, or try different combinations of features. This week, we'll try the latter approach.

In the coming weeks as we start exploring machine learning, we'll use Tensor Flow with Haskell! To get prepared, download our Haskell Tensor Flow guide!

## Existing Features

Last week, we came up with a few different features that could help us navigate this maze. These features included:

1. Maze distance to goal
2. Manhattan distance to goal
3. Whether or not an enemy is on our location
4. Whether or not our stun is available
5. The number of drills we have available
6. The number of enemies that are nearby (using manhattan distance)

But there were some clear sub-optimal behaviors with our bot. We tend to get "zoned out" by enemies, even when they aren't near us by maze distance. Obviously, it would suit us to use maze distance instead of manhattan distance. But we also want to be willing to approach enemies aggressively when we have our stun, and retreat intelligently without it. To that end, let's add a couple more features:

1. The number of enemies on the shortest path to the goal.
2. The shortest distance to an enemy from a particular square (only up to 5)

We'll impose a penalty for close enemies if we don't have our stun. Otherwise we'll ignore this first new feature. Then we'll also impose a penalty having more enemies on our shortest path. This will make us more willing to use the stun, rather than waiting.

## Enemies In The Way

Our first order of business will be to determine how many enemies lie on our shortest path. We'll filter the path itself based on membership in the active enemies set:

``````evaluateWorld :: World -> Float
evaluateWorld w =

where
activeEnemyLocations = …

shortestPath =
getShortestPath (worldBoundaries w) playerLoc goalLoc

enemiesOnPath = length \$ filter
(\l -> Set.member l (Set.fromList activeEnemyLocations))
shortestPath``````

Then we'll assign each enemy on this path a penalty greater than the value of using the stun. We'll add this score to our other scores.

``````evaluateWorld :: World -> Float
evaluateWorld w =
enemiesOnPathScore +
...
where
enemiesOnPath = ...
enemiesOnPathScore = -85.0 * (fromIntegral enemiesOnPath)``````

## Maze Distance

Next lets get the shortest maze distance to a nearby enemy. We'll actually want to generalize the behavior of our existing BFS function for this. We want to find the shortest path to any one of the enemy locations. So instead of supplying a single target location, we'll supply a set of target locations. Then we'll cap the distance to search so we aren't doing a full BFS of the maze every time. This gives an optional range parameter. Let's use these ideas to make an expanded API that our original function will use.

``````getShortestPathToTargetsWithLimit
:: Maze
-> Location
-> Set.Set Location
-> Maybe Int
-> [Location]
getShortestPathToTargetsWithLimit
maze initialLocation targetLocations maxRange = ...

-- Original function call!
getShortestPath maze initialLocation targetLocation =
getShortestPathToTargetsWithLimit maze initialLocation
(Set.singleton targetLocation) Nothing

bfs
:: Maze
-> Location
-> Set.Set Location -- Now a set of targets
-> Maybe Int -- Added range parameter
-> [Location]
bfs = ...``````

We'll have to make a few tweaks to our algorithm now. Each search state element will have a "distance" associated with it.

``````data BFSState = BFSState
{ bfsSearchQueue :: Seq.Seq (Location, Int)
...

-- Our initial state has a distance of 0
getShortestPathToTargetsWithLimit
maze initialLocation targetLocations maxRange =
evalState
(bfs maze initialLocation targetLocations maxRange)
(BFSState
(Seq.singleton (initialLocation, 0))
(Set.Singleton initialLocation)
Map.empty)``````

Now we need a couple modifications to the core `bfs` function. When extracting the next element in the queue, we have to consider its distance. All new items we create will increment that distance. And if we're at the max distance, we won't add anything to the queue. Finally, when evaluating if we're done, we'll check against the set of targets, rather than a single target. Here's our `bfs` code, with differences noted.

``````bfs
:: Maze
-> Location
-> Set.Set Location
-> Maybe Int
-> State BFSState [Location]
bfs maze initialLocation targetLocations maxRange = do
BFSState searchQueue visitedSet parentsMap <- get
if Seq.null searchQueue
then return []
else do

-- ! Unwrap distance as well
let (nextLoc, distance) = Seq.index searchQueue 0

-- ! Check set membership, not equality
if Set.member nextLoc targetLocations
then return (unwindPath parentsMap [nextLoc])
else do

-- ! Add the new distance to each adjacent cell
let adjacentCells = (, distance + 1) <\$>

-- ! Account for the distance with a new helper function
let unvisitedNextCells = filter

let newSearchQueue = foldr
(flip (Seq.|>))
(Seq.drop 1 searchQueue)
unvisitedNextCells
newVisitedSet = Set.insert nextLoc visitedSet
newParentsMap = foldr
(\(l, _) -> Map.insert l nextLoc)
parentsMap unvisitedNextCells
put (BFSState newSearchQueue newVisitedSet newParentsMap)
bfs maze initialLocation targetLocations maxRange
where
-- ! Helper function to account for distance when adding to queue
shouldAddNextCell visitedSet (loc, distance) = case maxRange of
Nothing -> not (Set.member loc visitedSet)
Just x -> distance <= x && not (Set.member loc visitedSet)

unwindPath parentsMap currentPath = ...``````

Now to use this feature, we'll use our new different shortest path call. If the distance is "0", this means we have no enemies near us, and there's no penalty. We also won't apply a penalty if our stun is available. Otherwise, we'll provide a stiffer penalty the shorter the path. Then we mix it in with the other scores.

``````evaluateWorld :: World -> Float
evaluateWorld w =
...
nearestEnemyDistanceScore +
...
where
...
nearestEnemyDistance = length \$ getShortestPathToTargetsWithLimit
(worldBoundaries w)
playerLoc
(Set.fromList activeEnemyLocations)
(Just 4)
nearestEnemyDistanceScore =
if nearestEnemyDistance == 0 || stunAvailable then 0.0
else -100.0 * (fromIntegral (5 - nearestEnemyDistance))``````

We'll also drop the enemy manhattan distance weight to -5.0.

## Results

From this change, our player suddenly appears much more intelligent! It will back away from enemies when it is missing it's stun. It will use the stun and go past the enemy when appropriate.

There are still ways we could improve the AI. It doesn't account for future space to retreat when running away. It sometimes uses the stun too early, when it might be better to wait for more enemies to come into range. But it's not clear how we could improve it by tweaking the weights. This means it's time to consider machine learning as an option to get better weights!

## Conclusion

Next week, we'll re-acquaint ourselves with the basics of machine learning and Tensor Flow. This will set us up to write a program that will determine our AI weights.

We're going to start working with Tensor Flow next week! To make sure you can keep up, download our Haskell Tensor Flow Guide. It'll help you with the basics of making this complex Haskell library work.

# Moving Towards ML: Evaluation Functions

Before we get started, here's a reminder that today (August 5th) is the last day of enrollment for our Haskell From Scratch course! Sign-ups close at midnight Pacfic time! Don't miss out!

This week, we're going to start taking our AI in a somewhat new direction. Right now, we're hard-coding specific decisions for our player to make. But this week, we'll make a more general function for evaluating different positions. Our initial results will be inferior to the AI we've hand-coded. But we'll set ourselves up to have a much better AI in the future by applying machine learning.

For more details on the code for this article, take a look at the `evaluation-game-function` branch on our Github Repository! This article also starts our move towards machine learning related concepts. So now would be a good time to review our Haskell AI Series. You can download our Tensor Flow Guide to learn more about using Haskell and Tensor Flow!

## Evaluation as a Strategy

Currently, our AI follows a strict set of rules. It performs pretty well for the current problem space. But suppose circumstances changed. Suppose we use different maze structures. Or we could add a completely new feature to the game. In these cases, we might need a completely different set of ideas to build a competent AI.

Our new strategy will be much more general. We'll supply our AI with a function that can evaluate a particular board position. That is, it will look at the world, and create a numeric output scoring it. Then our brain will look at all possible moves, score each position, and choose the move with the best result.

If game rules change, we'll need to rethink the evaluation function. But, by making the problem one of numbers to numbers, it'll be easier to use machine learning (instead of our own logic) to devise this function. This way, we can radically change the nature of the game, and we won't need to do too much manual work to change the AI. We might need to add new features (as we'll discuss later). But otherwise we would just need to re-train the evaluation function.

## Top Down Development

To implement this approach, we'll put the "function" in functional programming. We'll start by outlining our decision making process with a series of type signatures. Let's remember that first, our overarching goal is a function that takes a `World` and gives us a `PlayerMove`:

``makePlayerMove :: World -> PlayerMove``

We should first determine the set of possible moves:

``possibleMoves :: World -> [PlayerMove]``

Then we'll need to calculate the new `World` from each of those moves. (We won't go over this function in this article. It mainly consists of refactoring code we already have for manipulating the game).

``applyPlayerMove :: World -> PlayerMove -> World``

Then we'll score each of those resulting worlds. This is where the real "brain" is going to live now:

``evaluateWorld :: World -> Float``

Now that we know the functions we're writing, we can already implement `makePlayerMove`. We'll assume our helpers already exist and then we apply the process outlined above:

``````makePlayerMove :: World -> PlayerMove
makePlayerMove w = bestMove
where
-- 1. Get our Moves
allMoves = possibleMoves w

-- 2. See what the results of each move are
possibleWorlds = applyPlayerMove w <\$> allMoves

-- 3. Score each resulting world
scores = evaluateWorld <\$> possibleWorlds

-- 4. Combine the world with its move and choose the best one
movesWithScores = zip allMoves movesWithScores
bestMove = fst \$ maximumBy (\(_, score1) (_, score2) ->
compare score1 score2) movesWithScores``````

This will compile, and we can now move on to the individual components.

## Getting Possible Moves

Let's start with getting all the possible moves. When it comes to movement, we generally have five options: stand still, or move in one of four directions. But if we're out of drills, or near the boundary of the world, this can restrict our options. But we always have the sure option of standing still, so let's start with that:

``````possibleMoves :: World -> [PlayerMove]
possibleMoves w = …
where
standStillMove = PlayerMove DirectionNone False DirectionNone
...``````

Now in every direction, we'll have a `Maybe` move possibility. If it's a `WorldBoundary`, we'll get `Nothing`. Otherwise if it's a wall, then we'll have a possible move as long as a drill is available. Otherwise the move is possible, and we won't need a drill. We'll wrap these behaviors in a helper function, and then it's easy to use that in each direction:

``````possibleMoves :: World -> [PlayerMove]
possibleMoves w = baseMoves
where
standStillMove = PlayerMove DirectionNone False DirectionNone
player = worldPlayer w
bounds = (worldBoundaries w) Array.! (playerLocation player)

possibleMove :: (CellBoundaries -> BoundaryType) ->
MoveDirection -> Maybe PlayerMove
possibleMove boundaryFunc direction =
case boundaryFunc bounds of
WorldBoundary -> Nothing
Wall _ -> if playerDrillsRemaining player > 0
then Just \$ PlayerMove direction False direction
else Nothing
AdjacentCell _ -> Just \$
PlayerMove direction False DirectionNone

upMove = possibleMove upBoundary DirectionUp
rightMove = possibleMove rightBoundary DirectionRight
downMove = possibleMove downBoundary DirectionDown
leftMove = possibleMove leftBoundary DirectionLeft

baseMoves = standStillMove : (catMaybes [upMove, rightMove, downMove, leftMove])``````

Now we have to factor in that each move can also apply the stun if it's available.

``````possibleMoves :: World -> [PlayerMove]
possibleMoves w = baseMoves ++ stunMoves
where
...
baseMoves = standStillMove : (catMaybes [upMove, rightMove, downMove, leftMove])

stunMoves = if playerCurrentStunDelay player /= 0 then []
else [ m { activateStun = True } | m <- baseMoves ]``````

And now we've got our moves!

## Evaluating the Game Position

Now let's start tackling the problem of evaluating a particular game situation. Any manual solution we come up with here is likely to have problems. This is where machine learning will come in. But here's the general approach we want.

First, we'll select particular "features" of the world. For instance, how far away are we from the end of the maze? How many enemies are within our stun radius? We'll consider all these elements, and then come up with a "weight" for each feature. A weight is a measurement of whether that feature makes the position "good" or "bad". Then, we'll add together the weighted feature values to get a score. So here's a list of the features we're going to use:

1. How close are we (in maze search terms) from the target location? This will use pure BFS and it will not account for using drills.
2. How close are we in manhattan distance terms from the target location?
3. Is there an active enemy on the same square as the player (this will receive a heavy negative weight!)
4. How many enemies are within our stun radius?
5. Is our stun available?
6. How many drills do we have left?

Let's start by getting all these features:

``````evaluateWorld :: World -> Float
evaluateWorld w = ...
where
player = worldPlayer w
playerLoc@(px, py) = playerLocation player
radius = stunRadius . playerGameParameters . worldParameters \$ w
goalLoc@(gx, gy) = endLocation w
activeEnemyLocations = enemyLocation <\$>
(filter (\e -> enemyCurrentStunTimer e == 0) (worldEnemies w))

onActiveEnemy = playerLocation player `elem` activeEnemyLocations

shortestPathLength = length \$
getShortestPath (worldBoundaries w) playerLoc goalLoc

manhattanDistance = abs (gx - px) + abs (gy - py)

stunAvailable = playerCurrentStunDelay player == 0

numNearbyEnemies = length
[ el | el@(elx, ely) <- activeEnemyLocations,
abs (elx - px) <= radius && abs (ely - py) <= radius ]

drillsRemaining = playerDrillsRemaining player``````

Now let's move on to assigning scores. If our player is on the same square as an active enemy, we lose. So let's give this a weight of -1000. Conversely, the closer we get to the target, the closer we are to winning. So let's devise a function where if that distance is 0, the score is 1000. Then the farther away we get, the more points we lose. Let's say, 20 points per square. For manhattan distance, we'll use a strict penalty, rather than reward:

``````evaluateWorld :: World -> Float
evaluateWorld w = ...
where
...
onActiveEnemyScore = if onActiveEnemy then -1000.0 else 0.0
shortestPathScore = 1000.0 - (20.0 * (fromIntegral shortestPathLength))
manhattanDistanceScore = (-5.0) * (fromIntegral manhattanDistance)``````

Now we want to generally reward having our power ups available to us. This will stop the bot from needlessly using them and also reward it for picking up new drills. We'll also penalize having enemies too close to us.

``````evaluateWorld :: World -> Float
evaluateWorld w = ...
where
...
stunAvailableScore = if stunAvailable then 80.0 else 0.0
numNearbyEnemiesScore = -100.0 * (fromIntegral numNearbyEnemies)
drillsRemainingScore = 30.0 * (fromIntegral drillsRemaining)``````

And to complete the function, we'll just add these together:

``````evaluateWorld :: World -> Float
evaluateWorld w =
onActiveEnemyScore +
shortestPathScore +
manhattanDistanceScore +
stunAvailableScore +
numNearbyEnemiesScore +
drillsRemainingScore``````

## How Well Does it Work?

When we run the game now with the AI active, we see some interesting behaviors. Our bot will generally navigate the maze well. It's path isn't optimal, as we have with `drillBFS`. But it makes decent choices about drilling. Its behavior around enemies is a bit strange. It tends to stay away from them, even if they're not actually close in maze difference. This makes it take longer than it needs.

We still don't have good retreating behavior in certain cases. It will often stand still and let an enemy grab it instead of running away.

At this point, we have a couple options for improving the AI. First, we could try tweaking the weights. This will be tedious for us to do manually. This is why we want to apply machine learning techniques to come up with optimal weights.

But the other option is to update the feature space. If we can come up with more intelligent features, we won't need as precise weights.

## Conclusion

Next week, we'll try to fix our behavior around enemies. We'll use true maze distance in more places as opposed to manhattan distance. This should give us some big improvements. Then we'll start looking into how we can learn better weights.

We'll be coming up pretty soon on using Tensor Flow for this program! Download our Haskell Tensor Flow Guide to learn more!

And if you're still a Haskell beginner, there's never been a better time to learn! Register for our Haskell From Scratch course to jump-start your Haskell journey! Enrollment ends at midnight TODAY! (August 5th).

# Grenade! Dependently Typed Neural Networks

In the last couple weeks we explored one of the most complex topics I’ve presented on this blog. We examined potential runtime failures that can occur when using Tensor Flow. These included mismatched dimensions and missing placeholders. In an ideal world, we would catch these issues at compile time instead. At its current stage, the Haskell Tensor Flow library doesn’t support that. But we demonstrated that it is possible to add a layer to do this by using dependent types.

Now, I’m still very much of a novice at dependent types, so the solutions I presented were rather clunky. This week I'll show a better example of this concept from a different library. The Grenade library uses dependent types everywhere. It allows us to build verifiably-valid neural networks with extreme concision. So let’s dive in and see what it’s all about!

## Shapes and Layers

The first thing to learn with this library is the two concepts of Shapes and Layers. Shapes are best compared to tensors from Tensor Flow, except that they exist at the type level. In Tensor Flow we could build tensors with arbitrary dimensions. Grenade currently only supports up to three dimensions. So the different shape types either start with `D1, D2`, or `D3`, depending on the dimensionality of the shape. Then each of these type constructors take a set of natural number parameters. So the following are all valid “Shape” types within Grenade:

``````D1 5
D2 4 12
D3 8 10 2``````

The first represents a vector with 5 elements. The second represents a matrix with 4 rows and 12 columns. And the third represents an 8x10x2 matrix (or tensor, if you like). The different numbers represent those values at the type level, not the term level. If this seems confusing, here’s a good tutorial that goes into more depth about the basics of dependent types. The most important idea is that something of type `D1 5` can only have 5 elements. A vector of 4 or 6 elements will not type-check.

So now that we know about shapes, let’s examine layers. Layers describe relationships between our shapes. They encapsulate the transformations that happen on our data. The following are all valid layer types:

``````Relu
FullyConnected 10 20
Convolution 1 10 5 5 1 1``````

The layer `Relu` describes a layer that takes in data of any kind of shape and outputs the same shape. In between, it applies the `relu` activation function to the input data. Since it doesn’t change the shape, it doesn’t need any parameters.

A `FullyConnected` layer represents the canonical layer of a neural network. It has two parameters, one for the number of input neurons and one for the number of output neurons. In this case, the layer will take 10 inputs and produce 20 outputs.

A `Convolution` layer represents a 2D convolution like we saw with our MNIST network. This particular example has 1 input feature, 10 output features, uses a 5x5 patch size, and a 1x1 patch offset.

## Describing a Network

Now that we have a basic grasp on shapes and layers, we can see how they fit together to create a full network. A network type has two type parameters. The second parameter is a list of the shapes that our data takes at any given point throughout the network. The first parameter is a list of the layers representing the transformations on the data. So let’s say we wanted to describe a very simple network. It will take 4 inputs and produce 10 outputs using a fully connected layer. Then it will perform an `Relu` activation. This network looks like this:

``````type SimpleNetwork = Network
‘[FullyConnected 4 10, Relu]
‘[ ‘D1 4, ‘D1 10, ‘D1 10]``````

The apostrophes in front of the lists and `D1` terms indicated that these are promoted constructors. So they are types instead of terms. To “read” this type, we start with the first data format. We go to each successive data format by applying the transformation layer. So for instance we start with a 4-vector, and transform it into a 10-vector with a fully-connected layer. Then we transform that 10-vector into another 10-vector by applying `relu`. That’s all there is to it! We could apply another `FullyConnected` layer onto this that will have 3 outputs like so:

``````type SimpleNetwork = Network
‘[FullyConnected 4 10, Relu, FullyConnected 10 3]
‘[ ‘D1 4, ‘D1 10, ‘D1 10, `D1 3]``````

Let's look at MNIST to see a more complicated example. We'll start with a 28x28 image of data. Then we’ll perform the convolution layer I mentioned above. This gives us a 3-dimensional tensor of size 24x24x10. Then we can perform 2x2 max pooling on this, resulting in a 12x12x10 tensor. Finally, we can apply an `Relu` layer, which keeps it at the same size:

``````type MNISTStart = MNISTStart
‘[Convolution 1 10 5 5 1 1, Pooling 2 2 2 2, Relu]
‘[D2 28 28, D3 24 24 10, D3 12 12 10, D3 12 12 10]``````

Here’s what a full MNIST example might look like (per the README on the library’s Github page):

``````type MNIST = Network
'[ Convolution 1 10 5 5 1 1, Pooling 2 2 2 2, Relu
, Convolution 10 16 5 5 1 1, Pooling 2 2 2 2, FlattenLayer, Relu
, FullyConnected 256 80, Logit, FullyConnected 80 10, Logit]
'[ 'D2 28 28, 'D3 24 24 10, 'D3 12 12 10, 'D3 12 12 10
, 'D3 8 8 16, 'D3 4 4 16, 'D1 256, 'D1 256
, 'D1 80, 'D1 80, 'D1 10, 'D1 10]``````

This is a much simpler and more concise description of our network than we can get in Tensor Flow! Let’s examine the ways in which the library uses dependent types to its advantage.

## The Magic of Dependent Types

Describing our network as a type seems like a strange idea if you’ve never used dependent types before. But it gives us a couple great perks!

The first major win we get is that it is very easy to generate the starting values of our network. Since it has a specific type, we can let type inference guide us! We don’t need any term level code that is specific to the shape of our network. All we need to do is attach the type signature and call `randomNetwork`!

``````randomSimple :: MonadRandom m => m SimpleNetwork
randomSimple = randomNetwork``````

This will give us all the initial values we need, so we can get going!

The second (and more important) win is that we can’t build an invalid network! Suppose we try to take our simple network and somehow format it incorrectly. For instance, we could say that instead of the input shape being of size 4, it’s of size 7:

``````type SimpleNetwork = Network
‘[FullyConnected 4 10, Relu, FullyConnected 10 3]
‘[ ‘D1 7, ‘D1 10, ‘D1 10, `D1 3]
-- ^^ Notice this 7``````

This will result in a compile error, since there is a mismatch between the layers. The first layer expects an input of 4, but the first data format is of length 7!

``````Could not deduce (Layer (FullyConnected 4 10) ('D1 7) ('D1 10))
arising from a use of ‘randomNetwork’
from the context: MonadRandom m
bound by the type signature for:
randomSimple :: MonadRandom m => m SimpleNetwork

In other words, it notices that the chain from `D1 7` to `D1 10` using a `FullyConnected 4 10` layer is invalid. So it doesn’t let us make this network. The same thing would happen if we made the layers themselves invalid. For instance, we could make the output and input of the two fully-connected layers not match up:

``````-- We changed the second to take 20 as the number of input elements.
type SimpleNetwork = Network
'[FullyConnected 4 10, Relu, FullyConnected 20 3]
'[ 'D1 4, 'D1 10, 'D1 20, 'D1 3]

…

• Could not deduce (Layer (FullyConnected 20 3) ('D1 10) ('D1 3))
arising from a use of ‘randomNetwork’
from the context: MonadRandom m
bound by the type signature for:
randomSimple :: MonadRandom m => m SimpleNetwork

So Grenade makes our program much safer by providing compile time guarantees about our network's validity. Runtime errors due to dimensionality are impossible!

## Training the Network on Iris

Now let’s do a quick run-through of how we actually train this neural network. Readers with a keen eye may have noticed that the `SimpleNetwork` we’ve built is the same network we used to train the Iris data set. So we’ll do a training run there, using the following steps:

1. Write the network type and generate a random network from it
2. Read our input data into a format that Grenade uses
3. Write a function to run a training iteration.
4. Run it!

### 1. Write the Network type and Generate Network

So we've already done this first step for the most part. We’ll adjust the names a little bit though. Note that I’ll include the imports list as an appendix to the post. Also, the code is on the `grenade` branch of my Haskell Tensor Flow repository in `IrisGrenade.hs`!

``````type IrisNetwork = Network
'[FullyConnected 4 10, Relu, FullyConnected 10 3]
'[ 'D1 4, 'D1 10, 'D1 10, 'D1 3]

randomIris :: MonadRandom m => m IrisNetwork
randomIris = randomNetwork

runIris :: FilePath -> FilePath -> IO ()
runIris trainingFile testingFile = do
initialNetwork <- randomIris
...``````

### 2. Take in our Input Data

We’ll make use of the `readIrisFromFile` function we used back when we first did Iris. Then we'll make a dependent type called `IrisRow`, which uses the `S` type. This `S` type is a container for a shape. We want our input data to use `D1 4` for the 4 input features. Then our output data should use `D1 3` for the three possible categories.

``````-- Dependent type on the dimensions of the row
type IrisRow = (S ('D1 4), S ('D1 3))``````

If we have malformed data, the types will not match up, so we’ll need to return a `Maybe` to ensure this succeeds. Note that we normalize the data by dividing by 8. This puts all the data between 0 and 1 and makes for better training results. Here's how we parse the data:

``````parseRecord :: IrisRecord -> Maybe IrisRow
parseRecord record = case (input, output) of
(Just i, Just o) -> Just (i, o)
_ -> Nothing
where
input = fromStorable \$ VS.fromList \$ float2Double <\$>
[ field1 record / 8.0, field2 record / 8.0, field3 record / 8.0, field4 record / 8.0]
output = oneHot (fromIntegral \$ label record)``````

Then we incorporate these into our main function:

``````runIris :: FilePath -> FilePath -> IO ()
runIris trainingFile testingFile = do
initialNetwork <- randomIris
trainingRecords <- readIrisFromFile trainingFile
testRecords <- readIrisFromFile testingFile

let trainingData = mapMaybe parseRecord (V.toList trainingRecords)
let testData = mapMaybe parseRecord (V.toList testRecords)

-- Catch if any were parsed as Nothing
if length trainingData /= length trainingRecords || length testData /= length testRecords
then putStrLn "Hmmm there were some problems parsing the data"
else …``````

### 3. Write a Function to Train the Input Data

This is a multi-step process. First we’ll establish our learning parameters. We'll also write a function that will allow us to call the `train` function on a particular row element:

``````learningParams :: LearningParameters
learningParams = LearningParameters 0.01 0.9 0.0005

-- Train the network!
trainRow :: LearningParameters -> IrisNetwork -> IrisRow -> IrisNetwork
trainRow lp network (input, output) = train lp network input output``````

Next we’ll write two more helper functions that will help us test our results. The first will take the network and a test row. It will transform it into the predicted output and the actual output of the network. The second function will take these outputs and reverse the `oneHot` process to get the label out (0, 1, or 2).

``````-- Takes a test row, returns predicted output and actual output from the network.
testRow :: IrisNetwork -> IrisRow -> (S ('D1 3), S ('D1 3))
testRow net (rowInput, predictedOutput) = (predictedOutput, runNet net rowInput)

-- Goes from probability output vector to label
getLabels :: (S ('D1 3), S ('D1 3)) -> (Int, Int)
getLabels (S1D predictedLabel, S1D actualOutput) =
(maxIndex (extract predictedLabel), maxIndex (extract actualOutput))``````

Finally we’ll write a function that will take our training data, test data, the network, and an iteration number. It will return the newly trained network, and log some results about how we’re doing. We’ll first take only a sample of our training data and adjust our parameters so that learning gets slower. Then we'll train the network by folding over the sampled data.

``````run :: [IrisRow] -> [IrisRow] -> IrisNetwork -> Int -> IO IrisNetwork
run trainData testData network iterationNum = do
sampledRecords <- V.toList <\$> chooseRandomRecords (V.fromList trainData)
-- Slowly drop the learning rate
let revisedParams = learningParams
{ learningRate = learningRate learningParams * 0.99 ^ iterationNum}
let newNetwork = foldl' (trainRow revisedParams) network sampledRecords
....``````

Then we’ll wrap up the function by looking at our test data, and seeing how much we got right!

``````run :: [IrisRow] -> [IrisRow] -> IrisNetwork -> Int -> IO IrisNetwork
run trainData testData network iterationNum = do
sampledRecords <- V.toList <\$> chooseRandomRecords (V.fromList trainData)
-- Slowly drop the learning rate
let revisedParams = learningParams
{ learningRate = learningRate learningParams * 0.99 ^ iterationNum}
let newNetwork = foldl' (trainRow revisedParams) network sampledRecords
let labelVectors = fmap (testRow newNetwork) testData
let labelValues = fmap getLabels labelVectors
let total = length labelValues
let correctEntries = length \$ filter ((==) <\$> fst <*> snd) labelValues
putStrLn \$ "Iteration: " ++ show iterationNum
putStrLn \$ show correctEntries ++ " correct out of: " ++ show total
return newNetwork``````

### 4. Run it!

We’ll call this now from our main function, iterating 100 times, and we’re done!

``````runIris :: FilePath -> FilePath -> IO ()
runIris trainingFile testingFile = do
...
if length trainingData /= length trainingRecords || length testData /= length testRecords
then putStrLn "Hmmm there were some problems parsing the data"
else foldM_ (run trainingData testData) initialNetwork [1..100]``````

## Comparing to Tensor Flow

So now that we’ve looked at a different library, we can consider how it stacks up against Tensor Flow. So first, the advantages. Grenade's main advantage is that it provides dependent type facilities. This means it is more difficult to write incorrect programs. The basic networks you build are guaranteed to have the correct dimensionality. Additionally, it does not use a “placeholders” system, so you can avoid those kinds of errors too. This means you're likely to have fewer runtime bugs using Grenade.

Concision is another major strong point. The training code got a bit involved when translating our data into Grenade's format. But it’s no more complicated than Tensor Flow. When it comes down to the exact definition of the network itself, we do this in only a few lines with Grenade. It’s complicated to understand what those lines mean if you are new to dependent types. But after seeing a few simple examples you should be able to follow the general pattern.

Of course, none of this means that Tensor Flow is without its advantages. As we saw a couple weeks ago, it is not too difficult to add very thorough logging to your Tensor Flow program. The Tensor Board application will then give you excellent visualizations of this data. It is somewhat more difficult to get intermediate log results with Grenade. There is not too much transparency (that I have found at least) into the inner values of the network. The network types are composable though. So it is possible to get intermediate steps of your operation. But if you break your network into different types and stitch them together, you will remove some of the concision of the network.

Also, Tensor Flow also has a much richer ecosystem of machine learning tools to access. Grenade is still limited to a subset of the most common machine learning layers, like convolution and max pooling. Tensor Flow’s API allows approaches like support vector machines and linear models. So Tensor Flow offers you more options.

One question I may explore in a future article would be to compare the performance of the two libraries. My suspicion is that Tensor Flow is faster due to how it gets all its math down to the C-level. But I’m not too familiar yet with HMatrix (which Grenade depends on for its math) and its efficiency. So I could definitely be wrong.

## Conclusion

Grenade provides some truly awesome facilities for building a concise neural network. A Grenade program can demonstrate at compile time that the network is well formed. It also allows an incredibly concise way to define what layers your neural network has. It doesn’t have the Google level support that Tensor Flow does. So it lacks many cool features like logging and visualizations. But it is quite a neat library for its scope. One thing I haven’t mentioned is its mechanics for Generative/Adversarial networks. I’d definitely like to try that out soon!

Grenade is a simpler library to incorporate into Stack compared to Tensor Flow. If you want to compare the two, you should check out our Haskell Tensor Flow guide so you can install TF and get started!

If you’ve never written a line of Haskell before, never fear! Download our Getting Started Checklist for some free resources to start your Haskell education!

## Appendix: Compiler Extensions and Imports

``````{-# LANGUAGE DataKinds #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE TupleSections #-}
{-# LANGUAGE GADTs #-}

import           Data.Foldable (foldl')
import           Data.Maybe (mapMaybe)
import qualified Data.Vector.Storable as VS
import qualified Data.Vector as V
import           GHC.Float (float2Double)
import           Numeric.LinearAlgebra (maxIndex)
import           Numeric.LinearAlgebra.Static (extract)

import           Processing (IrisRecord(..), readIrisFromFile, chooseRandomRecords)``````

# Checking it's all in Place: Placeholders and Dependent Types

Last week we dove into the world of dependent types. We linked tensors with their shapes at the type level. This gave our program some extra type safety and allowed us to avoid certain runtime errors.

This week, we’re going to solve another runtime conundrum: missing placeholders. We’ll add some more dependent type machinery to ensure we've plugged in all the necessary placeholders! But we’ll see this is not as straightforward as shapes.

Now to start, let’s remind ourselves what placeholders are in Tensor Flow and how we use them.

## Placeholder Review

Placeholders represent tensors that can have different values on different application runs. This is often the case when we’re training on different samples of data. Here’s our very simple example in Python. We’ll create a couple placeholder tensors by providing their shapes and no values. Then when we actually run the session, we’ll provide a value for each of those tensors.

``````node1 = tf.placeholder(tf.float32)
node2 = tf.placeholder(tf.float32)
sess = tf.Session()
result1 = sess.run(adderNode, {node1: 3, node2: 4.5 })``````

The weakness here is that there’s nothing forcing us to provide values for those tensors! We could try running our program without them and we’ll get a runtime crash:

``````...
sess = tf.Session()
print(result1)
…

Terminal Output:

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder' with dtype float
[[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]``````

Unfortunately, the Haskell Tensor Flow library doesn’t actually do any better here. When we want to fill in placeholders, we provide a list of “feeds”. But our program will still compile even if we pass an empty list! We’ll encounter similar runtime errors:

``````(node1 :: Tensor Value Float) <- placeholder 
(node2 :: Tensor Value Float) <- placeholder 
let runStep = \node1Feed node2Feed -> runWithFeeds [] adderNode
runStep (encodeTensorData  input1) (encodeTensorData  input2)
…

Terminal Output:

TensorFlowException TF_INVALID_ARGUMENT "You must feed a value for placeholder tensor 'Placeholder_1' with dtype float and shape \n\t [[Node: Placeholder_1 = Placeholder[dtype=DT_FLOAT, shape=, _device=\"/job:localhost/replica:0/task:0/cpu:0\"]()]]"``````

In the Iris and MNIST examples, we bury the call to `runWithFeeds` within our neural network API. We only provide a `Model` object. This model object forces us to provide the expected input and output tensors. So anyone using our model wouldn't make a manual `runWithFeeds` call.

``````data Model = Model
{ train :: TensorData Float
-> TensorData Int64
-> Session ()
, errorRate :: TensorData Float
-> TensorData Int64
-> SummaryTensor
-> Session (Float, ByteString)
}``````

This isn’t a bad solution! But it’s interesting to see how we can push the envelope with dependent types, so let’s try that!

## Adding More “Safe” Types

The first step we’ll take is to augment Tensor Flow’s `TensorData` type. We’ll want it to have shape information like `SafeTensor` and `SafeShape`. But we’ll also attach a name to each piece of data. This will allow us to identify which tensor to substitute the data in for. At the type level, we refer to this name as a `Symbol`.

``````data SafeTensorData a (n :: Symbol) (s :: [Nat]) where
SafeTensorData :: (TensorType a) => TensorData a -> SafeTensorData a n s``````

Next, we’ll need to make some changes to our `SafeTensor` type. First, each `SafeTensor` will get a new type parameter. This parameter refers to a mapping of names (symbols) to shapes (which are still lists of naturals). We'll call this a placeholder list. So each tensor will have type-level information for the placeholders it depends on. Each different placeholder has a name and a shape.

``````data SafeTensor v a (s :: [Nat]) (p :: [(Symbol, [Nat])]) where
SafeTensor :: (TensorType a) => Tensor v a -> SafeTensor v a s p``````

Now, recall when we substituted for placeholders, we used a list of feeds. But this list had no information about the names or dimensions of its feeds. Let's create a new type containing the different elements we need for our feeds. It should also contain the correct type information about the placeholder list. The first step of to define the type so that it has the list of placeholders it contains, like the `SafeTensor`.

``data FeedList (pl :: [(Symbol, [Nat])]) where``

This structure will look like a linked list, like our `SafeShape`. Thus we’ll start by defining an “empty” constructor:

``````data FeedList (pl :: [(Symbol, [Nat])]) where
EmptyFeedList :: FeedList '[]``````

Now we’ll add a “Cons”-like constructor by creating yet another type operator `:--:`. Each “piece” of our linked list will contain two different items. First, the tensor we are substituting for. Next, it will have the data we’ll be using for the substitution. We can use type parameters to force their shapes and data types to match. Then we need the resulting placeholder type. We have to append the type-tuple containing the symbol and shape to the previous list. This completes our definition.

``````data FeedList (pl :: [(Symbol, [Nat])]) where
EmptyFeedList :: FeedList '[]
(:--:) :: (KnownSymbol n)
=> (SafeTensor Value a s p, SafeTensorData a n s)
-> FeedList pl
-> FeedList ( '(n, s) ': pl)

infixr 5 :--:``````

Note that we force the tensor to be a `Value` tensor. We can only substitute data for rendered tensors, hence this restriction. Let's add a quick `safeRender` so we can render our `SafeTensor` items.

``````safeRender :: (MonadBuild m) => SafeTensor Build a s pl -> m (SafeTensor Value a s pl)
safeRender (SafeTensor t1) = do
t2 <- render t1
return \$ SafeTensor t2``````

## Making a Placeholder

Now we can write our `safePlaceholder` function. We’ll add a `KnownSymbol` as a type constraint. Then we’ll take a `SafeShape` to give ourselves the type information for the shape. The result is a new tensor that maps the symbol and the shape in the placeholder list.

``````safePlaceholder :: (MonadBuild m, TensorType a, KnownSymbol sym) =>
SafeShape s -> m (SafeTensor Value a s '[ '(sym, s)])
safePlaceholder shp = do
pl <- placeholder (toShape shp)
return \$ SafeTensor pl``````

This looks a little crazy, and it kind’ve is! But we’ve now created a tensor that stores its own placeholder information at the type level!

## Updating Old Code

Now that we’ve done this, we’re also going to have to update some of our older code. The first part of this is pretty straightforward. We’ll need to change `safeConstant` so that it has the type information. It will have an empty list for the placeholders.

``````safeConstant :: (TensorType a, ShapeProduct s ~ n) =>
Vector n a -> SafeShape s -> SafeTensor Build a s '[]
safeConstant elems shp = SafeTensor (constant (toShape shp) (toList elems))``````

Our mathematical operations will be a bit more tricky though. Consider adding two arbitrary tensors. They may share placeholder dependencies but may not. What should be the placeholder type for the resulting tensor? Obviously the union of the two placeholder maps of the input tensors! Luckily for us, we can use `Union` from the `type-list` library to represent this concept.

``````safeAdd :: (TensorType a, a /= Bool, TensorKind v)
=> SafeTensor v a s p1
-> SafeTensor v a s p2
-> SafeTensor Build a s (Union p1 p2)
safeAdd (SafeTensor t1) (SafeTensor t2) = SafeTensor (t1 `add` t2)``````

We’ll make the same update with matrix multiplication:

``````safeMatMul :: (TensorType a, a /= Bool, a /= Int8, a /= Int16,
a /= Int64, a /= Word8, a /= ByteString, TensorKind v)
=> SafeTensor v a '[i,n] p1 -> SafeTensor v a '[n,o] p2 -> SafeTensor Build a '[i,o] (Union p1 p2)
safeMatMul (SafeTensor t1) (SafeTensor t2) = SafeTensor (t1 `matMul` t2)``````

## Running with Placeholders

Now we have all the information we need to write our `safeRun` function. This will take a `SafeTensor`, and it will also take a `FeedList` with the same placeholder type. Remember, a `FeedList` contains a series of `SafeTensorData` items. They must match up symbol-for-symbol and shape-for-shape with the placeholders within the `SafeTensor`. Let’s look at the type signature:

``````safeRun :: (TensorType a, Fetchable (Tensor v a) r) =>
FeedList pl -> SafeTensor v a s pl -> Session r``````

The `Fetchable` constraint enforces that we can actually get the “result” `r` out of our tensor. For instance, we can "fetch" a vector of floats out of a tensor that uses `Float` as its underlying value.

We’ll next define a tail-recursive helper function to build the vanilla “list of feeds” out of our `FeedList`. Through pattern matching, we can pick out the tensor to substitute for and the data we’re using. We can combine these into a feed and append to the growing list:

``````safeRun = ...
where
buildFeedList :: FeedList ss -> [Feed] -> [Feed]
buildFeedList EmptyFeedList accum = accum
buildFeedList ((SafeTensor tensor_, SafeTensorData data_) :--: rest) accum =
buildFeedList rest ((feed tensor_ data_) : accum)``````

Now all we have to do to finish up is call the normal `runWithFeeds` function with the list we’ve created!

``````safeRun :: (TensorType a, Fetchable (Tensor v a) r) =>
FeedList pl -> SafeTensor v a s pl -> Session r
safeRun feeds (SafeTensor finalTensor) = runWithFeeds (buildFeedList feeds []) finalTensor
where
...``````

And here’s what it looks like to use this in practice with our simple example. Notice the type signatures do get a little cumbersome. The signatures we place on the initial placeholder tensors are necessary. Otherwise the compiler wouldn't know what label we're giving them! The signature containing the union of the types is unnecessary. We can remove it if we want and let type inference do its work.

``````main3 :: IO (VN.Vector Float)
main3 = runSession \$ do
let (shape1 :: SafeShape '[2,2]) = fromJust \$ fromShape (Shape [2,2])
(a :: SafeTensor Value Float '[2,2] '[ '("a", '[2,2])]) <- safePlaceholder shape1
(b :: SafeTensor Value Float '[2,2] '[ '("b", '[2,2])] ) <- safePlaceholder shape1
let result = a `safeAdd` b
(result_ :: SafeTensor Value Float '[2,2] '[ '("b", '[2,2]), '("a", '[2,2])]) <- safeRender result
let (feedA :: Vector 4 Float) = fromJust \$ fromList [1,2,3,4]
let (feedB :: Vector 4 Float) = fromJust \$ fromList [5,6,7,8]
let fullFeedList = (b, safeEncodeTensorData shape1 feedB) :--:
(a, safeEncodeTensorData shape1 feedA) :--:
EmptyFeedList
safeRun fullFeedList result_

{- It runs!
[6.0,8.0,10.0,12.0]
-}``````

Now suppose we make some mistakes with our types. Here we’ll take out the “A” feed from our feed list:

``````-- Let’s take out Feed A!
main = …
let fullFeedList = (b, safeEncodeTensorData shape1 feedB) :--:
EmptyFeedList
safeRun fullFeedList result_

{- Compiler Error!
• Couldn't match type ‘'['("a", '[2, 2])]’ with ‘'[]’
Expected type: SafeTensor Value Float '[2, 2] '['("b", '[2, 2])]
Actual type: SafeTensor
Value Float '[2, 2] '['("b", '[2, 2]), '("a", '[2, 2])]
-}``````

Here’s what happens when we try to substitute a vector with the wrong size. It will identify that we have the wrong number of elements!

``````main = …
-- Wrong Size!
let (feedA :: Vector 8 Float) = fromJust \$ fromList [1,2,3,4,5,6,7,8]
let (feedB :: Vector 4 Float) = fromJust \$ fromList [5,6,7,8]
let fullFeedList = (b, safeEncodeTensorData shape1 feedB) :--:
(a, safeEncodeTensorData shape1 feedA) :--:
EmptyFeedList
safeRun fullFeedList result_

{- Compiler Error!
Couldn't match type ‘4’ with ‘8’
arising from a use of ‘safeEncodeTensorData’
-}``````

## Conclusion: Pros and Cons

So let’s take a step back and look at what we’ve constructed here. We’ve managed to provide ourselves with some pretty cool compile time guarantees. We’ve also added de-facto documentation to our code. Anyone familiar with the codebase can tell at a glance what placeholders we need for each tensor. It’s a lot harder now to write incorrect code. There are still error conditions of course. But if we’re smart we can write our code to deal with these all upfront. That way we can fail gracefully instead of throwing a random run-time crash somewhere.

But there are drawbacks. Imagine being a Haskell novice and walking into this codebase. You’ll have no real clue what’s going on (I wouldn’t have 2 months ago). The types are very cumbersome after a while, so continuing to write them down gets very tedious. Though as I mentioned, type inference can deal with a lot of that. But if you don’t track them, the type union can be finicky about the ordering of your placeholders. We could fix this with another type family though.

All these factors could present a real drag on development. But then again, tracking down run-time errors can also do this. Tensor Flow’s error messages can still be a little cryptic. This can make it hard to find root causes.

Since I’m still a novice with dependent types, this code was a little messy. Next week we’ll take a look at a more polished library that uses dependent types for neural networks. We’ll see how the Grenade library allows us to specify a learning system in just a few lines of code!

If you’re new to Haskell, I hope none of this dependent type madness scared you! The language is much easier than these last couple posts make it seem! Try it out, and download our Getting Started Checklist. It'll give you some instructions and tools to help you learn!

If you’re an experienced Haskeller and want to try out Tensor Flow, download our Tensor Flow Guide! It will walk you through incorporating the library into a Stack project!

## Appendix: Compiler Extensions and Imports

``````{-# LANGUAGE GADTs                #-}
{-# LANGUAGE DataKinds            #-}
{-# LANGUAGE KindSignatures       #-}
{-# LANGUAGE TypeOperators        #-}
{-# LANGUAGE ScopedTypeVariables  #-}
{-# LANGUAGE TypeFamilies         #-}
{-# LANGUAGE FlexibleContexts     #-}
{-# LANGUAGE UndecidableInstances #-}

import           Data.ByteString (ByteString)
import           Data.Int (Int64, Int8, Int16)
import           Data.Maybe (fromJust)
import           Data.Proxy (Proxy(..))
import           Data.Type.List (Union)
import qualified Data.Vector as VN
import           Data.Vector.Sized (Vector, toList, fromList)
import           Data.Word (Word8)
import           GHC.TypeLits (Nat, KnownNat, natVal)
import           GHC.TypeLits

import           TensorFlow.Core
import           TensorFlow.Core (Shape(..), TensorType, Tensor, Build)
import           TensorFlow.Ops (constant, add, matMul, placeholder)
import           TensorFlow.Session (runSession, run)
import           TensorFlow.Tensor (TensorKind)``````

# Deep Learning and Deep Types: Tensor Flow and Dependent Types

In the introduction to this series, one primary point I made was that Haskell is a safe language. There are a lot of errors we will catch at compile time, rather than runtime. Runtime errors can often be catastrophic to a system, so being able to reduce these is paramount. This is especially true when programming an autonomous car or drone. These objects will be out in the real world where they can hurt people if they malfunction.

So let’s take a look back at some of the code we’ve written over the last 3 or 4 weeks. Is it actually any safer? We’ll find the answer is, well, not so much. It's hard to verify certain properties about code. But the facilities for making this code safer do exist in Haskell! In the next couple articles we'll do some serious hacking with dependent types. We'll be able to prove some of these difficult properties of AI programs at compile time!

The next three articles will focus on dependent type programming. This is a difficult topic, so don’t worry if you can’t follow all the code examples completely. The main idea of making our machine learning code safer is what’s important! So without further ado, let’s rewind to the beginning to see where runtime issues can appear.

If you want to play with this code yourself, check out the dependent shapes branch on my Github repository! All the code for this article is in DepShape.hs Though if you want to get the code to run, you'll probably also need to get Haskell Tensor Flow working. Download our Haskell Tensor Flow Guide for instructions on that!

## Issues with Python

Python, as an interpreted language, is definitely subject to runtime bugs. As I was first learning Tensor Flow, I came across a lot of these that were quite common. The two that stood out to me most were placeholder failures and dimension mismatches. For instance, let’s think back to one of the first examples. Our code will have a couple of placeholders, and we submit values for those when we run the session:

``````node1 = tf.placeholder(tf.float32)
node2 = tf.placeholder(tf.float32)
sess = tf.Session()
result1 = sess.run(adderNode, {node1: 3, node2: 4.5 })``````

But there’s nothing stopping us from trying to run the session without submitting values. This will result in a runtime crash:

``````...
sess = tf.Session()
print(result1)
…

Terminal Output:

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder' with dtype float
[[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]``````

Another issue that came up from time to time was dimension mismatches. Certain operations need certain relationships between the dimensions of the tensors. For instance, you can’t add two vectors with different lengths:

``````node1 = tf.constant([3.0, 4.0, 5.0], dtype=tf.float32)
node2 = tf.constant([4.0, 16.0], dtype=tf.float32)

sess = tf.Session()
print(result)

…

Terminal Output:

ValueError: Dimensions must be equal, but are 3 and 2 for 'Add' (op: 'Add') with input shapes: , .``````

Again, we get a runtime crash. These seem like the kinds of problems we can solve at compile time.

## Does Haskell Solve these Issues?

But anyone who takes a close look at the Haskell code I’ve written so far can see that it doesn’t solve these issues! Here’s a review of our basic placeholder example:

``````runPlaceholder :: Vector Float -> Vector Float -> IO (Vector Float)
runPlaceholder input1 input2 = runSession \$ do
(node1 :: Tensor Value Float) <- placeholder 
(node2 :: Tensor Value Float) <- placeholder 
let runStep = \node1Feed node2Feed -> runWithFeeds
[ feed node1 node1Feed
, feed node2 node2Feed
]
runStep (encodeTensorData  input1) (encodeTensorData  input2)``````

Notice how the `runWithFeeds` function takes a list of `Feed` objects. The code would still compile fine if we supplied the empty list. Then it would face a fate no better than our Python code:

``````…
let runStep = \node1Feed node2Feed -> runWithFeeds [] adderNode
…

Terminal Output:

TensorFlowException TF_INVALID_ARGUMENT "You must feed a value for placeholder tensor 'Placeholder_1' with dtype float and shape \n\t [[Node: Placeholder_1 = Placeholder[dtype=DT_FLOAT, shape=, _device=\"/job:localhost/replica:0/task:0/cpu:0\"]()]]"``````

For the second example of dimensionality, we can also make this mistake in Haskell. The following code compiles and will crash at runtime:

``````runSimple :: IO (Vector Float)
runSimple = runSession \$ do
let node1 = constant  [3 :: Float, 4, 5]
let node2 = constant  [4 :: Float, 5]
…

Terminal Output:
TensorFlowException TF_INVALID_ARGUMENT "Incompatible shapes:  vs. \n\t [[Node: Add_2 = Add[T=DT_FLOAT, _device=\"/job:localhost/replica:0/task:0/cpu:0\"](Const_0, Const_1)]]"``````

At an even more basic level, we don’t even have to tell the truth about the shape of our vectors! We can give a bogus shape value and it will still compile!

``````let node1 = constant [3, 2, 3] [3 :: Float, 4, 5]
…

Terminal Output:
invalid tensor length: expected 18 got 3
CallStack (from HasCallStack):
error, called at src/TensorFlow/Ops.hs:299:23 in tensorflow-ops-0.1.0.0-EWsy8DQdciaL8o6yb2fUKR:TensorFlow.Ops``````

## Can we do better?

Now, we did do some things right. Let's think back to our `Model` type when we made neural networks.

``````data Model = Model
{ train :: TensorData Float
-> TensorData Int64
-> Session ()
, errorRate :: TensorData Float
-> TensorData Int64
-> SummaryTensor
-> Session (Float, ByteString)
}``````

We exposed our training step as a function. This function forced the user to supply both of the tensors for the placeholders. This is good, but doesn't protect us from dimension issues.

When trying to solve these, we could write wrappers around every operation. Functions like `add` and `matMul` could return `Maybe` values. But this would be clunky. We could take this same step in Python. Granted, monads would allow the Haskell version to compose better. But it would be nicer if we could check our errors all at once, up front.

If we’re willing to dig quite a bit deeper, we can solve these problems! In the rest of this post, we’ll explore using dependent types to ensure dimensions are always correct. Getting placeholders right turns out to be a little more complicated though! So we’ll save that for next week’s post.

## Checking Dimensions

Currently, the Tensor Types we’ve been dealing with have no type safety on the dimensions. Tensor Flow doesn't provide this information when interacting with the C library. So it’s impossible to enforce it at a low level. But this doesn’t stop us from writing wrappers that allow us to solve this.

To write these wrappers, we’re going to need to dive into dependent types. I’ll give a high level overview of what’s going on. But for some details on the basics, you should check out this tutorial . I’ll also give a shout-out to Renzo Carbonara, author of the Exinst library and other great Haskell things. He helped me a lot in crossing a couple big knowledge gaps for implementing dependent types.

## Intro to Dependent Types: Sized Vectors

The simplest example for introducing dependent types is the idea of sized vectors. If you read the tutorial above, you'll see how they're implemented from scratch. A normal vector has a single type parameter, referring to what type of item the vector contains. A sized vector has an extra type parameter, and this type refers to the size of the vector. For instance, the following are valid sized vector types:

``````import Data.Vector.Sized (Vector, fromList)

vectorWith2 :: Vector 2 Int64
...
vectorWith6 :: Vector 6 Float
...``````

In the first type signature, `2` does not refer to the term 2. It refers to the type 2. That is, we’ve taken the term and promoted it to a type which has only a single value. The mechanics of how this works are confusing, but here’s the result. We can try to convert normal vectors to sized vectors. But the operation will fail if we don’t match up the size.

``````import Data.Vector.Sized (Vector, fromList)
import GHC.TypeLits (KnownNat)

-- fromList :: (KnownNat n) => [a] -> Maybe (Vector n a)

-- This results in a “Just” value!
success :: Maybe (Vector 2 Int64)
success = fromList [5,6]

-- The sizes don’t match, so we’ll get “Nothing”!
failure :: Maybe (Vector 2 Int64)
failure = fromList [3,1,5]``````

The `KnownNat` constraint allows us to specify that the type `n` refers to a single natural number. So now we can assign a type signature that encapsulates the size of the list.

## A “Safe” Shape type

Now that we have a very basic understanding of dependent types, let's come up with a gameplan for Tensor Flow. The first step will be to make a new type that puts the shape into the type signature. We'll make a `SafeShape` type that mimics the sized vector type. Instead of storing a single number as the type, it will store the full list of dimensions. We want to create an API something like this:

``````-- fromShape :: Shape -> Maybe (SafeShape s)

-- Results in a “Just” value
goodShape :: Maybe (SafeShape ‘[2, 2])
goodShape = fromShape (Shape [2,2])

-- Results in Nothing
badShape :: Maybe (SafeShape ‘[2,2])
badShape = fromShape (Shape [3,3,2])``````

So to do this, we first define the `SafeShape` type. This follows the example of sized vectors. See the appendix below for compiler extensions and imports used throughout this article. In particular, you want GADTs and DataKinds.

``````data SafeShape (s :: [Nat]) where
NilShape :: SafeShape '[]
(:--) :: KnownNat m => Proxy m -> SafeShape s -> SafeShape (m ': s)

infixr 5 :--``````

Now we can define the `toShape` function. This will take our `SafeShape` and turn it into a normal `Shape` using proxies.

``````toShape :: SafeShape s -> Shape
toShape NilShape = Shape []
toShape ((pm :: Proxy m) :-- s) = Shape (fromInteger (natVal pm) : s')
where
(Shape s') = toShape s``````

Now for the reverse direction, we first have to make a class `MkSafeShape`. This class encapsulates all the types that we can turn into the `SafeShape` type. We’ll define instances of this class for all lists of naturals.

``````class MkSafeShape (s :: [Nat]) where
mkSafeShape :: SafeShape s
instance MkSafeShape '[] where
mkSafeShape = NilShape
instance (MkSafeShape s, KnownNat m) => MkSafeShape (m ': s) where
mkSafeShape = Proxy :-- mkSafeShape``````

Now we can define our `fromShape` function using the `MkSafeShape` class. To check if it works, we’ll compare the resulting shape to the input shape and make sure they’re equal. Note this requires us to define a simple instance of `Eq Shape`.

``````instance Eq Shape where
(==) (Shape s) (Shape r) = s == r

fromShape :: forall s. MkSafeShape s => Shape -> Maybe (SafeShape s)
fromShape shape = if toShape myShape == shape
then Just myShape
else Nothing
where
myShape = mkSafeShape :: SafeShape s``````

Now that we’ve done this for Shape, we can create a similar type for `Tensor` that will store the shape as a type parameter.

``````data SafeTensor v a (s :: [Nat]) where
SafeTensor :: (TensorType a) => Tensor v a -> SafeTensor v a s``````

## Using our Safe Types

So what has all this gotten us? Our next goal is to create a `safeConstant` function. This will let us create a `SafeTensor` wrapping a constant tensor and storing the shape. Remember, `constant` takes a shape and a vector without ensuring correlation between them. We want something like this:

``````safeConstant :: (TensorType a) => Vector n a -> SafeShape s -> SafeTensor Build a s
safeConstant elems shp = SafeTensor \$ constant (toShape shp) (toList elems)``````

This will attach the given shape to the tensor. But there’s one piece missing. We also want to create a connection between the number of input elements and the shape. So something with shape `[3,3,2]` should force you to input a vector of length 18. And right now, there is no constraint between `n` and `s`.

We’ll add this with a type family called `ShapeProduct`. The instances will state that the correct natural type for a given list of naturals is the product of them. We define the second instance with recursion, so we'll need `UndecidableInstances`.

``````type family ShapeProduct (s :: [Nat]) :: Nat
type instance ShapeProduct '[] = 1
type instance ShapeProduct (m ': s) = m * ShapeProduct s``````

Now we’re almost done with this part! We can fix our `safeConstant` function by adding a constraint on the `ShapeProduct` between `s` and `n`.

``````safeConstant :: (TensorType a, ShapeProduct s ~ n) => Vector n a -> SafeShape s -> SafeTensor Build a s
safeConstant elems shp = SafeTensor \$ constant (toShape shp) (toList elems)``````

Now we can write out a simple use of our `safeConstant` function as follows:

``````main :: IO (VN.Vector Int64)
main = runSession \$ do
let (shape1 :: SafeShape '[2,2]) = fromJust \$ fromShape (Shape [2,2])
let (elems1 :: Vector 4 Int64) = fromJust \$ fromList [1,2,3,4]
let (constant1 :: SafeTensor Build Int64 '[2,2]) = safeConstant elems1 shape1
let (SafeTensor t) = constant1
run t``````

We’re using `fromJust` as a shortcut here. But in a real program you would read your initial tensors in and check them as `Maybe` values. There's still the possibility for runtime failures. But this system has a couple advantages. First, it won't crash. We'll have the opportunity to handle it gracefully. Second, we do all the error checking up front. Once we've assigned types to everything, all the failure cases should be covered.

Going back to the last example, let's change something. For instance, we could make our vector have length 3 instead of 4. We’ll now get a compile error!

``````main :: IO (VN.Vector Int64)
main = runSession \$ do
let (shape1 :: SafeShape '[2,2]) = fromJust \$ fromShape (Shape [2,2])
let (elems1 :: Vector 3 Int64) = fromJust \$ fromList [1,2,3]
let (constant1 :: SafeTensor Build Int64 '[2,2]) = safeConstant elems1 shape1
let (SafeTensor t) = constant1
run t

…

• Couldn't match type ‘4’ with ‘3’
arising from a use of ‘safeConstant’
• In the expression: safeConstant elems1 shape1
In a pattern binding:
(constant1 :: SafeTensor Build Int64 '[2, 2])
= safeConstant elems1 shape1``````

## Adding Type Safe Operations

Now that we’ve attached shape information to our tensors, we can define safer math operations. It's easy to write a safe addition function that ensures that the tensors have the same shape:

``````safeAdd :: (TensorType a, a /= Bool) => SafeTensor Build a s -> SafeTensor Build a s -> SafeTensor Build a s
safeAdd (SafeTensor t1) (SafeTensor t2) = SafeTensor (t1 `add` t2)``````

Here’s a similar matrix multiplication function. It ensures we have 2-dimensional shapes and that the dimensions work out. Notice the two tensors share the `n` dimension. It must be the column dimension of the first tensor and the row dimension of the second tensor:

``````safeMatMul :: (TensorType a, a /= Bool, a /= Int8, a /= Int16, a /= Int64, a /= Word8, a /= ByteString)
=> SafeTensor Build a '[i,n] -> SafeTensor Build a '[n,o] -> SafeTensor Build a '[i,o]
safeMatMul (SafeTensor t1) (SafeTensor t2) = SafeTensor (t1 `matMul` t2)``````

Here are these functions in action:

``````main2 :: IO (VN.Vector Float)
main2 = runSession \$ do
let (shape1 :: SafeShape '[4,3]) = fromJust \$ fromShape (Shape [4,3])
let (shape2 :: SafeShape '[3,2]) = fromJust \$ fromShape (Shape [3,2])
let (shape3 :: SafeShape '[4,2]) = fromJust \$ fromShape (Shape [4,2])
let (elems1 :: Vector 12 Float) = fromJust \$ fromList [1,2,3,4,1,2,3,4,1,2,3,4]
let (elems2 :: Vector 6 Float) = fromJust \$ fromList [5,6,7,8,9,10]
let (elems3 :: Vector 8 Float) = fromJust \$ fromList [11,12,13,14,15,16,17,18]
let (constant1 :: SafeTensor Build Float '[4,3]) = safeConstant elems1 shape1
let (constant2 :: SafeTensor Build Float '[3,2]) = safeConstant elems2 shape2
let (constant3 :: SafeTensor Build Float '[4,2]) = safeConstant elems3 shape3
let (multTensor :: SafeTensor Build Float '[4,2]) = constant1 `safeMatMul` constant2
let (addTensor :: SafeTensor Build Float '[4,2]) = multTensor `safeAdd` constant3
let (SafeTensor finalTensor) = addTensor
run finalTensor``````

And of course we’ll get compile errors if we use incorrect dimensions anywhere. Let’s say we change `multTensor` to use `[4,3]` as its type:

``````• Couldn't match type ‘2’ with ‘3’
Expected type: SafeTensor Build Float '[4, 3]
Actual type: SafeTensor Build Float '[4, 2]
• In the expression: constant1 `safeMatMul` constant2
…
• Couldn't match type ‘3’ with ‘2’
Expected type: SafeTensor Build Float '[4, 2]
Actual type: SafeTensor Build Float '[4, 3]
• In the expression: multTensor `safeAdd` constant3
…
• Couldn't match type ‘2’ with ‘3’
Expected type: SafeTensor Build Float '[4, 3]
Actual type: SafeTensor Build Float '[4, 2]
• In the second argument of ‘safeAdd’, namely ‘constant3’``````

## Conclusion

In this exercise we got deep into the weeds of one of the most difficult topics to learn about in Haskell. Dependent types will make your head spin at first. But we saw a concrete example of how they can allow us to detect problematic code at compile time. They are a form of documentation that also enables us to verify that our code is correct in certain ways.

Types do not replace tests (especially behavioral tests). But in this instance there are at least a few different test cases we don’t need to worry about too much. Next week, we’ll see how we can apply these principles to verifying placeholders.

If you want to learn more about the nuts and bolts of using Haskell Tensor Flow, you should check out our Tensor Flow Guide. It will guide you through the basics of adding Tensor Flow to a simple Stack project.

Maybe you’ve never used Haskell before but I’ve convinced you that dependent types are the future. If you want to try it out, download our Getting Started Checklist. You can also learn how to create and organize Haskell projects using Stack! Checkout our Stack mini-course!

## Appendix: Extensions and Imports

``````{-# LANGUAGE GADTs                #-}
{-# LANGUAGE DataKinds            #-}
{-# LANGUAGE KindSignatures       #-}
{-# LANGUAGE TypeOperators        #-}
{-# LANGUAGE ScopedTypeVariables  #-}
{-# LANGUAGE TypeFamilies         #-}
{-# LANGUAGE UndecidableInstances #-}

import           Data.ByteString (ByteString)
import           Data.Constraint (Constraint)
import           Data.Int (Int64, Int8, Int16)
import           Data.Maybe (fromJust)
import           Data.Proxy (Proxy(..))
import qualified Data.Vector as VN
import           Data.Vector.Sized (Vector(..), toList, fromList)
import           Data.Word (Word8)
import           GHC.TypeLits (Nat, KnownNat, natVal)
import           GHC.TypeLits

import           TensorFlow.Core
import           TensorFlow.Core (Shape(..), TensorType, Tensor, Build)
import           TensorFlow.Ops (constant, add, matMul)
import           TensorFlow.Session (runSession, run)``````

# Deeper Still: Convolutional Neural Networks

Two weeks ago, we began our machine study in earnest by constructing a full neural network. But this network was still quite simple by deep learning standards. In this article, we’re going to tackle a much more difficult problem: image recognition. Of course, we’ll still be using a well known data set with well-known results, so this is only the tip of the iceberg. We'll be using the MNIST data set. This set classifies images of handwritten digits as the numbers 0-9. This problem is so well-known that the folks at Tensor Flow refer to it as the “Hello World” of machine learning.

We’ll start this problem by using a very similar approach to what we used with the Iris data set. We’ll make a fully-connected neural network with two layers, and then use the “Adam” optimizer. This will give us some decent results by our beginner standards. But MNIST is a well known problem with a very large data set. So we’re going to hold ourselves to a higher standard of accuracy this time. This will force us to use some more advanced techniques. But to start with, let’s examine what we need to change to adapt our Iris model to work for the MNIST problem. As with the last couple weeks, the code for all this is on Github if you want to follow along.

## Re-use and Recycle!

Generally, we can re-use most of the code we had with Iris, which is good news! We still have to make a few adjustments here and there though. First, we’ll use some different constants. We’ll use `mnistFeatures` in place of `irisFeatures`, and `mnistLabels` instead of `irisLabels`. We’ll also bump up the size of our hidden layer and the number of samples we’ll draw on each iteration:

``````mnistFeatures :: Int64
mnistFeatures = 784

mnistLabels :: Int64
mnistLabels = 10

numHiddenUnits :: Int64
numHiddenUnits = 1024

sampleSize :: Int
sampleSize = 1000``````

We’ll also change our model to use `Word8` as the result type instead `Int64`.

``````data Model = Model
{ train :: TensorData Float
-> TensorData Word8 -- Used to be Int64
-> Session ()
, errorRate :: TensorData Float
-> TensorData Word8 -- Used to be Int64
-> SummaryTensor
-> Session (Float, ByteString)
}``````

Now we have to change how we get our input data. Our data isn’t in CSV format this time. We’ll use helper functions from the Tensor Flow library to extract the images and labels:

``````import TensorFlow.Examples.MNIST.Parse (readMNISTSamples, readMNISTLabels)
…
runDigits :: FilePath -> FilePath -> FilePath -> FilePath -> IO ()
runDigits trainImageFile trainLabelFile testImageFile testLabelFile =
withEventWriter eventsDir \$ \eventWriter -> runSession \$ do

-- trainingImages, testImages :: [Vector Word8]
trainingImages <- liftIO \$ readMNISTSamples trainImageFile
testImages <- liftIO \$ readMNISTSamples testImageFile

-- traininglabels, testLabels :: [Word8]
trainingLabels <- liftIO \$ readMNISTLabels trainLabelFile
testLabels <- liftIO \$ readMNISTLabels testLabelFile

-- trainingRecords, testRecords :: Vector (Vector Word8, Word8)
let trainingRecords = fromList \$ zip trainingImages trainingLabels
let testRecords = fromList \$ zip testImages testLabels
...``````

Our “input” type consists of vectors of `Word8` elements. These represent the intensity of various pixels. Our “output” type is `Word8`, referring to the actual labels (0-9). We read the images and labels from separate files. Then we zip them together to pass to our processing functions. We’ll have to make a few changes to these processing functions for this data set. First, we have to generalize the type of our randomization function:

``````-- Used to be IrisRecord Specific
chooseRandomRecords :: Vector a -> IO (Vector a)``````

Next we have to write a new encoding function that will put our data into the `TensorData` format. This looks like our old version, except dealing with the new tuple type instead of the `IrisRecord`.

``````convertDigitRecordsToTensorData
:: Vector (Vector Word8, Word8)
-> (TensorData Float, TensorData Word8)
convertDigitRecordsToTensorData records = (input, output)
where
numRecords = Data.Vector.length records
input = encodeTensorData [fromIntegral numRecords, mnistFeatures]
(fromList \$ concatMap recordToInputs records)
output = encodeTensorData [fromIntegral numRecords] (snd <\$> records)
recordToInputs :: (Vector Word8, Word8) -> [Float]
recordToInputs rec = fromIntegral <\$> (toList . fst) rec``````

And then we just have to substitute our new functions and parameters in, and we’ll be able to run our digit trainer!

``````Current training error 89.8
Current training error 19.300001
Current training error 13.300001
Current training error 11.199999
Current training error 8.700001
Current training error 6.5999985
Current training error 6.999999
Current training error 5.199999
Current training error 4.400003
Current training error 5.000001
Current training error 2.3000002

test error 6.830001``````

So our accuracy is 93.2%. This seems like an alright number. But imagine being a Post office and having 6.8% of your mail sorted into the wrong Zip Code! (This was the original use case of this data set). So let’s see if we can do better.

## Convolution and Max Pooling

Now we could train our model longer. This will tend to improve our error rate. But we can also help ourselves by making our model more complex. The fundamental flaw with what we’ve got so far is that it doesn’t account for the 2D nature of the images. This means we're losing a ton of useful information. So the first thing we'll do is treat our images as being 28x28 tensors instead of 1x784. This way, our model can pick out specific areas that are significant for the identification of the digit.

One thing we want to account for is that our image might not be in the center of the frame. To account for this, we're going to apply convolution. When using convolution, we break the image into many different overlapping tiles. In our case, we’ll make our strides size “1” in every direction, and we’ll use a patch size of 5x5. So this means we’ll center a 5x5 tile around each different pixel in our image, and then come up with a score for it. That score tells us if this part of the image contains any important information. We can represent this score as a vector with many features.

So with 2D convolution, we'll be dealing with 4-dimensional tensors. The first dimension is the sample size. The second two dimensions are the shape of the image. The final dimension is the number of features of the "score" for each part of the image. So each original image starts our with a single feature for the "score" of each pixel. This score is the actual intensity of that pixel! Then each layer of convolution will act as a mini neural network per pixel, making as many features as we want.

The different sliding windows correspond to scores we store in the next layer. This example uses 3x3 patches; we'll use 5x5.

Max pooling is a form of down-sampling. After our first convolution step, we’ll have scores on the 28x28 image. We’ll use 2x2 max-pooling, meaning we divide each image into 2x2 squares. Then we’ll make a new layer that is 14x14, using only the “best” score from each 2x2 box. This makes our model more efficient while keeping the most important information.

Simple demonstration of max-pooling

## Implementing Convolutional Layers

We’ll do two rounds of convolution and max pooling. So we’ll make a function that creates a layer that performs these two steps. This will look a lot like our other neural network layer. We’ll take parameters for the size of the input and output channels of the layer, as well as the tensor itself. So our first step will be to create the weights and bias tensors using these parameters:

``````patchSize :: Int64
patchSize = 5

buildConvPoolLayer :: Int64 -> Int64 -> Tensor v Float -> Text
-> Build (Variable Float, Variable Float, Tensor Build Float)
buildConvPoolLayer inputChannels outputChannels input layerName = withNameScope layerName \$ do
weights <- truncatedNormal (vector weightsShape)
>>= initializedVariable
bias <- truncatedNormal (vector [outputChannels]) >>= initializedVariable
...
where
weightsShape :: [Int64]
weightsShape = [patchSize, patchSize, inputChannels, outputChannels]``````

Now we’ll want to call our convolution and max pooling functions. These are still a little rough around the edges (the Haskell library is still quite young). The C versions of these functions have many optional, named attributes. For the moment there don’t seem to be any functions that use normal Haskell values for these arguments. Instead, we’ll be using `OpAttr` values, assign bytestring names to values.

``````where
...
convStridesAttr = opAttr "strides" .~ ([1,1,1,1] :: [Int64])
poolStridesAttr = opAttr "strides" .~ ([1,2,2,1] :: [Int64])
poolKSizeAttr = opAttr "ksize" .~ ([1,2,2,1] :: [Int64])
paddingAttr = opAttr "padding" .~ ("SAME" :: ByteString)
dataFormatAttr = opAttr "data_format" .~ ("NHWC" :: ByteString)
convAttrs = convStridesAttr . paddingAttr . dataFormatAttr
poolAttrs = poolKSizeAttr . poolStridesAttr . paddingAttr . dataFormatAttr``````

The `strides` argument for convolution refers to how much we shift the window each time. The `strides` argument for pooling refers to how big the windows will be that we perform the pooling over. In this case, it's 2x2. Now that we have our attributes, we can call the library functions `conv2D’` and `maxPool’`. This gives our resulting vector. We also throw in a call to `relu` between these steps.

``````buildConvPoolLayer :: Int64 -> Int64 -> Tensor v Float -> Text
-> Build (Variable Float, Variable Float, Tensor Build Float)
buildConvPoolLayer inputChannels outputChannels input layerName = withNameScope layerName \$ do
weights <- truncatedNormal (vector weightsShape)
>>= initializedVariable
bias <- truncatedNormal (vector [outputChannels]) >>= initializedVariable
let conv = conv2D' convAttrs input (readValue weights)
let results = maxPool' poolAttrs (relu conv)
return (weights, bias, results)
where
...``````

## Modifying our Model

Now we’ll make a few updates to our model and we’ll be in good shape. First, we need to reshape our input data to be 4-dimensional. Then, we’ll apply the two convolution/pooling layers:

``````imageDimen :: Int32
imageDimen = 28

createModel :: Build Model
createModel = do
let batchSize = -1 -- Allows variable sized batches
let conv1OutputChannels = 32
let conv2OutputChannels = 64
let denseInputSize = 7 * 7 * 64 :: Int32 -- 3136
let numHiddenUnits = 1024

inputs <- placeholder [batchSize, mnistFeatures]
outputs <- placeholder [batchSize]

let inputImage = reshape inputs (vector [batchSize, imageDimen, imageDimen, 1])

(convWeights1, convBiases1, convResults1) <-
buildConvPoolLayer 1 conv1OutputChannels inputImage "convLayer1"
(convWeights2, convBiases2, convResults2) <-
buildConvPoolLayer conv1OutputChannels conv2OutputChannels convResults1 "convLayer2"``````

Once we’re done with that, we’ll apply two fully-connected (dense) layers as we did before. Note we'll reshape our result from four dimensions back down to two:

``````let denseInput = reshape convResults2 (vector [batchSize, denseInputSize])
(denseWeights1, denseBiases1, denseResults1) <-
buildNNLayer (fromIntegral denseInputSize) numHiddenUnits denseInput "denseLayer1"
let rectifiedDenseResults = relu denseResults1
(denseWeights2, denseBiases2, denseResults2) <-
buildNNLayer numHiddenUnits mnistLabels rectifiedDenseResults "denseLayer2"``````

And after that we can treat the rest of the model the same. We'll update the parameter names and add the new weights and biases to the `params` that the model can change.

As a review, let’s look at the dimensions of each of the intermediate tensors here. Then we can see the restrictions on the dimensions of the different operations. Each convolution step takes two four-dimensional tensors. The final dimension of argument 1 must match the third dimension of argument 2. Then the result will swap in the final dimension of argument 2. Meanwhile, pooling with a 2x2 stride size will take this resulting 4-dimensional tensor and halve each of the inner dimensions.

``````input: n x 784
inputImage: n x 28 x 28 x 1
convWeights1: 5 x 5 x 1 x 32
convBias1: 32
conv (first layer): n x 28 x 28 x 32
convResults1: n x 14 x 14 x 32
convWeights2:  5 x 5 x 32 x 64
conv (second layer): n x 14 x 14 x 64
convResults2: n x 7 x 7 x 64
denseInput: n x 3136
denseWeights1: 3136 x 1024
denseBias1: 1024
denseResults1: n x 1024
denseWeights2: 1024 x 10
denseBias2: 10
denseResults2: n x 10``````

So for each input, we’ll have a probability for all 10 of the possible inputs. We pick the greatest of these as the chosen label.

## Results

We’ll run our model again, only this time we’ll use a smaller sample size (100 per training iteration). This allows us to train for more iterations (20000). This takes quite a while to train, but we get these results (printed every 1000 iterations).

``````Current training error 91.0
Current training error 6.0
Current training error 2.9999971
Current training error 2.9999971
Current training error 0.0
Current training error 0.0
Current training error 0.99999905
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0

test error 1.1799991``````

Not too bad! Once it got going, we saw very few training errors, though still ended up with a model that was a tad overfit. The Tensor Flow MNIST expert tutorial suggests using a dropout factor. This helps diminish the effects of overfitting. But this option isn’t available to us yet in Haskell. Still, we got close to 99% accuracy, which is a success for us!

And here’s what our final graph looks like. Notice the extra layers we added for convolution:

## Conclusion

So that’s it for convolutional neural networks! Our goal was to adapt our previous neural network model to recognize digits. Not only did we pick a harder problem, but we also wanted higher accuracy. We achieved this by using more advanced machine learning techniques. Convolution allowed us to use the full 2x2 nature of the image. It also checked for the digit no matter where it was in the image. Max pooling enabled us to make our algorithm more efficient while keeping the most important information.

If you’re itching to see what else Haskell can do with Tensor Flow, check out our Tensor Flow Guide. It’ll walk you through some of the trickier parts of getting the library working on your local machine. It will also go through the most important information you need to know about the types in this library.

If you’re new to Haskell, here are a couple more resources you can dig into before you try your hand at Tensor Flow. First, there’s our Getting Started Checklist. This will point you toward some helpful resources for learning the language. Next, you can check out our Stack mini-course so you can learn how to organize a project! If you want to use Tensor Flow in Haskell, Stack is a must!

# Putting the Flow in Tensor Flow!

Last week we built our first neural network and used it on a real machine learning problem. We took the Iris data set and built a classifier that took in various flower measurements. It determined, with decent accuracy, the type of flower the measurements referred to.

But we’ve still only seen half the story of Tensor Flow! We’ve constructed many tensors and combined them in interesting ways. We can imagine what is going on with the “flow”, but we haven’t seen a visual representation of that yet.

We’re in luck though, thanks to the Tensor Board application. With it, we can visualize the computation graph we've created. We can also track certain values throughout our program run. In this article, we’ll take our Iris example and show how we can add Tensor Board features to it. Here's the Github repo with all the code so you can follow along!

## Add an Event Writer

The first thing to understand about Tensor Board is that it gets its data from a source directory. While we’re running our system, we have to direct it to write events to that directory. This will allow Tensor Board to know what happened in our training run.

``````eventsDir :: FilePath
eventsDir = "/tmp/tensorflow/iris/logs/"

runIris :: FilePath -> FilePath -> IO ()
runIris trainingFile testingFile = withEventWriter eventsDir \$ \eventWriter -> runSession \$ do
...``````

By itself, this doesn’t write anything down into that directory though! To understand the consequences of this, let’s boot up tensor board.

## Running Tensor Board

Running our executable again doesn't bring up tensor board. It merely logs the information that Tensor Board uses. To actually see that information, we’ll run the `tensorboard` command.

``````>> tensorboard --logdir=’/tmp/tensorflow/iris/logs’
Starting TensorBoard 47 at http://0.0.0.0:6006``````

Then we can point our web browser at the correct port. Since we haven't written anything to the file yet, there won’t be much for us to see other than some pretty graphics. So let’s start by logging our graph. This is actually quite easy! Remember our model? We can use the `logGraph` function combined with our event writer so we can see it.

``````model <- build createModel
logGraph eventWriter createModel``````

Now when we refresh Tensor Flow, we’ll see our system’s graph.

What the heck is going on here?

But, it’s very large and very confusing. The names of all the nodes are a little confusing, and it’s not clear what data is going where. Plus, we have no idea what’s going on with our error rate or anything like that. Let’s make a couple adjustments to fix this.

So the first step is to actually specify some measurements that we’ll have Tensor Board plot for us. One node we can use is a “scalar summary”. This provides us with a summary of a particular value over the course of our training run. Let’s do this with our `errorRate` node. We can use the simple `scalarSummary` function.

``````errorRate_ <- render \$ 1 - (reduceMean (cast correctPredictions))
scalarSummary "Error" errorRate_``````

The second type of summary is a `histogram` summary. We use this on a particular tensor to see the distribution of its values over the course of the run. Let’s do this with our second set of weights. We need to use `readValue` to go from a `Variable` to a `Tensor`.

``````(finalWeights, finalBiases, finalResults) <-
buildNNLayer numHiddenUnits irisLabels rectifiedHiddenResults
histogramSummary "Weights" (readValue finalWeights)``````

So let’s run tensor flow again. We would expect to see these new values show up under the `Scalars` and `Histograms` tabs. But they don’t. This is because we still to write these results to our event writer. And this turns out to be a little complicated. First, before we start training, we have to create a tensor representing all our summaries.

``````logGraph eventWriter createModel
summaryTensor <- build mergeAllSummaries``````

Now if we had no placeholders, we could run this tensor whenever we wanted, and it would output the values. But our summary tensors depend on the input placeholders, which complicates the matter. So here’s what we’ll do. We’ll only write out the summaries when we check our error rate (every 100 steps). To do this, we have to change our error rate in the model to take the summary tensor as an extra argument. We’ll also have it add a `ByteString` as a return value to the original `Float`.

``````data Model = Model
{ train :: TensorData Float
-> TensorData Int64
-> Session ()
, errorRate :: TensorData Float
-> TensorData Int64
-> SummaryTensor
-> Session (Float, ByteString)
}``````

Within our model definition, we’ll use this extra parameter. It will run both the `errorRate_` tensor AND the summary tensor together with the feeds:

``````return \$ Model
, train = ...
, errorRate = \inputFeed outputFeed summaryTensor -> do
(errorTensorResult, summaryTensorResult) <- runWithFeeds
[ feed inputs inputFeed
, feed outputs outputFeed
]
(errorRate_, summaryTensor)
return (unScalar errorTensorResult, unScalar summaryTensorResult)``````

Now we need to modify our calls to `errorRate` below. We’ll pass the summary tensor as an argument, and get the bytes as output. We’ll write it to our event writer (after decoding), and then we’ll be done!

``````-- Training
forM_ ([0..1000] :: [Int]) \$ \i -> do
trainingSample <- liftIO \$ chooseRandomRecords trainingRecords
let (trainingInputs, trainingOutputs) = convertRecordsToTensorData trainingSample
(train model) trainingInputs trainingOutputs
when (i `mod` 100 == 0) \$ do
(err, summaryBytes) <- (errorRate model) trainingInputs trainingOutputs summaryTensor
let summary = decodeMessageOrDie summaryBytes
liftIO \$ putStrLn \$ "Current training error " ++ show (err * 100)
logSummary eventWriter (fromIntegral i) summary

liftIO \$ putStrLn ""

-- Testing
let (testingInputs, testingOutputs) = convertRecordsToTensorData testRecords
(testingError, _) <- (errorRate model) testingInputs testingOutputs summaryTensor
liftIO \$ putStrLn \$ "test error " ++ show (testingError * 100)``````

Now we can see what our summaries look like by running tensor board again!

Scalar Summary of our Error Rate

Histogram summary of our final weights.

## Annotating our Graph

Now let’s look back to our graph. It’s still a bit confusing. We can clean it up a lot by creating “name scopes”. A name scope is part of the graph that we set aside under a single name. When Tensor Board generates our graph, it will create one big block for the scope. We can then zoom in and examine the individual nodes if we want.

We’ll make three different scopes. First, we’ll make a scope for each of the hidden layers of our neural network. This is quite easy, since we already have a function for creating these. All we have to do is make the function take an extra parameter for the name of the scope we want. Then we wrap the whole function within the `withNameScope` function.

``````buildNNLayer :: Int64 -> Int64 -> Tensor v Float -> Text
-> Build (Variable Float, Variable Float, Tensor Build Float)
buildNNLayer inputSize outputSize input layerName = withNameScope layerName \$ do
weights <- truncatedNormal (vector [inputSize, outputSize]) >>= initializedVariable
bias <- truncatedNormal (vector [outputSize]) >>= initializedVariable
let results = (input `matMul` readValue weights) `add` readValue bias
return (weights, bias, results)``````

We supply our name further down in the code:

``````(hiddenWeights, hiddenBiases, hiddenResults) <-
buildNNLayer irisFeatures numHiddenUnits inputs "layer1"
let rectifiedHiddenResults = relu hiddenResults
(finalWeights, finalBiases, finalResults) <-
buildNNLayer numHiddenUnits irisLabels rectifiedHiddenResults "layer2"``````

Now we’ll add a scope around all our error calculations. First, we combine these into an action wrapped in `withNameScope`. Then, observing that we need the `errorRate_` and `train_` steps, we return those from the block. That’s it!

``````(errorRate_, train_) <- withNameScope "error_calculation" \$ do
actualOutput <- render \$ cast \$ argMax finalResults (scalar (1 :: Int64))
let correctPredictions = equal actualOutput outputs
er <- render \$ 1 - (reduceMean (cast correctPredictions))
scalarSummary "Error" er

let outputVectors = oneHot outputs (fromIntegral irisLabels) 1 0
let loss = reduceMean \$ fst \$ softmaxCrossEntropyWithLogits finalResults outputVectors
let params = [hiddenWeights, hiddenBiases, finalWeights, finalBiases]
tr <- minimizeWith adam loss params
return (er, tr)``````

Now when we look at our graph, we see that it’s divided into three parts: our two layers, and our error calculation. All the information flows among these three parts (as well as the "Adam" optimizer portion).

Much Better

## Conclusion

By default, Tensor Board graphs can look a little messy. But by adding a little more information to the nodes and using scopes, you can paint a much clearer picture. You can see how the data flows from one end of the application to the other. We can also use summaries to track important information about our graph. We’ll use this most often for the loss function or error rate. Hopefully, we'll see it decline over time.

Next week we’ll add some more complexity to our neural networks. We'll see new tensors for convolution and max pooling. This will allow us to solve the more difficult MNIST digit recognition problem. Stay tuned!

If you’re itching to try out some Tensor Board functionality for yourself, check out our in-depth Tensor Flow guide. It goes into more detail about the practical aspects of using this library. If you want to get the Haskell Tensor Flow library running on your local machine, check it out! Trust me, it's a little complicated, unless you're a Stack wizard already!

And if this is your first exposure to Haskell, try it out! Take a look at our guide to getting started with the language!

# Digging in Deep: Solving a Real Problem with Haskell Tensor Flow

Last week we got acquainted with the core concepts of Tensor Flow. We learned about the differences between constants, placeholders, and variable tensors. Both the Haskell and Python bindings have functions to represent these. The Python version was a bit simpler though. Once we had our tensors, we wrote a program that “learned” a simple linear equation.

This week, we’re going to solve an actual machine learning problem. We’re going to use the Iris data set, which contains measurements of different Iris flowers. Each flower belongs to one of three species. Our program will "learn" a function choosing the species from the measurements. This function will involved a fully-connected neural network.

## Formatting our Input

The first step in pretty much any machine learning problem is data processing. After all, our data doesn’t magically get resolved into Haskell data types. Luckily, Cassava is a great library to help us out. The Iris data set consists of data in .csv files that each have a header line and then a series of records. They look a bit like this:

``````120,4,setosa,versicolor,virginica
6.4,2.8,5.6,2.2,2
5.0,2.3,3.3,1.0,1
4.9,2.5,4.5,1.7,2
4.9,3.1,1.5,0.1,0
...``````

Each line contains one record. A record has four flower measurements, and a final label. In this case, we have three types of flowers we are trying to classify between: Iris Setosa, Iris Versicolor, and Iris Virginica. So the last column contains the numbers 0,1, and 2, corresponding to these respective classes.

Let's create a data type representing each record. Then we can parse the file line-by-line. Our `IrisRecord` type will contain the feature data and the resulting label. This type will act as a bridge between our raw data and the tensor format we’ll need to run our learning algorithm. We’ll derive the “Generic” typeclass for our record type, and use this to get `FromRecord`. Once our type has an instance for `FromRecord`, we can parse it with ease. As a note, throughout this article, I’ll be omitting the imports section from the code samples. I’ve included a full list of imports from these files as an appendix at the bottom. We'll also be using the `OverloadedLists` extension throughout.

``````{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedLists #-}

...

data IrisRecord = IrisRecord
{ field1 :: Float
, field2 :: Float
, field3 :: Float
, field4 :: Float
, label  :: Int64
}
deriving (Generic)

instance FromRecord IrisRecord``````

Now that we have our type, we’ll write a function, `readIrisFromFile`, that will read our data in from a CSV file.

``````readIrisFromFile :: FilePath -> IO (Vector IrisRecord)
readIrisFromFile fp = do
contents <- readFile fp
let contentsAsBs = pack contents
let results = decode HasHeader contentsAsBs :: Either String (Vector IrisRecord)
case results of
Left err -> error err
Right records -> return records``````

We won’t want to always feed our entire data set into our training system. So given a whole slew of these items, we should be able to pick out a random sample.

``````sampleSize :: Int
sampleSize = 10

chooseRandomRecords :: Vector IrisRecord -> IO (Vector IrisRecord)
chooseRandomRecords records = do
let numRecords = Data.Vector.length records
chosenIndices <- take sampleSize <\$> shuffleM [0..(numRecords - 1)]
return \$ fromList \$ map (records !) chosenIndices``````

Once we’ve selected our vector of records to use for each run, we’re still not done. We need to take these records and transform them into the `TensorData` that we’ll feed into our algorithm. We create items of `TensorData` by feeding in a shape and then a 1-dimensional vector of values. First, we need to know the shapes of our input and output. Both of these depend on the number of rows in the sample. The “input” will have a column for each of the four features in our set. The output meanwhile will have a single column for the label values.

``````irisFeatures :: Int64
irisFeatures = 4

irisLabels :: Int64
irisLabels = 3

convertRecordsToTensorData :: Vector IrisRecord -> (TensorData Float, TensorData Int64)
convertRecordsToTensorData records = (input, output)
where
numRecords = Data.Vector.length records
input = encodeTensorData [fromIntegral numRecords, irisFeatures] (undefined)
output = encodeTensorData [fromIntegral numRecords] (undefined)``````

Now all we need to do is take the various records and turn them into one dimensional vectors to encode. Here’s the final function:

``````convertRecordsToTensorData :: Vector IrisRecord -> (TensorData Float, TensorData Int64)
convertRecordsToTensorData records = (input, output)
where
numRecords = Data.Vector.length records
input = encodeTensorData [fromIntegral numRecords, irisFeatures]
(fromList \$ concatMap recordToInputs records)
output = encodeTensorData [fromIntegral numRecords] (label <\$> records)
recordToInputs :: IrisRecord -> [Float]
recordToInputs rec = [field1 rec, field2 rec, field3 rec, field4 rec]``````

## Neural Network Basics

Now that we’ve got that out of the way, we can start writing our model. Remember, we want to perform two different actions with our model. First, we want to be able to take our training input and train the weights. Second, we want to be able to pass a test data set and determine the error rate. We can represent these two different functions as a single `Model` object. Remember the `Session` monad, where we run all our Tensor Flow activities. The training will run an action that alters the variables but returns nothing. The error rate calculation will return us a float value.

``````data Model = Model
{ train :: TensorData Float -- Training input
-> TensorData Int64 -- Training output
-> Session ()
, errorRate :: TensorData Float -- Test input
-> TensorData Int64 -- Test output
-> Session Float
}``````

Now we’re going to build a fully-connected neural network. We’ll have 4 input units (1 for each of the different features), and then we’ll have 3 output units (1 for each of the classes we’re trying to represent). In the middle, we’ll use a hidden layer consisting of 10 units. This means we’ll need two sets of weights and biases. We’ll write a function that, when given dimensions, will give us the variable tensors for each layer. We want the weight and bias tensors, plus the result tensor of the layer.

``````buildNNLayer :: Int64 -> Int64 -> Tensor v Float
-> Build (Variable Float, Variable Float, Tensor Build Float)
buildNNLayer inputSize outputSize input = do
weights <- truncatedNormal (vector [inputSize, outputSize]) >>= initializedVariable
bias <- truncatedNormal (vector [outputSize]) >>= initializedVariable
let results = (input `matMul` readValue weights) `add` readValue bias
return (weights, bias, results)``````

We do this in the `Build` monad, which allows us to construct variables, among other things. We’ll use a `truncatedNormal` distribution for all our variables to keep things simple. We specify the size of each variable in a `vector` tensor, and then initialize them. Then we’ll create the resulting tensor by multiplying the input by our weights and adding the bias.

## Constructing our Model

Now we’ll start building our `Model` object, again within the `Build` monad. We begin by specifying our input and output placeholders, as well the number of hidden units. We’ll also use a `batchSize` of -1 to account for the fact that we want a variable number of input samples.

``````irisFeatures :: Int64
irisFeatures = 4

irisLabels :: Int64
irisLabels = 3
-- ^^ From above

createModel :: Build Model
createModel = do
let batchSize = -1 -- Allows variable sized batches
let numHiddenUnits = 10
inputs <- placeholder [batchSize, irisFeatures]
outputs <- placeholder [batchSize]``````

Then we’ll get the nodes for the two layers of variables, as well as their results. Between the layers, we’ll add a “rectifier” activation function `relu`:

``````(hiddenWeights, hiddenBiases, hiddenResults) <-
buildNNLayer irisFeatures numHiddenUnits inputs
let rectifiedHiddenResults = relu hiddenResults
(finalWeights, finalBiases, finalResults) <-
buildNNLayer numHiddenUnits irisLabels rectifiedHiddenResults``````

Now we have to get the inferred classes of each output. This means calling `argMax` to take the class with the highest probability. We’ll also `cast` the vector and then `render` it. These are some Haskell-Tensor-Flow specific terms for getting tensors to the right type. Next, we compare that against our output placeholders to see how many we got correct. Then we’ll make a node for calculating the error rate for this run.

``````actualOutput <- render \$ cast \$ argMax finalResults (scalar (1 :: Int64))
let correctPredictions = equal actualOutput outputs
errorRate_ <- render \$ 1 - (reduceMean (cast correctPredictions))``````

Now we have to actually do the work of training. First, we’ll make `oneHot` vectors for our expected outputs. This means converting the label `0` into the vector `[1,0,0]`, and so on. We’ll compare these values against our results (before we took the max), and this gives us our loss function. Then we will make a list of the parameters we want to train. The `adam` optimizer will minimize our loss function while modifying the params.

``````let outputVectors = oneHot outputs (fromIntegral irisLabels) 1 0
let loss = reduceMean \$ fst \$ softmaxCrossEntropyWithLogits finalResults outputVectors
let params = [hiddenWeights, hiddenBiases, finalWeights, finalBiases]
train_ <- minimizeWith adam loss params``````

Now we’ve got our `errorRate_` and `train_` nodes ready. There's one last step here. We have to plug in for the placeholder values and create functions that will take in the tensor data. Remember the `feed` pattern from last week? We use it again here. Finally, our model is complete!

``````return \$ Model
{ train = \inputFeed outputFeed ->
runWithFeeds
[ feed inputs inputFeed
, feed outputs outputFeed
]
train_
, errorRate = \inputFeed outputFeed -> unScalar <\$>
runWithFeeds
[ feed inputs inputFeed
, feed outputs outputFeed
]
errorRate_
}``````

## Tying it all together

Now we’ll write our main function that will run the session. It will have three stages. In the preparation stage, we’ll load our data, and use the `build` function to get our model. Then we’ll train our model for 1000 steps by choosing samples and converting our records to data. Every 100 steps, we'll print the output. Finally, we’ll determine the resulting error ratio by using the test data.

``````runIris :: FilePath -> FilePath -> IO ()
runIris trainingFile testingFile = runSession \$ do
-- Preparation
trainingRecords <- liftIO \$ readIrisFromFile trainingFile
testRecords <- liftIO \$ readIrisFromFile testingFile
model <- build createModel

-- Training
forM_ ([0..1000] :: [Int]) \$ \i -> do
trainingSample <- liftIO \$ chooseRandomRecords trainingRecords
let (trainingInputs, trainingOutputs) = convertRecordsToTensorData trainingSample
(train model) trainingInputs trainingOutputs
when (i `mod` 100 == 0) \$ do
err <- (errorRate model) trainingInputs trainingOutputs
liftIO \$ putStrLn \$ "Current training error " ++ show (err * 100)

liftIO \$ putStrLn ""

-- Testing
let (testingInputs, testingOutputs) = convertRecordsToTensorData testRecords
testingError <- (errorRate model) testingInputs testingOutputs
liftIO \$ putStrLn \$ "test error " ++ show (testingError * 100)

return ()``````

## Results

So when we actually run all this output, we’ll get the following results on our test set.

``````Current training error 60.000004
Current training error 30.000002
Current training error 39.999996
Current training error 19.999998
Current training error 10.000002
Current training error 10.000002
Current training error 19.999998
Current training error 19.999998
Current training error 10.000002
Current training error 10.000002
Current training error 0.0

test error 3.333336``````

Our test sample size was 30, so this means we got 29/30 this time around. Results change though from run to run (I obviously used the best results I found). Since our sample size is so small, we have high entropy here (sometimes the error rate is like 40%). Generally we’ll want to train longer on a larger test set, so that we get more consistent results, but this is a good start.

## Conclusion

In this article we went over the basics of making a neural network using the Haskell Tensor Flow library. We made a fully-connected neural network and fed in real data we parsed using the `Cassava` library. This network was able to learn a function to classify flowers from the Iris data set. Considering the small amount of data, we got some good results.

Come back next week, where we’ll see how we can add some more summary information to our tensor flow graph. We’ll use the tensor board application to view our graph in a visual format.

For more details on installing the Haskell Tensor Flow system, check out our In-Depth Tensor Flow Tutorial. It should walk you through the important steps in running the code on your own machine.

Perhaps you’ve never tried Haskell before at all, and want to see what it’s like. Maybe I’ve convinced you that Haskell is in fact the future of AI. In that case, you should check out our Getting Started Checklist for some tools on starting with the language.

## Appendix: All Imports

Documentation for Haskell Tensor Flow is still a major work in progress. So I want to make sure I explicitly list the modules you need to import for all the different functions we used here.

``````import Control.Monad (forM_, when)
import Data.ByteString.Lazy.Char8 (pack)
import Data.Csv (FromRecord, decode, HasHeader(..))
import Data.Int (Int64)
import Data.Vector (Vector, length, fromList, (!))
import GHC.Generics (Generic)
import System.Random.Shuffle (shuffleM)

import TensorFlow.Core (TensorData, Session, Build, render, runWithFeeds, feed, unScalar, build,
Tensor, encodeTensorData)
import TensorFlow.Minimize (minimizeWith, adam)
import TensorFlow.Ops (placeholder, truncatedNormal, add, matMul, relu,
argMax, scalar, cast, oneHot, reduceMean, softmaxCrossEntropyWithLogits,
equal, vector)
import TensorFlow.Session (runSession)
import TensorFlow.Variable (readValue, initializedVariable, Variable)``````

# Starting out with Haskell Tensor Flow

Last week we discussed the burgeoning growth of AI systems. We saw several examples of how those systems are impacting our lives more and more. I made the case that we ought to focus more on reliability when making architecture choices. After all, people’s lives might be at stake when we right code now. Naturally, I suggested Haskell as a prime candidate for developing reliable AI systems.

So now we’ll actually write some Haskell machine learning code. We'll focus on the Tensor Flow bindings library. I first got familiar with this library back at BayHac in April. I’ve spent the last couple months learning both Tensor Flow as a whole and the Haskell library. In this first article, we’ll go over the basic concepts of Tensor Flow. We'll see how they’re implemented in Python (the most common language for TF). We'll then translate these concepts to Haskell.

Note this series will not be a general introduction to the concept of machine learning. There is a fantastic series on Medium about that called Machine Learning is Fun! If you’re interested in learning the basic concepts, I highly recommend you check out part 1 of that series. Many of the ideas in my own article series will be a lot clearer with that background.

## Tensors

Tensor Flow is a great name because it breaks the library down into the two essential concepts. First up are tensors. These are the primary vehicle of data representation in Tensor Flow. Low-dimensional tensors are actually quite intuitive. But there comes a point when you can’t really visualize what’s going on, so you have to let the theoretical idea guide you.

In the world of big data, we represent everything numerically. And when you have a group of numbers, a programmer’s natural instinct is to put those in an array.

``[1.0, 2.0, 3.0, 6.7]``

Now what do you do if you have a lot of different arrays of the same size and you want to associate them together? You make a 2-dimensional array (an array of arrays), which we also refer to as a matrix.

``````[[1.0, 2.0, 3.0, 6.7],
[5.0, 10.0, 3.0, 12.9],
[6.0, 12.0, 15.0, 13.6],
[7.0, 22.0, 8.0, 5.3]]``````

Most programmers are pretty familiar with these concepts. Tensors take this idea and keep extending it. What happens when you have a lot of matrices of the same size? You can group them together as an array of matrices. We could call this a three-dimensional matrix. But “tensor” is the term we’ll use for this data representation in all dimensions.

Every tensor has a degree. We could start with a single number. This is a tensor of degree 0. Then a normal array is a tensor of degree 1. Then a matrix is a tensor of degree 2. Our last example would be a tensor of degree 3. And you can keep adding these on to each other, ad infinitum.

Every tensor has a shape. The shape is an array representing the dimensions of the tensor. The length of this array will be the degree of the tensor. So a number will have the empty list as its shape. An array will have a list of length 1 containing the length of the array. A matrix will have a list of length 2 containing its number of rows and columns. And so on. There are a few different ways we can represent tensors in code, but we'll get to that in a bit.

## Go with the Flow

The second important concept to understand is how Tensor Flow performs computations. Machine learning generally involves simple math operations. A lot of simple math operations. Since the scale is so large, we need to perform these operations as fast as possible. We need to use software and hardware that is optimized for this specific task. This necessitates having a low-level code representation of what’s going on. This is easier to achieve in a language like C, instead of Haskell or Python.

We could have the bulk of our code in Haskell, but perform the math in C using a Foreign Function Interface. But these interfaces have a large overhead, so this is likely to negate most of the gains we get from using C.

Tensor Flow’s solution to this problem is that we first build up a graph describing all our computations. Then once we have described that, we “run” our graph using a “session”. Thus it performs the entire language conversion process at once, so the overhead is lower.

If this sounds familiar, it's because this is how actions tend to work in Haskell (in some sense). We can, for instance, describe an IO action. And this action isn’t a series of commands that we execute the moment they show up in the code. Rather, the action is a description of the operations that our program will perform at some point. It’s also similar to the concept of Effectful programming. We’ll explore that topic in the future on this blog.

So what does our computation graph look like? We'll, each tensor is a node. Then we can make other nodes for "operations", that take tensors as input. For instance, we can “add” two tensors together, and this is another node. We’ll see in our example how we build up the computational graph, and then run it.

One of the awesome features of Tensor Flow is the Tensor Board application. It allows you to visualize your graph of computations. We’ll see how to do this later in the series.

## Coding Tensors

So at this point we should start examining how we actually create tensors in our code. We’ll start by looking at how we do this in Python, since the concepts are a little easier to understand that way. There are three types of tensors we’ll consider. The first are “constants”. These represent a set of values that do not change. We can use these values throughout our model training process, and they'll be the same each time. Since we define the values for the tensor up front, there’s no need to give any size arguments. But we will specify the datatype that we’ll use for them.

``````import tensorflow as tf

node1 = tf.constant(3.0, dtype=tf.float32)
node2 = tf.constant(4.0, dtype=tf.float32)``````

Now what can we actually do with these tensors? Well for a quick sample, let’s try adding them. This creates a new node in our graph that represents the addition of these two tensors. Then we can “run” that addition node to see the result. To encapsulate all our information, we’ll create a “Session”:

``````import tensorflow as tf

node1 = tf.constant(3.0, dtype=tf.float32)
node2 = tf.constant(4.0, dtype=tf.float32)

sess = tf.Session()
print result

“””
Output:
7.0
“””``````

Next up are placeholders. These are values that we change each run. Generally, we will use these for the inputs to our model. By using placeholders, we'll be able to change the input and train on different values each time. When we “run” a session, we need to assign values to each of these nodes.

We don’t know the values that will go into a placeholder, but we still assign the type of data at construction. We can also assign a size if we like. So here’s a quick snippet showing how we initialize placeholders. Then we can assign different values with each run of the application. Even though our placeholder tensors don’t have values, we can still add them just as we could with constant tensors.

``````node1 = tf.placeholder(tf.float32)
node2 = tf.placeholder(tf.float32)

sess = tf.Session()
result1 = sess.run(adderNode, {node1: 3, node2: 4.5 })
result2 = sess.run(adderNode, {node1: 2.7, node2: 8.9 })
print(result1)
print(result2)

"""
Output:
7.5
11.6
"""``````

The last type of tensor we’ll use are variables. These are the values that will constitute our “model”. Our goal is to find values for these parameters that will make our model fit the data well. We’ll supply a data type, as always. In this situation, we’ll also provide an initial constant value. Normally, we’d want to use a random distribution of some kind. The tensor won’t actually take on its value until we run a global variable initializer function. We’ll have to create this initializer and then have our session object run it before we get going.

``````w = tf.Variable(, dtype=tf.float32)
b = tf.Variable(, dtype=tf.float32)

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)``````

Now let’s use our variables to create a “model” of sorts. For this article we'll make a simple linear model. Let’s create additional nodes for our input tensor and the model itself. We’ll let `w` be the weights, and `b` be the “bias”. This means we’ll construct our final value by `w*x + b`, where `x` is the input.

``````w = tf.Variable(, dtype=tf.float32)
b = tf.Variable(, dtype=tf.float32)
x = tf.placeholder(dtype=tf.float32)
linear_model = w * x + b``````

Now, we want to know how good our model is. So let’s compare it to `y`, an input of our expected values. We’ll take the difference, square it, and then use the `reduce_sum` library function to get our “loss”. The loss measures the difference between what we want our model to represent and what it actually represents.

``````w = tf.Variable(, dtype=tf.float32)
b = tf.Variable(, dtype=tf.float32)
x = tf.placeholder(dtype=tf.float32)
linear_model = w * x + b
y = tf.placeholder(dtype=tf.float32)
squared_deltas = tf.square(linear_model - y)
loss = tf.reduce_sum(squared_deltas)``````

Each line here is a different tensor, or a new node in our graph. We’ll finish up our model by using the built in `GradientDescentOptimizer` with a learning rate of 0.01. We’ll set our training step as attempting to minimize the loss function.

``````optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)``````

Now we’ll run the session, initialize the variables, and run our training step 1000 times. We’ll pass a series of inputs with their expected outputs. Let's try to learn the line `y = 5x - 1`. Our expected output `y` values will assume this.

``````sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
for i in range(1000):
sess.run(train, {x: [1, 2, 3, 4], y: [4,9,14,19]})

print(sess.run([W,b]))``````

At the end we print the weights and bias, and we see our results!

``[array([ 4.99999475], dtype=float32), array([-0.99998516], dtype=float32)]``

So we can see that our learned values are very close to the correct values of 5 and -1!

## Representing Tensors in Haskell

So now at long last, I’m going to get into some of the details of how we apply these tensor concepts in Haskell. Like strings and numbers, we can’t have this one “Tensor” type in Haskell, since that type could really represent some very different concepts. For a deeper look at the tensor types we’re dealing with, check out our in depth guide.

In the meantime, let’s go through some simple code snippets replicating our efforts in Python. Here’s how we make a few constants and add them together. Do note the “overloaded lists” extension. It allows us to represent different types with the same syntax as lists. We use this with both `Shape` items and `Vectors`:

``````{-# LANGUAGE OverloadedLists #-}

import Data.Vector (Vector)
import TensorFlow.Ops (constant, add)
import TensorFlow.Session (runSession, run)

runSimple :: IO (Vector Float)
runSimple = runSession \$ do
let node1 = constant  [3 :: Float]
let node2 = constant  [4 :: Float]

main :: IO ()
main = do
result <- runSimple
print result

{-
Output:
[7.0]
-}``````

We use the `constant` function, which takes a `Shape` and then the value we want. We’ll create our addition node and then `run` it to get the output, which is a vector with a single float. We wrap everything in the `runSession` function. This encapsulates the initialization and running actions we saw in Python.

Now suppose we want placeholders. This is a little more complicated in Haskell. We’ll be using two placeholders, as we did in Python. We’ll initialized them with the `placeholder` function and a shape. We’ll take arguments to our function for the input values. To actually pass the parameters to fill in the placeholders, we have to use what we call a “feed”.

We know that our `adderNode` depends on two values. So we’ll write our run-step as a function that takes in two “feed” values, one for each placeholder. Then we’ll assign those feeds to the proper nodes using the `feed` function. We’ll put these in a list, and pass that list as an argument to `runWithFeeds`. Then, we wrap up by calling our run-step on our input data. We’ll have to `encode` the raw vectors as tensors though.

``````import TensorFlow.Core (Tensor, Value, feed, encodeTensorData)
import TensorFlow.Ops (constant, add, placeholder)
import TensorFlow.Session (runSession, run, runWithFeeds)

import Data.Vector (Vector)

runPlaceholder :: Vector Float -> Vector Float -> IO (Vector Float)
runPlaceholder input1 input2 = runSession \$ do
(node1 :: Tensor Value Float) <- placeholder 
(node2 :: Tensor Value Float) <- placeholder 
let runStep = \node1Feed node2Feed -> runWithFeeds
[ feed node1 node1Feed
, feed node2 node2Feed
]
runStep (encodeTensorData  input1) (encodeTensorData  input2)

main :: IO ()
main = do
result1 <- runPlaceholder [3.0] [4.5]
result2 <- runPlaceholder [2.7] [8.9]
print result1
print result2

{-
Output:
[7.5]
[11.599999] -- Yay rounding issues!
-}``````

Now we’ll wrap up by going through the simple linear model scenario we already saw in Python. Once again, we’ll take two vectors as our inputs. These will be the values we try to match. Next, we’ll use the `initializedVariable` function to get our variables. We don’t need to call a global variable initializer. But this does affect the state of the session. Notice that we pull it out of the monad context, rather than using let. (We also did for placeholders.)

``````import TensorFlow.Core (Tensor, Value, feed, encodeTensorData, Scalar(..))
import TensorFlow.Ops (constant, add, placeholder, sub, reduceSum, mul)
import TensorFlow.GenOps.Core (square)
import TensorFlow.Variable (readValue, initializedVariable, Variable)
import TensorFlow.Session (runSession, run, runWithFeeds)
import TensorFlow.Minimize (gradientDescent, minimizeWith)

import qualified Data.Vector as Vector
import Data.Vector (Vector)

runVariable :: Vector Float -> Vector Float -> IO (Float, Float)
runVariable xInput yInput = runSession \$ do
let xSize = fromIntegral \$ Vector.length xInput
let ySize = fromIntegral \$ Vector.length yInput
(w :: Variable Float) <- initializedVariable 3
(b :: Variable Float) <- initializedVariable 1
…``````

Next, we’ll make our placeholders and linear model. Then we’ll calculate our loss function in much the same way we did before. Then we’ll use the same feed trick to get our placeholders plugged in.

``````runVariable :: Vector Float -> Vector Float -> IO (Float, Float)
...
(x :: Tensor Value Float) <- placeholder [xSize]
let linear_model = ((readValue w) `mul` x) `add` (readValue b)
(y :: Tensor Value Float) <- placeholder [ySize]
let square_deltas = square (linear_model `sub` y)
let loss = reduceSum square_deltas
trainStep <- minimizeWith (gradientDescent 0.01) loss [w,b]
let trainWithFeeds = \xF yF -> runWithFeeds
[ feed x xF
, feed y yF
]
trainStep
…``````

Finally, we’ll run our training step 1000 times on our input data. Then we’ll run our model one more time to pull out the values of our weights and bias. Then we’re done!

``````runVariable :: Vector Float -> Vector Float -> IO (Float, Float)
...
replicateM_ 1000
(trainWithFeeds (encodeTensorData [xSize] xInput) (encodeTensorData [ySize] yInput))
(Scalar w_learned, Scalar b_learned) <- run (readValue w, readValue b)
return (w_learned, b_learned)

main :: IO ()
main = do
results <- runVariable [1.0, 2.0, 3.0, 4.0] [4.0, 9.0, 14.0, 19.0]
print results

{-
Output:
(4.9999948,-0.99998516)
-}``````

## Conclusion

Hopefully this article gave you a taste of some of the possibilities of Tensor Flow in Haskell. We saw a quick introduction to the fundamentals of Tensor Flow. We saw three different kinds of tensors. We then saw code examples both in Python and in Haskell. Finally, we went over a very quick example of a simple linear model and saw how we could learn values to fit that model. Next week, we’ll do a more complicated learning problem. We’ll use the classic “Iris” flower data set and train a classifier using a full neural network.

If you want more details, you should check out FREE Haskell Tensor Flow Guide. It will walk you through using the Tensor Flow library as a dependency and getting a basic model running!

Perhaps you’re completely new to Haskell but intrigued by the possibilities of using it for machine learning or anything else. You should download our Getting Started Checklist! It has some great resources on installing Haskell and learning the core concepts.

# The Future is Functional: Haskell and the AI-Native World

As regular readers of this blog know, I love talking about the future of Haskell as a language. I’m interested in ways we can shape the future of programming in a way that will help Haskell grow. I've mentioned network effects as a major hindrance a couple different times. Companies are reluctant to try Haskell since there aren't that many Haskell developers. As a result, fewer other developers will have the opportunity to get paid to learn Haskell. And the cycle continues.

Many perfectly smart people also have a bias against using Haskell in production code for a business. This stems from the idea that Haskell is an academic language. They see it as unsuited towards “Real World” problems. The best rebuttal to this point is to show the many uses of Haskell in creating systems that people use every day. Now, I can sit here and point the ease of creating web servers in Haskell. I could also point to the excellent mechanisms for designing front-end UIs. But there’s still one vital area in the future of programming that I have yet to address.

This is of course, the world of AI and machine learning. AI is slowly (or not so slowly) becoming a primary concern for pretty much any software based business. The last 5-10 years have seen the rise of “cloud native” architectures and systems. But we will soon be living in age when all major software systems will use AI and machine learning at their core. In short, we are about the enter the AI Native Future, as my company’s founder put it.

This will be the first in a series of articles where I explore the uses of Haskell in writing AI applications. In the coming weeks I’ll be focusing on using the Tensor Flow bindings for Haskell. Tensor Flow allows programmers to build simple but powerful applications. There are many tutorials in Python, but the Haskell library is still in early stages. So I'll go through the core concepts of this library and show their usage in Haskell.

But for me it’s not enough to show that we can use Haskell for AI applications. That’s hardly going to move the needle or change the status quo. My ultimate goal is to prove that it’s the best tool for these kinds of systems. But first, let’s get an idea of where AI is being used, and why it’s so important.

## AI Will be Everywhere...

...and this doesn’t seem to be particularly controversial. Year-on-year there are more and more discoveries in AI research. We are now able to solve problems that were scarcely thinkable a few years ago. Advancements in NLP systems like IBM Watson have made it so that chatbots are popping up all over the place. Tensor Flow has put advanced deep learning techniques at the finger tips of every programmer. Systems are getting faster and faster.

On top of that, the implications for the general public are becoming more well known. Self-driving cars are roaming the streets in several major American cities. The idea of autonomous Amazon drones making deliveries seems a near certainty. Millions of people entrust part of their home systems to joint IoT/AI devices like Nest. AI is truly going to be everywhere soon.

## AI Needs to be Safe

But the ubiquity of AI presents a large number of concerns as well. Software engineers are going to have a lot more responsibility. Our code will have to make difficult and potentially life-altering decisions. For instance, the design of self-driving cars carries many philosophical implications.

The plot thickens even more when we mix AI with the Internet of Things, another exploding market. In the last year, an attack brought down large parts of the internet using a bot-net of IoT devices. In the world of IoT, security still does not have paramount importance. But soon, more and more people will have cameras, audio recording devices, fire alarms and security systems hooked up to the internet. When this happens, their safety and privacy will depend on IoT security.

The need for safety and security suggest we may need to re-think some software paradigms. "Move fast and break things" is the prevailing mindset in many quarters. But this idea doesn't look so good if "breaking things" means someone's house burns.

## Pure, Functional Programming is the Path to Safety

So how does this relate to Haskell? Well let’s consider the tradeoffs we face when we choose what language to develop in. Haskell, with its strong type system and compile-time guarantees is more reliable. We can catch a lot more errors at compile time when compared to languages like Javascript, or Python. In these languages, non-syntactic issues tend to only pop up at runtime. Programmers must lean even more heavily on testing systems to catch possible errors. But testing is difficult, and there’s still plenty of disagreement about the best methodologies.

The flip side of this is that it’s somewhat easier to write code in a language like Javascript. It's easier to cut corners in the type system and have more “dynamic” objects. So while we all want “reliable” software, we’re often willing to compromise to get code off the ground faster. This is the epitome of the "Move fast and break things" mindset.

However, the explosion in the safety concerns of our software has elevated the stakes. If someone’s web browser crashes from Javascript, it's no big deal. The user will reload the page and hopefully not trigger that condition again. If your app stops responding, your user might get frustrated and you’ll lose a customer. But when programming starts penetrating other markets, any error could be catastrophic. If a self driving car encounters a sudden bug and the system crashes, many people could die. So it is our ethical responsibility to figure out ways to make it impossible for our software to encounter these kinds of errors.

Haskell is in many respects a very safe language. This is why it’s trusted by large financial institutions, large data science firms, and even by a company working in autonomous flight control. When your code cannot have arbitrary side effects, it is far easier to prevent it from crashing. It is also easier to secure a system (like an IoT device) when you can prevent leaks from arbitrary effects. Often these techniques are present in Haskell but not other languages.

The field of dependent types is yet another area where we’ll be able to add more security to our programming. They'll enable even more compile-time guarantees of behavior. This can add a great deal of safety when used well. Haskell doesn’t have full support for dependent types yet, but it is in the works. In the meantime there are languages like Idris with first class support.

Of course, when it comes to AI and deep learning, getting these guarantees will be difficult. It's one thing to build a type that ensures you have a vector of a particular length. It's quite another to build a type ensuring that when your car sees a dozen people standing ahead of it, it must brake. But these are the sorts of challenges programmers will need to face in the AI Native Future. And if we want to ensure Haskell’s place in that future, we’ll have to show these results are possible.

## Conclusion

It's obvious that AI and machine learning are the big fields of software engineering. They’ll continue to dominate the field for a long time. They can have an incredible impact on our lives. But by allowing this impact, we’re putting more of our safety in the hands of software engineers. This has major implications for how we develop software.

We often have to make tradeoffs between ease of development and reliability. A language like Haskell can offer us a lot of compile time guarantees about the behavior of our program. These guarantees are absent from many other languages. However, achieving these guarantees can introduce more pain into the development process.

But soon our code will be controlling things like self-driving cars, delivery drones, and home security devices. So we have an ethical responsibility to do everything in our power to make our code as reliable as possible. For this reason, Haskell is in a prime position when it comes to the AI Native Future. To take advantage of this, it will require a lot of work. Haskell programmers will have to develop language tools like dependent types to make Haskell even more reliable. We'll also have to contribute to libraries that will make it easy to write machine learning applications in Haskell.

With this in mind, the next few articles on this blog are all going to focus on using Haskell for machine learning. We’ll be starting by going through the basics of the Tensor Flow bindings for Haskell. You can get a sneak peek at some of that content by downloading our Haskell Tensor Flow tutorial!

If you’ve never programmed in Haskell before, you should try it out! We have two great resources for getting started. First, there’s the Getting Started Checklist. It will first walk you through downloading the language. Then it will point you in the directions of some other beginner materials. Second, there’s our Stack mini-course. This will walk you through the Stack tool, which makes it very easy to build projects, organize code, and get dependencies with Haskell.