Putting the Flow in Tensor Flow!

Last week we built our first neural network and used it on a real machine learning problem. We took the Iris data set and built a classifier that took in various flower measurements. It determined, with decent accuracy, the type of flower the measurements referred to.

But we’ve still only seen half the story of Tensor Flow! We’ve constructed many tensors and combined them in interesting ways. We can imagine what is going on with the “flow”, but we haven’t seen a visual representation of that yet.

We’re in luck though, thanks to the Tensor Board application. With it, we can visualize the computation graph we've created. We can also track certain values throughout our program run. In this article, we’ll take our Iris example and show how we can add Tensor Board features to it. Here's the Github repo with all the code so you can follow along!

Add an Event Writer

The first thing to understand about Tensor Board is that it gets its data from a source directory. While we’re running our system, we have to direct it to write events to that directory. This will allow Tensor Board to know what happened in our training run.

eventsDir :: FilePath
eventsDir = "/tmp/tensorflow/iris/logs/"

runIris :: FilePath -> FilePath -> IO ()
runIris trainingFile testingFile = withEventWriter eventsDir $ \eventWriter -> runSession $ do
...

By itself, this doesn’t write anything down into that directory though! To understand the consequences of this, let’s boot up tensor board.

Running Tensor Board

Running our executable again doesn't bring up tensor board. It merely logs the information that Tensor Board uses. To actually see that information, we’ll run the tensorboard command.

>> tensorboard --logdir=’/tmp/tensorflow/iris/logs’
Starting TensorBoard 47 at http://0.0.0.0:6006

Then we can point our web browser at the correct port. Since we haven't written anything to the file yet, there won’t be much for us to see other than some pretty graphics. So let’s start by logging our graph. This is actually quite easy! Remember our model? We can use the logGraph function combined with our event writer so we can see it.

model <- build createModel
logGraph eventWriter createModel

Now when we refresh Tensor Flow, we’ll see our system’s graph.

But, it’s very large and very confusing. The names of all the nodes are a little confusing, and it’s not clear what data is going where. Plus, we have no idea what’s going on with our error rate or anything like that. Let’s make a couple adjustments to fix this.

Adding Summaries

So the first step is to actually specify some measurements that we’ll have Tensor Board plot for us. One node we can use is a “scalar summary”. This provides us with a summary of a particular value over the course of our training run. Let’s do this with our errorRate node. We can use the simple scalarSummary function.

errorRate_ <- render $ 1 - (reduceMean (cast correctPredictions))
scalarSummary "Error" errorRate_

The second type of summary is a histogram summary. We use this on a particular tensor to see the distribution of its values over the course of the run. Let’s do this with our second set of weights. We need to use readValue to go from a Variable to a Tensor.

(finalWeights, finalBiases, finalResults) <-
  buildNNLayer numHiddenUnits irisLabels rectifiedHiddenResults
histogramSummary "Weights" (readValue finalWeights)

So let’s run tensor flow again. We would expect to see these new values show up under the Scalars and Histograms tabs. But they don’t. This is because we still to write these results to our event writer. And this turns out to be a little complicated. First, before we start training, we have to create a tensor representing all our summaries.

logGraph eventWriter createModel
summaryTensor <- build mergeAllSummaries

Now if we had no placeholders, we could run this tensor whenever we wanted, and it would output the values. But our summary tensors depend on the input placeholders, which complicates the matter. So here’s what we’ll do. We’ll only write out the summaries when we check our error rate (every 100 steps). To do this, we have to change our error rate in the model to take the summary tensor as an extra argument. We’ll also have it add a ByteString as a return value to the original Float.

data Model = Model
  { train :: TensorData Float
          -> TensorData Int64
          -> Session ()
  , errorRate :: TensorData Float
              -> TensorData Int64
              -> SummaryTensor
              -> Session (Float, ByteString)
  }

Within our model definition, we’ll use this extra parameter. It will run both the errorRate_ tensor AND the summary tensor together with the feeds:

return $ Model
  , train = ...
  , errorRate = \inputFeed outputFeed summaryTensor -> do
      (errorTensorResult, summaryTensorResult) <- runWithFeeds
        [ feed inputs inputFeed
        , feed outputs outputFeed
        ]
        (errorRate_, summaryTensor)
      return (unScalar errorTensorResult, unScalar summaryTensorResult)

Now we need to modify our calls to errorRate below. We’ll pass the summary tensor as an argument, and get the bytes as output. We’ll write it to our event writer (after decoding), and then we’ll be done!

-- Training
  forM_ ([0..1000] :: [Int]) $ \i -> do
    trainingSample <- liftIO $ chooseRandomRecords trainingRecords
    let (trainingInputs, trainingOutputs) = convertRecordsToTensorData trainingSample
    (train model) trainingInputs trainingOutputs
    when (i `mod` 100 == 0) $ do
      (err, summaryBytes) <- (errorRate model) trainingInputs trainingOutputs summaryTensor
      let summary = decodeMessageOrDie summaryBytes
      liftIO $ putStrLn $ "Current training error " ++ show (err * 100)
      logSummary eventWriter (fromIntegral i) summary

  liftIO $ putStrLn ""

  -- Testing
  let (testingInputs, testingOutputs) = convertRecordsToTensorData testRecords
  (testingError, _) <- (errorRate model) testingInputs testingOutputs summaryTensor
  liftIO $ putStrLn $ "test error " ++ show (testingError * 100)

Now we can see what our summaries look like by running tensor board again!

Annotating our Graph

Now let’s look back to our graph. It’s still a bit confusing. We can clean it up a lot by creating “name scopes”. A name scope is part of the graph that we set aside under a single name. When Tensor Board generates our graph, it will create one big block for the scope. We can then zoom in and examine the individual nodes if we want.

We’ll make three different scopes. First, we’ll make a scope for each of the hidden layers of our neural network. This is quite easy, since we already have a function for creating these. All we have to do is make the function take an extra parameter for the name of the scope we want. Then we wrap the whole function within the withNameScope function.

buildNNLayer :: Int64 -> Int64 -> Tensor v Float -> Text
             -> Build (Variable Float, Variable Float, Tensor Build Float)
buildNNLayer inputSize outputSize input layerName = withNameScope layerName $ do
  weights <- truncatedNormal (vector [inputSize, outputSize]) >>= initializedVariable
  bias <- truncatedNormal (vector [outputSize]) >>= initializedVariable
  let results = (input `matMul` readValue weights) `add` readValue bias
  return (weights, bias, results)

We supply our name further down in the code:

(hiddenWeights, hiddenBiases, hiddenResults) <- 
  buildNNLayer irisFeatures numHiddenUnits inputs "layer1"
let rectifiedHiddenResults = relu hiddenResults
(finalWeights, finalBiases, finalResults) <-
  buildNNLayer numHiddenUnits irisLabels rectifiedHiddenResults "layer2"

Now we’ll add a scope around all our error calculations. First, we combine these into an action wrapped in withNameScope. Then, observing that we need the errorRate_ and train_ steps, we return those from the block. That’s it!

(errorRate_, train_) <- withNameScope "error_calculation" $ do
    actualOutput <- render $ cast $ argMax finalResults (scalar (1 :: Int64))
    let correctPredictions = equal actualOutput outputs
    er <- render $ 1 - (reduceMean (cast correctPredictions))
    scalarSummary "Error" er

    let outputVectors = oneHot outputs (fromIntegral irisLabels) 1 0
    let loss = reduceMean $ fst $ softmaxCrossEntropyWithLogits finalResults outputVectors
    let params = [hiddenWeights, hiddenBiases, finalWeights, finalBiases]
    tr <- minimizeWith adam loss params
    return (er, tr)

Now when we look at our graph, we see that it’s divided into three parts: our two layers, and our error calculation. All the information flows among these three parts (as well as the "Adam" optimizer portion).

Conclusion

By default, Tensor Board graphs can look a little messy. But by adding a little more information to the nodes and using scopes, you can paint a much clearer picture. You can see how the data flows from one end of the application to the other. We can also use summaries to track important information about our graph. We’ll use this most often for the loss function or error rate. Hopefully, we'll see it decline over time.

Next week we’ll add some more complexity to our neural networks. We'll see new tensors for convolution and max pooling. This will allow us to solve the more difficult MNIST digit recognition problem. Stay tuned!

If you’re itching to try out some Tensor Board functionality for yourself, check out our in-depth Tensor Flow guide. It goes into more detail about the practical aspects of using this library. If you want to get the Haskell Tensor Flow library running on your local machine, check it out! Trust me, it's a little complicated, unless you're a Stack wizard already!

And if this is your first exposure to Haskell, try it out! Take a look at our guide to getting started with the language!

Add an Event Writer

Running Tensor Board

Adding Summaries

Annotating our Graph

Conclusion

Deeper Still: Convolutional Neural Networks

Digging in Deep: Solving a Real Problem with Haskell Tensor Flow