Putting the Flow in Tensor Flow!
Last week we built our first neural network and used it on a real machine learning problem. We took the Iris data set and built a classifier that took in various flower measurements. It determined, with decent accuracy, the type of flower the measurements referred to.
But we’ve still only seen half the story of Tensor Flow! We’ve constructed many tensors and combined them in interesting ways. We can imagine what is going on with the “flow”, but we haven’t seen a visual representation of that yet.
We’re in luck though, thanks to the Tensor Board application. With it, we can visualize the computation graph we've created. We can also track certain values throughout our program run. In this article, we’ll take our Iris example and show how we can add Tensor Board features to it. Here's the Github repo with all the code so you can follow along!
Add an Event Writer
The first thing to understand about Tensor Board is that it gets its data from a source directory. While we’re running our system, we have to direct it to write events to that directory. This will allow Tensor Board to know what happened in our training run.
eventsDir :: FilePath
eventsDir = "/tmp/tensorflow/iris/logs/"
runIris :: FilePath -> FilePath -> IO ()
runIris trainingFile testingFile = withEventWriter eventsDir $ \eventWriter -> runSession $ do
...
By itself, this doesn’t write anything down into that directory though! To understand the consequences of this, let’s boot up tensor board.
Running Tensor Board
Running our executable again doesn't bring up tensor board. It merely logs the information that Tensor Board uses. To actually see that information, we’ll run the tensorboard
command.
>> tensorboard --logdir=’/tmp/tensorflow/iris/logs’
Starting TensorBoard 47 at http://0.0.0.0:6006
Then we can point our web browser at the correct port. Since we haven't written anything to the file yet, there won’t be much for us to see other than some pretty graphics. So let’s start by logging our graph. This is actually quite easy! Remember our model? We can use the logGraph
function combined with our event writer so we can see it.
model <- build createModel
logGraph eventWriter createModel
Now when we refresh Tensor Flow, we’ll see our system’s graph.
But, it’s very large and very confusing. The names of all the nodes are a little confusing, and it’s not clear what data is going where. Plus, we have no idea what’s going on with our error rate or anything like that. Let’s make a couple adjustments to fix this.
Adding Summaries
So the first step is to actually specify some measurements that we’ll have Tensor Board plot for us. One node we can use is a “scalar summary”. This provides us with a summary of a particular value over the course of our training run. Let’s do this with our errorRate
node. We can use the simple scalarSummary
function.
errorRate_ <- render $ 1 - (reduceMean (cast correctPredictions))
scalarSummary "Error" errorRate_
The second type of summary is a histogram
summary. We use this on a particular tensor to see the distribution of its values over the course of the run. Let’s do this with our second set of weights. We need to use readValue
to go from a Variable
to a Tensor
.
(finalWeights, finalBiases, finalResults) <-
buildNNLayer numHiddenUnits irisLabels rectifiedHiddenResults
histogramSummary "Weights" (readValue finalWeights)
So let’s run tensor flow again. We would expect to see these new values show up under the Scalars
and Histograms
tabs. But they don’t. This is because we still to write these results to our event writer. And this turns out to be a little complicated. First, before we start training, we have to create a tensor representing all our summaries.
logGraph eventWriter createModel
summaryTensor <- build mergeAllSummaries
Now if we had no placeholders, we could run this tensor whenever we wanted, and it would output the values. But our summary tensors depend on the input placeholders, which complicates the matter. So here’s what we’ll do. We’ll only write out the summaries when we check our error rate (every 100 steps). To do this, we have to change our error rate in the model to take the summary tensor as an extra argument. We’ll also have it add a ByteString
as a return value to the original Float
.
data Model = Model
{ train :: TensorData Float
-> TensorData Int64
-> Session ()
, errorRate :: TensorData Float
-> TensorData Int64
-> SummaryTensor
-> Session (Float, ByteString)
}
Within our model definition, we’ll use this extra parameter. It will run both the errorRate_
tensor AND the summary tensor together with the feeds:
return $ Model
, train = ...
, errorRate = \inputFeed outputFeed summaryTensor -> do
(errorTensorResult, summaryTensorResult) <- runWithFeeds
[ feed inputs inputFeed
, feed outputs outputFeed
]
(errorRate_, summaryTensor)
return (unScalar errorTensorResult, unScalar summaryTensorResult)
Now we need to modify our calls to errorRate
below. We’ll pass the summary tensor as an argument, and get the bytes as output. We’ll write it to our event writer (after decoding), and then we’ll be done!
-- Training
forM_ ([0..1000] :: [Int]) $ \i -> do
trainingSample <- liftIO $ chooseRandomRecords trainingRecords
let (trainingInputs, trainingOutputs) = convertRecordsToTensorData trainingSample
(train model) trainingInputs trainingOutputs
when (i `mod` 100 == 0) $ do
(err, summaryBytes) <- (errorRate model) trainingInputs trainingOutputs summaryTensor
let summary = decodeMessageOrDie summaryBytes
liftIO $ putStrLn $ "Current training error " ++ show (err * 100)
logSummary eventWriter (fromIntegral i) summary
liftIO $ putStrLn ""
-- Testing
let (testingInputs, testingOutputs) = convertRecordsToTensorData testRecords
(testingError, _) <- (errorRate model) testingInputs testingOutputs summaryTensor
liftIO $ putStrLn $ "test error " ++ show (testingError * 100)
Now we can see what our summaries look like by running tensor board again!
Annotating our Graph
Now let’s look back to our graph. It’s still a bit confusing. We can clean it up a lot by creating “name scopes”. A name scope is part of the graph that we set aside under a single name. When Tensor Board generates our graph, it will create one big block for the scope. We can then zoom in and examine the individual nodes if we want.
We’ll make three different scopes. First, we’ll make a scope for each of the hidden layers of our neural network. This is quite easy, since we already have a function for creating these. All we have to do is make the function take an extra parameter for the name of the scope we want. Then we wrap the whole function within the withNameScope
function.
buildNNLayer :: Int64 -> Int64 -> Tensor v Float -> Text
-> Build (Variable Float, Variable Float, Tensor Build Float)
buildNNLayer inputSize outputSize input layerName = withNameScope layerName $ do
weights <- truncatedNormal (vector [inputSize, outputSize]) >>= initializedVariable
bias <- truncatedNormal (vector [outputSize]) >>= initializedVariable
let results = (input `matMul` readValue weights) `add` readValue bias
return (weights, bias, results)
We supply our name further down in the code:
(hiddenWeights, hiddenBiases, hiddenResults) <-
buildNNLayer irisFeatures numHiddenUnits inputs "layer1"
let rectifiedHiddenResults = relu hiddenResults
(finalWeights, finalBiases, finalResults) <-
buildNNLayer numHiddenUnits irisLabels rectifiedHiddenResults "layer2"
Now we’ll add a scope around all our error calculations. First, we combine these into an action wrapped in withNameScope
. Then, observing that we need the errorRate_
and train_
steps, we return those from the block. That’s it!
(errorRate_, train_) <- withNameScope "error_calculation" $ do
actualOutput <- render $ cast $ argMax finalResults (scalar (1 :: Int64))
let correctPredictions = equal actualOutput outputs
er <- render $ 1 - (reduceMean (cast correctPredictions))
scalarSummary "Error" er
let outputVectors = oneHot outputs (fromIntegral irisLabels) 1 0
let loss = reduceMean $ fst $ softmaxCrossEntropyWithLogits finalResults outputVectors
let params = [hiddenWeights, hiddenBiases, finalWeights, finalBiases]
tr <- minimizeWith adam loss params
return (er, tr)
Now when we look at our graph, we see that it’s divided into three parts: our two layers, and our error calculation. All the information flows among these three parts (as well as the "Adam" optimizer portion).
Conclusion
By default, Tensor Board graphs can look a little messy. But by adding a little more information to the nodes and using scopes, you can paint a much clearer picture. You can see how the data flows from one end of the application to the other. We can also use summaries to track important information about our graph. We’ll use this most often for the loss function or error rate. Hopefully, we'll see it decline over time.
Next week we’ll add some more complexity to our neural networks. We'll see new tensors for convolution and max pooling. This will allow us to solve the more difficult MNIST digit recognition problem. Stay tuned!
If you’re itching to try out some Tensor Board functionality for yourself, check out our in-depth Tensor Flow guide. It goes into more detail about the practical aspects of using this library. If you want to get the Haskell Tensor Flow library running on your local machine, check it out! Trust me, it's a little complicated, unless you're a Stack wizard already!
And if this is your first exposure to Haskell, try it out! Take a look at our guide to getting started with the language!