Digging in Deep: Solving a Real Problem with Haskell Tensor Flow
Last week we got acquainted with the core concepts of Tensor Flow. We learned about the differences between constants, placeholders, and variable tensors. Both the Haskell and Python bindings have functions to represent these. The Python version was a bit simpler though. Once we had our tensors, we wrote a program that “learned” a simple linear equation.
This week, we’re going to solve an actual machine learning problem. We’re going to use the Iris data set, which contains measurements of different Iris flowers. Each flower belongs to one of three species. Our program will "learn" a function choosing the species from the measurements. This function will involved a fully-connected neural network.
Formatting our Input
The first step in pretty much any machine learning problem is data processing. After all, our data doesn’t magically get resolved into Haskell data types. Luckily, Cassava is a great library to help us out. The Iris data set consists of data in .csv files that each have a header line and then a series of records. They look a bit like this:
120,4,setosa,versicolor,virginica
6.4,2.8,5.6,2.2,2
5.0,2.3,3.3,1.0,1
4.9,2.5,4.5,1.7,2
4.9,3.1,1.5,0.1,0
...
Each line contains one record. A record has four flower measurements, and a final label. In this case, we have three types of flowers we are trying to classify between: Iris Setosa, Iris Versicolor, and Iris Virginica. So the last column contains the numbers 0,1, and 2, corresponding to these respective classes.
Let's create a data type representing each record. Then we can parse the file line-by-line. Our IrisRecord
type will contain the feature data and the resulting label. This type will act as a bridge between our raw data and the tensor format we’ll need to run our learning algorithm. We’ll derive the “Generic” typeclass for our record type, and use this to get FromRecord
. Once our type has an instance for FromRecord
, we can parse it with ease. As a note, throughout this article, I’ll be omitting the imports section from the code samples. I’ve included a full list of imports from these files as an appendix at the bottom. We'll also be using the OverloadedLists
extension throughout.
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedLists #-}
...
data IrisRecord = IrisRecord
{ field1 :: Float
, field2 :: Float
, field3 :: Float
, field4 :: Float
, label :: Int64
}
deriving (Generic)
instance FromRecord IrisRecord
Now that we have our type, we’ll write a function, readIrisFromFile
, that will read our data in from a CSV file.
readIrisFromFile :: FilePath -> IO (Vector IrisRecord)
readIrisFromFile fp = do
contents <- readFile fp
let contentsAsBs = pack contents
let results = decode HasHeader contentsAsBs :: Either String (Vector IrisRecord)
case results of
Left err -> error err
Right records -> return records
We won’t want to always feed our entire data set into our training system. So given a whole slew of these items, we should be able to pick out a random sample.
sampleSize :: Int
sampleSize = 10
chooseRandomRecords :: Vector IrisRecord -> IO (Vector IrisRecord)
chooseRandomRecords records = do
let numRecords = Data.Vector.length records
chosenIndices <- take sampleSize <$> shuffleM [0..(numRecords - 1)]
return $ fromList $ map (records !) chosenIndices
Once we’ve selected our vector of records to use for each run, we’re still not done. We need to take these records and transform them into the TensorData
that we’ll feed into our algorithm. We create items of TensorData
by feeding in a shape and then a 1-dimensional vector of values. First, we need to know the shapes of our input and output. Both of these depend on the number of rows in the sample. The “input” will have a column for each of the four features in our set. The output meanwhile will have a single column for the label values.
irisFeatures :: Int64
irisFeatures = 4
irisLabels :: Int64
irisLabels = 3
convertRecordsToTensorData :: Vector IrisRecord -> (TensorData Float, TensorData Int64)
convertRecordsToTensorData records = (input, output)
where
numRecords = Data.Vector.length records
input = encodeTensorData [fromIntegral numRecords, irisFeatures] (undefined)
output = encodeTensorData [fromIntegral numRecords] (undefined)
Now all we need to do is take the various records and turn them into one dimensional vectors to encode. Here’s the final function:
convertRecordsToTensorData :: Vector IrisRecord -> (TensorData Float, TensorData Int64)
convertRecordsToTensorData records = (input, output)
where
numRecords = Data.Vector.length records
input = encodeTensorData [fromIntegral numRecords, irisFeatures]
(fromList $ concatMap recordToInputs records)
output = encodeTensorData [fromIntegral numRecords] (label <$> records)
recordToInputs :: IrisRecord -> [Float]
recordToInputs rec = [field1 rec, field2 rec, field3 rec, field4 rec]
Neural Network Basics
Now that we’ve got that out of the way, we can start writing our model. Remember, we want to perform two different actions with our model. First, we want to be able to take our training input and train the weights. Second, we want to be able to pass a test data set and determine the error rate. We can represent these two different functions as a single Model
object. Remember the Session
monad, where we run all our Tensor Flow activities. The training will run an action that alters the variables but returns nothing. The error rate calculation will return us a float value.
data Model = Model
{ train :: TensorData Float -- Training input
-> TensorData Int64 -- Training output
-> Session ()
, errorRate :: TensorData Float -- Test input
-> TensorData Int64 -- Test output
-> Session Float
}
Now we’re going to build a fully-connected neural network. We’ll have 4 input units (1 for each of the different features), and then we’ll have 3 output units (1 for each of the classes we’re trying to represent). In the middle, we’ll use a hidden layer consisting of 10 units. This means we’ll need two sets of weights and biases. We’ll write a function that, when given dimensions, will give us the variable tensors for each layer. We want the weight and bias tensors, plus the result tensor of the layer.
buildNNLayer :: Int64 -> Int64 -> Tensor v Float
-> Build (Variable Float, Variable Float, Tensor Build Float)
buildNNLayer inputSize outputSize input = do
weights <- truncatedNormal (vector [inputSize, outputSize]) >>= initializedVariable
bias <- truncatedNormal (vector [outputSize]) >>= initializedVariable
let results = (input `matMul` readValue weights) `add` readValue bias
return (weights, bias, results)
We do this in the Build
monad, which allows us to construct variables, among other things. We’ll use a truncatedNormal
distribution for all our variables to keep things simple. We specify the size of each variable in a vector
tensor, and then initialize them. Then we’ll create the resulting tensor by multiplying the input by our weights and adding the bias.
Constructing our Model
Now we’ll start building our Model
object, again within the Build
monad. We begin by specifying our input and output placeholders, as well the number of hidden units. We’ll also use a batchSize
of -1 to account for the fact that we want a variable number of input samples.
irisFeatures :: Int64
irisFeatures = 4
irisLabels :: Int64
irisLabels = 3
-- ^^ From above
createModel :: Build Model
createModel = do
let batchSize = -1 -- Allows variable sized batches
let numHiddenUnits = 10
inputs <- placeholder [batchSize, irisFeatures]
outputs <- placeholder [batchSize]
Then we’ll get the nodes for the two layers of variables, as well as their results. Between the layers, we’ll add a “rectifier” activation function relu
:
(hiddenWeights, hiddenBiases, hiddenResults) <-
buildNNLayer irisFeatures numHiddenUnits inputs
let rectifiedHiddenResults = relu hiddenResults
(finalWeights, finalBiases, finalResults) <-
buildNNLayer numHiddenUnits irisLabels rectifiedHiddenResults
Now we have to get the inferred classes of each output. This means calling argMax
to take the class with the highest probability. We’ll also cast
the vector and then render
it. These are some Haskell-Tensor-Flow specific terms for getting tensors to the right type. Next, we compare that against our output placeholders to see how many we got correct. Then we’ll make a node for calculating the error rate for this run.
actualOutput <- render $ cast $ argMax finalResults (scalar (1 :: Int64))
let correctPredictions = equal actualOutput outputs
errorRate_ <- render $ 1 - (reduceMean (cast correctPredictions))
Now we have to actually do the work of training. First, we’ll make oneHot
vectors for our expected outputs. This means converting the label 0
into the vector [1,0,0]
, and so on. We’ll compare these values against our results (before we took the max), and this gives us our loss function. Then we will make a list of the parameters we want to train. The adam
optimizer will minimize our loss function while modifying the params.
let outputVectors = oneHot outputs (fromIntegral irisLabels) 1 0
let loss = reduceMean $ fst $ softmaxCrossEntropyWithLogits finalResults outputVectors
let params = [hiddenWeights, hiddenBiases, finalWeights, finalBiases]
train_ <- minimizeWith adam loss params
Now we’ve got our errorRate_
and train_
nodes ready. There's one last step here. We have to plug in for the placeholder values and create functions that will take in the tensor data. Remember the feed
pattern from last week? We use it again here. Finally, our model is complete!
return $ Model
{ train = \inputFeed outputFeed ->
runWithFeeds
[ feed inputs inputFeed
, feed outputs outputFeed
]
train_
, errorRate = \inputFeed outputFeed -> unScalar <$>
runWithFeeds
[ feed inputs inputFeed
, feed outputs outputFeed
]
errorRate_
}
Tying it all together
Now we’ll write our main function that will run the session. It will have three stages. In the preparation stage, we’ll load our data, and use the build
function to get our model. Then we’ll train our model for 1000 steps by choosing samples and converting our records to data. Every 100 steps, we'll print the output. Finally, we’ll determine the resulting error ratio by using the test data.
runIris :: FilePath -> FilePath -> IO ()
runIris trainingFile testingFile = runSession $ do
-- Preparation
trainingRecords <- liftIO $ readIrisFromFile trainingFile
testRecords <- liftIO $ readIrisFromFile testingFile
model <- build createModel
-- Training
forM_ ([0..1000] :: [Int]) $ \i -> do
trainingSample <- liftIO $ chooseRandomRecords trainingRecords
let (trainingInputs, trainingOutputs) = convertRecordsToTensorData trainingSample
(train model) trainingInputs trainingOutputs
when (i `mod` 100 == 0) $ do
err <- (errorRate model) trainingInputs trainingOutputs
liftIO $ putStrLn $ "Current training error " ++ show (err * 100)
liftIO $ putStrLn ""
-- Testing
let (testingInputs, testingOutputs) = convertRecordsToTensorData testRecords
testingError <- (errorRate model) testingInputs testingOutputs
liftIO $ putStrLn $ "test error " ++ show (testingError * 100)
return ()
Results
So when we actually run all this output, we’ll get the following results on our test set.
Current training error 60.000004
Current training error 30.000002
Current training error 39.999996
Current training error 19.999998
Current training error 10.000002
Current training error 10.000002
Current training error 19.999998
Current training error 19.999998
Current training error 10.000002
Current training error 10.000002
Current training error 0.0
test error 3.333336
Our test sample size was 30, so this means we got 29/30 this time around. Results change though from run to run (I obviously used the best results I found). Since our sample size is so small, we have high entropy here (sometimes the error rate is like 40%). Generally we’ll want to train longer on a larger test set, so that we get more consistent results, but this is a good start.
Conclusion
In this article we went over the basics of making a neural network using the Haskell Tensor Flow library. We made a fully-connected neural network and fed in real data we parsed using the Cassava
library. This network was able to learn a function to classify flowers from the Iris data set. Considering the small amount of data, we got some good results.
Come back next week, where we’ll see how we can add some more summary information to our tensor flow graph. We’ll use the tensor board application to view our graph in a visual format.
For more details on installing the Haskell Tensor Flow system, check out our In-Depth Tensor Flow Tutorial. It should walk you through the important steps in running the code on your own machine.
Perhaps you’ve never tried Haskell before at all, and want to see what it’s like. Maybe I’ve convinced you that Haskell is in fact the future of AI. In that case, you should check out our Getting Started Checklist for some tools on starting with the language.
Appendix: All Imports
Documentation for Haskell Tensor Flow is still a major work in progress. So I want to make sure I explicitly list the modules you need to import for all the different functions we used here.
import Control.Monad (forM_, when)
import Control.Monad.IO.Class (liftIO)
import Data.ByteString.Lazy.Char8 (pack)
import Data.Csv (FromRecord, decode, HasHeader(..))
import Data.Int (Int64)
import Data.Vector (Vector, length, fromList, (!))
import GHC.Generics (Generic)
import System.Random.Shuffle (shuffleM)
import TensorFlow.Core (TensorData, Session, Build, render, runWithFeeds, feed, unScalar, build,
Tensor, encodeTensorData)
import TensorFlow.Minimize (minimizeWith, adam)
import TensorFlow.Ops (placeholder, truncatedNormal, add, matMul, relu,
argMax, scalar, cast, oneHot, reduceMean, softmaxCrossEntropyWithLogits,
equal, vector)
import TensorFlow.Session (runSession)
import TensorFlow.Variable (readValue, initializedVariable, Variable)