AI

Grenade! Dependently Typed Neural Networks

In the last couple weeks we explored one of the most complex topics I’ve presented on this blog. We examined potential runtime failures that can occur when using Tensor Flow. These included mismatched dimensions and missing placeholders. In an ideal world, we would catch these issues at compile time instead. At its current stage, the Haskell Tensor Flow library doesn’t support that. But we demonstrated that it is possible to add a layer to do this by using dependent types.

Now, I’m still very much of a novice at dependent types, so the solutions I presented were rather clunky. This week I'll show a better example of this concept from a different library. The Grenade library uses dependent types everywhere. It allows us to build verifiably-valid neural networks with extreme concision. So let’s dive in and see what it’s all about!

Shapes and Layers

The first thing to learn with this library is the two concepts of Shapes and Layers. Shapes are best compared to tensors from Tensor Flow, except that they exist at the type level. In Tensor Flow we could build tensors with arbitrary dimensions. Grenade currently only supports up to three dimensions. So the different shape types either start with D1, D2, or D3, depending on the dimensionality of the shape. Then each of these type constructors take a set of natural number parameters. So the following are all valid “Shape” types within Grenade:

D1 5
D2 4 12
D3 8 10 2

The first represents a vector with 5 elements. The second represents a matrix with 4 rows and 12 columns. And the third represents an 8x10x2 matrix (or tensor, if you like). The different numbers represent those values at the type level, not the term level. If this seems confusing, here’s a good tutorial that goes into more depth about the basics of dependent types. The most important idea is that something of type D1 5 can only have 5 elements. A vector of 4 or 6 elements will not type-check.

So now that we know about shapes, let’s examine layers. Layers describe relationships between our shapes. They encapsulate the transformations that happen on our data. The following are all valid layer types:

Relu
FullyConnected 10 20
Convolution 1 10 5 5 1 1

The layer Relu describes a layer that takes in data of any kind of shape and outputs the same shape. In between, it applies the relu activation function to the input data. Since it doesn’t change the shape, it doesn’t need any parameters.

A FullyConnected layer represents the canonical layer of a neural network. It has two parameters, one for the number of input neurons and one for the number of output neurons. In this case, the layer will take 10 inputs and produce 20 outputs.

A Convolution layer represents a 2D convolution like we saw with our MNIST network. This particular example has 1 input feature, 10 output features, uses a 5x5 patch size, and a 1x1 patch offset.

Describing a Network

Now that we have a basic grasp on shapes and layers, we can see how they fit together to create a full network. A network type has two type parameters. The second parameter is a list of the shapes that our data takes at any given point throughout the network. The first parameter is a list of the layers representing the transformations on the data. So let’s say we wanted to describe a very simple network. It will take 4 inputs and produce 10 outputs using a fully connected layer. Then it will perform an Relu activation. This network looks like this:

type SimpleNetwork = Network
  ‘[FullyConnected 4 10, Relu]
  ‘[ ‘D1 4, ‘D1 10, ‘D1 10]

The apostrophes in front of the lists and D1 terms indicated that these are promoted constructors. So they are types instead of terms. To “read” this type, we start with the first data format. We go to each successive data format by applying the transformation layer. So for instance we start with a 4-vector, and transform it into a 10-vector with a fully-connected layer. Then we transform that 10-vector into another 10-vector by applying relu. That’s all there is to it! We could apply another FullyConnected layer onto this that will have 3 outputs like so:

type SimpleNetwork = Network
  ‘[FullyConnected 4 10, Relu, FullyConnected 10 3]
  ‘[ ‘D1 4, ‘D1 10, ‘D1 10, `D1 3]

Let's look at MNIST to see a more complicated example. We'll start with a 28x28 image of data. Then we’ll perform the convolution layer I mentioned above. This gives us a 3-dimensional tensor of size 24x24x10. Then we can perform 2x2 max pooling on this, resulting in a 12x12x10 tensor. Finally, we can apply an Relu layer, which keeps it at the same size:

type MNISTStart = MNISTStart
  ‘[Convolution 1 10 5 5 1 1, Pooling 2 2 2 2, Relu]
  ‘[D2 28 28, D3 24 24 10, D3 12 12 10, D3 12 12 10]

Here’s what a full MNIST example might look like (per the README on the library’s Github page):

type MNIST = Network
    '[ Convolution 1 10 5 5 1 1, Pooling 2 2 2 2, Relu
     , Convolution 10 16 5 5 1 1, Pooling 2 2 2 2, FlattenLayer, Relu
     , FullyConnected 256 80, Logit, FullyConnected 80 10, Logit]
    '[ 'D2 28 28, 'D3 24 24 10, 'D3 12 12 10, 'D3 12 12 10
     , 'D3 8 8 16, 'D3 4 4 16, 'D1 256, 'D1 256
     , 'D1 80, 'D1 80, 'D1 10, 'D1 10]

This is a much simpler and more concise description of our network than we can get in Tensor Flow! Let’s examine the ways in which the library uses dependent types to its advantage.

The Magic of Dependent Types

Describing our network as a type seems like a strange idea if you’ve never used dependent types before. But it gives us a couple great perks!

The first major win we get is that it is very easy to generate the starting values of our network. Since it has a specific type, we can let type inference guide us! We don’t need any term level code that is specific to the shape of our network. All we need to do is attach the type signature and call randomNetwork!

randomSimple :: MonadRandom m => m SimpleNetwork
randomSimple = randomNetwork

This will give us all the initial values we need, so we can get going!

The second (and more important) win is that we can’t build an invalid network! Suppose we try to take our simple network and somehow format it incorrectly. For instance, we could say that instead of the input shape being of size 4, it’s of size 7:

type SimpleNetwork = Network
  ‘[FullyConnected 4 10, Relu, FullyConnected 10 3]
  ‘[ ‘D1 7, ‘D1 10, ‘D1 10, `D1 3]
-- ^^ Notice this 7

This will result in a compile error, since there is a mismatch between the layers. The first layer expects an input of 4, but the first data format is of length 7!

Could not deduce (Layer (FullyConnected 4 10) ('D1 7) ('D1 10))
        arising from a use of ‘randomNetwork’
      from the context: MonadRandom m
        bound by the type signature for:
                   randomSimple :: MonadRandom m => m SimpleNetwork
        at src/IrisGrenade.hs:29:1-48

In other words, it notices that the chain from D1 7 to D1 10 using a FullyConnected 4 10 layer is invalid. So it doesn’t let us make this network. The same thing would happen if we made the layers themselves invalid. For instance, we could make the output and input of the two fully-connected layers not match up:

-- We changed the second to take 20 as the number of input elements.
type SimpleNetwork = Network 
  '[FullyConnected 4 10, Relu, FullyConnected 20 3]
  '[ 'D1 4, 'D1 10, 'D1 20, 'D1 3]

…

/Users/jamesbowen/HTensor/src/IrisGrenade.hs:30:16: error:
    • Could not deduce (Layer (FullyConnected 20 3) ('D1 10) ('D1 3))
        arising from a use of ‘randomNetwork’
      from the context: MonadRandom m
        bound by the type signature for:
                   randomSimple :: MonadRandom m => m SimpleNetwork
        at src/IrisGrenade.hs:29:1-48

So Grenade makes our program much safer by providing compile time guarantees about our network's validity. Runtime errors due to dimensionality are impossible!

Training the Network on Iris

Now let’s do a quick run-through of how we actually train this neural network. Readers with a keen eye may have noticed that the SimpleNetwork we’ve built is the same network we used to train the Iris data set. So we’ll do a training run there, using the following steps:

  1. Write the network type and generate a random network from it
  2. Read our input data into a format that Grenade uses
  3. Write a function to run a training iteration.
  4. Run it!

1. Write the Network type and Generate Network

So we've already done this first step for the most part. We’ll adjust the names a little bit though. Note that I’ll include the imports list as an appendix to the post. Also, the code is on the grenade branch of my Haskell Tensor Flow repository in IrisGrenade.hs!

type IrisNetwork = Network 
  '[FullyConnected 4 10, Relu, FullyConnected 10 3]
  '[ 'D1 4, 'D1 10, 'D1 10, 'D1 3]

randomIris :: MonadRandom m => m IrisNetwork
randomIris = randomNetwork

runIris :: FilePath -> FilePath -> IO ()
runIris trainingFile testingFile = do
  initialNetwork <- randomIris
  ...

2. Take in our Input Data

We’ll make use of the readIrisFromFile function we used back when we first did Iris. Then we'll make a dependent type called IrisRow, which uses the S type. This S type is a container for a shape. We want our input data to use D1 4 for the 4 input features. Then our output data should use D1 3 for the three possible categories.

-- Dependent type on the dimensions of the row
type IrisRow = (S ('D1 4), S ('D1 3))

If we have malformed data, the types will not match up, so we’ll need to return a Maybe to ensure this succeeds. Note that we normalize the data by dividing by 8. This puts all the data between 0 and 1 and makes for better training results. Here's how we parse the data:

parseRecord :: IrisRecord -> Maybe IrisRow
parseRecord record = case (input, output) of
  (Just i, Just o) -> Just (i, o)
  _ -> Nothing
  where
    input = fromStorable $ VS.fromList $ float2Double <$>
      [ field1 record / 8.0, field2 record / 8.0, field3 record / 8.0, field4 record / 8.0]
    output = oneHot (fromIntegral $ label record)

Then we incorporate these into our main function:

runIris :: FilePath -> FilePath -> IO ()
runIris trainingFile testingFile = do
  initialNetwork <- randomIris
  trainingRecords <- readIrisFromFile trainingFile
  testRecords <- readIrisFromFile testingFile

  let trainingData = mapMaybe parseRecord (V.toList trainingRecords)
  let testData = mapMaybe parseRecord (V.toList testRecords)

  -- Catch if any were parsed as Nothing
  if length trainingData /= length trainingRecords || length testData /= length testRecords
    then putStrLn "Hmmm there were some problems parsing the data"
    else …

3. Write a Function to Train the Input Data

This is a multi-step process. First we’ll establish our learning parameters. We'll also write a function that will allow us to call the train function on a particular row element:

learningParams :: LearningParameters
learningParams = LearningParameters 0.01 0.9 0.0005

-- Train the network!
trainRow :: LearningParameters -> IrisNetwork -> IrisRow -> IrisNetwork
trainRow lp network (input, output) = train lp network input output

Next we’ll write two more helper functions that will help us test our results. The first will take the network and a test row. It will transform it into the predicted output and the actual output of the network. The second function will take these outputs and reverse the oneHot process to get the label out (0, 1, or 2).

-- Takes a test row, returns predicted output and actual output from the network.
testRow :: IrisNetwork -> IrisRow -> (S ('D1 3), S ('D1 3))
testRow net (rowInput, predictedOutput) = (predictedOutput, runNet net rowInput)

-- Goes from probability output vector to label
getLabels :: (S ('D1 3), S ('D1 3)) -> (Int, Int)
getLabels (S1D predictedLabel, S1D actualOutput) = 
  (maxIndex (extract predictedLabel), maxIndex (extract actualOutput))

Finally we’ll write a function that will take our training data, test data, the network, and an iteration number. It will return the newly trained network, and log some results about how we’re doing. We’ll first take only a sample of our training data and adjust our parameters so that learning gets slower. Then we'll train the network by folding over the sampled data.

run :: [IrisRow] -> [IrisRow] -> IrisNetwork -> Int -> IO IrisNetwork
run trainData testData network iterationNum = do
  sampledRecords <- V.toList <$> chooseRandomRecords (V.fromList trainData)
  -- Slowly drop the learning rate
  let revisedParams = learningParams 
        { learningRate = learningRate learningParams * 0.99 ^ iterationNum}
  let newNetwork = foldl' (trainRow revisedParams) network sampledRecords
  ....

Then we’ll wrap up the function by looking at our test data, and seeing how much we got right!

run :: [IrisRow] -> [IrisRow] -> IrisNetwork -> Int -> IO IrisNetwork
    run trainData testData network iterationNum = do
      sampledRecords <- V.toList <$> chooseRandomRecords (V.fromList trainData)
      -- Slowly drop the learning rate
      let revisedParams = learningParams 
            { learningRate = learningRate learningParams * 0.99 ^ iterationNum}
      let newNetwork = foldl' (trainRow revisedParams) network sampledRecords
      let labelVectors = fmap (testRow newNetwork) testData
      let labelValues = fmap getLabels labelVectors
      let total = length labelValues
      let correctEntries = length $ filter ((==) <$> fst <*> snd) labelValues
      putStrLn $ "Iteration: " ++ show iterationNum
      putStrLn $ show correctEntries ++ " correct out of: " ++ show total
      return newNetwork

4. Run it!

We’ll call this now from our main function, iterating 100 times, and we’re done!

runIris :: FilePath -> FilePath -> IO ()
runIris trainingFile testingFile = do
 ...
  if length trainingData /= length trainingRecords || length testData /= length testRecords
    then putStrLn "Hmmm there were some problems parsing the data"
    else foldM_ (run trainingData testData) initialNetwork [1..100]

Comparing to Tensor Flow

So now that we’ve looked at a different library, we can consider how it stacks up against Tensor Flow. So first, the advantages. Grenade's main advantage is that it provides dependent type facilities. This means it is more difficult to write incorrect programs. The basic networks you build are guaranteed to have the correct dimensionality. Additionally, it does not use a “placeholders” system, so you can avoid those kinds of errors too. This means you're likely to have fewer runtime bugs using Grenade.

Concision is another major strong point. The training code got a bit involved when translating our data into Grenade's format. But it’s no more complicated than Tensor Flow. When it comes down to the exact definition of the network itself, we do this in only a few lines with Grenade. It’s complicated to understand what those lines mean if you are new to dependent types. But after seeing a few simple examples you should be able to follow the general pattern.

Of course, none of this means that Tensor Flow is without its advantages. As we saw a couple weeks ago, it is not too difficult to add very thorough logging to your Tensor Flow program. The Tensor Board application will then give you excellent visualizations of this data. It is somewhat more difficult to get intermediate log results with Grenade. There is not too much transparency (that I have found at least) into the inner values of the network. The network types are composable though. So it is possible to get intermediate steps of your operation. But if you break your network into different types and stitch them together, you will remove some of the concision of the network.

Also, Tensor Flow also has a much richer ecosystem of machine learning tools to access. Grenade is still limited to a subset of the most common machine learning layers, like convolution and max pooling. Tensor Flow’s API allows approaches like support vector machines and linear models. So Tensor Flow offers you more options.

One question I may explore in a future article would be to compare the performance of the two libraries. My suspicion is that Tensor Flow is faster due to how it gets all its math down to the C-level. But I’m not too familiar yet with HMatrix (which Grenade depends on for its math) and its efficiency. So I could definitely be wrong.

Conclusion

Grenade provides some truly awesome facilities for building a concise neural network. A Grenade program can demonstrate at compile time that the network is well formed. It also allows an incredibly concise way to define what layers your neural network has. It doesn’t have the Google level support that Tensor Flow does. So it lacks many cool features like logging and visualizations. But it is quite a neat library for its scope. One thing I haven’t mentioned is its mechanics for Generative/Adversarial networks. I’d definitely like to try that out soon!

Grenade is a simpler library to incorporate into Stack compared to Tensor Flow. If you want to compare the two, you should check out our Haskell Tensor Flow guide so you can install TF and get started!

If you’ve never written a line of Haskell before, never fear! Download our Getting Started Checklist for some free resources to start your Haskell education!

Appendix: Compiler Extensions and Imports

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE TupleSections #-}
{-# LANGUAGE GADTs #-}

import           Control.Monad (foldM_)
import           Control.Monad.Random (MonadRandom)
import           Control.Monad.IO.Class (liftIO)
import           Data.Foldable (foldl')
import           Data.Maybe (mapMaybe)
import qualified Data.Vector.Storable as VS
import qualified Data.Vector as V
import           GHC.Float (float2Double)
import           Grenade
import           Grenade.Core.LearningParameters (LearningParameters(..))
import           Grenade.Core.Shape (fromStorable)
import           Grenade.Utils.OneHot (oneHot)
import           Numeric.LinearAlgebra (maxIndex)
import           Numeric.LinearAlgebra.Static (extract)

import           Processing (IrisRecord(..), readIrisFromFile, chooseRandomRecords)

Checking it's all in Place: Placeholders and Dependent Types

Last week we dove into the world of dependent types. We linked tensors with their shapes at the type level. This gave our program some extra type safety and allowed us to avoid certain runtime errors.

This week, we’re going to solve another runtime conundrum: missing placeholders. We’ll add some more dependent type machinery to ensure we've plugged in all the necessary placeholders! But we’ll see this is not as straightforward as shapes.

To follow along with the code in this article, take a look at this branch on my Haskell Tensor Flow Github repository. All the code for this article is in DepShape.hs. As usual, I've listed the necessary compiler extensions and imports at the bottom of this article. If you want to run the code yourself, you'll have to get Haskell and Tensor Flow running first. Take a look at our Haskell Tensor Flow guide for that!

Now to start, let’s remind ourselves what placeholders are in Tensor Flow and how we use them.

Placeholder Review

Placeholders represent tensors that can have different values on different application runs. This is often the case when we’re training on different samples of data. Here’s our very simple example in Python. We’ll create a couple placeholder tensors by providing their shapes and no values. Then when we actually run the session, we’ll provide a value for each of those tensors.

node1 = tf.placeholder(tf.float32)
node2 = tf.placeholder(tf.float32)
adderNode = tf.add(node1, node2)
sess = tf.Session()
result1 = sess.run(adderNode, {node1: 3, node2: 4.5 })

The weakness here is that there’s nothing forcing us to provide values for those tensors! We could try running our program without them and we’ll get a runtime crash:

...
sess = tf.Session()
result1 = sess.run(adderNode)
print(result1)
…

Terminal Output:

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder' with dtype float
   [[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Unfortunately, the Haskell Tensor Flow library doesn’t actually do any better here. When we want to fill in placeholders, we provide a list of “feeds”. But our program will still compile even if we pass an empty list! We’ll encounter similar runtime errors:

(node1 :: Tensor Value Float) <- placeholder [1]
(node2 :: Tensor Value Float) <- placeholder [1]
let adderNode = node1 `add` node2
let runStep = \node1Feed node2Feed -> runWithFeeds [] adderNode
runStep (encodeTensorData [1] input1) (encodeTensorData [1] input2)
…

Terminal Output:

TensorFlowException TF_INVALID_ARGUMENT "You must feed a value for placeholder tensor 'Placeholder_1' with dtype float and shape [1]\n\t [[Node: Placeholder_1 = Placeholder[dtype=DT_FLOAT, shape=[1], _device=\"/job:localhost/replica:0/task:0/cpu:0\"]()]]"

In the Iris and MNIST examples, we bury the call to runWithFeeds within our neural network API. We only provide a Model object. This model object forces us to provide the expected input and output tensors. So anyone using our model wouldn't make a manual runWithFeeds call.

data Model = Model
  { train :: TensorData Float
          -> TensorData Int64
          -> Session ()
  , errorRate :: TensorData Float
              -> TensorData Int64
              -> SummaryTensor
              -> Session (Float, ByteString)
  }

This isn’t a bad solution! But it’s interesting to see how we can push the envelope with dependent types, so let’s try that!

Adding More “Safe” Types

The first step we’ll take is to augment Tensor Flow’s TensorData type. We’ll want it to have shape information like SafeTensor and SafeShape. But we’ll also attach a name to each piece of data. This will allow us to identify which tensor to substitute the data in for. At the type level, we refer to this name as a Symbol.

data SafeTensorData a (n :: Symbol) (s :: [Nat]) where
  SafeTensorData :: (TensorType a) => TensorData a -> SafeTensorData a n s

Next, we’ll need to make some changes to our SafeTensor type. First, each SafeTensor will get a new type parameter. This parameter refers to a mapping of names (symbols) to shapes (which are still lists of naturals). We'll call this a placeholder list. So each tensor will have type-level information for the placeholders it depends on. Each different placeholder has a name and a shape.

data SafeTensor v a (s :: [Nat]) (p :: [(Symbol, [Nat])]) where
  SafeTensor :: (TensorType a) => Tensor v a -> SafeTensor v a s p

Now, recall when we substituted for placeholders, we used a list of feeds. But this list had no information about the names or dimensions of its feeds. Let's create a new type containing the different elements we need for our feeds. It should also contain the correct type information about the placeholder list. The first step of to define the type so that it has the list of placeholders it contains, like the SafeTensor.

data FeedList (pl :: [(Symbol, [Nat])]) where

This structure will look like a linked list, like our SafeShape. Thus we’ll start by defining an “empty” constructor:

data FeedList (pl :: [(Symbol, [Nat])]) where
  EmptyFeedList :: FeedList '[]

Now we’ll add a “Cons”-like constructor by creating yet another type operator :--:. Each “piece” of our linked list will contain two different items. First, the tensor we are substituting for. Next, it will have the data we’ll be using for the substitution. We can use type parameters to force their shapes and data types to match. Then we need the resulting placeholder type. We have to append the type-tuple containing the symbol and shape to the previous list. This completes our definition.

data FeedList (pl :: [(Symbol, [Nat])]) where
  EmptyFeedList :: FeedList '[]
  (:--:) :: (KnownSymbol n)
    => (SafeTensor Value a s p, SafeTensorData a n s) 
    -> FeedList pl
    -> FeedList ( '(n, s) ': pl)

infixr 5 :--:

Note that we force the tensor to be a Value tensor. We can only substitute data for rendered tensors, hence this restriction. Let's add a quick safeRender so we can render our SafeTensor items.

safeRender :: (MonadBuild m) => SafeTensor Build a s pl -> m (SafeTensor Value a s pl)
safeRender (SafeTensor t1) = do
  t2 <- render t1
  return $ SafeTensor t2

Making a Placeholder

Now we can write our safePlaceholder function. We’ll add a KnownSymbol as a type constraint. Then we’ll take a SafeShape to give ourselves the type information for the shape. The result is a new tensor that maps the symbol and the shape in the placeholder list.

safePlaceholder :: (MonadBuild m, TensorType a, KnownSymbol sym) => 
  SafeShape s -> m (SafeTensor Value a s '[ '(sym, s)])
safePlaceholder shp = do
  pl <- placeholder (toShape shp)
  return $ SafeTensor pl

This looks a little crazy, and it kind’ve is! But we’ve now created a tensor that stores its own placeholder information at the type level!

Updating Old Code

Now that we’ve done this, we’re also going to have to update some of our older code. The first part of this is pretty straightforward. We’ll need to change safeConstant so that it has the type information. It will have an empty list for the placeholders.

safeConstant :: (TensorType a, ShapeProduct s ~ n) => 
  Vector n a -> SafeShape s -> SafeTensor Build a s '[]
safeConstant elems shp = SafeTensor (constant (toShape shp) (toList elems))

Our mathematical operations will be a bit more tricky though. Consider adding two arbitrary tensors. They may share placeholder dependencies but may not. What should be the placeholder type for the resulting tensor? Obviously the union of the two placeholder maps of the input tensors! Luckily for us, we can use Union from the type-list library to represent this concept.

safeAdd :: (TensorType a, a /= Bool, TensorKind v)
  => SafeTensor v a s p1
  -> SafeTensor v a s p2
  -> SafeTensor Build a s (Union p1 p2)
safeAdd (SafeTensor t1) (SafeTensor t2) = SafeTensor (t1 `add` t2)

We’ll make the same update with matrix multiplication:

safeMatMul :: (TensorType a, a /= Bool, a /= Int8, a /= Int16,
               a /= Int64, a /= Word8, a /= ByteString, TensorKind v)
   => SafeTensor v a '[i,n] p1 -> SafeTensor v a '[n,o] p2 -> SafeTensor Build a '[i,o] (Union p1 p2)
safeMatMul (SafeTensor t1) (SafeTensor t2) = SafeTensor (t1 `matMul` t2)

Running with Placeholders

Now we have all the information we need to write our safeRun function. This will take a SafeTensor, and it will also take a FeedList with the same placeholder type. Remember, a FeedList contains a series of SafeTensorData items. They must match up symbol-for-symbol and shape-for-shape with the placeholders within the SafeTensor. Let’s look at the type signature:

safeRun :: (TensorType a, Fetchable (Tensor v a) r) =>
  FeedList pl -> SafeTensor v a s pl -> Session r

The Fetchable constraint enforces that we can actually get the “result” r out of our tensor. For instance, we can "fetch" a vector of floats out of a tensor that uses Float as its underlying value.

We’ll next define a tail-recursive helper function to build the vanilla “list of feeds” out of our FeedList. Through pattern matching, we can pick out the tensor to substitute for and the data we’re using. We can combine these into a feed and append to the growing list:

safeRun = ...
  where
    buildFeedList :: FeedList ss -> [Feed] -> [Feed]
    buildFeedList EmptyFeedList accum = accum
    buildFeedList ((SafeTensor tensor_, SafeTensorData data_) :--: rest) accum = 
      buildFeedList rest ((feed tensor_ data_) : accum)

Now all we have to do to finish up is call the normal runWithFeeds function with the list we’ve created!

safeRun :: (TensorType a, Fetchable (Tensor v a) r) =>
  FeedList pl -> SafeTensor v a s pl -> Session r
safeRun feeds (SafeTensor finalTensor) = runWithFeeds (buildFeedList feeds []) finalTensor
  where
  ...

And here’s what it looks like to use this in practice with our simple example. Notice the type signatures do get a little cumbersome. The signatures we place on the initial placeholder tensors are necessary. Otherwise the compiler wouldn't know what label we're giving them! The signature containing the union of the types is unnecessary. We can remove it if we want and let type inference do its work.

main3 :: IO (VN.Vector Float)
main3 = runSession $ do
  let (shape1 :: SafeShape '[2,2]) = fromJust $ fromShape (Shape [2,2])
  (a :: SafeTensor Value Float '[2,2] '[ '("a", '[2,2])]) <- safePlaceholder shape1
  (b :: SafeTensor Value Float '[2,2] '[ '("b", '[2,2])] ) <- safePlaceholder shape1
  let result = a `safeAdd` b
  (result_ :: SafeTensor Value Float '[2,2] '[ '("b", '[2,2]), '("a", '[2,2])]) <- safeRender result
  let (feedA :: Vector 4 Float) = fromJust $ fromList [1,2,3,4]
  let (feedB :: Vector 4 Float) = fromJust $ fromList [5,6,7,8]
  let fullFeedList = (b, safeEncodeTensorData shape1 feedB) :--:
                     (a, safeEncodeTensorData shape1 feedA) :--:
                     EmptyFeedList
  safeRun fullFeedList result_

{- It runs!
[6.0,8.0,10.0,12.0]
-}

Now suppose we make some mistakes with our types. Here we’ll take out the “A” feed from our feed list:

-- Let’s take out Feed A!
main = …
  let fullFeedList = (b, safeEncodeTensorData shape1 feedB) :--:
                     EmptyFeedList
  safeRun fullFeedList result_

{- Compiler Error!
• Couldn't match type ‘'['("a", '[2, 2])]’ with ‘'[]’
      Expected type: SafeTensor Value Float '[2, 2] '['("b", '[2, 2])]
        Actual type: SafeTensor
                       Value Float '[2, 2] '['("b", '[2, 2]), '("a", '[2, 2])]
-}

Here’s what happens when we try to substitute a vector with the wrong size. It will identify that we have the wrong number of elements!

main = …
  -- Wrong Size!
  let (feedA :: Vector 8 Float) = fromJust $ fromList [1,2,3,4,5,6,7,8]
  let (feedB :: Vector 4 Float) = fromJust $ fromList [5,6,7,8]
  let fullFeedList = (b, safeEncodeTensorData shape1 feedB) :--:
                     (a, safeEncodeTensorData shape1 feedA) :--:
                     EmptyFeedList
  safeRun fullFeedList result_

{- Compiler Error!
Couldn't match type ‘4’ with ‘8’
        arising from a use of ‘safeEncodeTensorData’
-}

Conclusion: Pros and Cons

So let’s take a step back and look at what we’ve constructed here. We’ve managed to provide ourselves with some pretty cool compile time guarantees. We’ve also added de-facto documentation to our code. Anyone familiar with the codebase can tell at a glance what placeholders we need for each tensor. It’s a lot harder now to write incorrect code. There are still error conditions of course. But if we’re smart we can write our code to deal with these all upfront. That way we can fail gracefully instead of throwing a random run-time crash somewhere.

But there are drawbacks. Imagine being a Haskell novice and walking into this codebase. You’ll have no real clue what’s going on (I wouldn’t have 2 months ago). The types are very cumbersome after a while, so continuing to write them down gets very tedious. Though as I mentioned, type inference can deal with a lot of that. But if you don’t track them, the type union can be finicky about the ordering of your placeholders. We could fix this with another type family though.

All these factors could present a real drag on development. But then again, tracking down run-time errors can also do this. Tensor Flow’s error messages can still be a little cryptic. This can make it hard to find root causes.

Since I’m still a novice with dependent types, this code was a little messy. Next week we’ll take a look at a more polished library that uses dependent types for neural networks. We’ll see how the Grenade library allows us to specify a learning system in just a few lines of code!

If you’re new to Haskell, I hope none of this dependent type madness scared you! The language is much easier than these last couple posts make it seem! Try it out, and download our Getting Started Checklist. It'll give you some instructions and tools to help you learn!

If you’re an experienced Haskeller and want to try out Tensor Flow, download our Tensor Flow Guide! It will walk you through incorporating the library into a Stack project!

Appendix: Compiler Extensions and Imports

{-# LANGUAGE GADTs                #-}
{-# LANGUAGE DataKinds            #-}
{-# LANGUAGE KindSignatures       #-}
{-# LANGUAGE TypeOperators        #-}
{-# LANGUAGE ScopedTypeVariables  #-}
{-# LANGUAGE TypeFamilies         #-}
{-# LANGUAGE FlexibleContexts     #-}
{-# LANGUAGE UndecidableInstances #-}

import           Data.ByteString (ByteString)
import           Data.Int (Int64, Int8, Int16)
import           Data.Maybe (fromJust)
import           Data.Proxy (Proxy(..))
import           Data.Type.List (Union)
import qualified Data.Vector as VN
import           Data.Vector.Sized (Vector, toList, fromList)
import           Data.Word (Word8)
import           GHC.TypeLits (Nat, KnownNat, natVal)
import           GHC.TypeLits

import           TensorFlow.Core
import           TensorFlow.Core (Shape(..), TensorType, Tensor, Build)
import           TensorFlow.Ops (constant, add, matMul, placeholder)
import           TensorFlow.Session (runSession, run)
import           TensorFlow.Tensor (TensorKind)

Deep Learning and Deep Types: Tensor Flow and Dependent Types

In the introduction to this series, one primary point I made was that Haskell is a safe language. There are a lot of errors we will catch at compile time, rather than runtime. Runtime errors can often be catastrophic to a system, so being able to reduce these is paramount. This is especially true when programming an autonomous car or drone. These objects will be out in the real world where they can hurt people if they malfunction.

So let’s take a look back at some of the code we’ve written over the last 3 or 4 weeks. Is it actually any safer? We’ll find the answer is, well, not so much. It's hard to verify certain properties about code. But the facilities for making this code safer do exist in Haskell! In the next couple articles we'll do some serious hacking with dependent types. We'll be able to prove some of these difficult properties of AI programs at compile time!

The next three articles will focus on dependent type programming. This is a difficult topic, so don’t worry if you can’t follow all the code examples completely. The main idea of making our machine learning code safer is what’s important! So without further ado, let’s rewind to the beginning to see where runtime issues can appear.

If you want to play with this code yourself, check out the dependent shapes branch on my Github repository! All the code for this article is in DepShape.hs Though if you want to get the code to run, you'll probably also need to get Haskell Tensor Flow working. Download our Haskell Tensor Flow Guide for instructions on that!

Issues with Python

Python, as an interpreted language, is definitely subject to runtime bugs. As I was first learning Tensor Flow, I came across a lot of these that were quite common. The two that stood out to me most were placeholder failures and dimension mismatches. For instance, let’s think back to one of the first examples. Our code will have a couple of placeholders, and we submit values for those when we run the session:

node1 = tf.placeholder(tf.float32)
node2 = tf.placeholder(tf.float32)
adderNode = tf.add(node1, node2)
sess = tf.Session()
result1 = sess.run(adderNode, {node1: 3, node2: 4.5 })

But there’s nothing stopping us from trying to run the session without submitting values. This will result in a runtime crash:

...
sess = tf.Session()
result1 = sess.run(adderNode)
print(result1)
…

Terminal Output:

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder' with dtype float
   [[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Another issue that came up from time to time was dimension mismatches. Certain operations need certain relationships between the dimensions of the tensors. For instance, you can’t add two vectors with different lengths:

node1 = tf.constant([3.0, 4.0, 5.0], dtype=tf.float32)
node2 = tf.constant([4.0, 16.0], dtype=tf.float32)
additionNode = tf.add(node1, node2)

sess = tf.Session()
result = sess.run(additionNode)
print(result)

…

Terminal Output:

ValueError: Dimensions must be equal, but are 3 and 2 for 'Add' (op: 'Add') with input shapes: [3], [2].

Again, we get a runtime crash. These seem like the kinds of problems we can solve at compile time.

Does Haskell Solve these Issues?

But anyone who takes a close look at the Haskell code I’ve written so far can see that it doesn’t solve these issues! Here’s a review of our basic placeholder example:

runPlaceholder :: Vector Float -> Vector Float -> IO (Vector Float)
runPlaceholder input1 input2 = runSession $ do
  (node1 :: Tensor Value Float) <- placeholder [1]
  (node2 :: Tensor Value Float) <- placeholder [1]
  let adderNode = node1 `add` node2
  let runStep = \node1Feed node2Feed -> runWithFeeds 
        [ feed node1 node1Feed
        , feed node2 node2Feed
        ] 
        adderNode
  runStep (encodeTensorData [1] input1) (encodeTensorData [1] input2)

Notice how the runWithFeeds function takes a list of Feed objects. The code would still compile fine if we supplied the empty list. Then it would face a fate no better than our Python code:

…
let runStep = \node1Feed node2Feed -> runWithFeeds [] adderNode
…

Terminal Output:

TensorFlowException TF_INVALID_ARGUMENT "You must feed a value for placeholder tensor 'Placeholder_1' with dtype float and shape [1]\n\t [[Node: Placeholder_1 = Placeholder[dtype=DT_FLOAT, shape=[1], _device=\"/job:localhost/replica:0/task:0/cpu:0\"]()]]"

For the second example of dimensionality, we can also make this mistake in Haskell. The following code compiles and will crash at runtime:

runSimple :: IO (Vector Float)
runSimple = runSession $ do
  let node1 = constant [3] [3 :: Float, 4, 5]
  let node2 = constant [2] [4 :: Float, 5]
  let additionNode = node1 `add` node2
  run additionNode
…

Terminal Output:
TensorFlowException TF_INVALID_ARGUMENT "Incompatible shapes: [3] vs. [2]\n\t [[Node: Add_2 = Add[T=DT_FLOAT, _device=\"/job:localhost/replica:0/task:0/cpu:0\"](Const_0, Const_1)]]"

At an even more basic level, we don’t even have to tell the truth about the shape of our vectors! We can give a bogus shape value and it will still compile!

let node1 = constant [3, 2, 3] [3 :: Float, 4, 5]
…

Terminal Output:
invalid tensor length: expected 18 got 3
CallStack (from HasCallStack):
  error, called at src/TensorFlow/Ops.hs:299:23 in tensorflow-ops-0.1.0.0-EWsy8DQdciaL8o6yb2fUKR:TensorFlow.Ops

Can we do better?

Now, we did do some things right. Let's think back to our Model type when we made neural networks.

data Model = Model
  { train :: TensorData Float
          -> TensorData Int64
          -> Session ()
  , errorRate :: TensorData Float
              -> TensorData Int64
              -> SummaryTensor
              -> Session (Float, ByteString)
  }

We exposed our training step as a function. This function forced the user to supply both of the tensors for the placeholders. This is good, but doesn't protect us from dimension issues.

When trying to solve these, we could write wrappers around every operation. Functions like add and matMul could return Maybe values. But this would be clunky. We could take this same step in Python. Granted, monads would allow the Haskell version to compose better. But it would be nicer if we could check our errors all at once, up front.

If we’re willing to dig quite a bit deeper, we can solve these problems! In the rest of this post, we’ll explore using dependent types to ensure dimensions are always correct. Getting placeholders right turns out to be a little more complicated though! So we’ll save that for next week’s post.

Checking Dimensions

Currently, the Tensor Types we’ve been dealing with have no type safety on the dimensions. Tensor Flow doesn't provide this information when interacting with the C library. So it’s impossible to enforce it at a low level. But this doesn’t stop us from writing wrappers that allow us to solve this.

To write these wrappers, we’re going to need to dive into dependent types. I’ll give a high level overview of what’s going on. But for some details on the basics, you should check out this tutorial . I’ll also give a shout-out to Renzo Carbonara, author of the Exinst library and other great Haskell things. He helped me a lot in crossing a couple big knowledge gaps for implementing dependent types.

Intro to Dependent Types: Sized Vectors

The simplest example for introducing dependent types is the idea of sized vectors. If you read the tutorial above, you'll see how they're implemented from scratch. A normal vector has a single type parameter, referring to what type of item the vector contains. A sized vector has an extra type parameter, and this type refers to the size of the vector. For instance, the following are valid sized vector types:

import Data.Vector.Sized (Vector, fromList)

vectorWith2 :: Vector 2 Int64
...
vectorWith6 :: Vector 6 Float
...

In the first type signature, 2 does not refer to the term 2. It refers to the type 2. That is, we’ve taken the term and promoted it to a type which has only a single value. The mechanics of how this works are confusing, but here’s the result. We can try to convert normal vectors to sized vectors. But the operation will fail if we don’t match up the size.

import Data.Vector.Sized (Vector, fromList)
import GHC.TypeLits (KnownNat)

-- fromList :: (KnownNat n) => [a] -> Maybe (Vector n a)

-- This results in a “Just” value!
success :: Maybe (Vector 2 Int64)
success = fromList [5,6]

-- The sizes don’t match, so we’ll get “Nothing”!
failure :: Maybe (Vector 2 Int64)
failure = fromList [3,1,5]

The KnownNat constraint allows us to specify that the type n refers to a single natural number. So now we can assign a type signature that encapsulates the size of the list.

A “Safe” Shape type

Now that we have a very basic understanding of dependent types, let's come up with a gameplan for Tensor Flow. The first step will be to make a new type that puts the shape into the type signature. We'll make a SafeShape type that mimics the sized vector type. Instead of storing a single number as the type, it will store the full list of dimensions. We want to create an API something like this:

-- fromShape :: Shape -> Maybe (SafeShape s)

-- Results in a “Just” value
goodShape :: Maybe (SafeShape ‘[2, 2])
goodShape = fromShape (Shape [2,2])

-- Results in Nothing
badShape :: Maybe (SafeShape ‘[2,2])
badShape = fromShape (Shape [3,3,2])

So to do this, we first define the SafeShape type. This follows the example of sized vectors. See the appendix below for compiler extensions and imports used throughout this article. In particular, you want GADTs and DataKinds.

data SafeShape (s :: [Nat]) where
  NilShape :: SafeShape '[]
  (:--) :: KnownNat m => Proxy m -> SafeShape s -> SafeShape (m ': s)

infixr 5 :--

Now we can define the toShape function. This will take our SafeShape and turn it into a normal Shape using proxies.

toShape :: SafeShape s -> Shape
toShape NilShape = Shape []
toShape ((pm :: Proxy m) :-- s) = Shape (fromInteger (natVal pm) : s')
  where
    (Shape s') = toShape s

Now for the reverse direction, we first have to make a class MkSafeShape. This class encapsulates all the types that we can turn into the SafeShape type. We’ll define instances of this class for all lists of naturals.

class MkSafeShape (s :: [Nat]) where
  mkSafeShape :: SafeShape s
instance MkSafeShape '[] where
  mkSafeShape = NilShape
instance (MkSafeShape s, KnownNat m) => MkSafeShape (m ': s) where
  mkSafeShape = Proxy :-- mkSafeShape

Now we can define our fromShape function using the MkSafeShape class. To check if it works, we’ll compare the resulting shape to the input shape and make sure they’re equal. Note this requires us to define a simple instance of Eq Shape.

instance Eq Shape where
  (==) (Shape s) (Shape r) = s == r

fromShape :: forall s. MkSafeShape s => Shape -> Maybe (SafeShape s)
fromShape shape = if toShape myShape == shape
  then Just myShape
  else Nothing
  where
    myShape = mkSafeShape :: SafeShape s

Now that we’ve done this for Shape, we can create a similar type for Tensor that will store the shape as a type parameter.

data SafeTensor v a (s :: [Nat]) where
  SafeTensor :: (TensorType a) => Tensor v a -> SafeTensor v a s

Using our Safe Types

So what has all this gotten us? Our next goal is to create a safeConstant function. This will let us create a SafeTensor wrapping a constant tensor and storing the shape. Remember, constant takes a shape and a vector without ensuring correlation between them. We want something like this:

safeConstant :: (TensorType a) => Vector n a -> SafeShape s -> SafeTensor Build a s
safeConstant elems shp = SafeTensor $ constant (toShape shp) (toList elems)

This will attach the given shape to the tensor. But there’s one piece missing. We also want to create a connection between the number of input elements and the shape. So something with shape [3,3,2] should force you to input a vector of length 18. And right now, there is no constraint between n and s.

We’ll add this with a type family called ShapeProduct. The instances will state that the correct natural type for a given list of naturals is the product of them. We define the second instance with recursion, so we'll need UndecidableInstances.

type family ShapeProduct (s :: [Nat]) :: Nat
type instance ShapeProduct '[] = 1
type instance ShapeProduct (m ': s) = m * ShapeProduct s

Now we’re almost done with this part! We can fix our safeConstant function by adding a constraint on the ShapeProduct between s and n.

safeConstant :: (TensorType a, ShapeProduct s ~ n) => Vector n a -> SafeShape s -> SafeTensor Build a s
safeConstant elems shp = SafeTensor $ constant (toShape shp) (toList elems)

Now we can write out a simple use of our safeConstant function as follows:

main :: IO (VN.Vector Int64)
main = runSession $ do
  let (shape1 :: SafeShape '[2,2]) = fromJust $ fromShape (Shape [2,2])
  let (elems1 :: Vector 4 Int64) = fromJust $ fromList [1,2,3,4]
  let (constant1 :: SafeTensor Build Int64 '[2,2]) = safeConstant elems1 shape1
  let (SafeTensor t) = constant1
  run t

We’re using fromJust as a shortcut here. But in a real program you would read your initial tensors in and check them as Maybe values. There's still the possibility for runtime failures. But this system has a couple advantages. First, it won't crash. We'll have the opportunity to handle it gracefully. Second, we do all the error checking up front. Once we've assigned types to everything, all the failure cases should be covered.

Going back to the last example, let's change something. For instance, we could make our vector have length 3 instead of 4. We’ll now get a compile error!

main :: IO (VN.Vector Int64)
main = runSession $ do
  let (shape1 :: SafeShape '[2,2]) = fromJust $ fromShape (Shape [2,2])
  let (elems1 :: Vector 3 Int64) = fromJust $ fromList [1,2,3]
  let (constant1 :: SafeTensor Build Int64 '[2,2]) = safeConstant elems1 shape1
  let (SafeTensor t) = constant1
  run t

…

    • Couldn't match type ‘4’ with ‘3’
        arising from a use of ‘safeConstant’
    • In the expression: safeConstant elems1 shape1
      In a pattern binding:
        (constant1 :: SafeTensor Build Int64 '[2, 2])
          = safeConstant elems1 shape1

Adding Type Safe Operations

Now that we’ve attached shape information to our tensors, we can define safer math operations. It's easy to write a safe addition function that ensures that the tensors have the same shape:

safeAdd :: (TensorType a, a /= Bool) => SafeTensor Build a s -> SafeTensor Build a s -> SafeTensor Build a s
safeAdd (SafeTensor t1) (SafeTensor t2) = SafeTensor (t1 `add` t2)

Here’s a similar matrix multiplication function. It ensures we have 2-dimensional shapes and that the dimensions work out. Notice the two tensors share the n dimension. It must be the column dimension of the first tensor and the row dimension of the second tensor:

safeMatMul :: (TensorType a, a /= Bool, a /= Int8, a /= Int16, a /= Int64, a /= Word8, a /= ByteString)
   => SafeTensor Build a '[i,n] -> SafeTensor Build a '[n,o] -> SafeTensor Build a '[i,o]
safeMatMul (SafeTensor t1) (SafeTensor t2) = SafeTensor (t1 `matMul` t2)

Here are these functions in action:

main2 :: IO (VN.Vector Float)
main2 = runSession $ do
  let (shape1 :: SafeShape '[4,3]) = fromJust $ fromShape (Shape [4,3])
  let (shape2 :: SafeShape '[3,2]) = fromJust $ fromShape (Shape [3,2])
  let (shape3 :: SafeShape '[4,2]) = fromJust $ fromShape (Shape [4,2])
  let (elems1 :: Vector 12 Float) = fromJust $ fromList [1,2,3,4,1,2,3,4,1,2,3,4]
  let (elems2 :: Vector 6 Float) = fromJust $ fromList [5,6,7,8,9,10]
  let (elems3 :: Vector 8 Float) = fromJust $ fromList [11,12,13,14,15,16,17,18]
  let (constant1 :: SafeTensor Build Float '[4,3]) = safeConstant elems1 shape1
  let (constant2 :: SafeTensor Build Float '[3,2]) = safeConstant elems2 shape2
  let (constant3 :: SafeTensor Build Float '[4,2]) = safeConstant elems3 shape3
  let (multTensor :: SafeTensor Build Float '[4,2]) = constant1 `safeMatMul` constant2
  let (addTensor :: SafeTensor Build Float '[4,2]) = multTensor `safeAdd` constant3
  let (SafeTensor finalTensor) = addTensor
  run finalTensor

And of course we’ll get compile errors if we use incorrect dimensions anywhere. Let’s say we change multTensor to use [4,3] as its type:

• Couldn't match type ‘2’ with ‘3’
      Expected type: SafeTensor Build Float '[4, 3]
        Actual type: SafeTensor Build Float '[4, 2]
    • In the expression: constant1 `safeMatMul` constant2
…
 • Couldn't match type ‘3’ with ‘2’
      Expected type: SafeTensor Build Float '[4, 2]
        Actual type: SafeTensor Build Float '[4, 3]
    • In the expression: multTensor `safeAdd` constant3
…
• Couldn't match type ‘2’ with ‘3’
      Expected type: SafeTensor Build Float '[4, 3]
        Actual type: SafeTensor Build Float '[4, 2]
    • In the second argument of ‘safeAdd’, namely ‘constant3’

Conclusion

In this exercise we got deep into the weeds of one of the most difficult topics to learn about in Haskell. Dependent types will make your head spin at first. But we saw a concrete example of how they can allow us to detect problematic code at compile time. They are a form of documentation that also enables us to verify that our code is correct in certain ways.

Types do not replace tests (especially behavioral tests). But in this instance there are at least a few different test cases we don’t need to worry about too much. Next week, we’ll see how we can apply these principles to verifying placeholders.

If you want to learn more about the nuts and bolts of using Haskell Tensor Flow, you should check out our Tensor Flow Guide. It will guide you through the basics of adding Tensor Flow to a simple Stack project.

Maybe you’ve never used Haskell before but I’ve convinced you that dependent types are the future. If you want to try it out, download our Getting Started Checklist. You can also learn how to create and organize Haskell projects using Stack! Checkout our Stack mini-course!

Appendix: Extensions and Imports

{-# LANGUAGE GADTs                #-}
{-# LANGUAGE DataKinds            #-}
{-# LANGUAGE KindSignatures       #-}
{-# LANGUAGE TypeOperators        #-}
{-# LANGUAGE ScopedTypeVariables  #-}
{-# LANGUAGE TypeFamilies         #-}
{-# LANGUAGE UndecidableInstances #-}

import           Data.ByteString (ByteString)
import           Data.Constraint (Constraint)
import           Data.Int (Int64, Int8, Int16)
import           Data.Maybe (fromJust)
import           Data.Proxy (Proxy(..))
import qualified Data.Vector as VN
import           Data.Vector.Sized (Vector(..), toList, fromList)
import           Data.Word (Word8)
import           GHC.TypeLits (Nat, KnownNat, natVal)
import           GHC.TypeLits

import           TensorFlow.Core
import           TensorFlow.Core (Shape(..), TensorType, Tensor, Build)
import           TensorFlow.Ops (constant, add, matMul)
import           TensorFlow.Session (runSession, run)

Deeper Still: Convolutional Neural Networks

Two weeks ago, we began our machine study in earnest by constructing a full neural network. But this network was still quite simple by deep learning standards. In this article, we’re going to tackle a much more difficult problem: image recognition. Of course, we’ll still be using a well known data set with well-known results, so this is only the tip of the iceberg. We'll be using the MNIST data set. This set classifies images of handwritten digits as the numbers 0-9. This problem is so well-known that the folks at Tensor Flow refer to it as the “Hello World” of machine learning.

We’ll start this problem by using a very similar approach to what we used with the Iris data set. We’ll make a fully-connected neural network with two layers, and then use the “Adam” optimizer. This will give us some decent results by our beginner standards. But MNIST is a well known problem with a very large data set. So we’re going to hold ourselves to a higher standard of accuracy this time. This will force us to use some more advanced techniques. But to start with, let’s examine what we need to change to adapt our Iris model to work for the MNIST problem. As with the last couple weeks, the code for all this is on Github if you want to follow along.

Re-use and Recycle!

Generally, we can re-use most of the code we had with Iris, which is good news! We still have to make a few adjustments here and there though. First, we’ll use some different constants. We’ll use mnistFeatures in place of irisFeatures, and mnistLabels instead of irisLabels. We’ll also bump up the size of our hidden layer and the number of samples we’ll draw on each iteration:

mnistFeatures :: Int64
mnistFeatures = 784

mnistLabels :: Int64
mnistLabels = 10

numHiddenUnits :: Int64
numHiddenUnits = 1024

sampleSize :: Int
sampleSize = 1000

We’ll also change our model to use Word8 as the result type instead Int64.

data Model = Model
 { train :: TensorData Float
         -> TensorData Word8 -- Used to be Int64
         -> Session ()
 , errorRate :: TensorData Float
             -> TensorData Word8 -- Used to be Int64
             -> SummaryTensor
             -> Session (Float, ByteString)
 }

Now we have to change how we get our input data. Our data isn’t in CSV format this time. We’ll use helper functions from the Tensor Flow library to extract the images and labels:

import TensorFlow.Examples.MNIST.Parse (readMNISTSamples, readMNISTLabels)
…
runDigits :: FilePath -> FilePath -> FilePath -> FilePath -> IO ()
runDigits trainImageFile trainLabelFile testImageFile testLabelFile = 
 withEventWriter eventsDir $ \eventWriter -> runSession $ do

   -- trainingImages, testImages :: [Vector Word8]
   trainingImages <- liftIO $ readMNISTSamples trainImageFile
   testImages <- liftIO $ readMNISTSamples testImageFile

   -- traininglabels, testLabels :: [Word8]
   trainingLabels <- liftIO $ readMNISTLabels trainLabelFile
   testLabels <- liftIO $ readMNISTLabels testLabelFile

   -- trainingRecords, testRecords :: Vector (Vector Word8, Word8)
   let trainingRecords = fromList $ zip trainingImages trainingLabels
   let testRecords = fromList $ zip testImages testLabels
   ...

Our “input” type consists of vectors of Word8 elements. These represent the intensity of various pixels. Our “output” type is Word8, referring to the actual labels (0-9). We read the images and labels from separate files. Then we zip them together to pass to our processing functions. We’ll have to make a few changes to these processing functions for this data set. First, we have to generalize the type of our randomization function:

-- Used to be IrisRecord Specific
chooseRandomRecords :: Vector a -> IO (Vector a)

Next we have to write a new encoding function that will put our data into the TensorData format. This looks like our old version, except dealing with the new tuple type instead of the IrisRecord.

convertDigitRecordsToTensorData 
 :: Vector (Vector Word8, Word8)
 -> (TensorData Float, TensorData Word8)
convertDigitRecordsToTensorData records = (input, output)
 where
   numRecords = Data.Vector.length records 
   input = encodeTensorData [fromIntegral numRecords, mnistFeatures]
     (fromList $ concatMap recordToInputs records)
   output = encodeTensorData [fromIntegral numRecords] (snd <$> records)
   recordToInputs :: (Vector Word8, Word8) -> [Float]
   recordToInputs rec = fromIntegral <$> (toList . fst) rec

And then we just have to substitute our new functions and parameters in, and we’ll be able to run our digit trainer!

Current training error 89.8
Current training error 19.300001
Current training error 13.300001
Current training error 11.199999
Current training error 8.700001
Current training error 6.5999985
Current training error 6.999999
Current training error 5.199999
Current training error 4.400003
Current training error 5.000001
Current training error 2.3000002

test error 6.830001

So our accuracy is 93.2%. This seems like an alright number. But imagine being a Post office and having 6.8% of your mail sorted into the wrong Zip Code! (This was the original use case of this data set). So let’s see if we can do better.

Convolution and Max Pooling

Now we could train our model longer. This will tend to improve our error rate. But we can also help ourselves by making our model more complex. The fundamental flaw with what we’ve got so far is that it doesn’t account for the 2D nature of the images. This means we're losing a ton of useful information. So the first thing we'll do is treat our images as being 28x28 tensors instead of 1x784. This way, our model can pick out specific areas that are significant for the identification of the digit.

One thing we want to account for is that our image might not be in the center of the frame. To account for this, we're going to apply convolution. When using convolution, we break the image into many different overlapping tiles. In our case, we’ll make our strides size “1” in every direction, and we’ll use a patch size of 5x5. So this means we’ll center a 5x5 tile around each different pixel in our image, and then come up with a score for it. That score tells us if this part of the image contains any important information. We can represent this score as a vector with many features.

So with 2D convolution, we'll be dealing with 4-dimensional tensors. The first dimension is the sample size. The second two dimensions are the shape of the image. The final dimension is the number of features of the "score" for each part of the image. So each original image starts our with a single feature for the "score" of each pixel. This score is the actual intensity of that pixel! Then each layer of convolution will act as a mini neural network per pixel, making as many features as we want.

The different sliding windows correspond to scores we store in the next layer. This example uses 3x3 patches; we'll use 5x5.

The different sliding windows correspond to scores we store in the next layer. This example uses 3x3 patches; we'll use 5x5.

Max pooling is a form of down-sampling. After our first convolution step, we’ll have scores on the 28x28 image. We’ll use 2x2 max-pooling, meaning we divide each image into 2x2 squares. Then we’ll make a new layer that is 14x14, using only the “best” score from each 2x2 box. This makes our model more efficient while keeping the most important information.

Simple demonstration of max-pooling

Simple demonstration of max-pooling

Implementing Convolutional Layers

We’ll do two rounds of convolution and max pooling. So we’ll make a function that creates a layer that performs these two steps. This will look a lot like our other neural network layer. We’ll take parameters for the size of the input and output channels of the layer, as well as the tensor itself. So our first step will be to create the weights and bias tensors using these parameters:

patchSize :: Int64
patchSize = 5

buildConvPoolLayer :: Int64 -> Int64 -> Tensor v Float -> Text
                  -> Build (Variable Float, Variable Float, Tensor Build Float)
buildConvPoolLayer inputChannels outputChannels input layerName = withNameScope layerName $ do
 weights <- truncatedNormal (vector weightsShape)
   >>= initializedVariable
 bias <- truncatedNormal (vector [outputChannels]) >>= initializedVariable
 ...
 where
   weightsShape :: [Int64]
   weightsShape = [patchSize, patchSize, inputChannels, outputChannels]

Now we’ll want to call our convolution and max pooling functions. These are still a little rough around the edges (the Haskell library is still quite young). The C versions of these functions have many optional, named attributes. For the moment there don’t seem to be any functions that use normal Haskell values for these arguments. Instead, we’ll be using OpAttr values, assign bytestring names to values.

where
 ...
 convStridesAttr = opAttr "strides" .~ ([1,1,1,1] :: [Int64])
 poolStridesAttr = opAttr "strides" .~ ([1,2,2,1] :: [Int64])
 poolKSizeAttr = opAttr "ksize" .~ ([1,2,2,1] :: [Int64])
 paddingAttr = opAttr "padding" .~ ("SAME" :: ByteString)
 dataFormatAttr = opAttr "data_format" .~ ("NHWC" :: ByteString)
 convAttrs = convStridesAttr . paddingAttr . dataFormatAttr
 poolAttrs = poolKSizeAttr . poolStridesAttr . paddingAttr . dataFormatAttr

The strides argument for convolution refers to how much we shift the window each time. The strides argument for pooling refers to how big the windows will be that we perform the pooling over. In this case, it's 2x2. Now that we have our attributes, we can call the library functions conv2D’ and maxPool’. This gives our resulting vector. We also throw in a call to relu between these steps.

buildConvPoolLayer :: Int64 -> Int64 -> Tensor v Float -> Text
                  -> Build (Variable Float, Variable Float, Tensor Build Float)
buildConvPoolLayer inputChannels outputChannels input layerName = withNameScope layerName $ do
 weights <- truncatedNormal (vector weightsShape)
   >>= initializedVariable
 bias <- truncatedNormal (vector [outputChannels]) >>= initializedVariable
 let conv = conv2D' convAttrs input (readValue weights)
       `add` readValue bias
 let results = maxPool' poolAttrs (relu conv)
 return (weights, bias, results)
 where
   ...

Modifying our Model

Now we’ll make a few updates to our model and we’ll be in good shape. First, we need to reshape our input data to be 4-dimensional. Then, we’ll apply the two convolution/pooling layers:

imageDimen :: Int32
imageDimen = 28

createModel :: Build Model
createModel = do
 let batchSize = -1 -- Allows variable sized batches
 let conv1OutputChannels = 32
 let conv2OutputChannels = 64
 let denseInputSize = 7 * 7 * 64 :: Int32 -- 3136
 let numHiddenUnits = 1024

 inputs <- placeholder [batchSize, mnistFeatures]
 outputs <- placeholder [batchSize]

 let inputImage = reshape inputs (vector [batchSize, imageDimen, imageDimen, 1])

 (convWeights1, convBiases1, convResults1) <- 
   buildConvPoolLayer 1 conv1OutputChannels inputImage "convLayer1"
 (convWeights2, convBiases2, convResults2) <-
   buildConvPoolLayer conv1OutputChannels conv2OutputChannels convResults1 "convLayer2"

Once we’re done with that, we’ll apply two fully-connected (dense) layers as we did before. Note we'll reshape our result from four dimensions back down to two:

let denseInput = reshape convResults2 (vector [batchSize, denseInputSize])
(denseWeights1, denseBiases1, denseResults1) <-
  buildNNLayer (fromIntegral denseInputSize) numHiddenUnits denseInput "denseLayer1"  
let rectifiedDenseResults = relu denseResults1
(denseWeights2, denseBiases2, denseResults2) <-
   buildNNLayer numHiddenUnits mnistLabels rectifiedDenseResults "denseLayer2"

And after that we can treat the rest of the model the same. We'll update the parameter names and add the new weights and biases to the params that the model can change.

As a review, let’s look at the dimensions of each of the intermediate tensors here. Then we can see the restrictions on the dimensions of the different operations. Each convolution step takes two four-dimensional tensors. The final dimension of argument 1 must match the third dimension of argument 2. Then the result will swap in the final dimension of argument 2. Meanwhile, pooling with a 2x2 stride size will take this resulting 4-dimensional tensor and halve each of the inner dimensions.

input: n x 784
inputImage: n x 28 x 28 x 1
convWeights1: 5 x 5 x 1 x 32
convBias1: 32
conv (first layer): n x 28 x 28 x 32
convResults1: n x 14 x 14 x 32
convWeights2:  5 x 5 x 32 x 64
conv (second layer): n x 14 x 14 x 64
convResults2: n x 7 x 7 x 64
denseInput: n x 3136
denseWeights1: 3136 x 1024
denseBias1: 1024
denseResults1: n x 1024
denseWeights2: 1024 x 10
denseBias2: 10
denseResults2: n x 10

So for each input, we’ll have a probability for all 10 of the possible inputs. We pick the greatest of these as the chosen label.

Results

We’ll run our model again, only this time we’ll use a smaller sample size (100 per training iteration). This allows us to train for more iterations (20000). This takes quite a while to train, but we get these results (printed every 1000 iterations).

Current training error 91.0
Current training error 6.0
Current training error 2.9999971
Current training error 2.9999971
Current training error 0.0
Current training error 0.0
Current training error 0.99999905
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0
Current training error 0.0

test error 1.1799991

Not too bad! Once it got going, we saw very few training errors, though still ended up with a model that was a tad overfit. The Tensor Flow MNIST expert tutorial suggests using a dropout factor. This helps diminish the effects of overfitting. But this option isn’t available to us yet in Haskell. Still, we got close to 99% accuracy, which is a success for us!

And here’s what our final graph looks like. Notice the extra layers we added for convolution:

Conclusion

So that’s it for convolutional neural networks! Our goal was to adapt our previous neural network model to recognize digits. Not only did we pick a harder problem, but we also wanted higher accuracy. We achieved this by using more advanced machine learning techniques. Convolution allowed us to use the full 2x2 nature of the image. It also checked for the digit no matter where it was in the image. Max pooling enabled us to make our algorithm more efficient while keeping the most important information.

If you’re itching to see what else Haskell can do with Tensor Flow, check out our Tensor Flow Guide. It’ll walk you through some of the trickier parts of getting the library working on your local machine. It will also go through the most important information you need to know about the types in this library.

If you’re new to Haskell, here are a couple more resources you can dig into before you try your hand at Tensor Flow. First, there’s our Getting Started Checklist. This will point you toward some helpful resources for learning the language. Next, you can check out our Stack mini-course so you can learn how to organize a project! If you want to use Tensor Flow in Haskell, Stack is a must!

Putting the Flow in Tensor Flow!

Last week we built our first neural network and used it on a real machine learning problem. We took the Iris data set and built a classifier that took in various flower measurements. It determined, with decent accuracy, the type of flower the measurements referred to.

But we’ve still only seen half the story of Tensor Flow! We’ve constructed many tensors and combined them in interesting ways. We can imagine what is going on with the “flow”, but we haven’t seen a visual representation of that yet.

We’re in luck though, thanks to the Tensor Board application. With it, we can visualize the computation graph we've created. We can also track certain values throughout our program run. In this article, we’ll take our Iris example and show how we can add Tensor Board features to it. Here's the Github repo with all the code so you can follow along!

Add an Event Writer

The first thing to understand about Tensor Board is that it gets its data from a source directory. While we’re running our system, we have to direct it to write events to that directory. This will allow Tensor Board to know what happened in our training run.

eventsDir :: FilePath
eventsDir = "/tmp/tensorflow/iris/logs/"

runIris :: FilePath -> FilePath -> IO ()
runIris trainingFile testingFile = withEventWriter eventsDir $ \eventWriter -> runSession $ do
...

By itself, this doesn’t write anything down into that directory though! To understand the consequences of this, let’s boot up tensor board.

Running Tensor Board

Running our executable again doesn't bring up tensor board. It merely logs the information that Tensor Board uses. To actually see that information, we’ll run the tensorboard command.

>> tensorboard --logdir=’/tmp/tensorflow/iris/logs’
Starting TensorBoard 47 at http://0.0.0.0:6006

Then we can point our web browser at the correct port. Since we haven't written anything to the file yet, there won’t be much for us to see other than some pretty graphics. So let’s start by logging our graph. This is actually quite easy! Remember our model? We can use the logGraph function combined with our event writer so we can see it.

model <- build createModel
logGraph eventWriter createModel

Now when we refresh Tensor Flow, we’ll see our system’s graph.

What the heck is going on here?

What the heck is going on here?

But, it’s very large and very confusing. The names of all the nodes are a little confusing, and it’s not clear what data is going where. Plus, we have no idea what’s going on with our error rate or anything like that. Let’s make a couple adjustments to fix this.

Adding Summaries

So the first step is to actually specify some measurements that we’ll have Tensor Board plot for us. One node we can use is a “scalar summary”. This provides us with a summary of a particular value over the course of our training run. Let’s do this with our errorRate node. We can use the simple scalarSummary function.

errorRate_ <- render $ 1 - (reduceMean (cast correctPredictions))
scalarSummary "Error" errorRate_

The second type of summary is a histogram summary. We use this on a particular tensor to see the distribution of its values over the course of the run. Let’s do this with our second set of weights. We need to use readValue to go from a Variable to a Tensor.

(finalWeights, finalBiases, finalResults) <-
  buildNNLayer numHiddenUnits irisLabels rectifiedHiddenResults
histogramSummary "Weights" (readValue finalWeights)

So let’s run tensor flow again. We would expect to see these new values show up under the Scalars and Histograms tabs. But they don’t. This is because we still to write these results to our event writer. And this turns out to be a little complicated. First, before we start training, we have to create a tensor representing all our summaries.

logGraph eventWriter createModel
summaryTensor <- build mergeAllSummaries

Now if we had no placeholders, we could run this tensor whenever we wanted, and it would output the values. But our summary tensors depend on the input placeholders, which complicates the matter. So here’s what we’ll do. We’ll only write out the summaries when we check our error rate (every 100 steps). To do this, we have to change our error rate in the model to take the summary tensor as an extra argument. We’ll also have it add a ByteString as a return value to the original Float.

data Model = Model
  { train :: TensorData Float
          -> TensorData Int64
          -> Session ()
  , errorRate :: TensorData Float
              -> TensorData Int64
              -> SummaryTensor
              -> Session (Float, ByteString)
  }

Within our model definition, we’ll use this extra parameter. It will run both the errorRate_ tensor AND the summary tensor together with the feeds:

return $ Model
  , train = ...
  , errorRate = \inputFeed outputFeed summaryTensor -> do
      (errorTensorResult, summaryTensorResult) <- runWithFeeds
        [ feed inputs inputFeed
        , feed outputs outputFeed
        ]
        (errorRate_, summaryTensor)
      return (unScalar errorTensorResult, unScalar summaryTensorResult)

Now we need to modify our calls to errorRate below. We’ll pass the summary tensor as an argument, and get the bytes as output. We’ll write it to our event writer (after decoding), and then we’ll be done!

-- Training
  forM_ ([0..1000] :: [Int]) $ \i -> do
    trainingSample <- liftIO $ chooseRandomRecords trainingRecords
    let (trainingInputs, trainingOutputs) = convertRecordsToTensorData trainingSample
    (train model) trainingInputs trainingOutputs
    when (i `mod` 100 == 0) $ do
      (err, summaryBytes) <- (errorRate model) trainingInputs trainingOutputs summaryTensor
      let summary = decodeMessageOrDie summaryBytes
      liftIO $ putStrLn $ "Current training error " ++ show (err * 100)
      logSummary eventWriter (fromIntegral i) summary

  liftIO $ putStrLn ""

  -- Testing
  let (testingInputs, testingOutputs) = convertRecordsToTensorData testRecords
  (testingError, _) <- (errorRate model) testingInputs testingOutputs summaryTensor
  liftIO $ putStrLn $ "test error " ++ show (testingError * 100)

Now we can see what our summaries look like by running tensor board again!

Scalar Summary of our Error Rate

Scalar Summary of our Error Rate

Histogram summary of our final weights.

Histogram summary of our final weights.

Annotating our Graph

Now let’s look back to our graph. It’s still a bit confusing. We can clean it up a lot by creating “name scopes”. A name scope is part of the graph that we set aside under a single name. When Tensor Board generates our graph, it will create one big block for the scope. We can then zoom in and examine the individual nodes if we want.

We’ll make three different scopes. First, we’ll make a scope for each of the hidden layers of our neural network. This is quite easy, since we already have a function for creating these. All we have to do is make the function take an extra parameter for the name of the scope we want. Then we wrap the whole function within the withNameScope function.

buildNNLayer :: Int64 -> Int64 -> Tensor v Float -> Text
             -> Build (Variable Float, Variable Float, Tensor Build Float)
buildNNLayer inputSize outputSize input layerName = withNameScope layerName $ do
  weights <- truncatedNormal (vector [inputSize, outputSize]) >>= initializedVariable
  bias <- truncatedNormal (vector [outputSize]) >>= initializedVariable
  let results = (input `matMul` readValue weights) `add` readValue bias
  return (weights, bias, results)

We supply our name further down in the code:

(hiddenWeights, hiddenBiases, hiddenResults) <- 
  buildNNLayer irisFeatures numHiddenUnits inputs "layer1"
let rectifiedHiddenResults = relu hiddenResults
(finalWeights, finalBiases, finalResults) <-
  buildNNLayer numHiddenUnits irisLabels rectifiedHiddenResults "layer2"

Now we’ll add a scope around all our error calculations. First, we combine these into an action wrapped in withNameScope. Then, observing that we need the errorRate_ and train_ steps, we return those from the block. That’s it!

(errorRate_, train_) <- withNameScope "error_calculation" $ do
    actualOutput <- render $ cast $ argMax finalResults (scalar (1 :: Int64))
    let correctPredictions = equal actualOutput outputs
    er <- render $ 1 - (reduceMean (cast correctPredictions))
    scalarSummary "Error" er

    let outputVectors = oneHot outputs (fromIntegral irisLabels) 1 0
    let loss = reduceMean $ fst $ softmaxCrossEntropyWithLogits finalResults outputVectors
    let params = [hiddenWeights, hiddenBiases, finalWeights, finalBiases]
    tr <- minimizeWith adam loss params
    return (er, tr)

Now when we look at our graph, we see that it’s divided into three parts: our two layers, and our error calculation. All the information flows among these three parts (as well as the "Adam" optimizer portion).

Much Better

Much Better

Conclusion

By default, Tensor Board graphs can look a little messy. But by adding a little more information to the nodes and using scopes, you can paint a much clearer picture. You can see how the data flows from one end of the application to the other. We can also use summaries to track important information about our graph. We’ll use this most often for the loss function or error rate. Hopefully, we'll see it decline over time.

Next week we’ll add some more complexity to our neural networks. We'll see new tensors for convolution and max pooling. This will allow us to solve the more difficult MNIST digit recognition problem. Stay tuned!

If you’re itching to try out some Tensor Board functionality for yourself, check out our in-depth Tensor Flow guide. It goes into more detail about the practical aspects of using this library. If you want to get the Haskell Tensor Flow library running on your local machine, check it out! Trust me, it's a little complicated, unless you're a Stack wizard already!

And if this is your first exposure to Haskell, try it out! Take a look at our guide to getting started with the language!

Digging in Deep: Solving a Real Problem with Haskell Tensor Flow

Last week we got acquainted with the core concepts of Tensor Flow. We learned about the differences between constants, placeholders, and variable tensors. Both the Haskell and Python bindings have functions to represent these. The Python version was a bit simpler though. Once we had our tensors, we wrote a program that “learned” a simple linear equation.

This week, we’re going to solve an actual machine learning problem. We’re going to use the Iris data set, which contains measurements of different Iris flowers. Each flower belongs to one of three species. Our program will "learn" a function choosing the species from the measurements. This function will involved a fully-connected neural network.

Formatting our Input

The first step in pretty much any machine learning problem is data processing. After all, our data doesn’t magically get resolved into Haskell data types. Luckily, Cassava is a great library to help us out. The Iris data set consists of data in .csv files that each have a header line and then a series of records. They look a bit like this:

120,4,setosa,versicolor,virginica
6.4,2.8,5.6,2.2,2
5.0,2.3,3.3,1.0,1
4.9,2.5,4.5,1.7,2
4.9,3.1,1.5,0.1,0
...

Each line contains one record. A record has four flower measurements, and a final label. In this case, we have three types of flowers we are trying to classify between: Iris Setosa, Iris Versicolor, and Iris Virginica. So the last column contains the numbers 0,1, and 2, corresponding to these respective classes.

Let's create a data type representing each record. Then we can parse the file line-by-line. Our IrisRecord type will contain the feature data and the resulting label. This type will act as a bridge between our raw data and the tensor format we’ll need to run our learning algorithm. We’ll derive the “Generic” typeclass for our record type, and use this to get FromRecord. Once our type has an instance for FromRecord, we can parse it with ease. As a note, throughout this article, I’ll be omitting the imports section from the code samples. I’ve included a full list of imports from these files as an appendix at the bottom. We'll also be using the OverloadedLists extension throughout.

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedLists #-}

...

data IrisRecord = IrisRecord
  { field1 :: Float
  , field2 :: Float
  , field3 :: Float
  , field4 :: Float
  , label  :: Int64
  }
  deriving (Generic)

instance FromRecord IrisRecord

Now that we have our type, we’ll write a function, readIrisFromFile, that will read our data in from a CSV file.

readIrisFromFile :: FilePath -> IO (Vector IrisRecord)
readIrisFromFile fp = do
  contents <- readFile fp
  let contentsAsBs = pack contents
  let results = decode HasHeader contentsAsBs :: Either String (Vector IrisRecord)
  case results of
    Left err -> error err
    Right records -> return records

We won’t want to always feed our entire data set into our training system. So given a whole slew of these items, we should be able to pick out a random sample.

sampleSize :: Int
sampleSize = 10

chooseRandomRecords :: Vector IrisRecord -> IO (Vector IrisRecord)
chooseRandomRecords records = do
  let numRecords = Data.Vector.length records
  chosenIndices <- take sampleSize <$> shuffleM [0..(numRecords - 1)]
  return $ fromList $ map (records !) chosenIndices

Once we’ve selected our vector of records to use for each run, we’re still not done. We need to take these records and transform them into the TensorData that we’ll feed into our algorithm. We create items of TensorData by feeding in a shape and then a 1-dimensional vector of values. First, we need to know the shapes of our input and output. Both of these depend on the number of rows in the sample. The “input” will have a column for each of the four features in our set. The output meanwhile will have a single column for the label values.

irisFeatures :: Int64
irisFeatures = 4

irisLabels :: Int64
irisLabels = 3

convertRecordsToTensorData :: Vector IrisRecord -> (TensorData Float, TensorData Int64)
convertRecordsToTensorData records = (input, output)
  where
    numRecords = Data.Vector.length records 
    input = encodeTensorData [fromIntegral numRecords, irisFeatures] (undefined)
    output = encodeTensorData [fromIntegral numRecords] (undefined)

Now all we need to do is take the various records and turn them into one dimensional vectors to encode. Here’s the final function:

convertRecordsToTensorData :: Vector IrisRecord -> (TensorData Float, TensorData Int64)
convertRecordsToTensorData records = (input, output)
  where
    numRecords = Data.Vector.length records 
    input = encodeTensorData [fromIntegral numRecords, irisFeatures]
      (fromList $ concatMap recordToInputs records)
    output = encodeTensorData [fromIntegral numRecords] (label <$> records)
    recordToInputs :: IrisRecord -> [Float]
    recordToInputs rec = [field1 rec, field2 rec, field3 rec, field4 rec]

Neural Network Basics

Now that we’ve got that out of the way, we can start writing our model. Remember, we want to perform two different actions with our model. First, we want to be able to take our training input and train the weights. Second, we want to be able to pass a test data set and determine the error rate. We can represent these two different functions as a single Model object. Remember the Session monad, where we run all our Tensor Flow activities. The training will run an action that alters the variables but returns nothing. The error rate calculation will return us a float value.

data Model = Model
  { train :: TensorData Float -- Training input
          -> TensorData Int64 -- Training output
          -> Session ()
  , errorRate :: TensorData Float -- Test input
              -> TensorData Int64 -- Test output
              -> Session Float
  }

Now we’re going to build a fully-connected neural network. We’ll have 4 input units (1 for each of the different features), and then we’ll have 3 output units (1 for each of the classes we’re trying to represent). In the middle, we’ll use a hidden layer consisting of 10 units. This means we’ll need two sets of weights and biases. We’ll write a function that, when given dimensions, will give us the variable tensors for each layer. We want the weight and bias tensors, plus the result tensor of the layer.

buildNNLayer :: Int64 -> Int64 -> Tensor v Float
             -> Build (Variable Float, Variable Float, Tensor Build Float)
buildNNLayer inputSize outputSize input = do
  weights <- truncatedNormal (vector [inputSize, outputSize]) >>= initializedVariable
  bias <- truncatedNormal (vector [outputSize]) >>= initializedVariable
  let results = (input `matMul` readValue weights) `add` readValue bias
  return (weights, bias, results)

We do this in the Build monad, which allows us to construct variables, among other things. We’ll use a truncatedNormal distribution for all our variables to keep things simple. We specify the size of each variable in a vector tensor, and then initialize them. Then we’ll create the resulting tensor by multiplying the input by our weights and adding the bias.

Constructing our Model

Now we’ll start building our Model object, again within the Build monad. We begin by specifying our input and output placeholders, as well the number of hidden units. We’ll also use a batchSize of -1 to account for the fact that we want a variable number of input samples.

irisFeatures :: Int64
irisFeatures = 4

irisLabels :: Int64
irisLabels = 3
-- ^^ From above

createModel :: Build Model
createModel = do
  let batchSize = -1 -- Allows variable sized batches
  let numHiddenUnits = 10
  inputs <- placeholder [batchSize, irisFeatures]
  outputs <- placeholder [batchSize]

Then we’ll get the nodes for the two layers of variables, as well as their results. Between the layers, we’ll add a “rectifier” activation function relu:

(hiddenWeights, hiddenBiases, hiddenResults) <- 
  buildNNLayer irisFeatures numHiddenUnits inputs
let rectifiedHiddenResults = relu hiddenResults
(finalWeights, finalBiases, finalResults) <-
  buildNNLayer numHiddenUnits irisLabels rectifiedHiddenResults

Now we have to get the inferred classes of each output. This means calling argMax to take the class with the highest probability. We’ll also cast the vector and then render it. These are some Haskell-Tensor-Flow specific terms for getting tensors to the right type. Next, we compare that against our output placeholders to see how many we got correct. Then we’ll make a node for calculating the error rate for this run.

actualOutput <- render $ cast $ argMax finalResults (scalar (1 :: Int64))
let correctPredictions = equal actualOutput outputs
errorRate_ <- render $ 1 - (reduceMean (cast correctPredictions))

Now we have to actually do the work of training. First, we’ll make oneHot vectors for our expected outputs. This means converting the label 0 into the vector [1,0,0], and so on. We’ll compare these values against our results (before we took the max), and this gives us our loss function. Then we will make a list of the parameters we want to train. The adam optimizer will minimize our loss function while modifying the params.

let outputVectors = oneHot outputs (fromIntegral irisLabels) 1 0
let loss = reduceMean $ fst $ softmaxCrossEntropyWithLogits finalResults outputVectors
let params = [hiddenWeights, hiddenBiases, finalWeights, finalBiases]
train_ <- minimizeWith adam loss params

Now we’ve got our errorRate_ and train_ nodes ready. There's one last step here. We have to plug in for the placeholder values and create functions that will take in the tensor data. Remember the feed pattern from last week? We use it again here. Finally, our model is complete!

return $ Model
  { train = \inputFeed outputFeed -> 
      runWithFeeds
        [ feed inputs inputFeed
        , feed outputs outputFeed
        ]
        train_
  , errorRate = \inputFeed outputFeed -> unScalar <$>
      runWithFeeds
        [ feed inputs inputFeed
        , feed outputs outputFeed
        ]
        errorRate_
  }

Tying it all together

Now we’ll write our main function that will run the session. It will have three stages. In the preparation stage, we’ll load our data, and use the build function to get our model. Then we’ll train our model for 1000 steps by choosing samples and converting our records to data. Every 100 steps, we'll print the output. Finally, we’ll determine the resulting error ratio by using the test data.

runIris :: FilePath -> FilePath -> IO ()
runIris trainingFile testingFile = runSession $ do
  -- Preparation
  trainingRecords <- liftIO $ readIrisFromFile trainingFile
  testRecords <- liftIO $ readIrisFromFile testingFile
  model <- build createModel

  -- Training
  forM_ ([0..1000] :: [Int]) $ \i -> do
    trainingSample <- liftIO $ chooseRandomRecords trainingRecords
    let (trainingInputs, trainingOutputs) = convertRecordsToTensorData trainingSample
    (train model) trainingInputs trainingOutputs
    when (i `mod` 100 == 0) $ do
      err <- (errorRate model) trainingInputs trainingOutputs
      liftIO $ putStrLn $ "Current training error " ++ show (err * 100)

  liftIO $ putStrLn ""

  -- Testing
  let (testingInputs, testingOutputs) = convertRecordsToTensorData testRecords
  testingError <- (errorRate model) testingInputs testingOutputs
  liftIO $ putStrLn $ "test error " ++ show (testingError * 100)

  return ()

Results

So when we actually run all this output, we’ll get the following results on our test set.

Current training error 60.000004
Current training error 30.000002
Current training error 39.999996
Current training error 19.999998
Current training error 10.000002
Current training error 10.000002
Current training error 19.999998
Current training error 19.999998
Current training error 10.000002
Current training error 10.000002
Current training error 0.0

test error 3.333336

Our test sample size was 30, so this means we got 29/30 this time around. Results change though from run to run (I obviously used the best results I found). Since our sample size is so small, we have high entropy here (sometimes the error rate is like 40%). Generally we’ll want to train longer on a larger test set, so that we get more consistent results, but this is a good start.

Conclusion

In this article we went over the basics of making a neural network using the Haskell Tensor Flow library. We made a fully-connected neural network and fed in real data we parsed using the Cassava library. This network was able to learn a function to classify flowers from the Iris data set. Considering the small amount of data, we got some good results.

Come back next week, where we’ll see how we can add some more summary information to our tensor flow graph. We’ll use the tensor board application to view our graph in a visual format.

For more details on installing the Haskell Tensor Flow system, check out our In-Depth Tensor Flow Tutorial. It should walk you through the important steps in running the code on your own machine.

Perhaps you’ve never tried Haskell before at all, and want to see what it’s like. Maybe I’ve convinced you that Haskell is in fact the future of AI. In that case, you should check out our Getting Started Checklist for some tools on starting with the language.

Appendix: All Imports

Documentation for Haskell Tensor Flow is still a major work in progress. So I want to make sure I explicitly list the modules you need to import for all the different functions we used here.

import Control.Monad (forM_, when)
import Control.Monad.IO.Class (liftIO)
import Data.ByteString.Lazy.Char8 (pack)
import Data.Csv (FromRecord, decode, HasHeader(..))
import Data.Int (Int64)
import Data.Vector (Vector, length, fromList, (!))
import GHC.Generics (Generic)
import System.Random.Shuffle (shuffleM)

import TensorFlow.Core (TensorData, Session, Build, render, runWithFeeds, feed, unScalar, build,
                        Tensor, encodeTensorData)
import TensorFlow.Minimize (minimizeWith, adam)
import TensorFlow.Ops (placeholder, truncatedNormal, add, matMul, relu,
                      argMax, scalar, cast, oneHot, reduceMean, softmaxCrossEntropyWithLogits, 
                      equal, vector)
import TensorFlow.Session (runSession)
import TensorFlow.Variable (readValue, initializedVariable, Variable)

Starting out with Haskell Tensor Flow

Last week we discussed the burgeoning growth of AI systems. We saw several examples of how those systems are impacting our lives more and more. I made the case that we ought to focus more on reliability when making architecture choices. After all, people’s lives might be at stake when we right code now. Naturally, I suggested Haskell as a prime candidate for developing reliable AI systems.

So now we’ll actually write some Haskell machine learning code. We'll focus on the Tensor Flow bindings library. I first got familiar with this library back at BayHac in April. I’ve spent the last couple months learning both Tensor Flow as a whole and the Haskell library. In this first article, we’ll go over the basic concepts of Tensor Flow. We'll see how they’re implemented in Python (the most common language for TF). We'll then translate these concepts to Haskell.

Note this series will not be a general introduction to the concept of machine learning. There is a fantastic series on Medium about that called Machine Learning is Fun! If you’re interested in learning the basic concepts, I highly recommend you check out part 1 of that series. Many of the ideas in my own article series will be a lot clearer with that background.

Tensors

Tensor Flow is a great name because it breaks the library down into the two essential concepts. First up are tensors. These are the primary vehicle of data representation in Tensor Flow. Low-dimensional tensors are actually quite intuitive. But there comes a point when you can’t really visualize what’s going on, so you have to let the theoretical idea guide you.

In the world of big data, we represent everything numerically. And when you have a group of numbers, a programmer’s natural instinct is to put those in an array.

[1.0, 2.0, 3.0, 6.7]

Now what do you do if you have a lot of different arrays of the same size and you want to associate them together? You make a 2-dimensional array (an array of arrays), which we also refer to as a matrix.

[[1.0, 2.0, 3.0, 6.7],
[5.0, 10.0, 3.0, 12.9],
[6.0, 12.0, 15.0, 13.6],
[7.0, 22.0, 8.0, 5.3]]

Most programmers are pretty familiar with these concepts. Tensors take this idea and keep extending it. What happens when you have a lot of matrices of the same size? You can group them together as an array of matrices. We could call this a three-dimensional matrix. But “tensor” is the term we’ll use for this data representation in all dimensions.

Every tensor has a degree. We could start with a single number. This is a tensor of degree 0. Then a normal array is a tensor of degree 1. Then a matrix is a tensor of degree 2. Our last example would be a tensor of degree 3. And you can keep adding these on to each other, ad infinitum.

Every tensor has a shape. The shape is an array representing the dimensions of the tensor. The length of this array will be the degree of the tensor. So a number will have the empty list as its shape. An array will have a list of length 1 containing the length of the array. A matrix will have a list of length 2 containing its number of rows and columns. And so on. There are a few different ways we can represent tensors in code, but we'll get to that in a bit.

Go with the Flow

The second important concept to understand is how Tensor Flow performs computations. Machine learning generally involves simple math operations. A lot of simple math operations. Since the scale is so large, we need to perform these operations as fast as possible. We need to use software and hardware that is optimized for this specific task. This necessitates having a low-level code representation of what’s going on. This is easier to achieve in a language like C, instead of Haskell or Python.

We could have the bulk of our code in Haskell, but perform the math in C using a Foreign Function Interface. But these interfaces have a large overhead, so this is likely to negate most of the gains we get from using C.

Tensor Flow’s solution to this problem is that we first build up a graph describing all our computations. Then once we have described that, we “run” our graph using a “session”. Thus it performs the entire language conversion process at once, so the overhead is lower.

If this sounds familiar, it's because this is how actions tend to work in Haskell (in some sense). We can, for instance, describe an IO action. And this action isn’t a series of commands that we execute the moment they show up in the code. Rather, the action is a description of the operations that our program will perform at some point. It’s also similar to the concept of Effectful programming. We’ll explore that topic in the future on this blog.

So what does our computation graph look like? We'll, each tensor is a node. Then we can make other nodes for "operations", that take tensors as input. For instance, we can “add” two tensors together, and this is another node. We’ll see in our example how we build up the computational graph, and then run it.

One of the awesome features of Tensor Flow is the Tensor Board application. It allows you to visualize your graph of computations. We’ll see how to do this later in the series.

Coding Tensors

So at this point we should start examining how we actually create tensors in our code. We’ll start by looking at how we do this in Python, since the concepts are a little easier to understand that way. There are three types of tensors we’ll consider. The first are “constants”. These represent a set of values that do not change. We can use these values throughout our model training process, and they'll be the same each time. Since we define the values for the tensor up front, there’s no need to give any size arguments. But we will specify the datatype that we’ll use for them.

import tensorflow as tf

node1 = tf.constant(3.0, dtype=tf.float32)
node2 = tf.constant(4.0, dtype=tf.float32)

Now what can we actually do with these tensors? Well for a quick sample, let’s try adding them. This creates a new node in our graph that represents the addition of these two tensors. Then we can “run” that addition node to see the result. To encapsulate all our information, we’ll create a “Session”:

import tensorflow as tf

node1 = tf.constant(3.0, dtype=tf.float32)
node2 = tf.constant(4.0, dtype=tf.float32)
additionNode = tf.add(node1, node2)

sess = tf.Session()
result = sess.run(additionNode)
print result

“””
Output:
7.0
“””

Next up are placeholders. These are values that we change each run. Generally, we will use these for the inputs to our model. By using placeholders, we'll be able to change the input and train on different values each time. When we “run” a session, we need to assign values to each of these nodes.

We don’t know the values that will go into a placeholder, but we still assign the type of data at construction. We can also assign a size if we like. So here’s a quick snippet showing how we initialize placeholders. Then we can assign different values with each run of the application. Even though our placeholder tensors don’t have values, we can still add them just as we could with constant tensors.

node1 = tf.placeholder(tf.float32)
node2 = tf.placeholder(tf.float32)
adderNode = tf.add(node1, node2)

sess = tf.Session()
result1 = sess.run(adderNode, {node1: 3, node2: 4.5 })
result2 = sess.run(adderNode, {node1: 2.7, node2: 8.9 })
print(result1)
print(result2)

"""
Output:
7.5
11.6
"""

The last type of tensor we’ll use are variables. These are the values that will constitute our “model”. Our goal is to find values for these parameters that will make our model fit the data well. We’ll supply a data type, as always. In this situation, we’ll also provide an initial constant value. Normally, we’d want to use a random distribution of some kind. The tensor won’t actually take on its value until we run a global variable initializer function. We’ll have to create this initializer and then have our session object run it before we get going.

w = tf.Variable([3], dtype=tf.float32)
b = tf.Variable([1], dtype=tf.float32)

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

Now let’s use our variables to create a “model” of sorts. For this article we'll make a simple linear model. Let’s create additional nodes for our input tensor and the model itself. We’ll let w be the weights, and b be the “bias”. This means we’ll construct our final value by w*x + b, where x is the input.

w = tf.Variable([3], dtype=tf.float32)
b = tf.Variable([1], dtype=tf.float32)
x = tf.placeholder(dtype=tf.float32)
linear_model = w * x + b

Now, we want to know how good our model is. So let’s compare it to y, an input of our expected values. We’ll take the difference, square it, and then use the reduce_sum library function to get our “loss”. The loss measures the difference between what we want our model to represent and what it actually represents.

w = tf.Variable([3], dtype=tf.float32)
b = tf.Variable([1], dtype=tf.float32)
x = tf.placeholder(dtype=tf.float32)
linear_model = w * x + b
y = tf.placeholder(dtype=tf.float32)
squared_deltas = tf.square(linear_model - y)
loss = tf.reduce_sum(squared_deltas)

Each line here is a different tensor, or a new node in our graph. We’ll finish up our model by using the built in GradientDescentOptimizer with a learning rate of 0.01. We’ll set our training step as attempting to minimize the loss function.

optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)

Now we’ll run the session, initialize the variables, and run our training step 1000 times. We’ll pass a series of inputs with their expected outputs. Let's try to learn the line y = 5x - 1. Our expected output y values will assume this.

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
for i in range(1000):
    sess.run(train, {x: [1, 2, 3, 4], y: [4,9,14,19]})

print(sess.run([W,b]))

At the end we print the weights and bias, and we see our results!

[array([ 4.99999475], dtype=float32), array([-0.99998516], dtype=float32)]

So we can see that our learned values are very close to the correct values of 5 and -1!

Representing Tensors in Haskell

So now at long last, I’m going to get into some of the details of how we apply these tensor concepts in Haskell. Like strings and numbers, we can’t have this one “Tensor” type in Haskell, since that type could really represent some very different concepts. For a deeper look at the tensor types we’re dealing with, check out our in depth guide.

In the meantime, let’s go through some simple code snippets replicating our efforts in Python. Here’s how we make a few constants and add them together. Do note the “overloaded lists” extension. It allows us to represent different types with the same syntax as lists. We use this with both Shape items and Vectors:

{-# LANGUAGE OverloadedLists #-}

import Data.Vector (Vector)
import TensorFlow.Ops (constant, add)
import TensorFlow.Session (runSession, run)

runSimple :: IO (Vector Float)
runSimple = runSession $ do
  let node1 = constant [1] [3 :: Float]
  let node2 = constant [1] [4 :: Float]
  let additionNode = node1 `add` node2
  run additionNode

main :: IO ()
main = do
  result <- runSimple
  print result

{-
Output:
[7.0]
-}

We use the constant function, which takes a Shape and then the value we want. We’ll create our addition node and then run it to get the output, which is a vector with a single float. We wrap everything in the runSession function. This encapsulates the initialization and running actions we saw in Python.

Now suppose we want placeholders. This is a little more complicated in Haskell. We’ll be using two placeholders, as we did in Python. We’ll initialized them with the placeholder function and a shape. We’ll take arguments to our function for the input values. To actually pass the parameters to fill in the placeholders, we have to use what we call a “feed”.

We know that our adderNode depends on two values. So we’ll write our run-step as a function that takes in two “feed” values, one for each placeholder. Then we’ll assign those feeds to the proper nodes using the feed function. We’ll put these in a list, and pass that list as an argument to runWithFeeds. Then, we wrap up by calling our run-step on our input data. We’ll have to encode the raw vectors as tensors though.

import TensorFlow.Core (Tensor, Value, feed, encodeTensorData)
import TensorFlow.Ops (constant, add, placeholder)
import TensorFlow.Session (runSession, run, runWithFeeds)

import Data.Vector (Vector)

runPlaceholder :: Vector Float -> Vector Float -> IO (Vector Float)
runPlaceholder input1 input2 = runSession $ do
  (node1 :: Tensor Value Float) <- placeholder [1]
  (node2 :: Tensor Value Float) <- placeholder [1]
  let adderNode = node1 `add` node2
  let runStep = \node1Feed node2Feed -> runWithFeeds 
        [ feed node1 node1Feed
        , feed node2 node2Feed
        ] 
        adderNode
  runStep (encodeTensorData [1] input1) (encodeTensorData [1] input2)

main :: IO ()
main = do
  result1 <- runPlaceholder [3.0] [4.5]
  result2 <- runPlaceholder [2.7] [8.9]
  print result1
  print result2

{-
Output:
[7.5]
[11.599999] -- Yay rounding issues!
-}

Now we’ll wrap up by going through the simple linear model scenario we already saw in Python. Once again, we’ll take two vectors as our inputs. These will be the values we try to match. Next, we’ll use the initializedVariable function to get our variables. We don’t need to call a global variable initializer. But this does affect the state of the session. Notice that we pull it out of the monad context, rather than using let. (We also did for placeholders.)

import TensorFlow.Core (Tensor, Value, feed, encodeTensorData, Scalar(..))
import TensorFlow.Ops (constant, add, placeholder, sub, reduceSum, mul)
import TensorFlow.GenOps.Core (square)
import TensorFlow.Variable (readValue, initializedVariable, Variable)
import TensorFlow.Session (runSession, run, runWithFeeds)
import TensorFlow.Minimize (gradientDescent, minimizeWith)

import Control.Monad (replicateM_)
import qualified Data.Vector as Vector
import Data.Vector (Vector)

runVariable :: Vector Float -> Vector Float -> IO (Float, Float)
runVariable xInput yInput = runSession $ do
  let xSize = fromIntegral $ Vector.length xInput
  let ySize = fromIntegral $ Vector.length yInput
  (w :: Variable Float) <- initializedVariable 3
  (b :: Variable Float) <- initializedVariable 1
  …

Next, we’ll make our placeholders and linear model. Then we’ll calculate our loss function in much the same way we did before. Then we’ll use the same feed trick to get our placeholders plugged in.

runVariable :: Vector Float -> Vector Float -> IO (Float, Float)
  ...
  (x :: Tensor Value Float) <- placeholder [xSize]
  let linear_model = ((readValue w) `mul` x) `add` (readValue b)
  (y :: Tensor Value Float) <- placeholder [ySize]
  let square_deltas = square (linear_model `sub` y)
  let loss = reduceSum square_deltas
  trainStep <- minimizeWith (gradientDescent 0.01) loss [w,b] 
  let trainWithFeeds = \xF yF -> runWithFeeds
        [ feed x xF
        , feed y yF
        ]
        trainStep
…

Finally, we’ll run our training step 1000 times on our input data. Then we’ll run our model one more time to pull out the values of our weights and bias. Then we’re done!

runVariable :: Vector Float -> Vector Float -> IO (Float, Float)
...
  replicateM_ 1000 
    (trainWithFeeds (encodeTensorData [xSize] xInput) (encodeTensorData [ySize] yInput))
  (Scalar w_learned, Scalar b_learned) <- run (readValue w, readValue b)
  return (w_learned, b_learned)

main :: IO ()
main = do
  results <- runVariable [1.0, 2.0, 3.0, 4.0] [4.0, 9.0, 14.0, 19.0]
  print results

{-
Output:
(4.9999948,-0.99998516)
-}

Conclusion

Hopefully this article gave you a taste of some of the possibilities of Tensor Flow in Haskell. We saw a quick introduction to the fundamentals of Tensor Flow. We saw three different kinds of tensors. We then saw code examples both in Python and in Haskell. Finally, we went over a very quick example of a simple linear model and saw how we could learn values to fit that model. Next week, we’ll do a more complicated learning problem. We’ll use the classic “Iris” flower data set and train a classifier using a full neural network.

If you want more details, you should check out FREE Haskell Tensor Flow Guide. It will walk you through using the Tensor Flow library as a dependency and getting a basic model running!

Perhaps you’re completely new to Haskell but intrigued by the possibilities of using it for machine learning or anything else. You should download our Getting Started Checklist! It has some great resources on installing Haskell and learning the core concepts.

The Future is Functional: Haskell and the AI-Native World

As regular readers of this blog know, I love talking about the future of Haskell as a language. I’m interested in ways we can shape the future of programming in a way that will help Haskell grow. I've mentioned network effects as a major hindrance a couple different times. Companies are reluctant to try Haskell since there aren't that many Haskell developers. As a result, fewer other developers will have the opportunity to get paid to learn Haskell. And the cycle continues.

Many perfectly smart people also have a bias against using Haskell in production code for a business. This stems from the idea that Haskell is an academic language. They see it as unsuited towards “Real World” problems. The best rebuttal to this point is to show the many uses of Haskell in creating systems that people use every day. Now, I can sit here and point the ease of creating web servers in Haskell. I could also point to the excellent mechanisms for designing front-end UIs. But there’s still one vital area in the future of programming that I have yet to address.

This is of course, the world of AI and machine learning. AI is slowly (or not so slowly) becoming a primary concern for pretty much any software based business. The last 5-10 years have seen the rise of “cloud native” architectures and systems. But we will soon be living in age when all major software systems will use AI and machine learning at their core. In short, we are about the enter the AI Native Future, as my company’s founder put it.

This will be the first in a series of articles where I explore the uses of Haskell in writing AI applications. In the coming weeks I’ll be focusing on using the Tensor Flow bindings for Haskell. Tensor Flow allows programmers to build simple but powerful applications. There are many tutorials in Python, but the Haskell library is still in early stages. So I'll go through the core concepts of this library and show their usage in Haskell.

But for me it’s not enough to show that we can use Haskell for AI applications. That’s hardly going to move the needle or change the status quo. My ultimate goal is to prove that it’s the best tool for these kinds of systems. But first, let’s get an idea of where AI is being used, and why it’s so important.

AI Will be Everywhere...

...and this doesn’t seem to be particularly controversial. Year-on-year there are more and more discoveries in AI research. We are now able to solve problems that were scarcely thinkable a few years ago. Advancements in NLP systems like IBM Watson have made it so that chatbots are popping up all over the place. Tensor Flow has put advanced deep learning techniques at the finger tips of every programmer. Systems are getting faster and faster.

On top of that, the implications for the general public are becoming more well known. Self-driving cars are roaming the streets in several major American cities. The idea of autonomous Amazon drones making deliveries seems a near certainty. Millions of people entrust part of their home systems to joint IoT/AI devices like Nest. AI is truly going to be everywhere soon.

AI Needs to be Safe

But the ubiquity of AI presents a large number of concerns as well. Software engineers are going to have a lot more responsibility. Our code will have to make difficult and potentially life-altering decisions. For instance, the design of self-driving cars carries many philosophical implications.

The plot thickens even more when we mix AI with the Internet of Things, another exploding market. In the last year, an attack brought down large parts of the internet using a bot-net of IoT devices. In the world of IoT, security still does not have paramount importance. But soon, more and more people will have cameras, audio recording devices, fire alarms and security systems hooked up to the internet. When this happens, their safety and privacy will depend on IoT security.

The need for safety and security suggest we may need to re-think some software paradigms. "Move fast and break things" is the prevailing mindset in many quarters. But this idea doesn't look so good if "breaking things" means someone's house burns.

Pure, Functional Programming is the Path to Safety

So how does this relate to Haskell? Well let’s consider the tradeoffs we face when we choose what language to develop in. Haskell, with its strong type system and compile-time guarantees is more reliable. We can catch a lot more errors at compile time when compared to languages like Javascript, or Python. In these languages, non-syntactic issues tend to only pop up at runtime. Programmers must lean even more heavily on testing systems to catch possible errors. But testing is difficult, and there’s still plenty of disagreement about the best methodologies.

The flip side of this is that it’s somewhat easier to write code in a language like Javascript. It's easier to cut corners in the type system and have more “dynamic” objects. So while we all want “reliable” software, we’re often willing to compromise to get code off the ground faster. This is the epitome of the "Move fast and break things" mindset.

However, the explosion in the safety concerns of our software has elevated the stakes. If someone’s web browser crashes from Javascript, it's no big deal. The user will reload the page and hopefully not trigger that condition again. If your app stops responding, your user might get frustrated and you’ll lose a customer. But when programming starts penetrating other markets, any error could be catastrophic. If a self driving car encounters a sudden bug and the system crashes, many people could die. So it is our ethical responsibility to figure out ways to make it impossible for our software to encounter these kinds of errors.

Haskell is in many respects a very safe language. This is why it’s trusted by large financial institutions, large data science firms, and even by a company working in autonomous flight control. When your code cannot have arbitrary side effects, it is far easier to prevent it from crashing. It is also easier to secure a system (like an IoT device) when you can prevent leaks from arbitrary effects. Often these techniques are present in Haskell but not other languages.

The field of dependent types is yet another area where we’ll be able to add more security to our programming. They'll enable even more compile-time guarantees of behavior. This can add a great deal of safety when used well. Haskell doesn’t have full support for dependent types yet, but it is in the works. In the meantime there are languages like Idris with first class support.

Of course, when it comes to AI and deep learning, getting these guarantees will be difficult. It's one thing to build a type that ensures you have a vector of a particular length. It's quite another to build a type ensuring that when your car sees a dozen people standing ahead of it, it must brake. But these are the sorts of challenges programmers will need to face in the AI Native Future. And if we want to ensure Haskell’s place in that future, we’ll have to show these results are possible.

Conclusion

It's obvious that AI and machine learning are the big fields of software engineering. They’ll continue to dominate the field for a long time. They can have an incredible impact on our lives. But by allowing this impact, we’re putting more of our safety in the hands of software engineers. This has major implications for how we develop software.

We often have to make tradeoffs between ease of development and reliability. A language like Haskell can offer us a lot of compile time guarantees about the behavior of our program. These guarantees are absent from many other languages. However, achieving these guarantees can introduce more pain into the development process.

But soon our code will be controlling things like self-driving cars, delivery drones, and home security devices. So we have an ethical responsibility to do everything in our power to make our code as reliable as possible. For this reason, Haskell is in a prime position when it comes to the AI Native Future. To take advantage of this, it will require a lot of work. Haskell programmers will have to develop language tools like dependent types to make Haskell even more reliable. We'll also have to contribute to libraries that will make it easy to write machine learning applications in Haskell.

With this in mind, the next few articles on this blog are all going to focus on using Haskell for machine learning. We’ll be starting by going through the basics of the Tensor Flow bindings for Haskell. You can get a sneak peek at some of that content by downloading our Haskell Tensor Flow tutorial!

If you’ve never programmed in Haskell before, you should try it out! We have two great resources for getting started. First, there’s the Getting Started Checklist. It will first walk you through downloading the language. Then it will point you in the directions of some other beginner materials. Second, there’s our Stack mini-course. This will walk you through the Stack tool, which makes it very easy to build projects, organize code, and get dependencies with Haskell.