How well does it work? Profiling in Haskell
I’ve said it before, but I’ll say it again. As much as we’d like to think it’s the case, our Haskell code doesn’t work just because it compiles. This is why we have test suites. But even if it passes our test suites this doesn’t mean it works as well as it could either. Sometimes we’ll realize that the code we wrote isn’t quite performant enough, so we’ll have to make improvements.
But improving our code can sometimes feel like taking shots in the dark. You'll spend a great deal of time tweaking a certain piece. Then you'll find you haven’t actually made much of a dent in the total run time of the application. Certain operations generally take longer, like database calls, network operations, and IO. So you can often have a decent idea of where to start. But it always helps to be sure. This is where benchmarking and profiling come in. We’re going to take a specific problem and learn how we can use some Haskell tools to zero in on the problem point.
As a note, the tools we’ll use require you to be organizing your code using Stack or Cabal. If you’ve never used either of these before, you should check out our Stack Mini Course! It'll teach you the basics of creating a project with Stack. You'll also learn the primary commands to use with Stack. It’s brand new and best of all FREE! Check it out! It’s our first course of any kind, so we’re looking for feedback!
The Problem
Our overarching problem for this article will be the “largest rectangle” problem. You can actually try to solve this problem yourself on Hackerrank under the name “John and Fences”. Imagine we have a series of vertical bars with varying heights placed next to each other. We want to find the area of the largest rectangle that we can draw over these bars that doesn’t include any empty space. Here’s a visualization of one such problem and solution:
In this example, we have posts with heights [2,5,7,4,1,8]
. The largest rectangle we can form has an area of 12, as we see with the highlighted squares.
This problem is pretty neat and clean to solve with Haskell, as it lends itself to a recursive solution. First let’s define a couple newtypes to illustrate our concepts for this problem. We’ll use a compiler extension to derive the Num
typeclass on our index type, as this will be useful later.
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
...
newtype FenceValues = FenceValues { unFenceValues :: [Int] }
newtype FenceIndex = FenceIndex { unFenceIndex :: Int }
deriving (Eq, Num, Ord)
-- Left Index is inclusive, right index is non-inclusive
newtype FenceInterval = FenceInterval { unFenceInterval :: (FenceIndex, FenceIndex) }
newtype FenceSolution = FenceSolution { unFenceSolution :: Int }
deriving (Eq, Show, Ord)
Next, we’ll define our primary function. It will take our FenceValues
, a list of integers, and return our solution.
largestRectangle :: FenceValues -> FenceSolution
largestRectangle values = ...
It in turn will call our recursive helper function. This function will calculate the largest rectangle over a specific interval. We can solve it recursively by using smaller and smaller intervals. We’ll start by calling it on the interval of the whole list.
largestRectangle :: FenceValues -> FenceSolution
largestRectangle values = largestRectangleAtIndices values
(FenceInterval (FenceIndex 0, FenceIndex (length (unFenceValues values))))
largestRectangleAtIndices :: FenceValues -> FenceInterval -> FenceSolution
largestRectangleAtIndices = ...
Now, to break this into recursive cases, we need some more information first. What we need is the index i
of the minimum height in this interval. One option is that we could make a rectangle spanning the whole interval with this height.
Any other "largest rectangle" won't use this particular index. So we can then divide our problem into two more cases. In the first, we'll find the largest rectangle on the interval to the left. In the second, we'll look to the right.
As your might realize, these two cases simply involve making recursive calls! Then we can easily compare their results. The only thing we need to add is a base case. Here are all these cases represented in code:
largestRectangleAtIndices :: FenceValues -> FenceInterval -> FenceSolution
largestRectangleAtIndices
values
interval@(FenceInterval (leftIndex, rightIndex)) =
-- Base Case: Checks if left + 1 >= right
if isBaseInterval interval
then FenceSolution (valueAtIndex values leftIndex)
-- Compare three cases
else max (max middleCase leftCase) rightCase
where
-- Find the minimum height and its index
(minIndex, minValue) = minimumHeightIndexValue values interval
-- Case 1: Use the minimum index
middleCase = FenceSolution $ (intervalSize interval) * minValue
-- Recursive call #1
leftCase = largestRectangleAtIndices values (FenceInterval (leftIndex, minIndex))
-- Guard against case where there is no "right" interval
rightCase = if minIndex + 1 == rightIndex
then FenceSolution (maxBound :: Int) -- Supply a "fake" solution that we'll ignore
-- Recursive call #2
else largestRectangleAtIndices values (FenceInterval (minIndex + 1, rightIndex))
And just like that, we’re actually almost finished. The only sticking point here is a few helper functions. Three of these are simple:
valueAtIndex :: FenceValues -> FenceIndex -> Int
valueAtIndex values index = (unFenceValues values) !! (unFenceIndex index)
isBaseInterval :: FenceInterval -> Bool
isBaseInterval (FenceInterval (FenceIndex left, FenceIndex right)) = left + 1 >= right
intervalSize :: FenceInterval -> Int
intervalSize (FenceInterval (FenceIndex left, FenceIndex right)) = right - left
Now we have to determine the minimum on this interval. Let’s do this in the most naive way, by scanning the whole interval with a fold.
minimumHeightIndexValue :: FenceValues -> FenceInterval -> (FenceIndex, Int)
minimumHeightIndexValue values (FenceInterval (FenceIndex left, FenceIndex right)) =
foldl minTuple (FenceIndex (-1), maxBound :: Int) valsInInterval
where
valsInInterval :: [(FenceIndex, Int)]
valsInInterval = drop left (take right (zip (FenceIndex <$> [0..]) (unFenceValues values)))
minTuple :: (FenceIndex, Int) -> (FenceIndex, Int) -> (FenceIndex, Int)
minTuple old@(_, heightOld) new@(_, heightNew) =
if heightNew < heightOld then new else old
And now we’re done!
Benchmarking our Code
Now, this is a neat little algorithmic solution, but we want to know if our code is efficient. We need to know if it will scale to larger input values. We can find the answer to this question by writing benchmarks. Benchmarks are a feature we can use in conjunction with Cabal and Stack. They work a lot like test suites. But instead of proving the correctness of our code, they’ll show us how fast our code runs under various circumstances. We’ll use the Criterion
library to do this. We’ll start by adding a section in our .cabal file for this benchmark:
benchmark fences-benchmarks
type: exitcode-stdio-1.0
hs-source-dirs: benchmark
main-is: fences-benchmark.hs
build-depends: base
, FencesExample
, criterion
, random
default-language: Haskell2010
Now we’ll make our file fences-benchmark.hs
, make it a Main
module and add a main
function. We’ll generate 6 lists, increasing in size by a factor of 10 each time. Then we’ll create a benchmark group and call the bench
function on each situation.
module Main where
import Criterion
import Criterion.Main (defaultMain)
import System.Random
import Lib
main :: IO ()
main = do
[l1, l2, l3, l4, l5, l6] <- mapM
randomList [1, 10, 100, 1000, 10000, 100000]
defaultMain
[ bgroup "fences tests"
[ bench "Size 1 Test" $ whnf largestRectangle l1
, bench "Size 10 Test" $ whnf largestRectangle l2
, bench "Size 100 Test" $ whnf largestRectangle l3
, bench "Size 1000 Test" $ whnf largestRectangle l4
, bench "Size 10000 Test" $ whnf largestRectangle l5
, bench "Size 100000 Test" $ whnf largestRectangle l6
]
]
-- Generate a list of a particular size
randomList :: Int -> IO FenceValues
randomList n = FenceValues <$> (sequence $ replicate n (randomRIO (1, 10000 :: Int)))
We’d normally run these benchmarks with stack bench
(or cabal bench
if you’re not using Stack). But we can also compile our code with the --profile
flag. This will automatically create a profiling report with more information about our code. Note using profiling requires re-compiling ALL the dependencies to use profiling as well. So you don't want to switch back and forth a lot.
>> stack bench --profile
Benchmark fences-benchmarks: RUNNING...
benchmarking fences tests/Size 1 Test
time 47.79 ns (47.48 ns .. 48.10 ns)
1.000 R² (0.999 R² .. 1.000 R²)
mean 47.78 ns (47.48 ns .. 48.24 ns)
std dev 1.163 ns (817.2 ps .. 1.841 ns)
variance introduced by outliers: 37% (moderately inflated)
benchmarking fences tests/Size 10 Test
time 3.324 μs (3.297 μs .. 3.356 μs)
0.999 R² (0.999 R² .. 1.000 R²)
mean 3.340 μs (3.312 μs .. 3.368 μs)
std dev 98.52 ns (79.65 ns .. 127.2 ns)
variance introduced by outliers: 38% (moderately inflated)
benchmarking fences tests/Size 100 Test
time 107.3 μs (106.3 μs .. 108.2 μs)
0.999 R² (0.999 R² .. 0.999 R²)
mean 107.2 μs (106.3 μs .. 108.4 μs)
std dev 3.379 μs (2.692 μs .. 4.667 μs)
variance introduced by outliers: 30% (moderately inflated)
benchmarking fences tests/Size 1000 Test
time 8.724 ms (8.596 ms .. 8.865 ms)
0.998 R² (0.997 R² .. 0.999 R²)
mean 8.638 ms (8.560 ms .. 8.723 ms)
std dev 228.8 μs (193.6 μs .. 272.8 μs)
benchmarking fences tests/Size 10000 Test
time 909.2 ms (899.3 ms .. 914.1 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 915.1 ms (914.6 ms .. 915.8 ms)
std dev 620.1 μs (136.0 as .. 664.8 μs)
variance introduced by outliers: 19% (moderately inflated)
benchmarking fences tests/Size 100000 Test
time 103.9 s (91.11 s .. 117.3 s)
0.997 R² (0.997 R² .. 1.000 R²)
mean 107.3 s (103.7 s .. 109.4 s)
std dev 3.258 s (0.0 s .. 3.702 s)
variance introduced by outliers: 19% (moderately inflated)
Benchmark fences-benchmarks: FINISH
So when we run this, we’ll find something...troubling. It takes a looong time to run the final benchmark on size 100000. On average, this case takes over 100 seconds...more than a minute and a half! We can further take note of how the average run time increases based on the size of the case. Let’s pare down the data a little bit:
Size 1: 47.78 ns
Size 10: 3.340 μs (increased ~70x)
Size 100: 107.2 μs (increased ~32x)
Size 1000: 8.638 ms (increased ~81x)
Size 10000: 915.1 ms (increased ~106x)
Size 100000: 107.3 s (increased ~117x)
Each time we increase the size of the problem by a factor of 10, the time spent increased by a factor closer to 100! This suggests our run time is O(n^2)
(check out this guide if you are unfamiliar with Big-O notation). We’d like to do better.
Determining the Problem
So we want to figure out why our code isn’t performing very well. Luckily, we already profiled our benchmark!. This outputs a specific file that we can look at, called fences-benchmark.prof
. It has some very interesting results:
COST CENTRE MODULE SRC %time %alloc
minimumHeightIndexValue.valsInInterval Lib src/Lib.hs:45:5-95 69.8 99.7
valueAtIndex Lib src/Lib.hs:51:1-74 29.3 0.0
We see that we have two big culprits taking a lot of time. First, there is our function that determines the minimum between a specific interval. The report is even more specific, calling out the specific offending part of the function. We spend a lot of time getting the different values for a specific interval. In second place, we have valueAtIndex
. This means we also spend a lot of time getting values out of our list.
First let’s be glad we’ve factored our code well. If we had written our entire solution in one big function, we wouldn’t have any leads here. This makes it much easier for us to analyze the problem. When examining the code, we see why both of these functions could produce O(n^2)
behavior.
Due to the number of recursive calls we make, we’ll call each of these functions O(n)
times. Then when we call valueAtIndex
, we use the (!!)
operator on our linked list. This takes O(n)
time. Scanning the whole interval for the minimum height has the same effect. In the worst case, we have to look at every element in the list!
I’m hand waving a bit here, but that is the basic result. When we call these O(n)
pieces O(n)
times, we get O(n^2)
time total.
Cliff Hanger Ending
We can actually solve this problem in O(n log n)
time, a dramatic improvement over the current O(n^2)
. But we’ll have to improve our data structures to accomplish this. First, we’ll store our values so that we can go from the index to the element in sub-linear time. This is easy. Second, we have to determine the index containing the minimum element within an arbitrary interval. This is a bit trickier to do in sub-linear time. We'll need a more advanced data structure.
To find out how we solve these problems, you’ll have to wait for part 2 of this series! Come back next week to the Monday Morning Haskell blog!
As a reminder, we’ve just published a FREE mini-course on Stack. It’ll teach you the basics of laying out a project and running commands on it using the Stack tool. You should enroll in the Monday Morning Haskell Academy to sign up! This is our first course of any kind, so we would love to get some feedback! Once you know about Stack, it'll be a lot easier to try this problem out for yourself!
In addition to Stack, recursion also featured pretty heavily in our solution here. If you’ve done any amount of functional programming you’ve seen recursion in action. But if you want to solidify your knowledge, you should download our FREE Recursion Workbook! It has two chapters worth of content on recursion and it has 10 practice problems you can work through!
Never programmed in Haskell? No problem! You can download our Getting Started Checklist and it’ll help you get Haskell installed. It’ll also point you in the direction of some great learning resources as well. Take a look!