Quick and Simple Benchmarking
As a programmer, you have to constantly be asking, "How Good is my Code?". There are a lot of different ways code can be "good" or "bad", but one of the key indicators in many cases is the time performance of your code. How long does it take your code to run on particular inputs?
Haskell isn't generally regarded as one of the fastest languages out there; it's not designed specifically with performance in mind. Languages like C, C++, and, at a higher level, Rust, are specifically geared for writing code that runs faster. But this doesn't mean performance is irrelevant in Haskell. As long as you're writing a real world application that people are going to use, you need to be concerned with the user experience, and no user wants to be waiting around for several seconds on simple operations because you didn't bother to optimize your code.
So we want ways to measure the performance of our code, a process also known as benchmarking. The Criterion Library is one of the main tools Haskell offers to help you do this. I've written about this library before in my series on Testing in Haskell.
But the library is a bit difficult to jump into if you're just a beginner or if you're doing something lightweight. It can also take a lot longer to run than you were expecting, since it runs your code many times to get a statistically significant average. So in this article we're going to go over a more lightweight way to benchmark your code. It turns out though that this is a little trickier in Haskell than in other languages, so let's learn some of these nuances.
Timing in Python
Timing a function in Python is rather straightforward. Here's some code we can use to benchmark an operation op
on a particular input_arg
:
# time_py.py
import time
def time_op(op, input_arg):
t1 = time.time()
result = op(input_arg)
t2 = time.time()
return t2 - t1
We get the initial time t1
, we perform the operation, and then we get the current time again, and we return the difference.
As a very basic example, we can compare running the sorted
function on a group of lists of escalating lengths:
xs = [10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000]
for x in xs:
a = time_op(sorted, xs)
print(a)
And running this command, we see that the amount of time increases for the number of elements in the list:
>> python3 time_py.py
2.6226043701171875e-06
1.1920928955078125e-06
5.245208740234375e-06
5.0067901611328125e-05
0.0007715225219726562
0.008384466171264648
0.08414745330810547
0.8780522346496582
Small list sizes don't take very long, and increasing the size of the list 10x increases the time by 10-20x, consistent with the n log n
nature of sorting.
Notably, these values don't feel like they reflect the total time in our program. This program takes 5-10 seconds to run on my machine, quite a bit more than the cumulative time shown here. But the values are at least directionally correct, indicating a longer time for larger operations.
First Haskell Attempt
Let's try repeating this code in Haskell. We'll need to work in the IO monad, since the equivalent to time.time()
is a call to getCurrentTime
. Here's what our code might look like:
import Control.Monad (forM_)
import qualified Data.List as L
import Data.Time.Clock
timeOp :: (a -> b) -> a -> IO NominalDiffTime
timeOp op input = do
t1 <- getCurrentTime
let result = op input
t2 <- getCurrentTime
return $ diffUTCTime t2 t1
main :: IO ()
main = forM_ sizes $ \x -> do
let input = [1..x]
diff <- timeOp L.sort input
print diff
sizes :: [Int]
sizes =
[ 10, 100, 1000, 10000
, 100000, 1000000
, 10000000, 100000000
]
When we run this program, we get some unhelpful results though…
0.000000576s
0.000000767s
0.000000684s
0.000000658s
0.000000751s
0.000000717s
0.00000075s
0.000000678s
All these times appear the same! What's happening?
Laziness at Work
What's happening, of course, is that Haskell's laziness is kicking in. Our program never demonstrates a need to use the values in our sorted lists. So it never actually evaluates these lists and performs the sorting!
So let's change things up a bit by returning our result along with the time, and printing the last value of the result list which will force the evaluation of the whole sorted list.
timeOp :: (a -> b) -> a -> IO (b, NominalDiffTime)
timeOp op input = do
t1 <- getCurrentTime
let result = op input
t2 <- getCurrentTime
return $ (result, diffUTCTime t2 t1)
main :: IO ()
main = forM_ sizes $ \x -> do
let input = [1..x]
(result, diff) <- timeOp L.sort input
print (diff, last result)
Our result now looks like this:
(0.000000672s,10)
(0.00000061s,100)
(0.000000363s,1000)
(0.00000042s,10000)
(0.000000661s,100000)
(0.000001089s,1000000)
(0.000001185s,10000000)
(0.000001256s,100000000)
Unfortunately, this is still not what we're looking for! Though there's a bit of increase in time, it's nowhere near the 10-20x increase we would like to see.
Haskell's laziness is still biting us. Our program is evaluating the list, but it's not evaluating the list when we would like it too. We want the whole evaluation to occur between our timing calls, but this doesn't happen. The complete list is only evaluated at the end of our loop, when we're about to print it out.
Printing Solutions
This can clue us in to some potential solutions though. Since printing is one of the sure-fire ways to ensure an expression is fully evaluated, we can add a Show
constraint and print the value out during the evaluation function. This removes the need to return this result from our function:
timeOp :: (Show b) => (a -> b) -> a -> IO NominalDiffTime
timeOp op input = do
t1 <- getCurrentTime
let result = op input
print result
t2 <- getCurrentTime
return $ diffUTCTime t2 t1
main :: IO ()
main = forM_ sizes $ \x -> do
let input = [1..x]
diff <- timeOp L.sort input
print diff
While technically this works, if we have non-trival result values to print (like a ten-million item list), then all the important data gets lost in this spam. Plus, our timing values now include the time it takes to print, which can end up being much larger!
We can get around this spam by instead printing the values to the /dev/null
file, which simply discards all this information.
timeOp :: (Show b) => (a -> b) -> a -> IO NominalDiffTime
timeOp op input = do
t1 <- getCurrentTime
let result = op input
withFile "/dev/null" WriteMode (\h -> hPutStrLn h (show result))
t2 <- getCurrentTime
return $ diffUTCTime t2 t1
With a sufficiently large amount of print information though, it is still possible that you can crash your program by printing to /dev/null
! I had to remove the one hundred million list size, but then I got the following output:
0.000048873s
0.000017368s
0.000058216s
0.00057228s
0.006825458s
0.09262504s
2.521403946s
Now we're getting values that seem accurate! But let's keep trying to improve how this function works!
Since printing large values can still crash our program, perhaps the more sensible choice is to provide a custom printing function instead of relying on the native Show
definition. For our example, we can print the "last" value of the list, like we used before. This fully evaluates the result, without printing a large amount of information.
timeOp :: (a -> b) -> a -> (b -> String) -> IO NominalDiffTime
timeOp op input toStr = do
t1 <- getCurrentTime
withFile "/dev/null" WriteMode (\h -> hPutStrLn h (toStr result))
t2 <- getCurrentTime
return $ diffUTCTime t2 t1
main :: IO ()
main = forM_ sizes $ \x -> do
let input = [1..x]
diff <- timeOp L.sort input (show . last)
print diff
This works, even running the largest size!
0.000153407s
0.000015817s
0.000028232s
0.000279035s
0.002366471s
0.098812637s
1.468213335s
17.183272162s
Notice though that the 10-million size has changed from 2.52 seconds to 1.46 seconds. Percentage-wise, that's actually a large error! There can be a fair amount of variation from run to run. This is why Criterion does many runs. However, our approach is still useful in letting us estimate orders of magnitude on our timing, which is often all you really need in assessing a solution.
Strict IO Library
Now, Haskell does have tools besides printing to fully evaluate expressions and otherwise introduce strictness into our code. One of these is the seq
function'. This takes two arguments and simply returns the second, but still strictly evaluates the first.
-- Returns the second argument, but strictly evaluates the first
seq :: a -> b -> b
This is actually used under the hood of the foldl'
function. This does a strict left-fold, avoiding the memory leak issues in the regular version of foldl
:
foldl' f z [] = z
foldl' f z (x:xs) =
let z' = z `f` x
-- Use 'seq' to evaluate z' early!
in seq z' $ foldl' f z' xs
You can also make particular fields on a data type strict, so that they are fully evaluated immediately when an item of that type is constructed. You indicate this with a bang operator in front of the type in the data definition.
-- The fields will be strictly evaluated
data MyStrictType = MyStrictType !Int !String
What I found most useful for our particular use case though is an older library called Strict IO. This provides a number of utilities for performing IO actions strictly instead of lazily. It introduces its own module SIO
to do this. We'll use 3 different functions from there:
-- Wrap an IO action taking no arguments and make it strict
wrap0 :: IO a -> SIO a
-- Wrap a pure value, evaluating it strictly
return' :: a -> SIO a
-- Run a strict IO action within the normal IO monad
run :: SIO a -> IO a
With these three pieces, our way forward is simple. We use wrap0
on our calls to getCurrentTime
to ensure they happen immediately. Then we use return'
to wrap the evaluation of our operation. And finally, we run
our timeOp
function from the IO monad.
import Control.DeepSeq (NFData)
import qualified System.IO.Strict as S
import qualified System.IO.Strict.Internals as S
timeOp :: (NFData b) => (a -> b) -> a -> S.SIO NominalDiffTime
timeOp op input = do
t1 <- S.wrap0 getCurrentTime
result <- S.return' (op input)
t2 <- S.wrap0 getCurrentTime
return $ (result, diffUTCTime t2 t1)
main :: IO ()
main = forM_ sizes $ \x -> do
let input = [1..x]
diff <- S.run (timeOp L.sort input)
print diff
And this gives us the timing values we want, with no need for relying on print as an evaluation crutch.
0.00000917s
0.000011727s
0.000049986s
0.000489229s
0.005072207s
0.102675824s
1.722589101s
20.870199397s
This means we don't need any extra "show"-ing functions as arguments to our timing operation. All we need is an NFData
constraint on our evaluated type. This exists for most basic types as long as we import Control.DeepSeq
from the deepseq
package.
As a weakness though, the strict-io
package is not part of recent Stack resolver sets, so you'll need it as an extra-dep
in stack.yaml
. I have not found a more recent or "official" alternative though. Also, the wrap0
function comes from a module labeled as Internals
. The library was mainly geared towards basic IO actions like file reading, and not getting the current time, so wrap0
wasn't intended to be part of the public API.
Conclusion
Now that we have a lighter-weight means of benchmarking our code (as compared to Criterion), we can explore ways that this helps us learn about problem solving! Come back next week and we'll see how!
And make sure to subscribe to our mailing list for more Haskell updates! If you're interested in problem solving content, there will be quite a lot of that coming up!