Monday Morning Haskell explores a variety of topics in the Haskell programming language, from the very basics to the best tools for use in a production environment. The author, James Bowen, is a software engineer in San Francisco.

Faster Code with Laziness

Most programming languages use an evaluation paradigm called eager evaluation. This means the moment the program execution reaches an expression, the expression is evaluated. This makes sense in imperative languages. An imperative program is a series of commands. To execute the commands in order, it is best to evaluate each expression immediately.

Haskell is different. As we’ve discussed, a Haskell program is actually a series of expressions to be evaluated. However, Haskell uses lazy evaluation. It only evaluates the expressions it absolutely needs to in order to produce the program’s output. Furthermore, it only evaluates an expression at the moment it is needed, rather than the moment it is encountered.

Laziness Under the Hood

So how does Haskell accomplish this? To understand, we have to take a deeper look at Haskell’s memory model. Consider this simple program:

main = do
  let myTuple = (“first”, map (*2) [1,2,3,4])
  print “Hello”
  print $ fst myTuple
  print head snd myTuple
  print (length (snd myTuple))
  print $ snd myTuple

When the first print statement (“Hello”) happens, the expression myTuple is actually completely unevaluated, even though it is defined before the print statement. It is represented in memory by what is called a “thunk”. Of course, our program knows how to evaluate this thunk when the time comes. But for now, no value is there. When it prints the first element of the tuple, it still doesn’t completely evaluate myTuple. When the compiler sees us call the fst function on myTuple, it knows myTuple must be a tuple. Instead of seeing a single thunk at this point, the compiler sees myTuple as a tuple containing two unevaluated thunks.

Next, we print the first element of myTuple. Printing an expression forces it to be completely evaluated. So after this, the compiler sees myTuple as a tuple containing a string in its first position and an unevaluated thunk in its second position.

At the next step, we print the head of the second element of myTuple. This tells the compiler this second element must be a non-empty list. If it were empty, our program would actually crash here. This forces the evaluation of this first element (2). However, the remainder of the list remains an unevaluated thunk.

Next, we print the length of the list. Haskell will do enough work here to determine how many elements are in the list, but it will not actually evaluate any more items. The list is now an evaluated first element, and 3 unevaluated thunks. Finally, we print the full list. This evaluates the list in its entirety. If we did not do this last step, the final 3 elements would never be evaluated.

Advantages of Laziness

So all of this seems extremely convoluted. How exactly does laziness help us? Well, the main benefit is it makes our program faster. Consider the following example:

function1 :: Int
function1 = function2 exp1 exp2 exp3
  where
    exp1 = reallyLongFunction 1234
    exp2 = reallyLongFunction 3151
    exp3 = reallyLongFunction 8571

function2 :: Int -> Int -> Int -> Int
function2 exp1 exp2 exp3 = if exp1 < 1000
  then exp2
  else if exp1 < 2000
    then exp3
    else exp1

Comparable code in C++ or Java would need to make all three expensive calls to reallyLongFunction before calling function2 with the results. But in Haskell, the program will not call reallyLongFunction until it absolutely needs to.

So in this example, we will always evaluate exp1 in order to determine the result of the if statement in function2. However, if we happen to have exp1 >= 2000, then we’ll never evaluate exp2 or exp3! We don’t need them! We’ll save ourselves from having to make the expensive calls to reallyLongFunction. As a result, our program will run faster.

For another example, suppose we have a quicksort function. The quicksort algorithm works like this:

  1. Pick a partition element
  2. Move all elements smaller than the partition into one list
  3. Move all elements larger than the partition into another list
  4. Sort these two smaller lists recursively.
  5. Combine them with the partition.

In Haskell, we can leverage this function quite easily to just get the smallest 3 elements of a list:

smallest3 :: [Int] -> [Int]
smallest3 input= take 3 (quicksort input)

quicksort :: Ord a => [a] -> [a]
...

Since Haskell is lazy, we’ll never have to touch the larger half of the partition, even in the recursive calls. In an eager language, the compiler would run the full sorting algorithm. This would take much longer than we need.

Another cool advantage of laziness is the possibility of infinite lists. Here are two ways to create infinite lists:

allTwos :: [Int]
allTwos = repeat 2

first4 :: [Int]
first4 = cycle [1,2,3,4]

Infinite lists can provide elegant solutions to many list problems. We can never do something that will bring the full infinite list into scope by, say, printing it. But we can take the first n elements of an infinite list! The "infinite" remainder will stay as an unevaluated thunk.

Disadvantages of Laziness

Laziness does have some disadvantages though. In particular, when using recursion or recursive data structures, unevaluated thunks can build up in heap memory. If they consume all your memory, your program will crash. In particular, the foldl function suffers from this problem. The following usage will probably fail due to a memory leak.

foldl (+) 0 [1..10^7]

Laziness can also make it harder to reason about our code. Just because a number of expressions are defined in the same where clause does not mean that they will be evaluated at the same time. This can easily confuse beginners.

Introducing Strictness

Now, if we want, there are a few ways to "remove" laziness from our code and instead introduce strictness. The first is the seq function. It has the type:

seq :: a -> b -> b

It takes two arguments. The output is simply the second argument. However, the function is "strict" in its first argument. The first argument will be fully evaluated no matter what. For example, consider the following:

>> seq 3 "Hello"
"Hello"
>> seq undefined 3
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
  error, called at libraries/base/GHC/Err.hs:79:14 in base:GHC.Err
  undefined, called at <interactive>:2:5 in interactive:Ghci2

The foldl function has a strict counterpart foldl’. Observe how it uses seq to avoid the memory leak problem:

foldl’ f z [] = z
foldl’ f z (x:xs) = let z’ = z `f` x
  in seq z’ $ foldl’ f z’ xs

Try the same addition example from above with foldl’. It works perfectly well. By forcing the evaluation of the sum, we prevent the buildup of unevaluated thunks on the heap. This avoids the memory leak.

We can also introduce strictness into our datatypes. Let’s compare two different Person classes.

data Person = Person
  { pFirstName :: String
  , pLastName :: String
  , pAge :: Int }

data StrictPerson  = StrictPerson
  { firstName :: !String
  , lastName :: !String
  , age :: !Int }

The StrictPerson data type (with the exclamation points) is strict in all its arguments. Whenever a StrictPerson object is evaluated, all its fields will be evaluated as well. This is not the case with the regular Person type. One difference is that we can use record syntax to create an incomplete person. We cannot create an incomplete StrictPerson:

-- This works! (though the compiler might give us a warning)
ageOfIncomplete :: Int
ageOfIncomplete = pAge incompletePerson
  where
    incompletePerson = Person { pAge = 25}

-- This will not compile
ageOfIncompleteStrict :: Int
ageOfIncompleteStrict = age incompletePerson
  where 
    incompletePerson = StrictPerson {age = 25}

Naturally, even the Person object would fail if we tried to access the undefined pFirstName field. In general, it is good to introduce strictness into your data types. The exception is if one of the fields is possibly unnecessary and expensive to compute.

Summary

  1. Haskell is a lazy language. It does not evaluate expressions until it absolutely must.
  2. This frequently allows our programs to save time by avoiding unnecessary computation, but they are at more of a risk to leak memory.
  3. There are ways of introducing strictness into our programs when we don’t want lazy evaluation.

If you're ready to try out your Haskell skills, check out our free workbook featuring content on recursion and higher order functions, plus 10 practice problems!

Easing Haskell's Intimidating Glare

Immutability is Awesome