Need to be Faster? Be Lazy!

Jan 15

In more procedural and object oriented languages, we write code as a series of commands. These commands get executed in the order we write them in no matter what. Consider this example:

int myFunction(int a, int b, int c) {
  int result1 = longCalculation1(a,b);
  int result2 = longCalculation2(b,c);
  int result3 = longCalculation3(a,c);
  if (result1 < 10) {
    return result1;
  } else if (result2 < 100) {
    return result2;
  } else {
    return result3;
  }
}

There’s a clear inefficiency here. No matter what, we’ll perform all three long running operations. But we might not actually need all the results! We could rewrite the code to get around this.

int myFunction(int a, int b, int c) {
  int result1 = longCalculation1(a,b);
  if (result1 < 10) {
    return result1;
  } else {
    int result2 = longCalculation2(b,c);
    if (result2 < 100) {
     return result2;
    } else {
      int result3 = longCalculation3(a,c);
      return result3;
    }
  }
}

But now it’s a little less clear what’s going on. The code isn’t as readable. And there are some situations where this kind of refactoring is impossible. This is an inevitable consequence of the paradigm of eager evaluation in almost all mainstream languages. In Haskell we write expressions, rather than commands. Thus evaluation order is a little less clear. In fact, Haskell expressions are evaluated lazily. We don’t perform any calculations until we’re sure they’re needed! Let’s see how this works.

How Laziness Works

Here’s how we can write the function above in Haskell:

myFunction :: Int -> Int -> Int -> Int
myFunction a b c =
  let result1 = longCalculation1 a b
      result2 = longCalculation2 b c
      result3 = longCalculation3 a c
  in if result1 < 10
       then result1
       else if result2 < 100
         then result2
         else result3

While this seems semantically identical to the first C++ version, it actually runs as efficiently as the second version! In Haskell, result1, result2, and result3 get stored as “thunks”. GHC sets aside a piece of memory for the result, and knows what calculation it has to perform to get the result. But it doesn’t perform the calculation until we need the result.

Here’s another example. Suppose we want all Pythagorean triples whose sum is less than 1000. Sounds like a tall order. But enter the following into GHCI, and you’ll see that it happens very quickly!

>> let triples = [(a,b,c) | a <- [1..1000], b <- [1..1000], c <- [1..1000], a + b + c < 1000, (a ** 2) + (b**2) == c ** 2]

Did it perform all that calculation so quickly? Of course not! If you now print triples, it will take a while to print it all out. But suppose we only wanted 5 examples! It doesn’t take too long!

>> take 5 triples
[(3.0,4.0,5.0),(4.0,3.0,5.0),(5.0,12.0,13.0),(6.0,8.0,10.0),(7.0,24.0,25.0)]

As we see, an element is typically only brought into scope when it is needed by an IO action, such as print. If you’re using GHCI and print the result of a calculation, you’ll need to evaluate the whole calculation. Otherwise, you’ll need the calculation once it (or an expression that depends on it) gets printed by main.

Infinite Lists as a Consequence of Laziness

Besides potentially saving time, laziness has some other interesting consequences. One of these is that we can have data structures that can’t exist in other languages. For instance, we can define an “infinite” list:

>> let infList = [1..]

This list starts at 1, and each element counts up by 1, going up to infinity! But how is this possible? We don’t have an infinite amount of memory! The key is that we don’t actually bring any of the elements into scope until we need them. For example, we can take the first 10 elements of an infinite list.

>> take 10 [1..]
[1,2,3,4,5,6,7,8,9,10]

Of course, if we try to print the entire list, we’ll run into problems!

>> [1..]
(Endless printing of numbers)

But there are some cool things we can do with infinite lists. For instance, it’s easy to match up a list of elements with the numeric index of the element in the list. We can do this by using zip in conjunction with an infinite list:

addIndex :: [a] -> [(Int, a)]
addIndex = zip [1..]

Or we could match every element with its index modulo 4:

addIndexMod4 :: [a] -> [(Int, a)]
addIndexMod4 = zip (cycle [0,1,2,3])

Disadvantages of Laziness

Haskell’s laziness isn’t infallible. It can get us into trouble sometimes. While it often saves us time, it can cost us in terms of space. This is apparent even in a simple example using foldl.

>> foldl (+) 0 [1..100000000]
Stack overflow!

When we add the numbers up through 100 million, we should be able to do it with constant memory. All we would need would be a running tally of the current sum. On the one hand, laziness means that the entire list of a hundred million numbers is never all in scope at the same time. But on the other hand, all the calculations involved in that running tally happen lazily! So at some point, our memory footprint actually looks like this:

(((((1 + 2) + 3) + 4) + …) + 100000000)

That is, all the individual numbers are in memory at the same time because the + operations aren’t evaluated until they need to be! In situations like this example, we want to introduce strictness into our code. We can do this with the seq function. This function is a little special. It takes two arguments, and returns the second of them. However, it is strict in its first argument. That is, the first item we pass to it will be fully evaluated.

We can see this in use in the definition of foldl’, the strict counterpart to foldl:

foldl’ f accum [] = accum
foldl’ f accum (x : xs) = let newAccum = f accum x
                           in seq newAccum $ foldl’ f newAccum xs

The use of seq here causes Haskell to evaluate newAccum strictly, so we don't keep storing calculations in memory. Using this technique, we can now actually add up that list of integers!

>> foldl’ (+) 0 [1..100000000]
5000000050000000

Conclusion

Laziness is another feature that makes Haskell pretty unique among programming languages. Like any language feature, it has its drawbacks. It gives us yet another way we have to reason differently about Haskell compared to other languages. But it also has some distinct advantages. We can have some significantly faster code in many cases. It also allows us to use structures like infinite lists that don’t exist in other languages.

Hopefully this has convinced you to give Haskell a try. Take a look at our Getting Started Checklist and get going!

Beginners

James Bowen

Need to be Faster? Be Lazy!

How Laziness Works

Infinite Lists as a Consequence of Laziness

Disadvantages of Laziness

Conclusion

Functors Done Quick!

Immutability: The Less Things Change, The More You Know