Buffering...Please Wait...

Today we continue our exploration of more obscure IO concepts with the idea of buffering. Buffering determines the more precise mechanics of how our program reads and writes with files. In the right circumstance, using the proper buffering method can make your program work a lot more efficiently.

To start, let's consider the different options Haskell offers us. The BufferMode type has three options:

data BufferMode =
  NoBuffering |
  LineBuffering |
  BlockBuffering (Maybe Int)

Every handle has an assigned buffering mode. We can get and set this value using the appropriate functions:

hGetBuffering :: Handle -> IO BufferMode

hSetBuffering :: Handle -> BufferMode -> IO ()

By default, terminal handles will use NoBuffering and file handles will use BlockBuffering:

main :: IO ()
main = do
  hGetBuffering stdin >>= print
  hGetBuffering stdout >>= print
  (openFile "myfile.txt" ReadMode) >>= hGetBuffering >>= print
  (openFile "myfile2.txt" WriteMode) >>= hGetBuffering >>= print

...

NoBuffering
NoBuffering
BlockBuffering Nothing
BlockBuffering Nothing

So far this seems like some nice trivia to know, but what do these terms actually mean?

Well, when your program reads and writes to files, it doesn't do the "writing" at the exact time you expect. When your program executes hPutStr or hPutStrLn, the given string will be added to the handle's buffer, but depending on the mode, it won't immediately be written out to the file.

If you use NoBuffering though, it will be written immediately. Once the buffer has even a single character, it will write this character to the file. If you use LineBuffering, it will wait until it encounters a newline character.

Finally, there is BlockBuffering. This constructor holds an optional number. The buffer won't write until it contains the given number of bytes. If the value is Nothing, then the underlying number just depends on the operating system.

This idea might sound dangerous to you. Does this mean that it's likely that your program will just leave data unwritten if it doesn't get the right amount? Well no. You can also flush buffers, which will cause them to write their information out no matter what. This happens automatically on important operations like hClose (remember to close your handles!). You can also do this manually with the hFlush function:

hFlush :: Handle -> IO ()

For the most part, you won't notice the difference in buffer modes on normal programs. But under certain circumstances, it can make a big difference in performance. The act of writing information to a file is actually a very long and expensive operation as far as programs are concerned. So doing fewer writes with larger amounts of data tends to be more efficient than doing more writes with smaller amounts of data.

Hopefully you can see now why BlockBuffering is an option. Typically, this is the most efficient way if you're writing a large amount of data, while NoBuffering is the least efficient.

To these this out, I wrote a simple program to write out one hundred thousand numbers to a file, and timed it with different buffer modes:

someFunc :: IO ()
someFunc = do
  let numbers = [1..100000]
  h <- openFile "number.txt" WriteMode
  hSetBuffering h NoBuffering
  timestamp1 <- getCurrentTime
  forM_ numbers (hPrint h)
  hClose h
  timestamp2 <- getCurrentTime
  print $ diffUTCTime timestamp2 timestamp1

When running with NoBuffering, this operation took almost a full second: 0.93938s. However, when I changed to LineBuffering, it dropped to 0.2367s. Finally, with BlockBuffering Nothing, I got a blazing fast 0.05473s. That's around 17x faster! So if you're writing a large amount of data to a file, this can make a definite difference!

If you're writing a program where write-performance is important, I hope this knowledge helps you! Even if not, it's good to know what kinds of things are happening under the hood. If you want to keep up to date with more Haskell knowledge, both obscure and obvious, make sure to subscribe to our monthly newsletter! If you're just starting out, this will give you access to resources like our Beginners Checklist and Recursion Workbook!

Previous
Previous

Interactive IO

Next
Next

Using Temporary Files