Using Binary Mode in Haskell
So far in our IO adventures, we've only been dealing with plain text files. But a lot of data isn't meant to be read as string data. Some of the most interesting and important problems in computing today are about reading image data and processing it so our programs can understand what's going on. Executable program files are also in a binary format, rather than human readable. So today, we're going to explore how IO works with binary files.
First, it's important to understand that handles have encodings, which we can retrieve using hGetEncoding
. For the most part, your files will default as UTF-8
.
hGetEncoding :: Handle -> IO (Maybe TextEncoding)
main :: IO ()
main = do
hGetEncoding stdin >>= print
hGetEncoding stdout >>= print
h <- openFile "testfile.txt" ReadMode
hGetEncoding h >>= print
...
Just UTF-8
Just UTF-8
Just UTF-8
There are other encodings of course, like char8
, latin1
, and utf16
. These are different ways of turning text into bytes, and each TextEncoding
expression refers to one of these. If you know you have a file written in UTF16, you can change the encoding using hSetEncoding
:
hSetEncoding :: Handle -> TextEncoding -> IO ()
main :: IO ()
main = do
h <- openFile "myutf16file.txt" ReadMode
hSetEncoding h utf16
myString <- hGetLine h
...
But now notice that hGetEncoding
returns a Maybe
value. For binary files, there is no encoding! We are only allowed to read raw data. You can set a file to read as binary by using hSetBinaryMode True
, or by just using openBinaryFile
.
hSetBinaryMode :: Handle -> Bool -> IO ()
openBinaryFile :: FilePath -> IOMode -> IO Handle
main :: IO ()
main = do
h <- openBinaryFile "pic_1.bmp" ReadMode
...
When it comes to processing binary data, it is best to parse your input into a ByteString
rather than a string. Using the unpack
function will then allow you to operate on the raw list of bytes:
import qualified Data.ByteString as B
main :: IO ()
main = do
h <- openBinaryFile "pic_1.bmp" ReadMode
inputBytes <- B.hGetContents h
print $ length inputBytes
In this example, I've opened up an image files, and converted its data into a list of bytes (using the Word
type).
Further processing of the image will require some knowledge of the image format. As a basic example, I made a 24-bit bitmap with horizontal stripes throughout. The size was 16 pixels by 16 pixels. With 3 bytes (24 bits) per pixel, the total size of the "image" would be 768. So then upon seeing that my program above printed "822", I could figure out that the first 54 bits were just header data.
I could then separate my data into "lines" (48-byte chunks) and I successfully observed that each of these chunks followed a specific pattern. Many lines were all white (the only value was 255), and other lines had three repeating values.
import qualified Data.ByteString as B
import Data.List.Split (chunksOf)
main :: IO ()
main = do
h <- openBinaryFile "pic_1.bmp" ReadMode
inputBytes <- B.unpack <$> B.hGetContents h
let lines = chunksOf 48 (drop 54 inputBytes)
forM_ lines print
...
[255, 255, 255, ...]
[36, 28, 237, 36, 28, 237, ...]
[255, 255, 255, ...]
[76, 177, 34, 76, 177, 34 ...]
[255, 255, 255, ...]
[36, 28, 237, 36, 28, 237, ...]
[255, 255, 255, ...]
[76, 177, 34, 76, 177, 34 ...]
[255, 255, 255, ...]
[0, 242, 255, 0, 242, 255, ...]
[255, 255, 255, ...]
[232, 162, 0, 232, 162, 0, ...]
[255, 255, 255, ...]
[0, 242, 255, 0, 242, 255, ...]
[255, 255, 255, ...]
[232, 162, 0, 232, 162, 0, ...]
Now that the data is broken into simple numbers, it would be possible to do many kinds of mathematical algorithms on it if there were some interesting data to process!
In our last couple of IO articles, we'll keep looking at some issues with binary data. If you want monthly summaries of what we're writing here at Monday Morning Haskell, make sure to subscribe to our monthly newsletter! This will also give you access to our subscriber resources!