Taking a Byte out of Strings

Earlier this week we learned about the Text type, which is a more efficient alternative to String. But there's one more set of string-y types we need to learn about, and these are "ByteStrings"!

The Text types capture a unicode representation of character data. But ByteString is more low-level, storing its information at the "byte" level. A normal string is a list of the Char type, but the fundamental underlying data structure of the ByteString is a list of Word8 - an 8-bit (1 byte) unsigned integer!

This means that in the normal ByteString library, the pack and unpack functions are reserved for converting back and forth between [Word8] and the ByteString, rather than a raw String:

pack :: [Word8] -> ByteString

unpack :: ByteString -> [Word8]

This can make it seem as though it would be quite difficult to construct ByteStrings.

>> import qualified Data.ByteString as B
>> let b = B.pack [72, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, 33]
>> b
"Hello world!"
>> B.unpack b
[72, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, 33]

There are two ways around this. Once again, we can use the "OverloadedStrings" extension:

>> :set -XOverloadedStrings
>> let b = "Hello world!" :: B.ByteString
>> :t b
b :: B.ByteString

There is also another module called Data.ByteString.Char8. In this module, the pack and unpack functions are used for strings instead of Word8 chunks. Note though, that all characters you use will be truncated to 8 bits, so your results may not be correct, especially if you go far beyond the simple ASCII character set.

>> import qualified Data.ByteString.Char8 as BC
>> let b = BC.pack "Hello World!"
>> b
"Hello world!"
>> :t b
BC.ByteString
>> :t (BC.unpack b)
[Char]

Like Text, there are both strict and lazy versions of ByteString, which you convert using the same way.

>> import qualified Data.ByteString as B
>> import qualified Data.ByteString.Lazy as BL
>> :set -XOverloadedStrings
>> let b1 = "Hello" :: B.ByteString
>> let b2 = "World" :: BL.ByteString
>> let b3 = BL.fromStrict b1
>> let b4 = BL.toStrict b2
>> :t b3
BL.ByteString
>> :t b4
B.ByteString

The last item to make a note of is that we can convert directly between Text and ByteString. This is often desirable because of the transcription errors that can occur using String as a go-between. It also allows strictness or laziness to be maintained, since the following functions live in both Data.Text.Encoding and Data.Text.Lazy.Encoding.

The trick is that we have to have some idea of what encoding we are using. Most often, this is UTF-8. In this case, we can use the following functions:

encodeUtf8 :: Text -> ByteString

decodeUtf8 :: ByteString -> Text

Let's see these sorts of functions in action:

>> import qualified Data.Text as T
>> import qualified Data.ByteString as B
>> import Data.Text.Encoding
>> let t = T.pack "Hello world"
>> let b = encodeUtf8 t
>> b
"Hello world"
>> :t b
B.ByteString
>> let t2 = decodeUtf8 b
>> t2
"Hello world"
>> :t t2
T.Text

Encoding a Text as a ByteString will always succeed. But decoding a ByteString might fail. This is because the raw bytes someone uses might not be a representation of a valid set of characters. So decodeUtf8 can throw errors in special cases. If you're concerned about catching these, you can use one of the following functions:

decodeUtf8' :: ByteString -> Either UnicodeException Text

decodeUtf8With :: OnDecodeError -> ByteString -> Text

Other encodings exist besides UTF-8, but you also need to know if it is "big-endian" or "little-endian", indicating the ordering of the bytes in the underlying representation:

encodeUtf16LE :: Text -> ByteString

decodeUtf32BEWith :: OnDecodeError -> ByteString -> Text

It can be difficult to keep all these conversions straight. But here's the TLDR, with 4 different conversions to remember:

  1. String <-> Text - Use Data.Text, with pack and unpack.
  2. String <-> ByteString - Use Data.ByteString.Char8, with pack and unpack
  3. Text <-> ByteString - Use Data.Text.Encoding with encodeUtf8 and decodeUtf8
  4. Strict and Lazy - Use the "Lazy" module (Data.Text.Lazy or Data.ByteString.Lazy) with fromStrict and toStrict.

Like text, bytestring is a separate package, so you'll need to include in your project with either Stack or Cabal (or Nix). To learn how to use the Stack tool, sign up for our free Stack mini-course!

Previous
Previous

Loading Different Strings

Next
Next

Con-Text-ualizing Strings