Taking a Byte out of Strings
Earlier this week we learned about the Text
type, which is a more efficient alternative to String
. But there's one more set of string-y types we need to learn about, and these are "ByteStrings"!
The Text
types capture a unicode representation of character data. But ByteString
is more low-level, storing its information at the "byte" level. A normal string is a list of the Char
type, but the fundamental underlying data structure of the ByteString
is a list of Word8
- an 8-bit (1 byte) unsigned integer!
This means that in the normal ByteString
library, the pack
and unpack
functions are reserved for converting back and forth between [Word8]
and the ByteString
, rather than a raw String
:
pack :: [Word8] -> ByteString
unpack :: ByteString -> [Word8]
This can make it seem as though it would be quite difficult to construct ByteStrings.
>> import qualified Data.ByteString as B
>> let b = B.pack [72, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, 33]
>> b
"Hello world!"
>> B.unpack b
[72, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, 33]
There are two ways around this. Once again, we can use the "OverloadedStrings" extension:
>> :set -XOverloadedStrings
>> let b = "Hello world!" :: B.ByteString
>> :t b
b :: B.ByteString
There is also another module called Data.ByteString.Char8
. In this module, the pack
and unpack
functions are used for strings instead of Word8
chunks. Note though, that all characters you use will be truncated to 8 bits, so your results may not be correct, especially if you go far beyond the simple ASCII character set.
>> import qualified Data.ByteString.Char8 as BC
>> let b = BC.pack "Hello World!"
>> b
"Hello world!"
>> :t b
BC.ByteString
>> :t (BC.unpack b)
[Char]
Like Text
, there are both strict and lazy versions of ByteString
, which you convert using the same way.
>> import qualified Data.ByteString as B
>> import qualified Data.ByteString.Lazy as BL
>> :set -XOverloadedStrings
>> let b1 = "Hello" :: B.ByteString
>> let b2 = "World" :: BL.ByteString
>> let b3 = BL.fromStrict b1
>> let b4 = BL.toStrict b2
>> :t b3
BL.ByteString
>> :t b4
B.ByteString
The last item to make a note of is that we can convert directly between Text
and ByteString
. This is often desirable because of the transcription errors that can occur using String
as a go-between. It also allows strictness or laziness to be maintained, since the following functions live in both Data.Text.Encoding
and Data.Text.Lazy.Encoding
.
The trick is that we have to have some idea of what encoding we are using. Most often, this is UTF-8. In this case, we can use the following functions:
encodeUtf8 :: Text -> ByteString
decodeUtf8 :: ByteString -> Text
Let's see these sorts of functions in action:
>> import qualified Data.Text as T
>> import qualified Data.ByteString as B
>> import Data.Text.Encoding
>> let t = T.pack "Hello world"
>> let b = encodeUtf8 t
>> b
"Hello world"
>> :t b
B.ByteString
>> let t2 = decodeUtf8 b
>> t2
"Hello world"
>> :t t2
T.Text
Encoding a Text
as a ByteString
will always succeed. But decoding a ByteString
might fail. This is because the raw bytes someone uses might not be a representation of a valid set of characters. So decodeUtf8
can throw errors in special cases. If you're concerned about catching these, you can use one of the following functions:
decodeUtf8' :: ByteString -> Either UnicodeException Text
decodeUtf8With :: OnDecodeError -> ByteString -> Text
Other encodings exist besides UTF-8, but you also need to know if it is "big-endian" or "little-endian", indicating the ordering of the bytes in the underlying representation:
encodeUtf16LE :: Text -> ByteString
decodeUtf32BEWith :: OnDecodeError -> ByteString -> Text
It can be difficult to keep all these conversions straight. But here's the TLDR, with 4 different conversions to remember:
- String <-> Text - Use
Data.Text
, withpack
andunpack
. - String <-> ByteString - Use
Data.ByteString.Char8
, withpack
andunpack
- Text <-> ByteString - Use
Data.Text.Encoding
withencodeUtf8
anddecodeUtf8
- Strict and Lazy - Use the "Lazy" module (
Data.Text.Lazy
orData.ByteString.Lazy
) withfromStrict
andtoStrict
.
Like text
, bytestring
is a separate package, so you'll need to include in your project with either Stack or Cabal (or Nix). To learn how to use the Stack tool, sign up for our free Stack mini-course!