Con-Text-ualizing Strings

In the past couple weeks of string exploration, we've only consider the basic String type, which is just an alias for a list of characters:

type String = [Char]

But it turns out that this representation has quite a few drawbacks. There are other string representations that are more compact and result in more efficient operations, which is paramount when you are parsing a large amount of data.

But since the type system plays such a strong role in Haskell, each of these different string representations must have its own type. Today we'll talk about Text, which is one of these alternatives.

Rather than storing a raw list of the Char type, a Text object stores values as unicode characters. This allows many operations to be much faster.

But perhaps the first and most important thing to learn when you're a beginner is how to convert back and forth between a normal String and Text. This is done with the pack and unpack functions:

import Data.Text (Text, pack, unpack)

pack :: String -> Text

unpack :: Text -> String

Because the underlying representations are a bit different, not every String can be converted into Text in a comprehensible manner. But if you're sticking to the basic ASCII character set, everything will work fine.

>> let s = "ABCD" :: String
>> let t = pack s
>> t
"ABCD"
>> :t t
Text
>> unpack t
"ABCD"
>> :t (unpack t)
[Char]

The Text library has many different functions for manipulating Text objects. For example, append will combine two Text items, and cons will add a character to the front. Many of these overlap with Prelude functions, so it is usually best to import this module in a qualified way.

>> import qualified Data.Text as T
>> let t1 = T.pack "Hello"
>> let t2 = T.pack "World"
>> T.append t1 (T.cons ' ' t2)
"Hello World"

Naturally, Text implements the IsString class we talked about a little while ago. So if you enable OverloadedStrings, you don't actually need to use pack to initialize it! You can just use a string literal.

>> import qualified Data.Text as T
>> :set -XOverloadedStrings
>> let t = "Hello" :: T.Text

Now technically there are two different Text types. So far, we've referred to "strict" Text objects. These must store all of their data in memory at once. However, we can also have "lazy" Text objects. These make use of Haskell's laziness mechanics so that you can stream data without having to store it all at once. All the operations are the same, they just come from the Data.Text.Lazy module instead of Data.Text!

>> import qualified Data.Text.Lazy as TL
>> let t1 = TL.pack "Hello"
>> let t2 = TL.pack "World"
>> TL.append t1 (TL.cons ' ' t2)
"Hello World"

There will often come times where you need to convert back and forth between these. The Lazy module contains functions for doing this.

>> import qualified Data.Text as T
>> import qualified Data.Text.Lazy as TL
>> let t1 = T.pack "Hello"
>> let t2 = TL.pack "World"
>> let t3 = TL.fromStrict t1
>> let t4 = TL.toStrict t2
>> :t t3
TL.Text
>> :t t4
T.Text

Because these types live in the text package, and this package is not included in base, you'll need to know how dependencies work in use it in your projects. To learn more about this, take our free Stack mini-course! You'll learn how to make a project using Stack and add dependencies to it.

Previous
Previous

Taking a Byte out of Strings

Next
Next

When Strings get Word-y