Con-Text-ualizing Strings
In the past couple weeks of string exploration, we've only consider the basic String
type, which is just an alias for a list of characters:
type String = [Char]
But it turns out that this representation has quite a few drawbacks. There are other string representations that are more compact and result in more efficient operations, which is paramount when you are parsing a large amount of data.
But since the type system plays such a strong role in Haskell, each of these different string representations must have its own type. Today we'll talk about Text
, which is one of these alternatives.
Rather than storing a raw list of the Char
type, a Text
object stores values as unicode characters. This allows many operations to be much faster.
But perhaps the first and most important thing to learn when you're a beginner is how to convert back and forth between a normal String
and Text
. This is done with the pack
and unpack
functions:
import Data.Text (Text, pack, unpack)
pack :: String -> Text
unpack :: Text -> String
Because the underlying representations are a bit different, not every String
can be converted into Text
in a comprehensible manner. But if you're sticking to the basic ASCII character set, everything will work fine.
>> let s = "ABCD" :: String
>> let t = pack s
>> t
"ABCD"
>> :t t
Text
>> unpack t
"ABCD"
>> :t (unpack t)
[Char]
The Text library has many different functions for manipulating Text
objects. For example, append
will combine two Text
items, and cons
will add a character to the front. Many of these overlap with Prelude functions, so it is usually best to import this module in a qualified way.
>> import qualified Data.Text as T
>> let t1 = T.pack "Hello"
>> let t2 = T.pack "World"
>> T.append t1 (T.cons ' ' t2)
"Hello World"
Naturally, Text
implements the IsString
class we talked about a little while ago. So if you enable OverloadedStrings
, you don't actually need to use pack
to initialize it! You can just use a string literal.
>> import qualified Data.Text as T
>> :set -XOverloadedStrings
>> let t = "Hello" :: T.Text
Now technically there are two different Text
types. So far, we've referred to "strict" Text
objects. These must store all of their data in memory at once. However, we can also have "lazy" Text
objects. These make use of Haskell's laziness mechanics so that you can stream data without having to store it all at once. All the operations are the same, they just come from the Data.Text.Lazy
module instead of Data.Text
!
>> import qualified Data.Text.Lazy as TL
>> let t1 = TL.pack "Hello"
>> let t2 = TL.pack "World"
>> TL.append t1 (TL.cons ' ' t2)
"Hello World"
There will often come times where you need to convert back and forth between these. The Lazy module contains functions for doing this.
>> import qualified Data.Text as T
>> import qualified Data.Text.Lazy as TL
>> let t1 = T.pack "Hello"
>> let t2 = TL.pack "World"
>> let t3 = TL.fromStrict t1
>> let t4 = TL.toStrict t2
>> :t t3
TL.Text
>> :t t4
T.Text
Because these types live in the text package, and this package is not included in base
, you'll need to know how dependencies work in use it in your projects. To learn more about this, take our free Stack mini-course! You'll learn how to make a project using Stack and add dependencies to it.