Introduction

Welcome to the third and final part of our Haskell liftoff series! In case you missed them, here are the links to part 1 and part 2. In part 1 covered the basics of installing the Haskell platform. Then we dug into writing some basic Haskell expressions in the interpreter. In part 2, we started writing our own functions in Haskell modules. We also learned a lot of cool syntax tricks to build bigger and better functions.

Now in part three, we’re going to wrap up by going more in depth with the type system. We’re going to learn how to build our own types. We’ll also learn some interesting tricks to make it easier to describe our types. Once you’re done with this article, you should download our Haskell Beginner Checklist! It will give you a great summary of the skills you learned in this series and a few ways to test your knowledge. If you want to take these skills and learn how to make a Haskell project with them, you should check out our Stack Mini Course as well!

Making a New Data Type

Now on to data types! To start, let’s make a new file called “MyData.hs”. The first thing we’re going to do is create our own type. Let’s say we’re trying to model someone’s TODO list. We’ll create a “Task” data type to represent each individual task on their list. We create a data type by first using the “data” keyword and following it up with the type name. Then we'll add the = assignment operator:

module MyData where

data Task = ...

Notice that unlike the expressions and function names we used in the previous lessons, our type starts with a capital letter. This is what distinguishes types from normal expressions in Haskell. We’re now going to make our first constructor. A constructor is a special type of expression that allows us to create an object of our Task type. They have some similarities to constructors in, say, Java. But they’re also very different. Constructors have an uppercase name, and then they have a list of types. This list of types is the information contained by that constructor. In our case, we want our task to have a name, and an expected length of time (in minutes). We’ll represent the name with a string, and the length of time with an Int.

data Task = BasicTask String Int

And just like that, we can now start making Task objects! For instance, let’s define a couple basic tasks as expressions within our module:

task1 :: Task
task1 = BasicTask “Do assignment 1” 60

task2 :: Task
task2 = BasicTask “Do Laundry” 45

We could also load up our code in the interpreter to check that it still compiles and makes sense:

>> :l MyData.hs
>> :t task1
task1 :: Task
>> :t task2
task2 :: Task

Notice that the type of our expression is Task even though we construct the objects using the BasicTask constructor. Now in Java, we can have many constructors for the same type. We can also do this in Haskell but it looks a little different. Let’s define another type for the different locations where we can perform a task. We could perform a Task at school, the office, or at home. We’ll represent this by creating a constructor for each of these. We separate the constructors using the vertical bar |:

data Location = School |
  Office |
  Home

In this case, each of the constructors is a simple marker that has no parameters or data stored within it. We can technically make different types of expressions representing each of these:

schoolLocation :: Location
schoolLocation = School

officeLocation :: Location
officeLocation = Office

homeLocation :: Location
homeLocation = Home

But these expressions aren't any more useful than using the constructors themselves.

Now that we have a couple different types, we can actually have one of our types contain the other! We’ll add a new constructor to our task type. It will represent a more complicated task that also lists a location:

data Task = BasicTask String Int |
  ComplexTask String Int Location
...

complexTask :: Task
complexTask = ComplexTask “Write Memo” 30 Office

So this is very different from constructors in other language. We can actually have different fields for different representations of our type. We can wrap completely different data depending on the constructor we use. This is awesome and gives us a lot of flexibility that other languages struggle to give us.

Parameterized Types

Another cool thing we can do with our type definitions is to leave one of the types as a type parameter. This means that one or more of the fields actually depends on a type that the person writing the code gets to select. Let’s suppose we have a type that has a few basic constructors for different amounts of time. This would restrict our description of the time for the sake of simplicity.

data TaskLength = QuarterHour |
  HalfHour |
  ThreeQuarterHour |
  Hour |
  HourAndHalf |
  TwoHours |
  ThreeHours

Now we might want to describe a task where the length of the task is an Int. But we might also want a task to be able to use this new task length type. We can do this by parameterizing the Task type like so:

data Task a = BasicTask String a |
                      ComplexTask String a Location

The type a is now a mystery type that we can fill in as we please. But now whenever we list the Task type in a type signature, we have to fill in the proper definition:

task1 :: Task Int
task1 = BasicTask “Do assignment 1” 60

task1Different :: Task TaskLength
task1Different = BasicTask “Do assignment 1” Hour

task2 :: Task Int
task2 = BasicTask “Do Laundry” 45

complexTask :: Task TaskLength
complexTask = ComplexTask “Write Memo” HalfHour Office

We have to be careful though, since this can restrict our ability to do certain things. For instance, we cannot create a list that contains both task1 and complexTask. This is because the two expressions now have different types!

-- THIS WILL CAUSE A COMPILER ERROR
badTaskList :: [Task a]
badTaskList = [task1, complexTask]

List Example

Speaking of lists, we can actually unravel a bit of the mystery about how lists are implemented now.

There is a lot of syntactic sugar that changes how we actually write lists in practice. But at the source level, lists are defined by two constructors, Nil and Cons.

data List a = Nil |
  Cons a (List a)

As we should expect, the List type has a single type parameter. This is what allows us to either have [Int] or [String].The Nil constructor is an empty list. It contains no objects. So any time you’re using the [] expression, you’re actually using Nil. Then the second constructor concatenates a single element, with another list. The type of the element and the list must match up obviously. When you use the : operator to prepend an element to a list, you are really using the Cons constructor.

emptyList :: [Int]
emptyList = [] -- Actually Nil

fullList :: [Int]
-- Equivalent to Cons 1 (Cons 2 (Cons 3 Nil))
-- More commonly written as [1,2,3]
fullList = 1 : 2 : 3 : []

Another cool thing here is that our data structure is recursive. We can see in the Cons constructor how a list contains another list as a parameter. This works fine as fine as long as there’s some base case! In this situation, we have Nil. Imagine if we only had a single constructor and it took a recursive parameter. We’d be in a real pickle about how we create any list in the first place!

Record Syntax

So let’s go back to our basic, unparameterized Task data type. Suppose we don't care about the entire Task item. Rather, we want one of its pieces, like the name or time. As our code is now, the only real way to do that is to use a pattern match that reveals these fields.

import Data.Char (toUpper)

...

twiceLength :: Task -> Int
twiceLength (BasicTask name time) = 2 * time

capitalizedName :: Task -> String
capitalizedName (BasicTask name time) = map toUpper name

tripleTaskLength :: Task -> Task
tripleTaskLength (BasicTask name time) = BasicTask name (3 * time)

Now we can simplify this a teensy bit. You can use underscores instead of parameters that you won’t use. But even so, this can get very cumbersome if you have a data type that has a lot of fields. We could write our own functions allowing us to access individual fields. Of course, these will have to use pattern matching under the hood:

taskName :: Task -> String
taskName (BasicTask name _) = name

taskLength :: Task -> Int
taskLength (BasicTask _ time) = time

twiceLength :: Task -> Int
twiceLength task = 2 * (taskLength task)

capitalizedName :: Task -> String
capitalizedName task = map toUpper (taskName task)

tripleTaskLength :: Task -> Task
tripleTaskLength task = BasicTask (taskName task) (3 * (taskLength task))

But this approach doesn’t scale, since we’ll have to write these functions for every different field of every data type we create. And imagine how easy it is to use a "setter" method in Java. Compare that to tripleTaskLength above. We have to re-iterate most of the existing fields, which is tedious. The exciting news is that we can get Haskell to write these functions for us using record syntax. To do this, all we have to do is assign each field a name in our data definition:

data Task = BasicTask
  { taskName :: String
  , taskLength :: Int }

Now we can write the same code WITHOUT the “getter” functions we wrote above.

-- These will now work WITHOUT our separate definitions for “taskName” and 
-- “taskLength”
twiceLength :: Task -> Int
twiceLength task = 2 * (taskLength task)

capitalizedName :: Task -> String
capitalizedName task = map toUpper (taskName task)

Now when we construct tasks, we can still use the same BasicTask constructor we used before. But for code clarity, we can also initialize the object using record syntax, where we name the field:

task1 :: Task
task1 = BasicTask 
  { taskName = “Do assignment 1” 
  , taskLength = 60 }

task2 :: Task
task2 = BasicTask 
  { taskName = “Do Laundry”
  { taskLength = 45 }

We can also write a “setter” more easily using record syntax. We use the previous task and then a list of “changes” to make within braces:

tripleTaskLength :: Task -> Task
tripleTaskLength task = task { taskLength = 3 * (taskLength task) }

Generally, we only use record syntax when there is a single constructor for a data type. We can use different fields for different constructors, but our code becomes a bit less safe:

data Task = 
  BasicTask
    { taskName :: String,
      taskLength :: Int }
  |
  ComplexTask 
    { taskName :: String,
      taskLength :: Int,
      taskLocation :: Location }

The trouble with this system is that the compiler will generate a taskLocation function that will compile for any task. But the function will only be valid when called on a ComplexTask. So the following code will compile, even though it will cause a crash, and we want to avoid that:

causeError :: Location
causeError = taskLocation (BasicTask “Cause error” 10)

In addition, if our different constructors use different types, we can’t use the same name for them. This can be frustrating when we want to represent the same concept with different types. This example won’t compile because GHC cannot determine the type of the taskLength function. It could either have type Task -> Int or Task -> TaskLength.

data Task = 
  BasicTask
    { taskName :: String,
      taskLength :: Int }
  |
  ComplexTask 
    { taskName :: String,
      taskLength :: TaskLength, -- Note we use “TaskLength” and not an Int here!
      taskLocation :: Location }

The Type Keyword

So now we know most of the ins and outs of making our own data types. But there are times when you don’t need to do this. We can create new type names without making a completely new data structure. There are two ways to do this. The first is the type keyword. It allows you to create a synonym for a type, like the typedef keyword in C++. The most common of these, as we’ve seen, is that a String is actually a list of characters:

type String = [Char]

The most common use case for this is that you’ve combined many different types together in a tuple. It can be quite tedious to write this tuple down several times in your code:

makeTupleBigger :: (Int, String, Task) -> (Int, String, Task)
makeTupleBigger (intValue, stringValue, (BasicTask name time) = 
  (2 * intValue, map toUpper stringValue, (BasicTask (map toUpper name) (2 * time)))

A type synonym would make the signature here look a lot cleaner:

type TaskTuple = (Int, String, Task)

makeTupleBigger :: TaskTuple -> TaskTuple
makeTupleBigger (intValue, stringValue, (BasicTask name length) = 
  (2 * intValue, map toUpper stringValue, (BasicTask (map toUpper name) (2 * length))

Of course, if this collection of items shows up a lot, it might be worth making a full data type for it. There are some reasons why type synonyms aren’t always the best choice. For one thing, they can lead to compile errors that can be difficult to muddle through. You’ve probably come across a few errors already where the compiler told you it expected a [Char]. It would have been far more clear if it had said String.

It can also lead to some unintuitive code. Suppose you use a basic tuple instead of a data type to represent a task. Someone might expect your Task type to be its own data type. Then they’ll be a little confused when you manipulate it like a tuple:

type Task = (String, Int)

twiceTaskLength :: Task -> Int
-- “snd task” is a little confusing here
twiceTaskLength task = 2 * (snd task)

Newtypes

The last topic we’ll cover is “newtypes”. These are like type synonyms in some ways, and ADTs in other ways. But they still have a unique place in Haskell and it is good to get accustomed to using them. Let’s suppose we want to have a new approach to representing TaskLength. We want to use a regular number, but we want it to have its own separate type. We can do this using “newtype”:

newtype TaskLength = TaskLength Int

The syntax for newtypes looks a lot like defining an ADT. However, a newtype definition can only have a single constructor. Then that constructor can only take a single type argument. The big difference between an ADT and a newtype comes after your code is compiled. In this example, there won’t be a difference between the TaskLength and Int types at runtime. This is good because a lot of code for Int types is specialized to run fast on hardware. If we were to make this a true ADT, this would not be the case:

-- Not as fast!
data TaskLength = TaskLength Int

But otherwise, we can do a lot of the same tricks with our newtype that we can do with ADTs. We can, for instance, use record syntax in the constructor for our newtype. This allows us to use a name to unwrap the inside value without pattern matching on the type. A frequent pattern when using record syntax is to use something like “un-TypeName” value as the field name. Also note that we can’t use the newtype value with the same functions as the original type. When we had type synonyms, we could do this, but it won’t here:

data Task = BasicTask String TaskLength

newtype TaskLength = TaskLength
  { unTaskLength :: Int }

mkTask :: String -> Int -> Task
mkTask name time = BasicTask name (TaskLength time)

twiceLength :: Task -> Int
twiceLength task = 2 * (unTaskLength (taskLength task))
-- The following would be WRONG!
-- 2 * (taskLength task)

Now, TaskLength is effectively a wrapper type around an Int. This makes it seem a lot like a type synonym, except that we can’t simply use the Int value itself. As you can see in the examples above, we do have to go through the process of wrapping and unwrapping the value. This seems tedious. But it is quite useful because it solves the main problems we’ve seen from using type synonyms. Now if we make a mistake involving TaskLength, the compiler will tell us it’s about TaskLength. We won’t be wondering if there’s a synonym we’re missing! Suppose we have a function with several integral arguments. If we always use Int types, we can easier confuse the order of the arguments. But when we use a newtype, the compiler will catch this type of error for us.

Conclusion

This wraps up our discussion on creating your own data types and is the conclusion of our Liftoff Series! If you need a refresher, don’t forget to check out part 1 and part 2 to refresh yourself on the basics. If you want to have a quick guide to refreshing your memory about this series, you should download our free Beginner Checklist!

If you want to take the next step in your Haskell education, you should check out our Stack tutorial mini-course, which will walk you through how to use Stack and the Haskell platform to start making your own Haskell project!