James Bowen James Bowen

Functional Programming vs. Object Oriented Programming

Functional Programming (FP) and Object Oriented Programming (OOP) are the two most important programming paradigms in use today. In this article, we'll discuss these two different programming paradigms and compare their key differences, strengths and weaknesses. We'll also highlight a few specific ways Haskell fits into this discussion. Here's a quick outline if you want to skip around a bit!

What is a Programming Paradigm?

A paradigm is a way of thinking about a subject. It's a model against which we can compare examples of something.

In programming, there are many ways to write code to solve a particular task. Our tasks normally involve taking some kind of input, whether data from a database or commands from a user. A program's job is then to produce outputs of some kind, like updates in that database or images on the user's screen.

Programming paradigms help us to organize our thinking so that we can rapidly select an implementation path that makes sense to us and other developers looking at the code. Paradigms also provide mechanisms for reusing code, so that we don't have to start from scratch every time we write a new program.

The two dominant paradigms in programming today are Object Oriented Programming (OOP) and Functional Programming (FP).

The Object Oriented Paradigm

In object oriented programming, our program's main job is to maintain objects. Objects almost always store data, and they have particular ways of acting on other objects and being acted on by other objects (these are the object's methods). Objects often have mutable data - many actions you take on your objects are capable of changing some of the object's underlying data.

Object oriented programming allows code reuse through a system called inheritance. Objects belong to classes which share the same kinds of data and actions. Classes can inherit from a parent class (or multiple classes, depending on the language), so that they also have access to the data from the base class and some of the same code that manipulates it.

The Functional Paradigm

In functional programming, we think about programming in terms of functions. This idea is rooted in the mathematical idea of a function. A function in math is a process which takes some input (or a series of different inputs) and produces some kind of output. A simple example would be a function that takes an input number and produces the square of that number. Many functional languages emphasize pure functions, which produce the exact same output every time when given the same input.

In programming, we may view our entire program as a function. It is a means by which some kind of input (file data or user commands), is transformed into some kind of output (new files, messages on our terminal). Individual functions within our program might take smaller portions of this input and produce some piece of our output, or some intermediate result that is needed to eventually produce this output.

In functional programming, we still need to organize our data in some way. So some of the ideas of objects/classes are still used to combine separate pieces of data in meaningful ways. However, we generally do not attach "actions" to data in the same way that classes do in OOP languages.

Since we don't perform actions directly on our data, functional languages are more likely to use immutable data as a default, rather than mutable data. (We should note though that both paradigms use both kinds of data in their own ways).

Functional Programming vs. OOP

The main point of separation between these paradigms is the question of "what is the fundamental building block of my program?" In object oriented programming, our programs are structured around objects. Functions are things we can do to an object or with an object.

In functional programming, functions are always first class citizens - the main building block of our code. In object oriented programming, functions can be first class citizens, but they do not need to be. Even in languages where they can be, they often are not used in this way, since this isn't as natural within the object oriented paradigm.

Object Oriented Programming Languages

Many of the most popular programming languages are OOP languages. Java, for a long time the most widely used language, is perhaps the most archetypal OO language. All code must exist within an object, even in a simple "Hello World" program:

class MyProgram {
  public static void main(String[] args) {
    System.out.println("Hello World!");
  }
}

In this example, we could not write our 'main' function on its own, without the use of 'class MyProgram'.

Java has a single basic 'Object' class, and all other classes (including any new classes you write) must inherit from it for basic behaviors like memory allocation. Java classes only allow single inheritance. This means that a class cannot inherit from multiple different types. Thus, all Java classes you would use can be mapped out on a tree structure with 'Object' as the root of the tree.

Other object oriented languages use the general ideas of classes, objects, and inheritance, but with some differences. C++ and Python both allow multiple inheritance, so that a class can inherit behavior from multiple existing classes. While these are both OOP languages, they are also more flexible in allowing functions to exist outside of classes. A basic script in either of these languages need not use any classes. In Python, we'd just write:

if __name__ == "__main__":
  print("Hello World!")

In C++, this looks like:

int main() {
  std::cout << "Hello World!" << std::endl;
}

These languages also don't have such a strictly defined inheritance structure. You can create classes that do not inherit from anything else, and they'll still work.

FP Languages

Haskell is perhaps the language that is most identifiable with the functional paradigm. Its type system and compiler really force you to adopt functional ideas, especially around immutable data, pure functions, and tail call optimization. It also embraces lazy evaluation, which is aligned with FP principles, but not a requirement for a functional language.

There are several other programming languages that generally get associated with the functional paradigm include Clojure, OCaml, Lisp, Scala and Rust. These languages aren't all functional in the same way as Haskell; there are many notable differences. Lisp bills itself specifically as a multi-paradigm language, and Scala is built to cross-compile with Java! Meanwhile Rust's syntax looks more object oriented, but its inheritance system (traits) feel much more like Haskell. However, on balance, these languages express functional programming ideas much more than their counterparts.

Amongst the languages mentioned in the object oriented section, Python has the most FP features. It is more natural to write functions outside of your class objects, and concepts like higher order functions and lambda expressions are more idiomatic than in C++ or Java. This is part of the reason Python is often recommended for beginners, with another reason being that its syntax makes it a relatively simple language to learn.

Advantages of Functional Programming

Fewer Bugs

FP code has a deserved reputation for having fewer bugs. Anecdotally, I certainly find I have a much easier time writing bug free code in Haskell than Python. Many bugs in object oriented code are caused by the proliferation of mutable state. You might pass an object to a method and expect your object to come back unchanged...only to find that the method does in fact change your object's state. With objects, it's also very easy for unstated pre-conditions to pop up in class methods. If your object is not in the state you expect when the method is called, you'll end up with behavior you didn't intend.

A lot of function-based code makes these errors impossible by imposing immutable objects as the default, if not making it a near requirement, as Haskell does. When the function is the building block of your code, you must specify precisely what the inputs of the function are. This gives you more opportunities to determine pre-conditions for this data. It also ensures that the return results of the function are the primary way you affect the rest of your program.

Functions also tend to be easier to test than objects. It is often tricky to create objects with the precise state you want to assess in a unit test, whereas to test a function you only need to reproduce the inputs.

More Expressive, Reasonable Design

The more you work with functions as your building blocks, and the more you try to fill your code with pure functions, the easier it will be to reason about your code. Imagine you have a couple dozen fields on an object in OO code. If someone calls a function on that object, any of those fields could impact the result of the method call.

Functions give you the opportunity to narrow things down to the precise values that you actually need to perform the computation. They let you separate the essential information from superfluous information, making it more obvious what the responsibilities are for each part of your code.

Multithreading

You can do parallel programming no matter what programming language you're using, but the functional programming paradigm aligns very well with parallel processing. To kick off a new thread in any language, you pretty much always have to pass a function as an argument, and this is more natural in FP. And with pure functions that don't modify shared mutable objects, FP is generally much easier to break into parallelizable pieces that don't require complex locking schemes.

Disadvantages of Functional Programming

Intuition of Complete Objects

Functional programming can feel less intuitive than object oriented programming. Perhaps one reason for this is that object oriented programming allows us to reason about "complete" objects, whose state at any given time is properly defined.

Functions are, in a sense, incomplete. A function is not a what that you can hold as a picture in your head. A function is a how. Given some inputs, how do you produce the outputs? In other words, it's a procedure. And a procedure can only really be imagined as a concrete object once you've filled in its inputs. This is best exemplified by the fact that functions have no native 'Show' instance in Haskell.

>> show (+)
No instance for Show (Integer -> Integer -> Integer) arising from a use of 'show'

If you apply the '+' function to arguments (and so create what could be called an "object"), then we can print it. But until then, it doesn't make much sense. If objects are the building block of your code though, you could, hypothetically, print the state of the objects in your code every step of the way.

Mutable State can be Useful!

As much as mutable state can cause a lot of bugs, it is nonetheless a useful tool for many problems, and decidedly more intuitive for certain data structures. If we just imagine something like the "Snake" game, it has a 2D grid that remains mostly the same from tick to tick, with just a couple things updating. This is easier to capture with mutable data.

Web development is another area where mutable objects are extremely useful. Anytime the user enters information on the page, some object has to change! Web development in FP almost requires its own paradigm (see "Functional Reactive Programming"). Haskell can represent mutable data, but the syntax is more cumbersome; you essentially need a separate data structure. Likewise, other functional languages might make mutability easier than Haskell, but mutability is still, again, more intuitive when objects are your fundamental building block, rather than functions on those objects.

We can see this even with something as simple as loops. Haskell doesn't perform "for-loops" in the same way as other languages, because most for loops essentially rely on the notion that there is some kind of state updating on each iteration of the loop, even if that state is only the integer counter. To write loops in Haskell, you have to learn concepts like maps and folds, which require you to get very used to writing new functions on the fly.

A Full Introduction to Haskell (and its Functional Aspects)

So functional programming languages are perhaps a bit more difficult to learn, but can offer a significant payoff if you put in the time to master the skills. Ultimately, you can use either paradigm for most kinds of projects and keep your development productive. It's down to your personal preference which you try while building software.

If you really want to dive into functional programming though, Haskell is a great language, since it will force you to learn FP principles more than other functional languages. For a complete introduction to Haskell, you should take a look at Haskell From Scratch, our beginner-level course for those new to the language. It will teach you everything you need to know about syntax and fundamental concepts, while providing you with a ton of hands-on practice through exercises and projects.

Haskell From Scratch also includes Making Sense of Monads, our course that shows the more functional side of Haskell by teaching you about the critical concept of monads. With these two courses under your belt, you'll be well on your way to mastery of functional programming! Head over here to learn more about these courses!

Read More
James Bowen James Bowen

How to Write Comments in Haskell

Comments are often a simple item to learn, but there's a few ways we can get more sophisticated with them! This article is all about writing comments in Haskell. Here's a quick outline to get you started!

  • What is a Comment?
  • Single Line Comments
  • Multi-Line Comments
  • Inline Comments
  • Writing Formal Documentation Comments
  • Intro to Haddock
  • Basic Haddock Comments
  • Creating Our Haskell Report
  • Documenting the Module Header
  • Module Header Fields
  • Haddock Comments Below
  • Commenting Type Signatures
  • Commenting Constructors
  • Commenting Record Fields
  • Commenting Class Definitions
  • A Complete Introduction to the Haskell Programming Language

    What is a Comment?

    A comment is non-code note you write in a code file. You write it to explain what the code does or how it works, in order to help someone else reading it. Comments are ignored by a language's compiler or interpreter. There is usually some kind of syntax to comments to distinguish them from code. Writing comments in Haskell isn't much different from other programming languages. But in this article, we'll look extensively at Haddock, a more advanced program for writing nice-looking documentation.

    Single Line Comments

    The basic syntax for comments in Haskell is easy, even if it is unusual compared to more common programming languages. In languages like Java, Javascript and C++, you use two forward slashes to start a single line comment:

    int main() {
    // This line will print the string value "Hello, World!" to the console
    std::cerr << "Hello, World!" << std::endl;
    }

    But in Haskell, single line comments start with two hyphens, '--':

    -- This is our 'main' function, which will print a string value to the console
    main :: IO ()
    main = putStrLn "Hello World!"

    You can have these take up an entire line by themselves, or you can add a comment after a line of code. In this simple "Hello World" program, we place a comment at the end of the first line of code, giving instructions on what would need to happen if you extended the program.

    main :: IO ()
    main = -- Add 'do' to this line if you add another 'putStrLn' statement!
    putStrLn "Hello World!"

    Multi-Line Comments

    While you can always start multiple consecutive lines with whatever a comment line starts with in your language, many languages also have a specific way to make multiline comments. And generally speaking, this method has a "start" and an "end" sequence. For example, in C++ or Java, you start a multi line comment block with the characters '/' and end it with '/'

    /*
    This function returns a new list
    that is a reversed copy of the input. 
    
    It iterates through each value in the input 
    and uses 'push_front' on the new copy.
    */
    std::list<int> reverseList(const std::list<int>& ints) {
    std::list<int> result;
    for (const auto& i : ints) {
      result.push_front(i);
    }
    return result;
    }

    In Haskell, it is very similar. You use the brace and a hyphen character to open ('{-') and then the reverse to close the block ('-}').

    {- This function returns a new list
     that is a reversed copy of the input.
    
     It uses a tail recursive helper function.
    -}
    reverse :: [a] -> [a]
    reverse = reverseTail []
    where
      reverseTail acc [] = acc
      reverseTail acc (x : xs) = reverseTail (x : acc) xs

    Notice we don't have to start every line in the comment with double hyphens. Everything in there is part of the comment, until we reach the closing character sequence. Comments like these with multiple lines are also known as "block comments". They are useful because it is easy to add more information to the comment without adding any more formatting.

    Inline Comments

    While you generally use the brace/hyphen sequence to write a multiline comment, this format is surprisingly also useful for a particular form of single line comments. You can write an "inline" comment, where the content is in between operational code on that line.

    reverse :: [a] -> [a]
    reverse = reverseTail []
    where
      reverseTail {- Base Case -}      acc [] = acc
      reverseTail {- Recursive Case -} acc (x : xs) = reverseTail (x : acc) xs

    The fact that our code has a start and end sequence means that the compiler knows where the real code starts up again. This is impossible when you use double hyphens to signify a comment.

    Writing Formal Documentation Comments

    If the only people using this code will be you or a small team, the two above techniques are all you really need. They tell people looking at your source code (including your future self) why you have written things in a certain way, and how they should work. However, if other people will be using your code as a library without necessarily looking at the source code, there's a much deeper area you can explore. In these cases, you will want to write formal documentation comments. A documentation comment tells someone what a function does, generally without going into the details of how it works. More importantly, documentation comments are usually compiled into a format for someone to look at outside of the source code. These sorts of comments are aimed at people using your code as a library. They'll import your module into their own programs, rather than modifying it themselves. You need to answer questions they'll have like "How do I use this feature?", or "What argument do I need to provide for this function to work"? You should also consider having examples in this kind of documentation, since these can explain your library much better than plain statements. A simple code snippet often provides way more clarification than a long document of function descriptions.

    Intro to Haddock

    As I mentioned above, formal documentation needs to be compiled into a format that is more readable than source code. In most cases, this requires an additional tool. Doxygen, for example, is one tool that supports many programming languages, like C++ and Python. Haskell has a special tool called Haddock. Luckily, you probably don't need to go through any additional effort to install Haddock. If you used GHCup to install Haskell, then Haddock comes along with it automatically. (For a full walkthrough on getting Haskell installed, you can read our Startup Guide). It also integrates well with Haskell's package tools, Stack and Cabal. In this article we'll use it through Stack. So if you want to follow along, you should create a new Haskell project on your machine with Stack, calling it 'HaddockTest'. Then build the code before we add comments so you don't have to wait for it later:

    >> stack new HaddockTest
    >> cd HaddockTest
    >> stack build

    You can write all the code from the rest of the article in the file 'src/Lib.hs', which Stack creates by default.

    Basic Haddock Comments

    Now let's see how easy it is to write Haddock comments! To write basic comments, you just have to add a vertical bar character after the two hyphens:

    -- | Get the "block" distance of two 2D coordinate pairs
    manhattanDistance :: (Int, Int) -> (Int, Int) -> Int
    manhattanDistance (x1, y1) (x2, y2) = abs (x2 - x1) + abs (y2 - y1)

    It still works even if you add a second line without the vertical bar. All comment lines until the type signature or function definition will be considered part of the Haddock comment.

    -- | Get the "block" distance of two 2D coordinate pairs
    -- This is the sum of the absolute difference in x and y values.
    manhattanDistance :: (Int, Int) -> (Int, Int) -> Int
    manhattanDistance (x1, y1) (x2, y2) = abs (x2 - x1) + abs (y2 - y1)

    You can also make a block comment in the Haddock style. It involves the same character sequences as multi line comments, but once again, you just add a vertical bar after the start sequence. The end sequence does not need the bar:

    {-| Get the "block" distance of two 2D coordinate pairs
     This is the sum of the absolute difference in x and y values.
    -}
    manhattanDistance :: (Int, Int) -> (Int, Int) -> Int
    manhattanDistance (x1, y1) (x2, y2) = abs (x2 - x1) + abs (y2 - y1)

    No matter which of these options you use, your comment will look the same in the final document. Next, we'll see how to generate our Haddock document. To contrast Haddock comments with normal comments, we'll add a second function in our code with a "normal" single line comment. We also need to add both functions to the export list of our module at the top: `haskell module Lib ( someFunc, , manhattanDistance , euclidenDistance ) where

...

-- Get the Euclidean distance of two 2D coordinate pairs (not Haddock) euclideanDistance :: (Double, Double) -> (Double, Double) -> Double euclideanDistance (x1, y1) (x2, y2) = sqrt ((x2 - x1) ^ 2 + (y2 - y1) ^ 2)

Now let's create our document!
## Creating Our Haskell Report
To generate our document, we just use the following command:
```bash
>> stack haddock

This will compile our code. At the end of the process, it will also inform us about what percentage of the elements in our code used Haddock comments. For example:

25% (  1 /  4) in 'Lib'
  Missing documentation for:
    Module header
    someFunc (src/Lib.hs:7)
    euclideanDistance (src/Lib.hs:17)

As expected, 'euclideanDistance' is not considered to have a Haddock comment. We also haven't defined a Haddock comment for our module header. We'll do that in the next section. We'll get rid of the 'someFunc' expression, which is just a stub. This command will generate HTML files for us, most importantly an index file! They get generated in the '.stack-work' directory, usually in a folder that looks like '{project}/.stack-work/install/{os}/{hash}/{ghc_version}/doc/'. For example, the full path of my index file in this example is:

/home/HaddockTest/.stack-work/install/x86_64-linux-tinfo6/6af01190efdb20c14a771b6e2823b492cb22572e9ec30114989156919ec4ab3a/9.6.3/doc/index.html

You can open the file with your web browser, and you'll find a mostly blank page listing the modules in your project, which at this point should only be 'Lib'. If you click on 'Lib', it will take you to a page that looks like this:

We can see that all three expressions from our file are there, but only 'manhattanDistance' has its comment visible on the page. What's neat is that the type links all connect to documentation for the base libraries. If we click on 'Int', it will take us to the page for the 'base' package module 'Data.Int', giving documentation on 'Int' and other integer types.

Documenting the Module Header

In the picture above, you'll see a blank space between our module name and the 'Documentation' section. This is where the module header documentation should go. Let's see how to add this into our code. Just as Haddock comments for functions should go above their type signatures, the module comment should go above the module declaration. You can start it with the same format as you would have with other Haddock block comments:

{-| This module exposes a couple functions
    related to 2D distance calculation.
-}
module Lib
  ( manhattanDistance
  , euclideanDistance
  ) where

...

If you rerun 'stack haddock' and refresh your Haddock page, this comment will now appear under 'Lib' and above 'Documentation'. This is the simplest thing you can do to provide general information about the module.

Module Header Fields

However, there are also additional fields you can add to the header that Haddock will specifically highlight on the page. Suppose we update our block comment to have these lines:

{-|
Module: Lib
Description: A module for distance functions.
Copyright: (c) Monday Morning Haskell, 2023
License: MIT
Maintainer: person@mmhaskell.com

The module has two functions. One calculates the "Manhattan" distance, or "block" distance on integer 2D coordinates. The other calculates the Euclidean distance for a floating-point coordinate system.
-}
module Lib
  ( manhattanDistance
  , euclideanDistance
  ) where

...

At the bottom of the multi line comment, after all the lines for the fields, we can put a longer description, as you see. After adding this, removing 'someFunc', and making our prior comment on Euclidean distance a Haddock comment, we now get 100% marks on the documentation for this module when we recompile it:

100% (  3 /  3) in 'Lib'

And here's what our HTML page looks like now. Note how the fields we entered are populated in the small box in the upper right.

Note that the short description we gave is now visible next to the module name on the index page. This page still only contains the description below the fields.

Haddock Comments Below

So far, we've been using the vertical bar character to place Haddock comments above our type signatures. However, it is also possible to place comments below the type signatures, and this will introduce us to a new syntax technique that we'll use for other areas. The general idea is that we can use a caret character '^' instead of the vertical bar, indicating that the item we are commenting is "above" or "before" the comment. We can do this either with single line comments or block comments. Here's how we would use this technique with our existing functions:

manhattanDistance :: (Int, Int) -> (Int, Int) -> Int
-- ^ Get the "blocK" distance of two 2D coordinate pairs
manhattanDistance (x1, y1) (x2, y2) = abs (x2 - x1) + abs (y2 - y1)

euclideanDistance :: (Double, Double) -> (Double, Double) -> Double
{- ^ Get the Euclidean distance of two 2D coordinate pairs
     This uses the Pythagorean formula.
-}
euclideanDistance (x1, y1) (x2, y2) = sqrt ((x2 - x1) ^ 2 + (y2 - y1) ^ 2)

The comments will appear the same in the final documentation.

Commenting Type Signatures

The comments we've written so far have described each function as a unit. However, sometimes you want to make notes on specific function arguments. The most common way to write these comments in Haskell with Haddock is with the "above" style. Each argument goes on its own line with a "caret" Haddock comment after it. Here's an example:

-- | Given a base point and a list of other points, returns
-- the shortest distance from the base point to a point in the list.
shortestDistance ::
  (Double, Double) -> -- ^ The base point we are measuring from
  [(Double, Double)] -> -- ^ The list of alternative points
  Double
shortestDistance base [] = -1.0
shorestDistance base rest = minimum $ (map (euclideanDistance base) rest)

It is also possible to write these with the vertical bar above each argument, but then you will need a second line for the comment.

-- | Given a base point and a list of other points, returns
-- the shortest distance from the base point to a point in the list.
shortestDistance ::
  -- | The base point we are measuring from
  (Double, Double) ->
  -- | The list of alternative points
  [(Double, Double)] -> 
  Double
shortestDistance base [] = -1.0
shorestDistance base rest = minimum $ (map (euclideanDistance base) rest)

It is even possible to write the comments before AND on the same line as inline comments. However, this is less common since developers usually prefer seeing the type as the first thing on the line.

Commenting Constructors

You can also use Haddock comments for type definitions. Here is an example of a data type with different constructors. Each gets a comment.

data Direction =
  DUp    | -- ^ Positive y direction
  DRight | -- ^ Positive x direction
  DDown  | -- ^ Negative y direction
  DLeft    -- ^ Negative x direction

Commenting Record Fields

You can also comment record fields within a single constructor.

data Movement = Movement
  { direction :: Direction -- ^ Which way we are moving
  , distance  :: Int       -- ^ How far we are moving
  }

An important note is that if you have a constructor on the same line as its fields, a single caret comment will refer to the constructor, not to its last field.

data Point =
  Point2I Int Int       |      -- ^ 2d integral coordinate
  Point2D Double Double |      -- ^ 2d floating point coordinate
  Point3I Int Int Int   |      -- ^ 3d integral coordinate
  Point3D Double Double Double -- ^ 3d floating point coordinate

Commenting Class Definitions

As one final feature, we can add these sorts of comments to class definitions as well. With class functions, it is usually better to use "before" comments with the vertical bar. Unlike constructors and fields, an "after" comment will get associated with the argument, not the method.

{-| The Polar class describes objects which can be described
    in "polar" coordinates, with a magnitude and angle
-}
class Polar a where
  -- | The total length of the item
  magnitude :: a -> Double 
  -- | The angle (in radians) of the point around the z-axis
  angle :: a -> Double

Here's what all these new pieces look like in our documentation:

You can see the way that each comment is associated with a particular field or argument.

A Complete Introduction to the Haskell Programming Language

Of course, comments are useless if you have no code or projects to write them in! If you're a beginner to Haskell, the fastest way to get up to writing project-level code is our course, Haskell From Scratch! This course features hours of video lectures, over 100 programming exercises, and a final project to test your skills! Learn more about it on this page!

Read More
James Bowen James Bowen

How to Write “Hello World” in Haskell

In this article we're going to write the easiest program we can in the Haskell programming language. We're going to write a simple example program that prints "Hello World!" to the console. It's such a simple program that we can do it in one line! But it's still the first thing you should do when starting a new programming language. Even with such a simple program there are several details we can learn about writing a Haskell program. Here's a quick table of contents if you want to jump around!

Now let's get started!

Writing Haskell "Hello World"

To write our "Haskell Hello World" program, we just need to open a file named 'HelloWorld.hs' in our code editor and write the following line:

main = putStrLn "Hello World!"

This is all the code you need! With just this one line, there's still another way you could write it. You could use the function 'print' instead of 'putStrLn':

main = print "Hello World!"

These programs will both accomplish our goal, but their behavior is slightly different! But to explore this, we first need to run our program!

The Simplest Way to Run the Code

Hopefully you've already installed the Haskell language tools on your machine. The old way to do this was through Haskell Platform, but now you should use GHCup. You can read our Startup Guide for more instructions on that! But assuming you've installed everything, the simplest way to run your program is to use the 'runghc' command on your file:

>> runghc HelloWorld.hs

With the first version of our code using 'putStrLn', we'll see this printed to our terminal:

Hello World!

If we use 'print' instead, we'll get this output:

"Hello World!"

In the second example, there are quotation marks! To understand why this is, we need to understand a little more about types, which are extremely important in Haskell code.

Functional Programming and Types

Haskell is a functional programming language with a strong, static type system. Even something as simple as our "Hello World" program is comprised of expressions, and each of these expressions has a type. For that matter, our whole program has a type!

In fact, every Haskell program has the same type: 'IO ()'. The IO type signifies any expression which can perform Input/Output activities, like printing to the terminal and reading user input. Most functions you write in Haskell won't need to do these tasks. But since we're printing, we need the IO signifier. The second part of the type is the empty tuple, '()'. This is also referred to as the "unit type". When used following 'IO', it is similar to having a 'void' return value in other programming languages.

Now, our 'main' expression signifies our whole program, and we can explicitly declare it to have this type by putting a type signature above it in our code. We give the expression name, two colons, and then the type:

main :: IO ()
main = putStrLn "Hello World!"

Our program will run the same with the type signature. We didn't need to put it there, because GHC, the Haskell compiler, can usually infer the types of expressions. With more complicated programs, it can get stuck without explicit type signatures, but we don't have to worry about that right now.

Requirements of an Executable Haskell Program

Now if we gave any other type to our main function, we won't be able to run our program! Our file is supposed to be an entry point - the root of an executable program. And Haskell has several requirements for such files.

These files must have an expression named 'main'. This expression must have the type 'IO ()'. Finally, if we put a module name on our code, that module name should be Main. Module names go at the top of our file, prefaced by "module", and followed by the word "where". Here's how we can explicitly declare the name of our module:

module Main where

main :: IO ()
main = putStrLn "Hello World!"

Like the type signature on our function 'main', GHC could infer the module name as well. But let's try giving it a different module name:

module HelloWorld where

main :: IO ()
main = putStrLn "Hello World!"

For most Haskell modules you write, using the file name (minus the '.hs' extension) IS how you want to name the module. But runnable entry point modules are different. If we use the 'runghc' command on this code, it will still work. However, if we get into more specific behaviors of GHC, we'll see that Haskell treats our file differently if we don't use 'Main'.

Using the GHC Compiler

Instead of using 'runghc', a command designed mainly for one-off files like this, let's try to compile our code more directly using the Haskell compiler. Suppose we have used HelloWorld as the module name. What files does it produce when we compile it with the 'ghc' command?

>> ghc HelloWorld.hs
[1 of 1] Compiling HelloWorld       ( HelloWorld.hs, HelloWorld.o )
>> ls
HelloWorld.hi HelloWorld.hs HelloWorld.o

This produces two output files beside our source module. The '.hi' file is an interface file. The '.o' file is an object file. Unfortunately, neither of these are runnable! So let's try changing our module name back to Main.

module Main where

main :: IO ()
main = putStrLn "Hello World!"

Now we'll go back to the command line and run it again:

>> ghc HelloWorld.hs
[1 of 2] Compiling Main       ( HelloWorld.hs, HelloWorld.o )
[2 of 2] Linking HelloWorld
>> ls 
HelloWorld HelloWorld.hi HelloWorld.hs HelloWorld.o

This time, things are different! We now have two compilation steps. The first says 'Compiling Main', referring to our code module. The second says 'Linking HelloWorld'. This refers to the creation of the 'HelloWorld' file, which is executable code! (On Windows, this file will be called 'HelloWorld.exe'). We can "run" this file on the command line now, and our program will run!

>> ./HelloWorld
Hello World!

Using GHCI - The Haskell Interpreter

Now there's another simple way for us to run our code. We can also use the GHC Interpreter, known as GHCI. We open it with the command 'ghci' on our command line terminal. This brings us a prompt where we can enter Haskell expressions. We can also load code from our modules, using the ':load' command. Let's load our hello world program and run its 'main' function.

>> ghci
GHCI, version 9.4.7: https://www.haskell.org/ghc/   :? for help
ghci> :load HelloWorld
[1 of 2] Compiling Main          ( HelloWorld.hs, interpreted )
ghci> main
Hello World!

If we wanted, we could also just run our "Hello World" code in the interpreter itself:

ghci> putStrLn "Hello World!"
Hello World!

It's also possible to assign our string to a value and then use it in another expression:

ghci> let myString = "Hello World!"
ghci> putStrLn myString
Hello World!

A Closer Look at Our Types

A very useful function of GHCI is that it can tell us the types of our expressions. We just have to use the ':type' command, or ':t' for short. We have two expressions in our Haskell program: 'putStrLn', and "Hello World!". Let's look at their types. We'll start with "Hello World!":

ghci> :type "Hello World!"
"Hello World!" :: String

The type of "Hello World!" itself is a 'String'. This is the name given for a list of characters. We can look at the type of an individual character as well:

ghci> :type 'H'
'H' :: Char

What about 'putStrLn'?

ghci> :t putStrLn
putStrLn :: String -> IO ()

The type for 'putStrLn' looks like 'String -> IO ()'. Any type with an arrow in it ('->') is a function. It takes a 'String' as an input and it returns a value of type 'IO ()', which we've discussed. In order to apply a function, we place its argument next to it in our code. This is very different from other programming languages, where you usually need parentheses to apply a function on arguments. Once we apply a function, the type of the resulting expression is just whatever is on the right side of the arrow. So applying our string to the function 'putStrLn', we get 'IO ()' as the resulting type!

ghci> :t putStrLn "Hello World!"
putStrLn "Hello World!" :: IO ()

Compilation Errors

For a different example, let's see what happens if we try to use an integer with 'putStrLn':

ghci> putStrLn 5
No instance for (Num String) arising from the literal '5'

The 'putStrLn' function only works with values of the 'String' type, while 5 has a type more like 'Int'. So we can't use these expressions together.

A Quick Look At Type Classes

However, this is where 'print' comes in. Let's look at its type signature:

ghci> :t print
print :: Show a => a -> IO ()

Unlike 'putStrLn', the 'print' function takes a more generic input. A "type class" is a general category describing a behavior. Many different types can perform the behavior. One such class is 'Show'. The behavior is that Show-able items can be converted to strings for printing. The 'Int' type is part of this type class, so we can use 'print' with it!

ghci> print 5
5

When use 'show' on a string, Haskell adds quotation marks to the string. This is why it looks different to use 'print' instead of 'putStrLn' in our initial program:

ghci> print "Hello World!"
"Hello World!"

Echo - Another Example Program

Our Haskell "Hello World" program is the most basic example of a program we can write. It only showed one side of the input/output equation. Here's an "echo" program, which first waits for the user to enter some text on the command line and then prints that line back out:

main :: IO ()
main = do
  input <- getLine
  putStrLn input

Let's quickly check the type of 'getLine':

ghci> :t getLine
getLine :: IO String

We can see that 'getLine' is an IO action returning a string. When we use the backwards arrow '<-' in our code, this means we unwrap the IO value and get the result on the left side. So the type of 'input' in our code is just 'String', meaning we can then use it with 'putStrLn'! Then we use the 'do' keyword to string together two consecutive IO actions. Here's what it looks like to run the program. The first line is us entering input, the second line is our program repeating it back to us!

>> runghc Echo.hs
I'm entering input!
I'm entering input!

A Complete Introduction to the Haskell Programming Language

Our Haskell "Hello World" program is the most basic thing you can do with the language. But if you want a comprehensive look at the syntax and every fundamental concept of Haskell, you should take our beginners course, Haskell From Scratch.

You'll get several hours of video lectures, plus a lot of hands-on experience with 100+ exercise problems with automated testing.

All-in-all, you'll only need 10-15 hours to work through all the material, so within a couple weeks you'll be ready for action! Read more about the course here!

Read More
James Bowen James Bowen

Black Friday Sale: Last Day!

We've come to Cyber Monday, marking the last day of our Black Friday sale! Today is your last chance to get big discounts on all of our courses. You can get 20% by using the code BFSOLVE23 at checkout. Or you can subscribe to our mailing list to receive a 30% discount code. You must use these codes by the end of the day in order to get the discount!

Here's a final runthrough of the courses we have available, including our newest course, Solve.hs!

Solve.hs

We just released the first part of our newest course last week! These two detailed modules dive into the fundamentals of problem solving in Haskell. You'll get to rewrite the list type and most of its API from scratch, teaching you all the different ways you can write "loop" code in Haskell. Then you'll get an in-depth look at how data structures work in Haskell, including the quick process to learn a data structure from start to finish!

Course Page

Normal Price: $89 Sale Price: $71.20 Subscriber Price: $62.30

Haskell From Scratch

This is our extensive, 7-module beginners course. You'll get a complete introduction to Haskell's syntax and core concepts, including things like monads and tricky type conversions.

Course Page

Normal Price: $99 Sale Price: $79.20 Subscriber Price: $69.30

Practical Haskell

Practical Haskell is designed to break the idea that "Haskell is only an academic language". In our longest and most detailed course, you'll learn the ins and outs of communicating with a database in Haskell, building a web server, and connecting that server to a functional frontend page. You'll also learn about the flexibility that comes with Haskell's effect systems, as well as best practices for testing your code, including tricky test cases like IO based functions!

Course Page

Normal Price: $149 Sale Price: $119.20 Subscriber Price: $104.30

Making Sense of Monads

The first of our shorter, more targeted courses, Making Sense of Monads will teach you how to navigate monads, one of Haskell's defining concepts. This idea is a bit tricky at first but also quite important for unleashing Haskell's full power. The course is well suited to beginners who know all the basic syntax but want more conceptual practice.

Note that Making Sense of Monads is bundled with Haskell From Scratch. So if you buy the full beginners course, you'll get this in-depth look at monads for free!

Course Page

Normal Price: $29 Sale Price: $23.20 Subscriber Price: $20.30

Effectful Haskell

If Making Sense of Monads is best for teaching the basics of monads, Effectful Haskell will show you how to maximize the potential of this idea. You'll develop a more complete idea of what we mean by "effects" in your code. You'll see a variety of ways to incorporate them into your code and learn some interesting ideas about effect substitution!

Course Page

Normal Price: $39 Sale Price: $31.20 Subscriber Price: $27.30

Haskell Brain

Last, but not least, Haskell Brain will teach you how to perform machine learning tasks in Haskell with TensorFlow. There's a lot of steps involved in linking these two technologies. So while machine learning is a valuable skill to have in today's world, understanding the ways we can link software together is almost as valuable!

Course Page

Normal Price: $39 Sale Price: $31.20 Subscriber Price: $27.30

Conclusion

So don't miss out on this special offer! You can use the code BFSOLVE23 for 20% off, or you can subscribe to our mailing list to get a code for 30% off! This offer ends tonight, so don't wait!

Read More
James Bowen James Bowen

Spotlight: Quick, Focused Haskell Courses

A couple days ago I gave a brief spotlight on the longer, more in-depth courses I've written. The newest of these is Solve.hs, with its focus on problem solving, and the original two I wrote were Haskell From Scratch and Practical Haskell.

After my first two courses, I transitioned towards writing a few shorter courses. These are designed to teach vital concepts in a shorter period of time. They all consist of just a single module and have a shorter total lecture time (1.5 to 2 hours each). You can finish any of them in a concentrated 1-2 week effort. Today I'll give a brief summary of each of these, listed from most abstract to most practical, and easiest to hardest.

Remember, all of these are on sale at 20% off using the code BFSOLVE23 at checkout! You can also subscribe to our mailing list to get an even bigger discount, at 30% off!

Making Sense of Monads

This is for those of you who have been writing Haskell long enough that you've got the hang of the syntax, but you still struggle a bit to understand monads. You might look at parts of Modules 4 and 5 from Haskell From Scratch and think they look useful, but you don't think you need the rest of the course.

Making Sense of Monads really "zooms in" on Module 5. It goes deeper in understanding all of the simpler structures that help us understand monads, and it gives a sizable amount of practice with writing monadic code. You'll also get a crash course on parsing (a common use of monadic operations), and write two fairly complex parsers. So it's a great option if you want a shorter but more concentrated approach on some of the basics!

Effectful Haskell

Effectful Haskell takes a lot of the core ideas and concepts in Making Sense of Monads and goes one step beyond into the more practical realm of applying monadic effects in a program. You'll learn more abstractly what an effect is, but then also the different ways to incorporate polymorphic effects into your Haskell program. You'll see how to use monads and monad classes to swap effectful behaviors in your program, and why this is useful.

This course culminates in a similar (but smaller) project to Practical Haskell, where you'll deploy an effectful web server to Heroku.

Haskell Brain

This course is the hardest and most practically-oriented of this series. You will take on the challenge of incorporating TensorFlow and machine learning into Haskell. This is easier said than done, because TensorFlow has many dependencies beyond the normal packages you can simply pick up on Hackage. So you'll gain valuable experience going through this installation process, and then we'll run through some of the main information you need to know when it comes to creating tensors in Haskell, and building moderately complex models.

Conclusion

So while these courses are shorter, they still pack a decent amount of material! And with the subscriber discount, you can get each of them for less than $30! This offer will only last until Monday though, so make up your mind quickly!

Read More
James Bowen James Bowen

Spotlight: In-Depth Haskell Courses!

On Monday, I announced the release of Part 1 of Solve.hs, our newest course focused on problem solving. In addition, this entire week is our Black Friday sale, with two options to save on any of our courses, including the newest one. Your first option is to just use the code BFSOLVE23, which will get you 20% off. You can also get a special discount code by signing up for our mailing list, which will get you 30% off!

Some of our courses are shorter and more focused, while others are longer and cover more ground. Today on the blog I wanted to highlight the longer courses available at Monday Morning Haskell Academy. These go into a lot of detail on many different subjects, so they can fill a large gap in your Haskell awareness. However, each obviously has its own general subject area. Here they are, ordered from most beginner-level to most advanced.

Haskell From Scratch

Our very first course, Haskell From Scratch, is perfect for the curious reader of Haskell who has hardly (if ever) gotten around to writing any lines of Haskell code. Over the course of 7 modules, you'll learn everything you really need to get started writing some code. We'll go over a bit of basic setup, cover Haskell's core syntax extensively, and go over many of the important conceptual ideas that separate Haskell from other languages, like immutability, laziness, and monads. You will get many chances to try your new skills out with an abundant series of practice problems, as well as a final project!

Learn more about it on the course page!

Solve.hs

Solve.hs is our newest course, with the first two modules released this week! It will teach you the fundamental patterns of problem solving in Haskell. You'll start with an in-depth look at Haskell's lists and how we can use them to implement loop patterns. Then you'll learn A LOT of details about Haskell's data structures, including some implementation ideas that are unique to Haskell. It's a great choice if you have a solid grasp on the basic syntax and fundamental concepts of Haskell, but are looking to improve your fluency in the language.

Find out more by going here!

Practical Haskell

Practical Haskell is, by some distance, our longest and most detailed course, with close to 6 hours of lectures and an additional 4 hours of screencast content spread over 5 modules. You'll build different parts of a functioning website, starting with the database connections, then the server, and finally a frontend written in Elm. You'll also learn how to launch this server on Heroku, as well as some tips for best practices in testing.

Take a look on the MMH Academy page!

Conclusion

These courses are all great for helping you take the next big step in your Haskell journey. Until next Monday, you can get 20% off all of them using the code BFSOLVE23 when you check out! And you can get an even bigger discount (30%) if you subscribe to our mailing list! So don't miss out on this opportunity; you've only got 5 more days!

Read More
James Bowen James Bowen

New Course: Solve.hs!

A few weeks ago I hinted that, while I hadn't been able to publish consistently for much of this year, I had been working on something important. And now, I'm finally ready for the big reveal!

Today I am excited to announce the release of Part 1 of my new course Solve.hs. This course is focused on problem solving in Haskell. This announcement also kicks off the Black Friday sale week for Monday Morning Haskell, where you can get 20% off any of our courses with the code BFSOLVE23. And if you subscribe to our mailing list, you'll get an even bigger discount, 30% off on everything, including Solve.hs!

Why Problem Solving?

I settled on problem solving as a great topic for a course because it occupies a good middle ground between, on the one hand, super basic setup and syntax (covered by Setup.hs and Haskell From Scratch, respectively), and on the other hand, more complex practical topics covered by courses like Practical Haskell, Effectful Haskell, and Haskell Brain.

Problem solving is a relatively quick and frictionless path to getting familiar with a language (especially one like Haskell that defies many conventions from more common languages). Once you have your toolchain set up, it's not hard to find problems that will quickly stretch your understanding of the language and even teach you some pretty tricky concepts! With problem solving, it's also easier to do a lot of different problems and get a lot of "reps" in, so to speak.

My actual Haskell career trajectory with Haskell went pretty quickly from learning the basics to jumping into practical applications like web servers. This is entirely doable, but it does run the risk of pigeon-holing you into some pretty specific areas of understanding, causing you to skip many of the fundamentals.

Problem solving is a great tool to fill this gap. It forces you to learn about new techniques that you might not need on a particular project, but which will help you grow and be able to apply to future projects. Let's learn a bit more about the course itself.

The Course

This course release is exciting for me because, even just considering the first part, it's the longest and most detailed course I've written since Practical Haskell. It doesn't appear to be that long, consisting of only two modules. But by several important metrics, it actually rivals the 7-module Haskell From Scratch. It has nearly as much lecture material, about as many practice problems, and the amount of code you'll actually have to write for the exercises is quite a bit more!

Course Lecture Time Practice Problems Solution LOC*
Haskell From Scratch 4h05m ~180 1316
Solve.hs 3h47m ~180 2587

*Solution LOC measures the GitHub diff comparing my answers branches to the original

Solve.hs is good for users with many levels of Haskell experience. Beginner and intermediate Haskellers will benefit a lot from learning the core patterns in this first part, and even experienced Haskellers will find some of the practice problems challenging. To see why, let's look at the two modules currently on offer!

Module 1: Lists and Loop Patterns

In the first module, we'll do extensive work with lists. Lists are one of the basic building blocks of Haskell - the simplest collection type you can have. To use them well, and so become "fluent" in Haskell, you need to familiarize yourself with a lot of different patterns of how to use them.

In Module 1, we'll explicitly name these patterns and draw helpful comparisons to code from other languages. This will help you finally understand how to answer questions like "how do I write a for-loop in Haskell?"

You'll learn large parts of the List API by implementing the type from scratch, as well as dozens of its helper functions. This will help you drill in the patterns of how lists work, and what operations we can (and cannot) do efficiently with them.

Module 2: Data Structures

Once you're well-versed in all the things Haskell can do with lists, it will be time to learn about other structures that allow for different efficient operations. This is the subject of Module 2. Now I've written plenty about Data Structures before. In fact there are a couple ebooks you can check out if you want.

But this module goes way deeper. In Solve.hs, we'll look into how some of these structures are implemented under the hood, and specifically how they're implemented differently in Haskell compared to other languages.

There are plenty of practice problems to help you get a feel for each structure. You'll learn about some of my favorite "derived structures" - abstract ideas that use the core structures under the hood, but express specific operations cleanly. On top of that, you'll get to implement a couple different data structures from scratch, which is super helpful for training yourself to understand how these structures perform in real life.

Part 2 - 2024

I wanted to release Part 1 of this course now, since these two modules on their own can be super impactful for beginning and intermediate Haskellers. Also, Advent of Code is coming up, a time when I have focused this blog's attention on problem solving anyway for the last couple years. So this course makes a perfect match for those of you who want to dive into Advent of Code this year, writing all your solutions in Haskell!

However, since problem solving is such a rich field with many areas to explore, I plan to add two more modules to this course in 2024. Module 3 will cover Essential Algorithms, with a special focus on graph algorithms in Haskell. Then Module 4 will go in-depth with Parsing, since many problem solving contests like Advent of Code do require you to parse your problem input, often from odd formats. I also intend to incorporate some material related to scale and performance testing.

After the full course release, Part 1 will continue to be its own option for purchase. There is, however, no risk in buying Part 1 now. Anyone who purchases Part 1 before the full release will receive a discount coupon for the full course based on what they paid for Part 1.

So, if you're interested, head to the course sales page to learn more and enroll in the course! But first make sure to subscribe to the Monday Morning Haskell mailing list, which will give you a 30% discount code for this and all of our other courses! This offer will only be good until Monday, November 27th, so don't wait!

Read More
James Bowen James Bowen

Ballparking Solutions

In last week's article, we went over how to build a simple benchmarking library for ourselves by timing computations. Now most of the time, we're inclined to use benchmarking in a purely reactive way. We write a solution to a problem, and then document how long it takes.

But perhaps the most important use of benchmarking is proactive. In this article, we'll learn how to use ballpark estimates to guide our problem-solving process. This can save you a lot of time when you're working on performance-intensive problems, whether on Advent of Code, Hackerrank, or, of course, in the real world of software development!

The Uses of Ballparking

A ballpark estimate essentially allows us to provide an order of magnitude on the time we expect our solution to take. We should be able to abstractly define the runtime of our solution with Big-O notation. And given this designation, as well as a problem size, we should be able to determine if we'll get an answer within a second, a few seconds, a few minutes, or a few hours.

Using this ballpark estimate, we can then determine if our proposed solution is even feasible, without needing to implement the solution. Whether we're solving a problem on Advent of Code, Hackerrank, Leetcode, or in the real world, we can usually have some idea of the scale of a problem. AoC will let you observe the "large" problem input on which to test your code. Hackerrank will give you an idea of the bounds of various problem parameters.

These problem parameters can, in fact, tell us if a solution is acceptable before we even write that solution. For example, if my input problem size is 10000, roughly how long should my program take if it's O(n)? What about O(n log n)? What about O(n^2)?

Now it's up to situational judgment to determine what's "acceptable". As a general rule, no one wants to wait around for hours to wait for their code to finish. And yet, in real world machine learning development, taking several hours to train a model on the latest data is quite commonplace. However, Hackerrank will reject your solution (or at least not give you full points) if it takes more than a few seconds on the largest input. In Advent of Code meanwhile, you run the code locally, and you only need to get it once. So if your code takes 30 minutes or even an hour to run, you can still use it if you're sure the algorithm is correct (I've done this a few times).

Getting Ballpark Values

So how do we get an idea for these ballpark values? How do we estimate how an algorithm will perform on some data? Well we can use the library we built last time! All we have to do is select some known operations and time how long they take on varying sizes of inputs. So we can choose some candidate functions for different run times:

  1. O(n) -> Take the sum of a list of integers
  2. O(n log n) -> Sort a list of values
  3. O(n^2) -> Take the intersection of two sets of size n
  4. O(n^3) -> Populate a "cubic" multiplication table.

For each operation we can propose different sizes, and see what the times are. We'll do this with sum first. To start, let's write a function to generate a randomized list of integers, given a range and a size:

import Control.Monad (replicateM)
import System.Random (randomRIO)

randomList :: (Int, Int) -> Int -> IO [Int]
randomList rng n = replicateM n (randomRIO rng)

Now we'll generate random lists for many different sizes:

sizes :: [Int]
sizes = [10, 100, 1000, 10000, 100000, 1000000, 10000000]

main :: IO ()
main = do
  listsWithSizes <- zip sizes <$> mapM (randomList (1, 10000)) sizes
  ...

And now we just use our timeOp function from the last article on each list. We'll print out the time it takes, followed by the size:

import qualified System.IO.Strict as S

timeOp :: (NFData b) => (a -> b) -> a -> S.SIO NominalDiffTime
...

main :: IO ()
main = do
  listsWithSizes <- zip sizes <$> mapM (randomList (1, 10000)) sizes
  putStrLn "Sum"
  forM_ listsWithSizes $ \(sz, lst) -> do
    t <- S.run (timeOp sum lst)
    putStr (show t)
    putStrLn $ " (" <> show sz <> ")"

With these sizes, we get the following output:

Sum
0.000039492s (10)
0.000024104s (100)
0.000031986s (1000)
0.000257356s (10000)
0.001830953s (100000)
0.014561602s (1000000)
0.287729319s (10000000)

And so we can see that if our algorithm is O(n), we can get an answer in less than a second, even for extremely large input sizes! If we went up to 100 million, it would start taking multiple seconds.

Larger Size Examples

Obviously though, O(n) is generally the best case scenario we can get, and most problems don't have such a solution. So we can start trying more expensive solutions. So for O(n log n), we'll do sorting. Now that we've got our template, it's easy to write this code:

import qualified Data.List as L

main :: IO ()
main = do
  listsWithSizes <- zip sizes <$> mapM (randomList (1, 10000)) sizes
  putStrLn "Sorting"
  forM_ listsWithSizes $ \(sz, lst) -> do
    t <- S.run (timeOp L.sort lst)
    putStr (show t)
    putStrLn $ " (" <> show sz <> ")"

And we get our results. With the same sizes, we see much longer times.

0.000011084s (10)
0.000028496s (100)
0.000264609s (1000)
0.004021375s (10000)
0.086308336s (100000)
1.691808146s (1000000)
39.815963511s (10000000)

We can sort 10 million items in under a minute, so an O(n log n) algorithm on such a large input would probably be fine for something like Advent of Code, but not for Hackerrank. If your size is limited to one hundred thousand or smaller, such an algorithm will probably work on Hackerrank.

As we increase the difficulty of the algorithm, we can't include as many test sizes, since we start getting prohibitively long times earlier! Let's try list intersection, which, using the naive algorithm in Data.List, is a quadratic algorithm. We'll only run up to 100000 items:

main = do
  listsWithSizes <- zip sizes <$> mapM (randomList (1, 10000)) (take 5 sizes)
  putStrLn "Intersect"
  forM_ listsWithSizes $ \(sz, lst) -> do
    t <- S.run (timeOp (L.intersect lst) lst')
    putStr (show t)
    putStrLn $ " (" <> show sz <> ")"

Here are the results:

Intersect
0.000009277s (10)
0.000092984s (100)
0.005050921s (1000)
0.396993992s (10000)
10.751072816s (100000)

So 10000 items might be doable. However, by intersecting a list with itself, we're being generous. The problem takes longer if the two lists are disjoint:

main = do
  listsWithSizes <- zip sizes <$> mapM (randomList (1, 10000)) (take 5 sizes)
  altLists <- mapM (randomList (10001, 20000)) (take 5 sizes)
  putStrLn "Intersect"
  forM_ (zip listsWithSizes altLists) $ \((sz, lst), lst') -> do
    t <- S.run (timeOp (L.intersect lst) lst')
    putStr (show t)
    putStrLn $ " (" <> show sz <> ")"

Now it takes longer:

Intersect
0.000009074s (10)
0.000167983s (100)
0.010436625s (1000)
1.109146166s (10000)
215.779740028s (100000)

We're over a second for 10000, but 100000 is now quite prohibitive. Keep in mind though, this problem also has low constant factors. So while 10000 seems doable, a good general rule of thumb is that you should look for something faster than O(n^2) for this problem size.

For one last example, we'll do a cubic algorithm. This function produces a cubic multiplication table:

cubicTable :: [Int] -> [((Int, Int, Int), Int)]
cubicTable ns = [((x, y, z), x * y * z) | x <- ns, y <- ns, z <- ns]

We have to majorly shrink the sizes for this to be even tractable! Let's just try 10, 100 and 200.

main = do
  listsWithSizes <- zip sizes <$> mapM (randomList (1, 10000)) [10, 100, 200]
  putStrLn "Intersect"
  forM_ listsWithSizes $ \(sz, lst) -> do
    t <- S.run (timeOp cubicTable lst
    putStr (show t)
    putStrLn $ " (" <> show sz <> ")"

Already at size 200, we're taking several seconds.

Cubic Table
0.000397915s (10)
0.553257425s (100)
3.549556318s (200)

So if your problem size is larger than 100, you'll really need to find something better than a cubic algorithm.

With slower algorithms, it gets even worse. You may be able to get away with an exhaustive exponential or factorial algorithm, but only at size 10 or below. These kinds of algorithms are rarely tested in programming problems though, since they are the most brute force thing you can do.

Example: Fences Problem

Here's an example of applying this ballpark methodology. I run through this example in more detail in my Testing Series (including benchmarking). But the problem is a Hackerrank question called John and Fences where we are essentially looking to calculate the largest rectangular area we can find amidst fence segments of varying heights.

The solution here is recursive. We find the minimum height plank over an interval. We then either take that height multiplied by the length of the interval, or we recursively look strictly left of the interval and strictly right. We return the maximum of these three options.

At each recursive step, we need to keep finding the minimum over a particular interval. Naively, this takes O(n) time, and we may have to do this step O(n) times, giving an O(n^2) algorithm.

However, we can observe the constraints on the problem, and find that n could be as high as 10000.

This is our clue that O(n^2) may not be fast enough. And indeed, to get full credit, you need to find an algorithm that is O(n log n). You can do this with a Segment Tree, which lets you calculate the minimum over any interval in your fence in logarithmic time. You can read the series for a complete solution.

The power of ballparking is when you realize this fact before you start writing your code, skipping the slow implementation altogether.

Conclusion

For more tips and content on problem solving, make sure to subscribe to our mailing list! There's a special deal coming out next week that you won't want to miss!

Read More
James Bowen James Bowen

Quick and Simple Benchmarking

As a programmer, you have to constantly be asking, "How Good is my Code?". There are a lot of different ways code can be "good" or "bad", but one of the key indicators in many cases is the time performance of your code. How long does it take your code to run on particular inputs?

Haskell isn't generally regarded as one of the fastest languages out there; it's not designed specifically with performance in mind. Languages like C, C++, and, at a higher level, Rust, are specifically geared for writing code that runs faster. But this doesn't mean performance is irrelevant in Haskell. As long as you're writing a real world application that people are going to use, you need to be concerned with the user experience, and no user wants to be waiting around for several seconds on simple operations because you didn't bother to optimize your code.

So we want ways to measure the performance of our code, a process also known as benchmarking. The Criterion Library is one of the main tools Haskell offers to help you do this. I've written about this library before in my series on Testing in Haskell.

But the library is a bit difficult to jump into if you're just a beginner or if you're doing something lightweight. It can also take a lot longer to run than you were expecting, since it runs your code many times to get a statistically significant average. So in this article we're going to go over a more lightweight way to benchmark your code. It turns out though that this is a little trickier in Haskell than in other languages, so let's learn some of these nuances.

Timing in Python

Timing a function in Python is rather straightforward. Here's some code we can use to benchmark an operation op on a particular input_arg:

# time_py.py

import time

def time_op(op, input_arg):
  t1 = time.time()
  result = op(input_arg)
  t2 = time.time()
  return t2 - t1

We get the initial time t1, we perform the operation, and then we get the current time again, and we return the difference.

As a very basic example, we can compare running the sorted function on a group of lists of escalating lengths:

xs = [10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000]
for x in xs:
  a = time_op(sorted, xs)
  print(a)

And running this command, we see that the amount of time increases for the number of elements in the list:

>> python3 time_py.py

2.6226043701171875e-06
1.1920928955078125e-06
5.245208740234375e-06
5.0067901611328125e-05
0.0007715225219726562
0.008384466171264648
0.08414745330810547
0.8780522346496582

Small list sizes don't take very long, and increasing the size of the list 10x increases the time by 10-20x, consistent with the n log n nature of sorting.

Notably, these values don't feel like they reflect the total time in our program. This program takes 5-10 seconds to run on my machine, quite a bit more than the cumulative time shown here. But the values are at least directionally correct, indicating a longer time for larger operations.

First Haskell Attempt

Let's try repeating this code in Haskell. We'll need to work in the IO monad, since the equivalent to time.time() is a call to getCurrentTime. Here's what our code might look like:

import Control.Monad (forM_)
import qualified Data.List as L
import Data.Time.Clock

timeOp :: (a -> b) -> a -> IO NominalDiffTime
timeOp op input = do
  t1 <- getCurrentTime
  let result = op input
  t2 <- getCurrentTime
  return $ diffUTCTime t2 t1

main :: IO ()
main = forM_ sizes $ \x -> do
  let input = [1..x]
  diff <- timeOp L.sort input
  print diff

sizes :: [Int]
sizes =
  [ 10, 100, 1000, 10000
  , 100000, 1000000
  , 10000000, 100000000
  ]

When we run this program, we get some unhelpful results though…

0.000000576s
0.000000767s
0.000000684s
0.000000658s
0.000000751s
0.000000717s
0.00000075s
0.000000678s

All these times appear the same! What's happening?

Laziness at Work

What's happening, of course, is that Haskell's laziness is kicking in. Our program never demonstrates a need to use the values in our sorted lists. So it never actually evaluates these lists and performs the sorting!

So let's change things up a bit by returning our result along with the time, and printing the last value of the result list which will force the evaluation of the whole sorted list.

timeOp :: (a -> b) -> a -> IO (b, NominalDiffTime)
timeOp op input = do
  t1 <- getCurrentTime
  let result = op input
  t2 <- getCurrentTime
  return $ (result, diffUTCTime t2 t1)

main :: IO ()
main = forM_ sizes $ \x -> do
  let input = [1..x]
  (result, diff) <- timeOp L.sort input
  print (diff, last result)

Our result now looks like this:

(0.000000672s,10)
(0.00000061s,100)
(0.000000363s,1000)
(0.00000042s,10000)
(0.000000661s,100000)
(0.000001089s,1000000)
(0.000001185s,10000000)
(0.000001256s,100000000)

Unfortunately, this is still not what we're looking for! Though there's a bit of increase in time, it's nowhere near the 10-20x increase we would like to see.

Haskell's laziness is still biting us. Our program is evaluating the list, but it's not evaluating the list when we would like it too. We want the whole evaluation to occur between our timing calls, but this doesn't happen. The complete list is only evaluated at the end of our loop, when we're about to print it out.

Printing Solutions

This can clue us in to some potential solutions though. Since printing is one of the sure-fire ways to ensure an expression is fully evaluated, we can add a Show constraint and print the value out during the evaluation function. This removes the need to return this result from our function:

timeOp :: (Show b) => (a -> b) -> a -> IO NominalDiffTime
timeOp op input = do
  t1 <- getCurrentTime
  let result = op input
  print result
  t2 <- getCurrentTime
  return $ diffUTCTime t2 t1

main :: IO ()
main = forM_ sizes $ \x -> do
  let input = [1..x]
  diff <- timeOp L.sort input
  print diff

While technically this works, if we have non-trival result values to print (like a ten-million item list), then all the important data gets lost in this spam. Plus, our timing values now include the time it takes to print, which can end up being much larger!

We can get around this spam by instead printing the values to the /dev/null file, which simply discards all this information.

timeOp :: (Show b) => (a -> b) -> a -> IO NominalDiffTime
timeOp op input = do
  t1 <- getCurrentTime
  let result = op input
  withFile "/dev/null" WriteMode (\h -> hPutStrLn h (show result))
  t2 <- getCurrentTime
  return $ diffUTCTime t2 t1

With a sufficiently large amount of print information though, it is still possible that you can crash your program by printing to /dev/null! I had to remove the one hundred million list size, but then I got the following output:

0.000048873s
0.000017368s
0.000058216s
0.00057228s
0.006825458s
0.09262504s
2.521403946s

Now we're getting values that seem accurate! But let's keep trying to improve how this function works!

Since printing large values can still crash our program, perhaps the more sensible choice is to provide a custom printing function instead of relying on the native Show definition. For our example, we can print the "last" value of the list, like we used before. This fully evaluates the result, without printing a large amount of information.

timeOp :: (a -> b) -> a -> (b -> String) -> IO NominalDiffTime
timeOp op input toStr = do
  t1 <- getCurrentTime
  withFile "/dev/null" WriteMode (\h -> hPutStrLn h (toStr result))
  t2 <- getCurrentTime
  return $ diffUTCTime t2 t1

main :: IO ()
main = forM_ sizes $ \x -> do
  let input = [1..x]
  diff <- timeOp L.sort input (show . last)
  print diff

This works, even running the largest size!

0.000153407s
0.000015817s
0.000028232s
0.000279035s
0.002366471s
0.098812637s
1.468213335s
17.183272162s

Notice though that the 10-million size has changed from 2.52 seconds to 1.46 seconds. Percentage-wise, that's actually a large error! There can be a fair amount of variation from run to run. This is why Criterion does many runs. However, our approach is still useful in letting us estimate orders of magnitude on our timing, which is often all you really need in assessing a solution.

Strict IO Library

Now, Haskell does have tools besides printing to fully evaluate expressions and otherwise introduce strictness into our code. One of these is the seq function'. This takes two arguments and simply returns the second, but still strictly evaluates the first.

-- Returns the second argument, but strictly evaluates the first
seq :: a -> b -> b

This is actually used under the hood of the foldl' function. This does a strict left-fold, avoiding the memory leak issues in the regular version of foldl:

foldl' f z [] = z
foldl' f z (x:xs) =
  let z' = z `f` x 
  -- Use 'seq' to evaluate z' early!
  in  seq z' $ foldl' f z' xs

You can also make particular fields on a data type strict, so that they are fully evaluated immediately when an item of that type is constructed. You indicate this with a bang operator in front of the type in the data definition.

-- The fields will be strictly evaluated
data MyStrictType = MyStrictType !Int !String

What I found most useful for our particular use case though is an older library called Strict IO. This provides a number of utilities for performing IO actions strictly instead of lazily. It introduces its own module SIO to do this. We'll use 3 different functions from there:

-- Wrap an IO action taking no arguments and make it strict
wrap0 :: IO a -> SIO a

-- Wrap a pure value, evaluating it strictly
return' :: a -> SIO a

-- Run a strict IO action within the normal IO monad
run :: SIO a -> IO a

With these three pieces, our way forward is simple. We use wrap0 on our calls to getCurrentTime to ensure they happen immediately. Then we use return' to wrap the evaluation of our operation. And finally, we run our timeOp function from the IO monad.

import Control.DeepSeq (NFData)
import qualified System.IO.Strict as S
import qualified System.IO.Strict.Internals as S

timeOp :: (NFData b) => (a -> b) -> a -> S.SIO NominalDiffTime
timeOp op input = do
  t1 <- S.wrap0 getCurrentTime
  result <- S.return' (op input)
  t2 <- S.wrap0 getCurrentTime
  return $ (result, diffUTCTime t2 t1)

main :: IO ()
main = forM_ sizes $ \x -> do
  let input = [1..x]
  diff <- S.run (timeOp L.sort input)
  print diff

And this gives us the timing values we want, with no need for relying on print as an evaluation crutch.

0.00000917s
0.000011727s
0.000049986s
0.000489229s
0.005072207s
0.102675824s
1.722589101s
20.870199397s

This means we don't need any extra "show"-ing functions as arguments to our timing operation. All we need is an NFData constraint on our evaluated type. This exists for most basic types as long as we import Control.DeepSeq from the deepseq package.

As a weakness though, the strict-io package is not part of recent Stack resolver sets, so you'll need it as an extra-dep in stack.yaml. I have not found a more recent or "official" alternative though. Also, the wrap0 function comes from a module labeled as Internals. The library was mainly geared towards basic IO actions like file reading, and not getting the current time, so wrap0 wasn't intended to be part of the public API.

Conclusion

Now that we have a lighter-weight means of benchmarking our code (as compared to Criterion), we can explore ways that this helps us learn about problem solving! Come back next week and we'll see how!

And make sure to subscribe to our mailing list for more Haskell updates! If you're interested in problem solving content, there will be quite a lot of that coming up!

Read More
James Bowen James Bowen

Big News Soon!

If you’ve ever been a follower of this blog, you may have noticed that I have not been posting much this year - basically nothing in the last 5 months or so. This year I experienced the birth of my first child, and subsequently moved 2500 miles with my family. So unfortunately I haven’t been able to prioritize my weekly blog content.

But I haven’t been completely staying away from Haskell! In fact, one of the other items I decided to prioritize is a new, upcoming course - the largest and most detailed I’ve done since Practical Haskell!

It’s not quite ready for publishing yet, but will be coming soon! So keep your eyes on this space!

Read More
James Bowen James Bowen

Spring Sale: Final Day!

Today is the final day to subscribe and get 20% off any of our paid courses! Here are the potential sale prices you might get:

  1. Haskell From Scratch | Our Comprehensive Beginners Course | $79.20
  2. Practical Haskell | Learn about Useful Web Libraries and Project Concepts | 119.20
  3. Making Sense of Monads - Learn Haskell's Key Concept | $23.20
  4. Effectful Haskell - Take a Step further with Monadic Effect Systems | 31.20
  5. Haskell Brain - Combine Tensor Flow and Haskell | 31.20

In addition, you can also take a look at our new free course, Setup.hs. This course teaches you how to set up your basic Haskell toolchain, including IDE integrations!

So if you want that 20% discount code, make sure to subscribe to our mailing list before the end of the day!

Read More
James Bowen James Bowen

This is How to Build Haskell with GNU Make (and why it's worth trying)

In a previous article I showed the GHC commands you need to compile a basic Haskell executable without explicitly using the source files from its dependencies. But when you're writing your own Haskell code, 99% of the time you want to be using a Haskell build system like Stack or Cabal for your compilation needs instead of writing your own GHC commands. (And you can learn how to use Stack in my new free course, Setup.hs).

But part of my motivation for solving that problem was that I wanted to try an interesting experiment:

How can I build my Haskell code using GNU Make?

GNU Make is a generic build system that allows you to specify components of your project, map out their dependencies, and dictate how your build artifacts are generated and run.

I wanted to structure my source code the same way I would in a Cabal-style application, but rely on GNU Make to chain together the necessary GHC compilation commands. I did this to help gain a deeper understanding of how a Haskell build system could work under the hood.

In a Haskell project, we map out our project structure in the .cabal file. When we use GNU Make, our project is mapped out in the makefile. Here's the Makefile we'll ultimately be constructing:

GHC = ~/.ghcup/ghc/9.2.5/bin/ghc
BIN = ./bin
EXE = ${BIN}/hello

LIB_DIR = ${BIN}/lib
SRCS = $(wildcard src/*.hs)
LIB_OBJS = $(wildcard ${LIB_DIR}/*.o)

library: ${SRCS}
  @mkdir -p ${LIB_DIR}
  @${GHC} ${SRCS} -hidir ${LIB_DIR} -odir ${LIB_DIR}

generate_run: app/Main.hs library
  @mkdir -p ${BIN}
  @cp ${LIB_DIR}/*.hi ${BIN}
  @${GHC} -i${BIN} -c app/Main.hs -hidir ${BIN} -odir ${BIN}
  @${GHC} ${BIN}/Main.o ${LIB_OBJS} -o ${EXE}

run: generate_run
  @${EXE}

TEST_DIR = ${BIN}/test
TEST_EXE = ${TEST_DIR}/run_test

generate_test: test/Spec.hs library
  @mkdir -p ${TEST_DIR}
  @cp ${LIB_DIR}/*.hi ${TEST_DIR}
  @${GHC} -i${TEST_DIR} -c test/Spec.hs -hidir ${TEST_DIR} -odir ${TEST_DIR}
  @${GHC} ${TEST_DIR}/Main.o ${LIB_OBJS} -o ${TEST_EXE}

test: generate_test
  @${TEST_EXE}

clean:
  rm -rf ./bin

Over the course of this article, we'll build up this solution piece-by-piece. But first, let's understand exactly what Haskell code we're trying to build.

Our Source Code

We want to lay out our files like this, separating our source code (/src directory), from our executable code (/app) and our testing code (/test):

.
├── app
│   └── Main.hs
├── makefile
├── src
│   ├── MyFunction.hs
│   └── TryStrings.hs
└── test
    └── Spec.hs

Here's the source code for our three primary files:

-- src/MyStrings.hs
module MyStrings where

greeting :: String
greeting = "Hello"

-- src/MyFunction.hs
module MyFunction where

modifyString :: String -> String
modifyString x = base <> " " <> base
  where
    base = tail x <> [head x]

-- app/Main.hs
module Main where

import MyStrings (greeting)
import MyFunction (modifyString)

main :: IO ()
main = putStrLn (modifyString greeting)

And here's what our simple "Spec" test looks like. It doesn't use a testing library, it just prints different messages depending on whether or not we get the expected output from modifyString.

-- test/Spec.hs
module Main where

import MyFunction (modifyString)

main :: IO ()
main = do
  test "abcd" "bcda bcda"
  test "Hello" "elloH elloH"

test :: String -> String -> IO ()
test input expected = do
  let actual = modifyString input
  putStr $ "Testing case: " <> input <> ": "
  if expected /= actual
    then putStrLn $ "Incorrect result! Expected: " <> expected <> " Actual: " <> actual
    else putStrLn "Correct!"

The files are laid out the way we would expect for a basic Haskell application. We have our "library" code in the src directory. We have a single "executable" in the app directory. And we have a single "test suite" in the test directory. Instead of having a Project.cabal file at the root of our project, we'll have our makefile. (At the end, we'll actually compare our Makefile with an equivalent .cabal file).

But what does the Makefile look like? Well it would be overwhelming to construct it all at once. Let's begin slowly by treating our executable as a single file application.

Running a Single File Application

So for now, let's adjust Main.hs so it's an independent file without any dependencies on our library modules:

-- app/Main.hs
module Main where

main :: IO ()
main = putStrLn "Hello"

The simplest way to run this file is runghc. So let's create our first makefile rule that will do this. A rule has a name, a set of prerequisites, and then a set of commands to run. We'll call our rule run, and have it use runghc on app/Main.hs. We'll also include the app/Main.hs as a prerequisite, since the rule will run differently if that file changes.

run: app/Main.hs
  runghc app/Main.hs

And now we can run this run using make run, and it will work!

$ make run
runghc app/Main.hs
Hello

Notice that it prints the command we're running. We can change this by using the @ symbol in front of the command in our Makefile. We'll do this with almost all our commands:

run: app/Main.hs
  @runghc app/Main.hs

And it now runs our application without printing the command.

Using runghc is convenient, but if we want to use dependencies from different directories, we'll eventually need to use multiple stages of compilation. So we'll want to create two distinct rules. One that generates the executable using ghc, and another that actually runs the generated executable.

So let's create a generate_run rule that will produce the build artifacts, and then run will use them.

generate_run: app/Main.hs
  @ghc app/Main.hs

run: generate_run
  @./app/Main

Notice that run can depend on generate_run as a prerequisite, instead of the source file now. This also generates three build artifacts directly in our app directory: the interface file Main.hi, the object file Main.o, and the executable Main.

It's bad practice to mix build artifacts with source files, so let's use GHC's arguments (-hidir, -odir and -o) to store these artifacts in their own directory called bin.

generate_run: app/Main.hs
  @mkdir -p ./bin
  @ghc app/Main.hs -hidir ./bin -odir ./bin -o ./bin/hello

run: generate_run
  @./bin/hello

We can then add a third rule to "clean" our project. This would remove all binary files so that we can do a fresh recompilation if we want.

clean:
  rm -rf ./bin

For one final flourish in this section, we can use some variables. We can make one for the GHC compiler, referencing its absolute path instead of a symlink. This would make it easy to switch out the version if we wanted. We'll also add a variable for our bin directory and the hello executable, since these are used multiple times.

# Could easily switch versions if desired
# e.g. GHC = ~/.ghcup/ghc/9.4.4/bin/ghc
GHC = ~/.ghcup/ghc/9.2.5/bin/ghc
BIN = ./bin
EXE = ${BIN}/hello

generate_run: app/Main.hs
  @mkdir -p ${BIN}
  @${GHC} app/Main.hs -hidir ${BIN} -odir ${BIN} -o ${EXE}

run: generate_run
  @${EXE}

clean:
  rm -rf ./bin

And all this still works as expected!

$ generate_run
[1 of 1] Compiling Main (app/Main.hs, bin/Main.o)
Linking ./bin/hello
$ make run
Hello
$ make clean
rm -rf ./bin

So we have some basic rules for our executable. But remember our goal is to depend on a library. So let's add a new rule to generate the library objects.

Generating a Library

For this step, we would like to compile src/MyStrings.hs and src/MyFunction.hs. Each of these will generate an interface file (.hi) and an object file (.o). We want to place these artifacts in a specific library directory within our bin folder.

We'll do this by means of a new rule, library, which will use our two source files as its prerequisites. It will start by creating the library artifacts directory:

LIB_DIR = ${BIN}/lib

library: src/MyStrings.hs src/MyFunction.hs
  @mkdir -p ${LIB_DIR}
  ...

But now the only thing we have to do is use GHC on both of our source files, using LIB_DIR as the destination point.

LIB_DIR = ${BIN}/lib

library: src/MyStrings.hs src/MyFunction.hs
  @mkdir -p ${LIB_DIR}
  @ghc src/MyStrings.hs src/MyFunction.hs -hidir ${LIB_DIR} -odir ${LIB_DIR}

Now when we run the target, we'll see that it produces the desired files:

$ make library
$ ls ./bin/lib
MyFunction.hi MyFunction.o MyStrings.hi MyStrings.o

Right now though, if we added a new source file, we'd have to modify the rule in two places. We can fix this by adding a variable that uses wildcard to match all our source files in the directory (src/*.hs).

LIB_DIR = ${BIN}/lib
SRCS = $(wildcard src/*.hs)

library: ${SRCS}
  @mkdir -p ${LIB_DIR}
  @${GHC} ${SRCS} -hidir ${LIB_DIR} -odir ${LIB_DIR}

While we're learning about wildcard, let's make another variable to capture all the produced object files. We'll use this in the next section.

LIB_OBJS = $(wildcard ${LIB_DIR}/*.o)

So great! We're producing our library artifacts. How do we use them?

Linking the Library

In this section, we'll link our library code with our executable. We'll begin by assuming our Main file has gone back to its original form with imports, instead of the simplified form:

-- app/Main.hs
module Main where

import MyStrings (greeting)
import MyFunction (modifyString)

main :: IO ()
main = putStrLn (modifyString greeting)

We when try to generate_run, compilation fails because it cannot find the modules we're trying to import:

$ make generate_run
...
Could not find module 'MyStrings'
...
Could not find module 'MyFunction'

As we went over in the previous article, the general approach to compiling the Main module with its dependencies has two steps:

1. Compile with the -c option (to stop before the linking stage) using -i to point to a directory containing the interface files.

2. Compile the generated Main.o object file together with the library .o files to produce the executable.

So we'll be modifying our generate_main rule with some extra steps. First of course, it must now depend on the library rule. Then our first new command will be to copy the .hi files from the lib directory into the top-level bin directory.

generate_run: app/Main.hs library
  @mkdir -p ./bin
  @cp ${LIB_DIR}/*.hi ${BIN}
  ...

We could have avoided this step by generating the library artifacts in bin directly. I wanted to have a separate location for all of them though. And while there may be some way to direct the next command to find the headers in the lib directory, none of the obvious ways worked for me.

Regardless, our next step will be to modify the ghc call in this rule to use the -c and -i arguments. The rest stays the same:

generate_run: app/Main.hs library
  @mkdir -p ./bin
  @cp ${LIB_DIR}/*.hi ${BIN}
  @${GHC} -i${BIN} -c app/Main.hs -hidir ${BIN} -odir ${BIN}
  ...

Finally, we invoke our final ghc call, linking the .o files together. At the command line, this would look like:

$ ghc ./bin/Main.o ./bin/lib/MyStrings.o ./bin/lib/MyFunction.o -o ./bin/hello

Recalling our LIB_OBJS variable from up above, we can fill in the rule in our Makefile like so:

LIB_OBJS = $(wildcard ${LIB_DIR}/*.o)

generate_run: app/Main.hs library
  @mkdir -p ./bin
  @cp ${LIB_DIR}/*.hi ${BIN}
  @${GHC} -i${BIN} -c app/Main.hs -hidir ${BIN} -odir ${BIN}
  @${GHC} ${BIN}/Main.o ${LIB_OBJS} -o ${EXE}

And now our program will work as expected! We can clean it and jump straight to the make run rule, since this will run its prerequisites make library and make generate_run automatically.

$ make clean
rm -rf ./bin
$ make run
[1 of 2] Compiling MyFunction (src/MyFunction.hs, bin/lib/MyFunction.o)
[2 of 2] Compiling MyStrings (src/MyStrings.hs, bin/lib/MyStrings.o)
elloH elloH

So we've covered the library and an executable, but most Haskell projects have at least one test suite. So how would we implement that?

Adding a Test Suite

Well, a test suite is basically just a special executable. So we'll make another pair of rules, generate_test and test, that will mimic generate_run and run. Very little changes, except that we'll make another special directory within bin for our test artifacts.

TEST_DIR = ${BIN}/test
TEST_EXE = ${TEST_DIR}/run_test

generate_test: test/Spec.hs library
  @mkdir -p ${TEST_DIR}
  @cp ${LIB_DIR}/*.hi ${TEST_DIR}
  @${GHC} -i${TEST_DIR} -c test/Spec.hs -hidir ${TEST_DIR} -odir ${TEST_DIR}
  @${GHC} ${TEST_DIR}/Main.o ${LIB_OBJS} -o ${TEST_EXE}

test: generate_test
  @${TEST_EXE}

Of note here is that at the final step, we're still using Main.o instead of Spec.o. Since it's an executable module, it also compiles as Main.

But we can then use this to run our tests!

$ make clean
$ make test
[1 of 2] Compiling MyFunction (src/MyFunction.hs, bin/lib/MyFunction.o)
[2 of 2] Compiling MyStrings (src/MyStrings.hs, bin/lib/MyStrings.o)
Testing case: abcd: Correct!
Testing case: Hello: Correct!

So now we have all the different components we'd expect in a normal Haskell project. So it's interesting to consider how our makefile definition would compare against an equivalent .cabal file for this project.

Comparing to a Cabal File

Suppose we want to call our project HaskellMake and store its configuration in HaskellMake.cabal. We'd start our Cabal file with four metadata lines:

cabal-version: 1.12
name: HaskellMake
version: 0.1.0.0
build-type: Simple

Now our library would expose its two modules, using the src directory as its root. The only "dependency" is the Haskell base packages. Finally, default-language is a required field.

library
  exposed-modules:
      MyStrings
    , MyFunction
  hs-source-dirs:
      src
  build-depends:
      base
  default-language: Haskell2010

The executable would similarly describe where the files are located and state a base dependency as well as a dependency on the library itself.

executable hello
  main-is: Main.hs
  hs-source-dirs:
      app
  build-depends:
      base
    , HaskellMake
  default-language: Haskell2010

Finally, our test suite would look very similar to the executable, just with a different directory and filename.

test-suite make-test
  type: exitcode-stdio-1.0
  main-is: Spec.hs
  hs-source-dirs:
      test
  build-depends:
      base
    , HaskellMake
  default-language: Haskell2010

And, if we add a bit more boilerplate, we could actually then compile our code with Stack! First we need a stack.yaml specifying the resolver and the package location:

# stack.yaml
resolver: lts-20.12
packages:
  - .

Then we need Setup.hs:

-- Setup.hs

import Distribution.Simple
main = defaultMain

And now we could actually run our code!

$ stack build
$ stack exec hello
elloH elloH
$ stack test
Testing case: abcd: Correct!
Testing case: Hello: Correct!

Now observant viewers will note that we don't use any Hackage dependencies in our code - only base, which GHC always knows how to find. It would require a lot of work for us to replicate dependency management. We could download a .zip file with curl easily enough, but tracking the whole dependency tree would be extremely difficult.

And indeed, many engineers have spent a lot of time getting this process to work well with Stack and Cabal! So while it would be a useful exercise to try to do this manually with a simple dependency, I'll leave that for a future article.

When comparing the two file definitions, Undoubtedly, the .cabal definition is more concise and human readable, but it hides a lot of implementation details. Most of the time, this is a good thing! This is exactly what we expect from tools in general; they should allow us to work more quickly without having to worry about details.

But there are times where we might, on our time, want to occasionally try out a more adventurous path like we've done in this article that avoids relying too much on modern tooling. So why was this article a "useful exercise"™?

What's the Point?

So obviously, there's no chance this Makefile approach is suddenly going to supplant Cabal and Stack for building Haskell projects. Stack and Cabal are "better" for Haskell precisely because they account for the intricacies of Haskell development. In fact, by their design, GHC and Cabal both already incorporate some key ideas and features from GNU Make, especially with avoiding re-work through dependency calculation.

But there's a lot you can learn by trying this kind of exercise.

First of all, we learned about GNU Make. This tool can be very useful if you're constructing a program that combines components from different languages and systems. You could even build your Haskell code with Stack, but combine it with something else in a makefile.

A case and point for this is my recent work with Haskell and AWS. The commands for creating a docker image, authenticating to AWS and deploying it are lengthy and difficult to remember. A makefile can, at the very least, serve as a rudimentary aliasing tool. You could run make deploy and have it automatically rebuild your changes into a Docker image and deploy that to your server.

But beyond this, it's important to take time to deliberately understand how our tools work. Stack and Cabal are great tools. But if they seem like black magic to you, then it can be a good idea to spend some time understanding what is happening at an internal level - like how GHC is being used under the hood to create our build artifacts.

Most of the fun in programming comes in effectively applying tools to create useful programs quickly. But if you ever want to make good tools in the future, you have to understand what's happening at a deeper level! At least a couple times a year, you should strive to go one level deeper in your understanding of your programming stack.

For me this time, it was understanding just a little more about GHC. Next time I might dive into dependency management, or a different topic like the internal workings of Haskell data structures. These kinds of topics might not seem immediately applicable in your day job, but you'll be surprised at the times when deeper knowledge will pay dividends for you.

Getting Better at Haskell

But enough philosophizing. If you're completely new to Haskell, going "one level deeper" might simply mean the practical ability to use these tools at a basic level. If your knowledge is more intermediate, you might want to explore ways to improve your development process. These thoughts can lead to questions like:

1. What's the best way to set up my Haskell toolchain in 2023?

2. How do I get more efficient and effective as a Haskell programmer?

You can answer these questions by signing up for my new free course Setup.hs! This will teach how to install your Haskell toolchain with GHCup and get you started running and testing your code.

Best of all, it will teach you how to use the Haskell Language Server to get code hints in your editor, which can massively increase your rate of progress. You can read more about the course in this blog post.

If you subscribe to our monthly newsletter, you'll also get an extra bonus - a 20% discount on any of our paid courses. This offer is good for two more weeks (until May 1) so don't miss out!

Read More
James Bowen James Bowen

How to Make ChatGPT Go Around in Circles (with GHC and Haskell)

As part of my research for the recently released (and free!) Setup.hs course, I wanted to explore the different kinds of compilation commands you can run with GHC outside the context of a build system.

I wanted to know…

Can I use GHC to compile a Haskell module without its dependent source files?

The answer, obviously, should be yes. When you use Stack or Cabal to get dependencies from Hackage, you aren't downloading and recompiling all the source files for those libraries.

And I eventually managed to do it. It doesn't seem hard once you know the commands already:

$ mkdir bin
$ ghc src/MyStrings.hs src/MyFunction.hs -hidir ./bin -odir ./bin
$ ghc -c app/Main.hs -i./bin -hidir ./bin -odir ./bin
$ ghc bin/Main.o ./bin/MyStrings.o ./bin/MyFunction.o -o ./bin/hello
$ ./bin/hello
...

But, being unfamiliar with the inner workings of GHC, I struggled for a while to find this exact combination of commands, especially with their arguments.

So, like I did last week, I turned to the latest tool in the developer's toolkit: ChatGPT. But once again, everyone's new favorite pair programmer had some struggles of its own on the topic! So let's start by defining exactly the problem we're trying to solve.

The Setup

Let's start with a quick look at our initial file tree.

├── app
│   └── Main.hs
└── src
    ├── MyFunction.hs
    └── MyStrings.hs

This is meant to look the way I would organize my code in a Stack project. We have two "library" modules in the src directory, and one executable module in the app directory that will depend on the library modules. These files are all very simple:

-- src/MyStrings.hs
module MyStrings where

greeting :: String
greeting = "Hello"

-- src/MyFunction.hs
module MyFunction where

modifyString :: String -> String
modifyString x = base <> " " <> base
  where
    base = tail x <> [head x]

-- app/Main.hs
module Main where

import MyStrings (greeting)
import MyFunction (modifyString)

main :: IO ()
main = putStrLn (modifyString greeting)

Our goal is to compile and run the executable with two constraints:

1. Use only GHC (no Stack or Cabal involved)

2. Compile the library separately, so that the Main module could be compiled using only the library's build artifacts, and not the source files.

Trying to Compile

Now, there are two easy ways to compile this code if we're willing to violate our constraints (particularly the second one). If all three files are in the same directory, GHC can immediately find the modules we're importing, so we can just call ghc Main.hs.

└── src
    ├── Main.hs
    ├── MyFunction.hs
    └── MyStrings.hs

...

$ cd src
$ ghc Main.hs
$ ./Main
elloH elloH

If we keep the original file structure, it's still pretty easy if we're willing to violate constraint 2 by using the library source files in the ghc command:

$ ghc app/Main.hs src/MyStrings.hs src/MyFunction.hs
$ ./Main
elloH elloH

However, with my extra constraint of not using the library source files directly, things got a little more difficult. I originally thought "Hey, let's just include the build artifacts in the GHC call." Each library module generates an interface file (.hi) and an object file (.o). So let's try including those in the GHC call:

$ ghc src/MyStrings.hs src/MyFunction.hs
$ ls -l src/
MyFunction.hi MyFunction.hs MyFunction.o MyStrings.hi MyStrings.hs MyStrings.o
$ ghc app/Main.hs src/MyFunction.hi src/MyFunction.o src/MyStrings.hi src/MyStrings.o
...
Could not find module 'MyStrings'
...
Could not find module 'MyFunction'

As we can see, it doesn't work! Even though we're including the build artifacts as direct arguments of ghc, this doesn't help our main module to find them.

So, feeling a little too lazy to dig deep through GHC documentation, and feeling like this problem might have too many arbitrary constraints to phrase in a Google query, I decided to try our aforementioned new favorite tool.

Enter: ChatGPT

The First Session

(Conversation has been abbreviated for relevance)

Me

How do I link existing .o files with a new source module using the 'ghc' command?

Chat GPT

To link existing object (.o) files with a new source module using the ghc command, you need to include the object files in the list of input files passed to the ghc command.

Assuming you have an existing .o file called example.o and a new source module called MyModule.hs, you can link them together into an executable program using the following command:

$ ghc -o MyProgram MyModule.hs example.o

This command tells ghc to compile and link the MyModule.hs source file with the example.o object file and create an executable program called MyProgram. The -o flag specifies the output file name.

Me

Is it possible to use a ghc command to link a Haskell object file (.o) with a Haskell source file that imports the other module?

Chat GPT

Yes, it is possible to use the ghc command to link a Haskell object file (.o) with a Haskell source file that imports the other module. Here's an example:

Suppose you have two files, MyModule.hs and OtherModule.o, and MyModule.hs imports a module that is defined in OtherModule.o. To link the two files into an executable program, you can use the following command:

$ ghc -o MyProgram MyModule.hs OtherModule.o

This command tells ghc to compile and link the MyModule.hs source file with the OtherModule.o object file, and create an executable program called MyProgram.

Me

Hmmm I don't think this works. I created the following file OtherModule.hs:

module OtherModule where

greeting :: String
greeting = "Hello"

And I'm trying to include it in another file:

module Main where

import OtherModule (greeting)

main :: IO ()
main = putStrLn greeting

I started by compiling OtherModule:

$ ghc src/OtherModule.hs

Then I moved OtherModule.o to the app directory (with MyModule.hs), and tried running this command:

$ cd app
$ ghc -o MyProgram MyModule.hs OtherModule.o

I get the error:

"Could not find module 'OtherModule'"

Chat GPT

I apologize for the confusion in my previous answer. You are correct that simply linking the object file with the executable does not work in this case.

When GHC compiles a module, it generates an interface file (.hi) that contains information about the module's name, type signatures, and other details. The compiled object file (.o) contains the compiled machine code for the functions defined in the module.

When you try to import a module in a source file, GHC looks for the interface file (OtherModule.hi) to find information about the module's name and type signatures. If the interface file is not found, GHC will raise an error.

In your case, when you compile OtherModule.hs, GHC generates an interface file called OtherModule.hi. To compile MyModule.hs, GHC needs to be able to find OtherModule.hi in order to resolve the import of OtherModule. Simply linking the OtherModule.o object file is not enough.

To make this work, you need to compile OtherModule.hs and MyModule.hs together, like this:

ghc -o MyProgram MyModule.hs src/OtherModule.hs

This command tells GHC to compile MyModule.hs and OtherModule.hs together, and create an executable program called MyProgram.

I hope this clears up the confusion. Again, I apologize for the confusion in my previous answer.

Me

Ah, so I need to include both source files, and it can't be done by linking an object file with another source file?

Chat GPT

Yes, that's correct. When you import a module in a source file, GHC needs to be able to find the interface file (.hi) for that module in order to resolve the import. If the interface file is not found, GHC will raise an error.

When you compile a source file with GHC, it generates both an object file (.o) and an interface file (.hi). The object file contains the compiled machine code for the functions defined in the module, while the interface file contains information about the module's name, type signatures, and other details.

When you compile a source file that imports another module, GHC needs to be able to find the interface file for that module in order to compile the importing module. If the interface file is not found, GHC will raise an error.

So in your case, you need to compile both MyModule.hs and OtherModule.hs together so that GHC can generate the necessary interface files and link the resulting object files together to produce the executable program.

It is not possible to directly link an object file with a source file that imports it, as GHC needs the interface file to be present in order to compile the importing module.

Analysis

Suspiciously, ChatGPT suggests a less detailed command than the previous one that failed for me!

$ ghc -o MyProgram MyModule.hs OtherModule.o

It includes only the object file, instead of the object and interface file. And, of course this fails, like I showed above.

Now there's an interesting statement that will come into play later, though I didn't realize it!

Yes, that's correct. When you import a module in a source file, GHC needs to be able to find the interface file (.hi) for that module in order to resolve the import. If the interface file is not found, GHC will raise an error.

I assumed that to "find" the interface file meant providing it on the command line (as in my original command). So with that idea having failed, I fixated on one part of the final statement:

It is not possible to directly link an object file with a source file that imports it...

And concluded that getting the right information from ChatGPT might be a lost cause. It seemed like it was sure this was possible, and then just as sure that it was impossible. But details matter, as we'll see!

Working Around It

So feeling slightly defeated, I decided to poke around the areas of the code where Cabal actually saves packages to see if there was a pattern I could follow. And I managed to cobble together a convoluted workaround.

What I observed was that GHC could use any module from a library in the Cabal package database. On my system's setup, this could be found in the .ghcup cache:

$ cd ~/.ghcup/ghc/9.2.5/lib/ghc-9.2.5/
$ ls
array-0.5.4.0/
base-4.16.4.0/
...
package.conf.d/
$ ls package.conf.d
array-0.5.4.0.conf
base-4.16.4.0.conf
...

And indeed, the package directories did not have source files, only various forms of compiled object and interface files. And each package had a corresponding .conf file in the package.conf.d directory.

So I could...

  1. Make my own directory in the package database (MyPackage-0.1.0.0)
  2. Place the .hi and .o files in there
  3. Create a phony MyPackage-0.1.0.0.conf and put it in package.conf.d. (Base it on an existing .conf file but make some substitutions)

And this...technically...works. But it makes me feel a little bit like:

But manually manipulating a package database wasn't the solution I had in mind though. Surely there's something easier, right?

The Second Session

After a week of sitting on this, I decided to come back to ChatGPT and see if it had any other ideas. I decided to frame my question in a different way, stating a lot more details up front. Lo and behold, it had a different answer this time!

Me

Suppose I have a Haskell module called MyStrings:

module MyStrings where

greeting :: String
greeting = "Hello"

I can compile this module:

$ ghc MyStrings.hs

This generates an interface file, MyStrings.hi, and an object file, MyStrings.o.

Now suppose I have an executable file, Main.hs, that imports MyStrings:

module Main where

import MyStrings (greeting)

main :: IO ()
main = putStrLn greeting

Can I use ghc to compile this Main file by using the .hi and .o files, without including the imported source file, MyStrings.hs in the command?

Chat

Yes, you can use the .hi and .o files generated by compiling MyStrings.hs to compile Main.hs without including MyStrings.hs in the command. To do so, you can use the -c option to tell GHC to compile Main.hs to an object file, and then use the -o option to link the object files together into an executable:

$ ghc -c Main.hs
$ ghc Main.o MyStrings.o -o myprogram

The first command compiles Main.hs to an object file Main.o, without attempting to link it. The second command links Main.o with MyStrings.o and creates an executable called myprogram. The .hi files are not needed for linking, so you don't need to include them in the command.

Note that in order for this to work, the interface file MyStrings.hi must be present in the same directory as Main.hs when you compile it. If you move MyStrings.hi to a different directory, you'll need to use the -i option to tell GHC where to find it:

$ ghc -i/path/to/MyStrings -c Main.hs
$ ghc Main.o /path/to/MyStrings/MyStrings.o -o myprogram

Here, -i/path/to/MyStrings tells GHC to look for MyStrings.hi in the directory /path/to/MyStrings.

Analysis

So there are two key ingredients that weren't mentioned at all in our first pass: the -c option and the -i option.

Using -c performs compilation without linking. With this option, GHC is able to produce an object file for our Main module without needing its dependent objects. So .o files are unnecessary in this stage.

We still need the .hi files though. But instead of providing them on the command line, we use the -i argument to supply them. It's an odd argument, because we put the path right after the i, without any spacing.

After we're done with the first phase, then we can link all our object files together.

Solving It

And sure enough, this approach works!

$ ghc src/MyStrings.hs src/MyFunction.hs
$ ghc -c app/Main.hs -i./src
$ ghc app/Main.o ./src/MyStrings.o ./src/MyFunction.o -o hello
$ ./hello
elloH elloH

And if we want to be a little cleaner about putting our artifacts in a single location, we can use the -hidir and -odir arguments for storing everything in a bin directory.

$ mkdir bin
$ ghc src/MyStrings.hs src/MyFunction.hs -hidir ./bin -odir ./bin
$ ghc -c app/Main.hs -i./bin -hidir ./bin -odir ./bin
$ ghc bin/Main.o ./bin/MyStrings.o ./bin/MyFunction.o -o ./bin/hello
$ ./bin/hello
elloH elloH

And we're done! Our program is compiling as we wanted it to, without our "Main" compilation command directly using the library source files.

Conclusion

So with that fun little adventure concluded, what can we learn from this? Well first of all, prompts matter a great deal when you're using a Chatbot. The more detailed your prompt, and the more you spell out your assumptions, the more likely you'll get the answer you're looking for. My second prompt was waaay more detailed than my first prompt, and the solution was much better as a result.

But a more pertinent lesson for Haskellers might be that using GHC by itself can be a big pain. So if you're a beginner, you might be asking:

What's the normal way to build Haskell Code?

You can learn all about building and running your Haskell code in our new free course, Setup.hs. This course will teach you the easy steps to set up your Haskell toolchain, and show you how to build and run your code using Stack, Haskell's most popular build system. You'll even learn how to get Haskell integrations in several popular code editors so you can learn from your mistakes much more quickly. Learn more about it on the course page.

And if you subscribe to our monthly newsletter, you'll get a code for 20% off any of our paid courses until May 1st! So don't miss out on that offer!

Read More
James Bowen James Bowen

How to Stump ChatGPT in 1 Easy Step (Hint: Ask a Windows Question)

While doing some research for Setup.hs, my new free course, I confronted a fact of Haskell development with Stack.

The commands for running individual test suites are cumbersome.

For example, if you have a parse-tests suite in your Project.cabal file, you can run it (without running other suites) like so:

$ stack build Project:test:parse-tests

The structure isn't hard once you're used to it, but remembering the exact names of your test suites isn't always easy on larger projects. This leads to the question:

How can I create aliases for 'stack build' commands?

On Unix-based systems, it's quite easy. Just slap something like this in your .bashrc (or equivalent) and you're good to go:

alias ppt="stack build Project:test:parse-tests"

...

$ source ~/.bashrc
$ ppt
(Tests run)

Windows though is, as always, a special beast. And much as I generally prefer to use Windows Subsystem for Linux, I wanted to help any users who might be working with a pure Windows setup. This article will focus on:

How do I enable 'stack' aliases in Windows PowerShell?

Now, when I lay out the code I put in my PowerShell profile, it looks a little dense but fairly innocuous:

Function ProjectParseTest { Invoke-Expression "stack build Project:test:parse-tests" }
New-Alias -Name qpt -Value ProjectParseTest

But discovering the exact formula for getting all this to work was quite difficult. And as we'll see, in fact, even ChatGPT, the new kid on the block, struggles with this question. After a brief while struggling through Windows documentation, I turned to the chat bot. Let's review the first question:

ChatGPT Question 1

Me

How can I make command aliases in Windows 11 Powershell?

Chat GPT

You can create command aliases in Windows 11 PowerShell by using the New-Alias cmdlet. Here's an example that creates an alias ls for the Get-ChildItem cmdlet:

New-Alias -Name ls -Value Get-ChildItem

This will create an alias named ls that can be used just like the Get-ChildItem cmdlet. For example:

ls .

This will list the contents of the current directory, just as if you had typed Get-ChildItem ..

To persist your alias across sessions, you need to add it to your PowerShell profile. To do that, first determine the path to your profile by running the following command:

$ $profile

If the profile does not exist yet, you can create it by running the following command:

New-Item -Type File -Path $profile -Force

Then, open the profile in your favorite text editor and add the New-Alias command. Save the file and restart your PowerShell session. Your alias will be available every time you start PowerShell.

Analysis

Now, everything Chat has told me is technically true. The command it gives successfully creates an alias. Notably though, if you want to try this yourself, you should use a different alias, as Powershell already includes the alias ls in most recent Windows version (previously you would need to use the alias dir to get that effect).

Additionally, I found that the PROFILE variable was indeed set, but the file did not exist. In fact, the directory it was supposed to be in (Documents/WindowsPowerShell) did not exist either! So you'll have to make that before creating the file. But placing an alias command in there will persist the alias across sessions.

However, notice that this doesn't quite answer our question, because the subject of the alias was an established Windows expression (Get-ChildItem, also know as a cmdlet), rather than a terminal command we would write.

Let's suppose we just want to alias sb=stack build. It does not work to simply use a string for the command:

There are some clues to our eventual solution in the error message though. It tells us we need a cmdlet, a function, a script file, or an operable program.

Now stack by itself is an operable program. So we could actually do the following:

PS > New-Alias -Name st -Value "stack"
PS > st
(stack usage output)

But we want to provide arguments as part of the alias. So let's see if ChatGPT knows how. We'll see that its answers are not quite right.

ChatGPT Question 2

Me

How do I make the alias run a command, like "stack build"

Chat GPT

You can create an alias that runs a command by using the Invoke-Expression cmdlet to evaluate a string as a PowerShell expression. Here's an example that creates an alias sb for the stack build command:

New-Alias -Name sb -Value { Invoke-Expression "stack build" }

This will create an alias named sb that can be used to run the stack build command. For example:

sb

This will run the stack build command, just as if you had typed it into the PowerShell prompt.

As with the previous example, you can add this alias to your PowerShell profile to persist it across sessions.

Me

I get an error "Cannot evaluate parameter 'Value' because its argument is specified as a script block and there is no input"

How do I specify input in this alias?

Chat GPT

To specify input in a script block alias, you can use the Param keyword to declare input parameters and then pass the input to the underlying command. Here's an example that creates an alias sb for the stack build command, allowing you to specify the target to build as input:

New-Alias -Name sb -Value {
    Param ($target)
    Invoke-Expression "stack build $target"
}

This will create an alias named sb that can be used to run the stack build command with a specified target. For example:

sb my-target

This will run the stack build my-target command, just as if you had typed it into the PowerShell prompt.

As with the previous examples, you can add this alias to your PowerShell profile to persist it across sessions.

Analysis

Both of the proposed approaches fail! I commented on the issue explicitly after the first response, and then got the same error running the second approach ("Cannot evaluate parameter 'Value' because its argument is specified as a script block..."). So I gave up on my interrogation and started searching for documentation again.

Solving the Problem

I don't remember exactly where I found the solution. I think it involved cobbling together points from multiple Stack Overflow posts. Windows documentation was not very helpful. Let's recall the error message from earlier:

The term 'stack build' is not recognized as the name of a cmdlet, function, script file, or operable program.

We can't make an official Windows PS cmdlet out of our program, nor can we make an operable program with the arguments we want. We could make a script file, but running scripts from PowerShell is surprisingly difficult (there are some extra steps with permissions). The answer is that we can make a function that our alias will refer to.

This function will, incidentally, use the Invoke-Expression idea ChatGPT recommended as well, just not directly as the alias value!

The following code will go in our $PROFILE. First, we make a function that invokes our expression. We can give this function an arbitrary name, but I used a capitalized name (ProjectParseTest) to distinguish from any potential aliases.

Function ProjectParseTest { Invoke-Expression "stack build Project:test:parse-tests" }

Now we can use this function as the object of a New-Alias call! So we use the command ChatGPT suggested, just substituting our function for the -Value instead of providing a raw Invoke-Expression command.

Function ProjectParseTest { Invoke-Expression "stack build Project:test:parse-tests" }
New-Alias -Name ppt -Value ProjectParseTest

This alias succeeds now, and by putting this in my PowerShell profile, I can persist the alias across sessions!

$ ppt
(Test runs)

Haskell on Windows

Now aliases are just one small piece of the Haskell puzzle. So if you're trying to get started with Haskell, but don't have a Mac, and aren't familiar with Linux, you might want to know:

How do I set up my Haskell toolchain on Windows?

My new free course Setup.hs goes over all the basics of setting up your Haskell toolchain, including how to get started with code hints in three of the most popular editors out there. Plus, every lecture includes a walkthrough video for Windows* so you can learn what kinds of odd quirks you might come across! You can read more about the course in this article.

Plus, if you subscribe to our monthly newsletter, you'll also get a 20% discount code for all our paid courses that is good until May 1! So don't miss out on your chance to learn about Haskell!

*Separate walkthrough videos for MacOS and Linux are also included.

Read More
James Bowen James Bowen

New Free Course: Setup.hs!

You can read all the Haskell articles you want, but unless you write the code for yourself, you'll never get anywhere! But there are so many different tools and ideas floating around out there, so how are you supposed to know what to do? How do you even get started writing a Haskell project? And how can you make your development process as efficient as possible?

The first course I ever made, Your First Haskell Project, was designed to help beginners answer these questions. But over the years, it's become a bit dated, and I thought it would be good to sunset that course and replace it with a new alternative, Setup.hs. Like its predecessor, Setup.hs is totally free!

Setup.hs is a short course designed for many levels of Haskellers! Newcomers will learn all the basics of building and running your code. More experienced Haskellers will get some new tools for managing all your Haskell-related programs, as well as some tips for integrating Haskell features into your code editor!

Here's what you'll learn in the course:

  1. How to install and manage all of the core Haskell tools (GHC, Cabal, Stack)
  2. What components you need in your Haskell project and how you can build and run them all
  3. How to get your editor to use advanced features, like flagging compilation errors and providing autocomplete suggestions.

We'll do all of this in a hands-on way with detailed, step-by-step exercises!

Improvements

Setup.hs makes a few notable updates and improvements compared to Your First Haskell Project.

First, it uses GHCup to install all the necessary tools instead of the now-deprecated Haskell Platform. GHCup allows for seamless switching between the different versions of all our tools, which can be very useful when you have many projects on your system!

Second, it goes into more details about pretty much every topic, whether that's project organization, Stack snapshots, and extra dependencies.

Third and probably most importantly, Setup.hs will teach you how to get Haskell code hints in three of the most common code editors (VS Code, Vim & Emacs) using Haskell Language Server. Even if these lectures don't cover the particular editor you use, they'll give you a great idea of what you need to search for to learn how. I can't overstate how useful these kinds of integrations are. They'll massively speed up your development and, if you're a beginner, they'll rapidly accelerate your learning.

If all this sounds super interesting to you, head over to the course page and sign up!

Read More
James Bowen James Bowen

Series Spotlight: Monads and Functional Structures!

Every so often I like to spotlight some of the permanent series you can find on the skills page, which contains over a dozen tutorial series for you to follow! This week I’m highlighting my series on Monads and Functional Structures!

Monads are widely understood to be one of the trickier concepts for newcomers to Haskell, since they are very important, very abstract and conceptual, and do not really appear in most mainstream languages. There are a lot of monad tutorials out there on the internet, most of which are either too shallow, or too deep.

This series will help you understand the concept from the group up, starting with simpler abstract structures like functors and applicative functors.

Here’s an outline of the series:

  1. Functors

  2. Applicative Functors

  3. Monad Basics

  4. Reader and Writer Monads

  5. State Monad

  6. Monad Transformers

  7. Monad Laws

For a more in depth look at monads and the effects they help us implement in our code, you can check out our two courses, Making Sense of Monads and Effectful Haskell!

Read More
James Bowen James Bowen

GHC 9.6.1 Includes Javascript Backend

Some exciting news this week, as the release of GHC 9.6.1 includes the merger of a couple of different web-based backends - one for Web Assembly and one for Javascript. These features move Haskell in the direction of being a first-class web development language!

From the release notes:

The WebAssembly backend has been merged. This allows GHC to be built as a cross-compiler that targets wasm32-wasi and compiles Haskell code to self-contained WebAssembly modules that can be executed on a variety of different runtimes.

The JavaScript backend has been merged. GHC is now able to be built as a cross-compiler targeting the JavaScript platform.

This is a particularly exciting direction for me, since I’ve been exploring ways to use Haskell in web development for many years, but found a lot of the current approaches require a lot of onboarding work to really get going in a meaningful way. In my Practical Haskell course, I show how to do a basic integration of a Haskell Web Server and an Elm frontend. But I look forward to the day when I can re-do that section of the course entirely with Haskell!

Of course, merging the backends is just a first step - there’s a long way to go. A few caveats mentioned in the release notes as well:

There are a few caveats to be aware of [with the WebAssembly backend]:

To use the WebAssembly backend, one would need to follow the instructions on ghc-wasm-meta. The WebAssembly backend is not included in the GHC release bindists for the time being, nor is it supported by ghcup or stack yet.

The WebAssembly backend is still under active development. It’s presented in this GHC version as a technology preview, bugs and missing features are expected.

The [Javascript] backend should be considered a technology preview. As such it is not ready for use in production, is not distributed in the GHC release bindists and requires the user to manually build GHC as a cross-compiler. See the JavaScript backend wiki page on the GHC wiki for the current status, project roadmap, build instructions and demos.

Both of these backends are technology previews, meaning they’re only ready for the most adventurous Haskellers to start experimenting - a lot of setup work is still required. But it’s certainly an important step in the right direction! Since these are included with GHC 9.6.1, improvements are possible in further minor releases to GHC 9.6, rather than needing to wait for the next major release of GHC 9.8.

Read More
James Bowen James Bowen

Adding a Database to our AWS Server

In the last few articles on the blog, we've been exploring how to launch a Haskell web server using AWS. Here are the steps we've done so far:

  1. Create a local Docker Image
  2. Upload the Docker Image to ECR
  3. Deploy your Server using Elastic Beanstalk

In this final part of the series, we're going to learn to attach a database to our application.

There are a few gotchas to this. Setting up the database for first time use is a bit tricky, because we have to do some initial migrations. Then we need to use environment variables to ensure it works both locally and on the remote server. Let's get started.

A Basic Schema

Let's first assume we have a super basic schema using the Persistent library. (If you want some details on how this works, see our Real World Haskell series). We'll just have one type in our database, and users will use server endpoints to create or fetch these "text entries".

import           Database.Persist.Sql
import qualified Database.Persist.TH as PTH
import           Data.Text (Text)

PTH.share [PTH.mkPersist PTH.sqlSettings, PTH.mkMigrate "migrateAll"] [PTH.persistLowerCase|

  TextEntry sql=text_entries
    text Text

|]

An important product of this template Haskell sequence is the migrateAll function, which will run the proper commands to migrate a Postgres database to fit our schema by creating tables.

Whenever we first create a database, we have to make sure it's migrated. But before we even do that we have to make sure we've created a database for Postgres to use! Let's see the commands we need for this, and how to run them in Haskell.

Running Setup Commands

When you install Postgres on your machine, you'll have separate "databases" on your system to help you keep your data separate. Separating data allows each database to have its own "users" table without having any name conflicts, for one example. By default, Postgresql comes installed with a database called postgres.

But we don't want to use this to store our data. We want to create a separate database. We want to create this database if it's the first time we're running the server with a database. But otherwise, we just want to make sure its migrations are up to date.

Now, the command we would run to create this database is simple:

CREATE DATABASE quiz;

But we can first run this command to see if this database already exists:

SELECT datname FROM pg_database WHERE datname = 'quiz';

Both these commands assume we are connected to the postgres database.

Since these first two instructions are raw commands, we can run them using the Postgresql Simple library. Here's some code to do this.

createDBIfMissing :: String -> IO ()
createDBIfMissing connString = do
  connection <- connectPostgreSQL (pack connString)
  putStrLn "Checking/Creating 'quiz' Database"
  let checkQuery = "SELECT datname FROM pg_database WHERE datname = 'quiz';"
  (checkResult :: [Only String]) <- query_ connection checkQuery
  when (null checkResult) $ do
    putStrLn "Not found! Creating 'quiz' database!"
    let createQuery = "CREATE DATABASE quiz;"
    void $ execute_ connection createQuery

When we run checkQuery, it sees if the quiz database exists. If its result is null, then we'll run the additional command to create our database.

Once we have this function, we can write a wrapper that will create the database and then migrate it for our schema. Here's what this wrapper looks like:

migrateDb :: String -> String -> IO ()
migrateDb baseConnString quizConnString = do
  createDBIfMissing baseConnString
  putStrLn "Migrating Database"
  runPG quizConnString (runMigration migrateAll)

runPG :: String -> SqlPersistT (LoggingT IO) a -> IO a
runPG connString action = runStdoutLoggingT $
  withPostgresqlConn (pack connString) $ \backend ->
    runReaderT action backend

Notice migrateDb takes two different connection strings. One is for the base (postgres) database. The other is for our new quiz database. The creation queries run on the first, the migration runs on the second.

But how do we use these functions within our server?

Loading the URI

When we kick off our server, we have to load the database URI for our Postgres database. We'll use the format of {host}:{port}. If you're running it locally, this would just be localhost:5432. But when we deploy the server, we'll use a different URI. So let's write a function to load the host and port (separated by a colon) from an environment variable named DATABASE_URI.

loadDatabaseEnv :: IO (String, String)
loadDatabaseEnv = do
  dbEnv <- lookupEnv "DATABASE_URI"
  if isNothing dbEnv || ':' `notElem` fromJust dbEnv
    then return ("localhost", "5432")
    else return (span (/= ':') (fromJust dbEnv))

Now we need to construct the full Postgres connection string. This has the following general format:

host={host} port={port} dbname={dbname} user={user} password={password}

As a default value, you can often just have the username and password both be postgres (though of course this isn't recommended for a serious database). But let's make a function to substitute in the other values:

mkPostgresUri :: String -> String -> String -> String
mkPostgresUri host port dbname =
  "host='" <> host <> "' port=" <> tail port <> " dbname='" <> dbname <> "' user='postgres' password='postgres'"

Finally, we'll pull our different pieces together, get both URIs, and launch our server. In my example, I'm using a Servant server (more details on that in this article), and this will often require passing the database string as an argument.

server :: String -> Server QuizAPI
server connString = ...

runServer :: IO ()
runServer = do
  (host, port) <- loadDatabaseEnv
  let baseConnString = mkPostgresUri host port "postgres"
  let quizConnString = mkPostgresUri host port "quiz"
  migrateDb baseConnString quizConnString
  putStrLn "Running Server!"
  run 8080 (serve api (server quizConnString))

Having made all these modifications to our server, of course we have to rebuild and redeploy our docker image for that! We can create the new local image with:

docker build -t quiz-server .

Then for more detailed instructions on deploying it, refer to part 2 and part 3 of this series!

When you deploy the server, you'll find it's crashing of course, because we haven't configured the database! So let's get to the real meat of this article…setting up the database on AWS!

Create a Database with RDS

This process is not actually too challenging. The first thing we're going to do is use RDS (Relational Database Service) to set up our database. This is easily done from the AWS console.

  1. Select the RDS service
  2. Hit the orange "Create Database" button
  3. Go through the creation wizard, making sure to select "Postgres" and the "Free Tier" option (assuming you're just making a test app).

Most of the default options are fine, but as I mentioned above I specified postgres for the username and password of the database. I also unchecked the box for "Performance Insights" since this could lead to additional billing charges if you forget to turn it off.

Once you've created your database, you can then click the "databases" link on the sidebar, and select your new database. On that screen, you'll be able to see the "endpoint" and "port" of your database. These are the values you'll need for your environment!

Add Environment Variable

To connect your environment to the database, now you just have to add an environment variable! To do this, you have to access the configuration from the web portal:

  1. Go to the Elastic Beanstalk service
  2. Select "Environments" from the sidebar and then click the environment you have running your server.
  3. Click on the "Configuration" link on the side, and then select the "Edit" button in the "Software" section.
  4. At the very bottom, you'll find the "Environment Properties" section. Fill in DATABASE_URI as the key, and the {host}:{port} combination you got from your database in RDS.
  5. Click "Apply" to make the change!

By adding an environment variable, you are changing the configuration of your server, so it will reboot. Once it relaunches, you should find that it works, and you can persist information from your database!

Conclusion

Hopefully this series has helped you learn how to deploy your Haskell code to AWS! If you'd like to see all this article in video form, you can check out our YouTube video covering these steps!

For more tips on creating a "Real World" application, you can read our series on web skills! You can also download our Haskell Production checklist for some ideas of other libraries and tools you can use to improve your Haskell!

Read More
James Bowen James Bowen

Deploying a Haskell Server to AWS

In the last few articles, we've been talking about how to deploy a Haskell application using AWS. This is part 3 of the series. So if you haven't done parts 1 & 2, you should start there so you can follow along!

In Part 1, we wrote a Dockerfile and created a local Docker image containing a simple program for a Haskell web server.

In the Part 2, we pushed our container image to the AWS container registry (ECR). Notably, this involved creating an AWS account, downloading AWS command line tools and authenticating on the command line. We'll run a couple more of these commands today, so hopefully you're still authenticated!

But now that our container is uploaded, deploying that container is fairly straightforward. But it requires us to use a couple new concepts, as we'll see.

Adding ECR Permission

Before we get started, there's one step we have to take on the web portal. You need to give Elastic Beanstalk permission to download your ECR containers. You can do this using the IAM service from the AWS portal. Then follow these steps:

  1. Select "roles" on the left hand menu.
  2. Select "aws-elasticbeanstalk-ec2-role" in the list in the middle of the screen.
  3. Click "Add Permissions"
  4. Search for and select "AmazonEC2ContainerRegistryReadOnly"

Now let's get into the steps on our local machine.

Configuration File

There are multiple approaches to deploying a docker container, but the one that worked most easily for me was to create a file called Dockerrun.aws.json. (Full other methods, refer to the documentation). This approach involves a counter-intuitive idea. We're going to create a separate directory outside of our main project directory. We'll call it remote.

~/Quiz $ cd ..
~/ $ mkdir remote && cd remote

In this directory, we'll make a single file, called Dockerrun.aws.json. This will, of course, be a JSON file. It will be a very simple configuration file telling our application to use the docker image we pushed last time to ECR. We have to start it by specifying the version of the program (which is 1 because we're only using a single container).

{
  "AWSEBDockerrunVersion": "1",
  ...
}

Now we'll use tell it to use the Docker image we pushed last time by giving the URI under the Image object:

{
  "AWSEBDockerrunVersion": "1",
  "Image": {
    "Name": "165102442442.dkr.ecr.us-west-2.amazonaws.com/quiz-server"
  },
  ...
}

Finally, we'll specify the port, similar to a Dockerfile. We'll use 8080 both for the "Container" port and the "Host" port.

{
  "AWSEBDockerrunVersion": "1",
  "Image": {
    "Name": "165102442442.dkr.ecr.us-west-2.amazonaws.com/quiz-server"
  },
  "Ports": [{
    "ContainerPort": 8080,
    "HostPort": 8080
  }]
}

This is the only file we need in this directory! So now let's see what commands we need to run.

Creating the Application

Now we have two more steps that can largely be accomplished on the command line. First, we have to create an application. Then we have to create an environment to use for that application.

Before we can create an application though, we have to create a Git repository, just to store our single file! That's how AWS figures out what to push for configuration.

~/remote $ git init
~/remote $ git add .
~/remote $ git commit -m "First Commit"

Now we can create the application using the eb init command. We'll give our application the name quiz-server.

~/remote $ eb init -p docker quiz-server

You can then see your application on the web portal by accessing the "Elastic Beanstalk" service and clicking the "Applications" tab on the left menu.

Creating the Environment

Now we have to deploy an environment to deploy for our application. When first creating this environment, we use the eb create command. We'll give this environment the name quiz-server-env.

~/remote $ eb create quiz-server-env

This will take a while to deploy. But once it's done, you should be able to see it by clicking the "Environments" tab from the previous screen in the web portal. This will also show you the URL you can use to access your server. It's now successfully deployed!

Debugging

Sometimes, your deployment might fail. For example, you might misspell the name of your container. If you click on your environment (from the "Environments" tab), then you'll be able to access the "Logs" on the left hand menu. This can help you debug. If you need to change your configuration file, you'll need to commit it, though you don't need to push it to any remote repository. You instead use eb deploy to push your changes.

~/remote $ git add Dockerrun.aws.json
~/remote $ git commit -m "New Commit"
~/remote $ eb deploy

Now the deployment process should start again!

Video

You can also watch our YouTube video to see all these steps in action!

Conclusion

You now have enough information to deploy a Haskell web application to Heroku! We'll have one more installment in this series around adding a database to our application, so stay tuned for that! In the meantime, subscribe to our monthly newsletter so you can stay up to date with all the latest news!

Read More
James Bowen James Bowen

Pushing our Container to AWS ECR

In the first part of this blog series we saw how to create a local docker image containing a simple web server program. In order to run this server remotely, we have to upload this image somewhere to deploy it.

One service that lets us deploy docker images is Amazon Web Services (AWS). In this article, we're going to take the first step, and walk through the process of publishing our container image to the AWS Elastic Container Registry (ECR). Next time around, we'll see how to actually deploy our application using this image.

In principle, publishing the image is a simple task. But in my experience with AWS, the processes and documentation just aren't quite as clear as one would like them to be. There tend to be a lot of branches in their tutorials, and it's often not clear which path is the right path. The sheer amount of AWS-specific terminology can get extremely confusing, and this can make it hard to know if you've satisfied the prerequisites for the tutorial.

So in this article I'm going to be as explicit as possible, and include a video at the end so you can follow along. Here's the high level overview:

  1. Create an AWS account
  2. Create an ECR Repository
  3. Install the AWS Command Line Interface
  4. Login using the CLI
  5. Push the container using Docker

Create an AWS Account

First of course, you need to create an account with Amazon Web Services. This is a separate account from a normal Amazon account. But a massive gotcha is that you should not use the exact email address from your Amazon account. This can cause a weird loop preventing you from logging in successfully (see this Stack Overflow issue).

If you have Gmail though, it should work to use the '+' trick with email aliases. So you can have `name@gmail.comfor your Amazon account andname+aws@gmail.com` for your AWS account.

Create an ECR Repository

Next you'll need to login to your account on the web portal and create an ECR repository. To do this, you'll simply click the services tab and search for "Elastic Container Registry". Assuming you have no existing repositories, you'll be prompted with a description page of the service, and you'll want to find the "Get Started" button under the "Create a Repository" header off in the top right corner.

The only thing you need to do on the next page is to assign a name to the repository. The prefix of the repository will always have the format of {account-id}.dkr.ecr.{region}.amazonaws.com, where the account ID is a 12-digit number.

If you want, you can also set the repository as public, but my instructions will assume that you'd made a private repository. To finish up, you'll just click the "Create Repository" button at the bottom of the page. This part is also covered in the video at the bottom if you want to see it in action!

Install the AWS CLI

Our next few actions will happen on our local command line prompt. To interact with our AWS account, we'll need to install the AWS Command Line Interface. To install these tools, you can follow this user guide. It is fairly straightforward to follow once you select your operating system. You know it's succeeded when the command aws --version succeeds on your command line.

Login Using the CLI

Now assuming you created a private repository, you'll need to authenticate on the command line. The first step in this process is to create an access key. You can do this from the web portal by clicking your account name in the top right corner to open up a menu and then going to the "Security Credentials" page. There's a section for "Access Keys" about midpage, and you'll want to use "Create Access Key".

If you do this as a "root" user, AWS will warn you that this is not the advised practice and you should instead create such keys as an "IAM User". But it is possible to do use root for demonstration purposes.

You'll want to copy the "Access Key ID" and the key itself. The latter must be copied or downloaded before you leave the page (you can't come back to it later).

You can then login using the aws configure command in your command line terminal. This will ask you to enter your access key ID and then the key itself, as well as the region.

Now that you're authenticated with AWS, we have to allow AWS to login to Docker for us. The following command would give us the Docker password for AWS in the us-west-2 region:

>> aws ecr get-login-password --region us-west-2

We can pipe this password into the docker login command and connect to the repository we created with this command, where you should substitute your region and your account ID.

>> aws ecr get-login-password --region {region} | \
  docker login --username AWS --password-stdin {account-id}.dkr.ecr.{region}.amazonaws.com

Note how you actually do not need the repository name for this command! Just the prefix formed by your account and the region ID.

Pushing the Image

Now that we're authenticated, we just need to push the container image. We'll start by reminding ourselves of what our image ID is:

>> docker images
REPOSITORY TAG    IMAGE ID ...
quiz-server latest b9eab6a22b12 ...

The first thing we need to do is provide a "tag" for this image corresponding to the remote ECR repository we created. This requires the image ID and the full repository URI. We'll also attach :latest to indicate that this is the most recent push. Here's the specific command I used for my IDs:

>> docker tag b9eab6a22b12 165102442442.dkr.ecr.us-west-2.amazonaws.com/quiz-server:latest

Here's a more generic command template:

>> docker tag {image-id} {account-id}.dkr.ecr.{region}.amazonaws.com/{repo-name}:latest

Finally, we just need to push it using this new repository/tag combo! Here's what it looks like for me:

>> docker push 165102442442.dkr.ecr.us-west-2.amazonaws.com/quiz-server:latest

And more generically:

>> docker push {account-id}.dkr.ecr.{region}.amazonaws.com/{repo-name}:latest

You should then be able to see your image if you head to your ECR dashboard!

Video Walkthrough

If you want to see all this in action, you can head to YouTube and take a look at the video walkthrough! If you are enjoying this series, make sure to subscribe to our monthly newsletter!

Read More