Solve.hs Module 3 + Summer Sale!
After 6 months of hard work, I am happy to announce that Solve.hs now has a new module - Essential Algorithms! You’ll learn the “Haskell Way” to write all of the most important algorithms for solving coding problems, such as Breadth First Search, Dijkstra’s Algorithm, and more!
You can get a 20% discount code for this and all of our other courses by subscribing to our mailing list! Starting next week, the price for Solve.hs will go up to reflect the increased content. So if you subscribe and purchase this week, you’ll end up saving 40% vs. buying later!
So don’t miss out, head to the course sales page to buy today!
Functional Programming vs. Object Oriented Programming
Functional Programming (FP) and Object Oriented Programming (OOP) are the two most important programming paradigms in use today. In this article, we'll discuss these two different programming paradigms and compare their key differences, strengths and weaknesses. We'll also highlight a few specific ways Haskell fits into this discussion. Here's a quick outline if you want to skip around a bit!
- What is a Programming Paradigm?
- The Object Oriented Paradigm
- The Functional Paradigm
- Functional Programming vs. OOP
- OOP Languages
- FP Languages
- Advantages of Functional Programming
- Disadvantages of Functional Programming
- A Full Introduction to Haskell
What is a Programming Paradigm?
A paradigm is a way of thinking about a subject. It's a model against which we can compare examples of something.
In programming, there are many ways to write code to solve a particular task. Our tasks normally involve taking some kind of input, whether data from a database or commands from a user. A program's job is then to produce outputs of some kind, like updates in that database or images on the user's screen.
Programming paradigms help us to organize our thinking so that we can rapidly select an implementation path that makes sense to us and other developers looking at the code. Paradigms also provide mechanisms for reusing code, so that we don't have to start from scratch every time we write a new program.
The two dominant paradigms in programming today are Object Oriented Programming (OOP) and Functional Programming (FP).
The Object Oriented Paradigm
In object oriented programming, our program's main job is to maintain objects. Objects almost always store data, and they have particular ways of acting on other objects and being acted on by other objects (these are the object's methods). Objects often have mutable data - many actions you take on your objects are capable of changing some of the object's underlying data.
Object oriented programming allows code reuse through a system called inheritance. Objects belong to classes which share the same kinds of data and actions. Classes can inherit from a parent class (or multiple classes, depending on the language), so that they also have access to the data from the base class and some of the same code that manipulates it.
The Functional Paradigm
In functional programming, we think about programming in terms of functions. This idea is rooted in the mathematical idea of a function. A function in math is a process which takes some input (or a series of different inputs) and produces some kind of output. A simple example would be a function that takes an input number and produces the square of that number. Many functional languages emphasize pure functions, which produce the exact same output every time when given the same input.
In programming, we may view our entire program as a function. It is a means by which some kind of input (file data or user commands), is transformed into some kind of output (new files, messages on our terminal). Individual functions within our program might take smaller portions of this input and produce some piece of our output, or some intermediate result that is needed to eventually produce this output.
In functional programming, we still need to organize our data in some way. So some of the ideas of objects/classes are still used to combine separate pieces of data in meaningful ways. However, we generally do not attach "actions" to data in the same way that classes do in OOP languages.
Since we don't perform actions directly on our data, functional languages are more likely to use immutable data as a default, rather than mutable data. (We should note though that both paradigms use both kinds of data in their own ways).
Functional Programming vs. OOP
The main point of separation between these paradigms is the question of "what is the fundamental building block of my program?" In object oriented programming, our programs are structured around objects. Functions are things we can do to an object or with an object.
In functional programming, functions are always first class citizens - the main building block of our code. In object oriented programming, functions can be first class citizens, but they do not need to be. Even in languages where they can be, they often are not used in this way, since this isn't as natural within the object oriented paradigm.
Object Oriented Programming Languages
Many of the most popular programming languages are OOP languages. Java, for a long time the most widely used language, is perhaps the most archetypal OO language. All code must exist within an object, even in a simple "Hello World" program:
class MyProgram {
public static void main(String[] args) {
System.out.println("Hello World!");
}
}
In this example, we could not write our 'main' function on its own, without the use of 'class MyProgram'.
Java has a single basic 'Object' class, and all other classes (including any new classes you write) must inherit from it for basic behaviors like memory allocation. Java classes only allow single inheritance. This means that a class cannot inherit from multiple different types. Thus, all Java classes you would use can be mapped out on a tree structure with 'Object' as the root of the tree.
Other object oriented languages use the general ideas of classes, objects, and inheritance, but with some differences. C++ and Python both allow multiple inheritance, so that a class can inherit behavior from multiple existing classes. While these are both OOP languages, they are also more flexible in allowing functions to exist outside of classes. A basic script in either of these languages need not use any classes. In Python, we'd just write:
if __name__ == "__main__":
print("Hello World!")
In C++, this looks like:
int main() {
std::cout << "Hello World!" << std::endl;
}
These languages also don't have such a strictly defined inheritance structure. You can create classes that do not inherit from anything else, and they'll still work.
FP Languages
Haskell is perhaps the language that is most identifiable with the functional paradigm. Its type system and compiler really force you to adopt functional ideas, especially around immutable data, pure functions, and tail call optimization. It also embraces lazy evaluation, which is aligned with FP principles, but not a requirement for a functional language.
There are several other programming languages that generally get associated with the functional paradigm include Clojure, OCaml, Lisp, Scala and Rust. These languages aren't all functional in the same way as Haskell; there are many notable differences. Lisp bills itself specifically as a multi-paradigm language, and Scala is built to cross-compile with Java! Meanwhile Rust's syntax looks more object oriented, but its inheritance system (traits) feel much more like Haskell. However, on balance, these languages express functional programming ideas much more than their counterparts.
Amongst the languages mentioned in the object oriented section, Python has the most FP features. It is more natural to write functions outside of your class objects, and concepts like higher order functions and lambda expressions are more idiomatic than in C++ or Java. This is part of the reason Python is often recommended for beginners, with another reason being that its syntax makes it a relatively simple language to learn.
Advantages of Functional Programming
Fewer Bugs
FP code has a deserved reputation for having fewer bugs. Anecdotally, I certainly find I have a much easier time writing bug free code in Haskell than Python. Many bugs in object oriented code are caused by the proliferation of mutable state. You might pass an object to a method and expect your object to come back unchanged...only to find that the method does in fact change your object's state. With objects, it's also very easy for unstated pre-conditions to pop up in class methods. If your object is not in the state you expect when the method is called, you'll end up with behavior you didn't intend.
A lot of function-based code makes these errors impossible by imposing immutable objects as the default, if not making it a near requirement, as Haskell does. When the function is the building block of your code, you must specify precisely what the inputs of the function are. This gives you more opportunities to determine pre-conditions for this data. It also ensures that the return results of the function are the primary way you affect the rest of your program.
Functions also tend to be easier to test than objects. It is often tricky to create objects with the precise state you want to assess in a unit test, whereas to test a function you only need to reproduce the inputs.
More Expressive, Reasonable Design
The more you work with functions as your building blocks, and the more you try to fill your code with pure functions, the easier it will be to reason about your code. Imagine you have a couple dozen fields on an object in OO code. If someone calls a function on that object, any of those fields could impact the result of the method call.
Functions give you the opportunity to narrow things down to the precise values that you actually need to perform the computation. They let you separate the essential information from superfluous information, making it more obvious what the responsibilities are for each part of your code.
Multithreading
You can do parallel programming no matter what programming language you're using, but the functional programming paradigm aligns very well with parallel processing. To kick off a new thread in any language, you pretty much always have to pass a function as an argument, and this is more natural in FP. And with pure functions that don't modify shared mutable objects, FP is generally much easier to break into parallelizable pieces that don't require complex locking schemes.
Disadvantages of Functional Programming
Intuition of Complete Objects
Functional programming can feel less intuitive than object oriented programming. Perhaps one reason for this is that object oriented programming allows us to reason about "complete" objects, whose state at any given time is properly defined.
Functions are, in a sense, incomplete. A function is not a what that you can hold as a picture in your head. A function is a how. Given some inputs, how do you produce the outputs? In other words, it's a procedure. And a procedure can only really be imagined as a concrete object once you've filled in its inputs. This is best exemplified by the fact that functions have no native 'Show' instance in Haskell.
>> show (+)
No instance for Show (Integer -> Integer -> Integer) arising from a use of 'show'
If you apply the '+' function to arguments (and so create what could be called an "object"), then we can print it. But until then, it doesn't make much sense. If objects are the building block of your code though, you could, hypothetically, print the state of the objects in your code every step of the way.
Mutable State can be Useful!
As much as mutable state can cause a lot of bugs, it is nonetheless a useful tool for many problems, and decidedly more intuitive for certain data structures. If we just imagine something like the "Snake" game, it has a 2D grid that remains mostly the same from tick to tick, with just a couple things updating. This is easier to capture with mutable data.
Web development is another area where mutable objects are extremely useful. Anytime the user enters information on the page, some object has to change! Web development in FP almost requires its own paradigm (see "Functional Reactive Programming"). Haskell can represent mutable data, but the syntax is more cumbersome; you essentially need a separate data structure. Likewise, other functional languages might make mutability easier than Haskell, but mutability is still, again, more intuitive when objects are your fundamental building block, rather than functions on those objects.
We can see this even with something as simple as loops. Haskell doesn't perform "for-loops" in the same way as other languages, because most for loops essentially rely on the notion that there is some kind of state updating on each iteration of the loop, even if that state is only the integer counter. To write loops in Haskell, you have to learn concepts like maps and folds, which require you to get very used to writing new functions on the fly.
A Full Introduction to Haskell (and its Functional Aspects)
So functional programming languages are perhaps a bit more difficult to learn, but can offer a significant payoff if you put in the time to master the skills. Ultimately, you can use either paradigm for most kinds of projects and keep your development productive. It's down to your personal preference which you try while building software.
If you really want to dive into functional programming though, Haskell is a great language, since it will force you to learn FP principles more than other functional languages. For a complete introduction to Haskell, you should take a look at Haskell From Scratch, our beginner-level course for those new to the language. It will teach you everything you need to know about syntax and fundamental concepts, while providing you with a ton of hands-on practice through exercises and projects.
Haskell From Scratch also includes Making Sense of Monads, our course that shows the more functional side of Haskell by teaching you about the critical concept of monads. With these two courses under your belt, you'll be well on your way to mastery of functional programming! Head over here to learn more about these courses!
How to Write Comments in Haskell
Comments are often a simple item to learn, but there's a few ways we can get more sophisticated with them! This article is all about writing comments in Haskell. Here's a quick outline to get you started!
- What is a Comment?
- Single Line Comments
- Multi-Line Comments
- Inline Comments
- Writing Formal Documentation Comments
- Intro to Haddock
- Basic Haddock Comments
- Creating Our Haskell Report
- Documenting the Module Header
- Module Header Fields
- Haddock Comments Below
- Commenting Type Signatures
- Commenting Constructors
- Commenting Record Fields
- Commenting Class Definitions
A Complete Introduction to the Haskell Programming Language
What is a Comment?
A comment is non-code note you write in a code file. You write it to explain what the code does or how it works, in order to help someone else reading it. Comments are ignored by a language's compiler or interpreter. There is usually some kind of syntax to comments to distinguish them from code. Writing comments in Haskell isn't much different from other programming languages. But in this article, we'll look extensively at Haddock, a more advanced program for writing nice-looking documentation.
Single Line Comments
The basic syntax for comments in Haskell is easy, even if it is unusual compared to more common programming languages. In languages like Java, Javascript and C++, you use two forward slashes to start a single line comment:
int main() { // This line will print the string value "Hello, World!" to the console std::cerr << "Hello, World!" << std::endl; }
But in Haskell, single line comments start with two hyphens, '--':
-- This is our 'main' function, which will print a string value to the console main :: IO () main = putStrLn "Hello World!"
You can have these take up an entire line by themselves, or you can add a comment after a line of code. In this simple "Hello World" program, we place a comment at the end of the first line of code, giving instructions on what would need to happen if you extended the program.
main :: IO () main = -- Add 'do' to this line if you add another 'putStrLn' statement! putStrLn "Hello World!"
Multi-Line Comments
While you can always start multiple consecutive lines with whatever a comment line starts with in your language, many languages also have a specific way to make multiline comments. And generally speaking, this method has a "start" and an "end" sequence. For example, in C++ or Java, you start a multi line comment block with the characters '/' and end it with '/'
/* This function returns a new list that is a reversed copy of the input. It iterates through each value in the input and uses 'push_front' on the new copy. */ std::list<int> reverseList(const std::list<int>& ints) { std::list<int> result; for (const auto& i : ints) { result.push_front(i); } return result; }
In Haskell, it is very similar. You use the brace and a hyphen character to open ('{-') and then the reverse to close the block ('-}').
{- This function returns a new list that is a reversed copy of the input. It uses a tail recursive helper function. -} reverse :: [a] -> [a] reverse = reverseTail [] where reverseTail acc [] = acc reverseTail acc (x : xs) = reverseTail (x : acc) xs
Notice we don't have to start every line in the comment with double hyphens. Everything in there is part of the comment, until we reach the closing character sequence. Comments like these with multiple lines are also known as "block comments". They are useful because it is easy to add more information to the comment without adding any more formatting.
Inline Comments
While you generally use the brace/hyphen sequence to write a multiline comment, this format is surprisingly also useful for a particular form of single line comments. You can write an "inline" comment, where the content is in between operational code on that line.
reverse :: [a] -> [a] reverse = reverseTail [] where reverseTail {- Base Case -} acc [] = acc reverseTail {- Recursive Case -} acc (x : xs) = reverseTail (x : acc) xs
The fact that our code has a start and end sequence means that the compiler knows where the real code starts up again. This is impossible when you use double hyphens to signify a comment.
Writing Formal Documentation Comments
If the only people using this code will be you or a small team, the two above techniques are all you really need. They tell people looking at your source code (including your future self) why you have written things in a certain way, and how they should work. However, if other people will be using your code as a library without necessarily looking at the source code, there's a much deeper area you can explore. In these cases, you will want to write formal documentation comments. A documentation comment tells someone what a function does, generally without going into the details of how it works. More importantly, documentation comments are usually compiled into a format for someone to look at outside of the source code. These sorts of comments are aimed at people using your code as a library. They'll import your module into their own programs, rather than modifying it themselves. You need to answer questions they'll have like "How do I use this feature?", or "What argument do I need to provide for this function to work"? You should also consider having examples in this kind of documentation, since these can explain your library much better than plain statements. A simple code snippet often provides way more clarification than a long document of function descriptions.
Intro to Haddock
As I mentioned above, formal documentation needs to be compiled into a format that is more readable than source code. In most cases, this requires an additional tool. Doxygen, for example, is one tool that supports many programming languages, like C++ and Python. Haskell has a special tool called Haddock. Luckily, you probably don't need to go through any additional effort to install Haddock. If you used GHCup to install Haskell, then Haddock comes along with it automatically. (For a full walkthrough on getting Haskell installed, you can read our Startup Guide). It also integrates well with Haskell's package tools, Stack and Cabal. In this article we'll use it through Stack. So if you want to follow along, you should create a new Haskell project on your machine with Stack, calling it 'HaddockTest'. Then build the code before we add comments so you don't have to wait for it later:
>> stack new HaddockTest >> cd HaddockTest >> stack build
You can write all the code from the rest of the article in the file 'src/Lib.hs', which Stack creates by default.
Basic Haddock Comments
Now let's see how easy it is to write Haddock comments! To write basic comments, you just have to add a vertical bar character after the two hyphens:
-- | Get the "block" distance of two 2D coordinate pairs manhattanDistance :: (Int, Int) -> (Int, Int) -> Int manhattanDistance (x1, y1) (x2, y2) = abs (x2 - x1) + abs (y2 - y1)
It still works even if you add a second line without the vertical bar. All comment lines until the type signature or function definition will be considered part of the Haddock comment.
-- | Get the "block" distance of two 2D coordinate pairs -- This is the sum of the absolute difference in x and y values. manhattanDistance :: (Int, Int) -> (Int, Int) -> Int manhattanDistance (x1, y1) (x2, y2) = abs (x2 - x1) + abs (y2 - y1)
You can also make a block comment in the Haddock style. It involves the same character sequences as multi line comments, but once again, you just add a vertical bar after the start sequence. The end sequence does not need the bar:
{-| Get the "block" distance of two 2D coordinate pairs This is the sum of the absolute difference in x and y values. -} manhattanDistance :: (Int, Int) -> (Int, Int) -> Int manhattanDistance (x1, y1) (x2, y2) = abs (x2 - x1) + abs (y2 - y1)
No matter which of these options you use, your comment will look the same in the final document. Next, we'll see how to generate our Haddock document. To contrast Haddock comments with normal comments, we'll add a second function in our code with a "normal" single line comment. We also need to add both functions to the export list of our module at the top:
`
haskell module Lib ( someFunc, , manhattanDistance , euclidenDistance ) where
...
-- Get the Euclidean distance of two 2D coordinate pairs (not Haddock) euclideanDistance :: (Double, Double) -> (Double, Double) -> Double euclideanDistance (x1, y1) (x2, y2) = sqrt ((x2 - x1) ^ 2 + (y2 - y1) ^ 2)
Now let's create our document!
## Creating Our Haskell Report
To generate our document, we just use the following command:
```bash
>> stack haddock
This will compile our code. At the end of the process, it will also inform us about what percentage of the elements in our code used Haddock comments. For example:
25% ( 1 / 4) in 'Lib'
Missing documentation for:
Module header
someFunc (src/Lib.hs:7)
euclideanDistance (src/Lib.hs:17)
As expected, 'euclideanDistance' is not considered to have a Haddock comment. We also haven't defined a Haddock comment for our module header. We'll do that in the next section. We'll get rid of the 'someFunc' expression, which is just a stub. This command will generate HTML files for us, most importantly an index file! They get generated in the '.stack-work' directory, usually in a folder that looks like '{project}/.stack-work/install/{os}/{hash}/{ghc_version}/doc/'. For example, the full path of my index file in this example is:
/home/HaddockTest/.stack-work/install/x86_64-linux-tinfo6/6af01190efdb20c14a771b6e2823b492cb22572e9ec30114989156919ec4ab3a/9.6.3/doc/index.html
You can open the file with your web browser, and you'll find a mostly blank page listing the modules in your project, which at this point should only be 'Lib'. If you click on 'Lib', it will take you to a page that looks like this:
We can see that all three expressions from our file are there, but only 'manhattanDistance' has its comment visible on the page. What's neat is that the type links all connect to documentation for the base libraries. If we click on 'Int', it will take us to the page for the 'base' package module 'Data.Int', giving documentation on 'Int' and other integer types.
Documenting the Module Header
In the picture above, you'll see a blank space between our module name and the 'Documentation' section. This is where the module header documentation should go. Let's see how to add this into our code. Just as Haddock comments for functions should go above their type signatures, the module comment should go above the module declaration. You can start it with the same format as you would have with other Haddock block comments:
{-| This module exposes a couple functions
related to 2D distance calculation.
-}
module Lib
( manhattanDistance
, euclideanDistance
) where
...
If you rerun 'stack haddock' and refresh your Haddock page, this comment will now appear under 'Lib' and above 'Documentation'. This is the simplest thing you can do to provide general information about the module.
Module Header Fields
However, there are also additional fields you can add to the header that Haddock will specifically highlight on the page. Suppose we update our block comment to have these lines:
{-|
Module: Lib
Description: A module for distance functions.
Copyright: (c) Monday Morning Haskell, 2023
License: MIT
Maintainer: person@mmhaskell.com
The module has two functions. One calculates the "Manhattan" distance, or "block" distance on integer 2D coordinates. The other calculates the Euclidean distance for a floating-point coordinate system.
-}
module Lib
( manhattanDistance
, euclideanDistance
) where
...
At the bottom of the multi line comment, after all the lines for the fields, we can put a longer description, as you see. After adding this, removing 'someFunc', and making our prior comment on Euclidean distance a Haddock comment, we now get 100% marks on the documentation for this module when we recompile it:
100% ( 3 / 3) in 'Lib'
And here's what our HTML page looks like now. Note how the fields we entered are populated in the small box in the upper right.
Note that the short description we gave is now visible next to the module name on the index page. This page still only contains the description below the fields.
Haddock Comments Below
So far, we've been using the vertical bar character to place Haddock comments above our type signatures. However, it is also possible to place comments below the type signatures, and this will introduce us to a new syntax technique that we'll use for other areas. The general idea is that we can use a caret character '^' instead of the vertical bar, indicating that the item we are commenting is "above" or "before" the comment. We can do this either with single line comments or block comments. Here's how we would use this technique with our existing functions:
manhattanDistance :: (Int, Int) -> (Int, Int) -> Int
-- ^ Get the "blocK" distance of two 2D coordinate pairs
manhattanDistance (x1, y1) (x2, y2) = abs (x2 - x1) + abs (y2 - y1)
euclideanDistance :: (Double, Double) -> (Double, Double) -> Double
{- ^ Get the Euclidean distance of two 2D coordinate pairs
This uses the Pythagorean formula.
-}
euclideanDistance (x1, y1) (x2, y2) = sqrt ((x2 - x1) ^ 2 + (y2 - y1) ^ 2)
The comments will appear the same in the final documentation.
Commenting Type Signatures
The comments we've written so far have described each function as a unit. However, sometimes you want to make notes on specific function arguments. The most common way to write these comments in Haskell with Haddock is with the "above" style. Each argument goes on its own line with a "caret" Haddock comment after it. Here's an example:
-- | Given a base point and a list of other points, returns
-- the shortest distance from the base point to a point in the list.
shortestDistance ::
(Double, Double) -> -- ^ The base point we are measuring from
[(Double, Double)] -> -- ^ The list of alternative points
Double
shortestDistance base [] = -1.0
shorestDistance base rest = minimum $ (map (euclideanDistance base) rest)
It is also possible to write these with the vertical bar above each argument, but then you will need a second line for the comment.
-- | Given a base point and a list of other points, returns
-- the shortest distance from the base point to a point in the list.
shortestDistance ::
-- | The base point we are measuring from
(Double, Double) ->
-- | The list of alternative points
[(Double, Double)] ->
Double
shortestDistance base [] = -1.0
shorestDistance base rest = minimum $ (map (euclideanDistance base) rest)
It is even possible to write the comments before AND on the same line as inline comments. However, this is less common since developers usually prefer seeing the type as the first thing on the line.
Commenting Constructors
You can also use Haddock comments for type definitions. Here is an example of a data type with different constructors. Each gets a comment.
data Direction =
DUp | -- ^ Positive y direction
DRight | -- ^ Positive x direction
DDown | -- ^ Negative y direction
DLeft -- ^ Negative x direction
Commenting Record Fields
You can also comment record fields within a single constructor.
data Movement = Movement
{ direction :: Direction -- ^ Which way we are moving
, distance :: Int -- ^ How far we are moving
}
An important note is that if you have a constructor on the same line as its fields, a single caret comment will refer to the constructor, not to its last field.
data Point =
Point2I Int Int | -- ^ 2d integral coordinate
Point2D Double Double | -- ^ 2d floating point coordinate
Point3I Int Int Int | -- ^ 3d integral coordinate
Point3D Double Double Double -- ^ 3d floating point coordinate
Commenting Class Definitions
As one final feature, we can add these sorts of comments to class definitions as well. With class functions, it is usually better to use "before" comments with the vertical bar. Unlike constructors and fields, an "after" comment will get associated with the argument, not the method.
{-| The Polar class describes objects which can be described
in "polar" coordinates, with a magnitude and angle
-}
class Polar a where
-- | The total length of the item
magnitude :: a -> Double
-- | The angle (in radians) of the point around the z-axis
angle :: a -> Double
Here's what all these new pieces look like in our documentation:
You can see the way that each comment is associated with a particular field or argument.
A Complete Introduction to the Haskell Programming Language
Of course, comments are useless if you have no code or projects to write them in! If you're a beginner to Haskell, the fastest way to get up to writing project-level code is our course, Haskell From Scratch! This course features hours of video lectures, over 100 programming exercises, and a final project to test your skills! Learn more about it on this page!
How to Write “Hello World” in Haskell
In this article we're going to write the easiest program we can in the Haskell programming language. We're going to write a simple example program that prints "Hello World!" to the console. It's such a simple program that we can do it in one line! But it's still the first thing you should do when starting a new programming language. Even with such a simple program there are several details we can learn about writing a Haskell program. Here's a quick table of contents if you want to jump around!
- Writing Haskell "Hello World"
- The Simplest Way to Run the Code
- Functional Programming and Types
- Requirements of an Executable Haskell Program
- Using the GHC Compiler
- Using GHCI - The Haskell Interpreter
- A Closer Look at Our Types
- Compilation Errors
- A Quick Look At Type Classes
- Echo - Another Example Program
- A Complete Introduction to the Haskell Programming Language
Now let's get started!
Writing Haskell "Hello World"
To write our "Haskell Hello World" program, we just need to open a file named 'HelloWorld.hs' in our code editor and write the following line:
main = putStrLn "Hello World!"
This is all the code you need! With just this one line, there's still another way you could write it. You could use the function 'print' instead of 'putStrLn':
main = print "Hello World!"
These programs will both accomplish our goal, but their behavior is slightly different! But to explore this, we first need to run our program!
The Simplest Way to Run the Code
Hopefully you've already installed the Haskell language tools on your machine. The old way to do this was through Haskell Platform, but now you should use GHCup. You can read our Startup Guide for more instructions on that! But assuming you've installed everything, the simplest way to run your program is to use the 'runghc' command on your file:
>> runghc HelloWorld.hs
With the first version of our code using 'putStrLn', we'll see this printed to our terminal:
Hello World!
If we use 'print' instead, we'll get this output:
"Hello World!"
In the second example, there are quotation marks! To understand why this is, we need to understand a little more about types, which are extremely important in Haskell code.
Functional Programming and Types
Haskell is a functional programming language with a strong, static type system. Even something as simple as our "Hello World" program is comprised of expressions, and each of these expressions has a type. For that matter, our whole program has a type!
In fact, every Haskell program has the same type: 'IO ()'. The IO type signifies any expression which can perform Input/Output activities, like printing to the terminal and reading user input. Most functions you write in Haskell won't need to do these tasks. But since we're printing, we need the IO signifier. The second part of the type is the empty tuple, '()'. This is also referred to as the "unit type". When used following 'IO', it is similar to having a 'void' return value in other programming languages.
Now, our 'main' expression signifies our whole program, and we can explicitly declare it to have this type by putting a type signature above it in our code. We give the expression name, two colons, and then the type:
main :: IO ()
main = putStrLn "Hello World!"
Our program will run the same with the type signature. We didn't need to put it there, because GHC, the Haskell compiler, can usually infer the types of expressions. With more complicated programs, it can get stuck without explicit type signatures, but we don't have to worry about that right now.
Requirements of an Executable Haskell Program
Now if we gave any other type to our main function, we won't be able to run our program! Our file is supposed to be an entry point - the root of an executable program. And Haskell has several requirements for such files.
These files must have an expression named 'main'. This expression must have the type 'IO ()'. Finally, if we put a module name on our code, that module name should be Main. Module names go at the top of our file, prefaced by "module", and followed by the word "where". Here's how we can explicitly declare the name of our module:
module Main where
main :: IO ()
main = putStrLn "Hello World!"
Like the type signature on our function 'main', GHC could infer the module name as well. But let's try giving it a different module name:
module HelloWorld where
main :: IO ()
main = putStrLn "Hello World!"
For most Haskell modules you write, using the file name (minus the '.hs' extension) IS how you want to name the module. But runnable entry point modules are different. If we use the 'runghc' command on this code, it will still work. However, if we get into more specific behaviors of GHC, we'll see that Haskell treats our file differently if we don't use 'Main'.
Using the GHC Compiler
Instead of using 'runghc', a command designed mainly for one-off files like this, let's try to compile our code more directly using the Haskell compiler. Suppose we have used HelloWorld as the module name. What files does it produce when we compile it with the 'ghc' command?
>> ghc HelloWorld.hs
[1 of 1] Compiling HelloWorld ( HelloWorld.hs, HelloWorld.o )
>> ls
HelloWorld.hi HelloWorld.hs HelloWorld.o
This produces two output files beside our source module. The '.hi' file is an interface file. The '.o' file is an object file. Unfortunately, neither of these are runnable! So let's try changing our module name back to Main.
module Main where
main :: IO ()
main = putStrLn "Hello World!"
Now we'll go back to the command line and run it again:
>> ghc HelloWorld.hs
[1 of 2] Compiling Main ( HelloWorld.hs, HelloWorld.o )
[2 of 2] Linking HelloWorld
>> ls
HelloWorld HelloWorld.hi HelloWorld.hs HelloWorld.o
This time, things are different! We now have two compilation steps. The first says 'Compiling Main', referring to our code module. The second says 'Linking HelloWorld'. This refers to the creation of the 'HelloWorld' file, which is executable code! (On Windows, this file will be called 'HelloWorld.exe'). We can "run" this file on the command line now, and our program will run!
>> ./HelloWorld
Hello World!
Using GHCI - The Haskell Interpreter
Now there's another simple way for us to run our code. We can also use the GHC Interpreter, known as GHCI. We open it with the command 'ghci' on our command line terminal. This brings us a prompt where we can enter Haskell expressions. We can also load code from our modules, using the ':load' command. Let's load our hello world program and run its 'main' function.
>> ghci
GHCI, version 9.4.7: https://www.haskell.org/ghc/ :? for help
ghci> :load HelloWorld
[1 of 2] Compiling Main ( HelloWorld.hs, interpreted )
ghci> main
Hello World!
If we wanted, we could also just run our "Hello World" code in the interpreter itself:
ghci> putStrLn "Hello World!"
Hello World!
It's also possible to assign our string to a value and then use it in another expression:
ghci> let myString = "Hello World!"
ghci> putStrLn myString
Hello World!
A Closer Look at Our Types
A very useful function of GHCI is that it can tell us the types of our expressions. We just have to use the ':type' command, or ':t' for short. We have two expressions in our Haskell program: 'putStrLn', and "Hello World!". Let's look at their types. We'll start with "Hello World!":
ghci> :type "Hello World!"
"Hello World!" :: String
The type of "Hello World!" itself is a 'String'. This is the name given for a list of characters. We can look at the type of an individual character as well:
ghci> :type 'H'
'H' :: Char
What about 'putStrLn'?
ghci> :t putStrLn
putStrLn :: String -> IO ()
The type for 'putStrLn' looks like 'String -> IO ()'. Any type with an arrow in it ('->') is a function. It takes a 'String' as an input and it returns a value of type 'IO ()', which we've discussed. In order to apply a function, we place its argument next to it in our code. This is very different from other programming languages, where you usually need parentheses to apply a function on arguments. Once we apply a function, the type of the resulting expression is just whatever is on the right side of the arrow. So applying our string to the function 'putStrLn', we get 'IO ()' as the resulting type!
ghci> :t putStrLn "Hello World!"
putStrLn "Hello World!" :: IO ()
Compilation Errors
For a different example, let's see what happens if we try to use an integer with 'putStrLn':
ghci> putStrLn 5
No instance for (Num String) arising from the literal '5'
The 'putStrLn' function only works with values of the 'String' type, while 5 has a type more like 'Int'. So we can't use these expressions together.
A Quick Look At Type Classes
However, this is where 'print' comes in. Let's look at its type signature:
ghci> :t print
print :: Show a => a -> IO ()
Unlike 'putStrLn', the 'print' function takes a more generic input. A "type class" is a general category describing a behavior. Many different types can perform the behavior. One such class is 'Show'. The behavior is that Show-able items can be converted to strings for printing. The 'Int' type is part of this type class, so we can use 'print' with it!
ghci> print 5
5
When use 'show' on a string, Haskell adds quotation marks to the string. This is why it looks different to use 'print' instead of 'putStrLn' in our initial program:
ghci> print "Hello World!"
"Hello World!"
Echo - Another Example Program
Our Haskell "Hello World" program is the most basic example of a program we can write. It only showed one side of the input/output equation. Here's an "echo" program, which first waits for the user to enter some text on the command line and then prints that line back out:
main :: IO ()
main = do
input <- getLine
putStrLn input
Let's quickly check the type of 'getLine':
ghci> :t getLine
getLine :: IO String
We can see that 'getLine' is an IO action returning a string. When we use the backwards arrow '<-' in our code, this means we unwrap the IO value and get the result on the left side. So the type of 'input' in our code is just 'String', meaning we can then use it with 'putStrLn'! Then we use the 'do' keyword to string together two consecutive IO actions. Here's what it looks like to run the program. The first line is us entering input, the second line is our program repeating it back to us!
>> runghc Echo.hs
I'm entering input!
I'm entering input!
A Complete Introduction to the Haskell Programming Language
Our Haskell "Hello World" program is the most basic thing you can do with the language. But if you want a comprehensive look at the syntax and every fundamental concept of Haskell, you should take our beginners course, Haskell From Scratch.
You'll get several hours of video lectures, plus a lot of hands-on experience with 100+ exercise problems with automated testing.
All-in-all, you'll only need 10-15 hours to work through all the material, so within a couple weeks you'll be ready for action! Read more about the course here!
Black Friday Sale: Last Day!
We've come to Cyber Monday, marking the last day of our Black Friday sale! Today is your last chance to get big discounts on all of our courses. You can get 20% by using the code BFSOLVE23 at checkout. Or you can subscribe to our mailing list to receive a 30% discount code. You must use these codes by the end of the day in order to get the discount!
Here's a final runthrough of the courses we have available, including our newest course, Solve.hs!
Solve.hs
We just released the first part of our newest course last week! These two detailed modules dive into the fundamentals of problem solving in Haskell. You'll get to rewrite the list type and most of its API from scratch, teaching you all the different ways you can write "loop" code in Haskell. Then you'll get an in-depth look at how data structures work in Haskell, including the quick process to learn a data structure from start to finish!
Normal Price: $89 Sale Price: $71.20 Subscriber Price: $62.30
Haskell From Scratch
This is our extensive, 7-module beginners course. You'll get a complete introduction to Haskell's syntax and core concepts, including things like monads and tricky type conversions.
Normal Price: $99 Sale Price: $79.20 Subscriber Price: $69.30
Practical Haskell
Practical Haskell is designed to break the idea that "Haskell is only an academic language". In our longest and most detailed course, you'll learn the ins and outs of communicating with a database in Haskell, building a web server, and connecting that server to a functional frontend page. You'll also learn about the flexibility that comes with Haskell's effect systems, as well as best practices for testing your code, including tricky test cases like IO based functions!
Normal Price: $149 Sale Price: $119.20 Subscriber Price: $104.30
Making Sense of Monads
The first of our shorter, more targeted courses, Making Sense of Monads will teach you how to navigate monads, one of Haskell's defining concepts. This idea is a bit tricky at first but also quite important for unleashing Haskell's full power. The course is well suited to beginners who know all the basic syntax but want more conceptual practice.
Note that Making Sense of Monads is bundled with Haskell From Scratch. So if you buy the full beginners course, you'll get this in-depth look at monads for free!
Normal Price: $29 Sale Price: $23.20 Subscriber Price: $20.30
Effectful Haskell
If Making Sense of Monads is best for teaching the basics of monads, Effectful Haskell will show you how to maximize the potential of this idea. You'll develop a more complete idea of what we mean by "effects" in your code. You'll see a variety of ways to incorporate them into your code and learn some interesting ideas about effect substitution!
Normal Price: $39 Sale Price: $31.20 Subscriber Price: $27.30
Haskell Brain
Last, but not least, Haskell Brain will teach you how to perform machine learning tasks in Haskell with TensorFlow. There's a lot of steps involved in linking these two technologies. So while machine learning is a valuable skill to have in today's world, understanding the ways we can link software together is almost as valuable!
Normal Price: $39 Sale Price: $31.20 Subscriber Price: $27.30
Conclusion
So don't miss out on this special offer! You can use the code BFSOLVE23 for 20% off, or you can subscribe to our mailing list to get a code for 30% off! This offer ends tonight, so don't wait!
Spotlight: Quick, Focused Haskell Courses
A couple days ago I gave a brief spotlight on the longer, more in-depth courses I've written. The newest of these is Solve.hs, with its focus on problem solving, and the original two I wrote were Haskell From Scratch and Practical Haskell.
After my first two courses, I transitioned towards writing a few shorter courses. These are designed to teach vital concepts in a shorter period of time. They all consist of just a single module and have a shorter total lecture time (1.5 to 2 hours each). You can finish any of them in a concentrated 1-2 week effort. Today I'll give a brief summary of each of these, listed from most abstract to most practical, and easiest to hardest.
Remember, all of these are on sale at 20% off using the code BFSOLVE23 at checkout! You can also subscribe to our mailing list to get an even bigger discount, at 30% off!
Making Sense of Monads
This is for those of you who have been writing Haskell long enough that you've got the hang of the syntax, but you still struggle a bit to understand monads. You might look at parts of Modules 4 and 5 from Haskell From Scratch and think they look useful, but you don't think you need the rest of the course.
Making Sense of Monads really "zooms in" on Module 5. It goes deeper in understanding all of the simpler structures that help us understand monads, and it gives a sizable amount of practice with writing monadic code. You'll also get a crash course on parsing (a common use of monadic operations), and write two fairly complex parsers. So it's a great option if you want a shorter but more concentrated approach on some of the basics!
Effectful Haskell
Effectful Haskell takes a lot of the core ideas and concepts in Making Sense of Monads and goes one step beyond into the more practical realm of applying monadic effects in a program. You'll learn more abstractly what an effect is, but then also the different ways to incorporate polymorphic effects into your Haskell program. You'll see how to use monads and monad classes to swap effectful behaviors in your program, and why this is useful.
This course culminates in a similar (but smaller) project to Practical Haskell, where you'll deploy an effectful web server to Heroku.
Haskell Brain
This course is the hardest and most practically-oriented of this series. You will take on the challenge of incorporating TensorFlow and machine learning into Haskell. This is easier said than done, because TensorFlow has many dependencies beyond the normal packages you can simply pick up on Hackage. So you'll gain valuable experience going through this installation process, and then we'll run through some of the main information you need to know when it comes to creating tensors in Haskell, and building moderately complex models.
Conclusion
So while these courses are shorter, they still pack a decent amount of material! And with the subscriber discount, you can get each of them for less than $30! This offer will only last until Monday though, so make up your mind quickly!
Spotlight: In-Depth Haskell Courses!
On Monday, I announced the release of Part 1 of Solve.hs, our newest course focused on problem solving. In addition, this entire week is our Black Friday sale, with two options to save on any of our courses, including the newest one. Your first option is to just use the code BFSOLVE23, which will get you 20% off. You can also get a special discount code by signing up for our mailing list, which will get you 30% off!
Some of our courses are shorter and more focused, while others are longer and cover more ground. Today on the blog I wanted to highlight the longer courses available at Monday Morning Haskell Academy. These go into a lot of detail on many different subjects, so they can fill a large gap in your Haskell awareness. However, each obviously has its own general subject area. Here they are, ordered from most beginner-level to most advanced.
Haskell From Scratch
Our very first course, Haskell From Scratch, is perfect for the curious reader of Haskell who has hardly (if ever) gotten around to writing any lines of Haskell code. Over the course of 7 modules, you'll learn everything you really need to get started writing some code. We'll go over a bit of basic setup, cover Haskell's core syntax extensively, and go over many of the important conceptual ideas that separate Haskell from other languages, like immutability, laziness, and monads. You will get many chances to try your new skills out with an abundant series of practice problems, as well as a final project!
Learn more about it on the course page!
Solve.hs
Solve.hs is our newest course, with the first two modules released this week! It will teach you the fundamental patterns of problem solving in Haskell. You'll start with an in-depth look at Haskell's lists and how we can use them to implement loop patterns. Then you'll learn A LOT of details about Haskell's data structures, including some implementation ideas that are unique to Haskell. It's a great choice if you have a solid grasp on the basic syntax and fundamental concepts of Haskell, but are looking to improve your fluency in the language.
Find out more by going here!
Practical Haskell
Practical Haskell is, by some distance, our longest and most detailed course, with close to 6 hours of lectures and an additional 4 hours of screencast content spread over 5 modules. You'll build different parts of a functioning website, starting with the database connections, then the server, and finally a frontend written in Elm. You'll also learn how to launch this server on Heroku, as well as some tips for best practices in testing.
Take a look on the MMH Academy page!
Conclusion
These courses are all great for helping you take the next big step in your Haskell journey. Until next Monday, you can get 20% off all of them using the code BFSOLVE23 when you check out! And you can get an even bigger discount (30%) if you subscribe to our mailing list! So don't miss out on this opportunity; you've only got 5 more days!
New Course: Solve.hs!
A few weeks ago I hinted that, while I hadn't been able to publish consistently for much of this year, I had been working on something important. And now, I'm finally ready for the big reveal!
Today I am excited to announce the release of Part 1 of my new course Solve.hs. This course is focused on problem solving in Haskell. This announcement also kicks off the Black Friday sale week for Monday Morning Haskell, where you can get 20% off any of our courses with the code BFSOLVE23. And if you subscribe to our mailing list, you'll get an even bigger discount, 30% off on everything, including Solve.hs!
Why Problem Solving?
I settled on problem solving as a great topic for a course because it occupies a good middle ground between, on the one hand, super basic setup and syntax (covered by Setup.hs and Haskell From Scratch, respectively), and on the other hand, more complex practical topics covered by courses like Practical Haskell, Effectful Haskell, and Haskell Brain.
Problem solving is a relatively quick and frictionless path to getting familiar with a language (especially one like Haskell that defies many conventions from more common languages). Once you have your toolchain set up, it's not hard to find problems that will quickly stretch your understanding of the language and even teach you some pretty tricky concepts! With problem solving, it's also easier to do a lot of different problems and get a lot of "reps" in, so to speak.
My actual Haskell career trajectory with Haskell went pretty quickly from learning the basics to jumping into practical applications like web servers. This is entirely doable, but it does run the risk of pigeon-holing you into some pretty specific areas of understanding, causing you to skip many of the fundamentals.
Problem solving is a great tool to fill this gap. It forces you to learn about new techniques that you might not need on a particular project, but which will help you grow and be able to apply to future projects. Let's learn a bit more about the course itself.
The Course
This course release is exciting for me because, even just considering the first part, it's the longest and most detailed course I've written since Practical Haskell. It doesn't appear to be that long, consisting of only two modules. But by several important metrics, it actually rivals the 7-module Haskell From Scratch. It has nearly as much lecture material, about as many practice problems, and the amount of code you'll actually have to write for the exercises is quite a bit more!
Course | Lecture Time | Practice Problems | Solution LOC* |
---|---|---|---|
Haskell From Scratch | 4h05m | ~180 | 1316 |
Solve.hs | 3h47m | ~180 | 2587 |
*Solution LOC measures the GitHub diff comparing my answers branches to the original
Solve.hs is good for users with many levels of Haskell experience. Beginner and intermediate Haskellers will benefit a lot from learning the core patterns in this first part, and even experienced Haskellers will find some of the practice problems challenging. To see why, let's look at the two modules currently on offer!
Module 1: Lists and Loop Patterns
In the first module, we'll do extensive work with lists. Lists are one of the basic building blocks of Haskell - the simplest collection type you can have. To use them well, and so become "fluent" in Haskell, you need to familiarize yourself with a lot of different patterns of how to use them.
In Module 1, we'll explicitly name these patterns and draw helpful comparisons to code from other languages. This will help you finally understand how to answer questions like "how do I write a for-loop in Haskell?"
You'll learn large parts of the List API by implementing the type from scratch, as well as dozens of its helper functions. This will help you drill in the patterns of how lists work, and what operations we can (and cannot) do efficiently with them.
Module 2: Data Structures
Once you're well-versed in all the things Haskell can do with lists, it will be time to learn about other structures that allow for different efficient operations. This is the subject of Module 2. Now I've written plenty about Data Structures before. In fact there are a couple ebooks you can check out if you want.
But this module goes way deeper. In Solve.hs, we'll look into how some of these structures are implemented under the hood, and specifically how they're implemented differently in Haskell compared to other languages.
There are plenty of practice problems to help you get a feel for each structure. You'll learn about some of my favorite "derived structures" - abstract ideas that use the core structures under the hood, but express specific operations cleanly. On top of that, you'll get to implement a couple different data structures from scratch, which is super helpful for training yourself to understand how these structures perform in real life.
Part 2 - 2024
I wanted to release Part 1 of this course now, since these two modules on their own can be super impactful for beginning and intermediate Haskellers. Also, Advent of Code is coming up, a time when I have focused this blog's attention on problem solving anyway for the last couple years. So this course makes a perfect match for those of you who want to dive into Advent of Code this year, writing all your solutions in Haskell!
However, since problem solving is such a rich field with many areas to explore, I plan to add two more modules to this course in 2024. Module 3 will cover Essential Algorithms, with a special focus on graph algorithms in Haskell. Then Module 4 will go in-depth with Parsing, since many problem solving contests like Advent of Code do require you to parse your problem input, often from odd formats. I also intend to incorporate some material related to scale and performance testing.
After the full course release, Part 1 will continue to be its own option for purchase. There is, however, no risk in buying Part 1 now. Anyone who purchases Part 1 before the full release will receive a discount coupon for the full course based on what they paid for Part 1.
So, if you're interested, head to the course sales page to learn more and enroll in the course! But first make sure to subscribe to the Monday Morning Haskell mailing list, which will give you a 30% discount code for this and all of our other courses! This offer will only be good until Monday, November 27th, so don't wait!
Ballparking Solutions
In last week's article, we went over how to build a simple benchmarking library for ourselves by timing computations. Now most of the time, we're inclined to use benchmarking in a purely reactive way. We write a solution to a problem, and then document how long it takes.
But perhaps the most important use of benchmarking is proactive. In this article, we'll learn how to use ballpark estimates to guide our problem-solving process. This can save you a lot of time when you're working on performance-intensive problems, whether on Advent of Code, Hackerrank, or, of course, in the real world of software development!
The Uses of Ballparking
A ballpark estimate essentially allows us to provide an order of magnitude on the time we expect our solution to take. We should be able to abstractly define the runtime of our solution with Big-O notation. And given this designation, as well as a problem size, we should be able to determine if we'll get an answer within a second, a few seconds, a few minutes, or a few hours.
Using this ballpark estimate, we can then determine if our proposed solution is even feasible, without needing to implement the solution. Whether we're solving a problem on Advent of Code, Hackerrank, Leetcode, or in the real world, we can usually have some idea of the scale of a problem. AoC will let you observe the "large" problem input on which to test your code. Hackerrank will give you an idea of the bounds of various problem parameters.
These problem parameters can, in fact, tell us if a solution is acceptable before we even write that solution. For example, if my input problem size is 10000, roughly how long should my program take if it's O(n)? What about O(n log n)? What about O(n^2)?
Now it's up to situational judgment to determine what's "acceptable". As a general rule, no one wants to wait around for hours to wait for their code to finish. And yet, in real world machine learning development, taking several hours to train a model on the latest data is quite commonplace. However, Hackerrank will reject your solution (or at least not give you full points) if it takes more than a few seconds on the largest input. In Advent of Code meanwhile, you run the code locally, and you only need to get it once. So if your code takes 30 minutes or even an hour to run, you can still use it if you're sure the algorithm is correct (I've done this a few times).
Getting Ballpark Values
So how do we get an idea for these ballpark values? How do we estimate how an algorithm will perform on some data? Well we can use the library we built last time! All we have to do is select some known operations and time how long they take on varying sizes of inputs. So we can choose some candidate functions for different run times:
- O(n) -> Take the sum of a list of integers
- O(n log n) -> Sort a list of values
- O(n^2) -> Take the intersection of two sets of size n
- O(n^3) -> Populate a "cubic" multiplication table.
For each operation we can propose different sizes, and see what the times are. We'll do this with sum
first. To start, let's write a function to generate a randomized list of integers, given a range and a size:
import Control.Monad (replicateM)
import System.Random (randomRIO)
randomList :: (Int, Int) -> Int -> IO [Int]
randomList rng n = replicateM n (randomRIO rng)
Now we'll generate random lists for many different sizes:
sizes :: [Int]
sizes = [10, 100, 1000, 10000, 100000, 1000000, 10000000]
main :: IO ()
main = do
listsWithSizes <- zip sizes <$> mapM (randomList (1, 10000)) sizes
...
And now we just use our timeOp
function from the last article on each list. We'll print out the time it takes, followed by the size:
import qualified System.IO.Strict as S
timeOp :: (NFData b) => (a -> b) -> a -> S.SIO NominalDiffTime
...
main :: IO ()
main = do
listsWithSizes <- zip sizes <$> mapM (randomList (1, 10000)) sizes
putStrLn "Sum"
forM_ listsWithSizes $ \(sz, lst) -> do
t <- S.run (timeOp sum lst)
putStr (show t)
putStrLn $ " (" <> show sz <> ")"
With these sizes, we get the following output:
Sum
0.000039492s (10)
0.000024104s (100)
0.000031986s (1000)
0.000257356s (10000)
0.001830953s (100000)
0.014561602s (1000000)
0.287729319s (10000000)
And so we can see that if our algorithm is O(n), we can get an answer in less than a second, even for extremely large input sizes! If we went up to 100 million, it would start taking multiple seconds.
Larger Size Examples
Obviously though, O(n) is generally the best case scenario we can get, and most problems don't have such a solution. So we can start trying more expensive solutions. So for O(n log n), we'll do sorting. Now that we've got our template, it's easy to write this code:
import qualified Data.List as L
main :: IO ()
main = do
listsWithSizes <- zip sizes <$> mapM (randomList (1, 10000)) sizes
putStrLn "Sorting"
forM_ listsWithSizes $ \(sz, lst) -> do
t <- S.run (timeOp L.sort lst)
putStr (show t)
putStrLn $ " (" <> show sz <> ")"
And we get our results. With the same sizes, we see much longer times.
0.000011084s (10)
0.000028496s (100)
0.000264609s (1000)
0.004021375s (10000)
0.086308336s (100000)
1.691808146s (1000000)
39.815963511s (10000000)
We can sort 10 million items in under a minute, so an O(n log n) algorithm on such a large input would probably be fine for something like Advent of Code, but not for Hackerrank. If your size is limited to one hundred thousand or smaller, such an algorithm will probably work on Hackerrank.
As we increase the difficulty of the algorithm, we can't include as many test sizes, since we start getting prohibitively long times earlier! Let's try list intersection, which, using the naive algorithm in Data.List, is a quadratic algorithm. We'll only run up to 100000 items:
main = do
listsWithSizes <- zip sizes <$> mapM (randomList (1, 10000)) (take 5 sizes)
putStrLn "Intersect"
forM_ listsWithSizes $ \(sz, lst) -> do
t <- S.run (timeOp (L.intersect lst) lst')
putStr (show t)
putStrLn $ " (" <> show sz <> ")"
Here are the results:
Intersect
0.000009277s (10)
0.000092984s (100)
0.005050921s (1000)
0.396993992s (10000)
10.751072816s (100000)
So 10000 items might be doable. However, by intersecting a list with itself, we're being generous. The problem takes longer if the two lists are disjoint:
main = do
listsWithSizes <- zip sizes <$> mapM (randomList (1, 10000)) (take 5 sizes)
altLists <- mapM (randomList (10001, 20000)) (take 5 sizes)
putStrLn "Intersect"
forM_ (zip listsWithSizes altLists) $ \((sz, lst), lst') -> do
t <- S.run (timeOp (L.intersect lst) lst')
putStr (show t)
putStrLn $ " (" <> show sz <> ")"
Now it takes longer:
Intersect
0.000009074s (10)
0.000167983s (100)
0.010436625s (1000)
1.109146166s (10000)
215.779740028s (100000)
We're over a second for 10000, but 100000 is now quite prohibitive. Keep in mind though, this problem also has low constant factors. So while 10000 seems doable, a good general rule of thumb is that you should look for something faster than O(n^2) for this problem size.
For one last example, we'll do a cubic algorithm. This function produces a cubic multiplication table:
cubicTable :: [Int] -> [((Int, Int, Int), Int)]
cubicTable ns = [((x, y, z), x * y * z) | x <- ns, y <- ns, z <- ns]
We have to majorly shrink the sizes for this to be even tractable! Let's just try 10, 100 and 200.
main = do
listsWithSizes <- zip sizes <$> mapM (randomList (1, 10000)) [10, 100, 200]
putStrLn "Intersect"
forM_ listsWithSizes $ \(sz, lst) -> do
t <- S.run (timeOp cubicTable lst
putStr (show t)
putStrLn $ " (" <> show sz <> ")"
Already at size 200, we're taking several seconds.
Cubic Table
0.000397915s (10)
0.553257425s (100)
3.549556318s (200)
So if your problem size is larger than 100, you'll really need to find something better than a cubic algorithm.
With slower algorithms, it gets even worse. You may be able to get away with an exhaustive exponential or factorial algorithm, but only at size 10 or below. These kinds of algorithms are rarely tested in programming problems though, since they are the most brute force thing you can do.
Example: Fences Problem
Here's an example of applying this ballpark methodology. I run through this example in more detail in my Testing Series (including benchmarking). But the problem is a Hackerrank question called John and Fences where we are essentially looking to calculate the largest rectangular area we can find amidst fence segments of varying heights.
The solution here is recursive. We find the minimum height plank over an interval. We then either take that height multiplied by the length of the interval, or we recursively look strictly left of the interval and strictly right. We return the maximum of these three options.
At each recursive step, we need to keep finding the minimum over a particular interval. Naively, this takes O(n) time, and we may have to do this step O(n) times, giving an O(n^2) algorithm.
However, we can observe the constraints on the problem, and find that n could be as high as 10000.
This is our clue that O(n^2) may not be fast enough. And indeed, to get full credit, you need to find an algorithm that is O(n log n). You can do this with a Segment Tree, which lets you calculate the minimum over any interval in your fence in logarithmic time. You can read the series for a complete solution.
The power of ballparking is when you realize this fact before you start writing your code, skipping the slow implementation altogether.
Conclusion
For more tips and content on problem solving, make sure to subscribe to our mailing list! There's a special deal coming out next week that you won't want to miss!
Quick and Simple Benchmarking
As a programmer, you have to constantly be asking, "How Good is my Code?". There are a lot of different ways code can be "good" or "bad", but one of the key indicators in many cases is the time performance of your code. How long does it take your code to run on particular inputs?
Haskell isn't generally regarded as one of the fastest languages out there; it's not designed specifically with performance in mind. Languages like C, C++, and, at a higher level, Rust, are specifically geared for writing code that runs faster. But this doesn't mean performance is irrelevant in Haskell. As long as you're writing a real world application that people are going to use, you need to be concerned with the user experience, and no user wants to be waiting around for several seconds on simple operations because you didn't bother to optimize your code.
So we want ways to measure the performance of our code, a process also known as benchmarking. The Criterion Library is one of the main tools Haskell offers to help you do this. I've written about this library before in my series on Testing in Haskell.
But the library is a bit difficult to jump into if you're just a beginner or if you're doing something lightweight. It can also take a lot longer to run than you were expecting, since it runs your code many times to get a statistically significant average. So in this article we're going to go over a more lightweight way to benchmark your code. It turns out though that this is a little trickier in Haskell than in other languages, so let's learn some of these nuances.
Timing in Python
Timing a function in Python is rather straightforward. Here's some code we can use to benchmark an operation op
on a particular input_arg
:
# time_py.py
import time
def time_op(op, input_arg):
t1 = time.time()
result = op(input_arg)
t2 = time.time()
return t2 - t1
We get the initial time t1
, we perform the operation, and then we get the current time again, and we return the difference.
As a very basic example, we can compare running the sorted
function on a group of lists of escalating lengths:
xs = [10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000]
for x in xs:
a = time_op(sorted, xs)
print(a)
And running this command, we see that the amount of time increases for the number of elements in the list:
>> python3 time_py.py
2.6226043701171875e-06
1.1920928955078125e-06
5.245208740234375e-06
5.0067901611328125e-05
0.0007715225219726562
0.008384466171264648
0.08414745330810547
0.8780522346496582
Small list sizes don't take very long, and increasing the size of the list 10x increases the time by 10-20x, consistent with the n log n
nature of sorting.
Notably, these values don't feel like they reflect the total time in our program. This program takes 5-10 seconds to run on my machine, quite a bit more than the cumulative time shown here. But the values are at least directionally correct, indicating a longer time for larger operations.
First Haskell Attempt
Let's try repeating this code in Haskell. We'll need to work in the IO monad, since the equivalent to time.time()
is a call to getCurrentTime
. Here's what our code might look like:
import Control.Monad (forM_)
import qualified Data.List as L
import Data.Time.Clock
timeOp :: (a -> b) -> a -> IO NominalDiffTime
timeOp op input = do
t1 <- getCurrentTime
let result = op input
t2 <- getCurrentTime
return $ diffUTCTime t2 t1
main :: IO ()
main = forM_ sizes $ \x -> do
let input = [1..x]
diff <- timeOp L.sort input
print diff
sizes :: [Int]
sizes =
[ 10, 100, 1000, 10000
, 100000, 1000000
, 10000000, 100000000
]
When we run this program, we get some unhelpful results though…
0.000000576s
0.000000767s
0.000000684s
0.000000658s
0.000000751s
0.000000717s
0.00000075s
0.000000678s
All these times appear the same! What's happening?
Laziness at Work
What's happening, of course, is that Haskell's laziness is kicking in. Our program never demonstrates a need to use the values in our sorted lists. So it never actually evaluates these lists and performs the sorting!
So let's change things up a bit by returning our result along with the time, and printing the last value of the result list which will force the evaluation of the whole sorted list.
timeOp :: (a -> b) -> a -> IO (b, NominalDiffTime)
timeOp op input = do
t1 <- getCurrentTime
let result = op input
t2 <- getCurrentTime
return $ (result, diffUTCTime t2 t1)
main :: IO ()
main = forM_ sizes $ \x -> do
let input = [1..x]
(result, diff) <- timeOp L.sort input
print (diff, last result)
Our result now looks like this:
(0.000000672s,10)
(0.00000061s,100)
(0.000000363s,1000)
(0.00000042s,10000)
(0.000000661s,100000)
(0.000001089s,1000000)
(0.000001185s,10000000)
(0.000001256s,100000000)
Unfortunately, this is still not what we're looking for! Though there's a bit of increase in time, it's nowhere near the 10-20x increase we would like to see.
Haskell's laziness is still biting us. Our program is evaluating the list, but it's not evaluating the list when we would like it too. We want the whole evaluation to occur between our timing calls, but this doesn't happen. The complete list is only evaluated at the end of our loop, when we're about to print it out.
Printing Solutions
This can clue us in to some potential solutions though. Since printing is one of the sure-fire ways to ensure an expression is fully evaluated, we can add a Show
constraint and print the value out during the evaluation function. This removes the need to return this result from our function:
timeOp :: (Show b) => (a -> b) -> a -> IO NominalDiffTime
timeOp op input = do
t1 <- getCurrentTime
let result = op input
print result
t2 <- getCurrentTime
return $ diffUTCTime t2 t1
main :: IO ()
main = forM_ sizes $ \x -> do
let input = [1..x]
diff <- timeOp L.sort input
print diff
While technically this works, if we have non-trival result values to print (like a ten-million item list), then all the important data gets lost in this spam. Plus, our timing values now include the time it takes to print, which can end up being much larger!
We can get around this spam by instead printing the values to the /dev/null
file, which simply discards all this information.
timeOp :: (Show b) => (a -> b) -> a -> IO NominalDiffTime
timeOp op input = do
t1 <- getCurrentTime
let result = op input
withFile "/dev/null" WriteMode (\h -> hPutStrLn h (show result))
t2 <- getCurrentTime
return $ diffUTCTime t2 t1
With a sufficiently large amount of print information though, it is still possible that you can crash your program by printing to /dev/null
! I had to remove the one hundred million list size, but then I got the following output:
0.000048873s
0.000017368s
0.000058216s
0.00057228s
0.006825458s
0.09262504s
2.521403946s
Now we're getting values that seem accurate! But let's keep trying to improve how this function works!
Since printing large values can still crash our program, perhaps the more sensible choice is to provide a custom printing function instead of relying on the native Show
definition. For our example, we can print the "last" value of the list, like we used before. This fully evaluates the result, without printing a large amount of information.
timeOp :: (a -> b) -> a -> (b -> String) -> IO NominalDiffTime
timeOp op input toStr = do
t1 <- getCurrentTime
withFile "/dev/null" WriteMode (\h -> hPutStrLn h (toStr result))
t2 <- getCurrentTime
return $ diffUTCTime t2 t1
main :: IO ()
main = forM_ sizes $ \x -> do
let input = [1..x]
diff <- timeOp L.sort input (show . last)
print diff
This works, even running the largest size!
0.000153407s
0.000015817s
0.000028232s
0.000279035s
0.002366471s
0.098812637s
1.468213335s
17.183272162s
Notice though that the 10-million size has changed from 2.52 seconds to 1.46 seconds. Percentage-wise, that's actually a large error! There can be a fair amount of variation from run to run. This is why Criterion does many runs. However, our approach is still useful in letting us estimate orders of magnitude on our timing, which is often all you really need in assessing a solution.
Strict IO Library
Now, Haskell does have tools besides printing to fully evaluate expressions and otherwise introduce strictness into our code. One of these is the seq
function'. This takes two arguments and simply returns the second, but still strictly evaluates the first.
-- Returns the second argument, but strictly evaluates the first
seq :: a -> b -> b
This is actually used under the hood of the foldl'
function. This does a strict left-fold, avoiding the memory leak issues in the regular version of foldl
:
foldl' f z [] = z
foldl' f z (x:xs) =
let z' = z `f` x
-- Use 'seq' to evaluate z' early!
in seq z' $ foldl' f z' xs
You can also make particular fields on a data type strict, so that they are fully evaluated immediately when an item of that type is constructed. You indicate this with a bang operator in front of the type in the data definition.
-- The fields will be strictly evaluated
data MyStrictType = MyStrictType !Int !String
What I found most useful for our particular use case though is an older library called Strict IO. This provides a number of utilities for performing IO actions strictly instead of lazily. It introduces its own module SIO
to do this. We'll use 3 different functions from there:
-- Wrap an IO action taking no arguments and make it strict
wrap0 :: IO a -> SIO a
-- Wrap a pure value, evaluating it strictly
return' :: a -> SIO a
-- Run a strict IO action within the normal IO monad
run :: SIO a -> IO a
With these three pieces, our way forward is simple. We use wrap0
on our calls to getCurrentTime
to ensure they happen immediately. Then we use return'
to wrap the evaluation of our operation. And finally, we run
our timeOp
function from the IO monad.
import Control.DeepSeq (NFData)
import qualified System.IO.Strict as S
import qualified System.IO.Strict.Internals as S
timeOp :: (NFData b) => (a -> b) -> a -> S.SIO NominalDiffTime
timeOp op input = do
t1 <- S.wrap0 getCurrentTime
result <- S.return' (op input)
t2 <- S.wrap0 getCurrentTime
return $ (result, diffUTCTime t2 t1)
main :: IO ()
main = forM_ sizes $ \x -> do
let input = [1..x]
diff <- S.run (timeOp L.sort input)
print diff
And this gives us the timing values we want, with no need for relying on print as an evaluation crutch.
0.00000917s
0.000011727s
0.000049986s
0.000489229s
0.005072207s
0.102675824s
1.722589101s
20.870199397s
This means we don't need any extra "show"-ing functions as arguments to our timing operation. All we need is an NFData
constraint on our evaluated type. This exists for most basic types as long as we import Control.DeepSeq
from the deepseq
package.
As a weakness though, the strict-io
package is not part of recent Stack resolver sets, so you'll need it as an extra-dep
in stack.yaml
. I have not found a more recent or "official" alternative though. Also, the wrap0
function comes from a module labeled as Internals
. The library was mainly geared towards basic IO actions like file reading, and not getting the current time, so wrap0
wasn't intended to be part of the public API.
Conclusion
Now that we have a lighter-weight means of benchmarking our code (as compared to Criterion), we can explore ways that this helps us learn about problem solving! Come back next week and we'll see how!
And make sure to subscribe to our mailing list for more Haskell updates! If you're interested in problem solving content, there will be quite a lot of that coming up!
Big News Soon!
If you’ve ever been a follower of this blog, you may have noticed that I have not been posting much this year - basically nothing in the last 5 months or so. This year I experienced the birth of my first child, and subsequently moved 2500 miles with my family. So unfortunately I haven’t been able to prioritize my weekly blog content.
But I haven’t been completely staying away from Haskell! In fact, one of the other items I decided to prioritize is a new, upcoming course - the largest and most detailed I’ve done since Practical Haskell!
It’s not quite ready for publishing yet, but will be coming soon! So keep your eyes on this space!
Spring Sale: Final Day!
Today is the final day to subscribe and get 20% off any of our paid courses! Here are the potential sale prices you might get:
- Haskell From Scratch | Our Comprehensive Beginners Course | $79.20
- Practical Haskell | Learn about Useful Web Libraries and Project Concepts | 119.20
- Making Sense of Monads - Learn Haskell's Key Concept | $23.20
- Effectful Haskell - Take a Step further with Monadic Effect Systems | 31.20
- Haskell Brain - Combine Tensor Flow and Haskell | 31.20
In addition, you can also take a look at our new free course, Setup.hs. This course teaches you how to set up your basic Haskell toolchain, including IDE integrations!
So if you want that 20% discount code, make sure to subscribe to our mailing list before the end of the day!
This is How to Build Haskell with GNU Make (and why it's worth trying)
In a previous article I showed the GHC commands you need to compile a basic Haskell executable without explicitly using the source files from its dependencies. But when you're writing your own Haskell code, 99% of the time you want to be using a Haskell build system like Stack or Cabal for your compilation needs instead of writing your own GHC commands. (And you can learn how to use Stack in my new free course, Setup.hs).
But part of my motivation for solving that problem was that I wanted to try an interesting experiment:
How can I build my Haskell code using GNU Make?
GNU Make is a generic build system that allows you to specify components of your project, map out their dependencies, and dictate how your build artifacts are generated and run.
I wanted to structure my source code the same way I would in a Cabal-style application, but rely on GNU Make to chain together the necessary GHC compilation commands. I did this to help gain a deeper understanding of how a Haskell build system could work under the hood.
In a Haskell project, we map out our project structure in the .cabal
file. When we use GNU Make, our project is mapped out in the makefile
. Here's the Makefile we'll ultimately be constructing:
GHC = ~/.ghcup/ghc/9.2.5/bin/ghc
BIN = ./bin
EXE = ${BIN}/hello
LIB_DIR = ${BIN}/lib
SRCS = $(wildcard src/*.hs)
LIB_OBJS = $(wildcard ${LIB_DIR}/*.o)
library: ${SRCS}
@mkdir -p ${LIB_DIR}
@${GHC} ${SRCS} -hidir ${LIB_DIR} -odir ${LIB_DIR}
generate_run: app/Main.hs library
@mkdir -p ${BIN}
@cp ${LIB_DIR}/*.hi ${BIN}
@${GHC} -i${BIN} -c app/Main.hs -hidir ${BIN} -odir ${BIN}
@${GHC} ${BIN}/Main.o ${LIB_OBJS} -o ${EXE}
run: generate_run
@${EXE}
TEST_DIR = ${BIN}/test
TEST_EXE = ${TEST_DIR}/run_test
generate_test: test/Spec.hs library
@mkdir -p ${TEST_DIR}
@cp ${LIB_DIR}/*.hi ${TEST_DIR}
@${GHC} -i${TEST_DIR} -c test/Spec.hs -hidir ${TEST_DIR} -odir ${TEST_DIR}
@${GHC} ${TEST_DIR}/Main.o ${LIB_OBJS} -o ${TEST_EXE}
test: generate_test
@${TEST_EXE}
clean:
rm -rf ./bin
Over the course of this article, we'll build up this solution piece-by-piece. But first, let's understand exactly what Haskell code we're trying to build.
Our Source Code
We want to lay out our files like this, separating our source code (/src
directory), from our executable code (/app
) and our testing code (/test
):
.
├── app
│ └── Main.hs
├── makefile
├── src
│ ├── MyFunction.hs
│ └── TryStrings.hs
└── test
└── Spec.hs
Here's the source code for our three primary files:
-- src/MyStrings.hs
module MyStrings where
greeting :: String
greeting = "Hello"
-- src/MyFunction.hs
module MyFunction where
modifyString :: String -> String
modifyString x = base <> " " <> base
where
base = tail x <> [head x]
-- app/Main.hs
module Main where
import MyStrings (greeting)
import MyFunction (modifyString)
main :: IO ()
main = putStrLn (modifyString greeting)
And here's what our simple "Spec" test looks like. It doesn't use a testing library, it just prints different messages depending on whether or not we get the expected output from modifyString
.
-- test/Spec.hs
module Main where
import MyFunction (modifyString)
main :: IO ()
main = do
test "abcd" "bcda bcda"
test "Hello" "elloH elloH"
test :: String -> String -> IO ()
test input expected = do
let actual = modifyString input
putStr $ "Testing case: " <> input <> ": "
if expected /= actual
then putStrLn $ "Incorrect result! Expected: " <> expected <> " Actual: " <> actual
else putStrLn "Correct!"
The files are laid out the way we would expect for a basic Haskell application. We have our "library" code in the src
directory. We have a single "executable" in the app
directory. And we have a single "test suite" in the test
directory. Instead of having a Project.cabal
file at the root of our project, we'll have our makefile
. (At the end, we'll actually compare our Makefile with an equivalent .cabal
file).
But what does the Makefile look like? Well it would be overwhelming to construct it all at once. Let's begin slowly by treating our executable as a single file application.
Running a Single File Application
So for now, let's adjust Main.hs
so it's an independent file without any dependencies on our library modules:
-- app/Main.hs
module Main where
main :: IO ()
main = putStrLn "Hello"
The simplest way to run this file is runghc
. So let's create our first makefile
rule that will do this. A rule has a name, a set of prerequisites, and then a set of commands to run. We'll call our rule run
, and have it use runghc
on app/Main.hs
. We'll also include the app/Main.hs
as a prerequisite, since the rule will run differently if that file changes.
run: app/Main.hs
runghc app/Main.hs
And now we can run this run using make run
, and it will work!
$ make run
runghc app/Main.hs
Hello
Notice that it prints the command we're running. We can change this by using the @
symbol in front of the command in our Makefile. We'll do this with almost all our commands:
run: app/Main.hs
@runghc app/Main.hs
And it now runs our application without printing the command.
Using runghc
is convenient, but if we want to use dependencies from different directories, we'll eventually need to use multiple stages of compilation. So we'll want to create two distinct rules. One that generates the executable using ghc
, and another that actually runs the generated executable.
So let's create a generate_run
rule that will produce the build artifacts, and then run
will use them.
generate_run: app/Main.hs
@ghc app/Main.hs
run: generate_run
@./app/Main
Notice that run
can depend on generate_run
as a prerequisite, instead of the source file now. This also generates three build artifacts directly in our app
directory: the interface file Main.hi
, the object file Main.o
, and the executable Main
.
It's bad practice to mix build artifacts with source files, so let's use GHC's arguments (-hidir
, -odir
and -o
) to store these artifacts in their own directory called bin
.
generate_run: app/Main.hs
@mkdir -p ./bin
@ghc app/Main.hs -hidir ./bin -odir ./bin -o ./bin/hello
run: generate_run
@./bin/hello
We can then add a third rule to "clean" our project. This would remove all binary files so that we can do a fresh recompilation if we want.
clean:
rm -rf ./bin
For one final flourish in this section, we can use some variables. We can make one for the GHC compiler, referencing its absolute path instead of a symlink. This would make it easy to switch out the version if we wanted. We'll also add a variable for our bin
directory and the hello
executable, since these are used multiple times.
# Could easily switch versions if desired
# e.g. GHC = ~/.ghcup/ghc/9.4.4/bin/ghc
GHC = ~/.ghcup/ghc/9.2.5/bin/ghc
BIN = ./bin
EXE = ${BIN}/hello
generate_run: app/Main.hs
@mkdir -p ${BIN}
@${GHC} app/Main.hs -hidir ${BIN} -odir ${BIN} -o ${EXE}
run: generate_run
@${EXE}
clean:
rm -rf ./bin
And all this still works as expected!
$ generate_run
[1 of 1] Compiling Main (app/Main.hs, bin/Main.o)
Linking ./bin/hello
$ make run
Hello
$ make clean
rm -rf ./bin
So we have some basic rules for our executable. But remember our goal is to depend on a library. So let's add a new rule to generate the library objects.
Generating a Library
For this step, we would like to compile src/MyStrings.hs
and src/MyFunction.hs
. Each of these will generate an interface file (.hi
) and an object file (.o
). We want to place these artifacts in a specific library directory within our bin
folder.
We'll do this by means of a new rule, library
, which will use our two source files as its prerequisites. It will start by creating the library artifacts directory:
LIB_DIR = ${BIN}/lib
library: src/MyStrings.hs src/MyFunction.hs
@mkdir -p ${LIB_DIR}
...
But now the only thing we have to do is use GHC on both of our source files, using LIB_DIR
as the destination point.
LIB_DIR = ${BIN}/lib
library: src/MyStrings.hs src/MyFunction.hs
@mkdir -p ${LIB_DIR}
@ghc src/MyStrings.hs src/MyFunction.hs -hidir ${LIB_DIR} -odir ${LIB_DIR}
Now when we run the target, we'll see that it produces the desired files:
$ make library
$ ls ./bin/lib
MyFunction.hi MyFunction.o MyStrings.hi MyStrings.o
Right now though, if we added a new source file, we'd have to modify the rule in two places. We can fix this by adding a variable that uses wildcard
to match all our source files in the directory (src/*.hs
).
LIB_DIR = ${BIN}/lib
SRCS = $(wildcard src/*.hs)
library: ${SRCS}
@mkdir -p ${LIB_DIR}
@${GHC} ${SRCS} -hidir ${LIB_DIR} -odir ${LIB_DIR}
While we're learning about wildcard
, let's make another variable to capture all the produced object files. We'll use this in the next section.
LIB_OBJS = $(wildcard ${LIB_DIR}/*.o)
So great! We're producing our library artifacts. How do we use them?
Linking the Library
In this section, we'll link our library code with our executable. We'll begin by assuming our Main
file has gone back to its original form with imports, instead of the simplified form:
-- app/Main.hs
module Main where
import MyStrings (greeting)
import MyFunction (modifyString)
main :: IO ()
main = putStrLn (modifyString greeting)
We when try to generate_run
, compilation fails because it cannot find the modules we're trying to import:
$ make generate_run
...
Could not find module 'MyStrings'
...
Could not find module 'MyFunction'
As we went over in the previous article, the general approach to compiling the Main
module with its dependencies has two steps:
1. Compile with the -c
option (to stop before the linking stage) using -i
to point to a directory containing the interface files.
2. Compile the generated Main.o
object file together with the library .o
files to produce the executable.
So we'll be modifying our generate_main
rule with some extra steps. First of course, it must now depend on the library
rule. Then our first new command will be to copy the .hi
files from the lib
directory into the top-level bin
directory.
generate_run: app/Main.hs library
@mkdir -p ./bin
@cp ${LIB_DIR}/*.hi ${BIN}
...
We could have avoided this step by generating the library artifacts in bin
directly. I wanted to have a separate location for all of them though. And while there may be some way to direct the next command to find the headers in the lib
directory, none of the obvious ways worked for me.
Regardless, our next step will be to modify the ghc
call in this rule to use the -c
and -i
arguments. The rest stays the same:
generate_run: app/Main.hs library
@mkdir -p ./bin
@cp ${LIB_DIR}/*.hi ${BIN}
@${GHC} -i${BIN} -c app/Main.hs -hidir ${BIN} -odir ${BIN}
...
Finally, we invoke our final ghc
call, linking the .o
files together. At the command line, this would look like:
$ ghc ./bin/Main.o ./bin/lib/MyStrings.o ./bin/lib/MyFunction.o -o ./bin/hello
Recalling our LIB_OBJS
variable from up above, we can fill in the rule in our Makefile like so:
LIB_OBJS = $(wildcard ${LIB_DIR}/*.o)
generate_run: app/Main.hs library
@mkdir -p ./bin
@cp ${LIB_DIR}/*.hi ${BIN}
@${GHC} -i${BIN} -c app/Main.hs -hidir ${BIN} -odir ${BIN}
@${GHC} ${BIN}/Main.o ${LIB_OBJS} -o ${EXE}
And now our program will work as expected! We can clean it and jump straight to the make run
rule, since this will run its prerequisites make library
and make generate_run
automatically.
$ make clean
rm -rf ./bin
$ make run
[1 of 2] Compiling MyFunction (src/MyFunction.hs, bin/lib/MyFunction.o)
[2 of 2] Compiling MyStrings (src/MyStrings.hs, bin/lib/MyStrings.o)
elloH elloH
So we've covered the library
and an executable
, but most Haskell projects have at least one test suite. So how would we implement that?
Adding a Test Suite
Well, a test suite is basically just a special executable. So we'll make another pair of rules, generate_test
and test
, that will mimic generate_run
and run
. Very little changes, except that we'll make another special directory within bin
for our test artifacts.
TEST_DIR = ${BIN}/test
TEST_EXE = ${TEST_DIR}/run_test
generate_test: test/Spec.hs library
@mkdir -p ${TEST_DIR}
@cp ${LIB_DIR}/*.hi ${TEST_DIR}
@${GHC} -i${TEST_DIR} -c test/Spec.hs -hidir ${TEST_DIR} -odir ${TEST_DIR}
@${GHC} ${TEST_DIR}/Main.o ${LIB_OBJS} -o ${TEST_EXE}
test: generate_test
@${TEST_EXE}
Of note here is that at the final step, we're still using Main.o
instead of Spec.o
. Since it's an executable module, it also compiles as Main
.
But we can then use this to run our tests!
$ make clean
$ make test
[1 of 2] Compiling MyFunction (src/MyFunction.hs, bin/lib/MyFunction.o)
[2 of 2] Compiling MyStrings (src/MyStrings.hs, bin/lib/MyStrings.o)
Testing case: abcd: Correct!
Testing case: Hello: Correct!
So now we have all the different components we'd expect in a normal Haskell project. So it's interesting to consider how our makefile
definition would compare against an equivalent .cabal
file for this project.
Comparing to a Cabal File
Suppose we want to call our project HaskellMake
and store its configuration in HaskellMake.cabal
. We'd start our Cabal file with four metadata lines:
cabal-version: 1.12
name: HaskellMake
version: 0.1.0.0
build-type: Simple
Now our library would expose its two modules, using the src
directory as its root. The only "dependency" is the Haskell base
packages. Finally, default-language
is a required field.
library
exposed-modules:
MyStrings
, MyFunction
hs-source-dirs:
src
build-depends:
base
default-language: Haskell2010
The executable would similarly describe where the files are located and state a base
dependency as well as a dependency on the library itself.
executable hello
main-is: Main.hs
hs-source-dirs:
app
build-depends:
base
, HaskellMake
default-language: Haskell2010
Finally, our test suite would look very similar to the executable, just with a different directory and filename.
test-suite make-test
type: exitcode-stdio-1.0
main-is: Spec.hs
hs-source-dirs:
test
build-depends:
base
, HaskellMake
default-language: Haskell2010
And, if we add a bit more boilerplate, we could actually then compile our code with Stack! First we need a stack.yaml
specifying the resolver and the package location:
# stack.yaml
resolver: lts-20.12
packages:
- .
Then we need Setup.hs
:
-- Setup.hs
import Distribution.Simple
main = defaultMain
And now we could actually run our code!
$ stack build
$ stack exec hello
elloH elloH
$ stack test
Testing case: abcd: Correct!
Testing case: Hello: Correct!
Now observant viewers will note that we don't use any Hackage dependencies in our code - only base
, which GHC always knows how to find. It would require a lot of work for us to replicate dependency management. We could download a .zip file with curl
easily enough, but tracking the whole dependency tree would be extremely difficult.
And indeed, many engineers have spent a lot of time getting this process to work well with Stack and Cabal! So while it would be a useful exercise to try to do this manually with a simple dependency, I'll leave that for a future article.
When comparing the two file definitions, Undoubtedly, the .cabal
definition is more concise and human readable, but it hides a lot of implementation details. Most of the time, this is a good thing! This is exactly what we expect from tools in general; they should allow us to work more quickly without having to worry about details.
But there are times where we might, on our time, want to occasionally try out a more adventurous path like we've done in this article that avoids relying too much on modern tooling. So why was this article a "useful exercise"™?
What's the Point?
So obviously, there's no chance this Makefile approach is suddenly going to supplant Cabal and Stack for building Haskell projects. Stack and Cabal are "better" for Haskell precisely because they account for the intricacies of Haskell development. In fact, by their design, GHC and Cabal both already incorporate some key ideas and features from GNU Make, especially with avoiding re-work through dependency calculation.
But there's a lot you can learn by trying this kind of exercise.
First of all, we learned about GNU Make. This tool can be very useful if you're constructing a program that combines components from different languages and systems. You could even build your Haskell code with Stack, but combine it with something else in a makefile
.
A case and point for this is my recent work with Haskell and AWS. The commands for creating a docker image, authenticating to AWS and deploying it are lengthy and difficult to remember. A makefile
can, at the very least, serve as a rudimentary aliasing tool. You could run make deploy
and have it automatically rebuild your changes into a Docker image and deploy that to your server.
But beyond this, it's important to take time to deliberately understand how our tools work. Stack and Cabal are great tools. But if they seem like black magic to you, then it can be a good idea to spend some time understanding what is happening at an internal level - like how GHC is being used under the hood to create our build artifacts.
Most of the fun in programming comes in effectively applying tools to create useful programs quickly. But if you ever want to make good tools in the future, you have to understand what's happening at a deeper level! At least a couple times a year, you should strive to go one level deeper in your understanding of your programming stack.
For me this time, it was understanding just a little more about GHC. Next time I might dive into dependency management, or a different topic like the internal workings of Haskell data structures. These kinds of topics might not seem immediately applicable in your day job, but you'll be surprised at the times when deeper knowledge will pay dividends for you.
Getting Better at Haskell
But enough philosophizing. If you're completely new to Haskell, going "one level deeper" might simply mean the practical ability to use these tools at a basic level. If your knowledge is more intermediate, you might want to explore ways to improve your development process. These thoughts can lead to questions like:
1. What's the best way to set up my Haskell toolchain in 2023?
2. How do I get more efficient and effective as a Haskell programmer?
You can answer these questions by signing up for my new free course Setup.hs! This will teach how to install your Haskell toolchain with GHCup and get you started running and testing your code.
Best of all, it will teach you how to use the Haskell Language Server to get code hints in your editor, which can massively increase your rate of progress. You can read more about the course in this blog post.
If you subscribe to our monthly newsletter, you'll also get an extra bonus - a 20% discount on any of our paid courses. This offer is good for two more weeks (until May 1) so don't miss out!
How to Make ChatGPT Go Around in Circles (with GHC and Haskell)
As part of my research for the recently released (and free!) Setup.hs course, I wanted to explore the different kinds of compilation commands you can run with GHC outside the context of a build system.
I wanted to know…
Can I use GHC to compile a Haskell module without its dependent source files?
The answer, obviously, should be yes. When you use Stack or Cabal to get dependencies from Hackage, you aren't downloading and recompiling all the source files for those libraries.
And I eventually managed to do it. It doesn't seem hard once you know the commands already:
$ mkdir bin
$ ghc src/MyStrings.hs src/MyFunction.hs -hidir ./bin -odir ./bin
$ ghc -c app/Main.hs -i./bin -hidir ./bin -odir ./bin
$ ghc bin/Main.o ./bin/MyStrings.o ./bin/MyFunction.o -o ./bin/hello
$ ./bin/hello
...
But, being unfamiliar with the inner workings of GHC, I struggled for a while to find this exact combination of commands, especially with their arguments.
So, like I did last week, I turned to the latest tool in the developer's toolkit: ChatGPT. But once again, everyone's new favorite pair programmer had some struggles of its own on the topic! So let's start by defining exactly the problem we're trying to solve.
The Setup
Let's start with a quick look at our initial file tree.
├── app
│ └── Main.hs
└── src
├── MyFunction.hs
└── MyStrings.hs
This is meant to look the way I would organize my code in a Stack project. We have two "library" modules in the src
directory, and one executable module in the app
directory that will depend on the library modules. These files are all very simple:
-- src/MyStrings.hs
module MyStrings where
greeting :: String
greeting = "Hello"
-- src/MyFunction.hs
module MyFunction where
modifyString :: String -> String
modifyString x = base <> " " <> base
where
base = tail x <> [head x]
-- app/Main.hs
module Main where
import MyStrings (greeting)
import MyFunction (modifyString)
main :: IO ()
main = putStrLn (modifyString greeting)
Our goal is to compile and run the executable with two constraints:
1. Use only GHC (no Stack or Cabal involved)
2. Compile the library separately, so that the Main
module could be compiled using only the library's build artifacts, and not the source files.
Trying to Compile
Now, there are two easy ways to compile this code if we're willing to violate our constraints (particularly the second one). If all three files are in the same directory, GHC can immediately find the modules we're importing, so we can just call ghc Main.hs
.
└── src
├── Main.hs
├── MyFunction.hs
└── MyStrings.hs
...
$ cd src
$ ghc Main.hs
$ ./Main
elloH elloH
If we keep the original file structure, it's still pretty easy if we're willing to violate constraint 2 by using the library source files in the ghc
command:
$ ghc app/Main.hs src/MyStrings.hs src/MyFunction.hs
$ ./Main
elloH elloH
However, with my extra constraint of not using the library source files directly, things got a little more difficult. I originally thought "Hey, let's just include the build artifacts in the GHC call." Each library module generates an interface file (.hi
) and an object file (.o
). So let's try including those in the GHC call:
$ ghc src/MyStrings.hs src/MyFunction.hs
$ ls -l src/
MyFunction.hi MyFunction.hs MyFunction.o MyStrings.hi MyStrings.hs MyStrings.o
$ ghc app/Main.hs src/MyFunction.hi src/MyFunction.o src/MyStrings.hi src/MyStrings.o
...
Could not find module 'MyStrings'
...
Could not find module 'MyFunction'
As we can see, it doesn't work! Even though we're including the build artifacts as direct arguments of ghc
, this doesn't help our main module to find them.
So, feeling a little too lazy to dig deep through GHC documentation, and feeling like this problem might have too many arbitrary constraints to phrase in a Google query, I decided to try our aforementioned new favorite tool.
Enter: ChatGPT
The First Session
(Conversation has been abbreviated for relevance)
Me
How do I link existing
.o
files with a new source module using the 'ghc' command?
Chat GPT
To link existing object (
.o
) files with a new source module using the ghc command, you need to include the object files in the list of input files passed to the ghc command.
Assuming you have an existing
.o
file calledexample.o
and a new source module calledMyModule.hs
, you can link them together into an executable program using the following command:
$ ghc -o MyProgram MyModule.hs example.o
This command tells ghc to compile and link the
MyModule.hs
source file with theexample.o
object file and create an executable program calledMyProgram
. The-o
flag specifies the output file name.
Me
Is it possible to use a ghc command to link a Haskell object file (
.o
) with a Haskell source file that imports the other module?
Chat GPT
Yes, it is possible to use the ghc command to link a Haskell object file (
.o
) with a Haskell source file that imports the other module. Here's an example:
Suppose you have two files,
MyModule.hs
andOtherModule.o
, andMyModule.hs
imports a module that is defined inOtherModule.o
. To link the two files into an executable program, you can use the following command:
$ ghc -o MyProgram MyModule.hs OtherModule.o
This command tells ghc to compile and link the
MyModule.hs
source file with theOtherModule.o
object file, and create an executable program calledMyProgram
.
Me
Hmmm I don't think this works. I created the following file
OtherModule.hs
:
module OtherModule where
greeting :: String
greeting = "Hello"
And I'm trying to include it in another file:
module Main where
import OtherModule (greeting)
main :: IO ()
main = putStrLn greeting
I started by compiling OtherModule:
$ ghc src/OtherModule.hs
Then I moved
OtherModule.o
to the app directory (withMyModule.hs
), and tried running this command:
$ cd app
$ ghc -o MyProgram MyModule.hs OtherModule.o
I get the error:
"Could not find module 'OtherModule'"
Chat GPT
I apologize for the confusion in my previous answer. You are correct that simply linking the object file with the executable does not work in this case.
When GHC compiles a module, it generates an interface file (
.hi
) that contains information about the module's name, type signatures, and other details. The compiled object file (.o
) contains the compiled machine code for the functions defined in the module.
When you try to import a module in a source file, GHC looks for the interface file (
OtherModule.hi
) to find information about the module's name and type signatures. If the interface file is not found, GHC will raise an error.
In your case, when you compile
OtherModule.hs
, GHC generates an interface file calledOtherModule.hi
. To compileMyModule.hs
, GHC needs to be able to findOtherModule.hi
in order to resolve the import ofOtherModule
. Simply linking theOtherModule.o
object file is not enough.
To make this work, you need to compile OtherModule.hs and MyModule.hs together, like this:
ghc -o MyProgram MyModule.hs src/OtherModule.hs
This command tells GHC to compile
MyModule.hs
andOtherModule.hs
together, and create an executable program calledMyProgram
.
I hope this clears up the confusion. Again, I apologize for the confusion in my previous answer.
Me
Ah, so I need to include both source files, and it can't be done by linking an object file with another source file?
Chat GPT
Yes, that's correct. When you import a module in a source file, GHC needs to be able to find the interface file (
.hi
) for that module in order to resolve the import. If the interface file is not found, GHC will raise an error.
When you compile a source file with GHC, it generates both an object file (
.o
) and an interface file (.hi
). The object file contains the compiled machine code for the functions defined in the module, while the interface file contains information about the module's name, type signatures, and other details.
When you compile a source file that imports another module, GHC needs to be able to find the interface file for that module in order to compile the importing module. If the interface file is not found, GHC will raise an error.
So in your case, you need to compile both
MyModule.hs
andOtherModule.hs
together so that GHC can generate the necessary interface files and link the resulting object files together to produce the executable program.
It is not possible to directly link an object file with a source file that imports it, as GHC needs the interface file to be present in order to compile the importing module.
Analysis
Suspiciously, ChatGPT suggests a less detailed command than the previous one that failed for me!
$ ghc -o MyProgram MyModule.hs OtherModule.o
It includes only the object file, instead of the object and interface file. And, of course this fails, like I showed above.
Now there's an interesting statement that will come into play later, though I didn't realize it!
Yes, that's correct. When you import a module in a source file, GHC needs to be able to find the interface file (
.hi
) for that module in order to resolve the import. If the interface file is not found, GHC will raise an error.
I assumed that to "find" the interface file meant providing it on the command line (as in my original command). So with that idea having failed, I fixated on one part of the final statement:
It is not possible to directly link an object file with a source file that imports it...
And concluded that getting the right information from ChatGPT might be a lost cause. It seemed like it was sure this was possible, and then just as sure that it was impossible. But details matter, as we'll see!
Working Around It
So feeling slightly defeated, I decided to poke around the areas of the code where Cabal actually saves packages to see if there was a pattern I could follow. And I managed to cobble together a convoluted workaround.
What I observed was that GHC could use any module from a library in the Cabal package database. On my system's setup, this could be found in the .ghcup
cache:
$ cd ~/.ghcup/ghc/9.2.5/lib/ghc-9.2.5/
$ ls
array-0.5.4.0/
base-4.16.4.0/
...
package.conf.d/
$ ls package.conf.d
array-0.5.4.0.conf
base-4.16.4.0.conf
...
And indeed, the package directories did not have source files, only various forms of compiled object and interface files. And each package had a corresponding .conf
file in the package.conf.d
directory.
So I could...
- Make my own directory in the package database (
MyPackage-0.1.0.0
) - Place the
.hi
and.o
files in there - Create a phony
MyPackage-0.1.0.0.conf
and put it inpackage.conf.d
. (Base it on an existing.conf
file but make some substitutions)
And this...technically...works. But it makes me feel a little bit like:
But manually manipulating a package database wasn't the solution I had in mind though. Surely there's something easier, right?
The Second Session
After a week of sitting on this, I decided to come back to ChatGPT and see if it had any other ideas. I decided to frame my question in a different way, stating a lot more details up front. Lo and behold, it had a different answer this time!
Me
Suppose I have a Haskell module called
MyStrings
:
module MyStrings where
greeting :: String
greeting = "Hello"
I can compile this module:
$ ghc MyStrings.hs
This generates an interface file,
MyStrings.hi
, and an object file,MyStrings.o
.
Now suppose I have an executable file,
Main.hs
, that importsMyStrings
:
module Main where
import MyStrings (greeting)
main :: IO ()
main = putStrLn greeting
Can I use ghc to compile this Main file by using the
.hi
and.o
files, without including the imported source file,MyStrings.hs
in the command?
Chat
Yes, you can use the
.hi
and.o
files generated by compilingMyStrings.hs
to compileMain.hs
without includingMyStrings.hs
in the command. To do so, you can use the-c
option to tell GHC to compileMain.hs
to an object file, and then use the-o
option to link the object files together into an executable:
$ ghc -c Main.hs
$ ghc Main.o MyStrings.o -o myprogram
The first command compiles
Main.hs
to an object fileMain.o
, without attempting to link it. The second command linksMain.o
withMyStrings.o
and creates an executable calledmyprogram
. The.hi
files are not needed for linking, so you don't need to include them in the command.
Note that in order for this to work, the interface file
MyStrings.hi
must be present in the same directory asMain.hs
when you compile it. If you moveMyStrings.hi
to a different directory, you'll need to use the-i
option to tell GHC where to find it:
$ ghc -i/path/to/MyStrings -c Main.hs
$ ghc Main.o /path/to/MyStrings/MyStrings.o -o myprogram
Here,
-i/path/to/MyStrings
tells GHC to look forMyStrings.hi
in the directory/path/to/MyStrings
.
Analysis
So there are two key ingredients that weren't mentioned at all in our first pass: the -c
option and the -i
option.
Using -c
performs compilation without linking. With this option, GHC is able to produce an object file for our Main
module without needing its dependent objects. So .o
files are unnecessary in this stage.
We still need the .hi
files though. But instead of providing them on the command line, we use the -i
argument to supply them. It's an odd argument, because we put the path right after the i, without any spacing.
After we're done with the first phase, then we can link all our object files together.
Solving It
And sure enough, this approach works!
$ ghc src/MyStrings.hs src/MyFunction.hs
$ ghc -c app/Main.hs -i./src
$ ghc app/Main.o ./src/MyStrings.o ./src/MyFunction.o -o hello
$ ./hello
elloH elloH
And if we want to be a little cleaner about putting our artifacts in a single location, we can use the -hidir
and -odir
arguments for storing everything in a bin
directory.
$ mkdir bin
$ ghc src/MyStrings.hs src/MyFunction.hs -hidir ./bin -odir ./bin
$ ghc -c app/Main.hs -i./bin -hidir ./bin -odir ./bin
$ ghc bin/Main.o ./bin/MyStrings.o ./bin/MyFunction.o -o ./bin/hello
$ ./bin/hello
elloH elloH
And we're done! Our program is compiling as we wanted it to, without our "Main" compilation command directly using the library source files.
Conclusion
So with that fun little adventure concluded, what can we learn from this? Well first of all, prompts matter a great deal when you're using a Chatbot. The more detailed your prompt, and the more you spell out your assumptions, the more likely you'll get the answer you're looking for. My second prompt was waaay more detailed than my first prompt, and the solution was much better as a result.
But a more pertinent lesson for Haskellers might be that using GHC by itself can be a big pain. So if you're a beginner, you might be asking:
What's the normal way to build Haskell Code?
You can learn all about building and running your Haskell code in our new free course, Setup.hs. This course will teach you the easy steps to set up your Haskell toolchain, and show you how to build and run your code using Stack, Haskell's most popular build system. You'll even learn how to get Haskell integrations in several popular code editors so you can learn from your mistakes much more quickly. Learn more about it on the course page.
And if you subscribe to our monthly newsletter, you'll get a code for 20% off any of our paid courses until May 1st! So don't miss out on that offer!
How to Stump ChatGPT in 1 Easy Step (Hint: Ask a Windows Question)
While doing some research for Setup.hs, my new free course, I confronted a fact of Haskell development with Stack.
The commands for running individual test suites are cumbersome.
For example, if you have a parse-tests
suite in your Project.cabal
file, you can run it (without running other suites) like so:
$ stack build Project:test:parse-tests
The structure isn't hard once you're used to it, but remembering the exact names of your test suites isn't always easy on larger projects. This leads to the question:
How can I create aliases for 'stack build' commands?
On Unix-based systems, it's quite easy. Just slap something like this in your .bashrc
(or equivalent) and you're good to go:
alias ppt="stack build Project:test:parse-tests"
...
$ source ~/.bashrc
$ ppt
(Tests run)
Windows though is, as always, a special beast. And much as I generally prefer to use Windows Subsystem for Linux, I wanted to help any users who might be working with a pure Windows setup. This article will focus on:
How do I enable 'stack' aliases in Windows PowerShell?
Now, when I lay out the code I put in my PowerShell profile, it looks a little dense but fairly innocuous:
Function ProjectParseTest { Invoke-Expression "stack build Project:test:parse-tests" }
New-Alias -Name qpt -Value ProjectParseTest
But discovering the exact formula for getting all this to work was quite difficult. And as we'll see, in fact, even ChatGPT, the new kid on the block, struggles with this question. After a brief while struggling through Windows documentation, I turned to the chat bot. Let's review the first question:
ChatGPT Question 1
Me
How can I make command aliases in Windows 11 Powershell?
Chat GPT
You can create command aliases in Windows 11 PowerShell by using the
New-Alias
cmdlet. Here's an example that creates an alias ls for theGet-ChildItem
cmdlet:
New-Alias -Name ls -Value Get-ChildItem
This will create an alias named
ls
that can be used just like theGet-ChildItem
cmdlet. For example:
ls .
This will list the contents of the current directory, just as if you had typed
Get-ChildItem
..
To persist your alias across sessions, you need to add it to your PowerShell profile. To do that, first determine the path to your profile by running the following command:
$ $profile
If the profile does not exist yet, you can create it by running the following command:
New-Item -Type File -Path $profile -Force
Then, open the profile in your favorite text editor and add the New-Alias command. Save the file and restart your PowerShell session. Your alias will be available every time you start PowerShell.
Analysis
Now, everything Chat has told me is technically true. The command it gives successfully creates an alias. Notably though, if you want to try this yourself, you should use a different alias, as Powershell already includes the alias ls
in most recent Windows version (previously you would need to use the alias dir
to get that effect).
Additionally, I found that the PROFILE
variable was indeed set, but the file did not exist. In fact, the directory it was supposed to be in (Documents/WindowsPowerShell
) did not exist either! So you'll have to make that before creating the file. But placing an alias command in there will persist the alias across sessions.
However, notice that this doesn't quite answer our question, because the subject of the alias was an established Windows expression (Get-ChildItem
, also know as a cmdlet
), rather than a terminal command we would write.
Let's suppose we just want to alias sb=stack build
. It does not work to simply use a string for the command:
There are some clues to our eventual solution in the error message though. It tells us we need a cmdlet
, a function, a script file, or an operable program.
Now stack
by itself is an operable program. So we could actually do the following:
PS > New-Alias -Name st -Value "stack"
PS > st
(stack usage output)
But we want to provide arguments as part of the alias. So let's see if ChatGPT knows how. We'll see that its answers are not quite right.
ChatGPT Question 2
Me
How do I make the alias run a command, like "stack build"
Chat GPT
You can create an alias that runs a command by using the
Invoke-Expression
cmdlet to evaluate a string as a PowerShell expression. Here's an example that creates an aliassb
for thestack build
command:
New-Alias -Name sb -Value { Invoke-Expression "stack build" }
This will create an alias named
sb
that can be used to run thestack build
command. For example:
sb
This will run the
stack build
command, just as if you had typed it into the PowerShell prompt.
As with the previous example, you can add this alias to your PowerShell profile to persist it across sessions.
Me
I get an error "Cannot evaluate parameter 'Value' because its argument is specified as a script block and there is no input"
How do I specify input in this alias?
Chat GPT
To specify input in a script block alias, you can use the
Param
keyword to declare input parameters and then pass the input to the underlying command. Here's an example that creates an aliassb
for the stack build command, allowing you to specify the target to build as input:
New-Alias -Name sb -Value {
Param ($target)
Invoke-Expression "stack build $target"
}
This will create an alias named
sb
that can be used to run thestack build
command with a specified target. For example:
sb my-target
This will run the
stack build my-target
command, just as if you had typed it into the PowerShell prompt.
As with the previous examples, you can add this alias to your PowerShell profile to persist it across sessions.
Analysis
Both of the proposed approaches fail! I commented on the issue explicitly after the first response, and then got the same error running the second approach ("Cannot evaluate parameter 'Value' because its argument is specified as a script block..."). So I gave up on my interrogation and started searching for documentation again.
Solving the Problem
I don't remember exactly where I found the solution. I think it involved cobbling together points from multiple Stack Overflow posts. Windows documentation was not very helpful. Let's recall the error message from earlier:
The term 'stack build' is not recognized as the name of a cmdlet, function, script file, or operable program.
We can't make an official Windows PS cmdlet out of our program, nor can we make an operable program with the arguments we want. We could make a script file, but running scripts from PowerShell is surprisingly difficult (there are some extra steps with permissions). The answer is that we can make a function that our alias will refer to.
This function will, incidentally, use the Invoke-Expression
idea ChatGPT recommended as well, just not directly as the alias value!
The following code will go in our $PROFILE
. First, we make a function that invokes our expression. We can give this function an arbitrary name, but I used a capitalized name (ProjectParseTest
) to distinguish from any potential aliases.
Function ProjectParseTest { Invoke-Expression "stack build Project:test:parse-tests" }
Now we can use this function as the object of a New-Alias
call! So we use the command ChatGPT suggested, just substituting our function for the -Value
instead of providing a raw Invoke-Expression
command.
Function ProjectParseTest { Invoke-Expression "stack build Project:test:parse-tests" }
New-Alias -Name ppt -Value ProjectParseTest
This alias succeeds now, and by putting this in my PowerShell profile, I can persist the alias across sessions!
$ ppt
(Test runs)
Haskell on Windows
Now aliases are just one small piece of the Haskell puzzle. So if you're trying to get started with Haskell, but don't have a Mac, and aren't familiar with Linux, you might want to know:
How do I set up my Haskell toolchain on Windows?
My new free course Setup.hs goes over all the basics of setting up your Haskell toolchain, including how to get started with code hints in three of the most popular editors out there. Plus, every lecture includes a walkthrough video for Windows* so you can learn what kinds of odd quirks you might come across! You can read more about the course in this article.
Plus, if you subscribe to our monthly newsletter, you'll also get a 20% discount code for all our paid courses that is good until May 1! So don't miss out on your chance to learn about Haskell!
*Separate walkthrough videos for MacOS and Linux are also included.
New Free Course: Setup.hs!
You can read all the Haskell articles you want, but unless you write the code for yourself, you'll never get anywhere! But there are so many different tools and ideas floating around out there, so how are you supposed to know what to do? How do you even get started writing a Haskell project? And how can you make your development process as efficient as possible?
The first course I ever made, Your First Haskell Project, was designed to help beginners answer these questions. But over the years, it's become a bit dated, and I thought it would be good to sunset that course and replace it with a new alternative, Setup.hs. Like its predecessor, Setup.hs is totally free!
Setup.hs is a short course designed for many levels of Haskellers! Newcomers will learn all the basics of building and running your code. More experienced Haskellers will get some new tools for managing all your Haskell-related programs, as well as some tips for integrating Haskell features into your code editor!
Here's what you'll learn in the course:
- How to install and manage all of the core Haskell tools (GHC, Cabal, Stack)
- What components you need in your Haskell project and how you can build and run them all
- How to get your editor to use advanced features, like flagging compilation errors and providing autocomplete suggestions.
We'll do all of this in a hands-on way with detailed, step-by-step exercises!
Improvements
Setup.hs makes a few notable updates and improvements compared to Your First Haskell Project.
First, it uses GHCup to install all the necessary tools instead of the now-deprecated Haskell Platform. GHCup allows for seamless switching between the different versions of all our tools, which can be very useful when you have many projects on your system!
Second, it goes into more details about pretty much every topic, whether that's project organization, Stack snapshots, and extra dependencies.
Third and probably most importantly, Setup.hs will teach you how to get Haskell code hints in three of the most common code editors (VS Code, Vim & Emacs) using Haskell Language Server. Even if these lectures don't cover the particular editor you use, they'll give you a great idea of what you need to search for to learn how. I can't overstate how useful these kinds of integrations are. They'll massively speed up your development and, if you're a beginner, they'll rapidly accelerate your learning.
If all this sounds super interesting to you, head over to the course page and sign up!
Series Spotlight: Monads and Functional Structures!
Every so often I like to spotlight some of the permanent series you can find on the skills page, which contains over a dozen tutorial series for you to follow! This week I’m highlighting my series on Monads and Functional Structures!
Monads are widely understood to be one of the trickier concepts for newcomers to Haskell, since they are very important, very abstract and conceptual, and do not really appear in most mainstream languages. There are a lot of monad tutorials out there on the internet, most of which are either too shallow, or too deep.
This series will help you understand the concept from the group up, starting with simpler abstract structures like functors and applicative functors.
Here’s an outline of the series:
For a more in depth look at monads and the effects they help us implement in our code, you can check out our two courses, Making Sense of Monads and Effectful Haskell!
GHC 9.6.1 Includes Javascript Backend
Some exciting news this week, as the release of GHC 9.6.1 includes the merger of a couple of different web-based backends - one for Web Assembly and one for Javascript. These features move Haskell in the direction of being a first-class web development language!
From the release notes:
The WebAssembly backend has been merged. This allows GHC to be built as a cross-compiler that targets
wasm32-wasi
and compiles Haskell code to self-contained WebAssembly modules that can be executed on a variety of different runtimes.The JavaScript backend has been merged. GHC is now able to be built as a cross-compiler targeting the JavaScript platform.
This is a particularly exciting direction for me, since I’ve been exploring ways to use Haskell in web development for many years, but found a lot of the current approaches require a lot of onboarding work to really get going in a meaningful way. In my Practical Haskell course, I show how to do a basic integration of a Haskell Web Server and an Elm frontend. But I look forward to the day when I can re-do that section of the course entirely with Haskell!
Of course, merging the backends is just a first step - there’s a long way to go. A few caveats mentioned in the release notes as well:
There are a few caveats to be aware of [with the WebAssembly backend]:
To use the WebAssembly backend, one would need to follow the instructions on ghc-wasm-meta. The WebAssembly backend is not included in the GHC release bindists for the time being, nor is it supported by
ghcup
orstack
yet.The WebAssembly backend is still under active development. It’s presented in this GHC version as a technology preview, bugs and missing features are expected.
The [Javascript] backend should be considered a technology preview. As such it is not ready for use in production, is not distributed in the GHC release bindists and requires the user to manually build GHC as a cross-compiler. See the JavaScript backend wiki page on the GHC wiki for the current status, project roadmap, build instructions and demos.
Both of these backends are technology previews, meaning they’re only ready for the most adventurous Haskellers to start experimenting - a lot of setup work is still required. But it’s certainly an important step in the right direction! Since these are included with GHC 9.6.1, improvements are possible in further minor releases to GHC 9.6, rather than needing to wait for the next major release of GHC 9.8.
Adding a Database to our AWS Server
In the last few articles on the blog, we've been exploring how to launch a Haskell web server using AWS. Here are the steps we've done so far:
- Create a local Docker Image
- Upload the Docker Image to ECR
- Deploy your Server using Elastic Beanstalk
In this final part of the series, we're going to learn to attach a database to our application.
There are a few gotchas to this. Setting up the database for first time use is a bit tricky, because we have to do some initial migrations. Then we need to use environment variables to ensure it works both locally and on the remote server. Let's get started.
A Basic Schema
Let's first assume we have a super basic schema using the Persistent library. (If you want some details on how this works, see our Real World Haskell series). We'll just have one type in our database, and users will use server endpoints to create or fetch these "text entries".
import Database.Persist.Sql
import qualified Database.Persist.TH as PTH
import Data.Text (Text)
PTH.share [PTH.mkPersist PTH.sqlSettings, PTH.mkMigrate "migrateAll"] [PTH.persistLowerCase|
TextEntry sql=text_entries
text Text
|]
An important product of this template Haskell sequence is the migrateAll
function, which will run the proper commands to migrate a Postgres database to fit our schema by creating tables.
Whenever we first create a database, we have to make sure it's migrated. But before we even do that we have to make sure we've created a database for Postgres to use! Let's see the commands we need for this, and how to run them in Haskell.
Running Setup Commands
When you install Postgres on your machine, you'll have separate "databases" on your system to help you keep your data separate. Separating data allows each database to have its own "users" table without having any name conflicts, for one example. By default, Postgresql comes installed with a database called postgres
.
But we don't want to use this to store our data. We want to create a separate database. We want to create this database if it's the first time we're running the server with a database. But otherwise, we just want to make sure its migrations are up to date.
Now, the command we would run to create this database is simple:
CREATE DATABASE quiz;
But we can first run this command to see if this database already exists:
SELECT datname FROM pg_database WHERE datname = 'quiz';
Both these commands assume we are connected to the postgres
database.
Since these first two instructions are raw commands, we can run them using the Postgresql Simple library. Here's some code to do this.
createDBIfMissing :: String -> IO ()
createDBIfMissing connString = do
connection <- connectPostgreSQL (pack connString)
putStrLn "Checking/Creating 'quiz' Database"
let checkQuery = "SELECT datname FROM pg_database WHERE datname = 'quiz';"
(checkResult :: [Only String]) <- query_ connection checkQuery
when (null checkResult) $ do
putStrLn "Not found! Creating 'quiz' database!"
let createQuery = "CREATE DATABASE quiz;"
void $ execute_ connection createQuery
When we run checkQuery
, it sees if the quiz
database exists. If its result is null
, then we'll run the additional command to create our database.
Once we have this function, we can write a wrapper that will create the database and then migrate it for our schema. Here's what this wrapper looks like:
migrateDb :: String -> String -> IO ()
migrateDb baseConnString quizConnString = do
createDBIfMissing baseConnString
putStrLn "Migrating Database"
runPG quizConnString (runMigration migrateAll)
runPG :: String -> SqlPersistT (LoggingT IO) a -> IO a
runPG connString action = runStdoutLoggingT $
withPostgresqlConn (pack connString) $ \backend ->
runReaderT action backend
Notice migrateDb
takes two different connection strings. One is for the base (postgres
) database. The other is for our new quiz
database. The creation queries run on the first, the migration runs on the second.
But how do we use these functions within our server?
Loading the URI
When we kick off our server, we have to load the database URI for our Postgres database. We'll use the format of {host}:{port}
. If you're running it locally, this would just be localhost:5432
. But when we deploy the server, we'll use a different URI. So let's write a function to load the host and port (separated by a colon) from an environment variable named DATABASE_URI
.
loadDatabaseEnv :: IO (String, String)
loadDatabaseEnv = do
dbEnv <- lookupEnv "DATABASE_URI"
if isNothing dbEnv || ':' `notElem` fromJust dbEnv
then return ("localhost", "5432")
else return (span (/= ':') (fromJust dbEnv))
Now we need to construct the full Postgres connection string. This has the following general format:
host={host} port={port} dbname={dbname} user={user} password={password}
As a default value, you can often just have the username and password both be postgres
(though of course this isn't recommended for a serious database). But let's make a function to substitute in the other values:
mkPostgresUri :: String -> String -> String -> String
mkPostgresUri host port dbname =
"host='" <> host <> "' port=" <> tail port <> " dbname='" <> dbname <> "' user='postgres' password='postgres'"
Finally, we'll pull our different pieces together, get both URIs, and launch our server. In my example, I'm using a Servant server (more details on that in this article), and this will often require passing the database string as an argument.
server :: String -> Server QuizAPI
server connString = ...
runServer :: IO ()
runServer = do
(host, port) <- loadDatabaseEnv
let baseConnString = mkPostgresUri host port "postgres"
let quizConnString = mkPostgresUri host port "quiz"
migrateDb baseConnString quizConnString
putStrLn "Running Server!"
run 8080 (serve api (server quizConnString))
Having made all these modifications to our server, of course we have to rebuild and redeploy our docker image for that! We can create the new local image with:
docker build -t quiz-server .
Then for more detailed instructions on deploying it, refer to part 2 and part 3 of this series!
When you deploy the server, you'll find it's crashing of course, because we haven't configured the database! So let's get to the real meat of this article…setting up the database on AWS!
Create a Database with RDS
This process is not actually too challenging. The first thing we're going to do is use RDS (Relational Database Service) to set up our database. This is easily done from the AWS console.
- Select the RDS service
- Hit the orange "Create Database" button
- Go through the creation wizard, making sure to select "Postgres" and the "Free Tier" option (assuming you're just making a test app).
Most of the default options are fine, but as I mentioned above I specified postgres
for the username and password of the database. I also unchecked the box for "Performance Insights" since this could lead to additional billing charges if you forget to turn it off.
Once you've created your database, you can then click the "databases" link on the sidebar, and select your new database. On that screen, you'll be able to see the "endpoint" and "port" of your database. These are the values you'll need for your environment!
Add Environment Variable
To connect your environment to the database, now you just have to add an environment variable! To do this, you have to access the configuration from the web portal:
- Go to the Elastic Beanstalk service
- Select "Environments" from the sidebar and then click the environment you have running your server.
- Click on the "Configuration" link on the side, and then select the "Edit" button in the "Software" section.
- At the very bottom, you'll find the "Environment Properties" section. Fill in
DATABASE_URI
as the key, and the{host}:{port}
combination you got from your database in RDS. - Click "Apply" to make the change!
By adding an environment variable, you are changing the configuration of your server, so it will reboot. Once it relaunches, you should find that it works, and you can persist information from your database!
Conclusion
Hopefully this series has helped you learn how to deploy your Haskell code to AWS! If you'd like to see all this article in video form, you can check out our YouTube video covering these steps!
For more tips on creating a "Real World" application, you can read our series on web skills! You can also download our Haskell Production checklist for some ideas of other libraries and tools you can use to improve your Haskell!
Deploying a Haskell Server to AWS
In the last few articles, we've been talking about how to deploy a Haskell application using AWS. This is part 3 of the series. So if you haven't done parts 1 & 2, you should start there so you can follow along!
In Part 1, we wrote a Dockerfile
and created a local Docker image containing a simple program for a Haskell web server.
In the Part 2, we pushed our container image to the AWS container registry (ECR). Notably, this involved creating an AWS account, downloading AWS command line tools and authenticating on the command line. We'll run a couple more of these commands today, so hopefully you're still authenticated!
But now that our container is uploaded, deploying that container is fairly straightforward. But it requires us to use a couple new concepts, as we'll see.
Adding ECR Permission
Before we get started, there's one step we have to take on the web portal. You need to give Elastic Beanstalk permission to download your ECR containers. You can do this using the IAM service from the AWS portal. Then follow these steps:
- Select "roles" on the left hand menu.
- Select "aws-elasticbeanstalk-ec2-role" in the list in the middle of the screen.
- Click "Add Permissions"
- Search for and select "AmazonEC2ContainerRegistryReadOnly"
Now let's get into the steps on our local machine.
Configuration File
There are multiple approaches to deploying a docker container, but the one that worked most easily for me was to create a file called Dockerrun.aws.json
. (Full other methods, refer to the documentation). This approach involves a counter-intuitive idea. We're going to create a separate directory outside of our main project directory. We'll call it remote
.
~/Quiz $ cd ..
~/ $ mkdir remote && cd remote
In this directory, we'll make a single file, called Dockerrun.aws.json
. This will, of course, be a JSON file. It will be a very simple configuration file telling our application to use the docker image we pushed last time to ECR. We have to start it by specifying the version of the program (which is 1
because we're only using a single container).
{
"AWSEBDockerrunVersion": "1",
...
}
Now we'll use tell it to use the Docker image we pushed last time by giving the URI under the Image
object:
{
"AWSEBDockerrunVersion": "1",
"Image": {
"Name": "165102442442.dkr.ecr.us-west-2.amazonaws.com/quiz-server"
},
...
}
Finally, we'll specify the port, similar to a Dockerfile
. We'll use 8080
both for the "Container" port and the "Host" port.
{
"AWSEBDockerrunVersion": "1",
"Image": {
"Name": "165102442442.dkr.ecr.us-west-2.amazonaws.com/quiz-server"
},
"Ports": [{
"ContainerPort": 8080,
"HostPort": 8080
}]
}
This is the only file we need in this directory! So now let's see what commands we need to run.
Creating the Application
Now we have two more steps that can largely be accomplished on the command line. First, we have to create an application. Then we have to create an environment to use for that application.
Before we can create an application though, we have to create a Git repository, just to store our single file! That's how AWS figures out what to push for configuration.
~/remote $ git init
~/remote $ git add .
~/remote $ git commit -m "First Commit"
Now we can create the application using the eb init
command. We'll give our application the name quiz-server
.
~/remote $ eb init -p docker quiz-server
You can then see your application on the web portal by accessing the "Elastic Beanstalk" service and clicking the "Applications" tab on the left menu.
Creating the Environment
Now we have to deploy an environment to deploy for our application. When first creating this environment, we use the eb create
command. We'll give this environment the name quiz-server-env
.
~/remote $ eb create quiz-server-env
This will take a while to deploy. But once it's done, you should be able to see it by clicking the "Environments" tab from the previous screen in the web portal. This will also show you the URL you can use to access your server. It's now successfully deployed!
Debugging
Sometimes, your deployment might fail. For example, you might misspell the name of your container. If you click on your environment (from the "Environments" tab), then you'll be able to access the "Logs" on the left hand menu. This can help you debug.
If you need to change your configuration file, you'll need to commit it, though you don't need to push it to any remote repository. You instead use eb deploy
to push your changes.
~/remote $ git add Dockerrun.aws.json
~/remote $ git commit -m "New Commit"
~/remote $ eb deploy
Now the deployment process should start again!
Video
You can also watch our YouTube video to see all these steps in action!
Conclusion
You now have enough information to deploy a Haskell web application to Heroku! We'll have one more installment in this series around adding a database to our application, so stay tuned for that! In the meantime, subscribe to our monthly newsletter so you can stay up to date with all the latest news!