Writing Our Own Structure: Tries in Haskell & Rust

Sep 22

In the last few weeks we’ve studied a few different graph problems. Graphs are interesting because they are a derived structure that we can represent in different ways to solve different problems. Today, we’ll solve a LeetCode problem that actually focuses on writing a data structure ourselves to satisfy certain requirements! Next week, we’ll use this structure to solve a problem.

If you want to improve your Haskell Data Structure skills, both with built-in types and in making your own types, your should check out Solve.hs, our problem solving course. Module 2 is heavily focused on data structures, so it will clear up a lot of blind spots you might have working with these in Haskell!

The Problem

Unlike our previous problems, we’re not trying to solve some peculiar question formulation with inputs and outputs. Our task today is to implement the basic functions for a Trie data structure. A Trie (pronounced “try”) is also known as a Prefix tree. They are most often used in the context of strings (though other stream-like types are also possible). We’ll make one that is effectively a container of strings that efficiently supports 3 operations:

Insert - Add a new word into our set
Search - Determine if we have previously inserted the given word into our tree
Starts With - Determine if we have inserted any word that has the given input as a prefix

The first two operations are typical of tree sets and hash sets, but the third operation is distinctive for a Trie.

We’ll implement these three functions, as well as provide a means of constructing an empty Trie. We’ll work with the constraint that all our input strings consist only of lowercase English letters.

The Algorithm

If at first you pronounced “Trie” like “Tree”, you’re not really wrong. Our core implementation strategy will be to create a recursive tree structure. It’s easiest if we start by visualizing a trie. Here is a tree structure corresponding to a Trie containing the words “at”, “ate”, “an”, “bank” and “band”.

:
       _
     /   \
    a     b
   / \     \
  t*  n*    a
 /           \
e*            n
             / \
            k*  d*

The top node is a blank space _ representing the root node. All other nodes in our tree correspond to letters. When we trace a path from the root to any node, we get a valid prefix of a word in our Trie. A star (*) indicates that a node corresponds to a complete word. Note that interior nodes can be complete, as is the case with at.

This suggests a structure for each node in the Trie. A node should store a boolean value telling us if the node completes a word. Other than that, all it needs is a map keying from characters to other nodes further down the tree.

We can use this structure to write all our function implementations in relatively simple recursive terms.

Haskell

Recursive data structures and functions are very natural in Haskell, so we’ll start with that implementation. We’ll make our data type and provide it with the two fields…the boolean indicating the end of a word, and the map of characters to additional Trie nodes.

data Trie = Trie Bool (M.Map Char Trie)

Note that even though a node is visually represented by a particular character, we don’t actually need to store the character on the node. The fact that we arrive at a particular node by keying on a character from its parent is enough.

Now let’s write our implementations, starting with insert. In Haskell, we don’t write functions “on” the type because we can’t mutate expressions. We write functions that take one instance of the type and return another. So our insertTrie function has this signature:

insertTrie :: String -> Trie -> Trie

We want to build a recursive structure, and we have an input (String) that breaks down recursively. This means we have two cases to deal with in this function. Either the string is empty, or it is not:

insertTrie :: String -> Trie -> Trie
insertTrie [] (Trie ends subs) = ...
insertTrie (c : cs) (Trie ends subs) = ...

If the string is empty, we don’t need to do much. We return a node with the same sub-tries, but its Bool field is True now! It’s good to remark on certain edge cases. For example, this tells us that we can insert the “empty” string into our Trie. We would just mark the “root” node as True!

insertTrie :: String -> Trie -> Trie
insertTrie [] (Trie _ subs) = Trie True subs
insertTrie (c : cs) (Trie ends subs) = ...

In the recursive case, we’ll be making a recursive call on a particular “sub” Trie. We want to “lookup” if a Trie for the character c already exists. If not, we’ll make a default one (with False and empty sub-tries map).

insertTrie :: String -> Trie -> Trie
insertTrie [] (Trie _ subs) = Trie True subs
insertTrie (c : cs) (Trie ends subs) =
  let sub = fromMaybe (Trie False M.empty) (M.lookup c subs)
  ...

Now we recursively insert the rest of the input into this sub-Trie:

insertTrie :: String -> Trie -> Trie
insertTrie [] (Trie _ subs) = Trie True subs
insertTrie (c : cs) (Trie ends subs) =
  let sub = fromMaybe (Trie False M.empty) (M.lookup c subs)
      newSub = insertTrie cs sub
  ...

Finally, we map this newSub into the original Trie, using c as the key for it:

insertTrie :: String -> Trie -> Trie
insertTrie [] (Trie _ subs) = Trie True subs
insertTrie (c : cs) (Trie ends subs) =
  let sub = fromMaybe (Trie False M.empty) (M.lookup c subs)
      newSub = insertTrie cs sub
  in  (Trie ends (M.insert c newSub subs))

The search and startsWith functions follow a similar pattern, pattern matching on the input string. With search, an empty string keys us to look at the Bool field for our Trie. If it’s True, then the word we are searching for was inserted into our Trie:

searchTrie :: String -> Trie -> Bool
searchTrie [] (Trie ends _) = ends
searchTrie (c : cs) (Trie _ subs) = ...

If not, we’ll check for the sub-Trie using the character c. If it doesn’t exist, then the word we’re looking for isn’t in our Trie. If it does, we recursively search for the “rest” of the string in that Trie:

searchTrie :: String -> Trie -> Bool
searchTrie [] (Trie ends _) = ends
searchTrie (c : cs) (Trie _ subs) = case M.lookup c subs of
  Nothing -> False
  Just sub -> searchTrie cs sub

Finally, startsWith is almost identical to search. The only difference is that if we reach the end of the input word, we always return True, as it only needs to be a prefix:

startsWithTrie :: String -> Trie -> Bool
startsWithTrie [] _ = True
startsWithTrie (c : cs) (Trie _ subs) = case M.lookup c subs of
  Nothing -> False
  Just sub -> startsWithTrie cs sub

There is one interesting case here. We always consider the empty string to be a valid prefix, even if we haven’t inserted anything into our Trie. Perhaps this doesn’t make sense to you, but LeetCode accepts this logic with our Rust solution. You could work around to accommodate it, but it results in code that is less clean.

Here’s the full Haskell solution:

data Trie = Trie Bool (M.Map Char Trie)

insertTrie :: String -> Trie -> Trie
insertTrie [] (Trie _ subs) = Trie True subs
insertTrie (c : cs) (Trie ends subs) =
  let sub = fromMaybe (Trie False M.empty) (M.lookup c subs)
      newSub = insertTrie cs sub
  in  (Trie ends (M.insert c newSub subs))

searchTrie :: String -> Trie -> Bool
searchTrie [] (Trie ends _) = ends
searchTrie (c : cs) (Trie _ subs) = case M.lookup c subs of
  Nothing -> False
  Just sub -> searchTrie cs sub

startsWithTrie :: String -> Trie -> Bool
startsWithTrie [] _ = True
startsWithTrie (c : cs) (Trie _ subs) = case M.lookup c subs of
  Nothing -> False
  Just sub -> startsWithTrie cs sub

Rust

The Rust solution follows the same algorithmic ideas, but the code looks quite a bit different. Rust allows mutable data structures, so it looks a bit more like your typical object oriented language in making structures. But there are some interesting quirks! Here’s the frame LeetCode gives you to work with:

struct Trie {
    ...
}

impl Trie {
    fn new() -> Self {
        ...
    }
    
    fn insert(&mut self, word: String) {
        ...
    }
    
    fn search(&self, word: String) -> bool {
        ...
    }

    fn starts_with(&self, prefix: String) -> bool {
        ...
    }
}

With Rust we define the fields of the struct, and then create an impl for the type with all the relevant functions. Each “class” method takes a self parameter, somewhat like Python. This can be mutable or not. A Rust “constructor” is typically done with a new function, as you see.

Despite Rust’s trickiness with ownership and borrowing, there are no obstacles to making a recursive data type in the same manner as our Haskell implementation. Here’s our struct definition, as well as the constructor:

use std::collections::HashMap;

struct Trie {
    endsWord: bool,
    nodes: HashMap<char, Trie>
}

impl Trie {
    fn new() -> Self {
        Trie {
            endsWord: false,
            nodes: HashMap::new()
        }
    }
    ...
}

When it comes to the main functions though, we don’t want to make them directly recursive. Each takes a String input, and constructing a new String that pops the first character actually isn’t efficient. Rust is also peculiar in that you cannot index into strings, due to ambiguity arising from character encodings. So our solution will be to use the Chars iterator to efficiently “pop” characters while being able to examine the characters that come next.

So let’s start by making ...It versions of all our functions that take Chars iterators. We’ll be able to call these functions recursively. We invoke each one from the base function by calling .chars() on the input string.

use std::collections::HashMap;
use std::str::Chars;

struct Trie {
    endsWord: bool,
    nodes: HashMap<char, Trie>
}

impl Trie {

    fn new() -> Self {
        Trie {
            endsWord: false,
            nodes: HashMap::new()
        }
    }
    
    fn insert(&mut self, word: String) {
        self.insertIt(word.chars());
    }

    fn insertIt(&mut self, mut iter: Chars) {
        ...
    }
    
    fn search(&self, word: String) -> bool {
        return self.searchIt(word.chars());
    }

    fn searchIt(&self, mut iter: Chars) -> bool {
        ...
    }

    fn starts_with(&self, prefix: String) -> bool {
        return self.startsWithIt(prefix.chars());
    }

    fn startsWithIt(&self, mut iter: Chars) -> bool {
        ...
    }
}

Now let’s zero in on these implementations, starting with insertIt. First we pattern match on the iterator. If it gives us None, we just end self.endsWith = true and we’re done.

impl Trie {
    ...
    fn insertIt(&mut self, mut iter: Chars) {
        if let Some(c) = iter.next() {
            ...
        } else {
            self.endsWord = true;
        }
    }
}

Now, just like in Haskell, if this node doesn’t have a sub-Trie for the character c yet, we insert a new Trie for c. Then we recursively call insertIt on this “subTrie”.

impl Trie {
    fn insertIt(&mut self, mut iter: Chars) {
        if let Some(c) = iter.next() {
            if !self.nodes.contains_key(&c) {
                self.nodes.insert(c, Trie::new());
            }
            if let Some(subTrie) = self.nodes.get_mut(&c) {
                subTrie.insertIt(iter);
            }
        } else {
            self.endsWord = true;
        }
    }
}

That’s it for insertion. Now for searching, we’ll follow the same pattern matching protocol. If the Chars iterator is empty, we just check if this node has endsWith set or not:

impl Trie {
   fn searchIt(&self, mut iter: Chars) -> bool {
        ...
        } else {
            return self.endsWord;
        }
    }
}

We check again for a sub-Trie under the key c. If it doesn’t exist, we return false. If it does, we just make a recursive call!

impl Trie {
   fn searchIt(&self, mut iter: Chars) -> bool {
            if let Some(subTrie) = self.nodes.get(&c) {
                return subTrie.searchIt(iter);
            } else {
                return false;
            }
        } else {
            return self.endsWord;
        }
    }
}

And with startsWith, it’s the same pattern. We do exactly the same thing as search, except that the else case is unambiguously true.

impl Trie {
    fn startsWithIt(&self, mut iter: Chars) -> bool {
        if let Some(c) = iter.next() {
            if let Some(subTrie) = self.nodes.get(&c) {
                return subTrie.startsWithIt(iter);
            } else {
                return false;
            }
        } else {
            return true;
        }
    }
}

Here’s our final Rust implementation:

use std::collections::HashMap;
use std::str::Chars;

struct Trie {
    endsWord: bool,
    nodes: HashMap<char, Trie>
}

impl Trie {

    fn new() -> Self {
        Trie {
            endsWord: false,
            nodes: HashMap::new()
        }
    }
    
    fn insert(&mut self, word: String) {
        self.insertIt(word.chars());
    }

    fn insertIt(&mut self, mut iter: Chars) {
        if let Some(c) = iter.next() {
            if !self.nodes.contains_key(&c) {
                self.nodes.insert(c, Trie::new());
            }
            if let Some(subTrie) = self.nodes.get_mut(&c) {
                subTrie.insertIt(iter);
            }
        } else {
            self.endsWord = true;
        }
    }
    
    fn search(&self, word: String) -> bool {
        return self.searchIt(word.chars());
    }

    fn searchIt(&self, mut iter: Chars) -> bool {
        if let Some(c) = iter.next() {
            if let Some(subTrie) = self.nodes.get(&c) {
                return subTrie.searchIt(iter);
            } else {
                return false;
            }
        } else {
            return self.endsWord;
        }
    }

    fn starts_with(&self, prefix: String) -> bool {
        return self.startsWithIt(prefix.chars());
    }

    fn startsWithIt(&self, mut iter: Chars) -> bool {
        if let Some(c) = iter.next() {
            if let Some(subTrie) = self.nodes.get(&c) {
                return subTrie.startsWithIt(iter);
            } else {
                return false;
            }
        } else {
            return true;
        }
    }
}

Conclusion

It’s always interesting to practice making recursive data structures in a new language. While Rust shares some things in common with Haskell, making data structures still feels more like other object-oriented languages than Haskell. Next week, we’ll put our Trie implementation to use by solving a problem that requires this data structure!

To learn more about writing your own data structures in Haskell, take our course, Solve.hs! Module 2 focuses heavily on data structures, and you’ll learn how to make some derived data structures that will improve your programs!

James Bowen