Apply the Trie: Word Search

Sep 29

Today will be a nice culmination of some of the work we’ve been doing with data structures and algorithms. In the past few weeks we’ve covered graph algorithms, particularly Depth First Search. And last week, we implemented the Trie data structure from scratch. Today we’ll solve a “Hard” problem (according to LeetCode) that pulls these pieces together!

For a comprehensive study of data structures and algorithms in Haskell, you should take a look at our course, Solve.hs. You’ll spend a full module on data structures in Haskell, and then another module learning about algorithms, especially graph algorithms!

The Problem

Today’s problem is Word Search II. In the first version of this problem, LeetCode asks you to determine if we can find a single word in a grid. In this second version, we receive an entire list of words, and we have to return the subset of those words that can be found in the grid.

Now the “search” mechanism on this grid is not simply a straight line search. We’ll use a limited Boggle Search. From each letter in a word, we can make any movement to the next letter, as long as it is horizontal or vertical (Boggle also allows diagonal, but we won’t).

So here’s an example grid:

CAND
XBIY
TENQ

Words like “CAN” and “TEN” are obviously allowed. But we can also use “CAB”, even though it doesn’t form a single straight line. Even better, we can use “CABINET”, snaking through all 3 rows. However, a word like “DIN” is disallowed since it would require a diagonal move. Also, we cannot “re-use” letters in the same word. So “TENET” would not be allowed, as it requires backtracking over the E and T.

We need to find the most efficient way to determine which of our input words can be found in the grid.

The Algorithm

This problem combines two elements we’ve recently worked with. First, we will use DFS ideas to actually search through the grid from a particular starting location. Second, we will use a Trie to store all of our input words. This will help us determine if we can stop searching. Once we find a string that is not a prefix in our Trie, we can discontinue the search branch.

Here’s a run-down of the solution:

Make a Trie from the Input Words
Search from each starting location in the grid, trying to add good words to a growing set of results.
At each location, add the character to our string. See if the resulting string is still a valid prefix of our Trie.
If it is, add the word to our results if the Trie indicates it is a valid word. Then search the neighboring locations, while keeping track of locations we’ve visited.
Once we no longer have a valid Trie, we can stop the line of searching

The obscures quite a few details, but from our last few weeks of work, those details shouldn’t be too difficult.

Updating Trie Functions

For both our solutions, we’ll assume we’re using the same Trie structure we built last week. We’ll need the general structure, as well as the insert function.

However, we won’t actually need the search or startsWith functions. Each time we call these, we traverse the full length of the string we’re querying. And the way our algorithm will work here, that will get quite inefficient (quadratic time overall).

Instead we’re going to rely on directly accessing sub-Tries, so that as we build our search word longer, we’ll get a Trie that assumes we’ve already searched that word in the main Trie. This will make subsequent searches faster.

To make this more convenient, we’ll just provide a function to get a Maybe Trie from the “sub-Tries” of the node we’re working with, based on the character. We’ll call this “popping” a Trie.

Here’s what it looks like in Rust:

impl Trie {
    ...
    fn pop(&self, c: char) -> Option<&Trie> {
        return self.nodes.get(&c);
    }
}

And in Haskell:

popTrie :: Char -> Trie -> Maybe Trie
popTrie c (Trie _ subs) = M.lookup c subs

For completeness, here’s the entire Rust version of the Trie that we’ll need for this problem:

use std::collections::HashMap;
use std::str::Chars;

struct Trie {
    endsWord: bool,
    nodes: HashMap<char, Trie>
}

impl Trie {

    fn new() -> Self {
        Trie {
            endsWord: false,
            nodes: HashMap::new()
        }
    }
    
    fn insert(&mut self, word: String) {
        self.insertIt(word.chars());
    }

    fn insertIt(&mut self, mut iter: Chars) {
        if let Some(c) = iter.next() {
            if !self.nodes.contains_key(&c) {
                self.nodes.insert(c, Trie::new());
            }
            if let Some(subTrie) = self.nodes.get_mut(&c) {
                subTrie.insertIt(iter);
            }
        } else {
            self.endsWord = true;
        }
    }

    fn pop(&self, c: char) -> Option<&Trie> {
        return self.nodes.get(&c);
    }
}

And here’s the full Haskell version:

data Trie = Trie Bool (M.Map Char Trie)

insertTrie :: String -> Trie -> Trie
insertTrie [] (Trie _ subs) = Trie True subs
insertTrie (c : cs) (Trie ends subs) =
  let sub = fromMaybe (Trie False M.empty) (M.lookup c subs)
      newSub = insertTrie cs sub
  in  (Trie ends (M.insert c newSub subs))

popTrie :: Char -> Trie -> Maybe Trie
popTrie c (Trie _ subs) = M.lookup c subs

Rust Solution

Now let’s move on to our solution, starting with Rust. As always with a graph problem, we’ll benefit from having a neighbors function. This will be very similar to functions we’ve written in the past few weeks, so we won’t dwell on it. In this case though, we’ll incorporate the visited set directly into this function, and exclude neighbors we’ve already seen:

pub fn neighbors(
        nr: usize,
        nc: usize,
        visitedLocs: &HashSet<(usize, usize)>,
        loc: (usize, usize)) -> Vec<(usize, usize)> {
    let r = loc.0;
    let c = loc.1;
    let mut results = Vec::new();
    if (r > 0 && !visitedLocs.contains(&(r - 1, c))) {
        results.push((r - 1, c));
    }
    if (c > 0 && !visitedLocs.contains(&(r, c - 1))) {
        results.push((r, c - 1));
    }
    if (r + 1 < nr && !visitedLocs.contains(&(r + 1, c))) {
        results.push((r + 1, c));
    }
    if (c + 1 < nc && !visitedLocs.contains(&(r, c + 1))) {
        results.push((r, c + 1));
    }
    return results;
}

Now, thinking back to the Islands example, we want to write a search function similar to the visit function we had before. The job of the visit function was to populate the visited set with all reachable tiles from the start. Our search function will populate a set of “results” with every word reachable from a certain location.

However, it will also require some immutable inputs, such as the board dimensions and the board itself. But it will also have several mutable, stateful items like the current Trie, the String we are building, and the current visited set. Here’s the signature we will use:

pub fn search(
        nr: usize,
        nc: usize,
        board: &Vec<Vec<char>>,
        trie: &Trie,
        loc: (usize, usize),
        visitedLocs: &mut HashSet<(usize, usize)>,
        currentStr: &mut String,
        seenWords: &mut HashSet<String>) {
    ...
}

This function has two tasks. First, assess the current location to see if the word we completed by arriving here should be added, or if it’s at least a prefix of a remaining word. If either is true, our second job is to find this location’s neighbors and recursively call them, continuing to grow our string and search for longer words.

Let’s write the code for the first part. Naturally, we need the character at this grid location. Then we need to query our Trie to “pop” the sub-trie associated with this character. If this sub-Trie doesn’t exist, we immediately return. Otherwise, we consider this location “visited” (add it to the set) and we push the new character onto the string. If our new sub-Trie “ends a word”, then we add this word to our results set!

pub fn search(
        nr: usize,
        nc: usize,
        board: &Vec<Vec<char>>,
        trie: &Trie,
        loc: (usize, usize),
        visitedLocs: &mut HashSet<(usize, usize)>,
        currentStr: &mut String,
        seenWords: &mut HashSet<String>) {
    let c = board[loc.0][loc.1];
    if let Some(subTrie) = trie.pop(c) {
        currentStr.push(c);
        visitedLocs.insert(loc);
        if subTrie.endsWord {
            seenWords.insert(currentStr.clone());
        }
        ...
    }
}

Now we get our neighbors and recursively search them, passing updated mutable values. But there’s one extra thing to include! After we are done searching our neighbors, we should “undo” our mutable changes to the visited set and the string.

We haven’t had to make this kind of “backtracking” change before. But we don’t want to permanently keep this location in this visited set, nor keep the string modified. When we return to our caller, we want these mutable values to be the same as how we got them. Otherwise, subsequent calls may be disturbed, and we’ll get incorrect answers!

pub fn search(
        nr: usize,
        nc: usize,
        board: &Vec<Vec<char>>,
        trie: &Trie,
        loc: (usize, usize),
        visitedLocs: &mut HashSet<(usize, usize)>,
        currentStr: &mut String,
        seenWords: &mut HashSet<String>) {
    let c = board[loc.0][loc.1];
    if let Some(subTrie) = trie.pop(c) {
        currentStr.push(c);
        visitedLocs.insert(loc);
        if subTrie.endsWord {
            seenWords.insert(currentStr.clone());
        }
        let ns = neighbors(nr, nc, visitedLocs, loc);
        for n in ns {
            search(nr, nc, board, subTrie, n, visitedLocs, currentStr, seenWords);
        }
        // Backtrack! Remove this location and pop this character
        visitedLocs.remove(&loc);
        currentStr.pop();
    }
}

This completes the search function. Now we just have to call it! We start our primary function by initializing our key values, especially our Trie. We need to insert all the starting words into a Trie that we create:

pub fn find_words(board: Vec<Vec<char>>, words: Vec<String>) -> Vec<String> {
    let mut trie = Trie::new();
    for word in &words {
        trie.insert(word.to_string());
    }
    let mut results = HashSet::new();
    let nr = board.len();
    let nc = board[0].len();
    ...
}

And now we just loop through each location in the grid and search it as a starting location! For an extra optimization, we can stop our search early if we have found all of our words.

pub fn find_words(board: Vec<Vec<char>>, words: Vec<String>) -> Vec<String> {
    let mut trie = Trie::new();
    for word in &words {
        trie.insert(word.to_string());
    }
    let mut results = HashSet::new();
    let nr = board.len();
    let nc = board[0].len();
    for i in 0..nr {
        for j in 0..nc {
            if results.len() < words.len() {
                let mut visited = HashSet::new();
                let mut curr = String::new();
                search(nr, nc, &board, &trie, (i, j), &mut visited, &mut curr, &mut results);
            }
        }
    }
    return results.into_iter().collect();
}

And we’re done! Here is our complete Rust solution!

pub fn neighbors(
        nr: usize,
        nc: usize,
        visitedLocs: &HashSet<(usize, usize)>,
        loc: (usize, usize)) -> Vec<(usize, usize)> {
    let r = loc.0;
    let c = loc.1;
    let mut results = Vec::new();
    if (r > 0 && !visitedLocs.contains(&(r - 1, c))) {
        results.push((r - 1, c));
    }
    if (c > 0 && !visitedLocs.contains(&(r, c - 1))) {
        results.push((r, c - 1));
    }
    if (r + 1 < nr && !visitedLocs.contains(&(r + 1, c))) {
        results.push((r + 1, c));
    }
    if (c + 1 < nc && !visitedLocs.contains(&(r, c + 1))) {
        results.push((r, c + 1));
    }
    return results;
}

pub fn search(
        nr: usize,
        nc: usize,
        board: &Vec<Vec<char>>,
        trie: &Trie,
        loc: (usize, usize),
        visitedLocs: &mut HashSet<(usize, usize)>,
        currentStr: &mut String,
        seenWords: &mut HashSet<String>) {
    let c = board[loc.0][loc.1];
    if let Some(subTrie) = trie.pop(c) {
        currentStr.push(c);
        visitedLocs.insert(loc);
        if subTrie.endsWord {
            seenWords.insert(currentStr.clone());
        }
        let ns = neighbors(nr, nc, visitedLocs, loc);
        for n in ns {
            search(nr, nc, board, subTrie, n, visitedLocs, currentStr, seenWords);
        }
        visitedLocs.remove(&loc);
        currentStr.pop();
    }
}

pub fn find_words(board: Vec<Vec<char>>, words: Vec<String>) -> Vec<String> {
    let mut trie = Trie::new();
    for word in &words {
        trie.insert(word.to_string());
    }
    let mut results = HashSet::new();
    let nr = board.len();
    let nc = board[0].len();
    for i in 0..nr {
        for j in 0..nc {
            if results.len() < words.len() {
                let mut visited = HashSet::new();
                let mut curr = String::new();
                search(nr, nc, &board, &trie, (i, j), &mut visited, &mut curr, &mut results);
            }
        }
    }
    return results.into_iter().collect();
}

Haskell Solution

Our Haskell solution starts with some of the same beats. We’ll create our initial Trie through insertion and define a familiar neighbors function:

findWords :: A.Array (Int, Int) Char -> [String] -> [String]
findWords board allWords = ...
  where
    ((minRow, minCol), (maxRow, maxCol)) = A.bounds board
    trie = foldr insertTrie (Trie False M.empty) allWords

    neighbors :: HS.HashSet (Int, Int) -> (Int, Int) -> [(Int, Int)]
    neighbors visited (r, c) =
      let up = if r > minRow && not (HS.member (r - 1, c) visited) then Just (r - 1, c) else Nothing
          left = if c > minCol && not (HS.member (r, c - 1) visited) then Just (r, c - 1) else Nothing
          down = if r < maxRow && not (HS.member (r + 1, c) visited) then Just (r + 1, c) else Nothing
          right = if c < maxCol && not (HS.member (r, c + 1) visited) then Just (r, c + 1) else Nothing
      in catMaybes [up, left, down, right]

    ...

Now let’s think about our search function. This function’s job is to update a set, given a particular location. We’re going to loop over this function with many locations. So we want the end of its signature to look like:

(Int, Int) -> HS.HashSet String -> HS.HashSet String

This will allow us to use it with foldr. But we still want to think about the mutable elements going into this function: the Trie, the visited set, and the accumulated string. These also change from call to call, but they should come earlier in the signature, since we can make them fixed over each loop of the function. So here’s what our type signature looks like:

search ::
    Trie ->
    HS.HashSet (Int, Int) ->
    String ->
    (Int, Int) ->
    HS.HashSet String ->
    HS.HashSet String

The first order of business in this function is to “pop” the try based on the character at this location and see if it exists. If not, we simply return our original set:

findWords :: A.Array (Int, Int) Char -> [String] -> [String]
findWords board allWords = ...
  where
    ...

    search ::
      Trie ->
      HS.HashSet (Int, Int) ->
      String ->
      (Int, Int) ->
      HS.HashSet String ->
      HS.HashSet String
    search trie' visited currentStr loc seenWords = case popTrie (board A.! loc) trie' of
      Nothing -> seenWords
      Just sub@(Trie ends _) -> ...

Now we update our current string and visited set, while adding the word to our results if the sub-Trie indicates we are at the end of a word:

findWords :: A.Array (Int, Int) Char -> [String] -> [String]
findWords board allWords = ...
  where
    ...

    search ::
      Trie ->
      HS.HashSet (Int, Int) ->
      String ->
      (Int, Int) ->
      HS.HashSet String ->
      HS.HashSet String
    search trie' visited currentStr loc seenWords = case popTrie (board A.! loc) trie' of
      Nothing -> seenWords
      Just sub@(Trie ends _) ->
        let currentStr' = board A.! loc : currentStr
            visited' = HS.insert loc visited
            seenWords' = if ends then HS.insert (reverse currentStr') seenWords else seenWords
        ...

And now we get our neighbors and loop through them with foldr. Observe how we define the function f that fixes the first three parameters with our new mutable values so we can cleanly call foldr.

findWords :: A.Array (Int, Int) Char -> [String] -> [String]
findWords board allWords = ...
  where
    ...

    search ::
      Trie ->
      HS.HashSet (Int, Int) ->
      String ->
      (Int, Int) ->
      HS.HashSet String ->
      HS.HashSet String
    search trie' visited currentStr loc seenWords = case popTrie (board A.! loc) trie' of
      Nothing -> seenWords
      Just sub@(Trie ends _) ->
        let currentStr' = board A.! loc : currentStr
            visited' = HS.insert loc visited
            seenWords' = if ends then HS.insert (reverse currentStr') seenWords else seenWords
            ns = neighbors visited loc
            f = search sub visited' currentStr'
        in  foldr f seenWords' ns

We’re almost done now! Having written the “inner” loop, we just have to write the “outer” loop that will loop through every location as a starting point.

findWords :: A.Array (Int, Int) Char -> [String] -> [String]
findWords board allWords = HS.toList result
  where
    ...
    trie = foldr insertTrie (Trie False M.empty) allWords
    result = foldr (search trie HS.empty "") HS.empty (A.indices board)

Here is our complete Haskell solution!

findWords :: A.Array (Int, Int) Char -> [String] -> [String]
findWords board allWords = HS.toList result
  where
    ((minRow, minCol), (maxRow, maxCol)) = A.bounds board
    trie = foldr insertTrie (Trie False M.empty) allWords

    neighbors :: HS.HashSet (Int, Int) -> (Int, Int) -> [(Int, Int)]
    neighbors visited (r, c) =
      let up = if r > minRow && not (HS.member (r - 1, c) visited) then Just (r - 1, c) else Nothing
          left = if c > minCol && not (HS.member (r, c - 1) visited) then Just (r, c - 1) else Nothing
          down = if r < maxRow && not (HS.member (r + 1, c) visited) then Just (r + 1, c) else Nothing
          right = if c < maxCol && not (HS.member (r, c + 1) visited) then Just (r, c + 1) else Nothing
      in catMaybes [up, left, down, right]

    search ::
      Trie ->
      HS.HashSet (Int, Int) ->
      String ->
      (Int, Int) ->
      HS.HashSet String ->
      HS.HashSet String
    search trie' visited currentStr loc seenWords = case popTrie (board A.! loc) trie' of
      Nothing -> seenWords
      Just sub@(Trie ends _) ->
        let currentStr' = board A.! loc : currentStr
            visited' = HS.insert loc visited
            seenWords' = if ends then HS.insert (reverse currentStr') seenWords else seenWords
            ns = neighbors visited loc
            f = search sub visited' currentStr'
        in  foldr f seenWords' ns
    
    result = foldr (search trie HS.empty "") HS.empty (A.indices board)

Conclusion

This problem brought together a lot of interesting solution components. We applied our Trie implementation from last week, and used several recurring ideas from graph search problems. Next week we’re going to switch gears a bit and start discussing dynamic programming.

To learn more about all of these problem concepts, you need to take a look at Solve.hs. It gives a fairly comprehensive look at problem solving concepts in Haskell. If you want to understand how to shape your functions to work with folds like we did in this article, you’ll learn about that in Module 1. If you want to implement and apply data structures like graphs and tries, Module 2 will teach you. And if you want practice writing and using key graph algorithms in Haskell, Module 3 will give you the experience you need!

James Bowen