Starting out with Graph Algorithms: Basic DFS

Sep 1

For a few weeks now, we’ve been tackling problems related to data structures, with a sprinkling of algorithmic ideas in there. Last week, we covered sets and heaps. Prior to that, we considered Matrices and the binary search algorithm.

This week, we’ll cover our first graph problem! Graph problems often build on a lot of fundamental layers. You need to understand the algorithm itself. Then you need to use the right data structures to apply it. And you’ll also still need the core problem solving patterns at your disposal. These 3 areas correspond to the first 3 modules of Solve.hs, our Haskell problem solving course! Check out that course to level up your Haskell skills!

The Problem

Today’s problem is called Number of Islands. We are given a 2D array as input, where every cell is either “land” or “sea” (either represented as the characters 1 and 0, or True and False). We want to find the number of distinct islands in this grid. Two “land” cells are part of the same island if we can draw a path from one cell to the other that only uses other land cells and other travels up, down, left and right (but not diagonally).

Let’s suppose we have this example:

This grid has 3 islands. The island in the top left corner comprises 7 connected cells. Then there’s a small island in the bottom right with only 2 cells. Finally, we have a third island in the middle right with 5 tiles. While it is diagonally adjacent to the first island, we do not count this as a connection.

The Algorithm

This is one of the most basic questions you’ll see that requires a graph search algorithm, like Depth-First-Search (DFS) or Breadth-First-Search (BFS). The basic principle is that we will select a starting coordinate for a search. We will use one of these algorithms to find all the land cells that are part of that cell’s island. We’ll then increment a counter for having found this island.

We need to track all the cells that are part of this island. We’ll then keep iterating for new start locations to find new islands, but we have to exclude any locations that have already been explored.

While BFS is certainly possible, the solutions we’ll write here will use DFS. Our solution will consist of 3 components:

A “neighbors” function that finds all adjacent land tiles to a given tile.
A “visit” function that will take a starting coordinate and populate a “visited” set with all of the cells on the same island as the starting coordinate.
A core “loop” that will consider each coordinate as a possible starting value for an island.

This ordering represents more of a “bottom up” approach to solving the problem. Going “top down” also works, and may be easier if you’re unfamiliar with graph algorithms. But as you get more practice with them, you’ll get a feel for knowing the bottom layers you need right away.

Rust Solution

To write our solution, we’ll write the components in our bottom up order. We’ll start with our neighbors function. This will take the island grid (LeetCode supplies us with a Vec<Vec<char>>) and the current location. We’ll represent locations as tuples, (usize, usize). This function will return a vector of locations.

use std::collections::HashSet;

pub fn neighbors(
        grid: &Vec<Vec<char>>,
        loc: &(usize,usize)) -> Vec<(usize,usize)> {
    ...
}

We’ll start this function by defining a few values. We want to know the length and width of the grid, as well as defining r and c to quickly reference the current location.

use std::collections::HashSet;

pub fn neighbors(
        grid: &Vec<Vec<char>>,
        loc: &(usize,usize)) -> Vec<(usize,usize)> {
    let m = grid.len();
    let n = grid[0].len();
    let r = loc.0;
    let c = loc.1;
    let mut result: Vec<(usize,usize)> = Vec::new();
    ...
}

Now we just have to look in each of the four directions. Each direction is included as long as it is a land tile and that it is not out of bounds. We’ll do our “visited” checks elsewhere.

pub fn neighbors(
        grid: &Vec<Vec<char>>,
        loc: &(usize,usize)) -> Vec<(usize,usize)> {
    let m = grid.len();
    let n = grid[0].len();
    let r = loc.0;
    let c = loc.1;
    let mut result: Vec<(usize,usize)> = Vec::new();
    if (r > 0 && grid[r - 1][c] == '1') {
        result.push((r - 1, c));
    }
    if (c > 0 && grid[r][c - 1] == '1') {
        result.push((r, c - 1));
    }
    if (r + 1 < m && grid[r + 1][c] == '1') {
        result.push((r + 1, c));
    }
    if (c + 1 < n && grid[r][c + 1] == '1') {
        result.push((r, c + 1));
    }
    return result;
}

Now let’s write the visit function. Remember, this function’s purpose is to populate the visited set starting from a certain location. We’ll use a HashSet of tuples for the visited set.

pub fn visit(
        grid: &Vec<Vec<char>>,
        visited: &mut HashSet<(usize,usize)>,
        loc: &(usize,usize)) {
    ...
}

First, we’ll check if this location is already visited and return if so. Otherwise we’ll insert it.

pub fn visit(
        grid: &Vec<Vec<char>>,
        visited: &mut HashSet<(usize,usize)>,
        loc: &(usize,usize)) {
    if (visited.contains(loc)) {
        return;
    visited.insert(*loc);
    ...
}

All we have to do now is find the neighbors of this location, and recursively “visit” each one of them!

pub fn visit(
        grid: &Vec<Vec<char>>,
        visited: &mut HashSet<(usize,usize)>,
        loc: &(usize,usize)) {
    if (visited.contains(loc)) {
        return;
    }
    visited.insert(*loc);
    let ns = neighbors(grid, visited, loc);
    for n in ns {
        visit(grid, &n);
    }
}

We’re not quite done, as we have to loop through our grid to call this function on each possible start. This isn’t so bad though. We start our function by defining key terms.

pub fn num_islands(grid: Vec<Vec<char>>) -> i32 {
    let m = grid.len();
    let n = grid[0].len();
    let mut visited: HashSet<(usize,usize)> = HashSet::new();
    let mut islandCount = 0;
    ...
    // islandCount will be our final result
    return islandCount;
}

Now we’ll “loop” through each possible starting location:

pub fn num_islands(grid: Vec<Vec>) -> i32 { let m = grid.len(); let n = grid[0].len(); let mut visited: HashSet<(usize,usize) data-preserve-html-node="true"> = HashSet::new(); let mut islandCount = 0; for row in 0..m { for col in 0..n { ... } // islandCount will be our final result return islandCount; }

The last question is, what do we do for each location? If the location is land AND it is still unvisited, we treat it as the start of a new island. This means we increase the island count and then “visit” the location. When we consider other cells on this island, they’re already visited, so we won’t increase the island count when we find them!

```rust
pub fn num_islands(grid: Vec<Vec<char>>) -> i32 {
    let m = grid.len();
    let n = grid[0].len();
    let mut visited: HashSet<(usize,usize)> = HashSet::new();
    let mut islandCount = 0;
    for row in 0..m {
        for col in 0..n {
            islandCount += 1;
            visit(&grid, &mut visited, &loc);
        }
    return islandCount;
}

Here’s our complete solution:

use std::collections::HashSet;

pub fn neighbors(
        grid: &Vec<Vec<char>>,
        loc: &(usize,usize)) -> Vec<(usize,usize)> {
    let m = grid.len();
    let n = grid[0].len();
    let r = loc.0;
    let c = loc.1;
    let mut result: Vec<(usize,usize)> = Vec::new();
    if (r > 0 && grid[r - 1][c] == '1') {
        result.push((r - 1, c));
    }
    if (c > 0 && grid[r][c - 1] == '1') {
        result.push((r, c - 1));
    }
    if (r + 1 < m && grid[r + 1][c] == '1') {
        result.push((r + 1, c));
    }
    if (c + 1 < n && grid[r][c + 1] == '1') {
        result.push((r, c + 1));
    }
    return result;
}

pub fn visit(
        grid: &Vec<Vec<char>>,
        visited: &mut HashSet<(usize,usize)>,
        loc: &(usize,usize)) {
    if (visited.contains(loc)) {
        return;
    }
    visited.insert(*loc);
    let ns = neighbors(grid, visited, loc);
    for n in ns {
        visit(grid, &n);
    }
}


pub fn num_islands(grid: Vec<Vec<char>>) -> i32 {
    let m = grid.len();
    let n = grid[0].len();
    let mut visited: HashSet<(usize,usize)> = HashSet::new();
    let mut islandCount = 0;
    for row in 0..m {
        for col in 0..n {
            let loc: (usize,usize) = (row,col);
            if grid[row][col] == '1' && !(visited.contains(&loc)) {
                islandCount += 1;
                visit(&grid, &mut visited, &loc);
            }
        }
    }
    return islandCount;
}

Haskell Solution

The structure of this solution translates well to Haskell, since it’s a recursive solution at its root. We’ll just use a couple of folds to handle the other loops. Let’s outline the solution:

numberOfIslands :: A.Array (Int, Int) Bool -> Int
numberOfIslands grid = ...
  where
    ((minRow, minCol), (maxRow, maxCol)) = A.bounds grid

    neighbors :: (Int, Int) -> [(Int, Int)]

    visit :: (Int, Int) -> HS.HashSet (Int, Int) -> HS.HashSet (Int, Int)

    loop :: (Int, Int) -> (Int, HS.HashSet (Int, Int)) -> (Int, HS.HashSet (Int, Int))

Since we’re writing our functions “in line”, we don’t need to pass the grid around like we did in our Rust solution (though inline functions are also possible there). What you should observe immediately is that visit and loop have a similar structure. They both fit into the a -> b -> b pattern we want for foldr! We’ll use this to great effect!

But first, let’s fill in neighbors. Each of the 4 directions requires the same two conditions we used before. We make sure it’s not out of bounds, and that the next tile is “land”. Here’s how we check the “up” direction:

numberOfIslands :: A.Array (Int, Int) Bool -> Int
numberOfIslands grid = ...
  where
    ((minRow, minCol), (maxRow, maxCol)) = A.bounds grid

    neighbors :: (Int, Int) -> [(Int, Int)]
    neighbors (row, col) = 
      let up =  if row > minRow && grid A.! (row - 1, col) then Just (row - 1, col) else Nothing
      ...

We return Nothing if it is not a valid neighbor. Then we just combine the four directional options with catMaybes to complete this helper:

numberOfIslands :: A.Array (Int, Int) Bool -> Int
numberOfIslands grid = ...
  where
    ((minRow, minCol), (maxRow, maxCol)) = A.bounds grid

    neighbors :: (Int, Int) -> [(Int, Int)]
    neighbors (row, col) = 
      let up =  if row > minRow && grid A.! (row - 1, col) then Just (row - 1, col) else Nothing
          left = if col > minCol && grid A.! (row, col - 1) then  Just (row, col - 1) else Nothing
          down = if row < maxRow && grid A.! (row + 1, col) then Just (row + 1, col) else Nothing
          right = if col < maxCol && grid A.! (row, col + 1) then Just (row, col + 1) else Nothing
      in  catMaybes [up, left, down, right]

Now we start the visit function by checking if we’ve already visited the location, and add it to the set if not:

numberOfIslands :: A.Array (Int, Int) Bool -> Int
numberOfIslands grid = ...
  where
    ((minRow, minCol), (maxRow, maxCol)) = A.bounds grid

    visit :: (Int, Int) -> HS.HashSet (Int, Int) -> HS.HashSet (Int, Int)
    visit coord visited = if HS.member coord visited then visited else
      let visited' = HS.insert coord visited
      ...

Now we have to get the neighbors and “loop” through the neighbors so that we keep the visited set updated. This is where we’ll apply our first fold. We’ll recursively fold over visit on each of the possible neighbors, which will give us the final visited set from this process. That’s all we need for this helper!

numberOfIslands :: A.Array (Int, Int) Bool -> Int
numberOfIslands grid = ...
  where
    ((minRow, minCol), (maxRow, maxCol)) = A.bounds grid

    visit :: (Int, Int) -> HS.HashSet (Int, Int) -> HS.HashSet (Int, Int)
    visit coord visited = if HS.member coord visited then visited else
      let visited' = HS.insert coord visited
      in  foldr visit visited’ ns

Now our loop function will consider only a single coordinate. We think of this as having two pieces of state. First, the number of accumulated islands (the Int). Second, we have the visited set. So we check if the coordinate is unvisited land. If so, we increase the count, and get our “new” visited set by calling visit on it. If not, we return the original inputs.

numberOfIslands :: A.Array (Int, Int) Bool -> Int
numberOfIslands grid = ...
  where
    loop :: (Int, Int) -> (Int, HS.HashSet (Int, Int)) -> (Int, HS.HashSet (Int, Int))
    loop coord (count, visited) = if grid A.! coord && not (HS.member coord visited)
      then (count + 1, visit coord visited)
      else (count, visited)

Now for the final flourish. Our loop also has the structure for foldr. So we’ll loop over all the indices of our array, which will give us the final number of islands and the visited set. Our final answer is just the fst of these:

numberOfIslands :: A.Array (Int, Int) Bool -> Int
numberOfIslands grid = fst (foldr loop (0, HS.empty) (A.indices grid))
  where

    loop :: (Int, Int) -> (Int, HS.HashSet (Int, Int)) -> (Int, HS.HashSet (Int, Int))

Here’s our final solution:

numberOfIslands :: A.Array (Int, Int) Bool -> Int
numberOfIslands grid = fst (foldr loop (0, HS.empty) (A.indices grid))
  where
    ((minRow, minCol), (maxRow, maxCol)) = A.bounds grid

    neighbors :: (Int, Int) -> [(Int, Int)]
    neighbors (row, col) = 
      let up =  if row > minRow && grid A.! (row - 1, col) then Just (row - 1, col) else Nothing
          left = if col > minCol && grid A.! (row, col - 1) then  Just (row, col - 1) else Nothing
          down = if row < maxRow && grid A.! (row + 1, col) then Just (row + 1, col) else Nothing
          right = if col < maxCol && grid A.! (row, col + 1) then Just (row, col + 1) else Nothing
      in  catMaybes [up, left, down, right]

    visit :: (Int, Int) -> HS.HashSet (Int, Int) -> HS.HashSet (Int, Int)
    visit coord visited = if HS.member coord visited then visited else
      let visited' = HS.insert coord visited
          ns = neighbors coord
      in  foldr visit visited' ns

    loop :: (Int, Int) -> (Int, HS.HashSet (Int, Int)) -> (Int, HS.HashSet (Int, Int))
    loop coord (count, visited) = if grid A.! coord && not (HS.member coord visited)
      then (count + 1, visit coord visited)
      else (count, visited)

A Note on the Graph Algorithm

It seems like we solved this without even really apply a “graph” algorithm! We just did a loop and a recursive call and everything worked out! There are a couple elements of this problem that make it one of the easiest graph problems out there.

Normally, we have to use some kind of structure to store a search state, telling us the next nodes in our graph to search. For BFS, this is a queue. For Dijkstra or A*, it is a heap. For DFS it is normally a stack. However, we are using the call stack to act as the stack for us!

When we make a recursive call to “visit” a location, we don’t need to keep track of which node we return to after we’re done. The function returns, and the prior node is already sitting there on the call stack.

The other simplifying factor is that we don’t need to do any backtracking or state restoration. Sometimes with a DFS, you need to “undo” some of the steps you took if you don’t find your goal. But this algorithm is just a space-filling algorithm. We are just trying to populate our “visited” set, and we never take nodes out of this set once we have visited them.

Conclusion

We’ve got a couple more graph problems coming up next. If you want to learn more about applying graph algorithms in Haskell (including implementing them for yourself!) check out Solve.hs, where Module 3 will teach you about algorithms including DFS, BFS, A* and beyond!

James Bowen