The Sliding Window in Haskell & Rust

Jun 23

In last week’s problem, we covered a two-pointer algorithm, and compared Rust and Haskell solutions as we have been for this whole series. Today, we’ll study a related concept, the sliding window problem. Whereas the general two-pointer problem can often be tackled by a single loop, we’ll have to use nested loops in this problem. This problem will also mark our first use of the Set data structure in this series.

If you want a deeper look at problem solving techniques in Haskell, you should enroll in our Solve.hs course! You’ll learn everything you need for general problem solving knowledge in Haskell, including data structures, algorithms, and parsing!

The Problem

Today’s LeetCode problem is Longest Substring without Repeating Characters. It’s a lengthy problem name, but the name basically tells you everything you need to know! We want to find a substring of our input that does not repeat any characters within the substring, and then get the longest such substring.

For example, abaca would give us an answer of 3, since we have the substringbac that consists of 3 unique characters. However, abaaca only gives us 2. There is no run of 3 characters where the three characters are all unique.

The Algorithm

The approach we’ll use, as mentioned above, is called a sliding window algorithm. In some ways, this is similar to the two-pointer approach last week. We’ll have, in a sense, two different pointers within our input. One dictates the “left end” of a window and one dictates the “right end” of a window. Unlike last week’s problem though, both pointers will move in the same direction, rather than converging from opposite directions.

The goal of a sliding window problem is “find a continuous subsequence of an input that matches the criteria”. And for many problems like ours, you want to find the longest such subsequence. The main process for a sliding window problem is this:

Grow the window by increasing the “right end” until (or while) the predicate is satisfied
Once you cannot grow the window any more, shrink the window by increasing the “left end” until we’re in a position to grow the window again.
Continue until one or both pointers go off the end of the input list.

So for our problem today, we want to “grow” our sliding window as long as we can get more unique characters. Once we hit a character we’ve already seen in our current window, we’ll need to shrink the window until that duplicate character is removed from the set.

As we’re doing this, we’ll need to keep track of the largest substring size we’ve seen so far.

Here are the steps we would take with the input abaca. At each step, we process a new input character.

1. Index 0 (‘a’) - window is “a” which is all unique.
2. Index 1 (‘b’) - window is “ab” which is all unique
3. Index 2 (‘a’) - window is “aba”, which is not all unique
3b. Shrink window, removing first ‘a’, so it is now “ba”
4. Index 3 (‘c’) - window is “bac”, which is all unique
5. Index 4 (‘a’) - window is “baca”, which is not unique
5b. Shrink window, remove ‘b’ and ‘a’, leaving “ca”

The largest unique window we saw was bac, so the final answer is 3.

Haskell Solution

For a change of pace, let’s discuss the Haskell approach first. Our algorithm is laid out in such a way that we can process one character at a time. Each character either grows the window, or forces it to shrink to accommodate the character. This means we can use a fold!

Let’s think about what state we need to track within this fold. Naturally, we want to track the current “set” of characters in our window. Each time we see the next character, we have to quickly determine if it’s already in the window. We’ll also want to track the largest set size we’ve seen so far, since by the end of the string our window might no longer reflect the largest subsequence.

With a general sliding window approach, you would also need to track both the start and the end index of your current window. In this problem though, we can get away with just tracking the start index. We can always derive the end index by taking the start index and adding the size of the set. And since we’re iterating through the characters anyway, we don’t need the end index to get the “next” character.

This means our fold-loop function will have this type signature:

-- State: (start index, set of letters, largest seen)
loop :: (Int, S.Set Char, Int) -> Char -> (Int, S.Set Char, Int)

Now, using our idea of “beginning from the end”, we can already write the invocation of this loop:

largestUniqueSubsequence :: String -> Int
largestUniqueSubsequence input = best
  where
    (_, _, best) = foldl loop (0, S.empty, 0) input

    loop :: (Int, S.Set Char, Int) -> Char -> (Int, S.Set Char, Int)
    ...

Using 0 for the start index right away is a little hand-wavy, since we haven’t actually added the first character to our set yet! But if we see a single character, we’ll always add it, and as we’ll see, the “adding” branch of our loop never increases this number.

With that in mind, let’s write this branch of our loop handler! If we have not seen the next character in the string, we keep the same start index (left side of the window isn’t moving), we add the character to our set, and we take the new size of the set as the “best” value if it’s greater than the original. We get the new size by adding 1 to the original set size.

largestUniqueSubsequence :: String -> Int
largestUniqueSubsequence input = best
  where
    (_, _, best) = foldl loop (0, S.empty, 0) input

   loop :: (Int, S.Set Char, Int) -> Char -> (Int, S.Set Char, Int)
   loop (startIndex, charSet, bestSoFar) c = if S.notMember c charSet
    then (startIndex, S.insert c charSet, max bestSoFar (S.size charSet + 1))
      else ...

Now we reach the tricky case! If we’ve already seen the next character, we need to remove characters from our set until we reach the instance of this character in the set. Since we might need to remove multiple characters, “shrinking” is an iterative process with a variable number of steps. This means it would be a while-loop in most languages, which means we need another recursive function!

The goal of this function is to change two of our stateful values (the start index and the character set) until we can once again have a unique character set with the new input character. So each iteration it takes the existing values for these, and will ultimately return updated values. Here’s its type signature:

shrink :: (Int, S.Set Char) -> Char -> (Int, S.Set Char)

Before we implement this, we can invoke it in our primary loop! When we’ve seen the new character in our set, we shrink the input to match this character, and then return these new stateful values along with our previous best (shrinking never increases the size).

largestUniqueSubsequence :: String -> Int
largestUniqueSubsequence input = best
  where
    (_, _, best) = foldl loop (0, S.empty, 0) input

   loop :: (Int, S.Set Char, Int) -> Char -> (Int, S.Set Char, Int)
   loop (startIndex, charSet, bestSoFar) c = if S.notMember c charSet
    then (startIndex, S.insert c charSet, max bestSoFar (S.size charSet + 1))
      else
        let (newStart, newSet) = shrink (startIndex, charSet) c
        in  (newStart, newSet, bestSoFar)

    shrink :: (Int, S.Set Char) -> Char -> (Int, S.Set Char)
    shrink = undefined

Now we implement “shrink” by considering the base case and recursive case. In the base case, the character at this index matches the new character we’ve trying to remove. So we can return the same set of characters, but increase the index.

In the recursive case, we still increase the index, but now we remove the character at the start index from the set without replacement. (Note how we need a vector for efficient indexing here).

largestUniqueSubsequence :: String -> Int
largestUniqueSubsequence input = best
  where
    (_, _, best) = foldl loop (0, S.empty, 0) input

   loop :: (Int, S.Set Char, Int) -> Char -> (Int, S.Set Char, Int)
   loop (startIndex, charSet, bestSoFar) c = if S.notMember c charSet
    then (startIndex, S.insert c charSet, max bestSoFar (S.size charSet + 1))
      else
        let (newStart, newSet) = shrink (startIndex, charSet) c
        in  (newStart, newSet, bestSoFar)

    shrink :: (Int, S.Set Char) -> Char -> (Int, S.Set Char)
    shrink (startIndex, charSet) c =
      let nextC = inputV V.! startIndex
          // Base Case: nextC is equal to newC
      in  if nextC == c then (startIndex + 1, charSet)
            // Recursive Case: Remove startIndex
            else shrink (startIndex + 1, S.delete nextC charSet) c

Now we have a complete Haskell solution!

Rust Solution

Now in our Rust solution, we’ll follow the same pattern we’ve been doing for these problems. We’ll set up our loop variables, write the loop, and handle the different cases in the loop. Because we had the nested recursive “shrink” function in Haskell, this will translate to a “while” loop in Rust, nested within our for-loop.

Here’s how we set up our loop variables:

pub fn length_of_longest_substring(s: String) -> i32 {
    let mut best = 0;
    let mut startIndex = 0;
    let inputV: Vec<char> = s.chars().collect();
    let mut charSet = HashSet::new();
    for c in s.chars() {
        ...
    }
}

Within the loop, we have the “easy” case, where the next character is not already in our set. We just insert it into our set, and we update best if we have a new maximum.

pub fn length_of_longest_substring(s: String) -> i32 {
    let mut best = 0;
    let mut startIndex = 0;
    let inputV: Vec<char> = s.chars().collect();
    let mut charSet = HashSet::new();
    for c in s.chars() {
        if charSet.contains(&c) {
            ...
        } else {
            charSet.insert(c);
            best = std::cmp::max(best, charSet.len());                
        }
    }
    return best as i32;
}

The Rust-specific oddity is that when we call contains on the HashSet, we must use &c, passing a reference to the character. In C++ we could just copy the character, or it could be handled by the function using const&. But Rust handles these things a little differently.

Now we get to the “tricky” case within our loop. How do we “shrink” our set to consume a new character?

In our case, we’ll actually just use the loop functionality of Rust, which works like while (true), requiring a manual break inside the loop. Our idea is that we’ll inspect the character at the “start” index of our window. If this character is the same as the new character, we will advance the start index (indicating we are dropping the old version), but then we’ll break. Otherwise, we’ll still increase the index, but we’ll remove the other character from the set as well.

Here’s what this loop looks like in relative isolation:

if charSet.contains(&c) {
    loop {
        // Look at “first” character of window
        let nextC = inputV[startIndex];
        if (nextC == c) {
            // If it’s the new character, we advance past it and break
            startIndex += 1;
            break;
        } else {
            // Otherwise, advance AND drop it from the set
            startIndex += 1;
            charSet.remove(&nextC);
        }
    }
} else {
    ...
}

The inner condition (nextC == c) feels a little flimsy to use with a while (true) loop. But it’s perfectly sound because of the invariant that if charSet contains c, we’ll necessarily find nextC == c before startIndex gets too large. We could also write it as a normal while loop, but loop is an interesting Rust-specific idea to bring in here.

Here’s our complete Rust solution!

pub fn length_of_longest_substring(s: String) -> i32 {
    let mut best = 0;
    let mut startIndex = 0;
    let inputV: Vec<char> = s.chars().collect();
    let mut charSet = HashSet::new();
    for c in s.chars() {
        if charSet.contains(&c) {
            loop {
                let nextC = inputV[startIndex];
                if (nextC == c) {
                    startIndex += 1;
                    break;
                } else {
                    startIndex += 1;
                    charSet.remove(&nextC);
                }
            }
        } else {
            charSet.insert(c);
            best = std::cmp::max(best, charSet.len());                
        }
    }
    return best as i32;
}

Conclusion

With today’s problem, we’ve covered another important problem-solving concept: the sliding window. We saw how this approach could work even with a fold in Haskell, considering one character at a time. We also saw how nested loops compare across Haskell and Rust.

For more problem solving tips and tricks, take a look at Solve.hs, our complete course on problem solving, data structures, and algorithms in Haskell. You’ll get tons of practice on problems like these so you can significantly level up your skills!

James Bowen