Buffer & Save with a Challenging Example

Jun 30

Welcome back to our series comparing LeetCode problems in Haskell and Rust. Today we’ll learn a new paradigm that I call “Buffer and Save”. This will also be the hardest problem we’ve done so far! The core loop structure isn’t that hard, but there are a couple layers of tricks to massage our data to get the final answer.

This will be the last problem we do that focuses strictly on string and list manipulation. The next set of problems we do will all rely on more advanced data structures or algorithmic ideas.

For more complete practice on problem solving in Haskell, check out Solve.hs, our newest course. This course will teach you everything you need to know about problem solving, data structures, and algorithms in Haskell. You’ll get loads of practice building structures and algorithms from scratch, which is very important for understanding and remembering how they work.

The Problem

Today’s problem is Text Justification. The idea here is that we are taking a list of words and a “maximum width” and printing out the words grouped into equal-width lines that are evenly spaced. Here’s an example input and output:

Example Input (list of 9 strings):
[“Study”, “Haskell”, “with”, “us”, “every”, “Monday”, “Morning”, “for”, “fun”]
Max Width: 16

Output (list of 4 strings):
“Study    Haskell”
“with   us  every”
“Monday   Morning”
“for fun         ”

There are a few notable rules, constraints, and edge cases. Here’s a list to sumarize them:

There is at least one word
No word is larger than the max width
All output strings must have max width as their length (including spaces)
The first word of every line is set to the left
The last line always has 1 space between words, and then enough spaces after the last word to read the max width.
All other lines with multiple words will align the final word all the way to the right
The spaces in non-final lines are distributed as evenly as possible, but extra spaces go between words to the left.

The final point is potentially the trickiest to understand. Consider the second line above, with us every. The max width is 16, and we have 3 words with a total of 11 characters. This leaves us 5 spaces. Having 3 words means 2 blanks, so the “left” blank gets 3 spaces and the “right” blank gets 2 spaces.

If you had a line with 5 words, a max width of 30, and 16 characters, you would place 4 spaces in the left two blanks, and 3 spaces in the right two blanks. The relative length of the words does not matter.

Words in Line: [“A”, “good”, “day”, “to”, “endure”]

Output Line:
“A    good    day   to   endure”

The Algorithm

As mentioned above, our main algorithmic idea could be called “buffer and save”. We’ve been defining all of our loops based on the state we must maintain between iterations of the loop. The buffer and save approach highlights two pieces of state for us:

The strings we’ve accumulated for our answer so far (the “result”)
A buffer of the strings in the “current” line we’re building.

So we’ll loop through the input words one at a time. We’ll consider if the next word can be added to the “current” line. If it would cause our current line to exceed the maximum width, we’ll “save” our current line and write it out to the “result” list, adding the required spaces.

To help our calculations, we’ll also include two other pieces of state in our loop:

The number of characters in our “current” line
The number of words in our “current” line

Finally, there’s the question of how to construct each output line. Combining the math with list-mechanics is a little tricky. But the central idea consists of 4 simple steps:

Find the number of spaces (subtract number of characters from max width)
Divide the number of spaces by the number of “blanks” (number of words - 1)
The quotient is the “base” number of spaces per blank
The remainder is the number of blanks (starting from the left) that get an extra space

The exact implementation of this idea differs between Haskell and Rust. Again this rests a lot on the “reverse” differences between Rust vectors and Haskell lists.

The final line has a slightly different (but easier) process. And we should note that the final line will still be in our buffer when we exit the loop! So we shouldn’t forget to add it to the result.

Haskell Solution

We know enough now to jump into our Haskell solution. Our solution should be organized around a loop. Since we go through the input word-by-word, this should follow a fold pattern. So here’s our outline:

justifyText :: [String] -> Int -> [String]
justifyText inputWords maxWidth = ...
  where
    -- f = ‘final’
    (fLine, fWordsInLine, fCharsInLine, result) = foldl loop ([], 0, 0, []) inputWords

    loop :: ([String], Int, Int, [String]) -> String -> ([String], Int, Int, [String])
    loop (currentLine, wordsInLine, charsInLine, currResult) newWord = ...

Let’s focus in on the choice we have to make in the loop. We need to determine if this new word fits in our current line. So we’ll get its length and add it to the number of characters in the line AND consider the number of words in the line. We count the words too since each word we already have requires at least one space!

-- (maxWidth is still in scope here)
loop :: ([String], Int, Int, [String]) -> String -> ([String], Int, Int, [String])
loop (currentLine, wordsInLine, charsInLine, currResult) newWord =
  let newWordLen = length newWord
  in  if newWordLen + charsInLine + wordsInLine > maxWidth
        then ...
        else ...

How do we fill in these choices? If we don’t overflow the line, we just append the new word, bump the count of the words, and add the new word’s length to the character count.

loop :: ([String], Int, Int, [String]) -> String -> ([String], Int, Int, [String])
loop (currentLine, wordsInLine, charsInLine, currResult) newWord =
  let newWordLen = length newWord
  in  if newWordLen + charsInLine + wordsInLine > maxWidth
        then ...
        else (newWord : currentLine, wordsInLine + 1, charsInLine + newWordLen, currResult)

The overflow case isn’t hard, but it does require us to have a function that can convert our current line into the final string. This function will also take the number of words and characters in this line. Assuming this function exists, we just make this new line, append it to result, and then reset our other stateful values so that they only reflect the “new word” as part of our current line.

loop :: ([String], Int, Int, [String]) -> String -> ([String], Int, Int, [String])
loop (currentLine, wordsInLine, charsInLine, currResult) newWord =
  let newWordLen = length newWord
      resultLine = makeLine currentLine wordsInLine charsInLine
  in  if newWordLen + charsInLine + wordsInLine > maxWidth
        then ([newWord], 1, newWordLen, resultLine : currResult)
        else (newWord : currentLine, wordsInLine + 1, charsInLine + newWordLen, currResult)

makeLine :: String -> Int -> Int -> String
makeLine = ...

Before we think about the makeLine implementation though, we just about have enough to fill in the rest of the “top” of our function definition. We’d just need another function for making the “final” line, since this is different from other lines. Then when we get our “final” state values, we’ll plug them into this function to get our final line, append this to the result, and reverse it all.

justifyText :: [String] -> Int -> [String]
justifyText inputWords maxWidth = 
  reverse (makeLineFinal flLine fWordsInLine fCharsInLine : result)
  where
    (fLine, fWordsInLine, fCharsInLine, result) = foldl loop ([], 0, 0, []) inputWords

    loop :: ([String], Int, Int, [String]) -> String -> ([String], Int, Int, [String])
    loop (currentLine, wordsInLine, charsInLine, currResult) newWord =
      let newWordLen = length newWord
          resultLine = makeLine currentLine wordsInLine charsInLine
      in  if newWordLen + charsInLine + wordsInLine > maxWidth
            then ([newWord], 1, newWordLen, resultLine : currResult)
            else (newWord : currentLine, wordsInLine + 1, charsInLine + newWordLen, currResult)

    makeLine :: [String] -> Int -> Int -> String
    makeLine = ...

    makeLineFinal :: [String] -> Int -> Int -> String
    makeLineFinal = ...

Now let’s discuss forming these lines, starting with the general case. We can start with a couple edge cases. This should never be called with an empty list. And with a singleton, we just left-align the word and add the right number of spaces:

makeLine :: [String] -> Int -> Int -> String
makeLine [] _ _ = error "Cannot makeLine with empty string!"
makeLine [onlyWord] _ charsInLine =
  let extraSpaces = replicate (maxWidth - charsInLine) ' '
  in  onlyWord <> extraSpaces
makeLine (first : rest) wordsInLine charsInLine = ...

Now we’ll calculate the quotient and remainder to get the spacing sizes, as mentioned in our algorithm section. But how do we combine them? There are multiple ways, but the idea I thought of was to zip the tail of the list with the number of spaces it needs to append. Then we can fold it into a resulting list using a function like this:

-- (String, Int) is the next string and the number of spaces after it
combine :: String -> (String, Int) -> String
combine suffix (nextWord, numSpaces) =
  nextWord <> replicate numSpaces ' ' <> suffix

Remember while doing this that we’ve accumulated the words for each line in reverse order. So we want to append each one in succession, together with the number of spaces that come after it.

To use this function, we can “fold” over the “tail” of our current line, while using the first word in our list as the base of the fold! Don’t forget the quotRem math going on in here!

makeLine :: [String] -> Int -> Int -> String
makeLine [] _ _ = error "Cannot makeLine with empty string!"
makeLine [onlyWord] _ charsInLine =
  let extraSpaces = replicate (maxWidth - charsInLine) ' '
  in  onlyWord <> extraSpaces
makeLine (first : rest) wordsInLine charsInLine = ...
  let (baseNumSpaces, numWithExtraSpace) = quotRem (maxWidth - charsInLine) (wordsInLine - 1)
      baseSpaces = replicate (wordsInLine - 1 - numWithExtraSpace) baseNumSpaces
      extraSpaces = replicate numWithExtraSpace (baseNumSpaces + 1)
      wordsWithSpaces = zip rest (baseSpaces <> extraSpaces)
  in  foldl combine first wordsWithSpaces

combine :: String -> (String, Int) -> String
combine suffix (nextWord, numSpaces) =
  nextWord <> replicate numSpaces ' ' <> suffix

To make the final line, we can also leverage our combine function! It’s just a matter of combining each word in our input with the appropriate number of spaces. In this case, almost every word gets 1 space except for the last one (which comes first in our list). This just gets however many trailing spaces we need!

makeLineFinal :: [String] -> Int -> Int -> String
makeLineFinal [] _ _ = error "Cannot makeLine with empty string!"
makeLineFinal strs wordsInLine charsInLine =
  let trailingSpaces = maxWidth - charsInLine - (wordsInLine - 1)
  in  foldl combine "" (zip strs (trailingSpaces : repeat 1))

Putting all these pieces together, we have our complete solution!

justifyText :: [String] -> Int -> [String]
justifyText inputWords maxWidth = 
  reverse (makeLineFinal flLine fWordsInLine fCharsInLine : result)
  where
    (fLine, fWordsInLine, fCharsInLine, result) = foldl loop ([], 0, 0, []) inputWords

    loop :: ([String], Int, Int, [String]) -> String -> ([String], Int, Int, [String])
    loop (currentLine, wordsInLine, charsInLine, currResult) newWord =
      let newWordLen = length newWord
          resultLine = makeLine currentLine wordsInLine charsInLine
      in  if newWordLen + charsInLine + wordsInLine > maxWidth
            then ([newWord], 1, newWordLen, resultLine : currResult)
            else (newWord : currentLine, wordsInLine + 1, charsInLine + newWordLen, currResult)

    makeLine :: [String] -> Int -> Int -> String
    makeLine [] _ _ = error "Cannot makeLine with empty string!"
    makeLine [onlyWord] _ charsInLine =
      let extraSpaces = replicate (maxWidth - charsInLine) ' '
      in  onlyWord <> extraSpaces
    makeLine (first : rest) wordsInLine charsInLine =
      let (baseNumSpaces, numWithExtraSpace) = quotRem (maxWidth - charsInLine) (wordsInLine - 1)
          baseSpaces = replicate (wordsInLine - 1 - numWithExtraSpace) baseNumSpaces
          extraSpaces = replicate numWithExtraSpace (baseNumSpaces + 1)
          wordsWithSpaces = zip rest (baseSpaces <> extraSpaces)
      in  foldl combine first wordsWithSpaces

    makeLineFinal :: [String] -> Int -> Int -> String
    makeLineFinal [] _ _ = error "Cannot makeLine with empty string!"
    makeLineFinal strs wordsInLine charsInLine =
      let trailingSpaces = maxWidth - charsInLine - (wordsInLine - 1)
      in  foldl combine "" (zip strs (trailingSpaces : repeat 1))

    combine :: String -> (String, Int) -> String
    combine suffix (nextWord, numSpaces) = nextWord <> replicate numSpaces ' ' <> suffix

Rust Solution

Now let’s put together our Rust solution. Since we have a reasonable outline from writing this in Haskell, let’s start with the simpler elements, makeLine and makeLineFinal. We’ll use library functions as much as possible for the string manipulation. For example, we can start makeLineFinal by using join on our input vector of strings.

pub fn make_line_final(
        currentLine: &Vec<&str>,
        max_width: usize,
        charsInLine: usize) -> String {
    let mut result = currentLine.join(" ");
    ...
}

Now we just need to calculate the number of trailing spaces, subtracting the number of characters in the joined string. We append this to the end by taking a blank space and using repeat for the correct number of times.

pub fn make_line_final(
        currentLine: &Vec<&str>,
        max_width: usize,
        charsInLine: usize) -> String {
    let mut result = currentLine.join(" ");
    let trailingSpaces = max_width - result.len();
    result.push_str(&" ".repeat(trailingSpaces));
    return result;
}

For those unfamiliar with Rust, the type of our input vector might seem odd. When we have &Vec<&str>, this means a reference to a vector of string slices. String slices are portions of a String that we hold a reference to, but they aren’t copied. However, when we join them, we make a new String result.

Also note that we aren’t passing wordsInLine as a separate parameter. We can get this value using .len() in constant time in Rust. In Haskell, length is O(n) so we don’t want to always do that.

Now for the general make_line function, we have the same type signature, but we start with our base case, where we only have one string in our current line. Again, we use repeat with the number of spaces.

pub fn make_line(
        currentLine: &Vec<&str>,
        max_width: usize,
        charsInLine: usize) -> String {
    let mut result = String::new();
    let n = currentLine.len();
    if (n == 1) {
        result.push_str(currentLine[0]);
        result.push_str(&" ".repeat(max_width - charsInLine));
        return result;
    }
    ...
}

Now we do the “math” portion of this. Rust doesn’t have a single quotRem function in its base library, so we calculate these values separately.

pub fn make_line(
        currentLine: &Vec<&str>,
        max_width: usize,
        charsInLine: usize) -> String {
    let mut result = String::new();
    let n = currentLine.len();
    if (n == 1) {
        result.push_str(currentLine[0]);
        result.push_str(&" ".repeat(max_width - charsInLine));
        return result;
    }
    let numSpaces = (max_width - charsInLine);
    let baseNumSpaces = numSpaces / (n - 1);
    let numWithExtraSpace = numSpaces % (n - 1);
    let mut i = 0;
    while i < n {
        ...
    }
    return result;
}

The while loop we’ll write here is instructive. We use an index instead of a for each pattern because the index tells us how many spaces to use. If our index is smaller than numWithExtraSpace, we add 1 to the base number of spaces. Otherwise we use the base until the index n - 1. This index has no extra spaces, so we’re done at that point!

pub fn make_line(
        currentLine: &Vec<&str>,
        max_width: usize,
        charsInLine: usize) -> String {
    let mut result = String::new();
    let n = currentLine.len();
    if (n == 1) {
        result.push_str(currentLine[0]);
        result.push_str(&" ".repeat(max_width - charsInLine));
        return result;
    }
    let numSpaces = (max_width - charsInLine);
    let baseNumSpaces = numSpaces / (n - 1);
    let numWithExtraSpace = numSpaces % (n - 1);
    let mut i = 0;
    while i < n {
        result.push_str(currentLine[i]);
        if i < numWithExtraSpace {
            result.push_str(&" ".repeat(baseNumSpaces + 1));
        } else if i < n - 1 {
            result.push_str(&" ".repeat(baseNumSpaces));
        }
        i += 1;
    }
    return result;
}

Now we frame our solution. Let’s start by setting up our state variables (again, omitting numWordsInLine). We’ll also redefine max_width as a usize value for ease of comparison later.

pub fn full_justify(words: Vec<String>, max_width: i32) -> Vec<String> {
    let mut currentLine = Vec::new();
    let mut charsInLine = 0;
    let mut result = Vec::new();
    let mw = max_width as usize;
    ...
}

Now we’d like to frame our solution as a “for each” loop. However, this doesn’t work, for Rust-related reasons we’ll describe after the solution! Instead, we’ll use an index loop.

pub fn full_justify(words: Vec<String>, max_width: i32) -> Vec<String> {
    let mut currentLine = Vec::new();
    let mut charsInLine = 0;
    let mut result = Vec::new();
    let mw = max_width as usize;
    let mut i = 0;
    let n = words.len();
    for i in 0..n {
        ...
    }
}

We’ll get the word by index on each iteration, and use its length to see if we’ll exceed the max width. If not, we can safely push it onto currentLine and increase the character count:

pub fn full_justify(words: Vec<String>, max_width: i32) -> Vec<String> {
    let mut currentLine = Vec::new();
    let mut charsInLine = 0;
    let mut result = Vec::new();
    let mw = max_width as usize;
    let mut i = 0;
    let n = words.len();
    for i in 0..n {
        let word = &words[i];
        if word.len() + charsInLine + currentLine.len() > mw {
            ...
        } else {
            currentLine.push(&words[i]);
            charsInLine += word.len();
        }
    }
}

Now when we do exceed the max width, we have to push our current line onto result (calling make_line). We clear the current line, push our new word, and use its length for charsInLine.

pub fn full_justify(words: Vec<String>, max_width: i32) -> Vec<String> {
    let mut currentLine = Vec::new();
    let mut charsInLine = 0;
    let mut result = Vec::new();
    let mw = max_width as usize;
    let mut i = 0;
    let n = words.len();
    for i in 0..n {
        let word = &words[i];
        if word.len() + charsInLine + currentLine.len() > mw {
            result.push(make_line(&currentLine, mw, charsInLine));
            currentLine.clear();
            currentLine.push(&words[i]);
            charsInLine = word.len();
        } else {
            currentLine.push(&words[i]);
            charsInLine += word.len();
        }
    }
    ...
}

After our loop, we’ll just call make_line_final on whatever is left in our currentLine! Here’s our complete full_justify function that calls make_line and make_line_final as we wrote above:

pub fn full_justify(words: Vec<String>, max_width: i32) -> Vec<String> {
    let mut currentLine = Vec::new();
    let mut charsInLine = 0;
    let mut result = Vec::new();
    let mw = max_width as usize;
    let mut i = 0;
    let n = words.len();
    for i in 0..n {
        let word = &words[i];
        if word.len() + charsInLine + currentLine.len() > mw {
            result.push(make_line(&currentLine, mw, charsInLine));
            currentLine.clear();
            currentLine.push(&words[i]);
            charsInLine = word.len();
        } else {
            currentLine.push(&words[i]);
            charsInLine += word.len();
        }
    }
    result.push(make_line_final(&currentLine, mw, charsInLine));
    return result;
}

Why an Index Loop?

Inside our Rust loop, we have an odd pattern in getting the “word” for this iteration. We first assign word = &words[i], and then later on, when we push that word, we reference words[i] again, using currentLine.push(&words[i]).

Why do this? Why not currentLen.push(word)? And then, why can’t we just do for word in words as our loop?

If we write our loop as for word in words, then we cannot reference the value word after the loop. It is “scoped” to the loop. However, currentLine “outlives” the loop! We have to reference currentLine at the end when we make our final line.

To get around this, we would basically have to copy the word instead of using a string reference &str, but this is unnecessarily expensive.

These are the sorts of odd “lifetime” quirks you have to learn to deal with in Rust. Haskell is easier in that it spares us from thinking about this. But Rust gains a significant performance boost with these sorts of ideas.

Conclusion

This was definitely the most involved problem we’ve dealt with so far. We learned a new paradigm (buffer and save), and got some experience dealing with some of the odd quirks and edge cases of string manipulation, especially in Rust. It was a fairly tricky problem, as far as list manipulation goes. For an easier example of a buffer and save problem, try solving Merge Intervals.

If you want to level up your Haskell problem solving skills, you need to take our course Solve.hs. This course will teach you everything you need to know about problem solving, data structures, and algorithms in Haskell. After this course, you’ll be in great shape to deal with these sorts of LeetCode style problems as they come up in your projects.

James Bowen