Polymer Expansion

Oct 24

Today we're back with another Advent of Code walkthrough. We're doing the problem from Day 14 of last year. Here are a couple previous walkthroughs:

If you want to see the code for today, you can find it here on GitHub!

If you're enjoying these problem overviews, make sure to subscribe to our monthly newsletter!

Problem Statement

The subject of today's problem is "polymer expansion". What this means in programming terms is that we'll be taking a string and inserting new characters into it based on side-by-side pairs.

The puzzle input looks like this:

NNCB

NN -> C
NC -> B
CB -> H
...

The top line of the input is our "starter string". It's our base for expansion. The lines that follow are codes that explain how to expand each pair of characters.

So in our original string of four characters (NNCB), there are three pairs: NN, NC, and CB. With the exception of the start and end characters, each character appears in two different pairs. So for each pair, we find the corresponding "insertion character" and construct a new string where all the insertion characters come between their parent pairs. The first pair gives us a C, the second pair gives us a new B, and the third pair gets us a new H.

So our string for the second step becomes: NCNBCHB. We'll then repeat the expansion a certain number of times.

For the first part, we'll run 10 steps of the expansion algorithm. For the second part, we'll do 40 steps. Each time, our final answer comes from taking the number of occurrences of the most common letter in the final string, and subtracting the occurrences of the least common letter.

Utilities

The main utility we'll end up using for this problem is an occurrence map. I decided to make this general idea for counting the number of occurrences of some item, since it's such a common pattern in these puzzles. The most generic alias we could have is a map where the key and value are parameterized, though the expectation is that i is an Integral type:

type OccMapI a i = Map a i

The most common usage is counting items up from 0. Since this is an unsigned, non-negative number, we would use Word.

type OccMap a = Map a Word

However, for today's problem, we're gonna be dealing with big numbers! So just to be safe, we'll use the unbounded Integer type, and make a separate type definition for that.

type OccMapBig a = Map a Integer

We can make a couple useful helper functions for this occurrence map. First, we can add a certain number value to a key.

addKey :: (Ord a, Integral i) => OccMapI a i -> a -> i -> OccMapI a i
addKey prevMap key count = case M.lookup key prevMap of
    Nothing -> M.insert key count prevMap
    Just x -> M.insert key (x + count) prevMap

We can add a specialization of this for "incrementing" a key, adding 1 to its value. We won't use this for today's solution, but it helps in a lot of cases.

incKey :: (Ord a, Integral i) => OccMapI a i -> a -> OccMapI a i
incKey prevMap key = addKey prevMap key 1

Now with our utilities out of the way, let's start parsing our input!

Parsing the Input

First off, let's define the result types of our parsing process. The starter string comes on the first line, so that's a separate String. But then we need to create a mapping between character pairs and the resulting character. We'll eventually want these in a HashMap, so let's make a type alias for that.

type PairMap = HashMap (Char, Char) Char

Now for parsing, we need to parse the start string, an empty line, and then each line of the code mapping.

Since most of the input is in the code mapping lines, let's do that first. Each line consists of parsing three characters, just separated by the arrow. This is very straightforward with Megaparsec.

parsePairCode :: (MonadLogger m) => ParsecT Void Text m (Char, Char, Char)
parsePairCode = do
  input1 <- letterChar
  input2 <- letterChar
  string " -> "
  output <- letterChar
  return (input1, input2, output)

Now let's make a function to combine these character tuples into the map. This is a nice quick fold:

buildPairMap :: [(Char, Char, Char)] -> HashMap (Char, Char) Char
buildPairMap = foldl (\prevMap (c1, c2, c3) -> HM.insert (c1, c2) c3 prevMap) HM.empty

The rest of our parsing function then parses the starter string and a couple newline characters before we get our pair codes.

parseInput :: (MonadLogger m) => ParsecT Void Text m (String, PairMap)
parseInput = do
  starterCode <- some letterChar
  eol >> eol
  pairCodes <- sepEndBy1 parsePairCode eol
  return (starterCode, buildPairMap pairCodes)

Then it will be easy enough to use our parseFile function from previous days. Now let's figure out our solution approach.

A Naive Approach

Now at first, the polymer expansion seems like a fairly simple problem. The root of the issue is that we have to write a function to run one step of the expansion. In principle, this isn't a hard function. We loop through the original string, two letters at a time, and gradually construct the new string for the next step.

One way to handle this would be with a tail recursive helper function. We could accumulate the new string (in reverse) through an accumulator argument.

runExpand :: (MonadLogger m)
  => PairMap
  -> String -- Accumulator
  -> String -- Remaining String
  -> m String

The "base case" of this function is when we have only one character left. In this case, we append it to the accumulator and reverse it all.

runExpand :: (MonadLogger m) => PairMap -> String -> String -> m String
runExpand pairMap accum [lastChar] = return $ reverse (lastChar : accum)

For the recursive case, we imagine we have at least two characters remaining. We'll look these characters up in our map. Then we'll append the first character and the new character to our accumulator, and then recurse on the remainder (including the second character).

runExpand :: (MonadLogger m) => PairMap -> String -> String -> m String
runExpand _ accum [lastChar] = return $ reverse (lastChar : accum)
runExpand pairMap accum (firstChar: secondChar : rest) = do
  let insertChar = pairMap HM.! (nextChar, secondChar)
  runExpand pairMap (insertChar : firstChar : accum) (secondChar : rest)

There are some extra edge cases we could handle here, but this isn't going to be how we solve the problem. The approach works...in theory. In practice though, it only works for a small number of steps. Why? Well the problem description gives a hint: This polymer grows quickly. In fact, with each step, our string essentially doubles in size - exponential growth!

This sort of solution is good enough for the first part, running only 10 steps. However, as the string gets bigger and bigger, we'll run out of memory! So we need something more efficient.

A Better Approach

The key insight here is that we don't actually care about the order of the letters in the string at any given time. All we really need to think about is the number of each kind of pair that is present. How does this work?

Well recall some of our basic code pairs from the top:

NN -> C
NC -> B
CB -> H
BN -> B

With the starter string like NNCB, we have one NN pair, an NC pair, and CB pair. In the next step, the NN pair generates two new pairs. Because a C is inserted between the N, we lose the NN pair but gain a NC pair and a CN pair. So after expansion the number of resulting NC pairs is 1, and the number of CN pairs is 1.

However, this is true of every NN pair within our string! Suppose we instead start off this with:

NNCBNN

Now there are two NN pairs, meaning the resulting string will have two NC pairs and two CN pairs, as you can see by taking a closer look at the result: NCNBCHBBNCN.

So instead of keeping the complete string in memory, all we need to do is use the "occurrence map" utility to store the number of each pair for our current state. So we'll keep folding over an object of type OccMapBig (Char, Char).

The first step of our solution then is to construct our initial mapping from the starter code. We can do this by folding through the starter string in a similar way to the example code in the naive solution. We one or zero characters are left in our "remainder", that's a base case and we can return the map.

-- Same signature as naive approach
expandPolymerLong :: (MonadLogger m) => Int -> String -> PairMap -> m (Maybe Integer)
expandPolymerLong numSteps starterCode pairMap = do
  let starterMap = buildInitialMap M.empty starterCode
  ...
  where
    buildInitialMap :: OccMapBig (Char, Char) -> String -> OccMapBig (Char, Char)
    buildInitialMap prevMap "" = prevMap
    buildInitialMap prevMap [_] = prevMap
   ...

Now for the recursive case, we have at least two characters remaining, so we'll just increment the value for the key formed by these characters!

-- Same signature as naive approach
expandPolymerLong :: (MonadLogger m) => Int -> String -> PairMap -> m (Maybe Integer)
expandPolymerLong numSteps starterCode pairMap = do
  let starterMap = buildInitialMap M.empty starterCode
  ...
  where
    buildInitialMap :: OccMapBig (Char, Char) -> String -> OccMapBig (Char, Char)
    buildInitialMap prevMap "" = prevMap
    buildInitialMap prevMap [_] = prevMap
    buildInitialMap prevMap (firstChar : secondChar : rest) = buildInitialMap (incKey prevMap (firstChar, secondChar)) (secondChar : rest)

The key point, of course, is how to expand our map each step, so let's do this next!

A New Expansion

To run a single step in our naive solution, we could use a tail-recursive helper to gradually build up the new string (the "accumulator") from the old string (the "remainder" or "rest"). So our type signature looked like this:

runExpand :: (MonadLogger m)
  => PairMap
  -> String -- Accumulator
  -> String -- Remainder
  -> m String

For our new expansion step, we're instead taking one occurrence map and transforming it into a new occurrence map. For convenience, we'll include an integer argument keeping track of which step we're on, but we won't need to use it in the function. We'll do all this within expandPolymerLong so that we have access to the PairMap argument.

expandPolymerLong :: (MonadLogger m) => Int -> String -> PairMap -> m (Maybe Integer)
expandPolymerLong numSteps starterCode pairMap = do
  ...
  where
    runStep ::(MonadLogger m) => OccMapBig (Char, Char) -> Int -> m (OccMapBig (Char, Char))
    runStep = ...

The runStep function has a simple idea behind it though. We gradually reconstruct our occurrence map by folding through the pairs in the previous map. We'll make a new function runExpand to act as the folding function.

expandPolymerLong :: (MonadLogger m) => Int -> String -> PairMap -> m (Maybe Integer)
expandPolymerLong numSteps starterCode pairMap = do
  ...
  where
    runStep ::(MonadLogger m) => OccMapBig (Char, Char) -> Int -> m (OccMapBig (Char, Char))
    runStep prevMap _ = foldM runExpand M.empty (M.toList prevMap)

    runExpand :: (MonadLogger m) => OccMapBig (Char, Char) -> ((Char, Char), Integer) -> m (OccMapBig (Char, Char))
    runExpand = ...

For this function, we begin by looking up the two-character code in our map. If for whatever reason it doesn't exist, we'll move on, but it's worth logging an error message since this isn't supposed to happen.

runExpand :: (MonadLogger m) => OccMapBig (Char, Char) -> ((Char, Char), Integer) -> m (OccMapBig (Char, Char))
runExpand prevMap (code@(c1, c2), count) = case HM.lookup code pairMap of
  Nothing -> logErrorN ("Missing Code: " <> pack [c1, c2]) >> return prevMap
  Just newChar -> ...

Now once we've found the new character, we'll create our first new pair and our second new pair by inserting the new character with our previous characters.

runExpand :: (MonadLogger m) => OccMapBig (Char, Char) -> ((Char, Char), Integer) -> m (OccMapBig (Char, Char))
runExpand prevMap (code@(c1, c2), count) = case HM.lookup code pairMap of
  Nothing -> logErrorN ("Missing Code: " <> pack [c1, c2]) >> return prevMap
  Just newChar -> do
    let first = (c1, newChar)
        second = (newChar, c2)
  ...

And to wrap things up, we add the new count value for each of our new keys to the existing map! This is done with nested calls to addKey on our occurrence map.

runExpand :: (MonadLogger m) => OccMapBig (Char, Char) -> ((Char, Char), Integer) -> m (OccMapBig (Char, Char))
runExpand prevMap (code@(c1, c2), count) = case HM.lookup code pairMap of
  Nothing -> logErrorN ("Missing Code: " <> pack [c1, c2]) >> return prevMap
  Just newChar -> do
    let first = (c1, newChar)
        second = (newChar, c2)
  return $ addKey (addKey prevMap first count) second count

Rounding Up

Now we have our last task: finding the counts of the characters in the final string, and subtracting the minimum from the maximum. This requires us to first disassemble our mapping of pair counts into a mapping of individual character counts. This is another fold step. But just like before, we use nested calls to addKey on an occurrence map! See how countChars works below:

expandPolymerLong :: (MonadLogger m) => Int -> String -> PairMap -> m (Maybe Integer)
expandPolymerLong numSteps starterCode pairMap = do
  let starterMap = buildInitialMap M.empty starterCode
  finalOccMap <- foldM runStep starterMap [1..numSteps]
  let finalCharCountMap = foldl countChars M.empty (M.toList finalOccMap)
  ...
  where
    countChars :: OccMapBig Char -> ((Char, Char), Integer) -> OccMapBig Char
    countChars prevMap ((c1, c2), count) = addKey (addKey prevMap c1 count) c2 count

So we have a count of the characters in our final string...sort of. Recall that we added characters for each pair. Thus the number we're getting is basically doubled! So we want to divide each value by 2, with the exception of the first and last characters in the string. If these are the same, we have an edge case. We divide the number by 2 and then add an extra one. Otherwise, if a character has an odd value, it must be on the end, so we divide by two and round up. We sum up this logic with the quotRoundUp function, which we apply over our finalCharCountMap.

expandPolymerLong :: (MonadLogger m) => Int -> String -> PairMap -> m (Maybe Integer) expandPolymerLong numSteps starterCode pairMap = do let starterMap = buildInitialMap M.empty starterCode finalOccMap <- foldM runStep starterMap [1..numSteps] let finalCharCountMap = foldl countChars M.empty (M.toList finalOccMap) let finalCounts = map quotRoundUp (M.toList finalCharCountMap) ... where quotRoundUp :: (Char, Integer) -> Integer quotRoundUp (c, i) = if even i then quot i 2 + if head starterCode == c && last starterCode == c then 1 else 0 else quot i 2 + 1

And finally, we consider the list of outcomes and take the maximum minus the minimum!

```haskell
expandPolymerLong :: (MonadLogger m) => Int -> String -> PairMap -> m (Maybe Integer)
expandPolymerLong numSteps starterCode pairMap = do
  let starterMap = buildInitialMap M.empty starterCode
  finalOccMap <- foldM runStep starterMap [1..numSteps]
  let finalCharCountMap = foldl countChars M.empty (M.toList finalOccMap)
  let finalCounts = map quotRoundUp (M.toList finalCharCountMap)
  if null finalCounts
    then logErrorN "Final Occurrence Map is empty!" >> return Nothing
    else return $ Just $ fromIntegral (maximum finalCounts - minimum finalCounts)

  where
    buildInitialMap = ...
    runStep = ...
    runExpand = ...
    countChars = ...
    quotRoundUp = ...

Last of all, we combine input parsing with solving the problem. Our "easy" and "hard" solutions look the same, just with different numbers of steps.

solveDay14Easy :: String -> IO (Maybe Integer)
solveDay14Easy fp = runStdoutLoggingT $ do
  (starterCode, pairCodes) <- parseFile parseInput fp
  expandPolymerLong 10 starterCode pairCodes

solveDay14Hard :: String -> IO (Maybe Integer)
solveDay14Hard fp = runStdoutLoggingT $ do
  (starterCode, pairCodes) <- parseFile parseInput fp
  expandPolymerLong 40 starterCode pairCodes

Conclusion

Hopefully that solution makes sense to you! In case I left anything out of my solution, you can peruse the code on GitHub. Later this week, we'll have a video walkthrough of this solution!

If you're enjoying this content, make sure to subscribe to our monthly newsletter, which will also give you access to our Subscriber Resources!

In Depth

James Bowen