Day 6 - Parsing Unique Characters
Subscribe to Monday Morning Haskell!
Problem Overview
With today's problem, we're looping through a string and searching for the first sequence of a certain length with unique characters. For part 1, we have to find the index where our 4 most recent characters are all unique. For part 2, this number gets bumped to 14.
Relevant Utilities
This will be the first time we use an Occurrence Map (OccMap
) this year. A lot of problems rely on counting the occurrences of particular values. So I added a few wrappers and helpers to make this easy. So by using incKey
, we can bump up the stored value up by 1.
type OccMap a = OccMapI a Word
type OccMapI a i = Map a i
emptyOcc :: OccMap a
emptyOcc = M.empty
incKey :: (Ord a, Integral i) => OccMapI a i -> a -> OccMapI a i
incKey prevMap key = addKey prevMap key 1
decKey :: (Ord a, Integral i) => OccMapI a i -> a -> OccMapI a i
decKey prevMap key = case M.lookup key prevMap of
Nothing -> prevMap
Just 0 -> M.delete key prevMap
Just 1 -> M.delete key prevMap
Just x -> M.insert key (x - 1) prevMap
addKey :: (Ord a, Integral i) => OccMapI a i -> a -> i -> OccMapI a i
addKey prevMap key count = case M.lookup key prevMap of
Nothing -> M.insert key count prevMap
Just x -> M.insert key (x + count) prevMap
In this solution, we'll also use decKey
. Note that we delete
the key if the count gets down to 0. This will be important in our problem!
Solution Approach and Insights
When I initially approached this problem, I made a custom data type and stored the different characters as individual elements. This worked fine for 3 characters, but it was cumbersome for 14. So I rewrote the solution more generically. We track the most recent characters we've seen in two different structures simultaneously.
First, we use a sequence to track the order we received them, so that with each iteration, we'll drop one character from the front and add a new one to the back.
We'll also use an occurrence map to track the counts for each character type in the last 4 (or 14). We'll increment a character's key when it is added, and decrement when it is removed from the front. If at any point we have 14 keys in our occurrence map, we're done!
Parsing the Input
Today we're only parsing a string:
mjqjpqmgbljsphdztnvjfqwrcgsmlb
So the parser is trivial:
type InputType = String
parseInput :: (MonadLogger m) => ParsecT Void Text m InputType
parseInput = some letterChar
Getting the Solution
To solve our problem, we're going to need one primary function to process the characters. We'll parameterize this by the number of characters we need for a unique code. If we don't have enough characters to reach the unique number, we'll log an error and return the max integer.
processChars :: (MonadLogger m) => Int -> [Char] -> m Int
processChars numCharsNeeded input = if length input < numCharsNeeded
then logErrorN "Not enough chars!" >> return maxBound
else ...
Now we need to initialize our structures. We'll split the input string into its first part (up to the number of unique characters) and the rest. The first characters will go into a sequence as well as our occurrence map.
processChars :: (MonadLogger m) => Int -> [Char] -> m Int
processChars numCharsNeeded input = if length input < numCharsNeeded
then logErrorN "Not enough chars!" >> return maxBound
else do
let (firstChars, rest) = splitAt (numCharsNeeded - 1) input
seq = Seq.fromList firstChars
occ = foldl incKey emptyOcc firstChars
...
Now we need our recursive helper. This function will also be parameterized by the number of characters needed. The "state" for the helper will have an Int
for the current index we're at in the string. We'll also have the current queue of characters, as well as the occurrence map for the counts of each character.
Now for implementation, starting with the "base" case. This function should never reach the end of the input. If it does, we'll handle this error case in the same way as above.
processTail :: (MonadLogger m) => Int -> (Int, Seq.Seq Char, OccMap Char) -> [Char] -> m Int
processTail _ _ [] = logErrorN "No remaining chars!" >> return maxBound
processTail numCharsNeeded (count, seq, occ) (c : cs) = ...
Now break off the first piece of the sequence using Seq.viewl
so that we'll be able to modify the sequence later. We have another error case that should never be tripped.
processTail numCharsNeeded (count, seq, occ) (c : cs) = case Seq.viewl seq of
Seq.EmptyL -> logErrorN "Sequence is empty!" >> return maxBound
(first Seq.:< rest) -> do
...
Here's where we do the calculation. First, increment the value for our new character c
. At this point, we can check the size of our occurrence map. If it equals the number of characters we need, we're done! We can return the current count value.
Otherwise we'll recurse. We add the new character to the end of the queue and we decrement the occurrence map for the character we removed.
processTail :: (MonadLogger m) => Int -> (Int, Seq.Seq Char, OccMap Char) -> [Char] -> m Int
processTail _ _ [] = logErrorN "No remaining chars!" >> return maxBound
processTail numCharsNeeded (count, seq, occ) (c : cs) = case Seq.viewl seq of
Seq.EmptyL -> logErrorN "Sequence is empty!" >> return maxBound
(first Seq.:< rest) -> do
let occ' = incKey occ c
if M.size occ' == numCharsNeeded
then return count
else processTail numCharsNeeded (count + 1, rest Seq.|> c, decKey occ' first) cs
And now we just plug in the call to this helper into our original function!
processChars :: (MonadLogger m) => Int -> [Char] -> m Int
processChars numCharsNeeded input = if length input < numCharsNeeded
then logErrorN "Not enough chars!" >> return maxBound
else do
let (firstChars, rest) = splitAt (numCharsNeeded - 1) input
seq = Seq.fromList firstChars
occ = foldl incKey emptyOcc firstChars
processTail numCharsNeeded (numCharsNeeded, seq, occ) rest
Answering the Question
Now answering the questions is quite easy. We parameterize the calls with the different length values.
processInputEasy :: (MonadLogger m) => InputType -> m EasySolutionType
processInputEasy = processChars 4
processInputHard :: (MonadLogger m) => InputType -> m HardSolutionType
processInputHard = processChars 14
solveEasy :: FilePath -> IO (Maybe Int)
solveEasy fp = runStdoutLoggingT $ do
input <- parseFile parseInput fp
Just <$> processInputEasy input
solveHard :: FilePath -> IO (Maybe Int)
solveHard fp = runStdoutLoggingT $ do
input <- parseFile parseInput fp
Just <$> processInputHard input
And we're done!