Binary Packet Parsing
Today we're back with a new problem walkthrough, this time from Day 16 of last year's Advent of Code. In some sense, the parsing section for this problem is very easy - there's not much data to read from the file. In another sense, it's actually rather hard! This problem is about parsing a binary format, similar in some sense to how network packets work. It's a good exercise in handling a few different kinds of recursive cases.
As with the previous parts of this series, you can take a look at the code on GitHub here. This problem also has quite a few utilities, so you can observe those as well. This article is a deep-dive code walkthrough, so having the code handy to look at might be a good idea!
Problem Description
For this problem, we're decoding a binary packet. The packet is initially given as a hexadecimal string.
A0016C880162017C3686B18A3D4780
But we'll turn it into binary and start working strictly with ones and zeros. However, the decoding process gets complicated because the packet is structured in a recursive way. But let's go over some of the rules.
Packet Header
Every packet has a six-bit header. The first three bits give a "version number" for the packet. The next three bits give a "type ID". That part's easy.
Then there are a series of rules about the rest of the information in the packet.
Literals
If the type ID is 4, the packet is a "literal". We then parse the remainder of the packet in 5-bit chunks. The first bit tells us if it is the last chunk of the packet (0 means yes, 1 means there are more chunks). The four other bits in the chunk are used to construct the binary number that forms the "value" of the literal. The more chunks, the higher the number can be.
Operator Sizes
Packets that aren't literals are operators. This means they contain a variable number of subpackets.
Operators have one bit (after the 6-bit header) giving a "length" type. A length type of "1" tells us that the following 11 bits give the number of subpackets. If the length bit is "0", then the next 15 bits give the length of all the subpackets in bits.
The Packet Structure
We'll see how these work out as we parse them. But with this structure in mind, one thing we can immediately do is come up with a recursive data type for a packet. I ended up calling this PacketNode
since I thought of each as a node in a tree. It's pretty easy to see how to do this. We start with a base constructor for a Literal
packet that only stores the version and the packet value. Then we just add an Operator
constructor that will have a list of subpackets as well as a field for the operator type.
data PacketNode =
Literal Word8 Word64 |
Operator Word8 Word8 [PacketNode]
deriving (Show)
Once we've parsed the packet, the "questions to answer" are, for the easy part, to take the sum of all the packet versions in our packet, and then to actually calculate the packet value recursively for the hard part. When we get to that part, we'll see how we use the operators to determine the value.
Solution Approach
The initial "parsing" part of this problem is actually quite easy. But we can observe that even after we have our binary values, it's still a parsing problem! We'll have an easy enough time answering the question once we've parsed our input into a PacketNode
. So the core of the problem is parsing the ones and zeros into our PacketNode
.
Since this is a parsing problem, we can actually use Megaparsec
for the second part, instead of only for getting the input out of the file. Here's a possible signature for our core function:
-- More on this type later
data Bit = One | Zero
parsePacketNode :: (MonadLogger m) => ParsecT Void [Bit] m PacketNode
Whereas we normally use Text
as the second type parameter to ParsecT
, we can also use any list type, and the library will know what to do! With this function, we'll eventually be able to break our solution into its different parts. But first, we should start with some useful helpers for all our binary parsing.
Binary Utilities
Binary logic comes up fairly often in Advent of Code, and there are quite a few different utilities we would want to use with these ones and zeros. We start with a data type to represent a single bit. For maximum efficiency, we'd want to use a BitVector
, but we aren't too worried about that. So we'll make a simple type with two constructors.
data Bit = Zero | One
deriving (Eq, Ord)
instance Show Bit where
show Zero = "0"
show One = "1"
Our first order of business is turning a hexadecimal character into a list of bits. Hexadecimal numbers encapsulate 4 bits. So, for example, 0
should be [Zero, Zero, Zero, Zero]
, 1
should be [Zero, Zero, Zero, One]
, and F
should be [One, One, One, One]
. This is a simple pattern match, but we'll also have a failure case.
parseHexChar :: (MonadLogger m) => Char -> MaybeT m [Bit]
parseHexChar '0' = return [Zero, Zero, Zero, Zero]
parseHexChar '1' = return [Zero, Zero, Zero, One]
parseHexChar '2' = return [Zero, Zero, One, Zero]
parseHexChar '3' = return [Zero, Zero, One, One]
parseHexChar '4' = return [Zero, One, Zero, Zero]
parseHexChar '5' = return [Zero, One, Zero, One]
parseHexChar '6' = return [Zero, One, One, Zero]
parseHexChar '7' = return [Zero, One, One, One]
parseHexChar '8' = return [One, Zero, Zero, Zero]
parseHexChar '9' = return [One, Zero, Zero, One]
parseHexChar 'A' = return [One, Zero, One, Zero]
parseHexChar 'B' = return [One, Zero, One, One]
parseHexChar 'C' = return [One, One, Zero, Zero]
parseHexChar 'D' = return [One, One, Zero, One]
parseHexChar 'E' = return [One, One, One, Zero]
parseHexChar 'F' = return [One, One, One, One]
parseHexChar c = logErrorN ("Invalid Hex Char: " <> pack [c]) >> mzero
If we wanted, we could also include lowercase, but this problem doesn't require it.
We also want to be able to turn a list of bits into a decimal number. We'll do this for a couple different sizes of numbers. For smaller numbers (8 bits or below), we might want to return a Word8
. For larger numbers we can do Word64
. Calculating the decimal number is a tail recursive process, where we track the accumulated sum and the current power of 2.
bitsToDecimal8 :: [Bit] -> Word8
bitsToDecimal8 bits = if length bits > 8
then error ("Too long! Use bitsToDecimal64! " ++ show bits)
else btd8 0 1 (reverse bits)
where
btd8 :: Word8 -> Word8 -> [Bit] -> Word8
btd8 accum _ [] = accum
btd8 accum mult (b : rest) = case b of
Zero -> btd8 accum (mult * 2) rest
One -> btd8 (accum + mult) (mult * 2) rest
bitsToDecimal64 :: [Bit] -> Word64
bitsToDecimal64 bits = if length bits > 64
then error ("Too long! Use bitsToDecimalInteger! " ++ (show $ bits))
else btd64 0 1 (reverse bits)
where
btd64 :: Word64 -> Word64 -> [Bit] -> Word64
btd64 accum _ [] = accum
btd64 accum mult (b : rest) = case b of
Zero -> btd64 accum (mult * 2) rest
One -> btd64 (accum + mult) (mult * 2) rest
Last of all, we should write a parser for reading a hexadecimal string from our file. This is easy, because Megaparsec already has a parser for a single hexadecimal character.
parseHexadecimal :: (MonadLogger m) => ParsecT Void Text m String
parseHexadecimal = some hexDigitChar
Basic Bit Parsing
With all these utilities in place, we can get started with parsing our list of bits. As mentioned above, we want a function that generally looks like this:
parsePacketNode :: (MonadLogger m) => ParsecT Void [Bit] m PacketNode
However, we need one extra nuance. Because we have one layer that will parse several consecutive packets based on the number of bits parsed, we should also return this number as part of our function. In this way, we'll be able to determine if we're done with the subpackets of an operator packet.
parsePacketNode :: (MonadLogger m) => ParsecT Void [Bit] m (PacketNode, Word64)
We'll also want a wrapper around this function so we can call it from a normal context with the list of bits as the input. This looks a lot like the existing utilities (e.g. for parsing a whole file). We use runParserT
from Megaparsec and do a case-branch on the result.
parseBits :: (MonadLogger m) => [Bit] -> MaybeT m PacketNode
parseBits bits = do
result <- runParserT parsePacketNode "Utils.hs" bits
case result of
Left e -> logErrorN ("Failed to parse: " <> (pack . show $ e)) >> mzero
Right (packet, _) -> return packet
We ignore the "size" of the parsed packet in the primary case, but we'll use its result in the recursive calls to parsePacketNode
!
Having done this, we can now start writing basic parser functions. To parse a single bit, we'll just wrap the anySingle
combinator from Megaparsec.
parseBit :: ParsecT Void [Bit] m Bit
parseBit = anySingle
If we want to parse a certain number of bits, we'll want to use the monadic count
combinator. Let's write a function that parses three bits and turns it into a Word8
, since we'll need this for the packet version and type ID.
parse3Bit :: ParsecT Void [Bit] m Word8
parse3Bit = bitsToDecimal8 <$> count 3 parseBit
We can then immediately use this to start filling in our parsing function!
parsePacketNode :: (MonadLogger m) => ParsecT Void [Bit] m (PacketNode, Word64)
parsePacketNode = do
packetVersion <- parse3Bit
packetTypeId <- parse3Bit
...
Then the rest of the function will depend upon the different cases we might parse.
Parsing a Literal
We can start with the "literal" case. This parses the "value" contained within the packet. We need to track the number of bits we parse so we can use this result in our parent function!
parseLiteral :: ParsecT Void [Bit] m (Word64, Word64)
As explained above, we examine chunks 5 bits at a time, and we end the packet once we have a chunk that starts with 0. This is a "while" loop pattern, which suggests tail recursion as our solution!
We'll have two accumulator arguments. First, the series of bits that contribute to our literal value. Second, the number of bits we've parsed so far (which must include the signal bit).
parseLiteral :: ParsecT Void [Bit] m (Word64, Word64)
parseLiteral = parseLiteralTail [] 0
where
parseLiteralTail :: [Bit] -> Word64 -> ParsecT Void [Bit] m (Word64, Word64)
parseLiteralTail accumBits numBits = do
...
First, we'll parse the leading bit, followed by the four bits in the chunk value. We append these to our previously accumulated bits, and add 5 to the number of bits parsed:
parseLiteral :: ParsecT Void [Bit] m (Word64, Word64)
parseLiteral = parseLiteralTail [] 0
where
parseLiteralTail :: [Bit] -> Word64 -> ParsecT Void [Bit] m (Word64, Word64)
parseLiteralTail accumBits numBits = do
leadingBit <- parseBit
nextBits <- count 4 parseBit
let accum' = accumBits ++ nextBits
let numBits' = numBits + 5
...
If the leading bit is 0, we're done! We can return our value by converting our accumulated bits to decimal. Otherwise, we recurse with our new values.
parseLiteral :: ParsecT Void [Bit] m (Word64, Word64)
parseLiteral = parseLiteralTail [] 0
where
parseLiteralTail :: [Bit] -> Word64 -> ParsecT Void [Bit] m (Word64, Word64)
parseLiteralTail accumBits numBits = do
leadingBit <- parseBit
nextBits <- count 4 parseBit
let accum' = accumBits ++ nextBits
let numBits' = numBits + 5
if leadingBit == Zero
then return (bitsToDecimal64 accum', numBits')
else parseLiteralTail accum' numBits'
Then it's very easy to incorporate this into our primary function. We check the type ID, and if it's "4" (for a literal), we call this function, and return with the Literal
packet constructor.
parsePacketNode :: (MonadLogger m) => ParsecT Void [Bit] m (PacketNode, Word64)
parsePacketNode = do
packetVersion <- parse3Bit
packetTypeId <- parse3Bit
if packetTypeId == 4
then do
(literalValue, literalBits) <- parseLiteral
return (Literal packetVersion literalValue, literalBits + 6)
else
...
Now we need to consider the "operator" cases and their subpackets.
Parsing from Number of Packets
We'll start with the simpler of these two cases, which is when we are parsing a specific number of subpackets. The first step, of course, is to parse the length type bit.
parsePacketNode :: (MonadLogger m) => ParsecT Void [Bit] m (PacketNode, Word64)
parsePacketNode = do
packetVersion <- parse3Bit
packetTypeId <- parse3Bit
if packetTypeId == 4
then do
(literalValue, literalBits) <- parseLiteral
return (Literal packetVersion literalValue, literalBits + 6)
else do
lengthTypeId <- parseBit
if lengthTypeId == One
then do
...
First, we have to count out 11 bits and use that to determine the number of subpackets. Once we have this number, we just have to recursively call the parsePacketNode
function the given number of times.
parsePacketNode :: (MonadLogger m) => ParsecT Void [Bit] m (PacketNode, Word64)
parsePacketNode = do
...
if packetTypeId == 4
then ...
else do
lengthTypeId <- parseBit
if lengthTypeId == One
then do
numberOfSubpackets <- bitsToDecimal64 <$> count 11 parseBit
subPacketsWithLengths <- replicateM (fromIntegral numberOfSubpackets) parsePacketNode
...
We'll unzip these results to get our list of packets and the lengths. To get our final packet length, we take the sum of the sizes, but we can't forget to add the header bits and the length type bit (7 bits), and the bits from the number of subpackets (11).
parsePacketNode :: (MonadLogger m) => ParsecT Void [Bit] m (PacketNode, Word64)
parsePacketNode = do
...
if packetTypeId == 4
then ...
else do
lengthTypeId <- parseBit
if lengthTypeId == One
then do
numberOfSubpackets <- bitsToDecimal64 <$> count 11 parseBit
subPacketsWithLengths <- replicateM (fromIntegral numberOfSubpackets) parsePacketNode
let (subPackets, lengths) = unzip subPacketsWithLengths
return (Operator packetVersion packetTypeId subPackets, sum lengths + 7 + 11)
else
Parsing from Number of Bits
Parsing based on the number of bits in all the subpackets is a little more complicated, because we have more state to track. As we loop through the different subpackets, we need to keep track of how many bits we still have to parse. So we'll make a separate helper function.
parseForPacketLength :: (MonadLogger m) => Int -> Word64 -> [PacketNode] -> ParsecT Void [Bit] m ([PacketNode], Word64)
parseForPacketLength remainingBits accumBits prevPackets = ...
The base case comes when we have 0 bits remaining. Ideally, this occurs with exactly 0 bits. If it's a negative number, this is a problem. But if it's successful, we'll reverse the accumulated packets and return the number of bits we've accumulated.
parseForPacketLength :: (MonadLogger m) => Int -> Word64 -> [PacketNode] -> ParsecT Void [Bit] m ([PacketNode], Word64)
parseForPacketLength remainingBits accumBits prevPackets = if remainingBits <= 0
then do
if remainingBits < 0
then error "Failing"
else return (reverse prevPackets, accumBits)
else ...
In the recursive case, we make one new call to parsePacketNode
(the original function, not this helper). This gives us a new packet, and some more bits that we've parsed (this is why we've been tracking that number the whole time). So we can subtract the size from the remaining bits, and add it to the accumulated bits. And then we'll make the actual recursive call to this helper function.
parseForPacketLength :: (MonadLogger m) => Int -> Word64 -> [PacketNode] -> ParsecT Void [Bit] m ([PacketNode], Word64)
parseForPacketLength remainingBits accumBits prevPackets = if remainingBits <= 0
then do
if remainingBits < 0
then error "Failing"
else return (reverse prevPackets, accumBits)
else do
(newPacket, size) <- parsePacketNode
parseForPacketLength (remainingBits - fromIntegral size) (accumBits + fromIntegral size) (newPacket : prevPackets)
And that's all! All our different pieces fit together now and we're able to parse our packet!
Solving the Problems
Now that we've parsed the packet into our structure, the rest of the problem is actually quite easy and fun! We've created a straightforward recursive structure, and so we can loop through it in a straightforward recursive way. We'll just always use the Literal
as the base case, and then loop through the list of packets for the base case.
Let's start with summing the packet versions. This will return a Word64
since we could be adding a lot of package versions. With a Literal
package, we just immediately return the version.
sumPacketVersions :: PacketNode -> Word64
sumPacketVersions (Literal v _) = fromIntegral v
...
Then with operator packets, we just map over the sub-packets, take the sum of their versions, and then add the original packet's version.
sumPacketVersions :: PacketNode -> Word64
sumPacketVersions (Literal v _) = fromIntegral v
sumPacketVersions (Operator v _ packets) = fromIntegral v +
sum (map sumPacketVersions packets)
Now, for calculating the final packet value, we again start with the Literal
case, since we'll just return its value. Note that we'll do this monadically, since we'll have some failure conditions in the later parts.
calculatePacketValue :: MonadLogger m => PacketNode -> MaybeT m Word64
calculatePacketValue (Literal _ x) = return x
Now, for the first time in the problem, we actually have to care what the operators mean! Here's a summary of the first few operators:
0 = Sum of all subpackets
1 = Product of all subpackets
2 = Minimum of all subpackets
3 = Maximum of all subpackets
There are three other operators following the same basic pattern. They expect exactly two subpackets and perform a binary, boolean operator. If it is true, the value is 1. If the operation is false, the packet value is 0.
5 = Greater than operator (<)
6 = Less than operator (>)
7 = Equals operator (==)
For the first set of operations, we can recursively calculate the value of the sub-packets, and take the appropriate aggregate function over the list.
calculatePacketValue :: MonadLogger m => PacketNode -> MaybeT m Word64
calculatePacketValue (Literal _ x) = return x
calculatePacketValue (Operator _ 0 packets) = sum <$> mapM calculatePacketValue packets
calculatePacketValue (Operator _ 1 packets) = product <$> mapM calculatePacketValue packets
calculatePacketValue (Operator _ 2 packets) = minimum <$> mapM calculatePacketValue packets
calculatePacketValue (Operator _ 3 packets) = maximum <$> mapM calculatePacketValue packets
...
For the binary operations, we first have to verify that there are only two packets.
calculatePacketValue :: MonadLogger m => PacketNode -> MaybeT m Word64
...
calculatePacketValue (Operator _ 5 packets) = do
if length packets /= 2
then logErrorN "> operator '5' must have two packets!" >> mzero
else ...
Then we just de-structure the packets, calculate each value, compare them, and then return the appropriate value.
calculatePacketValue :: MonadLogger m => PacketNode -> MaybeT m Word64
...
calculatePacketValue (Operator _ 5 packets) = do
if length packets /= 2
then logErrorN "> operator '5' must have two packets!" >> mzero
else do
let [p1, p2] = packets
v1 <- calculatePacketValue p1
v2 <- calculatePacketValue p2
return (if v1 > v2 then 1 else 0)
calculatePacketValue (Operator _ 6 packets) = do
if length packets /= 2
then logErrorN "< operator '6' must have two packets!" >> mzero
else do
let [p1, p2] = packets
v1 <- calculatePacketValue p1
v2 <- calculatePacketValue p2
return (if v1 < v2 then 1 else 0)
calculatePacketValue (Operator _ 7 packets) = do
if length packets /= 2
then logErrorN "== operator '7' must have two packets!" >> mzero
else do
let [p1, p2] = packets
v1 <- calculatePacketValue p1
v2 <- calculatePacketValue p2
return (if v1 == v2 then 1 else 0)
calculatePacketValue p = do
logErrorN ("Invalid packet! " <> (pack . show $ p))
mzero
Concluding Code
To tie everything together, we just follow the steps.
- Parse the hexadecimal from the file
- Transform the hexadecimal string into a list of bits
- Parse the packet
- Answer the question
For the first part, we use sumPacketVersions
on the resulting packet.
solveDay16Easy :: String -> IO (Maybe Int)
solveDay16Easy fp = runStdoutLoggingT $ do
hexLine <- parseFile parseHexadecimal fp
result <- runMaybeT $ do
bitLine <- concatMapM parseHexChar hexLine
packet <- parseBits bitLine
return $ sumPacketVersions packet
return (fromIntegral <$> result)
And the "hard" solution is the same, except we use calculatePacketValue
instead.
solveDay16Hard :: String -> IO (Maybe Int)
solveDay16Hard fp = runStdoutLoggingT $ do
hexLine <- parseFile parseHexadecimal fp
result <- runMaybeT $ do
bitLine <- concatMapM parseHexChar hexLine
packet <- parseBits bitLine
calculatePacketValue packet
return (fromIntegral <$> result)
And we're done!
Conclusion
That's all for this solution! As always, you can take a look at the code on GitHub. Later this week I'll have the video walkthrough as well. To keep up with all the latest news, make sure to subscribe to our monthly newsletter! Subscribing will give you access to our subscriber resources, like our Beginners Checklist and our Production Checklist.