Binary Packet Parsing

Oct 31

Today we're back with a new problem walkthrough, this time from Day 16 of last year's Advent of Code. In some sense, the parsing section for this problem is very easy - there's not much data to read from the file. In another sense, it's actually rather hard! This problem is about parsing a binary format, similar in some sense to how network packets work. It's a good exercise in handling a few different kinds of recursive cases.

As with the previous parts of this series, you can take a look at the code on GitHub here. This problem also has quite a few utilities, so you can observe those as well. This article is a deep-dive code walkthrough, so having the code handy to look at might be a good idea!

Problem Description

For this problem, we're decoding a binary packet. The packet is initially given as a hexadecimal string.

A0016C880162017C3686B18A3D4780

But we'll turn it into binary and start working strictly with ones and zeros. However, the decoding process gets complicated because the packet is structured in a recursive way. But let's go over some of the rules.

Packet Header

Every packet has a six-bit header. The first three bits give a "version number" for the packet. The next three bits give a "type ID". That part's easy.

Then there are a series of rules about the rest of the information in the packet.

Literals

If the type ID is 4, the packet is a "literal". We then parse the remainder of the packet in 5-bit chunks. The first bit tells us if it is the last chunk of the packet (0 means yes, 1 means there are more chunks). The four other bits in the chunk are used to construct the binary number that forms the "value" of the literal. The more chunks, the higher the number can be.

Operator Sizes

Packets that aren't literals are operators. This means they contain a variable number of subpackets.

Operators have one bit (after the 6-bit header) giving a "length" type. A length type of "1" tells us that the following 11 bits give the number of subpackets. If the length bit is "0", then the next 15 bits give the length of all the subpackets in bits.

The Packet Structure

We'll see how these work out as we parse them. But with this structure in mind, one thing we can immediately do is come up with a recursive data type for a packet. I ended up calling this PacketNode since I thought of each as a node in a tree. It's pretty easy to see how to do this. We start with a base constructor for a Literal packet that only stores the version and the packet value. Then we just add an Operator constructor that will have a list of subpackets as well as a field for the operator type.

data PacketNode =
  Literal Word8 Word64 |
  Operator Word8 Word8 [PacketNode]
  deriving (Show)

Once we've parsed the packet, the "questions to answer" are, for the easy part, to take the sum of all the packet versions in our packet, and then to actually calculate the packet value recursively for the hard part. When we get to that part, we'll see how we use the operators to determine the value.

Solution Approach

The initial "parsing" part of this problem is actually quite easy. But we can observe that even after we have our binary values, it's still a parsing problem! We'll have an easy enough time answering the question once we've parsed our input into a PacketNode. So the core of the problem is parsing the ones and zeros into our PacketNode.

Since this is a parsing problem, we can actually use Megaparsec for the second part, instead of only for getting the input out of the file. Here's a possible signature for our core function:

-- More on this type later
data Bit = One | Zero

parsePacketNode :: (MonadLogger m) => ParsecT Void [Bit] m PacketNode

Whereas we normally use Text as the second type parameter to ParsecT, we can also use any list type, and the library will know what to do! With this function, we'll eventually be able to break our solution into its different parts. But first, we should start with some useful helpers for all our binary parsing.

Binary Utilities

Binary logic comes up fairly often in Advent of Code, and there are quite a few different utilities we would want to use with these ones and zeros. We start with a data type to represent a single bit. For maximum efficiency, we'd want to use a BitVector, but we aren't too worried about that. So we'll make a simple type with two constructors.

data Bit = Zero | One
  deriving (Eq, Ord)

instance Show Bit where
  show Zero = "0"
  show One = "1"

Our first order of business is turning a hexadecimal character into a list of bits. Hexadecimal numbers encapsulate 4 bits. So, for example, 0 should be [Zero, Zero, Zero, Zero], 1 should be [Zero, Zero, Zero, One], and F should be [One, One, One, One]. This is a simple pattern match, but we'll also have a failure case.

parseHexChar :: (MonadLogger m) => Char -> MaybeT m [Bit]
parseHexChar '0' = return [Zero, Zero, Zero, Zero]
parseHexChar '1' = return [Zero, Zero, Zero, One]
parseHexChar '2' = return [Zero, Zero, One, Zero]
parseHexChar '3' = return [Zero, Zero, One, One]
parseHexChar '4' = return [Zero, One, Zero, Zero]
parseHexChar '5' = return [Zero, One, Zero, One]
parseHexChar '6' = return [Zero, One, One, Zero]
parseHexChar '7' = return [Zero, One, One, One]
parseHexChar '8' = return [One, Zero, Zero, Zero]
parseHexChar '9' = return [One, Zero, Zero, One]
parseHexChar 'A' = return [One, Zero, One, Zero]
parseHexChar 'B' = return [One, Zero, One, One]
parseHexChar 'C' = return [One, One, Zero, Zero]
parseHexChar 'D' = return [One, One, Zero, One]
parseHexChar 'E' = return [One, One, One, Zero]
parseHexChar 'F' = return [One, One, One, One]
parseHexChar c = logErrorN ("Invalid Hex Char: " <> pack [c]) >> mzero

If we wanted, we could also include lowercase, but this problem doesn't require it.

We also want to be able to turn a list of bits into a decimal number. We'll do this for a couple different sizes of numbers. For smaller numbers (8 bits or below), we might want to return a Word8. For larger numbers we can do Word64. Calculating the decimal number is a tail recursive process, where we track the accumulated sum and the current power of 2.

bitsToDecimal8 :: [Bit] -> Word8
bitsToDecimal8 bits = if length bits > 8
  then error ("Too long! Use bitsToDecimal64! " ++ show bits)
  else btd8 0 1 (reverse bits)
    where
      btd8 :: Word8 -> Word8 -> [Bit] -> Word8
      btd8 accum _ [] = accum
      btd8 accum mult (b : rest) = case b of
        Zero -> btd8 accum (mult * 2) rest
        One -> btd8 (accum + mult) (mult * 2) rest

bitsToDecimal64 :: [Bit] -> Word64
bitsToDecimal64 bits = if length bits > 64
  then error ("Too long! Use bitsToDecimalInteger! " ++ (show $ bits))
  else btd64 0 1 (reverse bits)
    where
      btd64 :: Word64 -> Word64 -> [Bit] -> Word64
      btd64 accum _ [] = accum
      btd64 accum mult (b : rest) = case b of
        Zero -> btd64 accum (mult * 2) rest
        One -> btd64 (accum + mult) (mult * 2) rest

Last of all, we should write a parser for reading a hexadecimal string from our file. This is easy, because Megaparsec already has a parser for a single hexadecimal character.

parseHexadecimal :: (MonadLogger m) => ParsecT Void Text m String
parseHexadecimal = some hexDigitChar

Basic Bit Parsing

With all these utilities in place, we can get started with parsing our list of bits. As mentioned above, we want a function that generally looks like this:

parsePacketNode :: (MonadLogger m) => ParsecT Void [Bit] m PacketNode

However, we need one extra nuance. Because we have one layer that will parse several consecutive packets based on the number of bits parsed, we should also return this number as part of our function. In this way, we'll be able to determine if we're done with the subpackets of an operator packet.

parsePacketNode :: (MonadLogger m) => ParsecT Void [Bit] m (PacketNode, Word64)

We'll also want a wrapper around this function so we can call it from a normal context with the list of bits as the input. This looks a lot like the existing utilities (e.g. for parsing a whole file). We use runParserT from Megaparsec and do a case-branch on the result.

parseBits :: (MonadLogger m) => [Bit] -> MaybeT m PacketNode
parseBits bits = do
  result <- runParserT parsePacketNode "Utils.hs" bits
  case result of
    Left e -> logErrorN ("Failed to parse: " <> (pack . show $ e)) >> mzero
    Right (packet, _) -> return packet

We ignore the "size" of the parsed packet in the primary case, but we'll use its result in the recursive calls to parsePacketNode!

Having done this, we can now start writing basic parser functions. To parse a single bit, we'll just wrap the anySingle combinator from Megaparsec.

parseBit :: ParsecT Void [Bit] m Bit
parseBit = anySingle

If we want to parse a certain number of bits, we'll want to use the monadic count combinator. Let's write a function that parses three bits and turns it into a Word8, since we'll need this for the packet version and type ID.

parse3Bit :: ParsecT Void [Bit] m Word8
parse3Bit = bitsToDecimal8 <$> count 3 parseBit

We can then immediately use this to start filling in our parsing function!

parsePacketNode :: (MonadLogger m) => ParsecT Void [Bit] m (PacketNode, Word64)
parsePacketNode = do
  packetVersion <- parse3Bit
  packetTypeId <- parse3Bit
  ...

Then the rest of the function will depend upon the different cases we might parse.

Parsing a Literal

We can start with the "literal" case. This parses the "value" contained within the packet. We need to track the number of bits we parse so we can use this result in our parent function!

parseLiteral :: ParsecT Void [Bit] m (Word64, Word64)

As explained above, we examine chunks 5 bits at a time, and we end the packet once we have a chunk that starts with 0. This is a "while" loop pattern, which suggests tail recursion as our solution!

We'll have two accumulator arguments. First, the series of bits that contribute to our literal value. Second, the number of bits we've parsed so far (which must include the signal bit).

parseLiteral :: ParsecT Void [Bit] m (Word64, Word64)
parseLiteral = parseLiteralTail [] 0
  where
    parseLiteralTail :: [Bit] -> Word64 -> ParsecT Void [Bit] m (Word64, Word64)
    parseLiteralTail accumBits numBits = do
      ...

First, we'll parse the leading bit, followed by the four bits in the chunk value. We append these to our previously accumulated bits, and add 5 to the number of bits parsed:

parseLiteral :: ParsecT Void [Bit] m (Word64, Word64)
parseLiteral = parseLiteralTail [] 0
  where
    parseLiteralTail :: [Bit] -> Word64 -> ParsecT Void [Bit] m (Word64, Word64)
    parseLiteralTail accumBits numBits = do
      leadingBit <- parseBit
      nextBits <- count 4 parseBit
      let accum' = accumBits ++ nextBits
      let numBits' = numBits + 5
      ...

If the leading bit is 0, we're done! We can return our value by converting our accumulated bits to decimal. Otherwise, we recurse with our new values.

parseLiteral :: ParsecT Void [Bit] m (Word64, Word64)
parseLiteral = parseLiteralTail [] 0
  where
    parseLiteralTail :: [Bit] -> Word64 -> ParsecT Void [Bit] m (Word64, Word64)
    parseLiteralTail accumBits numBits = do
      leadingBit <- parseBit
      nextBits <- count 4 parseBit
      let accum' = accumBits ++ nextBits
      let numBits' = numBits + 5
      if leadingBit == Zero
        then return (bitsToDecimal64 accum', numBits')
        else parseLiteralTail accum' numBits'

Then it's very easy to incorporate this into our primary function. We check the type ID, and if it's "4" (for a literal), we call this function, and return with the Literal packet constructor.

parsePacketNode :: (MonadLogger m) => ParsecT Void [Bit] m (PacketNode, Word64)
parsePacketNode = do
  packetVersion <- parse3Bit
  packetTypeId <- parse3Bit
  if packetTypeId == 4
    then do
      (literalValue, literalBits) <- parseLiteral
      return (Literal packetVersion literalValue, literalBits + 6)
    else
      ...

Now we need to consider the "operator" cases and their subpackets.

Parsing from Number of Packets

We'll start with the simpler of these two cases, which is when we are parsing a specific number of subpackets. The first step, of course, is to parse the length type bit.

parsePacketNode :: (MonadLogger m) => ParsecT Void [Bit] m (PacketNode, Word64)
parsePacketNode = do
  packetVersion <- parse3Bit
  packetTypeId <- parse3Bit
  if packetTypeId == 4
    then do
      (literalValue, literalBits) <- parseLiteral
      return (Literal packetVersion literalValue, literalBits + 6)
    else do
      lengthTypeId <- parseBit
      if lengthTypeId == One
        then do
        ...

First, we have to count out 11 bits and use that to determine the number of subpackets. Once we have this number, we just have to recursively call the parsePacketNode function the given number of times.

parsePacketNode :: (MonadLogger m) => ParsecT Void [Bit] m (PacketNode, Word64)
parsePacketNode = do
 ...
  if packetTypeId == 4
    then ...
    else do
      lengthTypeId <- parseBit
      if lengthTypeId == One
        then do
          numberOfSubpackets <- bitsToDecimal64 <$> count 11 parseBit
          subPacketsWithLengths <- replicateM (fromIntegral numberOfSubpackets) parsePacketNode
         ...

We'll unzip these results to get our list of packets and the lengths. To get our final packet length, we take the sum of the sizes, but we can't forget to add the header bits and the length type bit (7 bits), and the bits from the number of subpackets (11).

parsePacketNode :: (MonadLogger m) => ParsecT Void [Bit] m (PacketNode, Word64)
parsePacketNode = do
 ...
  if packetTypeId == 4
    then ...
    else do
      lengthTypeId <- parseBit
      if lengthTypeId == One
        then do
          numberOfSubpackets <- bitsToDecimal64 <$> count 11 parseBit
          subPacketsWithLengths <- replicateM (fromIntegral numberOfSubpackets) parsePacketNode
         let (subPackets, lengths) = unzip subPacketsWithLengths
          return (Operator packetVersion packetTypeId subPackets, sum lengths + 7 + 11)
        else

Parsing from Number of Bits

Parsing based on the number of bits in all the subpackets is a little more complicated, because we have more state to track. As we loop through the different subpackets, we need to keep track of how many bits we still have to parse. So we'll make a separate helper function.

parseForPacketLength :: (MonadLogger m) => Int -> Word64 -> [PacketNode] -> ParsecT Void [Bit] m ([PacketNode], Word64)
parseForPacketLength remainingBits accumBits prevPackets = ...

The base case comes when we have 0 bits remaining. Ideally, this occurs with exactly 0 bits. If it's a negative number, this is a problem. But if it's successful, we'll reverse the accumulated packets and return the number of bits we've accumulated.

parseForPacketLength :: (MonadLogger m) => Int -> Word64 -> [PacketNode] -> ParsecT Void [Bit] m ([PacketNode], Word64)
parseForPacketLength remainingBits accumBits prevPackets = if remainingBits <= 0
  then do
    if remainingBits < 0
      then error "Failing"
      else return (reverse prevPackets, accumBits)
  else ...

In the recursive case, we make one new call to parsePacketNode (the original function, not this helper). This gives us a new packet, and some more bits that we've parsed (this is why we've been tracking that number the whole time). So we can subtract the size from the remaining bits, and add it to the accumulated bits. And then we'll make the actual recursive call to this helper function.

parseForPacketLength :: (MonadLogger m) => Int -> Word64 -> [PacketNode] -> ParsecT Void [Bit] m ([PacketNode], Word64)
parseForPacketLength remainingBits accumBits prevPackets = if remainingBits <= 0
  then do
    if remainingBits < 0
      then error "Failing"
      else return (reverse prevPackets, accumBits)
  else do
    (newPacket, size) <- parsePacketNode
    parseForPacketLength (remainingBits - fromIntegral size) (accumBits + fromIntegral size) (newPacket : prevPackets)

And that's all! All our different pieces fit together now and we're able to parse our packet!

Solving the Problems

Now that we've parsed the packet into our structure, the rest of the problem is actually quite easy and fun! We've created a straightforward recursive structure, and so we can loop through it in a straightforward recursive way. We'll just always use the Literal as the base case, and then loop through the list of packets for the base case.

Let's start with summing the packet versions. This will return a Word64 since we could be adding a lot of package versions. With a Literal package, we just immediately return the version.

sumPacketVersions :: PacketNode -> Word64
sumPacketVersions (Literal v _) = fromIntegral v
...

Then with operator packets, we just map over the sub-packets, take the sum of their versions, and then add the original packet's version.

sumPacketVersions :: PacketNode -> Word64
sumPacketVersions (Literal v _) = fromIntegral v
sumPacketVersions (Operator v _ packets) = fromIntegral v +
  sum (map sumPacketVersions packets)

Now, for calculating the final packet value, we again start with the Literal case, since we'll just return its value. Note that we'll do this monadically, since we'll have some failure conditions in the later parts.

calculatePacketValue :: MonadLogger m => PacketNode -> MaybeT m Word64
calculatePacketValue (Literal _ x) = return x

Now, for the first time in the problem, we actually have to care what the operators mean! Here's a summary of the first few operators:

0 = Sum of all subpackets
1 = Product of all subpackets
2 = Minimum of all subpackets
3 = Maximum of all subpackets

There are three other operators following the same basic pattern. They expect exactly two subpackets and perform a binary, boolean operator. If it is true, the value is 1. If the operation is false, the packet value is 0.

5 = Greater than operator (<)
6 = Less than operator (>)
7 = Equals operator (==)

For the first set of operations, we can recursively calculate the value of the sub-packets, and take the appropriate aggregate function over the list.

calculatePacketValue :: MonadLogger m => PacketNode -> MaybeT m Word64
calculatePacketValue (Literal _ x) = return x
calculatePacketValue (Operator _ 0 packets) = sum <$> mapM calculatePacketValue packets
calculatePacketValue (Operator _ 1 packets) = product <$> mapM calculatePacketValue packets
calculatePacketValue (Operator _ 2 packets) = minimum <$> mapM calculatePacketValue packets
calculatePacketValue (Operator _ 3 packets) = maximum <$> mapM calculatePacketValue packets
...

For the binary operations, we first have to verify that there are only two packets.

calculatePacketValue :: MonadLogger m => PacketNode -> MaybeT m Word64
...
calculatePacketValue (Operator _ 5 packets) = do
  if length packets /= 2
    then logErrorN "> operator '5' must have two packets!" >> mzero
    else ...

Then we just de-structure the packets, calculate each value, compare them, and then return the appropriate value.

calculatePacketValue :: MonadLogger m => PacketNode -> MaybeT m Word64
...
calculatePacketValue (Operator _ 5 packets) = do
  if length packets /= 2
    then logErrorN "> operator '5' must have two packets!" >> mzero
    else do
      let [p1, p2] = packets
      v1 <- calculatePacketValue p1
      v2 <- calculatePacketValue p2
      return (if v1 > v2 then 1 else 0)
calculatePacketValue (Operator _ 6 packets) = do
  if length packets /= 2
    then logErrorN "< operator '6' must have two packets!" >> mzero
    else do
      let [p1, p2] = packets
      v1 <- calculatePacketValue p1
      v2 <- calculatePacketValue p2
      return (if v1 < v2 then 1 else 0)
calculatePacketValue (Operator _ 7 packets) = do
  if length packets /= 2
    then logErrorN "== operator '7' must have two packets!" >> mzero
    else do
      let [p1, p2] = packets
      v1 <- calculatePacketValue p1
      v2 <- calculatePacketValue p2
      return (if v1 == v2 then 1 else 0)
calculatePacketValue p = do
  logErrorN ("Invalid packet! " <> (pack . show $ p))
  mzero

Concluding Code

To tie everything together, we just follow the steps.

Parse the hexadecimal from the file
Transform the hexadecimal string into a list of bits
Parse the packet
Answer the question

For the first part, we use sumPacketVersions on the resulting packet.

solveDay16Easy :: String -> IO (Maybe Int)
solveDay16Easy fp = runStdoutLoggingT $ do
  hexLine <- parseFile parseHexadecimal fp
  result <- runMaybeT $ do
    bitLine <- concatMapM parseHexChar hexLine
    packet <- parseBits bitLine
    return $ sumPacketVersions packet
  return (fromIntegral <$> result)

And the "hard" solution is the same, except we use calculatePacketValue instead.

solveDay16Hard :: String -> IO (Maybe Int)
solveDay16Hard fp = runStdoutLoggingT $ do
  hexLine <- parseFile parseHexadecimal fp
  result <- runMaybeT $ do
    bitLine <- concatMapM parseHexChar hexLine
    packet <- parseBits bitLine
    calculatePacketValue packet
  return (fromIntegral <$> result)

And we're done!

Conclusion

That's all for this solution! As always, you can take a look at the code on GitHub. Later this week I'll have the video walkthrough as well. To keep up with all the latest news, make sure to subscribe to our monthly newsletter! Subscribing will give you access to our subscriber resources, like our Beginners Checklist and our Production Checklist.

In Depth

James Bowen