Day 7 - File System Shaving

Solution code on GitHub

All 2022 Problems

Subscribe to Monday Morning Haskell!

Problem Overview

Full Description

Our input is a series of file system commands that list the directories and files (with their sizes) on a computer. In the first part, we'll find every directory whose size is less than 100000 bytes and add their sizes. In the second part, we'll determine the smallest directory we need to delete to free up enough space to perform an update.

Solution Approach and Insights

As we loop through the different commands and outputs that are run, we want to track the current directory as a list of sub-directories. Then we can add each file's size to all directories along that tree. But we need to make sure our representation for each directory incorporates its full path as a list, and not just the top level.

Relevant Utilities

Once again, we're using OccMap, but this time it will be the OccMapBig alias that uses an unbounded Integer instead of Word, just because the sum of file sizes might get a bit large.

Parsing the Input

Here's a sample input:

$ cd /
$ ls
dir a
14848514 b.txt
8504156 c.dat
dir d
$ cd a
$ ls
dir e
29116 f
2557 g
62596 h.lst
$ cd e
$ ls
584 i
$ cd ..
$ cd ..
$ cd d
$ ls
4060174 j
8033020 d.log
5626152 d.ext
7214296 k

It contains four kinds of lines:

  1. Change Directory commands (cd)
  2. List directory commands (ls)
  3. A directory listed by ls (stars with dir)
  4. A file listed by ls (starts with a file size)

We treat each line as its own kind of "command" (even the outputs, which aren't technically commands). We'll make this data type to represent the idea:

type InputType = [LineType]
type LineType = Command
data Command =
  ChangeDirectoryCommand String |
  ListDirectoryCommand |
  ListedDirectory String |
  ListedFile Integer String
  deriving (Show)

The parsing code isn't too hard. The main thing is we want an alternative parser for each command type.

parseLine :: (MonadLogger m) => ParsecT Void Text m LineType
parseLine = parseCD <|> parseLS <|> parseDir <|> parseFile
  where
    parseCD = ...
    parseLS = ...
    parseDir = ...
    parseFile = ...

Within these parsers, we'll also have some other alternatives, but nothing it too tricky. For instance, the cd parser has to account for cd .., cd / and then using a normal directory name.

parseLine :: (MonadLogger m) => ParsecT Void Text m LineType
parseLine = parseCD <|> parseLS <|> parseDir <|> parseFile
  where
    parseCD = do
      string "$ cd "
      dir <- (unpack <$> string "..") <|> (unpack <$> string "/") <|> some letterChar
      return $ ChangeDirectoryCommand dir
    parseLS = ...
    parseDir = ...
    parseFile = ...

Here's the complete parser, which we apply line-by-line.

parseInput :: (MonadLogger m) => ParsecT Void Text m InputType
parseInput = sepEndBy1 parseLine eol

parseLine :: (MonadLogger m) => ParsecT Void Text m LineType
parseLine = parseCD <|> parseLS <|> parseDir <|> parseFile
  where
    parseCD = do
      string "$ cd "
      dir <- (unpack <$> string "..") <|> (unpack <$> string "/") <|> some letterChar
      return $ ChangeDirectoryCommand dir
    parseLS = string "$ ls" >> return ListDirectoryCommand
    parseDir = do
      string "dir "
      dir <- some letterChar
      return $ ListedDirectory dir
    parseFile = do
      fileSize <- fromIntegral <$> parsePositiveNumber
      char ' '
      fileName <- some (letterChar <|> char '.')
      return $ ListedFile fileSize fileName

Getting the Solution

As we loop through different commands, we need to track the current directory we're in, as well as the sizes for each directory based on the files we've seen so far. Note we have to use the full path as the key. There are some duplicately named sub-directories, so we can't just use the relative name in our map!

data FSState = FSState
  { currentDirectory :: [String]
  , directoryMap :: OccMapBig [String]
  } deriving (Show)

So let's set up a fold to go through each line. Ultimately the directoryMap is the only item we need to solve the problem.

type EasySolutionType = OccMapBig [String]

processInputEasy :: (MonadLogger m) => InputType -> m EasySolutionType
processInputEasy inputs = directoryMap <$> solveFold inputs

solveFold :: (MonadLogger m) => [LineType] -> m FSState
solveFold = foldM foldLine initialFoldV

initialFoldV :: FSState
initialFoldV = FSState [] M.empty

foldLine :: (MonadLogger m) => FSState -> LineType -> m FSState
foldLine = ...

Now we have to determine how each command modifies the state. However, only two commands actually change the state. Changing the directory will obviously modify our current directory, and reading a file with its size will modify our map. But reading the list command and reading a new directory name that is listed don't actually modify our state. So we can set up this template.

foldLine :: (MonadLogger m) => FSState -> LineType -> m FSState
foldLine prevState command = case command of
  ChangeDirectoryCommand dir -> ...
  ListedFile size _ -> ...
  _ -> return prevState

Changing directory has three cases. If we change to "..", we remove a level from our hierarchy with tail. If we change to "/", we reset the hierarchy to just have this root element. Otherwise, we append the new directory to the front of the hierarchy.

foldLine :: (MonadLogger m) => FSState -> LineType -> m FSState
foldLine prevState command = case command of
  ChangeDirectoryCommand dir -> if dir == ".."
    then return $ prevState { currentDirectory = tail (currentDirectory prevState)}
    else if dir == "/"
      then return $ prevState { currentDirectory = ["/"]}
      else return $ prevState { currentDirectory = dir : currentDirectory prevState}
  ListedFile size _ -> ...
  _ -> return prevState

When we list a file, we have to go through all the subdirectories and add its size to their stored value. But remember, each subdirectory contains the full list. So we need the tails function in order to properly enumerate these. Here's a quick example of tails:

tails [1, 2, 3] = [[1, 2, 3], [2, 3], [3], []]

Then we need init on that result to exclude the empty list from tails. We fold through these options and use addKey with our occupancy map.

Here is the complete function:

foldLine :: (MonadLogger m) => FSState -> LineType -> m FSState
foldLine prevState command = case command of
  ChangeDirectoryCommand dir -> if dir == ".."
    then return $ prevState { currentDirectory = tail (currentDirectory prevState)}
    else if dir == "/"
      then return $ prevState { currentDirectory = ["/"]}
      else return $ prevState { currentDirectory = dir : currentDirectory prevState}
  ListedFile size _ -> do
    let allDirs = currentDirectory prevState
    let newDirMap = foldl (\mp d -> addKey mp d size) (directoryMap prevState) (init $ tails allDirs)
    return $ prevState { directoryMap = newDirMap}
  _ -> return prevState

Now we'll have the mapping from directory paths to sizes, so we can start answering the questions!

Answering the Question

For the first part, we filter the directory sizes by only looking for those under 100000 bytes. Then we take the sum of these.

findEasySolution :: (MonadLogger m) => EasySolutionType -> m (Maybe Integer)
findEasySolution dirMap = do
  let largePairs = filter (<= 100000) (M.elems dirMap)
  return $ Just $ sum largePairs

For the second part, we sort the directory sizes and find the first one that will give us the desired unused space.

findHardSolution :: (MonadLogger m) => HardSolutionType -> m (Maybe Integer)
findHardSolution dirMap = do
  let allDirSizes = sort (M.elems dirMap)
  let usedSpace = last allDirSizes
  let currentUnusedSpace = 70000000 - usedSpace
  return $ find (\i -> currentUnusedSpace + i >= 30000000) allDirSizes

Observe how we use the last element for the total "used" size. The largest size in our map should always be the root element, which contains all files! We don't want to sum the values in this list since otherwise we'd be double-counting files!

Now we just combine our functions and get our answers!

solveEasy :: FilePath -> IO (Maybe Integer)
solveEasy fp = runStdoutLoggingT $ do
  input <- parseFile parseInput fp
  result <- processInputEasy input
  findEasySolution result

solveHard :: FilePath -> IO (Maybe Integer)
solveHard fp = runStdoutLoggingT $ do
  input <- parseFile parseInput fp
  result <- processInputEasy input
  findHardSolution result

And we're done!

Video

YouTube Link

Previous
Previous

Day 8 - Scenic Tree Visibility

Next
Next

Day 6 - Parsing Unique Characters