Day 7 - File System Shaving
Subscribe to Monday Morning Haskell!
Problem Overview
Our input is a series of file system commands that list the directories and files (with their sizes) on a computer. In the first part, we'll find every directory whose size is less than 100000 bytes and add their sizes. In the second part, we'll determine the smallest directory we need to delete to free up enough space to perform an update.
Solution Approach and Insights
As we loop through the different commands and outputs that are run, we want to track the current directory as a list of sub-directories. Then we can add each file's size to all directories along that tree. But we need to make sure our representation for each directory incorporates its full path as a list, and not just the top level.
Relevant Utilities
Once again, we're using OccMap
, but this time it will be the OccMapBig
alias that uses an unbounded Integer
instead of Word
, just because the sum of file sizes might get a bit large.
Parsing the Input
Here's a sample input:
$ cd /
$ ls
dir a
14848514 b.txt
8504156 c.dat
dir d
$ cd a
$ ls
dir e
29116 f
2557 g
62596 h.lst
$ cd e
$ ls
584 i
$ cd ..
$ cd ..
$ cd d
$ ls
4060174 j
8033020 d.log
5626152 d.ext
7214296 k
It contains four kinds of lines:
- Change Directory commands (
cd
) - List directory commands (
ls
) - A directory listed by
ls
(stars withdir
) - A file listed by
ls
(starts with a file size)
We treat each line as its own kind of "command" (even the outputs, which aren't technically commands). We'll make this data type to represent the idea:
type InputType = [LineType]
type LineType = Command
data Command =
ChangeDirectoryCommand String |
ListDirectoryCommand |
ListedDirectory String |
ListedFile Integer String
deriving (Show)
The parsing code isn't too hard. The main thing is we want an alternative parser for each command type.
parseLine :: (MonadLogger m) => ParsecT Void Text m LineType
parseLine = parseCD <|> parseLS <|> parseDir <|> parseFile
where
parseCD = ...
parseLS = ...
parseDir = ...
parseFile = ...
Within these parsers, we'll also have some other alternatives, but nothing it too tricky. For instance, the cd
parser has to account for cd ..
, cd /
and then using a normal directory name.
parseLine :: (MonadLogger m) => ParsecT Void Text m LineType
parseLine = parseCD <|> parseLS <|> parseDir <|> parseFile
where
parseCD = do
string "$ cd "
dir <- (unpack <$> string "..") <|> (unpack <$> string "/") <|> some letterChar
return $ ChangeDirectoryCommand dir
parseLS = ...
parseDir = ...
parseFile = ...
Here's the complete parser, which we apply line-by-line.
parseInput :: (MonadLogger m) => ParsecT Void Text m InputType
parseInput = sepEndBy1 parseLine eol
parseLine :: (MonadLogger m) => ParsecT Void Text m LineType
parseLine = parseCD <|> parseLS <|> parseDir <|> parseFile
where
parseCD = do
string "$ cd "
dir <- (unpack <$> string "..") <|> (unpack <$> string "/") <|> some letterChar
return $ ChangeDirectoryCommand dir
parseLS = string "$ ls" >> return ListDirectoryCommand
parseDir = do
string "dir "
dir <- some letterChar
return $ ListedDirectory dir
parseFile = do
fileSize <- fromIntegral <$> parsePositiveNumber
char ' '
fileName <- some (letterChar <|> char '.')
return $ ListedFile fileSize fileName
Getting the Solution
As we loop through different commands, we need to track the current directory we're in, as well as the sizes for each directory based on the files we've seen so far. Note we have to use the full path as the key. There are some duplicately named sub-directories, so we can't just use the relative name in our map!
data FSState = FSState
{ currentDirectory :: [String]
, directoryMap :: OccMapBig [String]
} deriving (Show)
So let's set up a fold to go through each line. Ultimately the directoryMap
is the only item we need to solve the problem.
type EasySolutionType = OccMapBig [String]
processInputEasy :: (MonadLogger m) => InputType -> m EasySolutionType
processInputEasy inputs = directoryMap <$> solveFold inputs
solveFold :: (MonadLogger m) => [LineType] -> m FSState
solveFold = foldM foldLine initialFoldV
initialFoldV :: FSState
initialFoldV = FSState [] M.empty
foldLine :: (MonadLogger m) => FSState -> LineType -> m FSState
foldLine = ...
Now we have to determine how each command modifies the state. However, only two commands actually change the state. Changing the directory will obviously modify our current directory, and reading a file with its size will modify our map. But reading the list command and reading a new directory name that is listed don't actually modify our state. So we can set up this template.
foldLine :: (MonadLogger m) => FSState -> LineType -> m FSState
foldLine prevState command = case command of
ChangeDirectoryCommand dir -> ...
ListedFile size _ -> ...
_ -> return prevState
Changing directory has three cases. If we change to ".."
, we remove a level from our hierarchy with tail
. If we change to "/"
, we reset the hierarchy to just have this root element. Otherwise, we append the new directory to the front of the hierarchy.
foldLine :: (MonadLogger m) => FSState -> LineType -> m FSState
foldLine prevState command = case command of
ChangeDirectoryCommand dir -> if dir == ".."
then return $ prevState { currentDirectory = tail (currentDirectory prevState)}
else if dir == "/"
then return $ prevState { currentDirectory = ["/"]}
else return $ prevState { currentDirectory = dir : currentDirectory prevState}
ListedFile size _ -> ...
_ -> return prevState
When we list a file, we have to go through all the subdirectories and add its size to their stored value. But remember, each subdirectory contains the full list. So we need the tails
function in order to properly enumerate these. Here's a quick example of tails
:
tails [1, 2, 3] = [[1, 2, 3], [2, 3], [3], []]
Then we need init
on that result to exclude the empty list from tails
. We fold through these options and use addKey
with our occupancy map.
Here is the complete function:
foldLine :: (MonadLogger m) => FSState -> LineType -> m FSState
foldLine prevState command = case command of
ChangeDirectoryCommand dir -> if dir == ".."
then return $ prevState { currentDirectory = tail (currentDirectory prevState)}
else if dir == "/"
then return $ prevState { currentDirectory = ["/"]}
else return $ prevState { currentDirectory = dir : currentDirectory prevState}
ListedFile size _ -> do
let allDirs = currentDirectory prevState
let newDirMap = foldl (\mp d -> addKey mp d size) (directoryMap prevState) (init $ tails allDirs)
return $ prevState { directoryMap = newDirMap}
_ -> return prevState
Now we'll have the mapping from directory paths to sizes, so we can start answering the questions!
Answering the Question
For the first part, we filter the directory sizes by only looking for those under 100000 bytes. Then we take the sum of these.
findEasySolution :: (MonadLogger m) => EasySolutionType -> m (Maybe Integer)
findEasySolution dirMap = do
let largePairs = filter (<= 100000) (M.elems dirMap)
return $ Just $ sum largePairs
For the second part, we sort the directory sizes and find the first one that will give us the desired unused space.
findHardSolution :: (MonadLogger m) => HardSolutionType -> m (Maybe Integer)
findHardSolution dirMap = do
let allDirSizes = sort (M.elems dirMap)
let usedSpace = last allDirSizes
let currentUnusedSpace = 70000000 - usedSpace
return $ find (\i -> currentUnusedSpace + i >= 30000000) allDirSizes
Observe how we use the last
element for the total "used" size. The largest size in our map should always be the root element, which contains all files! We don't want to sum the values in this list since otherwise we'd be double-counting files!
Now we just combine our functions and get our answers!
solveEasy :: FilePath -> IO (Maybe Integer)
solveEasy fp = runStdoutLoggingT $ do
input <- parseFile parseInput fp
result <- processInputEasy input
findEasySolution result
solveHard :: FilePath -> IO (Maybe Integer)
solveHard fp = runStdoutLoggingT $ do
input <- parseFile parseInput fp
result <- processInputEasy input
findHardSolution result
And we're done!