This one is more complicated. Going to start at the end of the description and work back.
OK, I did, but as I worked back I kept adding things to the core classes, so it doesn’t start as simple as I did.
We will need to: find all of the directories with a total size of at most 100,000, then calculate the sum of their total sizes. In the example above, these directories are a and e; the sum of their total sizes is 95,437 (94,853 + 584). (As in this example, this process can count files more than once!)
- a (dir)
- e (dir)
- i (file, size=584)
- f (file, size=29116)
- g (file, size=2557)
- h.lst (file, size=62596)
Define classes
Base FileObj & File
Just names and sizes and __str__.
Code
from dataclasses import dataclass, field@dataclassclass FileObj():"""Base class for File & Directory. Has a name""" name: str# Name of the file or dir.@dataclassclass File(FileObj):"""Files have name and size""" size: int# Size of a file is givendef__str__(me):returnf"{me.name} (file, size={me.size})"
Directories compute their .size from their members. This is naturally recursive. 😊
This started really simple & grew as I needed methods later. Now has:
parent so we can build path and know depth.
Starts as None.
Gets assigned when our parent adds us. (During __setitem__ now.)
Helps __str__ get indents right
Eliminates need to build separate list of dirs during input.
Provides [] getting and setting items in dir.
Implementation changed from list to dict.
Added subdirs to get just the kids that are dirs.
Evicted a recursive version of that to a helper (below)
Alert
Initializing _kids: [] turns all Directory into Ender chests. Beware the default-mutable!
Alert
Solutions that worked in Example failed in Part 1 because (a) data has multiple dirs of same name (in different locations), & (b) I ran the solution on the wrong arg.
Doing (b) definitely caused a too-low error. I think it was (a) that caused the too-high error. But I didn’t find (b) until the end and by then fixing that fixed all.
Code
@dataclassclass Directory(FileObj):"""Directories have kids (files or dirs), and PARENT""" parent: FileObj=None# My parent dir. Set when I'm appended. _kids: dict[str: FileObj] = field(default_factory=dict)@propertydef size(me):"""Dir size is sum of kids' sizes."""returnsum(f.size for f in me._kids.values())@propertydef path(me) ->str:"""The path/to/me/. <- Note slash placement."""try:returnf"{me.parent.path}{me.name}/"exceptAttributeError:returnf"{me.name}/"@propertydef depth(me):"""How far down the tree am me?"""return me.path.count("/") - me.path.count("//") -1@propertydef subdirs(me):"""My kids what could has kids. Not files."""return [x for x in me ifisinstance(x, Directory)]def append(me, obj: FileObj) ->None:"""Shortcut: just me.append(e) instead of me[e.name] = e.""" me[obj.name] = objdef__str__(me):"""Indent formatted for pretty-printing: whole tree below me.""" indent =" "* me.depth head =f"{me.name} (dir, size={me.size})" tail =f"\n".join(f"{indent} - {val}"for val in me) # uses __iter__returnf"{head}\n{tail}"def__iter__(me):"""Return iterator of objects (not names!) in _kids."""returniter(me._kids.values())def__getitem__(me, name: str) -> FileObj:"""Get member FileObj by name. Or die."""return me._kids[name]def__setitem__(me, name: str, arg: FileObj) ->None:"""Add {name: arg} to _kids; if arg is dir, set arg.parent->me.""" me._kids[name] = argifisinstance(arg, Directory):if arg.parent isnotNone:raise(ValueError, "We are stealing someone else's kid!") arg.parent = me
a = Directory("a")root.append(a)g, h = File("g", 2557), File("h.lst", size=62596)f2 = File("f", 512) # <- Another file named f!assert f2 isnot ffor item in [e, f2, g, h, Directory("k")]: a.append(item)print(root)assert a.size ==66249assert root.size == f.size + a.sizeassert root.depth ==0assert a.depth ==1assert e.depth ==2assert a["k"].depth ==2
/ (dir, size=95365)
- f (file, size=29116)
- a (dir, size=66249)
- e (dir, size=584)
- i (file, size=584)
- f (file, size=512)
- g (file, size=2557)
- h.lst (file, size=62596)
- k (dir, size=0)
Test the subdirs property at two depths.
assert [x.name for x in root.subdirs] == ['a']assert [x.name for x in a.subdirs] == ['e', 'k']
Get all subdirs
This is basically a filtered depth-first traversal.
This used to return dir objects. Changed to (name, size) tuples while trying to track down Part1 error that turned out to be something else entirely. Could change back but, eh.
Code
def get_all_subdirs(start: Directory) ->list[Directory]:"""Return flat list of (name, size) for _all_ dirs under `start` dir.""" my_dirs = [(x.name, x.size) for x in start.subdirs]for _dir in start.subdirs: my_dirs.extend(get_all_subdirs(_dir))return my_dirs
Return flat list of (name, size) for all dirs under start dir.
get_all_subdirs(root)
[('a', 66249), ('e', 584), ('k', 0)]
assert [x[0] for x in get_all_subdirs(root)] == ['a', 'e', 'k']
OK, looks good! Now need to read the data.
Example
example ="""$ cd /$ lsdir a14848514 b.txt8504156 c.datdir d$ cd a$ lsdir e29116 f2557 g62596 h.lst$ cd e$ ls584 i$ cd ..$ cd ..$ cd d$ ls4060174 j8033020 d.log5626152 d.ext7214296 k""".strip().split("\n")
Parse the dirs
Code
def get_dirs(commands: list[str]) -> Directory:"""Create dir tree from '/'. Return all dirs""" root = Directory("/") dirstack = [root]for i, line inenumerate(commands): cur_dir = dirstack[-1]match line.split(): case ("$", "ls"):continue case ("dir", dirname):try: cur_dir[dirname] # Should fail, unless twice ls same dirprint("*********** {dirname} already in {cur_dir.name}!")exceptKeyError: cur_dir.append(Directory(dirname)) case ("$", "cd", dirname):match dirname:case"/": dirstack = [root]case"..": dirstack.pop()case _: dirstack.append(cur_dir[dirname]) depth =len(dirstack) -1print(f'{"❚"* depth}\tfrom cd {dirname}\t in line {i}') case (size, name): f = File(name, int(size)) cur_dir.append(f)case _:raiseValueError(f"Unrecognized line: '{line}'")return rootdirs = get_dirs(example)
from cd / in line 0
❚ from cd a in line 6
❚❚ from cd e in line 12
❚ from cd .. in line 15
from cd .. in line 16
❚ from cd d in line 17
/ (dir, size=48381165)
- a (dir, size=94853)
- e (dir, size=584)
- i (file, size=584)
- f (file, size=29116)
- g (file, size=2557)
- h.lst (file, size=62596)
- b.txt (file, size=14848514)
- c.dat (file, size=8504156)
- d (dir, size=24933642)
- j (file, size=4060174)
- d.log (file, size=8033020)
- d.ext (file, size=5626152)
- k (file, size=7214296)
We’re told: > The total size of directory e is 584 because it contains a single file i of size 584 and no other directories. The directory a has total size 94853 because it contains files f (size 29116), g (size 2557), and h.lst (size 62596), plus file i indirectly (a contains e which contains i). Directory d has total size 24933642. As the outermost directory, / contains every file. Its total size is 48381165, the sum of the size of every file.
Goal: > To begin, find all of the directories with a total size of at most 100,000, then calculate the sum of their total sizes. In the example above, these directories are a and e; the sum of their total sizes is 95,437 (94,853 + 584). (As in this example, this process can count files more than once!)
The code below would be clearer if get_all_subdirs returned objects instead of (name, size) tuples.
Code
def sum_cutoff(start: Directory, cutoff=100_000) ->int:"""Return sum of sizes for dirs with size < cutoff.""" subdirs = get_all_subdirs(start) sizes = [x[1] for x in subdirs if x[1] <100_000]returnsum(x for x in sizes if x <= cutoff)
withopen(f"data/{NAME}_input.txt") as f: data = f.read()data = data.strip().split("\n")
Examine the data - because it broke code that worked on the example.
print(f"Data has {len(data)} commands. First few:")print("\n".join(x for x in data[:10]))lines_with_dir = [x for x in data if x.startswith('dir')]print(f"There are {len(lines_with_dir)} lines with 'dir'.")
Data has 1014 commands. First few:
$ cd /
$ ls
dir blgtdv
dir dbrfcz
dir fvspj
dir hbjmndt
dir hzg
dir jpjgdm
dir mtd
dir pcpf
There are 182 lines with 'dir'.
Note that names get reused!
fvspj = [row for row in data if"cd fvspj"in row]fvspj2 = [row for row in data if"dir fvspj"in row]print(f"There are {len(fvspj)} rows with 'cd fvspj'!")print(f"There are {len(fvspj2)} rows with 'dir fvspj'!")
There are 13 rows with 'cd fvspj'!
There are 13 rows with 'dir fvspj'!
part1_subdirs = get_all_subdirs(part1_tree)data_dirs = [x for x in data if x.startswith("$ cd") andnot x.endswith("..")]n_getall =len(part1_subdirs)f"{n_getall} from get_all_subdirs; {len(data)} commands, of which {len(data_dirs)} cd to dir."
'182 from get_all_subdirs; 1014 commands, of which 183 cd to dir.'
Guess 2: 214,201,382 -> Too high Oh gosh yes, it’s larger than size of /!
D’oh! While some cleanups of the classes and functions no doubt helped, the core problem was:
sum_cutoff(root)
Rather than
sum_cutoff(part1_tree)
Copy/paste error: “root” was the example.
Part 2
The total disk space on the file system is 70_000_000.
To run the update, you need unused space of at least 30_000_000.
Find the smallest directory that, if deleted, would free up enough space on the filesystem to run the update. What is the total size of that directory?
ToDo Special case
dirs["/"]
to get self?