Python Working with Directories Practice Problems & Exercises
Practice: Working with Directories
← Back to lessonEasy
Predict the output. Two directory-creation approaches are tested — one succeeds, one fails, and then the correct approach succeeds.
import os
import tempfile
with tempfile.TemporaryDirectory() as root:
# makedirs creates the full chain a/b/c
deep = os.path.join(root, "a", "b", "c")
os.makedirs(deep)
print(os.path.isdir(deep))
# mkdir only creates one level — parent must exist
single_deep = os.path.join(root, "x", "y")
try:
os.mkdir(single_deep)
except FileNotFoundError:
print("FileNotFoundError")
# mkdir works fine when parent exists
os.mkdir(os.path.join(root, "x"))
os.mkdir(os.path.join(root, "x", "y"))
print(os.path.isdir(os.path.join(root, "x", "y")))Solution
import os
import tempfile
with tempfile.TemporaryDirectory() as root:
deep = os.path.join(root, "a", "b", "c")
os.makedirs(deep)
print(os.path.isdir(deep))
single_deep = os.path.join(root, "x", "y")
try:
os.mkdir(single_deep)
except FileNotFoundError:
print("FileNotFoundError")
os.mkdir(os.path.join(root, "x"))
os.mkdir(os.path.join(root, "x", "y"))
print(os.path.isdir(os.path.join(root, "x", "y")))
Output:
True
FileNotFoundError
True
How it works: os.makedirs("a/b/c") creates all three levels in one call — equivalent to mkdir -p in Unix. os.mkdir("x/y") fails with FileNotFoundError because x does not exist yet. After creating x first, os.mkdir("x/y") succeeds because its parent now exists.
Key insight: os.makedirs(path, exist_ok=True) is the production-standard pattern for ensuring a directory exists. It handles both the "create from scratch" and "already exists" cases in a single atomic call. Only use os.mkdir() when you are certain the parent exists and want the explicit error if it does not.
Expected Output
True\nFileNotFoundError\nTrueHints
Hint 1: os.mkdir() creates exactly one directory. Its parent must already exist.
Hint 2: os.makedirs() creates all intermediate directories in the path chain.
Predict the output. The same directory is created twice — once without exist_ok and once with it — to demonstrate the error-suppression behavior.
import os
import tempfile
from pathlib import Path
with tempfile.TemporaryDirectory() as root:
target = os.path.join(root, "output", "logs")
# First creation succeeds
os.makedirs(target)
print(os.path.isdir(target))
# Second creation without exist_ok raises
try:
os.makedirs(target)
except FileExistsError:
print("FileExistsError")
# exist_ok=True silences the error
os.makedirs(target, exist_ok=True)
print(os.path.isdir(target))
# pathlib equivalent
Path(target).mkdir(parents=True, exist_ok=True)
print(Path(target).is_dir())Solution
import os
import tempfile
from pathlib import Path
with tempfile.TemporaryDirectory() as root:
target = os.path.join(root, "output", "logs")
os.makedirs(target)
print(os.path.isdir(target))
try:
os.makedirs(target)
except FileExistsError:
print("FileExistsError")
os.makedirs(target, exist_ok=True)
print(os.path.isdir(target))
Path(target).mkdir(parents=True, exist_ok=True)
print(Path(target).is_dir())
Output:
True
FileExistsError
True
True
How it works:
- The first
os.makedirs(target)createsoutput/logs— success. - Without
exist_ok=True, a second call on an already-existing directory raisesFileExistsError. os.makedirs(target, exist_ok=True)succeeds silently regardless of whether the directory already exists.Path(target).mkdir(parents=True, exist_ok=True)is the pathlib equivalent — identical behavior.
Key insight: The older check-then-create pattern if not os.path.exists(path): os.makedirs(path) has a race condition: another process could create the directory between the check and the creation, causing the makedirs call to still raise. Always use exist_ok=True to avoid this TOCTOU (time-of-check/time-of-use) race.
Expected Output
True\nFileExistsError\nTrue\nTrueHints
Hint 1: Without exist_ok=True, os.makedirs raises FileExistsError if the directory already exists.
Hint 2: Path.mkdir(parents=True, exist_ok=True) is the pathlib equivalent.
Predict the output. Two deletion strategies are tested against a non-empty directory to show when each raises and when each succeeds.
import os
import shutil
import tempfile
with tempfile.TemporaryDirectory() as root:
# Create a directory with content
project_dir = os.path.join(root, "project")
os.makedirs(os.path.join(project_dir, "src"))
open(os.path.join(project_dir, "README.md"), "w").close()
# os.rmdir fails on non-empty directory
try:
os.rmdir(project_dir)
except OSError:
print("OSError")
# shutil.rmtree deletes everything recursively
shutil.rmtree(project_dir)
print(os.path.isdir(project_dir))Solution
import os
import shutil
import tempfile
with tempfile.TemporaryDirectory() as root:
project_dir = os.path.join(root, "project")
os.makedirs(os.path.join(project_dir, "src"))
open(os.path.join(project_dir, "README.md"), "w").close()
try:
os.rmdir(project_dir)
except OSError:
print("OSError")
shutil.rmtree(project_dir)
print(os.path.isdir(project_dir))
Output:
OSError
False
How it works: os.rmdir() is the safe primitive — it only removes a directory that is completely empty. If the directory contains any files or subdirectories, it raises OSError (specifically [Errno 39] Directory not empty). shutil.rmtree() recursively deletes the directory and every file and subdirectory inside it, immediately and permanently.
Key insight: The safe deletion hierarchy is: (1) Path.unlink() for single files, (2) os.rmdir() for empty directories, (3) shutil.rmtree() for entire directory trees. Always guard shutil.rmtree() in production — validate the path is under a known prefix before calling it. A single typo can destroy entire directory trees with no recovery.
Expected Output
OSError\nFalseHints
Hint 1: os.rmdir() fails with OSError if the directory is not empty.
Hint 2: shutil.rmtree() deletes the directory and all its contents — no confirmation, no recycle bin.
Predict the output. A TemporaryDirectory is used as a context manager, work is done inside it, then its state is checked before and after the with block exits.
import tempfile
import os
from pathlib import Path
tmp_path = None
with tempfile.TemporaryDirectory() as tmp_dir:
tmp_path = tmp_dir
# Directory exists while inside the context
print(os.path.isdir(tmp_dir))
# Write a file into it
(Path(tmp_dir) / "work.txt").write_text("processing...")
print((Path(tmp_dir) / "work.txt").exists())
# After the with block — directory and all contents are gone
print(os.path.isdir(tmp_path))Solution
import tempfile
import os
from pathlib import Path
tmp_path = None
with tempfile.TemporaryDirectory() as tmp_dir:
tmp_path = tmp_dir
print(os.path.isdir(tmp_dir))
(Path(tmp_dir) / "work.txt").write_text("processing...")
print((Path(tmp_dir) / "work.txt").exists())
print(os.path.isdir(tmp_path))
Output:
True
True
False
How it works: tempfile.TemporaryDirectory() creates a temporary directory in the OS temp location (e.g., /tmp/tmpXYZABC). The with statement captures the directory path as a string. Inside the block, the directory and its contents exist normally. When the with block exits — whether normally or due to an exception — the __exit__ method calls shutil.rmtree() on the directory, deleting everything.
Key insight: Always prefer tempfile.TemporaryDirectory() over tempfile.mkdtemp() when possible. The context manager guarantees cleanup even if your code raises an exception. mkdtemp() creates a directory but returns you a path string — cleanup is your responsibility, which means it frequently leaks on error paths. The context manager pattern eliminates the entire cleanup responsibility.
Expected Output
True\nTrue\nFalseHints
Hint 1: tempfile.TemporaryDirectory() returns a context manager. The directory path is the string yielded by __enter__.
Hint 2: After the with block exits, the temp directory and all its contents are automatically deleted.
Medium
Use shutil copy functions to demonstrate the difference between copying a file and copying a tree, including pattern-based exclusion.
import os
import shutil
import tempfile
from pathlib import Path
with tempfile.TemporaryDirectory() as root:
root_path = Path(root)
# Build source tree
src = root_path / "src"
(src / "src" / ".git").mkdir(parents=True)
(src / "src" / "main.py").write_text("# main")
(src / "src" / "utils.pyc").write_bytes(b"\x00\x01")
# shutil.copy2: copy a single file
dest_file = root_path / "main_copy.py"
shutil.copy2(src / "src" / "main.py", dest_file)
print(dest_file.exists())
# shutil.copytree: copy the whole tree
dest_tree = root_path / "dst"
shutil.copytree(
src / "src",
dest_tree,
ignore=shutil.ignore_patterns(".git", "*.pyc"),
)
print((dest_tree / "main.py").exists())
print(not (dest_tree / "utils.pyc").exists())
print((dest_tree / ".git").exists())Solution
import os
import shutil
import tempfile
from pathlib import Path
with tempfile.TemporaryDirectory() as root:
root_path = Path(root)
src = root_path / "src"
(src / "src" / ".git").mkdir(parents=True)
(src / "src" / "main.py").write_text("# main")
(src / "src" / "utils.pyc").write_bytes(b"\x00\x01")
dest_file = root_path / "main_copy.py"
shutil.copy2(src / "src" / "main.py", dest_file)
print(dest_file.exists())
dest_tree = root_path / "dst"
shutil.copytree(
src / "src",
dest_tree,
ignore=shutil.ignore_patterns(".git", "*.pyc"),
)
print((dest_tree / "main.py").exists())
print(not (dest_tree / "utils.pyc").exists())
print((dest_tree / ".git").exists())
Output:
True
True
True
False
How it works:
-
shutil.copy2(src, dst)copies a single file. It preserves content, permission bits, and timestamps (modification time and access time). This is equivalent to the Unixcp -pcommand. -
shutil.copytree(src, dst, ignore=...)copies the entire directory tree.shutil.ignore_patterns(".git", "*.pyc")returns a callable that filters out.gitdirectories and*.pycfiles during the traversal. -
main.pyis copied.utils.pycis excluded by the pattern..gitis excluded by the pattern — so.gitdoes not exist in the destination.
Key insight: The four shutil copy functions trade off different metadata: copyfile() (content only), copy() (content + permissions), copy2() (content + permissions + timestamps), copytree() (full tree). In backup and deployment pipelines, copy2() is preferred because it preserves timestamps that incremental backup tools rely on. Always use ignore_patterns in copytree() to exclude .git, __pycache__, node_modules, and *.egg-info from copies.
Expected Output
True\nTrue\nTrue\nFalseHints
Hint 1: shutil.copy2() copies a single file, preserving content, permissions, and timestamps.
Hint 2: shutil.copytree() copies an entire directory tree. Use ignore=shutil.ignore_patterns() to skip files.
Hint 3: ignore_patterns(".git", "*.pyc") will exclude .git directories and .pyc files from the copy.
Use shutil.move() to rename a file in-place and then move a file into a target directory.
import shutil
import tempfile
from pathlib import Path
with tempfile.TemporaryDirectory() as root:
root_path = Path(root)
# Create source files and target directory
old_file = root_path / "old_name.txt"
old_file.write_text("content")
archive_dir = root_path / "archive"
archive_dir.mkdir()
# Rename in-place (same directory)
shutil.move(str(old_file), str(root_path / "new_name.txt"))
print(old_file.exists())
print((root_path / "new_name.txt").exists())
# Move into an existing directory — filename is preserved
src = root_path / "data.csv"
src.write_text("id,value")
shutil.move(str(src), str(archive_dir))
print(src.exists())
print((archive_dir / "data.csv").exists())Solution
import shutil
import tempfile
from pathlib import Path
with tempfile.TemporaryDirectory() as root:
root_path = Path(root)
old_file = root_path / "old_name.txt"
old_file.write_text("content")
archive_dir = root_path / "archive"
archive_dir.mkdir()
shutil.move(str(old_file), str(root_path / "new_name.txt"))
print(old_file.exists())
print((root_path / "new_name.txt").exists())
src = root_path / "data.csv"
src.write_text("id,value")
shutil.move(str(src), str(archive_dir))
print(src.exists())
print((archive_dir / "data.csv").exists())
Output:
False
True
False
True
How it works:
-
shutil.move(src, dst)is semantically identical to the Unixmvcommand. Whendstis a non-existing path, the file is renamed to that path.old_name.txtis gone;new_name.txtnow exists. -
When
dstis an existing directory,shutil.move()places the source file inside that directory, preserving the filename.data.csvends up atarchive/data.csv.
Key insight: Path.rename() is atomic on the same filesystem but raises OSError when moving across different filesystems (e.g., from /tmp to /home). shutil.move() is always safe — on the same filesystem it calls os.rename() (fast, atomic), and across filesystems it falls back to copy-then-delete (slower, not atomic). Use shutil.move() in production unless you have a specific reason to require atomic behavior.
Expected Output
False\nTrue\nFalse\nTrueHints
Hint 1: shutil.move() can move a file into an existing directory — it preserves the filename.
Hint 2: If the destination is a directory, the file is placed inside it with its original name.
Hint 3: Path.rename() raises OSError across filesystems. shutil.move() handles cross-filesystem moves transparently.
Implement a function that walks a directory tree and returns a dictionary mapping file extensions (without the dot) to their counts. Ignore files with no extension.
import os
import tempfile
from pathlib import Path
def count_by_extension(root_dir):
"""Return {ext: count} for all files in root_dir and its subdirectories."""
counts = {}
for dirpath, dirnames, filenames in os.walk(root_dir):
for fname in filenames:
_, ext = os.path.splitext(fname)
if ext:
key = ext.lstrip(".")
counts[key] = counts.get(key, 0) + 1
return dict(sorted(counts.items()))
with tempfile.TemporaryDirectory() as root:
p = Path(root)
(p / "src").mkdir()
(p / "data").mkdir()
for name in ["main.py", "utils.py", "test_main.py"]:
(p / "src" / name).write_text("")
for name in ["train.csv", "test.csv"]:
(p / "data" / name).write_text("")
(p / "README.txt").write_text("")
print(count_by_extension(root))Solution
import os
import tempfile
from pathlib import Path
def count_by_extension(root_dir):
counts = {}
for dirpath, dirnames, filenames in os.walk(root_dir):
for fname in filenames:
_, ext = os.path.splitext(fname)
if ext:
key = ext.lstrip(".")
counts[key] = counts.get(key, 0) + 1
return dict(sorted(counts.items()))
with tempfile.TemporaryDirectory() as root:
p = Path(root)
(p / "src").mkdir()
(p / "data").mkdir()
for name in ["main.py", "utils.py", "test_main.py"]:
(p / "src" / name).write_text("")
for name in ["train.csv", "test.csv"]:
(p / "data" / name).write_text("")
(p / "README.txt").write_text("")
print(count_by_extension(root))
Output:
{'csv': 2, 'py': 3, 'txt': 1}
How it works: os.walk(root_dir) visits every directory in the tree. For each directory, filenames is a list of filenames in that directory (without the path). os.path.splitext(fname) returns a (root, ext) tuple where ext includes the leading dot (e.g., ".py"). The if ext guard skips files with no extension (like Makefile or .gitignore). lstrip(".") removes the leading dot to produce the clean key. Results are sorted by key for deterministic output.
Key insight: os.path.splitext() only splits on the last dot, so archive.tar.gz yields (".gz"), not (".tar.gz"). For compound extensions, use pathlib.Path(fname).suffixes instead. The pattern of walking and aggregating is the basis for disk usage analyzers, duplicate finders, and code analysis tools.
Expected Output
{'py': 3, 'csv': 2, 'txt': 1}Hints
Hint 1: os.walk() yields (dirpath, dirnames, filenames) for every directory in the tree.
Hint 2: os.path.splitext(filename)[1] gives the extension including the dot. Strip the dot with [1:].
Hint 3: Use a collections.Counter or a plain dict to accumulate counts.
Use Path.rglob() to find all files modified within a given number of seconds. Verify the function finds recently created files and excludes older ones.
import time
import tempfile
from pathlib import Path
def find_recent_files(root_dir, within_seconds=60):
"""Return a sorted list of files modified within the last within_seconds."""
root = Path(root_dir)
cutoff = time.time() - within_seconds
return sorted(
p for p in root.rglob("*")
if p.is_file() and p.stat().st_mtime >= cutoff
)
with tempfile.TemporaryDirectory() as root:
p = Path(root)
(p / "sub").mkdir()
# Create two files right now — they should be found
new1 = p / "new_file.txt"
new2 = p / "sub" / "also_new.py"
new1.write_text("fresh")
new2.write_text("also fresh")
results = find_recent_files(root, within_seconds=30)
print(len(results))
# Results are sorted Path objects
names = {f.name for f in results}
print("new_file.txt" in names)
print("also_new.py" in names)Solution
import time
import tempfile
from pathlib import Path
def find_recent_files(root_dir, within_seconds=60):
root = Path(root_dir)
cutoff = time.time() - within_seconds
return sorted(
p for p in root.rglob("*")
if p.is_file() and p.stat().st_mtime >= cutoff
)
with tempfile.TemporaryDirectory() as root:
p = Path(root)
(p / "sub").mkdir()
new1 = p / "new_file.txt"
new2 = p / "sub" / "also_new.py"
new1.write_text("fresh")
new2.write_text("also fresh")
results = find_recent_files(root, within_seconds=30)
print(len(results))
names = {f.name for f in results}
print("new_file.txt" in names)
print("also_new.py" in names)
Output:
2
True
True
How it works: Path.rglob("*") generates every path (files and directories) under root recursively. The filter p.is_file() removes directories. p.stat().st_mtime returns the modification timestamp in seconds since the Unix epoch (January 1, 1970). The cutoff is time.time() - within_seconds — any file with st_mtime >= cutoff was modified within the window.
Key insight: Path.rglob("*") returns all paths; Path.rglob("*.py") applies a glob filter at the filename level. For filtering on file metadata (size, mtime, permissions), you still need to read the stat in a list comprehension or generator — the glob pattern only matches names. Use p.stat() once per file and cache the result if you need multiple attributes (size AND mtime) to avoid redundant system calls.
Expected Output
2\nTrue\nTrueHints
Hint 1: Path.rglob("*") matches all files and directories recursively.
Hint 2: p.stat().st_mtime returns the last modification time as a Unix timestamp (seconds since epoch).
Hint 3: Compare with time.time() - threshold_seconds to filter by recency.
Hard
Implement print_tree() that prints a directory tree in a human-readable format. Directories have a trailing /. Files and subdirectories are indented by 2 spaces per depth level. Everything is sorted alphabetically — directories first, then files.
import os
import tempfile
from pathlib import Path
def print_tree(root_dir, indent=0):
"""Print directory tree rooted at root_dir with sorted, indented output."""
root = Path(root_dir)
prefix = " " * indent
# Separate and sort entries
entries = sorted(root.iterdir(), key=lambda e: (e.is_file(), e.name))
for entry in entries:
if entry.is_dir():
print(prefix + entry.name + "/")
print_tree(entry, indent + 1)
else:
print(prefix + entry.name)
with tempfile.TemporaryDirectory() as root:
p = Path(root)
(p / "project" / "src").mkdir(parents=True)
(p / "project" / "tests").mkdir(parents=True)
(p / "project" / "src" / "main.py").write_text("")
(p / "project" / "src" / "utils.py").write_text("")
(p / "project" / "tests" / "test_main.py").write_text("")
(p / "project" / "README.md").write_text("")
print_tree(p / "project")Solution
import os
import tempfile
from pathlib import Path
def print_tree(root_dir, indent=0):
root = Path(root_dir)
prefix = " " * indent
entries = sorted(root.iterdir(), key=lambda e: (e.is_file(), e.name))
for entry in entries:
if entry.is_dir():
print(prefix + entry.name + "/")
print_tree(entry, indent + 1)
else:
print(prefix + entry.name)
with tempfile.TemporaryDirectory() as root:
p = Path(root)
(p / "project" / "src").mkdir(parents=True)
(p / "project" / "tests").mkdir(parents=True)
(p / "project" / "src" / "main.py").write_text("")
(p / "project" / "src" / "utils.py").write_text("")
(p / "project" / "tests" / "test_main.py").write_text("")
(p / "project" / "README.md").write_text("")
print_tree(p / "project")
Output:
project/
src/
main.py
utils.py
tests/
test_main.py
README.md
How it works: Path.iterdir() yields all direct children of a directory (non-recursive). The sort key (e.is_file(), e.name) sorts directories before files because is_file() returns False (0) for directories and True (1) for files — tuples compare element-by-element, so (False, "src") sorts before (True, "README.md"). Then within each group, names are sorted alphabetically. The recursion depth is tracked via indent, and " " * indent produces the correct prefix. The root directory name is printed with a trailing / by the caller one level up.
Key insight: Sorting by (is_file, name) is a compact way to enforce "directories first, files second" ordering without two separate passes. In the Unix tree command, the output uses box-drawing characters (├──, └──, │) — you can extend this implementation to use those characters by tracking whether each entry is the last sibling and passing the connecting prefix string down through the recursion.
Expected Output
project/\n src/\n main.py\n utils.py\n tests/\n test_main.py\n README.mdHints
Hint 1: Use recursion or os.walk with topdown=True to visit directories before their contents.
Hint 2: Track the depth level to compute the indent prefix (2 spaces per level).
Hint 3: Sort directory and file names for deterministic output.
Implement organize_by_extension() that moves all files in a flat directory into subdirectories named after each file extension. Files with no extension are left in place. Existing subdirectories are not moved.
import shutil
import tempfile
from pathlib import Path
def organize_by_extension(source_dir):
"""
Move each file in source_dir into a subdirectory named after its extension.
Files without an extension are left in place.
Returns the number of files moved.
"""
source = Path(source_dir)
moved = 0
for entry in list(source.iterdir()):
if entry.is_dir() or not entry.suffix:
continue
ext_dir = source / entry.suffix.lstrip(".")
ext_dir.mkdir(exist_ok=True)
shutil.move(str(entry), str(ext_dir / entry.name))
moved += 1
return moved
with tempfile.TemporaryDirectory() as root:
p = Path(root)
for name in ["report.pdf", "data.csv", "notes.txt", "archive.csv"]:
(p / name).write_text("")
(p / "Makefile").write_text("") # No extension — should stay
count = organize_by_extension(p)
print(count) # 4 files moved
print((p / "pdf" / "report.pdf").exists())
print((p / "csv" / "data.csv").exists())
print((p / "csv" / "archive.csv").exists())
print((p / "Makefile").exists()) # UntouchedSolution
import shutil
import tempfile
from pathlib import Path
def organize_by_extension(source_dir):
source = Path(source_dir)
moved = 0
for entry in list(source.iterdir()):
if entry.is_dir() or not entry.suffix:
continue
ext_dir = source / entry.suffix.lstrip(".")
ext_dir.mkdir(exist_ok=True)
shutil.move(str(entry), str(ext_dir / entry.name))
moved += 1
return moved
with tempfile.TemporaryDirectory() as root:
p = Path(root)
for name in ["report.pdf", "data.csv", "notes.txt", "archive.csv"]:
(p / name).write_text("")
(p / "Makefile").write_text("")
count = organize_by_extension(p)
print(count)
print((p / "pdf" / "report.pdf").exists())
print((p / "csv" / "data.csv").exists())
print((p / "csv" / "archive.csv").exists())
print((p / "Makefile").exists())
Output:
4
True
True
True
True
How it works: The function iterates over the directory contents using list(source.iterdir()) — the list() call is critical because iterating over a generator while modifying the directory (by moving files and creating subdirectories) can cause issues on some platforms. For each file entry with a suffix, an extension subdirectory is created with mkdir(exist_ok=True), then shutil.move() transfers the file. The if entry.is_dir() guard prevents accidentally moving newly created extension subdirectories. Makefile has no .suffix so it is skipped.
Key insight: The list() call around source.iterdir() materializes all entries before the loop begins. Without it, creating pdf/ and csv/ subdirectories during iteration could make them appear as entries in the generator — the is_dir() guard would catch them, but the behavior is platform-dependent and generally unsafe to rely on. Always materialize directory listings before mutating the directory structure you are iterating over.
Expected Output
3\nTrue\nTrue\nTrue\nTrueHints
Hint 1: Use Path.iterdir() to list files directly in the source directory.
Hint 2: Create a subdirectory named after each extension (without the dot) using mkdir(exist_ok=True).
Hint 3: Use shutil.move() to move each file into its extension subdirectory.
Hint 4: Skip directories and files with no extension.
Implement safe_rmtree() that deletes a directory tree only when it passes three safety checks: (1) path is under a required prefix, (2) path is not in a set of protected directories, (3) path exists and is a directory. Raise ValueError for safety violations and do nothing if the path does not exist.
import shutil
import tempfile
from pathlib import Path
def safe_rmtree(path, require_prefix=None):
"""
Delete a directory tree with safety guards.
Raises ValueError if:
- require_prefix is set and path is not under it
- path is a known protected directory (root, home)
"""
target = Path(path).resolve()
if require_prefix:
prefix = Path(require_prefix).resolve()
if not str(target).startswith(str(prefix) + "/") and target != prefix:
raise ValueError(f"outside prefix")
forbidden = {Path("/"), Path.home()}
if target in forbidden:
raise ValueError("protected path")
if not target.exists():
return
if not target.is_dir():
raise ValueError(f"not a directory: {target}")
shutil.rmtree(target)
with tempfile.TemporaryDirectory() as root:
p = Path(root)
build = p / "build"
build.mkdir()
(build / "artifact.o").write_text("")
# Normal deletion under allowed prefix
safe_rmtree(build, require_prefix=root)
print(build.exists())
# Attempt to delete something outside prefix
try:
safe_rmtree("/tmp/other", require_prefix=root)
except ValueError as e:
print("ValueError:", e)
# Attempt to delete home directory
try:
safe_rmtree(Path.home(), require_prefix=str(Path.home().parent))
except ValueError as e:
print("ValueError:", e)
# Deleting non-existent path is a no-op
safe_rmtree(p / "nonexistent", require_prefix=root)
print(build.exists())Solution
import shutil
import tempfile
from pathlib import Path
def safe_rmtree(path, require_prefix=None):
target = Path(path).resolve()
if require_prefix:
prefix = Path(require_prefix).resolve()
if not str(target).startswith(str(prefix) + "/") and target != prefix:
raise ValueError(f"outside prefix")
forbidden = {Path("/"), Path.home()}
if target in forbidden:
raise ValueError("protected path")
if not target.exists():
return
if not target.is_dir():
raise ValueError(f"not a directory: {target}")
shutil.rmtree(target)
with tempfile.TemporaryDirectory() as root:
p = Path(root)
build = p / "build"
build.mkdir()
(build / "artifact.o").write_text("")
safe_rmtree(build, require_prefix=root)
print(build.exists())
try:
safe_rmtree("/tmp/other", require_prefix=root)
except ValueError as e:
print("ValueError:", e)
try:
safe_rmtree(Path.home(), require_prefix=str(Path.home().parent))
except ValueError as e:
print("ValueError:", e)
safe_rmtree(p / "nonexistent", require_prefix=root)
print(build.exists())
Output:
False
ValueError: outside prefix
ValueError: protected path
False
How it works:
-
Path(path).resolve()converts the target to an absolute path and resolves any symlinks or..components. This prevents path traversal attacks — a caller cannot sneak"/tmp/safe_dir/../../../home/user"past the prefix check withoutresolve()catching it. -
The prefix check uses string prefix matching with an appended
/—startswith(str(prefix) + "/")ensures that a prefix of/tmp/foodoes not accidentally match/tmp/foobar. Thetarget != prefixclause also allows deleting the prefix root itself. -
The forbidden set blocks deletion of
/and the user's home directory. Extend this set with any additional paths that should never be deleted. -
Non-existent paths return silently — this is the "no-op on missing" contract. Callers should not need to check existence before calling.
-
buildwas already deleted in step 1, so the finalprint(build.exists())returnsFalse.
Key insight: This pattern is essential for any build system, test cleanup, or deployment script that calls shutil.rmtree(). The resolve() call is non-negotiable — without it, symlinks can be used to redirect deletion to arbitrary locations. In production, some teams add a dry-run mode that logs what would be deleted without actually deleting, and a required confirmation flag for paths above a certain size threshold.
Expected Output
True\nValueError: outside prefix\nValueError: protected path\nFalseHints
Hint 1: Use Path.resolve() to convert the target path to an absolute, symlink-resolved path before any checks.
Hint 2: Check that the resolved path starts with the required prefix using str(path).startswith(str(prefix)).
Hint 3: Maintain a set of forbidden paths — at minimum Path("/") and Path.home().
