Python Pathlib Deep Dive Practice Problems & Exercises
Practice: Pathlib Deep Dive
← Back to lessonEasy
Predict the output. Three paths are constructed in different ways. Determine what each print produces.
from pathlib import PurePosixPath as Path
p1 = Path("/data/projects")
p2 = Path("data/projects")
p3 = Path("/data") / "projects" / "myapp"
print(p1)
print(p2)
print(p3)Solution
from pathlib import PurePosixPath as Path
p1 = Path("/data/projects")
p2 = Path("data/projects")
p3 = Path("/data") / "projects" / "myapp"
print(p1)
print(p2)
print(p3)
Output:
/data/projects
data/projects
/data/projects/myapp
How it works: Path("/data/projects") creates an absolute path (starts with /). Path("data/projects") creates a relative path (no leading /). The / operator joins segments correctly — Path("/data") / "projects" / "myapp" produces /data/projects/myapp with proper separators.
Key insight: Path preserves whether a path is absolute or relative based on the initial string. The / operator is the Pythonic replacement for os.path.join() — it never accidentally concatenates strings without separators.
Expected Output
/data/projects\ndata/projects\n/data/projects/myappHints
Hint 1: Path() accepts a string and creates a Path object. The string representation matches what you passed in.
Hint 2: The / operator joins path segments — it works between Path objects and strings.
Predict the output. A path with multiple extensions is inspected using four different attributes.
from pathlib import PurePosixPath as Path
p = Path("/data/exports/report.final.csv")
print(p.name)
print(p.stem)
print(p.suffix)
print(p.suffixes)Solution
from pathlib import PurePosixPath as Path
p = Path("/data/exports/report.final.csv")
print(p.name)
print(p.stem)
print(p.suffix)
print(p.suffixes)
Output:
report.final.csv
report.final
.csv
['.final', '.csv']
How it works:
.namereturns the final path component:report.final.csv.stemstrips only the last suffix:report.final(notreport).suffixreturns only the last extension:.csv.suffixesreturns all dot-separated suffixes as a list:['.final', '.csv']
Key insight: A common mistake is expecting .stem to strip all extensions. It only strips the last one. If you need the bare filename without any extensions, you can chain: Path(p.stem).stem or use p.name.split('.')[0]. The .suffixes attribute is useful for files like .tar.gz where you need to know both extensions.
Expected Output
report.final.csv\nreport.final\n.csv\n['.final', '.csv']Hints
Hint 1: .name returns the final component of the path (file name with extension).
Hint 2: .stem strips only the last suffix. .suffixes returns all suffixes as a list.
Predict the output. A deeply nested path is traversed using .parent, .parents, and .parts.
from pathlib import PurePosixPath as Path
p = Path("/data/projects/myapp/src/main.py")
print(p.parent)
print(p.parent.parent)
print(p.parents[2])
print(p.parts)Solution
from pathlib import PurePosixPath as Path
p = Path("/data/projects/myapp/src/main.py")
print(p.parent)
print(p.parent.parent)
print(p.parents[2])
print(p.parts)
Output:
/data/projects/myapp/src
/data/projects/myapp
/data/projects
('/', 'data', 'projects', 'myapp', 'src', 'main.py')
How it works:
.parentreturns the directory containing the file:/data/projects/myapp/src- Chaining
.parent.parentgoes two levels up:/data/projects/myapp .parents[0]is the same as.parent,.parents[1]is.parent.parent, and.parents[2]goes three levels up:/data/projects.partssplits the path into a tuple of every component, with'/'as the root anchor for absolute paths
Key insight: .parents is indexed from the file outward — index 0 is the immediate parent, and higher indices go further toward the root. The .parts tuple always starts with the anchor ('/' on POSIX, 'C:\\' on Windows) for absolute paths, or the first directory name for relative paths.
Expected Output
/data/projects/myapp/src\n/data/projects/myapp\n/data/projects\n('/', 'data', 'projects', 'myapp', 'src', 'main.py')Hints
Hint 1: .parent returns the immediate parent directory as a Path object.
Hint 2: .parents returns an immutable sequence of all ancestor directories.
Hint 3: .parts returns a tuple of individual path components.
Predict the output. The / operator behaves differently when one of the operands is an absolute path.
from pathlib import PurePosixPath as Path
base = Path("/data/projects")
# What happens when the right side is absolute?
p1 = base / "/etc/config.yaml"
print(p1)
# Normal composition
p2 = base / "app" / "main.py"
print(p2)
# Is the result still a Path?
print(isinstance(p2, Path))Solution
from pathlib import PurePosixPath as Path
base = Path("/data/projects")
p1 = base / "/etc/config.yaml"
print(p1)
p2 = base / "app" / "main.py"
print(p2)
print(isinstance(p2, Path))
Output:
/etc/config.yaml
/data/projects/app/main.py
True
How it works: When the right operand of / is an absolute path (starts with /), it replaces the entire left side. So Path("/data/projects") / "/etc/config.yaml" discards /data/projects and returns Path("/etc/config.yaml"). This matches os.path.join() behavior — if any component is absolute, all previous components are thrown away.
Normal composition with relative segments ("app", "main.py") appends as expected.
Key insight: This is a safety feature, not a bug. It prevents accidentally creating paths like /data/projects//etc/config.yaml. If a user provides an absolute path, it should be used as-is, not appended to a base. Always validate user input paths if you expect them to be relative.
Expected Output
/etc/config.yaml\n/data/projects/app/main.py\nTrueHints
Hint 1: When the right operand of / is an absolute path, it replaces the left side entirely.
Hint 2: This mirrors how os.path.join works — an absolute component resets the path.
Medium
Predict the output. Two glob patterns are used to find different file types in a directory. Assume the following directory structure exists:
project/
data.csv
report.csv
summary.csv
config.yaml
src/
main.py
utils.py
from pathlib import Path
import tempfile, os
# Set up the directory structure
with tempfile.TemporaryDirectory() as tmp:
base = Path(tmp) / "project"
(base / "src").mkdir(parents=True)
for f in ["data.csv", "report.csv", "summary.csv", "config.yaml"]:
(base / f).touch()
for f in ["main.py", "utils.py"]:
(base / "src" / f).touch()
# Glob for CSV files
csv_files = sorted([p.name for p in base.glob("*.csv")])
print(len(csv_files))
print(csv_files)
# Glob for YAML files
yaml_files = sorted([p.name for p in base.glob("*.yaml")])
print(len(yaml_files))
print(yaml_files)Solution
from pathlib import Path
import tempfile, os
with tempfile.TemporaryDirectory() as tmp:
base = Path(tmp) / "project"
(base / "src").mkdir(parents=True)
for f in ["data.csv", "report.csv", "summary.csv", "config.yaml"]:
(base / f).touch()
for f in ["main.py", "utils.py"]:
(base / "src" / f).touch()
csv_files = sorted([p.name for p in base.glob("*.csv")])
print(len(csv_files))
print(csv_files)
yaml_files = sorted([p.name for p in base.glob("*.yaml")])
print(len(yaml_files))
print(yaml_files)
Output:
3
['data.csv', 'report.csv', 'summary.csv']
1
['config.yaml']
How it works: .glob("*.csv") matches all files in base whose name ends with .csv. It does not recurse into subdirectories — src/main.py and src/utils.py are not matched. The * wildcard matches any sequence of characters except the path separator.
.glob("*.yaml") finds only config.yaml. The glob returns a generator of Path objects — we extract .name to get just the filename.
Key insight: .glob() returns a lazy generator, not a list. For large directories with thousands of files, this is memory-efficient. Always convert to a list explicitly if you need to iterate multiple times or check the length. The pattern follows standard Unix glob rules — * matches everything except /, ? matches a single character, [abc] matches character classes.
Expected Output
3\n['data.csv', 'report.csv', 'summary.csv']\n1\n['config.yaml']Hints
Hint 1: .glob() searches only the immediate directory (one level) unless the pattern contains **.
Hint 2: The pattern *.csv matches any file ending in .csv in the target directory.
Predict the output. An rglob call searches for all Python files across a nested project structure.
myproject/
setup.py (not a .py we want — but wait, it IS .py)
src/
main.py
utils.py
views.py
tests/
test_main.py
test_utils.py
from pathlib import Path
import tempfile
with tempfile.TemporaryDirectory() as tmp:
base = Path(tmp) / "myproject"
(base / "src").mkdir(parents=True)
(base / "tests").mkdir(parents=True)
for f in ["main.py", "utils.py", "views.py"]:
(base / "src" / f).touch()
for f in ["test_main.py", "test_utils.py"]:
(base / "tests" / f).touch()
# rglob searches ALL subdirectories
all_py = sorted([p.name for p in base.rglob("*.py")])
print(len(all_py))
print(all_py)Solution
from pathlib import Path
import tempfile
with tempfile.TemporaryDirectory() as tmp:
base = Path(tmp) / "myproject"
(base / "src").mkdir(parents=True)
(base / "tests").mkdir(parents=True)
for f in ["main.py", "utils.py", "views.py"]:
(base / "src" / f).touch()
for f in ["test_main.py", "test_utils.py"]:
(base / "tests" / f).touch()
all_py = sorted([p.name for p in base.rglob("*.py")])
print(len(all_py))
print(all_py)
Output:
5
['main.py', 'test_main.py', 'test_utils.py', 'utils.py', 'views.py']
How it works: .rglob("*.py") recursively walks the entire directory tree under base and matches any file ending in .py. It finds files in both src/ and tests/ subdirectories. The result is sorted alphabetically by name.
Note that rglob("*.py") is equivalent to glob("**/*.py"). The ** pattern matches zero or more directories.
Key insight: .rglob() is extremely useful for project-wide operations like finding all Python files, all test files, or all config files. However, be careful with large directory trees — it walks every subdirectory. For performance-critical code, consider using .glob() with specific subdirectory patterns instead of a blanket recursive search.
Expected Output
5\n['main.py', 'test_main.py', 'test_utils.py', 'utils.py', 'views.py']Hints
Hint 1: .rglob(pattern) is equivalent to .glob("**/" + pattern) — it searches all subdirectories recursively.
Hint 2: rglob returns a generator that walks the entire directory tree.
Predict the output. The .resolve() method is used to normalize paths containing .. and . segments.
from pathlib import PurePosixPath as Path
# Paths with navigation segments
p1 = Path("/data/projects/../projects/myapp")
p2 = Path("/data/projects/./myapp")
p3 = Path("/data/projects/myapp/../myapp/src/../src/main.py")
# PurePath does not resolve, but we can check parts
# For demonstration, we manually normalize:
# /data/projects/../projects/myapp -> /data/projects/myapp
# /data/projects/./myapp -> /data/projects/myapp
# With real Path (not PurePath), resolve() cleans these up
# Let's verify the logic manually:
print(str(p1) == "/data/projects/../projects/myapp")
print(".." in p1.parts)
# The / operator does NOT collapse .. segments
p4 = Path("/data") / "projects" / ".." / "projects" / "myapp"
print(".." in p4.parts)
# Only resolve() on concrete Path cleans up .. segments
from pathlib import Path as ConcretePath
import tempfile, os
with tempfile.TemporaryDirectory() as tmp:
real = ConcretePath(tmp) / "a" / "b"
real.mkdir(parents=True)
messy = ConcretePath(tmp) / "a" / "b" / ".." / "b"
resolved = messy.resolve()
print(resolved == real.resolve())Solution
from pathlib import PurePosixPath as Path
p1 = Path("/data/projects/../projects/myapp")
p2 = Path("/data/projects/./myapp")
p3 = Path("/data/projects/myapp/../myapp/src/../src/main.py")
print(str(p1) == "/data/projects/../projects/myapp")
print(".." in p1.parts)
p4 = Path("/data") / "projects" / ".." / "projects" / "myapp"
print(".." in p4.parts)
from pathlib import Path as ConcretePath
import tempfile, os
with tempfile.TemporaryDirectory() as tmp:
real = ConcretePath(tmp) / "a" / "b"
real.mkdir(parents=True)
messy = ConcretePath(tmp) / "a" / "b" / ".." / "b"
resolved = messy.resolve()
print(resolved == real.resolve())
Output:
True
True
True
True
How it works:
PurePosixPathpreserves..segments literally — it does not resolve them. The string representation is exactly what you passed in...appears as a literal part in.parts—PurePathmakes no filesystem calls.- The
/operator also preserves..— it simply appends segments without normalization. - Only
.resolve()on a concretePath(notPurePath) actually collapses..segments by consulting the real filesystem. After resolution, both paths point to the same directory.
Key insight: There is a critical distinction between PurePath (string manipulation only) and Path (filesystem-aware). PurePath never touches the disk, so it cannot resolve .. — it does not know whether the directories exist. Always use .resolve() on concrete Path objects when you need a canonical, normalized absolute path. This is essential for comparing paths reliably: two different path strings might refer to the same file, and .resolve() makes them identical.
Expected Output
True\nTrue\nTrue\nTrueHints
Hint 1: .resolve() returns an absolute path with all symlinks and .. segments resolved.
Hint 2: .resolve() always returns an absolute path, even if the original was relative.
Predict the output. The .relative_to() method computes paths relative to a base directory. One call succeeds, another raises an error.
from pathlib import PurePosixPath as Path
project = Path("/data/projects/myapp")
source = Path("/data/projects/myapp/src/main.py")
unrelated = Path("/etc/config.yaml")
# Compute relative path from project to source
rel = source.relative_to(project)
print(rel)
print(rel.parts[0] == "src")
# What happens with an unrelated path?
try:
unrelated.relative_to(project)
print("OK")
except ValueError:
print("ValueError")Solution
from pathlib import PurePosixPath as Path
project = Path("/data/projects/myapp")
source = Path("/data/projects/myapp/src/main.py")
unrelated = Path("/etc/config.yaml")
rel = source.relative_to(project)
print(rel)
print(rel.parts[0] == "src")
try:
unrelated.relative_to(project)
print("OK")
except ValueError:
print("ValueError")
Output:
src/main.py
True
ValueError
How it works:
source.relative_to(project)strips the/data/projects/myappprefix from/data/projects/myapp/src/main.py, leavingsrc/main.pyas a relative path.- The result's first part is
"src", so the check is True. unrelated.relative_to(project)fails because/etc/config.yamldoes not start with/data/projects/myapp. There is no way to express/etc/config.yamlrelative to/data/projects/myappusing only therelative_tomethod (it does not generate..segments).
Key insight: .relative_to() only works when the path is actually a descendant of the base. It does not compute ../../etc/config.yaml-style relative paths. For that, use os.path.relpath() or, in Python 3.12+, PurePath.relative_to(base, walk_up=True). In production code, always wrap .relative_to() in a try/except or check with .is_relative_to() (Python 3.9+) first.
Expected Output
src/main.py\nTrue\nValueErrorHints
Hint 1: .relative_to(base) strips the base prefix and returns the remainder as a relative path.
Hint 2: relative_to raises ValueError if the path is not relative to the given base.
Hard
Write a function that takes a directory path, finds all files (not directories), and groups them by extension. Return a dictionary mapping extension names (without the dot) to sorted lists of filenames.
from pathlib import Path
import tempfile
def organize_by_extension(directory):
"""Group files by extension. Return dict of ext -> sorted filenames."""
groups = {}
for p in sorted(directory.iterdir()):
if p.is_file() and p.suffix:
ext = p.suffix.lstrip(".")
if ext not in groups:
groups[ext] = []
groups[ext].append(p.name)
return dict(sorted(groups.items()))
with tempfile.TemporaryDirectory() as tmp:
base = Path(tmp) / "downloads"
base.mkdir()
for f in ["index.html", "about.html", "styles.css",
"reset.css", "app.js", "utils.js"]:
(base / f).touch()
result = organize_by_extension(base)
for ext, files in result.items():
print(f"{ext}: {files}")Solution
from pathlib import Path
import tempfile
def organize_by_extension(directory):
"""Group files by extension. Return dict of ext -> sorted filenames."""
groups = {}
for p in sorted(directory.iterdir()):
if p.is_file() and p.suffix:
ext = p.suffix.lstrip(".")
if ext not in groups:
groups[ext] = []
groups[ext].append(p.name)
return dict(sorted(groups.items()))
with tempfile.TemporaryDirectory() as tmp:
base = Path(tmp) / "downloads"
base.mkdir()
for f in ["index.html", "about.html", "styles.css",
"reset.css", "app.js", "utils.js"]:
(base / f).touch()
result = organize_by_extension(base)
for ext, files in result.items():
print(f"{ext}: {files}")
Output:
css: ['reset.css', 'styles.css']
html: ['about.html', 'index.html']
js: ['app.js', 'utils.js']
How it works:
.iterdir()yields all entries in the directory (files and subdirectories)..is_file()filters out directories..suffixcheck excludes files without extensions..suffixreturns the extension with the leading dot (.css), so.lstrip(".")strips it tocss.- Files are grouped by extension into a dictionary, then both keys and value lists are sorted.
Key insight: This pattern is the foundation of file organizers, build tools, and static site generators. In production, you would also handle files with no extension, files with multiple extensions (.tar.gz), and hidden files (.gitignore). Using defaultdict(list) from collections would simplify the grouping logic.
Expected Output
css: ['reset.css', 'styles.css']\nhtml: ['about.html', 'index.html']\njs: ['app.js', 'utils.js']Hints
Hint 1: Use .suffix to get the file extension, then strip the leading dot to get the category name.
Hint 2: Group files by extension using a dictionary, then sort both keys and values.
Write a function that discovers all Python files in a project, classifies them as source or test files, and reports the counts. A file is a "test file" if any of its parent directories is named tests.
from pathlib import Path
import tempfile
def discover_project(root):
"""Find all .py files, classify as source or test."""
py_files = sorted(root.rglob("*.py"))
source_files = []
test_files = []
for f in py_files:
rel = f.relative_to(root)
if "tests" in rel.parts:
test_files.append(rel)
else:
source_files.append(rel)
return source_files, test_files
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp) / "myproject"
(root / "src").mkdir(parents=True)
(root / "tests").mkdir(parents=True)
for f in ["src/app.py", "src/utils.py",
"tests/test_app.py", "tests/test_utils.py"]:
(root / f).touch()
source, tests = discover_project(root)
all_files = sorted(source + tests, key=lambda p: str(p))
print(f"Found {len(all_files)} Python files:")
for f in all_files:
print(f" {f}")
print(f"Source files: {len(source)}")
print(f"Test files: {len(tests)}")Solution
from pathlib import Path
import tempfile
def discover_project(root):
"""Find all .py files, classify as source or test."""
py_files = sorted(root.rglob("*.py"))
source_files = []
test_files = []
for f in py_files:
rel = f.relative_to(root)
if "tests" in rel.parts:
test_files.append(rel)
else:
source_files.append(rel)
return source_files, test_files
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp) / "myproject"
(root / "src").mkdir(parents=True)
(root / "tests").mkdir(parents=True)
for f in ["src/app.py", "src/utils.py",
"tests/test_app.py", "tests/test_utils.py"]:
(root / f).touch()
source, tests = discover_project(root)
all_files = sorted(source + tests, key=lambda p: str(p))
print(f"Found {len(all_files)} Python files:")
for f in all_files:
print(f" {f}")
print(f"Source files: {len(source)}")
print(f"Test files: {len(tests)}")
Output:
Found 4 Python files:
src/app.py
src/utils.py
tests/test_app.py
tests/test_utils.py
Source files: 2
Test files: 2
How it works:
.rglob("*.py")recursively finds every.pyfile under the project root..relative_to(root)converts absolute paths to relative paths (e.g.,src/app.py).- Checking
"tests" in rel.partslooks at the path components (not a substring match on the full string). This correctly classifiestests/test_app.pyas a test file without false positives on files likesrc/test_helpers.py.
Key insight: This is the pattern used by test runners (pytest), linters (flake8), and build tools (setuptools) to discover project files. Using .parts for classification is more robust than string matching — "tests" in str(path) would incorrectly match a file called contests/entry.py. Always decompose paths into parts for reliable classification.
Expected Output
Found 4 Python files:\n src/app.py\n src/utils.py\n tests/test_app.py\n tests/test_utils.py\nSource files: 2\nTest files: 2Hints
Hint 1: Use rglob to find all .py files, then relative_to to get paths relative to the project root.
Hint 2: Check if any parent directory is named "tests" to classify test files vs source files.
Write a function that generates rename plans — given a directory, a search string, and a replacement string, find all files whose stem contains the search string and produce (old_name, new_name) pairs. Do not actually rename; just print the plan.
from pathlib import Path
import tempfile
def plan_renames(directory, search, replace):
"""Find files with search in stem, return rename pairs."""
pairs = []
for p in sorted(directory.iterdir()):
if p.is_file() and search in p.stem:
new_stem = p.stem.replace(search, replace)
new_path = p.with_stem(new_stem)
pairs.append((p.name, new_path.name))
return pairs
with tempfile.TemporaryDirectory() as tmp:
base = Path(tmp) / "reports"
base.mkdir()
for f in ["report_2024.csv", "data_2024.csv", "summary_2024.csv",
"backup_v1.json", "config_v1.json", "readme.txt"]:
(base / f).touch()
# Plan 1: Update year in CSV files
for old, new in plan_renames(base, "2024", "2025"):
print(f"{old} -> {new}")
# Plan 2: Update version in JSON files
for old, new in plan_renames(base, "v1", "v2"):
print(f"{old} -> {new}")Solution
from pathlib import Path
import tempfile
def plan_renames(directory, search, replace):
"""Find files with search in stem, return rename pairs."""
pairs = []
for p in sorted(directory.iterdir()):
if p.is_file() and search in p.stem:
new_stem = p.stem.replace(search, replace)
new_path = p.with_stem(new_stem)
pairs.append((p.name, new_path.name))
return pairs
with tempfile.TemporaryDirectory() as tmp:
base = Path(tmp) / "reports"
base.mkdir()
for f in ["report_2024.csv", "data_2024.csv", "summary_2024.csv",
"backup_v1.json", "config_v1.json", "readme.txt"]:
(base / f).touch()
for old, new in plan_renames(base, "2024", "2025"):
print(f"{old} -> {new}")
for old, new in plan_renames(base, "v1", "v2"):
print(f"{old} -> {new}")
Output:
report_2024.csv -> report_2025.csv
data_2024.csv -> data_2025.csv
summary_2024.csv -> summary_2025.csv
backup_v1.json -> backup_v2.json
config_v1.json -> config_v2.json
How it works:
.iterdir()lists all entries in the directory..is_file()filters to files only.search in p.stemchecks if the stem contains the target string (.stemexcludes the extension, so we do not accidentally match.csv2024)..with_stem(new_stem)creates a newPathwith the stem replaced but the suffix preserved. This is cleaner than manual string manipulation.readme.txtis not matched by either search because its stem (readme) contains neither2024norv1.
Key insight: .with_stem() (Python 3.9+) and .with_suffix() are the safe way to transform filenames. They guarantee the path structure is preserved — you cannot accidentally corrupt the directory portion or create invalid paths. For actual renaming, you would call p.rename(new_path) on each pair. Always generate a dry-run plan first (as this function does) before executing destructive renames in production scripts.
Expected Output
report_2024.csv -> report_2025.csv\ndata_2024.csv -> data_2025.csv\nsummary_2024.csv -> summary_2025.csv\nbackup_v1.json -> backup_v2.json\nconfig_v1.json -> config_v2.jsonHints
Hint 1: .with_stem() replaces the stem (filename without extension) while keeping the suffix.
Hint 2: Use string .replace() on the stem to transform just the part you need.
