Skip to main content

pathlib - Modern Path Manipulation in Python

Reading time: ~16 minutes | Level: Foundation → Engineering

Here is code that most Python tutorials still teach:

import os

# Old style - fragile, verbose, error-prone
base = "/data/projects"
project = "myapp"
config_file = os.path.join(base, project, "config", "settings.json")
config_dir = os.path.dirname(config_file)

if not os.path.exists(config_dir):
os.makedirs(config_dir)

if os.path.isfile(config_file):
with open(config_file, "r") as f:
content = f.read()

And here is the same code with pathlib, the modern approach added in Python 3.4:

from pathlib import Path

# Modern style - reads like what it is
config_file = Path("/data/projects") / "myapp" / "config" / "settings.json"
config_file.parent.mkdir(parents=True, exist_ok=True)

if config_file.is_file():
content = config_file.read_text(encoding="utf-8")

The pathlib version is shorter, reads like English, is cross-platform without extra effort, and does not require juggling strings. This page covers pathlib at engineering depth - from the object model to glob patterns to cross-platform path manipulation.

What You Will Learn

  • Why pathlib.Path beats os.path strings - the OOP model, operator overloading, and cross-platform behavior
  • Creating paths: Path('.'), Path.home(), Path.cwd(), absolute paths
  • The / operator for path composition and why it is type-safe
  • All path attributes: .name, .stem, .suffix, .suffixes, .parent, .parents, .parts
  • Path inspection methods: .exists(), .is_file(), .is_dir(), .stat(), .resolve(), .absolute()
  • Reading and writing files directly through Path objects
  • Globbing: .glob(), .rglob() - lazy iterators for batch file processing
  • Directory creation, renaming, deletion
  • Cross-platform paths: PurePosixPath, PureWindowsPath for path manipulation without I/O
  • Real-world patterns: project structure discovery, batch file processing, config file location

Prerequisites

  • Python 3.6+ (pathlib is stable from 3.6, fully integrated from 3.4)
  • Familiarity with open() and basic file I/O
  • Understanding of the filesystem concept: files, directories, and paths

Mental Model: Path as an Object, Not a String

Old approach - path as a string:

  • "/data/projects/myapp/config.json" - just a string, no knowledge of filesystem structure
  • Operations require function calls: os.path.join(), os.path.basename()
  • Easy to create invalid paths: "/data/projects" + "myapp" = "/data/projectsmyapp"

New approach - path as an object (pathlib.Path):

  • Knows its own structure: .name, .parent, .suffix
  • Operations are methods and operators: path / "subdir", path.parent
  • Type-safe composition: Path / str = Path (never concatenates incorrectly)
  • OS-aware: uses / on Unix, \ on Windows automatically

For Path("/data/projects/myapp/config.json"):

AttributeValue
.parts('/', 'data', 'projects', 'myapp', 'config.json')
.parentPath('/data/projects/myapp')
.parents[/data/projects/myapp, /data/projects, /data, /]
.name'config.json'
.stem'config'
.suffix'.json'
.suffixes['.json']

Part 1 - Creating Path Objects

Basic Construction

from pathlib import Path

# From a string
p = Path("/usr/local/bin/python3")
print(type(p)) # <class 'pathlib.PosixPath'> (on Linux/macOS)
# <class 'pathlib.WindowsPath'> (on Windows)

# Relative path
p = Path("data/config.json")
print(p) # data/config.json

# Current directory
cwd = Path(".")
cwd = Path.cwd() # absolute path to current directory

# Home directory
home = Path.home() # /Users/alice (macOS), /home/alice (Linux), C:\Users\alice (Win)

# Root
root = Path("/") # Unix root

The / Operator - Type-Safe Path Composition

base = Path("/data/projects")
project_dir = base / "myapp" # Path('/data/projects/myapp')
config_dir = project_dir / "config" # Path('/data/projects/myapp/config')
settings = config_dir / "settings.json" # Path('/data/projects/myapp/config/settings.json')

# Chain it:
settings = Path("/data/projects") / "myapp" / "config" / "settings.json"

# Mix str and Path - str always on the right:
subdir = Path("/data") / "logs" / "2024"
print(subdir) # /data/logs/2024

# WRONG - str / str does not work:
# result = "/data" / "logs" # TypeError: unsupported operand type(s) for /

:::tip Why / is better than os.path.join() os.path.join("/data", "/config") returns "/config" - the second argument is absolute, silently discarding the first. The / operator raises an error in ambiguous cases and behaves predictably. Additionally, typos like os.path.join missing a level are harder to spot in string form. :::

Path from Parts

# pathlib.Path accepts multiple parts in the constructor
p = Path("/usr", "local", "bin", "python3")
print(p) # /usr/local/bin/python3

# Equivalent to:
p = Path("/usr") / "local" / "bin" / "python3"

# Absolute path from parts:
p = Path.home() / ".config" / "myapp" / "config.toml"
print(p) # /home/alice/.config/myapp/config.toml

Part 2 - Path Attributes

.name, .stem, .suffix, .suffixes

p = Path("/data/reports/sales_2024.csv.gz")

print(p.name) # 'sales_2024.csv.gz' - full filename with all extensions
print(p.stem) # 'sales_2024.csv' - filename without last extension
print(p.suffix) # '.gz' - last extension only
print(p.suffixes) # ['.csv', '.gz'] - all extensions

p2 = Path("/data/config.json")
print(p2.name) # 'config.json'
print(p2.stem) # 'config'
print(p2.suffix) # '.json'
print(p2.suffixes) # ['.json']

# Paths without extension:
p3 = Path("/usr/bin/python3")
print(p3.stem) # 'python3'
print(p3.suffix) # '' - empty string
print(p3.suffixes) # [] - empty list

.parent and .parents

p = Path("/data/projects/myapp/config/settings.json")

print(p.parent) # /data/projects/myapp/config
print(p.parent.parent) # /data/projects/myapp
print(p.parent.parent.parent) # /data/projects

# .parents is a sequence of all ancestor paths
for ancestor in p.parents:
print(ancestor)
# /data/projects/myapp/config
# /data/projects/myapp
# /data/projects
# /data
# /

# Index into parents:
print(p.parents[0]) # /data/projects/myapp/config (immediate parent)
print(p.parents[2]) # /data/projects (grandparent.parent)
print(p.parents[-1]) # / (root, Python 3.10+)

.parts

p = Path("/data/projects/myapp/config.json")

print(p.parts)
# ('/', 'data', 'projects', 'myapp', 'config.json')

# Useful for cross-platform path decomposition
p_win = Path("C:/Users/Alice/Documents/file.txt")
print(p_win.parts)
# ('C:\\', 'Users', 'Alice', 'Documents', 'file.txt') (on Windows)

# Reconstruct from parts:
reconstructed = Path(*p.parts)
print(reconstructed) # /data/projects/myapp/config.json

Modifying Path Components

p = Path("/data/reports/sales_2024.csv")

# Change the filename
new_name = p.with_name("purchases_2024.csv")
print(new_name) # /data/reports/purchases_2024.csv

# Change just the extension
new_ext = p.with_suffix(".xlsx")
print(new_ext) # /data/reports/sales_2024.xlsx

# Change both
new = p.with_name("data").with_suffix(".json")
print(new) # /data/reports/data.json

# with_stem() - Python 3.9+
new_stem = p.with_stem("revenue_2024")
print(new_stem) # /data/reports/revenue_2024.csv

Part 3 - Path Inspection Methods

Existence and Type Checks

from pathlib import Path

p = Path("/etc/hosts")

# Does it exist at all?
print(p.exists()) # True (if /etc/hosts exists)

# What type is it?
print(p.is_file()) # True
print(p.is_dir()) # False
print(p.is_symlink()) # True if it's a symbolic link
print(p.is_absolute())# True (starts from root)

# Check relative path
rel = Path("data/config.json")
print(rel.is_absolute()) # False

.stat() - File Metadata

p = Path("/etc/hosts")
stat = p.stat()

print(stat.st_size) # file size in bytes: 285
print(stat.st_mtime) # modification time: 1705312321.4 (Unix timestamp)
print(stat.st_ctime) # metadata change time (Unix) / creation time (Windows)
print(stat.st_mode) # file permissions: 33188 (0o100644)

# Human-readable modification time:
from datetime import datetime
mtime = datetime.fromtimestamp(stat.st_mtime)
print(mtime) # 2024-01-15 10:32:01.423456

# File size in human units:
def human_size(size_bytes):
for unit in ["B", "KB", "MB", "GB", "TB"]:
if size_bytes < 1024:
return f"{size_bytes:.1f} {unit}"
size_bytes /= 1024

p = Path("/usr/bin/python3")
if p.exists():
print(human_size(p.stat().st_size)) # e.g. "4.2 MB"

.resolve() and .absolute()

from pathlib import Path
import os

# .resolve() - make absolute AND resolve symlinks, .. and .
p = Path("../../data/../config.json")
resolved = p.resolve() # absolute path with all .. resolved
print(resolved) # /absolute/path/to/config.json

# .absolute() - make absolute without resolving symlinks (Python 3.11+)
# On older Python, use .resolve() for both purposes

# Practical use: always work with absolute paths
config = Path("config.json").resolve()
print(config.parent) # the directory your script is in (or cwd)
# Common pattern: find path relative to the current script
import sys
from pathlib import Path

# __file__ is the current script's path (may be relative)
SCRIPT_DIR = Path(__file__).resolve().parent
DATA_DIR = SCRIPT_DIR / "data"
CONFIG_FILE = SCRIPT_DIR.parent / "config" / "settings.json"

# This works regardless of where you run the script from

Part 4 - Reading and Writing with Path

read_text() and write_text()

from pathlib import Path

config_path = Path("/etc/myapp/config.json")

# Read entire file as string - no explicit open() needed
if config_path.exists():
content = config_path.read_text(encoding="utf-8")
print(content)

# Write string to file - creates or overwrites
output_path = Path("/tmp/output.txt")
output_path.write_text("Hello, World!\n", encoding="utf-8")

# Chain: write and immediately read back
result = output_path.write_text("data\n", encoding="utf-8")
print(result) # 6 - number of characters written

:::note read_text() / write_text() load the full file Like f.read(), Path.read_text() loads the entire file into memory. Use it for configuration files and small data files. For large files (logs, datasets), use open(path, ...) with iteration. :::

read_bytes() and write_bytes()

from pathlib import Path

# Read binary files
png_path = Path("/data/logo.png")
raw = png_path.read_bytes() # returns bytes
print(raw[:8]) # b'\x89PNG\r\n\x1a\n'

# Write binary data
binary_path = Path("/tmp/data.bin")
binary_path.write_bytes(b"\x00\x01\x02\x03")

Using Path with open()

from pathlib import Path

p = Path("/data/largefile.csv")

# Path objects work natively as the first argument to open()
with open(p, "r", encoding="utf-8") as f:
for line in f:
process(line)

# Or use the .open() method (same result):
with p.open("r", encoding="utf-8") as f:
for line in f:
process(line)

The .open() method on a Path object is exactly open(self, ...) - same parameters, same result. For large files, always use open(p, ...) with iteration rather than p.read_text().

Part 5 - Globbing: Finding Files by Pattern

.glob() - Match in a Directory

from pathlib import Path

data_dir = Path("/data/logs")

# Find all .log files in data_dir (not recursive)
for log_file in data_dir.glob("*.log"):
print(log_file)
# /data/logs/app.log
# /data/logs/error.log
# /data/logs/access.log

# Find all .py files matching a pattern
src_dir = Path("/project/src")
for py_file in src_dir.glob("test_*.py"):
print(py_file.name)
# test_models.py
# test_views.py

# Single directory wildcard:
for f in data_dir.glob("2024-*/access.log"):
print(f)
# /data/logs/2024-01/access.log
# /data/logs/2024-02/access.log

.rglob() - Recursive Glob

from pathlib import Path

project = Path("/project")

# Find all Python files recursively
for py_file in project.rglob("*.py"):
print(py_file)
# /project/src/models.py
# /project/src/views/home.py
# /project/tests/test_models.py
# ...

# Equivalent to glob("**/*.py"):
for py_file in project.glob("**/*.py"):
print(py_file)

# Find all __init__.py files to discover packages
packages = list(project.rglob("__init__.py"))
print(f"Found {len(packages)} Python packages")

:::tip Glob returns lazy iterators .glob() and .rglob() return generators - they do not scan all files upfront. Wrapping in list() forces evaluation:

# Lazy - memory efficient for large trees:
for f in project.rglob("*.log"):
process(f)

# Eager - loads all paths into memory first:
log_files = list(project.rglob("*.log"))
print(f"Found {len(log_files)} log files")

:::

Glob Patterns Reference

from pathlib import Path

p = Path("/data")

# * - match anything in one directory level
p.glob("*.txt") # all .txt in /data
p.glob("logs/app*.log") # app*.log in /data/logs

# ? - match exactly one character
p.glob("data_?.csv") # data_1.csv, data_A.csv, etc.

# [seq] - match one character from the set
p.glob("report_[0-9].pdf") # report_1.pdf through report_9.pdf

# ** - match any number of directory levels (only in rglob or glob("**"))
p.glob("**/*.py") # all .py files recursively

# Multiple extensions (not native glob - combine with logic):
py_files = list(p.rglob("*.py")) + list(p.rglob("*.pyx"))

Practical: Batch File Processing

from pathlib import Path
import json

def process_all_configs(config_dir):
"""Load and validate all JSON config files in a directory tree."""
config_dir = Path(config_dir)
errors = []
configs = {}

for config_file in sorted(config_dir.rglob("*.json")):
try:
content = config_file.read_text(encoding="utf-8")
data = json.loads(content)
# Use path relative to config_dir as key
rel_path = config_file.relative_to(config_dir)
configs[str(rel_path)] = data
print(f"Loaded: {rel_path}")
except json.JSONDecodeError as e:
errors.append((config_file, str(e)))
print(f"Invalid JSON: {config_file}: {e}")

if errors:
print(f"\n{len(errors)} file(s) had errors")

return configs

configs = process_all_configs("/etc/myapp")

Part 6 - Directory Operations

mkdir() - Creating Directories

from pathlib import Path

# Create a single directory (fails if parent missing or already exists)
Path("/tmp/newdir").mkdir()

# Create with all parents (like mkdir -p):
Path("/tmp/deep/nested/dir").mkdir(parents=True, exist_ok=True)
# parents=True - create all missing parent directories
# exist_ok=True - do not raise if directory already exists

# Without exist_ok - raises if directory exists:
try:
Path("/tmp").mkdir()
except FileExistsError:
print("/tmp already exists")

# Practical pattern - always use both flags for idempotency:
output_dir = Path("/data/reports/2024/january")
output_dir.mkdir(parents=True, exist_ok=True)

iterdir() - List Directory Contents

from pathlib import Path

d = Path("/usr/bin")

# List all entries (not recursive - use glob for recursive)
for entry in d.iterdir():
print(entry.name, "dir" if entry.is_dir() else "file")

# Filter: only files
files = [e for e in d.iterdir() if e.is_file()]

# Filter: only directories
subdirs = [e for e in d.iterdir() if e.is_dir()]

# Sorted listing:
for f in sorted(d.iterdir(), key=lambda p: p.name.lower()):
print(f.name)
from pathlib import Path

# Rename a file (same as os.rename)
old = Path("/tmp/old_name.txt")
new = Path("/tmp/new_name.txt")
old.rename(new)

# Atomically replace (same as os.replace - overwrites if target exists)
tmp = Path("/tmp/data.tmp")
target = Path("/data/config.json")
tmp.replace(target) # atomic on POSIX

# Delete a file
p = Path("/tmp/temp_file.txt")
p.unlink() # raises FileNotFoundError if missing

# Delete if exists (Python 3.8+):
p.unlink(missing_ok=True) # no error if file does not exist

# Equivalent old style:
from contextlib import suppress
with suppress(FileNotFoundError):
p.unlink()

Deleting Directories

from pathlib import Path
import shutil

# Remove empty directory
Path("/tmp/emptydir").rmdir() # raises if not empty

# Remove directory and all contents (recursive):
shutil.rmtree("/tmp/build_artifacts") # still requires shutil

# With pathlib (Python 3.12+) - Path.rmdir() only removes empty dirs
# For non-empty, shutil is still needed:
build_dir = Path("/tmp/build")
if build_dir.exists():
shutil.rmtree(build_dir)

Part 7 - Cross-Platform Paths: PurePath

PurePath classes let you manipulate path strings without any filesystem I/O. They are platform-independent - useful for testing, cross-platform path logic, and server-side path manipulation.

PurePosixPath - Unix Path Manipulation Anywhere

from pathlib import PurePosixPath

# Manipulate Unix paths on any platform (including Windows)
p = PurePosixPath("/home/alice/docs/report.pdf")
print(p.parent) # /home/alice/docs
print(p.name) # report.pdf
print(p.stem) # report
print(p.suffix) # .pdf

# Build Unix paths on Windows:
remote_path = PurePosixPath("/var/www/html") / "assets" / "style.css"
print(remote_path) # /var/www/html/assets/style.css
print(str(remote_path)) # '/var/www/html/assets/style.css'

PureWindowsPath - Windows Path Manipulation Anywhere

from pathlib import PureWindowsPath

# Manipulate Windows paths on Linux (for testing or path translation)
p = PureWindowsPath(r"C:\Users\Alice\Documents\report.docx")
print(p.drive) # C:
print(p.root) # \
print(p.parts) # ('C:\\', 'Users', 'Alice', 'Documents', 'report.docx')
print(p.name) # report.docx
print(p.parent) # C:\Users\Alice\Documents

# Windows paths are case-insensitive for comparison:
p1 = PureWindowsPath("C:/Windows/system32/notepad.exe")
p2 = PureWindowsPath("c:/WINDOWS/SYSTEM32/NOTEPAD.EXE")
print(p1 == p2) # True - case-insensitive

# Posix paths are case-sensitive:
from pathlib import PurePosixPath
p3 = PurePosixPath("/data/File.txt")
p4 = PurePosixPath("/data/file.txt")
print(p3 == p4) # False

Real-World Use: URL-to-Path Translation

from pathlib import PurePosixPath, Path

def url_path_to_local(url_path, base_dir):
"""
Translate a URL path to a local filesystem path safely.
Prevents path traversal attacks (../../etc/passwd).
"""
# Parse the URL path using PurePosixPath (always Unix-style)
url_parts = PurePosixPath(url_path)

# Reject paths with traversal components
if ".." in url_parts.parts:
raise ValueError(f"Path traversal detected: {url_path}")

# Build local path
local_base = Path(base_dir).resolve()
local_path = local_base

for part in url_parts.parts:
if part == "/":
continue
local_path = local_path / part

# Verify the result is still under base_dir
resolved = local_path.resolve()
if not str(resolved).startswith(str(local_base)):
raise ValueError(f"Path escapes base directory: {local_path}")

return resolved


# Safe:
p = url_path_to_local("/assets/css/style.css", "/var/www")
print(p) # /var/www/assets/css/style.css

# Unsafe - raises:
try:
p = url_path_to_local("/../../etc/passwd", "/var/www")
except ValueError as e:
print(e) # Path traversal detected: /../../etc/passwd

Part 8 - Real-World Patterns

Pattern 1: Project Structure Discovery

from pathlib import Path
from typing import Dict, List


def discover_project(root: Path) -> Dict[str, List[Path]]:
"""
Discover key files and directories in a Python project.
Returns a dict mapping category → list of paths.
"""
root = root.resolve()

return {
"python_files": sorted(root.rglob("*.py")),
"test_files": sorted(root.rglob("test_*.py")) +
sorted(root.rglob("*_test.py")),
"config_files": (
sorted(root.rglob("*.toml")) +
sorted(root.rglob("*.yaml")) +
sorted(root.rglob("*.yml")) +
sorted(root.rglob("*.ini")) +
sorted(root.rglob("*.cfg"))
),
"requirements": sorted(root.rglob("requirements*.txt")),
"notebooks": sorted(root.rglob("*.ipynb")),
"data_files": sorted(root.rglob("*.csv")) + sorted(root.rglob("*.json")),
}


def print_project_summary(root_path: str):
root = Path(root_path)
if not root.is_dir():
print(f"Not a directory: {root_path}")
return

structure = discover_project(root)

print(f"Project: {root.name}")
print(f"Location: {root}")
print()

for category, files in structure.items():
if files:
print(f"{category}: {len(files)} file(s)")
for f in files[:5]: # show first 5
print(f" {f.relative_to(root)}")
if len(files) > 5:
print(f" ... and {len(files) - 5} more")
print()


print_project_summary("/Users/alice/projects/myapp")

Pattern 2: Batch File Renaming

from pathlib import Path


def normalize_filenames(directory, dry_run=True):
"""
Normalize all filenames in a directory:
- Lowercase
- Replace spaces with underscores
- Remove special characters

dry_run=True: print what would happen without changing files.
"""
import re
directory = Path(directory)

renames = []
for f in sorted(directory.iterdir()):
if f.is_dir():
continue

stem = f.stem.lower()
stem = re.sub(r"\s+", "_", stem) # spaces to underscores
stem = re.sub(r"[^a-z0-9_.-]", "", stem) # remove special chars
stem = re.sub(r"_+", "_", stem) # collapse multiple underscores
stem = stem.strip("_")

new_name = stem + f.suffix.lower()
new_path = f.parent / new_name

if new_path != f:
renames.append((f, new_path))

if not renames:
print("No renames needed.")
return

for old, new in renames:
print(f" {old.name!r}{new.name!r}")

if not dry_run:
for old, new in renames:
old.rename(new)
print(f"\nRenamed {len(renames)} file(s).")
else:
print(f"\nDry run: {len(renames)} rename(s) would occur. Pass dry_run=False to apply.")


normalize_filenames("/data/uploads", dry_run=True)

Pattern 3: Config File Location (XDG Standard)

from pathlib import Path
import os
import sys


def find_config_file(app_name: str, filename: str = "config.toml") -> Path:
"""
Find a configuration file following platform conventions:
- Linux/macOS: XDG_CONFIG_HOME (default: ~/.config/APP/filename)
- Windows: APPDATA/APP/filename
- Fallback: ~/APP/filename

Returns the Path whether or not the file exists.
"""
if sys.platform == "win32":
base = Path(os.environ.get("APPDATA", Path.home()))
else:
xdg_config = os.environ.get("XDG_CONFIG_HOME", "")
if xdg_config:
base = Path(xdg_config)
else:
base = Path.home() / ".config"

return base / app_name / filename


def load_or_create_config(app_name: str) -> dict:
"""Load config if it exists, or create a default one."""
import json

config_path = find_config_file(app_name)

if config_path.exists():
return json.loads(config_path.read_text(encoding="utf-8"))

# Create default config
default_config = {
"version": 1,
"debug": False,
"log_level": "INFO",
}
config_path.parent.mkdir(parents=True, exist_ok=True)
config_path.write_text(
json.dumps(default_config, indent=2) + "\n",
encoding="utf-8"
)
print(f"Created default config: {config_path}")
return default_config


config = load_or_create_config("myapp")
print(config)

Pattern 4: Smart File Organizer

from pathlib import Path
import shutil
from datetime import datetime


def organize_downloads(downloads_dir, target_dir, dry_run=True):
"""
Organize files in downloads_dir into target_dir/YYYY-MM/extension/
sorted by modification date.
"""
downloads = Path(downloads_dir)
target = Path(target_dir)

operations = []

for f in downloads.iterdir():
if f.is_dir():
continue

# Get modification date
mtime = f.stat().st_mtime
date = datetime.fromtimestamp(mtime)
month_dir = f"{date.year}-{date.month:02d}"

# Get extension (without the dot, lowercase)
ext = f.suffix.lstrip(".").lower() or "no_extension"

dest_dir = target / month_dir / ext
dest_file = dest_dir / f.name

# Handle naming conflicts
counter = 1
while dest_file.exists():
dest_file = dest_dir / f"{f.stem}_{counter}{f.suffix}"
counter += 1

operations.append((f, dest_file))

print(f"Would organize {len(operations)} files:")
for src, dst in operations[:10]:
print(f" {src.name}{dst.relative_to(target)}")
if len(operations) > 10:
print(f" ... and {len(operations) - 10} more")

if not dry_run:
for src, dst in operations:
dst.parent.mkdir(parents=True, exist_ok=True)
shutil.move(str(src), str(dst))
print(f"\nOrganized {len(operations)} files.")


organize_downloads("/home/alice/Downloads", "/home/alice/Sorted", dry_run=True)

Interview Questions

Q1: Why is pathlib.Path preferred over os.path string manipulation?

Answer: Several reasons: (1) Readability - Path("/data") / "reports" / "2024.csv" reads like a path; os.path.join("/data", "reports", "2024.csv") reads like a function call. (2) Type safety - the / operator only works when the left side is a Path; you cannot accidentally concatenate two strings. (3) Methods over global functions - attributes and methods like .stem, .suffix, .parent, .exists(), .is_file() are on the object itself, not spread across os.path. (4) Cross-platform - Path uses \ on Windows and / on Unix automatically; string paths require manual handling. (5) Richer API - .read_text(), .write_text(), .mkdir(parents=True, exist_ok=True), .rglob() are on the object; os.path requires open(), os.makedirs(), glob.glob() separately.

Q2: What is the difference between .resolve() and .absolute()?

Answer: Both convert a relative path to an absolute path. The key difference is symlink handling: .resolve() resolves all symlinks, .. components, and . components - it returns the canonical path as the filesystem sees it. If the path does not exist, Python 3.6+ resolve() still attempts to resolve what it can. .absolute() (available since Python 3.11 as stable API; earlier it was documented as provisional) makes a path absolute relative to the current directory but does NOT resolve symlinks or normalize ... For most use cases, .resolve() is what you want - it gives you the definitive path. Use .absolute() when you want to avoid touching the filesystem during path resolution.

Q3: What is the difference between Path.glob() and Path.rglob()?

Answer: .glob(pattern) searches only within the immediate directory, with optional subdirectory matching using **/ in the pattern. For example, path.glob("*.py") finds .py files only in path, and path.glob("**/*.py") finds them recursively. .rglob(pattern) is shorthand for path.glob("**/" + pattern) - it always searches recursively through all subdirectories. Both return lazy iterators (generators), so they do not load all results into memory at once. Use .rglob("*.py") for recursive searches, .glob("*.py") for single-directory searches.

Q4: How does pathlib handle cross-platform path manipulation? What is the difference between Path, PurePath, PurePosixPath, and PureWindowsPath?

Answer: Path is the concrete class for actual filesystem operations - it automatically becomes PosixPath on Linux/macOS or WindowsPath on Windows. PurePath classes do path string manipulation without any filesystem access. PurePosixPath always uses Unix path semantics (forward slashes, case-sensitive) regardless of the OS you run on. PureWindowsPath always uses Windows path semantics (backslashes, drive letters, case-insensitive) regardless of OS. Use PurePosixPath when you need to manipulate server-side Unix paths on a Windows dev machine, parse URL paths, or test path logic without touching the filesystem. Use PureWindowsPath to manipulate Windows UNC paths or share paths on a Linux server.

Q5: How do you safely read a large directory tree without running out of memory?

Answer: Use .rglob() or .glob() as lazy iterators - do not wrap them in list() unless you need random access to all paths. The iterator yields one path at a time:

for py_file in project_dir.rglob("*.py"):
analyze(py_file) # only one Path in memory at a time

This works for directories with millions of files. If you need sorting (which requires all paths), be aware that sorted(path.rglob("*.py")) will load all paths into memory first. For very large trees, process in streaming fashion and avoid sorting unless necessary.

Q6: You have a list of file paths as strings from a configuration file. How would you safely join them with a base directory using pathlib?

Answer:

from pathlib import Path

base = Path("/data/projects")
user_paths = ["../../../etc/passwd", "reports/2024.csv", "config.json"]

safe_paths = []
for user_path in user_paths:
# Construct the full path
full = (base / user_path).resolve()

# Verify it is still under base (path traversal protection)
base_resolved = base.resolve()
try:
full.relative_to(base_resolved)
safe_paths.append(full)
except ValueError:
print(f"Security: rejected path traversal attempt: {user_path!r}")

# ../../../etc/passwd is rejected, reports/2024.csv and config.json are accepted

The key is .resolve() to canonicalize the path, then .relative_to() to verify the result is still under the expected base directory. relative_to() raises ValueError if the path is not under the base, making it a clean check for path traversal attacks.

Practice Challenges

Beginner - Project File Counter

Write a function count_files(directory) that returns a dictionary mapping file extension to count of files with that extension in the directory tree.

# Example:
# count_files("/project") → {'.py': 42, '.json': 8, '.md': 12, '': 3}
Solution
from pathlib import Path
from collections import defaultdict


def count_files(directory):
"""
Count files by extension in a directory tree.
Returns a dict mapping extension (including dot) to count.
Files without extension use empty string as key.
"""
directory = Path(directory)
if not directory.is_dir():
raise NotADirectoryError(f"Not a directory: {directory}")

counts = defaultdict(int)
for path in directory.rglob("*"):
if path.is_file():
counts[path.suffix] += 1

# Return as regular dict, sorted by count descending
return dict(sorted(counts.items(), key=lambda item: -item[1]))


# Test
import tempfile, os

with tempfile.TemporaryDirectory() as tmpdir:
root = Path(tmpdir)

# Create test structure
(root / "src").mkdir()
(root / "src" / "models").mkdir()
(root / "tests").mkdir()
(root / "docs").mkdir()

for i in range(5):
(root / "src" / f"module_{i}.py").write_text("# code", encoding="utf-8")
(root / "src" / "models" / "user.py").write_text("# model", encoding="utf-8")
for i in range(3):
(root / "tests" / f"test_{i}.py").write_text("# test", encoding="utf-8")
(root / "docs" / "readme.md").write_text("# docs", encoding="utf-8")
(root / "config.json").write_text("{}", encoding="utf-8")
(root / "Makefile").write_text("all:", encoding="utf-8") # no extension

result = count_files(tmpdir)
print(result)
# {'.py': 9, '.md': 1, '.json': 1, '': 1}

for ext, count in result.items():
label = ext if ext else "(no extension)"
print(f" {label}: {count}")

Intermediate - Find Duplicate Files

Write a function find_duplicates(directory) that returns a dictionary mapping file content hash to a list of paths that have identical content.

Solution
from pathlib import Path
from collections import defaultdict
import hashlib


def file_hash(path: Path, chunk_size: int = 65536) -> str:
"""Compute SHA-256 hash of a file without loading it all into memory."""
hasher = hashlib.sha256()
with open(path, "rb") as f:
while True:
chunk = f.read(chunk_size)
if not chunk:
break
hasher.update(chunk)
return hasher.hexdigest()


def find_duplicates(directory):
"""
Find duplicate files in a directory tree.
Returns a dict: {hash: [Path, Path, ...]} for hashes with 2+ files.
Skips files that cannot be read.
"""
directory = Path(directory)
hash_to_paths = defaultdict(list)

# First pass: group files by size (cheap filter - different sizes can't be equal)
size_to_paths = defaultdict(list)
for path in directory.rglob("*"):
if not path.is_file():
continue
try:
size = path.stat().st_size
size_to_paths[size].append(path)
except OSError:
continue

# Second pass: hash only files that share a size
for size, paths in size_to_paths.items():
if len(paths) < 2:
continue # unique size = definitely unique content
for path in paths:
try:
h = file_hash(path)
hash_to_paths[h].append(path)
except (OSError, PermissionError):
continue

# Return only hashes with duplicates
return {h: paths for h, paths in hash_to_paths.items() if len(paths) > 1}


# Test
import tempfile, os

with tempfile.TemporaryDirectory() as tmpdir:
root = Path(tmpdir)
(root / "a").mkdir()
(root / "b").mkdir()

# Create some duplicates
content_1 = b"Hello, World!"
content_2 = b"Different content"

(root / "a" / "file1.txt").write_bytes(content_1)
(root / "b" / "file2.txt").write_bytes(content_1) # duplicate of file1
(root / "a" / "file3.txt").write_bytes(content_2)
(root / "b" / "file4.txt").write_bytes(b"Unique content")

dupes = find_duplicates(tmpdir)

if dupes:
print(f"Found {len(dupes)} group(s) of duplicate files:")
for h, paths in dupes.items():
print(f"\n Hash: {h[:16]}...")
for p in paths:
print(f" {p.relative_to(root)} ({p.stat().st_size} bytes)")
else:
print("No duplicates found.")

# Found 1 group(s) of duplicate files:
#
# Hash: dffd6021bb2bd5b0...
# a/file1.txt (13 bytes)
# b/file2.txt (13 bytes)

Advanced - File Watcher with Change Detection

Build a FileWatcher class that monitors a directory for changes (new files, deleted files, modified files) between two snapshots, using pathlib for all filesystem operations.

Solution
from pathlib import Path
from dataclasses import dataclass, field
from typing import Dict, Set, List
import hashlib
import time


@dataclass
class FileSnapshot:
"""A snapshot of a directory's file state."""
root: Path
files: Dict[str, tuple] = field(default_factory=dict)
# {relative_path_str: (size, mtime, hash_first_8kb)}

@classmethod
def take(cls, directory: Path, follow_symlinks: bool = False):
"""Take a snapshot of all files in a directory tree."""
snap = cls(root=directory.resolve())
for path in directory.rglob("*"):
if not path.is_file():
continue
try:
stat = path.stat()
# Read first 8KB for a quick content fingerprint
first_bytes = path.read_bytes()[:8192]
quick_hash = hashlib.md5(first_bytes).hexdigest()
rel = str(path.relative_to(directory))
snap.files[rel] = (stat.st_size, stat.st_mtime, quick_hash)
except (OSError, PermissionError):
continue
return snap


@dataclass
class DirectoryDiff:
"""The differences between two directory snapshots."""
added: List[str] = field(default_factory=list)
deleted: List[str] = field(default_factory=list)
modified: List[str] = field(default_factory=list)

@property
def has_changes(self):
return bool(self.added or self.deleted or self.modified)

def report(self):
if not self.has_changes:
print("No changes detected.")
return
if self.added:
print(f"Added ({len(self.added)}):")
for f in sorted(self.added):
print(f" + {f}")
if self.deleted:
print(f"Deleted ({len(self.deleted)}):")
for f in sorted(self.deleted):
print(f" - {f}")
if self.modified:
print(f"Modified ({len(self.modified)}):")
for f in sorted(self.modified):
print(f" ~ {f}")


def diff_snapshots(before: FileSnapshot, after: FileSnapshot) -> DirectoryDiff:
"""Compare two snapshots and return the differences."""
before_set = set(before.files.keys())
after_set = set(after.files.keys())

added = sorted(after_set - before_set)
deleted = sorted(before_set - after_set)
modified = sorted(
f for f in before_set & after_set
if before.files[f] != after.files[f]
)

return DirectoryDiff(added=added, deleted=deleted, modified=modified)


class FileWatcher:
"""Watch a directory for changes between poll intervals."""

def __init__(self, directory):
self.directory = Path(directory).resolve()
self._last_snapshot = None

def reset(self):
"""Take the initial snapshot."""
self._last_snapshot = FileSnapshot.take(self.directory)
print(f"Watching {self.directory} ({len(self._last_snapshot.files)} files)")
return self

def poll(self) -> DirectoryDiff:
"""Check for changes since last snapshot."""
if self._last_snapshot is None:
raise RuntimeError("Call reset() before poll()")
new_snapshot = FileSnapshot.take(self.directory)
diff = diff_snapshots(self._last_snapshot, new_snapshot)
self._last_snapshot = new_snapshot
return diff


# Demo
import tempfile, os

with tempfile.TemporaryDirectory() as tmpdir:
root = Path(tmpdir)
(root / "src").mkdir()

# Create initial files
(root / "src" / "main.py").write_text("print('hello')", encoding="utf-8")
(root / "src" / "utils.py").write_text("def add(a, b): return a+b", encoding="utf-8")
(root / "README.md").write_text("# My Project", encoding="utf-8")

# Take initial snapshot
watcher = FileWatcher(tmpdir).reset()

# Simulate changes
time.sleep(0.01)
(root / "src" / "main.py").write_text("print('changed')", encoding="utf-8") # modify
(root / "src" / "new_module.py").write_text("# new", encoding="utf-8") # add
(root / "README.md").unlink() # delete

# Poll for changes
diff = watcher.poll()
diff.report()

# Watching /tmp/.../ (3 files)
# Added (1):
# + src/new_module.py
# Deleted (1):
# - README.md
# Modified (1):
# ~ src/main.py

Quick Reference

Operationpathlibos.path equivalent
Create pathPath("/data/file.txt")"/data/file.txt"
Join pathsPath("/data") / "file.txt"os.path.join("/data", "file.txt")
Get filenamep.nameos.path.basename(p)
Get stemp.stemos.path.splitext(os.path.basename(p))[0]
Get extensionp.suffixos.path.splitext(p)[1]
Get parentp.parentos.path.dirname(p)
Make absolutep.resolve()os.path.abspath(p)
Check existsp.exists()os.path.exists(p)
Check is filep.is_file()os.path.isfile(p)
Check is dirp.is_dir()os.path.isdir(p)
File sizep.stat().st_sizeos.path.getsize(p)
Read textp.read_text(encoding="utf-8")open(p).read()
Write textp.write_text(s, encoding="utf-8")open(p, 'w').write(s)
Create dirp.mkdir(parents=True, exist_ok=True)os.makedirs(p, exist_ok=True)
List dirp.iterdir()os.listdir(p)
Find filesp.glob("*.py")glob.glob(str(p / "*.py"))
Find recursivep.rglob("*.py")glob.glob(str(p / "**/*.py"), recursive=True)
Delete filep.unlink(missing_ok=True)os.remove(p) + try/except
Renamep.rename(new)os.rename(p, new)
Change suffixp.with_suffix(".txt")os.path.splitext(p)[0] + ".txt"
Home directoryPath.home()os.path.expanduser("~")
Current dirPath.cwd()os.getcwd()

Key Takeaways

  • pathlib.Path treats filesystem paths as objects with attributes and methods, not as raw strings. This eliminates entire classes of bugs from string concatenation and makes path code more readable.
  • The / operator creates new Path objects by composing segments - it is type-safe, readable, and cross-platform (uses the correct separator automatically).
  • Path attributes (.name, .stem, .suffix, .parent, .parts) give you immediate access to components without function calls. Modification methods (.with_name(), .with_suffix(), .with_stem()) return new Path objects.
  • .glob() and .rglob() return lazy generators - they scale to directories with millions of files without loading all paths into memory. Always use for f in path.rglob("*.py"): rather than list(path.rglob(...)) unless you need sorting or counting.
  • Path.read_text() and Path.write_text() are convenient for small files. For large files, use open(path, ...) with iteration.
  • PurePosixPath and PureWindowsPath let you manipulate path strings with platform-specific semantics without any filesystem I/O - useful for testing, cross-platform code, and server-side path manipulation.
  • Always call .resolve() when accepting paths from user input, then verify with .relative_to(base) to prevent path traversal attacks.
© 2026 EngineersOfAI. All rights reserved.