pathlib - Modern Path Manipulation in Python

Reading time: ~16 minutes | Level: Foundation → Engineering

Here is code that most Python tutorials still teach:

import os

# Old style - fragile, verbose, error-prone
base = "/data/projects"
project = "myapp"
config_file = os.path.join(base, project, "config", "settings.json")
config_dir = os.path.dirname(config_file)

if not os.path.exists(config_dir):
    os.makedirs(config_dir)

if os.path.isfile(config_file):
    with open(config_file, "r") as f:
        content = f.read()

And here is the same code with pathlib, the modern approach added in Python 3.4:

from pathlib import Path

# Modern style - reads like what it is
config_file = Path("/data/projects") / "myapp" / "config" / "settings.json"
config_file.parent.mkdir(parents=True, exist_ok=True)

if config_file.is_file():
    content = config_file.read_text(encoding="utf-8")

The pathlib version is shorter, reads like English, is cross-platform without extra effort, and does not require juggling strings. This page covers pathlib at engineering depth - from the object model to glob patterns to cross-platform path manipulation.

What You Will Learn

Why pathlib.Path beats os.path strings - the OOP model, operator overloading, and cross-platform behavior
Creating paths: Path('.'), Path.home(), Path.cwd(), absolute paths
The / operator for path composition and why it is type-safe
All path attributes: .name, .stem, .suffix, .suffixes, .parent, .parents, .parts
Path inspection methods: .exists(), .is_file(), .is_dir(), .stat(), .resolve(), .absolute()
Reading and writing files directly through Path objects
Globbing: .glob(), .rglob() - lazy iterators for batch file processing
Directory creation, renaming, deletion
Cross-platform paths: PurePosixPath, PureWindowsPath for path manipulation without I/O
Real-world patterns: project structure discovery, batch file processing, config file location

Prerequisites

Python 3.6+ (pathlib is stable from 3.6, fully integrated from 3.4)
Familiarity with open() and basic file I/O
Understanding of the filesystem concept: files, directories, and paths

Mental Model: Path as an Object, Not a String

Old approach - path as a string:

"/data/projects/myapp/config.json" - just a string, no knowledge of filesystem structure
Operations require function calls: os.path.join(), os.path.basename()
Easy to create invalid paths: "/data/projects" + "myapp" = "/data/projectsmyapp"

New approach - path as an object (pathlib.Path):

Knows its own structure: .name, .parent, .suffix
Operations are methods and operators: path / "subdir", path.parent
Type-safe composition: Path / str = Path (never concatenates incorrectly)
OS-aware: uses / on Unix, \ on Windows automatically

For Path("/data/projects/myapp/config.json"):

Attribute	Value
`.parts`	`('/', 'data', 'projects', 'myapp', 'config.json')`
`.parent`	`Path('/data/projects/myapp')`
`.parents`	`[/data/projects/myapp, /data/projects, /data, /]`
`.name`	`'config.json'`
`.stem`	`'config'`
`.suffix`	`'.json'`
`.suffixes`	`['.json']`

Part 1 - Creating Path Objects

Basic Construction

from pathlib import Path

# From a string
p = Path("/usr/local/bin/python3")
print(type(p))    # <class 'pathlib.PosixPath'>  (on Linux/macOS)
                  # <class 'pathlib.WindowsPath'> (on Windows)

# Relative path
p = Path("data/config.json")
print(p)          # data/config.json

# Current directory
cwd = Path(".")
cwd = Path.cwd()   # absolute path to current directory

# Home directory
home = Path.home()   # /Users/alice (macOS), /home/alice (Linux), C:\Users\alice (Win)

# Root
root = Path("/")   # Unix root

The `/` Operator - Type-Safe Path Composition

base = Path("/data/projects")
project_dir = base / "myapp"              # Path('/data/projects/myapp')
config_dir = project_dir / "config"      # Path('/data/projects/myapp/config')
settings = config_dir / "settings.json"  # Path('/data/projects/myapp/config/settings.json')

# Chain it:
settings = Path("/data/projects") / "myapp" / "config" / "settings.json"

# Mix str and Path - str always on the right:
subdir = Path("/data") / "logs" / "2024"
print(subdir)   # /data/logs/2024

# WRONG - str / str does not work:
# result = "/data" / "logs"   # TypeError: unsupported operand type(s) for /

:::tip Why / is better than os.path.join() os.path.join("/data", "/config") returns "/config" - the second argument is absolute, silently discarding the first. The / operator raises an error in ambiguous cases and behaves predictably. Additionally, typos like os.path.join missing a level are harder to spot in string form. :::

Path from Parts

# pathlib.Path accepts multiple parts in the constructor
p = Path("/usr", "local", "bin", "python3")
print(p)   # /usr/local/bin/python3

# Equivalent to:
p = Path("/usr") / "local" / "bin" / "python3"

# Absolute path from parts:
p = Path.home() / ".config" / "myapp" / "config.toml"
print(p)   # /home/alice/.config/myapp/config.toml

Part 2 - Path Attributes

`.name`, `.stem`, `.suffix`, `.suffixes`

p = Path("/data/reports/sales_2024.csv.gz")

print(p.name)      # 'sales_2024.csv.gz'  - full filename with all extensions
print(p.stem)      # 'sales_2024.csv'     - filename without last extension
print(p.suffix)    # '.gz'               - last extension only
print(p.suffixes)  # ['.csv', '.gz']     - all extensions

p2 = Path("/data/config.json")
print(p2.name)     # 'config.json'
print(p2.stem)     # 'config'
print(p2.suffix)   # '.json'
print(p2.suffixes) # ['.json']

# Paths without extension:
p3 = Path("/usr/bin/python3")
print(p3.stem)     # 'python3'
print(p3.suffix)   # ''   - empty string
print(p3.suffixes) # []   - empty list

`.parent` and `.parents`

p = Path("/data/projects/myapp/config/settings.json")

print(p.parent)           # /data/projects/myapp/config
print(p.parent.parent)    # /data/projects/myapp
print(p.parent.parent.parent)   # /data/projects

# .parents is a sequence of all ancestor paths
for ancestor in p.parents:
    print(ancestor)
# /data/projects/myapp/config
# /data/projects/myapp
# /data/projects
# /data
# /

# Index into parents:
print(p.parents[0])   # /data/projects/myapp/config   (immediate parent)
print(p.parents[2])   # /data/projects                (grandparent.parent)
print(p.parents[-1])  # /                              (root, Python 3.10+)

`.parts`

p = Path("/data/projects/myapp/config.json")

print(p.parts)
# ('/', 'data', 'projects', 'myapp', 'config.json')

# Useful for cross-platform path decomposition
p_win = Path("C:/Users/Alice/Documents/file.txt")
print(p_win.parts)
# ('C:\\', 'Users', 'Alice', 'Documents', 'file.txt')  (on Windows)

# Reconstruct from parts:
reconstructed = Path(*p.parts)
print(reconstructed)   # /data/projects/myapp/config.json

Modifying Path Components

p = Path("/data/reports/sales_2024.csv")

# Change the filename
new_name = p.with_name("purchases_2024.csv")
print(new_name)   # /data/reports/purchases_2024.csv

# Change just the extension
new_ext = p.with_suffix(".xlsx")
print(new_ext)    # /data/reports/sales_2024.xlsx

# Change both
new = p.with_name("data").with_suffix(".json")
print(new)        # /data/reports/data.json

# with_stem() - Python 3.9+
new_stem = p.with_stem("revenue_2024")
print(new_stem)   # /data/reports/revenue_2024.csv

Part 3 - Path Inspection Methods

Existence and Type Checks

from pathlib import Path

p = Path("/etc/hosts")

# Does it exist at all?
print(p.exists())     # True (if /etc/hosts exists)

# What type is it?
print(p.is_file())    # True
print(p.is_dir())     # False
print(p.is_symlink()) # True if it's a symbolic link
print(p.is_absolute())# True (starts from root)

# Check relative path
rel = Path("data/config.json")
print(rel.is_absolute())   # False

`.stat()` - File Metadata

p = Path("/etc/hosts")
stat = p.stat()

print(stat.st_size)    # file size in bytes: 285
print(stat.st_mtime)   # modification time: 1705312321.4 (Unix timestamp)
print(stat.st_ctime)   # metadata change time (Unix) / creation time (Windows)
print(stat.st_mode)    # file permissions: 33188 (0o100644)

# Human-readable modification time:
from datetime import datetime
mtime = datetime.fromtimestamp(stat.st_mtime)
print(mtime)   # 2024-01-15 10:32:01.423456

# File size in human units:
def human_size(size_bytes):
    for unit in ["B", "KB", "MB", "GB", "TB"]:
        if size_bytes < 1024:
            return f"{size_bytes:.1f} {unit}"
        size_bytes /= 1024

p = Path("/usr/bin/python3")
if p.exists():
    print(human_size(p.stat().st_size))   # e.g. "4.2 MB"

`.resolve()` and `.absolute()`

from pathlib import Path
import os

# .resolve() - make absolute AND resolve symlinks, .. and .
p = Path("../../data/../config.json")
resolved = p.resolve()   # absolute path with all .. resolved
print(resolved)   # /absolute/path/to/config.json

# .absolute() - make absolute without resolving symlinks (Python 3.11+)
# On older Python, use .resolve() for both purposes

# Practical use: always work with absolute paths
config = Path("config.json").resolve()
print(config.parent)   # the directory your script is in (or cwd)

# Common pattern: find path relative to the current script
import sys
from pathlib import Path

# __file__ is the current script's path (may be relative)
SCRIPT_DIR = Path(__file__).resolve().parent
DATA_DIR = SCRIPT_DIR / "data"
CONFIG_FILE = SCRIPT_DIR.parent / "config" / "settings.json"

# This works regardless of where you run the script from

Part 4 - Reading and Writing with `Path`

`read_text()` and `write_text()`

from pathlib import Path

config_path = Path("/etc/myapp/config.json")

# Read entire file as string - no explicit open() needed
if config_path.exists():
    content = config_path.read_text(encoding="utf-8")
    print(content)

# Write string to file - creates or overwrites
output_path = Path("/tmp/output.txt")
output_path.write_text("Hello, World!\n", encoding="utf-8")

# Chain: write and immediately read back
result = output_path.write_text("data\n", encoding="utf-8")
print(result)   # 6 - number of characters written

:::note read_text() / write_text() load the full file Like f.read(), Path.read_text() loads the entire file into memory. Use it for configuration files and small data files. For large files (logs, datasets), use open(path, ...) with iteration. :::

`read_bytes()` and `write_bytes()`

from pathlib import Path

# Read binary files
png_path = Path("/data/logo.png")
raw = png_path.read_bytes()         # returns bytes
print(raw[:8])                       # b'\x89PNG\r\n\x1a\n'

# Write binary data
binary_path = Path("/tmp/data.bin")
binary_path.write_bytes(b"\x00\x01\x02\x03")

Using `Path` with `open()`

from pathlib import Path

p = Path("/data/largefile.csv")

# Path objects work natively as the first argument to open()
with open(p, "r", encoding="utf-8") as f:
    for line in f:
        process(line)

# Or use the .open() method (same result):
with p.open("r", encoding="utf-8") as f:
    for line in f:
        process(line)

The .open() method on a Path object is exactly open(self, ...) - same parameters, same result. For large files, always use open(p, ...) with iteration rather than p.read_text().

Part 5 - Globbing: Finding Files by Pattern

`.glob()` - Match in a Directory

from pathlib import Path

data_dir = Path("/data/logs")

# Find all .log files in data_dir (not recursive)
for log_file in data_dir.glob("*.log"):
    print(log_file)
    # /data/logs/app.log
    # /data/logs/error.log
    # /data/logs/access.log

# Find all .py files matching a pattern
src_dir = Path("/project/src")
for py_file in src_dir.glob("test_*.py"):
    print(py_file.name)
    # test_models.py
    # test_views.py

# Single directory wildcard:
for f in data_dir.glob("2024-*/access.log"):
    print(f)
    # /data/logs/2024-01/access.log
    # /data/logs/2024-02/access.log

`.rglob()` - Recursive Glob

from pathlib import Path

project = Path("/project")

# Find all Python files recursively
for py_file in project.rglob("*.py"):
    print(py_file)
    # /project/src/models.py
    # /project/src/views/home.py
    # /project/tests/test_models.py
    # ...

# Equivalent to glob("**/*.py"):
for py_file in project.glob("**/*.py"):
    print(py_file)

# Find all __init__.py files to discover packages
packages = list(project.rglob("__init__.py"))
print(f"Found {len(packages)} Python packages")

:::tip Glob returns lazy iterators .glob() and .rglob() return generators - they do not scan all files upfront. Wrapping in list() forces evaluation:

# Lazy - memory efficient for large trees:
for f in project.rglob("*.log"):
    process(f)

# Eager - loads all paths into memory first:
log_files = list(project.rglob("*.log"))
print(f"Found {len(log_files)} log files")

:::

Glob Patterns Reference

from pathlib import Path

p = Path("/data")

# * - match anything in one directory level
p.glob("*.txt")            # all .txt in /data
p.glob("logs/app*.log")    # app*.log in /data/logs

# ? - match exactly one character
p.glob("data_?.csv")       # data_1.csv, data_A.csv, etc.

# [seq] - match one character from the set
p.glob("report_[0-9].pdf") # report_1.pdf through report_9.pdf

# ** - match any number of directory levels (only in rglob or glob("**"))
p.glob("**/*.py")          # all .py files recursively

# Multiple extensions (not native glob - combine with logic):
py_files = list(p.rglob("*.py")) + list(p.rglob("*.pyx"))

Practical: Batch File Processing

from pathlib import Path
import json

def process_all_configs(config_dir):
    """Load and validate all JSON config files in a directory tree."""
    config_dir = Path(config_dir)
    errors = []
    configs = {}

    for config_file in sorted(config_dir.rglob("*.json")):
        try:
            content = config_file.read_text(encoding="utf-8")
            data = json.loads(content)
            # Use path relative to config_dir as key
            rel_path = config_file.relative_to(config_dir)
            configs[str(rel_path)] = data
            print(f"Loaded: {rel_path}")
        except json.JSONDecodeError as e:
            errors.append((config_file, str(e)))
            print(f"Invalid JSON: {config_file}: {e}")

    if errors:
        print(f"\n{len(errors)} file(s) had errors")

    return configs

configs = process_all_configs("/etc/myapp")

Part 6 - Directory Operations

`mkdir()` - Creating Directories

from pathlib import Path

# Create a single directory (fails if parent missing or already exists)
Path("/tmp/newdir").mkdir()

# Create with all parents (like mkdir -p):
Path("/tmp/deep/nested/dir").mkdir(parents=True, exist_ok=True)
# parents=True  - create all missing parent directories
# exist_ok=True - do not raise if directory already exists

# Without exist_ok - raises if directory exists:
try:
    Path("/tmp").mkdir()
except FileExistsError:
    print("/tmp already exists")

# Practical pattern - always use both flags for idempotency:
output_dir = Path("/data/reports/2024/january")
output_dir.mkdir(parents=True, exist_ok=True)

`iterdir()` - List Directory Contents

from pathlib import Path

d = Path("/usr/bin")

# List all entries (not recursive - use glob for recursive)
for entry in d.iterdir():
    print(entry.name, "dir" if entry.is_dir() else "file")

# Filter: only files
files = [e for e in d.iterdir() if e.is_file()]

# Filter: only directories
subdirs = [e for e in d.iterdir() if e.is_dir()]

# Sorted listing:
for f in sorted(d.iterdir(), key=lambda p: p.name.lower()):
    print(f.name)

`rename()`, `replace()`, and `unlink()`

from pathlib import Path

# Rename a file (same as os.rename)
old = Path("/tmp/old_name.txt")
new = Path("/tmp/new_name.txt")
old.rename(new)

# Atomically replace (same as os.replace - overwrites if target exists)
tmp = Path("/tmp/data.tmp")
target = Path("/data/config.json")
tmp.replace(target)   # atomic on POSIX

# Delete a file
p = Path("/tmp/temp_file.txt")
p.unlink()   # raises FileNotFoundError if missing

# Delete if exists (Python 3.8+):
p.unlink(missing_ok=True)   # no error if file does not exist

# Equivalent old style:
from contextlib import suppress
with suppress(FileNotFoundError):
    p.unlink()

Deleting Directories

from pathlib import Path
import shutil

# Remove empty directory
Path("/tmp/emptydir").rmdir()   # raises if not empty

# Remove directory and all contents (recursive):
shutil.rmtree("/tmp/build_artifacts")   # still requires shutil

# With pathlib (Python 3.12+) - Path.rmdir() only removes empty dirs
# For non-empty, shutil is still needed:
build_dir = Path("/tmp/build")
if build_dir.exists():
    shutil.rmtree(build_dir)

Part 7 - Cross-Platform Paths: PurePath

PurePath classes let you manipulate path strings without any filesystem I/O. They are platform-independent - useful for testing, cross-platform path logic, and server-side path manipulation.

`PurePosixPath` - Unix Path Manipulation Anywhere

from pathlib import PurePosixPath

# Manipulate Unix paths on any platform (including Windows)
p = PurePosixPath("/home/alice/docs/report.pdf")
print(p.parent)   # /home/alice/docs
print(p.name)     # report.pdf
print(p.stem)     # report
print(p.suffix)   # .pdf

# Build Unix paths on Windows:
remote_path = PurePosixPath("/var/www/html") / "assets" / "style.css"
print(remote_path)   # /var/www/html/assets/style.css
print(str(remote_path))   # '/var/www/html/assets/style.css'

`PureWindowsPath` - Windows Path Manipulation Anywhere

from pathlib import PureWindowsPath

# Manipulate Windows paths on Linux (for testing or path translation)
p = PureWindowsPath(r"C:\Users\Alice\Documents\report.docx")
print(p.drive)    # C:
print(p.root)     # \
print(p.parts)    # ('C:\\', 'Users', 'Alice', 'Documents', 'report.docx')
print(p.name)     # report.docx
print(p.parent)   # C:\Users\Alice\Documents

# Windows paths are case-insensitive for comparison:
p1 = PureWindowsPath("C:/Windows/system32/notepad.exe")
p2 = PureWindowsPath("c:/WINDOWS/SYSTEM32/NOTEPAD.EXE")
print(p1 == p2)   # True - case-insensitive

# Posix paths are case-sensitive:
from pathlib import PurePosixPath
p3 = PurePosixPath("/data/File.txt")
p4 = PurePosixPath("/data/file.txt")
print(p3 == p4)   # False

Real-World Use: URL-to-Path Translation

from pathlib import PurePosixPath, Path

def url_path_to_local(url_path, base_dir):
    """
    Translate a URL path to a local filesystem path safely.
    Prevents path traversal attacks (../../etc/passwd).
    """
    # Parse the URL path using PurePosixPath (always Unix-style)
    url_parts = PurePosixPath(url_path)

    # Reject paths with traversal components
    if ".." in url_parts.parts:
        raise ValueError(f"Path traversal detected: {url_path}")

    # Build local path
    local_base = Path(base_dir).resolve()
    local_path = local_base

    for part in url_parts.parts:
        if part == "/":
            continue
        local_path = local_path / part

    # Verify the result is still under base_dir
    resolved = local_path.resolve()
    if not str(resolved).startswith(str(local_base)):
        raise ValueError(f"Path escapes base directory: {local_path}")

    return resolved


# Safe:
p = url_path_to_local("/assets/css/style.css", "/var/www")
print(p)   # /var/www/assets/css/style.css

# Unsafe - raises:
try:
    p = url_path_to_local("/../../etc/passwd", "/var/www")
except ValueError as e:
    print(e)   # Path traversal detected: /../../etc/passwd

Part 8 - Real-World Patterns

Pattern 1: Project Structure Discovery

from pathlib import Path
from typing import Dict, List


def discover_project(root: Path) -> Dict[str, List[Path]]:
    """
    Discover key files and directories in a Python project.
    Returns a dict mapping category → list of paths.
    """
    root = root.resolve()

    return {
        "python_files": sorted(root.rglob("*.py")),
        "test_files": sorted(root.rglob("test_*.py")) +
                      sorted(root.rglob("*_test.py")),
        "config_files": (
            sorted(root.rglob("*.toml")) +
            sorted(root.rglob("*.yaml")) +
            sorted(root.rglob("*.yml")) +
            sorted(root.rglob("*.ini")) +
            sorted(root.rglob("*.cfg"))
        ),
        "requirements": sorted(root.rglob("requirements*.txt")),
        "notebooks": sorted(root.rglob("*.ipynb")),
        "data_files": sorted(root.rglob("*.csv")) + sorted(root.rglob("*.json")),
    }


def print_project_summary(root_path: str):
    root = Path(root_path)
    if not root.is_dir():
        print(f"Not a directory: {root_path}")
        return

    structure = discover_project(root)

    print(f"Project: {root.name}")
    print(f"Location: {root}")
    print()

    for category, files in structure.items():
        if files:
            print(f"{category}: {len(files)} file(s)")
            for f in files[:5]:   # show first 5
                print(f"  {f.relative_to(root)}")
            if len(files) > 5:
                print(f"  ... and {len(files) - 5} more")
            print()


print_project_summary("/Users/alice/projects/myapp")

Pattern 2: Batch File Renaming

from pathlib import Path


def normalize_filenames(directory, dry_run=True):
    """
    Normalize all filenames in a directory:
    - Lowercase
    - Replace spaces with underscores
    - Remove special characters

    dry_run=True: print what would happen without changing files.
    """
    import re
    directory = Path(directory)

    renames = []
    for f in sorted(directory.iterdir()):
        if f.is_dir():
            continue

        stem = f.stem.lower()
        stem = re.sub(r"\s+", "_", stem)          # spaces to underscores
        stem = re.sub(r"[^a-z0-9_.-]", "", stem)  # remove special chars
        stem = re.sub(r"_+", "_", stem)            # collapse multiple underscores
        stem = stem.strip("_")

        new_name = stem + f.suffix.lower()
        new_path = f.parent / new_name

        if new_path != f:
            renames.append((f, new_path))

    if not renames:
        print("No renames needed.")
        return

    for old, new in renames:
        print(f"  {old.name!r}  →  {new.name!r}")

    if not dry_run:
        for old, new in renames:
            old.rename(new)
        print(f"\nRenamed {len(renames)} file(s).")
    else:
        print(f"\nDry run: {len(renames)} rename(s) would occur. Pass dry_run=False to apply.")


normalize_filenames("/data/uploads", dry_run=True)

Pattern 3: Config File Location (XDG Standard)

from pathlib import Path
import os
import sys


def find_config_file(app_name: str, filename: str = "config.toml") -> Path:
    """
    Find a configuration file following platform conventions:
    - Linux/macOS: XDG_CONFIG_HOME (default: ~/.config/APP/filename)
    - Windows:     APPDATA/APP/filename
    - Fallback:    ~/APP/filename

    Returns the Path whether or not the file exists.
    """
    if sys.platform == "win32":
        base = Path(os.environ.get("APPDATA", Path.home()))
    else:
        xdg_config = os.environ.get("XDG_CONFIG_HOME", "")
        if xdg_config:
            base = Path(xdg_config)
        else:
            base = Path.home() / ".config"

    return base / app_name / filename


def load_or_create_config(app_name: str) -> dict:
    """Load config if it exists, or create a default one."""
    import json

    config_path = find_config_file(app_name)

    if config_path.exists():
        return json.loads(config_path.read_text(encoding="utf-8"))

    # Create default config
    default_config = {
        "version": 1,
        "debug": False,
        "log_level": "INFO",
    }
    config_path.parent.mkdir(parents=True, exist_ok=True)
    config_path.write_text(
        json.dumps(default_config, indent=2) + "\n",
        encoding="utf-8"
    )
    print(f"Created default config: {config_path}")
    return default_config


config = load_or_create_config("myapp")
print(config)

Pattern 4: Smart File Organizer

from pathlib import Path
import shutil
from datetime import datetime


def organize_downloads(downloads_dir, target_dir, dry_run=True):
    """
    Organize files in downloads_dir into target_dir/YYYY-MM/extension/
    sorted by modification date.
    """
    downloads = Path(downloads_dir)
    target = Path(target_dir)

    operations = []

    for f in downloads.iterdir():
        if f.is_dir():
            continue

        # Get modification date
        mtime = f.stat().st_mtime
        date = datetime.fromtimestamp(mtime)
        month_dir = f"{date.year}-{date.month:02d}"

        # Get extension (without the dot, lowercase)
        ext = f.suffix.lstrip(".").lower() or "no_extension"

        dest_dir = target / month_dir / ext
        dest_file = dest_dir / f.name

        # Handle naming conflicts
        counter = 1
        while dest_file.exists():
            dest_file = dest_dir / f"{f.stem}_{counter}{f.suffix}"
            counter += 1

        operations.append((f, dest_file))

    print(f"Would organize {len(operations)} files:")
    for src, dst in operations[:10]:
        print(f"  {src.name} → {dst.relative_to(target)}")
    if len(operations) > 10:
        print(f"  ... and {len(operations) - 10} more")

    if not dry_run:
        for src, dst in operations:
            dst.parent.mkdir(parents=True, exist_ok=True)
            shutil.move(str(src), str(dst))
        print(f"\nOrganized {len(operations)} files.")


organize_downloads("/home/alice/Downloads", "/home/alice/Sorted", dry_run=True)

Interview Questions

Q1: Why is `pathlib.Path` preferred over `os.path` string manipulation?

Answer: Several reasons: (1) Readability - Path("/data") / "reports" / "2024.csv" reads like a path; os.path.join("/data", "reports", "2024.csv") reads like a function call. (2) Type safety - the / operator only works when the left side is a Path; you cannot accidentally concatenate two strings. (3) Methods over global functions - attributes and methods like .stem, .suffix, .parent, .exists(), .is_file() are on the object itself, not spread across os.path. (4) Cross-platform - Path uses \ on Windows and / on Unix automatically; string paths require manual handling. (5) Richer API - .read_text(), .write_text(), .mkdir(parents=True, exist_ok=True), .rglob() are on the object; os.path requires open(), os.makedirs(), glob.glob() separately.

Q2: What is the difference between `.resolve()` and `.absolute()`?

Answer: Both convert a relative path to an absolute path. The key difference is symlink handling: .resolve() resolves all symlinks, .. components, and . components - it returns the canonical path as the filesystem sees it. If the path does not exist, Python 3.6+ resolve() still attempts to resolve what it can. .absolute() (available since Python 3.11 as stable API; earlier it was documented as provisional) makes a path absolute relative to the current directory but does NOT resolve symlinks or normalize ... For most use cases, .resolve() is what you want - it gives you the definitive path. Use .absolute() when you want to avoid touching the filesystem during path resolution.

Q3: What is the difference between `Path.glob()` and `Path.rglob()`?

Answer: .glob(pattern) searches only within the immediate directory, with optional subdirectory matching using **/ in the pattern. For example, path.glob("*.py") finds .py files only in path, and path.glob("**/*.py") finds them recursively. .rglob(pattern) is shorthand for path.glob("**/" + pattern) - it always searches recursively through all subdirectories. Both return lazy iterators (generators), so they do not load all results into memory at once. Use .rglob("*.py") for recursive searches, .glob("*.py") for single-directory searches.

Q4: How does `pathlib` handle cross-platform path manipulation? What is the difference between `Path`, `PurePath`, `PurePosixPath`, and `PureWindowsPath`?

Answer: Path is the concrete class for actual filesystem operations - it automatically becomes PosixPath on Linux/macOS or WindowsPath on Windows. PurePath classes do path string manipulation without any filesystem access. PurePosixPath always uses Unix path semantics (forward slashes, case-sensitive) regardless of the OS you run on. PureWindowsPath always uses Windows path semantics (backslashes, drive letters, case-insensitive) regardless of OS. Use PurePosixPath when you need to manipulate server-side Unix paths on a Windows dev machine, parse URL paths, or test path logic without touching the filesystem. Use PureWindowsPath to manipulate Windows UNC paths or share paths on a Linux server.

Q5: How do you safely read a large directory tree without running out of memory?

Answer: Use .rglob() or .glob() as lazy iterators - do not wrap them in list() unless you need random access to all paths. The iterator yields one path at a time:

for py_file in project_dir.rglob("*.py"):
    analyze(py_file)   # only one Path in memory at a time

This works for directories with millions of files. If you need sorting (which requires all paths), be aware that sorted(path.rglob("*.py")) will load all paths into memory first. For very large trees, process in streaming fashion and avoid sorting unless necessary.

Q6: You have a list of file paths as strings from a configuration file. How would you safely join them with a base directory using pathlib?

Answer:

from pathlib import Path

base = Path("/data/projects")
user_paths = ["../../../etc/passwd", "reports/2024.csv", "config.json"]

safe_paths = []
for user_path in user_paths:
    # Construct the full path
    full = (base / user_path).resolve()

    # Verify it is still under base (path traversal protection)
    base_resolved = base.resolve()
    try:
        full.relative_to(base_resolved)
        safe_paths.append(full)
    except ValueError:
        print(f"Security: rejected path traversal attempt: {user_path!r}")

# ../../../etc/passwd is rejected, reports/2024.csv and config.json are accepted

The key is .resolve() to canonicalize the path, then .relative_to() to verify the result is still under the expected base directory. relative_to() raises ValueError if the path is not under the base, making it a clean check for path traversal attacks.

Practice Challenges

Beginner - Project File Counter

Write a function count_files(directory) that returns a dictionary mapping file extension to count of files with that extension in the directory tree.

# Example:
# count_files("/project") → {'.py': 42, '.json': 8, '.md': 12, '': 3}

Solution

from pathlib import Path
from collections import defaultdict


def count_files(directory):
    """
    Count files by extension in a directory tree.
    Returns a dict mapping extension (including dot) to count.
    Files without extension use empty string as key.
    """
    directory = Path(directory)
    if not directory.is_dir():
        raise NotADirectoryError(f"Not a directory: {directory}")

    counts = defaultdict(int)
    for path in directory.rglob("*"):
        if path.is_file():
            counts[path.suffix] += 1

    # Return as regular dict, sorted by count descending
    return dict(sorted(counts.items(), key=lambda item: -item[1]))


# Test
import tempfile, os

with tempfile.TemporaryDirectory() as tmpdir:
    root = Path(tmpdir)

    # Create test structure
    (root / "src").mkdir()
    (root / "src" / "models").mkdir()
    (root / "tests").mkdir()
    (root / "docs").mkdir()

    for i in range(5):
        (root / "src" / f"module_{i}.py").write_text("# code", encoding="utf-8")
    (root / "src" / "models" / "user.py").write_text("# model", encoding="utf-8")
    for i in range(3):
        (root / "tests" / f"test_{i}.py").write_text("# test", encoding="utf-8")
    (root / "docs" / "readme.md").write_text("# docs", encoding="utf-8")
    (root / "config.json").write_text("{}", encoding="utf-8")
    (root / "Makefile").write_text("all:", encoding="utf-8")  # no extension

    result = count_files(tmpdir)
    print(result)
    # {'.py': 9, '.md': 1, '.json': 1, '': 1}

    for ext, count in result.items():
        label = ext if ext else "(no extension)"
        print(f"  {label}: {count}")

Intermediate - Find Duplicate Files

Write a function find_duplicates(directory) that returns a dictionary mapping file content hash to a list of paths that have identical content.

Solution

from pathlib import Path
from collections import defaultdict
import hashlib


def file_hash(path: Path, chunk_size: int = 65536) -> str:
    """Compute SHA-256 hash of a file without loading it all into memory."""
    hasher = hashlib.sha256()
    with open(path, "rb") as f:
        while True:
            chunk = f.read(chunk_size)
            if not chunk:
                break
            hasher.update(chunk)
    return hasher.hexdigest()


def find_duplicates(directory):
    """
    Find duplicate files in a directory tree.
    Returns a dict: {hash: [Path, Path, ...]} for hashes with 2+ files.
    Skips files that cannot be read.
    """
    directory = Path(directory)
    hash_to_paths = defaultdict(list)

    # First pass: group files by size (cheap filter - different sizes can't be equal)
    size_to_paths = defaultdict(list)
    for path in directory.rglob("*"):
        if not path.is_file():
            continue
        try:
            size = path.stat().st_size
            size_to_paths[size].append(path)
        except OSError:
            continue

    # Second pass: hash only files that share a size
    for size, paths in size_to_paths.items():
        if len(paths) < 2:
            continue   # unique size = definitely unique content
        for path in paths:
            try:
                h = file_hash(path)
                hash_to_paths[h].append(path)
            except (OSError, PermissionError):
                continue

    # Return only hashes with duplicates
    return {h: paths for h, paths in hash_to_paths.items() if len(paths) > 1}


# Test
import tempfile, os

with tempfile.TemporaryDirectory() as tmpdir:
    root = Path(tmpdir)
    (root / "a").mkdir()
    (root / "b").mkdir()

    # Create some duplicates
    content_1 = b"Hello, World!"
    content_2 = b"Different content"

    (root / "a" / "file1.txt").write_bytes(content_1)
    (root / "b" / "file2.txt").write_bytes(content_1)   # duplicate of file1
    (root / "a" / "file3.txt").write_bytes(content_2)
    (root / "b" / "file4.txt").write_bytes(b"Unique content")

    dupes = find_duplicates(tmpdir)

    if dupes:
        print(f"Found {len(dupes)} group(s) of duplicate files:")
        for h, paths in dupes.items():
            print(f"\n  Hash: {h[:16]}...")
            for p in paths:
                print(f"    {p.relative_to(root)}  ({p.stat().st_size} bytes)")
    else:
        print("No duplicates found.")

# Found 1 group(s) of duplicate files:
#
#   Hash: dffd6021bb2bd5b0...
#     a/file1.txt  (13 bytes)
#     b/file2.txt  (13 bytes)

Advanced - File Watcher with Change Detection

Build a FileWatcher class that monitors a directory for changes (new files, deleted files, modified files) between two snapshots, using pathlib for all filesystem operations.

Solution

from pathlib import Path
from dataclasses import dataclass, field
from typing import Dict, Set, List
import hashlib
import time


@dataclass
class FileSnapshot:
    """A snapshot of a directory's file state."""
    root: Path
    files: Dict[str, tuple] = field(default_factory=dict)
    # {relative_path_str: (size, mtime, hash_first_8kb)}

    @classmethod
    def take(cls, directory: Path, follow_symlinks: bool = False):
        """Take a snapshot of all files in a directory tree."""
        snap = cls(root=directory.resolve())
        for path in directory.rglob("*"):
            if not path.is_file():
                continue
            try:
                stat = path.stat()
                # Read first 8KB for a quick content fingerprint
                first_bytes = path.read_bytes()[:8192]
                quick_hash = hashlib.md5(first_bytes).hexdigest()
                rel = str(path.relative_to(directory))
                snap.files[rel] = (stat.st_size, stat.st_mtime, quick_hash)
            except (OSError, PermissionError):
                continue
        return snap


@dataclass
class DirectoryDiff:
    """The differences between two directory snapshots."""
    added: List[str] = field(default_factory=list)
    deleted: List[str] = field(default_factory=list)
    modified: List[str] = field(default_factory=list)

    @property
    def has_changes(self):
        return bool(self.added or self.deleted or self.modified)

    def report(self):
        if not self.has_changes:
            print("No changes detected.")
            return
        if self.added:
            print(f"Added ({len(self.added)}):")
            for f in sorted(self.added):
                print(f"  + {f}")
        if self.deleted:
            print(f"Deleted ({len(self.deleted)}):")
            for f in sorted(self.deleted):
                print(f"  - {f}")
        if self.modified:
            print(f"Modified ({len(self.modified)}):")
            for f in sorted(self.modified):
                print(f"  ~ {f}")


def diff_snapshots(before: FileSnapshot, after: FileSnapshot) -> DirectoryDiff:
    """Compare two snapshots and return the differences."""
    before_set = set(before.files.keys())
    after_set = set(after.files.keys())

    added = sorted(after_set - before_set)
    deleted = sorted(before_set - after_set)
    modified = sorted(
        f for f in before_set & after_set
        if before.files[f] != after.files[f]
    )

    return DirectoryDiff(added=added, deleted=deleted, modified=modified)


class FileWatcher:
    """Watch a directory for changes between poll intervals."""

    def __init__(self, directory):
        self.directory = Path(directory).resolve()
        self._last_snapshot = None

    def reset(self):
        """Take the initial snapshot."""
        self._last_snapshot = FileSnapshot.take(self.directory)
        print(f"Watching {self.directory} ({len(self._last_snapshot.files)} files)")
        return self

    def poll(self) -> DirectoryDiff:
        """Check for changes since last snapshot."""
        if self._last_snapshot is None:
            raise RuntimeError("Call reset() before poll()")
        new_snapshot = FileSnapshot.take(self.directory)
        diff = diff_snapshots(self._last_snapshot, new_snapshot)
        self._last_snapshot = new_snapshot
        return diff


# Demo
import tempfile, os

with tempfile.TemporaryDirectory() as tmpdir:
    root = Path(tmpdir)
    (root / "src").mkdir()

    # Create initial files
    (root / "src" / "main.py").write_text("print('hello')", encoding="utf-8")
    (root / "src" / "utils.py").write_text("def add(a, b): return a+b", encoding="utf-8")
    (root / "README.md").write_text("# My Project", encoding="utf-8")

    # Take initial snapshot
    watcher = FileWatcher(tmpdir).reset()

    # Simulate changes
    time.sleep(0.01)
    (root / "src" / "main.py").write_text("print('changed')", encoding="utf-8")  # modify
    (root / "src" / "new_module.py").write_text("# new", encoding="utf-8")        # add
    (root / "README.md").unlink()                                                   # delete

    # Poll for changes
    diff = watcher.poll()
    diff.report()

# Watching /tmp/.../  (3 files)
# Added (1):
#   + src/new_module.py
# Deleted (1):
#   - README.md
# Modified (1):
#   ~ src/main.py

Quick Reference

Operation	pathlib	os.path equivalent
Create path	`Path("/data/file.txt")`	`"/data/file.txt"`
Join paths	`Path("/data") / "file.txt"`	`os.path.join("/data", "file.txt")`
Get filename	`p.name`	`os.path.basename(p)`
Get stem	`p.stem`	`os.path.splitext(os.path.basename(p))[0]`
Get extension	`p.suffix`	`os.path.splitext(p)[1]`
Get parent	`p.parent`	`os.path.dirname(p)`
Make absolute	`p.resolve()`	`os.path.abspath(p)`
Check exists	`p.exists()`	`os.path.exists(p)`
Check is file	`p.is_file()`	`os.path.isfile(p)`
Check is dir	`p.is_dir()`	`os.path.isdir(p)`
File size	`p.stat().st_size`	`os.path.getsize(p)`
Read text	`p.read_text(encoding="utf-8")`	`open(p).read()`
Write text	`p.write_text(s, encoding="utf-8")`	`open(p, 'w').write(s)`
Create dir	`p.mkdir(parents=True, exist_ok=True)`	`os.makedirs(p, exist_ok=True)`
List dir	`p.iterdir()`	`os.listdir(p)`
Find files	`p.glob("*.py")`	`glob.glob(str(p / "*.py"))`
Find recursive	`p.rglob("*.py")`	`glob.glob(str(p / "*/.py"), recursive=True)`
Delete file	`p.unlink(missing_ok=True)`	`os.remove(p)` + try/except
Rename	`p.rename(new)`	`os.rename(p, new)`
Change suffix	`p.with_suffix(".txt")`	`os.path.splitext(p)[0] + ".txt"`
Home directory	`Path.home()`	`os.path.expanduser("~")`
Current dir	`Path.cwd()`	`os.getcwd()`

Key Takeaways

pathlib.Path treats filesystem paths as objects with attributes and methods, not as raw strings. This eliminates entire classes of bugs from string concatenation and makes path code more readable.
The / operator creates new Path objects by composing segments - it is type-safe, readable, and cross-platform (uses the correct separator automatically).
Path attributes (.name, .stem, .suffix, .parent, .parts) give you immediate access to components without function calls. Modification methods (.with_name(), .with_suffix(), .with_stem()) return new Path objects.
.glob() and .rglob() return lazy generators - they scale to directories with millions of files without loading all paths into memory. Always use for f in path.rglob("*.py"): rather than list(path.rglob(...)) unless you need sorting or counting.
Path.read_text() and Path.write_text() are convenient for small files. For large files, use open(path, ...) with iteration.
PurePosixPath and PureWindowsPath let you manipulate path strings with platform-specific semantics without any filesystem I/O - useful for testing, cross-platform code, and server-side path manipulation.
Always call .resolve() when accepting paths from user input, then verify with .relative_to(base) to prevent path traversal attacks.

What You Will Learn
Prerequisites
Mental Model: Path as an Object, Not a String
Part 1 - Creating Path Objects
Part 2 - Path Attributes
Part 3 - Path Inspection Methods
Part 4 - Reading and Writing with Path
Part 5 - Globbing: Finding Files by Pattern
Part 6 - Directory Operations
Part 7 - Cross-Platform Paths: PurePath
Part 8 - Real-World Patterns
Interview Questions
Practice Challenges
Quick Reference
Key Takeaways

What You Will Learn​

Prerequisites​

Mental Model: Path as an Object, Not a String​

Part 1 - Creating Path Objects​

Basic Construction​

The / Operator - Type-Safe Path Composition​

Path from Parts​

Part 2 - Path Attributes​

.name, .stem, .suffix, .suffixes​

.parent and .parents​

.parts​

Modifying Path Components​

Part 3 - Path Inspection Methods​

Existence and Type Checks​

.stat() - File Metadata​

.resolve() and .absolute()​

Part 4 - Reading and Writing with Path​

read_text() and write_text()​

read_bytes() and write_bytes()​

Using Path with open()​

Part 5 - Globbing: Finding Files by Pattern​

.glob() - Match in a Directory​

.rglob() - Recursive Glob​

Glob Patterns Reference​

Practical: Batch File Processing​

Part 6 - Directory Operations​

mkdir() - Creating Directories​

iterdir() - List Directory Contents​

rename(), replace(), and unlink()​

Deleting Directories​

Part 7 - Cross-Platform Paths: PurePath​

PurePosixPath - Unix Path Manipulation Anywhere​

PureWindowsPath - Windows Path Manipulation Anywhere​

Real-World Use: URL-to-Path Translation​

Part 8 - Real-World Patterns​

Pattern 1: Project Structure Discovery​

Pattern 2: Batch File Renaming​

Pattern 3: Config File Location (XDG Standard)​

Pattern 4: Smart File Organizer​

Interview Questions​

Q1: Why is pathlib.Path preferred over os.path string manipulation?​

Q2: What is the difference between .resolve() and .absolute()?​

Q3: What is the difference between Path.glob() and Path.rglob()?​

Q4: How does pathlib handle cross-platform path manipulation? What is the difference between Path, PurePath, PurePosixPath, and PureWindowsPath?​

Q5: How do you safely read a large directory tree without running out of memory?​

Q6: You have a list of file paths as strings from a configuration file. How would you safely join them with a base directory using pathlib?​

Practice Challenges​

Beginner - Project File Counter​

Intermediate - Find Duplicate Files​

Advanced - File Watcher with Change Detection​

Quick Reference​

Key Takeaways​

What You Will Learn

Prerequisites

Mental Model: Path as an Object, Not a String

Part 1 - Creating Path Objects

Basic Construction

The `/` Operator - Type-Safe Path Composition

Path from Parts

Part 2 - Path Attributes

`.name`, `.stem`, `.suffix`, `.suffixes`

`.parent` and `.parents`

`.parts`

Modifying Path Components

Part 3 - Path Inspection Methods

Existence and Type Checks

`.stat()` - File Metadata

`.resolve()` and `.absolute()`

Part 4 - Reading and Writing with `Path`

`read_text()` and `write_text()`

`read_bytes()` and `write_bytes()`

Using `Path` with `open()`

Part 5 - Globbing: Finding Files by Pattern

`.glob()` - Match in a Directory

`.rglob()` - Recursive Glob

Glob Patterns Reference

Practical: Batch File Processing

Part 6 - Directory Operations

`mkdir()` - Creating Directories

`iterdir()` - List Directory Contents

`rename()`, `replace()`, and `unlink()`

Deleting Directories

Part 7 - Cross-Platform Paths: PurePath

`PurePosixPath` - Unix Path Manipulation Anywhere

`PureWindowsPath` - Windows Path Manipulation Anywhere

Real-World Use: URL-to-Path Translation

Part 8 - Real-World Patterns

Pattern 1: Project Structure Discovery

Pattern 2: Batch File Renaming

Pattern 3: Config File Location (XDG Standard)

Pattern 4: Smart File Organizer

Interview Questions

Q1: Why is `pathlib.Path` preferred over `os.path` string manipulation?

Q2: What is the difference between `.resolve()` and `.absolute()`?

Q3: What is the difference between `Path.glob()` and `Path.rglob()`?

Q4: How does `pathlib` handle cross-platform path manipulation? What is the difference between `Path`, `PurePath`, `PurePosixPath`, and `PureWindowsPath`?

Q5: How do you safely read a large directory tree without running out of memory?

Q6: You have a list of file paths as strings from a configuration file. How would you safely join them with a base directory using pathlib?

Practice Challenges

Beginner - Project File Counter

Intermediate - Find Duplicate Files

Advanced - File Watcher with Change Detection

Quick Reference

Key Takeaways