The `os` Module - System Calls and Process Interaction

Reading time: ~18 minutes | Level: Foundation → Engineering

Here is a behavior that surprises most Python developers:

import os

os.environ["MY_SECRET"] = "hunter2"

import subprocess
result = subprocess.run(
    ["python3", "-c", "import os; print(os.environ.get('MY_SECRET'))"],
    capture_output=True, text=True
)
print(result.stdout.strip())  # hunter2

Setting os.environ["MY_SECRET"] in your process mutates the environment for that process and all child processes it spawns - including subprocesses. This is how configuration leaks and secret exposure happen in production.

The os module is the thin wrapper between your Python code and the operating system kernel. Understanding it means understanding how processes, files, permissions, and system resources actually work - not just how Python abstracts them.

What You Will Learn

The architectural difference between os, pathlib, and shutil - and when to use each
How os.path works and why pathlib replaces most of it in modern code
Directory listing with os.listdir() and os.scandir() - and why scandir is dramatically faster
Recursive traversal with os.walk() and how topdown controls the traversal order
File metadata with os.stat() - permissions, size, timestamps
Changing file permissions with os.chmod() and reading them with stat.S_IMODE
Process identity with os.getpid() and os.getppid()
Why os.system() is dangerous and how subprocess.run() replaces it safely
Cryptographically secure random bytes from os.urandom()

Prerequisites

Familiarity with Python file I/O (open(), read(), write())
Understanding of Python strings and f-strings
Basic knowledge of what a file system is (files, directories, paths)
Having completed the pathlib module (topic 04) is helpful but not required

The Big Picture: `os` vs `pathlib` vs `shutil`

These three modules are often confused. Here is when to use each:

Module	Use for	Key APIs
`pathlib`	Path manipulation, file read/write, directory creation, glob patterns	`Path("/a/b/c")`, `p.exists()`, `p.read_text()`, `p.glob("*.py")`, `p.stat()`
`os`	Process/system info, permissions, environment vars, walking trees	`os.getpid()`, `os.environ`, `os.walk()`, `os.chmod()`, `os.urandom()`, `os.cpu_count()`
`shutil`	Copy/move/delete, high-level FS ops, archive handling, finding executables	`shutil.copy()`, `shutil.move()`, `shutil.rmtree()`, `shutil.which()`, `shutil.disk_usage()`

:::tip Rule of Thumb For path manipulation, prefer pathlib. For system-level operations (process info, permissions, environment, random bytes), use os. For copying, moving, and deleting directory trees, use shutil. :::

Part 1 - `os.path`: The Classic Path Toolkit

os.path provides string-based path manipulation. It predates pathlib by decades. In Python 3.4+, pathlib is preferred for path manipulation - but os.path is still everywhere in existing codebases, so you must know it.

import os

path = "/home/alice/projects/myapp/config.yaml"

# Core os.path operations
print(os.path.basename(path))          # config.yaml
print(os.path.dirname(path))           # /home/alice/projects/myapp
print(os.path.splitext(path))          # ('/home/alice/projects/myapp/config', '.yaml')
print(os.path.split(path))             # ('/home/alice/projects/myapp', 'config.yaml')

# Building paths safely (handles OS-specific separators)
joined = os.path.join("/home/alice", "projects", "myapp", "config.yaml")
print(joined)                          # /home/alice/projects/myapp/config.yaml

# Checking path properties
print(os.path.exists(path))            # True or False depending on disk
print(os.path.isfile(path))            # True if it's a file
print(os.path.isdir(path))             # True if it's a directory
print(os.path.isabs(path))             # True - path starts with /
print(os.path.abspath("config.yaml"))  # /current/working/dir/config.yaml

The `pathlib` Equivalents

Every os.path operation has a pathlib equivalent. The pathlib version is more readable because you compose operations with attribute access instead of function calls:

from pathlib import Path
import os

path_str = "/home/alice/projects/myapp/config.yaml"
path = Path(path_str)

# os.path         →   pathlib
os.path.basename(path_str)    # config.yaml
path.name                     # config.yaml  ← cleaner

os.path.dirname(path_str)     # /home/alice/projects/myapp
path.parent                   # PosixPath('/home/alice/projects/myapp')

os.path.splitext(path_str)    # ('.../config', '.yaml')
path.stem, path.suffix        # 'config', '.yaml'

os.path.exists(path_str)      # True/False
path.exists()                 # True/False

:::note When os.path Still Makes Sense os.path is still useful when you are working with code that passes around plain strings, integrating with legacy APIs that only accept strings, or writing library code that must work without importing pathlib. :::

Part 2 - Current Working Directory

import os

# Get the current working directory
cwd = os.getcwd()
print(cwd)  # /Users/alice/projects/myapp

# Change the working directory
os.chdir("/tmp")
print(os.getcwd())  # /tmp

# Change back
os.chdir(cwd)
print(os.getcwd())  # /Users/alice/projects/myapp

:::danger os.chdir is a Code Smell os.chdir() mutates the process-wide working directory. If any other thread or code calls os.getcwd() after your chdir, it sees the new directory. This causes hard-to-debug race conditions in multithreaded applications.

The correct pattern is to build absolute paths with os.path.join(base, filename) or Path(base) / filename rather than changing directories. Reserve os.chdir() for short scripts where you control the entire process. :::

The Safe Pattern

import os
from pathlib import Path

# BAD: changing the global working directory
def process_files(directory):
    os.chdir(directory)           # global mutation - dangerous
    for f in os.listdir("."):
        process(f)

# GOOD: build absolute paths, never change directory
def process_files(directory):
    base = Path(directory).resolve()
    for f in base.iterdir():
        process(f)                # f is an absolute Path - safe

Part 3 - Listing Directory Contents

`os.listdir()`: Simple but Dumb

import os

entries = os.listdir("/tmp")
print(entries)
# ['file1.txt', 'file2.log', 'subdir', '.hidden']
# Returns: list of strings, names only, no metadata

os.listdir() returns a plain list of names. To get file type or size, you must make a separate os.stat() call for each entry - which means one system call per file.

`os.scandir()`: Faster and Smarter

import os

# scandir returns DirEntry objects - already have type and stat info
with os.scandir("/tmp") as entries:
    for entry in entries:
        print(f"{entry.name:30} is_file={entry.is_file()} is_dir={entry.is_dir()}")
        # file1.txt                      is_file=True  is_dir=False
        # subdir                         is_file=False is_dir=True

os.scandir() returns DirEntry objects that cache the file type information from the OS directory listing. On most filesystems, this means zero extra system calls to determine is_file() and is_dir().

For a directory with 1000 files:

os.listdir() + os.path.isfile() per file: 1 readdir syscall + 1000 stat syscalls = 1001 total syscalls
os.scandir(): 1 readdir syscall (DirEntry caches d_type from dirent struct) = 1 total syscall on most Linux filesystems

scandir is up to 20x faster on large directories.

Practical `os.scandir()` Usage

import os

def list_python_files(directory):
    """List all Python files in a directory (non-recursive)."""
    py_files = []
    with os.scandir(directory) as entries:
        for entry in entries:
            if entry.is_file() and entry.name.endswith(".py"):
                stat = entry.stat()
                py_files.append({
                    "name": entry.name,
                    "path": entry.path,      # Full absolute path
                    "size": stat.st_size,
                    "modified": stat.st_mtime,
                })
    return sorted(py_files, key=lambda x: x["name"])

# Usage
files = list_python_files("/Users/alice/myproject")
for f in files:
    print(f"{f['name']:30} {f['size']:8} bytes")

:::tip DirEntry Attributes A DirEntry object has: name (filename), path (full path), is_file(), is_dir(), is_symlink(), and stat(). The stat() call may use a cached result on Windows or follow a symlink - check stat(follow_symlinks=False) for symlink metadata. :::

Part 4 - Recursive Directory Traversal with `os.walk()`

os.walk() is one of the most useful functions in Python's standard library. It generates (dirpath, dirnames, filenames) tuples for every directory in a tree.

import os

# Basic traversal
for dirpath, dirnames, filenames in os.walk("/Users/alice/projects"):
    print(f"DIR:   {dirpath}")
    for fname in filenames:
        print(f"  FILE: {os.path.join(dirpath, fname)}")

How `os.walk()` Works Internally

/project/
├── main.py
├── config.yaml
└── src/
    ├── models.py
    └── utils/
        └── helpers.py

topdown=True (default) yields root-first:

("/project", ["src"], ["main.py", "config.yaml"])
("/project/src", ["utils"], ["models.py"])
("/project/src/utils", [], ["helpers.py"])

topdown=False yields deepest-first:

("/project/src/utils", [], ["helpers.py"])
("/project/src", ["utils"], ["models.py"])
("/project", ["src"], ["main.py", "config.yaml"])

Controlling Traversal: Pruning Subdirectories

With topdown=True, you can modify dirnames in-place to skip directories:

import os

def find_python_files(root, skip_dirs=None):
    """
    Recursively find all .py files, skipping specified directories.
    Modifying dirnames in-place prunes the traversal - no wasted work.
    """
    skip_dirs = skip_dirs or {".git", "__pycache__", ".venv", "node_modules"}
    py_files = []

    for dirpath, dirnames, filenames in os.walk(root, topdown=True):
        # Prune: remove directories we don't want to descend into
        # Must modify in-place (slice assignment), not reassign
        dirnames[:] = [d for d in dirnames if d not in skip_dirs]

        for fname in filenames:
            if fname.endswith(".py"):
                full_path = os.path.join(dirpath, fname)
                py_files.append(full_path)

    return py_files

# Find all Python files in a project
files = find_python_files("/Users/alice/projects/myapp")
for f in files:
    print(f)
# /Users/alice/projects/myapp/main.py
# /Users/alice/projects/myapp/src/models.py
# /Users/alice/projects/myapp/src/utils/helpers.py

:::warning Modifying dirnames In-Place Use dirnames[:] = [...] (slice assignment), not dirnames = [...] (rebinding). Slice assignment modifies the original list object that os.walk holds a reference to. Rebinding creates a new list and leaves the original untouched - so os.walk still descends into all directories. :::

`topdown=False`: When You Need to Delete Directories

topdown=False yields deepest directories first. This is the correct mode for deleting directory trees - you must delete files before deleting their parent directory:

import os

def delete_empty_directories(root):
    """Remove all empty directories in a tree (bottom-up)."""
    for dirpath, dirnames, filenames in os.walk(root, topdown=False):
        if not dirnames and not filenames:
            try:
                os.rmdir(dirpath)
                print(f"Removed empty dir: {dirpath}")
            except OSError as e:
                print(f"Could not remove {dirpath}: {e}")

Part 5 - File Metadata and Permissions

`os.stat()`: Everything About a File

import os
import stat
import time

info = os.stat("/etc/hosts")

print(f"Size:      {info.st_size} bytes")
print(f"Mode:      {oct(info.st_mode)}")         # e.g., 0o100644
print(f"UID:       {info.st_uid}")               # owner user ID
print(f"GID:       {info.st_gid}")               # owner group ID
print(f"Modified:  {time.ctime(info.st_mtime)}") # last modification time
print(f"Accessed:  {time.ctime(info.st_atime)}") # last access time
print(f"Changed:   {time.ctime(info.st_ctime)}") # metadata change time

Understanding Unix File Permissions

The st_mode octal 0o 1 0 0 6 4 4 breaks down as: file type (10=regular, 04=directory, 012=symlink) · special bits (setuid/setgid/sticky) · user perms · group perms · other perms.

Permission bit values: 4 = read (r) · 2 = write (w) · 1 = execute (x) · 6 = rw- · 7 = rwx

Octal	Symbolic	Meaning
`0o644`	`rw-r--r--`	Owner rw, group r, other r - typical file
`0o755`	`rwxr-xr-x`	Owner rwx, group/other rx - executable/directory
`0o700`	`rwx------`	Owner only, private - SSH keys

`os.chmod()`: Changing Permissions

import os
import stat

# Make a file executable
os.chmod("deploy.sh", 0o755)

# Make a private key file owner-read-only
os.chmod("id_rsa", 0o600)

# Using stat constants (more readable)
os.chmod("script.py", stat.S_IRUSR | stat.S_IWUSR | stat.S_IXUSR)
# stat.S_IRUSR = 0o400 (owner read)
# stat.S_IWUSR = 0o200 (owner write)
# stat.S_IXUSR = 0o100 (owner execute)
# Combined: 0o700

# Check current permissions
info = os.stat("deploy.sh")
permissions = stat.S_IMODE(info.st_mode)  # Extract permission bits only
print(oct(permissions))  # 0o755
print(bool(permissions & stat.S_IXUSR))   # True - owner can execute

Practical: Audit Files With Insecure Permissions

import os
import stat

def find_world_writable(directory):
    """Find files that are writable by anyone - a security risk."""
    risky_files = []
    for dirpath, dirnames, filenames in os.walk(directory):
        dirnames[:] = [d for d in dirnames if not d.startswith(".")]
        for fname in filenames:
            fpath = os.path.join(dirpath, fname)
            try:
                mode = os.stat(fpath).st_mode
                if mode & stat.S_IWOTH:  # World-writable bit
                    risky_files.append(fpath)
            except PermissionError:
                pass
    return risky_files

# Usage
risky = find_world_writable("/var/www/html")
for path in risky:
    print(f"RISKY: {path}")

Part 6 - File System Operations

Creating Directories

import os

# Create a single directory
os.mkdir("/tmp/mydir")           # Fails if parent doesn't exist

# Create nested directories (like mkdir -p)
os.makedirs("/tmp/a/b/c")        # Creates all intermediate dirs
os.makedirs("/tmp/a/b/c", exist_ok=True)  # No error if already exists

:::tip Always Use exist_ok=True In production code, always use os.makedirs(path, exist_ok=True). Without it, you get a FileExistsError if another process or thread creates the directory between your check and your creation - a classic TOCTOU (time-of-check-time-of-use) race condition. :::

Renaming and Moving Files

import os

# Rename/move within same filesystem - atomic on POSIX
os.rename("/tmp/old_name.txt", "/tmp/new_name.txt")

# For cross-filesystem moves, use shutil.move() instead
import shutil
shutil.move("/tmp/file.txt", "/mnt/storage/file.txt")

Removing Files and Directories

import os

os.remove("file.txt")           # Remove a file (raises if directory)
os.unlink("file.txt")           # Alias for os.remove

os.rmdir("empty_dir")           # Remove EMPTY directory only

# For non-empty directories:
import shutil
shutil.rmtree("non_empty_dir")  # USE WITH CAUTION - no recycle bin

:::danger shutil.rmtree is Permanent shutil.rmtree() deletes the directory and all its contents permanently - there is no recycle bin or undo. Always double-check the path. A common catastrophic bug: shutil.rmtree(base_dir + suffix) where suffix is empty and base_dir is /. Test with dry runs in production code. :::

Part 7 - Process Information

import os

# Current process ID
pid = os.getpid()
print(f"This process ID: {pid}")    # e.g., 12345

# Parent process ID
ppid = os.getppid()
print(f"Parent process ID: {ppid}") # e.g., 12300

# System info
cpus = os.cpu_count()
print(f"CPU cores: {cpus}")         # e.g., 8

# System load average (Unix only - not available on Windows)
try:
    load = os.getloadavg()
    print(f"Load avg (1m, 5m, 15m): {load}")  # e.g., (1.5, 1.2, 0.9)
except AttributeError:
    print("getloadavg not available on this platform")

Why Process IDs Matter

import os

# Writing PID files (used by daemons to prevent duplicate instances)
pid_file = "/var/run/myapp.pid"

def write_pid_file():
    with open(pid_file, "w") as f:
        f.write(str(os.getpid()))

def check_running():
    try:
        with open(pid_file) as f:
            old_pid = int(f.read().strip())
        # Check if process is still running
        os.kill(old_pid, 0)   # Signal 0 = check existence, don't kill
        return True           # Process exists
    except (FileNotFoundError, ProcessLookupError):
        return False
    except PermissionError:
        return True           # Process exists but we can't signal it

# Common in web servers, background workers, schedulers

Part 8 - Environment Variables

import os

# Read environment variables
path = os.environ["PATH"]          # KeyError if missing
home = os.environ.get("HOME")      # None if missing
port = os.environ.get("PORT", "8080")  # Default value

# Set environment variable (affects current process and future child processes)
os.environ["MY_APP_MODE"] = "production"

# Delete an environment variable
del os.environ["TEMP_VAR"]
# or
os.environ.pop("TEMP_VAR", None)   # Safe - no error if missing

# Get all environment variables as a dict
env_dict = dict(os.environ)
for key, value in sorted(env_dict.items()):
    print(f"{key}={value}")

:::note Full Coverage in Next Topic Environment variables have their own dedicated topic (06-Environment-Variables) covering the 12-factor app pattern, python-dotenv, Pydantic Settings, and security practices. This section covers just the os module mechanics. :::

Part 9 - `os.urandom()`: Cryptographically Secure Random Bytes

import os

# Generate 16 bytes of cryptographically secure random data
random_bytes = os.urandom(16)
print(random_bytes)               # b'\x8f\xc3\xb2...' (16 random bytes)
print(len(random_bytes))          # 16

# Generate a secure token (common for session IDs, CSRF tokens)
import secrets  # Python 3.6+ preferred API wrapping os.urandom
token = secrets.token_hex(32)     # 64-character hex string
print(token)                      # e.g., "a3f8c9d1e2..."

api_key = secrets.token_urlsafe(32)  # URL-safe base64
print(api_key)                       # e.g., "wI4Qp8..."

:::note os.urandom vs random os.urandom() reads from the OS cryptographically secure random number generator (/dev/urandom on Unix, CryptGenRandom on Windows). The random module is not cryptographically secure - never use random to generate passwords, tokens, or keys. Use os.urandom() directly or the secrets module (which wraps it with a friendlier API). :::

Part 10 - `os.system()` vs `subprocess.run()`

os.system() is one of those functions that exists in Python and should essentially never be used in production code.

Why `os.system()` Is Dangerous

import os

# os.system - DO NOT USE
filename = "report 2024.pdf"
os.system(f"ls -la {filename}")
# This passes the string to the shell, which interprets it.
# If filename = "file.pdf; rm -rf /", you get shell injection!

# Worse: no way to capture output
# os.system returns only the exit code (0 = success)
ret = os.system("ls /tmp")   # Prints to stdout directly
print(ret)                   # 0 (success) - output is gone

With os.system(): user_input = "report.pdf; rm -rf ~" → os.system(f"open {user_input}") → shell executes open report.pdf; rm -rf ~ → deletes home directory.

With subprocess.run(["open", user_input]): arguments are passed as a list, never interpreted by the shell. No shell metacharacters (;, &&, |, >) are processed.

The Correct Way: `subprocess.run()`

import subprocess

# Safe - arguments are a list, no shell injection possible
result = subprocess.run(
    ["ls", "-la", "/tmp"],
    capture_output=True,    # Capture stdout and stderr
    text=True,              # Decode bytes to str
    check=True              # Raise CalledProcessError if exit code != 0
)

print(result.stdout)        # The ls output as a string
print(result.returncode)    # 0

# Handling errors
try:
    result = subprocess.run(
        ["python3", "nonexistent.py"],
        capture_output=True,
        text=True,
        check=True
    )
except subprocess.CalledProcessError as e:
    print(f"Command failed with code {e.returncode}")
    print(f"stderr: {e.stderr}")

# Passing user input safely - no string formatting needed
user_filename = "report 2024.pdf"
result = subprocess.run(
    ["wc", "-l", user_filename],   # Each argument is separate
    capture_output=True, text=True
)

Part 11 - Real-World: Build Tool Integration

Here is a complete, production-quality script combining os.walk, os.stat, os.makedirs, and subprocess.run to build a project report:

import os
import subprocess
import json
from datetime import datetime

def analyze_project(root_dir, output_dir):
    """
    Analyze a Python project: count files, find large files,
    run pylint, and write a JSON report.
    """
    os.makedirs(output_dir, exist_ok=True)

    stats = {
        "root": root_dir,
        "analyzed_at": datetime.utcnow().isoformat(),
        "file_count": 0,
        "total_size_bytes": 0,
        "large_files": [],       # Files over 100KB
        "file_types": {},
        "pylint_score": None,
    }

    skip_dirs = {".git", "__pycache__", ".venv", "node_modules", ".mypy_cache"}

    for dirpath, dirnames, filenames in os.walk(root_dir, topdown=True):
        dirnames[:] = [d for d in dirnames if d not in skip_dirs]

        for fname in filenames:
            fpath = os.path.join(dirpath, fname)
            try:
                info = os.stat(fpath)
                size = info.st_size
                ext = os.path.splitext(fname)[1].lower() or "(no ext)"

                stats["file_count"] += 1
                stats["total_size_bytes"] += size
                stats["file_types"][ext] = stats["file_types"].get(ext, 0) + 1

                if size > 100_000:  # 100KB
                    stats["large_files"].append({
                        "path": fpath,
                        "size_kb": round(size / 1024, 1),
                    })
            except PermissionError:
                pass

    # Run pylint on the project (safely, without shell=True)
    try:
        result = subprocess.run(
            ["python3", "-m", "pylint", root_dir, "--score=y"],
            capture_output=True,
            text=True,
            timeout=60
        )
        # Parse pylint score from last line: "Your code has been rated at 9.50/10"
        for line in result.stdout.splitlines():
            if "rated at" in line:
                score_part = line.split("rated at")[1].strip()
                stats["pylint_score"] = score_part.split("/")[0].strip()
    except (subprocess.TimeoutExpired, FileNotFoundError):
        stats["pylint_score"] = "unavailable"

    # Write report
    report_path = os.path.join(output_dir, "project_report.json")
    with open(report_path, "w", encoding="utf-8") as f:
        json.dump(stats, f, indent=2)

    print(f"Report written to {report_path}")
    print(f"Files analyzed: {stats['file_count']}")
    print(f"Total size: {stats['total_size_bytes'] / 1024:.1f} KB")
    return stats

# Run it
# analyze_project("/Users/alice/projects/myapp", "/tmp/reports")

Interview Questions

Q1: What is the difference between `os.listdir()` and `os.scandir()`, and when would you use each?

Answer: os.listdir() returns a plain list of filenames as strings. To determine file type (file vs directory) or size, you must make a separate os.stat() system call for each entry - O(n) additional syscalls.

os.scandir() returns DirEntry objects. On Linux and Windows, the directory entry structure from the OS already includes the file type (the d_type field in struct dirent). scandir exposes this as entry.is_file() and entry.is_dir() without additional syscalls, making it up to 20x faster on large directories. Use scandir when you need to filter by type or access stat information. Use listdir only when you need a simple list of names and nothing else.

Q2: How does `os.walk()` allow you to prune directories, and why must you use slice assignment?

Answer: With topdown=True (the default), os.walk yields (dirpath, dirnames, filenames) and checks dirnames to decide which subdirectories to descend into. If you modify dirnames before the next iteration, os.walk respects the change.

The modification must be in-place using dirnames[:] = [...] (slice assignment). If you write dirnames = [...], you rebind the local variable to a new list object, but os.walk still holds a reference to the original list and will descend into all original directories. Slice assignment modifies the contents of the existing list object that both your code and os.walk share.

Q3: Why should you never use `os.system()` in production code?

Answer: Three reasons:

Shell injection: os.system() passes the command string to the shell for interpretation. User-controlled input in the string can contain shell metacharacters (;, &&, |, `) that execute arbitrary commands.
No output capture: os.system() writes stdout/stderr directly to the terminal and returns only the exit code. You cannot capture or process the output programmatically.
No timeout or error handling: subprocess.run() with check=True, timeout=N, and capture_output=True gives you structured error handling, output capture, and timeout protection. Use subprocess.run(["cmd", "arg1", "arg2"]) with a list of arguments - no shell interpretation, no injection risk.

Q4: What is the difference between `os.remove()` and `shutil.rmtree()`?

Answer: os.remove() (also os.unlink()) deletes a single file. It raises IsADirectoryError if called on a directory and FileNotFoundError if the file does not exist.

os.rmdir() deletes a single empty directory. It raises OSError if the directory contains any files or subdirectories.

shutil.rmtree() recursively deletes an entire directory tree - all files, subdirectories, and their contents. There is no confirmation, no recycle bin, and no undo. Always validate the path before calling it in production code.

Q5: What does `os.stat()` return, and what is the difference between `st_mtime`, `st_atime`, and `st_ctime`?

Answer: os.stat() returns a stat_result object with fields from the underlying POSIX stat(2) system call:

st_size: file size in bytes
st_mode: file type and permission bits (use stat.S_IMODE() to extract permissions)
st_uid, st_gid: owner user ID and group ID
st_mtime: modification time - when the file content was last changed
st_atime: access time - when the file was last read (often disabled on Linux for performance)
st_ctime: metadata change time on Unix (NOT creation time) - when permissions, owner, or link count changed. On Windows, this is creation time.

The common confusion: on Unix/Linux, st_ctime is not creation time. Use st_mtime to detect file changes in build tools and cache invalidators.

Q6: How is `os.urandom()` different from the `random` module, and when must you use `os.urandom()`?

Answer: random is a pseudo-random number generator (Mersenne Twister) seeded from the system time. It is statistically high-quality but cryptographically predictable - given enough output, an attacker can determine the internal state and predict all future values.

os.urandom() reads from the OS cryptographically secure random number generator - /dev/urandom on Unix (which uses hardware entropy sources, interrupt timing, and the kernel's CSPRNG), or CryptGenRandom on Windows. The output is computationally infeasible to predict.

You must use os.urandom() (or the secrets module, which wraps it) for: session tokens, CSRF tokens, password reset links, API keys, encryption keys, nonces, and any value whose unpredictability has security implications. Use random only for simulations, games, and non-security random sampling.

Practice Challenges

Beginner - Directory File Counter

Write a function count_by_extension(directory) that returns a dictionary mapping each file extension (e.g., ".py", ".txt") to the number of files with that extension in the directory (non-recursive). Use os.scandir().

Solution

import os

def count_by_extension(directory):
    """
    Count files in a directory grouped by extension.
    Non-recursive. Uses os.scandir for efficiency.

    Args:
        directory: path to the directory to scan

    Returns:
        dict mapping extension -> count
        e.g., {'.py': 12, '.txt': 3, '(no ext)': 1}
    """
    counts = {}

    with os.scandir(directory) as entries:
        for entry in entries:
            if entry.is_file():
                # os.path.splitext returns ('name', '.ext') or ('name', '')
                _, ext = os.path.splitext(entry.name)
                ext = ext.lower() if ext else "(no ext)"
                counts[ext] = counts.get(ext, 0) + 1

    return counts


# Demo
if __name__ == "__main__":
    import sys
    target = sys.argv[1] if len(sys.argv) > 1 else "."
    result = count_by_extension(target)

    print(f"File types in {target}:")
    for ext, count in sorted(result.items(), key=lambda x: -x[1]):
        print(f"  {ext:15} {count:4} files")

# Example output for a Python project directory:
# File types in /Users/alice/myproject:
#   .py              47 files
#   .md               8 files
#   .yaml             3 files
#   (no ext)          2 files
#   .json             1 files

Intermediate - Recursive Duplicate Finder

Write a function find_duplicates(directory) that recursively scans a directory and returns a dict where keys are file sizes (in bytes) and values are lists of file paths that share that size. Include only sizes with more than one file - these are potential duplicates. Skip hidden directories and __pycache__.

Solution

import os
from collections import defaultdict

def find_duplicates(directory):
    """
    Find potential duplicate files by matching file size.
    Files with the same size are candidates for deduplication.
    (True deduplication requires content hashing - this is step 1.)

    Args:
        directory: root directory to scan recursively

    Returns:
        dict: {size_bytes: [list_of_paths]} for sizes with 2+ files
    """
    skip_dirs = {"__pycache__", ".git", ".venv", "node_modules", ".mypy_cache"}
    size_map = defaultdict(list)  # size -> [paths]

    for dirpath, dirnames, filenames in os.walk(directory, topdown=True):
        # Prune hidden dirs and known noisy dirs
        dirnames[:] = [
            d for d in dirnames
            if d not in skip_dirs and not d.startswith(".")
        ]

        for fname in filenames:
            if fname.startswith("."):
                continue  # Skip hidden files
            fpath = os.path.join(dirpath, fname)
            try:
                size = os.stat(fpath).st_size
                if size > 0:  # Skip empty files
                    size_map[size].append(fpath)
            except (PermissionError, FileNotFoundError):
                pass  # Skip inaccessible files

    # Keep only sizes with multiple files
    duplicates = {
        size: paths
        for size, paths in size_map.items()
        if len(paths) > 1
    }

    return duplicates


def report_duplicates(directory):
    """Print a human-readable duplicate report."""
    dupes = find_duplicates(directory)

    if not dupes:
        print("No potential duplicates found.")
        return

    total_wasted = 0
    print(f"Potential duplicates in {directory}:\n")

    for size, paths in sorted(dupes.items(), key=lambda x: -x[0]):
        size_kb = size / 1024
        wasted = size * (len(paths) - 1)  # Could save this many bytes
        total_wasted += wasted

        print(f"  Size: {size_kb:.1f} KB - {len(paths)} files")
        for path in paths:
            print(f"    {path}")
        print()

    print(f"Potential savings if deduplicated: {total_wasted / 1024:.1f} KB")


# Demo usage
# report_duplicates("/Users/alice/Downloads")

# Note: size matching is not conclusive - two files can have the same size
# but different contents. For reliable deduplication, hash the content:
import hashlib

def hash_file(path, chunk_size=65536):
    """Return MD5 hash of file contents."""
    h = hashlib.md5()
    with open(path, "rb") as f:
        while chunk := f.read(chunk_size):
            h.update(chunk)
    return h.hexdigest()

def find_true_duplicates(directory):
    """Find files with identical content (two-pass: size then hash)."""
    # First pass: group by size (cheap)
    size_candidates = find_duplicates(directory)

    # Second pass: hash only the candidate files (expensive for large files)
    hash_map = defaultdict(list)
    for paths in size_candidates.values():
        for path in paths:
            try:
                digest = hash_file(path)
                hash_map[digest].append(path)
            except (PermissionError, FileNotFoundError):
                pass

    return {h: paths for h, paths in hash_map.items() if len(paths) > 1}

Advanced - Secure Deployment Script with Permission Auditing

Write a deploy_static_files(src_dir, dest_dir) function that:

Copies all non-hidden files from src_dir to dest_dir recursively using os.walk, os.makedirs, and shutil.copy2
After copying, audits each file and sets permissions: directories get 0o755, regular files get 0o644, files ending in .sh get 0o755
Detects any files that end up world-writable (stat.S_IWOTH) and raises a RuntimeError listing them
Returns a summary dict: {"copied": N, "permission_errors": [...]}

Solution

import os
import stat
import shutil
from pathlib import Path


class DeploymentError(Exception):
    """Raised when deployment encounters security violations."""
    pass


def deploy_static_files(src_dir, dest_dir):
    """
    Deploy static files from src_dir to dest_dir with secure permissions.

    Steps:
      1. Walk src_dir, skip hidden files/dirs
      2. Recreate directory structure in dest_dir
      3. Copy each file (preserving metadata with shutil.copy2)
      4. Set permissions: dirs=0o755, .sh files=0o755, others=0o644
      5. Audit for world-writable files and raise DeploymentError if found

    Args:
        src_dir: source directory path (str or Path)
        dest_dir: destination directory path (str or Path)

    Returns:
        dict: {"copied": int, "permission_errors": list[str]}

    Raises:
        DeploymentError: if any deployed file ends up world-writable
    """
    src_dir = str(src_dir)
    dest_dir = str(dest_dir)
    os.makedirs(dest_dir, exist_ok=True)

    summary = {"copied": 0, "permission_errors": []}
    skip_dirs = {".git", "__pycache__", ".venv", "node_modules"}

    # ── Phase 1: Copy files ───────────────────────────────────────────
    for dirpath, dirnames, filenames in os.walk(src_dir, topdown=True):
        # Skip hidden and noisy directories
        dirnames[:] = [
            d for d in dirnames
            if d not in skip_dirs and not d.startswith(".")
        ]

        # Compute the relative path from src_dir
        rel_path = os.path.relpath(dirpath, src_dir)
        dest_subdir = os.path.join(dest_dir, rel_path)
        os.makedirs(dest_subdir, exist_ok=True)

        for fname in filenames:
            if fname.startswith("."):
                continue  # Skip hidden files

            src_file = os.path.join(dirpath, fname)
            dest_file = os.path.join(dest_subdir, fname)

            try:
                shutil.copy2(src_file, dest_file)  # copy2 preserves timestamps
                summary["copied"] += 1
            except PermissionError as e:
                summary["permission_errors"].append(f"copy failed: {src_file}: {e}")

    # ── Phase 2: Set permissions ──────────────────────────────────────
    for dirpath, dirnames, filenames in os.walk(dest_dir, topdown=True):
        # Set directory permissions
        try:
            os.chmod(dirpath, 0o755)
        except PermissionError as e:
            summary["permission_errors"].append(f"chmod dir failed: {dirpath}: {e}")

        for fname in filenames:
            fpath = os.path.join(dirpath, fname)
            # Shell scripts need execute bit; everything else gets 0o644
            target_mode = 0o755 if fname.endswith(".sh") else 0o644
            try:
                os.chmod(fpath, target_mode)
            except PermissionError as e:
                summary["permission_errors"].append(
                    f"chmod failed: {fpath}: {e}"
                )

    # ── Phase 3: Security audit ───────────────────────────────────────
    world_writable = []

    for dirpath, dirnames, filenames in os.walk(dest_dir):
        for fname in filenames:
            fpath = os.path.join(dirpath, fname)
            try:
                mode = os.stat(fpath).st_mode
                if mode & stat.S_IWOTH:  # world-writable bit set
                    world_writable.append(fpath)
            except PermissionError:
                pass

    if world_writable:
        file_list = "\n  ".join(world_writable)
        raise DeploymentError(
            f"Security violation: {len(world_writable)} world-writable files found "
            f"after deployment:\n  {file_list}"
        )

    print(f"Deployment complete:")
    print(f"  Files copied:   {summary['copied']}")
    print(f"  Perm errors:    {len(summary['permission_errors'])}")
    if summary["permission_errors"]:
        for err in summary["permission_errors"]:
            print(f"    WARNING: {err}")

    return summary


# Demo usage
if __name__ == "__main__":
    import tempfile
    import textwrap

    # Set up a test source directory
    with tempfile.TemporaryDirectory() as src:
        # Create some files
        Path(src, "index.html").write_text("<h1>Hello</h1>")
        Path(src, "style.css").write_text("body { margin: 0; }")
        Path(src, "deploy.sh").write_text("#!/bin/bash\necho deploying")
        Path(src, "subdir").mkdir()
        Path(src, "subdir", "app.js").write_text("console.log('app');")

        with tempfile.TemporaryDirectory() as dest:
            result = deploy_static_files(src, dest)
            print(f"\nResult: {result}")

            # Verify permissions
            for dirpath, _, filenames in os.walk(dest):
                for fname in filenames:
                    fpath = os.path.join(dirpath, fname)
                    mode = stat.S_IMODE(os.stat(fpath).st_mode)
                    expected = 0o755 if fname.endswith(".sh") else 0o644
                    status = "OK" if mode == expected else "MISMATCH"
                    print(f"  [{status}] {fname}: {oct(mode)}")

# Example output:
# Deployment complete:
#   Files copied:   4
#   Perm errors:    0
#
# Result: {'copied': 4, 'permission_errors': []}
#   [OK] index.html: 0o644
#   [OK] style.css:  0o644
#   [OK] app.js:     0o644
#   [OK] deploy.sh:  0o755

Quick Reference

Operation	Code	Notes
Current directory	`os.getcwd()`	Returns absolute path string
Change directory	`os.chdir(path)`	Avoid in production - mutates global state
List directory	`os.listdir(path)`	Returns list of name strings
Scan directory	`os.scandir(path)`	Returns DirEntry objects - faster
Walk recursively	`os.walk(path, topdown=True)`	Yields (dirpath, dirnames, filenames)
Prune walk	`dirnames[:] = [...]`	Slice assignment in-place
File exists	`os.path.exists(path)`	Prefer `Path(p).exists()`
Is file	`os.path.isfile(path)`	Prefer `Path(p).is_file()`
Is directory	`os.path.isdir(path)`	Prefer `Path(p).is_dir()`
Join paths	`os.path.join(a, b, c)`	Prefer `Path(a) / b / c`
Split extension	`os.path.splitext(name)`	Returns `('stem', '.ext')`
File metadata	`os.stat(path)`	Returns stat_result
File permissions	`stat.S_IMODE(os.stat(p).st_mode)`	Requires `import stat`
Set permissions	`os.chmod(path, 0o644)`	Octal mode
Create directory	`os.makedirs(path, exist_ok=True)`	Creates all intermediate dirs
Remove file	`os.remove(path)`	Single file only
Remove dir tree	`shutil.rmtree(path)`	Permanent - no undo
Rename/move	`os.rename(src, dst)`	Atomic on POSIX if same filesystem
Current PID	`os.getpid()`	Integer process ID
Parent PID	`os.getppid()`	Integer parent process ID
CPU count	`os.cpu_count()`	Number of logical CPUs
System load	`os.getloadavg()`	Unix only: (1m, 5m, 15m) tuple
Secure random	`os.urandom(n)`	n bytes of cryptographic entropy
Run command	`subprocess.run([...], capture_output=True, text=True, check=True)`	Never use `os.system()`
Read env var	`os.environ.get("KEY", "default")`	Safe - returns default if missing
Set env var	`os.environ["KEY"] = "value"`	Affects current process and children

Key Takeaways

os is the thin wrapper around POSIX/Win32 system calls - it handles process info, permissions, environment variables, and filesystem operations that pathlib does not cover
Use os.scandir() instead of os.listdir() whenever you need file type or stat information - it avoids extra system calls by caching directory entry metadata
os.walk() with topdown=True lets you prune directory traversal by modifying dirnames[:] = [...] in-place; use topdown=False for bottom-up operations like directory deletion
os.chdir() mutates global process state - avoid it in library code; build absolute paths instead
os.chmod() and os.stat() give you full control over Unix file permissions; stat.S_IMODE() extracts the permission bits from the full mode value
Never use os.system() - it is vulnerable to shell injection and cannot capture output; use subprocess.run(["cmd", "arg"], capture_output=True, text=True, check=True) instead
os.urandom() provides cryptographically secure random bytes; the secrets module (Python 3.6+) wraps it with a friendlier API for tokens and keys
os.makedirs(path, exist_ok=True) is the safe way to create nested directories - it avoids TOCTOU race conditions by not failing if the directory already exists

What You Will Learn
Prerequisites
The Big Picture: os vs pathlib vs shutil
Part 1 - os.path: The Classic Path Toolkit
- The pathlib Equivalents
Part 2 - Current Working Directory
- The Safe Pattern
Part 3 - Listing Directory Contents
Part 4 - Recursive Directory Traversal with os.walk()
Part 5 - File Metadata and Permissions
Part 6 - File System Operations
Part 7 - Process Information
- Why Process IDs Matter
Part 8 - Environment Variables
Part 9 - os.urandom(): Cryptographically Secure Random Bytes
Part 10 - os.system() vs subprocess.run()
- Why os.system() Is Dangerous
- The Correct Way: subprocess.run()
Part 11 - Real-World: Build Tool Integration
Interview Questions
Practice Challenges
Quick Reference
Key Takeaways

What You Will Learn​

Prerequisites​

The Big Picture: os vs pathlib vs shutil​

Part 1 - os.path: The Classic Path Toolkit​

The pathlib Equivalents​

Part 2 - Current Working Directory​

The Safe Pattern​

Part 3 - Listing Directory Contents​

os.listdir(): Simple but Dumb​

os.scandir(): Faster and Smarter​

Practical os.scandir() Usage​

Part 4 - Recursive Directory Traversal with os.walk()​

How os.walk() Works Internally​

Controlling Traversal: Pruning Subdirectories​

topdown=False: When You Need to Delete Directories​

Part 5 - File Metadata and Permissions​

os.stat(): Everything About a File​

Understanding Unix File Permissions​

os.chmod(): Changing Permissions​

Practical: Audit Files With Insecure Permissions​

Part 6 - File System Operations​

Creating Directories​

Renaming and Moving Files​

Removing Files and Directories​

Part 7 - Process Information​

Why Process IDs Matter​

Part 8 - Environment Variables​

Part 9 - os.urandom(): Cryptographically Secure Random Bytes​

Part 10 - os.system() vs subprocess.run()​

Why os.system() Is Dangerous​

The Correct Way: subprocess.run()​

Part 11 - Real-World: Build Tool Integration​

Interview Questions​

Q1: What is the difference between os.listdir() and os.scandir(), and when would you use each?​

Q2: How does os.walk() allow you to prune directories, and why must you use slice assignment?​

Q3: Why should you never use os.system() in production code?​

Q4: What is the difference between os.remove() and shutil.rmtree()?​

Q5: What does os.stat() return, and what is the difference between st_mtime, st_atime, and st_ctime?​

Q6: How is os.urandom() different from the random module, and when must you use os.urandom()?​

Practice Challenges​

Beginner - Directory File Counter​

Intermediate - Recursive Duplicate Finder​

Advanced - Secure Deployment Script with Permission Auditing​

Quick Reference​

Key Takeaways​

What You Will Learn

Prerequisites

The Big Picture: `os` vs `pathlib` vs `shutil`

Part 1 - `os.path`: The Classic Path Toolkit

The `pathlib` Equivalents

Part 2 - Current Working Directory

The Safe Pattern

Part 3 - Listing Directory Contents

`os.listdir()`: Simple but Dumb

`os.scandir()`: Faster and Smarter

Practical `os.scandir()` Usage

Part 4 - Recursive Directory Traversal with `os.walk()`

How `os.walk()` Works Internally

Controlling Traversal: Pruning Subdirectories

`topdown=False`: When You Need to Delete Directories

Part 5 - File Metadata and Permissions

`os.stat()`: Everything About a File

Understanding Unix File Permissions

`os.chmod()`: Changing Permissions

Practical: Audit Files With Insecure Permissions

Part 6 - File System Operations

Creating Directories

Renaming and Moving Files

Removing Files and Directories

Part 7 - Process Information

Why Process IDs Matter

Part 8 - Environment Variables

Part 9 - `os.urandom()`: Cryptographically Secure Random Bytes

Part 10 - `os.system()` vs `subprocess.run()`

Why `os.system()` Is Dangerous

The Correct Way: `subprocess.run()`

Part 11 - Real-World: Build Tool Integration

Interview Questions

Q1: What is the difference between `os.listdir()` and `os.scandir()`, and when would you use each?

Q2: How does `os.walk()` allow you to prune directories, and why must you use slice assignment?

Q3: Why should you never use `os.system()` in production code?

Q4: What is the difference between `os.remove()` and `shutil.rmtree()`?

Q5: What does `os.stat()` return, and what is the difference between `st_mtime`, `st_atime`, and `st_ctime`?

Q6: How is `os.urandom()` different from the `random` module, and when must you use `os.urandom()`?

Practice Challenges

Beginner - Directory File Counter

Intermediate - Recursive Duplicate Finder

Advanced - Secure Deployment Script with Permission Auditing

Quick Reference

Key Takeaways