Module 07 - File Handling and OS Interaction

Every production Python program interacts with the file system. Configuration files, logs, data pipelines, reports, model checkpoints - they all go through the file I/O stack. And yet, "reading a file" hides a surprising amount of engineering: buffering layers, encoding edge cases, atomicity guarantees, OS-level path abstractions, and serialization contracts.

This module makes those hidden layers visible.

What This Module Covers

Module 7 - File Handling and OS Interaction
│
├── Lesson 01: Reading Files
│   ├── open() - modes, encodings, buffering
│   ├── Text vs binary mode
│   ├── Reading large files without OOM
│   └── UnicodeDecodeError and encoding pitfalls
│
├── Lesson 02: Writing Files
│   ├── Write modes: 'w', 'a', 'x' (exclusive create)
│   ├── Atomic writes with temp files
│   ├── Flushing and fsync
│   └── Preventing data loss on crash
│
├── Lesson 03: Context Managers
│   ├── The with statement internals
│   ├── __enter__ / __exit__ protocol
│   ├── contextlib.contextmanager
│   └── Nested context managers
│
├── Lesson 04: Pathlib Deep Dive
│   ├── Path objects vs strings
│   ├── Cross-platform path construction
│   ├── Globbing, iterating, stat()
│   └── Why pathlib replaced os.path
│
├── Lesson 05: OS Module
│   ├── os.getcwd(), os.listdir(), os.stat()
│   ├── Process information: os.getpid(), os.getenv()
│   ├── File metadata and permissions
│   └── When to use os vs pathlib
│
├── Lesson 06: Environment Variables
│   ├── os.environ - reading, writing, defaults
│   ├── 12-factor app configuration
│   ├── .env files and python-dotenv
│   └── Secrets management patterns
│
├── Lesson 07: CSV Handling
│   ├── csv.reader / csv.writer
│   ├── csv.DictReader / csv.DictWriter
│   ├── Quoting, dialects, edge cases
│   └── Large CSV files without pandas
│
├── Lesson 08: JSON Handling
│   ├── json.load / json.dump
│   ├── Custom encoders and decoders
│   ├── JSON schema validation
│   └── Streaming large JSON
│
├── Lesson 09: Serialization Concepts
│   ├── pickle - when to use, when never to
│   ├── struct for binary data
│   ├── Protocol Buffers concept
│   └── Choosing the right serialization format
│
└── Lesson 10: Working with Directories
    ├── Creating, moving, deleting directories
    ├── shutil - copying and archiving
    ├── tempfile - safe temporary files
    └── Directory traversal with os.walk

Why This Module Is Not Optional

Consider what happens in a single FastAPI request handler for a document upload endpoint:

The request body (binary) is read from the network socket
It is written to a temporary file on disk
The temp file path is passed to a processing function
The result is written to an output path (atomically, so a crash doesn't leave a half-written file)
Metadata is written to a JSON sidecar file
The temp file is cleaned up

That is six distinct file I/O operations, each with its own failure mode. If you do not understand buffering, you might lose data. If you do not understand atomic writes, a power failure corrupts your output. If you do not understand encoding, a user whose name contains a non-ASCII character breaks your log pipeline.

This module teaches you to do all of it correctly.

The Python I/O Stack

Before the individual lessons, it helps to understand the full stack that every open() call sits on top of:

Your Python code
       │
       │  open("file.txt", "r", encoding="utf-8")
       ▼
TextIOWrapper          ← handles encoding/decoding (str ↔ bytes)
       │
       ▼
BufferedReader         ← 8192-byte read buffer (avoids syscall per char)
       │
       ▼
FileIO (RawIOBase)     ← thin wrapper around os.read() / os.write()
       │
       ▼
OS kernel              ← read() syscall → VFS → filesystem driver
       │
       ▼
Disk / NFS / tmpfs     ← actual storage

Every lesson in this module touches some layer of this stack. By the end, you will understand why f.read() is fast even for large files (buffering), why you need f.flush() before os.fsync() (two different buffer layers), and why encoding="utf-8" matters even on a Mac (the system default is not always UTF-8).

What You Will Be Able to Do

After this module you will:

Read and write files correctly in both text and binary mode, choosing the right encoding and buffer size
Write atomic file updates that survive process crashes without corrupting data
Use pathlib.Path instead of os.path string manipulation everywhere
Load and validate configuration from environment variables following the 12-factor pattern
Parse CSV and JSON files including edge cases (quoted commas, custom encoders, large files)
Choose the right serialization format for different use cases (JSON vs pickle vs struct vs protobuf)
Safely create, copy, move, and delete files and directories
Write production-grade code that cleans up temp files even when exceptions occur

These are not advanced topics - they are the everyday plumbing of backend Python engineering, and getting them right is what separates clean production code from fragile scripts.

Start with Lesson 01: Reading Files and work through in order. Each lesson is self-contained but builds on the I/O stack understanding introduced here.

What This Module Covers​

Why This Module Is Not Optional​

The Python I/O Stack​

What You Will Be Able to Do​

What This Module Covers

Why This Module Is Not Optional

The Python I/O Stack

What You Will Be Able to Do