Module 07 - File Handling and OS Interaction
Every production Python program interacts with the file system. Configuration files, logs, data pipelines, reports, model checkpoints - they all go through the file I/O stack. And yet, "reading a file" hides a surprising amount of engineering: buffering layers, encoding edge cases, atomicity guarantees, OS-level path abstractions, and serialization contracts.
This module makes those hidden layers visible.
What This Module Covers
Module 7 - File Handling and OS Interaction
│
├── Lesson 01: Reading Files
│ ├── open() - modes, encodings, buffering
│ ├── Text vs binary mode
│ ├── Reading large files without OOM
│ └── UnicodeDecodeError and encoding pitfalls
│
├── Lesson 02: Writing Files
│ ├── Write modes: 'w', 'a', 'x' (exclusive create)
│ ├── Atomic writes with temp files
│ ├── Flushing and fsync
│ └── Preventing data loss on crash
│
├── Lesson 03: Context Managers
│ ├── The with statement internals
│ ├── __enter__ / __exit__ protocol
│ ├── contextlib.contextmanager
│ └── Nested context managers
│
├── Lesson 04: Pathlib Deep Dive
│ ├── Path objects vs strings
│ ├── Cross-platform path construction
│ ├── Globbing, iterating, stat()
│ └── Why pathlib replaced os.path
│
├── Lesson 05: OS Module
│ ├── os.getcwd(), os.listdir(), os.stat()
│ ├── Process information: os.getpid(), os.getenv()
│ ├── File metadata and permissions
│ └── When to use os vs pathlib
│
├── Lesson 06: Environment Variables
│ ├── os.environ - reading, writing, defaults
│ ├── 12-factor app configuration
│ ├── .env files and python-dotenv
│ └── Secrets management patterns
│
├── Lesson 07: CSV Handling
│ ├── csv.reader / csv.writer
│ ├── csv.DictReader / csv.DictWriter
│ ├── Quoting, dialects, edge cases
│ └── Large CSV files without pandas
│
├── Lesson 08: JSON Handling
│ ├── json.load / json.dump
│ ├── Custom encoders and decoders
│ ├── JSON schema validation
│ └── Streaming large JSON
│
├── Lesson 09: Serialization Concepts
│ ├── pickle - when to use, when never to
│ ├── struct for binary data
│ ├── Protocol Buffers concept
│ └── Choosing the right serialization format
│
└── Lesson 10: Working with Directories
├── Creating, moving, deleting directories
├── shutil - copying and archiving
├── tempfile - safe temporary files
└── Directory traversal with os.walk
Why This Module Is Not Optional
Consider what happens in a single FastAPI request handler for a document upload endpoint:
- The request body (binary) is read from the network socket
- It is written to a temporary file on disk
- The temp file path is passed to a processing function
- The result is written to an output path (atomically, so a crash doesn't leave a half-written file)
- Metadata is written to a JSON sidecar file
- The temp file is cleaned up
That is six distinct file I/O operations, each with its own failure mode. If you do not understand buffering, you might lose data. If you do not understand atomic writes, a power failure corrupts your output. If you do not understand encoding, a user whose name contains a non-ASCII character breaks your log pipeline.
This module teaches you to do all of it correctly.
The Python I/O Stack
Before the individual lessons, it helps to understand the full stack that every open() call sits on top of:
Your Python code
│
│ open("file.txt", "r", encoding="utf-8")
▼
TextIOWrapper ← handles encoding/decoding (str ↔ bytes)
│
▼
BufferedReader ← 8192-byte read buffer (avoids syscall per char)
│
▼
FileIO (RawIOBase) ← thin wrapper around os.read() / os.write()
│
▼
OS kernel ← read() syscall → VFS → filesystem driver
│
▼
Disk / NFS / tmpfs ← actual storage
Every lesson in this module touches some layer of this stack. By the end, you will understand why f.read() is fast even for large files (buffering), why you need f.flush() before os.fsync() (two different buffer layers), and why encoding="utf-8" matters even on a Mac (the system default is not always UTF-8).
What You Will Be Able to Do
After this module you will:
- Read and write files correctly in both text and binary mode, choosing the right encoding and buffer size
- Write atomic file updates that survive process crashes without corrupting data
- Use
pathlib.Pathinstead ofos.pathstring manipulation everywhere - Load and validate configuration from environment variables following the 12-factor pattern
- Parse CSV and JSON files including edge cases (quoted commas, custom encoders, large files)
- Choose the right serialization format for different use cases (JSON vs pickle vs struct vs protobuf)
- Safely create, copy, move, and delete files and directories
- Write production-grade code that cleans up temp files even when exceptions occur
These are not advanced topics - they are the everyday plumbing of backend Python engineering, and getting them right is what separates clean production code from fragile scripts.
Start with Lesson 01: Reading Files and work through in order. Each lesson is self-contained but builds on the I/O stack understanding introduced here.
