Module 06 - Error Handling and Defensive Engineering
Reading time: ~8 minutes | Level: Foundation → Engineering
In June 2012, the Knight Capital Group lost $440 million in 45 minutes because an exception was silently swallowed in their trading system. The error handling code was disabled. Errors were logged nowhere. No alert fired.
The program continued running - executing trades - while operating in a completely broken state.
That is what happens when engineers treat error handling as an afterthought.
This module teaches error handling as a first-class engineering discipline: not something you bolt on at the end, but something you design into every function, every module, every system from the start.
What This Module Covers
This module has 10 lessons and 3 projects covering the complete error handling landscape in professional Python:
| # | Topic | What You Master |
|---|---|---|
| 01 | Exceptions Explained | What exceptions are, exception objects, propagation, BaseException vs Exception |
| 02 | try / except / finally | All 4 clauses, execution order, the else clause, cleanup guarantees |
| 03 | Exception Hierarchy | Full built-in hierarchy, which exceptions to catch, broad vs narrow |
| 04 | Raising Exceptions | raise, raise from, re-raising, chained exceptions |
| 05 | Custom Exceptions | Designing exception hierarchies, attributes, factories |
| 06 | Assertions and Invariants | assert, design-by-contract, when to assert vs raise |
| 07 | Defensive Programming | Input validation, fail fast, EAFP vs LBYL, guard clauses |
| 08 | Logging Basics | Python logging module, levels, handlers, structured logging |
| 09 | Debugging Strategies | pdb, breakpoint(), reading tracebacks, bisect debugging |
| 10 | Common Error Anti-Patterns | Bare except, swallowing exceptions, silent failures |
What Defensive Engineering Means
Defensive programming is not about being pessimistic.
It is about being honest about what can go wrong and designing your system to handle it explicitly - rather than hoping nothing breaks.
A defensive engineer asks:
- What if this file does not exist?
- What if this API returns
None? - What if the user passes a negative number where only positive is valid?
- What if the database is temporarily unavailable?
- What if this function is called in a completely unexpected way?
And then handles every answer with explicit code.
The Professional Mindset
:::tip The Core Principle Errors are not exceptional - they are guaranteed. A function that can fail will fail at 3am when you are asleep. Your job is to make sure it fails loudly, clearly, and recoverably - not silently. :::
Professional Python error handling means:
- Raise exceptions with meaningful messages that explain what happened and what was expected
- Catch specific exceptions - never use bare
except:which catches everything including keyboard interrupts - Log errors with enough context to reproduce and diagnose the problem
- Fail fast - detect invalid state as early as possible, not five function calls later
- Clean up resources - use
finallyor context managers to guarantee cleanup - Chain exceptions - use
raise ... from ...to preserve the original cause
AI/ML Connection
Error handling is disproportionately important in AI/ML systems:
Training loops - a single corrupted batch can crash hours of GPU training if exceptions are not handled
Data pipelines - one malformed record can silently produce incorrect statistics that corrupt model weights
Model serving - an unhandled exception in a serving endpoint silently returns 500 to every caller
Experiment logging - a logging error that swallows the exception means you lose irreplaceable training metrics
Every lesson in this module connects the concept to real ML scenarios - because error handling in production ML is not optional, it is the difference between a system that works and one that fails invisibly.
Prerequisites
Before starting this module, you should be comfortable with:
- Python functions, arguments, return values
- Python data types: str, int, list, dict
- Basic control flow: if/else, for loops, while loops
- Importing modules
Projects in This Module
After completing the lessons, three projects apply everything in realistic engineering scenarios:
Project 01 - Robust File Reader Build a file reading system that gracefully handles missing files, encoding errors, permission errors, and partial reads - with proper logging throughout.
Project 02 - Fault-Tolerant Calculator Build a calculator that validates all inputs, handles arithmetic edge cases, raises meaningful custom exceptions, and logs every error with context.
Project 03 - Error Logging CLI Tool Build a CLI tool that reads a log file, classifies ERROR/WARNING/INFO lines, and produces a structured summary report with statistics.
Key Takeaways
- Error handling is a design discipline, not a debugging tool
- The Knight Capital incident and dozens of production outages trace directly to missing or incorrect error handling
- Python's exception system is powerful when used correctly - and actively dangerous when abused
- This module treats
try/exceptthe same way a systems engineer treats fault tolerance: as a core requirement, not an optional extra - Every pattern learned here applies directly to AI/ML training pipelines, data ingestion systems, and model serving infrastructure
