Skip to main content

Module 06 - Error Handling and Defensive Engineering

Reading time: ~8 minutes | Level: Foundation → Engineering

In June 2012, the Knight Capital Group lost $440 million in 45 minutes because an exception was silently swallowed in their trading system. The error handling code was disabled. Errors were logged nowhere. No alert fired.

The program continued running - executing trades - while operating in a completely broken state.

That is what happens when engineers treat error handling as an afterthought.

This module teaches error handling as a first-class engineering discipline: not something you bolt on at the end, but something you design into every function, every module, every system from the start.

What This Module Covers

This module has 10 lessons and 3 projects covering the complete error handling landscape in professional Python:

#TopicWhat You Master
01Exceptions ExplainedWhat exceptions are, exception objects, propagation, BaseException vs Exception
02try / except / finallyAll 4 clauses, execution order, the else clause, cleanup guarantees
03Exception HierarchyFull built-in hierarchy, which exceptions to catch, broad vs narrow
04Raising Exceptionsraise, raise from, re-raising, chained exceptions
05Custom ExceptionsDesigning exception hierarchies, attributes, factories
06Assertions and Invariantsassert, design-by-contract, when to assert vs raise
07Defensive ProgrammingInput validation, fail fast, EAFP vs LBYL, guard clauses
08Logging BasicsPython logging module, levels, handlers, structured logging
09Debugging Strategiespdb, breakpoint(), reading tracebacks, bisect debugging
10Common Error Anti-PatternsBare except, swallowing exceptions, silent failures

What Defensive Engineering Means

Defensive programming is not about being pessimistic.

It is about being honest about what can go wrong and designing your system to handle it explicitly - rather than hoping nothing breaks.

A defensive engineer asks:

  • What if this file does not exist?
  • What if this API returns None?
  • What if the user passes a negative number where only positive is valid?
  • What if the database is temporarily unavailable?
  • What if this function is called in a completely unexpected way?

And then handles every answer with explicit code.

The Professional Mindset

:::tip The Core Principle Errors are not exceptional - they are guaranteed. A function that can fail will fail at 3am when you are asleep. Your job is to make sure it fails loudly, clearly, and recoverably - not silently. :::

Professional Python error handling means:

  1. Raise exceptions with meaningful messages that explain what happened and what was expected
  2. Catch specific exceptions - never use bare except: which catches everything including keyboard interrupts
  3. Log errors with enough context to reproduce and diagnose the problem
  4. Fail fast - detect invalid state as early as possible, not five function calls later
  5. Clean up resources - use finally or context managers to guarantee cleanup
  6. Chain exceptions - use raise ... from ... to preserve the original cause

AI/ML Connection

Error handling is disproportionately important in AI/ML systems:

Training loops - a single corrupted batch can crash hours of GPU training if exceptions are not handled Data pipelines - one malformed record can silently produce incorrect statistics that corrupt model weights Model serving - an unhandled exception in a serving endpoint silently returns 500 to every caller Experiment logging - a logging error that swallows the exception means you lose irreplaceable training metrics

Every lesson in this module connects the concept to real ML scenarios - because error handling in production ML is not optional, it is the difference between a system that works and one that fails invisibly.

Prerequisites

Before starting this module, you should be comfortable with:

  • Python functions, arguments, return values
  • Python data types: str, int, list, dict
  • Basic control flow: if/else, for loops, while loops
  • Importing modules

Projects in This Module

After completing the lessons, three projects apply everything in realistic engineering scenarios:

Project 01 - Robust File Reader Build a file reading system that gracefully handles missing files, encoding errors, permission errors, and partial reads - with proper logging throughout.

Project 02 - Fault-Tolerant Calculator Build a calculator that validates all inputs, handles arithmetic edge cases, raises meaningful custom exceptions, and logs every error with context.

Project 03 - Error Logging CLI Tool Build a CLI tool that reads a log file, classifies ERROR/WARNING/INFO lines, and produces a structured summary report with statistics.

Key Takeaways

  • Error handling is a design discipline, not a debugging tool
  • The Knight Capital incident and dozens of production outages trace directly to missing or incorrect error handling
  • Python's exception system is powerful when used correctly - and actively dangerous when abused
  • This module treats try/except the same way a systems engineer treats fault tolerance: as a core requirement, not an optional extra
  • Every pattern learned here applies directly to AI/ML training pipelines, data ingestion systems, and model serving infrastructure
© 2026 EngineersOfAI. All rights reserved.