Defensive Programming - Writing Code That Fails Gracefully
Reading time: ~19 minutes | Level: Foundation → Engineering
Here is a bug that takes hours to find in production:
def add_tag(item, tags=[]): # Mutable default argument!
tags.append(item)
return tags
print(add_tag("python")) # ['python']
print(add_tag("flask")) # ['python', 'flask'] ← WHY?
print(add_tag("numpy")) # ['python', 'flask', 'numpy'] ← ???
['python']
['python', 'flask']
['python', 'flask', 'numpy']
The list [] is created once when the def statement executes - not on each call. Every call that uses the default shares the same list object. Tags from earlier calls persist into later calls. Silent, insidious, catastrophic in production.
This is defensive programming failure at the most basic level: a dangerous default value that violates the principle of least surprise. Every call should start with a clean slate. This page teaches you how to write code that cannot be surprised this way.
What You Will Learn
- What defensive programming is and why it matters for production systems
- Validate at system boundaries, trust internal code: the fundamental boundary principle
- Guard clauses: validate early, keep the happy path clean and flat
- Fail fast and fail loudly: detect errors immediately where they occur
- Complete input validation: type, range, format, and business rule checks
- Defensive copying: avoiding aliasing bugs with mutable inputs
- The mutable default argument trap and the correct
Nonepattern - Null object pattern: eliminate
Nonepropagation with empty-but-safe objects - Circuit breaker pattern: stop hammering a failing external service
- Timeout and retry patterns: never block forever on external failures
- Real-world application in REST APIs, ML feature engineering, and database access
Prerequisites
- Python functions, arguments, and default values
- Python data structures: lists, dicts, and their mutability
- Python exception handling: try/except/raise (topics 01–04 of this module)
- Basic understanding of Python classes
What Defensive Programming Is
Defensive programming is the discipline of writing code that handles unexpected inputs and states gracefully rather than crashing or - worse - continuing silently with corrupted data.
It is not about being paranoid. It is about being precise about where trust boundaries lie.
The rule: validate aggressively at the boundary, trust completely inside it. Over-validation of already-validated data creates noise. Under-validation of external data creates vulnerabilities.
Part 1 - Guard Clauses: Validate Early, Stay Flat
A guard clause is an early return (or raise) that handles a degenerate case at the top of a function. It keeps the main logic flat and readable.
Without Guard Clauses (deeply nested)
# BAD: deeply nested, hard to read
def process_order(order):
if order is not None:
if order.get("items"):
if order["total"] > 0:
if order["customer_id"]:
# The actual business logic is buried 4 levels deep
inventory.reserve(order["items"])
payment.charge(order["customer_id"], order["total"])
return {"status": "confirmed"}
else:
return {"status": "error", "reason": "missing customer"}
else:
return {"status": "error", "reason": "invalid total"}
else:
return {"status": "error", "reason": "no items"}
else:
return {"status": "error", "reason": "null order"}
With Guard Clauses (flat and readable)
# GOOD: fail fast, main logic is at the top level
def process_order(order: dict) -> dict:
# Guard clauses: handle all the bad cases first
if order is None:
raise ValueError("order cannot be None")
if not order.get("items"):
raise ValueError("order must contain at least one item")
if order.get("total", 0) <= 0:
raise ValueError(f"order total must be positive, got {order.get('total')}")
if not order.get("customer_id"):
raise ValueError("order must have a customer_id")
# Happy path: clean, flat, readable
inventory.reserve(order["items"])
payment.charge(order["customer_id"], order["total"])
return {"status": "confirmed"}
Guard clauses reduce nesting, make the validation logic scannable at a glance, and move the happy path to the lowest indentation level. This is sometimes called the early return pattern or the return early, return often principle.
:::tip Guard Clauses in Class Methods
In class methods, guard clauses can also check the object's state before proceeding. if not self.is_connected: raise RuntimeError("Not connected - call connect() first") is a guard clause that enforces pre-conditions on object state.
:::
Part 2 - Fail Fast and Fail Loudly
Fail fast means detecting and reporting errors at the point they occur, not 100 lines later when symptoms appear as confusing secondary failures.
Fail loudly means raising a specific, informative exception - not silently returning a wrong value, logging without raising, or swallowing an exception.
Fail Fast Example
# BAD: error propagates silently, symptoms appear far from the cause
def compute_statistics(data_path: str):
with open(data_path) as f:
rows = [line.strip().split(",") for line in f]
# If row[2] is not a number, this fails with a cryptic IndexError
# or a confusing ValueError 50 lines after the file was read
values = [float(row[2]) for row in rows]
return {"mean": sum(values) / len(values), "count": len(values)}
# GOOD: fail immediately at the point of the problem with context
def compute_statistics(data_path: str):
rows = _load_csv(data_path) # Validated and parsed data
values = []
for row_num, row in enumerate(rows, start=1):
if len(row) < 3:
raise ValueError(
f"Row {row_num} in '{data_path}' has only {len(row)} columns, "
f"expected at least 3"
)
try:
values.append(float(row[2]))
except ValueError:
raise ValueError(
f"Row {row_num}, column 3 in '{data_path}': "
f"cannot convert '{row[2]!r}' to float"
)
if not values:
raise ValueError(f"No numeric values found in '{data_path}'")
return {"mean": sum(values) / len(values), "count": len(values)}
Fail Loudly: Never Swallow Exceptions
# BAD: swallowing exceptions silently
def get_user_config(user_id: int) -> dict:
try:
return db.query(Config).filter_by(user_id=user_id).first().to_dict()
except Exception:
return {} # Silent failure - caller has no idea what went wrong
# BAD: logging but not raising (only acceptable at the very top of a call stack)
def process_payment(order_id: str):
try:
_charge_card(order_id)
except Exception as e:
logger.error(f"Payment failed: {e}")
# Returns None - caller thinks payment succeeded!
# GOOD: let exceptions propagate with context
def get_user_config(user_id: int) -> dict:
try:
record = db.query(Config).filter_by(user_id=user_id).first()
except SQLAlchemyError as e:
raise DatabaseError(f"Failed to load config for user {user_id}") from e
if record is None:
return {} # Legitimate absence of config - explicit, expected case
return record.to_dict()
:::danger Never Use Bare except: or except Exception: pass
except: pass is the most dangerous pattern in Python. It catches everything including SystemExit and KeyboardInterrupt, and silently discards the error. The program continues in an undefined state. If you must catch broadly for a teardown handler, always at minimum log the exception and usually re-raise it.
:::
Part 3 - Complete Input Validation
Input at system boundaries requires four layers of validation:
| Layer | Question to Ask | Example |
|---|---|---|
| 1 - Type Check | Is the value the right Python type? | isinstance(value, int), isinstance(value, str) |
| 2 - Range / Bounds Check | Is the value within acceptable numeric or length bounds? | 0 <= age <= 120, 0 < price, 1 <= len(name) <= 255 |
| 3 - Format Check | Does the value match an expected pattern or structure? | re.match for email/phone, UUID format, date parsing |
| 4 - Business Rule Check | Does the value satisfy domain-specific constraints? | "departure must be before arrival", "discount cannot exceed product price" |
Full Validation Example
import re
from datetime import date, timedelta
# Custom exceptions (from topic 05)
class BookingValidationError(ValueError):
def __init__(self, field: str, value, reason: str):
self.field = field
self.value = value
self.reason = reason
super().__init__(f"Invalid '{field}': {reason} (got {value!r})")
def validate_hotel_booking(
guest_name: str,
checkin_date: date,
checkout_date: date,
num_guests: int,
email: str,
) -> None:
"""Validate hotel booking parameters. Raises BookingValidationError on failure."""
today = date.today()
# ── Layer 1: Type checks ──────────────────────────────────────────────────
if not isinstance(guest_name, str):
raise BookingValidationError(
"guest_name", guest_name,
f"must be a string, got {type(guest_name).__name__}"
)
if not isinstance(checkin_date, date):
raise BookingValidationError(
"checkin_date", checkin_date, "must be a date object"
)
if not isinstance(checkout_date, date):
raise BookingValidationError(
"checkout_date", checkout_date, "must be a date object"
)
if not isinstance(num_guests, int):
raise BookingValidationError(
"num_guests", num_guests,
f"must be an integer, got {type(num_guests).__name__}"
)
if not isinstance(email, str):
raise BookingValidationError(
"email", email, f"must be a string, got {type(email).__name__}"
)
# ── Layer 2: Range / bounds checks ────────────────────────────────────────
guest_name = guest_name.strip()
if len(guest_name) < 2:
raise BookingValidationError(
"guest_name", guest_name, "must be at least 2 characters"
)
if len(guest_name) > 100:
raise BookingValidationError(
"guest_name", guest_name, "must not exceed 100 characters"
)
if num_guests < 1:
raise BookingValidationError(
"num_guests", num_guests, "must be at least 1"
)
if num_guests > 20:
raise BookingValidationError(
"num_guests", num_guests, "cannot exceed 20 per booking"
)
# ── Layer 3: Format checks ────────────────────────────────────────────────
email_pattern = r'^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$'
if not re.match(email_pattern, email):
raise BookingValidationError(
"email", email, "does not match email format"
)
# ── Layer 4: Business rule checks ─────────────────────────────────────────
if checkin_date < today:
raise BookingValidationError(
"checkin_date", checkin_date,
f"cannot be in the past (today is {today})"
)
if checkout_date <= checkin_date:
raise BookingValidationError(
"checkout_date", checkout_date,
f"must be after checkin_date ({checkin_date})"
)
max_stay = timedelta(days=90)
if checkout_date - checkin_date > max_stay:
raise BookingValidationError(
"checkout_date", checkout_date,
f"stay cannot exceed 90 days"
)
# Usage
from datetime import date
try:
validate_hotel_booking(
guest_name="Alice Chen",
checkin_date=date(2026, 6, 15),
checkout_date=date(2026, 6, 20),
num_guests=2,
)
print("Booking validated successfully")
except BookingValidationError as e:
print(f"Validation failed: {e}")
print(f" Field: {e.field}")
print(f" Value: {e.value!r}")
print(f" Reason: {e.reason}")
Part 4 \text{---} Defensive Copying: Avoiding Aliasing Bugs
When your function stores or modifies a mutable argument, always copy it first. Otherwise the caller's object is modified behind their back.
The Aliasing Problem
class ShoppingCart:
def __init__(self, initial_items: list):
# BUG: we store a reference, not a copy
self._items = initial_items # Same object as caller's list!
def add(self, item):
self._items.append(item)
def items(self):
return self._items
# Caller does not expect their list to be modified
my_items = ["apple", "banana"]
cart = ShoppingCart(my_items)
cart.add("cherry")
print(my_items) # ['apple', 'banana', 'cherry'] ← Caller's list was mutated!
This is an aliasing bug: two parts of the program share a reference to the same mutable object and one modifies it without the other expecting it.
Defensive Copying
class ShoppingCart:
def __init__(self, initial_items: list):
# GOOD: copy the input \text{---} we own our own list
self._items = list(initial_items) # Shallow copy
def add(self, item):
self._items.append(item)
def items(self) -> list:
# Also copy on the way out: callers cannot modify our internal list
return list(self._items)
def __repr__(self):
return f"ShoppingCart({self._items!r})"
my_items = ["apple", "banana"]
cart = ShoppingCart(my_items)
cart.add("cherry")
print(my_items) # ['apple', 'banana'] ← Unchanged
print(cart.items()) # ['apple', 'banana', 'cherry']
Deep vs Shallow Copy
import copy
def process_config(config: dict) -> dict:
# Shallow copy: new dict, but nested objects still shared
config_copy = config.copy()
config_copy["processed"] = True
# Deep copy: fully independent clone of the entire object graph
config_deep = copy.deepcopy(config)
config_deep["nested"]["key"] = "modified" # Does not affect original
return config_deep
| Copy Type | "name" key | "db" key (nested dict) |
|---|---|---|
Shallow (config.copy()) | New binding to the same string object | Points to the same nested dict object \text{---} mutations affect the original |
Deep (copy.deepcopy(config)) | New binding to a new string object | Points to a new nested dict object \text{---} fully independent copy |
Rule of thumb: use shallow copy (list(), dict(), .copy()) when your object does not contain nested mutable objects. Use copy.deepcopy() when it does.
Part 5 \text{---} The Mutable Default Argument Trap
The opening example of this page shows this bug. Here is the complete explanation:
# WRONG: mutable default is created once at def time
def append_to(element, to=[]):
to.append(element)
return to
print(append_to(1)) # [1]
print(append_to(2)) # [1, 2] ← Bug: 2 appended to same list as call 1
print(append_to(3)) # [1, 2, 3] ← Same list object, third call
Python creates the default value [] once when the def statement runs. The same list object is reused on every call that does not provide the argument.
The Correct Pattern: Use None as the Sentinel
# CORRECT: use None as sentinel, create fresh list on each call
def append_to(element, to=None):
if to is None:
to = [] # New list created on each call
to.append(element)
return to
print(append_to(1)) # [1]
print(append_to(2)) # [2] ← Fresh list, correct
print(append_to(3, [10])) # [10, 3] ← Caller-provided list, correct
This applies to all mutable types as defaults:
# All of these are wrong as default arguments:
def bad_1(data={}): ... # mutable dict
def bad_2(items=[]): ... # mutable list
def bad_3(s=set()): ... # mutable set
# All corrected with None sentinel:
def good_1(data=None):
if data is None: data = {}
...
def good_2(items=None):
if items is None: items = []
...
def good_3(s=None):
if s is None: s = set()
...
:::note When Mutable Defaults ARE Intentional
There is a rare legitimate use of mutable defaults: a cache shared across all calls. def cached(key, _cache={}): ... explicitly uses the shared dict as a call-level cache. This is intentional, should be clearly documented, and is almost always better replaced by functools.lru_cache.
:::
Part 6 \text{---} Null Object Pattern
Returning None to signal absence forces every caller to check for None before using the result. This creates None propagation \text{---} the check for None spreads across the codebase.
# BAD: returning None forces callers to check
def find_user(user_id: int):
row = db.query("SELECT * FROM users WHERE id = ?", (user_id,))
return dict(row) if row else None
user = find_user(999)
# Every caller must guard against None:
if user is not None:
print(user["name"].upper())
send_email(user["email"], "Welcome!")
The Null Object Pattern
Instead of returning None, return an object of the same type that is safe to use without checking:
# An empty/default user object that is safe to call methods on
class NullUser:
name = ""
email = ""
is_authenticated = False
permissions = frozenset()
def has_permission(self, perm: str) -> bool:
return False
def __bool__(self) -> bool:
return False # Falsy \text{---} allows `if user:` checks
def __repr__(self) -> str:
return "NullUser()"
# A real user
class User:
def __init__(self, name: str, email: str, permissions: set):
self.name = name
self.email = email
self.permissions = permissions
self.is_authenticated = True
def has_permission(self, perm: str) -> bool:
return perm in self.permissions
def __bool__(self) -> bool:
return True
def find_user(user_id: int):
row = db.query("SELECT * FROM users WHERE id = ?", (user_id,))
if not row:
return NullUser() # Safe null object, never None
return User(row["name"], row["email"], set(row["permissions"]))
# Callers: no None checks needed
user = find_user(999)
if user: # __bool__ on NullUser returns False
send_welcome_email(user.email)
# This always works \text{---} NullUser.has_permission() safely returns False
if user.has_permission("admin"):
show_admin_panel()
Null Object for Collections
For functions that return lists, always return an empty list instead of None:
# BAD
def get_user_orders(user_id: int):
orders = db.query(...)
return orders if orders else None # Caller must check for None
# GOOD: empty list is safe to iterate, slice, and len()
def get_user_orders(user_id: int) -> list:
orders = db.query(...)
return orders if orders else [] # Always a list
# Caller code is clean:
for order in get_user_orders(user_id):
process_order(order)
# (Loop body never executes for empty list \text{---} no check needed)
Part 7 \text{---} Circuit Breaker Pattern
When your code calls an external service (API, database, microservice), that service can fail. Without protection, your code will block until timeout, then retry, then block again \text{---} degrading the entire system.
The circuit breaker pattern stops calling a failing service for a cool-down period, returning an error immediately instead.
Simple Circuit Breaker Implementation
import time
from dataclasses import dataclass, field
from typing import Callable, Any
class CircuitOpenError(Exception):
"""Raised when the circuit breaker is open and calls are blocked."""
pass
@dataclass
class CircuitBreaker:
"""A simple circuit breaker for protecting external service calls."""
failure_threshold: int = 5 # failures before opening
cooldown_seconds: float = 30.0 # seconds to wait before half-open probe
_failure_count: int = field(default=0, init=False)
_state: str = field(default="closed", init=False) # closed, open, half-open
_opened_at: float = field(default=0.0, init=False)
def call(self, func: Callable, *args, **kwargs) -> Any:
"""Call func through the circuit breaker.
Raises:
CircuitOpenError: If the circuit is open.
Any exception raised by func (and counts it as a failure).
"""
if self._state == "open":
elapsed = time.monotonic() - self._opened_at
if elapsed >= self.cooldown_seconds:
self._state = "half-open"
else:
remaining = self.cooldown_seconds - elapsed
raise CircuitOpenError(
f"Circuit is OPEN. Retry in {remaining:.1f}s."
)
try:
result = func(*args, **kwargs)
self._on_success()
return result
except Exception:
self._on_failure()
raise
def _on_success(self):
self._failure_count = 0
self._state = "closed"
def _on_failure(self):
self._failure_count += 1
if self._failure_count >= self.failure_threshold:
self._state = "open"
self._opened_at = time.monotonic()
@property
def state(self) -> str:
return self._state
# Usage
import requests
payment_breaker = CircuitBreaker(failure_threshold=3, cooldown_seconds=60.0)
def charge_payment(amount: float, card_token: str) -> dict:
def _do_charge():
response = requests.post(
"https://api.payments.example.com/charge",
json={"amount": amount, "token": card_token},
timeout=5.0
)
response.raise_for_status()
return response.json()
try:
return payment_breaker.call(_do_charge)
except CircuitOpenError as e:
# Return a safe degraded response instead of blocking
return {"status": "unavailable", "message": str(e)}
except requests.Timeout:
return {"status": "timeout", "message": "Payment service timed out"}
except requests.HTTPError as e:
return {"status": "error", "message": str(e)}
Part 8 \text{---} Timeout and Retry Patterns
External services fail and slow down. Never let external failures block your code indefinitely.
Explicit Timeouts
import requests
# ALWAYS set timeouts on external calls
# (connect_timeout, read_timeout) in seconds
response = requests.get(
"https://api.example.com/data",
timeout=(3.05, 10) # 3s to connect, 10s to read response
)
# For urllib:
import urllib.request
urllib.request.urlopen(url, timeout=10)
# For database connections (SQLAlchemy):
engine = create_engine(
"postgresql://...",
connect_args={"connect_timeout": 5}
)
Retry with Exponential Backoff
import time
import random
import functools
from typing import Tuple, Type
def retry(
max_attempts: int = 3,
exceptions: Tuple[Type[Exception], ...] = (Exception,),
base_delay: float = 1.0,
max_delay: float = 60.0,
exponential_base: float = 2.0,
jitter: bool = True,
):
"""Decorator: retry a function on specified exceptions with exponential backoff.
Args:
max_attempts: Maximum total attempts (including first try).
exceptions: Exception types that trigger a retry.
base_delay: Initial delay in seconds.
max_delay: Maximum delay in seconds.
exponential_base: Multiplier for exponential backoff.
jitter: Add random jitter to prevent thundering herd.
"""
def decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
last_exception = None
for attempt in range(1, max_attempts + 1):
try:
return func(*args, **kwargs)
except exceptions as e:
last_exception = e
if attempt == max_attempts:
break # No more retries
delay = min(
base_delay * (exponential_base ** (attempt - 1)),
max_delay
)
if jitter:
delay *= (0.5 + random.random() * 0.5)
print(
f"Attempt {attempt}/{max_attempts} failed: {e}. "
f"Retrying in {delay:.2f}s..."
)
time.sleep(delay)
raise RuntimeError(
f"All {max_attempts} attempts failed. Last error: {last_exception}"
) from last_exception
return wrapper
return decorator
# Usage: retry database operations that may fail transiently
import requests
@retry(max_attempts=3, exceptions=(requests.Timeout, requests.ConnectionError))
def fetch_user_data(user_id: int) -> dict:
response = requests.get(
f"https://api.example.com/users/{user_id}",
timeout=5.0
)
response.raise_for_status()
return response.json()
# Usage: retry database writes with backoff
@retry(
max_attempts=5,
exceptions=(OperationalError,), # Transient DB errors
base_delay=0.5,
max_delay=30.0,
)
def save_to_database(record: dict):
with get_db_session() as session:
session.add(MyModel(**record))
session.commit()
Part 9 \text{---} Real-World Application Patterns
REST API: Validation at the Boundary
Using FastAPI with Pydantic \text{---} the framework enforces validation at the boundary:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field, validator
from datetime import date
app = FastAPI()
class BookingRequest(BaseModel):
"""Pydantic model \text{---} validation runs automatically on request parsing."""
guest_name: str = Field(..., min_length=2, max_length=100)
checkin_date: date
checkout_date: date
num_guests: int = Field(..., ge=1, le=20)
email: str = Field(..., regex=r'^[a-zA-Z0-9._\%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$')
@validator("checkout_date")
def checkout_after_checkin(cls, checkout, values):
if "checkin_date" in values and checkout <= values["checkin_date"]:
raise ValueError("checkout_date must be after checkin_date")
return checkout
@validator("checkin_date")
def checkin_not_in_past(cls, checkin):
if checkin < date.today():
raise ValueError("checkin_date cannot be in the past")
return checkin
@app.post("/bookings/")
def create_booking(request: BookingRequest):
# If we reach here, all validation has passed
# The happy path is clean - no validation noise
booking_id = booking_service.create(
guest_name=request.guest_name,
checkin=request.checkin_date,
checkout=request.checkout_date,
guests=request.num_guests,
email=request.email,
)
return {"booking_id": booking_id, "status": "confirmed"}
Defensive ML Feature Engineering
import numpy as np
import pandas as pd
def engineer_features(df: pd.DataFrame) -> pd.DataFrame:
"""Transform raw data into model-ready features.
This function validates all assumptions about the input data
before transforming - fail fast at the data boundary.
"""
# Type validation
if not isinstance(df, pd.DataFrame):
raise TypeError(f"Expected DataFrame, got {type(df).__name__}")
# Schema validation
required_columns = {"age", "income", "credit_score", "loan_amount"}
missing = required_columns - set(df.columns)
if missing:
raise ValueError(f"Missing required columns: {missing}")
# Data quality validation
if df.empty:
raise ValueError("Input DataFrame is empty")
null_counts = df[list(required_columns)].isnull().sum()
if null_counts.any():
raise ValueError(
f"Null values found in required columns:\n{null_counts[null_counts > 0]}"
)
# Range validation
if (df["age"] < 18).any() or (df["age"] > 100).any():
bad_ages = df.loc[(df["age"] < 18) | (df["age"] > 100), "age"]
raise ValueError(f"Ages out of range [18, 100]: {bad_ages.values[:5]}")
if (df["income"] < 0).any():
raise ValueError("Negative income values found")
if ((df["credit_score"] < 300) | (df["credit_score"] > 850)).any():
raise ValueError("Credit scores must be in range [300, 850]")
# Safe transformation - defensive copies
result = df.copy() # Never modify the input DataFrame
# Feature engineering
result["debt_to_income"] = result["loan_amount"] / result["income"].clip(lower=1)
result["credit_tier"] = pd.cut(
result["credit_score"],
bins=[300, 580, 670, 740, 800, 850],
labels=["poor", "fair", "good", "very_good", "exceptional"],
)
result["log_income"] = np.log1p(result["income"])
# Post-condition: no NaN in output features
new_cols = ["debt_to_income", "log_income"]
if result[new_cols].isnull().any().any():
raise RuntimeError(
"Feature engineering produced NaN values - check for zero income"
)
return result
Safe Database Access Pattern
from contextlib import contextmanager
from sqlalchemy.orm import Session
@contextmanager
def safe_transaction(db: Session):
"""Context manager for safe database transactions with rollback on error."""
try:
yield db
db.commit()
except Exception as e:
db.rollback()
raise # Re-raise after rollback - never swallow DB errors
def update_user_balance(db: Session, user_id: int, delta: float) -> float:
"""Atomically update a user's balance. Returns new balance."""
# Guard: validate inputs before touching the database
if not isinstance(user_id, int) or user_id <= 0:
raise ValueError(f"user_id must be a positive integer, got {user_id!r}")
if not isinstance(delta, (int, float)):
raise TypeError(f"delta must be numeric, got {type(delta).__name__}")
with safe_transaction(db):
# Lock the row to prevent race conditions
user = (
db.query(User)
.filter(User.id == user_id)
.with_for_update() # SELECT ... FOR UPDATE
.first()
)
if user is None:
raise RecordNotFoundError("users", user_id)
new_balance = user.balance + delta
# Business rule: balance cannot go negative
if new_balance < 0:
raise InsufficientFundsError(
account_id=str(user_id),
amount=abs(delta),
balance=user.balance
)
user.balance = new_balance
# Commit happens automatically at end of `with safe_transaction`
return new_balance
Interview Questions
Q1: What is the difference between defensive programming and paranoid programming?
Answer: Defensive programming validates at defined trust boundaries and trusts code inside those boundaries. Paranoid programming validates everywhere, including between functions that all control. The practical difference: defensive programming adds validation at external inputs (user data, APIs, files, database rows) and trusts its own type-annotated internal functions. Paranoid programming re-validates inside every private helper function, adding noise and performance overhead without safety benefit. The key principle is "validate at the boundary, trust internally." Code that over-validates is harder to read, slower, and gives a false sense of security (the redundant checks may themselves have bugs).
Q2: Why is the mutable default argument def f(items=[]) a bug, and what is the correct pattern?
Answer: Python evaluates default argument values once when the def statement executes, not on each function call. The [] creates a single list object at definition time. Every call that uses the default receives the same list object. Mutations from one call (.append()) persist into subsequent calls. The correct pattern is to use None as a sentinel: def f(items=None): if items is None: items = []. This creates a fresh list on each call. The only legitimate use of a mutable default is an intentional shared cache, which should be clearly documented.
Q3: What is the "fail fast" principle and why is it important?
Answer: Fail fast means detecting and reporting an error at the exact point it occurs, not allowing the corrupted state or wrong value to propagate down the call stack where it eventually manifests as a confusing secondary error. For example: if a file row has the wrong number of columns, raise immediately at that row rather than letting the bad data reach a computation 50 lines later that fails with an IndexError. Fail fast is important because it: (1) makes errors dramatically easier to diagnose - the traceback points to the actual cause, (2) prevents corrupted data from being used, logged, or stored, (3) makes the error message relevant and specific rather than cryptic.
Q4: When should you use the Null Object pattern instead of returning None?
Answer: Use the Null Object pattern when the caller would need to check if result is not None: before every use of the return value, and when you can define a safe "empty" version of the return type that behaves correctly for all callers without the check. Classic cases: return an empty list instead of None for "no results found" (empty lists are safe to iterate and len()), return a NullUser object instead of None for "user not found" when callers need to call methods on the result. The pattern eliminates None propagation and makes calling code cleaner. Avoid it when the absence of a value IS semantically meaningful and callers must handle it differently from "found but empty" - in that case, None or raising an exception is more appropriate.
Q5: What is a circuit breaker and what problem does it solve?
Answer: A circuit breaker is a state machine wrapping calls to an external service. In CLOSED state, calls go through normally. If failures exceed a threshold, the circuit opens (transitions to OPEN state) and all calls immediately return an error without attempting the real call. After a cooldown period, one probe request is allowed (HALF-OPEN state); success resets to CLOSED, failure returns to OPEN. The problem it solves: without a circuit breaker, a slow or failing external service causes your code to block (waiting for timeouts) on every call, consuming threads/connections and cascading the failure through your system. The circuit breaker "short-circuits" the failing path, letting the rest of the system continue operating at degraded but functional level.
Q6: Explain the trust boundary principle in defensive programming. What should be validated at a REST API boundary?
Answer: The trust boundary principle says: code at or inside the boundary trusts its inputs have been validated; code at the boundary validates everything. For a REST API, the boundary is where the HTTP request enters the application. At that boundary you should validate: (1) types - are fields the right JSON type (string, number, boolean, not null where required)? (2) ranges - are numbers within acceptable bounds, are strings within min/max length? (3) format - do emails match the pattern, are dates parseable, are UUIDs valid? (4) business rules - is checkout_date after checkin_date, is the requested resource owned by the requesting user? Once the request passes all these checks and is represented as a typed, validated model (e.g., a Pydantic model in FastAPI), the business logic layer trusts that model completely without re-validating.
Practice Challenges
Beginner - Fix the Mutable Default and Add Guard Clauses
Problem: The following function has two defensive programming problems: a mutable default argument and deeply nested logic. Fix both.
def add_user_to_groups(username, groups=[]):
if username:
if isinstance(groups, list):
if username not in groups:
groups.append(username)
return groups
else:
return groups
else:
return None
else:
return None
Solution
def add_user_to_groups(username: str, groups: list = None) -> list:
"""Add username to groups if not already present.
Args:
username: The username to add. Must be a non-empty string.
groups: The list of existing group members. Defaults to a new empty list.
Returns:
The updated list of group members.
Raises:
TypeError: If username is not a string or groups is not a list.
ValueError: If username is empty or whitespace.
"""
# Fix 1: None sentinel replaces mutable default
if groups is None:
groups = []
# Fix 2: Guard clauses replace nested conditionals
if not isinstance(username, str):
raise TypeError(f"username must be a string, got {type(username).__name__}")
if not isinstance(groups, list):
raise TypeError(f"groups must be a list, got {type(groups).__name__}")
username = username.strip()
if not username:
raise ValueError("username cannot be empty or whitespace")
# Happy path - flat, readable
if username not in groups:
groups.append(username)
return groups
# Test it
print(add_user_to_groups("alice")) # ['alice']
print(add_user_to_groups("bob")) # ['bob'] ← Fresh list, not ['alice', 'bob']!
print(add_user_to_groups("alice", ["bob"])) # ['bob', 'alice']
print(add_user_to_groups("bob", ["bob"])) # ['bob'] ← No duplicate
try:
add_user_to_groups("")
except ValueError as e:
print(f"ValueError: {e}") # ValueError: username cannot be empty or whitespace
try:
add_user_to_groups(42)
except TypeError as e:
print(f"TypeError: {e}") # TypeError: username must be a string, got int
Intermediate - Defensive File Parser with Full Validation
Problem: Write a function parse_transactions_csv(file_path) that reads a CSV of financial transactions. Each row must have columns: date, amount, type (debit or credit), description. The function must: (1) validate the file exists and is readable, (2) validate each row has the correct number of columns, (3) validate each field type and range, (4) collect all errors before raising (report all problems at once, not just the first), and (5) return a list of typed transaction dicts.
Solution
import csv
import os
from datetime import datetime
from typing import List
class TransactionValidationError(Exception):
"""Raised when one or more rows in the CSV fail validation."""
def __init__(self, errors: list):
self.errors = errors
error_lines = "\n".join(f" Row {row}: {msg}" for row, msg in errors)
super().__init__(f"{len(errors)} validation error(s):\n{error_lines}")
def parse_transactions_csv(file_path: str) -> List[dict]:
"""Parse and validate a transactions CSV file.
Expected columns: date (YYYY-MM-DD), amount (positive float),
type ('debit' or 'credit'), description (non-empty string).
Args:
file_path: Path to the CSV file.
Returns:
List of transaction dicts with typed values.
Raises:
FileNotFoundError: If file_path does not exist.
PermissionError: If the file cannot be read.
ValueError: If the file is empty or has no header.
TransactionValidationError: If any row fails validation.
"""
# Guard 1: file existence
if not os.path.exists(file_path):
raise FileNotFoundError(f"File not found: {file_path!r}")
if not os.access(file_path, os.R_OK):
raise PermissionError(f"Cannot read file: {file_path!r}")
# Read the file
with open(file_path, newline="", encoding="utf-8") as f:
reader = csv.DictReader(f)
# Guard 2: required headers
if reader.fieldnames is None:
raise ValueError(f"File is empty or has no header: {file_path!r}")
required_cols = {"date", "amount", "type", "description"}
missing_cols = required_cols - set(reader.fieldnames)
if missing_cols:
raise ValueError(
f"Missing required columns: {missing_cols}. "
f"Found: {set(reader.fieldnames)}"
)
rows = list(reader)
if not rows:
raise ValueError(f"File has header but no data rows: {file_path!r}")
# Validate each row - collect ALL errors before raising
errors = []
transactions = []
for row_num, row in enumerate(rows, start=2): # start=2 (row 1 is header)
row_errors = []
parsed = {}
# Validate date
date_str = row.get("date", "").strip()
try:
parsed["date"] = datetime.strptime(date_str, "%Y-%m-%d").date()
except ValueError:
row_errors.append(f"'date' must be YYYY-MM-DD, got {date_str!r}")
# Validate amount
amount_str = row.get("amount", "").strip()
try:
amount = float(amount_str)
if amount <= 0:
row_errors.append(f"'amount' must be positive, got {amount}")
else:
parsed["amount"] = amount
except ValueError:
row_errors.append(f"'amount' must be a number, got {amount_str!r}")
# Validate type
txn_type = row.get("type", "").strip().lower()
if txn_type not in ("debit", "credit"):
row_errors.append(
f"'type' must be 'debit' or 'credit', got {txn_type!r}"
)
else:
parsed["type"] = txn_type
# Validate description
desc = row.get("description", "").strip()
if not desc:
row_errors.append("'description' cannot be empty")
elif len(desc) > 500:
row_errors.append(
f"'description' too long ({len(desc)} chars, max 500)"
)
else:
parsed["description"] = desc
if row_errors:
for err in row_errors:
errors.append((row_num, err))
else:
transactions.append(parsed)
# Raise once with ALL errors collected
if errors:
raise TransactionValidationError(errors)
return transactions
# Test
import io
import tempfile
sample_csv = """date,amount,type,description
2026-01-15,100.00,credit,Salary
2026-01-20,45.50,debit,Groceries
bad-date,-50,unknown,
2026-01-25,200.00,debit,Rent payment
"""
# Write to temp file and parse
with tempfile.NamedTemporaryFile(mode='w', suffix='.csv', delete=False) as f:
f.write(sample_csv)
tmp_path = f.name
try:
transactions = parse_transactions_csv(tmp_path)
for t in transactions:
print(t)
except TransactionValidationError as e:
print(f"Validation failed:\n{e}")
for row, msg in e.errors:
print(f" Row {row}: {msg}")
# Output (row 3 has multiple errors):
# Validation failed:
# 3 validation error(s):
# Row 4: 'date' must be YYYY-MM-DD, got 'bad-date'
# Row 4: 'amount' must be positive, got -50.0
# Row 4: 'type' must be 'debit' or 'credit', got 'unknown'
# Row 4: 'description' cannot be empty
import os
os.unlink(tmp_path)
Advanced - Production-Grade API Client with Circuit Breaker, Retry, and Defensive Copying
Problem: Build a WeatherAPIClient class that: (1) validates all inputs defensively, (2) uses a circuit breaker to stop calling a failing API, (3) implements exponential backoff retry for transient errors, (4) returns defensive copies of its cached data, and (5) never exposes raw exceptions from the underlying HTTP library - always wraps them in domain exceptions.
Solution
import time
import random
import copy
import urllib.request
import urllib.error
import json
from dataclasses import dataclass, field
from typing import Optional
# ── Domain exceptions ─────────────────────────────────────────────────────────
class WeatherClientError(Exception):
"""Base exception for WeatherAPIClient errors."""
pass
class WeatherAPIUnavailableError(WeatherClientError):
"""The weather API is currently unavailable (circuit open)."""
pass
class WeatherAPITimeoutError(WeatherClientError):
"""Request to weather API timed out."""
def __init__(self, city: str, timeout: float):
self.city = city
self.timeout = timeout
super().__init__(f"Weather API timed out after {timeout}s for city={city!r}")
class WeatherAPIResponseError(WeatherClientError):
"""Weather API returned an error response."""
def __init__(self, city: str, status_code: int):
self.city = city
self.status_code = status_code
super().__init__(f"Weather API returned HTTP {status_code} for city={city!r}")
class WeatherValidationError(WeatherClientError, ValueError):
"""Invalid input to the weather client."""
pass
# ── Circuit breaker ───────────────────────────────────────────────────────────
@dataclass
class _CircuitBreaker:
failure_threshold: int = 5
cooldown_seconds: float = 30.0
_failures: int = field(default=0, init=False)
_state: str = field(default="closed", init=False)
_opened_at: float = field(default=0.0, init=False)
def is_available(self) -> bool:
if self._state == "closed":
return True
if self._state == "open":
if time.monotonic() - self._opened_at >= self.cooldown_seconds:
self._state = "half-open"
return True
return False
return True # half-open: allow one probe
def on_success(self):
self._failures = 0
self._state = "closed"
def on_failure(self):
self._failures += 1
if self._failures >= self.failure_threshold:
self._state = "open"
self._opened_at = time.monotonic()
# ── Main client ───────────────────────────────────────────────────────────────
class WeatherAPIClient:
"""Defensive HTTP client for the weather API.
Features:
- Input validation (type, range, format) before any network call
- Circuit breaker: stops calling a failing API automatically
- Exponential backoff retry for transient failures
- Response caching with defensive copies on retrieval
- Domain exception wrapping: never exposes urllib internals
"""
VALID_UNITS = ("metric", "imperial", "standard")
MAX_CITY_LENGTH = 100
def __init__(
self,
api_key: str,
base_url: str = "https://api.openweathermap.org/data/2.5",
timeout_seconds: float = 5.0,
max_retries: int = 3,
circuit_failure_threshold: int = 5,
circuit_cooldown_seconds: float = 30.0,
):
# Input validation on construction
if not isinstance(api_key, str) or not api_key.strip():
raise WeatherValidationError("api_key must be a non-empty string")
if timeout_seconds <= 0:
raise WeatherValidationError(f"timeout_seconds must be positive, got {timeout_seconds}")
if max_retries < 1:
raise WeatherValidationError(f"max_retries must be >= 1, got {max_retries}")
self._api_key = api_key
self._base_url = base_url.rstrip("/")
self._timeout = timeout_seconds
self._max_retries = max_retries
self._breaker = _CircuitBreaker(
failure_threshold=circuit_failure_threshold,
cooldown_seconds=circuit_cooldown_seconds,
)
# Cache: {city -> weather_dict}
self._cache: dict = {}
def _validate_city(self, city: str) -> str:
"""Validate and normalise city name. Returns stripped city."""
if not isinstance(city, str):
raise WeatherValidationError(
f"city must be a string, got {type(city).__name__}"
)
city = city.strip()
if not city:
raise WeatherValidationError("city cannot be empty or whitespace")
if len(city) > self.MAX_CITY_LENGTH:
raise WeatherValidationError(
f"city name too long ({len(city)} chars, max {self.MAX_CITY_LENGTH})"
)
return city
def _validate_units(self, units: str) -> str:
if units not in self.VALID_UNITS:
raise WeatherValidationError(
f"units must be one of {self.VALID_UNITS}, got {units!r}"
)
return units
def _do_request(self, city: str, units: str) -> dict:
"""Make a single HTTP request. Raises domain exceptions on failure."""
url = (
f"{self._base_url}/weather"
f"?q={urllib.parse.quote(city)}"
f"&units={units}"
f"&appid={self._api_key}"
)
try:
with urllib.request.urlopen(url, timeout=self._timeout) as response:
body = response.read().decode("utf-8")
return json.loads(body)
except TimeoutError:
raise WeatherAPITimeoutError(city, self._timeout)
except urllib.error.HTTPError as e:
raise WeatherAPIResponseError(city, e.code) from e
except urllib.error.URLError as e:
raise WeatherClientError(
f"Network error fetching weather for {city!r}: {e.reason}"
) from e
def _fetch_with_retry(self, city: str, units: str) -> dict:
"""Retry with exponential backoff on transient failures."""
last_error = None
transient = (WeatherAPITimeoutError, WeatherClientError)
for attempt in range(1, self._max_retries + 1):
try:
return self._do_request(city, units)
except WeatherAPIResponseError as e:
# 4xx errors are not retryable
if e.status_code < 500:
raise
last_error = e
except transient as e:
last_error = e
if attempt < self._max_retries:
delay = (2 ** (attempt - 1)) * (0.5 + random.random() * 0.5)
time.sleep(min(delay, 30.0))
raise WeatherClientError(
f"All {self._max_retries} retries failed for city={city!r}"
) from last_error
def get_weather(self, city: str, units: str = "metric") -> dict:
"""Fetch current weather for a city.
Args:
city: City name (non-empty string, max 100 chars).
units: Unit system - 'metric', 'imperial', or 'standard'.
Returns:
Weather data dict (a defensive copy - callers can mutate safely).
Raises:
WeatherValidationError: Invalid arguments.
WeatherAPIUnavailableError: Circuit breaker is open.
WeatherAPITimeoutError: Request timed out on all retries.
WeatherAPIResponseError: API returned an error HTTP status.
WeatherClientError: Other network or parsing failure.
"""
# Validate inputs before touching the network
city = self._validate_city(city)
units = self._validate_units(units)
# Check circuit breaker
if not self._breaker.is_available():
raise WeatherAPIUnavailableError(
f"Weather API circuit is OPEN - too many recent failures. "
f"Try again in {self._breaker.cooldown_seconds}s."
)
# Check cache first
cache_key = f"{city.lower()}:{units}"
if cache_key in self._cache:
# Return a defensive DEEP copy - callers cannot corrupt our cache
return copy.deepcopy(self._cache[cache_key])
# Fetch with retry
try:
data = self._fetch_with_retry(city, units)
self._breaker.on_success()
except (WeatherAPITimeoutError, WeatherClientError) as e:
self._breaker.on_failure()
raise
# Store a copy in cache - we keep our own reference, independent of caller
self._cache[cache_key] = copy.deepcopy(data)
# Return a copy to the caller - they get their own object
return copy.deepcopy(data)
def clear_cache(self) -> int:
"""Clear all cached responses. Returns number of entries cleared."""
count = len(self._cache)
self._cache.clear()
return count
@property
def circuit_state(self) -> str:
return self._breaker._state
# ── Demonstrate the defensive patterns ───────────────────────────────────────
client = WeatherAPIClient(api_key="test-key-123", timeout_seconds=5.0)
# Input validation fires immediately
try:
client.get_weather("")
except WeatherValidationError as e:
print(f"Validation: {e}")
# Validation: city cannot be empty or whitespace
try:
client.get_weather("London", units="kelvin")
except WeatherValidationError as e:
print(f"Validation: {e}")
# Validation: units must be one of ('metric', 'imperial', 'standard'), got 'kelvin'
try:
client.get_weather(12345)
except WeatherValidationError as e:
print(f"Validation: {e}")
# Validation: city must be a string, got int
print(f"Circuit state: {client.circuit_state}") # Circuit state: closed
Quick Reference
| Technique | Pattern | Use Case |
|---|---|---|
| Guard clause | if bad: raise at top of function | Flatten nested logic, validate early |
| Fail fast | Validate at point of data arrival | Prevent error propagation |
| Fail loudly | Always raise, never except: pass | Ensure errors are visible |
| Mutable default fix | def f(x=None): if x is None: x = [] | Any mutable default argument |
| Defensive copy in | self._data = list(external_input) | Prevent aliasing on received data |
| Defensive copy out | return list(self._data) | Prevent callers from corrupting internals |
| Deep copy | copy.deepcopy(obj) | Nested mutable structures |
| Null object | Return NullUser() instead of None | Eliminate None propagation in callers |
| Empty collection | Return [] or {} not None | Safe for iteration without None check |
| Circuit breaker | State machine wrapping external calls | Stop hammering a failing service |
| Explicit timeout | requests.get(url, timeout=(3, 10)) | Always on external HTTP/DB calls |
| Retry + backoff | Exponential delay between retries | Transient network/DB failures |
| Boundary validation | Full type + range + format + business rule checks | Any external data entry point |
| Trust internally | No re-validation inside private helpers | After boundary validation, trust typed objects |
Key Takeaways
- Defensive programming is about knowing where the trust boundary is and validating aggressively there - not re-validating everywhere
- Guard clauses eliminate deep nesting by handling bad cases at the top of a function and keeping the happy path flat and readable
- Fail fast means detecting errors at the point they occur; fail loudly means raising specific exceptions with clear messages, never swallowing errors silently
- The mutable default argument trap (
def f(items=[])) is one of Python's most notorious bugs - always useNoneas a sentinel and create fresh containers inside the function - Defensive copying prevents aliasing bugs: copy mutable inputs on the way in and copy mutable outputs on the way out so callers cannot corrupt internal state
- The Null Object pattern eliminates
Nonepropagation by returning safe, empty-but-usable objects instead ofNonewhen an entity is absent - External service calls always need timeouts (never block forever), retry with backoff (handle transient failures), and ideally a circuit breaker (stop cascading failures from a degraded service)
