Skip to main content

Static Analysis and Type Systems

The Shape Bug That Deployed to Production

The model was working in development. Unit tests passed. The integration test passed. The staging deployment ran without errors. The first day in production, a small fraction of user requests began returning garbage predictions - not exceptions, not 500 errors, just silent wrong answers. It took four hours to find the root cause.

A data preprocessing step expected tensors of shape (batch_size, sequence_length, features) - three dimensions. A refactoring six weeks earlier had changed the upstream pipeline to produce shape (batch_size, features, sequence_length) - same data, wrong axis order. The model consumed both shapes without raising an exception. When multiplied against the weight matrix, the shapes broadcast incorrectly through a fortunate numeric coincidence, producing output that was within a plausible range but systematically wrong.

This is not a Python-specific problem, but Python makes it worse than it needs to be. A statically-typed language like Rust or Go would catch the type mismatch at compile time. Python, with no type annotations and no shape checking, passes the tensor through silently.

The fix took twenty minutes. Four hours of debugging, twenty minutes of fixing. The real cost was the silently wrong predictions that had already gone out. And all of it was preventable.

Static type checking and shape annotations exist precisely to catch this class of bug before it reaches production. Not through elaborate testing - through the type checker reading your code and saying "this shape does not match." This lesson covers the full Python type safety stack: from basic type hints to mypy strict mode, pydantic v2 validation, Protocol types, and jaxtyping for tensor shape annotations.


Why This Exists - The Dynamic Typing Tax

Python's dynamic typing is deliberate. Guido van Rossum designed Python for rapid iteration, exploration, and readability. For scripting, data exploration, and prototyping, dynamic typing is a genuine advantage: you write less boilerplate, you can experiment with different types without declaration overhead, and you get flexible, generic code naturally.

But at scale, dynamic typing accumulates a tax. Every function signature is an implicit contract. Without annotations, that contract is undocumented: callers must read the function body or the docstring to understand what types are expected, what is returned, and what exceptions might be raised. In a codebase with 200,000 lines and 15 engineers, undocumented contracts generate bugs.

The specific bugs that dynamic typing enables:

  • Signature drift: a function signature changes but callers are not updated (no compile error)
  • Null propagation: None returned from one function is passed to another that does not accept it
  • Type narrowing failures: code assumes a value is a list but it might be None
  • Shape errors in ML: tensors passed with wrong dimensions, wrong dtypes, wrong device

PEP 484, accepted in 2014, introduced optional type hints to Python. They are a description of intent - Python does not enforce them at runtime - but tools like mypy and pyright read them and report violations. The key insight of PEP 484 was that type annotations could be entirely optional and backward-compatible, enabling gradual adoption in existing codebases.


Historical Context - From PEP 484 to the Modern Type Stack

The Python type system has evolved rapidly through a series of PEPs:

  • PEP 484 (2014): Introduced typing module with List, Dict, Optional, Union, Callable
  • PEP 526 (2016): Variable annotations (x: int = 5 instead of comments)
  • PEP 544 (2017): Protocols - structural subtyping (interfaces without inheritance)
  • PEP 563 (2017): Postponed evaluation of annotations (from __future__ import annotations)
  • PEP 589 (2019): TypedDict - typed dictionaries
  • PEP 612 (2020): ParamSpec - preserving parameter types through decorators
  • PEP 646 (2022): TypeVarTuple - variadic generics (important for tensor shapes)
  • PEP 673 (2022): Self type - cleaner return type for constructors and method chaining

The tooling side evolved in parallel. mypy (Dropbox, started 2012) was the first mainstream Python type checker. pyright (Microsoft, 2019) arrived with faster incremental checking and became the engine for Pylance in VS Code. Pydantic evolved from v1 (pure Python validation) to v2 (Rust-powered validation engine, 5-50x faster). ruff (Astral, 2022) rewrote Python linting in Rust, making it 10-100x faster than flake8 while unifying many tools.


Python Type Hints - The Foundation

Type hints annotate function signatures and variables. The Python runtime ignores them; type checkers read them.

Basic Annotations

# basic_types.py
from typing import Optional, Union

# Function annotations
def compute_similarity(
query: list[float],
candidate: list[float],
metric: str = "cosine"
) -> float:
"""Compute similarity between two vectors."""
...

# Variable annotations
learning_rate: float = 1e-3
model_name: str = "bert-base-uncased"
num_layers: int = 12
use_gpu: bool = True

# Optional type (value OR None)
checkpoint_path: Optional[str] = None # equivalent to str | None

# Union type (one of several types)
embedding: Union[list[float], None] = None # old style
embedding_new: list[float] | None = None # Python 3.10+ style (preferred)

Container Types

# Python 3.9+: lowercase generics (no need to import from typing)
from typing import Any

# Lists, dicts, tuples with element types
scores: list[float] = [0.9, 0.1, 0.85]
config: dict[str, Any] = {"lr": 1e-3, "epochs": 10}
coordinates: tuple[float, float] = (1.0, 2.0)

# Variable-length tuples
points: tuple[float, ...] = (1.0, 2.0, 3.0) # any number of floats

# Nested containers
batch: list[dict[str, list[float]]] = []

# Sets and frozensets
allowed_metrics: set[str] = {"cosine", "euclidean", "dot"}
immutable_tags: frozenset[str] = frozenset({"ml", "nlp"})

TypeVar and Generics

# generics.py
from typing import TypeVar, Generic, Sequence

T = TypeVar("T")
NumberT = TypeVar("NumberT", int, float)

def first_element(sequence: Sequence[T]) -> T:
"""Returns first element, preserving the exact type."""
return sequence[0]

# Usage - type checker infers return type from input
x: int = first_element([1, 2, 3]) # x is int
y: str = first_element(["a", "b", "c"]) # y is str

# Generic class
class Stack(Generic[T]):
def __init__(self) -> None:
self._items: list[T] = []

def push(self, item: T) -> None:
self._items.append(item)

def pop(self) -> T:
return self._items.pop()

int_stack: Stack[int] = Stack()
int_stack.push(42)
value: int = int_stack.pop()

mypy - The Standard Type Checker

mypy reads your type annotations and reports type errors without running the code.

Installing and Running

pip install mypy

# Basic check
mypy my_module.py

# Check an entire package
mypy src/

# Strict mode (maximum checking)
mypy --strict src/

# With configuration file
mypy --config-file mypy.ini src/

mypy Configuration

# mypy.ini
[mypy]
python_version = 3.11
strict = True # enable all strict checks
warn_return_any = True # warn when returning Any type
warn_unused_configs = True
disallow_untyped_defs = True # all functions must have type annotations
disallow_incomplete_defs = True # partial annotations are not allowed
check_untyped_defs = True # type-check even unannotated functions
disallow_untyped_decorators = True
no_implicit_optional = True
warn_redundant_casts = True
warn_unused_ignores = True
ignore_missing_imports = False

# Per-module overrides (for third-party libs without stubs)
[mypy-numpy.*]
ignore_missing_imports = True

[mypy-torch.*]
ignore_missing_imports = True

[mypy-sklearn.*]
ignore_missing_imports = True

mypy vs pyright Comparison

Featuremypypyright
SpeedModerate (can be slow on large codebases)Fast (TypeScript runtime)
Standards complianceExcellentExcellent
VS Code integrationVia extensionNative (Pylance)
Inference qualityGoodSlightly better
Plugin ecosystemGood (pydantic plugin, etc.)Growing
CI usageVery commonGrowing

For new projects: use pyright in VS Code (via Pylance) for instant feedback, and run mypy in CI for the final gate.


Protocols - Structural Subtyping

Protocols (PEP 544) enable structural subtyping: a class satisfies a Protocol if it has the right methods and attributes, regardless of inheritance. This is Python's version of Go interfaces or Rust traits.

# protocols.py
from typing import Protocol, runtime_checkable

# Define an interface structurally
class Embedder(Protocol):
def encode(self, text: str) -> list[float]: ...
def batch_encode(self, texts: list[str]) -> list[list[float]]: ...
@property
def embedding_dim(self) -> int: ...

# Any class with these methods satisfies the Protocol
# NO inheritance required

class SentenceTransformerEmbedder:
"""Wraps sentence-transformers library."""

def __init__(self, model_name: str) -> None:
self._model_name = model_name
self._dim = 768

def encode(self, text: str) -> list[float]:
# actual implementation would call sentence-transformers
return [0.0] * self._dim

def batch_encode(self, texts: list[str]) -> list[list[float]]:
return [self.encode(t) for t in texts]

@property
def embedding_dim(self) -> int:
return self._dim

class OpenAIEmbedder:
"""Wraps OpenAI embeddings API."""

def __init__(self, model: str = "text-embedding-ada-002") -> None:
self._model = model

def encode(self, text: str) -> list[float]:
# actual implementation would call openai.Embedding.create
return [0.0] * 1536

def batch_encode(self, texts: list[str]) -> list[list[float]]:
return [self.encode(t) for t in texts]

@property
def embedding_dim(self) -> int:
return 1536

# Both satisfy Embedder without inheriting from anything
def build_index(embedder: Embedder, documents: list[str]) -> list[list[float]]:
"""Works with any Embedder implementation."""
return embedder.batch_encode(documents)

# Type checker accepts both:
embedder_a: Embedder = SentenceTransformerEmbedder("all-MiniLM-L6-v2")
embedder_b: Embedder = OpenAIEmbedder()

# runtime_checkable enables isinstance() checks
@runtime_checkable
class HasEncode(Protocol):
def encode(self, text: str) -> list[float]: ...

print(isinstance(embedder_a, HasEncode)) # True at runtime

Protocols are particularly valuable in ML systems where you want to swap model backends, data loaders, or optimizers without creating inheritance hierarchies.


TypedDict - Typed Dictionaries

TypedDict lets you specify the exact keys and value types of a dictionary. This is common in ML config handling and API response parsing.

# typed_dicts.py
from typing import TypedDict, Required, NotRequired

class ModelConfig(TypedDict):
model_name: str
num_layers: int
hidden_dim: int
dropout: float
learning_rate: float

# With required/optional fields (Python 3.11+)
class TrainingConfig(TypedDict, total=False):
# total=False makes all fields optional by default
epochs: int
batch_size: int
gradient_clip: float

class FullConfig(TypedDict):
model: ModelConfig
training: TrainingConfig
dataset_path: Required[str] # explicitly required
output_dir: NotRequired[str] # explicitly optional

# Usage - type checker validates dict literal shapes
config: FullConfig = {
"model": {
"model_name": "bert-base",
"num_layers": 12,
"hidden_dim": 768,
"dropout": 0.1,
"learning_rate": 2e-5,
},
"training": {
"epochs": 10,
"batch_size": 32,
},
"dataset_path": "/data/imdb",
}

# Type checker catches wrong types:
# config["model"]["num_layers"] = "twelve" # Error: expected int, got str

Literal Types - Narrowing Allowed Values

# literal_types.py
from typing import Literal, overload

# Only specific values allowed
ActivationType = Literal["relu", "gelu", "swish", "tanh"]
DeviceType = Literal["cpu", "cuda", "mps"]
PrecisionType = Literal["float32", "float16", "bfloat16"]

def create_model(
activation: ActivationType = "relu",
device: DeviceType = "cuda",
precision: PrecisionType = "float32",
) -> None:
...

# Type checker catches invalid literals:
# create_model(activation="sigmoid") # Error: not a valid ActivationType

# @overload enables different return types based on argument values
@overload
def get_device_tensor(device: Literal["cpu"]) -> "torch.Tensor": ...
@overload
def get_device_tensor(device: Literal["cuda"]) -> "torch.Tensor": ...
@overload
def get_device_tensor(device: str) -> "torch.Tensor": ...

def get_device_tensor(device: str) -> "torch.Tensor":
import torch
return torch.zeros(1, device=device)

Pydantic v2 - Runtime Type Validation

Type hints are checked at static analysis time (before running). Pydantic validates at runtime - when you actually receive data from an API, a config file, or user input.

# pydantic_config.py
from pydantic import BaseModel, Field, field_validator, model_validator
from pydantic import ConfigDict
from typing import Annotated
import os

# Define the config schema
class DatabaseConfig(BaseModel):
host: str
port: int = Field(default=5432, ge=1, le=65535)
database: str
username: str
password: str = Field(min_length=8)

model_config = ConfigDict(
frozen=True, # make instances immutable
extra="forbid", # raise error on unexpected fields
)

class ModelConfig(BaseModel):
name: str = Field(min_length=1)
hidden_dim: int = Field(ge=64, le=8192)
num_layers: int = Field(ge=1, le=128)
dropout: float = Field(ge=0.0, le=1.0)
learning_rate: float = Field(gt=0.0, lt=1.0)
device: Literal["cpu", "cuda", "mps"] = "cuda"

@field_validator("hidden_dim")
@classmethod
def hidden_dim_must_be_power_of_2(cls, v: int) -> int:
if v & (v - 1) != 0:
raise ValueError(f"hidden_dim must be a power of 2, got {v}")
return v

@model_validator(mode="after")
def validate_dropout_for_inference(self) -> "ModelConfig":
if self.dropout > 0.5 and self.num_layers < 4:
raise ValueError(
"High dropout (>0.5) on shallow networks (<4 layers) "
"typically prevents convergence"
)
return self

class TrainingConfig(BaseModel):
model: ModelConfig
database: DatabaseConfig
output_dir: str = Field(default="./checkpoints")
max_epochs: int = Field(default=100, ge=1)
early_stopping_patience: int = Field(default=10, ge=1)

# Validation happens on construction
try:
config = TrainingConfig.model_validate({
"model": {
"name": "transformer",
"hidden_dim": 512, # power of 2 - ok
"num_layers": 6,
"dropout": 0.1,
"learning_rate": 1e-4,
},
"database": {
"host": "localhost",
"database": "training_db",
"username": "admin",
"password": "securepassword",
}
})
print(f"Config valid: {config.model.name}")
except Exception as e:
print(f"Validation error: {e}")

# Load from JSON or YAML
import json
with open("config.json") as f:
config = TrainingConfig.model_validate_json(f.read())

# Serialize back to dict or JSON
config_dict = config.model_dump()
config_json = config.model_dump_json(indent=2)

Pydantic v2 Performance

Pydantic v2's validation engine (pydantic-core) is written in Rust. Benchmark numbers from the Pydantic team:

Pydantic v1 validate (pure Python): ~10,000 validations/second
Pydantic v2 validate (Rust core): ~150,000 validations/second
Speedup: ~15x

For inference services that validate every incoming request, this matters.


Type Narrowing and isinstance Guards

Type narrowing is the type checker's ability to refine the type of a variable after a conditional check.

# narrowing.py
from typing import Union

def process_config_value(
value: Union[str, int, float, None]
) -> str:
if value is None:
return "default"
# After the None check, type checker knows value is str | int | float

if isinstance(value, str):
return value.upper() # type checker knows: value is str here
elif isinstance(value, int):
return str(value) # type checker knows: value is int here
else:
return f"{value:.4f}" # type checker knows: value is float here

# Type narrowing with TypeGuard
from typing import TypeGuard
import torch

def is_cuda_tensor(tensor: torch.Tensor) -> TypeGuard[torch.Tensor]:
"""TypeGuard tells the type checker this narrows the type."""
return tensor.device.type == "cuda"

def process_tensor(tensor: torch.Tensor) -> None:
if is_cuda_tensor(tensor):
# Type checker knows tensor is a CUDA tensor in this branch
# (TypeGuard is a hint to the checker, not enforced at runtime)
pass

ParamSpec - Typing Decorators That Preserve Signatures

ParamSpec (PEP 612) solves the long-standing problem of decorators losing type information:

# paramspec_decorator.py
from typing import ParamSpec, TypeVar, Callable
from functools import wraps
import time
import logging

P = ParamSpec("P") # captures the parameter specification
R = TypeVar("R") # captures the return type

def log_and_time(func: Callable[P, R]) -> Callable[P, R]:
"""Decorator that preserves the full signature of the decorated function."""
@wraps(func)
def wrapper(*args: P.args, **kwargs: P.kwargs) -> R:
logger = logging.getLogger(func.__module__)
start = time.perf_counter()
try:
result = func(*args, **kwargs)
elapsed = time.perf_counter() - start
logger.info(f"{func.__name__} completed in {elapsed:.3f}s")
return result
except Exception as e:
logger.error(f"{func.__name__} failed: {e}")
raise
return wrapper

@log_and_time
def train_epoch(
model: "torch.nn.Module",
dataloader: "torch.utils.data.DataLoader",
optimizer: "torch.optim.Optimizer",
device: str = "cuda",
) -> float:
"""Returns average loss for the epoch."""
total_loss = 0.0
# ... training loop ...
return total_loss

# With ParamSpec, the type checker knows train_epoch still has these parameters:
# train_epoch(model, dataloader, optimizer, device="cpu") # correct
# train_epoch(model, device="cpu") # Error: missing dataloader and optimizer

Without ParamSpec, the decorated function's type signature degenerates to (*args: Any, **kwargs: Any) -> Any, losing all type safety.


dataclasses vs attrs vs pydantic

# comparison.py
from dataclasses import dataclass, field
import attrs
from pydantic import BaseModel, Field

# dataclasses: stdlib, zero dependencies, no validation
@dataclass
class DataclassConfig:
model_name: str
num_layers: int = 6
hidden_dim: int = 512
tags: list[str] = field(default_factory=list)
# No runtime validation - num_layers="six" would be accepted

# attrs: more features than dataclasses (slots, validators, converters)
@attrs.define
class AttrsConfig:
model_name: str = attrs.field()
num_layers: int = attrs.field(
default=6,
validator=attrs.validators.instance_of(int)
)
hidden_dim: int = attrs.field(
default=512,
validator=attrs.validators.and_(
attrs.validators.instance_of(int),
attrs.validators.gt(0)
)
)

# pydantic: fullest validation, JSON serialization, schema generation
class PydanticConfig(BaseModel):
model_name: str
num_layers: int = Field(default=6, ge=1, le=128)
hidden_dim: int = Field(default=512, ge=64)
# Full validation, JSON round-trip, OpenAPI schema generation

Decision guide:

  • Use dataclasses for simple data containers where you just need structured storage, no validation
  • Use attrs when you want validation without Pydantic's HTTP/JSON feature set
  • Use pydantic for anything that crosses a boundary: config files, API requests/responses, ML experiment configs

jaxtyping - Tensor Shape Annotations

jaxtyping brings static shape annotations to NumPy, PyTorch, and JAX. Combined with beartype, it enforces them at runtime.

# tensor_shapes.py
from jaxtyping import Float, Int, Bool, Array
import numpy as np
import torch
from beartype import beartype

# Shape annotations with jaxtyping:
# Float[Tensor, "batch seq features"] means:
# - dtype is float
# - shape has 3 dims named batch, seq, features

@beartype # enables runtime shape checking
def scaled_dot_product_attention(
query: Float[torch.Tensor, "batch heads seq_q d_k"],
key: Float[torch.Tensor, "batch heads seq_k d_k"],
value: Float[torch.Tensor, "batch heads seq_k d_v"],
mask: Bool[torch.Tensor, "batch 1 seq_q seq_k"] | None = None,
) -> Float[torch.Tensor, "batch heads seq_q d_v"]:
"""Standard scaled dot-product attention with shape checking."""
import math
d_k = query.shape[-1]
scores = torch.matmul(query, key.transpose(-2, -1)) / math.sqrt(d_k)

if mask is not None:
scores = scores.masked_fill(~mask, float("-inf"))

weights = torch.softmax(scores, dim=-1)
return torch.matmul(weights, value)

# With @beartype, calling with wrong shapes raises an error:
# batch, heads, seq, d_k, d_v = 2, 8, 128, 64, 64
# q = torch.randn(batch, heads, seq, d_k)
# k = torch.randn(batch, heads, seq, d_k)
# v = torch.randn(batch, heads, seq, d_v)
# result = scaled_dot_product_attention(q, k, v) # works
# result = scaled_dot_product_attention(k, q, v) # raises if shapes differ

# Named dimensions ensure consistency across function calls
@beartype
def compute_loss(
logits: Float[torch.Tensor, "batch num_classes"],
targets: Int[torch.Tensor, "batch"],
) -> Float[torch.Tensor, ""]: # scalar output
return torch.nn.functional.cross_entropy(logits, targets)

beartype for Runtime Checking

# beartype_demo.py
from beartype import beartype
from beartype.roar import BeartypeCallHintParamViolation

@beartype
def normalize_embeddings(
embeddings: list[list[float]],
epsilon: float = 1e-8,
) -> list[list[float]]:
import math
result = []
for emb in embeddings:
norm = math.sqrt(sum(x**2 for x in emb))
result.append([x / (norm + epsilon) for x in emb])
return result

# Correct call - works fine
normalize_embeddings([[1.0, 2.0, 3.0], [0.5, 0.5, 0.5]])

# Wrong type - raises BeartypeCallHintParamViolation at runtime
try:
normalize_embeddings("not a list")
except BeartypeCallHintParamViolation as e:
print(f"Type error caught: {e}")

ruff - The Modern Python Linter

ruff is a Python linter and formatter written in Rust. It replaces flake8, pylint, isort, pyupgrade, and several other tools with a single fast binary.

pip install ruff

# Lint a file or directory
ruff check src/

# Auto-fix issues where possible
ruff check --fix src/

# Format code (like Black, but faster)
ruff format src/

# Check specific rules
ruff check --select E,F,W src/ # pycodestyle, pyflakes

ruff Configuration

# pyproject.toml
[tool.ruff]
target-version = "py311"
line-length = 100
indent-width = 4

[tool.ruff.lint]
# Enable specific rule sets
select = [
"E", # pycodestyle errors
"F", # pyflakes (unused imports, undefined names)
"W", # pycodestyle warnings
"I", # isort (import sorting)
"N", # pep8 naming conventions
"UP", # pyupgrade (modernize Python syntax)
"B", # flake8-bugbear (likely bugs)
"C4", # flake8-comprehensions (list/dict comprehension improvements)
"ANN", # flake8-annotations (require type annotations)
"SIM", # flake8-simplify (simplifiable code patterns)
"RUF", # ruff-specific rules
]
ignore = [
"ANN101", # missing type annotation for self
"ANN102", # missing type annotation for cls
"E501", # line too long (handled by formatter)
]
fixable = ["I", "UP", "C4", "SIM"] # auto-fixable categories

[tool.ruff.lint.isort]
known-first-party = ["my_package"]
force-sort-within-sections = true

[tool.ruff.format]
quote-style = "double"
indent-style = "space"
docstring-code-format = true

Speed Comparison

ruff check (10,000 files): ~0.4 seconds
flake8 (10,000 files): ~30-60 seconds
pylint (10,000 files): ~120-300 seconds

ruff achieves 50-100x speedup over flake8 through parallel processing and compiled Rust implementation.


Pre-commit Hooks for Type Checking

Pre-commit hooks run your type checker and linter before every commit, preventing type errors from entering the codebase.

# .pre-commit-config.yaml
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.3.0
hooks:
- id: ruff
args: [--fix]
- id: ruff-format

- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.8.0
hooks:
- id: mypy
additional_dependencies:
- pydantic>=2.0
- types-PyYAML
- types-requests
- torch # torch has inline type stubs in recent versions
args: [--strict, --ignore-missing-imports]

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
args: [--maxkb=500]
# Install pre-commit
pip install pre-commit

# Install the hooks
pre-commit install

# Run manually on all files
pre-commit run --all-files

# Run only mypy
pre-commit run mypy --all-files

Typed ML Pipeline - Complete Example

# typed_pipeline.py
"""A fully typed ML inference pipeline demonstrating mypy strict compliance."""
from __future__ import annotations

from pathlib import Path
from typing import Protocol, TypeVar, Generic, runtime_checkable
from dataclasses import dataclass

import numpy as np
from numpy.typing import NDArray
from pydantic import BaseModel, Field
from jaxtyping import Float
import torch
import torch.nn as nn

# --- Configuration ---

class EmbeddingModelConfig(BaseModel):
model_path: str
embedding_dim: int = Field(ge=64, le=4096)
max_sequence_length: int = Field(default=512, ge=1, le=8192)
device: str = "cpu"
batch_size: int = Field(default=32, ge=1)

class ClassifierConfig(BaseModel):
num_classes: int = Field(ge=2)
threshold: float = Field(default=0.5, gt=0.0, lt=1.0)

class PipelineConfig(BaseModel):
embedder: EmbeddingModelConfig
classifier: ClassifierConfig

# --- Protocols ---

@runtime_checkable
class TextEmbedder(Protocol):
"""Protocol for any text embedding model."""
def embed(
self,
texts: list[str],
) -> Float[np.ndarray, "batch embedding_dim"]:
...

@property
def embedding_dim(self) -> int:
...

@runtime_checkable
class BinaryClassifier(Protocol):
"""Protocol for any binary classification model."""
def predict_proba(
self,
embeddings: Float[np.ndarray, "batch embedding_dim"],
) -> Float[np.ndarray, "batch 2"]:
...

# --- TypeVar for generic pipeline ---

EmbedderT = TypeVar("EmbedderT", bound=TextEmbedder)
ClassifierT = TypeVar("ClassifierT", bound=BinaryClassifier)

@dataclass
class PredictionResult:
text: str
label: int
confidence: float
embedding: Float[np.ndarray, " embedding_dim"]

class InferencePipeline(Generic[EmbedderT, ClassifierT]):
"""Generic typed pipeline that works with any embedder/classifier pair."""

def __init__(
self,
embedder: EmbedderT,
classifier: ClassifierT,
config: PipelineConfig,
) -> None:
self._embedder = embedder
self._classifier = classifier
self._config = config

def predict(self, texts: list[str]) -> list[PredictionResult]:
"""Run inference on a list of texts."""
embeddings: Float[np.ndarray, "batch embedding_dim"] = (
self._embedder.embed(texts)
)
probabilities: Float[np.ndarray, "batch 2"] = (
self._classifier.predict_proba(embeddings)
)

results: list[PredictionResult] = []
for i, text in enumerate(texts):
positive_prob = float(probabilities[i, 1])
label = int(positive_prob >= self._config.classifier.threshold)
results.append(PredictionResult(
text=text,
label=label,
confidence=positive_prob,
embedding=embeddings[i],
))

return results

@classmethod
def from_config(
cls,
config: PipelineConfig,
embedder: EmbedderT,
classifier: ClassifierT,
) -> InferencePipeline[EmbedderT, ClassifierT]:
return cls(embedder, classifier, config)

mypy Strict Mode Setup

# Check compliance level by level:

# Level 1: catch obvious errors
mypy --ignore-missing-imports src/

# Level 2: require annotations on public APIs
mypy --disallow-untyped-defs --ignore-missing-imports src/

# Level 3: full strict mode (the goal for production ML code)
mypy --strict src/

# Suppress specific errors with inline comments (use sparingly)
result = some_untyped_library.do_thing() # type: ignore[no-any-return]

Production Engineering Notes

Incremental Adoption Strategy

Adding types to an existing codebase works best incrementally:

  1. Start with mypy --ignore-missing-imports - fixes obvious errors
  2. Add --disallow-untyped-defs to the most critical modules first
  3. Add a CI check that fails if new files lack annotations
  4. Use # type: ignore for legacy code as a tracking mechanism, not a permanent fix
  5. Prioritize annotating the public API surface first (functions called from tests or external code)

Type Stubs for Third-Party Libraries

Many libraries do not ship type stubs. The typeshed project maintains stubs for common libraries. Install additional stubs with:

pip install types-requests types-PyYAML types-redis types-boto3

For libraries with no stubs (older scikit-learn versions, some audio libraries):

# mypy.ini
[mypy-sklearn.*]
ignore_missing_imports = True

[mypy-librosa.*]
ignore_missing_imports = True

Type Checking in CI

# .github/workflows/type-check.yml
name: Type Check
on: [push, pull_request]

jobs:
mypy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install -e ".[dev]"
- run: mypy --strict src/
continue-on-error: false # fail the PR on type errors

ruff:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install ruff
- run: ruff check src/
- run: ruff format --check src/

Common Mistakes

:::danger Using Any as a Type Escape Hatch

# WRONG - Any defeats the purpose of typing
from typing import Any

def process_data(data: Any) -> Any:
return data["embeddings"] # type checker cannot verify this

# RIGHT - be specific, use Union if needed
def process_data(
data: dict[str, list[float]]
) -> list[float]:
return data["embeddings"]

# If a type is genuinely unknown, document why:
def parse_json_response(response: str) -> dict[str, Any]: # noqa: ANN401
"""External API response - shape validated by Pydantic after this."""
import json
return json.loads(response)

Any is infectious - once a value is typed as Any, operations on it return Any, spreading the loss of type information. Use it only at true boundaries (JSON parsing, OS interfaces) and validate immediately with Pydantic. :::

:::danger Annotating Without Checking (Type Theater)

# WRONG - annotations without running mypy are documentation, not type checking
def embed_texts(texts: list[str], model: "SomeModel") -> "list[float]":
# These annotations will never be checked if mypy is not in CI
...

# RIGHT - type annotations are only useful if they are checked
# Add mypy to your CI pipeline and treat type errors as build failures
# A failing mypy check should block a merge request

Type annotations without a type checker are comments that drift. They give a false sense of safety. The annotations need to be verified by mypy or pyright on every commit. :::

:::warning TypedDict Keys Are Not Validated at Runtime

from typing import TypedDict

class Config(TypedDict):
learning_rate: float
num_epochs: int

# This passes at runtime - TypedDict has no runtime enforcement
config: Config = {"learning_rate": 1e-3, "num_epochs": 10, "extra_key": "surprise"}

# For runtime validation, use pydantic BaseModel, not TypedDict
# TypedDict is purely a static analysis hint

:::

:::warning jaxtyping Shape Names Are Not Enforced Across Calls by Default

from jaxtyping import Float
import torch

# These two functions use "batch" as a dimension name
# but they are independent - jaxtyping does not link them
def encode(x: Float[torch.Tensor, "batch features"]) -> Float[torch.Tensor, "batch hidden"]:
...

def decode(x: Float[torch.Tensor, "batch hidden"]) -> Float[torch.Tensor, "batch output"]:
...

# Dimension names are only checked for consistency within a single
# @beartype-decorated function call, not across the call graph
# For cross-function shape consistency, use beartype's TypeVarTuple support
# or explicitly assert shapes at each boundary

:::


Interview Q&A

Q: What is the difference between nominal and structural subtyping, and how does Python support both?

A: Nominal subtyping means a type is a subtype if it explicitly inherits from the parent type - this is how Python's class inheritance works. class Dog(Animal) makes Dog a nominal subtype of Animal. Structural subtyping means a type is a subtype if it has the required methods and attributes, regardless of inheritance - this is how Go interfaces work and how Python's Protocol works. A class satisfies a Protocol by having the right methods, even if it has never heard of the Protocol. Python supports both: standard class inheritance is nominal, and Protocol (PEP 544) enables structural subtyping. Structural subtyping is more flexible - you can write a Protocol for "anything that has an embed() method returning a float list" and any library that happens to have such a method satisfies it, even third-party code you cannot modify. This is particularly useful in ML systems where you want to swap embedding backends or model implementations without creating deep inheritance hierarchies.

Q: When should you use Pydantic v2 vs a plain Python dataclass for configuration handling?

A: Use dataclasses when you need a lightweight data container with no external validation requirements - internal data structures, algorithm state, simple value objects. Use Pydantic when your data crosses a trust boundary: config files loaded from disk, API request/response bodies, user input, or experiment configuration that engineers edit directly. Pydantic provides three things dataclasses do not: runtime validation (it checks types and field constraints when you construct the object, not just at static analysis time), serialization (built-in JSON serialization, schema generation for OpenAPI, round-trip from JSON/dict), and helpful error messages (it reports exactly which field failed validation and why, with all violations at once rather than stopping at the first error). For ML training configs specifically, Pydantic is almost always the right choice: config files are edited by humans, they contain validated ranges (learning rate must be positive), and they need JSON serialization for experiment tracking systems like MLflow and W&B.

Q: Explain how ParamSpec enables type-safe decorators and why TypeVar alone was insufficient.

A: Before ParamSpec, when you wrapped a function in a decorator, you had to type the wrapper signature as (*args: Any, **kwargs: Any) -> R or similar, losing all information about the original function's parameter types. A decorated function like train_step(model: Model, lr: float) -> float would lose its signature and the type checker would accept train_step("wrong", "types") without complaint. ParamSpec captures the parameter specification of the original callable as a unit. When you write Callable[P, R] in the decorator signature and wrapper(*args: P.args, **kwargs: P.kwargs), the type checker can prove that wrapper accepts exactly the same parameters as the original function. After applying the decorator, the resulting function preserves the original signature including parameter names, types, and defaults. This means decorators for logging, timing, retrying, and caching no longer destroy the type information of the functions they wrap.

Q: What is type narrowing and how does TypeGuard work?

A: Type narrowing is the type checker's inference that, after a conditional check, a variable's type is more specific than its declared type. When you write if isinstance(x, str): x.upper(), the type checker knows that within the if block, x is definitely a str even if it was declared as str | int | None. Standard narrowing works with isinstance(), is None, is not None, truthy checks, and equality comparisons. TypeGuard extends this to user-defined type predicates. A function returning TypeGuard[SomeType] tells the type checker: "if this function returns True, the argument is narrowed to SomeType in the calling scope." For example, a function def is_2d_tensor(t: Tensor) -> TypeGuard[Float[Tensor, "h w"]] lets you write if is_2d_tensor(x): use_2d_ops(x) and the type checker will treat x as a typed 2D tensor in the if block. Note that TypeGuard is a promise to the type checker - the runtime enforcement depends on the predicate's correctness.

Q: What makes ruff significantly faster than flake8, and what does it replace?

A: ruff achieves 50-100x speed improvements over flake8 through two main mechanisms: it is implemented in Rust (compiled, no interpreter overhead, efficient memory layout), and it processes files in parallel using all available CPU cores. flake8 is a Python process that processes files sequentially, with each plugin adding its own pass over the AST. ruff performs a single parse of each file and applies all checks in that single pass. ruff replaces flake8 (style and error checking), pylint (a subset of its rules), isort (import sorting), pyupgrade (modernizing Python syntax), flake8-bugbear (likely bug detection), flake8-comprehensions, and several other flake8 plugins. It also replaces Black as a code formatter (via ruff format). The practical benefit is that a CI step that took 60 seconds with flake8 and isort running separately takes under 1 second with ruff, enabling much tighter feedback loops in development.

Q: How does jaxtyping help prevent the shape bug described in this lesson's opening scenario?

A: The opening scenario involved a tensor with shape (batch, features, sequence) being passed to a function that expected (batch, sequence, features). jaxtyping makes this detectable in two ways. First, statically: if you annotate the function as accepting Float[Tensor, "batch seq features"] and the upstream function returns Float[Tensor, "batch features seq"], mypy or pyright can catch the mismatch at type-check time before the code runs. Second, at runtime: when you add @beartype to the receiving function, it checks the actual tensor shape against the annotation on each call. If the dimensions do not match (wrong number of dimensions or wrong named dimension order), it raises a BeartypeCallHintParamViolation with a helpful error message. The combination of static checking in CI and runtime checking in development tests means this class of shape bug becomes a caught exception with a clear error message rather than a silent wrong output that takes four hours to diagnose.


Summary

Python's type system has matured from an optional annotation syntax into a practical engineering tool. The combination of mypy (or pyright) for static checking, Pydantic for runtime boundary validation, Protocols for flexible interface definition, and jaxtyping for tensor shape contracts gives you a level of correctness guarantees that were previously only available in statically-typed languages.

The key shift in mindset: type annotations are not documentation. They are machine-readable contracts. A contract that is not checked by a machine is just a comment. Run mypy in CI, fail the build on type errors, and treat type annotations the same way you treat unit tests - as executable specifications of the code's intended behavior.

The shape bug that opened this lesson is preventable today. The tools exist. The question is whether you install them before or after the production incident.

© 2026 EngineersOfAI. All rights reserved.