Module 12 - Explainability and Interpretability
Modern ML models achieve remarkable accuracy, but accuracy alone is not enough. Regulators demand explanations. Doctors need to trust diagnoses. Engineers must debug silent failures. This module covers the full spectrum - from intrinsically interpretable models to post-hoc explanation methods, deep-learning-specific techniques, counterfactuals, and production-grade explainability infrastructure.
Module Roadmap
Lessons
| # | Lesson | What You Learn |
|---|---|---|
| 01 | Interpretability vs Explainability | Definitions, taxonomy, regulatory landscape (GDPR, EU AI Act), the interpretability-accuracy tradeoff |
| 02 | SHAP Values | Shapley values from cooperative game theory, the four axioms, TreeSHAP, KernelSHAP, DeepSHAP, production use |
| 03 | LIME | Local linear approximations, the LIME objective, perturbation sampling, stability limitations |
| 04 | Feature Importance Methods | MDI vs permutation importance, PDP, ICE plots, ALE - and when each one lies to you |
| 05 | Attention as Explanation | Attention weights in transformers, the "attention is not explanation" debate, rollout and gradient-weighted attention |
| 06 | Saliency Maps for Vision | Vanilla gradients, Grad-CAM, Integrated Gradients, RISE - visualizing what CNNs look at |
| 07 | Counterfactual Explanations | Algorithmic recourse, the Wachter et al. formulation, DiCE, actionable counterfactuals |
| 08 | Explainability in Production | SHAP at inference time, explanation drift monitoring, audit trails, compliance dashboards |
| 09 | Evaluating Explanations | Faithfulness, stability, comprehensibility - how to measure whether explanations are actually correct |
Key Concepts
The interpretability-accuracy tradeoff - the widely-held belief that more accurate models are inherently less interpretable. Cynthia Rudin (2019) challenges this directly: for tabular data, carefully engineered interpretable models often match black-box performance. The tradeoff is real in some domains (image recognition with CNNs) but overstated in others.
Global vs local explanations - global explanations describe overall model behavior (which features matter most, on average). Local explanations describe a single prediction (why was this specific instance classified this way?). SHAP values can be aggregated for global insight or inspected per-prediction for local insight. LIME is inherently local.
The regulatory landscape - GDPR Article 22 grants EU citizens a right not to be subject to solely automated decisions with legal effects, and a right to obtain meaningful information about the logic involved. The EU AI Act (2024) classifies high-risk AI systems (credit, hiring, medical, law enforcement) and mandates transparency, human oversight, and audit trails. In the US, FINRA Regulatory Notice 21-06 covers algorithmic trading; FDA guidance covers AI in medical devices.
