Module 12 - Explainability and Interpretability

Modern ML models achieve remarkable accuracy, but accuracy alone is not enough. Regulators demand explanations. Doctors need to trust diagnoses. Engineers must debug silent failures. This module covers the full spectrum - from intrinsically interpretable models to post-hoc explanation methods, deep-learning-specific techniques, counterfactuals, and production-grade explainability infrastructure.

Module Roadmap

Lessons

#	Lesson	What You Learn
01	Interpretability vs Explainability	Definitions, taxonomy, regulatory landscape (GDPR, EU AI Act), the interpretability-accuracy tradeoff
02	SHAP Values	Shapley values from cooperative game theory, the four axioms, TreeSHAP, KernelSHAP, DeepSHAP, production use
03	LIME	Local linear approximations, the LIME objective, perturbation sampling, stability limitations
04	Feature Importance Methods	MDI vs permutation importance, PDP, ICE plots, ALE - and when each one lies to you
05	Attention as Explanation	Attention weights in transformers, the "attention is not explanation" debate, rollout and gradient-weighted attention
06	Saliency Maps for Vision	Vanilla gradients, Grad-CAM, Integrated Gradients, RISE - visualizing what CNNs look at
07	Counterfactual Explanations	Algorithmic recourse, the Wachter et al. formulation, DiCE, actionable counterfactuals
08	Explainability in Production	SHAP at inference time, explanation drift monitoring, audit trails, compliance dashboards
09	Evaluating Explanations	Faithfulness, stability, comprehensibility - how to measure whether explanations are actually correct

Key Concepts

The interpretability-accuracy tradeoff - the widely-held belief that more accurate models are inherently less interpretable. Cynthia Rudin (2019) challenges this directly: for tabular data, carefully engineered interpretable models often match black-box performance. The tradeoff is real in some domains (image recognition with CNNs) but overstated in others.

Global vs local explanations - global explanations describe overall model behavior (which features matter most, on average). Local explanations describe a single prediction (why was this specific instance classified this way?). SHAP values can be aggregated for global insight or inspected per-prediction for local insight. LIME is inherently local.

The regulatory landscape - GDPR Article 22 grants EU citizens a right not to be subject to solely automated decisions with legal effects, and a right to obtain meaningful information about the logic involved. The EU AI Act (2024) classifies high-risk AI systems (credit, hiring, medical, law enforcement) and mandates transparency, human oversight, and audit trails. In the US, FINRA Regulatory Notice 21-06 covers algorithmic trading; FDA guidance covers AI in medical devices.

Module Roadmap​

Lessons​

Key Concepts​

Module Roadmap

Lessons

Key Concepts