Skip to main content

Module 12 - Explainability and Interpretability

Modern ML models achieve remarkable accuracy, but accuracy alone is not enough. Regulators demand explanations. Doctors need to trust diagnoses. Engineers must debug silent failures. This module covers the full spectrum - from intrinsically interpretable models to post-hoc explanation methods, deep-learning-specific techniques, counterfactuals, and production-grade explainability infrastructure.

Module Roadmap

Lessons

#LessonWhat You Learn
01Interpretability vs ExplainabilityDefinitions, taxonomy, regulatory landscape (GDPR, EU AI Act), the interpretability-accuracy tradeoff
02SHAP ValuesShapley values from cooperative game theory, the four axioms, TreeSHAP, KernelSHAP, DeepSHAP, production use
03LIMELocal linear approximations, the LIME objective, perturbation sampling, stability limitations
04Feature Importance MethodsMDI vs permutation importance, PDP, ICE plots, ALE - and when each one lies to you
05Attention as ExplanationAttention weights in transformers, the "attention is not explanation" debate, rollout and gradient-weighted attention
06Saliency Maps for VisionVanilla gradients, Grad-CAM, Integrated Gradients, RISE - visualizing what CNNs look at
07Counterfactual ExplanationsAlgorithmic recourse, the Wachter et al. formulation, DiCE, actionable counterfactuals
08Explainability in ProductionSHAP at inference time, explanation drift monitoring, audit trails, compliance dashboards
09Evaluating ExplanationsFaithfulness, stability, comprehensibility - how to measure whether explanations are actually correct

Key Concepts

The interpretability-accuracy tradeoff - the widely-held belief that more accurate models are inherently less interpretable. Cynthia Rudin (2019) challenges this directly: for tabular data, carefully engineered interpretable models often match black-box performance. The tradeoff is real in some domains (image recognition with CNNs) but overstated in others.

Global vs local explanations - global explanations describe overall model behavior (which features matter most, on average). Local explanations describe a single prediction (why was this specific instance classified this way?). SHAP values can be aggregated for global insight or inspected per-prediction for local insight. LIME is inherently local.

The regulatory landscape - GDPR Article 22 grants EU citizens a right not to be subject to solely automated decisions with legal effects, and a right to obtain meaningful information about the logic involved. The EU AI Act (2024) classifies high-risk AI systems (credit, hiring, medical, law enforcement) and mandates transparency, human oversight, and audit trails. In the US, FINRA Regulatory Notice 21-06 covers algorithmic trading; FDA guidance covers AI in medical devices.

© 2026 EngineersOfAI. All rights reserved.