Skip to main content

SHAP Values - The Unified Theory of Feature Importance

Reading time: 45 min | Interview relevance: Very High - standard in industry ML interviews | Target roles: ML Engineer, AI Engineer, Data Scientist, Research Engineer


The Game Theory Insight That Changed ML Explainability

It is 2017. Scott Lundberg is a PhD student at the University of Washington, frustrated by the zoo of explanation methods that all claim to measure "feature importance" but give different, incompatible answers. LIME gives one result. Permutation importance gives another. The model's built-in feature importance gives a third. Which one is right? Under what conditions? Is there a principled way to decide?

Then he reads about cooperative game theory. Specifically, the Shapley value - a 1953 result from Lloyd Shapley (later a Nobel laureate in economics) that addresses a deceptively simple question: given a coalition of players jointly producing a value, how do you fairly attribute that value to individual players? Shapley proved that there is exactly one attribution method satisfying four natural fairness axioms simultaneously. The attribution is unique.

Lundberg has the insight: a machine learning prediction is a coalition game. The "players" are the features. The "value" is the model's output. The Shapley value is the unique fair attribution of that output to the features. He implements it, proves the connection rigorously, and shows that methods like LIME and the model's built-in importance are all approximations to - or special cases of - Shapley values. The unifying theory was there in 1953. It just had not been connected to ML yet.

The paper, "A Unified Approach to Interpreting Model Predictions," is published at NeurIPS 2017. It receives 10,000+ citations. SHAP becomes the industry standard for feature attribution. Every major ML library ships SHAP integration. Regulatory bodies reference SHAP methodology in guidance documents.

This lesson explains why Shapley values are the uniquely correct answer, how SHAP makes them computationally tractable, and how to use SHAP correctly in production.


Why Previous Feature Importance Methods Fell Short

Before SHAP, the field had several competing approaches, each with serious problems.

Gini importance (MDI) - built into scikit-learn's random forest. Fast, but systematically biased toward high-cardinality features. A continuous feature with 1,000 unique values scores higher than a binary feature even if they have equal predictive power. The bias comes from the way the algorithm accumulates node splits.

Permutation importance - shuffle a feature, measure the drop in accuracy. Model-agnostic and unbiased, but correlates: if two features are highly correlated, permuting one while leaving the other intact understates both features' importance.

LIME coefficients - fit a local linear model, read its weights. Unstable: different random seeds give meaningfully different explanations for the same prediction.

Gradients (neural networks) - the gradient of the output with respect to the input. Suffers from saturation: a feature with a large actual effect can have a near-zero gradient if the activation function is saturated.

Each method answers a slightly different question, under different assumptions, with different failure modes. Lundberg's contribution: formalize what "feature attribution" should mean axiomatically, then derive the unique correct answer.


The Cooperative Game Theory Foundation

In a cooperative game:

  • There is a set of players N={1,2,,p}N = \{1, 2, \ldots, p\}
  • There is a characteristic function v:2NRv: 2^N \to \mathbb{R} that assigns a value to every possible coalition (subset) of players
  • v()=0v(\emptyset) = 0 (the empty coalition has zero value)

The question: how do you distribute the total value v(N)v(N) fairly among the individual players?

Lloyd Shapley (1953) identified four fairness axioms that any reasonable attribution should satisfy:

Efficiency: The attribution values sum to the total: iNϕi=v(N)\sum_{i \in N} \phi_i = v(N).

Symmetry: If players ii and jj contribute equally to every coalition - v(S{i})=v(S{j})v(S \cup \{i\}) = v(S \cup \{j\}) for all SS - then ϕi=ϕj\phi_i = \phi_j.

Dummy: If player ii never contributes anything - v(S{i})=v(S)v(S \cup \{i\}) = v(S) for all SS - then ϕi=0\phi_i = 0.

Additivity: ϕi(v+w)=ϕi(v)+ϕi(w)\phi_i(v + w) = \phi_i(v) + \phi_i(w).

Shapley proved that there is exactly one attribution method satisfying all four axioms:

ϕi(v)=SN{i}S!(NS1)!N![v(S{i})v(S)]\phi_i(v) = \sum_{S \subseteq N \setminus \{i\}} \frac{|S|!(|N|-|S|-1)!}{|N|!} \left[ v(S \cup \{i\}) - v(S) \right]

The interpretation: ϕi\phi_i is the weighted average of player ii's marginal contribution across all possible orderings in which players join the coalition. The weight S!(NS1)!N!\frac{|S|!(|N|-|S|-1)!}{|N|!} is the probability that, in a uniformly random ordering of all players, player ii is preceded by exactly the players in SS.

Uniqueness is the key result. This is not just a good formula - it is the only formula satisfying all four axioms simultaneously.


From Game Theory to Machine Learning

Lundberg and Lee's insight: treat a model's prediction as a coalition game.

Players: the features f1,f2,,fpf_1, f_2, \ldots, f_p

Characteristic function: the model's expected output given a subset of features. For a feature subset SS:

v(S)=EXP(XXS=xS)[f(X)]v(S) = E_{X \sim P(X \mid X_S = x_S)}[f(X)]

SHAP value for feature jj:

ϕj(x)=SN{j}S!(pS1)!p![fS(xS{j})fS(xS)]\phi_j(x) = \sum_{S \subseteq N \setminus \{j\}} \frac{|S|!(p-|S|-1)!}{p!} \left[ f_S(x_{S \cup \{j\}}) - f_S(x_S) \right]

The additive decomposition - SHAP values sum to the difference between the prediction and the mean prediction:

f(x)=E[f(X)]+j=1pϕj(x)f(x) = E[f(X)] + \sum_{j=1}^{p} \phi_j(x)

This is what makes SHAP plots interpretable. The baseline is the mean model output E[f(X)]E[f(X)]. Each SHAP value ϕj(x)\phi_j(x) is the amount feature jj shifted the prediction away from the baseline. Positive SHAP values push the prediction above the baseline; negative values push it below. They sum exactly to the total deviation from the mean.


The Four Axioms Applied to ML

Efficiency: SHAP values sum exactly to f(x)E[f(X)]f(x) - E[f(X)]. No missing attribution, no double-counting.

Symmetry: If two features ii and jj are interchangeable (they contribute equally in every coalition), they receive equal SHAP values. Correlated features with equal predictive power share their attribution equally.

Dummy: A feature that has no effect on the model in any context receives SHAP value zero. This is the only feature importance method that provably respects this.

Additivity: SHAP values of an ensemble of models are the sum of SHAP values of the individual models.

note

These axioms are what make SHAP the uniquely correct method. Permutation importance violates symmetry (correlated features are systematically undervalued). Gradient-based attributions violate dummy in saturated regimes. SHAP satisfies all four - and it is the only method that does.


Computing SHAP Values

The naive Shapley computation requires evaluating the model on all 2p2^p subsets. For p=20p = 20 features, that is 1 million model evaluations. Completely intractable. SHAP becomes practical through three specialized algorithms:

TreeSHAP - Exact, Polynomial Time for Trees

For tree ensembles (XGBoost, LightGBM, scikit-learn forests), Lundberg et al. (2018) derived TreeSHAP - an exact algorithm with complexity O(TLD2)O(TLD^2) where TT is the number of trees, LL is the maximum number of leaves, and DD is the maximum tree depth.

The key insight: tree predictions are piecewise constant, and the conditional expectation E[f(X)XS=xS]E[f(X) \mid X_S = x_S] can be computed exactly by tracking node visit probabilities during a single tree traversal. Each tree traversal takes O(LD2)O(LD^2); multiply by TT trees for the total cost. For typical production models (500 trees, depth 8), TreeSHAP runs in under 1 millisecond per prediction.

This is exact - not an approximation. TreeSHAP computes the true Shapley values for tree models.

KernelSHAP - Model-Agnostic Approximation

For any model (neural networks, SVMs, custom functions), KernelSHAP approximates Shapley values using weighted linear regression on coalitions.

The algorithm:

  1. Sample MM subsets S{1,,p}S \subseteq \{1, \ldots, p\} uniformly with Shapley kernel weighting
  2. For each subset, create a masked input: use the instance's values for features in SS, replace others with background data samples
  3. Get model predictions for the masked inputs
  4. Fit a weighted linear regression on (z,f(z))(z', f(z)) where zz' is the binary coalition indicator

The Shapley kernel weight for coalition size S|S|:

πx(S)=(p1)(pS)S(pS)\pi_x(S) = \frac{(p-1)}{\binom{p}{|S|} |S|(p-|S|)}

DeepSHAP - Neural Networks

DeepSHAP builds on DeepLIFT (Shrikumar et al. 2017), using backpropagation rules to attribute outputs to inputs. DeepSHAP chains these approximations through the network layers. Faster than KernelSHAP for neural networks but is an approximation.


Manual Shapley Calculation from Scratch

To build intuition, let's compute Shapley values by hand for a 3-feature example where we enumerate all 23=82^3 = 8 coalitions.

import numpy as np
from itertools import combinations
from math import factorial

# Toy model: f(x1, x2, x3) = 2*x1 + 3*x2 + x1*x2 + x3
# Explain the prediction for x = (1, 1, 1)
# Background (E[f(X)]) approximated by single point (0, 0, 0)

def model(x1, x2, x3):
"""Nonlinear interaction between x1 and x2"""
return 2*x1 + 3*x2 + x1*x2 + x3

def v(coalition, x=(1, 1, 1), background=(0, 0, 0)):
"""
v(S) = E[f(X) | X_S = x_S]
Features in coalition take values from x, others from background.
"""
x1 = x[0] if 0 in coalition else background[0]
x2 = x[1] if 1 in coalition else background[1]
x3 = x[2] if 2 in coalition else background[2]
return model(x1, x2, x3)

N = [0, 1, 2] # feature indices
p = len(N)

def shapley_value(player, N, v):
phi = 0.0
others = [i for i in N if i != player]

for size in range(len(others) + 1):
for S in combinations(others, size):
S = list(S)
v_with = v(S + [player])
v_without = v(S)
marginal = v_with - v_without
weight = (factorial(len(S)) * factorial(p - len(S) - 1)) / factorial(p)
phi += weight * marginal

print(f" S={S}, v(S∪{{f{player}}})={v_with:.3f}, "
f"v(S)={v_without:.3f}, marginal={marginal:.3f}, "
f"weight={weight:.4f}, contrib={weight*marginal:.4f}")

return phi

f_x = model(1, 1, 1) # = 7
f_bg = model(0, 0, 0) # = 0
print(f"f(1,1,1) = {f_x}, f(0,0,0) = {f_bg}, deviation = {f_x - f_bg}")
print()

shap_values = []
for player in N:
print(f"Computing φ_{player+1} (x{player+1}):")
phi = shapley_value(player, N, v)
shap_values.append(phi)
print(f" → φ_{player+1} = {phi:.4f}\n")

print(f"SHAP values: {[f'{sv:.4f}' for sv in shap_values]}")
print(f"Sum: {sum(shap_values):.4f} (should equal {f_x - f_bg})")
# Output: φ₁=2.5, φ₂=4.0, φ₃=0.5, sum=7.0 ✓

Notice: x1x_1's direct coefficient is 2 but its Shapley value is 2.5. The extra 0.5 comes from the interaction term x1x2x_1 x_2 - Shapley distributes this interaction symmetrically between the two features.


Production Code: TreeSHAP and KernelSHAP

import numpy as np
import pandas as pd
import xgboost as xgb
import shap
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# ─── DATASET ─────────────────────────────────────────────────────────────────

data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)

# ─── TREESHAP - EXACT AND FAST ────────────────────────────────────────────────

xgb_model = xgb.XGBClassifier(
n_estimators=200, max_depth=5, learning_rate=0.1,
random_state=42, eval_metric="logloss", verbosity=0
)
xgb_model.fit(X_train, y_train)
print(f"XGBoost accuracy: {xgb_model.score(X_test, y_test):.4f}")

explainer = shap.TreeExplainer(xgb_model)
shap_values = explainer.shap_values(X_test)

print(f"\nSHAP values shape: {shap_values.shape}")
print(f"Expected value (baseline): {explainer.expected_value:.4f}")

# Verify efficiency axiom for first test instance
instance_margin = xgb_model.predict(
X_test.iloc[[0]], output_margin=True
)[0]
shap_sum = explainer.expected_value + shap_values[0].sum()
print(f"\nEfficiency axiom check:")
print(f" Model output (log-odds): {instance_margin:.6f}")
print(f" E[f] + sum(SHAP): {shap_sum:.6f}")
print(f" Match: {abs(instance_margin - shap_sum) < 1e-4}")

# Global importance - mean absolute SHAP
mean_abs_shap = np.abs(shap_values).mean(axis=0)
feature_importance = pd.Series(mean_abs_shap, index=X.feature_names)
print("\nTop 10 features (mean |SHAP|):")
print(feature_importance.sort_values(ascending=False).head(10).to_string())

# Local explanation for instance 5
i = 5
print(f"\nLocal explanation for instance {i}:")
print(f" Prediction: {xgb_model.predict(X_test.iloc[[i]])[0]}, "
f"prob={xgb_model.predict_proba(X_test.iloc[[i]])[0, 1]:.3f}")
for feat, sv in sorted(zip(X.feature_names, shap_values[i]),
key=lambda t: abs(t[1]), reverse=True)[:6]:
arrow = "↑" if sv > 0 else "↓"
print(f" {feat:35s} {sv:+.4f} {arrow}")

# ─── KERNELSHAP - MODEL AGNOSTIC ─────────────────────────────────────────────

# Summarize background with k-means (100 centers keeps cost down)
background = shap.kmeans(X_train, 100)

def predict_proba_positive(X):
return xgb_model.predict_proba(X)[:, 1]

kernel_explainer = shap.KernelExplainer(predict_proba_positive, background)

# Compute on a small batch - nsamples controls accuracy vs speed
shap_kernel = kernel_explainer.shap_values(
X_test.iloc[:20], nsamples=500, silent=True
)

correlation = np.corrcoef(
shap_values[:20].flatten(),
shap_kernel.flatten()
)[0, 1]
print(f"\nTreeSHAP vs KernelSHAP correlation (first 20 instances): {correlation:.4f}")
# Typically > 0.95

# ─── PRODUCTION PATTERN: EXPLAIN AT INFERENCE TIME ───────────────────────────

class ExplainablePredictor:
"""Serves predictions + SHAP explanations at inference time."""

def __init__(self, model, feature_names):
self.model = model
self.feature_names = feature_names
self.explainer = shap.TreeExplainer(model)

def predict_with_explanation(self, X_df, top_n=3):
predictions = self.model.predict(X_df)
probabilities = self.model.predict_proba(X_df)[:, 1]
sv = self.explainer.shap_values(X_df)

results = []
for i in range(len(X_df)):
drivers = sorted(
zip(self.feature_names, sv[i], X_df.iloc[i]),
key=lambda t: abs(t[1]), reverse=True
)[:top_n]
results.append({
"prediction": int(predictions[i]),
"probability": round(float(probabilities[i]), 4),
"top_drivers": [
{
"feature": feat,
"shap_value": round(float(s), 4),
"feature_value": round(float(fv), 4),
"direction": "increases_risk" if s > 0 else "decreases_risk"
}
for feat, s, fv in drivers
]
})
return results

predictor = ExplainablePredictor(xgb_model, list(X.feature_names))
explanations = predictor.predict_with_explanation(X_test.iloc[:3])
for i, exp in enumerate(explanations):
print(f"\nInstance {i}: pred={exp['prediction']}, prob={exp['probability']}")
for d in exp['top_drivers']:
print(f" {d['feature']:35s} SHAP={d['shap_value']:+.4f} "
f"(value={d['feature_value']:.3f}, {d['direction']})")

SHAP for Model Debugging and Monitoring

def find_spurious_feature_reliance(shap_values, feature_names,
suspect_features, threshold=0.2):
"""
Check if the model relies heavily on features that should be irrelevant
(e.g., row IDs, timestamps, data artifacts).
"""
report = {}
for feat in suspect_features:
if feat not in feature_names:
continue
idx = list(feature_names).index(feat)
feat_shap = shap_values[:, idx]
global_importance = np.abs(feat_shap).mean()
high_reliance_count = (np.abs(feat_shap) > threshold).sum()

report[feat] = {
"global_mean_abs_shap": float(global_importance),
"n_instances_above_threshold": int(high_reliance_count),
"pct_instances_above_threshold": float(high_reliance_count / len(feat_shap))
}
if global_importance > 0.01:
print(f"WARNING: model relies on '{feat}' - mean|SHAP|={global_importance:.4f}")

return report

# In production: flag if a feature that should be irrelevant has nonzero SHAP
find_spurious_feature_reliance(
shap_values,
list(X.feature_names),
suspect_features=["mean radius", "mean texture"], # example suspects
threshold=0.2
)

# ─── SHAP DRIFT MONITORING ────────────────────────────────────────────────────

import datetime

def shap_importance_snapshot(explainer, X_batch, feature_names):
"""
Compute per-feature mean |SHAP| for a batch.
Store daily; alert when any feature shifts by >20% from 7-day average.
"""
sv = explainer.shap_values(X_batch)
mean_abs = np.abs(sv).mean(axis=0)
return {
"date": datetime.date.today().isoformat(),
"n_predictions": len(X_batch),
"importance": {feat: float(imp)
for feat, imp in zip(feature_names, mean_abs)}
}

snapshot = shap_importance_snapshot(explainer, X_test, list(X.feature_names))
print(f"\nSHAP snapshot for {snapshot['date']}:")
top5 = sorted(snapshot["importance"].items(), key=lambda t: t[1], reverse=True)[:5]
for feat, imp in top5:
print(f" {feat:35s} mean|SHAP| = {imp:.4f}")

Common Mistakes

danger

Interpreting SHAP values as causal effects

A SHAP value of +0.34 for debt_to_income_ratio means the model uses that feature to push this prediction upward by 0.34 units in the model's output space. It does not mean that reducing the debt-to-income ratio causes the prediction to change by a proportional amount. SHAP is correlation-based attribution. If the model uses a proxy for a protected attribute (like zip code as a proxy for race), SHAP will faithfully report the model is using that proxy - but will not flag the proxy as discriminatory. Causal analysis requires additional methods beyond SHAP.

danger

Using KernelSHAP on large feature sets without controlling sample count

For a model with 200 features, each KernelSHAP call with nsamples=2000 invokes the model 400,000 times. At 1ms per inference, that is 400 seconds per explanation - unusable in production. Always use TreeSHAP for tree models. When KernelSHAP is necessary, use shap.kmeans() to summarize the background to 50-100 points and keep nsamples in the 200-500 range.

warning

Using SHAP for feature selection without accounting for correlated features

When two features are highly correlated, SHAP distributes their combined importance between them - each gets roughly half. If you drop the lower-SHAP one, you have removed a feature that the model would have happily substituted for the remaining one. Always inspect correlation structure before using SHAP for feature selection.

warning

Not setting a representative background dataset

The baseline E[f(X)]E[f(X)] depends entirely on the background dataset. A background that is not representative of the production distribution will produce SHAP values that do not correspond to production model behavior. Use a stratified sample of 100-1000 training instances, drawn from the same distribution as production inputs. Never use a single reference point (like all zeros) unless you have a domain-specific reason.


YouTube Resources

  • "A Unified Approach to Interpreting Model Predictions" - Scott Lundberg (NeurIPS 2017): Lundberg's original presentation. The derivation of the four axioms is laid out clearly.
  • "SHAP: From Theory to Practice" - PyData Global: Deep walkthrough of TreeSHAP internals and production patterns.
  • "Interpreting Machine Learning Models with SHAP" - Weights and Biases: Practical tutorial on all major SHAP plot types.
  • "Cooperative Game Theory and ML" - MLSS: Background on Shapley's original 1953 result and its extensions.

Interview Q&A

Q: Derive the Shapley value formula and explain what each term means.

The Shapley value for player ii in a coalition game:

ϕi=SN{i}S!(NS1)!N![v(S{i})v(S)]\phi_i = \sum_{S \subseteq N \setminus \{i\}} \frac{|S|!(|N|-|S|-1)!}{|N|!} \left[ v(S \cup \{i\}) - v(S) \right]

The term v(S{i})v(S)v(S \cup \{i\}) - v(S) is player ii's marginal contribution when joining coalition SS. The weight S!(NS1)!N!\frac{|S|!(|N|-|S|-1)!}{|N|!} is the probability that, in a uniformly random ordering of all N|N| players, player ii joins exactly when SS players have already joined. The sum averages ii's marginal contribution over all possible orderings. This is the unique fair attribution satisfying all four axioms.

Q: What are the four Shapley axioms and why do they uniquely determine SHAP values?

Efficiency: attributions sum to f(x)E[f(X)]f(x) - E[f(X)]. Symmetry: interchangeable features get equal attribution. Dummy: zero-effect features get zero attribution. Additivity: attributions for a sum of games equal the sum of attributions. Shapley (1953) proved these four axioms uniquely determine the attribution formula - no other attribution satisfies all four. This uniqueness is SHAP's core claim: if you accept these axioms as the definition of fair attribution, SHAP is the only correct answer.

Q: What is TreeSHAP and why is it O(TLD2)O(TLD^2) rather than exponential?

Naive Shapley computation evaluates the model on all 2p2^p feature subsets. For tree models, Lundberg et al. (2018) showed that the conditional expectation E[f(X)XS=xS]E[f(X) \mid X_S = x_S] can be computed exactly in a single tree traversal by tracking node visit probabilities. At each internal node, the algorithm maintains a distribution over which leaf the instance would reach depending on which features are "known." This traversal is O(LD2)O(LD^2) per tree, multiplied by TT trees. The polynomial complexity replaces the exponential because the tree structure allows the conditional expectations to be accumulated in a single pass, rather than re-evaluated for each of the 2p2^p subsets.

Q: SHAP vs permutation importance - when would you use each?

Use TreeSHAP when: you have a tree model, you need local (per-prediction) explanations, you need exact values, or you need to serve explanations at inference time. Use permutation importance when: you have a non-tree model without a SHAP implementation, you want a quick global overview, or you need a method that is easy to explain to non-technical stakeholders. Key difference: permutation importance gives only global importance and is slow. TreeSHAP gives both local and global (by aggregating), is exact, and runs in under 1ms per prediction for typical tree ensembles.

Q: How would you use SHAP for model debugging in production?

Three patterns. First, detect spurious feature reliance: compute mean |SHAP| for each feature on a validation set; if a feature that should have no predictive value (row ID, timestamp) has nonzero importance, the model has learned a data artifact. Second, find subgroups where the model fails: cluster instances by their SHAP value vectors; clusters with unusual attributions often correspond to input subspaces where the model generalizes poorly. Third, monitor explanation drift: compute mean |SHAP| per feature on weekly production batches; a shift in which features drive predictions signals input distribution shift or model degradation, even before accuracy metrics show a change.


SHAP Interaction Values

Standard SHAP values measure the marginal contribution of each feature, averaging over all feature subsets. But they do not distinguish between the main effect of a feature and its interaction effects with other features.

SHAP interaction values (Lundberg et al. 2018) extend the Shapley framework to two-feature interactions. The SHAP interaction value for feature pair (i,j)(i, j) is:

Φij(x)=SN{i,j}S!(pS2)!2(p1)!δij(S)\Phi_{ij}(x) = \sum_{S \subseteq N \setminus \{i,j\}} \frac{|S|!(p-|S|-2)!}{2(p-1)!} \delta_{ij}(S)

where δij(S)=v(S{i,j})v(S{i})v(S{j})+v(S)\delta_{ij}(S) = v(S \cup \{i,j\}) - v(S \cup \{i\}) - v(S \cup \{j\}) + v(S) is the interaction term.

The SHAP main effect for feature ii is then:

Φii(x)=ϕi(x)jiΦij(x)\Phi_{ii}(x) = \phi_i(x) - \sum_{j \neq i} \Phi_{ij}(x)

And the original SHAP values satisfy: ϕi(x)=Φii(x)+jiΦij(x)\phi_i(x) = \Phi_{ii}(x) + \sum_{j \neq i} \Phi_{ij}(x)

This decomposition tells you: how much of feature ii's contribution comes from its main effect, and how much comes from interactions with each other feature?

import numpy as np
import xgboost as xgb
import shap
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
import pandas as pd

data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)

model = xgb.XGBClassifier(
n_estimators=100, max_depth=4, random_state=42,
eval_metric="logloss", verbosity=0
)
model.fit(X_train, y_train)

# Compute SHAP interaction values - returns (n_samples, n_features, n_features) array
# shap_interaction[i, j, k] = SHAP interaction value for features j and k in instance i
explainer = shap.TreeExplainer(model)
shap_interaction = explainer.shap_interaction_values(X_test[:50])

# The diagonal contains main effects; off-diagonal contains interaction effects
# shap_interaction is shape (n_samples, n_features, n_features)
print(f"SHAP interaction values shape: {shap_interaction.shape}")

# For binary classification, it may be a list [class0, class1]
if isinstance(shap_interaction, list):
sv_interact = shap_interaction[1]
else:
sv_interact = shap_interaction

# Mean absolute interaction between each pair of features
n_feat = len(X.columns)
mean_abs_interact = np.abs(sv_interact).mean(axis=0)

# Top 5 pairwise interactions
pairs = []
for i in range(n_feat):
for j in range(i + 1, n_feat):
pairs.append({
"feature_i": X.columns[i],
"feature_j": X.columns[j],
"mean_abs_interaction": mean_abs_interact[i, j]
})

top_interactions = sorted(pairs, key=lambda x: x["mean_abs_interaction"], reverse=True)[:5]
print("\nTop 5 pairwise SHAP interactions:")
for p in top_interactions:
print(f" {p['feature_i']:25s} × {p['feature_j']:25s} = {p['mean_abs_interaction']:.4f}")

# Main effects vs interactions for top feature
top_feature_idx = np.abs(sv_interact).mean(axis=0).diagonal().argsort()[-1]
top_feat_name = X.columns[top_feature_idx]
main_effect = np.abs(sv_interact[:, top_feature_idx, top_feature_idx]).mean()
total_interaction = np.abs(sv_interact[:, top_feature_idx, :]).mean() - main_effect

print(f"\nFeature: {top_feat_name}")
print(f" Main effect: {main_effect:.4f}")
print(f" Interaction: {total_interaction:.4f}")
print(f" Interaction fraction: {total_interaction / (main_effect + total_interaction):.1%}")

SHAP for Fairness Analysis

SHAP provides a principled tool for detecting model discrimination. The key technique: compute SHAP values for a protected attribute (or its proxy) and measure its contribution to predictions across demographic groups.

import numpy as np
import pandas as pd
import xgboost as xgb
import shap
from sklearn.model_selection import train_test_split

# Synthetic lending dataset with potential proxy discrimination
np.random.seed(42)
n = 5000

# Features
credit_score = np.random.normal(700, 80, n).clip(300, 850)
income = np.random.lognormal(10.8, 0.5, n) # log-normal income
debt_ratio = np.random.beta(3, 5, n)
employment_years = np.random.gamma(5, 2, n)
zip_density = np.random.choice([0, 1], n, p=[0.7, 0.3]) # 1 = urban/minority-dense zip
# Note: zip_density correlates with demographic characteristics in training data

# Target: default risk (zip_density should NOT be used but may be a proxy)
logit = (-0.003 * credit_score + 1.5 * debt_ratio
- 0.000005 * income - 0.02 * employment_years
+ 0.3 * zip_density) # zip_density included in DGP (realistic scenario)
prob_default = 1 / (1 + np.exp(-logit))
y = (np.random.rand(n) < prob_default).astype(int)

X = pd.DataFrame({
"credit_score": credit_score,
"income": income,
"debt_ratio": debt_ratio,
"employment_years": employment_years,
"zip_density": zip_density
})

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = xgb.XGBClassifier(
n_estimators=200, max_depth=4, random_state=42,
eval_metric="logloss", verbosity=0
)
model.fit(X_train, y_train)

explainer = shap.TreeExplainer(model)
shap_vals = explainer.shap_values(X_test)
if isinstance(shap_vals, list):
shap_vals = shap_vals[1]

# ── Fairness analysis: SHAP contribution of zip_density ────────────────────

zip_col_idx = list(X.columns).index("zip_density")
zip_shap = shap_vals[:, zip_col_idx]

print("SHAP-based fairness analysis for 'zip_density':")
print(f" Global mean |SHAP|: {np.abs(zip_shap).mean():.4f}")
print(f" Mean SHAP (urban=1): {zip_shap[X_test['zip_density'] == 1].mean():.4f}")
print(f" Mean SHAP (rural=0): {zip_shap[X_test['zip_density'] == 0].mean():.4f}")
print(f" Difference: {zip_shap[X_test['zip_density'] == 1].mean() - zip_shap[X_test['zip_density'] == 0].mean():.4f}")

# If mean SHAP for zip_density != 0 by demographic group,
# the model is using zip as a proxy.
# Threshold for concern: mean |SHAP| for zip_density > 5% of total mean |SHAP|
total_mean_abs = np.abs(shap_vals).mean()
zip_rel_importance = np.abs(zip_shap).mean() / total_mean_abs
print(f"\n Relative importance of zip_density: {zip_rel_importance:.1%}")
print(f" Status: {'FLAGGED for review' if zip_rel_importance > 0.05 else 'Within acceptable range'}")

# ── Approval rate disparity check ──────────────────────────────────────────

predictions = model.predict(X_test)
approval_urban = 1 - predictions[X_test["zip_density"] == 1].mean()
approval_rural = 1 - predictions[X_test["zip_density"] == 0].mean()
disparate_impact = approval_urban / approval_rural # 4/5ths rule: should be > 0.8

print(f"\n Approval rate (urban zip): {approval_urban:.3f}")
print(f" Approval rate (rural zip): {approval_rural:.3f}")
print(f" Disparate impact ratio: {disparate_impact:.3f}")
print(f" 4/5 rule: {'PASSES' if disparate_impact >= 0.8 else 'FAILS - investigate proxy discrimination'}")

This pattern - checking the SHAP contribution of a potential proxy attribute and comparing approval rates across groups - is a core component of any ML fairness audit. SHAP makes the mechanism visible; the four-fifths rule (used in US employment discrimination law) provides the threshold.


SHAP for Time-Series and Text

SHAP for Text (with Transformers)

For text models based on transformers, SHAP can be computed using the shap.Explainer with the masker approach:

# pip install transformers shap
try:
import shap
from transformers import pipeline

# Load a sentiment classifier
clf = pipeline("text-classification",
model="distilbert-base-uncased-finetuned-sst-2-english")

# SHAP explainer for text - masks tokens and observes prediction changes
explainer = shap.Explainer(clf)

texts = [
"The product quality is excellent and delivery was fast",
"Terrible service and the item arrived broken"
]

# Compute SHAP values - each token gets a SHAP value
shap_values = explainer(texts)

print("SHAP values for text classification:")
for i, text in enumerate(texts):
print(f"\nText: '{text}'")
tokens = shap_values[i].data
svs = shap_values[i].values
# Show tokens with their SHAP contributions
if len(svs.shape) > 1:
svs = svs[:, 1] # positive class SHAP values
for token, sv in sorted(zip(tokens, svs), key=lambda t: abs(t[1]), reverse=True)[:5]:
direction = "positive" if sv > 0 else "negative"
print(f" '{token}': {sv:+.4f} ({direction} sentiment contribution)")

except ImportError:
print("transformers or shap not configured for this environment.")
print("Text SHAP works by masking tokens and measuring prediction changes.")
print("The explainer treats masked tokens as 'unknown' (using a mask token or UNK).")

Practice Problems

  1. Prove that SHAP values satisfy the efficiency axiom for a 2-feature linear model f(x)=w1x1+w2x2+bf(x) = w_1 x_1 + w_2 x_2 + b. Show that ϕ1+ϕ2=f(x)E[f(X)]\phi_1 + \phi_2 = f(x) - E[f(X)].

  2. For a model with 10 features, you are computing KernelSHAP with nsamples=500. Estimate the total number of model evaluations required. If each model evaluation takes 2ms, how long will it take to compute SHAP values for 1,000 test instances? Is this acceptable for a real-time API?

  3. You have two features with correlation ρ=0.95\rho = 0.95 and equal predictive power. What do you expect SHAP to report for their individual importances compared to permutation importance? Which is more useful for feature selection?

  4. Your SHAP monitoring system detects that the mean |SHAP| for transaction_time doubled between last week and this week on a fraud detection model. List three possible root causes and how you would diagnose each.

  5. Design a SHAP-based fairness audit for a hiring algorithm. What protected attributes would you check? What SHAP analysis would you run? What thresholds would you use to flag potential discrimination?


TreeSHAP Internals - How the Algorithm Works

Understanding TreeSHAP's internal algorithm builds intuition for when it is exact and what its computational complexity actually means.

For a single decision tree, the key operation is computing E[f(X)XS=xS]E[f(X) \mid X_S = x_S] - the expected model output given that we know the values of features in set SS. For a tree model, this is:

E[f(X)XS=xS]=leaf lvalue(l)P(X reaches leaf lXS=xS)E[f(X) \mid X_S = x_S] = \sum_{\text{leaf } l} \text{value}(l) \cdot P(\text{X reaches leaf } l \mid X_S = x_S)

The challenge: compute this for all 2p2^p subsets SS efficiently.

The algorithm's key insight: as we traverse the tree from the root, we maintain, for each ancestor node, a vector of "coverage fractions" - what proportion of the background data that would reach this node comes from "known" vs "unknown" feature contributions. When we reach a split on a feature in SS (known), we follow the actual split. When we reach a split on a feature not in SS (unknown), we follow both branches, weighted by the proportion of training data going each way.

import numpy as np

def treeshap_single_tree_conceptual(tree_model, x, background_X):
"""
Conceptual (slow) implementation of TreeSHAP for a single decision tree.
This illustrates the algorithm; shap.TreeExplainer is 100x faster.

For each of the 2^p subsets S:
1. Traverse the tree with x for features in S
2. Use background distribution for features not in S
3. Compute the Shapley weight for this |S|
4. Accumulate marginal contributions

This is O(2^p * T) - the fast version is O(LD^2 * T).
"""
from sklearn.tree import DecisionTreeClassifier

def predict_with_subset(S, x, background_X, tree):
"""E[f(X) | X_S = x_S] by marginalization over background"""
predictions = []
for bg_row in background_X:
masked = bg_row.copy()
for feat_idx in S:
masked[feat_idx] = x[feat_idx]
predictions.append(tree.predict_proba([masked])[0, 1])
return np.mean(predictions)

n_features = len(x)
N = list(range(n_features))
shap_values = np.zeros(n_features)

# For each feature j, compute Shapley value
for j in N:
others = [i for i in N if i != j]
phi_j = 0.0

from itertools import combinations
from math import factorial

for size in range(len(others) + 1):
for S in combinations(others, size):
S = list(S)
weight = (factorial(len(S)) * factorial(n_features - len(S) - 1)
/ factorial(n_features))
v_with = predict_with_subset(S + [j], x, background_X, tree_model)
v_without = predict_with_subset(S, x, background_X, tree_model)
phi_j += weight * (v_with - v_without)

shap_values[j] = phi_j

return shap_values

# The fast version (shap.TreeExplainer) achieves the same result in O(LD^2 * T)
# by exploiting the tree structure to avoid re-traversing for each coalition.
# The key optimization: maintain a probability vector over paths at each node,
# and update it analytically as we decide which features are "known" vs "unknown."
print("TreeSHAP conceptual algorithm implemented above.")
print("The library implementation is identical in result, 100x+ faster in execution.")

The critical property: TreeSHAP gives EXACTLY the same result as the brute-force coalition enumeration above - but in polynomial rather than exponential time. This is not an approximation. It is the exact Shapley value, computed cleverly.


Comparing SHAP Across Model Versions

A powerful use of SHAP in production is model comparison: when you retrain a model, do the feature attributions change? Should they?

import numpy as np
import pandas as pd
import xgboost as xgb
import shap
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)

# Train model v1
model_v1 = xgb.XGBClassifier(
n_estimators=100, max_depth=3, random_state=42,
eval_metric="logloss", verbosity=0
)
model_v1.fit(X_train, y_train)

# Train model v2 (more complex)
model_v2 = xgb.XGBClassifier(
n_estimators=300, max_depth=6, random_state=42,
eval_metric="logloss", verbosity=0
)
model_v2.fit(X_train, y_train)

# Compare SHAP profiles
explainer_v1 = shap.TreeExplainer(model_v1)
explainer_v2 = shap.TreeExplainer(model_v2)
sv1 = explainer_v1.shap_values(X_test)
sv2 = explainer_v2.shap_values(X_test)

if isinstance(sv1, list):
sv1, sv2 = sv1[1], sv2[1]

importance_v1 = pd.Series(np.abs(sv1).mean(axis=0), index=X.feature_names, name="v1")
importance_v2 = pd.Series(np.abs(sv2).mean(axis=0), index=X.feature_names, name="v2")

comparison = pd.concat([importance_v1, importance_v2], axis=1)
comparison["rank_v1"] = comparison["v1"].rank(ascending=False).astype(int)
comparison["rank_v2"] = comparison["v2"].rank(ascending=False).astype(int)
comparison["rank_change"] = comparison["rank_v1"] - comparison["rank_v2"]

print("Feature importance comparison v1 vs v2:")
print(comparison.sort_values("v2", ascending=False).round(4).to_string())

# Flags: features that changed rank by more than 3
significant_changes = comparison[np.abs(comparison["rank_change"]) > 3]
if not significant_changes.empty:
print(f"\nWarning: {len(significant_changes)} features changed rank significantly:")
print(significant_changes[["rank_v1", "rank_v2", "rank_change"]].to_string())
print("Investigate whether these changes reflect genuine improvements or regression.")
else:
print("\nNo significant rank changes - models have similar feature reliance patterns.")

# Overall correlation between SHAP importance profiles
correlation = np.corrcoef(importance_v1.values, importance_v2.values)[0, 1]
print(f"\nSHAP importance profile correlation (v1 vs v2): {correlation:.4f}")
print("(> 0.95: very similar reliance patterns; < 0.8: substantially different)")

This comparison pattern is especially useful for catching regressions during model updates. If a new model achieves slightly higher accuracy but has completely different feature attribution patterns, that warrants investigation: either the new model learned something genuinely new, or it learned a spurious pattern that happened to boost held-out accuracy.


SHAP in a Complete MLOps Pipeline

Each step in this pipeline has SHAP as a first-class component:

  • Training: compute SHAP globally on validation set, inspect feature reliance before promotion
  • Validation: check for spurious features, run fairness analysis, verify no protected proxies
  • Model registry: serialize the shap.TreeExplainer alongside the model; ensures consistent explanations even when the library version changes
  • Inference API: return SHAP top-3 drivers with every prediction; cache for audit trail
  • Monitoring: weekly SHAP snapshots; alert on >20% importance shift for any feature
  • Audit trail: store SHAP values with prediction ID, model version, and timestamp; enables reconstruction of any historical explanation

The key practice: treat the shap.TreeExplainer as part of the model artifact. When you serialize your XGBoost model with model.save_model(), also pickle the explainer:

import pickle
import shap
import xgboost as xgb

# After training
model = xgb.XGBClassifier(...)
model.fit(X_train, y_train)
explainer = shap.TreeExplainer(model)

# Serialize as a bundle
model_bundle = {
"model": model,
"explainer": explainer,
"feature_names": list(X_train.columns),
"model_version": "v2.1.0",
"training_date": "2026-03-09",
"expected_value": explainer.expected_value
}

with open("model_bundle_v2.1.0.pkl", "wb") as f:
pickle.dump(model_bundle, f)

# Load at inference time
with open("model_bundle_v2.1.0.pkl", "rb") as f:
bundle = pickle.load(f)

# Predictions + explanations from the same versioned bundle
loaded_model = bundle["model"]
loaded_explainer = bundle["explainer"]

This ensures that historical predictions can always be re-explained using the exact explainer configuration active at the time of prediction - a regulatory requirement in many domains.

:::tip 🎮 Interactive Playground

Visualize this concept: Try the SHAP Values & Feature Attribution demo on the EngineersOfAI Playground - no code required.

:::

© 2026 EngineersOfAI. All rights reserved.