Skip to main content

LIME - Local Interpretable Model-Agnostic Explanations

Reading time: 40 min | Interview relevance: High - classic XAI interview topic | Target roles: ML Engineer, Data Scientist, AI Engineer, Research Engineer


The Newsgroup Classifier That Was Wrong for the Right Reasons

It is 2016. Marco Ribeiro is a PhD student at the University of Washington, and he is demonstrating a text classifier trained on the 20 Newsgroups dataset. The task: classify posts as written by an Atheist or a Christian. The model achieves 99% accuracy on the test set. The stakeholders are impressed.

Ribeiro is not satisfied. He applies a new method he is developing - LIME - to explain individual predictions. The result is alarming. For posts classified as "Atheist," the most important feature is the word "Posting." For posts classified as "Christian," it is "Host." These are not theological terms. They are email header artifacts: the metadata line "Posting-host:" that appears in the header of every email in the newsgroup archive.

The model did not learn to distinguish religious viewpoints. It learned to distinguish header formats. In the training and test data, Atheist posts happened to have a different header convention than Christian posts. The model found this pattern and exploited it. The 99% accuracy was real. The model was completely useless for any deployment scenario where the header artifacts were absent.

Without local explanations, this bug was invisible. Overall accuracy looked fine. Only by asking "what did the model look at for this specific prediction?" did the failure become apparent.

This is the founding moment of LIME. The paper, "Why Should I Trust You?: Explaining the Predictions of Any Classifier," is published at KDD 2016. The title is the right question. The answer is: show me what the model looks at.


Why Local Explanations Reveal What Accuracy Hides

Global accuracy metrics aggregate over the entire test distribution. If the model is correct 99% of the time, accuracy masks the 1% cases where it fails - and, more importantly, it masks the mechanism by which it succeeds. A model can be correct for entirely the wrong reasons.

Local explanations shift the question from "is this model usually right?" to "what does the model pay attention to for this specific prediction?" The difference matters enormously:

  • A radiology AI classifies chest X-rays for pneumonia. Overall AUC = 0.92. But LIME reveals that for many images, the model attends to a tag burned into the corner of the image that indicates the patient's home hospital - which correlates with pneumonia prevalence in the training data. The AI learned a demographic proxy, not pneumonia pathology.

  • A fraud detection model has 95% precision. LIME reveals that for many flagged transactions, the model relies primarily on transaction time - specifically late-night timing. Legitimate transactions from users in different time zones are being incorrectly flagged.

  • A loan approval model passes a fairness audit based on aggregate metrics. Local LIME explanations reveal that for applicants in certain zip codes, the model's top feature is the zip code itself - a proxy for race. The aggregate audit missed it; local explanations find it.

The core insight behind LIME: the model is a black box, but locally - in the neighborhood of any single prediction - it behaves approximately like a simple model. Find that local simple model, and you have an explanation.


Historical Context - From Global to Local

Before LIME, the dominant paradigm for model explanation was global: understand the overall model. Feature importance plots. Partial dependence plots. Model distillation. These methods are valuable but answer the wrong question for individual decisions.

The shift to local explanations had a few precursors:

Individual conditional expectation (ICE) (Goldstein et al. 2015): Instead of showing the average model response as a feature varies (PDP), show one line per instance. Reveals heterogeneous effects hidden by averaging.

Anchors (Ribeiro et al. 2018, the LIME team's follow-up): Instead of a linear approximation, find a simple rule that "anchors" the prediction - a set of conditions under which the model always gives the same output.

LIME's innovation was not the idea of locality - statisticians had used local regression for decades. It was applying local linear approximation specifically to explain black-box ML predictions, any model, any modality, with a clean theoretical objective function.


The LIME Objective

LIME finds a simple model gg from an interpretable model class G\mathcal{G} (typically linear models) that locally approximates the black-box model ff around a specific instance xx:

ξ(x)=argmingGL(f,g,πx)+Ω(g)\xi(x) = \arg\min_{g \in \mathcal{G}} \mathcal{L}(f, g, \pi_x) + \Omega(g)

Where:

  • L(f,g,πx)\mathcal{L}(f, g, \pi_x) measures how well gg approximates ff in the locality of xx, weighted by proximity kernel πx\pi_x
  • Ω(g)\Omega(g) is a complexity penalty on gg (e.g., number of features used in a linear model)

The proximity kernel πx\pi_x weights samples by their distance to xx:

πx(z)=exp(D(x,z)2σ2)\pi_x(z) = \exp\left(-\frac{D(x, z)^2}{\sigma^2}\right)

Samples close to xx (small D(x,z)D(x, z)) get weight near 1. Samples far from xx get exponentially downweighted. The bandwidth σ\sigma controls how local the explanation is.

The final objective, expanded for a linear model g(z)=wzg(z') = w \cdot z':

ξ(x)=argminwz,zπx(z)(f(z)wz)2+λw1\xi(x) = \arg\min_{w} \sum_{z, z'} \pi_x(z) \left(f(z) - w \cdot z'\right)^2 + \lambda \|w\|_1

This is a weighted lasso regression. The weights are the LIME explanation - the coefficients of the local linear approximation.


The LIME Algorithm - Step by Step

Step 1 - Sample the neighborhood: Generate NN perturbed instances zz near xx. The perturbation strategy depends on data modality:

  • Tabular: zz' is a binary vector indicating which features are "present" (kept) or "absent" (replaced with background value). For each sample, randomly set some features to absent.
  • Text: zz' is a binary vector over words. Randomly zero out words in the instance to create perturbed documents.
  • Image: Segment the image into superpixels (contiguous regions). zz' is a binary vector over superpixels. Randomly grey out superpixels.

Step 2 - Get predictions: Feed each perturbed instance zz through the black-box model ff to get prediction f(z)f(z).

Step 3 - Compute proximity weights: πx(z)=exp(D(x,z)2/σ2)\pi_x(z) = \exp(-D(x, z)^2 / \sigma^2) where DD is an appropriate distance measure (cosine for text, L2 for images, weighted L2 for tabular).

Step 4 - Fit weighted linear regression: Fit a linear model on (z,f(z))(z', f(z)) pairs, weighted by πx(z)\pi_x(z). Apply L1 regularization (or feature selection via K-LASSO) to keep the explanation sparse.

Step 5 - Return coefficients: The weights of the linear model are the LIME explanation. Positive weight = feature pushes the prediction toward the predicted class. Negative weight = feature pushes away.


LIME for Tabular Data - From Scratch

import numpy as np
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# ─── TRAIN BLACK-BOX MODEL ────────────────────────────────────────────────────

data = load_breast_cancer()
X = data.data.astype(float)
y = data.target
feature_names = list(data.feature_names)

X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)

# Normalize for distance computation
scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s = scaler.transform(X_test)

rf = RandomForestClassifier(n_estimators=200, random_state=42)
rf.fit(X_train_s, y_train)
print(f"Random Forest accuracy: {rf.score(X_test_s, y_test):.4f}")

# ─── LIME FROM SCRATCH ────────────────────────────────────────────────────────

class TabularLIME:
"""
LIME for tabular data: sample neighborhood by replacing features
with mean values from the training distribution.
"""

def __init__(self, model, X_train, feature_names,
n_samples=500, kernel_width=0.75, n_features=6):
self.model = model
self.X_train = X_train
self.feature_names = feature_names
self.n_samples = n_samples
self.sigma = kernel_width * np.sqrt(X_train.shape[1]) # scale by dimensionality
self.n_features = n_features

# Statistics for generating perturbations
self.feature_means = X_train.mean(axis=0)
self.feature_stds = X_train.std(axis=0)

def _proximity_kernel(self, distances):
"""Exponential kernel: closer samples get higher weight"""
return np.exp(-(distances ** 2) / (self.sigma ** 2))

def explain(self, instance, class_idx=1):
"""
Explain the prediction for 'instance' (class_idx = class to explain).
Returns: dict of {feature_name: lime_coefficient}
"""
n_features = len(instance)

# ── Step 1: Sample perturbations ──────────────────────────────────────
# z' is a binary matrix: 1 = keep original, 0 = replace with mean
z_prime = np.random.binomial(1, 0.5, size=(self.n_samples, n_features))

# Force at least some features to be present in each sample
z_prime[z_prime.sum(axis=1) == 0, :] = 1 # avoid all-zero rows

# Build actual perturbations: blend instance and feature means
z = np.where(z_prime == 1, instance, self.feature_means)

# ── Step 2: Get black-box predictions ─────────────────────────────────
f_z = self.model.predict_proba(z)[:, class_idx]

# ── Step 3: Compute distance and kernel weights ────────────────────────
# Distance in normalized feature space: count of changed features
distances = np.sqrt((z_prime == 0).sum(axis=1).astype(float))
weights = self._proximity_kernel(distances)

# ── Step 4: Fit weighted linear regression (Ridge) ────────────────────
# Use L1 / feature selection to keep explanation sparse
from sklearn.feature_selection import SelectKBest, f_regression

# Select top K features first (faster than full lasso)
selector = SelectKBest(f_regression, k=min(self.n_features, n_features))
selector.fit(z_prime, f_z, sample_weight=weights)
selected_features = selector.get_support(indices=True)
z_selected = z_prime[:, selected_features]

# Fit weighted ridge on selected features
ridge = Ridge(alpha=1.0)
ridge.fit(z_selected, f_z, sample_weight=weights)

# ── Step 5: Return explanation ─────────────────────────────────────────
explanation = {
self.feature_names[i]: float(coef)
for i, coef in zip(selected_features, ridge.coef_)
}
return dict(sorted(explanation.items(), key=lambda t: abs(t[1]), reverse=True))


lime_explainer = TabularLIME(rf, X_train_s, feature_names, n_samples=500)

# Explain a single test instance
instance_idx = 10
instance = X_test_s[instance_idx]
pred_class = rf.predict([instance])[0]
pred_prob = rf.predict_proba([instance])[0, 1]

print(f"\nLIME explanation for instance {instance_idx}:")
print(f"Prediction: class={pred_class}, prob={pred_prob:.3f}")

explanation = lime_explainer.explain(instance, class_idx=1)
print("\nFeature contributions (LIME coefficients):")
for feat, coef in list(explanation.items())[:6]:
direction = "pushes toward positive" if coef > 0 else "pushes toward negative"
print(f" {feat:35s} {coef:+.4f} ({direction})")

LIME for Text - Catching the Newsgroup Bug

import re
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression

# Simulate a text classifier (newsgroup-style)
sample_texts = [
"God is real and the Bible is truth. Posting-host: christian-server.edu",
"Science explains everything. Religion is primitive. Posting-host: atheism-list.net",
"Prayer has guided my life. Jesus saves all who believe.",
"Evolution is fact. God is a human invention. Rational thought wins.",
"I believe in the Father the Son and the Holy Spirit",
"There is no evidence for any supernatural being. Logic matters."
]
labels = [1, 0, 1, 0, 1, 0] # 1=Christian, 0=Atheist

pipe = Pipeline([
("tfidf", TfidfVectorizer(max_features=100)),
("clf", LogisticRegression(random_state=42))
])
pipe.fit(sample_texts, labels)

def text_lime_explain(model, text, class_idx=1, n_samples=200, n_features=5):
"""
LIME for text: randomly remove words, observe prediction changes,
fit a local linear model on word presence/absence.
"""
words = re.findall(r'\w+', text.lower())
unique_words = list(set(words))

if not unique_words:
return {}

# Perturbation: randomly remove words from the text
perturbed_texts = []
z_prime = [] # binary: word present (1) or absent (0)

for _ in range(n_samples):
# Each word is kept with probability 0.5
keep = np.random.binomial(1, 0.5, size=len(unique_words))
keep_set = {w for w, k in zip(unique_words, keep) if k == 1}

# Reconstruct text with only kept words
new_text = " ".join(w for w in words if w in keep_set) or "empty"
perturbed_texts.append(new_text)
z_prime.append(keep)

z_prime = np.array(z_prime, dtype=float)

# Get model predictions for perturbed texts
f_z = model.predict_proba(perturbed_texts)[:, class_idx]

# Proximity weights: fewer removed words = closer to original
distances = np.sqrt((z_prime == 0).sum(axis=1).astype(float))
sigma = 0.75 * np.sqrt(len(unique_words))
weights = np.exp(-(distances ** 2) / (sigma ** 2))

# Fit weighted ridge regression
from sklearn.linear_model import Ridge
ridge = Ridge(alpha=1.0)
ridge.fit(z_prime, f_z, sample_weight=weights)

# Return top-N most important words
word_coefs = sorted(
zip(unique_words, ridge.coef_),
key=lambda t: abs(t[1]), reverse=True
)[:n_features]

return dict(word_coefs)

# Explain the first text (a Christian post with header artifact)
text = sample_texts[0]
print(f"\nText: '{text}'")
print(f"Prediction: {pipe.predict([text])[0]} (1=Christian)")
explanation = text_lime_explain(pipe, text)
print("LIME word importance:")
for word, coef in explanation.items():
direction = "→ Christian" if coef > 0 else "→ Atheist"
print(f" '{word}': {coef:+.4f} ({direction})")
# If "posting" or "host" appear with high coefficients, the model is exploiting
# header artifacts - exactly the bug Ribeiro found in the original 2016 paper

LIME vs SHAP - When to Use Which

import time
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import make_classification
import shap

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test = X[:800], X[800:]
y_train, y_test = y[:800], y[800:]

model = GradientBoostingClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# ── Timing comparison ──────────────────────────────────────────────────────

# SHAP (TreeExplainer)
t0 = time.perf_counter()
explainer_shap = shap.TreeExplainer(model)
shap_values = explainer_shap.shap_values(X_test[:10])
t_shap = time.perf_counter() - t0
print(f"TreeSHAP for 10 instances: {t_shap*1000:.1f}ms")

# LIME (using lime library)
try:
import lime
import lime.lime_tabular

lime_explainer = lime.lime_tabular.LimeTabularExplainer(
X_train, mode="classification", random_state=42
)
t0 = time.perf_counter()
for i in range(10):
exp = lime_explainer.explain_instance(
X_test[i], model.predict_proba, num_features=10, num_samples=500
)
t_lime = time.perf_counter() - t0
print(f"LIME for 10 instances: {t_lime*1000:.1f}ms")
print(f"LIME is {t_lime/t_shap:.0f}x slower than TreeSHAP")
except ImportError:
print("lime library not installed - install with: pip install lime")

# ── Stability comparison: run LIME twice with different random seeds ───────

print("\nLIME stability check (same instance, 5 different random seeds):")
print("If coefficients differ significantly, the explanation is unstable.")

feature_names = [f"feature_{i}" for i in range(20)]
instance = X_test[0]

lime_results = []
for seed in range(5):
rng = np.random.RandomState(seed)
# Manually sample perturbations with this seed
n_samples = 500
z_prime = rng.binomial(1, 0.5, size=(n_samples, X_train.shape[1]))
z = np.where(z_prime == 1, instance, X_train.mean(axis=0))
f_z = model.predict_proba(z)[:, 1]

distances = np.sqrt((z_prime == 0).sum(axis=1).astype(float))
sigma = 0.75 * np.sqrt(X_train.shape[1])
weights = np.exp(-(distances ** 2) / sigma ** 2)

from sklearn.linear_model import Ridge
ridge = Ridge(alpha=1.0)
ridge.fit(z_prime, f_z, sample_weight=weights)
lime_results.append(ridge.coef_)

lime_results = np.array(lime_results)
stability_std = lime_results.std(axis=0)
most_unstable_idx = stability_std.argsort()[-5:][::-1]
for idx in most_unstable_idx:
print(f" {feature_names[idx]:15s} "
f"mean coef: {lime_results[:, idx].mean():+.4f} "
f"std: {stability_std[idx]:.4f}")
print("(High std = unstable explanation - verify by running LIME multiple times)")

LIME for Image Classification

# Conceptual implementation of image LIME
# Uses superpixel segmentation and binary masking

try:
from skimage.segmentation import quickshift
from skimage.util import img_as_float
import numpy as np

def image_lime_explain(model_fn, image, n_segments=50,
n_samples=200, class_idx=0):
"""
LIME for images:
1. Segment image into superpixels
2. Randomly mask superpixels (grey them out)
3. Get model predictions
4. Fit linear model on superpixel presence/absence
"""
image_float = img_as_float(image)

# Segment into superpixels
segments = quickshift(image_float, kernel_size=4,
max_dist=200, ratio=0.2)
n_superpixels = segments.max() + 1
print(f" Image segmented into {n_superpixels} superpixels")

# Sample perturbations
z_prime = np.random.binomial(1, 0.5,
size=(n_samples, n_superpixels))

# Build perturbed images
perturbed = []
for mask_row in z_prime:
perturbed_img = image_float.copy()
for seg_idx, keep in enumerate(mask_row):
if not keep:
# Grey out this superpixel
perturbed_img[segments == seg_idx] = 0.5
perturbed.append(perturbed_img)

# Get predictions (model_fn takes batch of images)
f_z = model_fn(np.array(perturbed))[:, class_idx]

# Proximity weights
distances = np.sqrt((z_prime == 0).sum(axis=1).astype(float))
sigma = 0.75 * np.sqrt(n_superpixels)
weights = np.exp(-(distances ** 2) / sigma ** 2)

# Fit weighted linear model
from sklearn.linear_model import Ridge
ridge = Ridge(alpha=1.0)
ridge.fit(z_prime, f_z, sample_weight=weights)

# Return superpixel importance scores
importance = {f"superpixel_{i}": float(coef)
for i, coef in enumerate(ridge.coef_)}
return ridge.coef_, segments

print("image_lime_explain() ready - pass a model_fn and image array")

except ImportError:
print("scikit-image not installed. Install with: pip install scikit-image")
print("Image LIME conceptual algorithm above is correct.")

LIME in Production

Stability is your biggest concern. LIME is a Monte Carlo method - each run samples a different random neighborhood. The explanation can change between runs. Before deploying LIME-based explanations, validate stability:

def lime_stability_check(lime_explainer_fn, instance, n_runs=10, top_k=3):
"""
Run LIME n_runs times on the same instance and measure explanation stability.
Returns: stability score (1.0 = perfectly stable, 0.0 = completely random)
"""
all_top_features = []
for _ in range(n_runs):
exp = lime_explainer_fn(instance)
top_features = set(list(exp.keys())[:top_k])
all_top_features.append(top_features)

# Measure Jaccard similarity between all pairs of explanation sets
n = len(all_top_features)
similarities = []
for i in range(n):
for j in range(i + 1, n):
intersection = len(all_top_features[i] & all_top_features[j])
union = len(all_top_features[i] | all_top_features[j])
similarities.append(intersection / union if union > 0 else 1.0)

return np.mean(similarities)

# Use this in a pre-deployment validation pipeline:
# If stability < 0.8, increase n_samples or consider switching to SHAP
print("Stability check: run lime_stability_check before deploying any LIME explanation.")
print("If stability < 0.8, increase n_samples (try 1000-2000) or use TreeSHAP instead.")

Bandwidth selection matters. The kernel bandwidth σ\sigma controls how local LIME is. A small σ\sigma gives highly local explanations (accurate to the immediate neighborhood but not generalizable). A large σ\sigma approaches a global linear approximation (more stable but less faithful to local behavior). The default kernel_width=0.75 in the lime library is usually a reasonable starting point, but always validate.

For production APIs: cache LIME explanations when possible. Since LIME explanations are stochastic, cache the result of multiple runs and return the most stable explanation (highest agreement between runs). For high-stakes decisions, show confidence intervals over LIME coefficients across multiple runs.


Common Mistakes

danger

Trusting a LIME explanation without checking stability

LIME is a Monte Carlo method. The same input can produce meaningfully different explanations on different runs. Before presenting any LIME explanation to a user or regulator, run LIME 5-10 times on the same instance and measure how consistent the top features are. If the top-3 features change across runs, the explanation is unreliable. Increase n_samples or switch to TreeSHAP, which is deterministic and exact.

danger

Using LIME to "explain" a model that is actually relying on artifacts

LIME faithfully explains what the model looks at - including spurious patterns. If your model uses header artifacts (like the newsgroup example), LIME will correctly show those as important. This is not a bug in LIME - it is LIME working correctly. The actionable response is not to distrust LIME but to investigate and fix the model.

warning

Setting the neighborhood too wide

If kernel_width is too large, LIME samples a wide neighborhood and fits a global linear approximation that is not locally faithful. The explanation looks clean (few features, high coefficient magnitudes) but does not accurately represent the model's behavior near the specific instance. Always validate that the local linear model achieves high R2R^2 on the sampled neighborhood - low R2R^2 means the local linear approximation is poor.

warning

Using binary superpixel masking for images without checking superpixel quality

Image LIME segments the image and masks superpixels. If the segmentation is poor - for example, superpixels straddle object boundaries - the masked inputs are incoherent (half an object is grey, the other half is present). This confuses the model and produces noisy, unreliable predictions. Use appropriate segmentation hyperparameters for your image domain, or switch to RISE (randomized input sampling for explanations) which uses smoother masks.


YouTube Resources

  • "Why Should I Trust You?" - Marco Ribeiro (KDD 2016 presentation): The original paper presentation. Ribeiro walks through the newsgroup classifier example in real time.
  • "LIME Explained" - StatQuest with Josh Starmer: Excellent visual walkthrough of the perturbation and linear fitting steps.
  • "LIME vs SHAP" - Towards Data Science talk: Practical comparison of when each method is appropriate.
  • "Trust, Explanations, and ML" - NIPS 2016 tutorial: Broader context on why local explanations matter for human trust.

Interview Q&A

Q: Walk me through the LIME algorithm step by step.

LIME explains a prediction by locally approximating the black-box model with a simple interpretable model. Step 1: sample a neighborhood around the instance by perturbing the input. For tabular data, randomly zero out features (replace with background values). For text, randomly remove words. For images, randomly grey out superpixels. Step 2: feed each perturbed instance through the black-box model to get predictions. Step 3: weight each perturbed instance by its proximity to the original instance using an exponential kernel πx(z)=exp(D(x,z)2/σ2)\pi_x(z) = \exp(-D(x,z)^2/\sigma^2). Step 4: fit a weighted sparse linear regression on the perturbed inputs and their predictions. Step 5: return the regression coefficients as the explanation - positive coefficients indicate features that push toward the predicted class.

Q: What is the LIME objective function and what does each term mean?

ξ(x)=argmingGL(f,g,πx)+Ω(g)\xi(x) = \arg\min_{g \in \mathcal{G}} \mathcal{L}(f, g, \pi_x) + \Omega(g)

gg is the local interpretable model (typically linear). G\mathcal{G} is the class of interpretable models. L(f,g,πx)\mathcal{L}(f, g, \pi_x) is the fidelity loss - how well gg approximates ff in the locality of xx, weighted by the proximity kernel πx\pi_x. Ω(g)\Omega(g) is the complexity penalty - typically the number of features in the linear model. The objective balances local fidelity (the explanation should match the black box in the neighborhood) with simplicity (the explanation should be short and interpretable).

Q: LIME vs SHAP - what are the key differences and when would you choose each?

LIME is a Monte Carlo approximation that fits a local linear model. It is model-agnostic (works for any model), handles any modality (tabular, text, image), but is unstable (different random seeds give different explanations), slower than TreeSHAP, and theoretically weaker (no axiomatic guarantees). SHAP is axiomatically grounded: it satisfies four fairness axioms that uniquely define a correct attribution. TreeSHAP is exact and runs in under 1ms for tree models.

Choose LIME when: you have a non-tree model, you need to explain image or text predictions (LIME's superpixel and word masking are natural for these modalities), or when computational resources are not a constraint. Choose SHAP when: you have a tree model (TreeSHAP is exact and fast), you need theoretically guaranteed attributions, you need to serve explanations in a production API, or you need consistent explanations for audit purposes.

Q: What is the stability problem with LIME and how do you mitigate it?

LIME samples a neighborhood randomly. Each run uses a different set of perturbed instances. For the same input, different runs can produce meaningfully different top features - especially when the decision boundary is complex near the instance. Mitigation strategies: (1) Increase n_samples to 2000-5000 to reduce Monte Carlo variance. (2) Run LIME multiple times and report the intersection of top features. (3) Report confidence intervals over coefficients across runs. (4) Use SHAP instead when exact, deterministic explanations are required. (5) Compute a stability score (Jaccard similarity of top-K features across runs) and reject explanations below a threshold.

Q: How does LIME handle different data modalities (tabular, text, image)?

The core algorithm is the same for all modalities, but the perturbation and feature representation differ. For tabular data: features are the columns; perturbation replaces feature values with background statistics (mean or sample from training distribution); zz' is a binary vector (feature present/absent). For text: features are words or tokens; perturbation randomly removes words from the document; zz' is a binary vector over the vocabulary. For images: features are superpixels (contiguous image regions); perturbation greys out superpixels; zz' is a binary vector over superpixels. In all cases, the proximity kernel, weighted linear regression, and coefficient readout remain identical.


LIME with the Official Library

The lime library (Ribeiro et al.) provides production-grade implementations for all three modalities. Here is the complete usage pattern:

# pip install lime
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# ─── Dataset ──────────────────────────────────────────────────────────────────

data = load_breast_cancer()
X = data.data.astype(float)
y = data.target
feature_names = list(data.feature_names)
class_names = list(data.target_names)

X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)

# Train a black-box model
model = GradientBoostingClassifier(n_estimators=200, random_state=42)
model.fit(X_train, y_train)
print(f"GBM accuracy: {model.score(X_test, y_test):.4f}")

try:
import lime
import lime.lime_tabular

# ─── Initialize LIME tabular explainer ────────────────────────────────────

lime_explainer = lime.lime_tabular.LimeTabularExplainer(
X_train,
feature_names=feature_names,
class_names=class_names,
mode="classification",
random_state=42
)

# ─── Explain a single instance ────────────────────────────────────────────

instance_idx = 7
instance = X_test[instance_idx]

exp = lime_explainer.explain_instance(
instance,
model.predict_proba,
num_features=8, # top-8 features in explanation
num_samples=2000, # larger = more stable
top_labels=1
)

print(f"\nLIME explanation for instance {instance_idx}:")
print(f"Prediction: {class_names[model.predict([instance])[0]]}")
print(f"Confidence: {model.predict_proba([instance])[0].max():.3f}")
print(f"\nTop feature contributions:")
for feature, weight in exp.as_list():
direction = "positive" if weight > 0 else "negative"
print(f" {feature:40s} {weight:+.4f}")

# ─── Local fidelity score ─────────────────────────────────────────────────

# How well does the local linear model approximate the black box in this neighborhood?
# exp.score is the R^2 of the local linear model on the sampled neighborhood
print(f"\nLocal fidelity (R² of linear approximation): {exp.score:.4f}")
print("(High R² = linear model captures local behavior well)")
print("(Low R² = LIME explanation may not be reliable for this instance)")

# ─── Batch explanation + stability audit ─────────────────────────────────

def lime_batch_explain(explainer, model_fn, X_batch, n_runs=5, n_features=5):
"""
Explain each instance in X_batch n_runs times and compute stability.
Returns explanations and stability scores.
"""
results = []
for i, instance in enumerate(X_batch):
run_features = []
for run in range(n_runs):
exp = explainer.explain_instance(
instance, model_fn,
num_features=n_features,
num_samples=1000,
random_state=run * 100 + i
)
top_feats = set(f for f, _ in exp.as_list()[:n_features])
run_features.append(top_feats)

# Compute pairwise Jaccard similarity
n = len(run_features)
sims = []
for a in range(n):
for b in range(a + 1, n):
inter = len(run_features[a] & run_features[b])
union = len(run_features[a] | run_features[b])
sims.append(inter / union if union > 0 else 1.0)

stability = np.mean(sims)
consensus = set.intersection(*run_features) if run_features else set()

results.append({
"instance_idx": i,
"stability_score": stability,
"consensus_features": list(consensus),
"reliable": stability >= 0.75
})

return results

print("\nRunning batch stability audit on 5 test instances (5 runs each)...")
stability_results = lime_batch_explain(
lime_explainer, model.predict_proba, X_test[:5]
)
for r in stability_results:
status = "STABLE" if r["reliable"] else "UNSTABLE"
print(f" Instance {r['instance_idx']}: stability={r['stability_score']:.3f} "
f"[{status}], consensus features: {len(r['consensus_features'])}")

except ImportError:
print("lime library not installed. Install with: pip install lime")

Anchors - LIME's Follow-Up

Ribeiro's team followed LIME with Anchors (2018) - a different local explanation approach that produces rules rather than coefficients. Where LIME says "this feature contributed +0.3", Anchors says "this prediction holds as long as debt_ratio > 0.4 AND credit_score < 700."

Anchors have higher precision: the rule predicts the same output as the model with high confidence (the "coverage" and "precision" properties). This makes them more actionable for stakeholders who think in rules rather than coefficients.

# pip install alibi (provides Anchors implementation)
try:
from alibi.explainers import AnchorTabular
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

data = load_breast_cancer()
X = data.data.astype(float)
y = data.target
feature_names = list(data.feature_names)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

rf = RandomForestClassifier(n_estimators=200, random_state=42)
rf.fit(X_train, y_train)

# Initialize Anchor explainer
anchor_explainer = AnchorTabular(
predictor=rf.predict,
feature_names=feature_names
)
anchor_explainer.fit(X_train, disc_perc=(25, 50, 75)) # discretize continuous features

# Explain instance
instance = X_test[5]
explanation = anchor_explainer.explain(instance, threshold=0.95)

print("Anchor explanation:")
print(f" Prediction: {rf.predict([instance])[0]}")
print(f" Anchor rule: {' AND '.join(explanation.anchor)}")
print(f" Precision: {explanation.precision:.3f}")
print(f" Coverage: {explanation.coverage:.3f}")
print(f" (Precision: among instances satisfying the rule, {explanation.precision:.1%} get the same prediction)")
print(f" (Coverage: {explanation.coverage:.1%} of the data satisfies this rule)")

except ImportError:
print("alibi not installed. Install with: pip install alibi")
print("Anchors produce if-then rules as explanations.")
print("Example: IF credit_score < 650 AND debt_ratio > 0.45 THEN deny loan")
print("Key properties:")
print(" Precision: P(model agrees with rule | rule applies) - want > 0.95")
print(" Coverage: P(rule applies) - higher coverage = more general rule")

Anchors are often the right choice for communicating with non-technical stakeholders. A loan officer understands "IF debt-to-income ratio exceeds 45% AND credit score is below 650, THEN the model predicts default." They may not understand "the SHAP value for debt_ratio is +0.34."


The LIME Faithfulness Problem - Quantifying It

LIME's local R2R^2 score tells you how well the local linear model approximates the black box in the sampled neighborhood. Low R2R^2 means the linear approximation is poor and the explanation may be unreliable.

A systematic way to catch unfaithful LIME explanations:

import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

data = load_breast_cancer()
X = data.data.astype(float)
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = GradientBoostingClassifier(n_estimators=200, max_depth=5, random_state=42)
model.fit(X_train, y_train)

try:
import lime.lime_tabular

explainer = lime.lime_tabular.LimeTabularExplainer(
X_train, feature_names=list(data.feature_names),
class_names=list(data.target_names), mode="classification"
)

# Compute local R² for each test instance
fidelity_scores = []
for i in range(min(50, len(X_test))):
exp = explainer.explain_instance(
X_test[i], model.predict_proba,
num_features=10, num_samples=1000
)
fidelity_scores.append(exp.score)

fidelity = np.array(fidelity_scores)
print(f"Local fidelity (R²) across 50 test instances:")
print(f" Mean: {fidelity.mean():.4f}")
print(f" Std: {fidelity.std():.4f}")
print(f" Min: {fidelity.min():.4f}")
print(f" Instances with R² < 0.5 (unreliable): {(fidelity < 0.5).sum()}")
print(f" Instances with R² > 0.9 (high fidelity): {(fidelity > 0.9).sum()}")
print()
print("Rule of thumb: only trust LIME explanations where R² > 0.7")
print("For low-fidelity instances, report the R² alongside the explanation")
print("so stakeholders understand the reliability of the explanation.")

except ImportError:
print("lime library not installed.")

Including the local R2R^2 alongside every LIME explanation is a transparency best practice. An explanation with R2=0.95R^2 = 0.95 is highly reliable. An explanation with R2=0.4R^2 = 0.4 should be flagged as uncertain, or replaced with a SHAP explanation.


Practice Problems

  1. Implement LIME from scratch for a binary text classifier. Your implementation should: sample neighborhoods by randomly removing words, weight samples by cosine distance to the original document, fit a weighted ridge regression, and return the top-5 words and their coefficients. Test it on a sentiment classifier.

  2. You are deploying LIME explanations in a real-time API for a medical risk model. The model has 30 features. Each LIME call with n_samples=1000 takes 800ms on your server. This is too slow for real-time use. Describe three strategies to reduce latency while maintaining explanation quality.

  3. A colleague argues that LIME explanations are "good enough" for a regulated loan decision system because they are faster than SHAP. Write a technical memo explaining the limitations of LIME relative to SHAP for regulatory compliance, referencing the stability problem, the faithfulness issue, and the lack of axiomatic guarantees.

  4. You discover that a LIME explanation for a critical medical decision has a local R2=0.35R^2 = 0.35. What does this mean, and what should you do? Consider the options: (a) discard the explanation, (b) increase n_samples, (c) switch to SHAP, (d) report the explanation with a confidence warning.

  5. Compare LIME and Anchors as explanation methods. For each method, describe: (a) what the explanation artifact looks like, (b) who the best audience is (data scientist vs clinician vs regulator), (c) the key failure mode, (d) when you would choose it.


The Bottom Line on LIME

LIME was the paper that launched the modern XAI movement. Its contribution was not a theoretically optimal algorithm - SHAP provides that. Its contribution was proving the concept: any black-box model, any modality, can be made locally explainable. That proof of concept unlocked the entire field.

In practice today, LIME has a specific niche:

  • Best for: image and text explanations where SHAP is not naturally applicable, rapid prototyping of explanation systems, cases where a local linear approximation is genuinely informative
  • Not recommended for: regulated high-stakes decisions where explanation consistency matters, real-time APIs where sub-100ms latency is required, any case where a tree model is available (use TreeSHAP instead)
  • Always validate: check local R2R^2, run stability checks, compare against SHAP when possible

The newsgroup classifier story is worth keeping in mind. LIME did not just explain the model - it caught a critical bug that accuracy metrics entirely missed. The most valuable use of any explanation method is not post-hoc documentation. It is real-time debugging during model development, before deployment, when you still have the ability to fix what you find.


Summary - LIME Method Comparison

Three rules for every LIME deployment:

  1. Always check local R2R^2. If it is below 0.7, increase n_samples or switch to SHAP.
  2. Always run stability checks. Run LIME 5 times per instance before shipping an explanation system. If the top-3 features change, increase samples or switch methods.
  3. Always validate against SHAP when possible. LIME and SHAP should agree on the broad strokes. Large divergences warrant investigation.

LIME remains valuable specifically where SHAP has gaps: image classification explanations, NLP models without fast Shapley implementations, and rapid iteration during model debugging. In those contexts, LIME's flexibility and modality coverage make it the right tool. In all other contexts, TreeSHAP wins on every dimension that matters for production.

:::tip 🎮 Interactive Playground

Visualize this concept: Try the LIME Local Explanations demo on the EngineersOfAI Playground - no code required.

:::

© 2026 EngineersOfAI. All rights reserved.