Model Explainability with SHAP and LIME: Opening the Black Box
The Explainability Crisis — Why “It Works” Isn’t Enough
Your gradient boosting model achieves 97% accuracy on the test set. The stakeholder nods approvingly, then asks: “Why did it reject this customer’s loan application?” You stare at 500 trees with 8 levels each and realize you have no idea.
This is the explainability crisis — and it’s not just a nice-to-have. GDPR’s “right to explanation,” healthcare regulations, and financial compliance all require that automated decisions be interpretable. A model that can’t explain itself is a liability.
Fortunately, two elegant frameworks let you explain any model without cracking it open. But first, let’s establish the landscape:
- Global vs. local explanations. Global explanations tell you what features matter overall (“income is the most important predictor”). Local explanations tell you why a specific prediction was made (“this applicant was rejected because their debt ratio is 0.85”). Both are valuable; they answer different questions.
- Intrinsic vs. post-hoc. Some models are interpretable by design — a shallow decision tree or logistic regression lets you read the decision rules directly. But when you use a gradient boosting ensemble or a neural network, you need post-hoc tools that explain the model after training.
The two dominant post-hoc approaches are SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations). SHAP comes from game theory — it asks “what is each feature’s fair contribution?” LIME comes from approximation theory — it asks “what simple model explains this prediction locally?” By the end of this post, you’ll know how both work under the hood, when to use each, and how to build a production explanation pipeline.
Shapley Values — Fair Credit from Game Theory
Before SHAP, there were Shapley values. In 1953, mathematician Lloyd Shapley solved a problem that seems unrelated to machine learning: if a group of players cooperate to earn a payout, how do you fairly distribute the earnings?
His answer was elegant. For each player, consider every possible coalition (subset) of other players. Measure the player’s marginal contribution — how much the payout increases when they join. Average this marginal contribution over all possible coalitions, weighted by how many ways each coalition can form. The result is the Shapley value.
Applied to ML: features are “players,” the model’s prediction is the “payout,” and the Shapley value of feature i is its fair share of the prediction. The formula is:
φi = ΣS⊆N\{i} |S|!(|N|−|S|−1)! / |N|! × [f(S∪{i}) − f(S)]
Where f(S) is the model’s prediction using only the features in set S (with absent features marginalized out), and N is the full feature set. This formula has four remarkable properties that uniquely define it — no other attribution method satisfies all four:
- Efficiency: The Shapley values sum to
f(x) − E[f(x)]. Every bit of the prediction above the baseline is accounted for. - Symmetry: If two features contribute identically in every coalition, they get identical Shapley values.
- Null player: A feature that never changes the prediction in any coalition gets a Shapley value of zero.
- Linearity: For a sum of two models, the Shapley values of the sum equal the sum of the individual Shapley values.
The catch? Computing exact Shapley values requires evaluating 2n subsets for n features. With 20 features, that’s over a million subsets. With 100 features, it’s more subsets than atoms in the universe. Let’s see it in action for a small model where brute force is tractable:
import itertools
import math
import numpy as np
# A simple loan model: f(x) = weighted sum + interaction term
def loan_model(income, credit_score, debt_ratio, employment_years):
score = (0.3 * income + 0.25 * credit_score
- 0.35 * debt_ratio + 0.1 * employment_years)
# Add interaction: high income offsets high debt
score += 0.15 * income * (1 - debt_ratio)
return score
feature_names = ["income", "credit_score", "debt_ratio", "employment_years"]
# Applicant to explain (normalized 0-1 scale)
x = [0.7, 0.8, 0.4, 0.6]
# Background mean (population average)
bg = [0.5, 0.5, 0.5, 0.5]
def f_subset(subset_mask, instance, background):
"""Evaluate model with only subset features; others use background."""
args = [instance[i] if subset_mask[i] else background[i] for i in range(4)]
return loan_model(*args)
n = len(feature_names)
shapley_values = np.zeros(n)
for i in range(n):
others = [j for j in range(n) if j != i]
for size in range(n): # coalition sizes 0..n-1
for combo in itertools.combinations(others, size):
mask_without = [j in combo for j in range(n)]
mask_with = list(mask_without)
mask_with[i] = True
marginal = f_subset(mask_with, x, bg) - f_subset(mask_without, x, bg)
weight = (math.factorial(size)
* math.factorial(n - size - 1)
/ math.factorial(n))
shapley_values[i] += weight * marginal
f_x = loan_model(*x)
f_bg = loan_model(*bg)
print(f"f(x) = {f_x:.4f}, E[f(x)] = {f_bg:.4f}")
print(f"f(x) - E[f(x)] = {f_x - f_bg:.4f}")
print(f"Sum of Shapley values = {shapley_values.sum():.4f}")
print()
for name, val in zip(feature_names, shapley_values):
print(f" {name:<20s} φ = {val:+.4f}")
Running this, you’ll see that the Shapley values sum exactly to f(x) - E[f(x)] — the efficiency property in action. Income and credit score get the largest positive contributions, while debt ratio gets a large negative contribution (pushing toward rejection). The interaction term (income × (1 - debt_ratio)) distributes credit across both income and debt ratio — Shapley values handle interactions naturally.
SHAP — Making Shapley Values Practical
Exact Shapley values are beautiful but exponentially expensive. In 2017, Scott Lundberg and Su-In Lee published “A Unified Approach to Interpreting Model Predictions” at NeurIPS, introducing SHAP — a family of algorithms that make Shapley values tractable for real models:
- KernelSHAP — Model-agnostic. Samples coalitions of features and fits a weighted linear regression where the weights come from the Shapley kernel. Works with any model but can be slow for many features.
- TreeSHAP — Specialized for tree-based models (random forests, gradient boosting, XGBoost). Exploits the tree structure to compute exact Shapley values in polynomial time — O(TLD²) where T is the number of trees, L is the max leaves, and D is the max depth. This is the gold standard for tree models.
- DeepSHAP — For neural networks. Combines Shapley values with DeepLift’s backpropagation rules to attribute predictions through network layers.
Let’s use the SHAP library on a real scikit-learn model. We’ll train a gradient boosting classifier on a synthetic loan dataset and explain individual predictions:
import numpy as np
import shap
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
# Generate synthetic loan data
rng = np.random.RandomState(42)
n_samples = 2000
income = rng.normal(55000, 15000, n_samples)
credit_score = rng.normal(680, 50, n_samples)
debt_ratio = rng.uniform(0.1, 0.9, n_samples)
employment_years = rng.exponential(5, n_samples).clip(0, 30)
loan_amount = rng.normal(25000, 10000, n_samples).clip(5000, None)
# Approval rule (with some noise)
approval_score = (0.3 * (income / 80000) + 0.25 * (credit_score / 800)
- 0.35 * debt_ratio + 0.1 * (employment_years / 15))
approved = (approval_score + rng.normal(0, 0.05, n_samples)) > 0.35
X = np.column_stack([income, credit_score, debt_ratio,
employment_years, loan_amount])
feature_names = ["income", "credit_score", "debt_ratio",
"employment_years", "loan_amount"]
y = approved.astype(int)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42)
model = GradientBoostingClassifier(
n_estimators=200, max_depth=4, random_state=42)
model.fit(X_train, y_train)
print(f"Test accuracy: {model.score(X_test, y_test):.3f}")
# Explain with TreeSHAP
explainer = shap.TreeExplainer(model)
shap_values = explainer(X_test)
# Single prediction explanation
idx = 0 # First test instance
print(f"\nPrediction for applicant #{idx}: "
f"{'Approved' if model.predict(X_test[idx:idx+1])[0] else 'Rejected'}")
print(f"Base value (avg prediction): {shap_values[idx].base_values:.3f}")
print(f"\nFeature contributions:")
for name, val in zip(feature_names, shap_values[idx].values):
direction = "toward approval" if val > 0 else "toward rejection"
print(f" {name:<20s} {val:+.4f} ({direction})")
# To visualize: shap.plots.waterfall(shap_values[idx])
# To see global importance: shap.plots.beeswarm(shap_values)
TreeSHAP computes exact Shapley values for all 200 trees in milliseconds — orders of magnitude faster than brute force. The output shows each feature’s push toward approval or rejection, and they sum exactly to the difference between this prediction and the average (the base value). The shap.plots.waterfall() call renders this as a horizontal force plot where features stack left (rejection) or right (approval) from the base value.
SHAP’s beeswarm plot gives you the global picture: for every feature, it shows the distribution of SHAP values across all predictions. High feature values in one color, low in another. You can instantly see that high income pushes toward approval (positive SHAP values) while high debt ratio pushes toward rejection (negative SHAP values) — and how much each feature matters relative to the others.
Try It: SHAP Force Plot Explorer
Click an applicant to see how each feature pushes the prediction toward approval (blue, right) or rejection (red, left). The base value sits at the center.
Adjust features:
LIME — Local Linear Approximations
In 2016, Marco Ribeiro, Sameer Singh, and Carlos Guestrin introduced LIME with a paper titled “Why Should I Trust You?” — arguably the best paper title in ML explainability. Where SHAP reaches for game theory, LIME reaches for something simpler: fit a line.
The core insight is that even though a model might be wildly nonlinear globally, it’s approximately linear in a small neighborhood around any single prediction. LIME exploits this by:
- Perturb: Generate many new samples near the instance you want to explain, by randomly toggling features on and off (using the training data’s statistics for replacement values).
- Predict: Get the black-box model’s predictions on all the perturbed samples.
- Weight: Assign proximity weights using an exponential kernel:
π(z) = exp(−D(x,z)² / σ²), so samples closer to the original get higher weight. - Fit: Train a weighted linear regression on the perturbed samples. The regression coefficients are the feature importances for this prediction.
LIME is model-agnostic by design. It only needs the model’s predict() function — not its internals. This means it works with anything: tree ensembles, neural networks, SVMs, even services behind an API where you only get predictions back.
Let’s build LIME from scratch to understand every step:
import numpy as np
from sklearn.linear_model import Ridge
def lime_explain(model_predict, instance, training_data,
n_samples=1000, kernel_width=0.75, random_state=42):
"""Explain a single prediction using LIME from scratch."""
rng = np.random.RandomState(random_state)
n_features = len(instance)
# Step 1: Generate binary perturbation vectors
# Each row: 1 = use original feature, 0 = use background value
perturbations = rng.randint(0, 2, size=(n_samples, n_features))
perturbations[0] = np.ones(n_features) # Always include the original
# Step 2: Build actual samples from perturbation masks
background = training_data.mean(axis=0)
samples = np.where(perturbations, instance, background)
# Step 3: Get model predictions on perturbed samples
predictions = model_predict(samples)
# Step 4: Compute proximity weights (exponential kernel)
distances = np.sqrt(((perturbations - 1) ** 2).sum(axis=1))
weights = np.exp(-(distances ** 2) / (kernel_width ** 2))
# Step 5: Fit weighted linear regression
local_model = Ridge(alpha=1.0)
local_model.fit(perturbations, predictions, sample_weight=weights)
return local_model.coef_, local_model.intercept_
# Use the same GradientBoostingClassifier from above
# (assumes model and X_train are already defined)
applicant = X_test[0]
coefs, intercept = lime_explain(
model_predict=lambda x: model.predict_proba(x)[:, 1],
instance=applicant,
training_data=X_train,
n_samples=2000
)
print("LIME feature importances (local linear coefficients):")
for name, coef in zip(feature_names, coefs):
direction = "toward approval" if coef > 0 else "toward rejection"
print(f" {name:<20s} {coef:+.4f} ({direction})")
Note the key differences from SHAP. We’re fitting a local linear model around a single point, not computing a global game-theoretic attribution. The linear coefficients tell us which features matter here — in this neighborhood of the feature space. Move to a different applicant and you’ll get a different local linear model with different coefficients.
The kernel_width parameter controls how “local” the explanation is. A small kernel width means only very nearby perturbations get high weight — giving a tighter, more local approximation. A large kernel width considers a broader neighborhood, which can be more stable but less faithful to the model’s behavior at that exact point.
Try It: LIME Neighborhood Explorer
Click anywhere on the canvas to select a point. LIME generates perturbation samples nearby (faded dots), then fits a local linear boundary (dashed line). Notice how the linear fit is accurate near your point but diverges from the true curved boundary further away.
SHAP vs. LIME — When to Use Which
Both SHAP and LIME explain individual predictions, but they take fundamentally different approaches. Here’s a head-to-head comparison to help you choose:
Theoretical guarantees. SHAP is the only explanation method that satisfies all four Shapley axioms (efficiency, symmetry, null player, linearity). This means SHAP values are uniquely “fair” in a mathematically rigorous sense. LIME has no such guarantee — it’s a heuristic that works well in practice but doesn’t come with a fairness proof.
Consistency. TreeSHAP is deterministic — run it twice on the same input and you get identical values. LIME uses random perturbations, so different runs can produce different explanations. This is a real problem in production systems where you need reproducible explanations for compliance. Let’s quantify the difference:
import numpy as np
from collections import defaultdict
# Assumes model, X_train, X_test, feature_names from earlier
# LIME stability: run 10 times with different seeds
lime_rankings = defaultdict(list)
applicant = X_test[0]
for seed in range(10):
coefs, _ = lime_explain(
model_predict=lambda x: model.predict_proba(x)[:, 1],
instance=applicant,
training_data=X_train,
n_samples=2000,
random_state=seed
)
ranking = np.argsort(np.abs(coefs))[::-1]
for rank, feat_idx in enumerate(ranking):
lime_rankings[feature_names[feat_idx]].append(rank + 1)
print("LIME rank across 10 runs (lower = more important):")
for name in feature_names:
ranks = lime_rankings[name]
print(f" {name:<20s} ranks: {ranks} "
f"mean: {np.mean(ranks):.1f} std: {np.std(ranks):.2f}")
# SHAP stability: always the same
import shap
explainer = shap.TreeExplainer(model)
shap_vals = explainer(X_test[0:1])
shap_ranking = np.argsort(np.abs(shap_vals.values[0]))[::-1]
print(f"\nSHAP ranking (deterministic, run it 1000 times — same result):")
for rank, idx in enumerate(shap_ranking):
print(f" #{rank+1}: {feature_names[idx]:<20s} "
f"SHAP = {shap_vals.values[0][idx]:+.4f}")
You’ll typically see LIME’s top feature stay consistent, but the rankings of features #2 and #3 may swap between runs. For a compliance report that says “the top reason for rejection was X,” this instability matters.
Speed. TreeSHAP is extremely fast for tree models — milliseconds for hundreds of trees. KernelSHAP (model-agnostic) is slow because it needs many model evaluations. LIME is faster than KernelSHAP for one-off explanations but slower than TreeSHAP.
Flexibility. LIME’s model-agnostic design is its superpower. It works with text classifiers (perturb by removing words), image classifiers (perturb by masking superpixels), and any model behind an API. SHAP’s specialized variants (TreeSHAP, DeepSHAP) only work with specific model types.
The decision guide: use TreeSHAP for any tree-based model (it’s exact, fast, and deterministic). Use LIME when you need a quick explanation for a non-standard model or want to explain text/image classifiers. Use KernelSHAP when you need SHAP’s theoretical guarantees but don’t have a specialized explainer for your model type.
Building an Explanation Pipeline for Production
Notebook-level explainability is nice. Production-level explainability is a different beast. When every prediction needs an explanation — and those explanations need to be fast, cacheable, and formatted for different audiences — you need a pipeline.
Here are the key considerations:
- Cache explanations alongside predictions. SHAP and LIME computations are expensive. Don’t recompute them every time someone asks “why?” — store the explanation when you first make the prediction.
- Monitor explanation drift. If your model’s SHAP values start shifting over time — say, income used to be the top feature but now credit score is — the model’s reasoning is changing. This is an early warning sign, even before accuracy drops. (See our model monitoring post for drift detection techniques.)
- Format for your audience. Engineers want raw SHAP values and feature names. Business stakeholders want “The loan was rejected primarily because the applicant’s debt-to-income ratio (0.85) exceeds the typical approved applicant’s ratio (0.45).”
- Handle failures gracefully. Explanations can be slow or fail. Set timeouts, provide fallback explanations (e.g., feature importance rankings instead of SHAP values), and never let the explanation pipeline block the prediction pipeline.
import json
import hashlib
import shap
class ExplanationPipeline:
def __init__(self, model, feature_names, background_data):
self.model = model
self.feature_names = feature_names
self.explainer = shap.TreeExplainer(model)
self.background_stats = {
name: {"mean": bg_mean, "std": bg_std}
for name, bg_mean, bg_std in zip(
feature_names,
background_data.mean(axis=0),
background_data.std(axis=0))
}
self.cache = {}
def _cache_key(self, instance):
return hashlib.md5(instance.tobytes()).hexdigest()
def explain(self, instance, audience="technical"):
key = self._cache_key(instance)
if key not in self.cache:
sv = self.explainer(instance.reshape(1, -1))
self.cache[key] = {
"shap_values": sv.values[0].tolist(),
"base_value": float(sv.base_values[0]),
"prediction": int(self.model.predict(
instance.reshape(1, -1))[0]),
"probability": float(self.model.predict_proba(
instance.reshape(1, -1))[0, 1])
}
result = self.cache[key]
if audience == "technical":
return self._format_technical(result, instance)
return self._format_business(result, instance)
def _format_technical(self, result, instance):
return {
"prediction": result["prediction"],
"probability": round(result["probability"], 4),
"base_value": round(result["base_value"], 4),
"features": {
name: {"value": float(instance[i]),
"shap": round(result["shap_values"][i], 4)}
for i, name in enumerate(self.feature_names)
}
}
def _format_business(self, result, instance):
pairs = list(zip(self.feature_names, result["shap_values"]))
top = sorted(pairs, key=lambda p: abs(p[1]), reverse=True)[:3]
decision = "approved" if result["prediction"] == 1 else "rejected"
reasons = []
for name, sv in top:
idx = self.feature_names.index(name)
val = instance[idx]
avg = self.background_stats[name]["mean"]
direction = "above" if val > avg else "below"
reasons.append(
f"{name} ({val:.0f}) is {direction} average ({avg:.0f})")
return {
"decision": decision,
"confidence": f"{result['probability']:.0%}",
"top_reasons": reasons
}
# Usage
pipeline = ExplanationPipeline(model, feature_names, X_train)
applicant = X_test[0]
tech_explanation = pipeline.explain(applicant, audience="technical")
print("Technical:\n", json.dumps(tech_explanation, indent=2))
biz_explanation = pipeline.explain(applicant, audience="business")
print("\nBusiness:\n", json.dumps(biz_explanation, indent=2))
The business format gives a stakeholder everything they need: a decision, confidence level, and the top three reasons in plain English with comparisons to the average applicant. The technical format gives data scientists the raw SHAP values for further analysis. Both come from the same cached computation.
References & Further Reading
- Lundberg & Lee — “A Unified Approach to Interpreting Model Predictions” — The original SHAP paper (NeurIPS 2017). Introduces KernelSHAP and the connection between Shapley values and model explanation.
- Ribeiro, Singh, Guestrin — “Why Should I Trust You? Explaining the Predictions of Any Classifier” — The LIME paper (KDD 2016). Introduces local surrogate models for model-agnostic explanations.
- Shapley — “A Value for n-Person Games” — The original 1953 paper defining Shapley values in cooperative game theory.
- Molnar — “Interpretable Machine Learning” — A comprehensive free online book covering SHAP, LIME, and many other interpretability methods.
- Lundberg et al. — “From Local Explanations to Global Understanding with Explainable AI for Trees” — The TreeSHAP paper (Nature Machine Intelligence, 2020) explaining the polynomial-time algorithm for tree models.
Related DadOps posts: Decision Trees from Scratch (intrinsically interpretable models), Gradient Boosting from Scratch (the black box we’re explaining), Mechanistic Interpretability from Scratch (neural-network-specific explainability), Logistic Regression from Scratch (intrinsic interpretability baseline), ML Model Monitoring (monitoring explanation drift in production).