← Back to Blog

Model Explainability with SHAP and LIME: Opening the Black Box

The Explainability Crisis — Why “It Works” Isn’t Enough

Your gradient boosting model achieves 97% accuracy on the test set. The stakeholder nods approvingly, then asks: “Why did it reject this customer’s loan application?” You stare at 500 trees with 8 levels each and realize you have no idea.

This is the explainability crisis — and it’s not just a nice-to-have. GDPR’s “right to explanation,” healthcare regulations, and financial compliance all require that automated decisions be interpretable. A model that can’t explain itself is a liability.

Fortunately, two elegant frameworks let you explain any model without cracking it open. But first, let’s establish the landscape:

The two dominant post-hoc approaches are SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations). SHAP comes from game theory — it asks “what is each feature’s fair contribution?” LIME comes from approximation theory — it asks “what simple model explains this prediction locally?” By the end of this post, you’ll know how both work under the hood, when to use each, and how to build a production explanation pipeline.

Shapley Values — Fair Credit from Game Theory

Before SHAP, there were Shapley values. In 1953, mathematician Lloyd Shapley solved a problem that seems unrelated to machine learning: if a group of players cooperate to earn a payout, how do you fairly distribute the earnings?

His answer was elegant. For each player, consider every possible coalition (subset) of other players. Measure the player’s marginal contribution — how much the payout increases when they join. Average this marginal contribution over all possible coalitions, weighted by how many ways each coalition can form. The result is the Shapley value.

Applied to ML: features are “players,” the model’s prediction is the “payout,” and the Shapley value of feature i is its fair share of the prediction. The formula is:

φi = ΣS⊆N\{i} |S|!(|N|−|S|−1)! / |N|! × [f(S∪{i}) − f(S)]

Where f(S) is the model’s prediction using only the features in set S (with absent features marginalized out), and N is the full feature set. This formula has four remarkable properties that uniquely define it — no other attribution method satisfies all four:

  1. Efficiency: The Shapley values sum to f(x) − E[f(x)]. Every bit of the prediction above the baseline is accounted for.
  2. Symmetry: If two features contribute identically in every coalition, they get identical Shapley values.
  3. Null player: A feature that never changes the prediction in any coalition gets a Shapley value of zero.
  4. Linearity: For a sum of two models, the Shapley values of the sum equal the sum of the individual Shapley values.

The catch? Computing exact Shapley values requires evaluating 2n subsets for n features. With 20 features, that’s over a million subsets. With 100 features, it’s more subsets than atoms in the universe. Let’s see it in action for a small model where brute force is tractable:

import itertools
import math
import numpy as np

# A simple loan model: f(x) = weighted sum + interaction term
def loan_model(income, credit_score, debt_ratio, employment_years):
    score = (0.3 * income + 0.25 * credit_score
             - 0.35 * debt_ratio + 0.1 * employment_years)
    # Add interaction: high income offsets high debt
    score += 0.15 * income * (1 - debt_ratio)
    return score

feature_names = ["income", "credit_score", "debt_ratio", "employment_years"]
# Applicant to explain (normalized 0-1 scale)
x = [0.7, 0.8, 0.4, 0.6]
# Background mean (population average)
bg = [0.5, 0.5, 0.5, 0.5]

def f_subset(subset_mask, instance, background):
    """Evaluate model with only subset features; others use background."""
    args = [instance[i] if subset_mask[i] else background[i] for i in range(4)]
    return loan_model(*args)

n = len(feature_names)
shapley_values = np.zeros(n)

for i in range(n):
    others = [j for j in range(n) if j != i]
    for size in range(n):  # coalition sizes 0..n-1
        for combo in itertools.combinations(others, size):
            mask_without = [j in combo for j in range(n)]
            mask_with = list(mask_without)
            mask_with[i] = True
            marginal = f_subset(mask_with, x, bg) - f_subset(mask_without, x, bg)
            weight = (math.factorial(size)
                      * math.factorial(n - size - 1)
                      / math.factorial(n))
            shapley_values[i] += weight * marginal

f_x = loan_model(*x)
f_bg = loan_model(*bg)
print(f"f(x) = {f_x:.4f},  E[f(x)] = {f_bg:.4f}")
print(f"f(x) - E[f(x)] = {f_x - f_bg:.4f}")
print(f"Sum of Shapley values = {shapley_values.sum():.4f}")
print()
for name, val in zip(feature_names, shapley_values):
    print(f"  {name:<20s} φ = {val:+.4f}")

Running this, you’ll see that the Shapley values sum exactly to f(x) - E[f(x)] — the efficiency property in action. Income and credit score get the largest positive contributions, while debt ratio gets a large negative contribution (pushing toward rejection). The interaction term (income × (1 - debt_ratio)) distributes credit across both income and debt ratio — Shapley values handle interactions naturally.

SHAP — Making Shapley Values Practical

Exact Shapley values are beautiful but exponentially expensive. In 2017, Scott Lundberg and Su-In Lee published “A Unified Approach to Interpreting Model Predictions” at NeurIPS, introducing SHAP — a family of algorithms that make Shapley values tractable for real models:

Let’s use the SHAP library on a real scikit-learn model. We’ll train a gradient boosting classifier on a synthetic loan dataset and explain individual predictions:

import numpy as np
import shap
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split

# Generate synthetic loan data
rng = np.random.RandomState(42)
n_samples = 2000
income = rng.normal(55000, 15000, n_samples)
credit_score = rng.normal(680, 50, n_samples)
debt_ratio = rng.uniform(0.1, 0.9, n_samples)
employment_years = rng.exponential(5, n_samples).clip(0, 30)
loan_amount = rng.normal(25000, 10000, n_samples).clip(5000, None)

# Approval rule (with some noise)
approval_score = (0.3 * (income / 80000) + 0.25 * (credit_score / 800)
                  - 0.35 * debt_ratio + 0.1 * (employment_years / 15))
approved = (approval_score + rng.normal(0, 0.05, n_samples)) > 0.35

X = np.column_stack([income, credit_score, debt_ratio,
                     employment_years, loan_amount])
feature_names = ["income", "credit_score", "debt_ratio",
                 "employment_years", "loan_amount"]
y = approved.astype(int)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

model = GradientBoostingClassifier(
    n_estimators=200, max_depth=4, random_state=42)
model.fit(X_train, y_train)
print(f"Test accuracy: {model.score(X_test, y_test):.3f}")

# Explain with TreeSHAP
explainer = shap.TreeExplainer(model)
shap_values = explainer(X_test)

# Single prediction explanation
idx = 0  # First test instance
print(f"\nPrediction for applicant #{idx}: "
      f"{'Approved' if model.predict(X_test[idx:idx+1])[0] else 'Rejected'}")
print(f"Base value (avg prediction): {shap_values[idx].base_values:.3f}")
print(f"\nFeature contributions:")
for name, val in zip(feature_names, shap_values[idx].values):
    direction = "toward approval" if val > 0 else "toward rejection"
    print(f"  {name:<20s} {val:+.4f}  ({direction})")

# To visualize: shap.plots.waterfall(shap_values[idx])
# To see global importance: shap.plots.beeswarm(shap_values)

TreeSHAP computes exact Shapley values for all 200 trees in milliseconds — orders of magnitude faster than brute force. The output shows each feature’s push toward approval or rejection, and they sum exactly to the difference between this prediction and the average (the base value). The shap.plots.waterfall() call renders this as a horizontal force plot where features stack left (rejection) or right (approval) from the base value.

SHAP’s beeswarm plot gives you the global picture: for every feature, it shows the distribution of SHAP values across all predictions. High feature values in one color, low in another. You can instantly see that high income pushes toward approval (positive SHAP values) while high debt ratio pushes toward rejection (negative SHAP values) — and how much each feature matters relative to the others.

Try It: SHAP Force Plot Explorer

Click an applicant to see how each feature pushes the prediction toward approval (blue, right) or rejection (red, left). The base value sits at the center.

Adjust features:

LIME — Local Linear Approximations

In 2016, Marco Ribeiro, Sameer Singh, and Carlos Guestrin introduced LIME with a paper titled “Why Should I Trust You?” — arguably the best paper title in ML explainability. Where SHAP reaches for game theory, LIME reaches for something simpler: fit a line.

The core insight is that even though a model might be wildly nonlinear globally, it’s approximately linear in a small neighborhood around any single prediction. LIME exploits this by:

  1. Perturb: Generate many new samples near the instance you want to explain, by randomly toggling features on and off (using the training data’s statistics for replacement values).
  2. Predict: Get the black-box model’s predictions on all the perturbed samples.
  3. Weight: Assign proximity weights using an exponential kernel: π(z) = exp(−D(x,z)² / σ²), so samples closer to the original get higher weight.
  4. Fit: Train a weighted linear regression on the perturbed samples. The regression coefficients are the feature importances for this prediction.

LIME is model-agnostic by design. It only needs the model’s predict() function — not its internals. This means it works with anything: tree ensembles, neural networks, SVMs, even services behind an API where you only get predictions back.

Let’s build LIME from scratch to understand every step:

import numpy as np
from sklearn.linear_model import Ridge

def lime_explain(model_predict, instance, training_data,
                 n_samples=1000, kernel_width=0.75, random_state=42):
    """Explain a single prediction using LIME from scratch."""
    rng = np.random.RandomState(random_state)
    n_features = len(instance)

    # Step 1: Generate binary perturbation vectors
    # Each row: 1 = use original feature, 0 = use background value
    perturbations = rng.randint(0, 2, size=(n_samples, n_features))
    perturbations[0] = np.ones(n_features)  # Always include the original

    # Step 2: Build actual samples from perturbation masks
    background = training_data.mean(axis=0)
    samples = np.where(perturbations, instance, background)

    # Step 3: Get model predictions on perturbed samples
    predictions = model_predict(samples)

    # Step 4: Compute proximity weights (exponential kernel)
    distances = np.sqrt(((perturbations - 1) ** 2).sum(axis=1))
    weights = np.exp(-(distances ** 2) / (kernel_width ** 2))

    # Step 5: Fit weighted linear regression
    local_model = Ridge(alpha=1.0)
    local_model.fit(perturbations, predictions, sample_weight=weights)

    return local_model.coef_, local_model.intercept_

# Use the same GradientBoostingClassifier from above
# (assumes model and X_train are already defined)
applicant = X_test[0]
coefs, intercept = lime_explain(
    model_predict=lambda x: model.predict_proba(x)[:, 1],
    instance=applicant,
    training_data=X_train,
    n_samples=2000
)

print("LIME feature importances (local linear coefficients):")
for name, coef in zip(feature_names, coefs):
    direction = "toward approval" if coef > 0 else "toward rejection"
    print(f"  {name:<20s} {coef:+.4f}  ({direction})")

Note the key differences from SHAP. We’re fitting a local linear model around a single point, not computing a global game-theoretic attribution. The linear coefficients tell us which features matter here — in this neighborhood of the feature space. Move to a different applicant and you’ll get a different local linear model with different coefficients.

The kernel_width parameter controls how “local” the explanation is. A small kernel width means only very nearby perturbations get high weight — giving a tighter, more local approximation. A large kernel width considers a broader neighborhood, which can be more stable but less faithful to the model’s behavior at that exact point.

Try It: LIME Neighborhood Explorer

Click anywhere on the canvas to select a point. LIME generates perturbation samples nearby (faded dots), then fits a local linear boundary (dashed line). Notice how the linear fit is accurate near your point but diverges from the true curved boundary further away.

0.25
Click a point on the canvas to see LIME in action.

SHAP vs. LIME — When to Use Which

Both SHAP and LIME explain individual predictions, but they take fundamentally different approaches. Here’s a head-to-head comparison to help you choose:

Theoretical guarantees. SHAP is the only explanation method that satisfies all four Shapley axioms (efficiency, symmetry, null player, linearity). This means SHAP values are uniquely “fair” in a mathematically rigorous sense. LIME has no such guarantee — it’s a heuristic that works well in practice but doesn’t come with a fairness proof.

Consistency. TreeSHAP is deterministic — run it twice on the same input and you get identical values. LIME uses random perturbations, so different runs can produce different explanations. This is a real problem in production systems where you need reproducible explanations for compliance. Let’s quantify the difference:

import numpy as np
from collections import defaultdict

# Assumes model, X_train, X_test, feature_names from earlier

# LIME stability: run 10 times with different seeds
lime_rankings = defaultdict(list)
applicant = X_test[0]

for seed in range(10):
    coefs, _ = lime_explain(
        model_predict=lambda x: model.predict_proba(x)[:, 1],
        instance=applicant,
        training_data=X_train,
        n_samples=2000,
        random_state=seed
    )
    ranking = np.argsort(np.abs(coefs))[::-1]
    for rank, feat_idx in enumerate(ranking):
        lime_rankings[feature_names[feat_idx]].append(rank + 1)

print("LIME rank across 10 runs (lower = more important):")
for name in feature_names:
    ranks = lime_rankings[name]
    print(f"  {name:<20s} ranks: {ranks}  "
          f"mean: {np.mean(ranks):.1f}  std: {np.std(ranks):.2f}")

# SHAP stability: always the same
import shap
explainer = shap.TreeExplainer(model)
shap_vals = explainer(X_test[0:1])
shap_ranking = np.argsort(np.abs(shap_vals.values[0]))[::-1]
print(f"\nSHAP ranking (deterministic, run it 1000 times — same result):")
for rank, idx in enumerate(shap_ranking):
    print(f"  #{rank+1}: {feature_names[idx]:<20s} "
          f"SHAP = {shap_vals.values[0][idx]:+.4f}")

You’ll typically see LIME’s top feature stay consistent, but the rankings of features #2 and #3 may swap between runs. For a compliance report that says “the top reason for rejection was X,” this instability matters.

Speed. TreeSHAP is extremely fast for tree models — milliseconds for hundreds of trees. KernelSHAP (model-agnostic) is slow because it needs many model evaluations. LIME is faster than KernelSHAP for one-off explanations but slower than TreeSHAP.

Flexibility. LIME’s model-agnostic design is its superpower. It works with text classifiers (perturb by removing words), image classifiers (perturb by masking superpixels), and any model behind an API. SHAP’s specialized variants (TreeSHAP, DeepSHAP) only work with specific model types.

The decision guide: use TreeSHAP for any tree-based model (it’s exact, fast, and deterministic). Use LIME when you need a quick explanation for a non-standard model or want to explain text/image classifiers. Use KernelSHAP when you need SHAP’s theoretical guarantees but don’t have a specialized explainer for your model type.

Building an Explanation Pipeline for Production

Notebook-level explainability is nice. Production-level explainability is a different beast. When every prediction needs an explanation — and those explanations need to be fast, cacheable, and formatted for different audiences — you need a pipeline.

Here are the key considerations:

import json
import hashlib
import shap

class ExplanationPipeline:
    def __init__(self, model, feature_names, background_data):
        self.model = model
        self.feature_names = feature_names
        self.explainer = shap.TreeExplainer(model)
        self.background_stats = {
            name: {"mean": bg_mean, "std": bg_std}
            for name, bg_mean, bg_std in zip(
                feature_names,
                background_data.mean(axis=0),
                background_data.std(axis=0))
        }
        self.cache = {}

    def _cache_key(self, instance):
        return hashlib.md5(instance.tobytes()).hexdigest()

    def explain(self, instance, audience="technical"):
        key = self._cache_key(instance)
        if key not in self.cache:
            sv = self.explainer(instance.reshape(1, -1))
            self.cache[key] = {
                "shap_values": sv.values[0].tolist(),
                "base_value": float(sv.base_values[0]),
                "prediction": int(self.model.predict(
                    instance.reshape(1, -1))[0]),
                "probability": float(self.model.predict_proba(
                    instance.reshape(1, -1))[0, 1])
            }

        result = self.cache[key]
        if audience == "technical":
            return self._format_technical(result, instance)
        return self._format_business(result, instance)

    def _format_technical(self, result, instance):
        return {
            "prediction": result["prediction"],
            "probability": round(result["probability"], 4),
            "base_value": round(result["base_value"], 4),
            "features": {
                name: {"value": float(instance[i]),
                       "shap": round(result["shap_values"][i], 4)}
                for i, name in enumerate(self.feature_names)
            }
        }

    def _format_business(self, result, instance):
        pairs = list(zip(self.feature_names, result["shap_values"]))
        top = sorted(pairs, key=lambda p: abs(p[1]), reverse=True)[:3]
        decision = "approved" if result["prediction"] == 1 else "rejected"
        reasons = []
        for name, sv in top:
            idx = self.feature_names.index(name)
            val = instance[idx]
            avg = self.background_stats[name]["mean"]
            direction = "above" if val > avg else "below"
            reasons.append(
                f"{name} ({val:.0f}) is {direction} average ({avg:.0f})")
        return {
            "decision": decision,
            "confidence": f"{result['probability']:.0%}",
            "top_reasons": reasons
        }

# Usage
pipeline = ExplanationPipeline(model, feature_names, X_train)
applicant = X_test[0]

tech_explanation = pipeline.explain(applicant, audience="technical")
print("Technical:\n", json.dumps(tech_explanation, indent=2))

biz_explanation = pipeline.explain(applicant, audience="business")
print("\nBusiness:\n", json.dumps(biz_explanation, indent=2))

The business format gives a stakeholder everything they need: a decision, confidence level, and the top three reasons in plain English with comparisons to the average applicant. The technical format gives data scientists the raw SHAP values for further analysis. Both come from the same cached computation.

References & Further Reading

Related DadOps posts: Decision Trees from Scratch (intrinsically interpretable models), Gradient Boosting from Scratch (the black box we’re explaining), Mechanistic Interpretability from Scratch (neural-network-specific explainability), Logistic Regression from Scratch (intrinsic interpretability baseline), ML Model Monitoring (monitoring explanation drift in production).