Low Entropy Works — AI & Data Strategy

Purpose

This report documents the interpretability stage of the fraud detection pipeline using SHAP (Shapley Additive Explanations).

By this point, the project already has trained tree-based models and calibrated variants. The next step is to inspect how the model reaches its risk scores:

which features drive fraud predictions globally
how feature effects behave across value ranges
which feature combinations interact
how a single flagged claim is explained case-by-case

This matters for a fraud decision system because a score alone is not enough. Operations teams need to understand why a claim was ranked as high risk, especially when the model affects investigation priority.

A useful mental model:

“Think of the model’s fraud score like a final bill in a restaurant. SHAP breaks the bill into line items and shows which features added cost and which ones reduced it.”

. . .

1. SHAP Setup in This Project

The interpretability workflow loads the prepared datasets and the calibrated tree-model bundle, then unwraps the base estimator when needed.

Datasets loaded

Two processed datasets are used in the project:

Preprocessed dataset: shape (1000, 51)
Trees dataset: shape (1000, 43)

Both retain the same target distribution:

Non-fraud: 753
Fraud: 247
Fraud rate: 24.7%

Why the model is unwrapped before SHAP

The SHAP workflow loads the calibrated models, then extracts the underlying estimator from the calibration wrapper (calibrated_classifiers_[0].estimator) before running TreeExplainer.

That design is practical and correct:

calibration improves probability quality
SHAP explains the tree model structure itself
the explanation step should inspect the actual decision logic learned by the tree ensemble

In plain language: calibration adjusts how the score is spoken, while SHAP inspects how the score was thought.

. . .

2. Pipeline Recovery and Feature Name Alignment

This is one of the most important engineering steps in the SHAP notebook, and frankly one of the most annoying parts of real ML work.

Tree models are often trained inside a preprocessing pipeline (for encoding, scaling, etc.). Once the data passes through that pipeline:

the original columns become transformed columns
categorical variables may become many one-hot encoded features
feature names can become hard to recover

Your SHAP workflow explicitly solves this.

What the code does

Detects whether the model is a pipeline (prep + clf)
Applies the exact same prep.transform(...) used during training
Converts sparse matrices to dense if needed
Extracts transformed feature names via get_feature_names_out()
Runs a dimensionality sanity check before SHAP

Example from the run

For the selected XGBoost on Trees setup:

pipeline was detected
preprocessing was applied
feature names were successfully extracted
SHAP matrix was aligned to 171 transformed features

That last part is crucial. If feature names and columns are misaligned, SHAP plots become decorative fiction.

// Code Snippet — Pipeline Recovery for SHAP
if hasattr(base_model, "named_steps") and "prep" in base_model.named_steps:
    prep = base_model.named_steps["prep"]
    clf = base_model.named_steps["clf"]

    X_train_t = prep.transform(X_train)
    X_test_t = prep.transform(X_test)

    if hasattr(X_train_t, "toarray"):
        X_train_t = X_train_t.toarray()
        X_test_t = X_test_t.toarray()

    feature_names = list(prep.get_feature_names_out())
else:
    clf = base_model
    X_train_t = X_train.values
    X_test_t = X_test.values
    feature_names = list(X_train.columns)

Why readers should care

This is the plumbing that keeps the interpretation honest. Without it, a chart might say “feature_42” is important, which is about as useful as a treasure map that says “dig somewhere.”

. . .

3. Global SHAP Analysis (Top Drivers)

The global SHAP step computes feature contributions on a subsample of training data (X_train_t[:300]) and summarizes average absolute impact. This gives the overall driver ranking for the selected model.

Selected global model in the run

Model: XGBoost
Dataset: Trees
Explainer: shap.TreeExplainer

Top global drivers (mean |SHAP|)

From your run, the strongest global drivers included:

incident_severity_Major Damage — 0.7272
insured_hobbies_chess — 0.2851
insured_hobbies_cross-fit — 0.2101
injury_share — 0.0516
months_as_customer — 0.0389
vehicle_age — 0.0353
incident_state_VA — 0.0280
insured_occupation_exec-managerial — 0.0278
days_since_bind — 0.0263
witnesses — 0.0259

What this tells the reader

This global ranking shows a mix of:

incident severity signal (very strong)
claim composition / financial structure signal (e.g., injury_share)
tenure/context variables (months_as_customer, days_since_bind)
encoded categorical effects (occupation, state, collision type, hobbies, auto model)

It also immediately raises a governance question around hobby-related features, because they rank unusually high. That’s exactly the kind of thing SHAP is supposed to surface early, before anyone gets too attached to leaderboard metrics.

“Global SHAP summary showing the strongest fraud-risk drivers and the direction of feature impact across claims.”

. . .

4. Dependence Plots (How Continuous Features Behave)

Global importance tells you which features matter. Dependence plots show how they matter across their value range.

Your workflow explicitly generated dependence plots for a list of continuous features (when present in the transformed feature set), including:

policy_annual_premium
days_since_bind
total_claim_amount
months_as_customer
capital-loss
capital-gains
age
incident_hour_of_the_day
hour_cos
hour_sin
vehicle_age

Why these plots are useful

A dependence plot helps answer questions like:

Does risk increase steadily with higher values?
Is there a threshold effect?
Does the effect flatten after a point?
Does the feature only matter in a certain region?

This is much more useful than a simple correlation number, because tree models often learn step-like or curved effects.

Simple metaphor

A correlation is like asking, “Does the road generally go uphill?”

A dependence plot shows the actual road:

where it climbs
where it flattens
where it suddenly turns weird

That “weird part” is often where fraud signal lives.

“Dependence plot for policy tenure, showing how time since policy bind influences fraud risk contribution across the observed range.”

“Dependence plot for claim composition / amount feature, showing where fraud contribution becomes stronger or weaker.”

. . .

5. SHAP Interaction Analysis (Feature Synergies)

Fraud patterns often come from combinations, not single variables.

Your SHAP workflow includes a dedicated interaction analysis module that:

computes shap_interaction_values(...) on a small sample (X_train_t[:80])
calculates mean absolute interaction strength
removes self-interactions (diagonal)
ranks the strongest feature pairs

This is exactly the right move for a fraud system, because suspicious patterns are often conditional.

Top 5 interactions from your run (XGBoost on Trees)

insured_hobbies_chess × incident_severity_Major Damage — 0.1039 (moderate joint effect)
insured_hobbies_cross-fit × incident_severity_Major Damage — 0.0876
insured_hobbies_chess × insured_hobbies_cross-fit — 0.0267
insured_occupation_handlers-cleaners × incident_severity_Major Damage — 0.0200
capital-loss × incident_severity_Major Damage — 0.0185

What stands out

incident_severity_Major Damage appears repeatedly in the top interactions.

That suggests the model is not treating severity as a simple isolated signal. It acts more like a context amplifier:

severity + claim composition
severity + tenure
severity + category-level encoded traits

From a product perspective, this is useful because it points toward composite investigation heuristics, not just one-column alert rules. It also reinforces why tree models earned their place earlier in the project: they can learn these local combinations naturally.

“Top SHAP interaction pairs reveal where fraud signal emerges from combinations rather than standalone features.”

. . .

6. Targeted Interaction Visuals (Insurance-Specific Pairs)

Beyond ranking interactions, your workflow also creates targeted interaction scatter plots for selected insurance-relevant feature pairs.

Examples included:

umbrella_limit × incident_severity_Major Damage
total_claim_amount × incident_severity_Major Damage
vehicle_age × incident_hour_of_the_day
policy_annual_premium × months_as_customer
policy_annual_premium × injury_share

Why this is excellent

This turns abstract “interaction strength” into something a human can actually inspect.

It helps readers see patterns like:

whether a feature matters only under major damage
whether premium behaves differently for short vs long customers
whether claim composition changes effect direction

These charts are especially useful for your website because they show actual model behavior without forcing people to read a wall of metrics.

“Targeted interaction view showing how claim amount contribution changes depending on incident severity.”

. . .

7. Case-Level SHAP (Waterfall Explanations)

Global analysis is great for trust. Case-level explanations are where SHAP becomes operational.

Your notebook includes two case-level workflows:

A) Waterfall plots (first case-level module)

Selects fraud cases in test set where:
- predicted probability ≥ 0.5
- true label = fraud
Plots SHAP waterfall explanations for the selected cases

B) Extended waterfall run (dedicated block)

A second run explicitly selects:

Dataset: Trees
Model: ExtraTrees
Threshold: 0.5
Cases shown: up to 10 fraud cases

What a waterfall plot shows (simple explanation)

A waterfall plot starts from the model’s baseline risk (average prediction level), then stacks feature contributions:

some push risk up
some push risk down

The final value becomes the claim’s score. It’s basically a receipt for one prediction. That is exactly the format investigators and auditors tend to understand fastest.

“Case-level SHAP waterfall showing which features increased and decreased the fraud score for a single flagged claim.”

“Comparing multiple case waterfalls reveals recurring patterns and feature combinations behind flagged fraud claims.”

. . .

8. Force Plots (Interactive Explanations)

Your workflow also generates SHAP force plots (JS mode) for the selected fraud cases.

A force plot is like a compact horizontal tug-of-war:

one side pushes the score up
the other side pushes it down

You also limited the plot to the top 15 features by absolute SHAP value for readability, which is a smart move. Full-force plots with 150+ transformed features become visual soup very quickly.

Why this matters for your website

Force plots are great for an interactive version of the project:

they feel intuitive
they work well as “open this case → inspect drivers”
they support the PM framing of decision support, not just model scoring

“Interactive force plot showing the main features pushing a claim’s fraud risk score upward or downward.”

. . .

9. What This SHAP Stage Adds to the System (Product Perspective)

This stage turns the fraud model into something a real team can reason about.

A) It supports model validation beyond metrics

Metrics tell you whether the model performs well. SHAP shows whether the model is using sensible logic or weird shortcuts. That is a very different question.

B) It exposes proxy-risk behavior early

The strong appearance of hobby features in global and interaction analysis is exactly the kind of thing that should trigger review. Even when performance looks good, these patterns may be unstable, socially problematic, or hard to justify in production. SHAP helps surface that before deployment.

C) It enables case review workflows

Waterfalls and force plots are useful for:

QA
model debugging
analyst training
stakeholder demos
documentation

You can show one claim and explain the score without hand-waving.

D) It strengthens the “AI PM” angle

This stage is not about “nice plots.” It is about making the decision system:

inspectable
debuggable
governable

That is exactly what mature AI product work looks like.

. . .

Key Takeaway

The SHAP stage opens the fraud model and shows how risk scores are built, feature by feature.

In this project, the interpretability workflow:

correctly restores transformed feature names from the preprocessing pipeline
computes global SHAP drivers on the tree-optimized feature representation
inspects continuous feature behavior with dependence plots
surfaces feature synergies through SHAP interaction analysis
generates case-level waterfall and force explanations for flagged fraud claims

The global results show a strong role for incident severity, claim composition, and contextual features, while also surfacing potentially problematic proxy-like signals (such as hobby-related features) that deserve governance review.

That combination is exactly what interpretability is for: understanding the model’s strengths, and catching the parts that need adult supervision.

Claim Fraud Detection (ML): Interpretability (SHAP)

Purpose

1. SHAP Setup in This Project

Datasets loaded

Why the model is unwrapped before SHAP

2. Pipeline Recovery and Feature Name Alignment

What the code does

Example from the run

Why readers should care

3. Global SHAP Analysis (Top Drivers)

Selected global model in the run

Top global drivers (mean |SHAP|)

What this tells the reader

4. Dependence Plots (How Continuous Features Behave)

Why these plots are useful

Simple metaphor

5. SHAP Interaction Analysis (Feature Synergies)

Top 5 interactions from your run (XGBoost on Trees)

What stands out

6. Targeted Interaction Visuals (Insurance-Specific Pairs)

Why this is excellent

7. Case-Level SHAP (Waterfall Explanations)

A) Waterfall plots (first case-level module)

B) Extended waterfall run (dedicated block)

What a waterfall plot shows (simple explanation)

8. Force Plots (Interactive Explanations)

Why this matters for your website

9. What This SHAP Stage Adds to the System (Product Perspective)

A) It supports model validation beyond metrics

B) It exposes proxy-risk behavior early

C) It enables case review workflows

D) It strengthens the “AI PM” angle

Key Takeaway