Low Entropy Works — AI & Data Strategy

Purpose

This report documents the tree-model optimization stage of the fraud detection system.

By this point, the linear baseline had already been tuned and gave a useful, interpretable reference. It also showed a practical ceiling: fraud patterns in this dataset depend on interactions and local combinations, so performance starts to plateau when a model relies on one global decision surface.

Tree-based models were introduced to handle that structure more naturally. They can split the data into local regions and capture combinations such as:

short policy tenure + severe incident
missing documentation + unusual claim composition
category combinations that become risky only in certain contexts

A simple way to picture it:

Linear model → one long ruler across the page
Tree model → branching checklist with follow-up questions

This stage also compares two dataset representations because model families “see” data differently:

Original dataset (50 features) — broader representation
Trees dataset (42 features) — tree-optimized representation

The central question is straightforward:

“Which tree model and feature representation produce the strongest fraud recall under operational constraints?”

. . .

1. Why Trees Were Trained Separately from Linear Models

Different model families need different kinds of feature representation. This is a design choice, not a cosmetic one.

How linear models read data

Linear models work best when the signal has already been translated into:

binary indicators
buckets
explicit threshold flags
one-hot encoded categories

That is why the earlier linear pipeline used more manual shaping.

How tree models read data

Tree models can discover thresholds and interactions on their own. They benefit from richer structure and usually need less “flattening.” In some cases, too much bucketing removes useful detail.

That is why this project keeps a separate tree pathway and compares two representations directly.

Dataset setup used in this stage

Dataset Version	Train Shape	Test Shape	Notes
Original	(800, 50)	(200, 50)	Broader feature set
Trees	(800, 42)	(200, 42)	Tree-optimized representation

Both splits were stratified, and the training data preserved the same fraud imbalance pattern (about 3.04 non-fraud per 1 fraud), so the comparison stayed fair.

. . .

2. Optimization Strategy

All candidate tree-family models were tuned using the same process, so the comparison reflects model behavior rather than inconsistent setup.

Candidate models screened and tuned

Model Family	Why Included
RandomForest	Strong tabular baseline, captures interactions
XGBoost	Powerful boosting model, often excellent on structured data
ExtraTrees	High-randomness tree ensemble, often robust on noisy tabular data
AdaBoost	Sequential boosting with simpler learners
Bagging (DecisionTree base)	Ensemble stability benchmark

Training and tuning setup

Component	Configuration
Cross-validation	5-fold StratifiedKFold
Search method	RandomizedSearchCV
Refit metric	F2
Additional tracking	PR-AUC
Objective priority	Recall-weighted fraud triage performance

Why F2?

F2 gives more weight to recall than precision. In fraud triage:

missing a fraud case (false negative) usually costs more
reviewing one extra legitimate claim (false positive) is still costly, but often less severe

That makes F2 a practical optimization target for this stage.

// Code Snippet — Tree Tuning Pattern (Conceptual)
from sklearn.model_selection import StratifiedKFold, RandomizedSearchCV
from sklearn.metrics import make_scorer, fbeta_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import RobustScaler

f2_scorer = make_scorer(fbeta_score, beta=2)

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

pipe = Pipeline([
    ("scaler", RobustScaler()),
    ("model", candidate_tree_model)
])

search = RandomizedSearchCV(
    estimator=pipe,
    param_distributions=param_grid,
    n_iter=30,
    scoring={"f2": f2_scorer, "pr_auc": "average_precision"},
    refit="f2",
    cv=cv,
    random_state=42,
    n_jobs=-1
)

search.fit(X_train, y_train)
best_model = search.best_estimator_

Why this matters: The model is tuned using the metric that matches the business problem, and the pipeline keeps preprocessing inside CV folds so evaluation stays clean.

. . .

3. Cross-Validation Results

Cross-validation is the first real stress test. It checks whether a model performs consistently before the held-out test set gets involved.

“Tree-optimized feature representation improves cross-validated F2 for every tree-family model.”

What this means

The tree-optimized representation improves CV F2 across every model family. These are not tiny gains. The feature representation is changing how well trees can split the risk space.

That’s the useful lesson here: feature engineering changes the “terrain” the model navigates.

. . .

4. Best Hyperparameter Snapshot (Trees Dataset)

Hyperparameters tell you how each model prefers to behave on this fraud problem. The values below are the best-performing settings from CV on the Trees dataset.

“Top-performing tree models converged on fraud-aware settings: class weighting, controlled depth, and regularization.”

Plain-English interpretation

Class weighting / positive-class weighting appears repeatedly. That’s the models being told: “Fraud is rarer, pay attention.”
Depth limits + min_samples_leaf show up in strong models. That helps prevent memorizing noise.
XGBoost prefers a very low learning rate here, which means slower, more careful fitting.

. . .

5. Test-Set Results (Threshold = 0.5)

After tuning, the best models were evaluated on the held-out test set using the standard threshold of 0.5 for a clean comparison.

“ExtraTrees achieved the strongest default-threshold F2 on the held-out test set, with high recall for fraud triage.”

What stands out

XGBoost had the strongest CV F2 during tuning.
ExtraTrees delivered the strongest test-set F2 at threshold 0.5.

That’s normal. Cross-validation and test evaluation often shuffle the order slightly. CV is the rehearsal; test is opening night.

. . .

6. Output Artifacts for Next Stages

The tuned tree models were saved as uncalibrated artifacts for later stages. That separation is intentional and useful:

Stage	Question It Answers
Training / Tuning	Which model family and settings perform best?
Threshold Design	How many claims get flagged, and how many frauds are caught?
Calibration (next)	Do the predicted probabilities behave honestly?
Interpretability (later)	Which features drove the score?

Keeping these stages separate makes the system easier to audit, explain, and maintain.

Minimalist abstract sequence showing six geometric stages of a machine learning model pipeline — “Tree optimization selects candidate models. Calibration and interpretability are handled as separate audit steps.”

. . .

Key Takeaway

Tree optimization is the point where the fraud pipeline becomes operationally strong.

Using separate pathways for linear and tree models was the right design decision because the model families learn differently. The tree-optimized feature representation improved cross-validated F2 across all tree families and produced stronger test performance overall.

At the default threshold, ExtraTrees delivered the strongest held-out F2 with high recall. This uncalibrated ranking artifact will be passed directly into the thresholding and calibration stages.

In short: the modeling got stronger, and the candidate models are ready for operational evaluation.

Claim Fraud Detection (ML): Tree Optimization

Purpose

1. Why Trees Were Trained Separately from Linear Models

How linear models read data

How tree models read data

Dataset setup used in this stage

2. Optimization Strategy

Candidate models screened and tuned

Training and tuning setup

Why F2?

3. Cross-Validation Results

What this means

4. Best Hyperparameter Snapshot (Trees Dataset)

Plain-English interpretation

5. Test-Set Results (Threshold = 0.5)

What stands out

6. Output Artifacts for Next Stages

Key Takeaway