The objective was not improving model accuracy.

The objective was engineering maturity:

  • Deterministic preprocessing
  • Reproducible training
  • Clear separation of prediction vs decision
  • Deployment-ready inference contract
  • Auditability

This transforms the project from “notebook experiment” into “production system component.”

. . .

System Overview

High-Level Architecture

Raw Claim Data
      ↓
Schema Validation
      ↓
Preprocessing Layer
      ↓
Feature Engineering
      ↓
Model Inference
      ↓
Calibration (optional)
      ↓
Threshold Policy
      ↓
Structured Decision Output

Key principle:

“Prediction is not a decision. The model outputs probabilities. The system applies policy.”
. . .

Core Design Decisions

3.1 Separation of Concerns

Layer Responsibility Why
Preprocessing Data cleaning & transformations Prevent training/inference drift
Model Probability estimation Statistical task only
Calibration Probability reliability Mode-specific deployment
Threshold Policy Business decision logic Governance & flexibility

This prevents:

  • Threshold baked into model weights
  • Calibration hard-coded
  • Hidden feature mismatch
. . .

Pipeline Components

1. Input Schema Validation

Before any transformation:

  • Required columns checked
  • Data types validated
  • Feature order enforced
  • Unknown columns flagged
  • Missing fields handled explicitly

Failure behavior:

  • Hard error
  • No silent fallback

Why this matters:

Most production ML failures are not model errors. They are schema drift errors.

. . .

2. Preprocessing Layer

Unified transformation module used in:

  • Training
  • Inference
  • Batch scoring
  • API scoring

Includes:

  • Missing value imputation
  • Categorical encoding
  • Cyclical time encoding (hour → sin/cos)
  • Missing flags
  • Ratio normalization
  • Scaling (linear model path only)

Visual: Shared Pipeline

                 ┌───────────────────┐
                 │  Base Transforms  │
                 └─────────┬─────────┘
                           │
           ┌───────────────┴───────────────┐
           │                               │
   Linear Model Path                Tree Model Path
   (Scaling applied)                 (No scaling)

Shared logic reduces duplicated feature code.

. . .

3. Feature Engineering Consolidation

All feature logic extracted into configuration:

feature_config.json

Contains:

  • Derived features
  • Ratio logic
  • Categorical mapping rules
  • Feature ordering

No feature engineering lives in notebooks anymore.

This reduces:

  • Hidden transformations
  • Experiment drift
  • Inconsistent inference behavior
. . .

4. Model Training Wrapper

Encapsulated training function:

train_pipeline(config)

Outputs:

  • model.pkl
  • calibration.pkl (optional)
  • feature metadata
  • training summary JSON

All runs log:

  • Model version
  • Feature config hash
  • Hyperparameters
  • Evaluation metrics
  • Threshold config

Reproducibility becomes deterministic.

. . .

Calibration Strategy

Mode Purpose Uses Calibration
Dashboard Mode Risk ranking Yes
Auto-Flagger Mode Binary decision Optional

Why?

Calibration is deployment-environment specific. It must not be fused into model weights.

. . .

Threshold Policy Layer

Threshold logic isolated in:

threshold_policy.py

Inputs:

  • Fraud probability
  • Capacity pressure (optional)
  • Risk appetite mode

Outputs:

  • Decision label
  • Risk tier (P0–P3)
  • Explanation metadata

Visual: Prediction vs Policy

Model → 0.82 probability
        ↓
Threshold Policy (Aggressive Mode)
        ↓
Decision: FLAG (P0)

The model predicts. The policy decides. This separation is essential for governance.

. . .

Output Contract

Standardized prediction schema:

{
  "claim_id": "string",
  "fraud_probability": 0.82,
  "calibrated_probability": 0.79,
  "decision": "flag",
  "risk_tier": "P0",
  "model_version": "v1.3.2",
  "timestamp": "ISO-8601"
}

Why structured outputs matter:

  • Downstream API stability
  • Batch job consistency
  • Audit trails
  • Explainability traceability
. . .

Testing Strategy

Implemented:

✔ Unit Tests

  • Preprocessing transformations
  • Schema validation
  • Feature generation stability

✔ Golden Row Regression Test

  • Fixed input
  • Fixed expected output
  • Detects unintended behavior changes

This prevents silent pipeline regressions.

. . .

Before vs After Consolidation

Before After
Notebook-dependent Reusable pipeline module
Manual threshold tweaks External threshold policy
Feature drift risk Shared transform layer
Hard to audit Versioned artifacts
Model-centric Decision-system ready
. . .

Production Readiness Status

  • ✔ Deterministic preprocessing
  • ✔ Shared train/inference pipeline
  • ✔ Config-driven features
  • ✔ Calibration isolated
  • ✔ Policy isolated
  • ✔ Structured output contract
  • ✔ Version logging
  • ✔ Test coverage

Future extensions:

  • Data drift monitoring
  • Retraining triggers
  • SHAP precomputation layer
. . .

Strategic Outcome

The fraud detection system is now:

  • Portable
  • Deployable
  • Auditable
  • Governable
  • Extendable

This consolidation transforms a modeling project into a production-grade ML component suitable for regulated environments.