The objective was not improving model accuracy.
The objective was engineering maturity:
- Deterministic preprocessing
- Reproducible training
- Clear separation of prediction vs decision
- Deployment-ready inference contract
- Auditability
This transforms the project from “notebook experiment” into “production system component.”
System Overview
High-Level Architecture
Raw Claim Data
↓
Schema Validation
↓
Preprocessing Layer
↓
Feature Engineering
↓
Model Inference
↓
Calibration (optional)
↓
Threshold Policy
↓
Structured Decision Output
Key principle:
Core Design Decisions
3.1 Separation of Concerns
| Layer | Responsibility | Why |
|---|---|---|
| Preprocessing | Data cleaning & transformations | Prevent training/inference drift |
| Model | Probability estimation | Statistical task only |
| Calibration | Probability reliability | Mode-specific deployment |
| Threshold Policy | Business decision logic | Governance & flexibility |
This prevents:
- Threshold baked into model weights
- Calibration hard-coded
- Hidden feature mismatch
Pipeline Components
1. Input Schema Validation
Before any transformation:
- Required columns checked
- Data types validated
- Feature order enforced
- Unknown columns flagged
- Missing fields handled explicitly
Failure behavior:
- Hard error
- No silent fallback
Why this matters:
Most production ML failures are not model errors. They are schema drift errors.
2. Preprocessing Layer
Unified transformation module used in:
- Training
- Inference
- Batch scoring
- API scoring
Includes:
- Missing value imputation
- Categorical encoding
- Cyclical time encoding (hour → sin/cos)
- Missing flags
- Ratio normalization
- Scaling (linear model path only)
Visual: Shared Pipeline
┌───────────────────┐
│ Base Transforms │
└─────────┬─────────┘
│
┌───────────────┴───────────────┐
│ │
Linear Model Path Tree Model Path
(Scaling applied) (No scaling)
Shared logic reduces duplicated feature code.
3. Feature Engineering Consolidation
All feature logic extracted into configuration:
feature_config.json
Contains:
- Derived features
- Ratio logic
- Categorical mapping rules
- Feature ordering
No feature engineering lives in notebooks anymore.
This reduces:
- Hidden transformations
- Experiment drift
- Inconsistent inference behavior
4. Model Training Wrapper
Encapsulated training function:
train_pipeline(config)
Outputs:
model.pklcalibration.pkl(optional)- feature metadata
- training summary JSON
All runs log:
- Model version
- Feature config hash
- Hyperparameters
- Evaluation metrics
- Threshold config
Reproducibility becomes deterministic.
Calibration Strategy
| Mode | Purpose | Uses Calibration |
|---|---|---|
| Dashboard Mode | Risk ranking | Yes |
| Auto-Flagger Mode | Binary decision | Optional |
Why?
Calibration is deployment-environment specific. It must not be fused into model weights.
Threshold Policy Layer
Threshold logic isolated in:
threshold_policy.py
Inputs:
- Fraud probability
- Capacity pressure (optional)
- Risk appetite mode
Outputs:
- Decision label
- Risk tier (P0–P3)
- Explanation metadata
Visual: Prediction vs Policy
Model → 0.82 probability
↓
Threshold Policy (Aggressive Mode)
↓
Decision: FLAG (P0)
The model predicts. The policy decides. This separation is essential for governance.
Output Contract
Standardized prediction schema:
{
"claim_id": "string",
"fraud_probability": 0.82,
"calibrated_probability": 0.79,
"decision": "flag",
"risk_tier": "P0",
"model_version": "v1.3.2",
"timestamp": "ISO-8601"
}
Why structured outputs matter:
- Downstream API stability
- Batch job consistency
- Audit trails
- Explainability traceability
Testing Strategy
Implemented:
✔ Unit Tests
- Preprocessing transformations
- Schema validation
- Feature generation stability
✔ Golden Row Regression Test
- Fixed input
- Fixed expected output
- Detects unintended behavior changes
This prevents silent pipeline regressions.
Before vs After Consolidation
| Before | After |
|---|---|
| Notebook-dependent | Reusable pipeline module |
| Manual threshold tweaks | External threshold policy |
| Feature drift risk | Shared transform layer |
| Hard to audit | Versioned artifacts |
| Model-centric | Decision-system ready |
Production Readiness Status
- ✔ Deterministic preprocessing
- ✔ Shared train/inference pipeline
- ✔ Config-driven features
- ✔ Calibration isolated
- ✔ Policy isolated
- ✔ Structured output contract
- ✔ Version logging
- ✔ Test coverage
Future extensions:
- Data drift monitoring
- Retraining triggers
- SHAP precomputation layer
Strategic Outcome
The fraud detection system is now:
- Portable
- Deployable
- Auditable
- Governable
- Extendable
This consolidation transforms a modeling project into a production-grade ML component suitable for regulated environments.