Design: Fraud Detection - Real-Time Classification Under Extreme Imbalance

Reading time: ~25 min | Interview relevance: Critical | Roles: MLE

The Real Interview Moment

"Design a fraud detection system for a payment platform processing 10,000 transactions per second." You start describing a random forest classifier. The interviewer asks: "What's your positive rate?" You say "maybe 1%?" The interviewer responds: "In reality, it's 0.1%. With 10K TPS, that's 10 fraud cases per second and 9,990 legitimate ones. If your model has 99% accuracy, it still misses 10 frauds per second and falsely blocks 100 legitimate transactions per second. How do you handle this?"

Fraud detection is the interview question that tests whether you understand the real-world consequences of ML decisions - every false negative costs the company money, every false positive costs a customer their purchase.

What You Will Master

Handling extreme class imbalance (0.1% positive rate)
Real-time feature engineering for streaming transactions
Precision-recall trade-offs with business impact analysis
Adversarial model evolution (fraudsters adapt)
Multi-layer defense architecture
Online model updates to catch new fraud patterns
Rule-based + ML hybrid systems

The Complete Design

Step 1: Requirements (5 min)

Functional requirements:

Score every transaction in real-time: approve, decline, or send to manual review
Process 10K transactions per second (TPS)
Detect card-not-present (CNP) fraud, account takeover, and promo abuse

Non-functional requirements:

Latency: <50ms per transaction (synchronous in payment flow)
Precision: >95% at operating threshold (false positive rate <0.5%)
Recall: >80% (catch 80%+ of fraud)
Dollar-weighted recall: >90% (high-value fraud matters more)

Interviewer's Perspective

The candidate who says "I'd optimize for F1 score" gets a Lean Hire at best. The candidate who says "I'd optimize for dollar-weighted recall at a fixed false positive rate of 0.5%, because blocking a legitimate $500 purchase costs us$ 50 in lost revenue plus customer lifetime value, while missing a $500 fraud costs us$ 500 directly" gets a Strong Hire. Translate ML metrics to business impact.

Step 2: Problem Formulation (5 min)

Business Goal	ML Objective	Primary Metric	Guardrails
Minimize fraud losses while maintaining customer experience	Binary classification: fraud vs. legitimate	Precision@Recall=0.8, Dollar-saved rate	False positive rate <0.5%, latency <50ms

Three-tier decision system:

Three-Tier Fraud Decision System - Auto-Approve (score < 0.1), Manual Review (0.1–0.7), Auto-Decline (≥ 0.7)

Step 3: Features & Data (8 min)

Feature Categories

Category	Examples	Computation
Transaction	Amount, currency, merchant category, card type, time of day	Available instantly
Velocity	Transactions in last 1h/24h/7d, unique merchants in 24h, amount spent in 24h	Streaming aggregation
Behavioral	Deviation from user's normal spending pattern, unusual merchant category, time-of-day anomaly	Requires user profile
Device/Network	IP geolocation, device fingerprint, proxy/VPN detection, distance from last transaction	Real-time lookup
Graph	Shared device with known fraudster, merchant fraud rate, card-merchant pair frequency	Batch + real-time

The Most Predictive Features (from industry experience)

Velocity features: Number and total amount of transactions in the last hour
Deviation features: How different is this transaction from the user's normal pattern?
Network features: Is this IP/device associated with previous fraud?
Merchant risk score: Historical fraud rate at this merchant

Common Trap

Many candidates list features without thinking about computation feasibility. "Average transaction amount over the last 30 days" is easy in batch but requires a streaming aggregation pipeline at 10K TPS. Always specify: Can this feature be computed within 50ms? For each feature, state whether it's pre-computed (batch), streamed (near-real-time), or available instantly.

Training Data

Labels: Chargebacks (30-90 day delay), manual review decisions (1-24 hour delay)
Challenge: Label delay - you train on yesterday's labels but serve on today's transactions
Imbalance: 0.1% positive rate → use SMOTE, class weights, or focal loss
Adversarial drift: Fraud patterns change weekly as fraudsters adapt

Step 4: Model (8 min)

The Progression

Fraud Detection Model Progression - Rules Engine → XGBoost → Ensemble → Online Learning with Graph

Why XGBoost is the production standard for fraud detection:

Handles tabular data with mixed feature types
Robust to missing values
Fast inference (<5ms per transaction)
Interpretable feature importance (needed for regulatory compliance)
Works well with class imbalance via scale_pos_weight

Why NOT deep learning (initially):

Tabular data - tree models typically outperform neural networks
Latency constraint - deep models are slower
Interpretability requirement - regulators require explainable decisions
Data volume - 0.1% positive rate means limited positive examples

Handling Class Imbalance

Technique	How It Works	When to Use
Class weights	Upweight positive class in loss function	Always - simplest approach
SMOTE	Generate synthetic positive examples	When positive examples are very few
Focal loss	Down-weight easy negatives	Neural network models
Threshold tuning	Adjust decision threshold post-training	Always - separate model from business decision
Cost-sensitive learning	Weight by transaction amount	When dollar impact matters more than count

Step 5: Serving (8 min)

Fraud Detection Real-Time Serving - Transaction → Rules Engine → Feature Store → ML Scoring → Decision

Key Architecture Decisions

Component	Decision	Rationale
Rules engine first	Block known fraud patterns before ML	Deterministic, fast, catches known attacks
Feature store	Redis with streaming updates (Kafka + Flink)	Sub-10ms feature lookups for velocity features
Model serving	XGBoost in C++ (treelite)	<5ms inference, no GPU needed
Fallback	Rules-only mode if ML is down	Higher false positive rate but still catches obvious fraud
Model updates	Retrain daily, deploy with shadow scoring	Fraud patterns evolve - stale models miss new attacks

Why Rules + ML (Not Just ML)

Layer	Catches	Example
Rules engine	Known fraud patterns, sanctions, blacklists	Card on blocklist → instant decline
ML model	Complex, subtle patterns	Unusual velocity + new device + high amount → likely fraud
Manual review	Edge cases ML is uncertain about	Score 0.3-0.7, high-value transaction

Step 6: Evaluation & Iteration (8 min)

Offline Evaluation

Metric	Definition	Target
Precision @ FPR=0.5%	How precise when we block 0.5% of legitimate traffic	> 80%
Recall	% of fraud caught	> 80%
Dollar recall	% of fraud dollars caught	> 90%
AUC-PR	Area under precision-recall curve	> 0.7

Why not AUC-ROC? With 0.1% positive rate, AUC-ROC is inflated and misleading. AUC-PR is much more informative for imbalanced problems.

Adversarial Evolution

The Fraudster–Model Arms Race - Pattern A → Model V1 catches → Pattern B → Model V2 catches → Pattern C...

This arms race means:

Retrain frequently: Daily or weekly, not monthly
Monitor for drift: Track fraud rate by segment
A/B test carefully: Don't expose treatment group to higher fraud risk
Shadow scoring: Score with new model but use old model's decisions, compare

Practice Problems

Problem 1: Account Takeover Detection

Direction

A fraudster gains access to a legitimate user's account and makes transactions. The transactions look normal for that user. How do you detect this?

Key Insight

Account takeover (ATO) is harder than card fraud because the transactions match the user's profile. Key signals: login from new device/IP, password change followed by purchase, session behavior anomaly (navigation speed, click patterns), geographic impossibility (login from NYC then London in 1 hour). Build a separate ATO model that focuses on session and device features rather than transaction features.

Problem 2: Explain a Fraud Decision

Direction

A customer calls complaining their transaction was blocked. Your XGBoost model scored it as 0.85 (fraud). How do you explain the decision?

Key Insight

Use SHAP values to explain individual predictions: "This transaction was flagged because: (1) It was 5x your typical transaction amount (+0.15), (2) It came from a new device we haven't seen before (+0.12), (3) It was at a merchant category you've never used (+0.08)." This is not just a nice-to-have - financial regulations (EU AI Act, ECOA) require explainable automated decisions.

Interview Cheat Sheet

Question Pattern	Framework	Key Phrases
"Design fraud detection"	Rules + ML + manual review three-tier	"Rules catch known patterns, ML catches subtle ones, manual review handles uncertainty"
"How do you handle imbalance?"	Multiple techniques	"Class weights, SMOTE for training, threshold tuning for deployment, evaluate with AUC-PR not AUC-ROC"
"How do you handle adversarial evolution?"	Continuous retraining	"Daily retraining, drift monitoring by segment, shadow scoring before deployment"
"Precision vs. recall?"	Business impact analysis	"Each false positive costs $X in lost revenue, each false negative costs$ Y in fraud - optimize the total cost"

Spaced Repetition Checkpoints

Day 0: Draw the three-tier architecture (rules → ML → manual review). Explain why each layer exists.
Day 3: Explain 5 techniques for handling class imbalance. When would you use each?
Day 7: Design fraud detection for a ride-sharing platform in 45 minutes.
Day 14: Explain adversarial model drift and your retraining strategy.
Day 21: Mock interview with follow-ups on explainability, regulatory requirements, and real-time feature engineering.

What's Next

Ad Click Prediction - Another real-time classification problem with calibration requirements
Anomaly Detection - Unsupervised approach to detecting unusual patterns

The Real Interview Moment​

What You Will Master​

The Complete Design​

Step 1: Requirements (5 min)​

Step 2: Problem Formulation (5 min)​

Step 3: Features & Data (8 min)​

Feature Categories​

The Most Predictive Features (from industry experience)​

Training Data​

Step 4: Model (8 min)​

The Progression​

Handling Class Imbalance​

Step 5: Serving (8 min)​

Key Architecture Decisions​

Why Rules + ML (Not Just ML)​

Step 6: Evaluation & Iteration (8 min)​

Offline Evaluation​

Adversarial Evolution​

Practice Problems​

Problem 1: Account Takeover Detection​

Problem 2: Explain a Fraud Decision​

Interview Cheat Sheet​

Spaced Repetition Checkpoints​

What's Next​