Design: Ad Click Prediction - Where ML Meets Revenue
Reading time: ~25 min | Interview relevance: High | Roles: MLE
The Real Interview Moment
"Design the ad click prediction system for a search engine or social media platform." You describe a logistic regression model that predicts clicks. The interviewer asks: "Your model predicts a 5% click probability for an ad, but the actual click rate is 3%. What happens?" You're not sure. The interviewer explains: "In a cost-per-click auction, we charge advertisers based on predicted click rates. If our predictions are 60% too high, we overcharge advertisers by 60%. They leave the platform. Calibration isn't a nice-to-have - it's a revenue requirement."
Ad click prediction is unique because model accuracy directly translates to revenue. An uncalibrated model doesn't just give bad recommendations - it breaks the ad auction economics.
What You Will Master
- Ad auction mechanics (second-price, VCG) and why calibration matters
- Feature engineering for ads (query-ad, user-ad, contextual features)
- Calibration techniques: Platt scaling, isotonic regression
- Real-time bidding architecture
- Training on delayed and partial feedback
- Multi-stage ranking for ad selection
The Complete Design
Step 1: Requirements (5 min)
Functional requirements:
- Predict P(click | user, query, ad) for ad selection and pricing
- Select top ads from 1M+ eligible ads per query
- Support multiple ad formats: search ads, display ads, video ads
Non-functional requirements:
- Latency: <20ms for ad scoring (ads compete with organic results)
- Calibration: Predicted CTR within 5% of actual CTR across all segments
- Throughput: 500K queries per second
- Freshness: New ads eligible within minutes of creation
The candidate who understands WHY calibration matters in ad systems - that predicted CTR feeds directly into the auction pricing equation - demonstrates real-world experience. If your model predicts 5% CTR but reality is 3%, you charge advertisers for 5% clickthrough rates they don't get. This is the #1 thing I test for in ad ML interviews.
Step 2: Problem Formulation (5 min)
The Ad Auction
For each query/impression:
- Eligible ads bid:
bid = advertiser_max_bid × P(click) - Ads ranked by:
rank_score = bid × quality_score - Winner pays: second-price auction →
cost = (next_bid / winner_CTR) + $0.01
Critical insight: The predicted CTR (P(click)) directly determines both ranking and pricing. Poor calibration means:
- Over-predicted CTR → overcharge advertisers → they leave
- Under-predicted CTR → undercharge advertisers → lost revenue
| Business Goal | ML Objective | Primary Metric | Guardrails |
|---|---|---|---|
| Maximize ad revenue while maintaining advertiser ROI | Predict P(click | user, query, ad) | Log-loss, calibration error | Revenue per query, advertiser churn rate |
Step 3: Features & Data (8 min)
Feature Categories
| Category | Features | Example |
|---|---|---|
| Query-Ad | Text match score, keyword match type (exact/phrase/broad), semantic similarity | Query "running shoes" + Ad "Nike Air Max" |
| Ad | Historical CTR, ad quality score, landing page quality, ad age, creative type | Ad with 2.5% historical CTR, image creative |
| User | Demographics, search history, past ad interactions, purchase intent signals | User who searched for "marathon training" yesterday |
| Context | Device, time of day, geographic location, search session depth | Mobile, 8pm, New York, 5th search in session |
| Advertiser | Account quality, bid amount, budget remaining, campaign objective | Advertiser with $10K daily budget, 60% spent |
Training Data
- Positive label: User clicked the ad
- Negative label: Ad was shown but not clicked
- CTR range: 1-5% for search ads, 0.1-0.5% for display ads
- Volume: Billions of impressions/day
- Label delay: Click happens within seconds, conversion (purchase) takes days
Ad click data has massive selection bias - you only observe clicks on ads that were shown, and they were shown because the old model ranked them highly. If you train naively on this data, you reinforce the old model's biases. Use exploration traffic (random ad selection on 1-5% of queries) to get unbiased training data, or use counterfactual learning.
Step 4: Model (8 min)
The Progression
Why Logistic Regression Is Still Used
In ad prediction, LR has unique advantages:
- Naturally calibrated: Outputs are probabilities (sigmoid)
- Fast inference: O(n) for n features - critical at 500K QPS
- Online learning: Easy to update with streaming data (FTRL optimizer)
- Interpretable: Feature weights explain predictions
Facebook's approach (still widely used): Use GBDT to create feature transformations, then feed leaf indices into LR. Combines GBDT's feature engineering power with LR's calibration.
Calibration
| Technique | How It Works | When to Use |
|---|---|---|
| Platt scaling | Fit a logistic regression on model scores | Simple, works for well-behaved models |
| Isotonic regression | Fit a monotonic step function | More flexible, handles non-linear miscalibration |
| Temperature scaling | Divide logits by temperature T | Neural networks |
| Segment-wise calibration | Calibrate separately by segment (device, country) | When miscalibration varies by segment |
How to measure: Expected Calibration Error (ECE) - bin predictions, compare mean predicted vs. actual CTR in each bin.
Step 5: Serving (8 min)
Architecture Decisions
| Component | Decision | Rationale |
|---|---|---|
| Candidate selection | Inverted index on keywords + targeting criteria | Sub-5ms retrieval |
| Model serving | Feature-hashed LR or quantized model | <10ms scoring for 100+ ads |
| Feature store | In-memory cache (Memcached) for user/ad features | Ultra-low latency |
| Calibration | Post-scoring calibration layer | Can update calibration without retraining |
| Online learning | FTRL with hourly mini-batch updates | Adapt to CTR changes quickly |
Real-Time Bidding (RTB) Variant
For programmatic display ads, the flow is different:
- Publisher sends ad request to ad exchange
- Ad exchange sends bid requests to demand-side platforms (DSPs)
- Each DSP has <100ms to respond with a bid
- Highest bidder wins, ad is shown
This means: Your entire scoring pipeline (feature lookup + model inference + bid calculation) must complete in <50ms including network latency.
Step 6: Evaluation & Iteration (8 min)
Offline Metrics
| Metric | What It Measures | Target |
|---|---|---|
| Log-loss | Prediction quality | Lower is better |
| AUC-ROC | Ranking quality | > 0.75 |
| Calibration error (ECE) | Predicted vs. actual CTR | < 5% relative error |
| Revenue impact (offline simulation) | Estimated revenue change | Positive |
Online Evaluation
- A/B test: Split traffic, measure revenue per query, advertiser satisfaction, user experience
- Metric: Revenue is the primary metric, but monitor advertiser ROI (if advertisers lose money, they leave)
- Duration: 1-2 weeks, with daily monitoring for regressions
Practice Problems
Problem 1: Conversion Prediction
Direction
Beyond clicks, advertisers want to optimize for conversions (purchases). How do you design a conversion prediction model?
Key Insight
Conversions are much sparser than clicks (10-100x) and have long label delays (days to weeks). Solutions: (1) Use click prediction as an intermediate signal - P(conversion) = P(click) × P(conversion|click). (2) Handle label delay with delayed feedback models - initially train on clicks, update labels as conversions arrive. (3) Multi-task learning: predict click and conversion jointly. (4) Use value prediction (predicted revenue) not just binary conversion.
Problem 2: New Ad Cold Start
Direction
A new advertiser creates their first ad. You have no historical performance data. How do you estimate CTR?
Key Insight
Cold start for ads: (1) Use content features (ad text, landing page quality) to estimate initial CTR. (2) Use similar-ad CTR as a prior (find ads with similar keywords/creative). (3) Exploration: show the ad to a small random sample, collect data quickly. (4) Thompson sampling: maintain uncertainty estimates, explore more when uncertain. Key trade-off: too much exploration wastes impressions on bad ads, too little means good new ads never get a chance.
Interview Cheat Sheet
| Question Pattern | Framework | Key Phrases |
|---|---|---|
| "Design ad click prediction" | Scoring + calibration + auction | "Calibrated CTR feeds into the auction - miscalibration directly impacts revenue" |
| "Why is calibration important?" | Auction economics | "Predicted CTR × bid = rank score. Over-prediction → overcharging → advertiser churn" |
| "How do you handle billions of features?" | Feature hashing + sparse LR | "Feature hashing to fixed dimension, FTRL for online learning" |
| "How do you handle label delay?" | Delayed feedback models | "Train on clicks (immediate), update with conversions (delayed)" |
Spaced Repetition Checkpoints
- Day 0: Explain the ad auction formula. Why does calibration matter for pricing?
- Day 3: Compare LR vs. GBDT+LR vs. deep models for CTR prediction. Trade-offs?
- Day 7: Design ad ranking for a video platform in 45 minutes.
- Day 14: Explain Platt scaling and isotonic regression. When would you use each?
- Day 21: Mock interview with follow-ups on real-time bidding, online learning, and conversion prediction.
What's Next
- Content Moderation - Another classification problem with multi-modal inputs
- Fraud Detection - Similar real-time scoring with high business impact
