Data Scientist Problem List

Reading time: ~40 min | Interview relevance: Critical | Roles: Data Scientist, Applied Scientist, Analytics Data Scientist, Product Data Scientist

You are in a Data Scientist interview at a major tech company. The interviewer slides a laptop across the table and says: "We ran an A/B test for a new feature. The p-value is 0.03, but the product manager says the effect is too small to ship. Walk me through how you would analyze this situation." If your heart just skipped a beat, this list is for you.

Data Scientist interviews are uniquely challenging because they blend statistics, coding, machine learning, and business reasoning in ways that no other role does. This list of 45 problems covers all four pillars: Statistics & Experimentation, SQL & Data Manipulation, ML Modeling, and Business Case Studies.

Data Scientist Interview Structure

Round	Duration	What They Test	Weight
Statistics & Probability	45-60 min	Hypothesis testing, distributions, experimental design	25-30%
SQL & Coding	45-60 min	SQL queries, pandas, data manipulation	20-25%
ML / Modeling	45-60 min	Model building, feature engineering, evaluation	20-25%
Business Case / Product Sense	45-60 min	Metric definition, problem framing, communication	15-20%
Behavioral	30-45 min	Influence without authority, stakeholder management	10%

:::tip The Data Scientist Secret The best Data Scientists are not the best coders or the best statisticians. They are the best at translating vague business problems into precise analytical questions. Practice the translation, not just the execution. :::

Section 1: Statistics & Experimentation (15 Problems)

Statistics is the backbone of data science. These problems test your ability to reason about uncertainty, design experiments, and interpret results.

Hypothesis Testing & Inference

#	Problem	Difficulty	Time	Key Concept	Why It Matters	Company Tags
1	Design an A/B Test for a New Checkout Flow	Medium	25 min	Sample size, power analysis, MDE	The single most important DS skill	FAANG, All
2	A/B Test Shows p=0.04 but CI Includes Practically Zero Effect. Ship or Not?	Hard	30 min	Statistical vs. practical significance	Distinguishes senior from junior DS	Meta, Google, Airbnb
3	Multiple Comparison Problem: Testing 10 Variants Simultaneously	Medium	20 min	Bonferroni, FDR, family-wise error rate	Real experiments test many variants	Meta, Netflix, Microsoft
4	Design a Switchback Experiment for a Marketplace	Hard	30 min	Network effects, interference, switchback design	Standard A/B testing fails with network effects	Uber, Lyft, DoorDash, Airbnb
5	Analyze an A/B Test with Novelty Effect	Medium	25 min	Time-varying treatment effect, holdout analysis	New features often show inflated initial effects	Meta, Google, Netflix

Probability & Distributions

#	Problem	Difficulty	Time	Key Concept	Why It Matters	Company Tags
6	Estimate the Probability of Server Failure Given Alert Data	Medium	20 min	Bayes' theorem, conditional probability	Bayesian reasoning is fundamental	Google, Amazon, Meta
7	Model the Number of Customer Arrivals per Hour	Easy	15 min	Poisson distribution, rate estimation	Distribution selection for count data	All
8	Calculate Confidence Intervals for a Proportion	Easy	15 min	Normal approximation, Wilson interval	Basic inference that many candidates get wrong	All
9	Compare Two Methods Using Bootstrap Confidence Intervals	Medium	25 min	Bootstrap resampling, percentile method	Non-parametric inference for complex metrics	Meta, Google, Airbnb
10	Design a Sequential Testing Procedure (Peeking Problem)	Hard	30 min	Sequential analysis, alpha spending	Real experiments are monitored continuously	Netflix, Meta, Uber

Advanced Experimentation

#	Problem	Difficulty	Time	Key Concept	Why It Matters	Company Tags
11	Estimate Treatment Effect When Random Assignment Is Not Possible	Hard	30 min	Causal inference, propensity score matching, DiD	Observational data is more common than experiments	Meta, Google, Uber
12	Design Guardrail Metrics for an A/B Test	Medium	20 min	Guardrails, pre-specified boundaries, SRM check	Protect against shipping harmful changes	FAANG, All
13	Variance Reduction Techniques for Faster Experiments	Hard	30 min	CUPED, stratification, pre-experiment covariates	Faster experiments = faster iteration	Meta, Netflix, Microsoft
14	Long-Term Impact Estimation When Only Short-Term Data Exists	Hard	30 min	Surrogate metrics, long-term holdout	Most business outcomes are long-term	Google, Meta, Netflix
15	Simpson's Paradox in Experiment Analysis	Medium	20 min	Confounding, segmented analysis	Aggregate results can mislead	All

:::warning Statistics Red Flags These mistakes immediately concern interviewers:

Confusing p-value with probability of hypothesis being true
Not checking assumptions (normality, independence) before applying tests
Using t-test when ratio metric requires delta method or bootstrap
Not considering multiple comparisons in multi-variant tests
Ignoring practical significance and focusing only on statistical significance :::

Section 2: SQL & Data Manipulation (12 Problems)

Data Scientists live in SQL and pandas. These problems test fluency in data extraction and transformation.

SQL Problems

#	Problem	Difficulty	Time	Key Concept	Why It Matters	Company Tags
16	Calculate Daily Active Users (DAU), Weekly Active Users (WAU), and Stickiness	Medium	20 min	Date functions, COUNT DISTINCT, ratio computation	Core product metric computation	Meta, Google, Snap
17	Find Power Users (Users in Top 10% of Activity)	Medium	20 min	Window functions, NTILE/PERCENT_RANK	User segmentation drives product decisions	Meta, Uber, Airbnb
18	Compute Funnel Conversion Rates with Drop-Off Analysis	Medium	25 min	Sequential event joins, conditional aggregation	Funnel analysis is a DS bread-and-butter task	All
19	Detect Churned Users Who Reactivated	Hard	25 min	Self-join with temporal logic, gap detection	Win-back analysis is high business value	Spotify, Netflix, Uber
20	Build a Cohort Retention Table	Hard	30 min	Cohort join, pivot logic, date arithmetic	The canonical product analytics query	Meta, Airbnb, Spotify
21	Find the First Touchpoint That Led to Conversion	Medium	20 min	Attribution logic, FIRST_VALUE window function	Marketing attribution analysis	Google, Meta, Airbnb

Pandas Problems

#	Problem	Difficulty	Time	Key Concept	Why It Matters	Company Tags
22	Clean and Merge Multiple Messy CSV Files	Easy	20 min	Data cleaning, merge, type coercion	Real data is always messy	All
23	Compute Rolling Engagement Metrics with User Segmentation	Medium	25 min	GroupBy, rolling window, multi-level aggregation	Time-series feature engineering	Meta, Uber, Airbnb
24	Build a Feature Matrix from Event-Level Data	Medium	25 min	Pivot, aggregation, sparse features	Feature engineering from raw logs	Big Tech, Startups
25	Detect and Handle Outliers in Metric Data	Medium	20 min	IQR, z-score, Winsorization	Outliers distort metrics and model performance	All
26	Perform Time-Series Decomposition of Revenue Data	Medium	25 min	Trend, seasonality, residual decomposition	Understanding revenue patterns	All
27	Create Automated Summary Statistics Report	Easy	15 min	Descriptive stats, distribution visualization	First step of any analysis	All

Section 3: ML Modeling (10 Problems)

Data Scientist ML questions focus more on practical modeling decisions than algorithm implementation.

Model Building & Selection

#	Problem	Difficulty	Time	Key Concept	Why It Matters	Company Tags
28	Build a Churn Prediction Model and Explain Feature Importance	Medium	35 min	Classification, SHAP/feature importance, business action	Connects modeling to business impact	All
29	Predict Customer Lifetime Value (LTV)	Hard	35 min	Regression, censored data, cohort-based estimation	LTV drives acquisition and retention strategy	Meta, Airbnb, Netflix, Uber
30	Build a Propensity Model for Targeted Marketing	Medium	30 min	Binary classification, calibration, uplift modeling	Marketing optimization requires calibrated models	All
31	Forecast Daily Revenue for Next 90 Days	Medium	30 min	Time-series forecasting, seasonality, uncertainty quantification	Revenue forecasting is the highest-visibility DS task	All
32	Detect Fraudulent Transactions in Highly Imbalanced Data	Hard	35 min	Extreme imbalance, precision-recall tradeoff, cost-sensitive learning	Fraud detection is a classic DS problem	Stripe, PayPal, Amazon
33	Build a Recommendation System for a Content Platform	Medium	30 min	Collaborative filtering, content-based, hybrid	Recommendations drive engagement at every platform	Netflix, Spotify, Meta

Model Evaluation & Interpretation

#	Problem	Difficulty	Time	Key Concept	Why It Matters	Company Tags
34	Your Model Has 95% Accuracy but Stakeholders Don't Trust It. Diagnose.	Medium	25 min	Class imbalance, confusion matrix deep dive, calibration	Accuracy alone is misleading	All
35	Compare Two Models: One Has Better AUC, The Other Better Precision@K	Medium	25 min	Metric selection, business context, threshold optimization	Different metrics tell different stories	All
36	Explain a Black-Box Model to a Non-Technical Stakeholder	Medium	20 min	SHAP, partial dependence, plain language explanation	Communication is a core DS skill	All
37	Detect Data Drift in a Production Model	Medium	25 min	PSI, KS test, feature distribution monitoring	Models degrade over time in production	FAANG, Big Tech

Section 4: Business Case Studies (8 Problems)

Business case studies test your ability to frame problems, define metrics, and connect analysis to decisions.

#	Problem	Difficulty	Time	Key Concept	Why It Matters	Company Tags
38	A Key Metric Dropped 10% Overnight. Walk Through Your Investigation.	Medium	25 min	Root cause analysis, segmentation, data quality checks	The most common DS on-call scenario	FAANG, All
39	Define the Success Metrics for a New Social Feature	Medium	20 min	Metric hierarchy (north star, primary, guardrail)	Product sense is critical for product DS	Meta, Google, Snap
40	Should We Launch This Feature Based on Inconclusive A/B Test Results?	Hard	30 min	Decision under uncertainty, business judgment, cost of wrong decision	Textbook doesn't cover inconclusive results	Meta, Google, Netflix
41	Design a Data Strategy for a New Market Entry	Medium	25 min	Data collection, baseline establishment, success criteria	Data strategy drives business strategy	Uber, Airbnb, DoorDash
42	Evaluate Whether a Pricing Change Increased Revenue	Hard	30 min	Price elasticity, causal inference, confounders	Pricing analysis requires causal thinking	Uber, Airbnb, Amazon
43	Prioritize Three Potential ML Projects Given Resource Constraints	Medium	20 min	Impact estimation, feasibility assessment, ROI framework	Resource allocation is a key DS leadership skill	All
44	A Model Performs Well Offline but Poorly Online. Diagnose.	Hard	30 min	Train-serve skew, data leakage, feedback loops	The classic production ML problem	FAANG, Big Tech
45	Design a Metric for Measuring Marketplace Health	Medium	25 min	Two-sided metrics, supply-demand balance, leading indicators	Marketplace metrics are inherently complex	Uber, Airbnb, DoorDash

:::tip Business Case Framework For any business case, follow this structure:

Clarify the problem and business context
Define success metrics (primary + guardrails)
Hypothesize root causes or expected outcomes
Analyze using data (describe the analysis you would do)
Recommend a course of action with confidence level
Acknowledge risks and next steps :::

4-Week Data Scientist Study Plan

Week	Focus	Problems	Daily Load
Week 1	Statistics & Experimentation	#1-15	2-3 problems/day
Week 2	SQL & Data Manipulation	#16-27	2 problems/day
Week 3	ML Modeling	#28-37	1-2 problems/day (deeper)
Week 4	Business Cases + Review	#38-45 + review	1 case/day + review

Week 1: Statistics Deep Dive

Day 1: #1, #2 (A/B testing fundamentals)
Day 2: #3, #4 (multiple comparisons, switchback)
Day 3: #5, #6 (novelty effect, Bayes)
Day 4: #7, #8 (distributions, confidence intervals)
Day 5: #9, #10 (bootstrap, sequential testing)
Day 6: #11, #12 (causal inference, guardrails)
Day 7: #13, #14, #15 (variance reduction, long-term impact, Simpson's paradox)

Week 2: SQL & Pandas Sprint

Day 1: #16, #17 (DAU/WAU, power users)
Day 2: #18, #19 (funnels, churn detection)
Day 3: #20, #21 (retention, attribution)
Day 4: #22, #23 (data cleaning, rolling metrics)
Day 5: #24, #25 (feature matrix, outliers)
Day 6: #26, #27 (time-series decomposition, summary stats)
Day 7: Review all SQL problems without reference

Key Statistical Formulas to Know

Sample Size Calculation

n = (Z_alpha/2 + Z_beta)^2 * (2 * sigma^2) / delta^2

Where:
- Z_alpha/2 = 1.96 for 95% confidence
- Z_beta = 0.84 for 80% power
- sigma^2 = variance of the metric
- delta = minimum detectable effect (MDE)

Common Statistical Tests Cheat Sheet

Scenario	Test	Assumptions
Compare two means (large n)	Z-test	Normal approximation
Compare two means (small n)	t-test	Normality, equal variance
Compare two proportions	Chi-squared / Z-test for proportions	Large n for normal approx
Compare means of 3+ groups	ANOVA	Normality, equal variance
Non-normal distributions	Mann-Whitney U	Independent samples
Paired measurements	Paired t-test	Normal differences
Ratio metrics	Delta method or bootstrap	Depends on method

Variance Reduction with CUPED

Y_adjusted = Y - theta * X

Where:
- Y = metric during experiment
- X = same metric pre-experiment (covariate)
- theta = Cov(Y, X) / Var(X)

Variance reduction: 1 - Corr(Y, X)^2

Problem Deep Dives

Problem 2: Statistical vs. Practical Significance

Scenario: An A/B test shows p=0.04 (significant at alpha=0.05). The 95% CI for the effect on revenue per user is [ $0.002,$ 0.15]. The product change requires 2 engineers for 3 months.

Analysis Framework:

The test is statistically significant, but the lower bound of the CI ($0.002/user) is tiny
Calculate total expected impact: 0.002 * DAU * 365 = annual minimum impact
Compare against engineering cost (2 engineers * 3 months * salary)
Consider opportunity cost: what else could those engineers build?
Decision: If minimum impact < cost, don't ship despite significance

Key Insight: Statistical significance means the effect is real (non-zero). It does not mean the effect is large enough to matter.

Problem 11: Causal Inference Without Random Assignment

Scenario: You want to measure the impact of a new onboarding flow, but it was rolled out to all new users in one region. You cannot run a randomized experiment.

Approaches:

Difference-in-Differences (DiD): Compare treated region before/after vs. control region before/after. Requires parallel trends assumption.
Propensity Score Matching: Match treated users to similar untreated users on observables. Requires no unmeasured confounders.
Synthetic Control: Create a weighted combination of control regions that matches the treated region pre-intervention.
Regression Discontinuity: If there is a sharp cutoff (e.g., date of rollout), compare users just before/after the cutoff.

When to use each:

Method	Best When	Key Assumption
DiD	Regional or temporal rollout	Parallel trends
PSM	Individual-level treatment variation	No unmeasured confounders
Synthetic Control	Few treated units (regions, countries)	Pre-treatment fit
RDD	Sharp cutoff exists	Continuity around cutoff

DS-Specific Patterns to Master

Pattern	Where It Appears	Problems
A/B test design and analysis	Nearly every DS interview	#1-5, #10, #12, #13, #40
Metric definition	Product DS roles	#39, #45
Causal inference	Quasi-experiments	#4, #11, #14, #42
Cohort analysis	Retention, LTV	#20, #29
Funnel analysis	Product optimization	#18, #38
Time-series reasoning	Forecasting, monitoring	#26, #31, #38
SQL window functions	Every DS SQL round	#16, #17, #19, #20, #21
Model interpretability	Stakeholder communication	#34, #36

Difficulty Distribution

Difficulty	Problems	Count
Easy	#7, #8, #22, #27	4
Medium	#1, #3, #5, #6, #9, #12, #15, #16, #17, #18, #21, #23, #24, #25, #26, #28, #30, #31, #33, #34, #35, #36, #37, #38, #39, #41, #43, #45	28
Hard	#2, #4, #10, #11, #13, #14, #19, #20, #29, #32, #40, #42, #44	13

Next Steps

After completing the Data Scientist problem list:

Easy Tier if you need more practice on fundamentals
Meta-Style Problems since Meta heavily hires product Data Scientists
Google-Style Problems for research-oriented DS roles
Section 15: Role-Specific Prep for the full Data Scientist preparation path

Data Scientist Interview Structure​

Section 1: Statistics & Experimentation (15 Problems)​

Hypothesis Testing & Inference​

Probability & Distributions​

Advanced Experimentation​

Section 2: SQL & Data Manipulation (12 Problems)​

SQL Problems​

Pandas Problems​

Section 3: ML Modeling (10 Problems)​

Model Building & Selection​

Model Evaluation & Interpretation​

Section 4: Business Case Studies (8 Problems)​

4-Week Data Scientist Study Plan​

Week 1: Statistics Deep Dive​

Week 2: SQL & Pandas Sprint​

Key Statistical Formulas to Know​

Sample Size Calculation​

Common Statistical Tests Cheat Sheet​

Variance Reduction with CUPED​

Problem Deep Dives​

Problem 2: Statistical vs. Practical Significance​

Problem 11: Causal Inference Without Random Assignment​

DS-Specific Patterns to Master​

Difficulty Distribution​

Next Steps​