Skip to main content

Data Scientist Problem List

Reading time: ~40 min | Interview relevance: Critical | Roles: Data Scientist, Applied Scientist, Analytics Data Scientist, Product Data Scientist

You are in a Data Scientist interview at a major tech company. The interviewer slides a laptop across the table and says: "We ran an A/B test for a new feature. The p-value is 0.03, but the product manager says the effect is too small to ship. Walk me through how you would analyze this situation." If your heart just skipped a beat, this list is for you.

Data Scientist interviews are uniquely challenging because they blend statistics, coding, machine learning, and business reasoning in ways that no other role does. This list of 45 problems covers all four pillars: Statistics & Experimentation, SQL & Data Manipulation, ML Modeling, and Business Case Studies.

Data Scientist Interview Structure

RoundDurationWhat They TestWeight
Statistics & Probability45-60 minHypothesis testing, distributions, experimental design25-30%
SQL & Coding45-60 minSQL queries, pandas, data manipulation20-25%
ML / Modeling45-60 minModel building, feature engineering, evaluation20-25%
Business Case / Product Sense45-60 minMetric definition, problem framing, communication15-20%
Behavioral30-45 minInfluence without authority, stakeholder management10%

:::tip The Data Scientist Secret The best Data Scientists are not the best coders or the best statisticians. They are the best at translating vague business problems into precise analytical questions. Practice the translation, not just the execution. :::

Section 1: Statistics & Experimentation (15 Problems)

Statistics is the backbone of data science. These problems test your ability to reason about uncertainty, design experiments, and interpret results.

Hypothesis Testing & Inference

#ProblemDifficultyTimeKey ConceptWhy It MattersCompany Tags
1Design an A/B Test for a New Checkout FlowMedium25 minSample size, power analysis, MDEThe single most important DS skillFAANG, All
2A/B Test Shows p=0.04 but CI Includes Practically Zero Effect. Ship or Not?Hard30 minStatistical vs. practical significanceDistinguishes senior from junior DSMeta, Google, Airbnb
3Multiple Comparison Problem: Testing 10 Variants SimultaneouslyMedium20 minBonferroni, FDR, family-wise error rateReal experiments test many variantsMeta, Netflix, Microsoft
4Design a Switchback Experiment for a MarketplaceHard30 minNetwork effects, interference, switchback designStandard A/B testing fails with network effectsUber, Lyft, DoorDash, Airbnb
5Analyze an A/B Test with Novelty EffectMedium25 minTime-varying treatment effect, holdout analysisNew features often show inflated initial effectsMeta, Google, Netflix

Probability & Distributions

#ProblemDifficultyTimeKey ConceptWhy It MattersCompany Tags
6Estimate the Probability of Server Failure Given Alert DataMedium20 minBayes' theorem, conditional probabilityBayesian reasoning is fundamentalGoogle, Amazon, Meta
7Model the Number of Customer Arrivals per HourEasy15 minPoisson distribution, rate estimationDistribution selection for count dataAll
8Calculate Confidence Intervals for a ProportionEasy15 minNormal approximation, Wilson intervalBasic inference that many candidates get wrongAll
9Compare Two Methods Using Bootstrap Confidence IntervalsMedium25 minBootstrap resampling, percentile methodNon-parametric inference for complex metricsMeta, Google, Airbnb
10Design a Sequential Testing Procedure (Peeking Problem)Hard30 minSequential analysis, alpha spendingReal experiments are monitored continuouslyNetflix, Meta, Uber

Advanced Experimentation

#ProblemDifficultyTimeKey ConceptWhy It MattersCompany Tags
11Estimate Treatment Effect When Random Assignment Is Not PossibleHard30 minCausal inference, propensity score matching, DiDObservational data is more common than experimentsMeta, Google, Uber
12Design Guardrail Metrics for an A/B TestMedium20 minGuardrails, pre-specified boundaries, SRM checkProtect against shipping harmful changesFAANG, All
13Variance Reduction Techniques for Faster ExperimentsHard30 minCUPED, stratification, pre-experiment covariatesFaster experiments = faster iterationMeta, Netflix, Microsoft
14Long-Term Impact Estimation When Only Short-Term Data ExistsHard30 minSurrogate metrics, long-term holdoutMost business outcomes are long-termGoogle, Meta, Netflix
15Simpson's Paradox in Experiment AnalysisMedium20 minConfounding, segmented analysisAggregate results can misleadAll

:::warning Statistics Red Flags These mistakes immediately concern interviewers:

  • Confusing p-value with probability of hypothesis being true
  • Not checking assumptions (normality, independence) before applying tests
  • Using t-test when ratio metric requires delta method or bootstrap
  • Not considering multiple comparisons in multi-variant tests
  • Ignoring practical significance and focusing only on statistical significance :::

Section 2: SQL & Data Manipulation (12 Problems)

Data Scientists live in SQL and pandas. These problems test fluency in data extraction and transformation.

SQL Problems

#ProblemDifficultyTimeKey ConceptWhy It MattersCompany Tags
16Calculate Daily Active Users (DAU), Weekly Active Users (WAU), and StickinessMedium20 minDate functions, COUNT DISTINCT, ratio computationCore product metric computationMeta, Google, Snap
17Find Power Users (Users in Top 10% of Activity)Medium20 minWindow functions, NTILE/PERCENT_RANKUser segmentation drives product decisionsMeta, Uber, Airbnb
18Compute Funnel Conversion Rates with Drop-Off AnalysisMedium25 minSequential event joins, conditional aggregationFunnel analysis is a DS bread-and-butter taskAll
19Detect Churned Users Who ReactivatedHard25 minSelf-join with temporal logic, gap detectionWin-back analysis is high business valueSpotify, Netflix, Uber
20Build a Cohort Retention TableHard30 minCohort join, pivot logic, date arithmeticThe canonical product analytics queryMeta, Airbnb, Spotify
21Find the First Touchpoint That Led to ConversionMedium20 minAttribution logic, FIRST_VALUE window functionMarketing attribution analysisGoogle, Meta, Airbnb

Pandas Problems

#ProblemDifficultyTimeKey ConceptWhy It MattersCompany Tags
22Clean and Merge Multiple Messy CSV FilesEasy20 minData cleaning, merge, type coercionReal data is always messyAll
23Compute Rolling Engagement Metrics with User SegmentationMedium25 minGroupBy, rolling window, multi-level aggregationTime-series feature engineeringMeta, Uber, Airbnb
24Build a Feature Matrix from Event-Level DataMedium25 minPivot, aggregation, sparse featuresFeature engineering from raw logsBig Tech, Startups
25Detect and Handle Outliers in Metric DataMedium20 minIQR, z-score, WinsorizationOutliers distort metrics and model performanceAll
26Perform Time-Series Decomposition of Revenue DataMedium25 minTrend, seasonality, residual decompositionUnderstanding revenue patternsAll
27Create Automated Summary Statistics ReportEasy15 minDescriptive stats, distribution visualizationFirst step of any analysisAll

Section 3: ML Modeling (10 Problems)

Data Scientist ML questions focus more on practical modeling decisions than algorithm implementation.

Model Building & Selection

#ProblemDifficultyTimeKey ConceptWhy It MattersCompany Tags
28Build a Churn Prediction Model and Explain Feature ImportanceMedium35 minClassification, SHAP/feature importance, business actionConnects modeling to business impactAll
29Predict Customer Lifetime Value (LTV)Hard35 minRegression, censored data, cohort-based estimationLTV drives acquisition and retention strategyMeta, Airbnb, Netflix, Uber
30Build a Propensity Model for Targeted MarketingMedium30 minBinary classification, calibration, uplift modelingMarketing optimization requires calibrated modelsAll
31Forecast Daily Revenue for Next 90 DaysMedium30 minTime-series forecasting, seasonality, uncertainty quantificationRevenue forecasting is the highest-visibility DS taskAll
32Detect Fraudulent Transactions in Highly Imbalanced DataHard35 minExtreme imbalance, precision-recall tradeoff, cost-sensitive learningFraud detection is a classic DS problemStripe, PayPal, Amazon
33Build a Recommendation System for a Content PlatformMedium30 minCollaborative filtering, content-based, hybridRecommendations drive engagement at every platformNetflix, Spotify, Meta

Model Evaluation & Interpretation

#ProblemDifficultyTimeKey ConceptWhy It MattersCompany Tags
34Your Model Has 95% Accuracy but Stakeholders Don't Trust It. Diagnose.Medium25 minClass imbalance, confusion matrix deep dive, calibrationAccuracy alone is misleadingAll
35Compare Two Models: One Has Better AUC, The Other Better Precision@KMedium25 minMetric selection, business context, threshold optimizationDifferent metrics tell different storiesAll
36Explain a Black-Box Model to a Non-Technical StakeholderMedium20 minSHAP, partial dependence, plain language explanationCommunication is a core DS skillAll
37Detect Data Drift in a Production ModelMedium25 minPSI, KS test, feature distribution monitoringModels degrade over time in productionFAANG, Big Tech

Section 4: Business Case Studies (8 Problems)

Business case studies test your ability to frame problems, define metrics, and connect analysis to decisions.

#ProblemDifficultyTimeKey ConceptWhy It MattersCompany Tags
38A Key Metric Dropped 10% Overnight. Walk Through Your Investigation.Medium25 minRoot cause analysis, segmentation, data quality checksThe most common DS on-call scenarioFAANG, All
39Define the Success Metrics for a New Social FeatureMedium20 minMetric hierarchy (north star, primary, guardrail)Product sense is critical for product DSMeta, Google, Snap
40Should We Launch This Feature Based on Inconclusive A/B Test Results?Hard30 minDecision under uncertainty, business judgment, cost of wrong decisionTextbook doesn't cover inconclusive resultsMeta, Google, Netflix
41Design a Data Strategy for a New Market EntryMedium25 minData collection, baseline establishment, success criteriaData strategy drives business strategyUber, Airbnb, DoorDash
42Evaluate Whether a Pricing Change Increased RevenueHard30 minPrice elasticity, causal inference, confoundersPricing analysis requires causal thinkingUber, Airbnb, Amazon
43Prioritize Three Potential ML Projects Given Resource ConstraintsMedium20 minImpact estimation, feasibility assessment, ROI frameworkResource allocation is a key DS leadership skillAll
44A Model Performs Well Offline but Poorly Online. Diagnose.Hard30 minTrain-serve skew, data leakage, feedback loopsThe classic production ML problemFAANG, Big Tech
45Design a Metric for Measuring Marketplace HealthMedium25 minTwo-sided metrics, supply-demand balance, leading indicatorsMarketplace metrics are inherently complexUber, Airbnb, DoorDash

:::tip Business Case Framework For any business case, follow this structure:

  1. Clarify the problem and business context
  2. Define success metrics (primary + guardrails)
  3. Hypothesize root causes or expected outcomes
  4. Analyze using data (describe the analysis you would do)
  5. Recommend a course of action with confidence level
  6. Acknowledge risks and next steps :::

4-Week Data Scientist Study Plan

WeekFocusProblemsDaily Load
Week 1Statistics & Experimentation#1-152-3 problems/day
Week 2SQL & Data Manipulation#16-272 problems/day
Week 3ML Modeling#28-371-2 problems/day (deeper)
Week 4Business Cases + Review#38-45 + review1 case/day + review

Week 1: Statistics Deep Dive

Day 1: #1, #2 (A/B testing fundamentals)
Day 2: #3, #4 (multiple comparisons, switchback)
Day 3: #5, #6 (novelty effect, Bayes)
Day 4: #7, #8 (distributions, confidence intervals)
Day 5: #9, #10 (bootstrap, sequential testing)
Day 6: #11, #12 (causal inference, guardrails)
Day 7: #13, #14, #15 (variance reduction, long-term impact, Simpson's paradox)

Week 2: SQL & Pandas Sprint

Day 1: #16, #17 (DAU/WAU, power users)
Day 2: #18, #19 (funnels, churn detection)
Day 3: #20, #21 (retention, attribution)
Day 4: #22, #23 (data cleaning, rolling metrics)
Day 5: #24, #25 (feature matrix, outliers)
Day 6: #26, #27 (time-series decomposition, summary stats)
Day 7: Review all SQL problems without reference

Key Statistical Formulas to Know

Sample Size Calculation

n = (Z_alpha/2 + Z_beta)^2 * (2 * sigma^2) / delta^2

Where:
- Z_alpha/2 = 1.96 for 95% confidence
- Z_beta = 0.84 for 80% power
- sigma^2 = variance of the metric
- delta = minimum detectable effect (MDE)

Common Statistical Tests Cheat Sheet

ScenarioTestAssumptions
Compare two means (large n)Z-testNormal approximation
Compare two means (small n)t-testNormality, equal variance
Compare two proportionsChi-squared / Z-test for proportionsLarge n for normal approx
Compare means of 3+ groupsANOVANormality, equal variance
Non-normal distributionsMann-Whitney UIndependent samples
Paired measurementsPaired t-testNormal differences
Ratio metricsDelta method or bootstrapDepends on method

Variance Reduction with CUPED

Y_adjusted = Y - theta * X

Where:
- Y = metric during experiment
- X = same metric pre-experiment (covariate)
- theta = Cov(Y, X) / Var(X)

Variance reduction: 1 - Corr(Y, X)^2

Problem Deep Dives

Problem 2: Statistical vs. Practical Significance

Scenario: An A/B test shows p=0.04 (significant at alpha=0.05). The 95% CI for the effect on revenue per user is [0.002,0.002, 0.15]. The product change requires 2 engineers for 3 months.

Analysis Framework:

  1. The test is statistically significant, but the lower bound of the CI ($0.002/user) is tiny
  2. Calculate total expected impact: 0.002 * DAU * 365 = annual minimum impact
  3. Compare against engineering cost (2 engineers * 3 months * salary)
  4. Consider opportunity cost: what else could those engineers build?
  5. Decision: If minimum impact < cost, don't ship despite significance

Key Insight: Statistical significance means the effect is real (non-zero). It does not mean the effect is large enough to matter.

Problem 11: Causal Inference Without Random Assignment

Scenario: You want to measure the impact of a new onboarding flow, but it was rolled out to all new users in one region. You cannot run a randomized experiment.

Approaches:

  1. Difference-in-Differences (DiD): Compare treated region before/after vs. control region before/after. Requires parallel trends assumption.
  2. Propensity Score Matching: Match treated users to similar untreated users on observables. Requires no unmeasured confounders.
  3. Synthetic Control: Create a weighted combination of control regions that matches the treated region pre-intervention.
  4. Regression Discontinuity: If there is a sharp cutoff (e.g., date of rollout), compare users just before/after the cutoff.

When to use each:

MethodBest WhenKey Assumption
DiDRegional or temporal rolloutParallel trends
PSMIndividual-level treatment variationNo unmeasured confounders
Synthetic ControlFew treated units (regions, countries)Pre-treatment fit
RDDSharp cutoff existsContinuity around cutoff

DS-Specific Patterns to Master

PatternWhere It AppearsProblems
A/B test design and analysisNearly every DS interview#1-5, #10, #12, #13, #40
Metric definitionProduct DS roles#39, #45
Causal inferenceQuasi-experiments#4, #11, #14, #42
Cohort analysisRetention, LTV#20, #29
Funnel analysisProduct optimization#18, #38
Time-series reasoningForecasting, monitoring#26, #31, #38
SQL window functionsEvery DS SQL round#16, #17, #19, #20, #21
Model interpretabilityStakeholder communication#34, #36

Difficulty Distribution

DifficultyProblemsCount
Easy#7, #8, #22, #274
Medium#1, #3, #5, #6, #9, #12, #15, #16, #17, #18, #21, #23, #24, #25, #26, #28, #30, #31, #33, #34, #35, #36, #37, #38, #39, #41, #43, #4528
Hard#2, #4, #10, #11, #13, #14, #19, #20, #29, #32, #40, #42, #4413

Next Steps

After completing the Data Scientist problem list:

© 2026 EngineersOfAI. All rights reserved.