Skip to main content

Data Scientist: 6-Week Prep Path

Reading time: ~40 min | Interview relevance: Critical | Roles: Data Scientist, Applied Scientist, Product Data Scientist, Analytics Data Scientist

The Real Interview Moment

The interviewer slides a laptop across the table. On the screen is a dataset from an e-commerce company. "We launched a new checkout flow two weeks ago. The product team says it increased revenue by 12%. The engineering team says it increased page load time by 300ms. The CEO wants to know if we should keep it. You have 45 minutes. Go."

This is the Data Scientist interview distilled to its essence. It is not about building the most sophisticated model or writing the most elegant code. It is about translating messy business questions into rigorous statistical analyses, and then communicating findings in a way that drives decisions.

The Data Scientist interview is unique because it tests something that no other AI/ML role tests as heavily: your ability to think critically about data and communicate insights to non-technical stakeholders. You need statistics, SQL, ML, and business sense -- all working together.

This 6-week plan will prepare you for every dimension.

Role Overview

What Data Scientists Do

Data Scientists extract insights from data to drive business decisions. They:

  • Design and analyze A/B tests and experiments
  • Build predictive models for business outcomes
  • Create dashboards and reports for stakeholders
  • Define and monitor key business metrics
  • Perform deep-dive analyses on product and user behavior
  • Collaborate with product, engineering, and leadership teams

Interview Format (Typical)

RoundDurationFocus
Phone Screen45-60 minSQL + probability/statistics basics
SQL Round45-60 minComplex queries, window functions, optimization
Statistics / Probability45-60 minHypothesis testing, A/B testing, distributions
Case Study / Product Sense45-60 minBusiness metric design, product analysis
ML / Modeling45-60 minFeature engineering, model selection, evaluation
Behavioral / Presentation45-60 minCommunication, stakeholder management

Focus Area Allocation

Data Scientist Interview Prep Time Allocation - Stats and Probability 25%, SQL and Data 25%, ML Fundamentals 20%, Business Cases 15%, Behavioral 15%

Breakdown by Skill

Statistics and Probability (25% -- ~35 hours total)

  • Probability: Bayes theorem, conditional probability, common distributions
  • Hypothesis testing: t-tests, chi-squared, ANOVA, multiple testing correction
  • A/B testing: power analysis, sample size calculation, sequential testing
  • Bayesian thinking: priors, posteriors, Bayesian A/B testing
  • Causal inference: difference-in-differences, instrumental variables, propensity scores

SQL and Data Manipulation (25% -- ~35 hours total)

  • Complex joins, subqueries, CTEs
  • Window functions: ROW_NUMBER, RANK, LAG, LEAD, running aggregates
  • Query optimization: execution plans, indexing strategies
  • pandas: groupby, merge, pivot, time series operations

ML Fundamentals (20% -- ~28 hours total)

  • Supervised learning: regression, classification, ensemble methods
  • Feature engineering: encoding, scaling, feature selection
  • Model evaluation: metrics, cross-validation, overfitting diagnosis
  • Practical ML: when to use what model and why

Business Case Studies (15% -- ~22 hours total)

  • Metric design: defining success metrics for a product
  • Root cause analysis: diagnosing metric changes
  • Product sense: understanding user behavior and business logic
  • Communication: presenting findings to non-technical audiences

Behavioral (15% -- ~22 hours total)

  • Stakeholder management stories
  • Project impact quantification
  • Handling ambiguity and conflicting priorities
  • Communication of technical concepts to non-technical people

6-Week Schedule Overview

Data Scientist 6-Week Prep Plan - gantt-style schedule: Stats and SQL weeks 1–2, A/B Testing and ML weeks 3–4, Mocks and Behavioral weeks 5–6

Week 1: Foundations -- Statistics and SQL

Goal: Refresh statistical foundations and build SQL fluency.

Daily time: 3 hours (weekdays), 5 hours (weekends)

Monday -- Probability Fundamentals

TimeActivityDetails
Morning (45 min)SQL practice2 easy/medium SQL problems (HackerRank or LeetCode)
Lunch (20 min)ReadML Fundamentals probability section
Evening (90 min)StudyProbability rules, conditional probability, Bayes theorem, independence, common distributions (normal, binomial, Poisson, exponential)
Night (15 min)ReviewSolve 3 probability brain teasers

Probability problems to practice:

  • Given a fair coin, what is the expected number of flips to get two heads in a row?
  • A diagnostic test has 95% sensitivity and 99% specificity. If 1% of the population has the disease, what is P(disease | positive test)?
  • You roll two dice. What is P(sum = 7)?

:::tip Bayes Theorem is Your Best Friend Data Scientist interviews love Bayes theorem problems. Memorize the formula and practice until it is second nature:

P(AB)={P(BA)P(A)}{P(B)}P(A|B) = \frac\{P(B|A) \cdot P(A)\}\{P(B)\}

More importantly, develop the intuition: start with the base rate (prior), update with evidence (likelihood), and normalize. :::

Tuesday -- Statistical Distributions and Estimation

TimeActivityDetails
Morning (45 min)SQL practice2 medium SQL problems
Lunch (20 min)ReadCentral Limit Theorem and its applications
Evening (90 min)StudyCLT, confidence intervals, maximum likelihood estimation, method of moments
Night (15 min)ReviewCalculate a 95% confidence interval by hand

Wednesday -- Hypothesis Testing

TimeActivityDetails
Morning (45 min)SQL practice2 medium SQL problems (JOINs, GROUP BY)
Lunch (20 min)ReadType I and Type II errors
Evening (90 min)StudyNull and alternative hypotheses, p-values, significance level, power, t-tests (one-sample, two-sample, paired)
Night (15 min)ReviewWork through a complete hypothesis test example

:::warning Understand p-values Correctly A p-value is NOT the probability that the null hypothesis is true. It is the probability of observing data at least as extreme as what you observed, assuming the null hypothesis is true. This is a very common interview question and many candidates get it wrong. :::

Thursday -- Advanced Hypothesis Testing

TimeActivityDetails
Morning (45 min)SQL practice2 medium SQL problems (subqueries)
Lunch (20 min)ReadChi-squared tests and ANOVA
Evening (90 min)StudyChi-squared test, ANOVA, non-parametric tests (Mann-Whitney, Wilcoxon), multiple testing correction (Bonferroni, Benjamini-Hochberg)
Night (15 min)ReviewDecision tree for choosing the right statistical test

Statistical Test Chooser - decision tree from comparing means or proportions or associations to the correct test: t-test, ANOVA, Mann-Whitney, chi-squared, or correlation

Friday -- SQL: JOINs and Aggregations

TimeActivityDetails
Morning (45 min)SQL practice3 medium SQL problems
Lunch (20 min)ReadCoding Interviews SQL section
Evening (90 min)StudyINNER, LEFT, RIGHT, FULL OUTER, CROSS JOINs. GROUP BY, HAVING, DISTINCT, CASE WHEN
Night (15 min)ReviewWrite a query to find the top 3 customers by revenue per month

Saturday -- SQL: Window Functions

TimeActivityDetails
Morning (2 hrs)StudyWindow functions: ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, SUM OVER, AVG OVER, NTILE
Afternoon (2 hrs)PracticeSolve 8 window function problems
Evening (1 hr)ReviewWrite queries for running totals, moving averages, and year-over-year comparisons

:::tip Window Functions Are the SQL Interview Differentiator Basic SQL (JOINs, GROUP BY) is table stakes. Window functions separate strong candidates from average ones. Practice these patterns:

  • Running totals: SUM(revenue) OVER (ORDER BY date)
  • Month-over-month growth: LAG(metric, 1) OVER (ORDER BY month)
  • Ranking within groups: ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC)
  • Moving averages: AVG(value) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) :::

Sunday -- Week 1 Review

TimeActivityDetails
Morning (2 hrs)ReviewRedo 5 hardest SQL problems from the week
Afternoon (2 hrs)StudyReview all statistical concepts; create a cheat sheet
Evening (1 hr)PlanUpdate resume per Resume and Portfolio

:::note Week 1 Milestone Checkpoint

  • Solve Bayes theorem problems in under 3 minutes
  • Choose the correct hypothesis test for a given scenario
  • Write SQL with window functions confidently
  • Explain CLT, confidence intervals, and p-values accurately
  • Calculate sample statistics and construct confidence intervals by hand
  • Write a query with 3+ JOINs and window functions :::

Week 2: Foundations -- A/B Testing and pandas

Goal: Master A/B testing methodology and data manipulation with pandas and SQL.

Daily time: 3 hours (weekdays), 5 hours (weekends)

Monday -- A/B Testing Fundamentals

TimeActivityDetails
Morning (45 min)SQL practice2 medium SQL problems
Lunch (20 min)ReadA/B testing at tech companies
Evening (90 min)StudyA/B test design: control/treatment, randomization, sample size calculation, power analysis, guardrail metrics
Night (15 min)ReviewCalculate required sample size for an A/B test with specific parameters

Tuesday -- A/B Testing: Advanced Topics

TimeActivityDetails
Morning (45 min)SQL practice2 medium/hard SQL problems
Lunch (20 min)ReadCommon A/B testing pitfalls
Evening (90 min)StudyNetwork effects, novelty/primacy effects, multiple testing, peeking problem, sequential testing, Bayesian A/B testing
Night (15 min)ReviewList 5 reasons an A/B test result might be invalid

:::danger A/B Testing Pitfalls That Fail Candidates

  1. Peeking: Checking results before reaching required sample size inflates false positive rate
  2. Simpson's Paradox: Overall results can contradict segment-level results
  3. Survivorship bias: Only analyzing users who completed the flow
  4. Interference: Users in treatment affecting control users (network effects)
  5. Not accounting for multiple comparisons: Testing 20 metrics means ~1 false positive at alpha=0.05 :::

Wednesday -- A/B Testing Case Studies

TimeActivityDetails
Morning (45 min)SQL practice2 medium SQL problems
Lunch (20 min)ReadReal A/B test case studies from tech companies
Evening (90 min)PracticeSolve 3 A/B testing case studies: design the test, choose metrics, analyze results, make a recommendation
Night (15 min)ReviewPractice explaining your reasoning aloud

Thursday -- pandas Fundamentals

TimeActivityDetails
Morning (45 min)SQL practice2 medium SQL problems
Lunch (20 min)Readpandas vs SQL comparison
Evening (90 min)Studypandas: DataFrames, Series, indexing, filtering, groupby, merge, concat, pivot_table
Night (15 min)PracticeReplicate a SQL query in pandas

Friday -- pandas Advanced and EDA

TimeActivityDetails
Morning (45 min)SQL practice2 medium/hard SQL problems
Lunch (20 min)ReadEDA best practices
Evening (90 min)Studypandas: apply, map, time series resampling, string methods. Matplotlib/seaborn for quick visualizations
Night (15 min)PracticePerform EDA on a sample dataset (distributions, correlations, missing values)

Saturday -- End-to-End Analysis Practice

TimeActivityDetails
Morning (2.5 hrs)PracticeGiven a dataset, perform complete analysis: EDA, hypothesis formulation, statistical testing, visualization, conclusion
Afternoon (1.5 hrs)StudyCommon data patterns: seasonality, trends, cohort effects, funnel analysis
Evening (1 hr)ReviewPractice presenting your analysis in 10 minutes

Sunday -- Week 2 Review

TimeActivityDetails
Morning (2 hrs)ReviewRedo A/B testing problems; practice calculations
Afternoon (2 hrs)SQLSolve 5 hard SQL problems
Evening (1 hr)MockFirst practice: explain an analysis to a non-technical partner

:::note Week 2 Milestone Checkpoint

  • Design a complete A/B test with guardrail metrics and sample size calculation
  • Identify 5+ A/B testing pitfalls with mitigation strategies
  • Manipulate data fluently in both SQL and pandas
  • Perform end-to-end exploratory data analysis
  • Present statistical findings clearly to a non-technical audience
  • Solve hard SQL problems involving self-joins, CTEs, and window functions :::

Week 3: Core Skills -- ML Fundamentals and Feature Engineering

Goal: Master practical ML for data science: model selection, feature engineering, and evaluation.

Daily time: 3.5 hours (weekdays), 5 hours (weekends)

Monday -- Supervised Learning: Regression

TimeActivityDetails
Morning (45 min)SQL practice2 medium/hard SQL problems
Lunch (20 min)ReadML Fundamentals regression section
Evening (120 min)StudyLinear regression, polynomial regression, regularization (Ridge, Lasso, Elastic Net), assumptions, diagnostics
Night (15 min)ReviewList the assumptions of linear regression and how to check them

Tuesday -- Supervised Learning: Classification

TimeActivityDetails
Morning (45 min)SQL practice2 medium SQL problems
Lunch (20 min)ReadClassification metrics comparison
Evening (120 min)StudyLogistic regression, decision trees, random forests, gradient boosting, SVM. When to use what
Night (15 min)ReviewCreate a model selection decision tree

Wednesday -- Model Evaluation Deep Dive

TimeActivityDetails
Morning (45 min)SQL practice2 medium SQL problems
Lunch (20 min)ReadPrecision-recall trade-offs
Evening (120 min)StudyAccuracy, precision, recall, F1, AUC-ROC, AUC-PR, log loss, calibration, cross-validation strategies
Night (15 min)ReviewWhen is accuracy a misleading metric? (imbalanced classes)

:::tip The Metric Selection Question Data Scientist interviews love asking: "Which metric would you use and why?" The answer is never "accuracy." Consider:

  • Precision over recall: When false positives are costly (spam filter for important emails)
  • Recall over precision: When false negatives are costly (disease screening)
  • AUC-ROC: When you need a threshold-independent measure
  • AUC-PR: When you have heavily imbalanced data
  • Business metric: When you can directly tie model performance to dollars :::

Thursday -- Feature Engineering

TimeActivityDetails
Morning (45 min)SQL practice2 medium SQL problems
Lunch (20 min)ReadFeature engineering best practices
Evening (120 min)StudyEncoding categorical variables, handling missing data, feature scaling, feature selection (filter, wrapper, embedded methods), feature importance
Night (15 min)PracticeGiven a raw dataset description, list 10 features you would engineer

Friday -- Practical ML: End-to-End Pipeline

TimeActivityDetails
Morning (45 min)SQL practice2 medium SQL problems
Lunch (20 min)Readscikit-learn pipeline patterns
Evening (120 min)PracticeBuild an end-to-end ML pipeline: data cleaning, feature engineering, model training, hyperparameter tuning, evaluation
Night (15 min)ReviewIdentify potential data leakage in your pipeline

Saturday -- Unsupervised Learning and Dimensionality Reduction

TimeActivityDetails
Morning (2 hrs)StudyK-means, hierarchical clustering, DBSCAN, PCA, t-SNE, UMAP
Afternoon (2 hrs)PracticeApply clustering to a customer segmentation problem
Evening (1 hr)MockFirst ML mock: given a problem, propose a modeling approach (30 min)

Sunday -- Week 3 Review

TimeActivityDetails
Morning (2 hrs)ReviewRevisit all ML concepts; create a one-page cheat sheet
Afternoon (2 hrs)PracticeSolve 5 "what model would you use?" scenario questions
Evening (1 hr)BehavioralDraft 3 STAR stories about data science projects

:::note Week 3 Milestone Checkpoint

  • Select the right model for a given problem with justification
  • Explain regularization (L1, L2) and when to use each
  • Evaluate models using appropriate metrics for imbalanced data
  • Engineer features from raw data descriptions
  • Build an end-to-end ML pipeline without data leakage
  • Apply and interpret clustering results :::

Week 4: Core Skills -- Business Cases and Product Sense

Goal: Master business case study interviews and product metric design.

Daily time: 3.5 hours (weekdays), 5 hours (weekends)

Monday -- Metric Design Frameworks

TimeActivityDetails
Morning (45 min)SQL practice2 medium/hard SQL problems
Lunch (20 min)ReadProduct metric frameworks (HEART, AARRR, North Star)
Evening (120 min)StudyHow to define success metrics, counter-metrics, guardrail metrics. HEART framework (Happiness, Engagement, Adoption, Retention, Task success)
Night (15 min)PracticeDefine metrics for 3 different products

Tuesday -- Root Cause Analysis

TimeActivityDetails
Morning (45 min)SQL practice2 medium/hard SQL problems
Lunch (20 min)ReadRoot cause analysis frameworks
Evening (120 min)PracticeSolve 3 root cause analysis scenarios: "Daily active users dropped 10% week-over-week. What happened?"
Night (15 min)ReviewDevelop a systematic debugging checklist for metric drops

Metric Drop Root Cause Analysis - systematic flowchart checking data issues, external factors, product changes, and segment-specific causes for a metric decline

Wednesday -- Case Study Practice: E-Commerce

TimeActivityDetails
Morning (45 min)SQL practice2 medium SQL problems
Lunch (20 min)ReadE-commerce metrics primer
Evening (120 min)PracticeCase study: "Design the metrics for a new marketplace feature. How would you measure success? What experiment would you run?"
Night (15 min)ReviewPractice presenting your case study answer in 15 minutes

Thursday -- Case Study Practice: Social Media

TimeActivityDetails
Morning (45 min)SQL practice2 medium SQL problems
Lunch (20 min)ReadSocial media engagement metrics
Evening (120 min)PracticeCase study: "Instagram engagement is down among users aged 18-24. Diagnose the problem and propose solutions."
Night (15 min)ReviewIdentify the data you would need to support your analysis

Friday -- Causal Inference

TimeActivityDetails
Morning (45 min)SQL practice2 medium SQL problems
Lunch (20 min)ReadCausal inference overview
Evening (120 min)StudyObservational studies vs experiments, confounding variables, difference-in-differences, propensity score matching, instrumental variables, regression discontinuity
Night (15 min)ReviewExplain when you cannot run an A/B test and what alternatives exist

:::warning Not Everything Can Be A/B Tested Interviewers will test whether you know when A/B testing is inappropriate and what to do instead:

  • Ethical constraints: Cannot randomly deny a safety feature
  • Network effects: Users influence each other
  • Long-term effects: Cannot wait months for results
  • Rare events: Not enough samples for statistical power
  • No randomization possible: Historical policy changes

Alternatives: difference-in-differences, propensity score matching, instrumental variables, regression discontinuity, interrupted time series. :::

Saturday -- Business Presentation Practice

TimeActivityDetails
Morning (2.5 hrs)PracticeComplete case study: analyze a dataset, formulate insights, create a 3-slide summary, present findings
Afternoon (1.5 hrs)StudyTime series basics: trends, seasonality, decomposition, forecasting
Evening (1 hr)MockCase study mock: root cause analysis scenario (30 min)

Sunday -- Week 4 Review

TimeActivityDetails
Morning (2 hrs)ReviewRedo all case studies from the week
Afternoon (2 hrs)PracticeRapid-fire metric design: define metrics for 10 products in 30 minutes
Evening (1 hr)BehavioralAdd 2 STAR stories about business impact and stakeholder communication

:::note Week 4 Milestone Checkpoint

  • Define success metrics for any product using the HEART framework
  • Diagnose a metric change using a systematic root cause analysis approach
  • Complete a business case study in 30-45 minutes
  • Explain causal inference methods and when to use each
  • Present data findings clearly in under 10 minutes
  • Know when A/B testing is inappropriate and propose alternatives :::

Week 5: Polish -- Advanced Topics and Mock Interviews

Goal: Cover advanced DS topics, practice take-homes, and intensify mocks.

Daily time: 3.5 hours (weekdays), 5 hours (weekends)

Monday -- Advanced ML: Time Series and NLP

TimeActivityDetails
Morning (45 min)SQL practice2 hard SQL problems
Lunch (20 min)ReadDeep Learning overview (skim)
Evening (120 min)StudyTime series: ARIMA, Prophet, feature engineering for time series. Basic NLP: TF-IDF, embeddings, sentiment analysis
Night (15 min)ReviewWhen to use time series models vs ML models for forecasting

Tuesday -- System Design for Data Science

TimeActivityDetails
Morning (45 min)SQL practice2 hard SQL problems
Lunch (20 min)ReadML System Design overview
Evening (120 min)StudyDesigning analytics pipelines, dashboard architecture, real-time metrics, experimentation platforms
Night (15 min)ReviewDesign an experimentation platform at a high level

Wednesday -- Take-Home Project Practice

TimeActivityDetails
Morning (45 min)SQL practice1 hard SQL problem
Lunch (20 min)ReadTake-Home Projects
Evening (120 min)ProjectComplete a mock take-home: analyze a dataset, build a model, write up findings
Night (15 min)ReviewSelf-critique: Is your analysis rigorous? Are your conclusions justified?

Thursday -- Company Research

TimeActivityDetails
Morning (45 min)SQL practice2 company-specific SQL problems
Lunch (20 min)ReadCompany Guides
Evening (120 min)ResearchTarget company data science blog posts, products, metrics, culture
Night (15 min)NotesPrepare company-specific talking points

Friday -- Mock Interview Day

TimeActivityDetails
Morning (45 min)Warm-up1 easy SQL problem
Afternoon (3 hrs)MocksSQL mock (45 min) + statistics/probability mock (45 min) + case study mock (45 min)
Evening (30 min)DebriefCatalog weaknesses for Week 6 focus

Saturday -- Weakness Remediation

TimeActivityDetails
Morning (2.5 hrs)StudyDeep dive into weakest area from mocks
Afternoon (1.5 hrs)Practice5 targeted practice problems
Evening (1 hr)BehavioralPractice all STAR stories aloud

Sunday -- Week 5 Review

TimeActivityDetails
Morning (2 hrs)ReviewCreate comprehensive cheat sheets: statistics, SQL patterns, metric frameworks
Afternoon (2 hrs)Practice5 rapid-fire case studies (10 minutes each)
Evening (1 hr)PlanFinalize Week 6 based on remaining gaps

:::note Week 5 Milestone Checkpoint

  • Handle time series and basic NLP problems
  • Complete a take-home analysis project in under 4 hours
  • Pass SQL, statistics, and case study mocks with 7/10+ scores
  • Know target company's products, metrics, and data science culture
  • Have 6+ polished STAR stories ready
  • Handle rapid-fire case studies with structured frameworks :::

Week 6: Final Week -- Simulations, Behavioral, and Confidence

Goal: Final mock interviews, behavioral polish, and mental preparation.

Daily time: 2.5 hours (weekdays), 4 hours (weekends)

Monday -- Light Review

TimeActivityDetails
Morning (45 min)SQL2 medium problems for flow
Lunch (20 min)ReadNegotiation and Offers
Evening (60 min)ReviewSkim all cheat sheets
Night (15 min)RestLight reading

Tuesday -- Full Loop Simulation

TimeActivityDetails
Morning (45 min)Warm-up1 easy problem
Afternoon (3 hrs)MockFull simulation: SQL + stats + case study + behavioral
Evening (30 min)DebriefFinal notes

Wednesday -- Targeted Review

TimeActivityDetails
Morning (45 min)StudyWeakest area from mock
Evening (90 min)Practice3 targeted problems

Thursday -- Behavioral Final Prep

TimeActivityDetails
Morning (60 min)PracticeAll STAR stories aloud, timed
Lunch (20 min)ReadBehavioral final tips
Evening (90 min)MockFinal behavioral mock
Night (15 min)PrepQuestions to ask interviewers

Friday -- Rest

TimeActivityDetails
Morning (30 min)LogisticsConfirm schedule, test setup
Rest of dayRelaxRecharge

Weekend -- Light and Rest

Light review Saturday. Full rest Sunday.

:::note Week 6 Final Assessment

  • Can solve complex SQL problems in under 20 minutes
  • Can design and analyze an A/B test from scratch
  • Can diagnose a metric change systematically
  • Can build and evaluate an ML model for a business problem
  • Can present findings clearly to non-technical stakeholders
  • Can answer probability/statistics questions confidently
  • Have prepared questions showing genuine curiosity about the company :::

SQL Problem Categories to Master

Must-Solve Problem Types

CategoryExampleDifficulty
Funnel analysisCalculate conversion rates across stepsMedium
Retention cohortsMonthly retention by signup cohortHard
Running totalsCumulative revenue by categoryMedium
Year-over-yearCompare metrics across time periodsMedium
SessionizationGroup user events into sessionsHard
Self-joinsFind users who did A then B within 7 daysHard
Ranking within groupsTop N items per categoryMedium
Gap analysisFind periods with no activityHard
Moving averages7-day rolling average of daily metricsMedium
PercentilesMedian and P95 response timesHard

Sample SQL Problems

Problem 1: Retention Analysis Given a user_activity table with user_id, activity_date, and signup_date, calculate the Day-1, Day-7, and Day-30 retention rates by signup month.

Problem 2: Funnel Conversion Given tables page_views, add_to_cart, and purchases, calculate the conversion rate at each funnel step by device type, for the last 30 days.

Problem 3: Revenue Growth Given an orders table, calculate the month-over-month revenue growth rate, and flag months where growth exceeded 20%.

Statistics Quick Reference

Formulas You Must Know

ConceptFormulaWhen to Use
Sample meanxˉ=1nxi\bar{x} = \frac{1}{n}\sum x_iAlways
Standard errorSE=snSE = \frac{s}{\sqrt{n}}Confidence intervals, hypothesis tests
Confidence intervalxˉ±zα/2SE\bar{x} \pm z_{\alpha/2} \cdot SEEstimating population parameters
Z-test statisticz=xˉμ0SEz = \frac{\bar{x} - \mu_0}{SE}Large sample hypothesis testing
T-test statistict=xˉ1xˉ2s12n1+s22n2t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}Small sample mean comparison
Sample size (A/B)n=(zα/2+zβ)22σ2δ2n = \frac{(z_{\alpha/2} + z_\beta)^2 \cdot 2\sigma^2}{\delta^2}Planning A/B tests
Bayes theoremP(AB)=P(BA)P(A)P(B)P(A\mid B) = \frac{P(B\mid A)P(A)}{P(B)}Conditional probability problems

Distribution Quick Reference

DistributionUse CaseKey Parameter
NormalContinuous data, CLTmean, std dev
BinomialCount of successes in n trialsn, p
PoissonCount of events in a time periodlambda
ExponentialTime between eventslambda
BernoulliSingle yes/no trialp
UniformEqual probability outcomesa, b
GeometricTrials until first successp

Case Study Framework

Use this framework for every case study question:

  1. Clarify the question -- What are we trying to answer? Who is the stakeholder?
  2. Define metrics -- What does success look like? What are guardrail metrics?
  3. Formulate hypotheses -- What might explain the observed behavior?
  4. Design the analysis -- What data do you need? What methods will you use?
  5. Analyze and conclude -- Present findings with confidence levels
  6. Recommend action -- What should the business do? What are the risks?

Essential Resources

Handbook Chapters to Prioritize

PriorityChapterWhen to Study
CriticalML FundamentalsWeeks 2-4
CriticalCoding Interviews (SQL focus)Weeks 1-5
HighBehavioralWeeks 5-6
HighML System DesignWeek 5
MediumDeep LearningWeek 5 (skim)
MediumCompany GuidesWeek 5
MediumTake-Home ProjectsWeek 5
LowNegotiationWeek 6

Books

  • "Practical Statistics for Data Scientists" by Peter Bruce and Andrew Bruce
  • "Trustworthy Online Controlled Experiments" by Kohavi, Tang, and Xu
  • "Naked Statistics" by Charles Wheelan (for intuition building)

Practice Platforms

  • StrataScratch -- SQL and data science interview questions from real companies
  • Mode Analytics -- SQL practice with real datasets
  • DataLemur -- SQL interview questions by difficulty
  • Kaggle -- Datasets for analysis practice

Next Steps

You now have a complete 6-week roadmap for Data Scientist interview preparation. If this path does not match your target role, consider:

The best data scientists are not just technically strong -- they are storytellers who translate data into decisions. Start practicing both skills today.

© 2026 EngineersOfAI. All rights reserved.