Data Scientist: 6-Week Prep Path

Reading time: ~40 min | Interview relevance: Critical | Roles: Data Scientist, Applied Scientist, Product Data Scientist, Analytics Data Scientist

The Real Interview Moment

The interviewer slides a laptop across the table. On the screen is a dataset from an e-commerce company. "We launched a new checkout flow two weeks ago. The product team says it increased revenue by 12%. The engineering team says it increased page load time by 300ms. The CEO wants to know if we should keep it. You have 45 minutes. Go."

This is the Data Scientist interview distilled to its essence. It is not about building the most sophisticated model or writing the most elegant code. It is about translating messy business questions into rigorous statistical analyses, and then communicating findings in a way that drives decisions.

The Data Scientist interview is unique because it tests something that no other AI/ML role tests as heavily: your ability to think critically about data and communicate insights to non-technical stakeholders. You need statistics, SQL, ML, and business sense -- all working together.

This 6-week plan will prepare you for every dimension.

Role Overview

What Data Scientists Do

Data Scientists extract insights from data to drive business decisions. They:

Design and analyze A/B tests and experiments
Build predictive models for business outcomes
Create dashboards and reports for stakeholders
Define and monitor key business metrics
Perform deep-dive analyses on product and user behavior
Collaborate with product, engineering, and leadership teams

Interview Format (Typical)

Round	Duration	Focus
Phone Screen	45-60 min	SQL + probability/statistics basics
SQL Round	45-60 min	Complex queries, window functions, optimization
Statistics / Probability	45-60 min	Hypothesis testing, A/B testing, distributions
Case Study / Product Sense	45-60 min	Business metric design, product analysis
ML / Modeling	45-60 min	Feature engineering, model selection, evaluation
Behavioral / Presentation	45-60 min	Communication, stakeholder management

Focus Area Allocation

Data Scientist Interview Prep Time Allocation - Stats and Probability 25%, SQL and Data 25%, ML Fundamentals 20%, Business Cases 15%, Behavioral 15%

Breakdown by Skill

Statistics and Probability (25% -- ~35 hours total)

Probability: Bayes theorem, conditional probability, common distributions
Hypothesis testing: t-tests, chi-squared, ANOVA, multiple testing correction
A/B testing: power analysis, sample size calculation, sequential testing
Bayesian thinking: priors, posteriors, Bayesian A/B testing
Causal inference: difference-in-differences, instrumental variables, propensity scores

SQL and Data Manipulation (25% -- ~35 hours total)

Complex joins, subqueries, CTEs
Window functions: ROW_NUMBER, RANK, LAG, LEAD, running aggregates
Query optimization: execution plans, indexing strategies
pandas: groupby, merge, pivot, time series operations

ML Fundamentals (20% -- ~28 hours total)

Supervised learning: regression, classification, ensemble methods
Feature engineering: encoding, scaling, feature selection
Model evaluation: metrics, cross-validation, overfitting diagnosis
Practical ML: when to use what model and why

Business Case Studies (15% -- ~22 hours total)

Metric design: defining success metrics for a product
Root cause analysis: diagnosing metric changes
Product sense: understanding user behavior and business logic
Communication: presenting findings to non-technical audiences

Behavioral (15% -- ~22 hours total)

Stakeholder management stories
Project impact quantification
Handling ambiguity and conflicting priorities
Communication of technical concepts to non-technical people

6-Week Schedule Overview

Data Scientist 6-Week Prep Plan - gantt-style schedule: Stats and SQL weeks 1–2, A/B Testing and ML weeks 3–4, Mocks and Behavioral weeks 5–6

Week 1: Foundations -- Statistics and SQL

Goal: Refresh statistical foundations and build SQL fluency.

Daily time: 3 hours (weekdays), 5 hours (weekends)

Monday -- Probability Fundamentals

Time	Activity	Details
Morning (45 min)	SQL practice	2 easy/medium SQL problems (HackerRank or LeetCode)
Lunch (20 min)	Read	ML Fundamentals probability section
Evening (90 min)	Study	Probability rules, conditional probability, Bayes theorem, independence, common distributions (normal, binomial, Poisson, exponential)
Night (15 min)	Review	Solve 3 probability brain teasers

Probability problems to practice:

Given a fair coin, what is the expected number of flips to get two heads in a row?
A diagnostic test has 95% sensitivity and 99% specificity. If 1% of the population has the disease, what is P(disease | positive test)?
You roll two dice. What is P(sum = 7)?

:::tip Bayes Theorem is Your Best Friend Data Scientist interviews love Bayes theorem problems. Memorize the formula and practice until it is second nature:

$P(A|B) = \frac\{P(B|A) \cdot P(A)\}\{P(B)\}$

More importantly, develop the intuition: start with the base rate (prior), update with evidence (likelihood), and normalize. :::

Tuesday -- Statistical Distributions and Estimation

Time	Activity	Details
Morning (45 min)	SQL practice	2 medium SQL problems
Lunch (20 min)	Read	Central Limit Theorem and its applications
Evening (90 min)	Study	CLT, confidence intervals, maximum likelihood estimation, method of moments
Night (15 min)	Review	Calculate a 95% confidence interval by hand

Wednesday -- Hypothesis Testing

Time	Activity	Details
Morning (45 min)	SQL practice	2 medium SQL problems (JOINs, GROUP BY)
Lunch (20 min)	Read	Type I and Type II errors
Evening (90 min)	Study	Null and alternative hypotheses, p-values, significance level, power, t-tests (one-sample, two-sample, paired)
Night (15 min)	Review	Work through a complete hypothesis test example

:::warning Understand p-values Correctly A p-value is NOT the probability that the null hypothesis is true. It is the probability of observing data at least as extreme as what you observed, assuming the null hypothesis is true. This is a very common interview question and many candidates get it wrong. :::

Thursday -- Advanced Hypothesis Testing

Time	Activity	Details
Morning (45 min)	SQL practice	2 medium SQL problems (subqueries)
Lunch (20 min)	Read	Chi-squared tests and ANOVA
Evening (90 min)	Study	Chi-squared test, ANOVA, non-parametric tests (Mann-Whitney, Wilcoxon), multiple testing correction (Bonferroni, Benjamini-Hochberg)
Night (15 min)	Review	Decision tree for choosing the right statistical test

Statistical Test Chooser - decision tree from comparing means or proportions or associations to the correct test: t-test, ANOVA, Mann-Whitney, chi-squared, or correlation

Friday -- SQL: JOINs and Aggregations

Time	Activity	Details
Morning (45 min)	SQL practice	3 medium SQL problems
Lunch (20 min)	Read	Coding Interviews SQL section
Evening (90 min)	Study	INNER, LEFT, RIGHT, FULL OUTER, CROSS JOINs. GROUP BY, HAVING, DISTINCT, CASE WHEN
Night (15 min)	Review	Write a query to find the top 3 customers by revenue per month

Saturday -- SQL: Window Functions

Time	Activity	Details
Morning (2 hrs)	Study	Window functions: ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, SUM OVER, AVG OVER, NTILE
Afternoon (2 hrs)	Practice	Solve 8 window function problems
Evening (1 hr)	Review	Write queries for running totals, moving averages, and year-over-year comparisons

:::tip Window Functions Are the SQL Interview Differentiator Basic SQL (JOINs, GROUP BY) is table stakes. Window functions separate strong candidates from average ones. Practice these patterns:

Running totals: SUM(revenue) OVER (ORDER BY date)
Month-over-month growth: LAG(metric, 1) OVER (ORDER BY month)
Ranking within groups: ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC)
Moving averages: AVG(value) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) :::

Sunday -- Week 1 Review

Time	Activity	Details
Morning (2 hrs)	Review	Redo 5 hardest SQL problems from the week
Afternoon (2 hrs)	Study	Review all statistical concepts; create a cheat sheet
Evening (1 hr)	Plan	Update resume per Resume and Portfolio

:::note Week 1 Milestone Checkpoint

Solve Bayes theorem problems in under 3 minutes
Choose the correct hypothesis test for a given scenario
Write SQL with window functions confidently
Explain CLT, confidence intervals, and p-values accurately
Calculate sample statistics and construct confidence intervals by hand
Write a query with 3+ JOINs and window functions :::

Week 2: Foundations -- A/B Testing and pandas

Goal: Master A/B testing methodology and data manipulation with pandas and SQL.

Daily time: 3 hours (weekdays), 5 hours (weekends)

Monday -- A/B Testing Fundamentals

Time	Activity	Details
Morning (45 min)	SQL practice	2 medium SQL problems
Lunch (20 min)	Read	A/B testing at tech companies
Evening (90 min)	Study	A/B test design: control/treatment, randomization, sample size calculation, power analysis, guardrail metrics
Night (15 min)	Review	Calculate required sample size for an A/B test with specific parameters

Tuesday -- A/B Testing: Advanced Topics

Time	Activity	Details
Morning (45 min)	SQL practice	2 medium/hard SQL problems
Lunch (20 min)	Read	Common A/B testing pitfalls
Evening (90 min)	Study	Network effects, novelty/primacy effects, multiple testing, peeking problem, sequential testing, Bayesian A/B testing
Night (15 min)	Review	List 5 reasons an A/B test result might be invalid

:::danger A/B Testing Pitfalls That Fail Candidates

Peeking: Checking results before reaching required sample size inflates false positive rate
Simpson's Paradox: Overall results can contradict segment-level results
Survivorship bias: Only analyzing users who completed the flow
Interference: Users in treatment affecting control users (network effects)
Not accounting for multiple comparisons: Testing 20 metrics means ~1 false positive at alpha=0.05 :::

Wednesday -- A/B Testing Case Studies

Time	Activity	Details
Morning (45 min)	SQL practice	2 medium SQL problems
Lunch (20 min)	Read	Real A/B test case studies from tech companies
Evening (90 min)	Practice	Solve 3 A/B testing case studies: design the test, choose metrics, analyze results, make a recommendation
Night (15 min)	Review	Practice explaining your reasoning aloud

Thursday -- pandas Fundamentals

Time	Activity	Details
Morning (45 min)	SQL practice	2 medium SQL problems
Lunch (20 min)	Read	pandas vs SQL comparison
Evening (90 min)	Study	pandas: DataFrames, Series, indexing, filtering, groupby, merge, concat, pivot_table
Night (15 min)	Practice	Replicate a SQL query in pandas

Friday -- pandas Advanced and EDA

Time	Activity	Details
Morning (45 min)	SQL practice	2 medium/hard SQL problems
Lunch (20 min)	Read	EDA best practices
Evening (90 min)	Study	pandas: apply, map, time series resampling, string methods. Matplotlib/seaborn for quick visualizations
Night (15 min)	Practice	Perform EDA on a sample dataset (distributions, correlations, missing values)

Saturday -- End-to-End Analysis Practice

Time	Activity	Details
Morning (2.5 hrs)	Practice	Given a dataset, perform complete analysis: EDA, hypothesis formulation, statistical testing, visualization, conclusion
Afternoon (1.5 hrs)	Study	Common data patterns: seasonality, trends, cohort effects, funnel analysis
Evening (1 hr)	Review	Practice presenting your analysis in 10 minutes

Sunday -- Week 2 Review

Time	Activity	Details
Morning (2 hrs)	Review	Redo A/B testing problems; practice calculations
Afternoon (2 hrs)	SQL	Solve 5 hard SQL problems
Evening (1 hr)	Mock	First practice: explain an analysis to a non-technical partner

:::note Week 2 Milestone Checkpoint

Design a complete A/B test with guardrail metrics and sample size calculation
Identify 5+ A/B testing pitfalls with mitigation strategies
Manipulate data fluently in both SQL and pandas
Perform end-to-end exploratory data analysis
Present statistical findings clearly to a non-technical audience
Solve hard SQL problems involving self-joins, CTEs, and window functions :::

Week 3: Core Skills -- ML Fundamentals and Feature Engineering

Goal: Master practical ML for data science: model selection, feature engineering, and evaluation.

Daily time: 3.5 hours (weekdays), 5 hours (weekends)

Monday -- Supervised Learning: Regression

Time	Activity	Details
Morning (45 min)	SQL practice	2 medium/hard SQL problems
Lunch (20 min)	Read	ML Fundamentals regression section
Evening (120 min)	Study	Linear regression, polynomial regression, regularization (Ridge, Lasso, Elastic Net), assumptions, diagnostics
Night (15 min)	Review	List the assumptions of linear regression and how to check them

Tuesday -- Supervised Learning: Classification

Time	Activity	Details
Morning (45 min)	SQL practice	2 medium SQL problems
Lunch (20 min)	Read	Classification metrics comparison
Evening (120 min)	Study	Logistic regression, decision trees, random forests, gradient boosting, SVM. When to use what
Night (15 min)	Review	Create a model selection decision tree

Wednesday -- Model Evaluation Deep Dive

Time	Activity	Details
Morning (45 min)	SQL practice	2 medium SQL problems
Lunch (20 min)	Read	Precision-recall trade-offs
Evening (120 min)	Study	Accuracy, precision, recall, F1, AUC-ROC, AUC-PR, log loss, calibration, cross-validation strategies
Night (15 min)	Review	When is accuracy a misleading metric? (imbalanced classes)

:::tip The Metric Selection Question Data Scientist interviews love asking: "Which metric would you use and why?" The answer is never "accuracy." Consider:

Precision over recall: When false positives are costly (spam filter for important emails)
Recall over precision: When false negatives are costly (disease screening)
AUC-ROC: When you need a threshold-independent measure
AUC-PR: When you have heavily imbalanced data
Business metric: When you can directly tie model performance to dollars :::

Thursday -- Feature Engineering

Time	Activity	Details
Morning (45 min)	SQL practice	2 medium SQL problems
Lunch (20 min)	Read	Feature engineering best practices
Evening (120 min)	Study	Encoding categorical variables, handling missing data, feature scaling, feature selection (filter, wrapper, embedded methods), feature importance
Night (15 min)	Practice	Given a raw dataset description, list 10 features you would engineer

Friday -- Practical ML: End-to-End Pipeline

Time	Activity	Details
Morning (45 min)	SQL practice	2 medium SQL problems
Lunch (20 min)	Read	scikit-learn pipeline patterns
Evening (120 min)	Practice	Build an end-to-end ML pipeline: data cleaning, feature engineering, model training, hyperparameter tuning, evaluation
Night (15 min)	Review	Identify potential data leakage in your pipeline

Saturday -- Unsupervised Learning and Dimensionality Reduction

Time	Activity	Details
Morning (2 hrs)	Study	K-means, hierarchical clustering, DBSCAN, PCA, t-SNE, UMAP
Afternoon (2 hrs)	Practice	Apply clustering to a customer segmentation problem
Evening (1 hr)	Mock	First ML mock: given a problem, propose a modeling approach (30 min)

Sunday -- Week 3 Review

Time	Activity	Details
Morning (2 hrs)	Review	Revisit all ML concepts; create a one-page cheat sheet
Afternoon (2 hrs)	Practice	Solve 5 "what model would you use?" scenario questions
Evening (1 hr)	Behavioral	Draft 3 STAR stories about data science projects

:::note Week 3 Milestone Checkpoint

Select the right model for a given problem with justification
Explain regularization (L1, L2) and when to use each
Evaluate models using appropriate metrics for imbalanced data
Engineer features from raw data descriptions
Build an end-to-end ML pipeline without data leakage
Apply and interpret clustering results :::

Week 4: Core Skills -- Business Cases and Product Sense

Goal: Master business case study interviews and product metric design.

Daily time: 3.5 hours (weekdays), 5 hours (weekends)

Monday -- Metric Design Frameworks

Time	Activity	Details
Morning (45 min)	SQL practice	2 medium/hard SQL problems
Lunch (20 min)	Read	Product metric frameworks (HEART, AARRR, North Star)
Evening (120 min)	Study	How to define success metrics, counter-metrics, guardrail metrics. HEART framework (Happiness, Engagement, Adoption, Retention, Task success)
Night (15 min)	Practice	Define metrics for 3 different products

Tuesday -- Root Cause Analysis

Time	Activity	Details
Morning (45 min)	SQL practice	2 medium/hard SQL problems
Lunch (20 min)	Read	Root cause analysis frameworks
Evening (120 min)	Practice	Solve 3 root cause analysis scenarios: "Daily active users dropped 10% week-over-week. What happened?"
Night (15 min)	Review	Develop a systematic debugging checklist for metric drops

Metric Drop Root Cause Analysis - systematic flowchart checking data issues, external factors, product changes, and segment-specific causes for a metric decline

Wednesday -- Case Study Practice: E-Commerce

Time	Activity	Details
Morning (45 min)	SQL practice	2 medium SQL problems
Lunch (20 min)	Read	E-commerce metrics primer
Evening (120 min)	Practice	Case study: "Design the metrics for a new marketplace feature. How would you measure success? What experiment would you run?"
Night (15 min)	Review	Practice presenting your case study answer in 15 minutes

Time	Activity	Details
Morning (45 min)	SQL practice	2 medium SQL problems
Lunch (20 min)	Read	Social media engagement metrics
Evening (120 min)	Practice	Case study: "Instagram engagement is down among users aged 18-24. Diagnose the problem and propose solutions."
Night (15 min)	Review	Identify the data you would need to support your analysis

Friday -- Causal Inference

Time	Activity	Details
Morning (45 min)	SQL practice	2 medium SQL problems
Lunch (20 min)	Read	Causal inference overview
Evening (120 min)	Study	Observational studies vs experiments, confounding variables, difference-in-differences, propensity score matching, instrumental variables, regression discontinuity
Night (15 min)	Review	Explain when you cannot run an A/B test and what alternatives exist

:::warning Not Everything Can Be A/B Tested Interviewers will test whether you know when A/B testing is inappropriate and what to do instead:

Ethical constraints: Cannot randomly deny a safety feature
Network effects: Users influence each other
Long-term effects: Cannot wait months for results
Rare events: Not enough samples for statistical power
No randomization possible: Historical policy changes

Alternatives: difference-in-differences, propensity score matching, instrumental variables, regression discontinuity, interrupted time series. :::

Saturday -- Business Presentation Practice

Time	Activity	Details
Morning (2.5 hrs)	Practice	Complete case study: analyze a dataset, formulate insights, create a 3-slide summary, present findings
Afternoon (1.5 hrs)	Study	Time series basics: trends, seasonality, decomposition, forecasting
Evening (1 hr)	Mock	Case study mock: root cause analysis scenario (30 min)

Sunday -- Week 4 Review

Time	Activity	Details
Morning (2 hrs)	Review	Redo all case studies from the week
Afternoon (2 hrs)	Practice	Rapid-fire metric design: define metrics for 10 products in 30 minutes
Evening (1 hr)	Behavioral	Add 2 STAR stories about business impact and stakeholder communication

:::note Week 4 Milestone Checkpoint

Define success metrics for any product using the HEART framework
Diagnose a metric change using a systematic root cause analysis approach
Complete a business case study in 30-45 minutes
Explain causal inference methods and when to use each
Present data findings clearly in under 10 minutes
Know when A/B testing is inappropriate and propose alternatives :::

Week 5: Polish -- Advanced Topics and Mock Interviews

Goal: Cover advanced DS topics, practice take-homes, and intensify mocks.

Daily time: 3.5 hours (weekdays), 5 hours (weekends)

Monday -- Advanced ML: Time Series and NLP

Time	Activity	Details
Morning (45 min)	SQL practice	2 hard SQL problems
Lunch (20 min)	Read	Deep Learning overview (skim)
Evening (120 min)	Study	Time series: ARIMA, Prophet, feature engineering for time series. Basic NLP: TF-IDF, embeddings, sentiment analysis
Night (15 min)	Review	When to use time series models vs ML models for forecasting

Tuesday -- System Design for Data Science

Time	Activity	Details
Morning (45 min)	SQL practice	2 hard SQL problems
Lunch (20 min)	Read	ML System Design overview
Evening (120 min)	Study	Designing analytics pipelines, dashboard architecture, real-time metrics, experimentation platforms
Night (15 min)	Review	Design an experimentation platform at a high level

Wednesday -- Take-Home Project Practice

Time	Activity	Details
Morning (45 min)	SQL practice	1 hard SQL problem
Lunch (20 min)	Read	Take-Home Projects
Evening (120 min)	Project	Complete a mock take-home: analyze a dataset, build a model, write up findings
Night (15 min)	Review	Self-critique: Is your analysis rigorous? Are your conclusions justified?

Thursday -- Company Research

Time	Activity	Details
Morning (45 min)	SQL practice	2 company-specific SQL problems
Lunch (20 min)	Read	Company Guides
Evening (120 min)	Research	Target company data science blog posts, products, metrics, culture
Night (15 min)	Notes	Prepare company-specific talking points

Friday -- Mock Interview Day

Time	Activity	Details
Morning (45 min)	Warm-up	1 easy SQL problem
Afternoon (3 hrs)	Mocks	SQL mock (45 min) + statistics/probability mock (45 min) + case study mock (45 min)
Evening (30 min)	Debrief	Catalog weaknesses for Week 6 focus

Saturday -- Weakness Remediation

Time	Activity	Details
Morning (2.5 hrs)	Study	Deep dive into weakest area from mocks
Afternoon (1.5 hrs)	Practice	5 targeted practice problems
Evening (1 hr)	Behavioral	Practice all STAR stories aloud

Sunday -- Week 5 Review

Time	Activity	Details
Morning (2 hrs)	Review	Create comprehensive cheat sheets: statistics, SQL patterns, metric frameworks
Afternoon (2 hrs)	Practice	5 rapid-fire case studies (10 minutes each)
Evening (1 hr)	Plan	Finalize Week 6 based on remaining gaps

:::note Week 5 Milestone Checkpoint

Handle time series and basic NLP problems
Complete a take-home analysis project in under 4 hours
Pass SQL, statistics, and case study mocks with 7/10+ scores
Know target company's products, metrics, and data science culture
Have 6+ polished STAR stories ready
Handle rapid-fire case studies with structured frameworks :::

Week 6: Final Week -- Simulations, Behavioral, and Confidence

Goal: Final mock interviews, behavioral polish, and mental preparation.

Daily time: 2.5 hours (weekdays), 4 hours (weekends)

Monday -- Light Review

Time	Activity	Details
Morning (45 min)	SQL	2 medium problems for flow
Lunch (20 min)	Read	Negotiation and Offers
Evening (60 min)	Review	Skim all cheat sheets
Night (15 min)	Rest	Light reading

Tuesday -- Full Loop Simulation

Time	Activity	Details
Morning (45 min)	Warm-up	1 easy problem
Afternoon (3 hrs)	Mock	Full simulation: SQL + stats + case study + behavioral
Evening (30 min)	Debrief	Final notes

Wednesday -- Targeted Review

Time	Activity	Details
Morning (45 min)	Study	Weakest area from mock
Evening (90 min)	Practice	3 targeted problems

Thursday -- Behavioral Final Prep

Time	Activity	Details
Morning (60 min)	Practice	All STAR stories aloud, timed
Lunch (20 min)	Read	Behavioral final tips
Evening (90 min)	Mock	Final behavioral mock
Night (15 min)	Prep	Questions to ask interviewers

Friday -- Rest

Time	Activity	Details
Morning (30 min)	Logistics	Confirm schedule, test setup
Rest of day	Relax	Recharge

Weekend -- Light and Rest

Light review Saturday. Full rest Sunday.

:::note Week 6 Final Assessment

Can solve complex SQL problems in under 20 minutes
Can design and analyze an A/B test from scratch
Can diagnose a metric change systematically
Can build and evaluate an ML model for a business problem
Can present findings clearly to non-technical stakeholders
Can answer probability/statistics questions confidently
Have prepared questions showing genuine curiosity about the company :::

SQL Problem Categories to Master

Must-Solve Problem Types

Category	Example	Difficulty
Funnel analysis	Calculate conversion rates across steps	Medium
Retention cohorts	Monthly retention by signup cohort	Hard
Running totals	Cumulative revenue by category	Medium
Year-over-year	Compare metrics across time periods	Medium
Sessionization	Group user events into sessions	Hard
Self-joins	Find users who did A then B within 7 days	Hard
Ranking within groups	Top N items per category	Medium
Gap analysis	Find periods with no activity	Hard
Moving averages	7-day rolling average of daily metrics	Medium
Percentiles	Median and P95 response times	Hard

Sample SQL Problems

Problem 1: Retention Analysis Given a user_activity table with user_id, activity_date, and signup_date, calculate the Day-1, Day-7, and Day-30 retention rates by signup month.

Problem 2: Funnel Conversion Given tables page_views, add_to_cart, and purchases, calculate the conversion rate at each funnel step by device type, for the last 30 days.

Problem 3: Revenue Growth Given an orders table, calculate the month-over-month revenue growth rate, and flag months where growth exceeded 20%.

Statistics Quick Reference

Formulas You Must Know

Concept	Formula	When to Use
Sample mean	$\bar{x} = \frac{1}{n}\sum x_i$	Always
Standard error	$SE = \frac{s}{\sqrt{n}}$	Confidence intervals, hypothesis tests
Confidence interval	$\bar{x} \pm z_{\alpha/2} \cdot SE$	Estimating population parameters
Z-test statistic	$z = \frac{\bar{x} - \mu_0}{SE}$	Large sample hypothesis testing
T-test statistic	$t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}$	Small sample mean comparison
Sample size (A/B)	$n = \frac{(z_{\alpha/2} + z_\beta)^2 \cdot 2\sigma^2}{\delta^2}$	Planning A/B tests
Bayes theorem	$P(A\mid B) = \frac{P(B\mid A)P(A)}{P(B)}$	Conditional probability problems

Distribution Quick Reference

Distribution	Use Case	Key Parameter
Normal	Continuous data, CLT	mean, std dev
Binomial	Count of successes in n trials	n, p
Poisson	Count of events in a time period	lambda
Exponential	Time between events	lambda
Bernoulli	Single yes/no trial	p
Uniform	Equal probability outcomes	a, b
Geometric	Trials until first success	p

Case Study Framework

Use this framework for every case study question:

Clarify the question -- What are we trying to answer? Who is the stakeholder?
Define metrics -- What does success look like? What are guardrail metrics?
Formulate hypotheses -- What might explain the observed behavior?
Design the analysis -- What data do you need? What methods will you use?
Analyze and conclude -- Present findings with confidence levels
Recommend action -- What should the business do? What are the risks?

Essential Resources

Handbook Chapters to Prioritize

Priority	Chapter	When to Study
Critical	ML Fundamentals	Weeks 2-4
Critical	Coding Interviews (SQL focus)	Weeks 1-5
High	Behavioral	Weeks 5-6
High	ML System Design	Week 5
Medium	Deep Learning	Week 5 (skim)
Medium	Company Guides	Week 5
Medium	Take-Home Projects	Week 5
Low	Negotiation	Week 6

Books

"Practical Statistics for Data Scientists" by Peter Bruce and Andrew Bruce
"Trustworthy Online Controlled Experiments" by Kohavi, Tang, and Xu
"Naked Statistics" by Charles Wheelan (for intuition building)

Practice Platforms

StrataScratch -- SQL and data science interview questions from real companies
Mode Analytics -- SQL practice with real datasets
DataLemur -- SQL interview questions by difficulty
Kaggle -- Datasets for analysis practice

Next Steps

You now have a complete 6-week roadmap for Data Scientist interview preparation. If this path does not match your target role, consider:

MLE Prep Path -- If your role requires more model building and engineering
AI Engineer Prep Path -- If your role focuses on LLM applications
Data Engineer Prep Path -- If your role emphasizes data infrastructure over analysis

The best data scientists are not just technically strong -- they are storytellers who translate data into decisions. Start practicing both skills today.

The Real Interview Moment​

Role Overview​

What Data Scientists Do​

Interview Format (Typical)​

Focus Area Allocation​

Breakdown by Skill​

6-Week Schedule Overview​

Week 1: Foundations -- Statistics and SQL​

Monday -- Probability Fundamentals​

Tuesday -- Statistical Distributions and Estimation​

Wednesday -- Hypothesis Testing​

Thursday -- Advanced Hypothesis Testing​

Friday -- SQL: JOINs and Aggregations​

Saturday -- SQL: Window Functions​

Sunday -- Week 1 Review​

Week 2: Foundations -- A/B Testing and pandas​

Monday -- A/B Testing Fundamentals​

Tuesday -- A/B Testing: Advanced Topics​

Wednesday -- A/B Testing Case Studies​

Thursday -- pandas Fundamentals​

Friday -- pandas Advanced and EDA​

Saturday -- End-to-End Analysis Practice​

Sunday -- Week 2 Review​

Week 3: Core Skills -- ML Fundamentals and Feature Engineering​

Monday -- Supervised Learning: Regression​

Tuesday -- Supervised Learning: Classification​

Wednesday -- Model Evaluation Deep Dive​

Thursday -- Feature Engineering​

Friday -- Practical ML: End-to-End Pipeline​

Saturday -- Unsupervised Learning and Dimensionality Reduction​

Sunday -- Week 3 Review​

Week 4: Core Skills -- Business Cases and Product Sense​

Monday -- Metric Design Frameworks​

Tuesday -- Root Cause Analysis​

Wednesday -- Case Study Practice: E-Commerce​

Thursday -- Case Study Practice: Social Media​

Friday -- Causal Inference​

Saturday -- Business Presentation Practice​

Sunday -- Week 4 Review​

Week 5: Polish -- Advanced Topics and Mock Interviews​

Monday -- Advanced ML: Time Series and NLP​

Tuesday -- System Design for Data Science​

Wednesday -- Take-Home Project Practice​

Thursday -- Company Research​

Friday -- Mock Interview Day​

Saturday -- Weakness Remediation​

Sunday -- Week 5 Review​

Week 6: Final Week -- Simulations, Behavioral, and Confidence​

Monday -- Light Review​

Tuesday -- Full Loop Simulation​

Wednesday -- Targeted Review​

Thursday -- Behavioral Final Prep​

Friday -- Rest​

Weekend -- Light and Rest​

SQL Problem Categories to Master​

Must-Solve Problem Types​

Sample SQL Problems​

Statistics Quick Reference​

Formulas You Must Know​

Distribution Quick Reference​

Case Study Framework​

Essential Resources​

Handbook Chapters to Prioritize​

Books​

Practice Platforms​

Next Steps​

The Real Interview Moment

Role Overview

What Data Scientists Do

Interview Format (Typical)

Focus Area Allocation

Breakdown by Skill

6-Week Schedule Overview

Week 1: Foundations -- Statistics and SQL

Monday -- Probability Fundamentals

Tuesday -- Statistical Distributions and Estimation

Wednesday -- Hypothesis Testing

Thursday -- Advanced Hypothesis Testing

Friday -- SQL: JOINs and Aggregations

Saturday -- SQL: Window Functions

Sunday -- Week 1 Review

Week 2: Foundations -- A/B Testing and pandas

Monday -- A/B Testing Fundamentals

Tuesday -- A/B Testing: Advanced Topics

Wednesday -- A/B Testing Case Studies

Thursday -- pandas Fundamentals

Friday -- pandas Advanced and EDA

Saturday -- End-to-End Analysis Practice

Sunday -- Week 2 Review

Week 3: Core Skills -- ML Fundamentals and Feature Engineering

Monday -- Supervised Learning: Regression

Tuesday -- Supervised Learning: Classification

Wednesday -- Model Evaluation Deep Dive

Thursday -- Feature Engineering

Friday -- Practical ML: End-to-End Pipeline

Saturday -- Unsupervised Learning and Dimensionality Reduction

Sunday -- Week 3 Review

Week 4: Core Skills -- Business Cases and Product Sense

Monday -- Metric Design Frameworks

Tuesday -- Root Cause Analysis

Wednesday -- Case Study Practice: E-Commerce

Thursday -- Case Study Practice: Social Media

Friday -- Causal Inference

Saturday -- Business Presentation Practice

Sunday -- Week 4 Review

Week 5: Polish -- Advanced Topics and Mock Interviews

Monday -- Advanced ML: Time Series and NLP

Tuesday -- System Design for Data Science

Wednesday -- Take-Home Project Practice

Thursday -- Company Research

Friday -- Mock Interview Day

Saturday -- Weakness Remediation

Sunday -- Week 5 Review

Week 6: Final Week -- Simulations, Behavioral, and Confidence

Monday -- Light Review

Tuesday -- Full Loop Simulation

Wednesday -- Targeted Review

Thursday -- Behavioral Final Prep

Friday -- Rest

Weekend -- Light and Rest

SQL Problem Categories to Master

Must-Solve Problem Types

Sample SQL Problems

Statistics Quick Reference

Formulas You Must Know

Distribution Quick Reference

Case Study Framework

Essential Resources

Handbook Chapters to Prioritize

Books

Practice Platforms

Next Steps