Skip to main content

8 docs tagged with "experimentation"

View all tags

Counterfactual Evaluation

Evaluate new ML policies using logged data from an old policy - inverse propensity scoring, doubly robust estimators, and offline policy evaluation for when A/B tests are too expensive.

Experimentation Platforms

Build and operate ML experimentation infrastructure - assignment services, metric computation pipelines, analysis tools, and the engineering required to scale from 3 to 30 experiments per month.

Interleaving Experiments

Use interleaving to compare ranking models with 10-25x better sensitivity than A/B tests - the technique behind fast iteration at search and recommendation companies.

Multi-Armed Bandits

Use multi-armed bandit algorithms to adaptively allocate traffic during experiments - learning faster than A/B tests while reducing regret.

Online Controlled Experiments

Design valid ML experiments by choosing the right randomization unit, handling network effects, detecting novelty, and managing holdout sets.

Shadow Mode Testing

Run new ML models against live production traffic without affecting users - catching silent failures, latency regressions, and behavioral differences before go-live.

Statistical Foundations for A/B Testing

Learn the statistical machinery behind A/B testing - null hypotheses, p-values, power, sample size calculation, and the mistakes that invalidate ML experiments.