Concentration Inequalities

Reading time: ~40 minutes | Interview relevance: High | Target roles: ML Engineer, AI Engineer, Research Scientist, Data Scientist

The ML Scenario That Motivates This Lesson

You train a model that achieves 95% accuracy on your test set of 1000 examples. How confident are you that the true accuracy on the full data distribution is close to 95%?

This is a question about how tightly the empirical average concentrates around its expectation. Concentration inequalities give you precise answers: "With probability at least 0.99, the true accuracy is within $\pm \epsilon$ of your test set accuracy."

Similarly: you're using mini-batch SGD with batch size 64. Why is your gradient estimate reliable? The Law of Large Numbers guarantees convergence; concentration inequalities tell you how fast.

Understanding these inequalities is essential for:

Evaluating statistical significance of benchmark results
Understanding PAC (Probably Approximately Correct) learning theory
Knowing when your test set is large enough
Explaining why larger batch sizes give more stable training

1. The Big Picture: From Bounds to Guarantees

All concentration inequalities answer the same question:

"How likely is a random variable to deviate far from its mean?"

Probability concentration around mean:

          P(X ≥ t)
             │   ←── This tail probability is what we bound
             │
  f(x)      │
    ┌───┐   │
    │   │   │   area = P(X ≥ t)
    │   │─────────────────────────►
    └───┘       t              x
     μ

Concentration inequality: P(X ≥ t) ≤ BOUND(t, μ, σ, n, ...)

Inequality	Requires	Bound Quality	Use Case
Markov	Non-negative, mean	Loose	Simple, very general
Chebyshev	Mean, variance	Moderate	Features with known variance
Hoeffding	Bounded range $[a,b]$	Tight	Bounded random variables
CLT (asymptotic)	Mean, variance, large $n$	Asymptotically exact	Normal approximation

2. Markov's Inequality

The simplest and most general concentration bound.

Statement

For a non-negative random variable $X$ and any $t > 0$ :

$P(X \geq t) \leq \frac{\mathbb{E}[X]}{t}$

Proof

$\mathbb{E}[X] = \int_0^\infty x f(x) dx \geq \int_t^\infty x f(x) dx \geq \int_t^\infty t f(x) dx = t \cdot P(X \geq t)$

Dividing both sides by $t$ : $P(X \geq t) \leq \mathbb{E}[X] / t$ .

Interpretation

Markov says: if the mean is $\mu$ , then the probability of being at least $k$ times the mean is at most $1/k$ :

$P(X \geq k\mu) \leq \frac{1}{k}$

$P(X \geq 2\mu) \leq 0.5$ (at most half the probability mass is above twice the mean)
$P(X \geq 10\mu) \leq 0.1$

ML Connection

Markov's inequality is used to bound the probability that the training loss of a randomly selected model is large:

$P(\mathcal{L}(\theta) \geq t) \leq \frac{\mathbb{E}[\mathcal{L}(\theta)]}{t}$

It is also the foundational lemma for proving Chebyshev's inequality.

import numpy as np

np.random.seed(42)
n_samples = 1_000_000

# Verify Markov's inequality empirically
# X ~ Exponential(1): E[X] = 1
X = np.random.exponential(scale=1.0, size=n_samples)

print("Markov's Inequality Verification (X ~ Exp(1), E[X]=1):")
print(f"{'t':>6} | {'P(X>=t) empirical':>20} | {'E[X]/t (bound)':>16} | {'Bound holds?':>12}")
print("-" * 65)
for t in [1, 2, 3, 5, 10]:
    empirical = (X >= t).mean()
    bound = 1.0 / t  # E[X]/t = 1/t
    print(f"{t:>6} | {empirical:>20.6f} | {bound:>16.4f} | {empirical <= bound + 1e-6}")

3. Chebyshev's Inequality

Uses both mean AND variance, giving a tighter bound than Markov.

Statement

For any random variable $X$ with mean $\mu$ and variance $\sigma^2$ , and any $k > 0$ :

$P(|X - \mu| \geq k) \leq \frac{\sigma^2}{k^2}$

Or equivalently, using $k = t\sigma$ :

$P(|X - \mu| \geq t\sigma) \leq \frac{1}{t^2}$

Proof

Apply Markov's inequality to the non-negative variable $Y = (X - \mu)^2$ :

$P(|X - \mu| \geq k) = P((X-\mu)^2 \geq k^2) \leq \frac{\mathbb{E}[(X-\mu)^2]}{k^2} = \frac{\sigma^2}{k^2}$

Comparison with the Gaussian (68-95-99.7 Rule)

Deviation	Chebyshev bound	Normal distribution
$\geq 2\sigma$	$\leq 25\%$	4.55%
$\geq 3\sigma$	$\leq 11\%$	0.27%
$\geq 5\sigma$	$\leq 4\%$	0.00006%
$\geq 10\sigma$	$\leq 1\%$	$\approx 10^{-23}$

Chebyshev is much looser - it works for any distribution, not just Gaussians.

import numpy as np

np.random.seed(42)
n_samples = 1_000_000

# Test on a non-Gaussian distribution (Poisson with lambda=4)
X = np.random.poisson(lam=4, size=n_samples)
mu    = X.mean()
sigma = X.std()

print(f"X ~ Poisson(4): mu={mu:.4f}, sigma={sigma:.4f}")
print(f"\nChebyshev's Inequality Verification:")
print(f"{'k (sigmas)':>12} | {'P(|X-mu|>=k*sigma) empirical':>30} | {'1/k^2 (bound)':>15}")
print("-" * 65)
for k in [1, 2, 3, 4, 5]:
    empirical = (np.abs(X - mu) >= k * sigma).mean()
    bound = 1.0 / k**2
    print(f"{k:>12} | {empirical:>30.6f} | {bound:>15.4f}")

Application: Confidence in Sample Mean

If $\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i$ where $X_i$ are iid with mean $\mu$ and variance $\sigma^2$ :

$\text{Var}(\bar{X}_n) = \frac{\sigma^2}{n}$

Chebyshev gives:

$P\left(|\bar{X}_n - \mu| \geq \epsilon\right) \leq \frac{\sigma^2}{n\epsilon^2}$

To achieve $P(|\bar{X}_n - \mu| \geq \epsilon) \leq \delta$ , we need:

$n \geq \frac{\sigma^2}{\delta \epsilon^2}$

This is a sample complexity bound - how many samples do you need for a reliable estimate?

import numpy as np

# How many samples needed for reliable loss estimate?
sigma2 = 0.25   # variance of per-sample loss (assumed)
epsilon = 0.01  # we want estimate within 1% of true loss
delta = 0.05    # with 5% failure probability

n_chebyshev = sigma2 / (delta * epsilon**2)
print(f"Chebyshev sample complexity:")
print(f"  sigma^2={sigma2}, epsilon={epsilon}, delta={delta}")
print(f"  Need n >= {n_chebyshev:.0f} samples")
print(f"  (Very conservative -- Hoeffding will do better)")

4. Hoeffding's Inequality

For bounded random variables, Hoeffding's inequality gives exponentially tighter bounds than Chebyshev.

Statement

Let $X_1, \ldots, X_n$ be independent random variables with $a_i \leq X_i \leq b_i$ . Let $\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i$ . Then for any $\epsilon > 0$ :

$P\left(\bar{X} - \mathbb{E}[\bar{X}] \geq \epsilon\right) \leq \exp\left(-\frac{2n^2\epsilon^2}{\sum_{i=1}^n (b_i-a_i)^2}\right)$

For iid $X_i \in [a, b]$ :

$P\left(\bar{X} - \mu \geq \epsilon\right) \leq \exp\left(-\frac{2n\epsilon^2}{(b-a)^2}\right)$

Two-sided version:

$P\left(|\bar{X} - \mu| \geq \epsilon\right) \leq 2\exp\left(-\frac{2n\epsilon^2}{(b-a)^2}\right)$

Sample Complexity

Setting the bound equal to $\delta$ and solving for $n$ :

$n \geq \frac{(b-a)^2}{2\epsilon^2} \ln\left(\frac{2}{\delta}\right)$

Compare to Chebyshev: Hoeffding's bound has a $\ln(1/\delta)$ factor, while Chebyshev has $1/\delta$ . For small $\delta$ , Hoeffding is much more efficient.

import numpy as np

# Hoeffding sample complexity
a, b = 0.0, 1.0   # bounded in [0, 1] (e.g., 0/1 loss)
epsilon = 0.01     # accuracy within 1%
delta = 0.05       # with 95% confidence

n_hoeffding = ((b - a)**2 / (2 * epsilon**2)) * np.log(2 / delta)
print(f"Hoeffding sample complexity:")
print(f"  epsilon={epsilon}, delta={delta}, range=[{a},{b}]")
print(f"  Need n >= {n_hoeffding:.0f} samples")

n_chebyshev = 0.25 / (delta * epsilon**2)  # for bounded [0,1]: sigma^2 <= 1/4
print(f"\nChebyshev sample complexity: n >= {n_chebyshev:.0f} (much worse)")

# Compare bounds for varying n
print(f"\nHoeffding bound for accuracy estimation (epsilon=0.02, [0,1] loss):")
print(f"{'n':>8} | {'Hoeffding bound P(err > 0.02)':>32}")
print("-" * 45)
epsilon = 0.02
for n in [100, 500, 1000, 5000, 10000]:
    bound = 2 * np.exp(-2 * n * epsilon**2 / (b-a)**2)
    print(f"{n:>8} | {bound:>32.6f}")

Generalization Bound via Hoeffding

In ML, the generalization error is:

$\text{gen-error} = \mathcal{L}_{\text{true}}(\theta) - \hat{\mathcal{L}}_{\text{test}}(\theta)$

For a model evaluated on $n$ test points with bounded loss in $[0, 1]$ :

$P\left(\mathcal{L}_{\text{true}} - \hat{\mathcal{L}}_{\text{test}} \geq \epsilon\right) \leq \exp(-2n\epsilon^2)$

With probability at least $1 - \delta$ :

$\mathcal{L}_{\text{true}} \leq \hat{\mathcal{L}}_{\text{test}} + \sqrt{\frac{\ln(1/\delta)}{2n}}$

import numpy as np

# Generalization bound: given test accuracy, bound true accuracy
def gen_bound(n_test, delta=0.05):
    """Upper bound on generalization gap with confidence 1-delta."""
    return np.sqrt(np.log(1/delta) / (2 * n_test))

print("Generalization bound (Hoeffding, confidence=95%):")
print(f"{'n_test':>10} | {'Bound epsilon':>15}")
print("-" * 30)
for n in [100, 500, 1000, 5000, 10000, 100000]:
    eps = gen_bound(n)
    print(f"{n:>10} | {eps:>15.4f}")

# Practical interpretation:
n = 10000
test_acc = 0.95
eps = gen_bound(n)
print(f"\nPractical: n={n}, test_acc={test_acc}")
print(f"True accuracy >= {test_acc - eps:.4f} with 95% confidence")

:::tip Hoeffding and Model Evaluation The Hoeffding bound is why having a held-out test set of at least 1000 examples is a standard recommendation. With $n=1000$ and $\delta=0.05$ : $\epsilon = \sqrt{\frac{\ln(1/0.05)}{2 \times 1000}} \approx 0.054$

So with 95% confidence, your true accuracy is within $\pm 5.4$ percentage points of the test set accuracy. With $n=10000$ , this tightens to $\pm 1.7$ percentage points. :::

5. Law of Large Numbers (LLN)

Weak Law of Large Numbers

Let $X_1, X_2, \ldots$ be iid random variables with mean $\mu < \infty$ . For any $\epsilon > 0$ :

$P\left(|\bar{X}_n - \mu| \geq \epsilon\right) \to 0 \quad \text{as } n \to \infty$

The sample average converges in probability to the true mean.

Strong Law of Large Numbers

Under slightly stronger conditions:

$P\left(\lim_{n \to \infty} \bar{X}_n = \mu\right) = 1$

The sample average converges to $\mu$ almost surely (with probability 1).

LLN in ML

ML Concept	LLN Interpretation
Training loss converges	$\hat{\mathcal{L}}_{\text{train}} \to \mathcal{L}_{\text{true}}$ as $N \to \infty$
SGD gradient is unbiased	Mini-batch gradient $\to$ true gradient as $B \to \infty$
Monte Carlo estimation	$\frac{1}{N}\sum_i g(x_i) \to \mathbb{E}[g(X)]$ as $N \to \infty$
Empirical risk minimization	Minimizing empirical risk $\to$ minimizing true risk

import numpy as np

np.random.seed(42)

# Demonstrate LLN: sample mean converges to true mean
# X ~ N(3, 4): mean=3
mu_true = 3.0
sigma    = 2.0

n_max = 10000
X = np.random.normal(mu_true, sigma, n_max)

ns = [10, 50, 100, 500, 1000, 5000, 10000]
running_means = [X[:n].mean() for n in ns]

print("Law of Large Numbers (X ~ N(3, 4)):")
print(f"True mean = {mu_true}")
print(f"\n{'n':>8} | {'Sample Mean':>12} | {'Error':>10}")
print("-" * 38)
for n, m in zip(ns, running_means):
    print(f"{n:>8} | {m:>12.6f} | {abs(m - mu_true):>10.6f}")

6. Central Limit Theorem (CLT)

Statement

Let $X_1, X_2, \ldots$ be iid random variables with mean $\mu$ and variance $\sigma^2 < \infty$ . Then:

$\frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} \xrightarrow{d} \mathcal{N}(0, 1) \quad \text{as } n \to \infty$

Or equivalently: $\bar{X}_n \approx \mathcal{N}\left(\mu, \frac{\sigma^2}{n}\right)$ for large $n$ .

This is the most profound theorem in probability: regardless of the original distribution (as long as it has finite variance), averages of many iid draws look Gaussian.

Why This Is Remarkable

Dice rolls are Uniform, but the average of 30 dice rolls is approximately Gaussian
Binomial with large $n$ is approximately Gaussian
Sample mean of any distribution with finite variance is approximately Gaussian

import numpy as np
from scipy.stats import norm, shapiro

np.random.seed(42)
n_experiments = 10000

# Starting distribution: Exponential (skewed, definitely not Gaussian)
lam = 2.0
mu_true    = 1.0 / lam   # E[X] = 1/lambda
sigma_true = 1.0 / lam   # Std[X] = 1/lambda

print("CLT Demonstration (X ~ Exponential(2), mu=0.5, sigma=0.5):")
print(f"\n{'n':>6} | {'Empirical mean':>14} | {'Empirical std':>13} | {'Shapiro p-val':>13}")
print("-" * 58)

for n in [1, 5, 30, 100]:
    samples = np.random.exponential(1/lam, size=(n_experiments, n))
    means   = samples.mean(axis=1)

    empirical_mean = means.mean()
    empirical_std  = means.std()
    _, p_val = shapiro(means[:500])  # Shapiro-Wilk test for normality

    print(f"{n:>6} | {empirical_mean:>14.6f} | {empirical_std:>13.6f} | {p_val:>13.4f}")
    theoretical_std = sigma_true / np.sqrt(n)
    print(f"{'':>6}   (expected: {mu_true:.6f}, expected std: {theoretical_std:.6f})")

CLT Applications in ML

# 1. Confidence intervals for model accuracy
# If we observe k correct out of n, the true accuracy has approximately:
# acc_hat ± z_{alpha/2} * sqrt(acc_hat * (1 - acc_hat) / n)

n_test, k_correct = 1000, 920
acc_hat = k_correct / n_test
se = np.sqrt(acc_hat * (1 - acc_hat) / n_test)   # standard error
z = 1.96  # 95% confidence interval

ci_low  = acc_hat - z * se
ci_high = acc_hat + z * se
print(f"\nAccuracy Confidence Interval (n={n_test}, correct={k_correct}):")
print(f"  Accuracy = {acc_hat:.4f}")
print(f"  95% CI = [{ci_low:.4f}, {ci_high:.4f}]")

# 2. z-test for comparing two models
def z_test_accuracy(n1, k1, n2, k2, alpha=0.05):
    """Test if two models have significantly different accuracy."""
    p1 = k1 / n1
    p2 = k2 / n2
    p_pool = (k1 + k2) / (n1 + n2)
    se = np.sqrt(p_pool * (1 - p_pool) * (1/n1 + 1/n2))
    z_stat = (p1 - p2) / se
    p_value = 2 * (1 - norm.cdf(abs(z_stat)))
    return z_stat, p_value

z_stat, p_val = z_test_accuracy(1000, 920, 1000, 900)
print(f"\nModel A: 920/1000, Model B: 900/1000")
print(f"z-statistic = {z_stat:.4f}, p-value = {p_val:.4f}")
print(f"Significant at alpha=0.05? {p_val < 0.05}")

7. Comparison of All Inequalities

             Strength of bounds (stronger = better):

Markov       ■□□□□  (uses only mean, distribution-free but very loose)
Chebyshev    ■■□□□  (uses mean + variance, polynomial decay)
Hoeffding    ■■■■□  (uses boundedness, exponential decay - much better)
CLT (exact)  ■■■■■  (uses mean + variance, asymptotically exact but approximate)

import numpy as np

# Compare bounds for P(|Xbar - mu| >= epsilon) with n=100, epsilon=0.1
# X in [0,1], mu=0.5, sigma^2=0.25

n = 100
epsilon = 0.1
mu = 0.5
sigma2 = 0.25  # max variance for [0,1]

markov_bound = mu / (mu + epsilon)   # Markov on |X - mu|: crude
chebyshev_bound = sigma2 / (n * epsilon**2)
hoeffding_bound = 2 * np.exp(-2 * n * epsilon**2 / 1.0**2)  # range=1

# CLT-based (approximate normal)
from scipy.stats import norm
sigma_xbar = np.sqrt(sigma2 / n)
clt_bound = 2 * (1 - norm.cdf(epsilon / sigma_xbar))

print("Bounds for P(|Xbar - 0.5| >= 0.1) with n=100:")
print(f"  Markov:     {markov_bound:.6f}")
print(f"  Chebyshev:  {chebyshev_bound:.6f}")
print(f"  Hoeffding:  {hoeffding_bound:.6f}")
print(f"  CLT (approx): {clt_bound:.6f}  (best approximation)")

# Simulate true value
np.random.seed(42)
n_sim = 1_000_000
samples = np.random.uniform(0, 1, (n_sim, n))
xbars = samples.mean(axis=1)
empirical = (np.abs(xbars - 0.5) >= epsilon).mean()
print(f"  Empirical:  {empirical:.6f}")

8. Interview Q&A

Q1: State Hoeffding's inequality and explain how it applies to evaluating ML models.

A: Hoeffding's inequality states that for iid random variables $X_1, \ldots, X_n$ with $X_i \in [a, b]$ , the sample average $\bar{X}_n$ satisfies: $P(|\bar{X}_n - \mu| \geq \epsilon) \leq 2\exp(-2n\epsilon^2/(b-a)^2)$ . For ML model evaluation: if we have $n$ test examples and bounded loss (e.g., 0-1 loss in $[0,1]$ ), the test loss is an average of bounded random variables. Hoeffding tells us $P(|\hat{\mathcal{L}}_{\text{test}} - \mathcal{L}_{\text{true}}| \geq \epsilon) \leq 2e^{-2n\epsilon^2}$ . Inverting: with probability $\geq 1-\delta$ , $|\hat{\mathcal{L}}_{\text{test}} - \mathcal{L}_{\text{true}}| \leq \sqrt{\ln(2/\delta)/(2n)}$ . With $n=1000$ and $\delta=0.05$ , this gives $\epsilon \approx 0.054$ - so test accuracy estimates are within $\pm 5.4\%$ with 95% confidence.

Q2: What is the Central Limit Theorem and why does it justify using Gaussian approximations in ML?

A: The CLT states that the sum (or average) of $n$ iid random variables with finite mean $\mu$ and variance $\sigma^2$ converges in distribution to $\mathcal{N}(n\mu, n\sigma^2)$ as $n \to \infty$ , regardless of the underlying distribution. This justifies Gaussian approximations in ML because: (1) the training loss is an average of per-sample losses - with large $n$ , it's approximately Gaussian; (2) weight updates (gradients) are sums of per-sample gradients - approximately Gaussian by CLT; (3) confidence intervals for model accuracy use the Gaussian approximation to the Binomial (via CLT); (4) the test set accuracy of a model is a sample mean of 0-1 losses - approximately Gaussian with std $\approx \sqrt{p(1-p)/n}$ . In practice, $n \geq 30$ is often sufficient for a reasonable Gaussian approximation, depending on the original distribution's skewness.

Q3: What is the difference between the Law of Large Numbers and the Central Limit Theorem?

A: The LLN and CLT answer different questions about sample means. The LLN answers: "Does the sample mean converge to the true mean?" - Yes, with probability 1 (strong LLN) as $n \to \infty$ . It tells us about convergence but says nothing about the speed. The CLT answers: "What is the distribution of the sample mean for finite $n$ ?" - Approximately $\mathcal{N}(\mu, \sigma^2/n)$ . The CLT gives us the rate of convergence and the shape of the distribution. Together: LLN says the sample mean will eventually be exactly right; CLT says how spread out it is along the way and what shape that spread takes. In ML: LLN justifies using empirical risk minimization (minimize training loss as a proxy for true risk); CLT tells us how to compute confidence intervals and statistical tests on those empirical estimates.

Q4: How do concentration inequalities explain why larger batch sizes help in deep learning?

A: The mini-batch gradient $g_B = \frac{1}{B}\sum_{i \in \mathcal{B}} \nabla_\theta \ell_i$ is a sample mean of per-sample gradients. By Hoeffding's inequality (assuming bounded gradients, or Chebyshev with known variance), $P(\|g_B - \nabla\mathcal{L}\| \geq \epsilon) \leq C \exp(-C' B\epsilon^2)$ - the probability of a bad gradient estimate decreases exponentially in batch size $B$ . Equivalently, the variance of the gradient estimate is $\text{Var}(g_B) = \text{Var}(g_1)/B$ - halving with each doubling of batch size. Larger batches mean: (1) lower gradient variance, (2) more reliable gradient direction, (3) ability to use larger learning rates without divergence. However, very large batches can lead to sharp minima and worse generalization - there's an empirically observed "generalization gap" with very large batches, suggesting noise from small batches has a beneficial regularization effect.

Q5: How does Markov's inequality lead to Chebyshev's inequality?

A: Markov's inequality states that for any non-negative RV $X$ : $P(X \geq t) \leq \mathbb{E}[X]/t$ . To derive Chebyshev: for any RV $X$ with mean $\mu$ and variance $\sigma^2$ , we want to bound $P(|X - \mu| \geq k)$ . Note that $|X - \mu| \geq k$ is equivalent to $(X - \mu)^2 \geq k^2$ . The variable $Y = (X-\mu)^2$ is non-negative, so we can apply Markov: $P(Y \geq k^2) \leq \mathbb{E}[Y]/k^2 = \sigma^2/k^2$ . Therefore: $P(|X - \mu| \geq k) = P((X-\mu)^2 \geq k^2) \leq \sigma^2/k^2$ . This chain - Markov applied to $(X-\mu)^2$ - is the complete proof. The moral: Chebyshev is strictly stronger than Markov because it uses more information (variance, not just mean).

:::tip 🎮 Interactive Playground

Visualize this concept: Try the Concentration Inequalities demo on the EngineersOfAI Playground - no code required.

:::

The ML Scenario That Motivates This Lesson​

1. The Big Picture: From Bounds to Guarantees​

2. Markov's Inequality​

Statement​

Proof​

Interpretation​

ML Connection​

3. Chebyshev's Inequality​

Statement​

Proof​

Comparison with the Gaussian (68-95-99.7 Rule)​

Application: Confidence in Sample Mean​

4. Hoeffding's Inequality​

Statement​

Sample Complexity​

Generalization Bound via Hoeffding​

5. Law of Large Numbers (LLN)​

Weak Law of Large Numbers​

Strong Law of Large Numbers​

LLN in ML​

6. Central Limit Theorem (CLT)​

Statement​

Why This Is Remarkable​

CLT Applications in ML​

7. Comparison of All Inequalities​

8. Interview Q&A​

The ML Scenario That Motivates This Lesson

1. The Big Picture: From Bounds to Guarantees

2. Markov's Inequality

Statement

Proof

Interpretation

ML Connection

3. Chebyshev's Inequality

Statement

Proof

Comparison with the Gaussian (68-95-99.7 Rule)

Application: Confidence in Sample Mean

4. Hoeffding's Inequality

Statement

Sample Complexity

Generalization Bound via Hoeffding

5. Law of Large Numbers (LLN)

Weak Law of Large Numbers

Strong Law of Large Numbers

LLN in ML

6. Central Limit Theorem (CLT)

Statement

Why This Is Remarkable

CLT Applications in ML

7. Comparison of All Inequalities

8. Interview Q&A