Stationarity and Ergodicity - The Prerequisites for Time Series ML

Reading time: ~22 minutes | Level: Time Series Foundations → Production ML

Your ML team trains a gradient boosted model to predict energy demand using the past 3 years of hourly consumption data. Backtesting shows 95% accuracy. You deploy. The model is 40% off every winter morning.

The problem: energy demand has a trend component that grew over your training period. Your model learned to predict "what level does demand run at in this period" rather than "what is the deviation from the long-run trend." This is the classic consequence of training on non-stationary data without accounting for it.

Stationarity is the first thing you must check before applying any time series ML model.

What You Will Learn

Strict vs weak (covariance) stationarity - formal definitions
Why stationarity matters for ML: assumptions underlying forecasting models
Unit roots: the most common form of non-stationarity
Augmented Dickey-Fuller (ADF) test: implementation and interpretation
Differencing: transforming non-stationary to stationary
Trend decomposition and deseasonalization
Ergodicity: why time averages equal ensemble averages

Part 1 - Stationarity: Formal Definitions

Strict stationarity

A time series $\{X_t\}$ is strictly stationary if the joint distribution of any collection $(X_{t_1}, X_{t_2}, \ldots, X_{t_k})$ is the same as the joint distribution of $(X_{t_1 + \tau}, X_{t_2 + \tau}, \ldots, X_{t_k + \tau})$ for all $k$ , all time points $t_1, \ldots, t_k$ , and all shifts $\tau$ .

Informally: the statistical properties of the process do not change over time.

Weak (covariance) stationarity

A time series $\{X_t\}$ is weakly stationary (covariance stationary) if:

Constant mean: $E[X_t] = \mu$ for all $t$
Constant variance: $\text{Var}(X_t) = \sigma^2 < \infty$ for all $t$
Autocovariance depends only on lag: $\text{Cov}(X_t, X_{t+h}) = \gamma(h)$ depends only on $h$ , not on $t$

:::note Weak vs strict stationarity Strict stationarity implies weak stationarity (assuming finite variance). Weak stationarity does not imply strict stationarity in general. However, for Gaussian processes, weak and strict stationarity are equivalent (since Gaussians are fully characterized by first and second moments).

In ML practice, we almost always work with weak stationarity. :::

import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller

np.random.seed(42)
n = 500

# --- Stationary processes ---

# White noise: E[X_t]=0, Var(X_t)=1, Cov(X_t, X_s)=0 for t≠s
white_noise = np.random.randn(n)

# AR(1) stationary (|φ|<1): X_t = 0.7 X_{t-1} + ε_t
ar1_stationary = np.zeros(n)
for t in range(1, n):
    ar1_stationary[t] = 0.7 * ar1_stationary[t-1] + np.random.randn()

# --- Non-stationary processes ---

# Random walk (unit root): X_t = X_{t-1} + ε_t
random_walk = np.cumsum(np.random.randn(n))

# Trend-stationary: X_t = 0.02t + ε_t
trend_stationary = 0.02 * np.arange(n) + 0.5 * np.random.randn(n)

# Heteroskedastic (non-constant variance)
heteroskedastic = np.random.randn(n) * (1 + np.arange(n) / n)

# Check: does the mean/variance look constant over time?
for name, series in [('White noise', white_noise),
                     ('AR(1)', ar1_stationary),
                     ('Random walk', random_walk),
                     ('Trend', trend_stationary)]:
    first_half_mean = series[:n//2].mean()
    second_half_mean = series[n//2:].mean()
    first_half_var = series[:n//2].var()
    second_half_var = series[n//2:].var()
    print(f"{name}:")
    print(f"  Mean: {first_half_mean:.2f} (1st half) vs {second_half_mean:.2f} (2nd half)")
    print(f"  Var:  {first_half_var:.2f} (1st half) vs {second_half_var:.2f} (2nd half)")

Part 2 - Why Stationarity Matters for ML

The i.i.d. assumption violation

Most ML models (linear regression, XGBoost, neural networks) assume training examples are independently and identically distributed (i.i.d.). For time series:

Non-i.i.d.: Observations at nearby times are correlated
Non-stationary: The distribution changes over time

If you train on non-stationary data without addressing the non-stationarity, you are learning a relation that no longer holds at test time.

Train/test split on non-stationary data

import numpy as np
from sklearn.linear_model import LinearRegression
import pandas as pd

np.random.seed(42)
n = 1000

# Non-stationary series: trending upward
trend = np.linspace(0, 10, n)
noise = np.random.randn(n) * 0.5
y = trend + noise

# Feature: lag-1 value
X = y[:-1].reshape(-1, 1)
y_target = y[1:]

# BAD: random shuffle split (introduces future leakage)
from sklearn.model_selection import train_test_split
X_train_bad, X_test_bad, y_train_bad, y_test_bad = train_test_split(
    X, y_target, test_size=0.2, shuffle=True, random_state=42)  # WRONG

# CORRECT: time-based split - only use past to predict future
split_idx = int(0.8 * len(X))
X_train = X[:split_idx]
y_train = y_target[:split_idx]
X_test = X[split_idx:]
y_test = y_target[split_idx:]

# Train on random split (inflated performance due to future leakage)
model_bad = LinearRegression().fit(X_train_bad, y_train_bad)
score_bad = model_bad.score(X_test_bad, y_test_bad)

# Train on time split
model_correct = LinearRegression().fit(X_train, y_train)
score_correct = model_correct.score(X_test, y_test)

print(f"R² (random split, leaky): {score_bad:.4f}")    # Artificially high
print(f"R² (time split, honest):  {score_correct:.4f}")
# For trending data: both may be high here, but the insight holds for non-linear patterns

Distribution shift over time

import numpy as np

np.random.seed(42)
n = 1000

# Simulated financial data: regime change
# First 500 steps: low volatility bull market
# Next 500 steps: high volatility bear market
low_vol = np.random.randn(500) * 0.01 + 0.0005
high_vol = np.random.randn(500) * 0.03 - 0.001

returns = np.concatenate([low_vol, high_vol])
prices = np.exp(np.cumsum(returns)) * 100

print("First 500 observations (training-like period):")
print(f"  Mean return: {low_vol.mean():.6f}")
print(f"  Std return:  {low_vol.std():.6f}")

print("\nNext 500 observations (test-like period):")
print(f"  Mean return: {high_vol.mean():.6f}")
print(f"  Std return:  {high_vol.std():.6f}")

# A model trained on the first regime will massively underestimate volatility
# in the second regime - catastrophic for risk management

Part 3 - Unit Roots and the ADF Test

The unit root process

A time series has a unit root if:

$X_t = X_{t-1} + \epsilon_t \quad (\text{random walk})$

or more generally:

$X_t = \phi X_{t-1} + \epsilon_t \quad \text{with } |\phi| = 1$

The "unit" refers to $\phi = 1$ - the characteristic polynomial has a root at 1.

Properties of unit root processes:

Mean changes over time (trending or drifting)
Variance grows without bound: $\text{Var}(X_t) = t \sigma^2$
Shocks are permanent: a perturbation at time $t_0$ affects all future values
Standard regression on integrated series produces spurious correlations

The Augmented Dickey-Fuller (ADF) test

The ADF test is a hypothesis test for unit roots:

$H_0$ : Series has a unit root (non-stationary)
$H_1$ : Series is stationary

The test regression:

$\Delta X_t = \alpha + \beta t + \gamma X_{t-1} + \sum_{i=1}^p \delta_i \Delta X_{t-i} + \epsilon_t$

Test statistic: $t$ -statistic for $\gamma = 0$ . The augmented terms $\Delta X_{t-i}$ account for serial correlation.

Decision: If test statistic < critical value (usually at 5% level), reject $H_0$ → series is stationary.

import numpy as np
from statsmodels.tsa.stattools import adfuller

def adf_test(series: np.ndarray, name: str = '', verbose: bool = True) -> dict:
    """
    Augmented Dickey-Fuller test with interpretation.

    Returns: dict with test_statistic, p_value, is_stationary
    """
    result = adfuller(series, autolag='AIC')  # AIC selects optimal lag order

    output = {
        'test_statistic': result[0],
        'p_value': result[1],
        'n_lags_used': result[2],
        'critical_values': result[4],
        'is_stationary': result[1] < 0.05  # Common threshold
    }

    if verbose:
        print(f"\nADF Test: {name}")
        print(f"  Test Statistic: {result[0]:.4f}")
        print(f"  p-value:        {result[1]:.4f}")
        print(f"  Lags used:      {result[2]}")
        print(f"  Critical Values:")
        for level, cv in result[4].items():
            print(f"    {level}: {cv:.4f}")
        conclusion = "STATIONARY" if output['is_stationary'] else "NON-STATIONARY (unit root)"
        print(f"  Conclusion: {conclusion} (p < 0.05: {result[1] < 0.05})")

    return output

# Test different processes
np.random.seed(42)
n = 500

white_noise = np.random.randn(n)
random_walk = np.cumsum(np.random.randn(n))
ar1_stationary = np.zeros(n)
for t in range(1, n):
    ar1_stationary[t] = 0.7 * ar1_stationary[t-1] + np.random.randn()

adf_test(white_noise, "White Noise")
adf_test(ar1_stationary, "AR(1), φ=0.7")
adf_test(random_walk, "Random Walk")

# Real-world example: S&P 500 price level vs log returns
import pandas as pd
# Simulated S&P 500-like data
returns_sim = 0.001 + 0.015 * np.random.randn(252 * 5)  # 5 years daily
prices_sim = 3000 * np.exp(np.cumsum(returns_sim))

adf_test(prices_sim, "Price Level (non-stationary expected)")
adf_test(np.diff(np.log(prices_sim)), "Log Returns (stationary expected)")

Part 4 - Differencing to Achieve Stationarity

First differencing

If $X_t$ is a random walk (non-stationary), then $\Delta X_t = X_t - X_{t-1}$ is white noise (stationary):

$\Delta X_t = (X_{t-1} + \epsilon_t) - X_{t-1} = \epsilon_t$

In general, a series that requires $d$ rounds of differencing to become stationary is called "integrated of order $d$ ", denoted $I(d)$ .

import numpy as np
from statsmodels.tsa.stattools import adfuller

def difference_to_stationarity(series: np.ndarray, max_d: int = 3) -> tuple:
    """
    Apply differencing until ADF test accepts stationarity.
    Returns: (differenced_series, d)
    """
    d = 0
    current_series = series.copy()

    while d <= max_d:
        result = adfuller(current_series, autolag='AIC')
        p_value = result[1]

        print(f"d={d}: ADF p-value = {p_value:.4f} ({'stationary' if p_value < 0.05 else 'non-stationary'})")

        if p_value < 0.05:
            return current_series, d

        # Apply one more round of differencing
        current_series = np.diff(current_series)
        d += 1

    print(f"Warning: series still non-stationary after {max_d} differences")
    return current_series, max_d

np.random.seed(42)
# I(1) process: random walk
random_walk = np.cumsum(np.random.randn(500))
stationary_rw, d_rw = difference_to_stationarity(random_walk)
print(f"Random walk needed d={d_rw} differences")

# I(2) process: integrated twice
rw2 = np.cumsum(np.cumsum(np.random.randn(500)))
stationary_rw2, d_rw2 = difference_to_stationarity(rw2)
print(f"I(2) series needed d={d_rw2} differences")

Seasonal differencing

For seasonal data with period $s$ :

$\Delta_s X_t = X_t - X_{t-s}$

import numpy as np

def seasonal_difference(series: np.ndarray, s: int = 12) -> np.ndarray:
    """
    Seasonal differencing with period s.
    Removes seasonal non-stationarity without removing trend.
    """
    return series[s:] - series[:-s]

# Monthly data with annual seasonality
np.random.seed(42)
t = np.arange(120)  # 10 years monthly
seasonal = 5 * np.sin(2 * np.pi * t / 12)   # Annual cycle
trend = 0.05 * t
noise = np.random.randn(120)
series = seasonal + trend + noise

# Regular differencing removes trend but seasonal pattern remains
diff_1 = np.diff(series)

# Seasonal differencing removes seasonal component
diff_s = seasonal_difference(series, s=12)

# Both together (what SARIMA does)
diff_both = np.diff(seasonal_difference(series, s=12))

print("ADF p-values:")
for name, data in [("Original", series),
                   ("1st diff", diff_1),
                   ("Seasonal diff", diff_s),
                   ("Both diffs", diff_both)]:
    pval = adfuller(data, autolag='AIC')[1]
    print(f"  {name}: {pval:.4f}")

Part 5 - Trend Decomposition and Deseasonalization

For ML with time series, a common preprocessing pipeline:

import numpy as np
import pandas as pd
from statsmodels.tsa.seasonal import seasonal_decompose, STL

def decompose_and_make_stationary(series: pd.Series, period: int = 12) -> dict:
    """
    Decompose time series into trend, seasonal, and residual.
    Return stationary residuals for ML modeling.
    """
    # STL (Seasonal-Trend decomposition using LOESS) - more robust than classical
    stl = STL(series, period=period, robust=True)
    result = stl.fit()

    trend = result.trend
    seasonal = result.seasonal
    residual = result.resid

    # The residual should be stationary (or close to it)
    pval_residual = adfuller(residual.dropna())[1]
    print(f"Residual ADF p-value: {pval_residual:.4f} "
          f"({'stationary' if pval_residual < 0.05 else 'non-stationary'})")

    return {
        'trend': trend,
        'seasonal': seasonal,
        'residual': residual,
        'original': series
    }

# ML workflow: predict residuals, then add back trend and seasonal
def ml_time_series_workflow(series: pd.Series, model, horizon: int = 12):
    """
    Proper ML time series pipeline:
    1. Decompose into trend + seasonal + residual
    2. Difference residuals to stationarity
    3. Train ML model on stationary residuals
    4. Predict residuals, reconstruct full forecast
    """
    components = decompose_and_make_stationary(series)
    residuals = components['residual'].dropna()

    # Feature engineering on stationary residuals
    # (lag features, rolling stats - valid only after stationarity)
    df = pd.DataFrame({'residual': residuals})
    for lag in [1, 2, 3, 6, 12]:
        df[f'lag_{lag}'] = df['residual'].shift(lag)

    df = df.dropna()
    X = df.drop('residual', axis=1)
    y = df['residual']

    # Train model (time-based split)
    split = int(0.8 * len(X))
    model.fit(X[:split], y[:split])
    residual_forecast = model.predict(X[split:split+horizon])

    # Reconstruct forecast by adding back trend and seasonal components
    # (In practice, forecast trend and seasonal separately)
    return residual_forecast

Part 6 - Ergodicity

Definition

A stationary process $\{X_t\}$ is ergodic if time averages converge to ensemble averages:

$\frac{1}{T} \sum_{t=1}^T g(X_t) \xrightarrow{T \to \infty} E[g(X_t)]$

In other words: a single long realization of the process contains enough information to estimate all population statistics.

Why ergodicity matters in ML

In ML, we have one realization of a time series (the observed data). We use time averages (e.g., sample mean, sample autocorrelation) to estimate population quantities. This is only valid if the process is ergodic.

Non-ergodic example: Suppose you measure one person's resting heart rate for 10,000 days. The time average is that person's average heart rate - but it tells you nothing about the population distribution of heart rates. If you want to model the population, you need many individuals (ensemble), not one long series.

import numpy as np

# Ergodic process: AR(1) with |φ| < 1
# Time average converges to population mean
np.random.seed(42)
n = 10000
ar1 = np.zeros(n)
for t in range(1, n):
    ar1[t] = 0.7 * ar1[t-1] + np.random.randn()

# Time average at different T
for T in [100, 1000, 5000, 10000]:
    time_avg = ar1[:T].mean()
    print(f"T={T:6d}: time_avg = {time_avg:.4f}")  # Should converge to 0 (true mean)
# Converges to population mean E[X] = 0

# Non-ergodic: different realizations have different long-run means
# (like measuring different people's heart rates)
n_realizations = 5
individual_means = np.random.normal(65, 10, n_realizations)  # True heart rate per person
series_per_person = [mean + np.random.randn(1000) for mean in individual_means]

print("\nPer-person time averages (non-ergodic scenario):")
for i, (series, true_mean) in enumerate(zip(series_per_person, individual_means)):
    print(f"  Person {i+1}: time_avg={series.mean():.1f}, true_mean={true_mean:.1f}")
# Time averages are person-specific - not estimates of population mean

Interview Questions

Q1: What is the difference between strict and weak stationarity? Which do we need for ARIMA models?

Strict stationarity: The joint distribution of $(X_{t_1}, \ldots, X_{t_k})$ is invariant to time shifts for any collection of time points. This means ALL statistical properties (mean, variance, higher moments, entire distribution) are constant over time.

Weak (covariance) stationarity: Only requires:

Constant mean: $E[X_t] = \mu$
Constant variance: $\text{Var}(X_t) = \sigma^2 < \infty$
Autocovariance depends only on lag: $\text{Cov}(X_t, X_{t+h}) = \gamma(h)$

Key relationships:

Strict → weak (if variance is finite)
Weak → strict only for Gaussian processes (where first two moments determine the distribution)
In practice, weak stationarity is sufficient for most analysis

ARIMA requires: Weak stationarity for the differenced series. The Wold decomposition theorem guarantees that any weakly stationary process can be represented as an MA(∞) - the theoretical foundation for ARIMA models. Strict stationarity is not needed because ARIMA only models first and second moments.

ML applications: If your series is weakly stationary (confirmed by ADF test, constant rolling mean and variance), ARIMA and related models will work correctly. If only strictly stationary but not weakly (infinite variance - rare), standard ARIMA breaks down.

Q2: What is the Augmented Dickey-Fuller test, and how do you interpret its output?

The ADF test tests for unit roots:

$H_0$ : Series has a unit root (non-stationary)
$H_1$ : Series is stationary

The test regression: $\Delta X_t = \alpha + \beta t + \gamma X_{t-1} + \sum_{i=1}^p \delta_i \Delta X_{t-i} + \epsilon_t$

The test statistic is the $t$ -ratio for $\gamma$ . The "augmented" part (lags of $\Delta X_t$ ) controls for serial correlation that would invalidate the standard Dickey-Fuller test.

Interpreting output:

from statsmodels.tsa.stattools import adfuller
result = adfuller(series)
# result[0]: test statistic
# result[1]: p-value
# result[4]: critical values at 1%, 5%, 10%

# Decision rule:
# p-value < 0.05: reject H₀ → series is stationary
# p-value ≥ 0.05: fail to reject H₀ → series has unit root

Critical values: The ADF distribution is non-standard (left-skewed). Critical values at 1% ≈ -3.5, 5% ≈ -2.9, 10% ≈ -2.6. More negative test statistic = stronger evidence for stationarity.

Practical nuances:

Low power with short series (T < 100): test may fail to detect stationarity
Near-unit-root processes ( $\phi = 0.98$ ): ADF may not reject even though the process is technically stationary
Structural breaks: ADF may give misleading results if the series has a break mid-sample
Always use autolag='AIC' to let the model select the lag order automatically

When both ADF and KPSS agree:

ADF rejects, KPSS does not reject → stationary (high confidence)
ADF does not reject, KPSS rejects → unit root (high confidence)
Both reject or both don't → structural break or near-unit-root (investigate further)

Q3: Why is it dangerous to train an ML model on non-stationary time series data without preprocessing?

Problem 1: Spurious correlations. Two independent random walks $X_t$ and $Y_t$ can have very high correlation (often > 0.8) purely by chance, because both trend in the same direction. A model trained on the levels will learn this spurious relationship. In deployment, when the trends diverge, the model fails spectacularly. This is called "spurious regression."

Problem 2: Distribution shift. If the mean or variance of the series changes over time, the distribution seen at test time is different from training. The model has learned the wrong distribution. Example: training on 2015-2019 data and testing on 2020 (pandemic) data.

Problem 3: Violation of ARIMA assumptions. ARIMA requires a stationary process after differencing. Applying ARIMA to a non-stationary series without proper differencing gives biased parameter estimates.

Problem 4: Invalid standard errors. OLS standard errors assume stationarity. With integrated series, standard errors are incorrect → hypothesis tests are invalid → model selection (feature selection, regularization) is flawed.

Correct approach:

Test for stationarity (ADF test, visual inspection of rolling mean/std)
If non-stationary: difference, log-transform, or detrend
Model the stationary residuals
Convert predictions back to original scale

For deep learning: LSTMs and Transformers can technically model non-stationary series, but they are much more data-hungry to do so correctly. Pre-whitening (making the series stationary) dramatically reduces the amount of data needed for reliable learning.

Q4: What is ergodicity and why is it important for time series ML?

Ergodicity: A stationary process $\{X_t\}$ is ergodic if time averages (from a single realization) converge to ensemble averages (expectations over many independent realizations).

$\frac{1}{T} \sum_{t=1}^T X_t \xrightarrow{p} E[X_t], \quad \frac{1}{T} \sum_{t=1}^T X_t X_{t+h} \xrightarrow{p} E[X_t X_{t+h}]$

Why it matters for ML: In time series ML, we have ONE trajectory (e.g., 5 years of daily stock prices). We estimate population parameters (mean, variance, autocorrelation structure) from this single trajectory using time averages.

This is only valid if the process is ergodic. If not, the time average may converge to something specific to that realization, not the population parameter.

Practical ergodicity conditions: Processes with fast-decaying autocorrelation are ergodic. Specifically, if $\sum_{h=0}^\infty |\gamma(h)| < \infty$ (absolute summability of autocovariance), the process is ergodic for the mean. This holds for any stationary ARMA process.

Non-ergodic situations in ML:

Panel data: Each user in your recommendation system is a different "realization." Using one user's long history to estimate population-level patterns is non-ergodic. Need many users.
Structural breaks: Before and after a regime change, the process has different statistical properties - a single long series spanning the break is non-ergodic for the pre-break period.

Practical implication: If you are fitting ARIMA to your time series, you are implicitly assuming ergodicity. If the series is ergodic, your parameter estimates from the single observed series are consistent. If not, you need ensemble data or regime-specific modeling.

Q5: How would you preprocess a non-stationary financial time series for ML feature engineering?

Step-by-step pipeline:

1. Inspect visually and test:

# Rolling mean and variance should be constant
rolling_mean = pd.Series(prices).rolling(50).mean()
rolling_std = pd.Series(prices).rolling(50).std()
# If rolling_mean trends → non-stationary mean
# If rolling_std trends → non-stationary variance (GARCH effects in finance)

2. Apply appropriate transformation:

Log prices → log returns via first difference: eliminates multiplicative trend, achieves approximate stationarity
Seasonal data: STL decomposition → model residuals
Multiple seasonalities: FBProphet or MSTL

3. Test again:

log_returns = np.diff(np.log(prices))
adf_result = adfuller(log_returns)
# p-value should be < 0.01 for log returns - typically true for financial data

4. Engineer features from stationary series:

df = pd.DataFrame({'return': log_returns})
# Lag features (only on stationary series to avoid leakage)
for lag in [1, 2, 3, 5, 10]:
    df[f'lag_{lag}'] = df['return'].shift(lag)
# Rolling statistics
df['rolling_mean_5'] = df['return'].rolling(5).mean()
df['rolling_std_5'] = df['return'].rolling(5).std()
df['rolling_skew_20'] = df['return'].rolling(20).skew()

5. Time-based cross-validation (never shuffle):

from sklearn.model_selection import TimeSeriesSplit
tscv = TimeSeriesSplit(n_splits=5)
for train_idx, val_idx in tscv.split(X):
    model.fit(X[train_idx], y[train_idx])
    score = model.score(X[val_idx], y[val_idx])

Key rule: All feature engineering must use only past information relative to the prediction target - no future leakage.

Quick Reference

Concept	Test	Fix
Unit root (random walk)	ADF, KPSS	First differencing
Seasonal non-stationarity	Plot seasonality	Seasonal differencing
Trending mean	Visual + ADF	Detrend or difference
Heteroskedasticity	Ljung-Box on squared residuals	GARCH model or log transform

ADF p-value	Interpretation
< 0.01	Strongly stationary
0.01–0.05	Stationary (reject H₀ at 5%)
0.05–0.10	Borderline - check visually
> 0.10	Non-stationary (unit root)

Next: Lesson 02: Autocorrelation and PACF →

:::tip 🎮 Interactive Playground

Visualize this concept: Try the Stationarity demo on the EngineersOfAI Playground - no code required.

:::

What You Will Learn​

Part 1 - Stationarity: Formal Definitions​

Strict stationarity​

Weak (covariance) stationarity​

Part 2 - Why Stationarity Matters for ML​

The i.i.d. assumption violation​

Train/test split on non-stationary data​

Distribution shift over time​

Part 3 - Unit Roots and the ADF Test​

The unit root process​

The Augmented Dickey-Fuller (ADF) test​

Part 4 - Differencing to Achieve Stationarity​

First differencing​

Seasonal differencing​

Part 5 - Trend Decomposition and Deseasonalization​

Part 6 - Ergodicity​

Definition​

Why ergodicity matters in ML​

Interview Questions​

Quick Reference​