ARIMA Models
Reading time: ~45 minutes Interview relevance: Very High - ARIMA is the classical forecasting baseline; every ML forecasting interview starts here Target roles: ML Engineer, Data Scientist, Financial ML, Research Engineer
The Real Interview Moment
You're presenting your LSTM forecasting model to a senior data scientist at a retail company. Your model achieves a MAPE of 8% on the test set. The senior DS asks: "Did you compare this to ARIMA?"
You did not.
"Why not? ARIMA with seasonal components often gets 5–7% MAPE on retail demand forecasting. Before you convince the business to run a GPU cluster for LSTMs, you need to prove that the complexity is justified. What's your ARIMA baseline?"
This scenario plays out constantly in industry. Neural networks are not always better than ARIMA. For short, well-behaved time series with strong seasonality, SARIMA frequently outperforms deep learning - at a fraction of the computational cost.
ARIMA is not a legacy technique you dismiss. It is the baseline you must beat and explain. Understanding its mathematics tells you exactly when it will and won't work - and what your deep learning model must be doing better.
Building Up to ARIMA
ARIMA combines three components: Autoregressive (AR), Integrated (I), and Moving Average (MA). Each adds expressive power for different temporal structures.
1. Autoregressive Model - AR(p)
An AR(p) model expresses the current value as a linear function of the most recent values:
where is white noise and are AR coefficients.
Backshift notation: Define the backshift operator such that . Then:
Stationarity condition for AR(p): The process is stationary if and only if all roots of the characteristic polynomial lie outside the unit circle in the complex plane.
For AR(1): for stationarity. For AR(2): The stationarity region is the triangle:
Unconditional mean: (centered). For a model with intercept : .
2. Moving Average Model - MA(q)
An MA(q) model expresses the current value as a linear combination of current and past noise shocks:
In backshift notation:
Always stationary: Any finite MA(q) is stationary regardless of .
Invertibility condition: The MA process is invertible (can be written as infinite AR) if all roots of lie outside the unit circle. Invertibility ensures a unique MA representation and is needed for parameter estimation.
Key property: ACF is exactly zero for lags . (This is why ACF identifies MA order.)
3. ARMA(p, q)
Combines both:
Expanded:
ARMA(p,q) is the most general stationary linear model. It can parsimoniously represent many processes that would require high-order pure AR or MA models.
4. Integration - ARIMA(p, d, q)
Most real-world series (prices, demand) are non-stationary - their mean drifts over time. The "I" in ARIMA handles this via differencing.
Define the difference operator :
- (first difference)
- (second difference)
ARIMA(p, d, q): Apply differences to achieve stationarity, then fit ARMA(p,q):
| Meaning | When to use | |
|---|---|---|
| 0 | No differencing (already stationary) | Stationary series |
| 1 | First difference | Random walks, trending series |
| 2 | Second difference | Series where growth rate is trending |
In practice, is almost always sufficient. Rarely .
SARIMA: Handling Seasonality
Seasonal patterns require seasonal AR/MA operators at lags
SARIMA:
where:
- : non-seasonal AR, integration, MA orders
- : seasonal AR, integration, MA orders
- : seasonal period (12 for monthly, 4 for quarterly, 7 for daily-weekly, 52 for weekly)
Example SARIMA(1,1,1)(1,1,1)[12]:
Python Implementation
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.stattools import adfuller, acf, pacf
from statsmodels.stats.diagnostic import acorr_ljungbox
from sklearn.metrics import mean_squared_error, mean_absolute_error
import itertools
# ─── Generate test data ───────────────────────────────────────────────────────
np.random.seed(42)
n = 300
# AR(2) process (known ground truth for validation)
def generate_arima(n: int, phi: list = None, theta: list = None,
d: int = 0, sigma: float = 1.0) -> np.ndarray:
"""Generate ARIMA(p,d,q) data for testing."""
if phi is None: phi = []
if theta is None: theta = []
p = len(phi)
q = len(theta)
extra = max(p, q) + d + 20 # burn-in
eps = np.random.normal(0, sigma, n + extra)
x = np.zeros(n + extra)
for t in range(max(p, d), n + extra):
# AR component
ar_part = sum(phi[j] * x[t-1-j] for j in range(p))
# MA component
ma_part = sum(theta[j] * eps[t-1-j] for j in range(q)) if q > 0 else 0
x[t] = ar_part + ma_part + eps[t]
# Apply differencing removal (integration) if d > 0
result = x[extra:]
if d > 0:
result = np.cumsum(result) # undo differencing (simulate I(d))
return result
# Ground truth: ARIMA(2,1,1)
series_raw = generate_arima(n, phi=[0.6, -0.2], theta=[0.4], d=1)
series = pd.Series(series_raw, name="y")
print(f"Generated ARIMA(2,1,1) series: mean={series.mean():.2f}, std={series.std():.2f}")
The Box-Jenkins Methodology
The Box-Jenkins methodology is the systematic workflow for fitting ARIMA models.
┌─────────────────────────────────────────────────────┐
│ Box-Jenkins Methodology │
│ │
│ 1. Identification │
│ ├── Check stationarity (ADF test) │
│ ├── Determine d (differencing order) │
│ └── Read ACF/PACF to estimate p, q │
│ │
│ 2. Estimation │
│ ├── Fit ARIMA(p,d,q) via MLE │
│ └── Check parameter significance │
│ │
│ 3. Diagnostic Checking │
│ ├── Residuals ~ white noise? (Ljung-Box) │
│ ├── Residual ACF/PACF significant? │
│ └── Residuals normally distributed? │
│ │
│ 4. Forecasting │
│ ├── Point forecasts with confidence intervals │
│ └── Evaluate on held-out data │
└─────────────────────────────────────────────────────┘
class BoxJenkinsPipeline:
"""
Complete Box-Jenkins ARIMA workflow.
"""
def __init__(self, series: pd.Series, test_size: int = 30):
self.series = series
self.test_size = test_size
self.train = series[:-test_size]
self.test = series[-test_size:]
self.model_fit = None
self.best_order = None
# ── Step 1: Identification ─────────────────────────────────────────────
def step1_identify(self) -> dict:
"""Determine d, p, q from ADF test and ACF/PACF."""
print("=" * 60)
print("STEP 1: IDENTIFICATION")
print("=" * 60)
# Determine d: difference until stationary
d = 0
current = self.train.values.copy()
while d <= 2:
adf_stat, adf_p, _, _, crit, _ = adfuller(current)
print(f"ADF test (d={d}): stat={adf_stat:.4f}, p={adf_p:.4f} "
f"→ {'STATIONARY' if adf_p < 0.05 else 'NON-STATIONARY'}")
if adf_p < 0.05:
break
current = np.diff(current)
d += 1
print(f"\nDifferencing order d = {d}")
# After differencing, read ACF and PACF
acf_vals = acf(current, nlags=20, fft=True)
pacf_vals = pacf(current, nlags=20, method='ywmle')
ci = 1.96 / np.sqrt(len(current))
sig_acf = [k for k in range(1, 21) if abs(acf_vals[k]) > ci]
sig_pacf = [k for k in range(1, 21) if abs(pacf_vals[k]) > ci]
print(f"Significant ACF lags: {sig_acf[:5]}")
print(f"Significant PACF lags: {sig_pacf[:5]}")
# Heuristic p, q
p_candidate = min(sig_pacf[:1], default=[1])[0] if sig_pacf else 1
q_candidate = min(sig_acf[:1], default=[1])[0] if sig_acf else 1
print(f"\nInitial candidates: p={p_candidate}, d={d}, q={q_candidate}")
return {'d': d, 'p_initial': p_candidate, 'q_initial': q_candidate,
'sig_acf': sig_acf, 'sig_pacf': sig_pacf}
# ── Step 2: Estimation (grid search over p, q) ─────────────────────────
def step2_estimate(
self,
d: int = 1,
p_range: range = range(0, 4),
q_range: range = range(0, 4)
) -> pd.DataFrame:
"""Fit multiple ARIMA models, select by AIC."""
print("\n" + "=" * 60)
print("STEP 2: ESTIMATION (Model Selection by AIC)")
print("=" * 60)
results = []
for p, q in itertools.product(p_range, q_range):
try:
model = ARIMA(self.train, order=(p, d, q))
fitted = model.fit(method='innovations_mle')
results.append({
'p': p, 'd': d, 'q': q,
'aic': fitted.aic,
'bic': fitted.bic,
'order': (p, d, q)
})
except Exception:
pass # some configurations may not converge
df = pd.DataFrame(results).sort_values('aic')
print(f"\n{'Order':>15} | {'AIC':>10} | {'BIC':>10}")
print("-" * 40)
for _, row in df.head(5).iterrows():
order_str = f"({row['p']},{row['d']},{row['q']})"
print(f"{order_str:>15} | {row['aic']:>10.2f} | {row['bic']:>10.2f}")
best = df.iloc[0]
self.best_order = (int(best['p']), int(best['d']), int(best['q']))
print(f"\nBest model by AIC: ARIMA{self.best_order}")
# Fit best model
self.model_fit = ARIMA(self.train, order=self.best_order).fit()
print(f"\nModel Parameters:")
print(self.model_fit.summary().tables[1])
return df
# ── Step 3: Diagnostic Checking ─────────────────────────────────────────
def step3_diagnostics(self) -> dict:
"""Check residuals for white noise."""
print("\n" + "=" * 60)
print("STEP 3: DIAGNOSTIC CHECKING")
print("=" * 60)
residuals = self.model_fit.resid
# Ljung-Box test on residuals
lb_results = acorr_ljungbox(residuals, lags=[10, 20], return_df=True)
lb_p10 = lb_results['lb_pvalue'].iloc[0]
lb_p20 = lb_results['lb_pvalue'].iloc[1]
# Residual ACF (should all be insignificant)
res_acf = acf(residuals, nlags=20, fft=True)
ci = 1.96 / np.sqrt(len(residuals))
sig_res_acf = [k for k in range(1, 21) if abs(res_acf[k]) > ci]
# Normality test
from scipy.stats import shapiro, normaltest
_, normality_p = normaltest(residuals) # D'Agostino-Pearson
diagnostics = {
'lb_p_lag10': lb_p10,
'lb_p_lag20': lb_p20,
'residuals_white_noise': lb_p10 > 0.05 and lb_p20 > 0.05,
'sig_residual_acf_lags': sig_res_acf,
'normality_p': normality_p,
'residual_mean': residuals.mean(),
'residual_std': residuals.std()
}
print(f"Residual diagnostics:")
print(f" Ljung-Box p (lag 10): {lb_p10:.4f} "
f"→ {'✓ white noise' if lb_p10 > 0.05 else '✗ autocorrelation remains'}")
print(f" Ljung-Box p (lag 20): {lb_p20:.4f} "
f"→ {'✓ white noise' if lb_p20 > 0.05 else '✗ autocorrelation remains'}")
print(f" Sig. residual ACF lags: {sig_res_acf[:5]}")
print(f" Normality p-value: {normality_p:.4f} "
f"→ {'✓ normal' if normality_p > 0.05 else 'non-normal'}")
print(f" Residual mean: {residuals.mean():.6f} (should be ~0)")
if sig_res_acf:
print(f"\n ⚠ Significant residual ACF at lags {sig_res_acf[:3]}")
print(f" Consider increasing p or q to capture remaining structure.")
return diagnostics
# ── Step 4: Forecasting ─────────────────────────────────────────────────
def step4_forecast(self, steps: int = None) -> pd.DataFrame:
"""Generate forecasts with confidence intervals."""
print("\n" + "=" * 60)
print("STEP 4: FORECASTING")
print("=" * 60)
if steps is None:
steps = self.test_size
# In-sample fit
in_sample = self.model_fit.fittedvalues
# Out-of-sample forecast
forecast_result = self.model_fit.get_forecast(steps=steps)
forecast_mean = forecast_result.predicted_mean
forecast_ci = forecast_result.conf_int(alpha=0.05)
# Evaluation
# Align test with forecast (handle index alignment)
y_true = self.test.values
y_pred = forecast_mean.values[:len(y_true)]
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
mae = mean_absolute_error(y_true, y_pred)
mape = np.mean(np.abs((y_true - y_pred) / (y_true + 1e-8))) * 100
print(f"\nForecast metrics on {steps}-step ahead:")
print(f" RMSE: {rmse:.4f}")
print(f" MAE: {mae:.4f}")
print(f" MAPE: {mape:.2f}%")
# Naive baseline (last observed value)
naive_pred = np.full(len(y_true), self.train.iloc[-1])
naive_rmse = np.sqrt(mean_squared_error(y_true, naive_pred))
print(f"\n Naive baseline RMSE: {naive_rmse:.4f}")
print(f" ARIMA improvement: {(naive_rmse - rmse)/naive_rmse:.1%}")
return pd.DataFrame({
'forecast': forecast_mean,
'lower_95': forecast_ci.iloc[:, 0],
'upper_95': forecast_ci.iloc[:, 1]
})
# Run the full pipeline
pipeline = BoxJenkinsPipeline(series, test_size=30)
ident = pipeline.step1_identify()
model_df = pipeline.step2_estimate(d=ident['d'])
diag = pipeline.step3_diagnostics()
forecasts = pipeline.step4_forecast()
Model Selection: AIC vs BIC
Both AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) balance goodness-of-fit against model complexity.
where is the maximized likelihood, is the number of parameters, and is the sample size.
Choose lower AIC/BIC.
| Criterion | Penalty | Preference | When to use |
|---|---|---|---|
| AIC | Penalizes less | When forecasting accuracy is the goal; tends to select larger models | |
| BIC | Penalizes more | When interpretability matters; tends to select smaller models | |
| AICc | AIC corrected | Small samples () - use AICc instead of AIC |
def aic_bic_comparison(
series: pd.Series,
d: int = 1,
max_p: int = 4,
max_q: int = 4,
criterion: str = 'aic'
) -> tuple:
"""Grid search for best ARIMA order using AIC or BIC."""
best_score = np.inf
best_order = None
scores = {}
for p in range(max_p + 1):
for q in range(max_q + 1):
if p == 0 and q == 0:
continue
try:
model = ARIMA(series, order=(p, d, q))
fitted = model.fit()
score = fitted.aic if criterion == 'aic' else fitted.bic
scores[(p, d, q)] = score
if score < best_score:
best_score = score
best_order = (p, d, q)
except Exception:
pass
return best_order, best_score, scores
best_order_aic, best_aic, _ = aic_bic_comparison(series, d=1, criterion='aic')
best_order_bic, best_bic, _ = aic_bic_comparison(series, d=1, criterion='bic')
print(f"Best by AIC: ARIMA{best_order_aic}, AIC={best_aic:.2f}")
print(f"Best by BIC: ARIMA{best_order_bic}, BIC={best_bic:.2f}")
SARIMA in Practice: Monthly Retail Sales
def fit_sarima_monthly(
series: pd.Series,
test_months: int = 24
) -> dict:
"""
Fit SARIMA model to monthly data with annual seasonality.
Classic application: retail sales, energy consumption,
subscription counts - any KPI with monthly/quarterly patterns.
"""
train = series[:-test_months]
test = series[-test_months:]
print("Fitting SARIMA models for monthly data...")
print("Seasonal period: 12 months\n")
# Common SARIMA specifications for monthly data
sarima_configs = [
{'order': (1,1,1), 'seasonal_order': (1,1,1,12)},
{'order': (0,1,1), 'seasonal_order': (0,1,1,12)}, # airline model
{'order': (1,1,0), 'seasonal_order': (1,1,0,12)},
{'order': (2,1,1), 'seasonal_order': (1,1,1,12)},
{'order': (0,1,1), 'seasonal_order': (1,1,0,12)},
]
results = []
for config in sarima_configs:
try:
model = SARIMAX(
train,
order=config['order'],
seasonal_order=config['seasonal_order'],
enforce_stationarity=False,
enforce_invertibility=False
)
fitted = model.fit(disp=False)
# Forecast
forecast = fitted.get_forecast(steps=test_months)
y_pred = forecast.predicted_mean.values
y_true = test.values
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
results.append({
'config': f"SARIMA{config['order']}{config['seasonal_order']}",
'aic': fitted.aic,
'bic': fitted.bic,
'test_rmse': rmse,
'fitted': fitted
})
except Exception as e:
pass
df = pd.DataFrame([{k: v for k, v in r.items() if k != 'fitted'}
for r in results]).sort_values('aic')
print(f"{'Model':>40} | {'AIC':>8} | {'BIC':>8} | {'RMSE':>8}")
print("-" * 72)
for _, row in df.iterrows():
print(f"{row['config']:>40} | {row['aic']:>8.2f} | {row['bic']:>8.2f} | {row['test_rmse']:>8.4f}")
best_result = min(results, key=lambda x: x['aic'])
return best_result
# Generate monthly sales data with seasonality
np.random.seed(0)
n_months = 120 # 10 years of monthly data
t = np.arange(n_months)
# Seasonal pattern + trend + AR(1) structure
seasonal = 50 * np.sin(2 * np.pi * t / 12) + 20 * np.sin(2 * np.pi * 2 * t / 12)
trend = 2 * t
ar_noise = np.zeros(n_months)
for i in range(1, n_months):
ar_noise[i] = 0.7 * ar_noise[i-1] + np.random.normal(0, 15)
sales = 500 + trend + seasonal + ar_noise
sales_series = pd.Series(sales)
best = fit_sarima_monthly(sales_series, test_months=24)
print(f"\nBest model: {best['config']}, RMSE={best['test_rmse']:.4f}")
ARIMA vs Deep Learning: When Does Each Win?
def benchmark_arima_vs_naive(
series: pd.Series,
test_size: int = 24,
arima_order: tuple = (1,1,1)
) -> pd.DataFrame:
"""
Benchmark ARIMA against naive baselines.
ARIMA should beat these baselines; if not, something is wrong.
"""
train = series[:-test_size].values
test = series[-test_size:].values
results = {}
# Naive: last value
naive_last = np.full(test_size, train[-1])
results['Naive (last value)'] = np.sqrt(mean_squared_error(test, naive_last))
# Naive: seasonal (same period last year, for monthly data)
if len(train) >= 12:
naive_seasonal = train[-12:][:test_size] if test_size <= 12 else np.tile(train[-12:], 3)[:test_size]
results['Naive (seasonal)'] = np.sqrt(mean_squared_error(test, naive_seasonal))
# ARIMA
try:
model = ARIMA(train, order=arima_order)
fitted = model.fit()
arima_pred = fitted.forecast(steps=test_size)
results['ARIMA'] = np.sqrt(mean_squared_error(test, arima_pred))
except Exception as e:
results['ARIMA'] = float('nan')
# Exponential smoothing (simple)
from statsmodels.tsa.holtwinters import ExponentialSmoothing
hw_model = ExponentialSmoothing(train, trend='add', seasonal='add',
seasonal_periods=12).fit(optimized=True)
hw_pred = hw_model.forecast(test_size)
results['Holt-Winters'] = np.sqrt(mean_squared_error(test, hw_pred))
df = pd.DataFrame.from_dict(results, orient='index', columns=['RMSE'])
df = df.sort_values('RMSE')
print("Benchmark Results (RMSE):")
print(f"{'Model':<30} | {'RMSE':>10}")
print("-" * 45)
for model_name, row in df.iterrows():
print(f"{model_name:<30} | {row['RMSE']:>10.4f}")
return df
benchmark_arima_vs_naive(sales_series)
Decision framework for ARIMA vs neural networks:
| Factor | ARIMA wins | Neural net wins |
|---|---|---|
| Series length | < 200 observations | > 1000+ observations |
| Computational budget | Low (CPU, seconds) | High (GPU, hours) |
| Interpretability required | High (coefficients matter) | Low (black box OK) |
| Seasonality | Well-defined seasonal period | Complex, multi-scale |
| Exogenous variables | Few, linear relationships | Many, complex interactions |
| Distribution shift | Stable (in-distribution) | Adaptive models handle shift |
| Multiple series | Each series separate model | Single model generalizes |
Auto-ARIMA: Automated Order Selection
def auto_arima_simple(
series: pd.Series | np.ndarray,
max_p: int = 5,
max_q: int = 5,
max_d: int = 2,
criterion: str = 'aic',
verbose: bool = True
) -> tuple:
"""
Simple auto-ARIMA implementation: grid search with AIC/BIC.
For production, use:
- pmdarima.auto_arima (battle-tested, handles edge cases)
- statsforecast.AutoARIMA (fastest implementation, Nixtla)
"""
if isinstance(series, np.ndarray):
series = pd.Series(series)
# Determine d by ADF test
d = 0
current = series.values.copy()
while d < max_d:
_, adf_p, *_ = adfuller(current)
if adf_p < 0.05:
break
current = np.diff(current)
d += 1
best_score = np.inf
best_order = None
best_model = None
for p in range(max_p + 1):
for q in range(max_q + 1):
if p == 0 and q == 0:
continue
try:
model = ARIMA(series, order=(p, d, q))
fitted = model.fit(method='innovations_mle')
score = fitted.aic if criterion == 'aic' else fitted.bic
if score < best_score:
best_score = score
best_order = (p, d, q)
best_model = fitted
except Exception:
pass
if verbose:
print(f"Auto-ARIMA result: ARIMA{best_order}, {criterion.upper()}={best_score:.2f}")
return best_order, best_model
order, model = auto_arima_simple(series)
Online Forecasting: Recursive Update
In production ML systems, you often need to update forecasts as new data arrives, without refitting from scratch.
def recursive_arima_forecast(
initial_train: np.ndarray,
new_observations: np.ndarray,
order: tuple = (1, 1, 1),
refit_every: int = 10
) -> np.ndarray:
"""
Rolling walk-forward ARIMA forecast.
Strategy 1: Refit the model every `refit_every` periods
Strategy 2: Use model.apply() to update state without full refit (faster)
In production systems:
- Full refit: expensive but gets updated parameter estimates
- State update (Kalman filter approach): O(1) per observation
- Compromise: refit every N periods, state-update in between
"""
all_data = list(initial_train)
forecasts = []
current_model = ARIMA(all_data, order=order).fit(method='innovations_mle')
for i, y_new in enumerate(new_observations):
# Forecast next step
fc = current_model.forecast(steps=1)[0]
forecasts.append(fc)
# Add new observation
all_data.append(y_new)
# Refit periodically
if (i + 1) % refit_every == 0:
current_model = ARIMA(all_data, order=order).fit(method='innovations_mle')
return np.array(forecasts)
# Run walk-forward evaluation
train_size = 200
initial = series.values[:train_size]
holdout = series.values[train_size:]
rolling_preds = recursive_arima_forecast(initial, holdout, order=best_order_aic)
rolling_rmse = np.sqrt(mean_squared_error(holdout, rolling_preds))
print(f"\nWalk-forward ARIMA RMSE: {rolling_rmse:.4f}")
Interview Questions
Q1: Explain ARIMA(p,d,q) - what does each component mean?
AR(p) - Autoregressive of order p: The current value depends on the last observed values. Each lag has a coefficient . Captures "momentum" and "mean reversion" patterns. PACF cuts off at lag .
I(d) - Integrated of order d: The series is differenced times to achieve stationarity. removes linear trends (the difference is stationary). removes quadratic trends. Most real-world series need or .
MA(q) - Moving Average of order q: The current value depends on the last noise shocks . Captures how shocks propagate through the series. ACF cuts off at lag .
Combined ARIMA(p,d,q): Take the -th difference of the series to get a stationary series, then fit ARMA(p,q) to it.
Practical identification:
- Run ADF test → if non-stationary, difference once (d=1) and re-test
- Look at ACF of differenced series: if cuts off at q → MA(q) component
- Look at PACF of differenced series: if cuts off at p → AR(p) component
- Confirm with AIC/BIC grid search
Q2: What is the difference between AIC and BIC, and which should you use?
Both penalize model complexity (number of parameters ) to prevent overfitting:
BIC penalizes more for large (since when ), so BIC selects simpler models than AIC when sample sizes are moderate to large.
AIC is optimal for forecasting: It minimizes one-step-ahead prediction error asymptotically (Akaike, 1974). If your goal is accurate forecasts, use AIC.
BIC is consistent: It selects the true model order as (if the true model is in the candidate set). If you want to identify the true data-generating process, use BIC.
Practical rule of thumb:
- AIC → when you care about forecast accuracy (common in ML)
- BIC → when you care about model parsimony/interpretability
- If AIC and BIC agree: high confidence in the selected model
- If they disagree: AIC selects a larger model; compare both on a validation set
For small samples (), use AICc (corrected AIC) to avoid overfitting.
Q3: Why do you check residuals after fitting ARIMA? What patterns indicate a misspecified model?
After fitting ARIMA, the residuals should be white noise - no remaining temporal structure. If they're not, the model failed to capture some pattern.
What to check:
-
Ljung-Box test on residuals: Test : residuals are white noise. Reject () → model is misspecified.
-
Residual ACF and PACF: Should be insignificant at all lags.
- Significant spike at lag in ACF → add MA() term
- Significant spike at lag in PACF → add AR() term
- Spikes at seasonal lags (12, 24) → add seasonal SARIMA components
-
Residual mean ≈ 0: Non-zero mean → include intercept term.
-
Residual normality: For valid prediction intervals, residuals should be approximately normal. Non-normality doesn't affect point forecasts but invalidates confidence intervals.
-
Ljung-Box on squared residuals: Detects ARCH effects (heteroskedasticity). If significant → residuals have non-constant variance; fit GARCH after ARIMA.
Common misspecification patterns and fixes:
- ACF residual spike at lag 1 → q too small, increase q
- PACF residual spike at lag 2 → p too small, increase p
- Slow ACF decay in residuals → d insufficient, increase d
- Seasonal ACF spikes in residuals → need SARIMA
- Residual variance increases over time → transform series (log), or fit GARCH
Q4: When should you use ARIMA vs a neural network for forecasting?
Use ARIMA when:
- Short series (< 200 observations) - neural nets overfit
- Need prediction intervals (ARIMA has exact distributional theory)
- Interpretability required (AR/MA coefficients have clear meaning)
- Fast inference / no GPU (ARIMA is CPU-only, milliseconds)
- Strong simple seasonality (SARIMA handles this naturally)
- Prototyping / establishing a baseline
- Business stakeholders need to understand the model
Use neural networks when:
- Long series (1000+ observations)
- Multiple related series (one model generalizes across series)
- Rich exogenous covariates (promotional variables, weather, macroeconomic)
- Complex non-linear dynamics
- Irregular seasonality or multiple seasonal periods
- Transfer learning across series (pre-trained Chronos, TimeGPT)
- When you've proven ARIMA can't capture the dynamics
The key discipline: Always fit ARIMA first and measure its performance. A neural network that underperforms ARIMA is a red flag - either the data doesn't have complex temporal structure worth learning, or the neural net is improperly configured.
In practice, ensembles often win: ARIMA captures the linear temporal structure; a gradient boosted model captures non-linear residuals.
Q5: What are unit roots and why do they matter for ARIMA?
A unit root occurs when the AR characteristic polynomial has a root exactly on the unit circle (modulus = 1). The simplest example is AR(1) with :
This is a random walk - a unit root process. It is non-stationary: its variance grows without bound over time ().
Why unit roots matter for ARIMA:
-
Stationarity requirement: Standard ARIMA assumes the differenced series is stationary. If the original series has a unit root, differencing removes it and gives a stationary ARMA process.
-
Hypothesis testing: The ADF (Augmented Dickey-Fuller) test tests : unit root present vs : stationary. You must reject before fitting ARMA. If is not rejected → difference first.
-
Spurious regression risk: Regressing two independent random walks (both with unit roots) produces statistically significant coefficients even though they're completely unrelated. This is why you cannot apply OLS to non-stationary series directly.
-
Integration order: A series with unit roots is (integrated of order ). The ARIMA differencing parameter is the number of unit roots to remove.
-
Cointegration: If two series have a linear combination that is (stationary), they are cointegrated - they share a common stochastic trend. This is handled by VECM (Vector Error Correction Model), not ARIMA.
Key Takeaways
- AR(p): current value depends on past values; MA(q): depends on past shocks; ARIMA(p,d,q): difference times then fit ARMA(p,q)
- SARIMA handles seasonal patterns at period
- Box-Jenkins methodology: Identify (ADF + ACF/PACF) → Estimate (MLE) → Diagnose (residual ACF, Ljung-Box) → Forecast
- Use AIC for forecasting accuracy, BIC for model parsimony
- Residuals must be white noise - persistent residual ACF indicates model misspecification
- Always fit ARIMA before neural networks - it's the baseline you must beat and explain
Next: Lesson 05: State Space Models →
:::tip 🎮 Interactive Playground
Visualize this concept: Try the ARIMA Explorer demo on the EngineersOfAI Playground - no code required.
:::
