Skip to main content

Eigenvalues and Eigenvectors - How PageRank Ranks the Internet

Reading time: ~22 minutes | Level: Mathematical Foundations → ML Engineering

PageRank - the algorithm that made Google the dominant search engine - is computed by finding the principal eigenvector of a graph's link adjacency matrix. The company ran this computation on billions of web pages in the late 1990s. Eigenvalues are not a math exercise. They are how search engines rank the internet.

In ML, eigenvalues appear everywhere:

  • PCA's principal components are the eigenvectors of the data covariance matrix
  • The variance explained by each component is the corresponding eigenvalue
  • Spectral clustering cuts graphs using the eigenvectors of the graph Laplacian
  • The conditioning of an optimization problem depends on the eigenvalue ratio of the Hessian
  • Attention heads in transformers operate in eigenspaces of the weight matrices

This lesson teaches you to see eigenvalues not as algebraic tricks, but as the natural language of linear transformations.

What You Will Learn

  • What an eigenvector is geometrically: directions a matrix does not rotate
  • The characteristic polynomial: intuition, not memorization
  • Eigendecomposition: when it exists and what it reveals
  • Real symmetric matrices: guaranteed real eigenvalues and orthogonal eigenvectors
  • Power iteration: how eigenvalues are computed in practice (PageRank)
  • NumPy: np.linalg.eig, np.linalg.eigh, verification of eigenpairs
  • ML connections: PCA, covariance matrix, spectral methods, optimization conditioning

Prerequisites

Part 1 - What an Eigenvector Really Is

When you apply a matrix A to most vectors, two things happen: the vector rotates (changes direction) and scales (changes magnitude). But there are special directions that a matrix does not rotate - it only scales them.

An eigenvector of a matrix A is a non-zero vector v such that:

Av = λv

where λ is a scalar called the eigenvalue. The matrix transforms v into a scaled version of itself - same direction (or exact opposite), different length.

Geometric intuition

Think of it this way: if you apply a transformation to a rubber sheet, almost every point moves in a complicated way. But there are special directions - the eigenvector directions - along which points only slide back and forth (scale), never rotating.

A concrete example

Consider the matrix:

A = [3 1]
[0 2]

Apply A to the vector v₁ = [1, 0]:

Av₁ = [3·1 + 1·0] [3]
[0·1 + 2·0] = [0] = 3·[1, 0]

So v₁ = [1, 0] is an eigenvector with eigenvalue λ₁ = 3.

Apply A to the vector v₂ = [1, -1]:

Av₂ = [3·1 + 1·(-1)] [2]
[0·1 + 2·(-1)] = [-2] = 2·[1, -1]

So v₂ = [1, -1] is an eigenvector with eigenvalue λ₂ = 2.

Why eigenvectors reveal a matrix's "personality"

The eigenvectors of a matrix are its invariant directions - the directions along which the transformation is simplest (pure scaling). The eigenvalues tell you how much scaling occurs in each direction.

A matrix's eigenvalues determine:

  • Whether the transformation expands (|λ| > 1) or contracts (|λ| < 1) along each direction
  • Whether the transformation is invertible (all λ ≠ 0)
  • The long-run behavior of repeated applications (Aⁿv)

Part 2 - The Characteristic Polynomial (Intuition)

To find the eigenvalues of A, we need to solve:

Av = λv
Av - λv = 0
(A - λI)v = 0

For a non-zero vector v to satisfy (A - λI)v = 0, the matrix (A - λI) must be singular (non-invertible). A matrix is singular if and only if its determinant is zero:

det(A - λI) = 0

This equation is called the characteristic equation. Expanding the determinant gives a polynomial in λ called the characteristic polynomial. Its roots are the eigenvalues.

For a 2×2 matrix:

A = [a b]
[c d]

det(A - λI) = det([a-λ b ]) = (a-λ)(d-λ) - bc = 0
[c d-λ]

λ² - (a+d)λ + (ad-bc) = 0
λ² - tr(A)λ + det(A) = 0

Key insight: the sum of eigenvalues = trace(A) and the product of eigenvalues = det(A). These two facts let you verify your eigenvalue calculations.

Computing eigenvalues in NumPy

import numpy as np

# Example matrix
A = np.array([[3.0, 1.0],
[0.0, 2.0]])

# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"Eigenvalues: {eigenvalues}") # [3. 2.]
print(f"Eigenvectors (columns):\n{eigenvectors}") # v₁=[1,0], v₂=[...,-1]

# Verify: Av = λv for each eigenpair
for i, (lam, v) in enumerate(zip(eigenvalues, eigenvectors.T)):
Av = A @ v
lam_v = lam * v
print(f"\nEigenpair {i+1}:")
print(f" Av = {Av}")
print(f" λv = {lam_v}")
print(f" Match: {np.allclose(Av, lam_v)}")

# Verify trace and determinant relationships
print(f"\nTrace(A) = {np.trace(A)}") # 5
print(f"Sum of eigenvalues = {eigenvalues.sum()}") # 5 ✓
print(f"Det(A) = {np.linalg.det(A):.1f}") # 6
print(f"Product of eigenvalues = {eigenvalues.prod():.1f}") # 6 ✓

Part 3 - Eigendecomposition: When a Matrix Splits Into Its Essence

If an n×n matrix A has n linearly independent eigenvectors v₁, v₂, ..., vₙ with eigenvalues λ₁, λ₂, ..., λₙ, then A can be eigendecomposed:

A = V Λ V⁻¹

Where:

  • V = [v₁ | v₂ | ... | vₙ] is the matrix whose columns are eigenvectors
  • Λ = diag(λ₁, λ₂, ..., λₙ) is the diagonal matrix of eigenvalues
  • V⁻¹ is the inverse of V

Why eigendecomposition is powerful

The eigendecomposition reveals the matrix's intrinsic structure:

  1. Matrix powers are easy: Aⁿ = V Λⁿ V⁻¹ (just raise the diagonal entries to the n-th power)
  2. Matrix functions are easy: f(A) = V f(Λ) V⁻¹ (apply f to each eigenvalue)
  3. Rank is visible: rank(A) = number of non-zero eigenvalues

When eigendecomposition exists

Not all matrices have eigendecompositions:

  • Eigendecomposition exists iff A has n linearly independent eigenvectors
  • Symmetric matrices always have eigendecomposition (with orthogonal eigenvectors)
  • Non-symmetric matrices may not have eigendecomposition (use SVD instead - Lesson 04)
import numpy as np

# Full eigendecomposition
A = np.array([[4.0, 2.0],
[1.0, 3.0]])

eigenvalues, V = np.linalg.eig(A)
Lambda = np.diag(eigenvalues)
V_inv = np.linalg.inv(V)

# Verify: A = V Λ V⁻¹
A_reconstructed = V @ Lambda @ V_inv
print(f"Reconstruction error: {np.linalg.norm(A - A_reconstructed):.2e}") # ≈ 0

# Compute matrix power using eigendecomposition
n = 10
A_power_10 = V @ np.diag(eigenvalues**10) @ V_inv
A_power_10_direct = np.linalg.matrix_power(A, 10)
print(f"A^10 match: {np.allclose(A_power_10, A_power_10_direct)}") # True

# Compute matrix exponential (used in ODE solvers, graph Laplacians)
A_exp = V @ np.diag(np.exp(eigenvalues)) @ V_inv
print(f"Matrix exponential computed via eigendecomposition")

Part 4 - Real Symmetric Matrices: The Special Case ML Engineers Must Know

Real symmetric matrices (A = Aᵀ) have extraordinary properties that make them central to ML:

The Spectral Theorem

Every real symmetric matrix has:

  1. All real eigenvalues (no complex numbers)
  2. Orthonormal eigenvectors (eigenvectors are perpendicular and unit length)
  3. Always diagonalizable (n independent eigenvectors always exist)

This means for a symmetric matrix S, the eigendecomposition is:

S = Q Λ Qᵀ (note: Qᵀ instead of Q⁻¹ because Q is orthogonal: Qᵀ = Q⁻¹)

Why symmetric matrices are everywhere in ML

Almost every matrix that appears in statistics and ML is symmetric:

MatrixWhy it's symmetricML use
Covariance matrix ΣΣᵢⱼ = Σⱼᵢ by definitionPCA, Gaussian models
Gram matrix XᵀX(XᵀX)ᵀ = XᵀXRidge regression, kernel methods
Correlation matrixSame as covariance, just normalizedFeature analysis
Graph Laplacian LL = D - A, both symmetricSpectral clustering, GNNs
Hessian matrix HH = ∂²L/∂θ² (for smooth L)Newton's method, curvature
import numpy as np

# Symmetric matrix - use np.linalg.eigh (not eig!)
# eigh is specifically for symmetric/Hermitian matrices:
# - Guaranteed real eigenvalues
# - Eigenvectors orthogonal by construction
# - Returns eigenvalues in ascending order
# - Faster and more numerically stable than eig

n = 5
A = np.random.randn(n, n)
S = A.T @ A # Always symmetric positive semidefinite

# Use eigh for symmetric matrices
eigenvalues, Q = np.linalg.eigh(S) # eigenvalues in ascending order

print(f"Eigenvalues (all real, ≥ 0): {eigenvalues}")
print(f"Are eigenvectors orthonormal? {np.allclose(Q.T @ Q, np.eye(n))}") # True

# Reconstruction: S = Q Λ Qᵀ
S_reconstructed = Q @ np.diag(eigenvalues) @ Q.T
print(f"Reconstruction error: {np.linalg.norm(S - S_reconstructed):.2e}") # ≈ 0

# Positive definite check: all eigenvalues > 0
print(f"Positive definite: {np.all(eigenvalues > 0)}")

# Positive semidefinite check: all eigenvalues ≥ 0 (allowing for numerical noise)
print(f"PSD: {np.all(eigenvalues >= -1e-10)}")

:::tip Always use np.linalg.eigh for symmetric matrices np.linalg.eig works for any square matrix but returns complex numbers (even for real symmetric matrices, due to numerical precision). np.linalg.eigh is specifically optimized for symmetric/Hermitian matrices:

  • Returns real eigenvalues (guaranteed)
  • Returns orthonormal eigenvectors
  • Is faster and more numerically stable
  • Returns eigenvalues in ascending order (convenient for PCA: largest last)

Rule: if your matrix is symmetric (which it is for covariance, Gram, Laplacian, Hessian), always use eigh. :::

Part 5 - Power Iteration: How PageRank Is Actually Computed

For a large matrix (billions × billions in Google's case), directly computing eigenvalues via the characteristic polynomial is infeasible. Power iteration is the practical algorithm for finding the largest eigenvalue (and corresponding eigenvector).

Algorithm

1. Start with a random vector v₀
2. Repeatedly: v_{k+1} = A v_k / ‖A v_k‖₂
3. Converges to: the eigenvector corresponding to the largest eigenvalue

Why it works

If we expand v₀ in the eigenvector basis: v₀ = c₁v₁ + c₂v₂ + ... + cₙvₙ

Then:

Aᵏv₀ = c₁λ₁ᵏv₁ + c₂λ₂ᵏv₂ + ... + cₙλₙᵏvₙ
= λ₁ᵏ(c₁v₁ + c₂(λ₂/λ₁)ᵏv₂ + ... + cₙ(λₙ/λ₁)ᵏvₙ)

If λ₁ is the largest eigenvalue (|λ₁| > |λᵢ| for i > 1), then (λᵢ/λ₁)ᵏ → 0 as k → ∞. The v₁ term dominates, and the iteration converges to v₁.

PageRank as power iteration

PageRank models the internet as a random walk: a "random surfer" follows links at random. The probability of being at each page after many steps converges to the principal eigenvector of the transition matrix.

import numpy as np

def power_iteration(
A: np.ndarray,
n_iterations: int = 100,
tol: float = 1e-8
) -> tuple:
"""
Find the dominant eigenvector and eigenvalue using power iteration.

Converges to the eigenvector corresponding to the largest |eigenvalue|.
Rate of convergence depends on |λ₁/λ₂| - larger gap = faster convergence.
"""
n = A.shape[0]
v = np.random.randn(n)
v /= np.linalg.norm(v) # start with unit vector

for i in range(n_iterations):
v_new = A @ v
eigenvalue = np.linalg.norm(v_new) # Rayleigh quotient approximation
v_new /= eigenvalue # normalize

if np.linalg.norm(v_new - v) < tol:
print(f"Converged at iteration {i+1}")
break
v = v_new

return eigenvalue, v

# Verify against np.linalg.eig
np.random.seed(42)
A = np.random.randn(5, 5)
A = A.T @ A # make symmetric positive definite

eigenval_pi, eigenvec_pi = power_iteration(A)

eigenvals_np, eigenvecs_np = np.linalg.eigh(A)
# eigh returns eigenvalues in ascending order → largest is last
largest_idx = np.argmax(eigenvals_np)

print(f"Power iteration λ₁: {eigenval_pi:.6f}")
print(f"NumPy λ₁: {eigenvals_np[largest_idx]:.6f}")
print(f"Match: {np.isclose(eigenval_pi, eigenvals_np[largest_idx], rtol=1e-4)}")

def pagerank(
adjacency: np.ndarray,
damping: float = 0.85,
n_iterations: int = 100
) -> np.ndarray:
"""
Compute PageRank scores via power iteration.

PageRank: r = damping * M @ r + (1-damping)/n * ones
where M is the column-normalized adjacency matrix.
"""
n = adjacency.shape[0]

# Column-normalize the adjacency matrix (each column sums to 1)
col_sums = adjacency.sum(axis=0)
col_sums[col_sums == 0] = 1 # avoid division by zero (dangling nodes)
M = adjacency / col_sums # transition probability matrix

# Power iteration on the PageRank equation
r = np.ones(n) / n # uniform initial distribution
teleport = (1 - damping) / n * np.ones(n)

for _ in range(n_iterations):
r_new = damping * M @ r + teleport
if np.linalg.norm(r_new - r) < 1e-10:
break
r = r_new

return r

# Example: 4-node graph
# 0 → 1, 0 → 2, 1 → 3, 2 → 1, 2 → 3, 3 → 0
adj = np.array([[0, 0, 0, 1],
[1, 0, 1, 0],
[1, 0, 0, 0],
[0, 1, 1, 0]], dtype=float)

ranks = pagerank(adj)
print(f"\nPageRank scores: {ranks}")
print(f"Highest-ranked node: {ranks.argmax()}")

Part 6 - NumPy: Complete Eigenvalue Reference

import numpy as np

# ── General matrices: np.linalg.eig ───────────────────────────────────────
A = np.array([[4, 2],
[1, 3]], dtype=float)

# Returns: eigenvalues (complex in general), eigenvectors as columns
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"Eigenvalues: {eigenvalues}") # [5. 2.]
print(f"Eigenvectors:\n{eigenvectors}") # columns are eigenvectors

# ── Symmetric matrices: np.linalg.eigh ────────────────────────────────────
S = A.T @ A # guaranteed symmetric

# Returns: real eigenvalues (ascending), orthonormal eigenvectors
eigenvalues_s, Q = np.linalg.eigh(S)
print(f"\nSymmetric eigenvalues (ascending): {eigenvalues_s}")
print(f"Orthonormal: {np.allclose(Q.T @ Q, np.eye(len(eigenvalues_s)))}")

# ── Eigenvalues only (faster than eig/eigh if eigenvectors not needed) ────
eigenvalues_only = np.linalg.eigvalsh(S) # symmetric
eigenvalues_only_gen = np.linalg.eigvals(A) # general

# ── Rayleigh quotient: eigenvalue of v w.r.t. A ───────────────────────────
def rayleigh_quotient(A: np.ndarray, v: np.ndarray) -> float:
"""
Estimate eigenvalue for a given vector v.
Equals the exact eigenvalue when v is an exact eigenvector.
"""
return float(v.T @ A @ v / (v.T @ v))

v = eigenvectors[:, 0] # first eigenvector
rq = rayleigh_quotient(A, v)
print(f"\nRayleigh quotient = {rq:.6f}")
print(f"Exact eigenvalue = {eigenvalues[0].real:.6f}")

# ── Condition number from eigenvalues (for symmetric PD matrices) ──────────
# For symmetric positive definite A: cond(A) = λmax / λmin
eigenvals_pos = np.linalg.eigvalsh(S)
cond_from_eigen = eigenvals_pos.max() / eigenvals_pos.min()
cond_from_linalg = np.linalg.cond(S)
print(f"\nCondition (from eigenvalues): {cond_from_eigen:.4f}")
print(f"Condition (np.linalg.cond): {cond_from_linalg:.4f}")

# ── PCA using eigendecomposition ──────────────────────────────────────────
def pca_via_eigen(X: np.ndarray, n_components: int) -> tuple:
"""
PCA via eigendecomposition of covariance matrix.
(sklearn uses SVD internally - see Lesson 04 for why)
"""
# Center the data
X_centered = X - X.mean(axis=0)
n = X.shape[0]

# Compute covariance matrix
cov = (X_centered.T @ X_centered) / (n - 1) # (d, d)

# Eigendecomposition of symmetric covariance matrix
eigenvalues, eigenvectors = np.linalg.eigh(cov)

# Sort in descending order (eigh returns ascending)
idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

# Select top-k principal components
components = eigenvectors[:, :n_components] # (d, k)

# Project data
X_projected = X_centered @ components # (n, k)

# Explained variance ratio
explained_var = eigenvalues[:n_components] / eigenvalues.sum()

return X_projected, components, explained_var

# Test on synthetic data
np.random.seed(42)
n_samples, n_features = 200, 10
X = np.random.randn(n_samples, n_features) @ np.random.randn(n_features, 3) # rank-3 data

X_pca, components, var_ratio = pca_via_eigen(X, n_components=3)
print(f"\nPCA output shape: {X_pca.shape}") # (200, 3)
print(f"Explained variance: {var_ratio}") # ≈ [0.33, 0.33, 0.33]
print(f"Total explained: {var_ratio.sum():.4f}") # ≈ 1.0 (rank-3 data)

Part 7 - ML Connections: Eigenvalues in Real ML Systems

PCA: eigenvalues as explained variance

import numpy as np

# The eigenvalues of the covariance matrix ARE the variances
# along the principal component directions

np.random.seed(42)
n, d = 300, 5

# Create data with known variance structure
variances = [10.0, 5.0, 2.0, 0.5, 0.1] # variance along each principal axis
X = np.column_stack([
np.sqrt(var) * np.random.randn(n) for var in variances
])

# Compute covariance and its eigenvalues
X_centered = X - X.mean(axis=0)
cov = X_centered.T @ X_centered / (n - 1)
eigenvalues, _ = np.linalg.eigh(cov)
eigenvalues = eigenvalues[::-1] # descending

print("True variances:", variances)
print(f"Eigenvalues: {eigenvalues.round(2)}")
# Should match (up to sampling noise)

# Explained variance ratio (scree plot values)
explained = eigenvalues / eigenvalues.sum()
cumulative = np.cumsum(explained)
print(f"Explained variance ratio: {explained.round(3)}")
print(f"Cumulative explained: {cumulative.round(3)}")
# → 10/(10+5+2+0.5+0.1) ≈ 0.57 for first component

Spectral clustering: graph Laplacian eigenvectors

import numpy as np

def spectral_clustering(W: np.ndarray, k: int) -> np.ndarray:
"""
Spectral clustering: find k clusters using eigenvectors of the graph Laplacian.

Intuition: the k smallest eigenvectors of L encode the k-way partition
of the graph into weakly-connected components.
"""
n = W.shape[0]

# Graph Laplacian: L = D - W
# D is the diagonal degree matrix
degrees = W.sum(axis=1)
D = np.diag(degrees)
L = D - W # unnormalized Laplacian

# Normalized Laplacian: Lsym = D^(-1/2) L D^(-1/2)
D_inv_sqrt = np.diag(1.0 / np.sqrt(degrees + 1e-10))
L_sym = D_inv_sqrt @ L @ D_inv_sqrt

# The k smallest eigenvectors reveal the cluster structure
eigenvalues, eigenvectors = np.linalg.eigh(L_sym)
# eigh returns ascending order - take first k (smallest)
cluster_representation = eigenvectors[:, :k] # (n, k)

# k-means on the cluster_representation
# (in practice: sklearn.cluster.KMeans)
# Here: simple assignment for illustration
from numpy.random import choice
centers = cluster_representation[choice(n, k, replace=False)]

for _ in range(50): # k-means iterations
# Assign each point to nearest center
dists = np.array([np.linalg.norm(cluster_representation - c, axis=1)
for c in centers])
labels = np.argmin(dists, axis=0)
# Update centers
centers = np.array([cluster_representation[labels == j].mean(axis=0)
for j in range(k)])

return labels

# Note: the second smallest eigenvalue of L is the Fiedler value
# - it measures the "algebraic connectivity" of the graph
# Zero eigenvalues of L = number of disconnected components

Optimization: eigenvalue ratio = conditioning

import numpy as np

# The Hessian matrix of the loss function determines optimization difficulty
# Eigenvalue ratio = condition number = how "elongated" the loss landscape is

def simulate_gradient_descent_conditioning():
"""
Show how eigenvalue ratio affects gradient descent convergence.
Well-conditioned: eigenvalues similar → fast convergence
Ill-conditioned: eigenvalue ratio large → slow convergence
"""
np.random.seed(42)

for cond_number in [1.0, 10.0, 100.0, 1000.0]:
# Create a quadratic loss with given condition number
# Loss = 0.5 * x.T @ H @ x, optimal at x=0
# H has eigenvalues [1, cond_number]
Q = np.array([[1.0, 0.0],
[0.0, cond_number]]) # diagonal Hessian

# Gradient descent with learning rate η = 1/λmax
x = np.array([1.0, 1.0])
lr = 1.0 / cond_number

n_steps = 100
for step in range(n_steps):
grad = Q @ x # gradient of quadratic loss
x = x - lr * grad
if np.linalg.norm(x) < 1e-6:
print(f"cond={cond_number:6.0f}: converged at step {step+1}")
break
else:
print(f"cond={cond_number:6.0f}: not converged after {n_steps} steps, "
f"‖x‖={np.linalg.norm(x):.4f}")

simulate_gradient_descent_conditioning()

Part 8 - Common Failure Modes

:::danger Use eig vs eigh on symmetric matrices np.linalg.eig on a symmetric matrix can return complex eigenvalues due to floating-point errors, even though all eigenvalues are mathematically real. np.linalg.eigh guarantees real eigenvalues.

import numpy as np

S = np.array([[2.0, 1.0],
[1.0, 2.0]]) # symmetric

vals_eig, vecs_eig = np.linalg.eig(S)
vals_eigh, vecs_eigh = np.linalg.eigh(S)

print(f"eig eigenvalues: {vals_eig}") # might be [3.+0j, 1.+0j] (complex dtype)
print(f"eigh eigenvalues: {vals_eigh}") # [1.0, 3.0] (real, ascending)

# ALWAYS use eigh for covariance, Gram, Laplacian, Hessian matrices

:::

:::danger Eigenvectors are not unique - only directions are The eigenvector equation Av = λv is satisfied by any scalar multiple of v. NumPy returns unit eigenvectors, but the sign is arbitrary: v and -v are both valid eigenvectors. This matters when:

  • Comparing PCA results across runs (signs may flip)
  • Comparing eigenvectors from different implementations

Always compare directions, not the vectors themselves:

# WRONG: compare eigenvectors directly
assert np.allclose(v1, v2) # may fail due to sign flip

# RIGHT: compare cosine similarity (absolute value)
assert abs(np.dot(v1 / np.linalg.norm(v1),
v2 / np.linalg.norm(v2))) > 0.999

:::

:::tip The eigenvalue spectrum reveals model capacity For a neural network weight matrix W, np.linalg.eigvalsh(W.T @ W) gives the squared singular values. The number of large singular values = effective rank = how many independent directions the layer can represent. If most eigenvalues are near zero, the layer is low-rank and a LoRA-style decomposition could reduce it with little information loss. :::

Interview Questions

Q1: Why do symmetric matrices always have real eigenvalues?

Proof sketch (by contradiction): Suppose Sv = λv where S = Sᵀ and λ is complex.

Take the conjugate transpose: vᵀSᵀ = λ vᵀ → vᵀS = λ* v*ᵀ (since S = Sᵀ)

Multiply on the right by v: vᵀSv = λ v*ᵀv

But also from Sv = λv: vᵀSv = vᵀ(λv) = λ v*ᵀv

So λ vᵀv = λ vᵀv → (λ - λ) v*ᵀv = 0

Since v ≠ 0, vᵀv = ‖v‖² > 0, so λ = λ → λ is real.

ML implication: Covariance matrices, Gram matrices, graph Laplacians, and Hessians are all symmetric, so their eigenvalues are always real - you can always sort them, compare them, and use them as variance estimates.

Q2: What does a zero eigenvalue mean for an ML matrix?

A zero eigenvalue means the matrix is singular (non-invertible) and rank-deficient.

For the covariance matrix Σ of data X (n×d):

  • A zero eigenvalue means some feature is an exact linear combination of others (perfect multicollinearity)
  • In practice: zero eigenvalues usually indicate duplicate or highly correlated features
  • The number of zero eigenvalues = d - rank(X) = number of redundant dimensions

Practical consequences:

  • Cannot compute Σ⁻¹ (used in Mahalanobis distance, LDA, linear regression with ridge=0)
  • Ridge regression adds αI to ensure all eigenvalues ≥ α > 0 (no zero eigenvalues)
  • PCA discards components with zero (or near-zero) eigenvalues since they carry no variance

For a neural network weight matrix, a zero singular value (eigenvalue of WᵀW = 0) means that dimension is completely unused - motivation for weight pruning.

Q3: How is PCA related to eigendecomposition?

PCA finds the directions of maximum variance in data. Here is the mathematical chain:

  1. Center the data: X_c = X - mean(X)
  2. Compute covariance: Σ = X_cᵀ X_c / (n-1)
  3. Eigendecompose: Σ = Q Λ Qᵀ (symmetric, so orthogonal Q)
  4. Principal components: columns of Q (eigenvectors of Σ)
  5. Explained variance: eigenvalues in Λ (variance along each PC direction)
  6. Project: X_reduced = X_c @ Q[:, :k] (project onto top-k eigenvectors)

The key insight: eigenvectors of Σ are the directions of maximum variance because they solve the optimization problem max ‖Xv‖² subject to ‖v‖ = 1 - and the solution to this is the dominant eigenvector of XᵀX = (n-1)Σ.

In practice, sklearn.decomposition.PCA uses SVD (more numerically stable) rather than explicitly computing the covariance matrix eigendecomposition. Lesson 04 covers this.

Q4: Explain the connection between the trace, determinant, and eigenvalues.

For a matrix A with eigenvalues λ₁, λ₂, ..., λₙ:

Trace: tr(A) = λ₁ + λ₂ + ... + λₙ

  • The trace is the sum of eigenvalues
  • This follows from the characteristic polynomial: the coefficient of λⁿ⁻¹

Determinant: det(A) = λ₁ × λ₂ × ... × λₙ

  • The determinant is the product of eigenvalues
  • This follows from det(A - λI) evaluated at λ = 0

ML use cases:

  • Gaussian log-likelihood: log det(Σ) = Σᵢ log(λᵢ) - use np.linalg.slogdet for stability
  • Verify eigenvalue computation: check that sum = trace and product = det
  • Information content: trace(Σ) = total variance in data = sum of all feature variances
A = np.random.randn(4, 4)
A = A.T @ A # symmetric PD
eigs = np.linalg.eigvalsh(A)
print(f"trace(A) = {np.trace(A):.4f}")
print(f"sum(λᵢ) = {eigs.sum():.4f}")
print(f"det(A) = {np.linalg.det(A):.4f}")
print(f"prod(λᵢ) = {eigs.prod():.4f}")

Practice Challenges

Level 1: Predict

Challenge: Without computing, predict:

  1. What are the eigenvalues of the identity matrix Iₙ?
  2. What are the eigenvalues of 2·I₃?
  3. If A has eigenvalues {3, 5, 7}, what are the eigenvalues of A²?
Answer
  1. I: All eigenvalues = 1 (Iv = v = 1·v for any v, so every vector is an eigenvector with λ=1)
  2. 2I₃: All eigenvalues = 2 (2Iv = 2v, same reasoning)
  3. : eigenvalues are {9, 25, 49} - eigenvalues of Aⁿ are λₙ for each eigenvalue λ

Verification for (3): If Av = λv, then A²v = A(Av) = A(λv) = λ(Av) = λ(λv) = λ²v.

Level 2: Debug

Challenge: The following code is supposed to verify the eigendecomposition but prints False. Find the bug:

A = np.array([[1.0, 2.0], [3.0, 4.0]])
eigenvalues, V = np.linalg.eig(A)
A_reconstructed = V @ np.diag(eigenvalues) @ V
print(np.allclose(A, A_reconstructed)) # prints False
Answer

Bug: The reconstruction formula is A = V Λ V⁻¹, but the code uses V @ diag(λ) @ V (not V⁻¹).

The eigenvector matrix V is NOT generally orthogonal (that's only for symmetric matrices). For a general matrix, V⁻¹ ≠ Vᵀ.

# Fix 1: use V inverse
A_reconstructed = V @ np.diag(eigenvalues) @ np.linalg.inv(V)
print(np.allclose(A, A_reconstructed)) # True

# Fix 2: for SYMMETRIC matrices only, use V.T (since orthogonal → V⁻¹ = Vᵀ)
S = A.T @ A # make symmetric
eigenvalues_s, Q = np.linalg.eigh(S)
S_reconstructed = Q @ np.diag(eigenvalues_s) @ Q.T # Q.T = Q⁻¹ for orthogonal Q
print(np.allclose(S, S_reconstructed)) # True

Level 3: Design

Challenge: Implement the full PCA pipeline from scratch using eigendecomposition (not SVD). Then verify that your results match sklearn.decomposition.PCA.

Answer
import numpy as np

class PCAFromScratch:
"""PCA via eigendecomposition of covariance matrix."""

def __init__(self, n_components: int):
self.n_components = n_components
self.components_ = None # (n_components, n_features)
self.explained_variance_ = None
self.explained_variance_ratio_ = None
self.mean_ = None

def fit(self, X: np.ndarray) -> 'PCAFromScratch':
n, d = X.shape
self.mean_ = X.mean(axis=0)
X_centered = X - self.mean_

# Covariance matrix (d×d)
cov = X_centered.T @ X_centered / (n - 1)

# Eigendecomposition of symmetric matrix
eigenvalues, eigenvectors = np.linalg.eigh(cov)

# Sort descending (eigh returns ascending)
idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

# Store top k components
self.components_ = eigenvectors[:, :self.n_components].T # (k, d)
self.explained_variance_ = eigenvalues[:self.n_components]
self.explained_variance_ratio_ = eigenvalues[:self.n_components] / eigenvalues.sum()
return self

def transform(self, X: np.ndarray) -> np.ndarray:
X_centered = X - self.mean_
return X_centered @ self.components_.T # (n, k)

def fit_transform(self, X: np.ndarray) -> np.ndarray:
return self.fit(X).transform(X)

# Test
from sklearn.decomposition import PCA as SklearnPCA
np.random.seed(42)
X = np.random.randn(100, 10)
k = 3

my_pca = PCAFromScratch(n_components=k)
X_my = my_pca.fit_transform(X)

sk_pca = SklearnPCA(n_components=k)
X_sk = sk_pca.fit_transform(X)

# Note: sign of components may differ
print(f"Shape match: {X_my.shape == X_sk.shape}") # True
print(f"Explained variance ratio match: "
f"{np.allclose(my_pca.explained_variance_ratio_, sk_pca.explained_variance_ratio_)}")
# Note: sklearn uses SVD internally, but results should be equivalent

Quick Reference Cheatsheet

OperationMathNumPyNotes
Eigenvalues + vectors (general)Av=λvnp.linalg.eig(A)May return complex
Eigenvalues + vectors (symmetric)Sv=λvnp.linalg.eigh(S)Real, ascending order
Eigenvalues only (general)λnp.linalg.eigvals(A)Faster than eig
Eigenvalues only (symmetric)λnp.linalg.eigvalsh(S)Real, ascending
ReconstructionA = VΛV⁻¹V @ diag(λ) @ inv(V)General matrix
Reconstruction (symmetric)S = QΛQᵀQ @ diag(λ) @ Q.TOrthogonal Q
Trace = sum eigenvaluestr(A)np.trace(A)Quick sanity check
Det = product eigenvaluesdet(A)np.linalg.det(A)Quick sanity check
Rayleigh quotientvᵀAv/vᵀvv.T @ A @ v / v.T @ vEstimate eigenvalue
Condition number (SPD)λmax/λmineigenvalues[-1]/eigenvalues[0]After eigh

Key Takeaways

  • Eigenvectors are the invariant directions of a linear transformation - they are only scaled, not rotated, by the matrix
  • The eigenvalues measure how much scaling occurs along each eigenvector direction
  • Real symmetric matrices always have real eigenvalues and orthogonal eigenvectors - use np.linalg.eigh for covariance, Gram, and Laplacian matrices
  • Power iteration computes the dominant eigenvector iteratively - it is how PageRank is actually computed at scale
  • PCA finds principal components by eigendecomposing the covariance matrix - eigenvalues equal explained variance
  • The eigenvalue ratio (condition number) determines how quickly gradient descent converges - large ratio = slow convergence

Next: SVD and Matrix Decompositions - The Most Powerful Matrix Factorization →

:::tip 🎮 Interactive Playground

Visualize this concept: Try the Eigenvalues & Eigenvectors demo on the EngineersOfAI Playground - no code required.

:::

© 2026 EngineersOfAI. All rights reserved.