Eigenvalues and Eigenvectors - How PageRank Ranks the Internet

Reading time: ~22 minutes | Level: Mathematical Foundations → ML Engineering

PageRank - the algorithm that made Google the dominant search engine - is computed by finding the principal eigenvector of a graph's link adjacency matrix. The company ran this computation on billions of web pages in the late 1990s. Eigenvalues are not a math exercise. They are how search engines rank the internet.

In ML, eigenvalues appear everywhere:

PCA's principal components are the eigenvectors of the data covariance matrix
The variance explained by each component is the corresponding eigenvalue
Spectral clustering cuts graphs using the eigenvectors of the graph Laplacian
The conditioning of an optimization problem depends on the eigenvalue ratio of the Hessian
Attention heads in transformers operate in eigenspaces of the weight matrices

This lesson teaches you to see eigenvalues not as algebraic tricks, but as the natural language of linear transformations.

What You Will Learn

What an eigenvector is geometrically: directions a matrix does not rotate
The characteristic polynomial: intuition, not memorization
Eigendecomposition: when it exists and what it reveals
Real symmetric matrices: guaranteed real eigenvalues and orthogonal eigenvectors
Power iteration: how eigenvalues are computed in practice (PageRank)
NumPy: np.linalg.eig, np.linalg.eigh, verification of eigenpairs
ML connections: PCA, covariance matrix, spectral methods, optimization conditioning

Prerequisites

Part 1 - What an Eigenvector Really Is

When you apply a matrix A to most vectors, two things happen: the vector rotates (changes direction) and scales (changes magnitude). But there are special directions that a matrix does not rotate - it only scales them.

An eigenvector of a matrix A is a non-zero vector v such that:

Av = λv

where λ is a scalar called the eigenvalue. The matrix transforms v into a scaled version of itself - same direction (or exact opposite), different length.

Geometric intuition

Think of it this way: if you apply a transformation to a rubber sheet, almost every point moves in a complicated way. But there are special directions - the eigenvector directions - along which points only slide back and forth (scale), never rotating.

A concrete example

Consider the matrix:

A = [3  1]
    [0  2]

Apply A to the vector v₁ = [1, 0]:

Av₁ = [3·1 + 1·0]   [3]
      [0·1 + 2·0] = [0] = 3·[1, 0]

So v₁ = [1, 0] is an eigenvector with eigenvalue λ₁ = 3.

Apply A to the vector v₂ = [1, -1]:

Av₂ = [3·1 + 1·(-1)]   [2]
      [0·1 + 2·(-1)] = [-2] = 2·[1, -1]

So v₂ = [1, -1] is an eigenvector with eigenvalue λ₂ = 2.

Why eigenvectors reveal a matrix's "personality"

The eigenvectors of a matrix are its invariant directions - the directions along which the transformation is simplest (pure scaling). The eigenvalues tell you how much scaling occurs in each direction.

A matrix's eigenvalues determine:

Whether the transformation expands (|λ| > 1) or contracts (|λ| < 1) along each direction
Whether the transformation is invertible (all λ ≠ 0)
The long-run behavior of repeated applications (Aⁿv)

Part 2 - The Characteristic Polynomial (Intuition)

To find the eigenvalues of A, we need to solve:

Av = λv
Av - λv = 0
(A - λI)v = 0

For a non-zero vector v to satisfy (A - λI)v = 0, the matrix (A - λI) must be singular (non-invertible). A matrix is singular if and only if its determinant is zero:

det(A - λI) = 0

This equation is called the characteristic equation. Expanding the determinant gives a polynomial in λ called the characteristic polynomial. Its roots are the eigenvalues.

For a 2×2 matrix:

A = [a  b]
    [c  d]

det(A - λI) = det([a-λ  b  ]) = (a-λ)(d-λ) - bc = 0
                  [c    d-λ]

λ² - (a+d)λ + (ad-bc) = 0
λ² - tr(A)λ + det(A) = 0

Key insight: the sum of eigenvalues = trace(A) and the product of eigenvalues = det(A). These two facts let you verify your eigenvalue calculations.

Computing eigenvalues in NumPy

import numpy as np

# Example matrix
A = np.array([[3.0, 1.0],
              [0.0, 2.0]])

# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"Eigenvalues: {eigenvalues}")       # [3. 2.]
print(f"Eigenvectors (columns):\n{eigenvectors}")  # v₁=[1,0], v₂=[...,-1]

# Verify: Av = λv for each eigenpair
for i, (lam, v) in enumerate(zip(eigenvalues, eigenvectors.T)):
    Av = A @ v
    lam_v = lam * v
    print(f"\nEigenpair {i+1}:")
    print(f"  Av  = {Av}")
    print(f"  λv  = {lam_v}")
    print(f"  Match: {np.allclose(Av, lam_v)}")

# Verify trace and determinant relationships
print(f"\nTrace(A) = {np.trace(A)}")                    # 5
print(f"Sum of eigenvalues = {eigenvalues.sum()}")      # 5 ✓
print(f"Det(A) = {np.linalg.det(A):.1f}")             # 6
print(f"Product of eigenvalues = {eigenvalues.prod():.1f}")  # 6 ✓

Part 3 - Eigendecomposition: When a Matrix Splits Into Its Essence

If an n×n matrix A has n linearly independent eigenvectors v₁, v₂, ..., vₙ with eigenvalues λ₁, λ₂, ..., λₙ, then A can be eigendecomposed:

A = V Λ V⁻¹

Where:

V = [v₁ | v₂ | ... | vₙ] is the matrix whose columns are eigenvectors
Λ = diag(λ₁, λ₂, ..., λₙ) is the diagonal matrix of eigenvalues
V⁻¹ is the inverse of V

Why eigendecomposition is powerful

The eigendecomposition reveals the matrix's intrinsic structure:

Matrix powers are easy: Aⁿ = V Λⁿ V⁻¹ (just raise the diagonal entries to the n-th power)
Matrix functions are easy: f(A) = V f(Λ) V⁻¹ (apply f to each eigenvalue)
Rank is visible: rank(A) = number of non-zero eigenvalues

When eigendecomposition exists

Not all matrices have eigendecompositions:

Eigendecomposition exists iff A has n linearly independent eigenvectors
Symmetric matrices always have eigendecomposition (with orthogonal eigenvectors)
Non-symmetric matrices may not have eigendecomposition (use SVD instead - Lesson 04)

import numpy as np

# Full eigendecomposition
A = np.array([[4.0, 2.0],
              [1.0, 3.0]])

eigenvalues, V = np.linalg.eig(A)
Lambda = np.diag(eigenvalues)
V_inv = np.linalg.inv(V)

# Verify: A = V Λ V⁻¹
A_reconstructed = V @ Lambda @ V_inv
print(f"Reconstruction error: {np.linalg.norm(A - A_reconstructed):.2e}")  # ≈ 0

# Compute matrix power using eigendecomposition
n = 10
A_power_10 = V @ np.diag(eigenvalues**10) @ V_inv
A_power_10_direct = np.linalg.matrix_power(A, 10)
print(f"A^10 match: {np.allclose(A_power_10, A_power_10_direct)}")  # True

# Compute matrix exponential (used in ODE solvers, graph Laplacians)
A_exp = V @ np.diag(np.exp(eigenvalues)) @ V_inv
print(f"Matrix exponential computed via eigendecomposition")

Part 4 - Real Symmetric Matrices: The Special Case ML Engineers Must Know

Real symmetric matrices (A = Aᵀ) have extraordinary properties that make them central to ML:

The Spectral Theorem

Every real symmetric matrix has:

All real eigenvalues (no complex numbers)
Orthonormal eigenvectors (eigenvectors are perpendicular and unit length)
Always diagonalizable (n independent eigenvectors always exist)

This means for a symmetric matrix S, the eigendecomposition is:

S = Q Λ Qᵀ    (note: Qᵀ instead of Q⁻¹ because Q is orthogonal: Qᵀ = Q⁻¹)

Why symmetric matrices are everywhere in ML

Almost every matrix that appears in statistics and ML is symmetric:

Matrix	Why it's symmetric	ML use
Covariance matrix Σ	Σᵢⱼ = Σⱼᵢ by definition	PCA, Gaussian models
Gram matrix XᵀX	(XᵀX)ᵀ = XᵀX	Ridge regression, kernel methods
Correlation matrix	Same as covariance, just normalized	Feature analysis
Graph Laplacian L	L = D - A, both symmetric	Spectral clustering, GNNs
Hessian matrix H	H = ∂²L/∂θ² (for smooth L)	Newton's method, curvature

import numpy as np

# Symmetric matrix - use np.linalg.eigh (not eig!)
# eigh is specifically for symmetric/Hermitian matrices:
# - Guaranteed real eigenvalues
# - Eigenvectors orthogonal by construction
# - Returns eigenvalues in ascending order
# - Faster and more numerically stable than eig

n = 5
A = np.random.randn(n, n)
S = A.T @ A  # Always symmetric positive semidefinite

# Use eigh for symmetric matrices
eigenvalues, Q = np.linalg.eigh(S)  # eigenvalues in ascending order

print(f"Eigenvalues (all real, ≥ 0): {eigenvalues}")
print(f"Are eigenvectors orthonormal? {np.allclose(Q.T @ Q, np.eye(n))}")  # True

# Reconstruction: S = Q Λ Qᵀ
S_reconstructed = Q @ np.diag(eigenvalues) @ Q.T
print(f"Reconstruction error: {np.linalg.norm(S - S_reconstructed):.2e}")  # ≈ 0

# Positive definite check: all eigenvalues > 0
print(f"Positive definite: {np.all(eigenvalues > 0)}")

# Positive semidefinite check: all eigenvalues ≥ 0 (allowing for numerical noise)
print(f"PSD: {np.all(eigenvalues >= -1e-10)}")

:::tip Always use np.linalg.eigh for symmetric matrices np.linalg.eig works for any square matrix but returns complex numbers (even for real symmetric matrices, due to numerical precision). np.linalg.eigh is specifically optimized for symmetric/Hermitian matrices:

Returns real eigenvalues (guaranteed)
Returns orthonormal eigenvectors
Is faster and more numerically stable
Returns eigenvalues in ascending order (convenient for PCA: largest last)

Rule: if your matrix is symmetric (which it is for covariance, Gram, Laplacian, Hessian), always use eigh. :::

Part 5 - Power Iteration: How PageRank Is Actually Computed

For a large matrix (billions × billions in Google's case), directly computing eigenvalues via the characteristic polynomial is infeasible. Power iteration is the practical algorithm for finding the largest eigenvalue (and corresponding eigenvector).

Algorithm

Start with a random vector v₀
Repeatedly: v_{k+1} = A v_k / ‖A v_k‖₂
Converges to: the eigenvector corresponding to the largest eigenvalue

Why it works

If we expand v₀ in the eigenvector basis: v₀ = c₁v₁ + c₂v₂ + ... + cₙvₙ

Then:

Aᵏv₀ = c₁λ₁ᵏv₁ + c₂λ₂ᵏv₂ + ... + cₙλₙᵏvₙ
      = λ₁ᵏ(c₁v₁ + c₂(λ₂/λ₁)ᵏv₂ + ... + cₙ(λₙ/λ₁)ᵏvₙ)

If λ₁ is the largest eigenvalue (|λ₁| > |λᵢ| for i > 1), then (λᵢ/λ₁)ᵏ → 0 as k → ∞. The v₁ term dominates, and the iteration converges to v₁.

PageRank as power iteration

PageRank models the internet as a random walk: a "random surfer" follows links at random. The probability of being at each page after many steps converges to the principal eigenvector of the transition matrix.

import numpy as np

def power_iteration(
    A: np.ndarray,
    n_iterations: int = 100,
    tol: float = 1e-8
) -> tuple:
    """
    Find the dominant eigenvector and eigenvalue using power iteration.

    Converges to the eigenvector corresponding to the largest |eigenvalue|.
    Rate of convergence depends on |λ₁/λ₂| - larger gap = faster convergence.
    """
    n = A.shape[0]
    v = np.random.randn(n)
    v /= np.linalg.norm(v)  # start with unit vector

    for i in range(n_iterations):
        v_new = A @ v
        eigenvalue = np.linalg.norm(v_new)  # Rayleigh quotient approximation
        v_new /= eigenvalue                  # normalize

        if np.linalg.norm(v_new - v) < tol:
            print(f"Converged at iteration {i+1}")
            break
        v = v_new

    return eigenvalue, v

# Verify against np.linalg.eig
np.random.seed(42)
A = np.random.randn(5, 5)
A = A.T @ A  # make symmetric positive definite

eigenval_pi, eigenvec_pi = power_iteration(A)

eigenvals_np, eigenvecs_np = np.linalg.eigh(A)
# eigh returns eigenvalues in ascending order → largest is last
largest_idx = np.argmax(eigenvals_np)

print(f"Power iteration λ₁: {eigenval_pi:.6f}")
print(f"NumPy λ₁:            {eigenvals_np[largest_idx]:.6f}")
print(f"Match: {np.isclose(eigenval_pi, eigenvals_np[largest_idx], rtol=1e-4)}")

def pagerank(
    adjacency: np.ndarray,
    damping: float = 0.85,
    n_iterations: int = 100
) -> np.ndarray:
    """
    Compute PageRank scores via power iteration.

    PageRank: r = damping * M @ r + (1-damping)/n * ones
    where M is the column-normalized adjacency matrix.
    """
    n = adjacency.shape[0]

    # Column-normalize the adjacency matrix (each column sums to 1)
    col_sums = adjacency.sum(axis=0)
    col_sums[col_sums == 0] = 1  # avoid division by zero (dangling nodes)
    M = adjacency / col_sums  # transition probability matrix

    # Power iteration on the PageRank equation
    r = np.ones(n) / n  # uniform initial distribution
    teleport = (1 - damping) / n * np.ones(n)

    for _ in range(n_iterations):
        r_new = damping * M @ r + teleport
        if np.linalg.norm(r_new - r) < 1e-10:
            break
        r = r_new

    return r

# Example: 4-node graph
# 0 → 1, 0 → 2, 1 → 3, 2 → 1, 2 → 3, 3 → 0
adj = np.array([[0, 0, 0, 1],
                [1, 0, 1, 0],
                [1, 0, 0, 0],
                [0, 1, 1, 0]], dtype=float)

ranks = pagerank(adj)
print(f"\nPageRank scores: {ranks}")
print(f"Highest-ranked node: {ranks.argmax()}")

Part 6 - NumPy: Complete Eigenvalue Reference

import numpy as np

# ── General matrices: np.linalg.eig ───────────────────────────────────────
A = np.array([[4, 2],
              [1, 3]], dtype=float)

# Returns: eigenvalues (complex in general), eigenvectors as columns
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"Eigenvalues: {eigenvalues}")     # [5. 2.]
print(f"Eigenvectors:\n{eigenvectors}")  # columns are eigenvectors

# ── Symmetric matrices: np.linalg.eigh ────────────────────────────────────
S = A.T @ A  # guaranteed symmetric

# Returns: real eigenvalues (ascending), orthonormal eigenvectors
eigenvalues_s, Q = np.linalg.eigh(S)
print(f"\nSymmetric eigenvalues (ascending): {eigenvalues_s}")
print(f"Orthonormal: {np.allclose(Q.T @ Q, np.eye(len(eigenvalues_s)))}")

# ── Eigenvalues only (faster than eig/eigh if eigenvectors not needed) ────
eigenvalues_only = np.linalg.eigvalsh(S)  # symmetric
eigenvalues_only_gen = np.linalg.eigvals(A)  # general

# ── Rayleigh quotient: eigenvalue of v w.r.t. A ───────────────────────────
def rayleigh_quotient(A: np.ndarray, v: np.ndarray) -> float:
    """
    Estimate eigenvalue for a given vector v.
    Equals the exact eigenvalue when v is an exact eigenvector.
    """
    return float(v.T @ A @ v / (v.T @ v))

v = eigenvectors[:, 0]  # first eigenvector
rq = rayleigh_quotient(A, v)
print(f"\nRayleigh quotient = {rq:.6f}")
print(f"Exact eigenvalue  = {eigenvalues[0].real:.6f}")

# ── Condition number from eigenvalues (for symmetric PD matrices) ──────────
# For symmetric positive definite A: cond(A) = λmax / λmin
eigenvals_pos = np.linalg.eigvalsh(S)
cond_from_eigen = eigenvals_pos.max() / eigenvals_pos.min()
cond_from_linalg = np.linalg.cond(S)
print(f"\nCondition (from eigenvalues): {cond_from_eigen:.4f}")
print(f"Condition (np.linalg.cond):   {cond_from_linalg:.4f}")

# ── PCA using eigendecomposition ──────────────────────────────────────────
def pca_via_eigen(X: np.ndarray, n_components: int) -> tuple:
    """
    PCA via eigendecomposition of covariance matrix.
    (sklearn uses SVD internally - see Lesson 04 for why)
    """
    # Center the data
    X_centered = X - X.mean(axis=0)
    n = X.shape[0]

    # Compute covariance matrix
    cov = (X_centered.T @ X_centered) / (n - 1)  # (d, d)

    # Eigendecomposition of symmetric covariance matrix
    eigenvalues, eigenvectors = np.linalg.eigh(cov)

    # Sort in descending order (eigh returns ascending)
    idx = np.argsort(eigenvalues)[::-1]
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]

    # Select top-k principal components
    components = eigenvectors[:, :n_components]  # (d, k)

    # Project data
    X_projected = X_centered @ components  # (n, k)

    # Explained variance ratio
    explained_var = eigenvalues[:n_components] / eigenvalues.sum()

    return X_projected, components, explained_var

# Test on synthetic data
np.random.seed(42)
n_samples, n_features = 200, 10
X = np.random.randn(n_samples, n_features) @ np.random.randn(n_features, 3)  # rank-3 data

X_pca, components, var_ratio = pca_via_eigen(X, n_components=3)
print(f"\nPCA output shape: {X_pca.shape}")      # (200, 3)
print(f"Explained variance: {var_ratio}")          # ≈ [0.33, 0.33, 0.33]
print(f"Total explained: {var_ratio.sum():.4f}")   # ≈ 1.0 (rank-3 data)

Part 7 - ML Connections: Eigenvalues in Real ML Systems

PCA: eigenvalues as explained variance

import numpy as np

# The eigenvalues of the covariance matrix ARE the variances
# along the principal component directions

np.random.seed(42)
n, d = 300, 5

# Create data with known variance structure
variances = [10.0, 5.0, 2.0, 0.5, 0.1]  # variance along each principal axis
X = np.column_stack([
    np.sqrt(var) * np.random.randn(n) for var in variances
])

# Compute covariance and its eigenvalues
X_centered = X - X.mean(axis=0)
cov = X_centered.T @ X_centered / (n - 1)
eigenvalues, _ = np.linalg.eigh(cov)
eigenvalues = eigenvalues[::-1]  # descending

print("True variances:", variances)
print(f"Eigenvalues:   {eigenvalues.round(2)}")
# Should match (up to sampling noise)

# Explained variance ratio (scree plot values)
explained = eigenvalues / eigenvalues.sum()
cumulative = np.cumsum(explained)
print(f"Explained variance ratio: {explained.round(3)}")
print(f"Cumulative explained:     {cumulative.round(3)}")
# → 10/(10+5+2+0.5+0.1) ≈ 0.57 for first component

Spectral clustering: graph Laplacian eigenvectors

import numpy as np

def spectral_clustering(W: np.ndarray, k: int) -> np.ndarray:
    """
    Spectral clustering: find k clusters using eigenvectors of the graph Laplacian.

    Intuition: the k smallest eigenvectors of L encode the k-way partition
    of the graph into weakly-connected components.
    """
    n = W.shape[0]

    # Graph Laplacian: L = D - W
    # D is the diagonal degree matrix
    degrees = W.sum(axis=1)
    D = np.diag(degrees)
    L = D - W  # unnormalized Laplacian

    # Normalized Laplacian: Lsym = D^(-1/2) L D^(-1/2)
    D_inv_sqrt = np.diag(1.0 / np.sqrt(degrees + 1e-10))
    L_sym = D_inv_sqrt @ L @ D_inv_sqrt

    # The k smallest eigenvectors reveal the cluster structure
    eigenvalues, eigenvectors = np.linalg.eigh(L_sym)
    # eigh returns ascending order - take first k (smallest)
    cluster_representation = eigenvectors[:, :k]  # (n, k)

    # k-means on the cluster_representation
    # (in practice: sklearn.cluster.KMeans)
    # Here: simple assignment for illustration
    from numpy.random import choice
    centers = cluster_representation[choice(n, k, replace=False)]

    for _ in range(50):  # k-means iterations
        # Assign each point to nearest center
        dists = np.array([np.linalg.norm(cluster_representation - c, axis=1)
                         for c in centers])
        labels = np.argmin(dists, axis=0)
        # Update centers
        centers = np.array([cluster_representation[labels == j].mean(axis=0)
                           for j in range(k)])

    return labels

# Note: the second smallest eigenvalue of L is the Fiedler value
# - it measures the "algebraic connectivity" of the graph
# Zero eigenvalues of L = number of disconnected components

Optimization: eigenvalue ratio = conditioning

import numpy as np

# The Hessian matrix of the loss function determines optimization difficulty
# Eigenvalue ratio = condition number = how "elongated" the loss landscape is

def simulate_gradient_descent_conditioning():
    """
    Show how eigenvalue ratio affects gradient descent convergence.
    Well-conditioned: eigenvalues similar → fast convergence
    Ill-conditioned: eigenvalue ratio large → slow convergence
    """
    np.random.seed(42)

    for cond_number in [1.0, 10.0, 100.0, 1000.0]:
        # Create a quadratic loss with given condition number
        # Loss = 0.5 * x.T @ H @ x, optimal at x=0
        # H has eigenvalues [1, cond_number]
        Q = np.array([[1.0, 0.0],
                      [0.0, cond_number]])  # diagonal Hessian

        # Gradient descent with learning rate η = 1/λmax
        x = np.array([1.0, 1.0])
        lr = 1.0 / cond_number

        n_steps = 100
        for step in range(n_steps):
            grad = Q @ x  # gradient of quadratic loss
            x = x - lr * grad
            if np.linalg.norm(x) < 1e-6:
                print(f"cond={cond_number:6.0f}: converged at step {step+1}")
                break
        else:
            print(f"cond={cond_number:6.0f}: not converged after {n_steps} steps, "
                  f"‖x‖={np.linalg.norm(x):.4f}")

simulate_gradient_descent_conditioning()

Part 8 - Common Failure Modes

:::danger Use eig vs eigh on symmetric matrices np.linalg.eig on a symmetric matrix can return complex eigenvalues due to floating-point errors, even though all eigenvalues are mathematically real. np.linalg.eigh guarantees real eigenvalues.

import numpy as np

S = np.array([[2.0, 1.0],
              [1.0, 2.0]])  # symmetric

vals_eig, vecs_eig = np.linalg.eig(S)
vals_eigh, vecs_eigh = np.linalg.eigh(S)

print(f"eig  eigenvalues: {vals_eig}")   # might be [3.+0j, 1.+0j] (complex dtype)
print(f"eigh eigenvalues: {vals_eigh}")  # [1.0, 3.0] (real, ascending)

# ALWAYS use eigh for covariance, Gram, Laplacian, Hessian matrices

:::

:::danger Eigenvectors are not unique - only directions are The eigenvector equation Av = λv is satisfied by any scalar multiple of v. NumPy returns unit eigenvectors, but the sign is arbitrary: v and -v are both valid eigenvectors. This matters when:

Comparing PCA results across runs (signs may flip)
Comparing eigenvectors from different implementations

Always compare directions, not the vectors themselves:

# WRONG: compare eigenvectors directly
assert np.allclose(v1, v2)  # may fail due to sign flip

# RIGHT: compare cosine similarity (absolute value)
assert abs(np.dot(v1 / np.linalg.norm(v1),
               v2 / np.linalg.norm(v2))) > 0.999

:::

:::tip The eigenvalue spectrum reveals model capacity For a neural network weight matrix W, np.linalg.eigvalsh(W.T @ W) gives the squared singular values. The number of large singular values = effective rank = how many independent directions the layer can represent. If most eigenvalues are near zero, the layer is low-rank and a LoRA-style decomposition could reduce it with little information loss. :::

Interview Questions

Q1: Why do symmetric matrices always have real eigenvalues?

Proof sketch (by contradiction): Suppose Sv = λv where S = Sᵀ and λ is complex.

Take the conjugate transpose: vᵀSᵀ = λ vᵀ → vᵀS = λ* v*ᵀ (since S = Sᵀ)

Multiply on the right by v: vᵀSv = λ v*ᵀv

But also from Sv = λv: vᵀSv = vᵀ(λv) = λ v*ᵀv

So λ vᵀv = λ vᵀv → (λ - λ) v*ᵀv = 0

Since v ≠ 0, vᵀv = ‖v‖² > 0, so λ = λ → λ is real.

ML implication: Covariance matrices, Gram matrices, graph Laplacians, and Hessians are all symmetric, so their eigenvalues are always real - you can always sort them, compare them, and use them as variance estimates.

Q2: What does a zero eigenvalue mean for an ML matrix?

A zero eigenvalue means the matrix is singular (non-invertible) and rank-deficient.

For the covariance matrix Σ of data X (n×d):

A zero eigenvalue means some feature is an exact linear combination of others (perfect multicollinearity)
In practice: zero eigenvalues usually indicate duplicate or highly correlated features
The number of zero eigenvalues = d - rank(X) = number of redundant dimensions

Practical consequences:

Cannot compute Σ⁻¹ (used in Mahalanobis distance, LDA, linear regression with ridge=0)
Ridge regression adds αI to ensure all eigenvalues ≥ α > 0 (no zero eigenvalues)
PCA discards components with zero (or near-zero) eigenvalues since they carry no variance

For a neural network weight matrix, a zero singular value (eigenvalue of WᵀW = 0) means that dimension is completely unused - motivation for weight pruning.

Q3: How is PCA related to eigendecomposition?

PCA finds the directions of maximum variance in data. Here is the mathematical chain:

Center the data: X_c = X - mean(X)
Compute covariance: Σ = X_cᵀ X_c / (n-1)
Eigendecompose: Σ = Q Λ Qᵀ (symmetric, so orthogonal Q)
Principal components: columns of Q (eigenvectors of Σ)
Explained variance: eigenvalues in Λ (variance along each PC direction)
Project: X_reduced = X_c @ Q[:, :k] (project onto top-k eigenvectors)

The key insight: eigenvectors of Σ are the directions of maximum variance because they solve the optimization problem max ‖Xv‖² subject to ‖v‖ = 1 - and the solution to this is the dominant eigenvector of XᵀX = (n-1)Σ.

In practice, sklearn.decomposition.PCA uses SVD (more numerically stable) rather than explicitly computing the covariance matrix eigendecomposition. Lesson 04 covers this.

Q4: Explain the connection between the trace, determinant, and eigenvalues.

For a matrix A with eigenvalues λ₁, λ₂, ..., λₙ:

Trace: tr(A) = λ₁ + λ₂ + ... + λₙ

The trace is the sum of eigenvalues
This follows from the characteristic polynomial: the coefficient of λⁿ⁻¹

Determinant: det(A) = λ₁ × λ₂ × ... × λₙ

The determinant is the product of eigenvalues
This follows from det(A - λI) evaluated at λ = 0

ML use cases:

Gaussian log-likelihood: log det(Σ) = Σᵢ log(λᵢ) - use np.linalg.slogdet for stability
Verify eigenvalue computation: check that sum = trace and product = det
Information content: trace(Σ) = total variance in data = sum of all feature variances

A = np.random.randn(4, 4)
A = A.T @ A  # symmetric PD
eigs = np.linalg.eigvalsh(A)
print(f"trace(A) = {np.trace(A):.4f}")
print(f"sum(λᵢ)  = {eigs.sum():.4f}")
print(f"det(A)   = {np.linalg.det(A):.4f}")
print(f"prod(λᵢ) = {eigs.prod():.4f}")

Practice Challenges

Level 1: Predict

Challenge: Without computing, predict:

What are the eigenvalues of the identity matrix Iₙ?
What are the eigenvalues of 2·I₃?
If A has eigenvalues {3, 5, 7}, what are the eigenvalues of A²?

Answer

I: All eigenvalues = 1 (Iv = v = 1·v for any v, so every vector is an eigenvector with λ=1)
2I₃: All eigenvalues = 2 (2Iv = 2v, same reasoning)
A²: eigenvalues are {9, 25, 49} - eigenvalues of Aⁿ are λₙ for each eigenvalue λ

Verification for (3): If Av = λv, then A²v = A(Av) = A(λv) = λ(Av) = λ(λv) = λ²v.

Level 2: Debug

Challenge: The following code is supposed to verify the eigendecomposition but prints False. Find the bug:

A = np.array([[1.0, 2.0], [3.0, 4.0]])
eigenvalues, V = np.linalg.eig(A)
A_reconstructed = V @ np.diag(eigenvalues) @ V
print(np.allclose(A, A_reconstructed))  # prints False

Answer

Bug: The reconstruction formula is A = V Λ V⁻¹, but the code uses V @ diag(λ) @ V (not V⁻¹).

The eigenvector matrix V is NOT generally orthogonal (that's only for symmetric matrices). For a general matrix, V⁻¹ ≠ Vᵀ.

# Fix 1: use V inverse
A_reconstructed = V @ np.diag(eigenvalues) @ np.linalg.inv(V)
print(np.allclose(A, A_reconstructed))  # True

# Fix 2: for SYMMETRIC matrices only, use V.T (since orthogonal → V⁻¹ = Vᵀ)
S = A.T @ A  # make symmetric
eigenvalues_s, Q = np.linalg.eigh(S)
S_reconstructed = Q @ np.diag(eigenvalues_s) @ Q.T  # Q.T = Q⁻¹ for orthogonal Q
print(np.allclose(S, S_reconstructed))  # True

Level 3: Design

Challenge: Implement the full PCA pipeline from scratch using eigendecomposition (not SVD). Then verify that your results match sklearn.decomposition.PCA.

Answer

import numpy as np

class PCAFromScratch:
    """PCA via eigendecomposition of covariance matrix."""

    def __init__(self, n_components: int):
        self.n_components = n_components
        self.components_ = None       # (n_components, n_features)
        self.explained_variance_ = None
        self.explained_variance_ratio_ = None
        self.mean_ = None

    def fit(self, X: np.ndarray) -> 'PCAFromScratch':
        n, d = X.shape
        self.mean_ = X.mean(axis=0)
        X_centered = X - self.mean_

        # Covariance matrix (d×d)
        cov = X_centered.T @ X_centered / (n - 1)

        # Eigendecomposition of symmetric matrix
        eigenvalues, eigenvectors = np.linalg.eigh(cov)

        # Sort descending (eigh returns ascending)
        idx = np.argsort(eigenvalues)[::-1]
        eigenvalues = eigenvalues[idx]
        eigenvectors = eigenvectors[:, idx]

        # Store top k components
        self.components_ = eigenvectors[:, :self.n_components].T  # (k, d)
        self.explained_variance_ = eigenvalues[:self.n_components]
        self.explained_variance_ratio_ = eigenvalues[:self.n_components] / eigenvalues.sum()
        return self

    def transform(self, X: np.ndarray) -> np.ndarray:
        X_centered = X - self.mean_
        return X_centered @ self.components_.T  # (n, k)

    def fit_transform(self, X: np.ndarray) -> np.ndarray:
        return self.fit(X).transform(X)

# Test
from sklearn.decomposition import PCA as SklearnPCA
np.random.seed(42)
X = np.random.randn(100, 10)
k = 3

my_pca = PCAFromScratch(n_components=k)
X_my = my_pca.fit_transform(X)

sk_pca = SklearnPCA(n_components=k)
X_sk = sk_pca.fit_transform(X)

# Note: sign of components may differ
print(f"Shape match: {X_my.shape == X_sk.shape}")  # True
print(f"Explained variance ratio match: "
      f"{np.allclose(my_pca.explained_variance_ratio_, sk_pca.explained_variance_ratio_)}")
# Note: sklearn uses SVD internally, but results should be equivalent

Quick Reference Cheatsheet

Operation	Math	NumPy	Notes
Eigenvalues + vectors (general)	Av=λv	`np.linalg.eig(A)`	May return complex
Eigenvalues + vectors (symmetric)	Sv=λv	`np.linalg.eigh(S)`	Real, ascending order
Eigenvalues only (general)	λ	`np.linalg.eigvals(A)`	Faster than eig
Eigenvalues only (symmetric)	λ	`np.linalg.eigvalsh(S)`	Real, ascending
Reconstruction	A = VΛV⁻¹	`V @ diag(λ) @ inv(V)`	General matrix
Reconstruction (symmetric)	S = QΛQᵀ	`Q @ diag(λ) @ Q.T`	Orthogonal Q
Trace = sum eigenvalues	tr(A)	`np.trace(A)`	Quick sanity check
Det = product eigenvalues	det(A)	`np.linalg.det(A)`	Quick sanity check
Rayleigh quotient	vᵀAv/vᵀv	`v.T @ A @ v / v.T @ v`	Estimate eigenvalue
Condition number (SPD)	λmax/λmin	`eigenvalues[-1]/eigenvalues[0]`	After eigh

Key Takeaways

Eigenvectors are the invariant directions of a linear transformation - they are only scaled, not rotated, by the matrix
The eigenvalues measure how much scaling occurs along each eigenvector direction
Real symmetric matrices always have real eigenvalues and orthogonal eigenvectors - use np.linalg.eigh for covariance, Gram, and Laplacian matrices
Power iteration computes the dominant eigenvector iteratively - it is how PageRank is actually computed at scale
PCA finds principal components by eigendecomposing the covariance matrix - eigenvalues equal explained variance
The eigenvalue ratio (condition number) determines how quickly gradient descent converges - large ratio = slow convergence

Next: SVD and Matrix Decompositions - The Most Powerful Matrix Factorization →

:::tip 🎮 Interactive Playground

Visualize this concept: Try the Eigenvalues & Eigenvectors demo on the EngineersOfAI Playground - no code required.

:::

What You Will Learn​

Prerequisites​

Part 1 - What an Eigenvector Really Is​

Geometric intuition​

A concrete example​

Why eigenvectors reveal a matrix's "personality"​

Part 2 - The Characteristic Polynomial (Intuition)​

For a 2×2 matrix:​

Computing eigenvalues in NumPy​

Part 3 - Eigendecomposition: When a Matrix Splits Into Its Essence​

Why eigendecomposition is powerful​

When eigendecomposition exists​

Part 4 - Real Symmetric Matrices: The Special Case ML Engineers Must Know​

The Spectral Theorem​

Why symmetric matrices are everywhere in ML​

Part 5 - Power Iteration: How PageRank Is Actually Computed​

Algorithm​

Why it works​

PageRank as power iteration​

Part 6 - NumPy: Complete Eigenvalue Reference​

Part 7 - ML Connections: Eigenvalues in Real ML Systems​

PCA: eigenvalues as explained variance​

Spectral clustering: graph Laplacian eigenvectors​

Optimization: eigenvalue ratio = conditioning​

Part 8 - Common Failure Modes​

Interview Questions​

Practice Challenges​

Level 1: Predict​

Level 2: Debug​

Level 3: Design​

Quick Reference Cheatsheet​

Key Takeaways​

What You Will Learn

Prerequisites

Part 1 - What an Eigenvector Really Is

Geometric intuition

A concrete example

Why eigenvectors reveal a matrix's "personality"

Part 2 - The Characteristic Polynomial (Intuition)

For a 2×2 matrix:

Computing eigenvalues in NumPy

Part 3 - Eigendecomposition: When a Matrix Splits Into Its Essence

Why eigendecomposition is powerful

When eigendecomposition exists

Part 4 - Real Symmetric Matrices: The Special Case ML Engineers Must Know

The Spectral Theorem

Why symmetric matrices are everywhere in ML

Part 5 - Power Iteration: How PageRank Is Actually Computed

Algorithm

Why it works

PageRank as power iteration

Part 6 - NumPy: Complete Eigenvalue Reference

Part 7 - ML Connections: Eigenvalues in Real ML Systems

PCA: eigenvalues as explained variance

Spectral clustering: graph Laplacian eigenvectors

Optimization: eigenvalue ratio = conditioning

Part 8 - Common Failure Modes

Interview Questions

Practice Challenges

Level 1: Predict

Level 2: Debug

Level 3: Design

Quick Reference Cheatsheet

Key Takeaways