Spectral Graph Theory - Graph Laplacian, Eigenvalues, and Spectral Clustering

Reading time: ~26 minutes | Level: Graph Theory → GNN Theory

To understand why Graph Convolutional Networks work - not just how to call GCNConv - you need spectral graph theory. When a GCN computes $\hat{A} H W$ , it is applying a learnable spectral filter to graph signals. Over-smoothing is a spectral phenomenon: deep GCNs amplify low-frequency (smooth) components and suppress high-frequency ones.

Spectral clustering, one of the most reliable clustering algorithms, is an eigenvalue problem on the graph Laplacian. Node2Vec's random walks implicitly approximate the spectral structure of the graph.

This lesson gives you the mathematical foundation to reason about all of this.

What You Will Learn

The graph Laplacian $L = D - A$ and its properties
Spectral decomposition: $L = U \Lambda U^T$
Graph signals and the graph Fourier transform
Eigenvalues as frequency: smooth vs high-frequency graph signals
The Fiedler vector and graph connectivity
Spectral clustering: normalized cuts and the algorithm
Chebyshev polynomial approximation: the bridge to GCNs

Part 1 - The Graph Laplacian

Definition

For a graph $G = (V, E)$ with adjacency matrix $A$ and degree matrix $D = \text{diag}(d_1, \ldots, d_n)$ :

$L = D - A$

Explicitly: $L_{ij} = \begin{cases} d_i & \text{if } i = j \\ -1 & \text{if } (i,j) \in E \\ 0 & \text{otherwise} \end{cases}$

For weighted graphs: $L_{ij} = -w(i,j)$ for $(i,j) \in E$ and $L_{ii} = \sum_j w(i,j)$ .

import numpy as np
import networkx as nx
import scipy.sparse as sp

G = nx.karate_club_graph()

# Compute Laplacian (various methods)
A = nx.to_numpy_array(G, dtype=np.float64)
D = np.diag(A.sum(axis=1))
L = D - A  # Unnormalized Laplacian

# Verify with networkx
L_nx = np.array(nx.laplacian_matrix(G).todense(), dtype=np.float64)
print(f"L matches networkx: {np.allclose(L, L_nx)}")  # True

# Properties
print(f"Symmetric: {np.allclose(L, L.T)}")          # True
print(f"Row sums: {L.sum(axis=1)[:5]}")              # All near 0
print(f"Trace = sum of degrees: {np.trace(L):.1f}, {sum(dict(G.degree()).values())}")

Key properties of the Laplacian

Property 1: $L$ is symmetric and positive semi-definite: $x^T L x = \sum_{(i,j) \in E} (x_i - x_j)^2 \geq 0 \quad \text{for all } x \in \mathbb{R}^n$

This quadratic form measures the "smoothness" of signal $x$ on the graph - it is zero if and only if $x$ is constant on each connected component.

Property 2: $L \mathbf{1} = \mathbf{0}$ - the constant vector is always in the null space.

Property 3: The number of zero eigenvalues of $L$ equals the number of connected components.

# Property 1: Quadratic form measures signal variation
x = np.random.randn(G.number_of_nodes())
smoothness = x.T @ L @ x
# This equals sum over edges of (x[i] - x[j])^2
manual = sum((x[i] - x[j])**2 for i, j in G.edges())
print(f"x^T L x = {smoothness:.4f}")
print(f"Σ(xi - xj)^2 = {manual:.4f}")
print(f"Equal: {np.isclose(smoothness, manual)}")  # True

# Property 3: zero eigenvalues count connected components
eigenvalues = np.linalg.eigvalsh(L)  # Sorted eigenvalues
n_components = np.sum(eigenvalues < 1e-8)
print(f"\nZero eigenvalues: {n_components}")        # Should equal nx.number_connected_components(G)
print(f"Connected components: {nx.number_connected_components(G)}")

Part 2 - Spectral Decomposition

Eigendecomposition of the Laplacian

Since $L$ is symmetric PSD, it has a full set of real, non-negative eigenvalues:

$L = U \Lambda U^T$

where:

$\Lambda = \text{diag}(\lambda_1, \lambda_2, \ldots, \lambda_n)$ with $0 = \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n$
$U = [u_1, u_2, \ldots, u_n]$ are the orthonormal eigenvectors

import numpy as np

G = nx.karate_club_graph()
L = np.array(nx.laplacian_matrix(G).todense(), dtype=np.float64)

# Full eigendecomposition
eigenvalues, eigenvectors = np.linalg.eigh(L)   # eigh: symmetric, returns sorted values
# eigenvalues[0] ≈ 0 (Laplacian always has 0 eigenvalue)
# eigenvectors[:, i] = i-th eigenvector

print(f"Eigenvalues (sorted): {eigenvalues[:5]}")  # [~0, λ₂, λ₃, λ₄, λ₅]
print(f"Smallest eigenvalue: {eigenvalues[0]:.2e}")  # near 0
print(f"Fiedler value (λ₂): {eigenvalues[1]:.4f}")  # algebraic connectivity
print(f"Largest eigenvalue: {eigenvalues[-1]:.4f}")

# Verify L = U Λ U^T
Lambda = np.diag(eigenvalues)
U = eigenvectors
L_reconstructed = U @ Lambda @ U.T
print(f"Reconstruction error: {np.max(np.abs(L - L_reconstructed)):.2e}")  # near 0

The Fiedler vector (second eigenvector)

The Fiedler value $\lambda_2$ (second smallest eigenvalue) is the algebraic connectivity of the graph:

$\lambda_2 = 0$ : graph is disconnected
Large $\lambda_2$ : well-connected graph (hard to disconnect)
Small positive $\lambda_2$ : graph is barely connected (has a bottleneck)

The corresponding Fiedler vector $u_2$ identifies the optimal graph bisection:

# Fiedler vector analysis
fiedler_value = eigenvalues[1]
fiedler_vector = eigenvectors[:, 1]

print(f"Fiedler value (algebraic connectivity): {fiedler_value:.4f}")
print(f"Fiedler vector (first 10 values): {fiedler_vector[:10]}")

# Graph bisection: sign of Fiedler vector determines partition
partition_A = np.where(fiedler_vector >= 0)[0]
partition_B = np.where(fiedler_vector < 0)[0]
print(f"\nPartition A ({len(partition_A)} nodes): {sorted(partition_A)[:10]}...")
print(f"Partition B ({len(partition_B)} nodes): {sorted(partition_B)[:10]}...")

# Count cut edges
cut_edges = sum(1 for i, j in G.edges()
                if (fiedler_vector[i] >= 0) != (fiedler_vector[j] >= 0))
print(f"Cut edges: {cut_edges}")

Part 3 - Graph Signals and the Graph Fourier Transform

Graph signals

A graph signal is a function $f: V \to \mathbb{R}$ - it assigns a real number to each node. Examples:

Temperature at each sensor node in an IoT network
Infection probability at each node in an epidemic model
Node feature value at each dimension in a GNN

The graph Fourier transform (GFT)

The standard Fourier transform decomposes a function into sinusoidal components (eigenfunctions of the Laplacian on the real line). The Graph Fourier Transform does the same using the eigenvectors of the graph Laplacian:

$\hat{f} = U^T f \quad \text{(transform)}$ $f = U \hat{f} \quad \text{(inverse transform)}$

The coefficient $\hat{f}_k = u_k^T f$ measures how much the signal $f$ "aligns" with eigenvector $u_k$ .

import numpy as np

G = nx.karate_club_graph()
L = np.array(nx.laplacian_matrix(G).todense(), dtype=np.float64)
eigenvalues, U = np.linalg.eigh(L)

# Graph signal: club membership (the Zachary karate club split)
club_membership = np.array([1 if G.nodes[n]['club'] == 'Mr. Hi' else -1
                             for n in sorted(G.nodes())])

# Graph Fourier Transform
f_hat = U.T @ club_membership   # (n,) vector of Fourier coefficients

# Low-frequency components (small eigenvalue = smooth signal)
print("Low-frequency Fourier coefficients (smooth components):")
for i in range(5):
    print(f"  λ={eigenvalues[i]:.4f}: |f̂|={abs(f_hat[i]):.4f}")

# Inverse transform (reconstruction)
f_reconstructed = U @ f_hat
print(f"\nReconstruction error: {np.max(np.abs(f_reconstructed - club_membership)):.2e}")

# Low-pass filtering: keep only low-frequency components
K = 5  # Keep only first K eigenvectors
f_lowpass = U[:, :K] @ f_hat[:K]
print(f"\nLow-pass filtered signal (first 10 values):")
print(f_lowpass[:10])
# Smooth version of the original signal

Frequency interpretation

The eigenvalue $\lambda_k$ is the "frequency" of eigenvector $u_k$ :

Low frequency ( $\lambda_k \approx 0$ ): $u_k$ varies slowly across the graph - neighboring nodes have similar values. These capture community structure.
High frequency ( $\lambda_k \approx \lambda_{\max}$ ): $u_k$ alternates sign between connected nodes - maximum variation across edges. These capture noise and fine-grained structure.

This is the spectral foundation of over-smoothing: repeated application of a low-pass graph convolution (like GCN) attenuates high-frequency components and amplifies low-frequency ones, eventually making all node features identical within each connected component.

Part 4 - Normalized Laplacians

Symmetric normalized Laplacian

$L_{\text{sym}} = D^{-1/2} L D^{-1/2} = I - D^{-1/2} A D^{-1/2}$

Eigenvalues lie in $[0, 2]$ . Used in spectral GNNs and ChebNet.

Random walk normalized Laplacian

$L_{\text{rw}} = D^{-1} L = I - D^{-1} A$

Eigenvalues lie in $[0, 2]$ . Connection: $D^{-1} A$ is the transition matrix of a random walk on the graph.

import numpy as np
import networkx as nx

G = nx.karate_club_graph()
A = nx.to_numpy_array(G, dtype=np.float64)
d = A.sum(axis=1)
D_inv_sqrt = np.diag(1.0 / np.sqrt(d + 1e-10))
D_inv = np.diag(1.0 / (d + 1e-10))

I = np.eye(len(G.nodes()))

L_sym = I - D_inv_sqrt @ A @ D_inv_sqrt
L_rw = I - D_inv @ A

# Eigenvalues of symmetric normalized Laplacian
evals_sym = np.linalg.eigvalsh(L_sym)
print(f"L_sym eigenvalue range: [{evals_sym.min():.4f}, {evals_sym.max():.4f}]")
# Should be in [0, 2]

# The GCN normalized adjacency Ã = D̃^{-1/2} (A+I) D̃^{-1/2}
A_hat = A + I
D_hat = np.diag(A_hat.sum(axis=1))
D_hat_inv_sqrt = np.diag(1.0 / np.sqrt(np.diag(D_hat) + 1e-10))
A_norm = D_hat_inv_sqrt @ A_hat @ D_hat_inv_sqrt
# A_norm = I - L_sym_with_self_loops
# Eigenvalues of A_norm ∈ [-1, 1] - bounded, training stable
evals_A_norm = np.linalg.eigvalsh(A_norm)
print(f"GCN normalized adjacency eigenvalue range: [{evals_A_norm.min():.4f}, {evals_A_norm.max():.4f}]")

Part 5 - Spectral Clustering

Spectral clustering uses the eigenvectors of the Laplacian to embed nodes into a low-dimensional space where standard clustering (e.g., k-means) works well.

The algorithm (Normalized Spectral Clustering - Shi & Malik)

Compute the normalized Laplacian L_sym = D^{-1/2} L D^{-1/2}
Compute the k smallest eigenvectors U = [u_1, ..., u_k]
Form matrix Y: normalize each row of U to unit length
Run k-means on rows of Y to get k clusters
Assign each node to the cluster of its row

import numpy as np
import networkx as nx
from sklearn.cluster import KMeans
from sklearn.preprocessing import normalize

def spectral_clustering(G: nx.Graph, n_clusters: int) -> np.ndarray:
    """
    Normalized spectral clustering (Shi-Malik / Ng-Jordan-Weiss).
    """
    n = G.number_of_nodes()

    # Build sparse normalized Laplacian
    import scipy.sparse as sp
    from scipy.sparse.linalg import eigsh

    L_sparse = sp.csr_matrix(nx.normalized_laplacian_matrix(G).astype(np.float64))

    # Compute k smallest eigenvectors (excluding λ=0 for connected graphs)
    # sigma=-0.01 shifts so we find eigenvalues near 0 efficiently
    eigenvalues, eigenvectors = eigsh(L_sparse, k=n_clusters, which='SM')

    # Sort by eigenvalue
    idx = np.argsort(eigenvalues)
    U = eigenvectors[:, idx]   # (n, k) matrix of eigenvectors

    # Normalize rows to unit length (Ng-Jordan-Weiss step)
    Y = normalize(U, norm='l2', axis=1)

    # K-means on the embedding
    kmeans = KMeans(n_clusters=n_clusters, n_init=10, random_state=42)
    labels = kmeans.fit_predict(Y)

    return labels

# Example: karate club graph - should recover the two factions
G = nx.karate_club_graph()
labels = spectral_clustering(G, n_clusters=2)

# Ground truth: club membership
true_labels = np.array([1 if G.nodes[n]['club'] == 'Mr. Hi' else 0
                         for n in sorted(G.nodes())])

# Compute accuracy (handle label permutation)
from sklearn.metrics import adjusted_rand_score
ari = adjusted_rand_score(true_labels, labels)
print(f"Spectral clustering Adjusted Rand Index: {ari:.4f}")
# Should be high (~0.8+) for the karate club graph

print(f"Cluster 0: {np.where(labels==0)[0][:10]}...")
print(f"Cluster 1: {np.where(labels==1)[0][:10]}...")

Why spectral clustering works: normalized cuts

The normalized cut (NCut) objective minimizes:

$\text{NCut}(A, B) = \frac{\text{cut}(A, B)}{\text{vol}(A)} + \frac{\text{cut}(A, B)}{\text{vol}(B)}$

where $\text{cut}(A, B) = \sum_{i \in A, j \in B} w_{ij}$ and $\text{vol}(A) = \sum_{i \in A} d_i$ .

Finding the exact minimum is NP-hard. The spectral relaxation (replacing the binary partition vector with a continuous vector and solving the resulting generalized eigenvalue problem) gives the Fiedler vector - the 2nd eigenvector of $L_{\text{rw}}$ .

Part 6 - Connection to Graph Neural Networks

Spectral graph convolution

Classical signal processing: convolution in time domain = pointwise multiplication in frequency domain.

Graph convolution (in spectral domain): For filter $g_\theta$ parameterized by $\theta$ :

$g_\theta * x = U g_\theta(\Lambda) U^T x$

where $g_\theta(\Lambda) = \text{diag}(g_\theta(\lambda_1), \ldots, g_\theta(\lambda_n))$ applies the filter to each frequency component.

Problem: Computing $U$ and $U^T x$ requires full eigendecomposition - $O(n^3)$ . Intractable for large graphs.

ChebNet: Chebyshev polynomial approximation

ChebNet (Defferrard et al., 2016) approximates the spectral filter with Chebyshev polynomials:

$g_\theta(\Lambda) \approx \sum_{k=0}^{K} \theta_k T_k(\tilde{\Lambda})$

where $\tilde{\Lambda} = 2\Lambda/\lambda_{\max} - I$ (eigenvalues scaled to $[-1, 1]$ ) and $T_k$ is the $k$ -th Chebyshev polynomial.

Using the recurrence $T_k(\tilde{L}) x = 2\tilde{L} T_{k-1}(\tilde{L}) x - T_{k-2}(\tilde{L}) x$ :

No eigendecomposition needed!
$K$ sparse matrix-vector products with $\tilde{L}$
$K$ -localized in the graph (aggregates $K$ -hop neighborhoods)

import torch
import torch.nn as nn
from torch_geometric.nn import ChebConv, GCNConv

# GCN (Kipf & Welling 2017): simplification of ChebNet with K=1
# K=1 Chebyshev approximation gives the GCN propagation rule:
# Z = Ã H W where Ã = D̃^{-1/2} (A+I) D̃^{-1/2}

# GCN layer
class GCNLayer(nn.Module):
    def __init__(self, in_channels: int, out_channels: int):
        super().__init__()
        self.linear = nn.Linear(in_channels, out_channels, bias=False)

    def forward(self, x, A_norm):
        # x: (n, in_channels)
        # A_norm: (n, n) normalized adjacency with self-loops
        return torch.relu(A_norm @ self.linear(x))

# PyTorch Geometric implementation
gcn_conv = GCNConv(in_channels=64, out_channels=32)
cheb_conv = ChebConv(in_channels=64, out_channels=32, K=3)  # K=3 hops

Over-smoothing: a spectral perspective

With $L$ layers of GCN propagation, the feature matrix becomes:

$H^{(L)} = \hat{A}^L H^{(0)} W^{(0)} W^{(1)} \cdots W^{(L-1)}$

The spectral effect: $\hat{A}^L$ has eigenvalues $\mu_i^L$ where $\mu_i$ are the eigenvalues of $\hat{A} \in [-1, 1]$ . As $L \to \infty$ :

$|\mu_i| < 1$ : eigenvalues decay to 0 - high-frequency components vanish
$\mu_i = 1$ : only the constant (DC) component survives

All node features converge to the same value - over-smoothing. After ~5-10 GCN layers, nodes in the same connected component become indistinguishable.

import numpy as np

def simulate_over_smoothing(A_norm: np.ndarray, H: np.ndarray, n_layers: int) -> list:
    """
    Simulate GCN feature propagation and measure over-smoothing.
    Over-smoothing metric: mean pairwise feature distance.
    """
    from scipy.spatial.distance import pdist

    feature_diversity = [pdist(H).mean()]

    H_current = H.copy()
    for layer in range(n_layers):
        H_current = A_norm @ H_current   # No weights - pure propagation
        diversity = pdist(H_current).mean()
        feature_diversity.append(diversity)

    return feature_diversity

# Run on karate club graph
G = nx.karate_club_graph()
A = nx.to_numpy_array(G, dtype=np.float64)
n = A.shape[0]
A_hat = A + np.eye(n)
D_hat_inv_sqrt = np.diag(1.0 / np.sqrt(A_hat.sum(axis=1)))
A_norm = D_hat_inv_sqrt @ A_hat @ D_hat_inv_sqrt

H0 = np.random.randn(n, 8)  # Random initial features
diversity = simulate_over_smoothing(A_norm, H0, n_layers=50)

print("Feature diversity (mean pairwise distance) vs layers:")
for layer in [0, 1, 2, 5, 10, 20, 50]:
    if layer < len(diversity):
        print(f"  Layer {layer:3d}: diversity = {diversity[layer]:.6f}")
# Diversity converges to near 0 after ~20 layers

Interview Questions

Q1: What is the graph Laplacian and what do its eigenvalues represent?

The graph Laplacian is $L = D - A$ where $D$ is the diagonal degree matrix and $A$ is the adjacency matrix.

Key properties:

Symmetric, positive semi-definite
Row sums are zero: $L \mathbf{1} = \mathbf{0}$
Quadratic form: $x^T L x = \sum_{(i,j) \in E} (x_i - x_j)^2$ - measures signal smoothness

Eigenvalue interpretation: Since $L$ is PSD, eigenvalues satisfy $0 = \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n$ .

$\lambda_1 = 0$ : always (constant vector is always in null space). The corresponding eigenvector $u_1 = \mathbf{1}/\sqrt{n}$ is constant.
Number of zero eigenvalues = number of connected components. If $G$ is disconnected, multiple eigenvectors are constant on each component.
$\lambda_2$ (Fiedler value) = algebraic connectivity. $\lambda_2 = 0$ iff graph is disconnected. Larger $\lambda_2$ = better connected, harder to bisect.
$\lambda_{\max}$ : bounded by $2 \times \max_i d_i$ for irregular graphs. For regular graphs, $\lambda_{\max} \leq 2d$ .

Frequency analogy: Eigenvector $u_k$ (corresponding to $\lambda_k$ ) is the $k$ -th "frequency mode" of the graph. Small $\lambda_k$ → smooth mode (neighbors have similar values). Large $\lambda_k$ → oscillating mode (neighbors have opposite values).

The graph Fourier transform $\hat{f} = U^T f$ decomposes any graph signal into contributions from each frequency mode, exactly like the classical Fourier transform on $\mathbb{R}$ .

Q2: Why does spectral clustering work? What does minimizing the normalized cut have to do with eigenvalues?

Normalized cut (NCut) measures partition quality: $\text{NCut}(A, B) = \frac{\text{cut}(A, B)}{\text{vol}(A)} + \frac{\text{cut}(A, B)}{\text{vol}(B)}$

A good partition has few edges between clusters (small cut) and balanced cluster sizes (large volumes). Minimizing NCut finds tight communities with clean separations.

The eigenvalue connection: Let $h \in \{0, 1\}^n$ be the indicator vector for partition $A$ . Then: $\text{NCut}(A, B) = \frac{h^T L h}{h^T D h}$

Minimizing this ratio over binary $h$ is the Rayleigh quotient problem - but with a binary constraint, it is NP-hard.

The spectral relaxation drops the binary constraint and minimizes over continuous $h$ : $\min_{h} \frac{h^T L h}{h^T D h} \quad \text{s.t.} \quad h^T D h = 1$

This is a generalized eigenvalue problem $L h = \lambda D h$ , equivalent to the standard eigenvalue problem $D^{-1/2} L D^{-1/2} v = \lambda v$ for $L_{\text{sym}}$ .

The solution is the Fiedler vector (2nd eigenvector of $L_{\text{rw}} = D^{-1}L$ ) - the continuous relaxation of the optimal bisection. Thresholding (sign or k-means) recovers a discrete partition.

For $k > 2$ clusters: use $k$ eigenvectors of $L_{\text{sym}}$ (the $k$ smallest non-zero eigenvalues). Each eigenvector captures a different "split" of the graph - together they embed nodes into $\mathbb{R}^k$ where k-means can find $k$ clusters.

Q3: What is the graph Fourier transform and how does it connect to GNN convolutions?

Classical Fourier transform: Decomposes a function $f: \mathbb{R} \to \mathbb{R}$ into sinusoids (eigenfunctions of the Laplacian $-d^2/dx^2$ ). High-frequency components correspond to rapid oscillations.

Graph Fourier Transform (GFT): Decomposes a graph signal $f: V \to \mathbb{R}$ into eigenvectors of the graph Laplacian $L = U\Lambda U^T$ : $\hat{f} = U^T f, \quad f = U \hat{f}$

Frequency = eigenvalue: small $\lambda$ → smooth signal, large $\lambda$ → rapidly alternating signal.

Graph convolution: In classical signal processing, convolution is pointwise multiplication in the Fourier domain. For graphs: $g_\theta * f = U g_\theta(\Lambda) U^T f$

where $g_\theta(\Lambda) = \text{diag}(g_\theta(\lambda_1), \ldots, g_\theta(\lambda_n))$ is a learnable spectral filter.

The GCN simplification (Kipf & Welling, 2017):

Approximate spectral filter by a polynomial: $g_\theta(\lambda) \approx \theta_0 + \theta_1 \lambda$
Use scaled Laplacian $\tilde{L} = 2L/\lambda_{\max} - I$ so eigenvalues in $[-1,1]$
With $\lambda_{\max} \approx 2$ (empirical): $g_\theta(\lambda) \approx \theta_0 - \theta_1 (I - A_{\text{norm}})$
Simplify to single parameter $\theta = \theta_0 = -\theta_1$ : $g_\theta(\hat{A}) = \theta \hat{A}$

This gives the GCN layer: $H^{(l+1)} = \sigma(\hat{A} H^{(l)} W^{(l)})$ - the famous normalized adjacency propagation, derived from spectral graph convolution theory.

Q4: What is over-smoothing in GNNs and how does spectral theory explain it?

Over-smoothing in GNNs: as the number of layers increases, all node features converge to the same vector, making nodes indistinguishable. Classification accuracy typically peaks at 2-4 layers and degrades with more layers.

Spectral explanation: The GCN propagation $H^{(L)} = \hat{A}^L H^{(0)} \cdot W$ applies the matrix $\hat{A}^L$ to the initial features. $\hat{A}$ has eigenvalues in $(-1, 1]$ .

After $L$ steps:

Eigenvalue $\mu_i$ → $\mu_i^L$
For $|\mu_i| < 1$ : $\mu_i^L \to 0$ exponentially - these eigenvector components are suppressed
For $\mu_i = 1$ : $\mu_i^L = 1$ - only the constant (lowest-frequency) component survives

As $L \to \infty$ , $H^{(L)}$ converges to $\alpha \mathbf{1} \mathbf{1}^T H^{(0)} \cdot W$ - all nodes have the same feature vector (the global mean, modulated by eigenvector of eigenvalue 1).

Solutions:

JK-Net (Jumping Knowledge Networks): Concatenate features from all layers - each layer captures different frequency range.
APPNP / PPR propagation: Uses $(1-\alpha)$ damping to prevent eigenvalue-1 components from dominating.
DropEdge: Randomly drops edges during training - reduces effective number of propagation steps.
PairNorm: Normalizes features to maintain distance distribution between nodes.
ResGCN: Skip connections prevent features from converging - gradient "highway" bypasses smoothing.

Q5: How does the Fiedler value quantify graph connectivity? What does it mean when λ₂ is small vs large?

The Fiedler value $\lambda_2$ (second smallest eigenvalue of $L$ ) is the algebraic connectivity of the graph.

Formal definition: From the variational characterization: $\lambda_2 = \min_{x \perp \mathbf{1}, \|x\|=1} x^T L x = \min_{x \perp \mathbf{1}, \|x\|=1} \sum_{(i,j) \in E} (x_i - x_j)^2$

This is the minimum "variation" of any non-constant signal on the graph.

Interpretation:

$\lambda_2 = 0$ : Graph is disconnected (two components can have constant signals with no variation across the cut).
Small positive $\lambda_2$ ( $\lambda_2 \ll 1$ ): Bottleneck graph. There is a cut set of few edges whose removal disconnects the graph. The Fiedler vector $u_2$ identifies this bottleneck. Examples: two cliques connected by a single edge, long chains.
Large $\lambda_2$ : Well-connected graph. Expander graph. Random walk mixes quickly ( $\text{mixing time} = O(\lambda_2^{-1} \log n)$ ). Hard to partition into two balanced halves.

Cheeger inequality (connecting $\lambda_2$ to the graph's conductance $\phi$ ): $\frac{\phi^2}{2} \leq \lambda_2 \leq 2\phi$

where $\phi = \min_S \frac{\text{cut}(S, V \setminus S)}{\min(\text{vol}(S), \text{vol}(V \setminus S))}$ is the conductance.

ML applications:

Random walk mixing time: $O(\lambda_2^{-1} \log n)$ - relevant for node2vec/DeepWalk-based GNN pretraining
GNN information propagation: small $\lambda_2$ means information between distant communities travels slowly through GNN layers
Graph generation quality: checking that generated graphs have similar $\lambda_2$ to real graphs

Quick Reference

Quantity	Formula	Interpretation
Graph Laplacian	$L = D - A$	Discrete second derivative on graph
Quadratic form	$x^T L x = \sum_{(i,j)} (x_i - x_j)^2$	Signal smoothness
Eigenvalues $\lambda_1 \leq \ldots \leq \lambda_n$	Always $\lambda_1 = 0$	Frequency spectrum
Number of zero eigenvalues	-	Number of connected components
Fiedler value $\lambda_2$	Second smallest eigenvalue	Algebraic connectivity
Fiedler vector $u_2$	Second eigenvector	Optimal bisection direction
Graph Fourier Transform	$\hat{f} = U^T f$	Decompose signal into modes
GCN propagation	$\hat{A} = D^{-1/2}(A+I)D^{-1/2}$	K=1 Chebyshev filter
Over-smoothing	$\hat{A}^L \to$ constant as $L \to \infty$	Low-pass filtering effect

Next: Lesson 05: Random Graphs and Network Models →

:::tip 🎮 Interactive Playground

Visualize this concept: Try the Spectral Graph Theory demo on the EngineersOfAI Playground - no code required.

:::

What You Will Learn​

Part 1 - The Graph Laplacian​

Definition​

Key properties of the Laplacian​

Part 2 - Spectral Decomposition​

Eigendecomposition of the Laplacian​

The Fiedler vector (second eigenvector)​

Part 3 - Graph Signals and the Graph Fourier Transform​

Graph signals​

The graph Fourier transform (GFT)​

Frequency interpretation​

Part 4 - Normalized Laplacians​

Symmetric normalized Laplacian​

Random walk normalized Laplacian​

Part 5 - Spectral Clustering​

The algorithm (Normalized Spectral Clustering - Shi & Malik)​

Why spectral clustering works: normalized cuts​

Part 6 - Connection to Graph Neural Networks​

Spectral graph convolution​

ChebNet: Chebyshev polynomial approximation​

Over-smoothing: a spectral perspective​

Interview Questions​

Quick Reference​