Spectral Graph Theory - Graph Laplacian, Eigenvalues, and Spectral Clustering
Reading time: ~26 minutes | Level: Graph Theory → GNN Theory
To understand why Graph Convolutional Networks work - not just how to call GCNConv - you need spectral graph theory. When a GCN computes , it is applying a learnable spectral filter to graph signals. Over-smoothing is a spectral phenomenon: deep GCNs amplify low-frequency (smooth) components and suppress high-frequency ones.
Spectral clustering, one of the most reliable clustering algorithms, is an eigenvalue problem on the graph Laplacian. Node2Vec's random walks implicitly approximate the spectral structure of the graph.
This lesson gives you the mathematical foundation to reason about all of this.
What You Will Learn
- The graph Laplacian and its properties
- Spectral decomposition:
- Graph signals and the graph Fourier transform
- Eigenvalues as frequency: smooth vs high-frequency graph signals
- The Fiedler vector and graph connectivity
- Spectral clustering: normalized cuts and the algorithm
- Chebyshev polynomial approximation: the bridge to GCNs
Part 1 - The Graph Laplacian
Definition
For a graph with adjacency matrix and degree matrix :
Explicitly:
For weighted graphs: for and .
import numpy as np
import networkx as nx
import scipy.sparse as sp
G = nx.karate_club_graph()
# Compute Laplacian (various methods)
A = nx.to_numpy_array(G, dtype=np.float64)
D = np.diag(A.sum(axis=1))
L = D - A # Unnormalized Laplacian
# Verify with networkx
L_nx = np.array(nx.laplacian_matrix(G).todense(), dtype=np.float64)
print(f"L matches networkx: {np.allclose(L, L_nx)}") # True
# Properties
print(f"Symmetric: {np.allclose(L, L.T)}") # True
print(f"Row sums: {L.sum(axis=1)[:5]}") # All near 0
print(f"Trace = sum of degrees: {np.trace(L):.1f}, {sum(dict(G.degree()).values())}")
Key properties of the Laplacian
Property 1: is symmetric and positive semi-definite:
This quadratic form measures the "smoothness" of signal on the graph - it is zero if and only if is constant on each connected component.
Property 2: - the constant vector is always in the null space.
Property 3: The number of zero eigenvalues of equals the number of connected components.
# Property 1: Quadratic form measures signal variation
x = np.random.randn(G.number_of_nodes())
smoothness = x.T @ L @ x
# This equals sum over edges of (x[i] - x[j])^2
manual = sum((x[i] - x[j])**2 for i, j in G.edges())
print(f"x^T L x = {smoothness:.4f}")
print(f"Σ(xi - xj)^2 = {manual:.4f}")
print(f"Equal: {np.isclose(smoothness, manual)}") # True
# Property 3: zero eigenvalues count connected components
eigenvalues = np.linalg.eigvalsh(L) # Sorted eigenvalues
n_components = np.sum(eigenvalues < 1e-8)
print(f"\nZero eigenvalues: {n_components}") # Should equal nx.number_connected_components(G)
print(f"Connected components: {nx.number_connected_components(G)}")
Part 2 - Spectral Decomposition
Eigendecomposition of the Laplacian
Since is symmetric PSD, it has a full set of real, non-negative eigenvalues:
where:
- with
- are the orthonormal eigenvectors
import numpy as np
G = nx.karate_club_graph()
L = np.array(nx.laplacian_matrix(G).todense(), dtype=np.float64)
# Full eigendecomposition
eigenvalues, eigenvectors = np.linalg.eigh(L) # eigh: symmetric, returns sorted values
# eigenvalues[0] ≈ 0 (Laplacian always has 0 eigenvalue)
# eigenvectors[:, i] = i-th eigenvector
print(f"Eigenvalues (sorted): {eigenvalues[:5]}") # [~0, λ₂, λ₃, λ₄, λ₅]
print(f"Smallest eigenvalue: {eigenvalues[0]:.2e}") # near 0
print(f"Fiedler value (λ₂): {eigenvalues[1]:.4f}") # algebraic connectivity
print(f"Largest eigenvalue: {eigenvalues[-1]:.4f}")
# Verify L = U Λ U^T
Lambda = np.diag(eigenvalues)
U = eigenvectors
L_reconstructed = U @ Lambda @ U.T
print(f"Reconstruction error: {np.max(np.abs(L - L_reconstructed)):.2e}") # near 0
The Fiedler vector (second eigenvector)
The Fiedler value (second smallest eigenvalue) is the algebraic connectivity of the graph:
- : graph is disconnected
- Large : well-connected graph (hard to disconnect)
- Small positive : graph is barely connected (has a bottleneck)
The corresponding Fiedler vector identifies the optimal graph bisection:
# Fiedler vector analysis
fiedler_value = eigenvalues[1]
fiedler_vector = eigenvectors[:, 1]
print(f"Fiedler value (algebraic connectivity): {fiedler_value:.4f}")
print(f"Fiedler vector (first 10 values): {fiedler_vector[:10]}")
# Graph bisection: sign of Fiedler vector determines partition
partition_A = np.where(fiedler_vector >= 0)[0]
partition_B = np.where(fiedler_vector < 0)[0]
print(f"\nPartition A ({len(partition_A)} nodes): {sorted(partition_A)[:10]}...")
print(f"Partition B ({len(partition_B)} nodes): {sorted(partition_B)[:10]}...")
# Count cut edges
cut_edges = sum(1 for i, j in G.edges()
if (fiedler_vector[i] >= 0) != (fiedler_vector[j] >= 0))
print(f"Cut edges: {cut_edges}")
Part 3 - Graph Signals and the Graph Fourier Transform
Graph signals
A graph signal is a function - it assigns a real number to each node. Examples:
- Temperature at each sensor node in an IoT network
- Infection probability at each node in an epidemic model
- Node feature value at each dimension in a GNN
The graph Fourier transform (GFT)
The standard Fourier transform decomposes a function into sinusoidal components (eigenfunctions of the Laplacian on the real line). The Graph Fourier Transform does the same using the eigenvectors of the graph Laplacian:
The coefficient measures how much the signal "aligns" with eigenvector .
import numpy as np
G = nx.karate_club_graph()
L = np.array(nx.laplacian_matrix(G).todense(), dtype=np.float64)
eigenvalues, U = np.linalg.eigh(L)
# Graph signal: club membership (the Zachary karate club split)
club_membership = np.array([1 if G.nodes[n]['club'] == 'Mr. Hi' else -1
for n in sorted(G.nodes())])
# Graph Fourier Transform
f_hat = U.T @ club_membership # (n,) vector of Fourier coefficients
# Low-frequency components (small eigenvalue = smooth signal)
print("Low-frequency Fourier coefficients (smooth components):")
for i in range(5):
print(f" λ={eigenvalues[i]:.4f}: |f̂|={abs(f_hat[i]):.4f}")
# Inverse transform (reconstruction)
f_reconstructed = U @ f_hat
print(f"\nReconstruction error: {np.max(np.abs(f_reconstructed - club_membership)):.2e}")
# Low-pass filtering: keep only low-frequency components
K = 5 # Keep only first K eigenvectors
f_lowpass = U[:, :K] @ f_hat[:K]
print(f"\nLow-pass filtered signal (first 10 values):")
print(f_lowpass[:10])
# Smooth version of the original signal
Frequency interpretation
The eigenvalue is the "frequency" of eigenvector :
- Low frequency (): varies slowly across the graph - neighboring nodes have similar values. These capture community structure.
- High frequency (): alternates sign between connected nodes - maximum variation across edges. These capture noise and fine-grained structure.
This is the spectral foundation of over-smoothing: repeated application of a low-pass graph convolution (like GCN) attenuates high-frequency components and amplifies low-frequency ones, eventually making all node features identical within each connected component.
Part 4 - Normalized Laplacians
Symmetric normalized Laplacian
Eigenvalues lie in . Used in spectral GNNs and ChebNet.
Random walk normalized Laplacian
Eigenvalues lie in . Connection: is the transition matrix of a random walk on the graph.
import numpy as np
import networkx as nx
G = nx.karate_club_graph()
A = nx.to_numpy_array(G, dtype=np.float64)
d = A.sum(axis=1)
D_inv_sqrt = np.diag(1.0 / np.sqrt(d + 1e-10))
D_inv = np.diag(1.0 / (d + 1e-10))
I = np.eye(len(G.nodes()))
L_sym = I - D_inv_sqrt @ A @ D_inv_sqrt
L_rw = I - D_inv @ A
# Eigenvalues of symmetric normalized Laplacian
evals_sym = np.linalg.eigvalsh(L_sym)
print(f"L_sym eigenvalue range: [{evals_sym.min():.4f}, {evals_sym.max():.4f}]")
# Should be in [0, 2]
# The GCN normalized adjacency à = D̃^{-1/2} (A+I) D̃^{-1/2}
A_hat = A + I
D_hat = np.diag(A_hat.sum(axis=1))
D_hat_inv_sqrt = np.diag(1.0 / np.sqrt(np.diag(D_hat) + 1e-10))
A_norm = D_hat_inv_sqrt @ A_hat @ D_hat_inv_sqrt
# A_norm = I - L_sym_with_self_loops
# Eigenvalues of A_norm ∈ [-1, 1] - bounded, training stable
evals_A_norm = np.linalg.eigvalsh(A_norm)
print(f"GCN normalized adjacency eigenvalue range: [{evals_A_norm.min():.4f}, {evals_A_norm.max():.4f}]")
Part 5 - Spectral Clustering
Spectral clustering uses the eigenvectors of the Laplacian to embed nodes into a low-dimensional space where standard clustering (e.g., k-means) works well.
The algorithm (Normalized Spectral Clustering - Shi & Malik)
1. Compute the normalized Laplacian L_sym = D^{-1/2} L D^{-1/2}
2. Compute the k smallest eigenvectors U = [u_1, ..., u_k]
3. Form matrix Y: normalize each row of U to unit length
4. Run k-means on rows of Y to get k clusters
5. Assign each node to the cluster of its row
import numpy as np
import networkx as nx
from sklearn.cluster import KMeans
from sklearn.preprocessing import normalize
def spectral_clustering(G: nx.Graph, n_clusters: int) -> np.ndarray:
"""
Normalized spectral clustering (Shi-Malik / Ng-Jordan-Weiss).
"""
n = G.number_of_nodes()
# Build sparse normalized Laplacian
import scipy.sparse as sp
from scipy.sparse.linalg import eigsh
L_sparse = sp.csr_matrix(nx.normalized_laplacian_matrix(G).astype(np.float64))
# Compute k smallest eigenvectors (excluding λ=0 for connected graphs)
# sigma=-0.01 shifts so we find eigenvalues near 0 efficiently
eigenvalues, eigenvectors = eigsh(L_sparse, k=n_clusters, which='SM')
# Sort by eigenvalue
idx = np.argsort(eigenvalues)
U = eigenvectors[:, idx] # (n, k) matrix of eigenvectors
# Normalize rows to unit length (Ng-Jordan-Weiss step)
Y = normalize(U, norm='l2', axis=1)
# K-means on the embedding
kmeans = KMeans(n_clusters=n_clusters, n_init=10, random_state=42)
labels = kmeans.fit_predict(Y)
return labels
# Example: karate club graph - should recover the two factions
G = nx.karate_club_graph()
labels = spectral_clustering(G, n_clusters=2)
# Ground truth: club membership
true_labels = np.array([1 if G.nodes[n]['club'] == 'Mr. Hi' else 0
for n in sorted(G.nodes())])
# Compute accuracy (handle label permutation)
from sklearn.metrics import adjusted_rand_score
ari = adjusted_rand_score(true_labels, labels)
print(f"Spectral clustering Adjusted Rand Index: {ari:.4f}")
# Should be high (~0.8+) for the karate club graph
print(f"Cluster 0: {np.where(labels==0)[0][:10]}...")
print(f"Cluster 1: {np.where(labels==1)[0][:10]}...")
Why spectral clustering works: normalized cuts
The normalized cut (NCut) objective minimizes:
where and .
Finding the exact minimum is NP-hard. The spectral relaxation (replacing the binary partition vector with a continuous vector and solving the resulting generalized eigenvalue problem) gives the Fiedler vector - the 2nd eigenvector of .
Part 6 - Connection to Graph Neural Networks
Spectral graph convolution
Classical signal processing: convolution in time domain = pointwise multiplication in frequency domain.
Graph convolution (in spectral domain): For filter parameterized by :
where applies the filter to each frequency component.
Problem: Computing and requires full eigendecomposition - . Intractable for large graphs.
ChebNet: Chebyshev polynomial approximation
ChebNet (Defferrard et al., 2016) approximates the spectral filter with Chebyshev polynomials:
where (eigenvalues scaled to ) and is the -th Chebyshev polynomial.
Using the recurrence :
- No eigendecomposition needed!
- sparse matrix-vector products with
- -localized in the graph (aggregates -hop neighborhoods)
import torch
import torch.nn as nn
from torch_geometric.nn import ChebConv, GCNConv
# GCN (Kipf & Welling 2017): simplification of ChebNet with K=1
# K=1 Chebyshev approximation gives the GCN propagation rule:
# Z = à H W where à = D̃^{-1/2} (A+I) D̃^{-1/2}
# GCN layer
class GCNLayer(nn.Module):
def __init__(self, in_channels: int, out_channels: int):
super().__init__()
self.linear = nn.Linear(in_channels, out_channels, bias=False)
def forward(self, x, A_norm):
# x: (n, in_channels)
# A_norm: (n, n) normalized adjacency with self-loops
return torch.relu(A_norm @ self.linear(x))
# PyTorch Geometric implementation
gcn_conv = GCNConv(in_channels=64, out_channels=32)
cheb_conv = ChebConv(in_channels=64, out_channels=32, K=3) # K=3 hops
Over-smoothing: a spectral perspective
With layers of GCN propagation, the feature matrix becomes:
The spectral effect: has eigenvalues where are the eigenvalues of . As :
- : eigenvalues decay to 0 - high-frequency components vanish
- : only the constant (DC) component survives
All node features converge to the same value - over-smoothing. After ~5-10 GCN layers, nodes in the same connected component become indistinguishable.
import numpy as np
def simulate_over_smoothing(A_norm: np.ndarray, H: np.ndarray, n_layers: int) -> list:
"""
Simulate GCN feature propagation and measure over-smoothing.
Over-smoothing metric: mean pairwise feature distance.
"""
from scipy.spatial.distance import pdist
feature_diversity = [pdist(H).mean()]
H_current = H.copy()
for layer in range(n_layers):
H_current = A_norm @ H_current # No weights - pure propagation
diversity = pdist(H_current).mean()
feature_diversity.append(diversity)
return feature_diversity
# Run on karate club graph
G = nx.karate_club_graph()
A = nx.to_numpy_array(G, dtype=np.float64)
n = A.shape[0]
A_hat = A + np.eye(n)
D_hat_inv_sqrt = np.diag(1.0 / np.sqrt(A_hat.sum(axis=1)))
A_norm = D_hat_inv_sqrt @ A_hat @ D_hat_inv_sqrt
H0 = np.random.randn(n, 8) # Random initial features
diversity = simulate_over_smoothing(A_norm, H0, n_layers=50)
print("Feature diversity (mean pairwise distance) vs layers:")
for layer in [0, 1, 2, 5, 10, 20, 50]:
if layer < len(diversity):
print(f" Layer {layer:3d}: diversity = {diversity[layer]:.6f}")
# Diversity converges to near 0 after ~20 layers
Interview Questions
Q1: What is the graph Laplacian and what do its eigenvalues represent?
The graph Laplacian is where is the diagonal degree matrix and is the adjacency matrix.
Key properties:
- Symmetric, positive semi-definite
- Row sums are zero:
- Quadratic form: - measures signal smoothness
Eigenvalue interpretation: Since is PSD, eigenvalues satisfy .
- : always (constant vector is always in null space). The corresponding eigenvector is constant.
- Number of zero eigenvalues = number of connected components. If is disconnected, multiple eigenvectors are constant on each component.
- (Fiedler value) = algebraic connectivity. iff graph is disconnected. Larger = better connected, harder to bisect.
- : bounded by for irregular graphs. For regular graphs, .
Frequency analogy: Eigenvector (corresponding to ) is the -th "frequency mode" of the graph. Small → smooth mode (neighbors have similar values). Large → oscillating mode (neighbors have opposite values).
The graph Fourier transform decomposes any graph signal into contributions from each frequency mode, exactly like the classical Fourier transform on .
Q2: Why does spectral clustering work? What does minimizing the normalized cut have to do with eigenvalues?
Normalized cut (NCut) measures partition quality:
A good partition has few edges between clusters (small cut) and balanced cluster sizes (large volumes). Minimizing NCut finds tight communities with clean separations.
The eigenvalue connection: Let be the indicator vector for partition . Then:
Minimizing this ratio over binary is the Rayleigh quotient problem - but with a binary constraint, it is NP-hard.
The spectral relaxation drops the binary constraint and minimizes over continuous :
This is a generalized eigenvalue problem , equivalent to the standard eigenvalue problem for .
The solution is the Fiedler vector (2nd eigenvector of ) - the continuous relaxation of the optimal bisection. Thresholding (sign or k-means) recovers a discrete partition.
For clusters: use eigenvectors of (the smallest non-zero eigenvalues). Each eigenvector captures a different "split" of the graph - together they embed nodes into where k-means can find clusters.
Q3: What is the graph Fourier transform and how does it connect to GNN convolutions?
Classical Fourier transform: Decomposes a function into sinusoids (eigenfunctions of the Laplacian ). High-frequency components correspond to rapid oscillations.
Graph Fourier Transform (GFT): Decomposes a graph signal into eigenvectors of the graph Laplacian :
Frequency = eigenvalue: small → smooth signal, large → rapidly alternating signal.
Graph convolution: In classical signal processing, convolution is pointwise multiplication in the Fourier domain. For graphs:
where is a learnable spectral filter.
The GCN simplification (Kipf & Welling, 2017):
- Approximate spectral filter by a polynomial:
- Use scaled Laplacian so eigenvalues in
- With (empirical):
- Simplify to single parameter :
This gives the GCN layer: - the famous normalized adjacency propagation, derived from spectral graph convolution theory.
Q4: What is over-smoothing in GNNs and how does spectral theory explain it?
Over-smoothing in GNNs: as the number of layers increases, all node features converge to the same vector, making nodes indistinguishable. Classification accuracy typically peaks at 2-4 layers and degrades with more layers.
Spectral explanation: The GCN propagation applies the matrix to the initial features. has eigenvalues in .
After steps:
- Eigenvalue →
- For : exponentially - these eigenvector components are suppressed
- For : - only the constant (lowest-frequency) component survives
As , converges to - all nodes have the same feature vector (the global mean, modulated by eigenvector of eigenvalue 1).
Solutions:
- JK-Net (Jumping Knowledge Networks): Concatenate features from all layers - each layer captures different frequency range.
- APPNP / PPR propagation: Uses damping to prevent eigenvalue-1 components from dominating.
- DropEdge: Randomly drops edges during training - reduces effective number of propagation steps.
- PairNorm: Normalizes features to maintain distance distribution between nodes.
- ResGCN: Skip connections prevent features from converging - gradient "highway" bypasses smoothing.
Q5: How does the Fiedler value quantify graph connectivity? What does it mean when λ₂ is small vs large?
The Fiedler value (second smallest eigenvalue of ) is the algebraic connectivity of the graph.
Formal definition: From the variational characterization:
This is the minimum "variation" of any non-constant signal on the graph.
Interpretation:
- : Graph is disconnected (two components can have constant signals with no variation across the cut).
- Small positive (): Bottleneck graph. There is a cut set of few edges whose removal disconnects the graph. The Fiedler vector identifies this bottleneck. Examples: two cliques connected by a single edge, long chains.
- Large : Well-connected graph. Expander graph. Random walk mixes quickly (). Hard to partition into two balanced halves.
Cheeger inequality (connecting to the graph's conductance ):
where is the conductance.
ML applications:
- Random walk mixing time: - relevant for node2vec/DeepWalk-based GNN pretraining
- GNN information propagation: small means information between distant communities travels slowly through GNN layers
- Graph generation quality: checking that generated graphs have similar to real graphs
Quick Reference
| Quantity | Formula | Interpretation |
|---|---|---|
| Graph Laplacian | Discrete second derivative on graph | |
| Quadratic form | Signal smoothness | |
| Eigenvalues | Always | Frequency spectrum |
| Number of zero eigenvalues | - | Number of connected components |
| Fiedler value | Second smallest eigenvalue | Algebraic connectivity |
| Fiedler vector | Second eigenvector | Optimal bisection direction |
| Graph Fourier Transform | Decompose signal into modes | |
| GCN propagation | K=1 Chebyshev filter | |
| Over-smoothing | constant as | Low-pass filtering effect |
Next: Lesson 05: Random Graphs and Network Models →
:::tip 🎮 Interactive Playground
Visualize this concept: Try the Spectral Graph Theory demo on the EngineersOfAI Playground - no code required.
:::
