Graph Algorithms - BFS, DFS, Dijkstra, PageRank, and ML Feature Engineering

Reading time: ~24 minutes | Level: Graph Theory → ML Engineering

You are building a fraud detection system for a financial network. Each transaction is a directed edge, each account is a node. You want features that capture: how central is this account in the transaction network? Are there short cycles suggesting money laundering? Does this account sit on many shortest paths between other accounts?

These questions are answered by graph algorithms. They are the bridge between raw graph structure and ML-usable features. In the GNN era, knowing graph algorithms also tells you what information your model can theoretically compute - and what it cannot.

What You Will Learn

BFS and DFS: traversal, connected components, bipartite detection
Dijkstra's algorithm: shortest paths in weighted graphs
Topological sort: execution order for DAGs
Minimum spanning tree (Kruskal, Prim)
PageRank: random walk based node importance
Graph feature engineering for tabular ML and GNN initialization
Complexity analysis and practical Python implementations

Part 1 - Breadth-First Search (BFS)

Algorithm

BFS explores a graph level by level from a source node, using a queue:

BFS(G, source):
    Initialize: visited = {source}, queue = [source], distance = {source: 0}
    While queue is not empty:
        v = queue.pop_front()
        For each neighbor u of v:
            If u not in visited:
                visited.add(u)
                distance[u] = distance[v] + 1
                queue.append(u)
    Return distances

Time complexity: $O(|V| + |E|)$ Space complexity: $O(|V|)$

from collections import deque
import networkx as nx
import numpy as np

def bfs(adj: dict, source: int) -> dict:
    """
    BFS from source node.
    Returns shortest-path distances (unweighted) to all reachable nodes.
    """
    visited = {source}
    distance = {source: 0}
    queue = deque([source])

    while queue:
        v = queue.popleft()
        for u in adj.get(v, []):
            if u not in visited:
                visited.add(u)
                distance[u] = distance[v] + 1
                queue.append(u)

    return distance

# ML application: compute closeness centrality efficiently
def closeness_centrality_from_bfs(adj: dict, n: int) -> dict:
    """Closeness centrality = n-1 / sum(distances from v to all others)."""
    centrality = {}
    nodes = list(adj.keys())
    for v in nodes:
        dists = bfs(adj, v)
        if len(dists) > 1:
            avg_dist = sum(dists.values()) / (len(dists) - 1)
            centrality[v] = (len(dists) - 1) / (avg_dist * (n - 1)) if avg_dist > 0 else 0
        else:
            centrality[v] = 0
    return centrality

# BFS for finding connected components
def connected_components_bfs(adj: dict) -> list:
    """Find all connected components using BFS."""
    all_nodes = set(adj.keys())
    visited = set()
    components = []

    for start in all_nodes:
        if start not in visited:
            # BFS from unvisited node
            component = set()
            queue = deque([start])
            while queue:
                v = queue.popleft()
                if v not in visited:
                    visited.add(v)
                    component.add(v)
                    for u in adj.get(v, []):
                        if u not in visited:
                            queue.append(u)
            components.append(component)

    return components

BFS in GNNs: k-hop neighborhoods

BFS directly corresponds to GNN receptive fields - a $k$ -layer GNN aggregates exactly the $k$ -hop BFS neighborhood of each node:

import networkx as nx
from torch_geometric.utils import k_hop_subgraph
import torch

G = nx.karate_club_graph()
edge_index = torch.tensor(list(G.edges())).t().contiguous()
# Make undirected
edge_index = torch.cat([edge_index, edge_index.flip(0)], dim=1)

# 2-hop neighborhood of node 0 (what a 2-layer GNN "sees")
node_idx = 0
num_hops = 2
subset, sub_edge_index, mapping, edge_mask = k_hop_subgraph(
    node_idx=node_idx,
    num_hops=num_hops,
    edge_index=edge_index,
    num_nodes=G.number_of_nodes()
)
print(f"Nodes in {num_hops}-hop neighborhood of node 0: {subset.tolist()}")
print(f"Number of nodes: {len(subset)}")
# These are exactly the nodes a 2-layer GNN uses to compute node 0's representation

Part 2 - Depth-First Search (DFS)

DFS explores a graph by going as deep as possible before backtracking, using a stack (or recursion):

def dfs(adj: dict, source: int) -> list:
    """
    Iterative DFS from source.
    Returns nodes in DFS visit order.
    """
    visited = set()
    stack = [source]
    order = []

    while stack:
        v = stack.pop()
        if v not in visited:
            visited.add(v)
            order.append(v)
            # Add neighbors to stack (reverse order for left-to-right DFS)
            for u in reversed(list(adj.get(v, []))):
                if u not in visited:
                    stack.append(u)

    return order

def has_cycle_undirected(adj: dict) -> bool:
    """
    Detect cycle in undirected graph using DFS.
    A back edge (to a visited non-parent node) indicates a cycle.
    """
    visited = set()

    def dfs_cycle(v, parent):
        visited.add(v)
        for u in adj.get(v, []):
            if u not in visited:
                if dfs_cycle(u, v):
                    return True
            elif u != parent:
                return True  # Back edge to non-parent = cycle
        return False

    for v in adj:
        if v not in visited:
            if dfs_cycle(v, -1):
                return True
    return False

# DFS for topological sort
def topological_sort_dfs(adj: dict) -> list:
    """
    Topological sort using DFS (post-order traversal).
    Only valid for DAGs.
    """
    visited = set()
    result = []

    def dfs_topo(v):
        visited.add(v)
        for u in adj.get(v, []):
            if u not in visited:
                dfs_topo(u)
        result.append(v)  # Add to result AFTER processing all neighbors

    for v in adj:
        if v not in visited:
            dfs_topo(v)

    return result[::-1]  # Reverse post-order = topological order

Part 3 - Dijkstra's Shortest Path Algorithm

For weighted graphs with non-negative weights, Dijkstra's algorithm finds shortest paths from a source to all other vertices.

Key idea: Greedy - always expand the unvisited node with the smallest known distance.

Time complexity: $O((V + E) \log V)$ with a binary heap.

import heapq
import numpy as np

def dijkstra(adj_weighted: dict, source: int) -> dict:
    """
    Dijkstra's shortest path algorithm.
    adj_weighted[v] = [(neighbor, weight), ...] - adjacency list with weights.
    Returns: dict of shortest distances from source to all reachable nodes.
    """
    distances = {source: 0.0}
    pq = [(0.0, source)]  # (distance, node) priority queue

    while pq:
        dist_v, v = heapq.heappop(pq)

        # Skip if we've already found a shorter path
        if dist_v > distances.get(v, float('inf')):
            continue

        for u, weight in adj_weighted.get(v, []):
            new_dist = dist_v + weight
            if new_dist < distances.get(u, float('inf')):
                distances[u] = new_dist
                heapq.heappush(pq, (new_dist, u))

    return distances

# Example: road network
road_network = {
    'NYC': [('Boston', 215), ('DC', 225), ('Philly', 95)],
    'Boston': [('NYC', 215), ('Providence', 50)],
    'DC': [('NYC', 225), ('Philly', 140), ('Richmond', 110)],
    'Philly': [('NYC', 95), ('DC', 140)],
    'Providence': [('Boston', 50)],
    'Richmond': [('DC', 110)]
}

distances_from_nyc = dijkstra(road_network, 'NYC')
for city, dist in sorted(distances_from_nyc.items(), key=lambda x: x[1]):
    print(f"NYC → {city}: {dist} miles")

# scipy/networkx equivalents
import networkx as nx
G = nx.Graph()
for u, neighbors in road_network.items():
    for v, w in neighbors:
        G.add_edge(u, v, weight=w)

nx_distances = nx.single_source_dijkstra_path_length(G, 'NYC')
all_pairs = dict(nx.all_pairs_dijkstra_path_length(G))

Shortest path features for ML

All-pairs shortest path distances are powerful graph features:

import numpy as np
import networkx as nx

def compute_distance_matrix(G: nx.Graph) -> np.ndarray:
    """
    Compute all-pairs shortest path distance matrix.
    Useful for graph kernel methods and position encodings.
    O(n * (n + m) * log n) time - feasible for n < 10^4.
    """
    n = G.number_of_nodes()
    nodes = sorted(G.nodes())
    node_to_idx = {v: i for i, v in enumerate(nodes)}

    dist_matrix = np.full((n, n), np.inf)
    np.fill_diagonal(dist_matrix, 0)

    for source in nodes:
        dists = nx.single_source_shortest_path_length(G, source)
        for target, d in dists.items():
            dist_matrix[node_to_idx[source], node_to_idx[target]] = d

    return dist_matrix

G = nx.karate_club_graph()
D = compute_distance_matrix(G)
print(f"Distance matrix shape: {D.shape}")
print(f"Diameter (max distance): {D[D < np.inf].max():.0f}")
print(f"Average distance: {D[D < np.inf].mean():.2f}")

# Shortest path features as node embeddings (naive, for baselines)
# Each node is represented by its distances to all other nodes
node_features_from_distances = D   # (n, n) matrix - each row is a feature vector

Part 4 - Topological Sort

Topological sort orders nodes so that all edges go from earlier to later in the ordering. Only possible for DAGs.

Application in ML: Execution order for computation graphs, resolving feature dependencies, scheduling pipeline stages.

from collections import deque

def topological_sort_kahn(adj: dict) -> list:
    """
    Kahn's algorithm for topological sort.
    Uses BFS with in-degree tracking.
    O(V + E) time.
    """
    # Compute in-degrees
    all_nodes = set(adj.keys())
    for neighbors in adj.values():
        all_nodes.update(neighbors)

    in_degree = {v: 0 for v in all_nodes}
    for v in adj:
        for u in adj[v]:
            in_degree[u] += 1

    # Start with nodes that have no prerequisites
    queue = deque([v for v in all_nodes if in_degree[v] == 0])
    order = []

    while queue:
        v = queue.popleft()
        order.append(v)
        for u in adj.get(v, []):
            in_degree[u] -= 1
            if in_degree[u] == 0:
                queue.append(u)

    if len(order) < len(all_nodes):
        raise ValueError("Graph has a cycle - topological sort impossible")

    return order

# ML pipeline as a DAG
pipeline_dag = {
    'raw_data': ['clean_data'],
    'clean_data': ['features_A', 'features_B'],
    'features_A': ['model_input'],
    'features_B': ['model_input'],
    'model_input': ['train_model'],
    'train_model': ['evaluate'],
    'evaluate': []
}

execution_order = topological_sort_kahn(pipeline_dag)
print(f"Execution order: {execution_order}")
# ['raw_data', 'clean_data', 'features_A', 'features_B', 'model_input', 'train_model', 'evaluate']

Part 5 - Minimum Spanning Tree

A minimum spanning tree (MST) of a weighted graph is a spanning tree (connected subgraph with $|V|-1$ edges) with minimum total edge weight.

Applications in ML: Clustering, dimensionality reduction initialization, data manifold approximation.

import numpy as np
import heapq

def prim_mst(adj_weighted: dict) -> list:
    """
    Prim's algorithm for Minimum Spanning Tree.
    Time: O((V + E) log V) with a heap.
    Returns: list of edges (u, v, weight) in the MST.
    """
    if not adj_weighted:
        return []

    start = next(iter(adj_weighted))
    visited = {start}
    mst_edges = []

    # Heap of (weight, u, v) - u is in MST, v is candidate
    candidates = [(w, start, v) for v, w in adj_weighted[start]]
    heapq.heapify(candidates)

    while candidates and len(visited) < len(adj_weighted):
        weight, u, v = heapq.heappop(candidates)
        if v in visited:
            continue
        visited.add(v)
        mst_edges.append((u, v, weight))
        for neighbor, w in adj_weighted.get(v, []):
            if neighbor not in visited:
                heapq.heappush(candidates, (w, v, neighbor))

    return mst_edges

# NetworkX MST
import networkx as nx

G = nx.complete_graph(10)
for u, v in G.edges():
    G[u][v]['weight'] = np.random.rand()

mst = nx.minimum_spanning_tree(G, algorithm='kruskal')
print(f"MST: {mst.number_of_edges()} edges, total weight: {sum(d['weight'] for u,v,d in mst.edges(data=True)):.4f}")

# MST for clustering: edges in MST connect nearest neighbors in a specific sense
# Cutting the heaviest edges in the MST gives clusters

Part 6 - PageRank

PageRank models a random surfer who follows links randomly with probability $\alpha$ and teleports to a random node with probability $1-\alpha$ .

Definition: For node $v$ :

$\text{PR}(v) = \frac{1-\alpha}{n} + \alpha \sum_{u \to v} \frac{\text{PR}(u)}{\text{out\_degree}(u)}$

This is the stationary distribution of a Markov chain on the graph.

import numpy as np

def pagerank(adj: dict, alpha: float = 0.85, tol: float = 1e-8, maxiter: int = 100) -> dict:
    """
    Power iteration PageRank.
    alpha: damping factor (probability of following a link)
    """
    nodes = list(adj.keys())
    n = len(nodes)
    node_idx = {v: i for i, v in enumerate(nodes)}

    # Build transition matrix
    out_degrees = {v: len(adj.get(v, [])) for v in nodes}

    # Initialize uniform
    pr = {v: 1.0 / n for v in nodes}

    for iteration in range(maxiter):
        pr_new = {v: (1 - alpha) / n for v in nodes}

        for u in nodes:
            if out_degrees.get(u, 0) > 0:
                contribution = alpha * pr[u] / out_degrees[u]
                for v in adj.get(u, []):
                    pr_new[v] = pr_new.get(v, 0) + contribution
            else:
                # Dangling node: distribute uniformly (teleport)
                for v in nodes:
                    pr_new[v] += alpha * pr[u] / n

        # Check convergence
        diff = sum(abs(pr_new[v] - pr[v]) for v in nodes)
        pr = pr_new
        if diff < tol:
            print(f"PageRank converged in {iteration + 1} iterations")
            break

    # Normalize
    total = sum(pr.values())
    return {v: score / total for v, score in pr.items()}

# ML applications of PageRank
def personalized_pagerank(adj: dict, seed_nodes: list, alpha: float = 0.85) -> dict:
    """
    Personalized PageRank (PPR): teleport to seed nodes instead of uniform random.
    Used in: node recommendation, community detection, GNN pre-training.

    PPR from seed nodes captures the "influence cone" of those nodes.
    """
    nodes = list(adj.keys())
    n = len(nodes)

    # Teleportation distribution: uniform over seed nodes
    seed_set = set(seed_nodes)
    teleport = {v: (1.0 / len(seed_nodes)) if v in seed_set else 0.0
                for v in nodes}

    pr = {v: teleport[v] for v in nodes}
    out_degrees = {v: len(adj.get(v, [])) for v in nodes}

    for _ in range(100):
        pr_new = {v: (1 - alpha) * teleport[v] for v in nodes}
        for u in nodes:
            if out_degrees.get(u, 0) > 0:
                contribution = alpha * pr[u] / out_degrees[u]
                for v in adj.get(u, []):
                    pr_new[v] = pr_new.get(v, 0) + contribution
        pr = pr_new

    return pr

PageRank as a GNN baseline

Personalized PageRank (PPR) is directly connected to GNNs. APPNP (Approximate PPR propagation) uses PPR to define the feature propagation scheme:

$H^{(\text{final})} = (1-\alpha)(I - \alpha \hat{A})^{-1} H^{(0)}$

This is equivalent to infinite GNN layers with PPR-weighted influence, which avoids over-smoothing.

Part 7 - Graph Feature Engineering for ML

Before training a GNN, or when a GNN is overkill, compute structural features as inputs to gradient boosted trees or linear models:

import networkx as nx
import numpy as np
import pandas as pd

def extract_node_features(G: nx.Graph) -> pd.DataFrame:
    """
    Extract a comprehensive set of node-level graph features.
    Used as input features for XGBoost/LightGBM on graph datasets.
    """
    features = {}

    # Basic degree features
    features['degree'] = dict(G.degree())
    features['in_degree'] = dict(G.degree())   # Same for undirected

    # Centrality measures
    features['degree_centrality'] = nx.degree_centrality(G)
    features['closeness_centrality'] = nx.closeness_centrality(G)

    # Betweenness (expensive for large graphs - use approximation)
    n = G.number_of_nodes()
    if n <= 5000:
        features['betweenness_centrality'] = nx.betweenness_centrality(G)
    else:
        # Approximate with sampling
        features['betweenness_centrality'] = nx.betweenness_centrality(G, k=500)

    # PageRank
    features['pagerank'] = nx.pagerank(G, alpha=0.85)

    # Local structure
    features['clustering_coefficient'] = nx.clustering(G)
    triangles = nx.triangles(G)
    features['triangle_count'] = triangles

    # Core number (k-core decomposition: remove low-degree nodes iteratively)
    features['core_number'] = nx.core_number(G)

    # Aggregate into DataFrame
    nodes = sorted(G.nodes())
    df = pd.DataFrame({
        feat_name: [feat_dict[n] for n in nodes]
        for feat_name, feat_dict in features.items()
    }, index=nodes)

    return df

# Example usage
G = nx.karate_club_graph()
node_features_df = extract_node_features(G)
print(node_features_df.head())
print(f"Feature matrix shape: {node_features_df.shape}")

Interview Questions

Q1: Compare BFS and DFS. What are the time/space complexities and when would you use each?

Property	BFS	DFS
Data structure	Queue (FIFO)	Stack (LIFO) or recursion
Time complexity	$O(V + E)$	$O(V + E)$
Space complexity	$O(V)$ - stores entire frontier	$O(\text{depth})$ - only current path
Shortest paths	Yes (unweighted)	No
Cycle detection	Yes	Yes
Topological sort	Via Kahn's algorithm	Via post-order traversal

Use BFS when:

Finding shortest paths (unweighted graphs)
Exploring nodes level by level (GNN $k$ -hop neighborhoods)
Finding the closest node to a source
Level-order traversal of trees
Bipartite graph detection (2-color level by level)

Use DFS when:

Topological sort of a DAG
Finding strongly connected components (Tarjan's algorithm)
Cycle detection
Maze solving (find any path, not necessarily shortest)
Generating all connected components

Space complexity matters for large graphs: BFS stores the entire frontier (up to $O(V)$ nodes at the widest level). DFS only stores the current path from root to the deepest visited node ( $O(\text{depth})$ ). For balanced trees with $V$ nodes, depth = $O(\log V)$ but frontier = $O(V/2)$ at the last level - DFS is much more space-efficient. For long chains, DFS depth = $O(V)$ and may cause stack overflow - use iterative DFS.

Q2: Walk through Dijkstra's algorithm and explain why it fails for negative-weight edges.

Algorithm walkthrough for source = A on graph A-B(weight 1), A-C(weight 4), B-C(weight 2), B-D(weight 5), C-D(weight 1):

Initialize: distances = {A:0, B:∞, C:∞, D:∞}, heap = [(0, A)]
Pop (0, A): process A. Update neighbors: B→1, C→4. heap = [(1,B), (4,C)]
Pop (1, B): process B. Update: C→min(4, 1+2)=3, D→6. heap = [(3,C), (4,C), (6,D)]
Pop (3, C): process C. Update: D→min(6, 3+1)=4. heap = [(4,C_old), (4,D)]
Pop (4, C_old): already visited, skip.
Pop (4, D): process D. Done.
Final: {A:0, B:1, C:3, D:4} ✓

Why it fails for negative edges: Dijkstra's greedy correctness relies on the property: once a node is popped from the heap, its distance is finalized. This holds because all edge weights are non-negative - adding more edges can only increase the path length.

With a negative edge $(B, C, -10)$ : When we pop B with distance 1, we update C to $1 + (-10) = -9$ . But if we had already finalized C at distance 4 (when we popped it earlier), that earlier finalization was wrong.

Fix for negative edges: Use Bellman-Ford ( $O(VE)$ time) - it runs $V-1$ relaxation passes over all edges, allowing negative-weight corrections to propagate. For graphs without negative cycles, Bellman-Ford always finds the correct shortest paths.

In ML: Edge weights in similarity graphs are often non-negative (distances, costs), so Dijkstra is commonly applicable. For graphs where edges represent "rewards" (reinforcement learning state graphs), negative edges may appear - use Bellman-Ford or ensure non-negativity by shifting weights.

Q3: What is PageRank and how is Personalized PageRank used in GNNs?

PageRank models a random surfer who:

With probability $\alpha$ (damping factor, typically 0.85): follows a random outgoing link
With probability $1-\alpha$ : teleports to a uniformly random node

The stationary distribution of this Markov chain is the PageRank vector $\mathbf{p}$ : $\mathbf{p} = \alpha A^T D^{-1} \mathbf{p} + \frac{1-\alpha}{n} \mathbf{1}$

It assigns higher scores to nodes that are linked to by other high-score nodes.

Personalized PageRank (PPR): Teleport to a specific seed set rather than uniformly. For seed node $v$ , the PPR vector $\pi_v$ gives the "influence" of $v$ on all other nodes - nodes nearby in the graph have higher PPR scores.

PPR in GNNs (APPNP):

PPR can be seen as an infinite-depth GNN with dampened propagation: $\pi_v \approx (1-\alpha)(I - \alpha \hat{A})^{-1} e_v$

where $\hat{A}$ is the normalized adjacency. APPNP separates feature transformation from propagation:

Transform: $H = f_\theta(X)$ (MLP, no propagation)
Propagate: $Z = (1-\alpha)(I - \alpha \hat{A})^{-1} H$ (PPR propagation, no parameters)

Advantages of PPR propagation:

Avoids over-smoothing: the $(1-\alpha)$ factor prevents the infinitely-propagated features from converging to a constant
Long-range dependencies: PPR reaches nodes far away with appropriately decayed influence
No learnable parameters in propagation: faster convergence, less overfitting

Q4: How do you extract graph features for a tabular ML model when a GNN would be overkill?

When graph structure is available but labeled data is scarce or a GNN is computationally expensive, structural graph features provide powerful signals for gradient boosted trees.

Feature categories:

Node-level structural features (computed once, used as columns):

Degree (in/out), log-degree
Clustering coefficient (local transitivity)
Triangle count
Core number (k-core decomposition: robust to noise)
PageRank, Personalized PageRank from known seed nodes
Eigenvector centrality (for smaller graphs)

Neighborhood aggregate features (first-order neighborhood statistics):

Mean/max/min/std of neighbor degrees
Fraction of neighbors with a specific attribute
Number of common neighbors between nodes u and v (for edge prediction)

Topological features (global structure):

Eccentricity, betweenness (expensive - sample for large graphs)
Shortest path to important nodes (e.g., "hub" nodes)
Number of paths of length k

Feature engineering pipeline:

# 1. Compute features
features_df = extract_node_features(G)  # See Part 7 above

# 2. Aggregate neighborhood features
for n in G.nodes():
    nbr_degrees = [G.degree(u) for u in G.neighbors(n)]
    features_df.loc[n, 'mean_nbr_degree'] = np.mean(nbr_degrees) if nbr_degrees else 0
    features_df.loc[n, 'max_nbr_degree'] = max(nbr_degrees) if nbr_degrees else 0

# 3. Train XGBoost on these features
import xgboost as xgb
model = xgb.XGBClassifier()
model.fit(features_df[train_mask], labels[train_mask])

When to use GNN vs feature engineering:

GNN if: labeled data > 1000 nodes, GPU available, graph structure is primary signal
Feature engineering if: small labeled set, interpretability needed, quick prototyping, tabular model already in production

Q5: What is the k-core decomposition and why is it useful for identifying influential nodes?

k-core decomposition: The $k$ -core of a graph is the maximal subgraph where every node has at least $k$ neighbors within the subgraph. Every node is assigned its core number - the maximum $k$ such that it belongs to the $k$ -core.

Algorithm: Iteratively remove nodes with degree less than $k$ . Nodes removed in round $k$ have core number $k-1$ .

import networkx as nx
G = nx.karate_club_graph()
core_numbers = nx.core_number(G)
print(f"Core number distribution:")
from collections import Counter
print(Counter(core_numbers.values()))
# {4: 10, 3: 15, 2: 7, 1: 2} - 10 nodes in the 4-core

Why core number is useful:

Robustness: Unlike degree centrality (a single node can have high degree by connecting to periphery), core number measures embeddedness in a dense subgraph. High core-number nodes are robust influencers - removing a few of their connections doesn't reduce their influence.
Information spreading: In epidemic/influence spreading models, nodes with high core number are more likely to be in the "giant cascade" - they sit in densely connected regions that sustain propagation.
Fraud detection: Fraudulent accounts often form dense clusters (they mutually follow/transact to inflate metrics). High core numbers within a suspicious cluster indicate collusion.
Preprocessing for GNNs: Core-number-stratified sampling ensures training batches cover both core (dense) and periphery (sparse) nodes - important for learning models that generalize to both.
Community detection: The $k$ -core hierarchy gives a natural multi-resolution view of community structure - the densest communities appear in the highest- $k$ cores.

Quick Reference

Algorithm	Time	Space	Use Case
BFS	$O(V+E)$	$O(V)$	Shortest path (unweighted), component discovery
DFS	$O(V+E)$	$O(\text{depth})$	Topological sort, cycle detection, SCC
Dijkstra	$O((V+E)\log V)$	$O(V)$	Shortest path (non-negative weights)
Bellman-Ford	$O(VE)$	$O(V)$	Shortest path (negative weights)
Topological sort	$O(V+E)$	$O(V)$	DAG execution order
Kruskal MST	$O(E \log E)$	$O(V)$	Minimum spanning tree
PageRank	$O(k(V+E))$	$O(V)$	Node importance, recommendation
Core decomposition	$O(V+E)$	$O(V)$	Influence, dense subgraph detection

Next: Lesson 04: Spectral Graph Theory →

:::tip 🎮 Interactive Playground

Visualize this concept: Try the Graph Algorithms demo on the EngineersOfAI Playground - no code required.

:::

What You Will Learn​

Part 1 - Breadth-First Search (BFS)​

Algorithm​

BFS in GNNs: k-hop neighborhoods​

Part 2 - Depth-First Search (DFS)​

Part 3 - Dijkstra's Shortest Path Algorithm​

Shortest path features for ML​

Part 4 - Topological Sort​

Part 5 - Minimum Spanning Tree​

Part 6 - PageRank​

PageRank as a GNN baseline​

Part 7 - Graph Feature Engineering for ML​

Interview Questions​

Quick Reference​