Trees and Graphs

BFS, DFS, binary trees, BSTs, tries, graph representations, shortest paths, and topological sort - with ML-relevant practice problems and full Python solutions.

Reading time: ~40 min | Interview relevance: High | Roles: All AI/ML roles

The Real Interview Moment

You are in a virtual on-site at a major AI company. The interviewer shares a screen with a diagram: "Here is the dependency graph for our ML training pipeline. Node A is data ingestion, B is data validation, C is feature engineering (depends on A and B), D is model training (depends on C), E is evaluation (depends on D), and F is model registration (depends on D and E). We need to determine a valid execution order."

You immediately recognize this: topological sort on a DAG. You have seen this exact pattern in production ML systems - Airflow DAGs, Kubeflow pipelines, MLflow projects. You explain the approach: build an adjacency list, compute in-degrees, use a queue to process nodes with zero in-degree. Within ten minutes, you have clean, working code.

The follow-up comes: "What if there is a cycle? How do you detect it?" You explain that if you process fewer nodes than exist in the graph, there must be a cycle. The interviewer probes further: "Can you identify which nodes are in the cycle?" Now you are in territory where DFS with node coloring (white/gray/black) shines.

Trees and graphs are not abstract textbook concepts - they are the backbone of ML systems. This chapter prepares you for both the algorithmic questions and the practical applications.

Binary Trees

Core Traversals

Every tree problem builds on four traversals. Know them cold.

class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

# ============ RECURSIVE TRAVERSALS ============

def inorder(root):
    """Left -> Root -> Right. Gives sorted order for BST.
    Time: O(n), Space: O(h) call stack where h = height
    """
    if not root:
        return []
    return inorder(root.left) + [root.val] + inorder(root.right)

def preorder(root):
    """Root -> Left -> Right. Useful for tree serialization.
    Time: O(n), Space: O(h)
    """
    if not root:
        return []
    return [root.val] + preorder(root.left) + preorder(root.right)

def postorder(root):
    """Left -> Right -> Root. Useful for deletion, expression evaluation.
    Time: O(n), Space: O(h)
    """
    if not root:
        return []
    return postorder(root.left) + postorder(root.right) + [root.val]

# ============ ITERATIVE TRAVERSALS ============

def inorder_iterative(root):
    """Iterative inorder using an explicit stack."""
    result = []
    stack = []
    current = root
    while current or stack:
        while current:
            stack.append(current)
            current = current.left
        current = stack.pop()
        result.append(current.val)
        current = current.right
    return result

def preorder_iterative(root):
    """Iterative preorder using a stack."""
    if not root:
        return []
    result = []
    stack = [root]
    while stack:
        node = stack.pop()
        result.append(node.val)
        # Push right first so left is processed first (LIFO)
        if node.right:
            stack.append(node.right)
        if node.left:
            stack.append(node.left)
    return result

# ============ LEVEL ORDER (BFS) ============

from collections import deque

def level_order(root):
    """BFS traversal - returns list of lists, one per level.
    Time: O(n), Space: O(w) where w = max width of tree
    """
    if not root:
        return []
    result = []
    queue = deque([root])
    while queue:
        level = []
        for _ in range(len(queue)):
            node = queue.popleft()
            level.append(node.val)
            if node.left:
                queue.append(node.left)
            if node.right:
                queue.append(node.right)
        result.append(level)
    return result

Traversal Comparison

Traversal	Order	Use Cases	ML Application
Inorder	L -> Root -> R	Get sorted order from BST	Sorted feature extraction from decision tree
Preorder	Root -> L -> R	Serialize tree, copy tree	Serialize model tree for storage
Postorder	L -> R -> Root	Delete tree, evaluate expressions	Compute subtree metrics bottom-up
Level Order (BFS)	Level by level	Find min depth, level averages	Layer-by-layer analysis of decision tree

60-Second Answer

If an interviewer asks "When would you use BFS vs DFS on a tree?", the answer is: Use BFS when you need level-by-level processing or finding the shortest path (min depth, level averages, right-side view). Use DFS when you need to process leaves before parents (path sums, tree diameter, subtree computations) or when the tree is very wide (BFS queue would be too large).

Common Binary Tree Problems

def max_depth(root):
    """Maximum depth of a binary tree.
    Time: O(n), Space: O(h)
    """
    if not root:
        return 0
    return 1 + max(max_depth(root.left), max_depth(root.right))

def is_balanced(root):
    """Check if tree is height-balanced (every subtree's height differs by at most 1).
    Time: O(n), Space: O(h)
    """
    def check(node):
        if not node:
            return 0
        left_h = check(node.left)
        right_h = check(node.right)
        if left_h == -1 or right_h == -1 or abs(left_h - right_h) > 1:
            return -1
        return 1 + max(left_h, right_h)
    return check(root) != -1

def diameter(root):
    """Diameter of binary tree (longest path between any two nodes).
    Time: O(n), Space: O(h)
    """
    max_diameter = [0]
    def depth(node):
        if not node:
            return 0
        left = depth(node.left)
        right = depth(node.right)
        max_diameter[0] = max(max_diameter[0], left + right)
        return 1 + max(left, right)
    depth(root)
    return max_diameter[0]

def lowest_common_ancestor(root, p, q):
    """Find the lowest common ancestor of two nodes.
    Time: O(n), Space: O(h)
    """
    if not root or root == p or root == q:
        return root
    left = lowest_common_ancestor(root.left, p, q)
    right = lowest_common_ancestor(root.right, p, q)
    if left and right:
        return root
    return left or right

def has_path_sum(root, target):
    """Check if any root-to-leaf path sums to target.
    ML context: Check if any branch of a decision tree
    produces a cumulative score above a threshold.
    Time: O(n), Space: O(h)
    """
    if not root:
        return False
    if not root.left and not root.right:
        return root.val == target
    return (has_path_sum(root.left, target - root.val) or
            has_path_sum(root.right, target - root.val))

Binary Search Trees (BSTs)

BSTs maintain the property: left.val < node.val < right.val. This enables O(log n) search, insert, and delete in balanced trees.

def search_bst(root, target):
    """Search for a value in a BST.
    Time: O(h), Space: O(h) recursive / O(1) iterative
    """
    if not root or root.val == target:
        return root
    if target < root.val:
        return search_bst(root.left, target)
    return search_bst(root.right, target)

def insert_bst(root, val):
    """Insert a value into a BST.
    Time: O(h), Space: O(h)
    """
    if not root:
        return TreeNode(val)
    if val < root.val:
        root.left = insert_bst(root.left, val)
    elif val > root.val:
        root.right = insert_bst(root.right, val)
    return root

def validate_bst(root, low=float('-inf'), high=float('inf')):
    """Validate that a tree is a valid BST.
    Time: O(n), Space: O(h)
    """
    if not root:
        return True
    if root.val <= low or root.val >= high:
        return False
    return (validate_bst(root.left, low, root.val) and
            validate_bst(root.right, root.val, high))

def kth_smallest_bst(root, k):
    """Find the kth smallest element in a BST.
    Approach: Inorder traversal gives sorted order.
    Time: O(h + k), Space: O(h)
    """
    stack = []
    current = root
    count = 0
    while current or stack:
        while current:
            stack.append(current)
            current = current.left
        current = stack.pop()
        count += 1
        if count == k:
            return current.val
        current = current.right
    return -1

Common Trap

When validating a BST, do not just check that left.val < node.val < right.val. You must check that all values in the left subtree are less than the node, and all values in the right subtree are greater. The correct approach passes valid ranges (low, high) down the recursion.

Tries (Prefix Trees)

Tries are essential for NLP-related problems: autocomplete, spell checking, prefix matching.

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False
        self.count = 0  # Number of words with this prefix

class Trie:
    """Prefix tree for efficient string operations.

    ML context: Building prefix-based search for token vocabularies,
    autocomplete for text input, prefix matching in NLP pipelines.

    Insert: O(m), Search: O(m), Prefix search: O(m)
    where m = length of the word
    Space: O(total characters across all words)
    """
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
            node.count += 1
        node.is_end = True

    def search(self, word):
        """Return True if word exists in trie."""
        node = self._find_node(word)
        return node is not None and node.is_end

    def starts_with(self, prefix):
        """Return True if any word starts with prefix."""
        return self._find_node(prefix) is not None

    def count_with_prefix(self, prefix):
        """Count words with given prefix."""
        node = self._find_node(prefix)
        return node.count if node else 0

    def _find_node(self, prefix):
        node = self.root
        for char in prefix:
            if char not in node.children:
                return None
            node = node.children[char]
        return node

    def autocomplete(self, prefix, limit=5):
        """Return up to `limit` words starting with prefix.

        ML context: Autocomplete suggestions based on prefix.
        """
        node = self._find_node(prefix)
        if not node:
            return []
        results = []
        self._dfs_collect(node, prefix, results, limit)
        return results

    def _dfs_collect(self, node, current, results, limit):
        if len(results) >= limit:
            return
        if node.is_end:
            results.append(current)
        for char in sorted(node.children):
            self._dfs_collect(node.children[char], current + char, results, limit)

Graph Representations

Graph Representations - Choosing the Right One

# Adjacency list - most common for interviews
def build_adjacency_list(edges, directed=True):
    """Build adjacency list from edge list.
    Time: O(E), Space: O(V + E)
    """
    from collections import defaultdict
    graph = defaultdict(list)
    for u, v in edges:
        graph[u].append(v)
        if not directed:
            graph[v].append(u)
    return graph

# Adjacency matrix - for dense graphs
def build_adjacency_matrix(n, edges, directed=True):
    """Build adjacency matrix from edge list.
    Time: O(V^2 + E), Space: O(V^2)
    """
    matrix = [[0] * n for _ in range(n)]
    for u, v in edges:
        matrix[u][v] = 1
        if not directed:
            matrix[v][u] = 1
    return matrix

Graph Traversals: BFS and DFS

Breadth-First Search (BFS)

from collections import deque

def bfs(graph, start):
    """BFS traversal of a graph.
    Time: O(V + E), Space: O(V)
    """
    visited = set([start])
    queue = deque([start])
    order = []
    while queue:
        node = queue.popleft()
        order.append(node)
        for neighbor in graph.get(node, []):
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append(neighbor)
    return order

def shortest_path_bfs(graph, start, end):
    """Find shortest path in unweighted graph using BFS.

    ML context: Finding the shortest dependency chain
    between two stages in an ML pipeline.

    Time: O(V + E), Space: O(V)
    """
    if start == end:
        return [start]
    visited = set([start])
    queue = deque([(start, [start])])
    while queue:
        node, path = queue.popleft()
        for neighbor in graph.get(node, []):
            if neighbor == end:
                return path + [neighbor]
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append((neighbor, path + [neighbor]))
    return []  # No path exists

def bfs_level_by_level(graph, start):
    """BFS that processes level by level (useful for distance calculations).
    Time: O(V + E), Space: O(V)
    """
    visited = set([start])
    queue = deque([start])
    levels = []
    while queue:
        level = []
        for _ in range(len(queue)):
            node = queue.popleft()
            level.append(node)
            for neighbor in graph.get(node, []):
                if neighbor not in visited:
                    visited.add(neighbor)
                    queue.append(neighbor)
        levels.append(level)
    return levels

Depth-First Search (DFS)

def dfs_recursive(graph, start, visited=None):
    """DFS traversal (recursive).
    Time: O(V + E), Space: O(V) for visited set + O(V) call stack
    """
    if visited is None:
        visited = set()
    visited.add(start)
    result = [start]
    for neighbor in graph.get(start, []):
        if neighbor not in visited:
            result.extend(dfs_recursive(graph, neighbor, visited))
    return result

def dfs_iterative(graph, start):
    """DFS traversal (iterative with explicit stack).
    Time: O(V + E), Space: O(V)
    """
    visited = set()
    stack = [start]
    result = []
    while stack:
        node = stack.pop()
        if node in visited:
            continue
        visited.add(node)
        result.append(node)
        for neighbor in graph.get(node, []):
            if neighbor not in visited:
                stack.append(neighbor)
    return result

def has_cycle_directed(graph, num_nodes):
    """Detect cycle in a directed graph using DFS with 3-color marking.

    WHITE (0) = unvisited
    GRAY (1)  = in current DFS path (being processed)
    BLACK (2) = fully processed

    If we encounter a GRAY node, we have found a back edge (cycle).

    ML context: Detecting circular dependencies in ML pipeline DAGs.

    Time: O(V + E), Space: O(V)
    """
    WHITE, GRAY, BLACK = 0, 1, 2
    color = {i: WHITE for i in range(num_nodes)}

    def dfs(node):
        color[node] = GRAY
        for neighbor in graph.get(node, []):
            if color[neighbor] == GRAY:
                return True  # Back edge found - cycle exists
            if color[neighbor] == WHITE and dfs(neighbor):
                return True
        color[node] = BLACK
        return False

    for node in range(num_nodes):
        if color[node] == WHITE:
            if dfs(node):
                return True
    return False

BFS vs DFS Comparison

Aspect	BFS	DFS
Data structure	Queue	Stack (or recursion)
Exploration pattern	Level by level	As deep as possible
Shortest path (unweighted)	Yes	No
Space complexity	O(width)	O(depth)
Best for	Shortest path, level-order, wide graphs	Cycle detection, path finding, deep graphs
ML use cases	Pipeline stage distance, layer analysis	Dependency detection, tree serialization

Topological Sort

Topological sort orders nodes in a DAG such that for every edge (u, v), u comes before v. This is critical for ML pipeline execution ordering.

from collections import deque, defaultdict

def topological_sort_kahn(graph, num_nodes):
    """Topological sort using Kahn's algorithm (BFS-based).

    ML context: Determining execution order of ML pipeline stages.
    data_ingestion -> preprocessing -> feature_engineering -> training -> evaluation

    Time: O(V + E), Space: O(V)
    """
    # Compute in-degrees
    in_degree = [0] * num_nodes
    for node in graph:
        for neighbor in graph[node]:
            in_degree[neighbor] += 1

    # Start with nodes that have no dependencies
    queue = deque([i for i in range(num_nodes) if in_degree[i] == 0])
    order = []

    while queue:
        node = queue.popleft()
        order.append(node)
        for neighbor in graph.get(node, []):
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)

    # If we processed fewer nodes than exist, there is a cycle
    if len(order) != num_nodes:
        return []  # Cycle detected - no valid topological order
    return order

def topological_sort_dfs(graph, num_nodes):
    """Topological sort using DFS (adds nodes in reverse postorder).
    Time: O(V + E), Space: O(V)
    """
    visited = set()
    order = []

    def dfs(node):
        visited.add(node)
        for neighbor in graph.get(node, []):
            if neighbor not in visited:
                dfs(neighbor)
        order.append(node)

    for node in range(num_nodes):
        if node not in visited:
            dfs(node)

    return order[::-1]  # Reverse postorder

# Example: ML Pipeline
def pipeline_execution_order():
    """Determine valid execution order for an ML pipeline."""
    stages = {
        0: 'data_ingestion',
        1: 'data_validation',
        2: 'feature_engineering',
        3: 'model_training',
        4: 'evaluation',
        5: 'model_registration'
    }

    # Dependencies (directed edges)
    graph = {
        0: [2],     # data_ingestion -> feature_engineering
        1: [2],     # data_validation -> feature_engineering
        2: [3],     # feature_engineering -> model_training
        3: [4, 5],  # model_training -> evaluation, model_registration
        4: [5],     # evaluation -> model_registration
        5: []
    }

    order = topological_sort_kahn(graph, 6)
    return [stages[i] for i in order]
    # Possible output: ['data_ingestion', 'data_validation',
    #                    'feature_engineering', 'model_training',
    #                    'evaluation', 'model_registration']

Instant Rejection

If you cannot implement topological sort in an ML interview, it is a significant gap. ML engineers work with DAG-based systems (Airflow, Kubeflow, Dagster) daily. Interviewers expect you to know Kahn's algorithm (BFS-based) or DFS-based topological sort from memory.

Shortest Path Algorithms

Dijkstra's Algorithm

import heapq

def dijkstra(graph, start):
    """Find shortest path from start to all other nodes in a weighted graph.

    graph[node] = [(neighbor, weight), ...]

    ML context: Finding the lowest-cost path through a model
    ensemble or pipeline with weighted processing costs.

    Time: O((V + E) log V), Space: O(V)
    """
    distances = {start: 0}
    heap = [(0, start)]  # (distance, node)
    visited = set()

    while heap:
        dist, node = heapq.heappop(heap)
        if node in visited:
            continue
        visited.add(node)

        for neighbor, weight in graph.get(node, []):
            new_dist = dist + weight
            if neighbor not in distances or new_dist < distances[neighbor]:
                distances[neighbor] = new_dist
                heapq.heappush(heap, (new_dist, neighbor))

    return distances

# Example: Finding cheapest compute path
# graph = {
#     'cpu': [('gpu_small', 5), ('tpu', 20)],
#     'gpu_small': [('gpu_large', 3), ('cpu', 5)],
#     'gpu_large': [('tpu', 2)],
#     'tpu': []
# }
# dijkstra(graph, 'cpu') -> {'cpu': 0, 'gpu_small': 5, 'gpu_large': 8, 'tpu': 10}

Union-Find (Disjoint Set)

class UnionFind:
    """Union-Find with path compression and union by rank.

    ML context: Clustering - grouping similar items,
    connected components in similarity graphs.

    Find: O(alpha(n)) ~ O(1) amortized
    Union: O(alpha(n)) ~ O(1) amortized
    """
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
        self.count = n  # Number of connected components

    def find(self, x):
        """Find root with path compression."""
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y):
        """Union by rank. Returns True if x and y were in different components."""
        rx, ry = self.find(x), self.find(y)
        if rx == ry:
            return False
        if self.rank[rx] < self.rank[ry]:
            rx, ry = ry, rx
        self.parent[ry] = rx
        if self.rank[rx] == self.rank[ry]:
            self.rank[rx] += 1
        self.count -= 1
        return True

    def connected(self, x, y):
        return self.find(x) == self.find(y)

def count_connected_components(n, edges):
    """Count connected components using Union-Find.

    ML context: Finding clusters in a similarity graph
    where edges connect similar data points.

    Time: O(E * alpha(V)) ~ O(E), Space: O(V)
    """
    uf = UnionFind(n)
    for u, v in edges:
        uf.union(u, v)
    return uf.count

Number of Islands (Classic Graph Problem)

def num_islands(grid):
    """Count the number of islands in a 2D grid.

    ML context: Connected component analysis in image segmentation.
    Each '1' is land, '0' is water. Count distinct land masses.

    Time: O(rows * cols), Space: O(rows * cols) for visited
    """
    if not grid:
        return 0

    rows, cols = len(grid), len(grid[0])
    visited = set()
    count = 0

    def dfs(r, c):
        if (r < 0 or r >= rows or c < 0 or c >= cols or
                grid[r][c] == '0' or (r, c) in visited):
            return
        visited.add((r, c))
        dfs(r + 1, c)
        dfs(r - 1, c)
        dfs(r, c + 1)
        dfs(r, c - 1)

    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == '1' and (r, c) not in visited:
                dfs(r, c)
                count += 1

    return count

Practice Problems

Problem 1: Validate ML Pipeline DAG

Given an ML pipeline as a list of (source, destination) dependencies, determine if the pipeline is valid (no circular dependencies) and return a valid execution order.

# Input:  num_stages = 5
#         dependencies = [(0,1), (0,2), (1,3), (2,3), (3,4)]
# Output: [0, 1, 2, 3, 4] or [0, 2, 1, 3, 4] (any valid order)

# Input:  num_stages = 3
#         dependencies = [(0,1), (1,2), (2,0)]
# Output: [] (cycle detected - invalid pipeline)

Hint 1

This is a direct application of topological sort with cycle detection.

Hint 2

Use Kahn's algorithm. Build in-degree array. If the result has fewer nodes than num_stages, there is a cycle.

Hint 3

The graph is directed: dependency (a, b) means a must run before b, so add edge a -> b.

Solution

from collections import defaultdict, deque

def validate_pipeline(num_stages, dependencies):
    """Return valid execution order or empty list if cycle exists.
    Time: O(V + E), Space: O(V + E)
    """
    graph = defaultdict(list)
    in_degree = [0] * num_stages

    for src, dst in dependencies:
        graph[src].append(dst)
        in_degree[dst] += 1

    queue = deque([i for i in range(num_stages) if in_degree[i] == 0])
    order = []

    while queue:
        node = queue.popleft()
        order.append(node)
        for neighbor in graph[node]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)

    return order if len(order) == num_stages else []

Problem 2: Binary Tree Right Side View

Given a binary tree, return the values visible from the right side (the last node at each level).

# Input:     1
#           / \
#          2   3
#           \   \
#            5   4
# Output: [1, 3, 4]

Hint 1

Use level-order traversal (BFS). The right-side view is the last element of each level.

Hint 2

Alternatively, use DFS visiting right child first, and track the first node seen at each depth.

Hint 3

For BFS: process each level, and append only the last element. For DFS: if `depth == len(result)`, it is the first node at this depth from the right.

Solution

from collections import deque

def right_side_view(root):
    """Return right-side view of binary tree.
    Time: O(n), Space: O(w) for BFS / O(h) for DFS
    """
    if not root:
        return []
    result = []
    queue = deque([root])
    while queue:
        level_size = len(queue)
        for i in range(level_size):
            node = queue.popleft()
            if i == level_size - 1:
                result.append(node.val)
            if node.left:
                queue.append(node.left)
            if node.right:
                queue.append(node.right)
    return result

Problem 3: Clone a Graph

Given a reference to a node in a connected undirected graph, return a deep copy of the graph.

# ML context: Cloning a computation graph for gradient checkpointing
# or creating a copy of a model architecture graph.

Hint 1

Use BFS or DFS. The key challenge is avoiding infinite loops with cycles - use a hash map to track which nodes have already been cloned.

Hint 2

Map: `original node -> cloned node`. Before processing neighbors, check if they have already been cloned.

Hint 3

Process: create clone, add to map, iterate through neighbors, clone each uncloned neighbor and recurse.

Solution

class GraphNode:
    def __init__(self, val=0, neighbors=None):
        self.val = val
        self.neighbors = neighbors if neighbors else []

def clone_graph(node):
    """Deep copy a graph using DFS + hash map.
    Time: O(V + E), Space: O(V)
    """
    if not node:
        return None

    cloned = {}  # original -> clone mapping

    def dfs(original):
        if original in cloned:
            return cloned[original]
        copy = GraphNode(original.val)
        cloned[original] = copy
        for neighbor in original.neighbors:
            copy.neighbors.append(dfs(neighbor))
        return copy

    return dfs(node)

Problem 4: Word Ladder (BFS Shortest Transformation)

Given a begin word, an end word, and a dictionary, find the shortest transformation sequence (each step changes one letter).

# Input:  beginWord = "hit", endWord = "cog"
#         wordList = ["hot", "dot", "dog", "lot", "log", "cog"]
# Output: 5 (hit -> hot -> dot -> dog -> cog)

# ML context: Analogous to finding the shortest path in
# a word embedding neighborhood graph.

Hint 1

Model as a graph problem: each word is a node, and two words are connected if they differ by exactly one character. Use BFS for shortest path.

Hint 2

Instead of checking all pairs of words (O(n^2 * m)), create a pattern map: "h*t" -> ["hit", "hot"]. This builds the graph in O(n * m).

Hint 3

BFS from beginWord. At each step, generate all wildcard patterns for the current word and find matching words from the pattern map.

Solution

from collections import defaultdict, deque

def ladder_length(begin_word, end_word, word_list):
    """Shortest word transformation sequence using BFS.
    Time: O(n * m^2) where n = words, m = word length
    Space: O(n * m)
    """
    if end_word not in word_list:
        return 0

    word_set = set(word_list)
    m = len(begin_word)

    # Build pattern map: "h*t" -> ["hit", "hot"]
    pattern_map = defaultdict(list)
    for word in word_set:
        for i in range(m):
            pattern = word[:i] + '*' + word[i + 1:]
            pattern_map[pattern].append(word)

    visited = {begin_word}
    queue = deque([(begin_word, 1)])

    while queue:
        word, depth = queue.popleft()
        for i in range(m):
            pattern = word[:i] + '*' + word[i + 1:]
            for neighbor in pattern_map[pattern]:
                if neighbor == end_word:
                    return depth + 1
                if neighbor not in visited:
                    visited.add(neighbor)
                    queue.append((neighbor, depth + 1))

    return 0

Problem 5: Course Schedule (Cycle Detection)

Given the number of courses and prerequisite pairs, determine if it is possible to finish all courses (no circular dependencies).

# Input:  numCourses = 4, prerequisites = [[1,0], [2,1], [3,2]]
# Output: True (take 0 -> 1 -> 2 -> 3)

# Input:  numCourses = 2, prerequisites = [[0,1], [1,0]]
# Output: False (circular dependency)

Hint 1

This is cycle detection in a directed graph. A valid course schedule exists if and only if the prerequisite graph is a DAG.

Hint 2

Use topological sort (Kahn's algorithm). If you can process all nodes, there is no cycle.

Hint 3

Edge [a, b] means "to take course a, you must first take course b" - so the edge is b -> a.

Solution

from collections import defaultdict, deque

def can_finish(num_courses, prerequisites):
    """Determine if all courses can be completed.
    Time: O(V + E), Space: O(V + E)
    """
    graph = defaultdict(list)
    in_degree = [0] * num_courses

    for course, prereq in prerequisites:
        graph[prereq].append(course)
        in_degree[course] += 1

    queue = deque([i for i in range(num_courses) if in_degree[i] == 0])
    count = 0

    while queue:
        node = queue.popleft()
        count += 1
        for neighbor in graph[node]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)

    return count == num_courses

Problem 6: Serialize and Deserialize Binary Tree

Design functions to serialize a binary tree to a string and deserialize it back.

# ML context: Saving and loading decision tree models,
# serializing computation graphs for distributed training.

Hint 1

Use preorder traversal with a special marker for null nodes.

Hint 2

Serialize: preorder, writing "null" for empty nodes. Deserialize: read values in order, using "null" to know when a subtree is empty.

Hint 3

Use an iterator/index to track position during deserialization.

Solution

class Codec:
    """Serialize and deserialize a binary tree.
    Time: O(n) for both operations, Space: O(n)
    """
    def serialize(self, root):
        """Preorder traversal with null markers."""
        tokens = []
        def preorder(node):
            if not node:
                tokens.append('null')
                return
            tokens.append(str(node.val))
            preorder(node.left)
            preorder(node.right)
        preorder(root)
        return ','.join(tokens)

    def deserialize(self, data):
        """Reconstruct tree from serialized string."""
        tokens = iter(data.split(','))

        def build():
            val = next(tokens)
            if val == 'null':
                return None
            node = TreeNode(int(val))
            node.left = build()
            node.right = build()
            return node

        return build()

Algorithm Selection Guide

Graph and Tree Algorithm Selection Guide

Interview Cheat Sheet

Topic	Key Takeaway
Tree traversals	Know all 4 (inorder, preorder, postorder, level-order) both recursive and iterative
BST property	`left < root < right` for entire subtrees, not just immediate children
BFS	Queue-based, level-by-level, finds shortest path in unweighted graphs
DFS	Stack/recursion, goes deep first, used for cycle detection and path problems
Topological sort	Kahn's (BFS + in-degree) or DFS postorder reverse. Only works on DAGs
Cycle detection	DFS with white/gray/black coloring for directed; Union-Find for undirected
Dijkstra	Heap-based, for weighted shortest path. O((V+E) log V)
Union-Find	Near O(1) per operation with path compression + union by rank
Trie	O(m) insert/search for strings of length m. Used for prefix matching
Number of Islands	DFS/BFS flood fill on 2D grid. Classic connected components

Spaced Repetition Checkpoints

Day	Review Activity	Time
Day 0	Implement all four tree traversals from memory. Solve Number of Islands.	40 min
Day 3	Implement topological sort (Kahn's) from memory. Test with ML pipeline example.	20 min
Day 7	Solve 2 new tree problems and 1 graph problem from LeetCode.	30 min
Day 14	Implement Union-Find from scratch. Solve connected components problem.	20 min
Day 21	Without reference, write BFS shortest path, DFS cycle detection, and topological sort. Time yourself.	40 min

Next Steps

Trees and graphs handle hierarchical and network-structured data. Next, learn Dynamic Programming to tackle optimization problems that appear in sequence alignment, Viterbi decoding, and beam search.

The Real Interview Moment​

Binary Trees​

Core Traversals​

Traversal Comparison​

Common Binary Tree Problems​

Binary Search Trees (BSTs)​

Tries (Prefix Trees)​

Graph Representations​

Graph Traversals: BFS and DFS​

Breadth-First Search (BFS)​

Depth-First Search (DFS)​

BFS vs DFS Comparison​

Topological Sort​

Shortest Path Algorithms​

Dijkstra's Algorithm​

Union-Find (Disjoint Set)​

Number of Islands (Classic Graph Problem)​

Practice Problems​

Problem 1: Validate ML Pipeline DAG​

Problem 2: Binary Tree Right Side View​

Problem 3: Clone a Graph​

Problem 4: Word Ladder (BFS Shortest Transformation)​

Problem 5: Course Schedule (Cycle Detection)​

Problem 6: Serialize and Deserialize Binary Tree​

Algorithm Selection Guide​

Interview Cheat Sheet​

Spaced Repetition Checkpoints​

Next Steps​

The Real Interview Moment

Binary Trees

Core Traversals

Traversal Comparison

Common Binary Tree Problems

Binary Search Trees (BSTs)

Tries (Prefix Trees)

Graph Representations

Graph Traversals: BFS and DFS

Breadth-First Search (BFS)

Depth-First Search (DFS)

BFS vs DFS Comparison

Topological Sort

Shortest Path Algorithms

Dijkstra's Algorithm

Union-Find (Disjoint Set)

Number of Islands (Classic Graph Problem)

Practice Problems

Problem 1: Validate ML Pipeline DAG

Problem 2: Binary Tree Right Side View

Problem 3: Clone a Graph

Problem 4: Word Ladder (BFS Shortest Transformation)

Problem 5: Course Schedule (Cycle Detection)

Problem 6: Serialize and Deserialize Binary Tree

Algorithm Selection Guide

Interview Cheat Sheet

Spaced Repetition Checkpoints

Next Steps