DSA Foundations for ML

Big-O complexity, core data structures, space-time tradeoffs, and how each concept appears in real AI/ML interviews.

Reading time: ~35 min | Interview relevance: Critical | Roles: All AI/ML roles

The Real Interview Moment

You are twenty minutes into a phone screen with a Meta ML engineer. The first problem went well - a straightforward hash map question that you solved in twelve minutes. Now comes the follow-up: "Good. For the next problem, imagine you have a dataset of 100 million user embeddings, each a 512-dimensional vector. You need to find, for each user, the K most similar users. Walk me through your approach - not the ML part, but the data structure and algorithmic considerations."

You pause. This is not a LeetCode problem. There is no single "correct" answer. But the interviewer is testing something fundamental: do you understand how data structures and algorithmic complexity interact at scale? Can you reason about whether to use a heap, a KD-tree, or a hash-based approach? Do you understand why a naive O(n^2) pairwise comparison is catastrophically slow at 100 million entries?

The candidates who excel at this question are not necessarily the ones who memorized the most algorithms. They are the ones who deeply understand the foundations - what each data structure is good at, what it costs, and when to reach for it.

This chapter builds that foundation.

Big-O Complexity Analysis

Big-O notation describes how an algorithm's runtime or memory usage grows as the input size increases. For ML interviews, you need to do three things with Big-O:

Analyze the complexity of code you write
Compare approaches and justify your choice
Predict whether a solution will work at production scale

The Complexity Hierarchy

Complexity Hierarchy - Fastest to Slowest

What Each Complexity Means in Practice

Complexity	Name	n = 1,000	n = 1,000,000	n = 1,000,000,000	ML Context
O(1)	Constant	1 op	1 op	1 op	Hash table lookup, array index
O(log n)	Logarithmic	10 ops	20 ops	30 ops	Binary search, balanced BST lookup
O(n)	Linear	1K ops	1M ops	1B ops	Single pass through dataset
O(n log n)	Linearithmic	10K ops	20M ops	30B ops	Sorting, building a heap
O(n^2)	Quadratic	1M ops	1T ops	Not feasible	Pairwise distance matrix
O(2^n)	Exponential	~10^300	Not feasible	Not feasible	Brute-force subset selection
O(n!)	Factorial	Not feasible	Not feasible	Not feasible	Brute-force permutations

Instant Rejection

If your solution is O(n^2) and the interviewer says "the dataset has 10 million rows," you need to immediately recognize that n^2 = 10^14 operations, which would take hours or days. Failing to catch this is a strong negative signal. Always mentally plug in the problem's scale to validate your approach.

How to Analyze Complexity

Here is a systematic approach:

# Example 1: O(n) - single loop
def find_max(arr):
    """Scan through array once."""
    max_val = arr[0]               # O(1)
    for val in arr:                 # O(n) - loop runs n times
        if val > max_val:          # O(1) - comparison
            max_val = val          # O(1) - assignment
    return max_val                 # Total: O(n)

# Example 2: O(n^2) - nested loops
def has_duplicate_pair(arr):
    """Check all pairs - this is slow."""
    n = len(arr)
    for i in range(n):             # O(n)
        for j in range(i + 1, n):  # O(n) in worst case
            if arr[i] == arr[j]:   # O(1)
                return True
    return False                   # Total: O(n^2)

# Example 3: O(n) - same problem, better approach
def has_duplicate_set(arr):
    """Use a set for O(1) lookups."""
    seen = set()                   # O(1) space initially
    for val in arr:                # O(n) - loop runs n times
        if val in seen:            # O(1) average - set lookup
            return True
        seen.add(val)              # O(1) average - set insert
    return False                   # Total: O(n) time, O(n) space

60-Second Answer

To analyze complexity quickly: count the nested loops that depend on input size. One loop over n = O(n). Two nested loops over n = O(n^2). A loop with a halving step (like binary search) = O(log n). Sorting + one pass = O(n log n). If you use a recursive call that branches, think about the recursion tree.

Space Complexity

Space complexity is often overlooked but frequently asked about. Key rules:

# O(1) space - modifying input in place
def reverse_in_place(arr):
    left, right = 0, len(arr) - 1
    while left < right:
        arr[left], arr[right] = arr[right], arr[left]
        left += 1
        right -= 1
    # No extra data structures - O(1) space

# O(n) space - creating a new collection proportional to input
def get_unique(arr):
    return list(set(arr))  # Set can hold up to n elements - O(n) space

# O(n^2) space - 2D matrix
def pairwise_distances(points):
    n = len(points)
    dist = [[0.0] * n for _ in range(n)]  # n x n matrix - O(n^2) space
    for i in range(n):
        for j in range(n):
            dist[i][j] = abs(points[i] - points[j])
    return dist

Amortized Complexity

Some operations are usually fast but occasionally slow. Amortized analysis averages over a sequence of operations:

Operation	Worst Case	Amortized	Why
Python list `.append()`	O(n)	O(1)	Resizes occasionally, but rare
Hash table insert	O(n)	O(1)	Rehashing is rare
Union-Find with path compression	O(log n)	O(alpha(n)) ~= O(1)	Near-constant in practice

Core Data Structures

Arrays (Python Lists)

Arrays are the most fundamental data structure and the most frequently tested. In Python, list is a dynamic array.

Operations and Complexity:

Operation	Time	Note
Access by index `arr[i]`	O(1)	Direct memory access
Append `arr.append(x)`	O(1) amortized	Occasional O(n) resize
Insert at position `arr.insert(i, x)`	O(n)	Shifts all elements after i
Delete by index `del arr[i]`	O(n)	Shifts all elements after i
Search (unsorted) `x in arr`	O(n)	Must scan entire array
Search (sorted) via `bisect`	O(log n)	Binary search
Slice `arr[i:j]`	O(j - i)	Creates a new list
Sort `arr.sort()`	O(n log n)	Timsort

When to Use in ML Interviews:

Storing sequences of predictions, features, or time-series data
Sliding window problems (rolling averages, max in window)
Two-pointer problems (merging sorted arrays, partitioning)
Matrix operations (2D arrays for feature matrices)

# Common array patterns in ML interviews

# 1. Prefix sum - precompute cumulative sums for range queries
def prefix_sum(arr):
    """Compute prefix sums for O(1) range sum queries."""
    n = len(arr)
    prefix = [0] * (n + 1)
    for i in range(n):
        prefix[i + 1] = prefix[i] + arr[i]
    return prefix
    # Range sum [i, j] = prefix[j+1] - prefix[i]

# 2. Two pointers - O(n) approach for sorted arrays
def two_sum_sorted(arr, target):
    """Find two numbers that sum to target in a sorted array."""
    left, right = 0, len(arr) - 1
    while left < right:
        current = arr[left] + arr[right]
        if current == target:
            return [left, right]
        elif current < target:
            left += 1
        else:
            right -= 1
    return [-1, -1]

Linked Lists

Linked lists are less common in ML interviews but still appear, especially at Google and Amazon.

Operations and Complexity:

Operation	Time	Note
Access by index	O(n)	Must traverse from head
Insert at head	O(1)	Just update pointers
Insert at tail (with tail pointer)	O(1)	Just update pointers
Delete (given node)	O(1)	Update pointers
Search	O(n)	Must traverse

class ListNode:
    def __init__(self, val=0, next=None):
        self.val = val
        self.next = next

# Reverse a linked list - classic interview problem
def reverse_list(head):
    """Reverse a singly linked list in place."""
    prev = None
    curr = head
    while curr:
        next_node = curr.next  # Save next
        curr.next = prev       # Reverse pointer
        prev = curr            # Advance prev
        curr = next_node       # Advance curr
    return prev                # New head

Company Variation

Linked list problems are common at Google (about 10-15% of coding questions) but rare at ML-focused companies like OpenAI or Anthropic. If you are targeting Google MLE specifically, practice 5-8 linked list problems. For most other companies, understanding the basics is sufficient.

Stacks and Queues

Stack (LIFO - Last In, First Out):

# Python list works as a stack
stack = []
stack.append(1)    # Push - O(1)
stack.append(2)
stack.append(3)
top = stack.pop()  # Pop - O(1), returns 3
top = stack[-1]    # Peek - O(1), returns 2

Queue (FIFO - First In, First Out):

from collections import deque

queue = deque()
queue.append(1)      # Enqueue - O(1)
queue.append(2)
queue.append(3)
front = queue.popleft()  # Dequeue - O(1), returns 1

When to Use in ML Interviews:

Stack: Expression parsing, DFS (iterative), matching brackets, monotonic stack for next-greater-element
Queue: BFS traversal, level-order tree traversal, task scheduling, rate limiting

# Monotonic stack - find next greater element
# ML relevance: finding the next time a metric exceeds current value
def next_greater_element(arr):
    """For each element, find the next element that is greater."""
    n = len(arr)
    result = [-1] * n
    stack = []  # Stack of indices
    for i in range(n):
        while stack and arr[i] > arr[stack[-1]]:
            idx = stack.pop()
            result[idx] = arr[i]
        stack.append(i)
    return result

# Example: arr = [4, 5, 2, 10, 8]
# Result:       [5, 10, 10, -1, -1]

Hash Maps (Python Dictionaries)

Hash maps are the single most important data structure for ML coding interviews. If you master one thing, master hash maps.

Operations and Complexity:

Operation	Average	Worst Case	Note
Get `d[key]`	O(1)	O(n)	Worst case with all collisions
Set `d[key] = val`	O(1)	O(n)	Amortized O(1) with resizing
Delete `del d[key]`	O(1)	O(n)	Rare worst case
Check `key in d`	O(1)	O(n)	Average case is what matters
Iterate all keys	O(n)	O(n)	Must visit every entry

from collections import Counter, defaultdict

# Counter - frequency counting (extremely common in interviews)
def top_k_features(feature_names, k):
    """Find the K most common features."""
    counts = Counter(feature_names)
    return counts.most_common(k)

# defaultdict - grouping pattern
def group_by_label(data):
    """Group data points by their label."""
    groups = defaultdict(list)
    for point, label in data:
        groups[label].append(point)
    return dict(groups)

# Two-sum with hash map - the most classic interview problem
def two_sum(nums, target):
    """Find indices of two numbers that sum to target."""
    seen = {}  # value -> index
    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:
            return [seen[complement], i]
        seen[num] = i
    return []

Common Trap

Do not confuse Python's dict with an ordered data structure. While Python 3.7+ guarantees insertion order, hash maps do not have a natural "sorted" order. If you need sorted keys, use sorted(d.keys()) which is O(n log n), or use a different data structure like a balanced BST (SortedDict from sortedcontainers).

Heaps (Priority Queues)

Heaps are essential for "top-K" problems, which appear frequently in ML contexts (top-K predictions, K nearest neighbors, beam search).

Operations and Complexity:

Operation	Time	Note
Push	O(log n)	Bubble up
Pop (min/max)	O(log n)	Bubble down
Peek (min/max)	O(1)	Root element
Build heap from list	O(n)	Floyd's algorithm

import heapq

# Min-heap (Python's default)
heap = []
heapq.heappush(heap, 5)
heapq.heappush(heap, 2)
heapq.heappush(heap, 8)
smallest = heapq.heappop(heap)  # Returns 2

# Top-K largest elements - O(n log k)
def top_k_predictions(scores, k):
    """Return the K highest prediction scores."""
    # Use a min-heap of size K
    # This is O(n log k) - much better than sorting O(n log n) when k << n
    return heapq.nlargest(k, scores)

# For custom objects, use tuples (priority, value)
def top_k_with_labels(predictions, k):
    """Return K predictions with highest confidence."""
    # predictions = [(score, label), ...]
    # nlargest compares by first element of tuple
    return heapq.nlargest(k, predictions)

Trees

Trees appear in ML interviews primarily through binary trees, BSTs, and tries.

Binary Tree Operations:

class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

# Three traversal orders - know all three
def inorder(root):
    """Left -> Root -> Right (gives sorted order for BST)."""
    if not root:
        return []
    return inorder(root.left) + [root.val] + inorder(root.right)

def preorder(root):
    """Root -> Left -> Right (used for serialization)."""
    if not root:
        return []
    return [root.val] + preorder(root.left) + preorder(root.right)

def postorder(root):
    """Left -> Right -> Root (used for expression evaluation)."""
    if not root:
        return []
    return postorder(root.left) + postorder(root.right) + [root.val]

# Level-order traversal using BFS - very common in interviews
from collections import deque

def level_order(root):
    """BFS traversal - returns list of lists, one per level."""
    if not root:
        return []
    result = []
    queue = deque([root])
    while queue:
        level = []
        for _ in range(len(queue)):
            node = queue.popleft()
            level.append(node.val)
            if node.left:
                queue.append(node.left)
            if node.right:
                queue.append(node.right)
        result.append(level)
    return result

ML Relevance of Trees:

Decision trees are literally trees - interview questions may ask you to traverse or build one
Expression trees for mathematical formulas in feature engineering
Trie for text autocompletion (relevant for NLP roles)

Graphs

Graphs are the most general data structure and appear in pipeline, dependency, and network problems.

Representations:

# Adjacency list - most common, best for sparse graphs
# ML use: feature dependency graphs, model pipeline DAGs
graph = {
    'data_loading': ['preprocessing', 'validation'],
    'preprocessing': ['feature_engineering'],
    'validation': ['feature_engineering'],
    'feature_engineering': ['model_training'],
    'model_training': ['evaluation'],
    'evaluation': []
}

# Adjacency matrix - used for dense graphs
# ML use: similarity matrices, attention weight matrices
import numpy as np
adj_matrix = np.array([
    [0, 1, 1, 0],
    [1, 0, 0, 1],
    [1, 0, 0, 1],
    [0, 1, 1, 0]
])

# Edge list - simple but limited
# ML use: knowledge graph triples (subject, predicate, object)
edges = [
    ('user_A', 'follows', 'user_B'),
    ('user_B', 'follows', 'user_C'),
    ('user_A', 'likes', 'item_1'),
]

BFS and DFS:

from collections import deque

def bfs(graph, start):
    """Breadth-first search - explore level by level."""
    visited = set([start])
    queue = deque([start])
    order = []
    while queue:
        node = queue.popleft()
        order.append(node)
        for neighbor in graph.get(node, []):
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append(neighbor)
    return order

def dfs(graph, start, visited=None):
    """Depth-first search - explore as deep as possible first."""
    if visited is None:
        visited = set()
    visited.add(start)
    result = [start]
    for neighbor in graph.get(start, []):
        if neighbor not in visited:
            result.extend(dfs(graph, neighbor, visited))
    return result

Data Structure Selection Guide

This is the most practically useful table in this chapter. When you see a problem, ask yourself what operations you need, then pick the right structure:

I Need To...	Best Data Structure	Time	Alternative
Look up by key	Hash Map	O(1)	BST: O(log n)
Check membership	Hash Set	O(1)	Sorted array + binary search: O(log n)
Get min/max quickly	Heap	O(1) peek, O(log n) pop	Sorted array: O(1) but O(n) insert
Maintain sorted order	BST / SortedList	O(log n) all ops	Sorted array: O(n) insert
FIFO processing	Queue (deque)	O(1)	List: O(n) for popleft
LIFO processing	Stack (list)	O(1)	deque works too
Count frequencies	Counter (hash map)	O(n) build, O(1) query	Sort + scan: O(n log n)
Find top-K	Heap	O(n log k)	Sort: O(n log n)
Store sparse connections	Adjacency list	O(V + E) space	Adjacency matrix: O(V^2)
Range queries	Prefix sum array	O(n) build, O(1) query	Segment tree for updates

60-Second Answer

When stuck on which data structure to use, ask yourself three questions: (1) What operation do I need to be fast? (2) Do I need ordering? (3) How much extra space can I afford? The answers to these three questions almost always point to the right structure.

Space-Time Tradeoffs

Almost every algorithmic optimization involves trading space for time. Understanding this tradeoff is crucial:

Space-Time Tradeoff

Common Tradeoffs in ML Interviews

Problem	Slow Approach	Fast Approach	Tradeoff
Two Sum	O(n^2) time, O(1) space	O(n) time, O(n) space	Hash map caches complements
Sliding window max	O(nk) time, O(1) space	O(n) time, O(k) space	Deque maintains window
Frequency count	O(n^2) time, O(1) space	O(n) time, O(n) space	Counter stores counts
K nearest neighbors	O(n) per query, O(1) space	O(log n) per query, O(n) space	KD-tree indexes points
String matching	O(nm) time, O(1) space	O(n+m) time, O(m) space	Precompute failure function

When Space Matters in ML

In most coding interviews, O(n) extra space is acceptable. But in ML contexts, space can be a real constraint:

# Scenario: You have 1 billion user embeddings, each 512-dimensional float32
import sys

n_users = 1_000_000_000
embedding_dim = 512
bytes_per_float = 4

total_bytes = n_users * embedding_dim * bytes_per_float
total_gb = total_bytes / (1024 ** 3)
print(f"Total storage: {total_gb:.1f} GB")  # ~1907 GB

# A hash map with all user IDs as keys would add ~80 GB (8 bytes per pointer x 10B slots)
# This is why approximate methods (LSH, FAISS) are used in practice

Python-Specific DSA Knowledge

ML interviews are almost always in Python. Know these Python-specific details:

Built-in Data Structures Performance

Structure	Python Type	Key Performance Notes
Dynamic array	`list`	O(1) append, O(n) insert/delete at front
Hash map	`dict`	O(1) average for get/set/delete
Hash set	`set`	O(1) average for add/remove/check
Double-ended queue	`collections.deque`	O(1) append/pop from both ends
Counter	`collections.Counter`	O(n) build, O(1) query
Default dict	`collections.defaultdict`	Like dict but auto-initializes missing keys
Ordered dict	`collections.OrderedDict`	Maintains insertion order (less needed in 3.7+)
Named tuple	`collections.namedtuple`	Lightweight immutable class
Heap	`heapq` module	Min-heap only - negate values for max-heap
Sorted container	`sortedcontainers.SortedList`	O(log n) insert, not in stdlib

Python Gotchas That Lose You Points

# GOTCHA 1: Mutable default arguments
def append_to(element, lst=[]):  # BUG! Same list shared across calls
    lst.append(element)
    return lst

def append_to(element, lst=None):  # CORRECT
    if lst is None:
        lst = []
    lst.append(element)
    return lst

# GOTCHA 2: List comprehension vs generator for large data
# Bad - creates entire list in memory
total = sum([x ** 2 for x in range(10_000_000)])  # Wastes ~80MB

# Good - generator uses O(1) memory
total = sum(x ** 2 for x in range(10_000_000))    # O(1) space

# GOTCHA 3: Integer overflow - Python handles it natively!
# Unlike Java/C++, Python integers have arbitrary precision
big_num = 2 ** 1000  # This is fine in Python, would overflow in Java

# GOTCHA 4: Shallow vs deep copy
import copy
matrix = [[1, 2], [3, 4]]
shallow = matrix[:]           # Copies outer list, inner lists are shared
deep = copy.deepcopy(matrix)  # Copies everything

shallow[0][0] = 99
print(matrix[0][0])  # 99 - shallow copy shared the inner list!

Common Trap

A surprisingly common interview mistake: using list.index(x) inside a loop. This is O(n) per call, making your loop O(n^2). Use a hash map instead to get O(1) lookups.

Complexity of Common Python Operations

This table is worth memorizing. Interviewers will ask you the complexity of your code, and knowing Python's built-in costs is essential:

Operation	Complexity	Notes
`len(list)`	O(1)	Stored as attribute
`list.append(x)`	O(1) amortized
`list.pop()`	O(1)	Pop from end
`list.pop(0)`	O(n)	Shifts all elements - use `deque.popleft()`
`list[i]`	O(1)	Direct index
`x in list`	O(n)	Linear scan
`list.sort()`	O(n log n)	Timsort, stable
`sorted(list)`	O(n log n)	Returns new list
`list + list`	O(n + m)	Creates new list
`list.extend(other)`	O(k)	k = len(other)
`x in set`	O(1) average	Hash-based
`set.add(x)`	O(1) average
`set & set`	O(min(n, m))	Intersection
`set \| set`	O(n + m)	Union
`dict[key]`	O(1) average
`dict.keys()`	O(1)	Returns a view
`str.split()`	O(n)	n = length of string
`''.join(list)`	O(total chars)
`str + str` (in loop)	O(n^2) total	Use `''.join()` instead

Practice Problems

Problem 1: Analyze This Code

What is the time and space complexity of the following function?

def mystery(arr):
    n = len(arr)
    result = []
    seen = set()
    arr.sort()
    for i in range(n):
        if arr[i] not in seen:
            result.append(arr[i])
            seen.add(arr[i])
    return result

Hint 1

Break the function into parts: the sort, the loop, and the set operations.

Hint 2

The sort is O(n log n). The loop is O(n). Each set operation in the loop is O(1).

Solution

Time: O(n log n) - dominated by the sort. The loop is O(n) and set operations are O(1) average.

Space: O(n) - both result and seen can hold up to n elements.

Note: This function returns unique elements in sorted order. A simpler approach: return sorted(set(arr)) - same complexity, one line.

Problem 2: Choose the Right Data Structure

You are building a system that:

Receives a stream of (timestamp, model_accuracy) tuples
Must answer "What was the highest accuracy in the last K measurements?" in O(log K) time
New measurements arrive one at a time

Which data structure should you use? Why?

Hint 1

You need fast access to the maximum within a sliding window.

Hint 2

Consider a max-heap or a monotonic deque. What are the tradeoffs?

Solution

Best approach: Monotonic deque (O(1) amortized per query)

A monotonic decreasing deque maintains elements in decreasing order. The front is always the max. When a new element arrives:

Remove elements from the back that are smaller (they can never be the max again)
Add the new element to the back
Remove the front if it is outside the window

from collections import deque

class SlidingWindowMax:
    def __init__(self, k):
        self.k = k
        self.dq = deque()  # Stores (timestamp, accuracy) in decreasing order
        self.count = 0

    def add(self, timestamp, accuracy):
        # Remove elements smaller than new one from back
        while self.dq and self.dq[-1][1] <= accuracy:
            self.dq.pop()
        self.dq.append((timestamp, accuracy))
        self.count += 1
        # Remove elements outside window from front
        while self.dq[0][0] <= timestamp - self.k:
            self.dq.popleft()

    def get_max(self):
        return self.dq[0][1]  # O(1)

A max-heap would give O(log K) per query but is harder to remove old elements from (lazy deletion needed). The monotonic deque is the canonical solution.

Problem 3: Optimize This ML Code

The following code computes pairwise cosine similarity for a set of embeddings. It works but is too slow for 100K embeddings. Identify the complexity and suggest an optimization.

import math

def cosine_similarity_all_pairs(embeddings):
    """Compute cosine similarity between all pairs of embeddings."""
    n = len(embeddings)
    dim = len(embeddings[0])
    result = [[0.0] * n for _ in range(n)]

    for i in range(n):
        for j in range(i + 1, n):
            dot = sum(embeddings[i][k] * embeddings[j][k] for k in range(dim))
            norm_i = math.sqrt(sum(x ** 2 for x in embeddings[i]))
            norm_j = math.sqrt(sum(x ** 2 for x in embeddings[j]))
            sim = dot / (norm_i * norm_j + 1e-8)
            result[i][j] = sim
            result[j][i] = sim
    return result

Hint 1

The outer loops are O(n^2). The inner loop over dimensions is O(d). What is recomputed unnecessarily?

Hint 2

Norms are recomputed for each pair. Precompute them. Also, consider using NumPy for vectorization.

Solution

Current complexity: O(n^2 * d) - for each of n^2/2 pairs, we compute a dot product in O(d) and two norms in O(d) each.

Optimization 1: Precompute norms - reduces constant factor by ~3x

def cosine_similarity_precomputed(embeddings):
    n = len(embeddings)
    dim = len(embeddings[0])
    # Precompute norms - O(n * d)
    norms = [math.sqrt(sum(x ** 2 for x in emb)) for emb in embeddings]
    result = [[0.0] * n for _ in range(n)]
    for i in range(n):
        for j in range(i + 1, n):
            dot = sum(embeddings[i][k] * embeddings[j][k] for k in range(dim))
            sim = dot / (norms[i] * norms[j] + 1e-8)
            result[i][j] = sim
            result[j][i] = sim
    return result

Optimization 2: NumPy - orders of magnitude faster

import numpy as np

def cosine_similarity_numpy(embeddings):
    """Vectorized cosine similarity - same O(n^2 * d) but with BLAS-optimized matrix multiply."""
    E = np.array(embeddings)            # (n, d)
    norms = np.linalg.norm(E, axis=1, keepdims=True)  # (n, 1)
    E_normalized = E / (norms + 1e-8)   # (n, d)
    return E_normalized @ E_normalized.T # (n, n) - single matrix multiply

The NumPy version is still O(n^2 * d) but runs 100-1000x faster due to BLAS-optimized matrix multiplication. For truly large datasets (millions), you need approximate methods like FAISS.

Interview Cheat Sheet

Concept	Key Takeaway
Big-O	Count nested loops. Sort = O(n log n). Hash lookup = O(1).
Arrays	O(1) access, O(n) insert/delete. Use for sequential data.
Hash maps	O(1) average for everything. Your default choice for lookup problems.
Heaps	O(log n) push/pop, O(1) peek. Use for top-K and priority problems.
Stacks	LIFO, O(1) push/pop. Use for DFS, parsing, monotonic problems.
Queues	FIFO, O(1) with deque. Use for BFS, level-order traversal.
Trees	O(log n) for balanced BST. Know all three traversal orders.
Graphs	Adjacency list for sparse, matrix for dense. Know BFS and DFS.
Space-time tradeoff	Usually trade O(n) space for O(n) time improvement.
Python specifics	`list.pop(0)` is O(n) - use deque. String concat in loop is O(n^2) - use join.

Spaced Repetition Checkpoints

Day	Review Activity	Time
Day 0	Read this chapter. Write down the complexity of every Python operation from memory, then check.	35 min
Day 3	Without looking, fill in the "I Need To... / Best Data Structure" table. Check your answers.	15 min
Day 7	Solve Problem 2 again from scratch. Can you implement the monotonic deque solution?	20 min
Day 14	Explain the space-time tradeoff to someone (or write an explanation). Include two examples.	15 min
Day 21	Given a new problem, write down: (1) required operations, (2) best data structure, (3) expected complexity - before writing any code. Practice this on 3 problems.	30 min

Next Steps

Now that you understand the foundations, it is time to practice. Start with Arrays and Strings - the most frequently tested category in AI/ML coding interviews.

The Real Interview Moment​

Big-O Complexity Analysis​

The Complexity Hierarchy​

What Each Complexity Means in Practice​

How to Analyze Complexity​

Space Complexity​

Amortized Complexity​

Core Data Structures​

Arrays (Python Lists)​

Linked Lists​

Stacks and Queues​

Hash Maps (Python Dictionaries)​

Heaps (Priority Queues)​

Trees​

Graphs​

Data Structure Selection Guide​

Space-Time Tradeoffs​

Common Tradeoffs in ML Interviews​

When Space Matters in ML​

Python-Specific DSA Knowledge​

Built-in Data Structures Performance​

Python Gotchas That Lose You Points​

Complexity of Common Python Operations​

Practice Problems​

Problem 1: Analyze This Code​

Problem 2: Choose the Right Data Structure​

Problem 3: Optimize This ML Code​

Interview Cheat Sheet​

Spaced Repetition Checkpoints​

Next Steps​

The Real Interview Moment

Big-O Complexity Analysis

The Complexity Hierarchy

What Each Complexity Means in Practice

How to Analyze Complexity

Space Complexity

Amortized Complexity

Core Data Structures

Arrays (Python Lists)

Linked Lists

Stacks and Queues

Hash Maps (Python Dictionaries)

Heaps (Priority Queues)

Trees

Graphs

Data Structure Selection Guide

Space-Time Tradeoffs

Common Tradeoffs in ML Interviews

When Space Matters in ML

Python-Specific DSA Knowledge

Built-in Data Structures Performance

Python Gotchas That Lose You Points

Complexity of Common Python Operations

Practice Problems

Problem 1: Analyze This Code

Problem 2: Choose the Right Data Structure

Problem 3: Optimize This ML Code

Interview Cheat Sheet

Spaced Repetition Checkpoints

Next Steps