Dynamic Programming

Memoization vs tabulation, common DP patterns (knapsack, LCS, edit distance), and DP in ML contexts (Viterbi, CTC, beam search) - with practice problems and full Python solutions.

Reading time: ~40 min | Interview relevance: High | Roles: MLE, Research Engineer, AI Engineer

The Real Interview Moment

You are in the final round at a top AI research lab. The interviewer, a research scientist who published three NeurIPS papers last year, leans back in her chair and says: "Let us do something different from the usual LeetCode fare. I want you to implement the Viterbi algorithm for a simple Hidden Markov Model. Given transition probabilities, emission probabilities, and a sequence of observations, find the most likely sequence of hidden states."

You feel a jolt of recognition. This is dynamic programming - the same principle behind edit distance and longest common subsequence, but applied to a core ML algorithm. The state is (position in sequence, current hidden state). The recurrence is: the best way to be in state s at position t is the best way to be in some state s' at position t-1, multiplied by the transition probability from s' to s and the emission probability of the observation at t given state s.

You write it cleanly. The interviewer nods: "Good. Now, what is the relationship between Viterbi and beam search in sequence-to-sequence models?" You explain that beam search is essentially a width-limited version of the same principle - instead of tracking the best path to every state, you track only the top-K paths.

Dynamic programming is not just a coding interview pattern. It is the algorithmic backbone of sequence modeling, reinforcement learning, and optimization. This chapter teaches you both the interview patterns and the ML connections.

The Core Idea

Dynamic programming solves problems by breaking them into overlapping subproblems and storing the results to avoid recomputation.

When to Use Dynamic Programming

Two Properties Required for DP

Optimal substructure - The optimal solution to the problem can be constructed from optimal solutions to its subproblems
Overlapping subproblems - The same subproblems are solved multiple times

Memoization vs Tabulation

# ============ TOP-DOWN (MEMOIZATION) ============
# Start from the big problem, recurse, cache results

def fib_memo(n, memo={}):
    """Fibonacci with memoization.
    Time: O(n), Space: O(n)
    """
    if n in memo:
        return memo[n]
    if n <= 1:
        return n
    memo[n] = fib_memo(n - 1, memo) + fib_memo(n - 2, memo)
    return memo[n]

# Better approach using functools
from functools import lru_cache

@lru_cache(maxsize=None)
def fib_lru(n):
    """Fibonacci with lru_cache - cleaner memoization."""
    if n <= 1:
        return n
    return fib_lru(n - 1) + fib_lru(n - 2)

# ============ BOTTOM-UP (TABULATION) ============
# Start from smallest subproblems, build up to the answer

def fib_tab(n):
    """Fibonacci with tabulation.
    Time: O(n), Space: O(n)
    """
    if n <= 1:
        return n
    dp = [0] * (n + 1)
    dp[1] = 1
    for i in range(2, n + 1):
        dp[i] = dp[i - 1] + dp[i - 2]
    return dp[n]

# Space-optimized - only need last two values
def fib_optimal(n):
    """Fibonacci with O(1) space.
    Time: O(n), Space: O(1)
    """
    if n <= 1:
        return n
    prev2, prev1 = 0, 1
    for _ in range(2, n + 1):
        curr = prev1 + prev2
        prev2 = prev1
        prev1 = curr
    return prev1

Memoization vs Tabulation Comparison

Aspect	Memoization (Top-Down)	Tabulation (Bottom-Up)
Direction	Start from target, recurse down	Start from base cases, build up
Implementation	Recursive + cache	Iterative + array
What gets computed	Only needed subproblems	All subproblems
Stack overflow risk	Yes (deep recursion)	No
Easier to write	Often yes (more intuitive)	Sometimes - depends on the problem
Space optimization	Harder	Easier (can often reduce to O(1))

60-Second Answer

When stuck on a DP problem, use this framework: (1) Define the state - what variables uniquely identify a subproblem? (2) Write the recurrence - how does the answer for state (i) relate to smaller states? (3) Identify base cases. (4) Determine the computation order (which states depend on which). (5) Code it up - start with memoization if the recurrence is clear, switch to tabulation for optimization.

Pattern 1: 1D Dynamic Programming

Climbing Stairs / Fibonacci Family

def climb_stairs(n):
    """Number of ways to climb n stairs (1 or 2 steps at a time).

    Recurrence: dp[i] = dp[i-1] + dp[i-2]
    This IS Fibonacci.

    Time: O(n), Space: O(1)
    """
    if n <= 2:
        return n
    prev2, prev1 = 1, 2
    for _ in range(3, n + 1):
        curr = prev1 + prev2
        prev2 = prev1
        prev1 = curr
    return prev1

def min_cost_climbing(cost):
    """Minimum cost to reach the top. Can start from step 0 or 1.

    ML context: Minimum-cost path through a pipeline where
    each stage has a processing cost.

    Recurrence: dp[i] = cost[i] + min(dp[i-1], dp[i-2])

    Time: O(n), Space: O(1)
    """
    n = len(cost)
    if n <= 1:
        return 0

    prev2, prev1 = cost[0], cost[1]
    for i in range(2, n):
        curr = cost[i] + min(prev1, prev2)
        prev2 = prev1
        prev1 = curr
    return min(prev1, prev2)

def house_robber(nums):
    """Maximum amount you can rob without robbing adjacent houses.

    Recurrence: dp[i] = max(dp[i-1], dp[i-2] + nums[i])
    Either skip this house (dp[i-1]) or rob it (dp[i-2] + nums[i]).

    Time: O(n), Space: O(1)
    """
    if not nums:
        return 0
    if len(nums) == 1:
        return nums[0]

    prev2, prev1 = nums[0], max(nums[0], nums[1])
    for i in range(2, len(nums)):
        curr = max(prev1, prev2 + nums[i])
        prev2 = prev1
        prev1 = curr
    return prev1

Longest Increasing Subsequence (LIS)

def lis(nums):
    """Length of the longest increasing subsequence.

    ML context: Finding the longest monotonically improving
    sequence of model performance metrics.

    Approach 1: DP - O(n^2)
    dp[i] = length of LIS ending at index i
    Recurrence: dp[i] = max(dp[j] + 1) for all j < i where nums[j] < nums[i]

    Time: O(n^2), Space: O(n)
    """
    if not nums:
        return 0
    n = len(nums)
    dp = [1] * n  # Each element is a subsequence of length 1

    for i in range(1, n):
        for j in range(i):
            if nums[j] < nums[i]:
                dp[i] = max(dp[i], dp[j] + 1)

    return max(dp)

def lis_optimal(nums):
    """LIS using patience sorting - O(n log n).

    Maintain a list 'tails' where tails[i] is the smallest tail
    of all increasing subsequences of length i+1.

    Time: O(n log n), Space: O(n)
    """
    import bisect
    tails = []
    for num in nums:
        pos = bisect.bisect_left(tails, num)
        if pos == len(tails):
            tails.append(num)
        else:
            tails[pos] = num
    return len(tails)

Common Trap

A very common DP mistake: confusing "subsequence" with "subarray." A subsequence can skip elements (e.g., [1, 3, 5] from [1, 2, 3, 4, 5]). A subarray must be contiguous (e.g., [2, 3, 4]). Longest Increasing Subsequence is DP. Maximum Subarray sum is Kadane's algorithm (greedy, not DP).

Pattern 2: 2D Dynamic Programming

Edit Distance (Levenshtein Distance)

def edit_distance(word1, word2):
    """Minimum number of operations (insert, delete, replace)
    to convert word1 into word2.

    ML context: String similarity for NLP - spell checking,
    fuzzy matching, comparing model outputs to ground truth.
    Also: sequence alignment in bioinformatics.

    State: dp[i][j] = edit distance between word1[:i] and word2[:j]

    Recurrence:
    If word1[i-1] == word2[j-1]:
        dp[i][j] = dp[i-1][j-1]            # Characters match, no operation
    Else:
        dp[i][j] = 1 + min(
            dp[i-1][j],      # Delete from word1
            dp[i][j-1],      # Insert into word1
            dp[i-1][j-1]     # Replace
        )

    Time: O(m * n), Space: O(m * n), can be optimized to O(min(m, n))
    """
    m, n = len(word1), len(word2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    # Base cases: converting empty string to/from word
    for i in range(m + 1):
        dp[i][0] = i
    for j in range(n + 1):
        dp[0][j] = j

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if word1[i - 1] == word2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1]
            else:
                dp[i][j] = 1 + min(
                    dp[i - 1][j],      # Delete
                    dp[i][j - 1],      # Insert
                    dp[i - 1][j - 1]   # Replace
                )

    return dp[m][n]

# Space-optimized to O(n)
def edit_distance_optimized(word1, word2):
    """Same algorithm, O(min(m,n)) space."""
    if len(word1) < len(word2):
        word1, word2 = word2, word1
    m, n = len(word1), len(word2)

    prev = list(range(n + 1))
    curr = [0] * (n + 1)

    for i in range(1, m + 1):
        curr[0] = i
        for j in range(1, n + 1):
            if word1[i - 1] == word2[j - 1]:
                curr[j] = prev[j - 1]
            else:
                curr[j] = 1 + min(prev[j], curr[j - 1], prev[j - 1])
        prev, curr = curr, prev

    return prev[n]

Longest Common Subsequence (LCS)

def lcs(text1, text2):
    """Length of the longest common subsequence.

    ML context: Measuring similarity between two sequences
    (e.g., comparing predicted vs actual token sequences,
    BLEU score computation involves LCS-like operations).

    State: dp[i][j] = LCS length of text1[:i] and text2[:j]

    Recurrence:
    If text1[i-1] == text2[j-1]:
        dp[i][j] = dp[i-1][j-1] + 1
    Else:
        dp[i][j] = max(dp[i-1][j], dp[i][j-1])

    Time: O(m * n), Space: O(m * n)
    """
    m, n = len(text1), len(text2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if text1[i - 1] == text2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

    return dp[m][n]

def lcs_with_backtrack(text1, text2):
    """Return the actual LCS string, not just its length."""
    m, n = len(text1), len(text2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if text1[i - 1] == text2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

    # Backtrack to find the actual subsequence
    result = []
    i, j = m, n
    while i > 0 and j > 0:
        if text1[i - 1] == text2[j - 1]:
            result.append(text1[i - 1])
            i -= 1
            j -= 1
        elif dp[i - 1][j] > dp[i][j - 1]:
            i -= 1
        else:
            j -= 1

    return ''.join(reversed(result))

Pattern 3: Knapsack Problems

0/1 Knapsack

def knapsack_01(weights, values, capacity):
    """0/1 Knapsack - each item can be taken at most once.

    ML context: Feature selection with a budget constraint -
    each feature has a "cost" (computation time) and "value"
    (predictive power). Select features to maximize value
    within a compute budget.

    State: dp[i][w] = max value using first i items with capacity w

    Recurrence:
    dp[i][w] = max(
        dp[i-1][w],                              # Skip item i
        dp[i-1][w - weights[i]] + values[i]      # Take item i (if it fits)
    )

    Time: O(n * capacity), Space: O(n * capacity)
    """
    n = len(weights)
    dp = [[0] * (capacity + 1) for _ in range(n + 1)]

    for i in range(1, n + 1):
        for w in range(capacity + 1):
            dp[i][w] = dp[i - 1][w]  # Skip item i
            if weights[i - 1] <= w:
                dp[i][w] = max(
                    dp[i][w],
                    dp[i - 1][w - weights[i - 1]] + values[i - 1]
                )

    return dp[n][capacity]

# Space-optimized to O(capacity)
def knapsack_01_optimized(weights, values, capacity):
    """O(capacity) space - iterate weights in reverse."""
    dp = [0] * (capacity + 1)
    for i in range(len(weights)):
        for w in range(capacity, weights[i] - 1, -1):  # Reverse!
            dp[w] = max(dp[w], dp[w - weights[i]] + values[i])
    return dp[capacity]

Unbounded Knapsack (Coin Change)

def coin_change(coins, amount):
    """Minimum number of coins to make the amount.
    Unbounded knapsack - each coin can be used unlimited times.

    State: dp[a] = minimum coins to make amount a

    Recurrence: dp[a] = min(dp[a - coin] + 1) for each coin

    Time: O(amount * len(coins)), Space: O(amount)
    """
    dp = [float('inf')] * (amount + 1)
    dp[0] = 0

    for a in range(1, amount + 1):
        for coin in coins:
            if coin <= a and dp[a - coin] != float('inf'):
                dp[a] = min(dp[a], dp[a - coin] + 1)

    return dp[amount] if dp[amount] != float('inf') else -1

def coin_change_count(coins, amount):
    """Count the number of ways to make the amount.

    State: dp[a] = number of ways to make amount a

    Recurrence: dp[a] += dp[a - coin] for each coin
    (iterate coins in outer loop to avoid counting permutations)

    Time: O(amount * len(coins)), Space: O(amount)
    """
    dp = [0] * (amount + 1)
    dp[0] = 1

    for coin in coins:  # Outer loop over coins - counts combinations
        for a in range(coin, amount + 1):
            dp[a] += dp[a - coin]

    return dp[amount]

Company Variation

At Google and Meta, you will likely see knapsack-style problems phrased in ML terms: "Given N features with prediction gains and computation costs, select the subset that maximizes prediction gain within a latency budget." Recognizing this as 0/1 knapsack is the key insight.

Pattern 4: String DP

Palindrome Problems

def longest_palindrome_substring(s):
    """Find the longest palindromic substring.

    Approach: Expand around center (not pure DP but often categorized here).
    For each position, try both odd-length and even-length palindromes.

    Time: O(n^2), Space: O(1)
    """
    if not s:
        return ""

    start, max_len = 0, 1

    def expand(left, right):
        nonlocal start, max_len
        while left >= 0 and right < len(s) and s[left] == s[right]:
            if right - left + 1 > max_len:
                start = left
                max_len = right - left + 1
            left -= 1
            right += 1

    for i in range(len(s)):
        expand(i, i)      # Odd-length palindromes
        expand(i, i + 1)  # Even-length palindromes

    return s[start:start + max_len]

def count_palindromic_substrings(s):
    """Count the number of palindromic substrings.
    Time: O(n^2), Space: O(1)
    """
    count = 0

    def expand(left, right):
        nonlocal count
        while left >= 0 and right < len(s) and s[left] == s[right]:
            count += 1
            left -= 1
            right += 1

    for i in range(len(s)):
        expand(i, i)
        expand(i, i + 1)

    return count

Word Break

def word_break(s, word_dict):
    """Can string s be segmented into words from the dictionary?

    ML context: Tokenization - can a string be segmented
    into valid tokens from a vocabulary?

    State: dp[i] = True if s[:i] can be segmented

    Recurrence: dp[i] = any(dp[j] and s[j:i] in word_dict) for j in [0, i)

    Time: O(n^2 * m) where m = max word length, Space: O(n)
    """
    n = len(s)
    word_set = set(word_dict)
    dp = [False] * (n + 1)
    dp[0] = True

    for i in range(1, n + 1):
        for j in range(i):
            if dp[j] and s[j:i] in word_set:
                dp[i] = True
                break

    return dp[n]

Pattern 5: DP in ML Algorithms

Viterbi Algorithm

def viterbi(observations, states, start_prob, trans_prob, emit_prob):
    """Viterbi algorithm - find the most likely sequence of hidden states.

    This is THE example of DP in ML. Used in:
    - Hidden Markov Models (HMMs) for POS tagging, speech recognition
    - CRF decoding in NER (named entity recognition)

    State: dp[t][s] = probability of the most likely path
           ending in state s at time t

    Recurrence:
    dp[t][s] = max over all previous states s' of:
        dp[t-1][s'] * trans_prob[s'][s] * emit_prob[s][observations[t]]

    Time: O(T * S^2) where T = sequence length, S = number of states
    Space: O(T * S)
    """
    T = len(observations)
    S = len(states)

    # dp[t][s] = (probability, previous_state)
    dp = [[None] * S for _ in range(T)]

    # Initialize: first observation
    for s in range(S):
        dp[0][s] = (
            start_prob[states[s]] * emit_prob[states[s]][observations[0]],
            None
        )

    # Fill: subsequent observations
    for t in range(1, T):
        for s in range(S):
            best_prob = -1
            best_prev = -1
            for s_prev in range(S):
                prob = (dp[t - 1][s_prev][0] *
                       trans_prob[states[s_prev]][states[s]] *
                       emit_prob[states[s]][observations[t]])
                if prob > best_prob:
                    best_prob = prob
                    best_prev = s_prev
            dp[t][s] = (best_prob, best_prev)

    # Backtrack: find the most likely path
    # Find best final state
    best_final = max(range(S), key=lambda s: dp[T - 1][s][0])
    path = [states[best_final]]

    current = best_final
    for t in range(T - 1, 0, -1):
        current = dp[t][current][1]
        path.append(states[current])

    return list(reversed(path)), dp[T - 1][best_final][0]

# Example usage:
# states = ['sunny', 'rainy']
# observations = ['walk', 'shop', 'clean']
# start_prob = {'sunny': 0.6, 'rainy': 0.4}
# trans_prob = {'sunny': {'sunny': 0.7, 'rainy': 0.3},
#               'rainy': {'sunny': 0.4, 'rainy': 0.6}}
# emit_prob = {'sunny': {'walk': 0.6, 'shop': 0.3, 'clean': 0.1},
#              'rainy': {'walk': 0.1, 'shop': 0.4, 'clean': 0.5}}

Beam Search as Width-Limited DP

def beam_search(initial_state, expand_fn, score_fn, beam_width, max_steps):
    """Beam search - a width-limited form of DP.

    Used in sequence-to-sequence models (machine translation,
    text generation) as an alternative to greedy decoding.

    Instead of tracking ALL states (like Viterbi), track only
    the top-K (beam_width) most promising partial sequences.

    Time: O(max_steps * beam_width * branching_factor)
    Space: O(beam_width * max_steps)
    """
    import heapq

    # Each beam entry: (negative_score, sequence)
    # Negative because heapq is a min-heap
    beams = [(-score_fn(initial_state), [initial_state])]

    for step in range(max_steps):
        candidates = []
        for neg_score, sequence in beams:
            current_state = sequence[-1]
            # Expand current state to get successors
            for next_state in expand_fn(current_state):
                new_sequence = sequence + [next_state]
                new_score = score_fn(new_sequence)
                candidates.append((-new_score, new_sequence))

        # Keep only top beam_width candidates
        beams = heapq.nsmallest(beam_width, candidates)

        # Check for termination (all beams ended)
        if not beams:
            break

    # Return best sequence
    best_neg_score, best_sequence = min(beams)
    return best_sequence, -best_neg_score

Sequence Alignment (Needleman-Wunsch)

def needleman_wunsch(seq1, seq2, match_score=1, mismatch_penalty=-1, gap_penalty=-2):
    """Global sequence alignment using Needleman-Wunsch algorithm.

    ML context: Protein/DNA sequence alignment in bioinformatics ML.
    This is a generalization of edit distance with customizable scores.

    Time: O(m * n), Space: O(m * n)
    """
    m, n = len(seq1), len(seq2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    # Initialize gap penalties
    for i in range(m + 1):
        dp[i][0] = i * gap_penalty
    for j in range(n + 1):
        dp[0][j] = j * gap_penalty

    # Fill table
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            score = match_score if seq1[i - 1] == seq2[j - 1] else mismatch_penalty
            dp[i][j] = max(
                dp[i - 1][j - 1] + score,    # Match/mismatch
                dp[i - 1][j] + gap_penalty,    # Gap in seq2
                dp[i][j - 1] + gap_penalty     # Gap in seq1
            )

    # Backtrack to find alignment
    alignment1, alignment2 = [], []
    i, j = m, n
    while i > 0 or j > 0:
        if i > 0 and j > 0:
            score = match_score if seq1[i - 1] == seq2[j - 1] else mismatch_penalty
            if dp[i][j] == dp[i - 1][j - 1] + score:
                alignment1.append(seq1[i - 1])
                alignment2.append(seq2[j - 1])
                i -= 1
                j -= 1
                continue
        if i > 0 and dp[i][j] == dp[i - 1][j] + gap_penalty:
            alignment1.append(seq1[i - 1])
            alignment2.append('-')
            i -= 1
        else:
            alignment1.append('-')
            alignment2.append(seq2[j - 1])
            j -= 1

    return (''.join(reversed(alignment1)),
            ''.join(reversed(alignment2)),
            dp[m][n])

Instant Rejection

If you are interviewing for a research engineer or bioinformatics ML role and cannot explain the connection between edit distance and sequence alignment, it is a major red flag. They are the same algorithm with different scoring schemes. Edit distance minimizes operations; Needleman-Wunsch maximizes alignment score.

The DP Problem-Solving Framework

Quick Decision Table

Problem Pattern	State Definition	Recurrence Pattern	Time
Linear sequence	dp[i] = answer for first i elements	dp[i] depends on dp[i-1], dp[i-2], etc.	O(n) or O(n^2)
Two sequences	dp[i][j] = answer for seq1[:i], seq2[:j]	dp[i][j] depends on dp[i-1][j], dp[i][j-1], dp[i-1][j-1]	O(m*n)
Knapsack	dp[i][w] = answer using first i items with capacity w	dp[i][w] = max(skip, take)	O(n*W)
Interval	dp[i][j] = answer for subarray arr[i:j+1]	dp[i][j] depends on dp[i+1][j], dp[i][j-1], dp[i+1][j-1]	O(n^2)
Grid	dp[i][j] = answer at position (i, j)	dp[i][j] depends on dp[i-1][j], dp[i][j-1]	O(m*n)

Practice Problems

Problem 1: Maximum Profit from Model Training

You are training a model and can choose to retrain or keep the current checkpoint at each epoch. Retraining costs c[i] compute and gives v[i] accuracy improvement. You have a total compute budget B. Maximize total accuracy improvement.

# Input:  costs = [3, 4, 2, 5, 1], values = [2, 3, 1, 5, 2], budget = 7
# Output: 7 (select epochs with costs [3, 4] giving values [2, 5])
#         Wait - that's costs [3,4] = 7, values [2,3] = 5
#         Better: costs [2, 5] = 7, values [1, 5] = 6
#         Even better: costs [3, 2, 1] = 6, values [2, 1, 2] = 5
#         Best: costs [4, 2, 1] = 7, values [3, 1, 2] = 6
#         Actually: costs [3, 2, 1] = 6 <= 7, values [2, 1, 2] = 5
#         costs [4, 1] = 5, values [3, 2] = 5
#         costs [5, 1] = 6, values [5, 2] = 7  <-- this is 7!
# Output: 7

Hint 1

This is a 0/1 knapsack problem. Each epoch is an "item" with a weight (cost) and value (accuracy improvement).

Hint 2

State: dp[i][b] = max accuracy improvement using first i epochs with budget b.

Hint 3

Recurrence: dp[i][b] = max(dp[i-1][b], dp[i-1][b-costs[i]] + values[i]).

Solution

def max_training_profit(costs, values, budget):
    """0/1 Knapsack - maximize accuracy improvement within compute budget.
    Time: O(n * budget), Space: O(budget)
    """
    dp = [0] * (budget + 1)
    for i in range(len(costs)):
        for b in range(budget, costs[i] - 1, -1):
            dp[b] = max(dp[b], dp[b - costs[i]] + values[i])
    return dp[budget]

Problem 2: Token Sequence Similarity (Edit Distance)

Given two token sequences (e.g., model output and ground truth), compute the minimum number of token insertions, deletions, and substitutions to convert one into the other.

# Input:  seq1 = ["the", "cat", "sat", "on", "mat"]
#         seq2 = ["the", "dog", "sat", "on", "the", "mat"]
# Output: 2 (substitute "cat"->"dog", insert "the")

Hint 1

This is edit distance but on tokens instead of characters.

Hint 2

The algorithm is identical. State: dp[i][j] = edit distance between seq1[:i] and seq2[:j].

Hint 3

Compare tokens for equality instead of characters. Everything else stays the same.

Solution

def token_edit_distance(seq1, seq2):
    """Edit distance on token sequences.
    Time: O(m * n), Space: O(n)
    """
    m, n = len(seq1), len(seq2)
    prev = list(range(n + 1))
    curr = [0] * (n + 1)

    for i in range(1, m + 1):
        curr[0] = i
        for j in range(1, n + 1):
            if seq1[i - 1] == seq2[j - 1]:
                curr[j] = prev[j - 1]
            else:
                curr[j] = 1 + min(prev[j], curr[j - 1], prev[j - 1])
        prev, curr = curr, prev

    return prev[n]

Problem 3: Decode Ways

A message containing letters A-Z is encoded as numbers (A=1, B=2, ..., Z=26). Given a string of digits, count the number of ways to decode it.

# Input:  "226"
# Output: 3 ("BZ", "VF", "BBF")

# ML context: Analogous to CTC decoding where multiple
# alignments can produce the same output.

Hint 1

At each position, you can decode one digit (1-9) or two digits (10-26).

Hint 2

dp[i] = number of ways to decode s[:i]. dp[i] = dp[i-1] (if single digit valid) + dp[i-2] (if two digits valid).

Hint 3

Be careful with '0' - it cannot be decoded alone, only as part of "10" or "20".

Solution

def num_decodings(s):
    """Count decoding ways for a digit string.
    Time: O(n), Space: O(1)
    """
    if not s or s[0] == '0':
        return 0

    n = len(s)
    prev2 = 1  # dp[i-2]: empty string has 1 way
    prev1 = 1  # dp[i-1]: single valid digit has 1 way

    for i in range(1, n):
        curr = 0
        # Single digit decode
        if s[i] != '0':
            curr += prev1
        # Two digit decode
        two_digit = int(s[i - 1:i + 1])
        if 10 <= two_digit <= 26:
            curr += prev2
        prev2 = prev1
        prev1 = curr

    return prev1

Problem 4: Unique Paths in a Grid

A robot starts at (0,0) and wants to reach (m-1, n-1). It can only move right or down. Count the number of unique paths.

# Input:  m = 3, n = 7
# Output: 28

Hint 1

dp[i][j] = number of unique paths to reach (i, j). The robot can come from (i-1, j) or (i, j-1).

Hint 2

Recurrence: dp[i][j] = dp[i-1][j] + dp[i][j-1]. Base case: first row and first column are all 1s.

Hint 3

Space optimization: you only need the previous row, so use a 1D array.

Solution

def unique_paths(m, n):
    """Count unique paths in a grid.
    Time: O(m * n), Space: O(n)
    """
    dp = [1] * n
    for i in range(1, m):
        for j in range(1, n):
            dp[j] += dp[j - 1]
    return dp[n - 1]

Problem 5: Longest Common Subsequence of Token Sequences

Given two tokenized text sequences, find the length of their longest common subsequence and return the actual subsequence.

# Input:  tokens1 = ["the", "quick", "brown", "fox", "jumps"]
#         tokens2 = ["the", "slow", "brown", "fox", "runs"]
# Output: 3, ["the", "brown", "fox"]

Hint 1

Standard LCS algorithm, applied to lists of tokens instead of strings.

Hint 2

Build the DP table, then backtrack to find the actual subsequence.

Hint 3

When text1[i-1] == text2[j-1], add to LCS and move diagonally. Otherwise, move in the direction of the larger DP value.

Solution

def lcs_tokens(tokens1, tokens2):
    """LCS for token sequences with backtracking.
    Time: O(m * n), Space: O(m * n)
    """
    m, n = len(tokens1), len(tokens2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if tokens1[i - 1] == tokens2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

    # Backtrack
    result = []
    i, j = m, n
    while i > 0 and j > 0:
        if tokens1[i - 1] == tokens2[j - 1]:
            result.append(tokens1[i - 1])
            i -= 1
            j -= 1
        elif dp[i - 1][j] > dp[i][j - 1]:
            i -= 1
        else:
            j -= 1

    result.reverse()
    return dp[m][n], result

Problem 6: Partition Equal Subset Sum

Given an array of positive integers, determine if the array can be partitioned into two subsets with equal sum.

# Input:  [1, 5, 11, 5]
# Output: True (subsets [1, 5, 5] and [11])

# ML context: Balanced data splitting - can you partition a dataset
# into two groups with equal total weight?

Hint 1

If the total sum is odd, return False immediately. Otherwise, the problem reduces to: can you find a subset that sums to total/2?

Hint 2

This is a 0/1 knapsack problem where the "capacity" is total/2.

Hint 3

dp[j] = True if a subset sum of j is achievable. Iterate items, update dp in reverse order.

Solution

def can_partition(nums):
    """Check if array can be partitioned into two equal-sum subsets.
    Time: O(n * sum/2), Space: O(sum/2)
    """
    total = sum(nums)
    if total % 2 != 0:
        return False
    target = total // 2

    dp = [False] * (target + 1)
    dp[0] = True

    for num in nums:
        for j in range(target, num - 1, -1):
            dp[j] = dp[j] or dp[j - num]

    return dp[target]

Interview Cheat Sheet

Pattern	When to Use	Key Recurrence
1D DP (Fibonacci)	Linear sequence, each step depends on prev 1-2	dp[i] = f(dp[i-1], dp[i-2])
LIS	Longest increasing subsequence	dp[i] = max(dp[j]+1) for j < i, arr[j] < arr[i]
Edit Distance	String/sequence transformation	dp[i][j] = min(delete, insert, replace)
LCS	Longest common subsequence	dp[i][j] = dp[i-1][j-1]+1 if match, else max(skip)
0/1 Knapsack	Select items with weight/value	dp[i][w] = max(skip, take)
Coin Change	Minimum coins for target	dp[a] = min(dp[a-coin]+1) for each coin
Grid Paths	Count/optimize paths in 2D grid	dp[i][j] = dp[i-1][j] + dp[i][j-1]
Viterbi	Most likely hidden state sequence	dp[t][s] = max(dp[t-1][s'] * transition * emission)
Word Break	Can string be segmented?	dp[i] = any(dp[j] and s[j:i] in dict)

Spaced Repetition Checkpoints

Day	Review Activity	Time
Day 0	Read all patterns. Implement edit distance and coin change from scratch.	40 min
Day 3	Implement LCS with backtracking and 0/1 knapsack without looking at solutions.	25 min
Day 7	Solve 3 new DP problems from LeetCode (one 1D, one 2D, one knapsack variant).	45 min
Day 14	Implement the Viterbi algorithm from scratch with a small example. Verify manually.	30 min
Day 21	Without reference, classify 5 DP problems by pattern (1D, 2D, knapsack, string). Solve one from each category.	60 min

Next Steps

Dynamic programming handles optimization over sequences. Next, learn Sorting and Searching to master binary search variations, top-K selection, and sorting algorithms critical for data-heavy ML pipelines.

The Real Interview Moment​

The Core Idea​

Two Properties Required for DP​

Memoization vs Tabulation​

Memoization vs Tabulation Comparison​

Pattern 1: 1D Dynamic Programming​

Climbing Stairs / Fibonacci Family​

Longest Increasing Subsequence (LIS)​

Pattern 2: 2D Dynamic Programming​

Edit Distance (Levenshtein Distance)​

Longest Common Subsequence (LCS)​

Pattern 3: Knapsack Problems​

0/1 Knapsack​

Unbounded Knapsack (Coin Change)​

Pattern 4: String DP​

Palindrome Problems​

Word Break​

Pattern 5: DP in ML Algorithms​

Viterbi Algorithm​

Beam Search as Width-Limited DP​

Sequence Alignment (Needleman-Wunsch)​

The DP Problem-Solving Framework​

Quick Decision Table​

Practice Problems​

Problem 1: Maximum Profit from Model Training​

Problem 2: Token Sequence Similarity (Edit Distance)​

Problem 3: Decode Ways​

Problem 4: Unique Paths in a Grid​

Problem 5: Longest Common Subsequence of Token Sequences​

Problem 6: Partition Equal Subset Sum​

Interview Cheat Sheet​

Spaced Repetition Checkpoints​

Next Steps​

The Real Interview Moment

The Core Idea

Two Properties Required for DP

Memoization vs Tabulation

Memoization vs Tabulation Comparison

Pattern 1: 1D Dynamic Programming

Climbing Stairs / Fibonacci Family

Longest Increasing Subsequence (LIS)

Pattern 2: 2D Dynamic Programming

Edit Distance (Levenshtein Distance)

Longest Common Subsequence (LCS)

Pattern 3: Knapsack Problems

0/1 Knapsack

Unbounded Knapsack (Coin Change)

Pattern 4: String DP

Palindrome Problems

Word Break

Pattern 5: DP in ML Algorithms

Viterbi Algorithm

Beam Search as Width-Limited DP

Sequence Alignment (Needleman-Wunsch)

The DP Problem-Solving Framework

Quick Decision Table

Practice Problems

Problem 1: Maximum Profit from Model Training

Problem 2: Token Sequence Similarity (Edit Distance)

Problem 3: Decode Ways

Problem 4: Unique Paths in a Grid

Problem 5: Longest Common Subsequence of Token Sequences

Problem 6: Partition Equal Subset Sum

Interview Cheat Sheet

Spaced Repetition Checkpoints

Next Steps