Python Python Lists Internals Practice Problems & Exercises

Practice: Python Lists Internals

12 problems4 Easy5 Medium3 Hard⏱ 45-60 min

Easy

#1Track List GrowthEasy

sys.getsizeofover-allocation

Write a function that appends integers 0 through n-1 to an initially empty list and records the byte size of the list after every append. Return a list of (length, byte_size) tuples.

This reveals the over-allocation jumps that make append() amortized O(1).

Python

import sys

def track_growth(n):
    lst = []
    records = []
    for i in range(n):
        lst.append(i)
        records.append((len(lst), sys.getsizeof(lst)))
    return records

print(track_growth(10))

Solution

import sys

def track_growth(n):
    lst = []
    records = []
    for i in range(n):
        lst.append(i)
        records.append((len(lst), sys.getsizeof(lst)))
    return records

The byte size does not increase on every append. CPython pre-allocates extra capacity so that most appends just write a pointer into an existing slot. The size jumps (e.g., 88 to 120 to 184) happen only when the internal buffer is full and must be reallocated. This is the mechanism behind amortized O(1) append.

Starter Code

import sys

def track_growth(n):
    """Append integers 0..n-1 to an empty list.
    Return a list of (length, byte_size) tuples recorded
    after each append.
    """
    # TODO: implement
    pass

Expected Output

[(1, 88), (2, 88), (3, 88), (4, 88), (5, 120), (6, 120), (7, 120), (8, 120), (9, 184), (10, 184)]

Hints

Hint 1: Use sys.getsizeof(lst) after each append to measure the list object size.

Hint 2: The byte size jumps at certain lengths because CPython over-allocates capacity.

#2Detect Reallocation PointsEasy

over-allocationdynamic-array

Given an integer n, append integers 0 through n-1 to an empty list and return the list lengths at which the internal buffer was reallocated (i.e., where sys.getsizeof changed).

Python

import sys

def reallocation_points(n):
    lst = []
    prev_size = sys.getsizeof(lst)
    points = []
    for i in range(n):
        lst.append(i)
        curr_size = sys.getsizeof(lst)
        if curr_size != prev_size:
            points.append(len(lst))
            prev_size = curr_size
    return points

print(reallocation_points(20))

Solution

import sys

def reallocation_points(n):
    lst = []
    prev_size = sys.getsizeof(lst)
    points = []
    for i in range(n):
        lst.append(i)
        curr_size = sys.getsizeof(lst)
        if curr_size != prev_size:
            points.append(len(lst))
            prev_size = curr_size
    return points

Reallocations happen at exponentially spaced intervals (roughly at lengths 1, 5, 9, 17, 26, ...). The growth formula in CPython is approximately newsize + newsize // 8 + (3 if newsize < 9 else 6). This geometric spacing ensures the total copy cost across all reallocations is O(n), giving amortized O(1) per append.

Starter Code

import sys

def reallocation_points(n):
    """Append integers 0..n-1 to an empty list.
    Return a list of lengths at which the internal
    buffer was reallocated (byte size changed).
    """
    # TODO: implement
    pass

Expected Output

[1, 5, 9, 17]

Hints

Hint 1: Compare sys.getsizeof before and after each append.

Hint 2: Record the current length whenever the size changes from the previous measurement.

#3Pointer Memory vs Value MemoryEasy

memory-layoutsys.getsizeof

Write a function that takes a list and returns a dictionary breaking down its memory usage into: the list object itself, the total memory of all contained elements, and the grand total. Test it with [1, 2, 3, 4, 5].

This demonstrates that Python lists store pointers, not values — the actual objects live separately on the heap.

Python

import sys

def memory_breakdown(lst):
    list_size = sys.getsizeof(lst)
    elements_size = sum(sys.getsizeof(elem) for elem in lst)
    return {
        'list_object': list_size,
        'elements_total': elements_size,
        'grand_total': list_size + elements_size,
    }

print(memory_breakdown([1, 2, 3, 4, 5]))

Solution

import sys

def memory_breakdown(lst):
    list_size = sys.getsizeof(lst)
    elements_size = sum(sys.getsizeof(elem) for elem in lst)
    return {
        'list_object': list_size,
        'elements_total': elements_size,
        'grand_total': list_size + elements_size,
    }

A list of 5 small integers uses roughly 260 bytes — 120 bytes for the list container (56 bytes base overhead + 8 bytes per pointer slot, with over-allocation) plus 28 bytes per integer object (5 x 28 = 140). In C, int arr[5] would be just 20 bytes. This overhead is the cost of Python's flexibility — every element is a full heap-allocated object.

Starter Code

import sys

def memory_breakdown(lst):
    """Return a dict with:
    - 'list_object': sys.getsizeof(lst)
    - 'elements_total': sum of sys.getsizeof for each element
    - 'grand_total': sum of the above
    """
    # TODO: implement
    pass

Expected Output

{'list_object': 120, 'elements_total': 140, 'grand_total': 260}

Hints

Hint 1: sys.getsizeof(lst) gives the list container size (overhead + pointer array), not the elements.

Hint 2: Sum sys.getsizeof(elem) for each elem in lst to get the element memory.

#4Identify the O(n) TrapEasy

time-complexityinsertappend

Build a list containing integers from n-1 down to 0. The naive approach uses insert(0, i) in a loop, which is O(n^2). Write a function that achieves the same result in O(n) using append() and a single reverse().

Test with n=10.

Python

def build_reversed_list_fast(n):
    result = []
    for i in range(n):
        result.append(i)
    result.reverse()
    return result

print(build_reversed_list_fast(10))

Solution

def build_reversed_list_fast(n):
    result = []
    for i in range(n):
        result.append(i)
    result.reverse()
    return result

The key insight is complexity arithmetic. insert(0, x) shifts all existing elements right — that is O(n) per call, making a loop of n insertions O(n^2). Instead, append() is amortized O(1) per call (total O(n) for the loop), and reverse() is O(n) once. Total: O(n) + O(n) = O(n). For 100,000 elements, this is roughly 150x faster than the insert approach.

Starter Code

def build_reversed_list_fast(n):
    """Build a list [n-1, n-2, ..., 1, 0]
    using only append() and reverse().
    Do NOT use insert(0, x).
    """
    # TODO: implement
    pass

Expected Output

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Hints

Hint 1: insert(0, x) is O(n) because it shifts all elements. append() is amortized O(1).

Hint 2: Build the list in natural order with append(), then call reverse() once at the end.

Medium

#5Estimate Capacity from Size JumpsMedium

over-allocationcapacityinternals

Write a function that estimates the allocated capacity of a Python list (how many pointer slots exist, not just how many are used). Use the fact that on 64-bit CPython, an empty list is 56 bytes and each pointer slot is 8 bytes.

Test by printing len vs estimated cap for lists of various sizes.

Python

import sys

EMPTY_LIST_OVERHEAD = sys.getsizeof([])
POINTER_SIZE = 8

def estimate_capacity(lst):
    return (sys.getsizeof(lst) - EMPTY_LIST_OVERHEAD) // POINTER_SIZE

for n in [0, 1, 4, 5, 8, 9]:
    lst = list(range(n))
    print(f"len={n} cap={estimate_capacity(lst)}")

Solution

import sys

EMPTY_LIST_OVERHEAD = sys.getsizeof([])
POINTER_SIZE = 8

def estimate_capacity(lst):
    return (sys.getsizeof(lst) - EMPTY_LIST_OVERHEAD) // POINTER_SIZE

You cannot read allocated directly from Python, but you can derive it from the byte size. The formula (getsizeof(lst) - base) // 8 gives the number of pointer slots. Notice that list(range(5)) has capacity 8, not 5 — because list() uses the same over-allocation strategy. The gap between len and cap is the headroom that makes future appends cheap.

Starter Code

import sys

def estimate_capacity(lst):
    """Estimate the allocated capacity (number of pointer slots)
    of a list by examining its byte size.
    
    On 64-bit CPython:
    - Empty list overhead: 56 bytes
    - Each pointer slot: 8 bytes
    
    Return the estimated capacity as an integer.
    """
    # TODO: implement
    pass

Expected Output

len=0 cap=0
len=1 cap=4
len=4 cap=4
len=5 cap=8
len=8 cap=8
len=9 cap=16

Hints

Hint 1: Capacity = (sys.getsizeof(lst) - empty_list_overhead) // pointer_size.

Hint 2: On 64-bit CPython, empty list overhead is typically 56 bytes and each pointer is 8 bytes.

#6Safe Matrix ConstructorMedium

aliasinglist-multiplicationbug

Implement a function that creates a rows x cols matrix filled with a default value. The critical requirement: modifying one cell must not affect any other row. The naive [[default]*cols]*rows fails this because all rows alias the same list.

Test by setting matrix[0][0] = 1 and verifying only row 0 changes.

Python

def make_matrix(rows, cols, default=0):
    return [[default] * cols for _ in range(rows)]

matrix = make_matrix(3, 3)
matrix[0][0] = 1
print(matrix)

Solution

def make_matrix(rows, cols, default=0):
    return [[default] * cols for _ in range(rows)]

The list comprehension creates a new [default]*cols list on each iteration, so each row is an independent object. With [[default]*cols]*rows, the * operator copies the reference to the same inner list rows times. This is one of the most common Python bugs in production code, especially in numerical computing and grid-based algorithms.

Starter Code

def make_matrix(rows, cols, default=0):
    """Create a rows x cols matrix where each row
    is an independent list. Modifying one cell must
    NOT affect other rows.
    """
    # TODO: implement (do NOT use [[default]*cols]*rows)
    pass

Expected Output

[[1, 0, 0], [0, 0, 0], [0, 0, 0]]

Hints

Hint 1: [[val]*cols]*rows creates rows references to the SAME inner list — a classic aliasing bug.

Hint 2: Use a list comprehension to create a new inner list on each iteration.

#7Efficient Deduplication with OrderMedium

membership-testingO(n)-vs-O(n2)set

Write an O(n) deduplication function that preserves the order of first occurrences. The naive approach using x not in result_list is O(n^2) because in on a list is O(n). Use the right data structure for membership testing.

Test with [3, 1, 4, 1, 5, 9, 2, 6, 5, 3].

Python

def deduplicate(items):
    seen = set()
    result = []
    for item in items:
        if item not in seen:
            seen.add(item)
            result.append(item)
    return result

print(deduplicate([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]))

Solution

def deduplicate(items):
    seen = set()
    result = []
    for item in items:
        if item not in seen:
            seen.add(item)
            result.append(item)
    return result

The set provides O(1) membership testing, so the total cost is O(n) instead of O(n^2). This pattern — a list for ordered results plus a set for fast lookup — is fundamental in Python. An alternative one-liner is list(dict.fromkeys(items)), which exploits dict insertion-order preservation (Python 3.7+), but understanding the explicit version is essential for interviews.

Starter Code

def deduplicate(items):
    """Remove duplicates from items while preserving
    first-occurrence order. Must run in O(n) time.
    """
    # TODO: implement without using dict.fromkeys
    pass

Expected Output

[3, 1, 4, 5, 9, 2, 6]

Hints

Hint 1: Using `x not in result_list` for membership is O(n) per check — total O(n^2).

Hint 2: Maintain a set alongside the result list: set lookup is O(1), giving O(n) total.

#8Flatten Without Quadratic ConcatenationMedium

extend-vs-concatO(n2)-trapitertools

Write a function that flattens a list of lists into a single flat list. The naive approach result = result + chunk in a loop is O(n^2) because + creates a new list every iteration, copying all previous elements.

Test with [[1, 2, 3], [4, 5, 6], [7, 8, 9]].

Python

def flatten_chunks(chunks):
    result = []
    for chunk in chunks:
        result.extend(chunk)
    return result

print(flatten_chunks([[1, 2, 3], [4, 5, 6], [7, 8, 9]]))

Solution

# Option 1: extend (most readable)
def flatten_chunks(chunks):
    result = []
    for chunk in chunks:
        result.extend(chunk)
    return result

# Option 2: itertools (most Pythonic)
from itertools import chain
def flatten_chunks(chunks):
    return list(chain.from_iterable(chunks))

# Option 3: list comprehension
def flatten_chunks(chunks):
    return [x for chunk in chunks for x in chunk]

The + operator creates a brand new list each time, copying all existing elements plus the new chunk. After k iterations with total n elements, the total copy work is O(n * k) in the worst case. extend() modifies the list in-place with amortized O(1) per element, giving O(n) total. This is one of the most common hidden quadratic traps in Python code.

Starter Code

def flatten_chunks(chunks):
    """Flatten a list of lists into a single list.
    Must be O(n) total elements, NOT O(n^2).
    Do NOT use result = result + chunk in a loop.
    """
    # TODO: implement
    pass

Expected Output

[1, 2, 3, 4, 5, 6, 7, 8, 9]

Hints

Hint 1: result = result + chunk creates a new list each iteration — O(n^2) total copies.

Hint 2: Use extend() which appends in-place (amortized O(1) per element), or itertools.chain.from_iterable.

#9Operation Complexity ClassifierMedium

time-complexitylist-operations

Fill in a dictionary that maps 10 common list operations to their correct time complexity. This tests your knowledge of the full complexity table from the lesson.

Python

def classify_operations():
    return {
        'index': 'O(1)',
        'append': 'O(1)*',
        'pop_end': 'O(1)*',
        'pop_front': 'O(n)',
        'insert_front': 'O(n)',
        'membership': 'O(n)',
        'sort': 'O(n log n)',
        'len': 'O(1)',
        'slice': 'O(k)',
        'extend': 'O(k)*',
    }

ANSWER_KEY = {
    'index': 'O(1)',
    'append': 'O(1)*',
    'pop_end': 'O(1)*',
    'pop_front': 'O(n)',
    'insert_front': 'O(n)',
    'membership': 'O(n)',
    'sort': 'O(n log n)',
    'len': 'O(1)',
    'slice': 'O(k)',
    'extend': 'O(k)*',
}

result = classify_operations()
wrong = [k for k in ANSWER_KEY if result.get(k) != ANSWER_KEY[k]]
if wrong:
    for k in wrong:
        print(f"Wrong: {k} -> got {result.get(k)}, expected {ANSWER_KEY[k]}")
else:
    print("All correct!")

Solution

def classify_operations():
    return {
        'index': 'O(1)',        # Direct pointer arithmetic: base + i*8
        'append': 'O(1)*',      # Amortized — occasional O(n) resize
        'pop_end': 'O(1)*',     # Amortized — may trigger shrink
        'pop_front': 'O(n)',    # Must shift all elements left
        'insert_front': 'O(n)', # Must shift all elements right
        'membership': 'O(n)',   # Linear scan — use set for O(1)
        'sort': 'O(n log n)',   # Timsort
        'len': 'O(1)',          # Stored in ob_size field
        'slice': 'O(k)',        # Copies k pointers to new list
        'extend': 'O(k)*',     # Amortized — appends k elements
    }

The asterisk marks amortized complexity. append is amortized O(1) because most calls just write a pointer, but occasional reallocations cost O(n). len is O(1) because the length is stored directly in the ob_size field of the PyListObject struct — it does not count elements. The most commonly confused operations are pop(0) and insert(0, x), which people assume are O(1) like pop() and append().

Starter Code

def classify_operations():
    """Return a dict mapping each list operation
    to its time complexity string.
    
    Operations to classify:
    'index', 'append', 'pop_end', 'pop_front',
    'insert_front', 'membership', 'sort',
    'len', 'slice', 'extend'
    
    Use these complexity strings:
    'O(1)', 'O(1)*', 'O(n)', 'O(n log n)', 'O(k)', 'O(k)*'
    where * means amortized and k means slice/iterable size.
    """
    # TODO: implement
    pass

Expected Output

All correct!

Hints

Hint 1: append and pop_end are amortized O(1) — marked as O(1)*.

Hint 2: pop_front and insert_front must shift all elements, making them O(n).

Hard

#10Amortized Cost SimulatorHard

amortized-analysisdynamic-arraysimulation

Simulate a dynamic array from scratch. Start with capacity 0. On each append, if there is room, the cost is 1 (write one pointer). If the array is full, double the capacity — the cost is current_size + 1 (copy all existing elements plus write the new one).

Return the total cost, amortized cost per operation, number of resizes, and final capacity. Test with n=1000.

This proves empirically that doubling gives amortized O(1) — the amortized cost converges to about 2.0 for a growth factor of 2.

Python

def simulate_dynamic_array(n, growth_factor=2):
    size = 0
    capacity = 0
    total_cost = 0
    resize_count = 0

    for i in range(n):
        if size == capacity:
            new_capacity = max(1, int(capacity * growth_factor))
            total_cost += size + 1  # copy all + write new
            capacity = new_capacity
            resize_count += 1
        else:
            total_cost += 1  # just write the pointer
        size += 1

    return {
        'total_cost': total_cost,
        'amortized_cost': round(total_cost / n, 2),
        'resize_count': resize_count,
        'final_capacity': capacity,
    }

result = simulate_dynamic_array(1000)
print(f"total_cost={result['total_cost']}, amortized={result['amortized_cost']}, resizes={result['resize_count']}, final_capacity={result['final_capacity']}")

Solution

def simulate_dynamic_array(n, growth_factor=2):
    size = 0
    capacity = 0
    total_cost = 0
    resize_count = 0

    for i in range(n):
        if size == capacity:
            new_capacity = max(1, int(capacity * growth_factor))
            total_cost += size + 1
            capacity = new_capacity
            resize_count += 1
        else:
            total_cost += 1
        size += 1

    return {
        'total_cost': total_cost,
        'amortized_cost': round(total_cost / n, 2),
        'resize_count': resize_count,
        'final_capacity': capacity,
    }

The amortized cost converges to approximately 2.0 for a doubling strategy. This is because the geometric series of copy costs 1 + 2 + 4 + 8 + ... + n sums to roughly 2n. Adding the n non-resize appends gives about 3n total, so roughly 3.0 per append. CPython uses a growth factor of about 1.125 (not 2.0), which wastes less memory but has a slightly higher amortized constant. The tradeoff is space efficiency vs. copy frequency.

Starter Code

def simulate_dynamic_array(n, growth_factor=2):
    """Simulate a dynamic array that doubles capacity
    when full. Track the cost of each append:
    - 1 if there is room (just write the pointer)
    - current_length + 1 if a resize is needed
      (copy all elements + write the new one)
    
    Return a dict with:
    - 'total_cost': sum of all individual costs
    - 'amortized_cost': total_cost / n
    - 'resize_count': number of resizes
    - 'final_capacity': capacity after all appends
    """
    # TODO: implement
    pass

Expected Output

total_cost=2047, amortized=2.00, resizes=10, final_capacity=1024

Hints

Hint 1: Start with capacity=0. On the first append (or when size==capacity), double the capacity (or set to 1 if 0).

Hint 2: The resize cost is size+1 (copying existing elements plus inserting the new one). Non-resize cost is always 1.

#11Queue Performance BenchmarkHard

pop(0)-trapdequebenchmarking

Benchmark three approaches to implementing a queue of n items. Enqueue integers 0 to n-1, then dequeue all of them.

Strategy 1: Use a plain list with pop(0) for dequeue. Strategy 2: Build with append(), reverse(), then pop() from end. Strategy 3: Use collections.deque with popleft().

Return a dict of strategy names to elapsed times. Use n=50000 to see the difference clearly.

Python

import time
from collections import deque

def benchmark_queue(n):
    results = {}

    # Strategy 1: list with pop(0) — O(n^2)
    lst = list(range(n))
    start = time.perf_counter()
    while lst:
        lst.pop(0)
    results['list_pop0'] = round(time.perf_counter() - start, 4)

    # Strategy 2: list with reverse + pop from end — O(n)
    lst = list(range(n))
    start = time.perf_counter()
    lst.reverse()
    while lst:
        lst.pop()
    results['list_reverse'] = round(time.perf_counter() - start, 4)

    # Strategy 3: deque with popleft — O(n)
    dq = deque(range(n))
    start = time.perf_counter()
    while dq:
        dq.popleft()
    results['deque'] = round(time.perf_counter() - start, 4)

    return results

result = benchmark_queue(50000)
for name, elapsed in result.items():
    print(f"{name}: {elapsed}s")

Solution

import time
from collections import deque

def benchmark_queue(n):
    results = {}

    # Strategy 1: list with pop(0) — O(n^2)
    lst = list(range(n))
    start = time.perf_counter()
    while lst:
        lst.pop(0)
    results['list_pop0'] = round(time.perf_counter() - start, 4)

    # Strategy 2: list with reverse + pop from end — O(n)
    lst = list(range(n))
    start = time.perf_counter()
    lst.reverse()
    while lst:
        lst.pop()
    results['list_reverse'] = round(time.perf_counter() - start, 4)

    # Strategy 3: deque with popleft — O(n)
    dq = deque(range(n))
    start = time.perf_counter()
    while dq:
        dq.popleft()
    results['deque'] = round(time.perf_counter() - start, 4)

    return results

For n=50,000, pop(0) takes about 0.5-1.5 seconds while the other two take under 0.01 seconds. The pop(0) approach is O(n^2) because each pop shifts all remaining elements left — that is a memmove of (n-k) pointers on the k-th dequeue. deque uses a doubly-linked list of fixed-size blocks, so popleft() is truly O(1). This is why collections.deque exists: lists are not designed for front operations.

Starter Code

import time
from collections import deque

def benchmark_queue(n):
    """Compare three queue strategies for n enqueue+dequeue ops:
    1. list with pop(0)          — O(n^2)
    2. list with append+reverse  — O(n)
    3. deque with appendleft     — O(n)
    
    Return a dict mapping strategy name to elapsed seconds.
    """
    # TODO: implement all three strategies and time them
    pass

Expected Output

list_pop0: slow, list_reverse: fast, deque: fast

Hints

Hint 1: pop(0) shifts all remaining elements left on every call — O(n) per dequeue, O(n^2) total.

Hint 2: For the reverse strategy: append all items, then reverse, then pop from end. For deque: use appendleft or popleft.

#12Production Data Pipeline OptimizerHard

productionO(n2)-detectionrefactoring

The function process_pipeline_slow has three hidden O(n^2) traps. Your task:

Identify each quadratic trap and explain why it is O(n^2).
Write process_pipeline_fast that produces the same output in O(n * k) time, where k is the average number of tags per record.

Test with the sample records below.

Python

def process_pipeline_slow(records):
    # Bug 1: O(n^2) — list comprehension for membership on every iteration
    unique = []
    for r in records:
        if r['id'] not in [u['id'] for u in unique]:
            unique.append(r)

    # Bug 2: O(n^2) — pop(i) shifts elements on every deletion
    i = 0
    while i < len(unique):
        if unique[i]['score'] < 0.5:
            unique.pop(i)
        else:
            i += 1

    # Bug 3: O(n^2) — concatenation creates new list every iteration
    all_tags = []
    for r in unique:
        all_tags = all_tags + r.get('tags', [])

    return all_tags


def process_pipeline_fast(records):
    # Fix 1: set for O(1) membership
    seen_ids = set()
    unique = []
    for r in records:
        if r['id'] not in seen_ids:
            seen_ids.add(r['id'])
            unique.append(r)

    # Fix 2: list comprehension filter — no shifting
    filtered = [r for r in unique if r['score'] >= 0.5]

    # Fix 3: extend instead of concatenation
    all_tags = []
    for r in filtered:
        all_tags.extend(r.get('tags', []))

    return all_tags


records = [
    {'id': 1, 'score': 0.8, 'tags': ['ml', 'python']},
    {'id': 2, 'score': 0.3, 'tags': ['web']},
    {'id': 1, 'score': 0.9, 'tags': ['duplicate']},
    {'id': 3, 'score': 0.7, 'tags': ['data']},
    {'id': 4, 'score': 0.6, 'tags': ['ml', 'ops']},
]

slow_result = process_pipeline_slow(records)
fast_result = process_pipeline_fast(records)
print(fast_result)
assert slow_result == fast_result, "Results should match"

Solution

def process_pipeline_fast(records):
    # Fix 1: set for O(1) membership testing
    seen_ids = set()
    unique = []
    for r in records:
        if r['id'] not in seen_ids:
            seen_ids.add(r['id'])
            unique.append(r)

    # Fix 2: list comprehension — O(n), no element shifting
    filtered = [r for r in unique if r['score'] >= 0.5]

    # Fix 3: extend — O(k) per call, O(n*k) total
    all_tags = []
    for r in filtered:
        all_tags.extend(r.get('tags', []))

    return all_tags

Three fixes, three different list internals principles:

Dedup: The original builds [u['id'] for u in unique] on every iteration — O(n) per check, O(n^2) total. A set gives O(1) lookup.
Filter: pop(i) from the middle of a list shifts all elements after index i left. In the worst case (deleting every other element), this is O(n^2). A comprehension builds a new list in one O(n) pass.
Flatten: all_tags = all_tags + chunk allocates a new list and copies all previous elements on every iteration. For n records with k tags each, that is O(n^2 * k). extend() appends in-place with amortized O(1) per element.

This is the kind of code review that separates senior engineers from juniors — the slow version looks reasonable but collapses at scale.

Starter Code

def process_pipeline_slow(records):
    """This pipeline has THREE hidden O(n^2) traps.
    Identify and fix all of them.
    
    The pipeline should:
    1. Deduplicate by 'id' (keep first occurrence)
    2. Filter records where score >= 0.5
    3. Collect all tags into a flat list
    """
    # --- Bug 1: quadratic dedup ---
    unique = []
    for r in records:
        if r['id'] not in [u['id'] for u in unique]:
            unique.append(r)
    
    # --- Bug 2: quadratic filter via delete ---
    i = 0
    while i < len(unique):
        if unique[i]['score'] < 0.5:
            unique.pop(i)
        else:
            i += 1
    
    # --- Bug 3: quadratic flatten ---
    all_tags = []
    for r in unique:
        all_tags = all_tags + r.get('tags', [])
    
    return all_tags


def process_pipeline_fast(records):
    """Fix all three O(n^2) traps.
    Total complexity should be O(n * k)
    where k is avg tags per record.
    """
    # TODO: implement the fixed version
    pass

Expected Output

['ml', 'python', 'data', 'ml', 'ops']

Hints

Hint 1: Bug 1: building a list of IDs via comprehension each iteration is O(n). Use a set for seen IDs.

Hint 2: Bug 2: popping from the middle shifts elements. Use a list comprehension filter instead. Bug 3: result + chunk copies everything — use extend().

Practice: Python Lists Internals

Easy​

Medium​

Hard​

Easy

Medium

Hard