Python Python Lists Internals Practice Problems & Exercises
Practice: Python Lists Internals
← Back to lessonEasy
Write a function that appends integers 0 through n-1 to an initially empty list and records the byte size of the list after every append. Return a list of (length, byte_size) tuples.
This reveals the over-allocation jumps that make append() amortized O(1).
import sys
def track_growth(n):
lst = []
records = []
for i in range(n):
lst.append(i)
records.append((len(lst), sys.getsizeof(lst)))
return records
print(track_growth(10))Solution
import sys
def track_growth(n):
lst = []
records = []
for i in range(n):
lst.append(i)
records.append((len(lst), sys.getsizeof(lst)))
return records
The byte size does not increase on every append. CPython pre-allocates extra capacity so that most appends just write a pointer into an existing slot. The size jumps (e.g., 88 to 120 to 184) happen only when the internal buffer is full and must be reallocated. This is the mechanism behind amortized O(1) append.
import sys
def track_growth(n):
"""Append integers 0..n-1 to an empty list.
Return a list of (length, byte_size) tuples recorded
after each append.
"""
# TODO: implement
passExpected Output
[(1, 88), (2, 88), (3, 88), (4, 88), (5, 120), (6, 120), (7, 120), (8, 120), (9, 184), (10, 184)]Hints
Hint 1: Use sys.getsizeof(lst) after each append to measure the list object size.
Hint 2: The byte size jumps at certain lengths because CPython over-allocates capacity.
Given an integer n, append integers 0 through n-1 to an empty list and return the list lengths at which the internal buffer was reallocated (i.e., where sys.getsizeof changed).
import sys
def reallocation_points(n):
lst = []
prev_size = sys.getsizeof(lst)
points = []
for i in range(n):
lst.append(i)
curr_size = sys.getsizeof(lst)
if curr_size != prev_size:
points.append(len(lst))
prev_size = curr_size
return points
print(reallocation_points(20))Solution
import sys
def reallocation_points(n):
lst = []
prev_size = sys.getsizeof(lst)
points = []
for i in range(n):
lst.append(i)
curr_size = sys.getsizeof(lst)
if curr_size != prev_size:
points.append(len(lst))
prev_size = curr_size
return points
Reallocations happen at exponentially spaced intervals (roughly at lengths 1, 5, 9, 17, 26, ...). The growth formula in CPython is approximately newsize + newsize // 8 + (3 if newsize < 9 else 6). This geometric spacing ensures the total copy cost across all reallocations is O(n), giving amortized O(1) per append.
import sys
def reallocation_points(n):
"""Append integers 0..n-1 to an empty list.
Return a list of lengths at which the internal
buffer was reallocated (byte size changed).
"""
# TODO: implement
passExpected Output
[1, 5, 9, 17]Hints
Hint 1: Compare sys.getsizeof before and after each append.
Hint 2: Record the current length whenever the size changes from the previous measurement.
Write a function that takes a list and returns a dictionary breaking down its memory usage into: the list object itself, the total memory of all contained elements, and the grand total. Test it with [1, 2, 3, 4, 5].
This demonstrates that Python lists store pointers, not values — the actual objects live separately on the heap.
import sys
def memory_breakdown(lst):
list_size = sys.getsizeof(lst)
elements_size = sum(sys.getsizeof(elem) for elem in lst)
return {
'list_object': list_size,
'elements_total': elements_size,
'grand_total': list_size + elements_size,
}
print(memory_breakdown([1, 2, 3, 4, 5]))Solution
import sys
def memory_breakdown(lst):
list_size = sys.getsizeof(lst)
elements_size = sum(sys.getsizeof(elem) for elem in lst)
return {
'list_object': list_size,
'elements_total': elements_size,
'grand_total': list_size + elements_size,
}
A list of 5 small integers uses roughly 260 bytes — 120 bytes for the list container (56 bytes base overhead + 8 bytes per pointer slot, with over-allocation) plus 28 bytes per integer object (5 x 28 = 140). In C, int arr[5] would be just 20 bytes. This overhead is the cost of Python's flexibility — every element is a full heap-allocated object.
import sys
def memory_breakdown(lst):
"""Return a dict with:
- 'list_object': sys.getsizeof(lst)
- 'elements_total': sum of sys.getsizeof for each element
- 'grand_total': sum of the above
"""
# TODO: implement
passExpected Output
{'list_object': 120, 'elements_total': 140, 'grand_total': 260}Hints
Hint 1: sys.getsizeof(lst) gives the list container size (overhead + pointer array), not the elements.
Hint 2: Sum sys.getsizeof(elem) for each elem in lst to get the element memory.
Build a list containing integers from n-1 down to 0. The naive approach uses insert(0, i) in a loop, which is O(n^2). Write a function that achieves the same result in O(n) using append() and a single reverse().
Test with n=10.
def build_reversed_list_fast(n):
result = []
for i in range(n):
result.append(i)
result.reverse()
return result
print(build_reversed_list_fast(10))Solution
def build_reversed_list_fast(n):
result = []
for i in range(n):
result.append(i)
result.reverse()
return result
The key insight is complexity arithmetic. insert(0, x) shifts all existing elements right — that is O(n) per call, making a loop of n insertions O(n^2). Instead, append() is amortized O(1) per call (total O(n) for the loop), and reverse() is O(n) once. Total: O(n) + O(n) = O(n). For 100,000 elements, this is roughly 150x faster than the insert approach.
def build_reversed_list_fast(n):
"""Build a list [n-1, n-2, ..., 1, 0]
using only append() and reverse().
Do NOT use insert(0, x).
"""
# TODO: implement
passExpected Output
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]Hints
Hint 1: insert(0, x) is O(n) because it shifts all elements. append() is amortized O(1).
Hint 2: Build the list in natural order with append(), then call reverse() once at the end.
Medium
Write a function that estimates the allocated capacity of a Python list (how many pointer slots exist, not just how many are used). Use the fact that on 64-bit CPython, an empty list is 56 bytes and each pointer slot is 8 bytes.
Test by printing len vs estimated cap for lists of various sizes.
import sys
EMPTY_LIST_OVERHEAD = sys.getsizeof([])
POINTER_SIZE = 8
def estimate_capacity(lst):
return (sys.getsizeof(lst) - EMPTY_LIST_OVERHEAD) // POINTER_SIZE
for n in [0, 1, 4, 5, 8, 9]:
lst = list(range(n))
print(f"len={n} cap={estimate_capacity(lst)}")Solution
import sys
EMPTY_LIST_OVERHEAD = sys.getsizeof([])
POINTER_SIZE = 8
def estimate_capacity(lst):
return (sys.getsizeof(lst) - EMPTY_LIST_OVERHEAD) // POINTER_SIZE
You cannot read allocated directly from Python, but you can derive it from the byte size. The formula (getsizeof(lst) - base) // 8 gives the number of pointer slots. Notice that list(range(5)) has capacity 8, not 5 — because list() uses the same over-allocation strategy. The gap between len and cap is the headroom that makes future appends cheap.
import sys
def estimate_capacity(lst):
"""Estimate the allocated capacity (number of pointer slots)
of a list by examining its byte size.
On 64-bit CPython:
- Empty list overhead: 56 bytes
- Each pointer slot: 8 bytes
Return the estimated capacity as an integer.
"""
# TODO: implement
passExpected Output
len=0 cap=0
len=1 cap=4
len=4 cap=4
len=5 cap=8
len=8 cap=8
len=9 cap=16Hints
Hint 1: Capacity = (sys.getsizeof(lst) - empty_list_overhead) // pointer_size.
Hint 2: On 64-bit CPython, empty list overhead is typically 56 bytes and each pointer is 8 bytes.
Implement a function that creates a rows x cols matrix filled with a default value. The critical requirement: modifying one cell must not affect any other row. The naive [[default]*cols]*rows fails this because all rows alias the same list.
Test by setting matrix[0][0] = 1 and verifying only row 0 changes.
def make_matrix(rows, cols, default=0):
return [[default] * cols for _ in range(rows)]
matrix = make_matrix(3, 3)
matrix[0][0] = 1
print(matrix)Solution
def make_matrix(rows, cols, default=0):
return [[default] * cols for _ in range(rows)]
The list comprehension creates a new [default]*cols list on each iteration, so each row is an independent object. With [[default]*cols]*rows, the * operator copies the reference to the same inner list rows times. This is one of the most common Python bugs in production code, especially in numerical computing and grid-based algorithms.
def make_matrix(rows, cols, default=0):
"""Create a rows x cols matrix where each row
is an independent list. Modifying one cell must
NOT affect other rows.
"""
# TODO: implement (do NOT use [[default]*cols]*rows)
passExpected Output
[[1, 0, 0], [0, 0, 0], [0, 0, 0]]Hints
Hint 1: [[val]*cols]*rows creates rows references to the SAME inner list — a classic aliasing bug.
Hint 2: Use a list comprehension to create a new inner list on each iteration.
Write an O(n) deduplication function that preserves the order of first occurrences. The naive approach using x not in result_list is O(n^2) because in on a list is O(n). Use the right data structure for membership testing.
Test with [3, 1, 4, 1, 5, 9, 2, 6, 5, 3].
def deduplicate(items):
seen = set()
result = []
for item in items:
if item not in seen:
seen.add(item)
result.append(item)
return result
print(deduplicate([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]))Solution
def deduplicate(items):
seen = set()
result = []
for item in items:
if item not in seen:
seen.add(item)
result.append(item)
return result
The set provides O(1) membership testing, so the total cost is O(n) instead of O(n^2). This pattern — a list for ordered results plus a set for fast lookup — is fundamental in Python. An alternative one-liner is list(dict.fromkeys(items)), which exploits dict insertion-order preservation (Python 3.7+), but understanding the explicit version is essential for interviews.
def deduplicate(items):
"""Remove duplicates from items while preserving
first-occurrence order. Must run in O(n) time.
"""
# TODO: implement without using dict.fromkeys
passExpected Output
[3, 1, 4, 5, 9, 2, 6]Hints
Hint 1: Using `x not in result_list` for membership is O(n) per check — total O(n^2).
Hint 2: Maintain a set alongside the result list: set lookup is O(1), giving O(n) total.
Write a function that flattens a list of lists into a single flat list. The naive approach result = result + chunk in a loop is O(n^2) because + creates a new list every iteration, copying all previous elements.
Test with [[1, 2, 3], [4, 5, 6], [7, 8, 9]].
def flatten_chunks(chunks):
result = []
for chunk in chunks:
result.extend(chunk)
return result
print(flatten_chunks([[1, 2, 3], [4, 5, 6], [7, 8, 9]]))Solution
# Option 1: extend (most readable)
def flatten_chunks(chunks):
result = []
for chunk in chunks:
result.extend(chunk)
return result
# Option 2: itertools (most Pythonic)
from itertools import chain
def flatten_chunks(chunks):
return list(chain.from_iterable(chunks))
# Option 3: list comprehension
def flatten_chunks(chunks):
return [x for chunk in chunks for x in chunk]
The + operator creates a brand new list each time, copying all existing elements plus the new chunk. After k iterations with total n elements, the total copy work is O(n * k) in the worst case. extend() modifies the list in-place with amortized O(1) per element, giving O(n) total. This is one of the most common hidden quadratic traps in Python code.
def flatten_chunks(chunks):
"""Flatten a list of lists into a single list.
Must be O(n) total elements, NOT O(n^2).
Do NOT use result = result + chunk in a loop.
"""
# TODO: implement
passExpected Output
[1, 2, 3, 4, 5, 6, 7, 8, 9]Hints
Hint 1: result = result + chunk creates a new list each iteration — O(n^2) total copies.
Hint 2: Use extend() which appends in-place (amortized O(1) per element), or itertools.chain.from_iterable.
Fill in a dictionary that maps 10 common list operations to their correct time complexity. This tests your knowledge of the full complexity table from the lesson.
def classify_operations():
return {
'index': 'O(1)',
'append': 'O(1)*',
'pop_end': 'O(1)*',
'pop_front': 'O(n)',
'insert_front': 'O(n)',
'membership': 'O(n)',
'sort': 'O(n log n)',
'len': 'O(1)',
'slice': 'O(k)',
'extend': 'O(k)*',
}
ANSWER_KEY = {
'index': 'O(1)',
'append': 'O(1)*',
'pop_end': 'O(1)*',
'pop_front': 'O(n)',
'insert_front': 'O(n)',
'membership': 'O(n)',
'sort': 'O(n log n)',
'len': 'O(1)',
'slice': 'O(k)',
'extend': 'O(k)*',
}
result = classify_operations()
wrong = [k for k in ANSWER_KEY if result.get(k) != ANSWER_KEY[k]]
if wrong:
for k in wrong:
print(f"Wrong: {k} -> got {result.get(k)}, expected {ANSWER_KEY[k]}")
else:
print("All correct!")Solution
def classify_operations():
return {
'index': 'O(1)', # Direct pointer arithmetic: base + i*8
'append': 'O(1)*', # Amortized — occasional O(n) resize
'pop_end': 'O(1)*', # Amortized — may trigger shrink
'pop_front': 'O(n)', # Must shift all elements left
'insert_front': 'O(n)', # Must shift all elements right
'membership': 'O(n)', # Linear scan — use set for O(1)
'sort': 'O(n log n)', # Timsort
'len': 'O(1)', # Stored in ob_size field
'slice': 'O(k)', # Copies k pointers to new list
'extend': 'O(k)*', # Amortized — appends k elements
}
The asterisk marks amortized complexity. append is amortized O(1) because most calls just write a pointer, but occasional reallocations cost O(n). len is O(1) because the length is stored directly in the ob_size field of the PyListObject struct — it does not count elements. The most commonly confused operations are pop(0) and insert(0, x), which people assume are O(1) like pop() and append().
def classify_operations():
"""Return a dict mapping each list operation
to its time complexity string.
Operations to classify:
'index', 'append', 'pop_end', 'pop_front',
'insert_front', 'membership', 'sort',
'len', 'slice', 'extend'
Use these complexity strings:
'O(1)', 'O(1)*', 'O(n)', 'O(n log n)', 'O(k)', 'O(k)*'
where * means amortized and k means slice/iterable size.
"""
# TODO: implement
passExpected Output
All correct!Hints
Hint 1: append and pop_end are amortized O(1) — marked as O(1)*.
Hint 2: pop_front and insert_front must shift all elements, making them O(n).
Hard
Simulate a dynamic array from scratch. Start with capacity 0. On each append, if there is room, the cost is 1 (write one pointer). If the array is full, double the capacity — the cost is current_size + 1 (copy all existing elements plus write the new one).
Return the total cost, amortized cost per operation, number of resizes, and final capacity. Test with n=1000.
This proves empirically that doubling gives amortized O(1) — the amortized cost converges to about 2.0 for a growth factor of 2.
def simulate_dynamic_array(n, growth_factor=2):
size = 0
capacity = 0
total_cost = 0
resize_count = 0
for i in range(n):
if size == capacity:
new_capacity = max(1, int(capacity * growth_factor))
total_cost += size + 1 # copy all + write new
capacity = new_capacity
resize_count += 1
else:
total_cost += 1 # just write the pointer
size += 1
return {
'total_cost': total_cost,
'amortized_cost': round(total_cost / n, 2),
'resize_count': resize_count,
'final_capacity': capacity,
}
result = simulate_dynamic_array(1000)
print(f"total_cost={result['total_cost']}, amortized={result['amortized_cost']}, resizes={result['resize_count']}, final_capacity={result['final_capacity']}")Solution
def simulate_dynamic_array(n, growth_factor=2):
size = 0
capacity = 0
total_cost = 0
resize_count = 0
for i in range(n):
if size == capacity:
new_capacity = max(1, int(capacity * growth_factor))
total_cost += size + 1
capacity = new_capacity
resize_count += 1
else:
total_cost += 1
size += 1
return {
'total_cost': total_cost,
'amortized_cost': round(total_cost / n, 2),
'resize_count': resize_count,
'final_capacity': capacity,
}
The amortized cost converges to approximately 2.0 for a doubling strategy. This is because the geometric series of copy costs 1 + 2 + 4 + 8 + ... + n sums to roughly 2n. Adding the n non-resize appends gives about 3n total, so roughly 3.0 per append. CPython uses a growth factor of about 1.125 (not 2.0), which wastes less memory but has a slightly higher amortized constant. The tradeoff is space efficiency vs. copy frequency.
def simulate_dynamic_array(n, growth_factor=2):
"""Simulate a dynamic array that doubles capacity
when full. Track the cost of each append:
- 1 if there is room (just write the pointer)
- current_length + 1 if a resize is needed
(copy all elements + write the new one)
Return a dict with:
- 'total_cost': sum of all individual costs
- 'amortized_cost': total_cost / n
- 'resize_count': number of resizes
- 'final_capacity': capacity after all appends
"""
# TODO: implement
passExpected Output
total_cost=2047, amortized=2.00, resizes=10, final_capacity=1024Hints
Hint 1: Start with capacity=0. On the first append (or when size==capacity), double the capacity (or set to 1 if 0).
Hint 2: The resize cost is size+1 (copying existing elements plus inserting the new one). Non-resize cost is always 1.
Benchmark three approaches to implementing a queue of n items. Enqueue integers 0 to n-1, then dequeue all of them.
Strategy 1: Use a plain list with pop(0) for dequeue.
Strategy 2: Build with append(), reverse(), then pop() from end.
Strategy 3: Use collections.deque with popleft().
Return a dict of strategy names to elapsed times. Use n=50000 to see the difference clearly.
import time
from collections import deque
def benchmark_queue(n):
results = {}
# Strategy 1: list with pop(0) — O(n^2)
lst = list(range(n))
start = time.perf_counter()
while lst:
lst.pop(0)
results['list_pop0'] = round(time.perf_counter() - start, 4)
# Strategy 2: list with reverse + pop from end — O(n)
lst = list(range(n))
start = time.perf_counter()
lst.reverse()
while lst:
lst.pop()
results['list_reverse'] = round(time.perf_counter() - start, 4)
# Strategy 3: deque with popleft — O(n)
dq = deque(range(n))
start = time.perf_counter()
while dq:
dq.popleft()
results['deque'] = round(time.perf_counter() - start, 4)
return results
result = benchmark_queue(50000)
for name, elapsed in result.items():
print(f"{name}: {elapsed}s")Solution
import time
from collections import deque
def benchmark_queue(n):
results = {}
# Strategy 1: list with pop(0) — O(n^2)
lst = list(range(n))
start = time.perf_counter()
while lst:
lst.pop(0)
results['list_pop0'] = round(time.perf_counter() - start, 4)
# Strategy 2: list with reverse + pop from end — O(n)
lst = list(range(n))
start = time.perf_counter()
lst.reverse()
while lst:
lst.pop()
results['list_reverse'] = round(time.perf_counter() - start, 4)
# Strategy 3: deque with popleft — O(n)
dq = deque(range(n))
start = time.perf_counter()
while dq:
dq.popleft()
results['deque'] = round(time.perf_counter() - start, 4)
return results
For n=50,000, pop(0) takes about 0.5-1.5 seconds while the other two take under 0.01 seconds. The pop(0) approach is O(n^2) because each pop shifts all remaining elements left — that is a memmove of (n-k) pointers on the k-th dequeue. deque uses a doubly-linked list of fixed-size blocks, so popleft() is truly O(1). This is why collections.deque exists: lists are not designed for front operations.
import time
from collections import deque
def benchmark_queue(n):
"""Compare three queue strategies for n enqueue+dequeue ops:
1. list with pop(0) — O(n^2)
2. list with append+reverse — O(n)
3. deque with appendleft — O(n)
Return a dict mapping strategy name to elapsed seconds.
"""
# TODO: implement all three strategies and time them
passExpected Output
list_pop0: slow, list_reverse: fast, deque: fastHints
Hint 1: pop(0) shifts all remaining elements left on every call — O(n) per dequeue, O(n^2) total.
Hint 2: For the reverse strategy: append all items, then reverse, then pop from end. For deque: use appendleft or popleft.
The function process_pipeline_slow has three hidden O(n^2) traps. Your task:
- Identify each quadratic trap and explain why it is O(n^2).
- Write
process_pipeline_fastthat produces the same output in O(n * k) time, where k is the average number of tags per record.
Test with the sample records below.
def process_pipeline_slow(records):
# Bug 1: O(n^2) — list comprehension for membership on every iteration
unique = []
for r in records:
if r['id'] not in [u['id'] for u in unique]:
unique.append(r)
# Bug 2: O(n^2) — pop(i) shifts elements on every deletion
i = 0
while i < len(unique):
if unique[i]['score'] < 0.5:
unique.pop(i)
else:
i += 1
# Bug 3: O(n^2) — concatenation creates new list every iteration
all_tags = []
for r in unique:
all_tags = all_tags + r.get('tags', [])
return all_tags
def process_pipeline_fast(records):
# Fix 1: set for O(1) membership
seen_ids = set()
unique = []
for r in records:
if r['id'] not in seen_ids:
seen_ids.add(r['id'])
unique.append(r)
# Fix 2: list comprehension filter — no shifting
filtered = [r for r in unique if r['score'] >= 0.5]
# Fix 3: extend instead of concatenation
all_tags = []
for r in filtered:
all_tags.extend(r.get('tags', []))
return all_tags
records = [
{'id': 1, 'score': 0.8, 'tags': ['ml', 'python']},
{'id': 2, 'score': 0.3, 'tags': ['web']},
{'id': 1, 'score': 0.9, 'tags': ['duplicate']},
{'id': 3, 'score': 0.7, 'tags': ['data']},
{'id': 4, 'score': 0.6, 'tags': ['ml', 'ops']},
]
slow_result = process_pipeline_slow(records)
fast_result = process_pipeline_fast(records)
print(fast_result)
assert slow_result == fast_result, "Results should match"Solution
def process_pipeline_fast(records):
# Fix 1: set for O(1) membership testing
seen_ids = set()
unique = []
for r in records:
if r['id'] not in seen_ids:
seen_ids.add(r['id'])
unique.append(r)
# Fix 2: list comprehension — O(n), no element shifting
filtered = [r for r in unique if r['score'] >= 0.5]
# Fix 3: extend — O(k) per call, O(n*k) total
all_tags = []
for r in filtered:
all_tags.extend(r.get('tags', []))
return all_tags
Three fixes, three different list internals principles:
-
Dedup: The original builds
[u['id'] for u in unique]on every iteration — O(n) per check, O(n^2) total. A set gives O(1) lookup. -
Filter:
pop(i)from the middle of a list shifts all elements after indexileft. In the worst case (deleting every other element), this is O(n^2). A comprehension builds a new list in one O(n) pass. -
Flatten:
all_tags = all_tags + chunkallocates a new list and copies all previous elements on every iteration. For n records with k tags each, that is O(n^2 * k).extend()appends in-place with amortized O(1) per element.
This is the kind of code review that separates senior engineers from juniors — the slow version looks reasonable but collapses at scale.
def process_pipeline_slow(records):
"""This pipeline has THREE hidden O(n^2) traps.
Identify and fix all of them.
The pipeline should:
1. Deduplicate by 'id' (keep first occurrence)
2. Filter records where score >= 0.5
3. Collect all tags into a flat list
"""
# --- Bug 1: quadratic dedup ---
unique = []
for r in records:
if r['id'] not in [u['id'] for u in unique]:
unique.append(r)
# --- Bug 2: quadratic filter via delete ---
i = 0
while i < len(unique):
if unique[i]['score'] < 0.5:
unique.pop(i)
else:
i += 1
# --- Bug 3: quadratic flatten ---
all_tags = []
for r in unique:
all_tags = all_tags + r.get('tags', [])
return all_tags
def process_pipeline_fast(records):
"""Fix all three O(n^2) traps.
Total complexity should be O(n * k)
where k is avg tags per record.
"""
# TODO: implement the fixed version
passExpected Output
['ml', 'python', 'data', 'ml', 'ops']Hints
Hint 1: Bug 1: building a list of IDs via comprehension each iteration is O(n). Use a set for seen IDs.
Hint 2: Bug 2: popping from the middle shifts elements. Use a list comprehension filter instead. Bug 3: result + chunk copies everything — use extend().
