Skip to main content

Python Data Structure Selection Strategy: Practice Problems & Exercises

Practice: Data Structure Selection Strategy

11 problems4 Easy4 Medium3 Hard40-55 min
← Back to lesson

Easy

#1Membership Check: List vs SetEasy
setmembershipO(1)-vs-O(n)

You receive a list of 5,000 banned usernames. Given a list of 10 usernames to check, determine which ones are banned. The naive approach uses in on the list directly. Refactor to use a set for O(1) lookups.

Python
# Setup — do not modify
banned_list = [f"bad_user_{i}" for i in range(5000)]
usernames_to_check = ["bad_user_42", "good_user", "bad_user_4999",
                       "alice", "bad_user_0", "bob",
                       "bad_user_2500", "carol", "bad_user_100", "dave"]

# Step 1: Convert banned_list to a set for O(1) lookups
banned_set = set(banned_list)

# Step 2: Check membership using the set
found_list = "bad_user_42" in banned_list   # O(n) — slow
found_set = "bad_user_42" in banned_set     # O(1) — fast

print(f"Found via list: {found_list}")
print(f"Found via set: {found_set}")
print(f"Banned count: {len(banned_set)}")
Solution
banned_list = [f"bad_user_{i}" for i in range(5000)]
usernames_to_check = ["bad_user_42", "good_user", "bad_user_4999",
"alice", "bad_user_0", "bob",
"bad_user_2500", "carol", "bad_user_100", "dave"]

banned_set = set(banned_list)

found_list = "bad_user_42" in banned_list
found_set = "bad_user_42" in banned_set

print(f"Found via list: {found_list}")
print(f"Found via set: {found_set}")
print(f"Banned count: {len(banned_set)}")

Why this matters:

  • "bad_user_42" in banned_list scans up to 5,000 elements — O(n).
  • "bad_user_42" in banned_set hashes the string and checks a bucket — O(1) average.
  • If you check 10 usernames against a list of 5,000: 10 x 5,000 = 50,000 comparisons.
  • With a set: one-time O(5,000) conversion + 10 x O(1) lookups = ~5,010 operations.
  • Rule of thumb: If you check membership more than once against the same collection, convert to a set first.
Expected Output
Found via list: True\nFound via set: True\nBanned count: 5000
Hints

Hint 1: A list membership check (`x in my_list`) scans every element — O(n) per check.

Hint 2: Converting the list to a set gives O(1) average membership checks. The one-time conversion is O(n), but each subsequent lookup is O(1).

#2Dedup While Preserving OrderEasy
setdict.fromkeysdeduplicationorder-preserving

Given a log stream with repeated event types, produce a deduplicated list that preserves the order of first appearance. Do NOT use a loop — find a one-liner using dict.fromkeys().

Python
log_events = ["error", "warning", "info", "error", "debug",
              "warning", "info", "critical", "error", "debug"]

# One-liner: use dict.fromkeys to dedup while preserving order
deduped = list(dict.fromkeys(log_events))

print(f"Deduped: {deduped}")
Solution
log_events = ["error", "warning", "info", "error", "debug",
"warning", "info", "critical", "error", "debug"]

deduped = list(dict.fromkeys(log_events))

print(f"Deduped: {deduped}")

Why dict.fromkeys beats alternatives:

set(log_events):
- Removes duplicates but LOSES order (hash-based, not insertion-ordered)
- Output could be: {'debug', 'critical', 'error', 'info', 'warning'}

Manual loop with seen-set:
seen = set()
result = []
for e in log_events:
if e not in seen:
seen.add(e)
result.append(e)
- Works, preserves order, but 5 lines for what should be trivial

dict.fromkeys(log_events):
- Keys are unique (duplicates silently dropped)
- dict preserves insertion order (Python 3.7+ guaranteed)
- One line, O(n) time, O(k) space where k = unique items

When to use each:

  • Need unique items, order doesn't matter: set()
  • Need unique items, order matters: dict.fromkeys() or list(dict.fromkeys())
  • Need unique items with extra logic per item: manual loop with seen set
Expected Output
Deduped: ['error', 'warning', 'info', 'debug', 'critical']
Hints

Hint 1: `set()` removes duplicates but does NOT preserve order.

Hint 2: `dict.fromkeys(sequence)` preserves insertion order (Python 3.7+) and silently drops duplicate keys. The values are all `None`, but `list(dict.fromkeys(seq))` gives you the unique items in original order.

#3Group Items with defaultdictEasy
defaultdictgroupingone-to-many

Given a list of (name, department) tuples, group employees by department. Use defaultdict(list) so you never need to check if a key exists before appending.

Python
from collections import defaultdict

employees = [
    ("Alice", "engineering"),
    ("Bob", "marketing"),
    ("Carol", "engineering"),
    ("Dave", "design"),
    ("Eve", "design"),
]

# Group by department using defaultdict
groups = defaultdict(list)
for name, dept in employees:
    groups[dept].append(name)

# Print each department
for dept in ["engineering", "marketing", "design"]:
    print(f"{dept}: {groups[dept]}")
Solution
from collections import defaultdict

employees = [
("Alice", "engineering"),
("Bob", "marketing"),
("Carol", "engineering"),
("Dave", "design"),
("Eve", "design"),
]

groups = defaultdict(list)
for name, dept in employees:
groups[dept].append(name)

for dept in ["engineering", "marketing", "design"]:
print(f"{dept}: {groups[dept]}")

defaultdict vs regular dict:

Regular dict approach (verbose, error-prone):
groups = {}
for name, dept in employees:
if dept not in groups: # Must check every time
groups[dept] = []
groups[dept].append(name)

defaultdict approach (clean):
groups = defaultdict(list)
for name, dept in employees:
groups[dept].append(name) # Auto-creates [] for new keys

Key insight: defaultdict(list) is the canonical pattern for one-to-many grouping in Python. The factory function (list) is called whenever a missing key is accessed, creating an empty list automatically.

Other common factories:

  • defaultdict(int) — counting (default 0)
  • defaultdict(set) — unique grouping
  • defaultdict(lambda: "unknown") — custom default value
Expected Output
engineering: ['Alice', 'Carol']\nmarketing: ['Bob']\ndesign: ['Dave', 'Eve']
Hints

Hint 1: `defaultdict(list)` auto-creates an empty list for any new key, so `groups[key].append(value)` never raises a `KeyError`.

Hint 2: A regular `dict` would require: `if key not in d: d[key] = []` before every append — error-prone and verbose.

#4Sliding Window with dequeEasy
dequemaxlensliding-window

Implement a sliding window of size 3 over a stream of numbers. Use deque(maxlen=3) to automatically maintain the window. Compute the moving average at each step.

Python
from collections import deque

stream = [1, 2, 3, 4, 5]
window = deque(maxlen=3)
averages = []

for val in stream:
    window.append(val)
    print(f"After {val}: {list(window)}")
    averages.append(sum(window) / len(window))

print(f"Moving averages: {averages}")
Solution
from collections import deque

stream = [1, 2, 3, 4, 5]
window = deque(maxlen=3)
averages = []

for val in stream:
window.append(val)
print(f"After {val}: {list(window)}")
averages.append(sum(window) / len(window))

print(f"Moving averages: {averages}")

Why deque(maxlen=k) beats a list for sliding windows:

List approach:
window = []
window.append(val)
if len(window) > k:
window.pop(0) # O(n) — shifts all elements left!

deque approach:
window = deque(maxlen=k)
window.append(val) # O(1) — auto-evicts oldest if full
  • list.pop(0) is O(n) because every remaining element must shift one position left.
  • deque.append() with maxlen is O(1) because the deque is backed by a doubly-linked list (or ring buffer in CPython) — the oldest element is simply unlinked.
  • For a stream of 1 million values with window size 1,000: list approach does ~1 billion shifts; deque does ~1 million constant-time appends.
Expected Output
After 1: [1]\nAfter 2: [1, 2]\nAfter 3: [1, 2, 3]\nAfter 4: [2, 3, 4]\nAfter 5: [3, 4, 5]\nMoving averages: [1.0, 1.5, 2.0, 3.0, 4.0]
Hints

Hint 1: `deque(maxlen=k)` automatically evicts the oldest element when a new one is appended beyond capacity — no manual bookkeeping needed.

Hint 2: For the moving average, compute `sum(window) / len(window)` after each append. Early windows (before the deque is full) will have fewer than `k` elements.


Medium

#5Refactor O(n squared) Duplicate FinderMedium
CountersetrefactoringO(n)-vs-O(n2)

The following function finds duplicate items in a list but runs in O(n squared) time. Refactor it to O(n) using Counter and a set. Preserve the order of first occurrence.

Python
from collections import Counter

def find_duplicates(lst):
    """Find all items appearing more than once. Preserve first-occurrence order. O(n) time."""
    counts = Counter(lst)
    seen = set()
    result = []
    for item in lst:
        if counts[item] > 1 and item not in seen:
            result.append(item)
            seen.add(item)
    return result

# Test
names = ["alice", "bob", "charlie", "alice", "dave",
         "charlie", "eve", "bob", "alice"]

print(f"Duplicates: {find_duplicates(names)}")
print(f"Time complexity: O(n)")
Solution
from collections import Counter

def find_duplicates(lst):
"""Find all items appearing more than once. Preserve first-occurrence order. O(n) time."""
counts = Counter(lst) # O(n) — one pass to count everything
seen = set()
result = []
for item in lst: # O(n) — one pass to collect in order
if counts[item] > 1 and item not in seen: # O(1) dict + O(1) set
result.append(item)
seen.add(item)
return result

names = ["alice", "bob", "charlie", "alice", "dave",
"charlie", "eve", "bob", "alice"]

print(f"Duplicates: {find_duplicates(names)}")
print(f"Time complexity: O(n)")

The naive O(n squared) version and why it fails:

# SLOW — do not use at scale
def find_dups_naive(lst):
result = []
for i, item in enumerate(lst):
if item in lst[i+1:]: # O(n) slice + O(n) scan per iteration
if item not in result: # O(n) scan of result list
result.append(item)
return result

Complexity comparison:

OperationNaiveRefactored
Count occurrencesO(n) per item x n items = O(n squared)Counter: O(n) single pass
Check "already added"item not in result — O(n) list scanitem not in seen — O(1) set check
TotalO(n squared)O(n)

For n = 100,000 items: naive does ~10 billion operations; refactored does ~200,000.

Expected Output
Duplicates: ['alice', 'charlie', 'bob']\nTime complexity: O(n)
Hints

Hint 1: The bottleneck in the naive version is `item in lst[i+1:]` inside a loop — that is O(n) per iteration, O(n squared) total.

Hint 2: Use `Counter` for a single O(n) pass to count occurrences, then collect items with count > 1 in a second O(n) pass. Use a `seen` set to preserve first-occurrence order without duplicates.

#6Top-K Frequent ElementsMedium
Countermost_commonheapqtop-k

Given a text document, find the top 3 most frequent words (excluding stop words). Use Counter.most_common() instead of sorting the entire frequency table.

Python
from collections import Counter
import re

text = """
Python is the language of data science. Python handles data processing,
data analysis, and data visualization. Engineers write Python code to
build systems. Clean code in Python makes Python the top choice.
Python code is readable code.
"""

stop_words = {"is", "the", "of", "and", "to", "in", "a", "an", "it", "for"}

# Step 1: Tokenize and filter
words = re.findall(r'\b\w+\b', text.lower())
filtered = [w for w in words if w not in stop_words]

# Step 2: Count and get top 3
counts = Counter(filtered)
top_3 = counts.most_common(3)

print(f"Top 3 words: {top_3}")
print(f"Used heapq internally: O(k + n*log(k))")
Solution
from collections import Counter
import re

text = """
Python is the language of data science. Python handles data processing,
data analysis, and data visualization. Engineers write Python code to
build systems. Clean code in Python makes Python the top choice.
Python code is readable code.
"""

stop_words = {"is", "the", "of", "and", "to", "in", "a", "an", "it", "for"}

words = re.findall(r'\b\w+\b', text.lower())
filtered = [w for w in words if w not in stop_words]

counts = Counter(filtered)
top_3 = counts.most_common(3)

print(f"Top 3 words: {top_3}")
print(f"Used heapq internally: O(k + n*log(k))")

Why most_common(k) beats sorting:

Approach 1 — Full sort:
sorted(counts.items(), key=lambda x: x[1], reverse=True)[:3]
Time: O(m * log(m)) where m = unique words
Sorts ALL items just to get top 3

Approach 2 — most_common(k):
counts.most_common(3)
Time: O(m + 3 * log(m)) ≈ O(m)
Uses heapq.nlargest internally — maintains a heap of size k

For m = 100,000 unique words:
Full sort: 100,000 * 17 ≈ 1.7M comparisons
most_common: 100,000 + 3*17 ≈ 100,051 operations
Speedup: ~17x

Common mistake: Using sorted() when you only need the top few items. most_common(k) is specifically optimized for this pattern.

Expected Output
Top 3 words: [('python', 6), ('data', 4), ('code', 3)]\nUsed heapq internally: O(k + n*log(k))
Hints

Hint 1: `Counter.most_common(k)` uses `heapq.nlargest` internally — it does NOT sort the entire dictionary.

Hint 2: The complexity is O(n) to build the Counter + O(k + m*log(k)) for `most_common(k)` where m is the number of unique items. This beats O(m*log(m)) from sorting all items.

#7Queue vs Stack: Choose the Right EndMedium
dequestackqueueFIFOLIFO

Implement both a FIFO queue and a LIFO stack using deque. Process three tasks through each and show the different ordering. Explain why list.pop(0) is the wrong choice for a queue.

Python
from collections import deque

tasks = ["task_1", "task_2", "task_3"]

# FIFO Queue — first in, first out
queue = deque()
for t in tasks:
    queue.append(t)

queue_order = []
while queue:
    queue_order.append(queue.popleft())    # O(1) — remove from left

# LIFO Stack — last in, first out
stack = deque()
for t in tasks:
    stack.append(t)

stack_order = []
while stack:
    stack_order.append(stack.pop())        # O(1) — remove from right

print(f"Queue (FIFO): {', '.join(queue_order)}")
print(f"Stack (LIFO): {', '.join(stack_order)}")
print(f"Deque is O(1) for both ends")
Solution
from collections import deque

tasks = ["task_1", "task_2", "task_3"]

# FIFO Queue — first in, first out
queue = deque()
for t in tasks:
queue.append(t)

queue_order = []
while queue:
queue_order.append(queue.popleft())

# LIFO Stack — last in, first out
stack = deque()
for t in tasks:
stack.append(t)

stack_order = []
while stack:
stack_order.append(stack.pop())

print(f"Queue (FIFO): {', '.join(queue_order)}")
print(f"Stack (LIFO): {', '.join(stack_order)}")
print(f"Deque is O(1) for both ends")

Why deque beats list for queues:

Operation | list | deque
-----------------+-------------+---------
append (right) | O(1) amort | O(1)
pop (right) | O(1) | O(1)
pop(0) / popleft | O(n) !! | O(1)
insert(0, x) | O(n) !! | O(1)
  • list.pop(0) shifts every remaining element one position left — O(n) per call.
  • Processing n items through a list-based queue: O(n squared) total.
  • deque.popleft() is O(1) — the deque is implemented as a doubly-linked list of fixed-size blocks.
  • Rule: Use deque any time you need to add or remove from both ends.
Expected Output
Queue (FIFO): task_1, task_2, task_3\nStack (LIFO): task_3, task_2, task_1\nDeque is O(1) for both ends
Hints

Hint 1: A queue (FIFO) processes items in the order they arrived: `append()` to add, `popleft()` to remove. A stack (LIFO) processes the most recent item first: `append()` to add, `pop()` to remove.

Hint 2: Using a list for a queue is a performance trap: `list.pop(0)` is O(n) because it shifts all elements. `deque.popleft()` is O(1).

#8Two-Sum with Dict LookupMedium
dicthash-maptwo-sumO(n)-vs-O(n2)

Given a list of numbers and a target sum, find two numbers that add up to the target. Return both the values and their indices. Use a dict for O(n) time instead of the O(n squared) brute-force approach.

Python
def two_sum(nums, target):
    """Find two numbers that sum to target. Return (val1, val2, idx1, idx2) or None."""
    seen = {}    # value -> index
    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:       # O(1) dict lookup
            j = seen[complement]
            return (complement, num, j, i)
        seen[num] = i                # Store for future lookups
    return None

# Test
numbers = [11, 2, 15, 7, 3]
target = 9
result = two_sum(numbers, target)

if result:
    val1, val2, idx1, idx2 = result
    print(f"Pair found: ({val1}, {val2}) at indices ({idx1}, {idx2})")
print(f"Time complexity: O(n)")
Solution
def two_sum(nums, target):
"""Find two numbers that sum to target. Return (val1, val2, idx1, idx2) or None."""
seen = {}
for i, num in enumerate(nums):
complement = target - num
if complement in seen:
j = seen[complement]
return (complement, num, j, i)
seen[num] = i
return None

numbers = [11, 2, 15, 7, 3]
target = 9
result = two_sum(numbers, target)

if result:
val1, val2, idx1, idx2 = result
print(f"Pair found: ({val1}, {val2}) at indices ({idx1}, {idx2})")
print(f"Time complexity: O(n)")

Brute-force vs dict approach:

Brute-force:
for i in range(len(nums)):
for j in range(i+1, len(nums)):
if nums[i] + nums[j] == target: ...
Time: O(n^2) — checks every pair
Space: O(1)

Dict approach:
For each num, ask: "Is (target - num) in my dict?"
Time: O(n) — single pass, O(1) lookup each
Space: O(n) — stores up to n entries

This is the classic space-time tradeoff: spend O(n) extra memory to reduce time from O(n squared) to O(n). The dict acts as a lookup table that turns a nested search into a single-pass scan.

Interview note: This is the most-asked coding interview question (LeetCode #1). The interviewer wants to see you reach for a hash map, not nested loops.

Expected Output
Pair found: (2, 7) at indices (1, 3)\nTime complexity: O(n)
Hints

Hint 1: The brute-force approach checks every pair — O(n squared). The key insight: for each number `x`, you need to find if `target - x` exists somewhere in the array.

Hint 2: Build a dict mapping each number to its index as you iterate. For each new number, check if `target - num` is already in the dict. This gives O(1) lookup per element.


Hard

#9LRU Cache from OrderedDictHard
OrderedDictLRU-cachemove_to_endpopitem

Implement an LRU (Least Recently Used) cache with O(1) get and O(1) put using OrderedDict. The cache should have a fixed capacity and evict the least recently used item when full. After the operations below, "b" should be evicted because it was least recently used when "d" was inserted.

Python
from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self._cache = OrderedDict()

    def get(self, key):
        if key not in self._cache:
            return None
        self._cache.move_to_end(key)   # Mark as most recently used
        return self._cache[key]

    def put(self, key, value):
        if key in self._cache:
            self._cache.move_to_end(key)
        self._cache[key] = value
        if len(self._cache) > self.capacity:
            self._cache.popitem(last=False)   # Evict least recently used

    def items(self):
        return list(self._cache.items())

# Test sequence
cache = LRUCache(capacity=3)
cache.put("a", 1)
cache.put("b", 2)
cache.put("c", 3)
# State: a -> b -> c (a is least recent)

cache.get("a")
# State: b -> c -> a (b is now least recent)

cache.put("d", 4)
# Capacity exceeded — evict "b" (least recent)
# State: c -> a -> d

print(f"cache.get('a'): {cache.get('a')}")
print(f"cache.get('b'): {cache.get('b')}")
print(f"cache.get('c'): {cache.get('c')}")
print(f"cache.get('d'): {cache.get('d')}")
print(f"Final state: {cache.items()}")
Solution
from collections import OrderedDict

class LRUCache:
def __init__(self, capacity):
self.capacity = capacity
self._cache = OrderedDict()

def get(self, key):
if key not in self._cache:
return None
self._cache.move_to_end(key)
return self._cache[key]

def put(self, key, value):
if key in self._cache:
self._cache.move_to_end(key)
self._cache[key] = value
if len(self._cache) > self.capacity:
self._cache.popitem(last=False)

def items(self):
return list(self._cache.items())

cache = LRUCache(capacity=3)
cache.put("a", 1)
cache.put("b", 2)
cache.put("c", 3)
cache.get("a")
cache.put("d", 4)

print(f"cache.get('a'): {cache.get('a')}")
print(f"cache.get('b'): {cache.get('b')}")
print(f"cache.get('c'): {cache.get('c')}")
print(f"cache.get('d'): {cache.get('d')}")
print(f"Final state: {cache.items()}")

Why OrderedDict and not dict + list:

dict + list approach:
get: dict[key] O(1) + list.remove(key) O(n) + list.append(key) O(1) = O(n)
put: same O(n) — list.remove is the bottleneck

OrderedDict approach:
get: dict lookup O(1) + move_to_end O(1) = O(1)
put: dict insert O(1) + move_to_end O(1) + popitem O(1) = O(1)

How OrderedDict achieves O(1) for move_to_end: Internally, it maintains a doubly-linked list alongside the hash table. Each entry has prev/next pointers. move_to_end unlinks the node and relinks it at the tail — pure pointer manipulation, no scanning.

Production note: In real applications, consider functools.lru_cache for function memoization or cachetools.LRUCache for a production-tested implementation. The manual version above is what interviewers expect you to build.

Expected Output
cache.get('a'): 1\ncache.get('b'): None\ncache.get('c'): 3\ncache.get('d'): 4\nFinal state: [('c', 3), ('a', 1), ('d', 4)]
Hints

Hint 1: `OrderedDict.move_to_end(key)` moves a key to the end (most recent) in O(1). This is the critical operation that makes LRU possible without a separate linked list.

Hint 2: On `put`: if the key exists, update its value and move to end. If inserting a new key exceeds capacity, call `popitem(last=False)` to evict the least recently used (leftmost) item in O(1).

#10Multi-Source Config with ChainMapHard
ChainMaplayered-configpriority-mergelive-updates

Build a three-layer configuration system using ChainMap: request overrides (highest priority), user settings, and global defaults (lowest priority). Demonstrate that changes to underlying dicts are immediately visible without rebuilding the ChainMap.

Python
from collections import ChainMap

# Layer 3 (lowest priority): global defaults
defaults = {
    "theme": "light",
    "language": "en",
    "page_size": 20,
    "notifications": True,
    "timezone": "UTC",
    "currency": "USD",
}

# Layer 2: user settings (from DB)
user_settings = {"theme": "dark", "page_size": 50}

# Layer 1 (highest priority): per-request overrides
request_overrides = {}

# Build the ChainMap — searches left to right
config = ChainMap(request_overrides, user_settings, defaults)

# Test priority resolution
print(f"theme: {config['theme']}")           # dark — user wins over default
print(f"language: {config['language']}")      # en — falls through to default

# Test request override
request_overrides["page_size"] = 100
print(f"page_size (with override): {config['page_size']}")   # 100 — override wins

# Clear overrides
request_overrides.clear()
print(f"page_size (after clear): {config['page_size']}")     # 50 — back to user setting

# Live update — no rebuild needed
user_settings["notifications"] = False
print(f"notifications: {config['notifications']}")            # False — immediate

print(f"Full config keys: {len(config)}")
Solution
from collections import ChainMap

defaults = {
"theme": "light",
"language": "en",
"page_size": 20,
"notifications": True,
"timezone": "UTC",
"currency": "USD",
}

user_settings = {"theme": "dark", "page_size": 50}
request_overrides = {}

config = ChainMap(request_overrides, user_settings, defaults)

print(f"theme: {config['theme']}")
print(f"language: {config['language']}")

request_overrides["page_size"] = 100
print(f"page_size (with override): {config['page_size']}")

request_overrides.clear()
print(f"page_size (after clear): {config['page_size']}")

user_settings["notifications"] = False
print(f"notifications: {config['notifications']}")

print(f"Full config keys: {len(config)}")

Why ChainMap beats dict merging:

Dict merge approach:
config = {**defaults, **user_settings, **request_overrides}
- Creates a NEW dict with all keys copied — O(n) time + O(n) space
- NOT live: changes to user_settings are NOT reflected
- Must rebuild the merged dict every time any layer changes

ChainMap approach:
config = ChainMap(request_overrides, user_settings, defaults)
- Holds REFERENCES to the original dicts — O(1) construction
- LIVE: updating user_settings["theme"] is immediately visible via config
- No rebuild needed — just modify the underlying dict
- Lookup cost: O(k) where k = number of layers (typically 2-4)

When to use ChainMap:

  • Layered configuration (env vars, config file, defaults)
  • Scope chains (local, enclosing, global, builtin — Python's own LEGB rule)
  • Template rendering with nested contexts
  • Any scenario where you need priority-based lookup across multiple dicts that may change independently
Expected Output
theme: dark\nlanguage: en\npage_size (with override): 100\npage_size (after clear): 50\nnotifications: False\nFull config keys: 6
Hints

Hint 1: `ChainMap` takes multiple dicts and searches them in order — first dict with the key wins. It does NOT merge or copy the dicts.

Hint 2: Because ChainMap holds references (not copies), updating the underlying dicts is immediately visible through the ChainMap. This makes it ideal for layered configuration where different layers can change independently.

#11Choose the Right Structure: System DesignHard
heapqdefaultdictsetsystem-designmulti-structure

Design a product analytics tracker that supports four operations efficiently. For each operation, choose the optimal data structure. Implement all four:

  1. Top-K categories — track which product categories are browsed most (use Counter)
  2. Unique products viewed — count distinct products a user has seen (use set)
  3. Recent views — keep the last 3 products viewed (use deque)
  4. Category index — group products by category for recommendations (use defaultdict)
Python
from collections import Counter, defaultdict, deque

class ProductTracker:
    def __init__(self, recent_size=3):
        self.category_counts = Counter()         # Top-K categories
        self.unique_products = set()              # Unique product tracking
        self.recent_views = deque(maxlen=recent_size)  # Sliding recency window
        self.category_index = defaultdict(list)   # Product grouping by category

    def track_view(self, product_id, category):
        """Record a product view — updates all four structures."""
        self.category_counts[category] += 1
        self.unique_products.add(product_id)
        self.recent_views.append(product_id)
        if product_id not in self.category_index[category]:
            self.category_index[category].append(product_id)

    def top_categories(self, k=3):
        return self.category_counts.most_common(k)

    def unique_count(self):
        return len(self.unique_products)

    def get_recent(self):
        return list(self.recent_views)

    def recommend_from_category(self, category):
        return self.category_index.get(category, [])

# Simulate user session
tracker = ProductTracker(recent_size=3)
views = [
    ("prod_1", "electronics"), ("prod_2", "electronics"),
    ("prod_3", "books"), ("prod_4", "clothing"),
    ("prod_5", "electronics"), ("prod_1", "electronics"),  # repeat
    ("prod_6", "books"), ("prod_7", "books"),
    ("prod_8", "electronics"), ("prod_9", "clothing"),
]

for pid, cat in views:
    tracker.track_view(pid, cat)

print(f"Top 3 categories: {tracker.top_categories(3)}")
print(f"Unique products viewed: {tracker.unique_count()}")
print(f"Recent views: {tracker.get_recent()}")
print(f"Recommendations for electronics: {tracker.recommend_from_category('electronics')}")
Solution
from collections import Counter, defaultdict, deque

class ProductTracker:
def __init__(self, recent_size=3):
self.category_counts = Counter()
self.unique_products = set()
self.recent_views = deque(maxlen=recent_size)
self.category_index = defaultdict(list)

def track_view(self, product_id, category):
self.category_counts[category] += 1
self.unique_products.add(product_id)
self.recent_views.append(product_id)
if product_id not in self.category_index[category]:
self.category_index[category].append(product_id)

def top_categories(self, k=3):
return self.category_counts.most_common(k)

def unique_count(self):
return len(self.unique_products)

def get_recent(self):
return list(self.recent_views)

def recommend_from_category(self, category):
return self.category_index.get(category, [])

tracker = ProductTracker(recent_size=3)
views = [
("prod_1", "electronics"), ("prod_2", "electronics"),
("prod_3", "books"), ("prod_4", "clothing"),
("prod_5", "electronics"), ("prod_1", "electronics"),
("prod_6", "books"), ("prod_7", "books"),
("prod_8", "electronics"), ("prod_9", "clothing"),
]

for pid, cat in views:
tracker.track_view(pid, cat)

print(f"Top 3 categories: {tracker.top_categories(3)}")
print(f"Unique products viewed: {tracker.unique_count()}")
print(f"Recent views: {tracker.get_recent()}")
print(f"Recommendations for electronics: {tracker.recommend_from_category('electronics')}")

Data structure selection rationale:

Feature | Structure | Why
-----------------------+--------------------+----------------------------------
Top-K categories | Counter | O(1) increment, O(k + m*log(k)) top-k
Unique product count | set | O(1) add + dedup, O(1) len()
Recent N views | deque(maxlen=N) | O(1) append, auto-evicts oldest
Category → products | defaultdict(list) | O(1) append, no KeyError on new key

The engineering lesson: Real systems rarely use a single data structure. Each sub-problem has its own access pattern, and the right design composes multiple structures — each chosen for its O(1) operation on the critical path. This tracker handles 10 views with 4 constant-time updates each = 40 total O(1) operations, regardless of data size.

What would go wrong with a single list?

  • Top-K: scan entire list and count — O(n squared)
  • Unique count: len(set(all_products)) every time — O(n)
  • Recent N: slice [-3:] — O(1) but only if you keep everything in memory
  • Category lookup: filter entire list — O(n)
Expected Output
Top 3 categories: [('electronics', 5), ('books', 3), ('clothing', 2)]\nUnique products viewed: 7\nRecent views: ['prod_8', 'prod_7', 'prod_6']\nRecommendations for electronics: ['prod_2', 'prod_5', 'prod_8']
Hints

Hint 1: Think about what operation each feature needs: counting uses `Counter`, uniqueness uses `set`, recency uses `deque(maxlen=k)`, grouping uses `defaultdict(list)`.

Hint 2: The key engineering skill is selecting the right structure for each sub-problem and composing them. No single data structure solves everything — the art is in the combination.

© 2026 EngineersOfAI. All rights reserved.