Deque and Collections - Specialized Data Structures for Performance
Reading time: ~20 minutes | Level: Foundation → Engineering
Here is a benchmark that surprises most Python developers.
import time
from collections import deque
n = 200_000
# Front-insert 200k items into a list
lst = []
start = time.time()
for i in range(n):
lst.insert(0, i)
list_time = time.time() - start
# Front-insert 200k items into a deque
dq = deque()
start = time.time()
for i in range(n):
dq.appendleft(i)
deque_time = time.time() - start
print(f"list.insert(0): {list_time:.3f}s")
print(f"deque.appendleft(): {deque_time:.4f}s")
print(f"Speedup: {list_time / deque_time:.0f}x")
Actual output on a modern machine:
list.insert(0): 2.847s
deque.appendleft(): 0.0142s
Speedup: 200x
Two hundred times faster - same logical operation, different data structure.
Understanding why this happens, and knowing when each structure in the collections module applies, is the difference between code that works and code that scales.
What You Will Learn
- How CPython implements
dequeas a doubly-linked list of fixed-size memory blocks - not a plain linked list - Why
dequeguarantees O(1) at both ends whilelist.insert(0)is always O(n) - How
deque(maxlen=k)implements a space-efficient rotating buffer for sliding windows - When to reach for
OrderedDict- and its enduring usefulness for LRU cache patterns - How
namedtupleandtyping.NamedTuplecreate lightweight, typed, immutable records - How
ChainMapimplements Python's own variable-lookup scoping rules - The complete
collectionsmodule overview - what exists and exactly when to use each tool - Production patterns: BFS queues, sliding window max, configuration layering, LRU cache from scratch
Prerequisites
- Python lists and their internal structure (Lesson 01 - Lists Internals)
- Python dictionaries and hash tables (Lesson 06 - Dictionaries Deep Dive)
- Basic Big-O notation (understand O(1) vs O(n))
- Familiarity with Python classes and dataclasses helps for the NamedTuple section
Part 1 - Why Lists Fail at the Front
The O(n) Wall
From Lesson 01, you know that a Python list stores elements as an array of pointers in contiguous memory:
With n elements, that is n pointer copies - O(n). For a list with 200,000 elements, that is 200,000 copies per insert. Over 200,000 insertions, total work is 200,000 × 200,000 / 2 = 20 billion operations - O(n²) total.
This is why the list benchmark above took almost 3 seconds.
Part 2 - Deque Internal Architecture
Not a Simple Linked List
The intuitive fix for O(n) front insertion is a doubly-linked list - each node points to the previous and next node, so insertion at either end is O(1). But naive linked lists have poor cache performance: every node is a separate heap allocation, and traversal jumps across memory.
CPython's deque is smarter. It uses a doubly-linked list of fixed-size blocks, each block holding 64 pointers:
Each block: 64 PyObject* pointers = 64 x 8 = 512 bytes
Key design decisions:
| Property | Value | Benefit |
|---|---|---|
| Block size | 64 pointers | Cache-line friendly; fits in L1/L2 cache |
| Block linkage | Doubly-linked | O(1) block add/remove at either end |
| Append within block | Write to next slot | O(1), no allocation |
| Append across blocks | Allocate new block, link it | O(1), rare |
Random access dq[i] | Walk blocks | O(n) - not O(1) |
O(1) at Both Ends
appendleft and popleft write or read from the left edge of the leftmost block. If that block is exhausted, a new block is allocated and linked in - constant time:
appendleft(99) - space exists in leftblock:
Block 0: [_, _, ptr0, ptr1, ...ptr63]
↑ leftindex
After: [_, ptr99, ptr0, ptr1, ...ptr63]
↑ leftindex moves left
O(1) - no shifting of existing data.
This is fundamentally different from the list model. No shifting ever happens.
:::tip Why Not Just Use a Linked List?
A pure linked list (each node is a separate heap object) has O(1) insertions but terrible cache performance - every .next access is a cache miss. Deque's 64-element blocks keep related data together in cache, giving the best of both worlds: O(1) operations with cache-friendly block layout.
:::
Part 3 - Deque Operations and When to Use Each
Full API with Complexity
from collections import deque
dq = deque([1, 2, 3, 4, 5])
# O(1) operations - use freely
dq.append(6) # Add to right end: [1, 2, 3, 4, 5, 6]
dq.appendleft(0) # Add to left end: [0, 1, 2, 3, 4, 5, 6]
dq.pop() # Remove from right: 6, [0, 1, 2, 3, 4, 5]
dq.popleft() # Remove from left: 0, [1, 2, 3, 4, 5]
len(dq) # Length: 5 (O(1))
# O(n) operations - use sparingly
dq[2] # Index access: 3 (O(n) - walks blocks)
dq.rotate(2) # Rotate right 2 steps: [4, 5, 1, 2, 3]
dq.rotate(-1) # Rotate left 1 step: [5, 1, 2, 3, 4]
dq.extend([7, 8]) # Extend right: [5, 1, 2, 3, 4, 7, 8]
dq.extendleft([9, 10])# Extend left (reversed):[10, 9, 5, 1, 2, 3, 4, 7, 8]
dq.remove(9) # Remove first occurrence of 9 - O(n)
dq.count(1) # Count occurrences - O(n)
# Convert to list when you need indexing
as_list = list(dq) # O(n) copy
:::warning Deque Random Access is O(n)
dq[i] is NOT O(1). It walks the block chain from whichever end is closer. If you need frequent random access, use a list. Deque is optimized for ends only.
:::
Deque as a Queue (FIFO)
from collections import deque
# BFS - canonical use of deque as a FIFO queue
def bfs(graph, start):
visited = set()
queue = deque([start]) # Initialize with start node
visited.add(start)
order = []
while queue:
node = queue.popleft() # O(1) - dequeue from front
order.append(node)
for neighbor in graph[node]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor) # O(1) - enqueue at back
return order
graph = {
'A': ['B', 'C'],
'B': ['D', 'E'],
'C': ['F'],
'D': [], 'E': [], 'F': []
}
print(bfs(graph, 'A'))
# Output: ['A', 'B', 'C', 'D', 'E', 'F']
Using list.pop(0) instead of deque.popleft() would make this O(n²) in the total number of nodes - a known performance anti-pattern in graph algorithms.
Deque as a Stack (LIFO)
from collections import deque
# Deque as stack - though list works equally well for pure stacks
stack = deque()
stack.append(10) # push
stack.append(20)
stack.append(30)
print(stack.pop()) # 30
print(stack.pop()) # 20
# For pure LIFO, list is equally fast and more readable:
# stack = []
# stack.append(x) / stack.pop()
# Use deque when you also need popleft (e.g., sliding window with two-pointer)
:::note List vs Deque for Stacks
For pure LIFO stacks, list with append()/pop() is just as fast as deque and more memory-efficient. Reach for deque when you need both ends - queues, sliding windows, or bidirectional traversal.
:::
Part 4 - deque(maxlen): Rotating Buffer and Sliding Window
The Bounded Deque
Pass maxlen=k to create a deque that never exceeds k elements. When a new element is appended at the right and the deque is full, the leftmost element is automatically discarded - and vice versa for appendleft:
from collections import deque
# Last-5-events monitor
recent = deque(maxlen=5)
for i in range(10):
recent.append(i)
print(list(recent))
# Output:
# [0]
# [0, 1]
# [0, 1, 2]
# [0, 1, 2, 3]
# [0, 1, 2, 3, 4]
# [1, 2, 3, 4, 5] ← 0 evicted automatically
# [2, 3, 4, 5, 6]
# [3, 4, 5, 6, 7]
# [4, 5, 6, 7, 8]
# [5, 6, 7, 8, 9]
This is a circular buffer implemented in O(1) per operation, with zero manual bookkeeping.
Real-World: Sliding Window Sum
A common interview and production problem - compute the sum of the last k elements as a stream arrives:
from collections import deque
def sliding_window_sum(stream, k):
"""
Compute sum of last k elements in a data stream.
Time: O(1) per element (amortized)
Space: O(k)
"""
window = deque(maxlen=k)
window_sum = 0
results = []
for value in stream:
if len(window) == k:
window_sum -= window[0] # Subtract the element that will be evicted
window.append(value)
window_sum += value
if len(window) == k:
results.append(window_sum)
return results
sensor_data = [4, 2, 7, 1, 9, 3, 6, 8]
print(sliding_window_sum(sensor_data, k=3))
# Output: [13, 10, 17, 13, 18, 17]
# (4+2+7=13), (2+7+1=10), (7+1+9=17), (1+9+3=13), (9+3+6=18), (3+6+8=17)
Sliding Window Maximum (Classic Algorithm)
The harder version - find the maximum in each window. Naive O(nk) becomes O(n) with a monotonic deque:
from collections import deque
def sliding_window_max(nums, k):
"""
Sliding window maximum using a monotonic deque.
Maintains indices of "useful" elements - decreasing by value.
Time: O(n) - each element pushed/popped at most once.
Space: O(k)
"""
dq = deque() # stores indices, not values
result = []
for i, val in enumerate(nums):
# Remove indices that have left the window
while dq and dq[0] <= i - k:
dq.popleft()
# Remove indices whose values are smaller than current
# (they can never be the max while current element is in window)
while dq and nums[dq[-1]] < val:
dq.pop()
dq.append(i)
# Window is fully formed starting at index k-1
if i >= k - 1:
result.append(nums[dq[0]]) # Front is always the max index
return result
nums = [1, 3, -1, -3, 5, 3, 6, 7]
print(sliding_window_max(nums, k=3))
# Output: [3, 3, 5, 5, 6, 7]
# Trace:
# Window [1,3,-1] → max 3 (dq: [1] - index of 3)
# Window [3,-1,-3] → max 3 (dq: [1])
# Window [-1,-3,5] → max 5 (dq: [4] - index of 5)
# Window [-3,5,3] → max 5 (dq: [4])
# Window [5,3,6] → max 6 (dq: [6])
# Window [3,6,7] → max 7 (dq: [7])
:::tip Why This is O(n) and Not O(nk)
Each index is pushed into the deque exactly once and popped at most once. Over all n elements, total push+pop work is O(n), regardless of window size k. The naive approach of calling max(window) on each step scans k elements per step - O(nk) total.
:::
Part 5 - collections.OrderedDict
History: Why It Existed
Before Python 3.7, regular dict did not guarantee insertion-order. This created a real problem for any code that needed deterministic dict ordering - configuration serialization, protocol encoding, test reproducibility.
collections.OrderedDict was introduced in Python 2.7 to fill that gap. It maintains a doubly-linked list of keys alongside the hash table, guaranteeing insertion order.
Python 3.7+ made insertion-order preservation a language guarantee for regular dict. So does OrderedDict still matter?
Yes - For LRU Cache Pattern
OrderedDict has one critical method that plain dict lacks: move_to_end(key, last=True). This is O(1) and makes implementing an LRU (Least Recently Used) cache trivial:
from collections import OrderedDict
class LRUCache:
"""
O(1) get and put using OrderedDict.
Least recently used item is always at the left (front).
Most recently used item is always at the right (end).
"""
def __init__(self, capacity: int):
self.capacity = capacity
self.cache = OrderedDict()
def get(self, key: int) -> int:
if key not in self.cache:
return -1
# Move to end = mark as most recently used
self.cache.move_to_end(key)
return self.cache[key]
def put(self, key: int, value: int) -> None:
if key in self.cache:
self.cache.move_to_end(key)
self.cache[key] = value
if len(self.cache) > self.capacity:
# Remove least recently used = leftmost item
self.cache.popitem(last=False)
# Demo
lru = LRUCache(capacity=3)
lru.put(1, "one")
lru.put(2, "two")
lru.put(3, "three")
print(lru.get(1)) # "one" - key 1 moved to end (most recent)
lru.put(4, "four") # key 2 evicted (least recently used)
print(lru.get(2)) # -1 - evicted
print(lru.get(3)) # "three"
print(lru.get(4)) # "four"
OrderedDict Equality Semantics
OrderedDict also differs from dict in equality comparison - order matters:
from collections import OrderedDict
d1 = OrderedDict([('a', 1), ('b', 2)])
d2 = OrderedDict([('b', 2), ('a', 1)])
plain = {'a': 1, 'b': 2}
print(d1 == d2) # False - different order
print(d1 == plain) # True - OrderedDict vs dict ignores order
Part 6 - collections.namedtuple and typing.NamedTuple
The Problem namedtuple Solves
Plain tuples are positional - index 0 means nothing by itself:
# Unclear - what does [0] mean?
point = (10.5, 20.3)
distance = point[0] ** 2 + point[1] ** 2
# Worse: easy to swap x and y
def process(point):
lat, lng = point[0], point[1] # Is it (lat, lng) or (lng, lat)?
namedtuple gives tuple semantics (immutable, hashable, memory-efficient) with attribute-style access:
from collections import namedtuple
# Define the named tuple class
Point = namedtuple('Point', ['x', 'y'])
GeoCoord = namedtuple('GeoCoord', ['lat', 'lng', 'altitude'])
# Create instances - positional or keyword
p = Point(x=10.5, y=20.3)
loc = GeoCoord(lat=37.7749, lng=-122.4194, altitude=52)
# Attribute access - clear intent
print(p.x) # 10.5
print(loc.lat) # 37.7749
# Still a tuple - all tuple operations work
print(p[0]) # 10.5
print(len(p)) # 2
x, y = p # Unpacking works
# Immutable
# p.x = 99 # AttributeError - namedtuples are immutable
# Memory: same as tuple - no __dict__ overhead
import sys
print(sys.getsizeof(p)) # 56 bytes - same as (10.5, 20.3)
print(sys.getsizeof({'x': 10.5, 'y': 20.3})) # 232 bytes - dict is much larger
typing.NamedTuple - The Modern Version
typing.NamedTuple uses class syntax, supports type annotations, and allows default values:
from typing import NamedTuple, Optional
from datetime import datetime
class DatabaseConfig(NamedTuple):
host: str
port: int = 5432 # Default value
database: str = "appdb"
ssl: bool = True
timeout: Optional[float] = None
class MLTrainingRun(NamedTuple):
run_id: str
model_name: str
learning_rate: float
epochs: int
started_at: datetime
accuracy: Optional[float] = None # Filled in after training
# Construction
config = DatabaseConfig(host="db.prod.internal", port=5432)
print(config.host) # "db.prod.internal"
print(config.ssl) # True (default)
# Convert to dict for JSON serialization
print(config._asdict())
# OrderedDict([('host', 'db.prod.internal'), ('port', 5432),
# ('database', 'appdb'), ('ssl', True), ('timeout', None)])
# Replace (returns a new instance - immutable)
staging_config = config._replace(host="db.staging.internal", ssl=False)
print(staging_config.host) # "db.staging.internal"
print(config.host) # "db.prod.internal" - original unchanged
# Type-safe-ish usage
run = MLTrainingRun(
run_id="run-001",
model_name="bert-base",
learning_rate=2e-5,
epochs=3,
started_at=datetime.now()
)
print(run.model_name) # "bert-base"
namedtuple vs dataclass vs dict - When to Use Each
# namedtuple / NamedTuple: immutable record, tuple-compatible, memory-efficient
# Use for: return values from functions, configs, data records that won't change
Point = namedtuple('Point', ['x', 'y'])
# dataclass: mutable record, class-like, supports methods and post_init
from dataclasses import dataclass
@dataclass
class MutablePoint:
x: float
y: float
def distance_to_origin(self):
return (self.x**2 + self.y**2) ** 0.5
# dict: dynamic, flexible, but no attribute access, more memory
d = {'x': 1.0, 'y': 2.0}
| Feature | namedtuple | NamedTuple | dataclass | dict |
|---|---|---|---|---|
| Immutable | Yes | Yes | Optional | No |
| Type hints | No | Yes | Yes | No |
| Defaults | Limited | Yes | Yes | N/A |
| Tuple API | Yes | Yes | No | No |
| Memory | Low | Low | Medium | High |
| Hashable | Yes | Yes | Frozen only | No |
Part 7 - collections.ChainMap
The Problem: Layered Configuration
Real applications have layered configuration: command-line args override environment variables, which override config file values, which override defaults. Managing this with manual dict merging creates bugs:
# The naive approach loses priority when same key appears
defaults = {'debug': False, 'timeout': 30, 'retries': 3}
env_vars = {'timeout': 60}
cli_args = {'debug': True}
# Wrong: dict merge loses override semantics if order changes
merged = {**defaults, **env_vars, **cli_args}
print(merged) # {'debug': True, 'timeout': 60, 'retries': 3}
# This works, but updating a layer requires re-merging everything
ChainMap creates a live view over multiple dicts without merging. Lookups walk the chain in order; writes go only to the first map:
from collections import ChainMap
defaults = {'debug': False, 'timeout': 30, 'retries': 3, 'log_level': 'INFO'}
env_vars = {'timeout': 60, 'log_level': 'WARNING'}
cli_args = {'debug': True}
# First map has highest priority
config = ChainMap(cli_args, env_vars, defaults)
# Lookup walks the chain: cli_args → env_vars → defaults
print(config['debug']) # True (from cli_args)
print(config['timeout']) # 60 (from env_vars)
print(config['retries']) # 3 (from defaults)
print(config['log_level']) # 'WARNING' (from env_vars, shadows defaults)
# Write goes to the FIRST map only
config['timeout'] = 90
print(cli_args) # {'debug': True, 'timeout': 90} ← written here
print(env_vars) # {'timeout': 60, ...} ← unchanged
# Add a new layer at highest priority
override = {'retries': 5}
higher_priority = config.new_child(override)
print(higher_priority['retries']) # 5
print(config['retries']) # 3 - original chain unchanged
ChainMap for Python Scope Simulation
Python itself uses ChainMap semantics for variable lookup: local scope → enclosing scope → global scope → builtins. You can implement a simple interpreter scope with it:
from collections import ChainMap
class Scope:
"""Simulate nested variable scopes using ChainMap."""
def __init__(self, parent=None):
self._local = {}
if parent:
self._chain = ChainMap(self._local, parent._chain)
else:
self._chain = ChainMap(self._local)
def set(self, name, value):
self._local[name] = value # Always writes to local scope
def get(self, name):
return self._chain[name] # Walks chain: local → parent → ...
# Usage
global_scope = Scope()
global_scope.set('x', 10)
global_scope.set('PI', 3.14159)
local_scope = Scope(parent=global_scope)
local_scope.set('x', 99) # Shadows global x
print(local_scope.get('x')) # 99 (local)
print(local_scope.get('PI')) # 3.14159 (from global)
print(global_scope.get('x')) # 10 (global unchanged)
:::note ChainMap is a View, Not a Copy
ChainMap does not copy the underlying dicts. Changes to the original dicts are immediately visible through the ChainMap. This is the key difference from {**d1, **d2} which creates a snapshot at merge time.
:::
Part 8 - The Complete collections Module Overview
Python's collections module contains 9 main tools. Here is the full reference with when to use each:
from collections import (
deque, # Doubly-ended queue with O(1) at both ends
OrderedDict, # Dict with move_to_end() - LRU cache, order-sensitive ops
defaultdict, # Dict with automatic default factory - grouping, counting
Counter, # Frequency counting dict - most_common, arithmetic ops
namedtuple, # Factory for lightweight immutable tuple subclasses
ChainMap, # Live view over multiple dicts - config layering, scopes
UserDict, # Subclassable dict wrapper - extend dict safely
UserList, # Subclassable list wrapper - extend list safely
UserString, # Subclassable string wrapper - rarely needed
)
# Also from heapq (not collections but complementary):
import heapq # Min-heap: priority queues, top-k, Dijkstra
When to Reach for Each
| Your Use Case | Reach For |
|---|---|
| FIFO queue, BFS, task scheduling | deque |
| Front insertions in a loop | deque (vs O(n²) list) |
| Sliding window (fixed size) | deque(maxlen=k) |
| Rotating buffer, last-N events | deque(maxlen=k) |
| LRU cache pattern, order-sensitive | OrderedDict |
| Grouping by key (one-to-many) | defaultdict(list) |
| Counting / frequency analysis | Counter |
| Top-k most frequent | Counter.most_common(k) |
| Set operations on frequencies | Counter (+, -, &, |) |
| Immutable record, return type | namedtuple / NamedTuple |
| Typed immutable record with defaults | typing.NamedTuple |
| Layered config (CLI > env > defaults) | ChainMap |
| Simulating scopes / overlays | ChainMap |
| Extend dict with custom behavior | UserDict (not dict subclass) |
| Priority queue, min/max tracking | heapq |
Part 9 - Production Examples
Example 1: Real-Time Log Monitor with Bounded Buffer
from collections import deque
from datetime import datetime
class LogMonitor:
"""
Maintains a rolling window of the last N log entries.
Automatically evicts old entries. O(1) per append.
"""
def __init__(self, max_entries: int = 1000):
self.buffer = deque(maxlen=max_entries)
self.error_count = 0
def record(self, level: str, message: str):
entry = {
'ts': datetime.utcnow().isoformat(),
'level': level,
'msg': message
}
self.buffer.append(entry)
if level == 'ERROR':
self.error_count += 1
def recent_errors(self, n: int = 10):
return [e for e in self.buffer if e['level'] == 'ERROR'][-n:]
def snapshot(self):
return list(self.buffer) # O(n) copy for export
monitor = LogMonitor(max_entries=100)
monitor.record('INFO', 'Server started')
monitor.record('ERROR', 'DB connection timeout')
monitor.record('WARNING', 'High memory usage')
print(len(monitor.buffer)) # 3
print(monitor.recent_errors())
# [{'ts': '...', 'level': 'ERROR', 'msg': 'DB connection timeout'}]
Example 2: Configuration System with ChainMap
from collections import ChainMap
import os
def build_config():
"""
Three-layer config: defaults < environment < explicit overrides.
Live view - env changes reflected immediately.
"""
defaults = {
'host': 'localhost',
'port': 8080,
'workers': 4,
'debug': False,
'log_level': 'INFO',
}
# Read from environment (simulated here)
env = {
k.lower().replace('app_', ''): v
for k, v in os.environ.items()
if k.startswith('APP_')
}
overrides = {} # Populated by CLI parser in real app
return ChainMap(overrides, env, defaults)
config = build_config()
print(config['port']) # 8080 (from defaults, unless APP_PORT set in env)
print(config['workers']) # 4
# Temporarily override for a test without mutating env or defaults
test_config = config.new_child({'debug': True, 'workers': 1})
print(test_config['debug']) # True
print(config['debug']) # False - original unchanged
Example 3: Flight Booking Record with NamedTuple
from typing import NamedTuple, Optional
from datetime import datetime
class Flight(NamedTuple):
flight_id: str
origin: str
destination: str
departure: datetime
seats_available: int
price_usd: float
airline: str
duration_minutes: int
operated_by: Optional[str] = None # Code-share partner
# Immutable, memory-efficient, hashable (usable in sets/dict keys)
f = Flight(
flight_id="AA123",
origin="SFO",
destination="JFK",
departure=datetime(2026, 6, 15, 10, 30),
seats_available=42,
price_usd=349.00,
airline="American Airlines",
duration_minutes=320
)
print(f.flight_id) # "AA123"
print(f.origin) # "SFO"
# Use in a set (hashable because it's a tuple)
seen_flights = {f}
print(f in seen_flights) # True
# Serialize easily
print(f._asdict())
Interview Questions
Q1: What is the internal structure of collections.deque, and why is appendleft() O(1)?
Answer: CPython's deque is implemented as a doubly-linked list of fixed-size blocks, where each block holds 64 pointers to Python objects. Unlike a plain list (which is a single contiguous array), deque maintains a leftblock and rightblock pointer. appendleft() writes to the current leftindex of the leftblock. When the leftblock is full, a new block is allocated and linked in - a constant-time operation. No existing data is shifted. This contrasts with list.insert(0, x), which must copy all n existing pointers one position right before writing - O(n).
Q2: When would you still use OrderedDict in Python 3.9+, given that plain dict preserves insertion order?
Answer: For two reasons: (1) OrderedDict.move_to_end(key, last=True/False) moves a key to either end in O(1) - this is the core primitive for implementing LRU caches efficiently. Plain dict has no equivalent method. (2) OrderedDict.__eq__ is order-sensitive: two OrderedDicts are equal only if both the keys/values AND their order match. Plain dict.__eq__ is order-independent. Use OrderedDict when you implement LRU cache, when equality must be order-sensitive, or when you need explicit move_to_end control.
Q3: What is the difference between namedtuple and typing.NamedTuple, and when should you use each vs a dataclass?
Answer: Both produce immutable, tuple-based classes with named attribute access. namedtuple uses factory-function syntax; typing.NamedTuple uses class syntax with PEP 526 type annotations and supports default values per field. The key difference from dataclass: namedtuples ARE tuples (tuple unpacking, indexing, and hashing work); dataclasses are not tuples. Use namedtuple/NamedTuple when you need immutability, hashability, tuple compatibility (e.g., dict keys, set membership), or minimal memory footprint (no __dict__). Use dataclass when you need mutable fields, custom methods, or __post_init__ validation.
Q4: Explain how ChainMap differs from {**d1, **d2} merging. When does that difference matter?
Answer: {**d1, **d2} creates a copy - a new dict with all key-value pairs at merge time. Future changes to d1 or d2 are not reflected in the merged dict. ChainMap(d1, d2) creates a live view - lookups walk d1 then d2 at access time; no copy is made. Changes to d1 or d2 are immediately visible. ChainMap also makes it easy to add/remove layers with new_child() and parents. The difference matters for configuration systems where environment variables or command-line args can change at runtime, and for implementing scope chains (variables, templates, shells) where layers need to be added/removed independently.
Q5: How would you implement a sliding window maximum in O(n) time using deque?
Answer: Use a deque to maintain a monotonic decreasing queue of indices. For each new element: (1) Remove from the front any indices that have left the current window (index ≤ i - k). (2) Remove from the back any indices whose values are less than the current value - they can never be the window maximum while the current element is in scope. (3) Append the current index. The front of the deque is always the index of the current window's maximum. Each index is pushed and popped at most once, so total work is O(n). The naive approach of calling max() on each window slice is O(nk).
Q6: What happens when you call appendleft() on a deque(maxlen=5) that is already full?
Answer: The rightmost element is automatically discarded to make room. For a full deque [1, 2, 3, 4, 5] (left to right), appendleft(0) results in [0, 1, 2, 3, 4] - 5 is dropped from the right end. This is symmetric: append() on a full deque discards the leftmost element. This behavior is what makes deque(maxlen=k) a natural circular buffer - elements rotate out automatically with zero manual management. The operation is O(1) regardless of maxlen.
Practice Challenges
Level 1 - Predict the Output
What does this print?
from collections import deque
dq = deque([1, 2, 3, 4, 5], maxlen=5)
dq.append(6)
dq.appendleft(0)
print(list(dq))
print(len(dq))
Solution
Output:
[0, 2, 3, 4, 5]
5
After dq.append(6): [2, 3, 4, 5, 6] - 1 is evicted from the left (maxlen=5 is full, right append evicts leftmost).
After dq.appendleft(0): [0, 2, 3, 4, 5] - 6 is evicted from the right (left append evicts rightmost).
len(dq) is 5 - maxlen constrains the size.
Level 2 - Implement Last-N Unique URLs
Build a data structure that:
- Tracks the last N distinct URLs visited by a user
- When a URL is visited again, it moves to the most recent position (no duplicates)
- Reports the last-visited N URLs in order from oldest to newest recent
# Example:
tracker = RecentURLs(maxlen=3)
tracker.visit("google.com")
tracker.visit("github.com")
tracker.visit("google.com") # already seen - should move to most recent
tracker.visit("python.org")
print(tracker.recent())
# Expected: ["github.com", "google.com", "python.org"]
# google.com moved to end when revisited; github.com evicted when python.org added
Solution
from collections import deque, OrderedDict
class RecentURLs:
"""
Tracks last N distinct URLs.
Visit is O(1) amortized.
Uses OrderedDict for O(1) move_to_end.
"""
def __init__(self, maxlen: int):
self.maxlen = maxlen
self._seen = OrderedDict() # key=url, val=None; ordered by recency
def visit(self, url: str):
if url in self._seen:
self._seen.move_to_end(url) # O(1) - move to most recent
else:
self._seen[url] = None
if len(self._seen) > self.maxlen:
self._seen.popitem(last=False) # Evict least recently visited
def recent(self):
return list(self._seen.keys()) # Oldest to most recent
tracker = RecentURLs(maxlen=3)
tracker.visit("google.com")
tracker.visit("github.com")
tracker.visit("google.com")
tracker.visit("python.org")
print(tracker.recent())
# ["github.com", "google.com", "python.org"]
Why OrderedDict over deque here: We need O(1) lookup (to detect revisits) AND O(1) move-to-end. A deque doesn't support O(1) lookup; a plain dict doesn't support move_to_end. OrderedDict provides both.
Level 3 - Implement a Task Scheduler with Priority and FIFO Fallback
Design a task scheduler that:
- Processes tasks in priority order (lower number = higher priority)
- Among tasks of equal priority, processes them FIFO (first added, first processed)
- Supports
add_task(name, priority)andnext_task()operations next_task()must return tasks in the correct order
scheduler = TaskScheduler()
scheduler.add_task("Deploy infra", priority=1)
scheduler.add_task("Run tests", priority=2)
scheduler.add_task("Send report", priority=2)
scheduler.add_task("Hotfix prod", priority=1)
while scheduler.has_tasks():
print(scheduler.next_task())
# Expected output:
# Deploy infra (priority 1, added first)
# Hotfix prod (priority 1, added second)
# Run tests (priority 2, added first)
# Send report (priority 2, added second)
Solution
import heapq
from collections import defaultdict
from typing import Optional
class TaskScheduler:
"""
Priority queue with FIFO tiebreaking.
Uses heapq for O(log n) push/pop with priority ordering.
Sequence counter breaks ties - lower counter = added earlier.
Structure: heap of (priority, sequence, task_name)
"""
def __init__(self):
self._heap = [] # min-heap: (priority, seq, name)
self._counter = 0 # Monotonically increasing for FIFO
def add_task(self, name: str, priority: int):
heapq.heappush(self._heap, (priority, self._counter, name))
self._counter += 1 # O(log n) push
def next_task(self) -> Optional[str]:
if not self._heap:
return None
priority, seq, name = heapq.heappop(self._heap) # O(log n)
return name
def has_tasks(self) -> bool:
return bool(self._heap)
scheduler = TaskScheduler()
scheduler.add_task("Deploy infra", priority=1)
scheduler.add_task("Run tests", priority=2)
scheduler.add_task("Send report", priority=2)
scheduler.add_task("Hotfix prod", priority=1)
while scheduler.has_tasks():
print(scheduler.next_task())
# Output:
# Deploy infra
# Hotfix prod
# Run tests
# Send report
Why this works: The heap sorts by (priority, sequence, name). Tasks with priority 1 come before priority 2. Among tasks with the same priority, the lower sequence (earlier insertion) comes first - giving FIFO semantics within each priority level.
Alternative with defaultdict(deque): Maintain defaultdict(deque) keyed by priority + a sorted(priorities) sorted set. Gets complex; heap approach is simpler and has better complexity.
Quick Reference
| Tool | Import | Key Operations | When to Use |
|---|---|---|---|
deque | from collections import deque | appendleft, popleft, append, pop, rotate | BFS queues, sliding windows, front insertions |
deque(maxlen=k) | same | auto-evicts on full | Rotating buffers, last-N events, sliding window |
OrderedDict | from collections import OrderedDict | move_to_end, popitem(last=False/True) | LRU cache, order-sensitive equality |
namedtuple | from collections import namedtuple | ._asdict(), ._replace(), ._fields | Immutable records, return types |
NamedTuple | from typing import NamedTuple | Same + type hints + defaults | Typed immutable records |
defaultdict | from collections import defaultdict | auto-initializes missing keys | Grouping, counting, adjacency lists |
Counter | from collections import Counter | .most_common(n), +, -, &, ` | ` |
ChainMap | from collections import ChainMap | .new_child(), .parents | Config layering, scope chains |
heapq | import heapq | heappush, heappop, nlargest, nsmallest | Priority queues, top-k |
Key Takeaways
dequeis a doubly-linked list of 64-element blocks - O(1) at both ends, O(n) for random access; use it any time you need front insertions or a FIFO queuelist.insert(0, x)is O(n) and causes O(n²) behavior in loops - always preferdeque.appendleft()for front-insertion workloadsdeque(maxlen=k)is a self-managing circular buffer - the most concise way to implement sliding windows and last-N monitorsOrderedDict.move_to_end()is the key primitive for O(1) LRU cache; plaindictin Python 3.7+ preserves order but lacks this methodtyping.NamedTupleproduces immutable, hashable, memory-efficient records with full type annotations - prefer over bare tuples for any structured dataChainMapcreates a live layered view over multiple dicts - the natural tool for configuration systems where CLI > env > defaults- Every tool in the
collectionsmodule solves a specific, common performance problem; knowing which tool to reach for, and why, is what separates engineering from scripting
