Deque and Collections - Specialized Data Structures for Performance

Reading time: ~20 minutes | Level: Foundation → Engineering

Here is a benchmark that surprises most Python developers.

import time
from collections import deque

n = 200_000

# Front-insert 200k items into a list
lst = []
start = time.time()
for i in range(n):
    lst.insert(0, i)
list_time = time.time() - start

# Front-insert 200k items into a deque
dq = deque()
start = time.time()
for i in range(n):
    dq.appendleft(i)
deque_time = time.time() - start

print(f"list.insert(0):    {list_time:.3f}s")
print(f"deque.appendleft(): {deque_time:.4f}s")
print(f"Speedup: {list_time / deque_time:.0f}x")

Actual output on a modern machine:

list.insert(0):    2.847s
deque.appendleft(): 0.0142s
Speedup: 200x

Two hundred times faster - same logical operation, different data structure.

Understanding why this happens, and knowing when each structure in the collections module applies, is the difference between code that works and code that scales.

What You Will Learn

How CPython implements deque as a doubly-linked list of fixed-size memory blocks - not a plain linked list
Why deque guarantees O(1) at both ends while list.insert(0) is always O(n)
How deque(maxlen=k) implements a space-efficient rotating buffer for sliding windows
When to reach for OrderedDict - and its enduring usefulness for LRU cache patterns
How namedtuple and typing.NamedTuple create lightweight, typed, immutable records
How ChainMap implements Python's own variable-lookup scoping rules
The complete collections module overview - what exists and exactly when to use each tool
Production patterns: BFS queues, sliding window max, configuration layering, LRU cache from scratch

Prerequisites

Python lists and their internal structure (Lesson 01 - Lists Internals)
Python dictionaries and hash tables (Lesson 06 - Dictionaries Deep Dive)
Basic Big-O notation (understand O(1) vs O(n))
Familiarity with Python classes and dataclasses helps for the NamedTuple section

Part 1 - Why Lists Fail at the Front

The O(n) Wall

From Lesson 01, you know that a Python list stores elements as an array of pointers in contiguous memory:

With n elements, that is n pointer copies - O(n). For a list with 200,000 elements, that is 200,000 copies per insert. Over 200,000 insertions, total work is 200,000 × 200,000 / 2 = 20 billion operations - O(n²) total.

This is why the list benchmark above took almost 3 seconds.

Part 2 - Deque Internal Architecture

Not a Simple Linked List

The intuitive fix for O(n) front insertion is a doubly-linked list - each node points to the previous and next node, so insertion at either end is O(1). But naive linked lists have poor cache performance: every node is a separate heap allocation, and traversal jumps across memory.

CPython's deque is smarter. It uses a doubly-linked list of fixed-size blocks, each block holding 64 pointers:

Each block: 64 PyObject* pointers = 64 x 8 = 512 bytes

Key design decisions:

Property	Value	Benefit
Block size	64 pointers	Cache-line friendly; fits in L1/L2 cache
Block linkage	Doubly-linked	O(1) block add/remove at either end
Append within block	Write to next slot	O(1), no allocation
Append across blocks	Allocate new block, link it	O(1), rare
Random access `dq[i]`	Walk blocks	O(n) - not O(1)

O(1) at Both Ends

appendleft and popleft write or read from the left edge of the leftmost block. If that block is exhausted, a new block is allocated and linked in - constant time:

appendleft(99) - space exists in leftblock:
Block 0: [_, _, ptr0, ptr1, ...ptr63]
                ↑ leftindex
After:   [_, ptr99, ptr0, ptr1, ...ptr63]
                    ↑ leftindex moves left

O(1) - no shifting of existing data.

This is fundamentally different from the list model. No shifting ever happens.

:::tip Why Not Just Use a Linked List? A pure linked list (each node is a separate heap object) has O(1) insertions but terrible cache performance - every .next access is a cache miss. Deque's 64-element blocks keep related data together in cache, giving the best of both worlds: O(1) operations with cache-friendly block layout. :::

Part 3 - Deque Operations and When to Use Each

Full API with Complexity

from collections import deque

dq = deque([1, 2, 3, 4, 5])

# O(1) operations - use freely
dq.append(6)          # Add to right end:      [1, 2, 3, 4, 5, 6]
dq.appendleft(0)      # Add to left end:       [0, 1, 2, 3, 4, 5, 6]
dq.pop()              # Remove from right: 6,  [0, 1, 2, 3, 4, 5]
dq.popleft()          # Remove from left:  0,  [1, 2, 3, 4, 5]
len(dq)               # Length:                 5   (O(1))

# O(n) operations - use sparingly
dq[2]                 # Index access:           3   (O(n) - walks blocks)
dq.rotate(2)          # Rotate right 2 steps:  [4, 5, 1, 2, 3]
dq.rotate(-1)         # Rotate left 1 step:    [5, 1, 2, 3, 4]
dq.extend([7, 8])     # Extend right:          [5, 1, 2, 3, 4, 7, 8]
dq.extendleft([9, 10])# Extend left (reversed):[10, 9, 5, 1, 2, 3, 4, 7, 8]
dq.remove(9)          # Remove first occurrence of 9 - O(n)
dq.count(1)           # Count occurrences - O(n)

# Convert to list when you need indexing
as_list = list(dq)    # O(n) copy

:::warning Deque Random Access is O(n) dq[i] is NOT O(1). It walks the block chain from whichever end is closer. If you need frequent random access, use a list. Deque is optimized for ends only. :::

Deque as a Queue (FIFO)

from collections import deque

# BFS - canonical use of deque as a FIFO queue
def bfs(graph, start):
    visited = set()
    queue = deque([start])   # Initialize with start node
    visited.add(start)
    order = []

    while queue:
        node = queue.popleft()   # O(1) - dequeue from front
        order.append(node)

        for neighbor in graph[node]:
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append(neighbor)  # O(1) - enqueue at back

    return order

graph = {
    'A': ['B', 'C'],
    'B': ['D', 'E'],
    'C': ['F'],
    'D': [], 'E': [], 'F': []
}

print(bfs(graph, 'A'))
# Output: ['A', 'B', 'C', 'D', 'E', 'F']

Using list.pop(0) instead of deque.popleft() would make this O(n²) in the total number of nodes - a known performance anti-pattern in graph algorithms.

Deque as a Stack (LIFO)

from collections import deque

# Deque as stack - though list works equally well for pure stacks
stack = deque()
stack.append(10)     # push
stack.append(20)
stack.append(30)
print(stack.pop())   # 30
print(stack.pop())   # 20

# For pure LIFO, list is equally fast and more readable:
# stack = []
# stack.append(x) / stack.pop()
# Use deque when you also need popleft (e.g., sliding window with two-pointer)

:::note List vs Deque for Stacks For pure LIFO stacks, list with append()/pop() is just as fast as deque and more memory-efficient. Reach for deque when you need both ends - queues, sliding windows, or bidirectional traversal. :::

Part 4 - deque(maxlen): Rotating Buffer and Sliding Window

The Bounded Deque

Pass maxlen=k to create a deque that never exceeds k elements. When a new element is appended at the right and the deque is full, the leftmost element is automatically discarded - and vice versa for appendleft:

from collections import deque

# Last-5-events monitor
recent = deque(maxlen=5)

for i in range(10):
    recent.append(i)
    print(list(recent))

# Output:
# [0]
# [0, 1]
# [0, 1, 2]
# [0, 1, 2, 3]
# [0, 1, 2, 3, 4]
# [1, 2, 3, 4, 5]   ← 0 evicted automatically
# [2, 3, 4, 5, 6]
# [3, 4, 5, 6, 7]
# [4, 5, 6, 7, 8]
# [5, 6, 7, 8, 9]

This is a circular buffer implemented in O(1) per operation, with zero manual bookkeeping.

Real-World: Sliding Window Sum

A common interview and production problem - compute the sum of the last k elements as a stream arrives:

from collections import deque

def sliding_window_sum(stream, k):
    """
    Compute sum of last k elements in a data stream.
    Time: O(1) per element (amortized)
    Space: O(k)
    """
    window = deque(maxlen=k)
    window_sum = 0
    results = []

    for value in stream:
        if len(window) == k:
            window_sum -= window[0]   # Subtract the element that will be evicted
        window.append(value)
        window_sum += value
        if len(window) == k:
            results.append(window_sum)

    return results

sensor_data = [4, 2, 7, 1, 9, 3, 6, 8]
print(sliding_window_sum(sensor_data, k=3))
# Output: [13, 10, 17, 13, 18, 17]
# (4+2+7=13), (2+7+1=10), (7+1+9=17), (1+9+3=13), (9+3+6=18), (3+6+8=17)

Sliding Window Maximum (Classic Algorithm)

The harder version - find the maximum in each window. Naive O(nk) becomes O(n) with a monotonic deque:

from collections import deque

def sliding_window_max(nums, k):
    """
    Sliding window maximum using a monotonic deque.
    Maintains indices of "useful" elements - decreasing by value.
    Time: O(n) - each element pushed/popped at most once.
    Space: O(k)
    """
    dq = deque()   # stores indices, not values
    result = []

    for i, val in enumerate(nums):
        # Remove indices that have left the window
        while dq and dq[0] <= i - k:
            dq.popleft()

        # Remove indices whose values are smaller than current
        # (they can never be the max while current element is in window)
        while dq and nums[dq[-1]] < val:
            dq.pop()

        dq.append(i)

        # Window is fully formed starting at index k-1
        if i >= k - 1:
            result.append(nums[dq[0]])  # Front is always the max index

    return result

nums = [1, 3, -1, -3, 5, 3, 6, 7]
print(sliding_window_max(nums, k=3))
# Output: [3, 3, 5, 5, 6, 7]

# Trace:
# Window [1,3,-1]  → max 3  (dq: [1] - index of 3)
# Window [3,-1,-3] → max 3  (dq: [1])
# Window [-1,-3,5] → max 5  (dq: [4] - index of 5)
# Window [-3,5,3]  → max 5  (dq: [4])
# Window [5,3,6]   → max 6  (dq: [6])
# Window [3,6,7]   → max 7  (dq: [7])

:::tip Why This is O(n) and Not O(nk) Each index is pushed into the deque exactly once and popped at most once. Over all n elements, total push+pop work is O(n), regardless of window size k. The naive approach of calling max(window) on each step scans k elements per step - O(nk) total. :::

Part 5 - collections.OrderedDict

History: Why It Existed

Before Python 3.7, regular dict did not guarantee insertion-order. This created a real problem for any code that needed deterministic dict ordering - configuration serialization, protocol encoding, test reproducibility.

collections.OrderedDict was introduced in Python 2.7 to fill that gap. It maintains a doubly-linked list of keys alongside the hash table, guaranteeing insertion order.

Python 3.7+ made insertion-order preservation a language guarantee for regular dict. So does OrderedDict still matter?

Yes - For LRU Cache Pattern

OrderedDict has one critical method that plain dict lacks: move_to_end(key, last=True). This is O(1) and makes implementing an LRU (Least Recently Used) cache trivial:

from collections import OrderedDict

class LRUCache:
    """
    O(1) get and put using OrderedDict.
    Least recently used item is always at the left (front).
    Most recently used item is always at the right (end).
    """
    def __init__(self, capacity: int):
        self.capacity = capacity
        self.cache = OrderedDict()

    def get(self, key: int) -> int:
        if key not in self.cache:
            return -1
        # Move to end = mark as most recently used
        self.cache.move_to_end(key)
        return self.cache[key]

    def put(self, key: int, value: int) -> None:
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        if len(self.cache) > self.capacity:
            # Remove least recently used = leftmost item
            self.cache.popitem(last=False)

# Demo
lru = LRUCache(capacity=3)
lru.put(1, "one")
lru.put(2, "two")
lru.put(3, "three")
print(lru.get(1))     # "one" - key 1 moved to end (most recent)
lru.put(4, "four")    # key 2 evicted (least recently used)
print(lru.get(2))     # -1 - evicted
print(lru.get(3))     # "three"
print(lru.get(4))     # "four"

OrderedDict Equality Semantics

OrderedDict also differs from dict in equality comparison - order matters:

from collections import OrderedDict

d1 = OrderedDict([('a', 1), ('b', 2)])
d2 = OrderedDict([('b', 2), ('a', 1)])
plain = {'a': 1, 'b': 2}

print(d1 == d2)      # False - different order
print(d1 == plain)   # True - OrderedDict vs dict ignores order

Part 6 - collections.namedtuple and typing.NamedTuple

The Problem namedtuple Solves

Plain tuples are positional - index 0 means nothing by itself:

# Unclear - what does [0] mean?
point = (10.5, 20.3)
distance = point[0] ** 2 + point[1] ** 2

# Worse: easy to swap x and y
def process(point):
    lat, lng = point[0], point[1]   # Is it (lat, lng) or (lng, lat)?

namedtuple gives tuple semantics (immutable, hashable, memory-efficient) with attribute-style access:

from collections import namedtuple

# Define the named tuple class
Point = namedtuple('Point', ['x', 'y'])
GeoCoord = namedtuple('GeoCoord', ['lat', 'lng', 'altitude'])

# Create instances - positional or keyword
p = Point(x=10.5, y=20.3)
loc = GeoCoord(lat=37.7749, lng=-122.4194, altitude=52)

# Attribute access - clear intent
print(p.x)           # 10.5
print(loc.lat)       # 37.7749

# Still a tuple - all tuple operations work
print(p[0])          # 10.5
print(len(p))        # 2
x, y = p             # Unpacking works

# Immutable
# p.x = 99           # AttributeError - namedtuples are immutable

# Memory: same as tuple - no __dict__ overhead
import sys
print(sys.getsizeof(p))   # 56 bytes - same as (10.5, 20.3)
print(sys.getsizeof({'x': 10.5, 'y': 20.3}))  # 232 bytes - dict is much larger

typing.NamedTuple - The Modern Version

typing.NamedTuple uses class syntax, supports type annotations, and allows default values:

from typing import NamedTuple, Optional
from datetime import datetime

class DatabaseConfig(NamedTuple):
    host: str
    port: int = 5432                    # Default value
    database: str = "appdb"
    ssl: bool = True
    timeout: Optional[float] = None

class MLTrainingRun(NamedTuple):
    run_id: str
    model_name: str
    learning_rate: float
    epochs: int
    started_at: datetime
    accuracy: Optional[float] = None   # Filled in after training

# Construction
config = DatabaseConfig(host="db.prod.internal", port=5432)
print(config.host)       # "db.prod.internal"
print(config.ssl)        # True (default)

# Convert to dict for JSON serialization
print(config._asdict())
# OrderedDict([('host', 'db.prod.internal'), ('port', 5432),
#              ('database', 'appdb'), ('ssl', True), ('timeout', None)])

# Replace (returns a new instance - immutable)
staging_config = config._replace(host="db.staging.internal", ssl=False)
print(staging_config.host)  # "db.staging.internal"
print(config.host)          # "db.prod.internal" - original unchanged

# Type-safe-ish usage
run = MLTrainingRun(
    run_id="run-001",
    model_name="bert-base",
    learning_rate=2e-5,
    epochs=3,
    started_at=datetime.now()
)
print(run.model_name)    # "bert-base"

namedtuple vs dataclass vs dict - When to Use Each

# namedtuple / NamedTuple: immutable record, tuple-compatible, memory-efficient
# Use for: return values from functions, configs, data records that won't change
Point = namedtuple('Point', ['x', 'y'])

# dataclass: mutable record, class-like, supports methods and post_init
from dataclasses import dataclass
@dataclass
class MutablePoint:
    x: float
    y: float

    def distance_to_origin(self):
        return (self.x**2 + self.y**2) ** 0.5

# dict: dynamic, flexible, but no attribute access, more memory
d = {'x': 1.0, 'y': 2.0}

Feature	namedtuple	NamedTuple	dataclass	dict
Immutable	Yes	Yes	Optional	No
Type hints	No	Yes	Yes	No
Defaults	Limited	Yes	Yes	N/A
Tuple API	Yes	Yes	No	No
Memory	Low	Low	Medium	High
Hashable	Yes	Yes	Frozen only	No

Part 7 - collections.ChainMap

The Problem: Layered Configuration

Real applications have layered configuration: command-line args override environment variables, which override config file values, which override defaults. Managing this with manual dict merging creates bugs:

# The naive approach loses priority when same key appears
defaults = {'debug': False, 'timeout': 30, 'retries': 3}
env_vars = {'timeout': 60}
cli_args = {'debug': True}

# Wrong: dict merge loses override semantics if order changes
merged = {**defaults, **env_vars, **cli_args}
print(merged)  # {'debug': True, 'timeout': 60, 'retries': 3}
# This works, but updating a layer requires re-merging everything

ChainMap creates a live view over multiple dicts without merging. Lookups walk the chain in order; writes go only to the first map:

from collections import ChainMap

defaults = {'debug': False, 'timeout': 30, 'retries': 3, 'log_level': 'INFO'}
env_vars = {'timeout': 60, 'log_level': 'WARNING'}
cli_args = {'debug': True}

# First map has highest priority
config = ChainMap(cli_args, env_vars, defaults)

# Lookup walks the chain: cli_args → env_vars → defaults
print(config['debug'])      # True     (from cli_args)
print(config['timeout'])    # 60       (from env_vars)
print(config['retries'])    # 3        (from defaults)
print(config['log_level'])  # 'WARNING' (from env_vars, shadows defaults)

# Write goes to the FIRST map only
config['timeout'] = 90
print(cli_args)    # {'debug': True, 'timeout': 90}  ← written here
print(env_vars)    # {'timeout': 60, ...}              ← unchanged

# Add a new layer at highest priority
override = {'retries': 5}
higher_priority = config.new_child(override)
print(higher_priority['retries'])  # 5
print(config['retries'])           # 3  - original chain unchanged

ChainMap for Python Scope Simulation

Python itself uses ChainMap semantics for variable lookup: local scope → enclosing scope → global scope → builtins. You can implement a simple interpreter scope with it:

from collections import ChainMap

class Scope:
    """Simulate nested variable scopes using ChainMap."""
    def __init__(self, parent=None):
        self._local = {}
        if parent:
            self._chain = ChainMap(self._local, parent._chain)
        else:
            self._chain = ChainMap(self._local)

    def set(self, name, value):
        self._local[name] = value   # Always writes to local scope

    def get(self, name):
        return self._chain[name]    # Walks chain: local → parent → ...

# Usage
global_scope = Scope()
global_scope.set('x', 10)
global_scope.set('PI', 3.14159)

local_scope = Scope(parent=global_scope)
local_scope.set('x', 99)      # Shadows global x

print(local_scope.get('x'))   # 99   (local)
print(local_scope.get('PI'))  # 3.14159 (from global)
print(global_scope.get('x'))  # 10   (global unchanged)

:::note ChainMap is a View, Not a Copy ChainMap does not copy the underlying dicts. Changes to the original dicts are immediately visible through the ChainMap. This is the key difference from {**d1, **d2} which creates a snapshot at merge time. :::

Part 8 - The Complete collections Module Overview

Python's collections module contains 9 main tools. Here is the full reference with when to use each:

from collections import (
    deque,          # Doubly-ended queue with O(1) at both ends
    OrderedDict,    # Dict with move_to_end() - LRU cache, order-sensitive ops
    defaultdict,    # Dict with automatic default factory - grouping, counting
    Counter,        # Frequency counting dict - most_common, arithmetic ops
    namedtuple,     # Factory for lightweight immutable tuple subclasses
    ChainMap,       # Live view over multiple dicts - config layering, scopes
    UserDict,       # Subclassable dict wrapper - extend dict safely
    UserList,       # Subclassable list wrapper - extend list safely
    UserString,     # Subclassable string wrapper - rarely needed
)
# Also from heapq (not collections but complementary):
import heapq       # Min-heap: priority queues, top-k, Dijkstra

When to Reach for Each

Your Use Case	Reach For
FIFO queue, BFS, task scheduling	`deque`
Front insertions in a loop	`deque` (vs O(n²) list)
Sliding window (fixed size)	`deque(maxlen=k)`
Rotating buffer, last-N events	`deque(maxlen=k)`
LRU cache pattern, order-sensitive	`OrderedDict`
Grouping by key (one-to-many)	`defaultdict(list)`
Counting / frequency analysis	`Counter`
Top-k most frequent	`Counter.most_common(k)`
Set operations on frequencies	`Counter` (+, -, &, \|)
Immutable record, return type	`namedtuple` / `NamedTuple`
Typed immutable record with defaults	`typing.NamedTuple`
Layered config (CLI > env > defaults)	`ChainMap`
Simulating scopes / overlays	`ChainMap`
Extend dict with custom behavior	`UserDict` (not dict subclass)
Priority queue, min/max tracking	`heapq`

Part 9 - Production Examples

Example 1: Real-Time Log Monitor with Bounded Buffer

from collections import deque
from datetime import datetime

class LogMonitor:
    """
    Maintains a rolling window of the last N log entries.
    Automatically evicts old entries. O(1) per append.
    """
    def __init__(self, max_entries: int = 1000):
        self.buffer = deque(maxlen=max_entries)
        self.error_count = 0

    def record(self, level: str, message: str):
        entry = {
            'ts': datetime.utcnow().isoformat(),
            'level': level,
            'msg': message
        }
        self.buffer.append(entry)
        if level == 'ERROR':
            self.error_count += 1

    def recent_errors(self, n: int = 10):
        return [e for e in self.buffer if e['level'] == 'ERROR'][-n:]

    def snapshot(self):
        return list(self.buffer)  # O(n) copy for export

monitor = LogMonitor(max_entries=100)
monitor.record('INFO', 'Server started')
monitor.record('ERROR', 'DB connection timeout')
monitor.record('WARNING', 'High memory usage')
print(len(monitor.buffer))   # 3
print(monitor.recent_errors())
# [{'ts': '...', 'level': 'ERROR', 'msg': 'DB connection timeout'}]

Example 2: Configuration System with ChainMap

from collections import ChainMap
import os

def build_config():
    """
    Three-layer config: defaults < environment < explicit overrides.
    Live view - env changes reflected immediately.
    """
    defaults = {
        'host': 'localhost',
        'port': 8080,
        'workers': 4,
        'debug': False,
        'log_level': 'INFO',
    }

    # Read from environment (simulated here)
    env = {
        k.lower().replace('app_', ''): v
        for k, v in os.environ.items()
        if k.startswith('APP_')
    }

    overrides = {}  # Populated by CLI parser in real app

    return ChainMap(overrides, env, defaults)

config = build_config()
print(config['port'])      # 8080 (from defaults, unless APP_PORT set in env)
print(config['workers'])   # 4

# Temporarily override for a test without mutating env or defaults
test_config = config.new_child({'debug': True, 'workers': 1})
print(test_config['debug'])    # True
print(config['debug'])         # False - original unchanged

Example 3: Flight Booking Record with NamedTuple

from typing import NamedTuple, Optional
from datetime import datetime

class Flight(NamedTuple):
    flight_id: str
    origin: str
    destination: str
    departure: datetime
    seats_available: int
    price_usd: float
    airline: str
    duration_minutes: int
    operated_by: Optional[str] = None  # Code-share partner

# Immutable, memory-efficient, hashable (usable in sets/dict keys)
f = Flight(
    flight_id="AA123",
    origin="SFO",
    destination="JFK",
    departure=datetime(2026, 6, 15, 10, 30),
    seats_available=42,
    price_usd=349.00,
    airline="American Airlines",
    duration_minutes=320
)

print(f.flight_id)     # "AA123"
print(f.origin)        # "SFO"

# Use in a set (hashable because it's a tuple)
seen_flights = {f}
print(f in seen_flights)  # True

# Serialize easily
print(f._asdict())

Interview Questions

Q1: What is the internal structure of `collections.deque`, and why is `appendleft()` O(1)?

Answer: CPython's deque is implemented as a doubly-linked list of fixed-size blocks, where each block holds 64 pointers to Python objects. Unlike a plain list (which is a single contiguous array), deque maintains a leftblock and rightblock pointer. appendleft() writes to the current leftindex of the leftblock. When the leftblock is full, a new block is allocated and linked in - a constant-time operation. No existing data is shifted. This contrasts with list.insert(0, x), which must copy all n existing pointers one position right before writing - O(n).

Q2: When would you still use `OrderedDict` in Python 3.9+, given that plain `dict` preserves insertion order?

Answer: For two reasons: (1) OrderedDict.move_to_end(key, last=True/False) moves a key to either end in O(1) - this is the core primitive for implementing LRU caches efficiently. Plain dict has no equivalent method. (2) OrderedDict.__eq__ is order-sensitive: two OrderedDicts are equal only if both the keys/values AND their order match. Plain dict.__eq__ is order-independent. Use OrderedDict when you implement LRU cache, when equality must be order-sensitive, or when you need explicit move_to_end control.

Q3: What is the difference between `namedtuple` and `typing.NamedTuple`, and when should you use each vs a `dataclass`?

Answer: Both produce immutable, tuple-based classes with named attribute access. namedtuple uses factory-function syntax; typing.NamedTuple uses class syntax with PEP 526 type annotations and supports default values per field. The key difference from dataclass: namedtuples ARE tuples (tuple unpacking, indexing, and hashing work); dataclasses are not tuples. Use namedtuple/NamedTuple when you need immutability, hashability, tuple compatibility (e.g., dict keys, set membership), or minimal memory footprint (no __dict__). Use dataclass when you need mutable fields, custom methods, or __post_init__ validation.

Q4: Explain how `ChainMap` differs from `{d1, d2}` merging. When does that difference matter?

Answer: {**d1, **d2} creates a copy - a new dict with all key-value pairs at merge time. Future changes to d1 or d2 are not reflected in the merged dict. ChainMap(d1, d2) creates a live view - lookups walk d1 then d2 at access time; no copy is made. Changes to d1 or d2 are immediately visible. ChainMap also makes it easy to add/remove layers with new_child() and parents. The difference matters for configuration systems where environment variables or command-line args can change at runtime, and for implementing scope chains (variables, templates, shells) where layers need to be added/removed independently.

Q5: How would you implement a sliding window maximum in O(n) time using deque?

Answer: Use a deque to maintain a monotonic decreasing queue of indices. For each new element: (1) Remove from the front any indices that have left the current window (index ≤ i - k). (2) Remove from the back any indices whose values are less than the current value - they can never be the window maximum while the current element is in scope. (3) Append the current index. The front of the deque is always the index of the current window's maximum. Each index is pushed and popped at most once, so total work is O(n). The naive approach of calling max() on each window slice is O(nk).

Q6: What happens when you call `appendleft()` on a `deque(maxlen=5)` that is already full?

Answer: The rightmost element is automatically discarded to make room. For a full deque [1, 2, 3, 4, 5] (left to right), appendleft(0) results in [0, 1, 2, 3, 4] - 5 is dropped from the right end. This is symmetric: append() on a full deque discards the leftmost element. This behavior is what makes deque(maxlen=k) a natural circular buffer - elements rotate out automatically with zero manual management. The operation is O(1) regardless of maxlen.

Practice Challenges

Level 1 - Predict the Output

What does this print?

from collections import deque

dq = deque([1, 2, 3, 4, 5], maxlen=5)
dq.append(6)
dq.appendleft(0)
print(list(dq))
print(len(dq))

Solution

Output:

[0, 2, 3, 4, 5]
5

After dq.append(6): [2, 3, 4, 5, 6] - 1 is evicted from the left (maxlen=5 is full, right append evicts leftmost).

After dq.appendleft(0): [0, 2, 3, 4, 5] - 6 is evicted from the right (left append evicts rightmost).

len(dq) is 5 - maxlen constrains the size.

Level 2 - Implement Last-N Unique URLs

Build a data structure that:

Tracks the last N distinct URLs visited by a user
When a URL is visited again, it moves to the most recent position (no duplicates)
Reports the last-visited N URLs in order from oldest to newest recent

# Example:
tracker = RecentURLs(maxlen=3)
tracker.visit("google.com")
tracker.visit("github.com")
tracker.visit("google.com")   # already seen - should move to most recent
tracker.visit("python.org")
print(tracker.recent())
# Expected: ["github.com", "google.com", "python.org"]
# google.com moved to end when revisited; github.com evicted when python.org added

Solution

from collections import deque, OrderedDict

class RecentURLs:
    """
    Tracks last N distinct URLs.
    Visit is O(1) amortized.
    Uses OrderedDict for O(1) move_to_end.
    """
    def __init__(self, maxlen: int):
        self.maxlen = maxlen
        self._seen = OrderedDict()   # key=url, val=None; ordered by recency

    def visit(self, url: str):
        if url in self._seen:
            self._seen.move_to_end(url)   # O(1) - move to most recent
        else:
            self._seen[url] = None
            if len(self._seen) > self.maxlen:
                self._seen.popitem(last=False)  # Evict least recently visited

    def recent(self):
        return list(self._seen.keys())   # Oldest to most recent

tracker = RecentURLs(maxlen=3)
tracker.visit("google.com")
tracker.visit("github.com")
tracker.visit("google.com")
tracker.visit("python.org")
print(tracker.recent())
# ["github.com", "google.com", "python.org"]

Why OrderedDict over deque here: We need O(1) lookup (to detect revisits) AND O(1) move-to-end. A deque doesn't support O(1) lookup; a plain dict doesn't support move_to_end. OrderedDict provides both.

Level 3 - Implement a Task Scheduler with Priority and FIFO Fallback

Design a task scheduler that:

Processes tasks in priority order (lower number = higher priority)
Among tasks of equal priority, processes them FIFO (first added, first processed)
Supports add_task(name, priority) and next_task() operations
next_task() must return tasks in the correct order

scheduler = TaskScheduler()
scheduler.add_task("Deploy infra", priority=1)
scheduler.add_task("Run tests", priority=2)
scheduler.add_task("Send report", priority=2)
scheduler.add_task("Hotfix prod", priority=1)

while scheduler.has_tasks():
    print(scheduler.next_task())

# Expected output:
# Deploy infra      (priority 1, added first)
# Hotfix prod       (priority 1, added second)
# Run tests         (priority 2, added first)
# Send report       (priority 2, added second)

Solution

import heapq
from collections import defaultdict
from typing import Optional

class TaskScheduler:
    """
    Priority queue with FIFO tiebreaking.

    Uses heapq for O(log n) push/pop with priority ordering.
    Sequence counter breaks ties - lower counter = added earlier.

    Structure: heap of (priority, sequence, task_name)
    """
    def __init__(self):
        self._heap = []           # min-heap: (priority, seq, name)
        self._counter = 0         # Monotonically increasing for FIFO

    def add_task(self, name: str, priority: int):
        heapq.heappush(self._heap, (priority, self._counter, name))
        self._counter += 1        # O(log n) push

    def next_task(self) -> Optional[str]:
        if not self._heap:
            return None
        priority, seq, name = heapq.heappop(self._heap)  # O(log n)
        return name

    def has_tasks(self) -> bool:
        return bool(self._heap)

scheduler = TaskScheduler()
scheduler.add_task("Deploy infra", priority=1)
scheduler.add_task("Run tests", priority=2)
scheduler.add_task("Send report", priority=2)
scheduler.add_task("Hotfix prod", priority=1)

while scheduler.has_tasks():
    print(scheduler.next_task())

# Output:
# Deploy infra
# Hotfix prod
# Run tests
# Send report

Why this works: The heap sorts by (priority, sequence, name). Tasks with priority 1 come before priority 2. Among tasks with the same priority, the lower sequence (earlier insertion) comes first - giving FIFO semantics within each priority level.

Alternative with defaultdict(deque): Maintain defaultdict(deque) keyed by priority + a sorted(priorities) sorted set. Gets complex; heap approach is simpler and has better complexity.

Quick Reference

Tool	Import	Key Operations	When to Use
`deque`	`from collections import deque`	`appendleft`, `popleft`, `append`, `pop`, `rotate`	BFS queues, sliding windows, front insertions
`deque(maxlen=k)`	same	auto-evicts on full	Rotating buffers, last-N events, sliding window
`OrderedDict`	`from collections import OrderedDict`	`move_to_end`, `popitem(last=False/True)`	LRU cache, order-sensitive equality
`namedtuple`	`from collections import namedtuple`	`._asdict()`, `._replace()`, `._fields`	Immutable records, return types
`NamedTuple`	`from typing import NamedTuple`	Same + type hints + defaults	Typed immutable records
`defaultdict`	`from collections import defaultdict`	auto-initializes missing keys	Grouping, counting, adjacency lists
`Counter`	`from collections import Counter`	`.most_common(n)`, `+`, `-`, `&`, `	`
`ChainMap`	`from collections import ChainMap`	`.new_child()`, `.parents`	Config layering, scope chains
`heapq`	`import heapq`	`heappush`, `heappop`, `nlargest`, `nsmallest`	Priority queues, top-k

Key Takeaways

deque is a doubly-linked list of 64-element blocks - O(1) at both ends, O(n) for random access; use it any time you need front insertions or a FIFO queue
list.insert(0, x) is O(n) and causes O(n²) behavior in loops - always prefer deque.appendleft() for front-insertion workloads
deque(maxlen=k) is a self-managing circular buffer - the most concise way to implement sliding windows and last-N monitors
OrderedDict.move_to_end() is the key primitive for O(1) LRU cache; plain dict in Python 3.7+ preserves order but lacks this method
typing.NamedTuple produces immutable, hashable, memory-efficient records with full type annotations - prefer over bare tuples for any structured data
ChainMap creates a live layered view over multiple dicts - the natural tool for configuration systems where CLI > env > defaults
Every tool in the collections module solves a specific, common performance problem; knowing which tool to reach for, and why, is what separates engineering from scripting

What You Will Learn​

Prerequisites​

Part 1 - Why Lists Fail at the Front​

The O(n) Wall​

Part 2 - Deque Internal Architecture​

Not a Simple Linked List​

O(1) at Both Ends​

Part 3 - Deque Operations and When to Use Each​

Full API with Complexity​

Deque as a Queue (FIFO)​

Deque as a Stack (LIFO)​

Part 4 - deque(maxlen): Rotating Buffer and Sliding Window​

The Bounded Deque​

Real-World: Sliding Window Sum​

Sliding Window Maximum (Classic Algorithm)​

Part 5 - collections.OrderedDict​

History: Why It Existed​

Yes - For LRU Cache Pattern​

OrderedDict Equality Semantics​

Part 6 - collections.namedtuple and typing.NamedTuple​

The Problem namedtuple Solves​

typing.NamedTuple - The Modern Version​

namedtuple vs dataclass vs dict - When to Use Each​

Part 7 - collections.ChainMap​

The Problem: Layered Configuration​

ChainMap for Python Scope Simulation​

Part 8 - The Complete collections Module Overview​

When to Reach for Each​

Part 9 - Production Examples​

Example 1: Real-Time Log Monitor with Bounded Buffer​

Example 2: Configuration System with ChainMap​

Example 3: Flight Booking Record with NamedTuple​

Interview Questions​

Q1: What is the internal structure of collections.deque, and why is appendleft() O(1)?​

Q2: When would you still use OrderedDict in Python 3.9+, given that plain dict preserves insertion order?​

Q3: What is the difference between namedtuple and typing.NamedTuple, and when should you use each vs a dataclass?​

Q4: Explain how ChainMap differs from {**d1, **d2} merging. When does that difference matter?​

Q5: How would you implement a sliding window maximum in O(n) time using deque?​

Q6: What happens when you call appendleft() on a deque(maxlen=5) that is already full?​

Practice Challenges​

Level 1 - Predict the Output​

Level 2 - Implement Last-N Unique URLs​

Level 3 - Implement a Task Scheduler with Priority and FIFO Fallback​

Quick Reference​

Key Takeaways​

What You Will Learn

Prerequisites

Part 1 - Why Lists Fail at the Front

The O(n) Wall

Part 2 - Deque Internal Architecture

Not a Simple Linked List

O(1) at Both Ends

Part 3 - Deque Operations and When to Use Each

Full API with Complexity

Deque as a Queue (FIFO)

Deque as a Stack (LIFO)

Part 4 - deque(maxlen): Rotating Buffer and Sliding Window

The Bounded Deque

Real-World: Sliding Window Sum

Sliding Window Maximum (Classic Algorithm)

Part 5 - collections.OrderedDict

History: Why It Existed

Yes - For LRU Cache Pattern

OrderedDict Equality Semantics

Part 6 - collections.namedtuple and typing.NamedTuple

The Problem namedtuple Solves

typing.NamedTuple - The Modern Version

namedtuple vs dataclass vs dict - When to Use Each

Part 7 - collections.ChainMap

The Problem: Layered Configuration

ChainMap for Python Scope Simulation

Part 8 - The Complete collections Module Overview

When to Reach for Each

Part 9 - Production Examples

Example 1: Real-Time Log Monitor with Bounded Buffer

Example 2: Configuration System with ChainMap

Example 3: Flight Booking Record with NamedTuple

Interview Questions

Q1: What is the internal structure of `collections.deque`, and why is `appendleft()` O(1)?

Q2: When would you still use `OrderedDict` in Python 3.9+, given that plain `dict` preserves insertion order?

Q3: What is the difference between `namedtuple` and `typing.NamedTuple`, and when should you use each vs a `dataclass`?

Q4: Explain how `ChainMap` differs from `{d1, d2}` merging. When does that difference matter?

Q5: How would you implement a sliding window maximum in O(n) time using deque?

Q6: What happens when you call `appendleft()` on a `deque(maxlen=5)` that is already full?

Practice Challenges

Level 1 - Predict the Output

Level 2 - Implement Last-N Unique URLs

Level 3 - Implement a Task Scheduler with Priority and FIFO Fallback

Quick Reference

Key Takeaways