Tuples and Immutability - Engineering with Stable Data
Reading time: ~20 minutes | Level: Foundation → Engineering
Here is a question that reveals whether you actually understand Python's type system.
t = ([1, 2], [3, 4])
t[0].append(99)
print(t)
print(hash(t))
Most engineers expect this to either succeed completely or fail completely. The actual output:
([1, 2, 99], [3, 4])
TypeError: unhashable type: 'list'
The tuple "changed" - and yet it is still immutable. The hash() call failed - and yet the tuple exists and its structure was never reassigned.
These two results expose a profound distinction: immutability is not the same as constancy, and hashability requires deep immutability. Understanding why is the foundation of reliable Python engineering.
What You Will Learn
- How CPython implements tuples as
PyTupleObject- the actual C struct and why it differs structurally fromPyListObject - Why creating a tuple literal is faster than creating a list - compile-time constant folding in the bytecode compiler
- The critical distinction between shallow immutability (the tuple's pointers cannot change) and deep constancy (nothing reachable can change)
- How
__hash__works, why-1is a forbidden hash value, and the exact conditions under which a tuple is unhashable - Named tuples with
collections.namedtupleandtyping.NamedTuple- what each generates and when to use each - Structural unpacking and
*splat unpacking patterns engineers actually use in production - The tuple-as-return-value pattern and why it is Python's idiomatic multiple-return mechanism
- When to choose tuple over list: semantic intent, API contracts, dict keys, and measurable memory savings
Prerequisites
- Python variables and name binding (Module 3, Lesson 4)
- Python lists internals (Module 4, Lesson 1)
- Basic understanding of Big-O notation and pointer arithmetic
Part 1 - The Mental Model: What a Tuple Actually Is
What Most Engineers Think
Most engineers describe a tuple as "a list you cannot modify." This is approximately correct but misses everything important about how tuples work and why they exist.
A more precise definition: a tuple is a fixed-length, ordered sequence of object references whose reference array is allocated once at creation time and can never be reallocated, extended, or modified.
The CPython Internal Structure
In CPython, lists are defined in listobject.h and tuples in tupleobject.h. The difference is architecturally significant:
/* PyListObject - dynamic array, designed to grow */
typedef struct {
PyObject_VAR_HEAD
PyObject **ob_item; /* pointer to the pointer array on heap - can be realloc'd */
Py_ssize_t allocated; /* total capacity, usually > ob_size */
} PyListObject;
/* PyTupleObject - fixed-length inline array */
typedef struct {
PyObject_VAR_HEAD
PyObject *ob_item[1]; /* flexible array member - pointers stored INSIDE the struct */
} PyTupleObject;
The structural difference is decisive. Here is what memory looks like for each:
Key observations:
- The list stores its pointer array in a separately heap-allocated block - this block can be
realloc'd to grow. ThePyListObjectstruct holds aPyObject **pointer to this block. - The tuple stores its pointers inline, inside the struct itself using a C99 flexible array member (
ob_item[1]). One allocation, zero possibility of realloc. - The list has an
allocatedfield (capacity, always>= ob_size). The tuple has no such field -ob_sizeis both the length and the total capacity simultaneously. - Because there is no realloc mechanism, tuples are structurally immutable - not by convention, but because there is no C code path to extend them.
Part 2 - Memory: Tuples Are Smaller Than Lists
import sys
# Same integer data, different containers
lst = [1, 2, 3, 4, 5]
tup = (1, 2, 3, 4, 5)
print(sys.getsizeof(lst)) # 104 bytes (64-bit CPython 3.11+)
print(sys.getsizeof(tup)) # 80 bytes
The breakdown explains the difference:
| Field | List (64-bit CPython) | Tuple (64-bit CPython) |
|---|---|---|
ob_refcnt | 8 bytes (PyObject_VAR_HEAD) | 8 bytes |
*ob_type | 8 bytes | 8 bytes |
ob_size | 8 bytes | 8 bytes |
allocated | 8 bytes (tuples have NO allocated field) | - |
*ob_item ptr | 8 bytes (pointer to separate block) | - |
ob_item[5] | - | 40 bytes (5 pointers x 8 bytes, inline in struct) |
| struct total | 40 bytes + separate pointer block (with over-alloc padding) | 72 bytes inline |
sys.getsizeof | ~104 bytes for 5 elements | ~80 bytes |
:::tip Memory Savings at Scale If you have 10 million coordinate pairs in a data processing pipeline stored as tuples instead of lists, you save roughly 240 MB in struct overhead alone - before accounting for over-allocation slack in lists. That determines whether your application fits in RAM. :::
Measuring the Difference
import sys
for n in range(0, 11, 2):
lst = list(range(n))
tup = tuple(range(n))
diff = sys.getsizeof(lst) - sys.getsizeof(tup)
print(f"n={n:2d}: list={sys.getsizeof(lst):4d}B tuple={sys.getsizeof(tup):4d}B saved={diff:+d}B")
# Output (64-bit CPython 3.11):
# n= 0: list= 56B tuple= 40B saved=+16B
# n= 2: list= 72B tuple= 56B saved=+16B
# n= 4: list= 88B tuple= 72B saved=+16B
# n= 6: list= 104B tuple= 88B saved=+16B
# n= 8: list= 120B tuple= 104B saved=+16B
# n=10: list= 152B tuple= 120B saved=+32B
The tuple grows by exactly 8 bytes per element - one pointer, perfectly linear. The list grows in jumps because of over-allocation. The gap widens as lists trigger reallocations into larger capacity blocks.
Part 3 - Why Tuple Creation Is Faster: Compile-Time Constants
import timeit
list_time = timeit.timeit("[1, 2, 3, 4, 5]", number=10_000_000)
tuple_time = timeit.timeit("(1, 2, 3, 4, 5)", number=10_000_000)
print(f"List: {list_time:.3f}s")
print(f"Tuple: {tuple_time:.3f}s")
# Typical output:
# List: 0.520s
# Tuple: 0.072s ← roughly 7x faster
The reason is visible in the bytecode:
import dis
print("=== [1, 2, 3] ===")
dis.dis(compile("[1, 2, 3]", "<string>", "eval"))
print("\n=== (1, 2, 3) ===")
dis.dis(compile("(1, 2, 3)", "<string>", "eval"))
=== [1, 2, 3] ===
1 0 LOAD_CONST 0 (1)
2 LOAD_CONST 1 (2)
4 LOAD_CONST 2 (3)
6 BUILD_LIST 3 ← allocates a new list object at RUNTIME
8 RETURN_VALUE
=== (1, 2, 3) ===
1 0 LOAD_CONST 0 ((1, 2, 3)) ← entire tuple is ONE pre-built constant
2 RETURN_VALUE
The Python bytecode compiler recognizes that (1, 2, 3) - a tuple whose elements are all compile-time constants - is itself a constant. It evaluates the tuple once at compile time and stores the result as a bytecode constant. Every execution is a single LOAD_CONST - no allocation, no pointer writes, just loading a pre-existing object.
[1, 2, 3] cannot be a compile-time constant because lists are mutable. If multiple parts of code shared the same list literal, mutating it in one place would corrupt all other users. So Python must create a fresh list on every execution.
:::note CPython Empty Tuple Singleton
CPython caches the empty tuple as a global singleton. Every expression () or call to tuple() that produces an empty tuple returns the exact same object - no allocation:
a = ()
b = ()
print(a is b) # True - identical object, not just equal
Functions that return () on their fast path pay zero allocation cost.
:::
Part 4 - Immutability Is Not Constancy: The Most Important Distinction
This is the lesson's central insight. Read it carefully.
t = ([1, 2], [3, 4])
# The tuple's pointer slots cannot be reassigned
t[0] = [99, 100] # TypeError: 'tuple' object does not support item assignment
# But the LIST that t[0] points to can still be mutated
t[0].append(99)
print(t) # ([1, 2, 99], [3, 4]) ← tuple "changed"!
The tuple guarantees that its internal reference array is immutable - you cannot make t[0] point to a different object. But the tuple makes no guarantee about the objects those references point to. If those objects are mutable, they can still be mutated through their own interface.
The pointers in the tuple are unchanged. The list object at the other end changed.
This is called shallow immutability. The tuple guarantees its own pointer slots are frozen; it makes no guarantees about what those pointers point to.
Deep immutability - where nothing reachable through the structure can change - requires that every element also be immutable: integers, strings, other tuples with only immutable contents, frozensets.
:::danger Hashability Consequence This distinction has a critical practical consequence. A tuple containing mutable objects cannot be hashed and therefore cannot be used as a dict key or set member:
t = ([1, 2], [3, 4])
hash(t) # TypeError: unhashable type: 'list'
# But this works - all elements are immutable
t2 = ((1, 2), (3, 4))
hash(t2) # Works: 3713082716806266542 (session-specific)
:::
Part 5 - Hashability: Why Dicts and Sets Need It
How __hash__ Works
Every Python object usable as a dict key or set member must implement __hash__. The contract:
- Objects that compare equal must have equal hashes:
a == bimplieshash(a) == hash(b) - A hash must be stable - it must not change during an object's lifetime
- Hash collisions are allowed - different objects can share the same hash value
CPython computes a tuple's hash by combining the hashes of all its elements using a mixing algorithm derived from xxHash:
# Simplified conceptual approximation of CPython's tuplehash()
def tuple_hash_conceptual(t):
acc = 0x27D4EB2F165667C5 # seeded with Py_HashSecret
for item in t:
item_hash = hash(item) # raises TypeError if item is unhashable
acc = acc ^ (item_hash * 0x517CC1B727220A95 + 1) # simplified mixing
result = acc if acc != -1 else -2 # -1 is reserved
return result
Why -1 Is a Forbidden Hash Return Value
In CPython, C functions return -1 to signal an error - it is the standard "exception was raised" sentinel for C-level hash computations. If a legitimate hash happened to equal -1, the interpreter would misinterpret it as an error. CPython solves this by mapping -1 to -2 before returning to Python:
class NegativeOneHash:
def __hash__(self):
return -1 # Python will store this as -2
obj = NegativeOneHash()
print(hash(obj)) # -2 (CPython transparently remapped -1 → -2)
Verifying Hashability
# All of these work - every element is hashable
hash((1, 2, 3)) # integers are hashable
hash(("alice", 42, 3.14)) # str, int, float all hashable
hash(((1, 2), (3, 4))) # nested tuples of ints - fine
hash((frozenset({1, 2}), "hello")) # frozenset is hashable
# All of these fail - unhashable element somewhere in the structure
hash((1, [2, 3])) # TypeError: unhashable type: 'list'
hash(({"key": "val"},)) # TypeError: unhashable type: 'dict'
hash(((1, [2]),)) # TypeError: list nested inside inner tuple
# Practical consequence: dict keys
cache = {}
cache[(42, "us-east-1")] = "data" # Works - both elements hashable
cache[(42, ["us-east-1"])] = "data" # TypeError at this line
Part 6 - Named Tuples: Tuples with Semantic Labels
Anonymous tuples like (42.0, -73.5) carry no information about meaning. Named tuples add field names while remaining fully tuple-compatible.
collections.namedtuple - The Classic Factory Function
from collections import namedtuple
# Define the type (generates a new class at runtime)
Point = namedtuple("Point", ["x", "y"])
p = Point(x=3.0, y=4.0)
# Access by name (readable) or by index (tuple-compatible)
print(p.x) # 3.0
print(p[0]) # 3.0 ← still a real tuple
print(p) # Point(x=3.0, y=4.0)
# Fully compatible with all tuple protocols
x, y = p # unpacking works
print(len(p)) # 2
print(hash(p)) # hashable - it is a tuple
# Useful class methods
p2 = p._replace(y=7.0) # returns NEW Point with y changed (immutable!)
print(p2) # Point(x=3.0, y=7.0)
print(p) # Point(x=3.0, y=4.0) ← original unchanged
# _asdict: convert to a regular dict
print(p._asdict()) # {'x': 3.0, 'y': 4.0}
# _fields: introspect field names
print(Point._fields) # ('x', 'y')
# _make: create from any iterable
p3 = Point._make([5.0, 6.0])
What does namedtuple generate? It dynamically creates a class that inherits from tuple. The named fields are property descriptors on the class that call self[index] - zero extra memory per instance:
typing.NamedTuple - The Modern Type-Safe Approach
from typing import NamedTuple
class HTTPResponse(NamedTuple):
status_code: int
body: str
headers: dict = {} # default values supported
is_cached: bool = False
# Construction - positional or keyword
r = HTTPResponse(status_code=200, body='{"ok": true}')
print(r)
# HTTPResponse(status_code=200, body='{"ok": true}', headers={}, is_cached=False)
# Named access
print(r.status_code) # 200
print(r.body) # '{"ok": true}'
# Still a real tuple
print(isinstance(r, tuple)) # True
print(r[0]) # 200
# Methods can be defined in the class body
class Vector(NamedTuple):
x: float
y: float
def magnitude(self) -> float:
return (self.x**2 + self.y**2) ** 0.5
def dot(self, other: "Vector") -> float:
return self.x * other.x + self.y * other.y
v = Vector(3.0, 4.0)
print(v.magnitude()) # 5.0
print(v.dot(Vector(1.0, 0.0))) # 3.0
When to Choose Each
| Scenario | Recommended |
|---|---|
| Simple records in new code | typing.NamedTuple |
| Need type checking (mypy/pyright) | typing.NamedTuple |
| Need default field values | typing.NamedTuple |
| Need methods on the record | typing.NamedTuple |
| Legacy code using string-based field names | collections.namedtuple |
| Dynamically generating field names at runtime | collections.namedtuple |
| Performance-critical hot loop, avoid property lookup | plain tuple with known indices |
Part 7 - Structural Unpacking and Splat Patterns
Basic Unpacking
point = (3.0, 4.0)
x, y = point # two-element unpack - must match exactly
rgb = (255, 128, 0)
r, g, b = rgb
# Nested unpacking
matrix_row = ((1, 2), (3, 4))
(a, b), (c, d) = matrix_row
print(a, b, c, d) # 1 2 3 4
# Unpacking in a for loop - idiomatic Python
points = [(0, 0), (1, 2), (3, 4)]
for x, y in points:
print(f"Point: ({x}, {y})")
Splat (*) Unpacking - Capturing Variable Lengths
first, *rest = (1, 2, 3, 4, 5)
print(first) # 1
print(rest) # [2, 3, 4, 5] ← rest is always a LIST, not a tuple
*init, last = (1, 2, 3, 4, 5)
print(init) # [1, 2, 3, 4]
print(last) # 5
first, *middle, last = (1, 2, 3, 4, 5)
print(first) # 1
print(middle) # [2, 3, 4]
print(last) # 5
# Practical: parse semantic version tuples
version = (3, 11, 4, "final")
major, minor, *rest = version
print(f"Python {major}.{minor}") # Python 3.11
:::note Starred Variable Is Always a List
*rest in an unpack expression always captures into a regular Python list, even when unpacking a tuple. This is by design: the starred variable captures a variable number of elements, so a mutable list is the appropriate container.
:::
Unpacking in Function Calls
def distance(x1, y1, x2, y2):
return ((x2 - x1)**2 + (y2 - y1)**2) ** 0.5
p1 = (0.0, 0.0)
p2 = (3.0, 4.0)
# With splat - clean and expressive
d = distance(*p1, *p2)
print(d) # 5.0
Swap Without a Temporary Variable
a, b = 10, 20
a, b = b, a # RHS packs into tuple (20, 10), then unpacks into a, b
print(a, b) # 20 10
# CPython actually optimizes the simple two-variable swap:
# It uses ROT_TWO bytecode - no heap tuple is constructed at all.
Part 8 - Tuple as Return Value: Python's Multiple Return Mechanism
Python functions return exactly one object. The idiomatic way to return multiple values is to return a tuple - Python's syntax makes this feel like native multiple returns:
import math
def polar_to_cartesian(r: float, theta: float) -> tuple[float, float]:
x = r * math.cos(theta)
y = r * math.sin(theta)
return x, y # implicitly packs into a tuple
# Unpack immediately
x, y = polar_to_cartesian(5.0, math.pi / 4)
print(f"x={x:.3f}, y={y:.3f}") # x=3.536, y=3.536
# Or keep as tuple
coords = polar_to_cartesian(5.0, math.pi / 4)
print(type(coords)) # <class 'tuple'>
print(coords) # (3.5355339059327378, 3.5355339059327373)
Using NamedTuple for Rich Return Types
When functions return more than two or three values, a named tuple beats a plain tuple for readability:
from typing import NamedTuple
class ModelMetrics(NamedTuple):
accuracy: float
precision: float
recall: float
f1_score: float
def evaluate_model(y_true, y_pred) -> ModelMetrics:
# ... compute metrics ...
return ModelMetrics(accuracy=0.92, precision=0.88, recall=0.90, f1_score=0.89)
metrics = evaluate_model(y_true, y_pred)
# Named access - self-documenting code
print(f"Accuracy: {metrics.accuracy:.1%}") # Accuracy: 92.0%
print(f"F1 Score: {metrics.f1_score:.3f}") # F1 Score: 0.890
# Legacy unpack still works
acc, prec, rec, f1 = metrics
:::tip API Design with NamedTuple
When designing library functions that return multiple values, prefer typing.NamedTuple over plain tuples. Named tuples allow callers to use field names (self-documenting), survive refactoring when new optional fields with defaults are added, and appear with correct types in IDE auto-complete and type checkers.
:::
Part 9 - When to Choose Tuple Over List
The choice is primarily a semantic decision about communicating intent, not a performance decision.
| Choose TUPLE when | Choose LIST when |
|---|---|
Data represents a fixed-arity record - e.g., (user_id, timestamp, ip) | Collection of homogeneous items - e.g., [user_1, user_2, user_3] |
Structure should not change - e.g., DB config: ("localhost", 5432) | Items may be added or removed - e.g., an accumulator, a queue |
Needed as dict key or set member - e.g., cache[(user_id, region)] | Order matters but may change - e.g., sorted query results |
API contract signals immutability - return (min_val, max_val) | API contract signals mutability - return [result1, result2] |
| Row from a DB result set (fixed schema per row) | Batch of items to process (count varies by batch) |
The Dict Key Use Case - Definitive Example
# Tuple as composite cache key - lists cannot do this
from functools import lru_cache
@lru_cache(maxsize=1024)
def compute_route(origin: tuple[float, float],
destination: tuple[float, float]) -> float:
"""lru_cache requires hashable arguments - tuples work, lists do not."""
lat1, lon1 = origin
lat2, lon2 = destination
# expensive computation (route planning API, etc.)
return 42.7 # km
result = compute_route((37.7749, -122.4194), (34.0522, -118.2437))
print(result) # 42.7
# 2D grid indexing
grid: dict[tuple[int, int], str] = {}
for row in range(5):
for col in range(5):
grid[(row, col)] = f"cell_{row}_{col}"
print(grid[(2, 3)]) # cell_2_3
Semantic Signal in Type Annotations
Python's type system formalizes the semantic distinction:
# tuple[int, str, float] - exactly 3 elements, each with a specific type
# List[str] - variable number of strings
def parse_row(raw: str) -> tuple[int, str, float]:
# Returns a fixed-arity record
parts = raw.split(",")
return int(parts[0]), parts[1].strip(), float(parts[2])
def load_records(path: str) -> list[tuple[int, str, float]]:
# Variable number of fixed-arity records
with open(path) as f:
return [parse_row(line) for line in f]
Part 10 - CPython Small Tuple Optimization
CPython maintains a free list of recently deallocated small tuples for reuse. When a small tuple is freed, its memory is not immediately returned to the OS - it is kept on the free list. The next time a small tuple of the same size is created, CPython may reuse that memory instead of allocating fresh from the heap.
# Demonstrating free-list reuse (CPython implementation detail)
a = (1, 2, 3)
id_a = id(a)
del a # tuple is freed - may go to free list
b = (4, 5, 6) # new tuple of same size - may reuse freed memory
id_b = id(b)
print(id_a == id_b) # Often True on CPython - same memory was reused
:::note Implementation Detail Only
Free-list reuse is specific to CPython. PyPy, Jython, GraalPy, and other Python implementations do not guarantee this behavior. Never write code that relies on tuple identity (is) - only equality (==). Identity is an observable side effect of memory allocation, not a language guarantee.
:::
Interview Questions
Q1: What is the structural difference between PyListObject and PyTupleObject in CPython memory?
Answer: Both store arrays of 8-byte pointers to Python objects. The key difference is where that array lives. A PyListObject stores its pointer array in a separately heap-allocated block pointed to by its ob_item field - this block can be realloc'd when the list grows. The struct also has an allocated field tracking total capacity (always >= ob_size). A PyTupleObject uses a C99 flexible array member - the pointer array is stored inline, at the end of the struct itself in one contiguous allocation. There is no allocated field because there is no mechanism to grow the array. The result: tuples use one fewer allocation, have better cache locality for small sizes, consume less memory (no capacity field, no slack), and are physically incapable of being resized.
Q2: Why can a tuple contain a mutable object? Is that a contradiction?
Answer: It is not a contradiction - it is a precise feature. Tuple immutability applies exactly to the tuple's own reference slots: you cannot rebind t[0] to point to a different object, and you cannot add or remove slots. But the tuple makes no claim about the objects those references point to. If t[0] points to a list, t[0].append(x) mutates the list - you are sending a message to the list object through a pointer that happens to be stored in the tuple. The tuple's pointer to that list did not change; only the list's internal state changed. This is called shallow immutability. Deep immutability requires that all reachable objects also be immutable, which is not something Python enforces - it is the programmer's responsibility to ensure if needed.
Q3: Why is tuple creation from constant literals faster than list creation?
Answer: The Python bytecode compiler performs constant folding on tuple literals whose elements are all compile-time constants (integers, strings, floats, None, True, False, and other constant tuples). It evaluates (1, 2, 3) once at compile time and stores the resulting tuple object in the code object's co_consts array. At runtime, execution is a single LOAD_CONST bytecode instruction - no allocation, no pointer writes, just loading a pre-existing object. List literals cannot be constant-folded because lists are mutable: if the same list object were shared between all executions of the literal, mutating it anywhere would corrupt all users. A fresh list must be heap-allocated on every execution via BUILD_LIST. The speedup is typically 4-8x for small constant tuples.
Q4: Under what exact conditions is a tuple not hashable?
Answer: A tuple is not hashable when any element within it - at any depth - is not hashable. CPython computes a tuple's hash by recursively calling hash() on each element and mixing the results. If any hash() call raises TypeError (which all mutable built-in types do - lists, dicts, sets), the TypeError propagates out of the tuple's __hash__. So hash((1, (2, [3]))) fails because the inner list [3] is unhashable, even though it is nested two levels deep. The only way to guarantee a tuple is hashable is to ensure every element, and every element of every nested container, is immutable: integers, floats, strings, bytes, booleans, None, other fully-hashable tuples, and frozensets.
Q5: What does typing.NamedTuple give you that collections.namedtuple does not?
Answer: Both produce classes that inherit from tuple with named field access. typing.NamedTuple offers four additional capabilities: (1) Type annotations - field types are recorded and understood by static type checkers like mypy and Pyright, so point.x is known to be float rather than Any; (2) Default values - field: type = default works naturally in the class body; (3) Methods - you can define methods in the class body that operate on the record; (4) Readable syntax - the class-body form reads like a normal Python class, is easier to understand and extend, and appears correctly in IDE auto-complete. The runtime behavior and memory usage are identical - both produce tuple subclasses where named fields are property descriptors that call self[index].
Q6: In a high-throughput ML inference service, a function receives a feature vector as input and is called 50,000 times per second. The function uses @lru_cache. Should the vector be passed as a list or a tuple? Justify your answer with concrete technical reasons.
Answer: Pass it as a tuple. Three concrete reasons: (1) @lru_cache requires hashable arguments - lists are unhashable and will raise TypeError immediately when the decorator tries to use the argument as a cache key. Tuples of numeric values (floats, ints) are hashable. (2) Creation speed - if the feature vector is constructed from constants or fixed-size computations, tuple literals benefit from constant folding and over-allocation is avoided. For a vector assembled programmatically, tuple(vector_list) is a fast O(n) conversion that pays off if the cache hit rate is high. (3) Semantic correctness - a feature vector is a fixed-arity record of numeric features; it should not change after construction. Using a tuple communicates this contract to all readers and to type checkers, prevents accidental mutation inside the function, and enables the caching infrastructure. For truly hot paths, also consider numpy arrays with a hashable wrapper or pre-hashing the vector.
Practice Challenges
Beginner - Predict the Output
What does this code print, and why?
t = (1, 2, [3, 4])
t[2] += [5, 6]
print(t)
Solution
This code both raises a TypeError and modifies t. Here is the sequence:
t[2]evaluates to the list[3, 4][3, 4] += [5, 6]callslist.__iadd__, which callst[2].extend([5, 6])and mutates the list in-place to[3, 4, 5, 6], then returns the same list object- Python then attempts to assign the returned list back to
t[2]- this triggerstuple.__setitem__, which raisesTypeError: 'tuple' object does not support item assignment
The mutation in step 2 happened before the assignment attempt in step 3. When the TypeError is raised, the list is already [3, 4, 5, 6].
t = (1, 2, [3, 4])
try:
t[2] += [5, 6]
except TypeError as e:
print(e) # 'tuple' object does not support item assignment
print(t) # (1, 2, [3, 4, 5, 6]) - list was mutated before error!
Lesson: Never use augmented assignment (+=, -=, *=, etc.) on mutable objects stored inside a tuple. The mutation side effect occurs before the assignment exception, leaving the data in a mutated state with an exception raised - a genuinely confusing outcome.
Intermediate - Build a Typed Record System
You are writing a function that parses CSV rows into typed records for a financial data pipeline. Each row contains: trade_id (int), symbol (str), price (float), quantity (int), side (str: "buy" or "sell").
Requirements:
- Records must be usable as dict keys (for deduplication by
trade_id) - Callers must be able to unpack:
trade_id, symbol, price, qty, side = record - Type checkers must know that
record.priceisfloat - Records must be immutable after creation
Implement the record type and a parse_csv_row(line: str) -> TradeRecord function.
Solution
from typing import NamedTuple, Literal
class TradeRecord(NamedTuple):
trade_id: int
symbol: str
price: float
quantity: int
side: Literal["buy", "sell"]
def notional_value(self) -> float:
"""Total value of this trade in dollars."""
return self.price * self.quantity
def is_valid(self) -> bool:
return (
self.trade_id > 0
and len(self.symbol) > 0
and self.price > 0
and self.quantity > 0
and self.side in ("buy", "sell")
)
def parse_csv_row(line: str) -> TradeRecord:
"""Parse a CSV line into a TradeRecord.
Expected format: "trade_id,symbol,price,quantity,side"
Example: "1001,AAPL,172.50,100,buy"
"""
parts = line.strip().split(",")
if len(parts) != 5:
raise ValueError(f"Expected 5 fields, got {len(parts)}: {line!r}")
return TradeRecord(
trade_id=int(parts[0]),
symbol=parts[1].strip().upper(),
price=float(parts[2]),
quantity=int(parts[3]),
side=parts[4].strip().lower(),
)
# Usage
row = "1001,AAPL,172.50,100,buy"
record = parse_csv_row(row)
# Named access - readable and type-checked
print(f"{record.symbol}: {record.quantity} shares @ ${record.price:.2f}")
# AAPL: 100 shares @ $172.50
print(f"Notional: ${record.notional_value():,.2f}")
# Notional: $17,250.00
# Tuple unpack - legacy compatible
trade_id, symbol, price, qty, side = record
print(trade_id, symbol) # 1001 AAPL
# Use as dict key (deduplication by trade_id - simplified)
seen_trades: dict[int, TradeRecord] = {}
seen_trades[record.trade_id] = record
# Or use the whole record as a key (hashable - all fields are immutable scalars)
record_set: set[TradeRecord] = set()
record_set.add(record)
print(record in record_set) # True
Why typing.NamedTuple wins here:
trade_id,symbol,price,quantity,sideare all immutable scalars → hashable → usable as dict keys or in setsrecord.priceis known to befloatby mypy/Pyright - type errors caught at analysis timenotional_value()method lives on the class - no need for standalone utility functions- Tuple unpack
trade_id, symbol, price, qty, side = recordstill works for legacy callers
Advanced - Memoization Cache with Tuple Keys
Implement a generic memoization decorator that caches function results using the arguments as the cache key. It must handle lists and dicts in arguments by converting them to hashable equivalents, include a cache statistics counter, and raise a meaningful error when arguments cannot be made hashable.
# Target usage:
@memo
def compute_features(data: list[float], threshold: float) -> list[float]:
return [x for x in data if x > threshold]
r1 = compute_features([1.0, 2.5, 3.0], threshold=2.0) # computed
r2 = compute_features([1.0, 2.5, 3.0], threshold=2.0) # from cache
print(compute_features.cache_stats())
# {'hits': 1, 'misses': 1, 'size': 1}
Solution
import functools
from typing import Any, Callable
def make_hashable(obj: Any) -> Any:
"""Recursively convert common unhashable containers to hashable equivalents.
list → tuple (of recursively converted elements)
dict → tuple of sorted (key, value) pairs
set → frozenset (of recursively converted elements)
Other objects: returned as-is if hashable, raise TypeError if not.
"""
if isinstance(obj, list):
return tuple(make_hashable(item) for item in obj)
elif isinstance(obj, dict):
return tuple(sorted(
(make_hashable(k), make_hashable(v)) for k, v in obj.items()
))
elif isinstance(obj, set):
return frozenset(make_hashable(item) for item in obj)
else:
try:
hash(obj) # raises TypeError if not hashable
return obj
except TypeError:
raise TypeError(
f"Cannot make argument of type {type(obj).__name__!r} hashable. "
f"Value: {obj!r}"
) from None
def memo(func: Callable) -> Callable:
"""Memoization decorator supporting list/dict arguments via tuple conversion."""
cache: dict[tuple, Any] = {}
hits = 0
misses = 0
@functools.wraps(func)
def wrapper(*args, **kwargs):
nonlocal hits, misses
# Build a single hashable cache key from all arguments
try:
hashable_args = tuple(make_hashable(a) for a in args)
hashable_kwargs = tuple(
sorted((k, make_hashable(v)) for k, v in kwargs.items())
)
key = (hashable_args, hashable_kwargs)
except TypeError as e:
raise TypeError(
f"Cannot memoize call to {func.__name__!r}: {e}"
) from None
if key in cache:
hits += 1
return cache[key]
misses += 1
result = func(*args, **kwargs)
cache[key] = result
return result
def cache_stats() -> dict[str, int]:
return {"hits": hits, "misses": misses, "size": len(cache)}
def cache_clear() -> None:
nonlocal hits, misses
cache.clear()
hits = 0
misses = 0
wrapper.cache_stats = cache_stats # type: ignore[attr-defined]
wrapper.cache_clear = cache_clear # type: ignore[attr-defined]
return wrapper
# Demonstration
@memo
def compute_features(data: list[float], threshold: float) -> list[float]:
print(f" Computing for data={data}, threshold={threshold}")
return [x for x in data if x > threshold]
r1 = compute_features([1.0, 2.5, 3.0], threshold=2.0) # Computing...
r2 = compute_features([1.0, 2.5, 3.0], threshold=2.0) # From cache
r3 = compute_features([1.0, 2.5, 3.0], threshold=1.0) # Computing... (different threshold)
print(f"r1 = {r1}") # [2.5, 3.0]
print(f"r2 = {r2}") # [2.5, 3.0]
print(compute_features.cache_stats())
# {'hits': 1, 'misses': 2, 'size': 2}
# Keyword argument order independence
r4 = compute_features(threshold=2.0, data=[1.0, 2.5, 3.0]) # From cache - same key!
print(compute_features.cache_stats())
# {'hits': 2, 'misses': 2, 'size': 2}
Key design decisions explained:
- The cache key is
((hashable_args...), ((k1, v1), (k2, v2)...))- a tuple of two tuples. This ensures both positional and keyword arguments contribute to the key. - Keyword arguments are sorted by key name before hashing - so
f(a=1, b=2)andf(b=2, a=1)produce the same cache key. make_hashablerecurses so nested structures likelist[list[int]]are handled correctly.nonlocalallows the closure to mutate the counter integers from the outer scope.functools.wrapspreserves__name__,__doc__,__annotations__, and__module__on the wrapper.- This pattern appears in real production systems: FastAPI's dependency injection, pytest's fixture caching, and Django's
cache_pagedecorator all use similar argument-to-key conversion strategies.
Quick Reference
| Operation | Code | Complexity | Notes |
|---|---|---|---|
| Create literal | t = (1, 2, 3) | O(1) if constants | Compiler folds constant tuples |
| Create from iterable | tuple(iterable) | O(n) | Converts any iterable |
| Create empty | t = () | O(1) | Returns CPython singleton |
| Single element | t = (42,) | O(1) | Trailing comma required |
| Index access | t[i] | O(1) | Direct pointer arithmetic |
| Slice | t[i:j] | O(j-i) | Returns new tuple |
| Unpack | a, b = t | O(n) | Count must match |
| Splat unpack | first, *rest = t | O(n) | rest is a list |
| Length | len(t) | O(1) | Stored in ob_size |
| Membership | x in t | O(n) | Linear scan - use set for repeated |
| Concatenate | t1 + t2 | O(n+m) | Returns new tuple |
| Repeat | t * k | O(nk) | Returns new tuple |
| Hash | hash(t) | O(n) | Hashes all elements recursively |
| As dict key | d[t] = v | O(n) hash + O(1) lookup | All elements must be hashable |
| Named field | nt.field_name | O(1) | Property calls self[index] |
_replace | nt._replace(x=1) | O(n) | Returns new namedtuple |
_asdict | nt._asdict() | O(n) | Returns dict |
_fields | NT._fields | O(1) | Tuple of field name strings |
| Size in bytes | sys.getsizeof(t) | O(1) | ~40 + 8 per element (64-bit) |
| Compare equal | t1 == t2 | O(n) | Element-wise left-to-right |
Key Takeaways
- A CPython tuple is a single heap allocation - its pointer array is stored inline inside the struct using a flexible array member. There is no
allocatedcapacity field and no separate pointer block. This is why tuples are smaller and have better cache locality than equivalent lists. - Tuple literals of compile-time constants are evaluated once at compile time and stored as bytecode constants. Execution is a single
LOAD_CONSTinstruction - typically 4-8x faster than equivalent list creation. - Immutability is shallow: the tuple's reference slots cannot be reassigned, but the mutable objects those slots point to can still be mutated.
t = ([1,2],[3,4]); t[0].append(99)silently works and changest's visible content. - A tuple is hashable only when every element - recursively - is hashable. Any mutable object anywhere in the element graph breaks hashability and prevents use as a dict key or set member.
- A single-element tuple requires a trailing comma:
(42,)- without it,(42)is just the integer42in parentheses. The comma, not the parentheses, creates the tuple. typing.NamedTupleis the modern preferred form for named records - it provides type annotations, default values, method support, and IDE auto-complete, with zero extra memory per instance compared to a plain tuple.- Choose tuple for heterogeneous fixed-arity records (database rows, coordinates, API responses); choose list for homogeneous variable-length sequences (collections of items that grow or shrink). This semantic contract is the primary reason to pick one over the other.
- Multiple return values from functions are tuples:
return a, bpacks into(a, b)automatically. CPython optimizes simple two-variable swapsa, b = b, ato useROT_TWObytecode with no heap allocation.
