Mutable vs Immutable - Designing Reliable Data Structures
Reading time: ~22 minutes | Level: Foundation → Engineering
Here is one of the most common bugs in Python, appearing in production systems every day.
def add_tag(tags=[]):
tags.append("processed")
return tags
result1 = add_tag()
result2 = add_tag()
result3 = add_tag()
print(result1)
print(result2)
print(result3)
If you expected each call to return ["processed"], you have the wrong mental model about how Python handles default arguments.
The actual output:
['processed']
['processed', 'processed']
['processed', 'processed', 'processed']
Default mutable arguments are created once when the function is defined, not on each call. The same list object is reused - and grows - across every invocation.
This bug, and dozens like it, stem from a misunderstanding of Python's object model and the difference between mutability and rebinding.
What You Will Learn
- The three pillars of the Python object model:
id(),type(),value - What immutability actually means: you cannot change the object, only rebind the name
- Complete classification of mutable and immutable built-in types
- The
id()trick: small integer caching (-5 to 256) and string interning - Pass-by-object-reference: how Python function parameters actually work
- The mutable default argument anti-pattern - the cause and the canonical fix
- How immutability enables safe concurrent access without locks
@dataclass(frozen=True): enforcing immutability in custom classes- Why
+=behaves fundamentally differently for mutable vs immutable objects - Why string concatenation in a loop is O(n²) and the correct O(n) alternatives
Prerequisites
- Python variables and assignment
- Understanding of Python functions and parameters
- Basic familiarity with lists, dicts, and tuples
Part 1 - The Python Object Model: Three Pillars
Every Object Has Three Properties
In CPython, every object has exactly three properties:
x = 42
print(id(x)) # Identity - memory address in CPython (e.g., 140234567)
print(type(x)) # Type - <class 'int'>
print(x) # Value - 42
These three properties are fixed at creation time. For immutable types, the value cannot change. For mutable types, the value can change in place.
What "Immutable" Really Means
Immutability is a property of the object, not the variable name.
x = 42 # x binds to the integer object with value 42
x = 43 # x is rebound to a DIFFERENT integer object with value 43
# The original integer 42 was NOT changed - it cannot be
y = [1, 2, 3]
y.append(4) # The list object IS changed in place - [1, 2, 3, 4]
# y still binds to the same object (same id)
Part 2 - The Complete Type Classification
Immutable Types
These types cannot be changed after creation. Any operation that appears to "modify" them creates a new object:
# int - immutable
n = 100
id_before = id(n)
n += 1
print(id(n) == id_before) # False - n now points to a new int object (101)
# float - immutable
f = 3.14
# bool - immutable (bool is a subclass of int)
b = True
print(isinstance(b, int)) # True
print(b + 1) # 2
# str - immutable
s = "hello"
id_before = id(s)
s += " world"
print(id(s) == id_before) # False - new string object created
# tuple - immutable (but elements may be mutable!)
t = (1, 2, 3)
# t[0] = 99 - TypeError: 'tuple' object does not support item assignment
# frozenset - immutable
fs = frozenset({1, 2, 3})
# fs.add(4) - AttributeError: no attribute 'add'
# bytes - immutable
data = b"hello"
# data[0] = 65 - TypeError
Mutable Types
These types can be changed in place. The object's value changes, but the object's identity (id()) stays the same:
# list - mutable
lst = [1, 2, 3]
id_before = id(lst)
lst.append(4)
print(id(lst) == id_before) # True - same object, modified in place
# dict - mutable
d = {"a": 1}
id_before = id(d)
d["b"] = 2
print(id(d) == id_before) # True - same object
# set - mutable
s = {1, 2, 3}
s.add(4)
# bytearray - mutable (contrast with bytes)
ba = bytearray(b"hello")
ba[0] = 72 # Change first byte - no error, modified in place
print(ba) # bytearray(b'Hello')
# Custom class instances - mutable by default
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
p = Point(1, 2)
p.x = 99 # Mutable - attribute reassignment works
Part 3 - The id() Trick: Integer Caching and String Interning
Small Integer Caching (-5 to 256)
CPython pre-allocates integer objects for the range -5 to 256. When you create a small integer, Python returns the cached object rather than creating a new one.
a = 100
b = 100
print(a is b) # True - same cached object!
print(id(a) == id(b)) # True
a = 257
b = 257
print(a is b) # False (usually) - different objects outside the cache range
print(id(a) == id(b)) # False
This is why is comparisons on integers are unreliable - they give different results depending on the value.
CPython pre-allocates integer objects for the range [-5, 256]:
| Index | Object | Returned for |
|---|---|---|
| 0 | IntObject(-5) | a = -5 |
| 1 | IntObject(-4) | a = -4 |
| ... | ... | ... |
| 261 | IntObject(256) | a = 256 |
Any int in [-5, 256]: Python returns the cached object. Any int outside: Python creates a new object.
:::danger Never Use is to Compare Integers
a is b tests identity (same object in memory). a == b tests equality (same value). For integers, 42 is 42 is True due to caching, but 1000 is 1000 may be False. Always use == for value comparison. Reserve is for None, True, False, and explicit identity checks.
:::
String Interning
Python automatically interns short strings that look like identifiers (letters, digits, underscores). String literals in source code are often interned within the same module.
# Interned strings (looks like an identifier)
a = "hello"
b = "hello"
print(a is b) # True in CPython - same interned object
# Not interned (contains spaces)
a = "hello world"
b = "hello world"
print(a is b) # False (usually) - separate objects
# Explicit interning
import sys
a = sys.intern("my_custom_string_for_dict_key")
b = sys.intern("my_custom_string_for_dict_key")
print(a is b) # True - explicitly interned
# Use sys.intern() for strings used as dictionary keys in hot paths
# Identity comparison (is) is faster than equality comparison (==)
Part 4 - Pass-by-Object-Reference
Python is neither "pass by value" nor "pass by reference" in the traditional C sense. It is pass by object reference (also called "pass by assignment" or "call by sharing").
When you call a function, the function parameter becomes a new name that binds to the same object as the argument.
def modify(items, count):
items.append(99) # Modifies the list object (mutable) - visible outside
count += 1 # Creates a new int object - local rebinding only
print(f"Inside: items={items}, count={count}")
my_list = [1, 2, 3]
my_count = 0
modify(my_list, my_count)
print(f"Outside: my_list={my_list}, my_count={my_count}")
Output:
Inside: items=[1, 2, 3, 99], count=1
Outside: my_list=[1, 2, 3, 99], my_count=0
Key Rule
- Mutable argument: mutations to the object are visible to the caller (because caller and callee share the same object)
- Immutable argument: the function can only rebind its local name - the caller's binding is unaffected
- Reassigning the parameter:
items = [99]in the function body would rebind the local nameitems, not the caller'smy_list
Part 5 - The Mutable Default Argument Anti-Pattern
This is the single most common Python gotcha. Python evaluates default argument values once, at function definition time, not on each call.
# THE BUG
def add_tag(tag, tags=[]): # This list is created ONCE
tags.append(tag)
return tags
print(add_tag("alpha")) # ['alpha']
print(add_tag("beta")) # ['alpha', 'beta'] - the default list grew!
print(add_tag("gamma")) # ['alpha', 'beta', 'gamma']
The default list [] is stored as an attribute of the function object and shared across all calls that use it.
# Proof: inspect the function's default arguments
print(add_tag.__defaults__) # (['alpha', 'beta', 'gamma'],) - the accumulated list!
The Canonical Fix: Use None as Sentinel
def add_tag(tag, tags=None):
if tags is None:
tags = [] # Create a fresh list on each call
tags.append(tag)
return tags
print(add_tag("alpha")) # ['alpha']
print(add_tag("beta")) # ['beta'] - fresh list each time
print(add_tag("gamma")) # ['gamma']
# If you WANT to share across calls, be explicit about it
shared_list = ["initial"]
print(add_tag("alpha", shared_list)) # ['initial', 'alpha']
print(add_tag("beta", shared_list)) # ['initial', 'alpha', 'beta']
:::warning The Sentinel Pattern Is Mandatory
Using mutable default arguments (lists, dicts, sets) is a well-known anti-pattern. Every serious Python linter (pylint, flake8, mypy) flags it. The sentinel None pattern is the universal fix. Never use def f(items=[]), def f(config={}), or def f(visited=set()) in production code.
:::
When Mutable Defaults Are Intentional
There is one legitimate use case: function-level caches that should persist across calls:
def memoized_fib(n, _cache={0: 0, 1: 1}):
"""Fibonacci with a mutable default dict as a persistent cache."""
if n not in _cache:
_cache[n] = memoized_fib(n - 1, _cache) + memoized_fib(n - 2, _cache)
return _cache[n]
print(memoized_fib(10)) # 55
print(memoized_fib.__defaults__[0]) # {0: 0, 1: 1, 2: 1, 3: 2, ..., 10: 55}
This is intentional and documented - the mutable default is used as a persistent cache, not as a per-call fresh container. Prefer functools.lru_cache in production code.
Part 6 - The += Operator: Mutable vs Immutable Behavior
The += operator behaves fundamentally differently depending on the type it is applied to.
For Immutable Types: += Creates a New Object
x = 42
print(id(x)) # e.g., 9788544
x += 1
print(id(x)) # Different - x now points to a new int object (43)
# The original int 42 is unchanged; x was rebound
For Mutable Types: += Modifies In Place (via __iadd__)
lst = [1, 2, 3]
id_before = id(lst)
lst += [4, 5] # Calls list.__iadd__([4, 5]) - extends in place!
print(id(lst) == id_before) # True - same object
print(lst) # [1, 2, 3, 4, 5]
# Compare with +: creates a new list
lst2 = [1, 2, 3]
id_before = id(lst2)
lst2 = lst2 + [4, 5] # Creates a new list!
print(id(lst2) == id_before) # False - new object
The Critical Bug: += on Immutable Tuple Elements
t = ([1, 2], [3, 4])
# Can we += on a tuple element?
# t[0] += [99] - What happens?
try:
t[0] += [99]
except TypeError as e:
print(f"Error: {e}")
print(t)
Output:
Error: 'tuple' object does not support item assignment
...
([1, 2, 99], [3, 4])
This is the strangest behavior in Python. The += on t[0]:
- Calls
list.__iadd__([99])- succeeds, extendst[0]in place - Tries to assign the result back to
t[0]- fails (tuple is immutable)
So the list was modified, but the TypeError was still raised. Both things happened.
:::danger Mutable Objects in Tuples
Never use += on a mutable object stored inside an immutable container (tuple). The mutation succeeds but the assignment raises an exception, leaving the program in an inconsistent state. If you need a mutable sequence in a tuple-like structure, reconsider your data model.
:::
Part 7 - Immutability and Safe Concurrent Access
Immutable objects are inherently thread-safe. Because they cannot be changed, multiple threads can read them simultaneously with no risk of data races.
import threading
# SAFE: Sharing immutable data between threads
CONFIG = {
"host": "localhost", # str - immutable
"port": 5432, # int - immutable
"name": ("db", "v2"), # tuple of str - immutable
}
def read_config():
# No lock needed - CONFIG values are immutable, reading is safe
return CONFIG["host"], CONFIG["port"]
threads = [threading.Thread(target=read_config) for _ in range(100)]
for t in threads:
t.start()
for t in threads:
t.join()
# No race conditions - immutable data is concurrency-safe
# UNSAFE: Sharing mutable data between threads without locks
shared_list = []
def append_items():
for i in range(1000):
shared_list.append(i) # NOT thread-safe!
threads = [threading.Thread(target=append_items) for _ in range(4)]
# This can corrupt shared_list due to race conditions on the list's internals
# SAFE: Use a lock for mutable shared state
import threading
shared_list = []
lock = threading.Lock()
def append_items_safe():
for i in range(1000):
with lock:
shared_list.append(i)
Part 8 - Immutable Custom Classes with @dataclass(frozen=True)
Python's dataclasses module provides frozen=True, which makes a dataclass immutable after construction. The fields become read-only, and the class becomes hashable (usable as a dict key or set member).
from dataclasses import dataclass, field
import dataclasses
@dataclass(frozen=True)
class Point:
x: float
y: float
p = Point(1.0, 2.0)
print(p.x) # 1.0
# p.x = 99.0 # FrozenInstanceError: cannot assign to field 'x'
# Frozen dataclasses are hashable - can be used as dict keys or in sets
points = {p, Point(3.0, 4.0)}
cache = {p: "result_at_this_point"}
@dataclass(frozen=True)
class Config:
host: str
port: int
tags: tuple = field(default_factory=tuple) # Must use tuple, not list!
# tags: list = field(default_factory=list) # Would make it unhashable
def with_host(self, new_host: str) -> "Config":
"""Return a modified copy - the only way to 'change' a frozen instance."""
return dataclasses.replace(self, host=new_host)
base = Config(host="localhost", port=5432, tags=("primary",))
prod = base.with_host("prod-db.internal")
print(base.host) # "localhost" - unchanged
print(prod.host) # "prod-db.internal"
print(base == prod) # False - different values
print(hash(base)) # Works - frozen dataclasses are hashable
:::tip frozen=True Forces Good Design
When you use @dataclass(frozen=True), Python forces you to think about mutations as "return a new object with this change" rather than "modify this object". This is the functional programming style applied to Python, and it eliminates entire categories of aliasing bugs.
:::
Part 9 - String Concatenation: Why O(n²) and the Fix
The O(n²) Anti-Pattern
Strings are immutable. Every concatenation creates a new string object, copying all characters from both operands.
# BAD: O(n²) string building
parts = ["apple", "banana", "cherry", "date"] # imagine 10,000 parts
result = ""
for part in parts:
result += part # Each += creates a new string, copying everything so far
# Iteration 1: copies "apple" → cost: 5 chars
# Iteration 2: copies "applebanana" → cost: 11 chars
# Iteration 3: copies "applebananacherry" → cost: 17 chars
# ...
# Total copies: 5 + 11 + 17 + ... = O(n²)
For 10,000 strings of average length 10:
- Total cost ≈ 10,000 × 10,000 / 2 = 50 million character copies
# GOOD: O(n) string building with join
result = "".join(parts) # All parts joined in one pass - O(total length)
# Also good: str.join with a generator
result = ", ".join(part.upper() for part in parts)
# For building up a complex string incrementally, use a list then join
parts = []
for chunk in data_source:
parts.append(process(chunk))
result = "".join(parts)
Benchmark
import timeit
words = ["word"] * 10_000
def concat_plus():
result = ""
for w in words:
result += w
return result
def concat_join():
return "".join(words)
t_plus = timeit.timeit(concat_plus, number=100)
t_join = timeit.timeit(concat_join, number=100)
print(f"'+=' concatenation: {t_plus:.3f}s")
print(f"'join' method: {t_join:.3f}s")
print(f"Speedup: {t_plus/t_join:.1f}x")
Typical output:
'+=' concatenation: 0.812s
'join' method: 0.042s
Speedup: 19.3x
str.join is roughly 20x faster for 10,000 strings. The gap grows quadratically with input size.
:::note CPython String Optimization
CPython has an optimization for str += str in some cases (when the string's reference count is 1), but this optimization is fragile and disappears in many common patterns. Do not rely on it. Always use str.join for building strings from multiple parts.
:::
Part 10 - Immutability Patterns in Production Systems
Pattern 1: Named Tuples for Lightweight Immutable Records
from typing import NamedTuple
class DatabaseRecord(NamedTuple):
id: int
name: str
score: float
def with_score(self, new_score: float) -> "DatabaseRecord":
return self._replace(score=new_score)
# Immutable, hashable, memory-efficient (no __dict__)
record = DatabaseRecord(id=1, name="Alice", score=85.0)
updated = record.with_score(90.0)
print(record.score) # 85.0 - unchanged
print(updated.score) # 90.0
# Can be used as dict keys
cache = {record: "processed"}
Pattern 2: Tuple-Keyed Caches
# Using immutable tuples as cache keys - works because tuples are hashable
_cache: dict[tuple, float] = {}
def expensive_computation(x: float, y: float, z: float) -> float:
key = (x, y, z) # Tuple is immutable - safe and hashable as dict key
if key not in _cache:
_cache[key] = x ** 2 + y ** 2 + z ** 2 # simulate expensive work
return _cache[key]
# Lists cannot be used as keys - TypeError: unhashable type: 'list'
# key = [x, y, z] # Would fail on _cache[key]
Pattern 3: Functional-Style Record Updates
from dataclasses import dataclass, replace
@dataclass(frozen=True)
class UserState:
user_id: str
score: int
level: int
badges: tuple[str, ...]
def award_badge(state: UserState, badge: str) -> UserState:
"""Return a new state with the badge added - never mutate in place."""
return replace(state, badges=state.badges + (badge,))
def level_up(state: UserState) -> UserState:
"""Return a new state with incremented level."""
return replace(state, level=state.level + 1, score=state.score + 100)
# Each operation returns a new state - old states are preserved
state0 = UserState(user_id="U001", score=0, level=1, badges=())
state1 = award_badge(state0, "first_login")
state2 = level_up(state1)
state3 = award_badge(state2, "level_2_achieved")
# Full audit trail - all states preserved
print(state0.badges) # ()
print(state1.badges) # ('first_login',)
print(state3.badges) # ('first_login', 'level_2_achieved')
Interview Questions
Q1: What does "immutable" mean in Python, and does it mean the variable can't change?
Answer: Immutability is a property of the object, not the variable. An immutable object cannot change its value or internal state after creation. When you write x = 42; x = 43, you are not mutating the integer 42 - you are creating a new integer object (43) and rebinding the name x to it. The integer 42 still exists in memory until its reference count drops to zero. The variable x is just a name - it can be rebound to any object at any time. What cannot happen is some_int.value = 99 - integer objects have no mutable fields.
Q2: Explain the mutable default argument bug and the canonical fix.
Answer: Python evaluates default argument values at function definition time, not at each call. If you write def f(lst=[]), the empty list [] is created once and stored as f.__defaults__[0]. Every call that uses the default gets the same list object. Mutations accumulate across calls. The fix: use None as a sentinel value for mutable defaults, and create a fresh mutable object inside the function body: def f(lst=None): if lst is None: lst = []. This creates a new empty list on each call that uses the default.
Q3: How does += behave differently for int vs list?
Answer: For immutable types like int, x += 1 is equivalent to x = x + 1 - it creates a new integer object and rebinds x to it. The original integer is unchanged. id(x) will differ before and after. For mutable types like list, lst += [4] calls list.__iadd__([4]), which extends the list in place and returns the same list object. id(lst) is the same before and after. The difference matters when the mutable object is referenced through multiple names: b = a; a += [4] will cause b to also reflect the change for lists, but not for ints.
Q4: Why does Python cache small integers in the range -5 to 256?
Answer: CPython pre-allocates integer objects for values -5 through 256 at interpreter startup. This optimization exists because small integers are used extremely frequently in Python programs (loop counters, array indices, boolean values). By caching them, Python avoids repeatedly allocating and deallocating identical integer objects, reducing memory pressure and improving performance. The consequence: a = 100; b = 100; a is b is True because both names point to the same cached object. For values outside this range, a = 1000; b = 1000; a is b is typically False. This is why you must use == for value comparison, never is.
Q5: Why is string concatenation in a loop O(n²)?
Answer: Strings are immutable. Every result += piece creates a new string by allocating memory for len(result) + len(piece) characters and copying all characters from both strings into the new allocation. On iteration k, where result has accumulated k pieces of average length L, the copy costs k × L operations. Summing across all n iterations: L(1 + 2 + 3 + ... + n) = L × n(n+1)/2 = O(n²). The fix is "".join(pieces_list) - it calculates the total length in one pass, allocates once, then fills in each piece. Total cost: O(total characters) = O(n × L).
Q6: How does @dataclass(frozen=True) enforce immutability and what does it enable?
Answer: @dataclass(frozen=True) generates __setattr__ and __delattr__ methods that raise FrozenInstanceError on any attempt to set or delete an attribute after construction. It also generates __hash__ based on the fields (since frozen instances are reliably immutable, their hash cannot change). This enables: (1) using instances as dictionary keys, (2) storing in sets, (3) safe concurrent access without locks, (4) architectural clarity that an object represents a value, not a mutable entity. To "modify" a frozen dataclass, use dataclasses.replace(instance, field=new_value), which returns a new instance.
Practice Challenges
Beginner - Predict the Output
What does this print? Explain each line.
a = [1, 2, 3]
b = a
b += [4]
print(a)
print(b)
print(a is b)
x = 100
y = x
y += 1
print(x)
print(y)
print(x is y)
Solution
Output:
[1, 2, 3, 4]
[1, 2, 3, 4]
True
100
101
False
Explanation:
b = a makes b an alias for the same list object. b += [4] calls list.__iadd__([4]) which extends the list in place. Since a and b point to the same list, both see [1, 2, 3, 4]. a is b is True - the += did not create a new list.
y = x makes y another name for the integer object 100. y += 1 creates a new integer object 101 and rebinds y to it. x still points to 100. x is y is False because they now point to different integer objects (100 and 101).
Intermediate - Fix the Anti-Pattern
The following class has a mutable default argument bug that causes data to accumulate across instances. Identify the bug, explain why it happens, and rewrite the class correctly.
class RequestBuilder:
def __init__(self, method, path, headers={}):
self.method = method
self.path = path
self.headers = headers
def add_header(self, key, value):
self.headers[key] = value
return self
def build(self):
return {
"method": self.method,
"path": self.path,
"headers": self.headers,
}
# Usage
req1 = RequestBuilder("GET", "/api/users")
req1.add_header("Authorization", "Bearer token_1")
req2 = RequestBuilder("POST", "/api/items") # New request, no headers?
print(req2.build()) # What does this print?
Solution
Current output (buggy):
{'method': 'POST', 'path': '/api/items', 'headers': {'Authorization': 'Bearer token_1'}}
req2 sees req1's headers! Both req1.headers and req2.headers reference the same default dict {} that was created when the class was defined.
Fixed version:
from typing import Optional
class RequestBuilder:
def __init__(self, method: str, path: str, headers: Optional[dict] = None):
self.method = method
self.path = path
# Create a fresh dict for each instance - never share the default
self.headers = dict(headers) if headers is not None else {}
def add_header(self, key: str, value: str) -> "RequestBuilder":
self.headers[key] = value
return self
def build(self) -> dict:
return {
"method": self.method,
"path": self.path,
"headers": dict(self.headers), # Defensive copy on output
}
# Test correctness
req1 = RequestBuilder("GET", "/api/users")
req1.add_header("Authorization", "Bearer token_1")
req2 = RequestBuilder("POST", "/api/items")
print(req2.build()) # {'method': 'POST', 'path': '/api/items', 'headers': {}}
print(req1.build()) # {'method': 'GET', 'path': '/api/users', 'headers': {'Authorization': 'Bearer token_1'}}
# Each instance has its own independent headers dict
Bonus - immutable version using frozen dataclass:
from dataclasses import dataclass, replace, field
@dataclass(frozen=True)
class Request:
method: str
path: str
headers: tuple[tuple[str, str], ...] = field(default_factory=tuple)
def with_header(self, key: str, value: str) -> "Request":
"""Return a new Request with the header added."""
return replace(self, headers=self.headers + ((key, value),))
def headers_dict(self) -> dict:
return dict(self.headers)
req1 = Request("GET", "/api/users").with_header("Authorization", "Bearer token_1")
req2 = Request("POST", "/api/items")
print(req1.headers_dict()) # {'Authorization': 'Bearer token_1'}
print(req2.headers_dict()) # {}
Advanced - Design an Immutable Event Log
Design an EventLog class for a game state machine. Requirements:
- The log must be truly immutable once an event is appended - no mutation of existing entries
- "Appending" returns a new
EventLoginstance - The log must be hashable (to use as a dict key for caching game states)
- Support slicing:
log.last(n)returns the lastnevents as a newEventLog - Benchmark: compare your immutable log against a mutable list for 10,000 append operations
@dataclass(frozen=True)
class GameEvent:
tick: int
event_type: str
payload: tuple # Must be immutable for hashability
Solution
from dataclasses import dataclass, field
import timeit
import hashlib
@dataclass(frozen=True)
class GameEvent:
tick: int
event_type: str
payload: tuple # tuple of immutable values - hashable
@dataclass(frozen=True)
class EventLog:
"""
Immutable append-only event log backed by a tuple.
Each "append" returns a new EventLog - original is never modified.
Hashable: can be used as a dict key or stored in a set.
"""
_events: tuple[GameEvent, ...] = field(default_factory=tuple)
def append(self, event: GameEvent) -> "EventLog":
"""Return a new EventLog with the event added."""
return EventLog(_events=self._events + (event,))
def last(self, n: int) -> "EventLog":
"""Return a new EventLog with only the last n events."""
return EventLog(_events=self._events[-n:] if n > 0 else ())
def __len__(self) -> int:
return len(self._events)
def __getitem__(self, index):
return self._events[index]
def __iter__(self):
return iter(self._events)
def replay(self, until_tick: int) -> "EventLog":
"""Return log filtered to events up to a given tick."""
return EventLog(_events=tuple(e for e in self._events if e.tick <= until_tick))
# Test correctness
log0 = EventLog()
log1 = log0.append(GameEvent(tick=1, event_type="MOVE", payload=(0, 0, 1, 0)))
log2 = log1.append(GameEvent(tick=2, event_type="ATTACK", payload=(1, "sword")))
log3 = log2.append(GameEvent(tick=3, event_type="PICKUP", payload=(2, "shield")))
print(f"log0 length: {len(log0)}") # 0 - original unchanged
print(f"log3 length: {len(log3)}") # 3
# Immutable - each is a distinct object
print(log0 is log1) # False
print(log1 is log2) # False
# Hashable - can be used as dict key
state_cache = {}
state_cache[log3] = "game_state_snapshot"
print(state_cache[log3]) # "game_state_snapshot"
# Slicing
last_two = log3.last(2)
print(len(last_two)) # 2
print(last_two[0].event_type) # "ATTACK"
# Replay up to tick 2
early_log = log3.replay(until_tick=2)
print(len(early_log)) # 2 (ticks 1 and 2 only)
# --- Benchmark: immutable log vs mutable list ---
def immutable_append_test():
log = EventLog()
for i in range(1000):
log = log.append(GameEvent(tick=i, event_type="TICK", payload=(i,)))
return log
def mutable_append_test():
log = []
for i in range(1000):
log.append(GameEvent(tick=i, event_type="TICK", payload=(i,)))
return log
t_immutable = timeit.timeit(immutable_append_test, number=100)
t_mutable = timeit.timeit(mutable_append_test, number=100)
print(f"\nImmutable EventLog (1000 appends × 100): {t_immutable:.3f}s")
print(f"Mutable list (1000 appends × 100): {t_mutable:.3f}s")
print(f"Overhead ratio: {t_immutable/t_mutable:.1f}x")
Expected output:
Immutable EventLog (1000 appends × 100): 0.847s
Mutable list (1000 appends × 100): 0.009s
Overhead ratio: 94.1x
Design analysis:
- Each "append" allocates a new tuple (copies all N existing events + 1 new one) → O(n) per append → O(n²) total for n appends
- This is the fundamental cost of immutability: no in-place mutation means copying
- For game state logging, this is acceptable: logs are typically small (< 10,000 events), and the benefits (hashability, no aliasing bugs, safe concurrent reads) outweigh the cost
- For high-frequency logging (>10,000 events), use a persistent data structure library or batch immutable snapshots
Quick Reference
| Type | Mutable? | Hashable? | Examples |
|---|---|---|---|
int | No | Yes | 42, -5, 0 |
float | No | Yes | 3.14, 1.0 |
bool | No | Yes | True, False |
str | No | Yes | "hello", "" |
tuple | No* | Yes* | (1, 2), () |
frozenset | No | Yes | frozenset({1,2}) |
bytes | No | Yes | b"data" |
list | Yes | No | [1, 2, 3] |
dict | Yes | No | {"a": 1} |
set | Yes | No | {1, 2, 3} |
bytearray | Yes | No | bytearray(b"x") |
dataclass(frozen=True) | No | Yes | Custom records |
*Tuple is immutable and hashable only if all its elements are also immutable.
| Scenario | Use | Why |
|---|---|---|
| Dict key or set member | Immutable type | Must be hashable |
| Shared across threads | Immutable type | No race condition possible |
| Function default container | None sentinel | Avoid shared mutable default |
| Large evolving dataset | list or dict | In-place mutation is efficient |
| Configuration object | @dataclass(frozen=True) | Read-only, hashable, explicit |
| String building from parts | str.join(parts) | O(n) vs O(n²) for += |
| Cache key with multiple fields | tuple | Immutable, hashable, ordered |
Key Takeaways
- Python's object model has three pillars:
id()(identity),type(), and value - immutability means the value cannot change after the object is created - Rebinding a name (
x = 43) never modifies an object - it redirects the name to a different object; immutable objects appear to "change" only because the name gets rebound to a new object - Python is pass-by-object-reference: functions receive a binding to the same object - mutations to mutable arguments are visible to callers; rebinding the parameter is not
- The mutable default argument bug is caused by default values being evaluated once at definition time - always use
Noneas sentinel and create fresh containers inside the function body - For immutable types,
+=creates a new object and rebinds; for mutable types,+=calls__iadd__and modifies in place - understanding this difference prevents subtle aliasing bugs - String concatenation in a loop is O(n²) - always use
"".join(list_of_strings)which is O(n) by allocating once and filling in @dataclass(frozen=True)enforces immutability for custom classes, enables hashability, and communicates design intent - use it for value objects, configuration records, and anything shared across threads
