Memory References and Aliasing - Python's Object Reference Model
Reading time: ~24 minutes | Level: Foundation → Engineering
Here is a question that reveals whether you understand how Python really manages memory.
import sys
x = []
print(sys.getrefcount(x)) # What prints here?
y = x
z = x
print(sys.getrefcount(x)) # And here?
del y
del z
print(sys.getrefcount(x)) # And here?
If you expected 1, 3, 1 - you are close but missing one reference. The actual output:
2
4
2
The count is always one higher than you expect because sys.getrefcount() itself holds a temporary reference to the object while evaluating the argument.
Understanding reference counts - why they increment, when they drop to zero, and what the garbage collector does about cycles - is the foundation of understanding Python's memory model. Memory leaks in Python are real, and they all trace back to these mechanics.
What You Will Learn
- How CPython implements reference counting via
ob_refcntin thePyObjectC struct - The two-tier memory management system: reference counting + cyclic garbage collector
- What aliasing means: multiple names pointing to the same object, and when it causes bugs
id()function in CPython: the memory address interpretation and its limitationsisvs==: the identity vs equality distinction and when each is correct- Weak references (
weakrefmodule): breaking cycles without preventing garbage collection - What
delactually does - removes the name binding, does not destroy the object sys.getrefcount(obj): interpreting the count and why it is always higher than expected- Memory leaks in Python: circular references, global caches, closures holding large objects
- ASCII diagrams of reference graphs for nested and circular structures
- Real-world implications: list slices create new objects, generator expressions do not
Prerequisites
- Python variables and name binding
- Understanding of mutable vs immutable types
- Basic familiarity with Python classes and instances
Part 1 - CPython's Memory Management: Reference Counting
The PyObject Struct
In CPython, every Python object is represented in C memory by a structure that begins with a common header. The relevant portion:
/* From CPython: Include/object.h */
typedef struct _object {
Py_ssize_t ob_refcnt; /* Reference count */
PyTypeObject *ob_type; /* Pointer to type object */
} PyObject;
Every Python object - integer, string, list, instance of a custom class - starts with these two fields. ob_refcnt is the reference count: the number of places in the program that currently hold a reference to this object.
How Reference Counting Works
Every time a new reference to an object is created, CPython calls Py_INCREF(obj) which increments ob_refcnt. Every time a reference is removed, CPython calls Py_DECREF(obj) which decrements ob_refcnt. When ob_refcnt reaches zero, the object is immediately deallocated.
import sys
import ctypes
# Watch the reference count change
x = object() # ob_refcnt = 1 (bound to x)
y = x # ob_refcnt = 2 (also bound to y)
print(sys.getrefcount(x)) # 3 (x, y, and getrefcount's own arg)
z = [x] # ob_refcnt = 4 (also in z[0])
print(sys.getrefcount(x)) # 4 (x, y, z[0], getrefcount's arg)
del y # ob_refcnt = 3 (y binding removed)
del z # ob_refcnt = 2 (z list deleted → its contents decrefd)
print(sys.getrefcount(x)) # 2 (x, getrefcount's arg)
# When del x is called, ob_refcnt drops to 0 → object deallocated immediately
What Increments the Reference Count
x = [1, 2, 3] # refcount = 1
# Assignment to new name: +1
y = x # refcount = 2
# Appending to container: +1
lst = [x] # refcount = 3
# Function argument: +1 (during the call)
def show(obj):
# obj is a local reference to x - ob_refcnt incremented on call
print(sys.getrefcount(obj)) # 5 during call (x, y, lst[0], obj, getrefcount arg)
show(x)
# After return: local 'obj' is deleted → ob_refcnt decremented
# Attribute assignment: +1
class Container:
pass
c = Container()
c.data = x # refcount = 4 (c.data holds a reference)
Part 2 - The Two-Tier System: Reference Counting + Cyclic GC
Why Reference Counting Alone is Not Enough
Reference counting fails for circular references - objects that reference each other, forming a cycle. Even with no external references, their reference counts never reach zero.
import gc
# Create a circular reference
a = []
b = []
a.append(b) # a holds a reference to b → b.refcount = 2 (b + a[0])
b.append(a) # b holds a reference to a → a.refcount = 2 (a + b[0])
# Now delete the external references
del a # a.refcount drops to 1 (only b[0] holds it)
del b # b.refcount drops to 1 (only a[0] holds it)
# Neither ever reaches 0! They keep each other alive - memory leaked!
No external references exist - but refcounts are not 0. Reference counting alone cannot collect this cycle.
The Cyclic Garbage Collector
CPython includes a supplementary garbage collector (gc module) that detects and collects unreachable cycles. It runs periodically and looks for "islands" of objects that reference each other but are not reachable from any live variable.
import gc
class Node:
def __init__(self, name):
self.name = name
self.partner = None # Will be set to create cycle
# Create a cycle
n1 = Node("alpha")
n2 = Node("beta")
n1.partner = n2 # n1 → n2
n2.partner = n1 # n2 → n1 - cycle!
del n1
del n2
# n1 and n2's objects still exist in memory due to the cycle
# Force garbage collection
collected = gc.collect()
print(f"Garbage collected: {collected} objects")
# Output: Garbage collected: 4 objects (or similar - the cycle members + their dicts)
GC Configuration
import gc
# Check if gc is enabled
print(gc.isenabled()) # True by default
# Get current collection thresholds
# (generation 0 threshold, gen 1 threshold, gen 2 threshold)
print(gc.get_threshold()) # (700, 10, 10)
# Objects are promoted to generation 1 after surviving 700 gen-0 collections
# Objects are promoted to generation 2 after surviving 10 gen-1 collections
# Manual collection
gc.collect(0) # Collect generation 0 only (youngest, fastest)
gc.collect(1) # Collect generations 0 and 1
gc.collect() # Collect all generations (slowest, most thorough)
# Disable GC in performance-critical sections (only if you're sure there are no cycles)
gc.disable()
# ... performance-critical code with no circular references ...
gc.enable()
gc.collect() # Sweep up any cycles created during the disabled window
:::tip Performance Consideration
The cyclic garbage collector introduces periodic latency "stop-the-world" pauses. In high-throughput applications, you can tune gc.set_threshold() or disable gc during hot paths, but only after profiling confirms it is the bottleneck. Most Python programs should leave gc defaults alone.
:::
Part 3 - Aliasing: Multiple Names, One Object
What Aliasing Means
Aliasing occurs when two or more names in a program refer to the same object in memory.
a = [1, 2, 3]
b = a # b is an alias for a
print(a is b) # True - same object
print(id(a) == id(b)) # True - same memory address
b.append(4) # Mutation via b
print(a) # [1, 2, 3, 4] - visible through a because it is the same object
Aliasing in Function Calls
def clear_list(lst):
lst.clear() # Mutates the list object - visible to caller!
# lst = [] # This would rebind the LOCAL name lst - NOT visible to caller
data = [1, 2, 3, 4, 5]
clear_list(data)
print(data) # [] - the original list was cleared via aliasing
Aliasing in Nested Structures
The most dangerous aliasing is in nested structures where you do not realize objects are shared:
# This is the classic matrix aliasing bug
matrix = [[0] * 3] * 3
# What's in memory?
print(matrix[0] is matrix[1]) # True! Same list object!
print(matrix[1] is matrix[2]) # True! Same list object!
matrix[0][0] = 9
print(matrix)
# [[9, 0, 0], [9, 0, 0], [9, 0, 0]] - all three "rows" changed!
Part 4 - id() and is vs ==
id() in CPython
In CPython, id(obj) returns the memory address of the object. This address is the value of the pointer held by the reference.
x = "hello"
print(id(x)) # e.g., 140234567891456 - actual memory address in CPython
# Two objects at the same address cannot coexist simultaneously
# But addresses CAN be reused after deallocation!
a = [1, 2, 3]
address = id(a)
del a # a is deallocated (if no other references)
b = [4, 5, 6] # New allocation may reuse the same address
print(id(b) == address) # Possibly True - address reuse!
:::warning id() Uniqueness is Temporal, Not Absolute
id() only guarantees that two live objects have different ids. After an object is deallocated, its memory address can be reused by a new object. Never store id() values for later comparison - the original object may have been deallocated and the address reused.
:::
is vs ==: Identity vs Equality
# == : value equality (calls __eq__)
a = [1, 2, 3]
b = [1, 2, 3]
print(a == b) # True - same values
print(a is b) # False - different objects in memory
# is : identity (same ob_refcnt increment, same memory address)
c = a
print(a is c) # True - same object
# Correct uses of 'is':
x = None
if x is None: # Correct - None is a singleton
print("is None")
y = True
if y is True: # Technically works but use == for booleans
print("is True")
# WRONG use of 'is':
n = 1000
m = 1000
if n is m: # WRONG - may be True or False depending on CPython internals
print("same") # Do not rely on this! Use n == m
| Expression | Result | Why |
|---|---|---|
a = [1,2,3] / b = [1,2,3] | - | Different objects: id(a)=0x7f1234, id(b)=0x7f5678 |
c = a | - | Same object: id(c)=0x7f1234 |
a == b | True | Values are equal |
a is b | False | Different objects in memory |
a == c | True | Values equal (trivially - same object) |
a is c | True | Same object in memory |
Part 5 - The del Keyword: Unbinding, Not Destroying
del removes a name binding. It does not directly destroy the object. The object is only destroyed when its reference count reaches zero.
import sys
x = [1, 2, 3]
y = x # ob_refcnt = 2
del x # Removes the name 'x'. ob_refcnt drops to 1.
# The list is NOT destroyed - y still holds it.
print(y) # [1, 2, 3] - still alive via y
print(sys.getrefcount(y)) # 2 (y + getrefcount arg)
del y # ob_refcnt drops to 0 → list IS destroyed
# y = ... # NameError: name 'y' is not defined
del vs list.clear() vs = []
data = [1, 2, 3]
other = data # alias
# Option 1: del removes the name binding
del data # 'data' no longer exists; 'other' still references the list
# print(data) # NameError
# Option 2: .clear() empties the list object in place
data = [1, 2, 3]
other = data
data.clear() # Empties the ONE list object both names reference
print(other) # [] - other sees the change!
# Option 3: = [] rebinds the name to a new empty list
data = [1, 2, 3]
other = data
data = [] # Rebinds 'data' to a new list; 'other' is unchanged
print(other) # [1, 2, 3] - original list, unchanged
print(data) # [] - new empty list
:::note del in dict and list
del d[key] removes a key from a dict - this is not removing a name binding, it is calling dict.__delitem__(key). del lst[i] removes an element from a list - calls list.__delitem__(i). These operate on the object, not on a name binding. Only del variable_name removes a name from a namespace.
:::
Part 6 - sys.getrefcount(): Reading Reference Counts
import sys
x = [1, 2, 3]
print(sys.getrefcount(x)) # 2 (x + getrefcount's temporary arg)
# Why always n+1? The function call creates a temporary reference
# to the argument before the function executes.
y = x
print(sys.getrefcount(x)) # 3 (x, y, getrefcount's arg)
def hold(obj):
print(sys.getrefcount(obj)) # 4 during call (x, y, obj param, getrefcount arg)
hold(x)
print(sys.getrefcount(x)) # 3 again after hold() returns (x, y, getrefcount arg)
Reference Count Contributions
import sys
lst = []
# Start: 2 (lst + getrefcount)
print(sys.getrefcount(lst)) # 2
# Put it in a container
container = [lst]
print(sys.getrefcount(lst)) # 3 (lst, container[0], getrefcount)
# Put it in a dict
d = {"key": lst}
print(sys.getrefcount(lst)) # 4 (lst, container[0], d["key"], getrefcount)
# Assign to another name
alias = lst
print(sys.getrefcount(lst)) # 5
# Remove from container
container.pop()
print(sys.getrefcount(lst)) # 4
# Clear the dict
d.clear()
print(sys.getrefcount(lst)) # 3
# Remove alias
del alias
print(sys.getrefcount(lst)) # 2 (back to just lst + getrefcount)
Part 7 - Weak References: Breaking Cycles Without Preventing Collection
A weak reference is a reference to an object that does not increment the reference count. If the object's only remaining references are weak references, it is eligible for garbage collection.
import weakref
class ExpensiveObject:
def __init__(self, data):
self.data = data
def __del__(self):
print(f"ExpensiveObject deallocated: {self.data}")
# Normal (strong) reference
obj = ExpensiveObject("important data")
print(f"refcount after creation: {import_sys().getrefcount(obj) - 1}") # 1
# Create a weak reference
weak = weakref.ref(obj)
print(f"Is alive: {weak() is not None}") # True - obj is still alive
# The weak reference can be called to get the object (or None if dead)
retrieved = weak()
print(retrieved.data) # "important data"
del retrieved # Release this strong reference
print(f"Is alive: {weak() is not None}") # True - 'obj' still holds it
del obj # Remove the last strong reference
# ExpensiveObject deallocated: important data ← printed here
print(f"Is alive: {weak() is not None}") # False - object is gone
print(weak()) # None
Practical Use: WeakValueDictionary for Caches
The most common production use of weakref is building caches that do not prevent garbage collection of the cached objects.
import weakref
class WeakCache:
"""Cache that allows entries to be garbage-collected when no longer used."""
def __init__(self):
self._cache = weakref.WeakValueDictionary()
def get(self, key):
return self._cache.get(key) # Returns None if object was collected
def set(self, key, value):
self._cache[key] = value # Weak reference - won't prevent GC
# Usage
cache = WeakCache()
class BigModel:
def __init__(self, name):
self.name = name
model_a = BigModel("resnet50")
cache.set("resnet50", model_a)
print(cache.get("resnet50")) # <BigModel object>
del model_a # Last strong reference removed
# model_a is eligible for GC immediately
import gc; gc.collect()
print(cache.get("resnet50")) # None - automatically evicted from cache!
Weak References to Break Circular Reference Memory Leaks
import weakref
class TreeNode:
def __init__(self, value):
self.value = value
self.children = []
self.parent = None # Strong reference to parent → creates cycle!
class TreeNodeFixed:
def __init__(self, value):
self.value = value
self.children = []
self._parent_ref = None # Weak reference to parent
@property
def parent(self):
if self._parent_ref is None:
return None
return self._parent_ref() # Call to get the object (may return None)
@parent.setter
def parent(self, node):
self._parent_ref = weakref.ref(node) if node is not None else None
# With weak references, deleting the tree root allows the whole tree to be collected
root = TreeNodeFixed("root")
child = TreeNodeFixed("child")
root.children.append(child)
child.parent = root # Weak reference - does not prevent root from being GC'd
print(child.parent.value) # "root" - weak ref still valid
del root # If root is the last strong reference, it can be collected
Part 8 - Memory Leaks in Python
Despite garbage collection, Python programs can leak memory. The common causes:
Leak Type 1: Unbounded Global Caches
# Global cache that grows forever - classic memory leak
_RESULT_CACHE = {}
def get_result(key):
if key not in _RESULT_CACHE:
_RESULT_CACHE[key] = compute(key) # Stored forever - never evicted
return _RESULT_CACHE[key]
# Fix: use functools.lru_cache with a size limit
from functools import lru_cache
@lru_cache(maxsize=1000)
def get_result_safe(key):
return compute(key) # Automatically evicts when > 1000 entries
Leak Type 2: Closures Holding Large Objects
import sys
def make_adder(large_data):
# large_data is captured by the closure
total = sum(large_data)
def adder(x):
return x + total # Only uses 'total', but large_data is also kept alive!
# Because: closures capture entire local scopes in Python 2
# In Python 3, free variables are captured, but...
return adder
big_list = list(range(10_000_000)) # ~80 MB
adder = make_adder(big_list)
del big_list # Try to free the memory
# In CPython, closures capture free variables, not the entire frame
# But if 'large_data' is referenced inside the function, it IS captured
# SAFE version: extract only what you need before closing over it
def make_adder_safe(large_data):
total = sum(large_data)
# large_data is NOT captured by the closure - only total is
del large_data # Explicitly release reference before returning closure
def adder(x):
return x + total # Only captures 'total' - a single int
return adder
Leak Type 3: Circular References with __del__
import gc
class Problematic:
def __del__(self):
print("Cleaning up...") # Has a finalizer
# Circular reference involving objects with __del__ is especially problematic
# because CPython's cyclic GC cannot collect objects with __del__ in cycles
# (Python 3.4+ resolves this with PEP 442, but it's still worth understanding)
a = Problematic()
b = Problematic()
a.partner = b
b.partner = a
del a
del b
# gc.collect() will collect these in Python 3.4+ (PEP 442 resolved the issue)
# In Python 2, objects with __del__ in cycles were uncollectable!
gc.collect()
Diagnosing Memory Leaks
import gc
import sys
def find_leaks():
"""Diagnostic: find objects that survived multiple GC cycles."""
gc.collect()
gc.collect()
gc.collect() # Three passes to ensure old-generation is cleaned
# Get all live objects
all_objects = gc.get_objects()
# Count by type
from collections import Counter
type_counts = Counter(type(obj).__name__ for obj in all_objects)
for type_name, count in type_counts.most_common(20):
print(f" {count:>8,} {type_name}")
# Use objgraph for more detailed leak analysis
# pip install objgraph
# import objgraph
# objgraph.show_most_common_types(limit=10)
# objgraph.show_backrefs(obj, max_depth=3)
Part 9 - Reference Graph for Complex Structures
Understanding how objects form graphs in memory is essential for diagnosing leaks and aliasing bugs.
import sys
# A moderately complex structure
data = {
"users": [
{"name": "Alice", "scores": [85, 90, 88]},
{"name": "Bob", "scores": [72, 68, 75]},
],
"config": {"version": 2, "active": True},
}
Reference Count Contributions per Object
import sys
alice_record = {"name": "Alice", "scores": [85, 90, 88]}
data = {"users": [alice_record], "config": {}}
# alice_record is referenced by:
# 1. The name 'alice_record' in this scope
# 2. data["users"][0] (the list inside "users" holds a reference)
print(sys.getrefcount(alice_record)) # 3 (alice_record, list[0], getrefcount arg)
# The scores list is referenced by:
# 1. alice_record["scores"]
scores = alice_record["scores"]
print(sys.getrefcount(scores)) # 3 (scores local name, alice_record["scores"], getrefcount arg)
Part 10 - Slices Create New Objects; Generators Do Not
Understanding whether an operation creates a new object or shares the existing one is crucial for memory analysis.
import sys
# Python list slices CREATE new list objects
original = [1, 2, 3, 4, 5]
sliced = original[1:4]
print(original is sliced) # False - new list object
print(id(original) == id(sliced)) # False
# But the ELEMENTS are shared (shallow copy)
print(original[1] is sliced[0]) # True - same integer object (small int cache)
# Modifying the slice does NOT affect original (for immutable elements)
sliced[0] = 99
print(original) # [1, 2, 3, 4, 5] - unchanged
print(sliced) # [99, 3, 4]
# Generator expressions do NOT create a new collection
original_data = range(1_000_000)
# This creates NO new object until iterated
gen = (x * x for x in original_data)
print(sys.getsizeof(gen)) # ~200 bytes - just the generator state
# vs list comprehension - creates the full list immediately
lst = [x * x for x in original_data]
print(sys.getsizeof(lst)) # ~8+ MB
NumPy Views vs Python Copies
import numpy as np
# CRITICAL DIFFERENCE: NumPy slices are VIEWS, not copies
arr = np.array([1, 2, 3, 4, 5])
view = arr[1:4] # Creates a VIEW of the same memory
print(np.shares_memory(arr, view)) # True - same underlying buffer!
view[0] = 99
print(arr) # [ 1 99 3 4 5] - original modified through the view!
# Python list slice: independent copy
lst = [1, 2, 3, 4, 5]
sliced = lst[1:4] # Independent copy
sliced[0] = 99
print(lst) # [1, 2, 3, 4, 5] - unchanged
:::danger NumPy Views vs Python Copies
Python list slices always create independent copies. NumPy array slices create views that share memory with the original. Modifying a NumPy slice modifies the original. This is one of the most common bugs in data science code. Use arr[1:4].copy() to get an independent NumPy array.
:::
Part 11 - Real-World Debugging with gc and sys
Finding Unexpected References
import gc
import sys
def find_referrers(obj):
"""Find all objects that hold a reference to obj."""
gc.collect() # Ensure GC has run
referrers = gc.get_referrers(obj)
print(f"Object: {obj!r}")
print(f"Reference count: {sys.getrefcount(obj) - 1}") # -1 for getrefcount's own ref
print(f"Referrers ({len(referrers)}):")
for ref in referrers:
print(f" type={type(ref).__name__}: {repr(ref)[:100]}")
# Example: find what's keeping a large dict alive
large_dict = {"data": list(range(10_000))}
find_referrers(large_dict)
Checking for Cycles
import gc
# Enable gc debug flags to see cycle collection in action
# gc.set_debug(gc.DEBUG_LEAK) # Prints when uncollectable garbage is found
# gc.set_debug(gc.DEBUG_STATS) # Prints GC statistics each collection
# Check how many objects are in each generation
print(f"Gen 0: {len(gc.get_objects(generation=0))} objects") # Python 3.8+
print(f"Gen 1: {len(gc.get_objects(generation=1))} objects")
print(f"Gen 2: {len(gc.get_objects(generation=2))} objects")
# Count uncollectable garbage (objects that can't be freed)
gc.collect()
print(f"Garbage: {len(gc.garbage)} uncollectable objects")
# gc.garbage lists objects that GC found in cycles but couldn't collect
# (e.g., objects with __del__ in Python 2, or bugs in C extensions)
Interview Questions
Q1: How does CPython's reference counting work, and what is its fundamental limitation?
Answer: Every CPython object contains an ob_refcnt field in its PyObject header. When a new reference to the object is created (assignment, function call, storing in a container), CPython calls Py_INCREF() to increment ob_refcnt. When a reference is removed (variable goes out of scope, del, container cleared), Py_DECREF() is called. When ob_refcnt reaches zero, the object is immediately deallocated - no garbage collection pass needed. The fundamental limitation: circular references. If A references B and B references A, both have ob_refcnt ≥ 1 even with no external references. They never reach zero. CPython supplements reference counting with a cyclic garbage collector (gc module) to detect and collect such unreachable cycles.
Q2: What does del x actually do in Python?
Answer: del x removes the name binding x from the current namespace (a STORE_FAST is reversed to a DELETE_FAST in bytecode). It does not directly destroy the object that x was pointing to. The object is only destroyed when its ob_refcnt drops to zero, which happens only when all references to it are removed. If another name y = x was created before del x, the object remains alive via y. del x followed by using x raises NameError. del lst[i] and del d[key] are different - they call __delitem__, removing an element from a container rather than removing a name from a namespace.
Q3: What is the difference between is and ==?
Answer: is tests identity: it checks whether two names point to the same object in memory (id(a) == id(b)). == tests equality: it calls a.__eq__(b) which compares values. Two distinct objects can be equal (==) without being identical (is): [1,2,3] == [1,2,3] is True but [1,2,3] is [1,2,3] is False (two different list objects). The only correct uses of is are: comparing to None (x is None), comparing to True/False in specific type checks, and explicit identity checks where object sameness matters (e.g., "is this the same callback object?"). Never use is to compare integers, strings, or other value types - the result depends on CPython implementation details like integer caching and string interning.
Q4: How do weak references break circular reference cycles?
Answer: A weak reference (weakref.ref(obj)) points to an object without incrementing its ob_refcnt. The object can be garbage-collected even if weak references to it exist - when the last strong reference is removed, the object is deallocated and all weak references to it become None. In a parent-child tree, if parent → children are strong references (parent owns children) and child → parent is a weak reference (child merely observes parent), then the reference graph has no strong cycle. Deleting the root parent allows the entire subtree to be collected via reference counting alone, without needing the cyclic GC. weakref.WeakValueDictionary is used to build caches where entries are automatically removed when no other strong reference exists.
Q5: What causes a memory leak in Python, and how would you diagnose one?
Answer: Python memory leaks arise from: (1) Unbounded caches - dicts that grow without eviction (use functools.lru_cache or weakref.WeakValueDictionary). (2) Circular references with finalizers - in older Python (< 3.4), objects with __del__ in a cycle were uncollectable. (3) Closures holding large objects - a closure captures all free variables, including large objects that the inner function only partly uses. (4) Event listeners / callbacks - registering a callback in a global event system keeps the object alive indefinitely. Diagnosis tools: gc.get_objects() to count live objects by type, gc.get_referrers(obj) to find what's keeping an object alive, sys.getsizeof() for size measurement, and third-party libraries like objgraph for visual reference graphs.
Q6: Why does sys.getrefcount(x) always return at least 2 for a live object?
Answer: sys.getrefcount(x) evaluates its argument before calling the function. The evaluation of x passes a reference to the list into the function call, creating a temporary reference for the duration of the call. This is standard Python calling convention: when you write func(x), CPython increments ob_refcnt for the argument before entering the function body, and decrements it after the function returns. So the reference count inside sys.getrefcount is always at least 1 (the permanent binding x) + 1 (the temporary argument reference) = 2. The rule: "subtract 1 from sys.getrefcount() to get the actual reference count."
Practice Challenges
Beginner - Trace the Reference Count
Trace the reference count of the list [1, 2, 3] through each line:
import sys
a = [1, 2, 3] # Line 1
b = a # Line 2
c = [a, a] # Line 3
del b # Line 4
c.pop() # Line 5
del a # Line 6
# How many references does [1, 2, 3] have after line 6?
Solution
Line 1: a = [1,2,3] → refcount = 1 (just 'a')
Line 2: b = a → refcount = 2 (a + b)
Line 3: c = [a, a] → refcount = 4 (a + b + c[0] + c[1])
Line 4: del b → refcount = 3 (a + c[0] + c[1])
Line 5: c.pop() → refcount = 2 (a + c[0], one removed from c)
Line 6: del a → refcount = 1 (only c[0])
After line 6, the list [1, 2, 3] has 1 reference - held by c[0]. It will not be garbage collected until c is cleared or goes out of scope. Note: sys.getrefcount() would show 2 due to its own temporary reference.
Intermediate - Fix the Memory Leak
The following server stores session data. Identify the memory leak and fix it.
import threading
import time
class SessionStore:
def __init__(self):
self._sessions = {} # session_id → user data
self._lock = threading.Lock()
def create_session(self, session_id: str, user_data: dict) -> None:
with self._lock:
self._sessions[session_id] = user_data
def get_session(self, session_id: str) -> dict | None:
with self._lock:
return self._sessions.get(session_id)
def delete_session(self, session_id: str) -> None:
with self._lock:
self._sessions.pop(session_id, None)
# Simulate: 100,000 sessions created, only 10% explicitly deleted
store = SessionStore()
for i in range(100_000):
store.create_session(f"session_{i}", {"user_id": i, "data": "x" * 1000})
# Only 10% get deleted
for i in range(0, 100_000, 10):
store.delete_session(f"session_{i}")
# 90,000 sessions remain in memory indefinitely!
Solution
The leak: _sessions is an unbounded dict. Sessions that are never explicitly deleted (due to client disconnection, crash, etc.) remain in memory forever. In a production server handling millions of requests, this exhausts memory.
Fixed version with TTL-based eviction:
import threading
import time
import weakref
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class Session:
session_id: str
user_data: dict
created_at: float = field(default_factory=time.monotonic)
last_accessed: float = field(default_factory=time.monotonic)
ttl_seconds: float = 1800.0 # 30 minutes
def is_expired(self) -> bool:
return time.monotonic() - self.last_accessed > self.ttl_seconds
def touch(self) -> None:
self.last_accessed = time.monotonic()
class TTLSessionStore:
"""Session store with automatic TTL-based expiration."""
def __init__(self, ttl_seconds: float = 1800.0, cleanup_interval: float = 300.0):
self._sessions: dict[str, Session] = {}
self._lock = threading.Lock()
self._ttl = ttl_seconds
self._cleanup_interval = cleanup_interval
self._start_cleanup_thread()
def _start_cleanup_thread(self):
"""Background thread to evict expired sessions."""
def cleanup_loop():
while True:
time.sleep(self._cleanup_interval)
self._evict_expired()
t = threading.Thread(target=cleanup_loop, daemon=True)
t.start()
def _evict_expired(self):
with self._lock:
expired = [sid for sid, s in self._sessions.items() if s.is_expired()]
for sid in expired:
del self._sessions[sid]
if expired:
print(f"Evicted {len(expired)} expired sessions")
def create_session(self, session_id: str, user_data: dict) -> None:
with self._lock:
self._sessions[session_id] = Session(
session_id=session_id,
user_data=user_data,
ttl_seconds=self._ttl,
)
def get_session(self, session_id: str) -> Optional[dict]:
with self._lock:
session = self._sessions.get(session_id)
if session is None:
return None
if session.is_expired():
del self._sessions[session_id]
return None
session.touch() # Update last access time
return session.user_data
def delete_session(self, session_id: str) -> None:
with self._lock:
self._sessions.pop(session_id, None)
def __len__(self) -> int:
with self._lock:
return len(self._sessions)
Alternative: use cachetools for production (battle-tested TTL cache)
from cachetools import TTLCache
import threading
class ProductionSessionStore:
def __init__(self, maxsize=100_000, ttl=1800):
# TTLCache automatically evicts entries older than ttl seconds
# maxsize prevents unbounded growth
self._cache = TTLCache(maxsize=maxsize, ttl=ttl)
self._lock = threading.Lock()
def create_session(self, session_id, user_data):
with self._lock:
self._cache[session_id] = user_data
def get_session(self, session_id):
with self._lock:
return self._cache.get(session_id)
Advanced - Implement a Reference-Tracking Allocator
Implement a TrackedObject mixin that tracks how many instances are alive and can detect when reference counts drop unexpectedly. Then demonstrate it can catch a circular reference memory leak.
# Goal: count live instances, detect potential leaks
class TrackedObject:
# ... implement this
class Node(TrackedObject):
def __init__(self, value):
self.value = value
self.next = None
# After this code, TrackedObject.live_count() should report correctly
n1 = Node(1)
n2 = Node(2)
n1.next = n2 # safe - no cycle
# TrackedObject.live_count() → 2
n1.next = n1 # cycle!
del n1
del n2
# Without gc.collect(), live_count may still report 2 (cycle leak)
# After gc.collect(), it should report 0
Solution
import gc
import sys
import weakref
from threading import Lock
class TrackedObject:
"""
Mixin that tracks live instances using weak references.
Uses WeakSet so tracking does not prevent garbage collection.
"""
_instances: "weakref.WeakSet[TrackedObject]" = weakref.WeakSet()
_lock = Lock()
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs)
# Each subclass gets its own WeakSet
cls._instances = weakref.WeakSet()
cls._lock = Lock()
def __init__(self):
with self.__class__._lock:
self.__class__._instances.add(self)
@classmethod
def live_count(cls) -> int:
"""Return the number of currently live instances."""
return len(cls._instances)
@classmethod
def check_for_leaks(cls) -> dict:
"""Run GC and check if live count changed."""
before = cls.live_count()
gc.collect()
after = cls.live_count()
return {
"before_gc": before,
"after_gc": after,
"leaked": before - after,
}
class Node(TrackedObject):
def __init__(self, value):
super().__init__() # Register with tracker
self.value = value
self.next = None
def __repr__(self):
return f"Node({self.value})"
# Test 1: Normal usage - no leak
n1 = Node(1)
n2 = Node(2)
n1.next = n2
print(f"Live nodes: {Node.live_count()}") # 2
del n1, n2
gc.collect()
print(f"Live nodes after del: {Node.live_count()}") # 0
# Test 2: Circular reference - leak detection
n1 = Node(10)
n2 = Node(20)
n1.next = n2
n2.next = n1 # Cycle!
del n1
del n2
leak_report = Node.check_for_leaks()
print(f"\nLeak report: {leak_report}")
# {'before_gc': 2, 'after_gc': 0, 'leaked': 2}
# 'before_gc' = 2 (reference counting couldn't collect the cycle)
# 'after_gc' = 0 (cyclic GC successfully collected them)
# 'leaked' = 2 (how many were collected by GC rather than refcounting)
print(f"Live nodes after GC: {Node.live_count()}") # 0
# Test 3: Mixed - some live, some leaked
a = Node(100) # Will stay alive
b = Node(200)
c = Node(300)
b.next = c
c.next = b # Cycle between b and c
del b, c # Try to delete b and c
print(f"\nAfter del b,c - live nodes: {Node.live_count()}") # 3 (a + b,c in cycle)
gc.collect()
print(f"After gc.collect() - live nodes: {Node.live_count()}") # 1 (only a)
del a
gc.collect()
print(f"After del a, gc - live nodes: {Node.live_count()}") # 0
Key design decisions:
WeakSetstores weak references to instances - the tracker does not prevent GC__init_subclass__gives each subclass its own separateWeakSet(not shared)check_for_leaks()compares count before and aftergc.collect()- the "leaked" count shows how many objects were held alive by cycles vs held by reference counting- Thread-safe via
LocksinceWeakSetis not thread-safe by default
Quick Reference
| Operation | Effect on ob_refcnt | Object Destroyed? |
|---|---|---|
a = obj | +1 | No |
del a | -1 | Only if count reaches 0 |
a = other | -1 for old, +1 for new | Old destroyed if count reaches 0 |
lst.append(obj) | +1 | No |
lst.pop() | -1 | If count reaches 0 |
func(obj) (during call) | +1 | No |
| Return from func | -1 | If count reaches 0 |
weakref.ref(obj) | 0 (no change) | Yes (when strong refs gone) |
| Tool | Purpose | Key Method |
|---|---|---|
sys.getrefcount(obj) | Current reference count | Returns count (subtract 1 for own ref) |
id(obj) | Object identity (memory address in CPython) | Compare with is |
gc.collect() | Force cyclic garbage collection | Returns # of unreachable objects |
gc.get_referrers(obj) | Find what holds a reference to obj | Returns list of referrers |
gc.get_objects() | All tracked objects | Useful for leak diagnosis |
weakref.ref(obj) | Create weak reference | Call result to get obj or None |
weakref.WeakValueDictionary | Dict that doesn't prevent GC of values | Auto-evicts when values collected |
Key Takeaways
- CPython uses reference counting (
ob_refcntinPyObject) as the primary memory management mechanism - when a count reaches zero, the object is immediately deallocated, no GC pass needed - Reference counting cannot handle circular references - CPython's supplementary cyclic GC (
gcmodule) detects unreachable islands of objects with non-zero reference counts - Aliasing - multiple names pointing to the same object - is the root cause of unintended mutation bugs; use
id()oristo detect aliasing del xremoves the name binding, not the object - the object lives as long as any reference to it existssys.getrefcount(x)always returns at least 2 - subtract 1 to get the real count (the extra 1 is the temporary argument reference)- Weak references (
weakref.ref,WeakValueDictionary) point to objects without incrementingob_refcnt, enabling caches and parent pointers that do not prevent garbage collection - Python list slices create new list objects (but share element references); NumPy array slices create views - a critical difference that causes silent data corruption
- Memory leaks in Python typically come from unbounded global caches, closures capturing large objects unnecessarily, and circular references involving objects with
__del__(pre-Python 3.4)
