Python Memory References Practice Problems & Exercises
Practice: Memory References and Aliasing
← Back to lessonEasy
Predict the output of each comparison. Think carefully about when Python creates new objects versus reusing existing ones.
a = [1, 2, 3] b = a print(a is b) c = [1, 2, 3] print(a is c) print(a == c) x = None y = None print(x is y) p = [10, 20] q = [10, 20] print(p is q)
Solution
True
False
True
True
False
How it works:
a is bisTruebecauseb = amakesban alias — both names point to the same list object in memory.a is cisFalsebecausec = [1, 2, 3]creates a brand new list object, even though the contents are identical.a == cisTruebecause==compares values, and both lists contain[1, 2, 3].x is yisTruebecauseNoneis a singleton — there is exactly oneNoneobject in the interpreter.p is qisFalsebecause each[10, 20]literal creates a separate list object.
Key insight: Use is only for singletons (None, True, False) or when you explicitly need identity checks. Use == for value comparison in all other cases.
Expected Output
True\nFalse\nTrue\nTrue\nFalseHints
Hint 1: The `is` operator checks whether two names point to the exact same object in memory (same id).
Hint 2: `==` checks value equality by calling __eq__. Two distinct list objects with the same contents are equal but not identical.
Predict the output at each print call. Remember that sys.getrefcount() adds 1 to the true count.
import sys x = [] print(sys.getrefcount(x)) y = x z = [x] print(sys.getrefcount(x)) del y print(sys.getrefcount(x)) z.clear() print(sys.getrefcount(x))
Solution
2
4
3
2
Step-by-step trace:
x = []— refcount is 1 (justx).sys.getrefcount(x)adds its own temporary reference, so prints2.y = xadds +1.z = [x]adds +1 (stored inz[0]). Now the object has 3 real references (x,y,z[0]). With getrefcount's temporary:4.del yremoves one reference. Real count: 2 (x,z[0]). With getrefcount:3.z.clear()removes the list element, dropping thez[0]reference. Real count: 1 (x). With getrefcount:2.
The +1 rule: Always subtract 1 from sys.getrefcount() to get the actual number of references in your code. The function itself holds a temporary reference to its argument for the duration of the call.
Expected Output
2\n4\n3\n2Hints
Hint 1: sys.getrefcount() always returns one more than the actual count because the function argument itself is a temporary reference.
Hint 2: Storing an object in a list increments its reference count. Removing it from the list decrements it.
Predict the output. Three different ways to "remove" a list — only one actually destroys the data for all aliases.
# Scenario 1: del removes the name, not the object a = [1, 2, 3] b = a del a print(b) # Scenario 2: rebinding creates a new object c = [1, 2, 3] d = c c = [] print(d) # Scenario 3: clear() mutates in place e = [1, 2, 3] f = e e.clear() print(f) # Scenario 4: reassignment does not affect aliases g = [1, 2, 3] h = g g = [4, 5, 6] print(h)
Solution
[1, 2, 3]
[1, 2, 3]
[]
[1, 2, 3]
What each operation actually does:
- Scenario 1 —
del a: Removes the nameafrom the namespace. The list object still exists becausebholds a reference.delnever destroys the object directly. - Scenario 2 —
c = []: Rebinds the namecto a new empty list. The original[1, 2, 3]object is unchanged and still referenced byd. - Scenario 3 —
e.clear(): Mutates the list object in place, emptying it. Sincefis an alias to the same object,falso sees an empty list. - Scenario 4 —
g = [4, 5, 6]: Rebindsgto a completely new list.hstill references the original[1, 2, 3].
Key insight: Only in-place mutation (clear(), append(), extend(), [i] = val) affects all aliases. Rebinding (=) and del only change the name, not the object.
Expected Output
[1, 2, 3]\n[1, 2, 3]\n[]\n[1, 2, 3]Hints
Hint 1: del removes the name binding, not the object. If another name still references the object, it stays alive.
Hint 2: list.clear() mutates the object in place — all aliases see the change. Rebinding with = [] creates a new object.
Predict the output. This is one of the most common aliasing bugs in Python.
# The buggy way bad_matrix = [[0] * 3] * 3 print(bad_matrix[0] is bad_matrix[1]) bad_matrix[0][0] = 9 print(bad_matrix) # The correct way good_matrix = [[0] * 3 for _ in range(3)] print(good_matrix[0] is good_matrix[1]) good_matrix[0][0] = 9 print(good_matrix)
Solution
True
[[9, 0, 0], [9, 0, 0], [9, 0, 0]]
False
[[9, 0, 0], [0, 0, 0], [0, 0, 0]]
Why the bug happens:
[[0] * 3] * 3 first creates a single inner list [0, 0, 0], then the outer * 3 puts three references to that same object in the outer list. All three "rows" are the same object in memory, so modifying one modifies all of them.
The list comprehension [[0] * 3 for _ in range(3)] evaluates [0] * 3 three separate times, creating three independent list objects. Modifying one row has no effect on the others.
Memory picture:
Bad: outer -> [ptr, ptr, ptr] --> [0, 0, 0] (ONE object)
Good: outer -> [ptr1, ptr2, ptr3]
| | |
v v v
[0,0,0] [0,0,0] [0,0,0] (THREE objects)
Rule of thumb: Never use * n to duplicate mutable objects inside a list. Always use a comprehension or explicit loop.
Expected Output
True\n[[9, 0, 0], [9, 0, 0], [9, 0, 0]]\nFalse\n[[9, 0, 0], [0, 0, 0], [0, 0, 0]]Hints
Hint 1: `[[0] * 3] * 3` creates one inner list object and puts three references to it in the outer list.
Hint 2: A list comprehension `[[0] * 3 for _ in range(3)]` creates three independent inner lists.
Medium
Predict the output. Pay attention to which functions mutate the argument and which rebind the local name.
def append_to(lst, val):
lst.append(val)
def replace_list(lst, new_lst):
lst = new_lst # Rebinds the local name only
def extend_copy(lst, val):
copy = list(lst)
copy.append(val)
return copy
data = [1, 2, 3]
append_to(data, 99)
print(data)
alias = data
print(alias)
original = [10, 20]
replace_list(original, [77, 88])
print(original)
result = extend_copy(original, 99)
print(result)Solution
[1, 2, 3, 99]
[1, 2, 3, 99]
[10, 20]
[10, 20, 99]
Analysis:
append_to(data, 99)—lstis an alias fordata. Callinglst.append(val)mutates the original list.datais now[1, 2, 3, 99].alias = datawas set beforeappend_to, but sincealiasanddatapoint to the same object,aliasalso shows[1, 2, 3, 99].replace_list(original, [77, 88])— inside the function,lst = new_lstrebinds the local variablelstto[77, 88]. The caller'soriginalis untouched because rebinding a local name does not affect the caller's reference.extend_copy(original, 99)— creates a new list vialist(lst), appends to the copy, and returns it. The original is unchanged.
Key insight: Python uses "pass by object reference" — the function receives a reference to the same object, not a copy. Mutation through that reference is visible to the caller. Rebinding the local parameter is not.
Expected Output
[1, 2, 3, 99]\n[1, 2, 3, 99]\n[10, 20]\n[10, 20, 99]Hints
Hint 1: When a mutable object is passed to a function, the parameter is an alias to the same object. Mutations via the parameter are visible to the caller.
Hint 2: Rebinding the parameter with = inside the function only changes the local name — the caller does not see the rebinding.
Write a function that traces the reference count of an object through a series of operations, correctly accounting for the sys.getrefcount() overhead. Show how each container operation increments or decrements the count.
import sys
def trace_refcount(label, obj):
count = sys.getrefcount(obj) - 1 # Subtract getrefcount's own reference
print(f"{label:<20}{count}")
def demonstrate_contributions():
obj = object()
trace_refcount("After creation:", obj)
alias = obj
trace_refcount("After alias:", obj)
container = [obj]
trace_refcount("After list store:", obj)
d = {"key": obj}
trace_refcount("After dict store:", obj)
container.pop()
trace_refcount("After list remove:", obj)
d.clear()
trace_refcount("After dict clear:", obj)
del alias
trace_refcount("After del alias:", obj)
demonstrate_contributions()Solution
import sys
def trace_refcount(label, obj):
count = sys.getrefcount(obj) - 1
print(f"{label:<20}{count}")
def demonstrate_contributions():
obj = object()
trace_refcount("After creation:", obj) # 1: just 'obj'
alias = obj
trace_refcount("After alias:", obj) # 2: obj + alias
container = [obj]
trace_refcount("After list store:", obj) # 3: obj + alias + container[0]
d = {"key": obj}
trace_refcount("After dict store:", obj) # 4: obj + alias + container[0] + d["key"]
container.pop()
trace_refcount("After list remove:", obj) # 3: obj + alias + d["key"]
d.clear()
trace_refcount("After dict clear:", obj) # 2: obj + alias
del alias
trace_refcount("After del alias:", obj) # 1: just obj
demonstrate_contributions()
Reference count sources:
Each of these operations increments ob_refcnt:
- Assignment to a name (
alias = obj) - Storage in a list (
container = [obj]orlst.append(obj)) - Storage as a dict value (
d["key"] = obj) - Passing as a function argument (temporary, during the call)
- Storage as an attribute (
instance.attr = obj)
Each of these decrements ob_refcnt:
del namelist.pop(),list.clear(),del lst[i]dict.clear(),del d[key],dict.pop(key)- Function argument going out of scope (when the call returns)
- Rebinding a name (
alias = something_else)
When ob_refcnt reaches 0, CPython immediately deallocates the object.
import sys
def trace_refcount(label, obj):
"""Print the reference count minus the getrefcount overhead."""
# TODO: return the actual reference count (subtract getrefcount's +1)
pass
def demonstrate_contributions():
"""Show how different operations affect reference count."""
obj = object()
# TODO: print refcount after each step:
# 1. After creation (just one name)
# 2. After assigning to another name
# 3. After storing in a list
# 4. After storing in a dict value
# 5. After removing from the list
# 6. After clearing the dict
# 7. After deleting the alias
pass
demonstrate_contributions()Expected Output
After creation: 1\nAfter alias: 2\nAfter list store: 3\nAfter dict store: 4\nAfter list remove: 3\nAfter dict clear: 2\nAfter del alias: 1Hints
Hint 1: sys.getrefcount(obj) always returns the actual count + 1 because the function argument is itself a reference.
Hint 2: Storing an object in a list (append, index assignment) increments refcount. Removing it (pop, clear, del lst[i]) decrements it.
Hint 3: Dict values also hold references. Clearing the dict releases those references.
Write utility functions that detect aliasing bugs in nested data structures. find_shared_references finds all index pairs that point to the same object. is_safe_matrix checks if a matrix has any aliased rows.
def find_shared_references(data):
shared = []
for i in range(len(data)):
for j in range(i + 1, len(data)):
if data[i] is data[j]:
shared.append((i, j))
return shared
def is_safe_matrix(matrix):
for i in range(len(matrix)):
for j in range(i + 1, len(matrix)):
if matrix[i] is matrix[j]:
return False
return True
# Test with aliased structure
row_a = [1, 2, 3]
row_b = [4, 5, 6]
data = [row_a, row_b, row_a, row_b, row_a]
print(f"Shared pairs: {find_shared_references(data)}")
print(f"Is safe: {is_safe_matrix(data)}")
# Fix it
fixed = [list(row) for row in data]
print(f"Fixed matrix safe: {is_safe_matrix(fixed)}")Solution
def find_shared_references(data):
"""Find all pairs of indices sharing the same object."""
shared = []
for i in range(len(data)):
for j in range(i + 1, len(data)):
if data[i] is data[j]:
shared.append((i, j))
return shared
def is_safe_matrix(matrix):
"""Return True if no two rows are the same object."""
for i in range(len(matrix)):
for j in range(i + 1, len(matrix)):
if matrix[i] is matrix[j]:
return False
return True
Output:
Shared pairs: [(0, 2), (0, 4), (2, 4), (1, 3)]
Is safe: False
Fixed matrix safe: True
Why this matters in production:
Aliasing bugs are silent — they do not raise exceptions. The program runs, produces wrong results, and the bug is extremely hard to trace because the mutation happens in a completely different part of the code from where the corrupted data is read.
The fix pattern: list(row) creates a shallow copy of each row, breaking the alias. For deeply nested structures, use copy.deepcopy(). The is_safe_matrix check can be added as a debug assertion:
assert is_safe_matrix(my_matrix), "Matrix has aliased rows — mutation will corrupt data"
Performance note: The find_shared_references function is O(n^2) where n is the number of elements. For large structures, you can optimize by grouping elements by id():
from collections import defaultdict
def find_shared_fast(data):
id_to_indices = defaultdict(list)
for i, item in enumerate(data):
id_to_indices[id(item)].append(i)
return [(indices[i], indices[j])
for indices in id_to_indices.values() if len(indices) > 1
for i in range(len(indices)) for j in range(i + 1, len(indices))]
def find_shared_references(data):
"""Given a list of lists, find all pairs of indices that share
the same inner list object (are aliases, not just equal).
Return a list of (i, j) tuples where i < j and data[i] is data[j].
"""
# TODO: implement
pass
def is_safe_matrix(matrix):
"""Return True if no two rows in the matrix are the same object.
A matrix is 'safe' if modifying one row cannot affect another.
"""
# TODO: implement
passExpected Output
Shared pairs: [(0, 2), (0, 4), (2, 4), (1, 3)]\nIs safe: False\nFixed matrix safe: TrueHints
Hint 1: Use the `is` operator to check object identity, not `==` which checks value equality.
Hint 2: Compare every pair (i, j) where i < j. If data[i] is data[j], they are aliases.
Hint 3: A matrix is safe if no pair of rows shares identity.
Demonstrate the difference between objects with circular references (which require the cyclic GC) and objects without cycles (which are cleaned up by reference counting alone).
import gc
def create_cycle():
gc.disable()
gc.collect() # Clear any existing garbage first
a = []
b = []
a.append(b)
b.append(a)
del a
del b
# a and b keep each other alive — refcount never reaches 0
collected = gc.collect()
gc.enable()
return collected
def create_clean():
gc.disable()
gc.collect()
a = [1, 2, 3]
b = [4, 5, 6]
c = a # alias, no cycle
del a
del b
del c
# All refcounts reach 0 — cleaned up by reference counting
collected = gc.collect()
gc.enable()
return collected
cycle_count = create_cycle()
print(f"Cycle collection: collected > 0: {cycle_count > 0}")
clean_count = create_clean()
print(f"Clean collection: collected == 0: {clean_count == 0}")Solution
import gc
def create_cycle():
gc.disable()
gc.collect() # Clear existing garbage
a = []
b = []
a.append(b) # a[0] -> b, so b.refcount = 2 (b, a[0])
b.append(a) # b[0] -> a, so a.refcount = 2 (a, b[0])
del a # a.refcount = 1 (only b[0])
del b # b.refcount = 1 (only a[0])
# Neither reaches 0 — they hold each other alive
collected = gc.collect()
gc.enable()
return collected
def create_clean():
gc.disable()
gc.collect()
a = [1, 2, 3]
b = [4, 5, 6]
c = a
del a # refcount of [1,2,3] drops to 1 (just c)
del b # refcount of [4,5,6] drops to 0 — immediately deallocated
del c # refcount of [1,2,3] drops to 0 — immediately deallocated
collected = gc.collect()
gc.enable()
return collected
Why the cyclic GC is necessary:
Reference counting is fast and deterministic — objects are freed the instant their refcount hits zero. But it cannot detect "islands" of objects that reference each other with no external references. The cyclic GC uses a "trial deletion" algorithm:
- For every container object tracked by GC, temporarily subtract the internal references.
- Objects whose adjusted count is zero are unreachable — they form an island.
- Collect the entire island at once.
Performance implication: The cyclic GC runs periodically (controlled by gc.get_threshold()), introducing brief "stop-the-world" pauses. For most programs this is negligible. For latency-sensitive applications, you can tune or disable the GC — but only if you are confident no cycles exist.
import gc
def create_cycle():
"""Create a circular reference and return the count of objects
collected when gc.collect() is called.
"""
# TODO:
# 1. Disable automatic GC
# 2. Create two objects that reference each other
# 3. Delete the external references
# 4. Run gc.collect() and return the number of collected objects
# 5. Re-enable GC
pass
def create_clean():
"""Create objects WITHOUT circular references.
Show that reference counting alone handles cleanup.
Return the count from gc.collect() (should be 0 or very small).
"""
# TODO: create and delete objects with no cycles
passExpected Output
Cycle collection: collected > 0: True\nClean collection: collected == 0: TrueHints
Hint 1: Disable GC with gc.disable() before creating cycles so they are not automatically collected.
Hint 2: Two lists that append each other form a cycle: a.append(b); b.append(a)
Hint 3: gc.collect() returns the number of unreachable objects found and collected.
Hard
Implement a WeakCache that stores objects using weak references. Cached objects are automatically evicted when all strong references outside the cache are deleted.
import weakref
import gc
class HeavyObject:
"""Simulates a large object that supports weak references."""
def __init__(self, name):
self.name = name
def __repr__(self):
return f"HeavyObject({self.name})"
class WeakCache:
def __init__(self):
self._store = weakref.WeakValueDictionary()
def get(self, key):
return self._store.get(key)
def put(self, key, value):
self._store[key] = value
def size(self):
return len(self._store)
def contains(self, key):
return key in self._store
cache = WeakCache()
obj_a = HeavyObject("alpha")
obj_b = HeavyObject("beta")
cache.put("alpha", obj_a)
cache.put("beta", obj_b)
print(f"After put: size={cache.size()}, contains alpha: {cache.contains('alpha')}")
print(f"Retrieved: {cache.get('alpha')}")
del obj_a
gc.collect()
print(f"After del + GC: size={cache.size()}, contains alpha: {cache.contains('alpha')}")
print(f"Beta still alive: {cache.contains('beta')}")
del obj_b
gc.collect()
print(f"After del beta + GC: size={cache.size()}")Solution
import weakref
import gc
class HeavyObject:
def __init__(self, name):
self.name = name
def __repr__(self):
return f"HeavyObject({self.name})"
class WeakCache:
def __init__(self):
self._store = weakref.WeakValueDictionary()
def get(self, key):
return self._store.get(key)
def put(self, key, value):
self._store[key] = value
def size(self):
return len(self._store)
def contains(self, key):
return key in self._store
How WeakValueDictionary works internally:
- When you store
cache["key"] = obj, the dictionary creates aweakref.ref(obj, callback)instead of a strong reference. - The callback is triggered when
objis garbage collected — it automatically removes the dictionary entry. cache["key"]transparently dereferences the weak reference, returning the object if it is still alive or raisingKeyErrorif it was collected.
Why basic types do not work: int, str, list, tuple, and dict are implemented in C without the __weakref__ slot that weak references require. Custom classes include __weakref__ by default. You can explicitly prevent it with __slots__ (without including __weakref__).
Production use case: Web frameworks use weak caches for expensive computations (template compilation, query plans). The cache speeds up repeated access, but does not prevent garbage collection when memory pressure is high. This is fundamentally different from functools.lru_cache, which holds strong references and requires an explicit maxsize to bound memory.
import weakref
import gc
class WeakCache:
"""A cache that allows entries to be garbage collected
when no strong references remain outside the cache.
Unlike a regular dict cache, this does not cause memory leaks
for objects that are no longer in use.
"""
def __init__(self):
# TODO: initialize with a WeakValueDictionary
pass
def get(self, key):
"""Return cached object or None if not present / collected."""
pass
def put(self, key, value):
"""Store value in cache (weak reference)."""
pass
def size(self):
"""Return number of live entries."""
pass
def contains(self, key):
"""Check if key is in cache and object is still alive."""
passExpected Output
After put: size=2, contains alpha: True\nRetrieved: HeavyObject(alpha)\nAfter del + GC: size=1, contains alpha: False\nBeta still alive: True\nAfter del beta + GC: size=0Hints
Hint 1: weakref.WeakValueDictionary stores weak references as values. When the last strong reference to a value is deleted, the entry is automatically removed.
Hint 2: Basic types like int, str, list do not support weak references. Use a custom class as the cached object.
Hint 3: After deleting strong references, call gc.collect() to ensure the weak references are cleaned up.
Implement a tree node that uses weak references for parent pointers. This prevents circular references between parent and child nodes, allowing the garbage collector to clean up entire subtrees when the root is deleted.
import weakref
import gc
class TreeNode:
def __init__(self, value):
self.value = value
self.children = []
self._parent_ref = None
@property
def parent(self):
if self._parent_ref is None:
return None
return self._parent_ref()
@parent.setter
def parent(self, node):
if node is None:
self._parent_ref = None
else:
self._parent_ref = weakref.ref(node)
def add_child(self, child_node):
self.children.append(child_node)
child_node.parent = self
def depth(self):
d = 0
current = self.parent
while current is not None:
d += 1
current = current.parent
return d
def path_to_root(self):
path = [self.value]
current = self.parent
while current is not None:
path.append(current.value)
current = current.parent
return path
# Build a tree
root = TreeNode("root")
child_a = TreeNode("child_a")
child_b = TreeNode("child_b")
grandchild = TreeNode("grandchild")
root.add_child(child_a)
root.add_child(child_b)
child_a.add_child(grandchild)
print(f"Root children: {len(root.children)}")
print(f"Child A parent: {child_a.parent.value}")
print(f"Grandchild parent: {grandchild.parent.value}")
print(f"Grandchild depth: {grandchild.depth()}")
print(f"Path to root: {grandchild.path_to_root()}")
# Delete the root — weak parent refs become None
del root
gc.collect()
print(f"After del root — grandchild parent alive: {grandchild.parent is not None}")
print(f"Grandchild still accessible: {grandchild.value}")Solution
import weakref
class TreeNode:
def __init__(self, value):
self.value = value
self.children = []
self._parent_ref = None
@property
def parent(self):
if self._parent_ref is None:
return None
return self._parent_ref() # Call the weakref to get the object or None
@parent.setter
def parent(self, node):
if node is None:
self._parent_ref = None
else:
self._parent_ref = weakref.ref(node)
def add_child(self, child_node):
self.children.append(child_node)
child_node.parent = self
def depth(self):
d = 0
current = self.parent
while current is not None:
d += 1
current = current.parent
return d
def path_to_root(self):
path = [self.value]
current = self.parent
while current is not None:
path.append(current.value)
current = current.parent
return path
Why weak parent references matter:
Without weak references, the parent-child relationship creates a reference cycle:
- Parent holds strong references to children via
self.children - Children hold strong references to parent via
self.parent - Deleting the root variable does not free any node — they all keep each other alive
With weak parent references:
- Parent holds strong references to children (ownership)
- Children hold weak references to parent (observation only)
- Deleting the root reduces its refcount to 0 (no strong references from children), which triggers immediate deallocation via reference counting
- This cascades: root's deallocation removes strong references to children, reducing their refcounts
The _parent_ref() call pattern: A weakref.ref object is callable. Calling it returns the referenced object if it is still alive, or None if it has been garbage collected. This is why the parent property getter calls self._parent_ref() — the parentheses dereference the weak reference.
After del root: The grandchild's parent property returns None for child_a (assuming child_a was only referenced through the tree). But grandchild itself is still alive if we hold a direct reference to it. The weak reference cleanly breaks the cycle without losing the ability to navigate the tree while it exists.
import weakref
import gc
import sys
class TreeNode:
"""A tree node that uses weak references for parent pointers
to avoid circular reference memory leaks.
"""
def __init__(self, value):
self.value = value
self.children = []
self._parent_ref = None # weak reference to parent
@property
def parent(self):
"""Return the parent node, or None if no parent or parent was collected."""
# TODO: dereference the weak reference
pass
@parent.setter
def parent(self, node):
"""Set parent using a weak reference."""
# TODO: store a weak reference to node
pass
def add_child(self, child_node):
"""Add a child and set its parent to self."""
# TODO: add to children list and set parent
pass
def depth(self):
"""Return depth of this node (root = 0)."""
# TODO: walk up parent chain
pass
def path_to_root(self):
"""Return list of values from this node to root."""
# TODO: collect values walking up
passExpected Output
Root children: 2\nChild A parent: root\nGrandchild parent: child_a\nGrandchild depth: 2\nPath to root: ['grandchild', 'child_a', 'root']\nAfter del root — grandchild parent alive: False\nGrandchild still accessible: grandchildHints
Hint 1: Store the parent as weakref.ref(node) in the setter. In the getter, call self._parent_ref() to dereference.
Hint 2: If the parent has been garbage collected, the weak reference returns None.
Hint 3: The depth and path_to_root methods must handle the case where parent returns None (the node is the root or parent was collected).
Build a MemoryProfiler that takes snapshots of live objects, diffs them to find types that are growing unexpectedly, and traces referrers to diagnose what is keeping leaked objects alive. This is how real memory leak investigations work.
import gc
import sys
from collections import Counter
class MemoryProfiler:
def __init__(self):
self._snapshots = []
def snapshot(self, label=""):
gc.collect()
gc.collect()
gc.collect()
objects = gc.get_objects()
counts = Counter(type(obj).__name__ for obj in objects)
self._snapshots.append({"label": label, "counts": counts})
def diff(self, idx_a=0, idx_b=-1):
a = self._snapshots[idx_a]["counts"]
b = self._snapshots[idx_b]["counts"]
all_types = set(a.keys()) | set(b.keys())
result = []
for t in all_types:
ca = a.get(t, 0)
cb = b.get(t, 0)
delta = cb - ca
if delta != 0:
result.append((t, ca, cb, delta))
result.sort(key=lambda x: abs(x[3]), reverse=True)
return result
def find_referrers(self, obj, max_depth=2):
gc.collect()
referrers = gc.get_referrers(obj)
results = []
for ref in referrers:
type_name = type(ref).__name__
snippet = repr(ref)[:80]
results.append((type_name, snippet))
return results
def detect_leaks(self, threshold=10):
diffs = self.diff()
return [(t, ca, cb, d) for t, ca, cb, d in diffs if d > threshold]
# --- Simulate a memory leak ---
class LeakyItem:
def __init__(self, data):
self.data = data
profiler = MemoryProfiler()
profiler.snapshot("baseline")
print(f"Snapshot 0 (baseline): types tracked > 0: {len(profiler._snapshots[0]['counts']) > 0}")
# Create a leak: store objects in a global list and "forget" about it
_leaked_storage = []
for i in range(500):
_leaked_storage.append(LeakyItem("x" * 100))
print("Leaked 500 LeakyItem objects")
profiler.snapshot("after leak")
print(f"Snapshot 1 (after leak): types tracked > 0: {len(profiler._snapshots[1]['counts']) > 0}")
# Detect the leak
leaks = profiler.detect_leaks(threshold=100)
for type_name, before, after, delta in leaks:
if type_name == "LeakyItem":
print(f"Leak detected: {type_name} increased by {delta}")
# Find what is keeping a leaked item alive
referrers = profiler.find_referrers(_leaked_storage[0])
has_list = any(t == "list" for t, _ in referrers)
print(f"Referrers of a leaked item: found list referrer: {has_list}")
# Cleanup
_leaked_storage.clear()
profiler.snapshot("after cleanup")
cleanup_diffs = profiler.diff(1, 2)
leaky_delta = 0
for t, ca, cb, d in cleanup_diffs:
if t == "LeakyItem":
leaky_delta = d
break
print(f"After cleanup: LeakyItem delta <= 0: {leaky_delta <= 0}")Solution
import gc
from collections import Counter
class MemoryProfiler:
def __init__(self):
self._snapshots = []
def snapshot(self, label=""):
gc.collect()
gc.collect()
gc.collect() # Three passes for thorough collection
objects = gc.get_objects()
counts = Counter(type(obj).__name__ for obj in objects)
self._snapshots.append({"label": label, "counts": counts})
def diff(self, idx_a=0, idx_b=-1):
a = self._snapshots[idx_a]["counts"]
b = self._snapshots[idx_b]["counts"]
all_types = set(a.keys()) | set(b.keys())
result = []
for t in all_types:
ca = a.get(t, 0)
cb = b.get(t, 0)
delta = cb - ca
if delta != 0:
result.append((t, ca, cb, delta))
result.sort(key=lambda x: abs(x[3]), reverse=True)
return result
def find_referrers(self, obj, max_depth=2):
gc.collect()
referrers = gc.get_referrers(obj)
results = []
for ref in referrers:
type_name = type(ref).__name__
snippet = repr(ref)[:80]
results.append((type_name, snippet))
return results
def detect_leaks(self, threshold=10):
diffs = self.diff()
return [(t, ca, cb, d) for t, ca, cb, d in diffs if d > threshold]
How this mirrors real memory leak debugging:
- Baseline snapshot: Before the suspected leak, capture a count of all live objects by type.
- Trigger the leak: Run the code path you suspect is leaking.
- Second snapshot: Capture counts again.
- Diff: Compare the two snapshots. Types with large positive deltas are potential leaks.
- Find referrers: For a leaked object,
gc.get_referrers()tells you exactly what is keeping it alive — usually an unbounded list, dict, or global cache.
Why three gc.collect() calls: CPython's GC uses generational collection (gen 0, gen 1, gen 2). A single gc.collect() only collects the youngest generation by default. Calling it three times ensures objects have been promoted through all generations and collected.
Production tools that use this pattern:
tracemalloc(stdlib) — tracks memory allocations with file/line infoobjgraph— generates visual reference graphspympler— tracks object sizes and growth over time
Common leak sources found this way:
- Global dictionaries that grow without eviction
- Closures capturing large objects
- Event listeners / callbacks registered but never unregistered
- C extensions that increment refcount without decrementing
import gc
import sys
from collections import Counter
class MemoryProfiler:
"""A diagnostic tool that snapshots live objects and detects leaks
by comparing snapshots over time.
"""
def __init__(self):
self._snapshots = []
def snapshot(self, label=""):
"""Take a snapshot of live object counts by type.
Store it with a label for later comparison.
"""
# TODO:
# 1. Force full GC collection
# 2. Get all tracked objects
# 3. Count by type name
# 4. Store in self._snapshots
pass
def diff(self, idx_a=0, idx_b=-1):
"""Compare two snapshots and return types with changed counts.
Returns list of (type_name, count_a, count_b, delta) sorted by |delta| descending.
"""
# TODO: compute difference between two snapshots
pass
def find_referrers(self, obj, max_depth=2):
"""Find what is keeping obj alive.
Returns a list of (type, repr_snippet) for each referrer.
"""
# TODO: use gc.get_referrers()
pass
def detect_leaks(self, threshold=10):
"""Compare first and last snapshot.
Return types whose count increased by more than threshold.
"""
# TODO: filter diff results
passExpected Output
Snapshot 0 (baseline): types tracked > 0: True\nLeaked 500 LeakyItem objects\nSnapshot 1 (after leak): types tracked > 0: True\nLeak detected: LeakyItem increased by 500\nReferrers of a leaked item: found list referrer: True\nAfter cleanup: LeakyItem delta <= 0: TrueHints
Hint 1: gc.collect() should be called before gc.get_objects() to ensure all collectable garbage is cleaned up first.
Hint 2: Use collections.Counter on type(obj).__name__ for all gc.get_objects() to build the snapshot.
Hint 3: The diff should show types that increased (potential leaks) and decreased (freed objects).
Hint 4: gc.get_referrers(obj) returns all objects that hold a reference to obj — useful for finding what is keeping a leaked object alive.
