Reference Counting - How CPython Manages Memory at the C Level
Reading time: ~30 minutes | Level: Intermediate → Engineering
Before reading further, predict every output:
import sys
a = [1, 2, 3]
print(sys.getrefcount(a)) # ?
b = a
print(sys.getrefcount(a)) # ?
del b
print(sys.getrefcount(a)) # ?
Show Answer
Output:
2
3
2
Most engineers expect 1, 2, 1. The actual values are one higher than expected because sys.getrefcount(a) itself creates a temporary reference - passing a as an argument increments its reference count for the duration of the getrefcount call.
- After
a = [1, 2, 3]: one reference (ain your namespace).getrefcountadds one → 2 - After
b = a: two references (aandb).getrefcountadds one → 3 - After
del b: one reference (aagain).getrefcountadds one → 2
The rule: always subtract 1 from sys.getrefcount()'s result to get the true reference count.
This is not a quirk - it reveals exactly how CPython's memory management works at the C level. Every Python object is a C struct with an ob_refcnt field. Every assignment, function call, and deletion modifies this field. Understanding it at this depth means understanding why objects sometimes live longer than expected, why __del__ is unreliable, why cycles require a separate garbage collector, and why weakref exists.
What You Will Learn
- Every
PyObjecthasob_refcnt: the reference count field in C - How reference counts change: assignment, deletion, function calls, returns
sys.getrefcount()always adds 1 - why and how to account for it- Reading raw refcounts with
ctypes(educational tool, not production code) - When refcount hits 0:
tp_deallocis called, memory returned immediately - Strengths of reference counting: deterministic destruction, immediate
__del__calls - Weaknesses: reference cycles are not collected by refcounting alone
- Reference cycles:
a.next = b; b.next = a- refcounts never reach zero - The
weakrefmodule: references that don't increment refcount WeakValueDictionaryandWeakKeyDictionaryfor caches that don't prevent GC- Why
contextlib.contextmanageris safer than__del__for resource cleanup
Prerequisites
- Lesson 01: CPython Architecture - you need to understand PyObject at the C level
- Lesson 03: Disassembly with
dis- helpful for seeing LOAD/STORE operations that trigger refcount changes - Familiarity with Python's
delstatement and__del__method
Part 1 - Every Object Is a PyObject
The C Structure
Every Python object in CPython is represented at the C level as a PyObject struct:
// From CPython's Include/object.h (simplified)
typedef struct _object {
Py_ssize_t ob_refcnt; // reference count
PyTypeObject *ob_type; // pointer to the type object
} PyObject;
Every single Python object - integers, strings, lists, functions, classes, modules - starts with these two fields. ob_refcnt is a C ssize_t (signed 64-bit integer on 64-bit systems). ob_type points to the object's type.
When you write a = [1, 2, 3] in Python, CPython:
- Allocates memory for a new list object
- Sets
ob_refcnt = 1 - Sets
ob_typeto point tolist - Stores a pointer to the object in the name
ain the local namespace
Why Reference Counting?
CPython chose reference counting for its simplicity and determinism:
- Immediate reclamation: when
ob_refcntdrops to 0, memory is freed at that exact moment - no GC pause, no delay - Deterministic
__del__: finalizers are called immediately when the last reference drops - Low overhead: incrementing/decrementing a counter is cheap - no periodic GC scans needed for simple cases
- Cache-friendly: small objects are freed and reused quickly, staying warm in CPU cache
The tradeoff: reference counting cannot collect cycles (objects that reference each other). Python's cyclic GC handles that separately (covered in Lesson 06).
Part 2 - How Reference Counts Change
The Six Operations That Change Refcounts
import sys
obj = object()
print(sys.getrefcount(obj) - 1) # 1 (subtract the getrefcount argument ref)
# 1. Assignment: +1
other = obj
print(sys.getrefcount(obj) - 1) # 2
# 2. Adding to a container: +1
lst = [obj]
print(sys.getrefcount(obj) - 1) # 3
# 3. Passing to a function: +1 (during the call)
def show_refcount(x):
# Inside the call: x is another reference to obj
print(sys.getrefcount(x) - 1) # 4 (obj + other + lst[0] + x)
show_refcount(obj)
# After the call: x is gone, refcount back to 3
print(sys.getrefcount(obj) - 1) # 3
# 4. Removing from container: -1
lst.remove(obj)
print(sys.getrefcount(obj) - 1) # 2
# 5. del statement: -1
del other
print(sys.getrefcount(obj) - 1) # 1
# 6. Reassignment: -1 for old object
new_obj = object()
obj = new_obj # 'obj' no longer points to original object
# original object's refcount is now 0 → freed immediately
Mermaid: The Refcount Lifecycle
Function Calls and Returns
Function calls are the highest-frequency refcount operations in typical Python programs:
import sys
def inspect_refcount(x, label):
# x is another reference - adds 1
count = sys.getrefcount(x) - 1 # subtract getrefcount's arg ref
print(f"{label}: {count}")
mylist = [1, 2, 3]
inspect_refcount(mylist, "before call") # probably 1 inside the function (2-1)
# After the function returns, the parameter 'x' is gone
# refcount drops back to 1
inspect_refcount(mylist, "after call")
# Storing a return value
def get_list():
result = [1, 2, 3]
# result has refcount 1 (the local name 'result')
return result # refcount stays 1 - transferred to caller's name
# 'result' local is removed from frame, but the object survives because
# the caller's name immediately holds the reference
received = get_list()
# received now holds the reference - refcount = 1
Part 3 - sys.getrefcount and ctypes
sys.getrefcount Always Adds 1
The extra reference is not a bug - it is correct behavior:
import sys
a = "hello"
# When sys.getrefcount(a) is called:
# 1. Python evaluates the argument 'a' - this creates a reference (the argument slot)
# 2. getrefcount() receives the reference
# 3. getrefcount() increments ob_refcnt for the duration of the call
# 4. getrefcount() reads ob_refcnt and returns it
# 5. The argument reference is released - ob_refcnt decrements
# The returned value is one higher than the "true" count at the call site
print(sys.getrefcount("hello") - 1) # subtract 1 to get the real count
# Note: string interning affects this - common strings may have high refcounts
# because CPython interns them and shares across all uses
print(sys.getrefcount("") - 1) # might be hundreds - "" is interned everywhere
Reading Raw Refcounts with ctypes (Educational Only)
For educational understanding, you can read the raw ob_refcnt field directly using ctypes:
import ctypes
import sys
a = [1, 2, 3]
# id(a) returns the memory address of the object
addr = id(a)
# Read the first field of PyObject (ob_refcnt) as a C long
raw_refcount = ctypes.c_long.from_address(addr).value
getrefcount_result = sys.getrefcount(a)
print(f"Raw ob_refcnt via ctypes: {raw_refcount}")
print(f"sys.getrefcount(a): {getrefcount_result}")
print(f"getrefcount - raw: {getrefcount_result - raw_refcount}")
# Output will show getrefcount is exactly 1 higher than raw ob_refcnt
:::danger Do Not Use ctypes.from_address in Production Code
Reading raw memory addresses bypasses all of Python's safety guarantees. The object at id(a) could be moved or freed between the id() call and the from_address() call if you don't hold a strong reference. This technique is for learning internals only. In production, use sys.getrefcount() and subtract 1.
:::
Small Integer and String Interning
CPython maintains a cache of small integers and interns certain strings. These objects have artificially high refcounts:
import sys
# Small integers (-5 to 256) are cached - refcount is very high
print(sys.getrefcount(0) - 1) # hundreds or thousands - 0 is used everywhere
print(sys.getrefcount(1) - 1) # similarly high
print(sys.getrefcount(257) - 1) # probably 1 - not cached
# Interned strings also have high refcounts
print(sys.getrefcount("") - 1) # high - empty string is used pervasively
print(sys.getrefcount("hello") - 1) # depends on usage in the current session
This caching is why a = 1; b = 1; a is b returns True - a and b point to the same cached integer object.
Part 4 - When Refcount Hits Zero: tp_dealloc
Immediate Deallocation
When ob_refcnt decrements to 0, CPython calls the type's tp_dealloc function immediately - no delay, no queue:
class Tracked:
def __init__(self, name):
self.name = name
print(f" Created: {self.name}")
def __del__(self):
print(f" Destroyed: {self.name}")
print("Creating a")
a = Tracked("A")
print("Creating b = a")
b = a
print("del a")
del a # refcount drops from 2 to 1 - NOT destroyed yet
print("del b")
del b # refcount drops from 1 to 0 - IMMEDIATELY destroyed here
print("After del b")
Output:
Creating a
Created: A
Creating b = a
del a
del b
Destroyed: A ← called IMMEDIATELY when del b executes
After del b
The destruction happens at the del b line, not at the end of the function, not at garbage collection time. This is deterministic destruction - a key advantage of reference counting over tracing GC.
What tp_dealloc Does
For a list object, list_dealloc (the C function) does:
- Decrements
ob_refcntof every element in the list (which may trigger their own deallocation cascade) - Frees the internal array
- Returns the list object's memory to
pymalloc
For a dict, a string, a custom class instance - each type has its own tp_dealloc that cleans up type-specific resources before returning memory.
Part 5 - Reference Cycles: Refcounting's Weakness
The Cycle Problem
Reference counting has one fundamental weakness: it cannot collect objects involved in reference cycles.
import sys
a = []
b = []
a.append(b) # a holds reference to b
b.append(a) # b holds reference to a - cycle created
print(sys.getrefcount(a) - 1) # 2: local name 'a' + b[0]
print(sys.getrefcount(b) - 1) # 2: local name 'b' + a[0]
del a
# a's name is gone - but b[0] still holds a reference
# refcount of the list originally named 'a' is now 1
del b
# b's name is gone - but a[0] still holds a reference (a[0] IS the b list)
# refcount of the list originally named 'b' is now 1
# Both objects have refcount 1 - neither ever reaches 0
# Neither is ever freed by reference counting alone
# This is a memory leak (until the cyclic GC runs)
Self-Referential Objects
# The simplest possible cycle
a = []
a.append(a) # a contains itself
print(sys.getrefcount(a) - 1) # 2: name 'a' + a[0]
del a
# refcount drops to 1 - the list still holds a reference to itself
# it will NEVER reach 0 via refcounting
# Only the cyclic GC can collect this
Instance Cycles
Cycles are common with parent-child relationships in object graphs:
class Node:
def __init__(self, val):
self.val = val
self.parent = None
self.children = []
def add_child(self, child):
child.parent = self # child → parent reference
self.children.append(child) # parent → child reference
# CYCLE: parent.children[i].parent is parent
root = Node("root")
child = Node("child")
root.add_child(child)
del root
del child
# Both objects still have refcount > 0 due to the cycle
# Memory is only reclaimed when the cyclic GC runs
Part 6 - Weak References: References That Don't Count
The weakref Module
A weak reference points to an object without incrementing its ob_refcnt. If the only remaining references to an object are weak references, the object is freed:
import weakref
import sys
class MyClass:
def __init__(self, name):
self.name = name
def __repr__(self):
return f"MyClass({self.name!r})"
obj = MyClass("example")
print(sys.getrefcount(obj) - 1) # 1 (just the 'obj' name)
# Create a weak reference - does NOT increment ob_refcnt
ref = weakref.ref(obj)
print(sys.getrefcount(obj) - 1) # still 1 - weak ref doesn't count
# Access the object through the weak reference
print(ref()) # MyClass('example') - dereference to get the live object
del obj # refcount drops to 0 - object freed IMMEDIATELY
print(ref()) # None - object is gone, weak ref returns None
WeakValueDictionary for Caches
The most practical use of weak references in production is WeakValueDictionary - a cache that does not prevent its values from being garbage collected:
import weakref
class ExpensiveObject:
def __init__(self, key):
self.key = key
self.data = [0] * 10_000 # large data
def __repr__(self):
return f"ExpensiveObject({self.key!r})"
# Normal dict: keeps objects alive even if nothing else references them
strong_cache = {}
strong_cache["a"] = ExpensiveObject("a")
strong_cache["b"] = ExpensiveObject("b")
# Even if no code uses these objects, they stay in memory as long as
# strong_cache exists
# WeakValueDictionary: objects are freed when no strong references remain
weak_cache = weakref.WeakValueDictionary()
obj_a = ExpensiveObject("a")
obj_b = ExpensiveObject("b")
weak_cache["a"] = obj_a
weak_cache["b"] = obj_b
print(dict(weak_cache)) # {'a': ExpensiveObject('a'), 'b': ExpensiveObject('b')}
del obj_b
# obj_b's only strong reference was the name 'obj_b'
# refcount drops to 0 - freed immediately
import gc; gc.collect()
print(dict(weak_cache)) # {'a': ExpensiveObject('a')} - 'b' was cleaned up
WeakKeyDictionary
WeakKeyDictionary uses weak references for the keys - useful for attaching metadata to objects without preventing their collection:
import weakref
class Widget:
pass
metadata = weakref.WeakKeyDictionary()
btn = Widget()
label = Widget()
metadata[btn] = {"type": "button", "text": "Click me"}
metadata[label] = {"type": "label", "text": "Hello"}
print(len(metadata)) # 2
del label
# label's refcount drops to 0 - freed
# WeakKeyDictionary automatically removes the entry
import gc; gc.collect()
print(len(metadata)) # 1 - entry for label was cleaned up automatically
:::tip Use weakref.WeakValueDictionary for Caches That Shouldn't Prevent Garbage Collection
In-memory caches (object caches, connection pools, computed result caches) should almost always use WeakValueDictionary as their backing store. This prevents the cache from becoming a memory leak - objects are freed when the rest of the application no longer needs them, and the cache silently drops the entry. If the object is needed again, recompute and re-cache.
import weakref
import functools
def weak_memoize(func):
"""Memoization that doesn't prevent GC of results."""
cache = weakref.WeakValueDictionary()
@functools.wraps(func)
def wrapper(*args):
if args not in cache:
cache[args] = func(*args)
return cache[args]
return wrapper
:::
weakref.finalize: Run Code When an Object Is Freed
weakref.finalize registers a callback to run when an object is garbage collected, without preventing collection:
import weakref
class Connection:
def __init__(self, host):
self.host = host
print(f"Connected to {host}")
def on_connection_freed(host):
print(f"Connection to {host} was freed - cleaning up")
conn = Connection("db.example.com")
weakref.finalize(conn, on_connection_freed, conn.host)
del conn # refcount → 0, object freed, finalize callback runs immediately
# Output: Connection to db.example.com was freed - cleaning up
Part 7 - del and Resource Cleanup
del: The Finalizer
__del__ is called when an object's reference count reaches 0 (or when the cyclic GC collects it). For simple objects without cycles, this happens deterministically at the del point:
class FileWrapper:
def __init__(self, path):
self.path = path
self._file = open(path, 'w')
print(f"Opened {path}")
def write(self, data):
self._file.write(data)
def __del__(self):
if not self._file.closed:
self._file.close()
print(f"Closed {self.path}")
fw = FileWrapper("/tmp/test.txt")
fw.write("hello")
del fw # __del__ called immediately - file closed
Why del Is Unreliable
__del__ is not guaranteed to be called in all situations:
# Problem 1: __del__ is NOT called on objects in cycles
# (until the cyclic GC runs - which may be never if GC is disabled)
class Node:
def __init__(self):
self.ref = None
def __del__(self):
print("Node deleted")
a = Node()
b = Node()
a.ref = b
b.ref = a # cycle - __del__ may never be called if GC is disabled
# Problem 2: __del__ is NOT reliably called at interpreter shutdown
# CPython clears global variables during shutdown; objects referenced only
# by globals may be in a partially-teardown state when __del__ runs
# Problem 3: Exceptions in __del__ are silently ignored
class Buggy:
def __del__(self):
raise RuntimeError("Error in __del__") # printed to stderr, not raised
:::warning del Is Not Guaranteed to Run at Process Exit
At Python interpreter shutdown, global variables are set to None in an unspecified order. __del__ methods that rely on globals (logging, database connections, file handles accessed through globals) will see None instead of the expected objects and crash or silently fail. Do not rely on __del__ for critical cleanup at process exit.
import logging
logger = logging.getLogger(__name__)
class Resource:
def __del__(self):
# UNRELIABLE: at process exit, logging module may already be torn down
logger.info("Resource freed") # logger might be None!
:::
Context Managers Are Better
Use context managers for deterministic resource cleanup. They are explicit, composable, and guaranteed to run:
from contextlib import contextmanager
@contextmanager
def managed_resource(name):
"""Deterministic resource management - always releases, even on exceptions."""
print(f"Acquiring {name}")
resource = {"name": name, "active": True}
try:
yield resource
finally:
resource["active"] = False
print(f"Released {name}") # ALWAYS runs, even if body raises
with managed_resource("database_connection") as conn:
print(f"Using {conn['name']}")
# raise RuntimeError("oops") # 'Released' still prints
# Output:
# Acquiring database_connection
# Using database_connection
# Released database_connection
:::danger Circular References Involving del in Python < 3.4 Could Leak Memory Forever
Before Python 3.4 (PEP 442), if an object with __del__ was part of a reference cycle, CPython could not collect it safely - calling __del__ might resurrect the cycle. These objects were placed in gc.garbage and never freed. This was a major source of memory leaks.
Python 3.4+ (PEP 442) fixed this: __del__ is now safe to call even on objects in cycles. But if you run Python < 3.4, or work with libraries that claim Python 2/3 compatibility, be aware that combining __del__ with cycles is dangerous.
:::
:::note del x Decrements Refcount; It Does Not Immediately Destroy the Object
del x removes the name x from the current namespace and decrements the referenced object's ob_refcnt by 1. If ob_refcnt drops to 0, the object is freed immediately. But if other references exist (another name, a list element, a function's local variable), the object lives on. del x only guarantees that x no longer names the object - nothing about the object's lifetime.
a = [1, 2, 3]
b = a
del a # ob_refcnt: 2 → 1 - list survives, accessible via b
print(b) # [1, 2, 3] - still alive
del b # ob_refcnt: 1 → 0 - list freed NOW
:::
Part 8 - Refcount in Practice
Memory Management Without a GC Pause
One of the most significant advantages of reference counting is predictable memory usage. In a long-running server, objects are freed immediately when no longer needed - there is no GC pause that doubles memory usage while the GC scans:
import sys
def process_records(records):
"""Process each record independently - no accumulation."""
for record in records:
# transformed has refcount 1 (the local name)
transformed = transform(record) # refcount = 1
store(transformed) # refcount temporarily 2 during call
# After store() returns, if store didn't retain a reference:
# transformed's refcount drops to 1
# At next iteration, transformed is rebound - old object refcount → 0 → freed
# Memory usage stays bounded - old objects freed before new ones allocated
Compare to a tracing GC (Java, Go, Ruby): objects accumulate until a GC cycle runs, potentially doubling peak memory.
Debugging Unexpected Memory Retention
When objects live longer than expected, sys.getrefcount can help find unexpected references:
import sys
import gc
class MyObject:
pass
obj = MyObject()
expected_count = 1
# Why is refcount higher than expected?
count = sys.getrefcount(obj) - 1
if count > expected_count:
print(f"Unexpected references: {count}")
# Find who holds the references
referrers = gc.get_referrers(obj)
for referrer in referrers:
print(f" Referenced by: {type(referrer).__name__}: {referrer!r}")
Common Mistakes
Mistake 1 - Interpreting sys.getrefcount Without Subtracting 1
import sys
a = "unique_string_" + str(id({})) # unlikely to be interned
print(sys.getrefcount(a)) # prints 2 - beginner thinks there are 2 references!
# Wrong: there is only 1 reference (the name 'a'); the extra 1 is getrefcount's arg
print(sys.getrefcount(a) - 1) # correct: 1
Mistake 2 - Relying on del for Critical Resource Cleanup
# Wrong: __del__ may not run, or may run in wrong state at shutdown
class DatabaseConnection:
def __del__(self):
self.conn.close() # UNRELIABLE
# Right: always use a context manager
class DatabaseConnection:
def __enter__(self):
return self
def __exit__(self, *exc):
self.conn.close() # GUARANTEED to run
return False
with DatabaseConnection() as db:
db.query("SELECT 1")
Mistake 3 - Creating Cycles Unintentionally
# Common cycle pattern: callback holds reference to object that holds callback
class EventEmitter:
def __init__(self):
self.handlers = []
def on(self, handler):
self.handlers.append(handler)
emitter = EventEmitter()
def my_handler():
emitter.do_something() # captures 'emitter' - creates cycle!
# emitter → handlers → my_handler → emitter (closure)
emitter.on(my_handler)
# del emitter - refcount won't reach 0 due to cycle
# Fix: use weakref in the handler
import weakref
emitter_ref = weakref.ref(emitter)
def my_handler():
e = emitter_ref()
if e is not None:
e.do_something() # no cycle - weakref doesn't increment refcount
Mistake 4 - Using a Strong Cache When WeakValueDictionary Is Appropriate
# Wrong: cache holds objects alive even after all other references drop
_cache = {}
def get_user(user_id):
if user_id not in _cache:
_cache[user_id] = load_user(user_id) # user kept alive by cache forever
return _cache[user_id]
# Right: cache drops entries when objects are no longer needed elsewhere
import weakref
_cache = weakref.WeakValueDictionary()
def get_user(user_id):
user = _cache.get(user_id)
if user is None:
user = load_user(user_id)
_cache[user_id] = user
return user
Graded Practice Challenges
Level 1 - Predict the Output
Question 1: What does this print?
import sys
x = [1, 2, 3]
y = x
z = [x, x]
print(sys.getrefcount(x) - 1)
Show Answer
Output: 4
The list [1, 2, 3] is referenced by:
- Name
x - Name
y z[0]z[1]
So ob_refcnt = 4. sys.getrefcount(x) adds 1 for its own argument, returning 5. Subtract 1 → 4.
Question 2: What does this print, and when does "Destroyed" appear?
class Obj:
def __init__(self, name): self.name = name
def __del__(self): print(f"Destroyed {self.name}")
print("A")
a = Obj("alpha")
print("B")
b = a
print("C")
del a
print("D")
del b
print("E")
Show Answer
Output:
A
B
C
D
Destroyed alpha
E
- After
del a: refcount drops from 2 to 1. Object survives (b still holds it). - After
del b: refcount drops from 1 to 0.__del__called immediately. - "Destroyed alpha" appears between "D" and "E" - proving deterministic, immediate destruction.
Question 3: What does this print?
import weakref
class Node:
pass
n = Node()
ref = weakref.ref(n)
print(ref() is n) # ?
del n
print(ref()) # ?
Show Answer
Output:
True
None
ref() dereferences the weak reference, returning the live Node object. ref() is n is True - same object. After del n, the only remaining reference was the name n. The weak reference did not count. Refcount → 0, object freed. ref() now returns None.
Question 4: What does this print?
import sys
def f(x):
return sys.getrefcount(x)
a = object()
print(f(a) - 1)
Show Answer
Output: 2
Inside f, x is a reference to the same object as a. So ob_refcnt = 2 (name a in outer scope + parameter x in f). sys.getrefcount(x) adds 1 for its own argument → returns 3. Subtract 1 → 2.
Question 5: What does this print?
import sys
a = []
a.append(a) # self-reference
print(sys.getrefcount(a) - 1)
del a
# The list is now inaccessible from Python code
# but its refcount is still 1 (a[0] points to itself)
# What happens to it?
Show Answer
Output: 2
Before del a: ob_refcnt = 2 (name a + a[0] which is a itself). getrefcount adds 1 → returns 3. Subtract 1 → 2.
After del a: ob_refcnt drops from 2 to 1. The list still holds a reference to itself via a[0]. The refcount will never reach 0 through reference counting alone. The list is a memory leak until the cyclic garbage collector runs and detects the unreachable cycle. This is exactly the problem covered in Lesson 06.
Level 2 - Debug Challenge
Find and fix all issues:
import weakref
# Bug 1: cache that prevents GC of large objects
class ImageCache:
def __init__(self):
self._cache = {} # strong references - images never freed
def get(self, path):
if path not in self._cache:
self._cache[path] = load_image(path)
return self._cache[path]
# Bug 2: __del__ used for critical network cleanup
class NetworkClient:
def __init__(self, host):
self.socket = connect(host)
def __del__(self):
self.socket.close() # unreliable at shutdown
# Bug 3: unintentional cycle via callback
class Button:
def __init__(self):
self.on_click = None
button = Button()
def handler():
print(f"Button at {id(button)} clicked") # captures button strongly
button.on_click = handler
# button → on_click → handler → button (closure captures button)
# Bug 4: misreading sys.getrefcount
import sys
data = {"key": "value"}
print(f"References to data: {sys.getrefcount(data)}") # reports wrong number
Show Solution
Bug 1 - Strong cache prevents GC:
class ImageCache:
def __init__(self):
self._cache = weakref.WeakValueDictionary() # images freed when not in use
def get(self, path):
img = self._cache.get(path)
if img is None:
img = load_image(path)
self._cache[path] = img
return img
Bug 2 - __del__ for critical cleanup:
class NetworkClient:
def __init__(self, host):
self.socket = connect(host)
def close(self):
self.socket.close()
def __enter__(self):
return self
def __exit__(self, *exc):
self.close() # guaranteed - use with 'with' statement
return False
with NetworkClient(host) as client:
client.do_work()
Bug 3 - Cycle via closure capturing button:
button = Button()
button_ref = weakref.ref(button)
def handler():
b = button_ref() # dereference weakref - no cycle
if b is not None:
print(f"Button at {id(b)} clicked")
button.on_click = handler
Bug 4 - Misreading sys.getrefcount:
import sys
data = {"key": "value"}
# Always subtract 1 - getrefcount adds a temporary reference for its argument
print(f"References to data: {sys.getrefcount(data) - 1}")
Level 3 - Design Challenge
Design a RefTracker context manager that:
- On entry, records the
id()and initialsys.getrefcount()of a given object - On exit, reports whether the refcount changed
- Has a
snapshot()method that logs the current refcount delta - Raises a warning (not an error) if the refcount on exit is higher than on entry (potential leak)
- Works correctly with
with RefTracker(obj) as tracker: ...
Show Reference Solution
import sys
import warnings
import weakref
class RefTracker:
"""
Context manager that tracks reference count changes for an object.
Useful for debugging unexpected memory retention in tests and profiling.
Usage:
obj = MyClass()
with RefTracker(obj) as tracker:
do_something_with(obj)
tracker.snapshot("after do_something")
# Prints report on exit
"""
def __init__(self, obj, name: str | None = None):
# Use weakref so RefTracker itself doesn't inflate the refcount
# Note: not all objects are weakly referenceable (int, str are not)
try:
self._ref = weakref.ref(obj)
except TypeError:
# Fall back to strong reference for non-weakreferenceable objects
self._ref = lambda: obj
self._name = name or repr(obj)
self._initial_count = None
self._snapshots: list[tuple[str, int]] = []
def _get_count(self) -> int | None:
"""Get current refcount, subtracting 1 for getrefcount's own arg."""
obj = self._ref()
if obj is None:
return None
# getrefcount adds 1 for its argument + 1 for self._ref() temporary
# We subtract 2 to compensate for both
return sys.getrefcount(obj) - 2
def __enter__(self):
self._initial_count = self._get_count()
return self
def snapshot(self, label: str = "") -> int | None:
"""Record a refcount snapshot and print the delta."""
count = self._get_count()
if count is None:
print(f" [{label}] Object '{self._name}' has been freed")
self._snapshots.append((label, -1))
return None
delta = count - self._initial_count
sign = "+" if delta >= 0 else ""
print(f" [{label}] refcount={count} (delta={sign}{delta})")
self._snapshots.append((label, count))
return count
def __exit__(self, exc_type, exc_val, exc_tb):
final_count = self._get_count()
obj = self._ref()
print(f"\nRefTracker report for '{self._name}':")
print(f" Initial refcount: {self._initial_count}")
if final_count is None:
print(f" Final refcount: <freed>")
else:
delta = final_count - self._initial_count
sign = "+" if delta >= 0 else ""
print(f" Final refcount: {final_count} ({sign}{delta})")
if delta > 0:
warnings.warn(
f"RefTracker: '{self._name}' has {delta} more reference(s) "
f"on exit than entry. Possible memory leak.",
ResourceWarning,
stacklevel=2,
)
if self._snapshots:
print(f" Snapshots: {len(self._snapshots)}")
return False # don't suppress exceptions
# Usage
import gc
class SomeObject:
def __init__(self, data):
self.data = data
obj = SomeObject([1, 2, 3])
with RefTracker(obj, name="SomeObject") as tracker:
tracker.snapshot("initial")
extra_ref = obj # adds a reference
tracker.snapshot("after extra_ref = obj")
del extra_ref # removes it
tracker.snapshot("after del extra_ref")
# Output:
# [initial] refcount=1 (delta=+0)
# [after extra_ref = obj] refcount=2 (delta=+1)
# [after del extra_ref] refcount=1 (delta=+0)
#
# RefTracker report for 'SomeObject':
# Initial refcount: 1
# Final refcount: 1 (+0)
# Snapshots: 3
Design decisions:
- Uses
weakref.refsoRefTrackeritself does not inflate the refcount - Subtracts 2 from
getrefcountresult: one for the argument slot, one for theself._ref()temporary ResourceWarning(notRuntimeError) - a warning is appropriate for a diagnostic tool- Falls back to strong reference for non-weakly-referenceable objects (int, str, etc.)
Key Takeaways
- Every Python object is a C
PyObjectstruct withob_refcnt(reference count) andob_type(type pointer) as its first two fields ob_refcntincrements on assignment, adding to a container, and passing as a function argument; decrements ondel, removing from a container, and function returnsys.getrefcount(obj)always returns a count that is 1 higher than the true count - the argument itself creates a temporary reference; always subtract 1- When
ob_refcntreaches 0,tp_deallocis called immediately and memory is returned - this is deterministic destruction with no GC pause - Reference counting is fast and deterministic but cannot collect reference cycles - objects that reference each other in a loop will never reach
ob_refcnt = 0 del xdecrements the refcount by 1; it does not guarantee immediate destruction if other references existweakref.ref,WeakValueDictionary, andWeakKeyDictionarycreate references that do not incrementob_refcnt- essential for caches that should not prevent GC__del__is called immediately when refcount reaches 0 for non-cyclic objects but is unreliable at interpreter shutdown and for objects in cycles; use context managers (with/__exit__) for deterministic resource cleanup- Circular references involving
__del__in Python < 3.4 could leak memory forever (PEP 442 fixed this in Python 3.4+) - The cyclic garbage collector (Lesson 06) handles what reference counting cannot: detecting and collecting unreachable cycles
What's Next
Lesson 06 covers CPython's cyclic garbage collector - the generational, mark-and-sweep collector that handles reference cycles. You will learn how three-generation collection works, how to tune GC thresholds for batch workloads, why gc.freeze() exists for forking servers, and how to diagnose memory leaks with gc.get_referrers() and tracemalloc.
