Skip to main content

Memory References and Aliasing - Python's Object Reference Model

Reading time: ~24 minutes | Level: Foundation → Engineering

Here is a question that reveals whether you understand how Python really manages memory.

import sys

x = []
print(sys.getrefcount(x)) # What prints here?

y = x
z = x
print(sys.getrefcount(x)) # And here?

del y
del z
print(sys.getrefcount(x)) # And here?

If you expected 1, 3, 1 - you are close but missing one reference. The actual output:

2
4
2

The count is always one higher than you expect because sys.getrefcount() itself holds a temporary reference to the object while evaluating the argument.

Understanding reference counts - why they increment, when they drop to zero, and what the garbage collector does about cycles - is the foundation of understanding Python's memory model. Memory leaks in Python are real, and they all trace back to these mechanics.

What You Will Learn

  • How CPython implements reference counting via ob_refcnt in the PyObject C struct
  • The two-tier memory management system: reference counting + cyclic garbage collector
  • What aliasing means: multiple names pointing to the same object, and when it causes bugs
  • id() function in CPython: the memory address interpretation and its limitations
  • is vs ==: the identity vs equality distinction and when each is correct
  • Weak references (weakref module): breaking cycles without preventing garbage collection
  • What del actually does - removes the name binding, does not destroy the object
  • sys.getrefcount(obj): interpreting the count and why it is always higher than expected
  • Memory leaks in Python: circular references, global caches, closures holding large objects
  • ASCII diagrams of reference graphs for nested and circular structures
  • Real-world implications: list slices create new objects, generator expressions do not

Prerequisites

  • Python variables and name binding
  • Understanding of mutable vs immutable types
  • Basic familiarity with Python classes and instances

Part 1 - CPython's Memory Management: Reference Counting

The PyObject Struct

In CPython, every Python object is represented in C memory by a structure that begins with a common header. The relevant portion:

/* From CPython: Include/object.h */
typedef struct _object {
Py_ssize_t ob_refcnt; /* Reference count */
PyTypeObject *ob_type; /* Pointer to type object */
} PyObject;

Every Python object - integer, string, list, instance of a custom class - starts with these two fields. ob_refcnt is the reference count: the number of places in the program that currently hold a reference to this object.

How Reference Counting Works

Every time a new reference to an object is created, CPython calls Py_INCREF(obj) which increments ob_refcnt. Every time a reference is removed, CPython calls Py_DECREF(obj) which decrements ob_refcnt. When ob_refcnt reaches zero, the object is immediately deallocated.

import sys
import ctypes

# Watch the reference count change
x = object() # ob_refcnt = 1 (bound to x)

y = x # ob_refcnt = 2 (also bound to y)
print(sys.getrefcount(x)) # 3 (x, y, and getrefcount's own arg)

z = [x] # ob_refcnt = 4 (also in z[0])
print(sys.getrefcount(x)) # 4 (x, y, z[0], getrefcount's arg)

del y # ob_refcnt = 3 (y binding removed)
del z # ob_refcnt = 2 (z list deleted → its contents decrefd)
print(sys.getrefcount(x)) # 2 (x, getrefcount's arg)

# When del x is called, ob_refcnt drops to 0 → object deallocated immediately

What Increments the Reference Count

x = [1, 2, 3] # refcount = 1

# Assignment to new name: +1
y = x # refcount = 2

# Appending to container: +1
lst = [x] # refcount = 3

# Function argument: +1 (during the call)
def show(obj):
# obj is a local reference to x - ob_refcnt incremented on call
print(sys.getrefcount(obj)) # 5 during call (x, y, lst[0], obj, getrefcount arg)

show(x)
# After return: local 'obj' is deleted → ob_refcnt decremented

# Attribute assignment: +1
class Container:
pass
c = Container()
c.data = x # refcount = 4 (c.data holds a reference)

Part 2 - The Two-Tier System: Reference Counting + Cyclic GC

Why Reference Counting Alone is Not Enough

Reference counting fails for circular references - objects that reference each other, forming a cycle. Even with no external references, their reference counts never reach zero.

import gc

# Create a circular reference
a = []
b = []
a.append(b) # a holds a reference to b → b.refcount = 2 (b + a[0])
b.append(a) # b holds a reference to a → a.refcount = 2 (a + b[0])

# Now delete the external references
del a # a.refcount drops to 1 (only b[0] holds it)
del b # b.refcount drops to 1 (only a[0] holds it)
# Neither ever reaches 0! They keep each other alive - memory leaked!

No external references exist - but refcounts are not 0. Reference counting alone cannot collect this cycle.

The Cyclic Garbage Collector

CPython includes a supplementary garbage collector (gc module) that detects and collects unreachable cycles. It runs periodically and looks for "islands" of objects that reference each other but are not reachable from any live variable.

import gc

class Node:
def __init__(self, name):
self.name = name
self.partner = None # Will be set to create cycle

# Create a cycle
n1 = Node("alpha")
n2 = Node("beta")
n1.partner = n2 # n1 → n2
n2.partner = n1 # n2 → n1 - cycle!

del n1
del n2
# n1 and n2's objects still exist in memory due to the cycle

# Force garbage collection
collected = gc.collect()
print(f"Garbage collected: {collected} objects")
# Output: Garbage collected: 4 objects (or similar - the cycle members + their dicts)

GC Configuration

import gc

# Check if gc is enabled
print(gc.isenabled()) # True by default

# Get current collection thresholds
# (generation 0 threshold, gen 1 threshold, gen 2 threshold)
print(gc.get_threshold()) # (700, 10, 10)
# Objects are promoted to generation 1 after surviving 700 gen-0 collections
# Objects are promoted to generation 2 after surviving 10 gen-1 collections

# Manual collection
gc.collect(0) # Collect generation 0 only (youngest, fastest)
gc.collect(1) # Collect generations 0 and 1
gc.collect() # Collect all generations (slowest, most thorough)

# Disable GC in performance-critical sections (only if you're sure there are no cycles)
gc.disable()
# ... performance-critical code with no circular references ...
gc.enable()
gc.collect() # Sweep up any cycles created during the disabled window

:::tip Performance Consideration The cyclic garbage collector introduces periodic latency "stop-the-world" pauses. In high-throughput applications, you can tune gc.set_threshold() or disable gc during hot paths, but only after profiling confirms it is the bottleneck. Most Python programs should leave gc defaults alone. :::

Part 3 - Aliasing: Multiple Names, One Object

What Aliasing Means

Aliasing occurs when two or more names in a program refer to the same object in memory.

a = [1, 2, 3]
b = a # b is an alias for a

print(a is b) # True - same object
print(id(a) == id(b)) # True - same memory address

b.append(4) # Mutation via b
print(a) # [1, 2, 3, 4] - visible through a because it is the same object

Aliasing in Function Calls

def clear_list(lst):
lst.clear() # Mutates the list object - visible to caller!
# lst = [] # This would rebind the LOCAL name lst - NOT visible to caller

data = [1, 2, 3, 4, 5]
clear_list(data)
print(data) # [] - the original list was cleared via aliasing

Aliasing in Nested Structures

The most dangerous aliasing is in nested structures where you do not realize objects are shared:

# This is the classic matrix aliasing bug
matrix = [[0] * 3] * 3
# What's in memory?

print(matrix[0] is matrix[1]) # True! Same list object!
print(matrix[1] is matrix[2]) # True! Same list object!

matrix[0][0] = 9
print(matrix)
# [[9, 0, 0], [9, 0, 0], [9, 0, 0]] - all three "rows" changed!

Part 4 - id() and is vs ==

id() in CPython

In CPython, id(obj) returns the memory address of the object. This address is the value of the pointer held by the reference.

x = "hello"
print(id(x)) # e.g., 140234567891456 - actual memory address in CPython

# Two objects at the same address cannot coexist simultaneously
# But addresses CAN be reused after deallocation!
a = [1, 2, 3]
address = id(a)
del a # a is deallocated (if no other references)
b = [4, 5, 6] # New allocation may reuse the same address
print(id(b) == address) # Possibly True - address reuse!

:::warning id() Uniqueness is Temporal, Not Absolute id() only guarantees that two live objects have different ids. After an object is deallocated, its memory address can be reused by a new object. Never store id() values for later comparison - the original object may have been deallocated and the address reused. :::

is vs ==: Identity vs Equality

# == : value equality (calls __eq__)
a = [1, 2, 3]
b = [1, 2, 3]
print(a == b) # True - same values
print(a is b) # False - different objects in memory

# is : identity (same ob_refcnt increment, same memory address)
c = a
print(a is c) # True - same object

# Correct uses of 'is':
x = None
if x is None: # Correct - None is a singleton
print("is None")

y = True
if y is True: # Technically works but use == for booleans
print("is True")

# WRONG use of 'is':
n = 1000
m = 1000
if n is m: # WRONG - may be True or False depending on CPython internals
print("same") # Do not rely on this! Use n == m
ExpressionResultWhy
a = [1,2,3] / b = [1,2,3]-Different objects: id(a)=0x7f1234, id(b)=0x7f5678
c = a-Same object: id(c)=0x7f1234
a == bTrueValues are equal
a is bFalseDifferent objects in memory
a == cTrueValues equal (trivially - same object)
a is cTrueSame object in memory

Part 5 - The del Keyword: Unbinding, Not Destroying

del removes a name binding. It does not directly destroy the object. The object is only destroyed when its reference count reaches zero.

import sys

x = [1, 2, 3]
y = x # ob_refcnt = 2

del x # Removes the name 'x'. ob_refcnt drops to 1.
# The list is NOT destroyed - y still holds it.

print(y) # [1, 2, 3] - still alive via y
print(sys.getrefcount(y)) # 2 (y + getrefcount arg)

del y # ob_refcnt drops to 0 → list IS destroyed
# y = ... # NameError: name 'y' is not defined

del vs list.clear() vs = []

data = [1, 2, 3]
other = data # alias

# Option 1: del removes the name binding
del data # 'data' no longer exists; 'other' still references the list
# print(data) # NameError

# Option 2: .clear() empties the list object in place
data = [1, 2, 3]
other = data
data.clear() # Empties the ONE list object both names reference
print(other) # [] - other sees the change!

# Option 3: = [] rebinds the name to a new empty list
data = [1, 2, 3]
other = data
data = [] # Rebinds 'data' to a new list; 'other' is unchanged
print(other) # [1, 2, 3] - original list, unchanged
print(data) # [] - new empty list

:::note del in dict and list del d[key] removes a key from a dict - this is not removing a name binding, it is calling dict.__delitem__(key). del lst[i] removes an element from a list - calls list.__delitem__(i). These operate on the object, not on a name binding. Only del variable_name removes a name from a namespace. :::

Part 6 - sys.getrefcount(): Reading Reference Counts

import sys

x = [1, 2, 3]
print(sys.getrefcount(x)) # 2 (x + getrefcount's temporary arg)

# Why always n+1? The function call creates a temporary reference
# to the argument before the function executes.

y = x
print(sys.getrefcount(x)) # 3 (x, y, getrefcount's arg)

def hold(obj):
print(sys.getrefcount(obj)) # 4 during call (x, y, obj param, getrefcount arg)

hold(x)
print(sys.getrefcount(x)) # 3 again after hold() returns (x, y, getrefcount arg)

Reference Count Contributions

import sys

lst = []

# Start: 2 (lst + getrefcount)
print(sys.getrefcount(lst)) # 2

# Put it in a container
container = [lst]
print(sys.getrefcount(lst)) # 3 (lst, container[0], getrefcount)

# Put it in a dict
d = {"key": lst}
print(sys.getrefcount(lst)) # 4 (lst, container[0], d["key"], getrefcount)

# Assign to another name
alias = lst
print(sys.getrefcount(lst)) # 5

# Remove from container
container.pop()
print(sys.getrefcount(lst)) # 4

# Clear the dict
d.clear()
print(sys.getrefcount(lst)) # 3

# Remove alias
del alias
print(sys.getrefcount(lst)) # 2 (back to just lst + getrefcount)

Part 7 - Weak References: Breaking Cycles Without Preventing Collection

A weak reference is a reference to an object that does not increment the reference count. If the object's only remaining references are weak references, it is eligible for garbage collection.

import weakref

class ExpensiveObject:
def __init__(self, data):
self.data = data

def __del__(self):
print(f"ExpensiveObject deallocated: {self.data}")

# Normal (strong) reference
obj = ExpensiveObject("important data")
print(f"refcount after creation: {import_sys().getrefcount(obj) - 1}") # 1

# Create a weak reference
weak = weakref.ref(obj)
print(f"Is alive: {weak() is not None}") # True - obj is still alive

# The weak reference can be called to get the object (or None if dead)
retrieved = weak()
print(retrieved.data) # "important data"
del retrieved # Release this strong reference

print(f"Is alive: {weak() is not None}") # True - 'obj' still holds it

del obj # Remove the last strong reference
# ExpensiveObject deallocated: important data ← printed here
print(f"Is alive: {weak() is not None}") # False - object is gone
print(weak()) # None

Practical Use: WeakValueDictionary for Caches

The most common production use of weakref is building caches that do not prevent garbage collection of the cached objects.

import weakref

class WeakCache:
"""Cache that allows entries to be garbage-collected when no longer used."""

def __init__(self):
self._cache = weakref.WeakValueDictionary()

def get(self, key):
return self._cache.get(key) # Returns None if object was collected

def set(self, key, value):
self._cache[key] = value # Weak reference - won't prevent GC

# Usage
cache = WeakCache()

class BigModel:
def __init__(self, name):
self.name = name

model_a = BigModel("resnet50")
cache.set("resnet50", model_a)

print(cache.get("resnet50")) # <BigModel object>

del model_a # Last strong reference removed
# model_a is eligible for GC immediately

import gc; gc.collect()
print(cache.get("resnet50")) # None - automatically evicted from cache!

Weak References to Break Circular Reference Memory Leaks

import weakref

class TreeNode:
def __init__(self, value):
self.value = value
self.children = []
self.parent = None # Strong reference to parent → creates cycle!

class TreeNodeFixed:
def __init__(self, value):
self.value = value
self.children = []
self._parent_ref = None # Weak reference to parent

@property
def parent(self):
if self._parent_ref is None:
return None
return self._parent_ref() # Call to get the object (may return None)

@parent.setter
def parent(self, node):
self._parent_ref = weakref.ref(node) if node is not None else None

# With weak references, deleting the tree root allows the whole tree to be collected
root = TreeNodeFixed("root")
child = TreeNodeFixed("child")
root.children.append(child)
child.parent = root # Weak reference - does not prevent root from being GC'd

print(child.parent.value) # "root" - weak ref still valid
del root # If root is the last strong reference, it can be collected

Part 8 - Memory Leaks in Python

Despite garbage collection, Python programs can leak memory. The common causes:

Leak Type 1: Unbounded Global Caches

# Global cache that grows forever - classic memory leak
_RESULT_CACHE = {}

def get_result(key):
if key not in _RESULT_CACHE:
_RESULT_CACHE[key] = compute(key) # Stored forever - never evicted
return _RESULT_CACHE[key]

# Fix: use functools.lru_cache with a size limit
from functools import lru_cache

@lru_cache(maxsize=1000)
def get_result_safe(key):
return compute(key) # Automatically evicts when > 1000 entries

Leak Type 2: Closures Holding Large Objects

import sys

def make_adder(large_data):
# large_data is captured by the closure
total = sum(large_data)

def adder(x):
return x + total # Only uses 'total', but large_data is also kept alive!
# Because: closures capture entire local scopes in Python 2
# In Python 3, free variables are captured, but...

return adder

big_list = list(range(10_000_000)) # ~80 MB
adder = make_adder(big_list)

del big_list # Try to free the memory
# In CPython, closures capture free variables, not the entire frame
# But if 'large_data' is referenced inside the function, it IS captured

# SAFE version: extract only what you need before closing over it
def make_adder_safe(large_data):
total = sum(large_data)
# large_data is NOT captured by the closure - only total is
del large_data # Explicitly release reference before returning closure

def adder(x):
return x + total # Only captures 'total' - a single int

return adder

Leak Type 3: Circular References with __del__

import gc

class Problematic:
def __del__(self):
print("Cleaning up...") # Has a finalizer

# Circular reference involving objects with __del__ is especially problematic
# because CPython's cyclic GC cannot collect objects with __del__ in cycles
# (Python 3.4+ resolves this with PEP 442, but it's still worth understanding)

a = Problematic()
b = Problematic()
a.partner = b
b.partner = a

del a
del b

# gc.collect() will collect these in Python 3.4+ (PEP 442 resolved the issue)
# In Python 2, objects with __del__ in cycles were uncollectable!
gc.collect()

Diagnosing Memory Leaks

import gc
import sys

def find_leaks():
"""Diagnostic: find objects that survived multiple GC cycles."""
gc.collect()
gc.collect()
gc.collect() # Three passes to ensure old-generation is cleaned

# Get all live objects
all_objects = gc.get_objects()

# Count by type
from collections import Counter
type_counts = Counter(type(obj).__name__ for obj in all_objects)

for type_name, count in type_counts.most_common(20):
print(f" {count:>8,} {type_name}")

# Use objgraph for more detailed leak analysis
# pip install objgraph
# import objgraph
# objgraph.show_most_common_types(limit=10)
# objgraph.show_backrefs(obj, max_depth=3)

Part 9 - Reference Graph for Complex Structures

Understanding how objects form graphs in memory is essential for diagnosing leaks and aliasing bugs.

import sys

# A moderately complex structure
data = {
"users": [
{"name": "Alice", "scores": [85, 90, 88]},
{"name": "Bob", "scores": [72, 68, 75]},
],
"config": {"version": 2, "active": True},
}

Reference Count Contributions per Object

import sys

alice_record = {"name": "Alice", "scores": [85, 90, 88]}
data = {"users": [alice_record], "config": {}}

# alice_record is referenced by:
# 1. The name 'alice_record' in this scope
# 2. data["users"][0] (the list inside "users" holds a reference)
print(sys.getrefcount(alice_record)) # 3 (alice_record, list[0], getrefcount arg)

# The scores list is referenced by:
# 1. alice_record["scores"]
scores = alice_record["scores"]
print(sys.getrefcount(scores)) # 3 (scores local name, alice_record["scores"], getrefcount arg)

Part 10 - Slices Create New Objects; Generators Do Not

Understanding whether an operation creates a new object or shares the existing one is crucial for memory analysis.

import sys

# Python list slices CREATE new list objects
original = [1, 2, 3, 4, 5]
sliced = original[1:4]

print(original is sliced) # False - new list object
print(id(original) == id(sliced)) # False

# But the ELEMENTS are shared (shallow copy)
print(original[1] is sliced[0]) # True - same integer object (small int cache)

# Modifying the slice does NOT affect original (for immutable elements)
sliced[0] = 99
print(original) # [1, 2, 3, 4, 5] - unchanged
print(sliced) # [99, 3, 4]
# Generator expressions do NOT create a new collection
original_data = range(1_000_000)

# This creates NO new object until iterated
gen = (x * x for x in original_data)
print(sys.getsizeof(gen)) # ~200 bytes - just the generator state

# vs list comprehension - creates the full list immediately
lst = [x * x for x in original_data]
print(sys.getsizeof(lst)) # ~8+ MB

NumPy Views vs Python Copies

import numpy as np

# CRITICAL DIFFERENCE: NumPy slices are VIEWS, not copies
arr = np.array([1, 2, 3, 4, 5])
view = arr[1:4] # Creates a VIEW of the same memory

print(np.shares_memory(arr, view)) # True - same underlying buffer!

view[0] = 99
print(arr) # [ 1 99 3 4 5] - original modified through the view!

# Python list slice: independent copy
lst = [1, 2, 3, 4, 5]
sliced = lst[1:4] # Independent copy
sliced[0] = 99
print(lst) # [1, 2, 3, 4, 5] - unchanged

:::danger NumPy Views vs Python Copies Python list slices always create independent copies. NumPy array slices create views that share memory with the original. Modifying a NumPy slice modifies the original. This is one of the most common bugs in data science code. Use arr[1:4].copy() to get an independent NumPy array. :::

Part 11 - Real-World Debugging with gc and sys

Finding Unexpected References

import gc
import sys

def find_referrers(obj):
"""Find all objects that hold a reference to obj."""
gc.collect() # Ensure GC has run
referrers = gc.get_referrers(obj)
print(f"Object: {obj!r}")
print(f"Reference count: {sys.getrefcount(obj) - 1}") # -1 for getrefcount's own ref
print(f"Referrers ({len(referrers)}):")
for ref in referrers:
print(f" type={type(ref).__name__}: {repr(ref)[:100]}")

# Example: find what's keeping a large dict alive
large_dict = {"data": list(range(10_000))}
find_referrers(large_dict)

Checking for Cycles

import gc

# Enable gc debug flags to see cycle collection in action
# gc.set_debug(gc.DEBUG_LEAK) # Prints when uncollectable garbage is found
# gc.set_debug(gc.DEBUG_STATS) # Prints GC statistics each collection

# Check how many objects are in each generation
print(f"Gen 0: {len(gc.get_objects(generation=0))} objects") # Python 3.8+
print(f"Gen 1: {len(gc.get_objects(generation=1))} objects")
print(f"Gen 2: {len(gc.get_objects(generation=2))} objects")

# Count uncollectable garbage (objects that can't be freed)
gc.collect()
print(f"Garbage: {len(gc.garbage)} uncollectable objects")
# gc.garbage lists objects that GC found in cycles but couldn't collect
# (e.g., objects with __del__ in Python 2, or bugs in C extensions)

Interview Questions

Q1: How does CPython's reference counting work, and what is its fundamental limitation?

Answer: Every CPython object contains an ob_refcnt field in its PyObject header. When a new reference to the object is created (assignment, function call, storing in a container), CPython calls Py_INCREF() to increment ob_refcnt. When a reference is removed (variable goes out of scope, del, container cleared), Py_DECREF() is called. When ob_refcnt reaches zero, the object is immediately deallocated - no garbage collection pass needed. The fundamental limitation: circular references. If A references B and B references A, both have ob_refcnt ≥ 1 even with no external references. They never reach zero. CPython supplements reference counting with a cyclic garbage collector (gc module) to detect and collect such unreachable cycles.

Q2: What does del x actually do in Python?

Answer: del x removes the name binding x from the current namespace (a STORE_FAST is reversed to a DELETE_FAST in bytecode). It does not directly destroy the object that x was pointing to. The object is only destroyed when its ob_refcnt drops to zero, which happens only when all references to it are removed. If another name y = x was created before del x, the object remains alive via y. del x followed by using x raises NameError. del lst[i] and del d[key] are different - they call __delitem__, removing an element from a container rather than removing a name from a namespace.

Q3: What is the difference between is and ==?

Answer: is tests identity: it checks whether two names point to the same object in memory (id(a) == id(b)). == tests equality: it calls a.__eq__(b) which compares values. Two distinct objects can be equal (==) without being identical (is): [1,2,3] == [1,2,3] is True but [1,2,3] is [1,2,3] is False (two different list objects). The only correct uses of is are: comparing to None (x is None), comparing to True/False in specific type checks, and explicit identity checks where object sameness matters (e.g., "is this the same callback object?"). Never use is to compare integers, strings, or other value types - the result depends on CPython implementation details like integer caching and string interning.

Q4: How do weak references break circular reference cycles?

Answer: A weak reference (weakref.ref(obj)) points to an object without incrementing its ob_refcnt. The object can be garbage-collected even if weak references to it exist - when the last strong reference is removed, the object is deallocated and all weak references to it become None. In a parent-child tree, if parent → children are strong references (parent owns children) and child → parent is a weak reference (child merely observes parent), then the reference graph has no strong cycle. Deleting the root parent allows the entire subtree to be collected via reference counting alone, without needing the cyclic GC. weakref.WeakValueDictionary is used to build caches where entries are automatically removed when no other strong reference exists.

Q5: What causes a memory leak in Python, and how would you diagnose one?

Answer: Python memory leaks arise from: (1) Unbounded caches - dicts that grow without eviction (use functools.lru_cache or weakref.WeakValueDictionary). (2) Circular references with finalizers - in older Python (< 3.4), objects with __del__ in a cycle were uncollectable. (3) Closures holding large objects - a closure captures all free variables, including large objects that the inner function only partly uses. (4) Event listeners / callbacks - registering a callback in a global event system keeps the object alive indefinitely. Diagnosis tools: gc.get_objects() to count live objects by type, gc.get_referrers(obj) to find what's keeping an object alive, sys.getsizeof() for size measurement, and third-party libraries like objgraph for visual reference graphs.

Q6: Why does sys.getrefcount(x) always return at least 2 for a live object?

Answer: sys.getrefcount(x) evaluates its argument before calling the function. The evaluation of x passes a reference to the list into the function call, creating a temporary reference for the duration of the call. This is standard Python calling convention: when you write func(x), CPython increments ob_refcnt for the argument before entering the function body, and decrements it after the function returns. So the reference count inside sys.getrefcount is always at least 1 (the permanent binding x) + 1 (the temporary argument reference) = 2. The rule: "subtract 1 from sys.getrefcount() to get the actual reference count."

Practice Challenges

Beginner - Trace the Reference Count

Trace the reference count of the list [1, 2, 3] through each line:

import sys

a = [1, 2, 3] # Line 1
b = a # Line 2
c = [a, a] # Line 3
del b # Line 4
c.pop() # Line 5
del a # Line 6
# How many references does [1, 2, 3] have after line 6?
Solution
Line 1: a = [1,2,3] → refcount = 1 (just 'a')
Line 2: b = a → refcount = 2 (a + b)
Line 3: c = [a, a] → refcount = 4 (a + b + c[0] + c[1])
Line 4: del b → refcount = 3 (a + c[0] + c[1])
Line 5: c.pop() → refcount = 2 (a + c[0], one removed from c)
Line 6: del a → refcount = 1 (only c[0])

After line 6, the list [1, 2, 3] has 1 reference - held by c[0]. It will not be garbage collected until c is cleared or goes out of scope. Note: sys.getrefcount() would show 2 due to its own temporary reference.

Intermediate - Fix the Memory Leak

The following server stores session data. Identify the memory leak and fix it.

import threading
import time

class SessionStore:
def __init__(self):
self._sessions = {} # session_id → user data
self._lock = threading.Lock()

def create_session(self, session_id: str, user_data: dict) -> None:
with self._lock:
self._sessions[session_id] = user_data

def get_session(self, session_id: str) -> dict | None:
with self._lock:
return self._sessions.get(session_id)

def delete_session(self, session_id: str) -> None:
with self._lock:
self._sessions.pop(session_id, None)

# Simulate: 100,000 sessions created, only 10% explicitly deleted
store = SessionStore()
for i in range(100_000):
store.create_session(f"session_{i}", {"user_id": i, "data": "x" * 1000})

# Only 10% get deleted
for i in range(0, 100_000, 10):
store.delete_session(f"session_{i}")

# 90,000 sessions remain in memory indefinitely!
Solution

The leak: _sessions is an unbounded dict. Sessions that are never explicitly deleted (due to client disconnection, crash, etc.) remain in memory forever. In a production server handling millions of requests, this exhausts memory.

Fixed version with TTL-based eviction:

import threading
import time
import weakref
from dataclasses import dataclass, field
from typing import Optional

@dataclass
class Session:
session_id: str
user_data: dict
created_at: float = field(default_factory=time.monotonic)
last_accessed: float = field(default_factory=time.monotonic)
ttl_seconds: float = 1800.0 # 30 minutes

def is_expired(self) -> bool:
return time.monotonic() - self.last_accessed > self.ttl_seconds

def touch(self) -> None:
self.last_accessed = time.monotonic()


class TTLSessionStore:
"""Session store with automatic TTL-based expiration."""

def __init__(self, ttl_seconds: float = 1800.0, cleanup_interval: float = 300.0):
self._sessions: dict[str, Session] = {}
self._lock = threading.Lock()
self._ttl = ttl_seconds
self._cleanup_interval = cleanup_interval
self._start_cleanup_thread()

def _start_cleanup_thread(self):
"""Background thread to evict expired sessions."""
def cleanup_loop():
while True:
time.sleep(self._cleanup_interval)
self._evict_expired()

t = threading.Thread(target=cleanup_loop, daemon=True)
t.start()

def _evict_expired(self):
with self._lock:
expired = [sid for sid, s in self._sessions.items() if s.is_expired()]
for sid in expired:
del self._sessions[sid]
if expired:
print(f"Evicted {len(expired)} expired sessions")

def create_session(self, session_id: str, user_data: dict) -> None:
with self._lock:
self._sessions[session_id] = Session(
session_id=session_id,
user_data=user_data,
ttl_seconds=self._ttl,
)

def get_session(self, session_id: str) -> Optional[dict]:
with self._lock:
session = self._sessions.get(session_id)
if session is None:
return None
if session.is_expired():
del self._sessions[session_id]
return None
session.touch() # Update last access time
return session.user_data

def delete_session(self, session_id: str) -> None:
with self._lock:
self._sessions.pop(session_id, None)

def __len__(self) -> int:
with self._lock:
return len(self._sessions)

Alternative: use cachetools for production (battle-tested TTL cache)

from cachetools import TTLCache
import threading

class ProductionSessionStore:
def __init__(self, maxsize=100_000, ttl=1800):
# TTLCache automatically evicts entries older than ttl seconds
# maxsize prevents unbounded growth
self._cache = TTLCache(maxsize=maxsize, ttl=ttl)
self._lock = threading.Lock()

def create_session(self, session_id, user_data):
with self._lock:
self._cache[session_id] = user_data

def get_session(self, session_id):
with self._lock:
return self._cache.get(session_id)

Advanced - Implement a Reference-Tracking Allocator

Implement a TrackedObject mixin that tracks how many instances are alive and can detect when reference counts drop unexpectedly. Then demonstrate it can catch a circular reference memory leak.

# Goal: count live instances, detect potential leaks
class TrackedObject:
# ... implement this

class Node(TrackedObject):
def __init__(self, value):
self.value = value
self.next = None

# After this code, TrackedObject.live_count() should report correctly
n1 = Node(1)
n2 = Node(2)
n1.next = n2 # safe - no cycle
# TrackedObject.live_count() → 2

n1.next = n1 # cycle!
del n1
del n2
# Without gc.collect(), live_count may still report 2 (cycle leak)
# After gc.collect(), it should report 0
Solution
import gc
import sys
import weakref
from threading import Lock

class TrackedObject:
"""
Mixin that tracks live instances using weak references.
Uses WeakSet so tracking does not prevent garbage collection.
"""
_instances: "weakref.WeakSet[TrackedObject]" = weakref.WeakSet()
_lock = Lock()

def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs)
# Each subclass gets its own WeakSet
cls._instances = weakref.WeakSet()
cls._lock = Lock()

def __init__(self):
with self.__class__._lock:
self.__class__._instances.add(self)

@classmethod
def live_count(cls) -> int:
"""Return the number of currently live instances."""
return len(cls._instances)

@classmethod
def check_for_leaks(cls) -> dict:
"""Run GC and check if live count changed."""
before = cls.live_count()
gc.collect()
after = cls.live_count()
return {
"before_gc": before,
"after_gc": after,
"leaked": before - after,
}


class Node(TrackedObject):
def __init__(self, value):
super().__init__() # Register with tracker
self.value = value
self.next = None

def __repr__(self):
return f"Node({self.value})"


# Test 1: Normal usage - no leak
n1 = Node(1)
n2 = Node(2)
n1.next = n2
print(f"Live nodes: {Node.live_count()}") # 2

del n1, n2
gc.collect()
print(f"Live nodes after del: {Node.live_count()}") # 0

# Test 2: Circular reference - leak detection
n1 = Node(10)
n2 = Node(20)
n1.next = n2
n2.next = n1 # Cycle!

del n1
del n2

leak_report = Node.check_for_leaks()
print(f"\nLeak report: {leak_report}")
# {'before_gc': 2, 'after_gc': 0, 'leaked': 2}
# 'before_gc' = 2 (reference counting couldn't collect the cycle)
# 'after_gc' = 0 (cyclic GC successfully collected them)
# 'leaked' = 2 (how many were collected by GC rather than refcounting)

print(f"Live nodes after GC: {Node.live_count()}") # 0

# Test 3: Mixed - some live, some leaked
a = Node(100) # Will stay alive
b = Node(200)
c = Node(300)
b.next = c
c.next = b # Cycle between b and c

del b, c # Try to delete b and c

print(f"\nAfter del b,c - live nodes: {Node.live_count()}") # 3 (a + b,c in cycle)
gc.collect()
print(f"After gc.collect() - live nodes: {Node.live_count()}") # 1 (only a)

del a
gc.collect()
print(f"After del a, gc - live nodes: {Node.live_count()}") # 0

Key design decisions:

  • WeakSet stores weak references to instances - the tracker does not prevent GC
  • __init_subclass__ gives each subclass its own separate WeakSet (not shared)
  • check_for_leaks() compares count before and after gc.collect() - the "leaked" count shows how many objects were held alive by cycles vs held by reference counting
  • Thread-safe via Lock since WeakSet is not thread-safe by default

Quick Reference

OperationEffect on ob_refcntObject Destroyed?
a = obj+1No
del a-1Only if count reaches 0
a = other-1 for old, +1 for newOld destroyed if count reaches 0
lst.append(obj)+1No
lst.pop()-1If count reaches 0
func(obj) (during call)+1No
Return from func-1If count reaches 0
weakref.ref(obj)0 (no change)Yes (when strong refs gone)
ToolPurposeKey Method
sys.getrefcount(obj)Current reference countReturns count (subtract 1 for own ref)
id(obj)Object identity (memory address in CPython)Compare with is
gc.collect()Force cyclic garbage collectionReturns # of unreachable objects
gc.get_referrers(obj)Find what holds a reference to objReturns list of referrers
gc.get_objects()All tracked objectsUseful for leak diagnosis
weakref.ref(obj)Create weak referenceCall result to get obj or None
weakref.WeakValueDictionaryDict that doesn't prevent GC of valuesAuto-evicts when values collected

Key Takeaways

  • CPython uses reference counting (ob_refcnt in PyObject) as the primary memory management mechanism - when a count reaches zero, the object is immediately deallocated, no GC pass needed
  • Reference counting cannot handle circular references - CPython's supplementary cyclic GC (gc module) detects unreachable islands of objects with non-zero reference counts
  • Aliasing - multiple names pointing to the same object - is the root cause of unintended mutation bugs; use id() or is to detect aliasing
  • del x removes the name binding, not the object - the object lives as long as any reference to it exists
  • sys.getrefcount(x) always returns at least 2 - subtract 1 to get the real count (the extra 1 is the temporary argument reference)
  • Weak references (weakref.ref, WeakValueDictionary) point to objects without incrementing ob_refcnt, enabling caches and parent pointers that do not prevent garbage collection
  • Python list slices create new list objects (but share element references); NumPy array slices create views - a critical difference that causes silent data corruption
  • Memory leaks in Python typically come from unbounded global caches, closures capturing large objects unnecessarily, and circular references involving objects with __del__ (pre-Python 3.4)
© 2026 EngineersOfAI. All rights reserved.