Memory References and Aliasing - Python's Object Reference Model

Reading time: ~24 minutes | Level: Foundation → Engineering

Here is a question that reveals whether you understand how Python really manages memory.

import sys

x = []
print(sys.getrefcount(x))   # What prints here?

y = x
z = x
print(sys.getrefcount(x))   # And here?

del y
del z
print(sys.getrefcount(x))   # And here?

If you expected 1, 3, 1 - you are close but missing one reference. The actual output:

2
4
2

The count is always one higher than you expect because sys.getrefcount() itself holds a temporary reference to the object while evaluating the argument.

Understanding reference counts - why they increment, when they drop to zero, and what the garbage collector does about cycles - is the foundation of understanding Python's memory model. Memory leaks in Python are real, and they all trace back to these mechanics.

What You Will Learn

How CPython implements reference counting via ob_refcnt in the PyObject C struct
The two-tier memory management system: reference counting + cyclic garbage collector
What aliasing means: multiple names pointing to the same object, and when it causes bugs
id() function in CPython: the memory address interpretation and its limitations
is vs ==: the identity vs equality distinction and when each is correct
Weak references (weakref module): breaking cycles without preventing garbage collection
What del actually does - removes the name binding, does not destroy the object
sys.getrefcount(obj): interpreting the count and why it is always higher than expected
Memory leaks in Python: circular references, global caches, closures holding large objects
ASCII diagrams of reference graphs for nested and circular structures
Real-world implications: list slices create new objects, generator expressions do not

Prerequisites

Python variables and name binding
Understanding of mutable vs immutable types
Basic familiarity with Python classes and instances

Part 1 - CPython's Memory Management: Reference Counting

The PyObject Struct

In CPython, every Python object is represented in C memory by a structure that begins with a common header. The relevant portion:

/* From CPython: Include/object.h */
typedef struct _object {
    Py_ssize_t ob_refcnt;    /* Reference count */
    PyTypeObject *ob_type;   /* Pointer to type object */
} PyObject;

Every Python object - integer, string, list, instance of a custom class - starts with these two fields. ob_refcnt is the reference count: the number of places in the program that currently hold a reference to this object.

How Reference Counting Works

Every time a new reference to an object is created, CPython calls Py_INCREF(obj) which increments ob_refcnt. Every time a reference is removed, CPython calls Py_DECREF(obj) which decrements ob_refcnt. When ob_refcnt reaches zero, the object is immediately deallocated.

import sys
import ctypes

# Watch the reference count change
x = object()   # ob_refcnt = 1 (bound to x)

y = x          # ob_refcnt = 2 (also bound to y)
print(sys.getrefcount(x))  # 3 (x, y, and getrefcount's own arg)

z = [x]        # ob_refcnt = 4 (also in z[0])
print(sys.getrefcount(x))  # 4 (x, y, z[0], getrefcount's arg)

del y          # ob_refcnt = 3 (y binding removed)
del z          # ob_refcnt = 2 (z list deleted → its contents decrefd)
print(sys.getrefcount(x))  # 2 (x, getrefcount's arg)

# When del x is called, ob_refcnt drops to 0 → object deallocated immediately

What Increments the Reference Count

x = [1, 2, 3]   # refcount = 1

# Assignment to new name: +1
y = x            # refcount = 2

# Appending to container: +1
lst = [x]        # refcount = 3

# Function argument: +1 (during the call)
def show(obj):
    # obj is a local reference to x - ob_refcnt incremented on call
    print(sys.getrefcount(obj))   # 5 during call (x, y, lst[0], obj, getrefcount arg)

show(x)
# After return: local 'obj' is deleted → ob_refcnt decremented

# Attribute assignment: +1
class Container:
    pass
c = Container()
c.data = x       # refcount = 4 (c.data holds a reference)

Part 2 - The Two-Tier System: Reference Counting + Cyclic GC

Why Reference Counting Alone is Not Enough

Reference counting fails for circular references - objects that reference each other, forming a cycle. Even with no external references, their reference counts never reach zero.

import gc

# Create a circular reference
a = []
b = []
a.append(b)   # a holds a reference to b → b.refcount = 2 (b + a[0])
b.append(a)   # b holds a reference to a → a.refcount = 2 (a + b[0])

# Now delete the external references
del a   # a.refcount drops to 1 (only b[0] holds it)
del b   # b.refcount drops to 1 (only a[0] holds it)
# Neither ever reaches 0! They keep each other alive - memory leaked!

No external references exist - but refcounts are not 0. Reference counting alone cannot collect this cycle.

The Cyclic Garbage Collector

CPython includes a supplementary garbage collector (gc module) that detects and collects unreachable cycles. It runs periodically and looks for "islands" of objects that reference each other but are not reachable from any live variable.

import gc

class Node:
    def __init__(self, name):
        self.name = name
        self.partner = None   # Will be set to create cycle

# Create a cycle
n1 = Node("alpha")
n2 = Node("beta")
n1.partner = n2   # n1 → n2
n2.partner = n1   # n2 → n1 - cycle!

del n1
del n2
# n1 and n2's objects still exist in memory due to the cycle

# Force garbage collection
collected = gc.collect()
print(f"Garbage collected: {collected} objects")
# Output: Garbage collected: 4 objects (or similar - the cycle members + their dicts)

GC Configuration

import gc

# Check if gc is enabled
print(gc.isenabled())   # True by default

# Get current collection thresholds
# (generation 0 threshold, gen 1 threshold, gen 2 threshold)
print(gc.get_threshold())   # (700, 10, 10)
# Objects are promoted to generation 1 after surviving 700 gen-0 collections
# Objects are promoted to generation 2 after surviving 10 gen-1 collections

# Manual collection
gc.collect(0)   # Collect generation 0 only (youngest, fastest)
gc.collect(1)   # Collect generations 0 and 1
gc.collect()    # Collect all generations (slowest, most thorough)

# Disable GC in performance-critical sections (only if you're sure there are no cycles)
gc.disable()
# ... performance-critical code with no circular references ...
gc.enable()
gc.collect()   # Sweep up any cycles created during the disabled window

:::tip Performance Consideration The cyclic garbage collector introduces periodic latency "stop-the-world" pauses. In high-throughput applications, you can tune gc.set_threshold() or disable gc during hot paths, but only after profiling confirms it is the bottleneck. Most Python programs should leave gc defaults alone. :::

Part 3 - Aliasing: Multiple Names, One Object

What Aliasing Means

Aliasing occurs when two or more names in a program refer to the same object in memory.

a = [1, 2, 3]
b = a           # b is an alias for a

print(a is b)   # True - same object
print(id(a) == id(b))   # True - same memory address

b.append(4)     # Mutation via b
print(a)        # [1, 2, 3, 4] - visible through a because it is the same object

Aliasing in Function Calls

def clear_list(lst):
    lst.clear()   # Mutates the list object - visible to caller!
    # lst = []    # This would rebind the LOCAL name lst - NOT visible to caller

data = [1, 2, 3, 4, 5]
clear_list(data)
print(data)   # [] - the original list was cleared via aliasing

Aliasing in Nested Structures

The most dangerous aliasing is in nested structures where you do not realize objects are shared:

# This is the classic matrix aliasing bug
matrix = [[0] * 3] * 3
# What's in memory?

print(matrix[0] is matrix[1])   # True! Same list object!
print(matrix[1] is matrix[2])   # True! Same list object!

matrix[0][0] = 9
print(matrix)
# [[9, 0, 0], [9, 0, 0], [9, 0, 0]] - all three "rows" changed!

Part 4 - `id()` and `is` vs `==`

`id()` in CPython

In CPython, id(obj) returns the memory address of the object. This address is the value of the pointer held by the reference.

x = "hello"
print(id(x))   # e.g., 140234567891456 - actual memory address in CPython

# Two objects at the same address cannot coexist simultaneously
# But addresses CAN be reused after deallocation!
a = [1, 2, 3]
address = id(a)
del a           # a is deallocated (if no other references)
b = [4, 5, 6]  # New allocation may reuse the same address
print(id(b) == address)   # Possibly True - address reuse!

:::warning id() Uniqueness is Temporal, Not Absolute id() only guarantees that two live objects have different ids. After an object is deallocated, its memory address can be reused by a new object. Never store id() values for later comparison - the original object may have been deallocated and the address reused. :::

`is` vs `==`: Identity vs Equality

# == : value equality (calls __eq__)
a = [1, 2, 3]
b = [1, 2, 3]
print(a == b)   # True - same values
print(a is b)   # False - different objects in memory

# is : identity (same ob_refcnt increment, same memory address)
c = a
print(a is c)   # True - same object

# Correct uses of 'is':
x = None
if x is None:        # Correct - None is a singleton
    print("is None")

y = True
if y is True:        # Technically works but use == for booleans
    print("is True")

# WRONG use of 'is':
n = 1000
m = 1000
if n is m:           # WRONG - may be True or False depending on CPython internals
    print("same")    # Do not rely on this! Use n == m

Expression	Result	Why
`a = [1,2,3]` / `b = [1,2,3]`	-	Different objects: `id(a)=0x7f1234`, `id(b)=0x7f5678`
`c = a`	-	Same object: `id(c)=0x7f1234`
`a == b`	`True`	Values are equal
`a is b`	`False`	Different objects in memory
`a == c`	`True`	Values equal (trivially - same object)
`a is c`	`True`	Same object in memory

Part 5 - The `del` Keyword: Unbinding, Not Destroying

del removes a name binding. It does not directly destroy the object. The object is only destroyed when its reference count reaches zero.

import sys

x = [1, 2, 3]
y = x            # ob_refcnt = 2

del x            # Removes the name 'x'. ob_refcnt drops to 1.
                 # The list is NOT destroyed - y still holds it.

print(y)         # [1, 2, 3] - still alive via y
print(sys.getrefcount(y))   # 2 (y + getrefcount arg)

del y            # ob_refcnt drops to 0 → list IS destroyed
# y = ...        # NameError: name 'y' is not defined

`del` vs `list.clear()` vs `= []`

data = [1, 2, 3]
other = data   # alias

# Option 1: del removes the name binding
del data       # 'data' no longer exists; 'other' still references the list
# print(data)  # NameError

# Option 2: .clear() empties the list object in place
data = [1, 2, 3]
other = data
data.clear()           # Empties the ONE list object both names reference
print(other)           # [] - other sees the change!

# Option 3: = [] rebinds the name to a new empty list
data = [1, 2, 3]
other = data
data = []              # Rebinds 'data' to a new list; 'other' is unchanged
print(other)           # [1, 2, 3] - original list, unchanged
print(data)            # [] - new empty list

:::note del in dict and list del d[key] removes a key from a dict - this is not removing a name binding, it is calling dict.__delitem__(key). del lst[i] removes an element from a list - calls list.__delitem__(i). These operate on the object, not on a name binding. Only del variable_name removes a name from a namespace. :::

Part 6 - `sys.getrefcount()`: Reading Reference Counts

import sys

x = [1, 2, 3]
print(sys.getrefcount(x))   # 2 (x + getrefcount's temporary arg)

# Why always n+1? The function call creates a temporary reference
# to the argument before the function executes.

y = x
print(sys.getrefcount(x))   # 3 (x, y, getrefcount's arg)

def hold(obj):
    print(sys.getrefcount(obj))   # 4 during call (x, y, obj param, getrefcount arg)

hold(x)
print(sys.getrefcount(x))   # 3 again after hold() returns (x, y, getrefcount arg)

Reference Count Contributions

import sys

lst = []

# Start: 2 (lst + getrefcount)
print(sys.getrefcount(lst))   # 2

# Put it in a container
container = [lst]
print(sys.getrefcount(lst))   # 3 (lst, container[0], getrefcount)

# Put it in a dict
d = {"key": lst}
print(sys.getrefcount(lst))   # 4 (lst, container[0], d["key"], getrefcount)

# Assign to another name
alias = lst
print(sys.getrefcount(lst))   # 5

# Remove from container
container.pop()
print(sys.getrefcount(lst))   # 4

# Clear the dict
d.clear()
print(sys.getrefcount(lst))   # 3

# Remove alias
del alias
print(sys.getrefcount(lst))   # 2 (back to just lst + getrefcount)

Part 7 - Weak References: Breaking Cycles Without Preventing Collection

A weak reference is a reference to an object that does not increment the reference count. If the object's only remaining references are weak references, it is eligible for garbage collection.

import weakref

class ExpensiveObject:
    def __init__(self, data):
        self.data = data

    def __del__(self):
        print(f"ExpensiveObject deallocated: {self.data}")

# Normal (strong) reference
obj = ExpensiveObject("important data")
print(f"refcount after creation: {import_sys().getrefcount(obj) - 1}")  # 1

# Create a weak reference
weak = weakref.ref(obj)
print(f"Is alive: {weak() is not None}")   # True - obj is still alive

# The weak reference can be called to get the object (or None if dead)
retrieved = weak()
print(retrieved.data)   # "important data"
del retrieved   # Release this strong reference

print(f"Is alive: {weak() is not None}")   # True - 'obj' still holds it

del obj   # Remove the last strong reference
# ExpensiveObject deallocated: important data ← printed here
print(f"Is alive: {weak() is not None}")   # False - object is gone
print(weak())   # None

Practical Use: WeakValueDictionary for Caches

The most common production use of weakref is building caches that do not prevent garbage collection of the cached objects.

import weakref

class WeakCache:
    """Cache that allows entries to be garbage-collected when no longer used."""

    def __init__(self):
        self._cache = weakref.WeakValueDictionary()

    def get(self, key):
        return self._cache.get(key)   # Returns None if object was collected

    def set(self, key, value):
        self._cache[key] = value   # Weak reference - won't prevent GC

# Usage
cache = WeakCache()

class BigModel:
    def __init__(self, name):
        self.name = name

model_a = BigModel("resnet50")
cache.set("resnet50", model_a)

print(cache.get("resnet50"))   # <BigModel object>

del model_a   # Last strong reference removed
# model_a is eligible for GC immediately

import gc; gc.collect()
print(cache.get("resnet50"))   # None - automatically evicted from cache!

Weak References to Break Circular Reference Memory Leaks

import weakref

class TreeNode:
    def __init__(self, value):
        self.value = value
        self.children = []
        self.parent = None   # Strong reference to parent → creates cycle!

class TreeNodeFixed:
    def __init__(self, value):
        self.value = value
        self.children = []
        self._parent_ref = None   # Weak reference to parent

    @property
    def parent(self):
        if self._parent_ref is None:
            return None
        return self._parent_ref()   # Call to get the object (may return None)

    @parent.setter
    def parent(self, node):
        self._parent_ref = weakref.ref(node) if node is not None else None

# With weak references, deleting the tree root allows the whole tree to be collected
root = TreeNodeFixed("root")
child = TreeNodeFixed("child")
root.children.append(child)
child.parent = root   # Weak reference - does not prevent root from being GC'd

print(child.parent.value)   # "root" - weak ref still valid
del root   # If root is the last strong reference, it can be collected

Part 8 - Memory Leaks in Python

Despite garbage collection, Python programs can leak memory. The common causes:

Leak Type 1: Unbounded Global Caches

# Global cache that grows forever - classic memory leak
_RESULT_CACHE = {}

def get_result(key):
    if key not in _RESULT_CACHE:
        _RESULT_CACHE[key] = compute(key)   # Stored forever - never evicted
    return _RESULT_CACHE[key]

# Fix: use functools.lru_cache with a size limit
from functools import lru_cache

@lru_cache(maxsize=1000)
def get_result_safe(key):
    return compute(key)   # Automatically evicts when > 1000 entries

Leak Type 2: Closures Holding Large Objects

import sys

def make_adder(large_data):
    # large_data is captured by the closure
    total = sum(large_data)

    def adder(x):
        return x + total   # Only uses 'total', but large_data is also kept alive!
        # Because: closures capture entire local scopes in Python 2
        # In Python 3, free variables are captured, but...

    return adder

big_list = list(range(10_000_000))   # ~80 MB
adder = make_adder(big_list)

del big_list   # Try to free the memory
# In CPython, closures capture free variables, not the entire frame
# But if 'large_data' is referenced inside the function, it IS captured

# SAFE version: extract only what you need before closing over it
def make_adder_safe(large_data):
    total = sum(large_data)
    # large_data is NOT captured by the closure - only total is
    del large_data   # Explicitly release reference before returning closure

    def adder(x):
        return x + total   # Only captures 'total' - a single int

    return adder

Leak Type 3: Circular References with `del`

import gc

class Problematic:
    def __del__(self):
        print("Cleaning up...")   # Has a finalizer

# Circular reference involving objects with __del__ is especially problematic
# because CPython's cyclic GC cannot collect objects with __del__ in cycles
# (Python 3.4+ resolves this with PEP 442, but it's still worth understanding)

a = Problematic()
b = Problematic()
a.partner = b
b.partner = a

del a
del b

# gc.collect() will collect these in Python 3.4+ (PEP 442 resolved the issue)
# In Python 2, objects with __del__ in cycles were uncollectable!
gc.collect()

Diagnosing Memory Leaks

import gc
import sys

def find_leaks():
    """Diagnostic: find objects that survived multiple GC cycles."""
    gc.collect()
    gc.collect()
    gc.collect()   # Three passes to ensure old-generation is cleaned

    # Get all live objects
    all_objects = gc.get_objects()

    # Count by type
    from collections import Counter
    type_counts = Counter(type(obj).__name__ for obj in all_objects)

    for type_name, count in type_counts.most_common(20):
        print(f"  {count:>8,}  {type_name}")

# Use objgraph for more detailed leak analysis
# pip install objgraph
# import objgraph
# objgraph.show_most_common_types(limit=10)
# objgraph.show_backrefs(obj, max_depth=3)

Part 9 - Reference Graph for Complex Structures

Understanding how objects form graphs in memory is essential for diagnosing leaks and aliasing bugs.

import sys

# A moderately complex structure
data = {
    "users": [
        {"name": "Alice", "scores": [85, 90, 88]},
        {"name": "Bob",   "scores": [72, 68, 75]},
    ],
    "config": {"version": 2, "active": True},
}

Reference Count Contributions per Object

import sys

alice_record = {"name": "Alice", "scores": [85, 90, 88]}
data = {"users": [alice_record], "config": {}}

# alice_record is referenced by:
# 1. The name 'alice_record' in this scope
# 2. data["users"][0] (the list inside "users" holds a reference)
print(sys.getrefcount(alice_record))   # 3 (alice_record, list[0], getrefcount arg)

# The scores list is referenced by:
# 1. alice_record["scores"]
scores = alice_record["scores"]
print(sys.getrefcount(scores))   # 3 (scores local name, alice_record["scores"], getrefcount arg)

Part 10 - Slices Create New Objects; Generators Do Not

Understanding whether an operation creates a new object or shares the existing one is crucial for memory analysis.

import sys

# Python list slices CREATE new list objects
original = [1, 2, 3, 4, 5]
sliced = original[1:4]

print(original is sliced)   # False - new list object
print(id(original) == id(sliced))   # False

# But the ELEMENTS are shared (shallow copy)
print(original[1] is sliced[0])   # True - same integer object (small int cache)

# Modifying the slice does NOT affect original (for immutable elements)
sliced[0] = 99
print(original)   # [1, 2, 3, 4, 5] - unchanged
print(sliced)     # [99, 3, 4]

# Generator expressions do NOT create a new collection
original_data = range(1_000_000)

# This creates NO new object until iterated
gen = (x * x for x in original_data)
print(sys.getsizeof(gen))   # ~200 bytes - just the generator state

# vs list comprehension - creates the full list immediately
lst = [x * x for x in original_data]
print(sys.getsizeof(lst))   # ~8+ MB

NumPy Views vs Python Copies

import numpy as np

# CRITICAL DIFFERENCE: NumPy slices are VIEWS, not copies
arr = np.array([1, 2, 3, 4, 5])
view = arr[1:4]   # Creates a VIEW of the same memory

print(np.shares_memory(arr, view))   # True - same underlying buffer!

view[0] = 99
print(arr)   # [ 1 99  3  4  5] - original modified through the view!

# Python list slice: independent copy
lst = [1, 2, 3, 4, 5]
sliced = lst[1:4]   # Independent copy
sliced[0] = 99
print(lst)   # [1, 2, 3, 4, 5] - unchanged

:::danger NumPy Views vs Python Copies Python list slices always create independent copies. NumPy array slices create views that share memory with the original. Modifying a NumPy slice modifies the original. This is one of the most common bugs in data science code. Use arr[1:4].copy() to get an independent NumPy array. :::

Part 11 - Real-World Debugging with gc and sys

Finding Unexpected References

import gc
import sys

def find_referrers(obj):
    """Find all objects that hold a reference to obj."""
    gc.collect()   # Ensure GC has run
    referrers = gc.get_referrers(obj)
    print(f"Object: {obj!r}")
    print(f"Reference count: {sys.getrefcount(obj) - 1}")   # -1 for getrefcount's own ref
    print(f"Referrers ({len(referrers)}):")
    for ref in referrers:
        print(f"  type={type(ref).__name__}: {repr(ref)[:100]}")

# Example: find what's keeping a large dict alive
large_dict = {"data": list(range(10_000))}
find_referrers(large_dict)

Checking for Cycles

import gc

# Enable gc debug flags to see cycle collection in action
# gc.set_debug(gc.DEBUG_LEAK)   # Prints when uncollectable garbage is found
# gc.set_debug(gc.DEBUG_STATS)  # Prints GC statistics each collection

# Check how many objects are in each generation
print(f"Gen 0: {len(gc.get_objects(generation=0))} objects")   # Python 3.8+
print(f"Gen 1: {len(gc.get_objects(generation=1))} objects")
print(f"Gen 2: {len(gc.get_objects(generation=2))} objects")

# Count uncollectable garbage (objects that can't be freed)
gc.collect()
print(f"Garbage: {len(gc.garbage)} uncollectable objects")
# gc.garbage lists objects that GC found in cycles but couldn't collect
# (e.g., objects with __del__ in Python 2, or bugs in C extensions)

Interview Questions

Q1: How does CPython's reference counting work, and what is its fundamental limitation?

Answer: Every CPython object contains an ob_refcnt field in its PyObject header. When a new reference to the object is created (assignment, function call, storing in a container), CPython calls Py_INCREF() to increment ob_refcnt. When a reference is removed (variable goes out of scope, del, container cleared), Py_DECREF() is called. When ob_refcnt reaches zero, the object is immediately deallocated - no garbage collection pass needed. The fundamental limitation: circular references. If A references B and B references A, both have ob_refcnt ≥ 1 even with no external references. They never reach zero. CPython supplements reference counting with a cyclic garbage collector (gc module) to detect and collect such unreachable cycles.

Q2: What does `del x` actually do in Python?

Answer: del x removes the name binding x from the current namespace (a STORE_FAST is reversed to a DELETE_FAST in bytecode). It does not directly destroy the object that x was pointing to. The object is only destroyed when its ob_refcnt drops to zero, which happens only when all references to it are removed. If another name y = x was created before del x, the object remains alive via y. del x followed by using x raises NameError. del lst[i] and del d[key] are different - they call __delitem__, removing an element from a container rather than removing a name from a namespace.

Q3: What is the difference between `is` and `==`?

Answer: is tests identity: it checks whether two names point to the same object in memory (id(a) == id(b)). == tests equality: it calls a.__eq__(b) which compares values. Two distinct objects can be equal (==) without being identical (is): [1,2,3] == [1,2,3] is True but [1,2,3] is [1,2,3] is False (two different list objects). The only correct uses of is are: comparing to None (x is None), comparing to True/False in specific type checks, and explicit identity checks where object sameness matters (e.g., "is this the same callback object?"). Never use is to compare integers, strings, or other value types - the result depends on CPython implementation details like integer caching and string interning.

Q4: How do weak references break circular reference cycles?

Answer: A weak reference (weakref.ref(obj)) points to an object without incrementing its ob_refcnt. The object can be garbage-collected even if weak references to it exist - when the last strong reference is removed, the object is deallocated and all weak references to it become None. In a parent-child tree, if parent → children are strong references (parent owns children) and child → parent is a weak reference (child merely observes parent), then the reference graph has no strong cycle. Deleting the root parent allows the entire subtree to be collected via reference counting alone, without needing the cyclic GC. weakref.WeakValueDictionary is used to build caches where entries are automatically removed when no other strong reference exists.

Q5: What causes a memory leak in Python, and how would you diagnose one?

Answer: Python memory leaks arise from: (1) Unbounded caches - dicts that grow without eviction (use functools.lru_cache or weakref.WeakValueDictionary). (2) Circular references with finalizers - in older Python (< 3.4), objects with __del__ in a cycle were uncollectable. (3) Closures holding large objects - a closure captures all free variables, including large objects that the inner function only partly uses. (4) Event listeners / callbacks - registering a callback in a global event system keeps the object alive indefinitely. Diagnosis tools: gc.get_objects() to count live objects by type, gc.get_referrers(obj) to find what's keeping an object alive, sys.getsizeof() for size measurement, and third-party libraries like objgraph for visual reference graphs.

Q6: Why does `sys.getrefcount(x)` always return at least 2 for a live object?

Answer: sys.getrefcount(x) evaluates its argument before calling the function. The evaluation of x passes a reference to the list into the function call, creating a temporary reference for the duration of the call. This is standard Python calling convention: when you write func(x), CPython increments ob_refcnt for the argument before entering the function body, and decrements it after the function returns. So the reference count inside sys.getrefcount is always at least 1 (the permanent binding x) + 1 (the temporary argument reference) = 2. The rule: "subtract 1 from sys.getrefcount() to get the actual reference count."

Practice Challenges

Beginner - Trace the Reference Count

Trace the reference count of the list [1, 2, 3] through each line:

import sys

a = [1, 2, 3]          # Line 1
b = a                  # Line 2
c = [a, a]             # Line 3
del b                  # Line 4
c.pop()                # Line 5
del a                  # Line 6
# How many references does [1, 2, 3] have after line 6?

Solution

Line 1: a = [1,2,3]     → refcount = 1  (just 'a')
Line 2: b = a            → refcount = 2  (a + b)
Line 3: c = [a, a]       → refcount = 4  (a + b + c[0] + c[1])
Line 4: del b            → refcount = 3  (a + c[0] + c[1])
Line 5: c.pop()          → refcount = 2  (a + c[0], one removed from c)
Line 6: del a            → refcount = 1  (only c[0])

After line 6, the list [1, 2, 3] has 1 reference - held by c[0]. It will not be garbage collected until c is cleared or goes out of scope. Note: sys.getrefcount() would show 2 due to its own temporary reference.

Intermediate - Fix the Memory Leak

The following server stores session data. Identify the memory leak and fix it.

import threading
import time

class SessionStore:
    def __init__(self):
        self._sessions = {}   # session_id → user data
        self._lock = threading.Lock()

    def create_session(self, session_id: str, user_data: dict) -> None:
        with self._lock:
            self._sessions[session_id] = user_data

    def get_session(self, session_id: str) -> dict | None:
        with self._lock:
            return self._sessions.get(session_id)

    def delete_session(self, session_id: str) -> None:
        with self._lock:
            self._sessions.pop(session_id, None)

# Simulate: 100,000 sessions created, only 10% explicitly deleted
store = SessionStore()
for i in range(100_000):
    store.create_session(f"session_{i}", {"user_id": i, "data": "x" * 1000})

# Only 10% get deleted
for i in range(0, 100_000, 10):
    store.delete_session(f"session_{i}")

# 90,000 sessions remain in memory indefinitely!

Solution

The leak: _sessions is an unbounded dict. Sessions that are never explicitly deleted (due to client disconnection, crash, etc.) remain in memory forever. In a production server handling millions of requests, this exhausts memory.

Fixed version with TTL-based eviction:

import threading
import time
import weakref
from dataclasses import dataclass, field
from typing import Optional

@dataclass
class Session:
    session_id: str
    user_data: dict
    created_at: float = field(default_factory=time.monotonic)
    last_accessed: float = field(default_factory=time.monotonic)
    ttl_seconds: float = 1800.0   # 30 minutes

    def is_expired(self) -> bool:
        return time.monotonic() - self.last_accessed > self.ttl_seconds

    def touch(self) -> None:
        self.last_accessed = time.monotonic()


class TTLSessionStore:
    """Session store with automatic TTL-based expiration."""

    def __init__(self, ttl_seconds: float = 1800.0, cleanup_interval: float = 300.0):
        self._sessions: dict[str, Session] = {}
        self._lock = threading.Lock()
        self._ttl = ttl_seconds
        self._cleanup_interval = cleanup_interval
        self._start_cleanup_thread()

    def _start_cleanup_thread(self):
        """Background thread to evict expired sessions."""
        def cleanup_loop():
            while True:
                time.sleep(self._cleanup_interval)
                self._evict_expired()

        t = threading.Thread(target=cleanup_loop, daemon=True)
        t.start()

    def _evict_expired(self):
        with self._lock:
            expired = [sid for sid, s in self._sessions.items() if s.is_expired()]
            for sid in expired:
                del self._sessions[sid]
            if expired:
                print(f"Evicted {len(expired)} expired sessions")

    def create_session(self, session_id: str, user_data: dict) -> None:
        with self._lock:
            self._sessions[session_id] = Session(
                session_id=session_id,
                user_data=user_data,
                ttl_seconds=self._ttl,
            )

    def get_session(self, session_id: str) -> Optional[dict]:
        with self._lock:
            session = self._sessions.get(session_id)
            if session is None:
                return None
            if session.is_expired():
                del self._sessions[session_id]
                return None
            session.touch()   # Update last access time
            return session.user_data

    def delete_session(self, session_id: str) -> None:
        with self._lock:
            self._sessions.pop(session_id, None)

    def __len__(self) -> int:
        with self._lock:
            return len(self._sessions)

Alternative: use cachetools for production (battle-tested TTL cache)

from cachetools import TTLCache
import threading

class ProductionSessionStore:
    def __init__(self, maxsize=100_000, ttl=1800):
        # TTLCache automatically evicts entries older than ttl seconds
        # maxsize prevents unbounded growth
        self._cache = TTLCache(maxsize=maxsize, ttl=ttl)
        self._lock = threading.Lock()

    def create_session(self, session_id, user_data):
        with self._lock:
            self._cache[session_id] = user_data

    def get_session(self, session_id):
        with self._lock:
            return self._cache.get(session_id)

Advanced - Implement a Reference-Tracking Allocator

Implement a TrackedObject mixin that tracks how many instances are alive and can detect when reference counts drop unexpectedly. Then demonstrate it can catch a circular reference memory leak.

# Goal: count live instances, detect potential leaks
class TrackedObject:
    # ... implement this

class Node(TrackedObject):
    def __init__(self, value):
        self.value = value
        self.next = None

# After this code, TrackedObject.live_count() should report correctly
n1 = Node(1)
n2 = Node(2)
n1.next = n2   # safe - no cycle
# TrackedObject.live_count() → 2

n1.next = n1   # cycle!
del n1
del n2
# Without gc.collect(), live_count may still report 2 (cycle leak)
# After gc.collect(), it should report 0

Solution

import gc
import sys
import weakref
from threading import Lock

class TrackedObject:
    """
    Mixin that tracks live instances using weak references.
    Uses WeakSet so tracking does not prevent garbage collection.
    """
    _instances: "weakref.WeakSet[TrackedObject]" = weakref.WeakSet()
    _lock = Lock()

    def __init_subclass__(cls, **kwargs):
        super().__init_subclass__(**kwargs)
        # Each subclass gets its own WeakSet
        cls._instances = weakref.WeakSet()
        cls._lock = Lock()

    def __init__(self):
        with self.__class__._lock:
            self.__class__._instances.add(self)

    @classmethod
    def live_count(cls) -> int:
        """Return the number of currently live instances."""
        return len(cls._instances)

    @classmethod
    def check_for_leaks(cls) -> dict:
        """Run GC and check if live count changed."""
        before = cls.live_count()
        gc.collect()
        after = cls.live_count()
        return {
            "before_gc": before,
            "after_gc": after,
            "leaked": before - after,
        }


class Node(TrackedObject):
    def __init__(self, value):
        super().__init__()   # Register with tracker
        self.value = value
        self.next = None

    def __repr__(self):
        return f"Node({self.value})"


# Test 1: Normal usage - no leak
n1 = Node(1)
n2 = Node(2)
n1.next = n2
print(f"Live nodes: {Node.live_count()}")   # 2

del n1, n2
gc.collect()
print(f"Live nodes after del: {Node.live_count()}")   # 0

# Test 2: Circular reference - leak detection
n1 = Node(10)
n2 = Node(20)
n1.next = n2
n2.next = n1   # Cycle!

del n1
del n2

leak_report = Node.check_for_leaks()
print(f"\nLeak report: {leak_report}")
# {'before_gc': 2, 'after_gc': 0, 'leaked': 2}
# 'before_gc' = 2 (reference counting couldn't collect the cycle)
# 'after_gc' = 0 (cyclic GC successfully collected them)
# 'leaked' = 2 (how many were collected by GC rather than refcounting)

print(f"Live nodes after GC: {Node.live_count()}")   # 0

# Test 3: Mixed - some live, some leaked
a = Node(100)           # Will stay alive
b = Node(200)
c = Node(300)
b.next = c
c.next = b   # Cycle between b and c

del b, c   # Try to delete b and c

print(f"\nAfter del b,c - live nodes: {Node.live_count()}")  # 3 (a + b,c in cycle)
gc.collect()
print(f"After gc.collect() - live nodes: {Node.live_count()}")  # 1 (only a)

del a
gc.collect()
print(f"After del a, gc - live nodes: {Node.live_count()}")  # 0

Key design decisions:

WeakSet stores weak references to instances - the tracker does not prevent GC
__init_subclass__ gives each subclass its own separate WeakSet (not shared)
check_for_leaks() compares count before and after gc.collect() - the "leaked" count shows how many objects were held alive by cycles vs held by reference counting
Thread-safe via Lock since WeakSet is not thread-safe by default

Quick Reference

Operation	Effect on `ob_refcnt`	Object Destroyed?
`a = obj`	+1	No
`del a`	-1	Only if count reaches 0
`a = other`	-1 for old, +1 for new	Old destroyed if count reaches 0
`lst.append(obj)`	+1	No
`lst.pop()`	-1	If count reaches 0
`func(obj)` (during call)	+1	No
Return from func	-1	If count reaches 0
`weakref.ref(obj)`	0 (no change)	Yes (when strong refs gone)

Tool	Purpose	Key Method
`sys.getrefcount(obj)`	Current reference count	Returns count (subtract 1 for own ref)
`id(obj)`	Object identity (memory address in CPython)	Compare with `is`
`gc.collect()`	Force cyclic garbage collection	Returns # of unreachable objects
`gc.get_referrers(obj)`	Find what holds a reference to obj	Returns list of referrers
`gc.get_objects()`	All tracked objects	Useful for leak diagnosis
`weakref.ref(obj)`	Create weak reference	Call result to get obj or None
`weakref.WeakValueDictionary`	Dict that doesn't prevent GC of values	Auto-evicts when values collected

Key Takeaways

CPython uses reference counting (ob_refcnt in PyObject) as the primary memory management mechanism - when a count reaches zero, the object is immediately deallocated, no GC pass needed
Reference counting cannot handle circular references - CPython's supplementary cyclic GC (gc module) detects unreachable islands of objects with non-zero reference counts
Aliasing - multiple names pointing to the same object - is the root cause of unintended mutation bugs; use id() or is to detect aliasing
del x removes the name binding, not the object - the object lives as long as any reference to it exists
sys.getrefcount(x) always returns at least 2 - subtract 1 to get the real count (the extra 1 is the temporary argument reference)
Weak references (weakref.ref, WeakValueDictionary) point to objects without incrementing ob_refcnt, enabling caches and parent pointers that do not prevent garbage collection
Python list slices create new list objects (but share element references); NumPy array slices create views - a critical difference that causes silent data corruption
Memory leaks in Python typically come from unbounded global caches, closures capturing large objects unnecessarily, and circular references involving objects with __del__ (pre-Python 3.4)

What You Will Learn​

Prerequisites​

Part 1 - CPython's Memory Management: Reference Counting​

The PyObject Struct​

How Reference Counting Works​

What Increments the Reference Count​

Part 2 - The Two-Tier System: Reference Counting + Cyclic GC​

Why Reference Counting Alone is Not Enough​

The Cyclic Garbage Collector​

GC Configuration​

Part 3 - Aliasing: Multiple Names, One Object​

What Aliasing Means​

Aliasing in Function Calls​

Aliasing in Nested Structures​

Part 4 - id() and is vs ==​

id() in CPython​

is vs ==: Identity vs Equality​

Part 5 - The del Keyword: Unbinding, Not Destroying​

del vs list.clear() vs = []​

Part 6 - sys.getrefcount(): Reading Reference Counts​

Reference Count Contributions​

Part 7 - Weak References: Breaking Cycles Without Preventing Collection​

Practical Use: WeakValueDictionary for Caches​

Weak References to Break Circular Reference Memory Leaks​

Part 8 - Memory Leaks in Python​

Leak Type 1: Unbounded Global Caches​

Leak Type 2: Closures Holding Large Objects​

Leak Type 3: Circular References with __del__​

Diagnosing Memory Leaks​

Part 9 - Reference Graph for Complex Structures​

Reference Count Contributions per Object​

Part 10 - Slices Create New Objects; Generators Do Not​

NumPy Views vs Python Copies​

Part 11 - Real-World Debugging with gc and sys​

Finding Unexpected References​

Checking for Cycles​

Interview Questions​

Q1: How does CPython's reference counting work, and what is its fundamental limitation?​

Q2: What does del x actually do in Python?​

Q3: What is the difference between is and ==?​

Q4: How do weak references break circular reference cycles?​

Q5: What causes a memory leak in Python, and how would you diagnose one?​

Q6: Why does sys.getrefcount(x) always return at least 2 for a live object?​

Practice Challenges​

Beginner - Trace the Reference Count​

Intermediate - Fix the Memory Leak​

Advanced - Implement a Reference-Tracking Allocator​

Quick Reference​

Key Takeaways​

What You Will Learn

Prerequisites

Part 1 - CPython's Memory Management: Reference Counting

The PyObject Struct

How Reference Counting Works

What Increments the Reference Count

Part 2 - The Two-Tier System: Reference Counting + Cyclic GC

Why Reference Counting Alone is Not Enough

The Cyclic Garbage Collector

GC Configuration

Part 3 - Aliasing: Multiple Names, One Object

What Aliasing Means

Aliasing in Function Calls

Aliasing in Nested Structures

Part 4 - `id()` and `is` vs `==`

`id()` in CPython

`is` vs `==`: Identity vs Equality

Part 5 - The `del` Keyword: Unbinding, Not Destroying

`del` vs `list.clear()` vs `= []`

Part 6 - `sys.getrefcount()`: Reading Reference Counts

Reference Count Contributions

Part 7 - Weak References: Breaking Cycles Without Preventing Collection

Practical Use: WeakValueDictionary for Caches

Weak References to Break Circular Reference Memory Leaks

Part 8 - Memory Leaks in Python

Leak Type 1: Unbounded Global Caches

Leak Type 2: Closures Holding Large Objects

Leak Type 3: Circular References with `del`

Diagnosing Memory Leaks

Part 9 - Reference Graph for Complex Structures

Reference Count Contributions per Object

Part 10 - Slices Create New Objects; Generators Do Not

NumPy Views vs Python Copies

Part 11 - Real-World Debugging with gc and sys

Finding Unexpected References

Checking for Cycles

Interview Questions

Q1: How does CPython's reference counting work, and what is its fundamental limitation?

Q2: What does `del x` actually do in Python?

Q3: What is the difference between `is` and `==`?

Q4: How do weak references break circular reference cycles?

Q5: What causes a memory leak in Python, and how would you diagnose one?

Q6: Why does `sys.getrefcount(x)` always return at least 2 for a live object?

Practice Challenges

Beginner - Trace the Reference Count

Intermediate - Fix the Memory Leak

Advanced - Implement a Reference-Tracking Allocator

Quick Reference

Key Takeaways