Memory Profiling - tracemalloc, memory_profiler, objgraph, and pympler
Reading time: ~35 minutes | Level: Intermediate → Engineering
Before reading further, predict what this program prints and what the output means:
import tracemalloc
tracemalloc.start()
data = [i * 2 for i in range(100_000)]
snapshot = tracemalloc.take_snapshot()
top = snapshot.statistics("lineno")
for stat in top[:3]:
print(stat)
Show Answer
<ipython-input>:5: size=3906 KiB, count=1, average=3906 KiB
<ipython-input>:1: size=8 B, count=1, average=8 B
/usr/lib/python3.12/tracemalloc.py:67: size=8 B, count=1, average=8 B
The first line is the most important: line 5 (the list comprehension) allocated 3906 KiB in a single allocation of 1 object - the list itself. Each entry shows:
- size: total bytes currently allocated by code at that line
- count: number of distinct Python objects allocated there
- average: bytes per object
Most developers have never seen this output. tracemalloc shows you which line of your code is responsible for each allocation - not just the total memory used by the process. This is the tool that turns a vague "the service is leaking memory" complaint into a specific file and line number.
Memory bugs are among the hardest bugs to diagnose because they are often invisible until a service crashes under load. Python's garbage collector handles most memory automatically, but it cannot help you when your own code holds references longer than needed. This lesson gives you the full toolkit for diagnosing, measuring, and fixing memory issues in real Python applications.
What You Will Learn
- Why
sys.getsizeof()lies to you about container sizes tracemalloc: taking snapshots, reading statistics, comparing snapshots to find leaksmemory_profiler: line-by-line memory increment for a functionobjgraph: finding what is holding a reference to an object you expected to be collectedpympler: recursively measuring true object size- Common memory leak patterns in Django, asyncio, and long-running services
__slots__memory savings - proven with tracemalloc
Prerequisites
- Lesson 05 (Reference Counting) - you need to understand how Python decides when to free memory
- Lesson 06 (Garbage Collection) - cyclic garbage and the
gcmodule - Basic understanding of Python containers (list, dict, set)
Part 1 - sys.getsizeof: The Shallow Lie
sys.getsizeof() returns the shallow size of an object - the memory used by the object's own data structure, not including any objects it references.
import sys
# Shallow size of the list HEADER only - not the integers inside
small_list = [1, 2, 3]
print(sys.getsizeof(small_list)) # 88 bytes (on CPython 3.12, 64-bit)
# The list is bigger on paper:
# Each int object is ~28 bytes; 3 ints = 84 bytes
# Plus the list header = 88 bytes
# Actual memory: 88 + 84 = 172 bytes - sys.getsizeof reports only 88
# Shallow size of a dict - just the hash table skeleton
d = {"a": 1, "b": 2}
print(sys.getsizeof(d)) # 232 bytes on Python 3.12
# An empty dict is nearly as large - the hash table is pre-allocated
empty_dict = {}
print(sys.getsizeof(empty_dict)) # 184 bytes
The Shallow vs Deep Size Diagram
Never use sys.getsizeof to measure the real memory usage of containers. sys.getsizeof([1, 2, 3]) returns ~88 bytes on a 64-bit system. The actual memory occupied by the list and its three integer objects is ~360 bytes. For nested structures like list[list[dict[str, Any]]], the undercount is orders of magnitude worse.
sys.getsizeof([1, 2, 3]) is ~88 bytes. The actual memory including the three Python int objects referenced by the list is ~360 bytes. The list header only stores pointers to the integers - sys.getsizeof counts the pointer array, not the pointed-to objects.
When sys.getsizeof IS Useful
sys.getsizeof is correct for scalar objects that contain no references:
import sys
print(sys.getsizeof(0)) # 24 bytes - small int
print(sys.getsizeof(10**100)) # 48 bytes - large int
print(sys.getsizeof(3.14)) # 24 bytes - float
print(sys.getsizeof("hello")) # 54 bytes - 5-char string
print(sys.getsizeof(b"hello")) # 38 bytes - 5-byte bytes object
# Safe to use for profiling individual scalars or comparing
# the size of different string representations:
print(sys.getsizeof("hello world")) # 60 bytes
print(sys.getsizeof(b"hello world")) # 44 bytes - bytes is smaller than str
Part 2 - tracemalloc: Surgical Memory Profiling
tracemalloc is the standard library answer to "which line of my code is allocating the most memory?" It hooks into CPython's memory allocator and records a traceback for every allocation.
Start, Snapshot, Read
import tracemalloc
# Start tracing - call this as early as possible
tracemalloc.start()
# --- your code under test ---
result = {}
for i in range(10_000):
result[f"key_{i}"] = list(range(i % 100))
# ---
# Take a snapshot of all current allocations
snapshot = tracemalloc.take_snapshot()
# Group by source line - shows which line allocated the most
stats = snapshot.statistics("lineno")
print("=== Top 5 memory allocations ===")
for stat in stats[:5]:
print(stat)
=== Top 5 memory allocations ===
example.py:7: size=2938 KiB, count=10000, average=300 B
example.py:6: size=441 KiB, count=10000, average=45 B
example.py:5: size=1 KiB, count=1, average=1 KiB
statistics() Grouping Keys
# Group by source file - useful for large projects
stats_by_file = snapshot.statistics("filename")
for stat in stats_by_file[:5]:
print(stat)
# Group by traceback - shows the full call chain
stats_by_tb = snapshot.statistics("traceback")
for stat in stats_by_tb[:3]:
print(stat)
for line in stat.traceback.format():
print(" ", line)
tracemalloc.compare_to(): Finding Leaks Between Snapshots
This is the most powerful tracemalloc feature for leak detection. Take a baseline snapshot before the suspected leak, run the leaking code, take a second snapshot, and compare.
import tracemalloc
tracemalloc.start()
# Baseline - before the suspected leak
snapshot_before = tracemalloc.take_snapshot()
# --- run the code you suspect is leaking ---
cache = {}
for i in range(5000):
cache[f"session_{i}"] = {"user": f"user_{i}", "data": list(range(100))}
# Imagine this cache is never cleared - it grows forever
# ---
snapshot_after = tracemalloc.take_snapshot()
# Compare: positive size = new allocations since baseline
top_stats = snapshot_after.compare_to(snapshot_before, "lineno")
print("=== New allocations since baseline ===")
for stat in top_stats[:5]:
print(stat)
=== New allocations since baseline ===
leak_demo.py:8: size=2400 KiB (+2400 KiB), count=5000 (+5000), average=491 B
leak_demo.py:7: size=391 KiB (+391 KiB), count=5001 (+5001), average=80 B
The +2400 KiB tells you exactly how much new memory line 8 allocated between your two snapshots.
Always use tracemalloc.take_snapshot() BEFORE and AFTER the suspected leaking operation, then call snapshot_after.compare_to(snapshot_before, "lineno"). The diff isolates exactly what your code allocated during that interval and cuts through the noise of allocations from imports and startup.
Reading the Full Traceback
For deep investigation, statistics("traceback") gives the full call chain:
import tracemalloc
tracemalloc.start(25) # keep up to 25 frames in traceback
def inner():
return [0] * 10_000
def middle():
return inner()
def outer():
return middle()
data = outer()
snapshot = tracemalloc.take_snapshot()
stats = snapshot.statistics("traceback")
for stat in stats[:1]:
print(f"Total: {stat.size / 1024:.1f} KiB")
print("Traceback:")
for line in stat.traceback.format():
print(" ", line)
Total: 80.0 KiB
Traceback:
File "example.py", line 5, in inner
return [0] * 10_000
File "example.py", line 8, in middle
return inner()
File "example.py", line 11, in outer
return middle()
File "example.py", line 14, in <module>
data = outer()
tracemalloc adds roughly 10–30% memory overhead and slows allocation-heavy code. Disable it in production deployments unless you are actively investigating a memory issue. Use tracemalloc.stop() when done and tracemalloc.is_tracing() to check state before conditionally enabling.
Part 3 - memory_profiler: Line-by-Line Memory Increments
While tracemalloc shows allocations by source line across your whole program, memory_profiler shows the memory increment of each line inside a decorated function. It is the memory equivalent of line_profiler.
Installation and the @profile Decorator
pip install memory_profiler
# memory_demo.py
from memory_profiler import profile
@profile
def build_data_structures():
# Line-by-line memory is shown after each statement
numbers = list(range(1_000_000)) # ~8 MB
squares = [x * x for x in numbers] # ~8 MB more
del numbers # reclaim ~8 MB
lookup = {x: x * x for x in range(100_000)} # ~10 MB
del squares # reclaim ~8 MB
return lookup
if __name__ == "__main__":
build_data_structures()
Run with:
python -m memory_profiler memory_demo.py
Output:
Line # Mem usage Increment Line Contents
================================================
4 45.3 MiB 45.3 MiB @profile
5 def build_data_structures():
6 52.9 MiB +7.6 MiB numbers = list(range(1_000_000))
7 60.4 MiB +7.5 MiB squares = [x * x for x in numbers]
8 52.9 MiB -7.5 MiB del numbers
9 63.1 MiB +10.2 MiB lookup = {x: x * x for x in range(100_000)}
10 55.6 MiB -7.5 MiB del squares
11 55.6 MiB 0.0 MiB return lookup
The Increment column is the key insight: positive values are new allocations, negative values are reclaimed memory.
mprof: Sampling Over Time
mprof records memory usage sampled over time - useful for finding gradual leaks in long-running processes:
# Record memory usage while running a script
mprof run my_service.py
# Plot the recorded memory timeline
mprof plot
# List recorded runs
mprof list
This generates a timeline showing RSS (Resident Set Size) sampled every 0.1 seconds. A steadily rising line with no downward trend indicates a leak.
Part 4 - objgraph: Following Reference Chains
objgraph answers the question: "Why hasn't this object been garbage collected yet?" It shows which objects are holding references to your object - the retention path.
pip install objgraph
Most Common Types
import objgraph
# Which Python types are using the most memory right now?
objgraph.show_most_common_types(limit=10)
dict 2847
function 1053
tuple 891
list 672
set 234
type 198
weakref 145
cell 132
show_growth(): What Appeared Since Last Check
import objgraph
objgraph.get_leaking_objects() # takes baseline
# run suspected leaking code
cache = {}
for i in range(100):
cache[f"key_{i}"] = {"value": i, "data": list(range(i))}
# show what grew since baseline
objgraph.show_growth(limit=5)
dict +102
list +101
str +200
int +10
show_refs() and show_backrefs(): Visualising the Object Graph
import objgraph
# What does this object reference?
x = {"key": [1, 2, 3], "other": (4, 5)}
objgraph.show_refs([x], filename="refs.png")
# What is referencing this object? (retention path)
leaky_list = [1, 2, 3]
global_cache = {"session_1": leaky_list}
objgraph.show_backrefs(leaky_list, filename="backrefs.png")
# Shows: leaky_list ← global_cache ← module globals
show_backrefs is invaluable when you expected an object to be garbage collected but it is not - it shows you exactly which chain of references is keeping it alive.
Part 5 - pympler: True Recursive Object Size
pympler.asizeof() recursively follows all references and returns the true total memory footprint of an object and everything it references.
pip install pympler
from pympler import asizeof
# Shallow size - misleading
import sys
data = {"key": [1, 2, 3, "hello", (4, 5)]}
print(sys.getsizeof(data)) # 232 bytes - dict header only
# True size - all referenced objects included
print(asizeof.asizeof(data)) # ~600+ bytes - full recursive measurement
Comparing sys.getsizeof vs asizeof
import sys
from pympler import asizeof
class Config:
def __init__(self):
self.settings = {f"key_{i}": f"value_{i}" for i in range(100)}
self.history = list(range(1000))
cfg = Config()
print(f"sys.getsizeof: {sys.getsizeof(cfg):>10} bytes")
# sys.getsizeof: 48 bytes - just the instance __dict__ pointer
print(f"asizeof: {asizeof.asizeof(cfg):>10} bytes")
# asizeof: 58432 bytes - everything cfg references, recursively
muppy: Heap Snapshot
from pympler import muppy, summary
# Get all Python objects currently in the heap
all_objects = muppy.get_objects()
# Summarise by type
summ = summary.summarize(all_objects)
summary.print_(summ)
types | # objects | total size
============================ | =========== | ============
dict | 2847 | 3.08 MB
function | 1053 | 148.90 KB
tuple | 891 | 84.24 KB
list | 672 | 74.12 KB
Part 6 - Common Memory Leak Patterns
Understanding the tools is only half the battle. Here are the patterns that cause real-world Python memory leaks.
Pattern 1: Unbounded Global Caches
# anti-pattern: cache that only grows
_results_cache = {}
def expensive_query(user_id: int, query: str) -> dict:
key = (user_id, query)
if key not in _results_cache:
_results_cache[key] = run_db_query(user_id, query)
return _results_cache[key]
# After 24 hours of traffic: _results_cache has millions of entries
# Fix: use functools.lru_cache with a maxsize, or an LRU dict
from functools import lru_cache
@lru_cache(maxsize=10_000)
def expensive_query_fixed(user_id: int, query: str) -> dict:
return run_db_query(user_id, query)
Pattern 2: Event Listeners Not Unregistered
# anti-pattern in an event system
class EventBus:
_listeners: dict[str, list] = {} # class-level - lives forever
@classmethod
def subscribe(cls, event: str, callback):
cls._listeners.setdefault(event, []).append(callback)
# BUG: callback captures 'self' of the subscriber object
# subscriber cannot be GC'd because the EventBus holds a reference
# Fix: use weakref.WeakSet or weakref.ref
import weakref
class EventBusSafe:
_listeners: dict[str, list] = {}
@classmethod
def subscribe(cls, event: str, callback):
cls._listeners.setdefault(event, []).append(weakref.ref(callback))
@classmethod
def unsubscribe(cls, event: str, callback):
listeners = cls._listeners.get(event, [])
cls._listeners[event] = [l for l in listeners if l() is not None and l() != callback]
Pattern 3: Django QuerySets Cached in Class Attributes
# anti-pattern: QuerySet evaluated and pinned to the class
class ProductView:
# This executes the query ONCE at class definition time
# and pins the entire queryset result to the class forever
all_products = Product.objects.filter(active=True) # WRONG
# Fix: use a property or method
class ProductViewFixed:
@property
def all_products(self):
return Product.objects.filter(active=True) # fresh queryset each time
Pattern 4: Large Objects Captured in Closures
# anti-pattern: closure captures a large object
def make_processor(large_dataset: list) -> callable:
# large_dataset is captured by the closure
# it cannot be GC'd as long as the returned function is alive
def process(x):
return x in large_dataset # 'large_dataset' is closed over
return process
processor = make_processor(list(range(1_000_000)))
del large_dataset # does NOT free the list - the closure still holds it
# Fix: capture only what you need
def make_processor_fixed(large_dataset: list) -> callable:
lookup = set(large_dataset) # convert to set for O(1) lookup
del large_dataset # explicitly release the list
def process(x):
return x in lookup # only the set is captured
return process
Part 7 - Practical Leak Debugging Workflow
Complete Workflow Example
import tracemalloc
import gc
def find_leak(leaky_function, iterations: int = 100):
"""
Profile memory growth of leaky_function over multiple calls.
Returns the top allocation sites.
"""
gc.collect()
tracemalloc.start(10) # 10 frames of traceback
snapshot_before = tracemalloc.take_snapshot()
for _ in range(iterations):
leaky_function()
gc.collect() # collect cyclic garbage before final snapshot
snapshot_after = tracemalloc.take_snapshot()
tracemalloc.stop()
diff = snapshot_after.compare_to(snapshot_before, "lineno")
return diff[:10]
# Usage
def suspect_function():
# simulate accumulation in a module-level dict
import sys
module = sys.modules[__name__]
if not hasattr(module, "_leak_store"):
module._leak_store = {}
module._leak_store[id(object())] = list(range(100))
results = find_leak(suspect_function, iterations=50)
for stat in results:
print(stat)
Part 8 - slots Memory Savings: Proven with tracemalloc
__slots__ was introduced in the OOP module. Here we prove the savings with tracemalloc rather than just asserting them.
import tracemalloc
class WithDict:
def __init__(self, x: int, y: int):
self.x = x
self.y = y
class WithSlots:
__slots__ = ("x", "y")
def __init__(self, x: int, y: int):
self.x = x
self.y = y
N = 100_000
# Measure WithDict
tracemalloc.start()
objects_with_dict = [WithDict(i, i * 2) for i in range(N)]
snap_dict = tracemalloc.take_snapshot()
tracemalloc.stop()
# Measure WithSlots
tracemalloc.start()
objects_with_slots = [WithSlots(i, i * 2) for i in range(N)]
snap_slots = tracemalloc.take_snapshot()
tracemalloc.stop()
def total_size(snapshot):
return sum(s.size for s in snapshot.statistics("filename"))
print(f"WithDict total: {total_size(snap_dict) / 1024 / 1024:.2f} MB")
print(f"WithSlots total: {total_size(snap_slots) / 1024 / 1024:.2f} MB")
WithDict total: 56.32 MB
WithSlots total: 19.07 MB
__slots__ eliminates the per-instance __dict__ (a dictionary is ~232 bytes), leaving only the slot descriptors. For 100 000 objects, the savings are over 37 MB - a 66% reduction.
import sys
d = WithDict(1, 2)
s = WithSlots(1, 2)
print(sys.getsizeof(d)) # 48 bytes
print(sys.getsizeof(s)) # 48 bytes - identical for getsizeof
print(sys.getsizeof(d.__dict__)) # 232 bytes - the hidden cost of __dict__
# s has no __dict__ - AttributeError if you try s.__dict__
This confirms why sys.getsizeof is inadequate: it reports the same size for both classes, hiding the 232-byte per-instance __dict__ overhead of WithDict.
Graded Practice
Level 1 - Predict the Output
import sys
data = [1, 2, 3, 4, 5]
print(sys.getsizeof(data))
nested = [[1, 2], [3, 4], [5, 6]]
print(sys.getsizeof(nested))
print(sys.getsizeof(nested) > sys.getsizeof(data))
Show Answer
120
120
False
Both lists have 5 (or 3×2) elements of comparable pointer size. The outer nested list has 3 slots, reporting a smaller header than the 5-element data list. Neither measurement includes the integers or inner lists referenced. sys.getsizeof of a list is approximately 56 + 8 * len(list) bytes - only the list header and pointer array.
Level 2 - Debug This Code
This code is supposed to profile how much memory a function uses. Why does it always report 0 bytes?
import tracemalloc
def profile_function(func):
snapshot_before = tracemalloc.take_snapshot()
func()
snapshot_after = tracemalloc.take_snapshot()
diff = snapshot_after.compare_to(snapshot_before, "lineno")
return diff
def allocate_data():
return [i * 2 for i in range(100_000)]
results = profile_function(allocate_data)
for stat in results[:3]:
print(stat)
Show Answer
tracemalloc.start() was never called. tracemalloc.take_snapshot() requires tracing to be active - if you call it without first calling tracemalloc.start(), it raises RuntimeError: the tracemalloc module must be tracing memory allocations.
Additionally, the return value of allocate_data() is discarded - the list is allocated and immediately freed. Even with tracing active, the comparison might show zero net change because the allocation and deallocation both happen before the second snapshot.
Fixed version:
import tracemalloc
def profile_function(func):
tracemalloc.start() # must start tracing
snapshot_before = tracemalloc.take_snapshot()
result = func() # keep result to prevent early GC
snapshot_after = tracemalloc.take_snapshot()
tracemalloc.stop()
diff = snapshot_after.compare_to(snapshot_before, "lineno")
return diff, result # return result to keep it alive
def allocate_data():
return [i * 2 for i in range(100_000)]
results, _ = profile_function(allocate_data)
for stat in results[:3]:
print(stat)
Level 3 - Design Challenge
You are the lead engineer on a Python web service that processes user sessions. After 24 hours of load testing, the service's RSS grows from 200 MB to 1.8 GB without restarting. The service handles roughly 50 000 sessions per hour.
Design a complete memory profiling plan:
- What tools would you use and in what order?
- Write the
tracemallocinstrumentation code you would add to the service. - What are the three most likely root causes given this context, and how would you confirm each?
- How would you ensure the profiling instrumentation itself does not cause a secondary memory issue in production?
Show Answer
Step 1: Hypothesis formation
Given 50 000 sessions/hour over 24 hours = 1.2 million sessions, and growth from 200 MB to 1.8 GB (+1.6 GB), that is roughly 1.3 KB per session that is never freed. The pattern points to a cache or session store that accumulates session objects.
Step 2: Profiling plan
Tools in order:
tracemalloc- identify which line is responsible for allocationsobjgraph.show_growth()- confirm which Python type is growingobjgraph.show_backrefs()- find the retention chain on a sample leaked objectpympler.asizeof()- measure the true size of a single leaked object
Step 3: tracemalloc instrumentation
import tracemalloc
import logging
import threading
import time
_snapshot_lock = threading.Lock()
_baseline_snapshot = None
def start_leak_detection():
"""Call at service startup."""
global _baseline_snapshot
tracemalloc.start(10)
time.sleep(30) # let startup allocations settle
_baseline_snapshot = tracemalloc.take_snapshot()
logging.info("Memory baseline captured")
def report_allocations_since_baseline():
"""Call periodically (e.g., every hour) via a background task."""
global _baseline_snapshot
if _baseline_snapshot is None:
return
with _snapshot_lock:
current = tracemalloc.take_snapshot()
diff = current.compare_to(_baseline_snapshot, "lineno")
for stat in diff[:10]:
logging.warning("MEMORY_GROWTH: %s", stat)
# Roll the baseline forward to track incremental growth
_baseline_snapshot = current
Step 4: Three most likely root causes
-
Session dict never evicted: A
_sessions = {}global dict holds session data indexed by session ID and is never cleared. Confirm byobjgraph.show_growth()showingdictcount growing linearly with session count, andshow_backrefs()showing a session dict referenced by a module-level variable. -
Middleware or decorator capturing request objects: A logging decorator closes over the
requestobject (which holds the full request body). Confirm by findingHttpRequestor similar objects inobjgraph.show_most_common_types()with count proportional to total requests processed. -
asyncio tasks not awaited or cancelled: In async services, tasks that raise exceptions without being awaited accumulate in the event loop's
_readyqueue. Confirm withlen(asyncio.all_tasks())growing over time.
Step 5: Production safety
- Use
tracemalloc.start(10)(limit traceback depth) rather than the default - deep tracebacks multiply overhead - Sample: only call
report_allocations_since_baseline()every 30–60 minutes, not per request - Use a feature flag to enable/disable tracing without restarting the service
- Monitor the overhead: compare p99 latency with and without
tracemallocactive; if overhead exceeds 15%, reduce snapshot frequency
Key Takeaways
sys.getsizeof()measures shallow size only - the object header and its own data structure, never the objects it references. It is correct for scalars but dangerously misleading for containers.tracemallocis the built-in standard library solution for finding which line of code is responsible for allocations. Usecompare_to()between two snapshots to isolate leaks.memory_profiler's@profiledecorator shows the memory increment of each line inside a function - the closest thing Python has to a line-level memory profiler.objgraph.show_backrefs()answers "why is this object still alive?" by showing the reference chain that prevents garbage collection.pympler.asizeof()gives the true recursive size of an object and everything it references - use it when you need an honest size measurement.- The four most common Python memory leak patterns are: unbounded global caches, event listeners not unregistered, Django QuerySets cached on class attributes, and large objects captured in closures.
__slots__saves ~232 bytes per instance by eliminating the per-instance__dict__. On 100 000 objects this is over 22 MB - measurable withtracemalloc.- A systematic leak investigation always goes:
tracemallocbaseline → reproduce → compare snapshots →objgraphbackrefs → fix → verify flat diff.
