Skip to main content

Memory Profiling - tracemalloc, memory_profiler, objgraph, and pympler

Reading time: ~35 minutes | Level: Intermediate → Engineering

Before reading further, predict what this program prints and what the output means:

import tracemalloc

tracemalloc.start()

data = [i * 2 for i in range(100_000)]
snapshot = tracemalloc.take_snapshot()
top = snapshot.statistics("lineno")

for stat in top[:3]:
print(stat)
Show Answer
<ipython-input>:5: size=3906 KiB, count=1, average=3906 KiB
<ipython-input>:1: size=8 B, count=1, average=8 B
/usr/lib/python3.12/tracemalloc.py:67: size=8 B, count=1, average=8 B

The first line is the most important: line 5 (the list comprehension) allocated 3906 KiB in a single allocation of 1 object - the list itself. Each entry shows:

  • size: total bytes currently allocated by code at that line
  • count: number of distinct Python objects allocated there
  • average: bytes per object

Most developers have never seen this output. tracemalloc shows you which line of your code is responsible for each allocation - not just the total memory used by the process. This is the tool that turns a vague "the service is leaking memory" complaint into a specific file and line number.

Memory bugs are among the hardest bugs to diagnose because they are often invisible until a service crashes under load. Python's garbage collector handles most memory automatically, but it cannot help you when your own code holds references longer than needed. This lesson gives you the full toolkit for diagnosing, measuring, and fixing memory issues in real Python applications.

What You Will Learn

  • Why sys.getsizeof() lies to you about container sizes
  • tracemalloc: taking snapshots, reading statistics, comparing snapshots to find leaks
  • memory_profiler: line-by-line memory increment for a function
  • objgraph: finding what is holding a reference to an object you expected to be collected
  • pympler: recursively measuring true object size
  • Common memory leak patterns in Django, asyncio, and long-running services
  • __slots__ memory savings - proven with tracemalloc

Prerequisites

  • Lesson 05 (Reference Counting) - you need to understand how Python decides when to free memory
  • Lesson 06 (Garbage Collection) - cyclic garbage and the gc module
  • Basic understanding of Python containers (list, dict, set)

Part 1 - sys.getsizeof: The Shallow Lie

sys.getsizeof() returns the shallow size of an object - the memory used by the object's own data structure, not including any objects it references.

import sys

# Shallow size of the list HEADER only - not the integers inside
small_list = [1, 2, 3]
print(sys.getsizeof(small_list)) # 88 bytes (on CPython 3.12, 64-bit)

# The list is bigger on paper:
# Each int object is ~28 bytes; 3 ints = 84 bytes
# Plus the list header = 88 bytes
# Actual memory: 88 + 84 = 172 bytes - sys.getsizeof reports only 88

# Shallow size of a dict - just the hash table skeleton
d = {"a": 1, "b": 2}
print(sys.getsizeof(d)) # 232 bytes on Python 3.12

# An empty dict is nearly as large - the hash table is pre-allocated
empty_dict = {}
print(sys.getsizeof(empty_dict)) # 184 bytes

The Shallow vs Deep Size Diagram

danger

Never use sys.getsizeof to measure the real memory usage of containers. sys.getsizeof([1, 2, 3]) returns ~88 bytes on a 64-bit system. The actual memory occupied by the list and its three integer objects is ~360 bytes. For nested structures like list[list[dict[str, Any]]], the undercount is orders of magnitude worse.

warning

sys.getsizeof([1, 2, 3]) is ~88 bytes. The actual memory including the three Python int objects referenced by the list is ~360 bytes. The list header only stores pointers to the integers - sys.getsizeof counts the pointer array, not the pointed-to objects.

When sys.getsizeof IS Useful

sys.getsizeof is correct for scalar objects that contain no references:

import sys

print(sys.getsizeof(0)) # 24 bytes - small int
print(sys.getsizeof(10**100)) # 48 bytes - large int
print(sys.getsizeof(3.14)) # 24 bytes - float
print(sys.getsizeof("hello")) # 54 bytes - 5-char string
print(sys.getsizeof(b"hello")) # 38 bytes - 5-byte bytes object

# Safe to use for profiling individual scalars or comparing
# the size of different string representations:
print(sys.getsizeof("hello world")) # 60 bytes
print(sys.getsizeof(b"hello world")) # 44 bytes - bytes is smaller than str

Part 2 - tracemalloc: Surgical Memory Profiling

tracemalloc is the standard library answer to "which line of my code is allocating the most memory?" It hooks into CPython's memory allocator and records a traceback for every allocation.

Start, Snapshot, Read

import tracemalloc

# Start tracing - call this as early as possible
tracemalloc.start()

# --- your code under test ---
result = {}
for i in range(10_000):
result[f"key_{i}"] = list(range(i % 100))
# ---

# Take a snapshot of all current allocations
snapshot = tracemalloc.take_snapshot()

# Group by source line - shows which line allocated the most
stats = snapshot.statistics("lineno")

print("=== Top 5 memory allocations ===")
for stat in stats[:5]:
print(stat)
=== Top 5 memory allocations ===
example.py:7: size=2938 KiB, count=10000, average=300 B
example.py:6: size=441 KiB, count=10000, average=45 B
example.py:5: size=1 KiB, count=1, average=1 KiB

statistics() Grouping Keys

# Group by source file - useful for large projects
stats_by_file = snapshot.statistics("filename")
for stat in stats_by_file[:5]:
print(stat)

# Group by traceback - shows the full call chain
stats_by_tb = snapshot.statistics("traceback")
for stat in stats_by_tb[:3]:
print(stat)
for line in stat.traceback.format():
print(" ", line)

tracemalloc.compare_to(): Finding Leaks Between Snapshots

This is the most powerful tracemalloc feature for leak detection. Take a baseline snapshot before the suspected leak, run the leaking code, take a second snapshot, and compare.

import tracemalloc

tracemalloc.start()

# Baseline - before the suspected leak
snapshot_before = tracemalloc.take_snapshot()

# --- run the code you suspect is leaking ---
cache = {}
for i in range(5000):
cache[f"session_{i}"] = {"user": f"user_{i}", "data": list(range(100))}
# Imagine this cache is never cleared - it grows forever
# ---

snapshot_after = tracemalloc.take_snapshot()

# Compare: positive size = new allocations since baseline
top_stats = snapshot_after.compare_to(snapshot_before, "lineno")

print("=== New allocations since baseline ===")
for stat in top_stats[:5]:
print(stat)
=== New allocations since baseline ===
leak_demo.py:8: size=2400 KiB (+2400 KiB), count=5000 (+5000), average=491 B
leak_demo.py:7: size=391 KiB (+391 KiB), count=5001 (+5001), average=80 B

The +2400 KiB tells you exactly how much new memory line 8 allocated between your two snapshots.

tip

Always use tracemalloc.take_snapshot() BEFORE and AFTER the suspected leaking operation, then call snapshot_after.compare_to(snapshot_before, "lineno"). The diff isolates exactly what your code allocated during that interval and cuts through the noise of allocations from imports and startup.

Reading the Full Traceback

For deep investigation, statistics("traceback") gives the full call chain:

import tracemalloc

tracemalloc.start(25) # keep up to 25 frames in traceback

def inner():
return [0] * 10_000

def middle():
return inner()

def outer():
return middle()

data = outer()
snapshot = tracemalloc.take_snapshot()
stats = snapshot.statistics("traceback")

for stat in stats[:1]:
print(f"Total: {stat.size / 1024:.1f} KiB")
print("Traceback:")
for line in stat.traceback.format():
print(" ", line)
Total: 80.0 KiB
Traceback:
File "example.py", line 5, in inner
return [0] * 10_000
File "example.py", line 8, in middle
return inner()
File "example.py", line 11, in outer
return middle()
File "example.py", line 14, in <module>
data = outer()
note

tracemalloc adds roughly 10–30% memory overhead and slows allocation-heavy code. Disable it in production deployments unless you are actively investigating a memory issue. Use tracemalloc.stop() when done and tracemalloc.is_tracing() to check state before conditionally enabling.

Part 3 - memory_profiler: Line-by-Line Memory Increments

While tracemalloc shows allocations by source line across your whole program, memory_profiler shows the memory increment of each line inside a decorated function. It is the memory equivalent of line_profiler.

Installation and the @profile Decorator

pip install memory_profiler
# memory_demo.py
from memory_profiler import profile

@profile
def build_data_structures():
# Line-by-line memory is shown after each statement
numbers = list(range(1_000_000)) # ~8 MB
squares = [x * x for x in numbers] # ~8 MB more
del numbers # reclaim ~8 MB
lookup = {x: x * x for x in range(100_000)} # ~10 MB
del squares # reclaim ~8 MB
return lookup

if __name__ == "__main__":
build_data_structures()

Run with:

python -m memory_profiler memory_demo.py

Output:

Line # Mem usage Increment Line Contents
================================================
4 45.3 MiB 45.3 MiB @profile
5 def build_data_structures():
6 52.9 MiB +7.6 MiB numbers = list(range(1_000_000))
7 60.4 MiB +7.5 MiB squares = [x * x for x in numbers]
8 52.9 MiB -7.5 MiB del numbers
9 63.1 MiB +10.2 MiB lookup = {x: x * x for x in range(100_000)}
10 55.6 MiB -7.5 MiB del squares
11 55.6 MiB 0.0 MiB return lookup

The Increment column is the key insight: positive values are new allocations, negative values are reclaimed memory.

mprof: Sampling Over Time

mprof records memory usage sampled over time - useful for finding gradual leaks in long-running processes:

# Record memory usage while running a script
mprof run my_service.py

# Plot the recorded memory timeline
mprof plot

# List recorded runs
mprof list

This generates a timeline showing RSS (Resident Set Size) sampled every 0.1 seconds. A steadily rising line with no downward trend indicates a leak.

Part 4 - objgraph: Following Reference Chains

objgraph answers the question: "Why hasn't this object been garbage collected yet?" It shows which objects are holding references to your object - the retention path.

pip install objgraph

Most Common Types

import objgraph

# Which Python types are using the most memory right now?
objgraph.show_most_common_types(limit=10)
dict 2847
function 1053
tuple 891
list 672
set 234
type 198
weakref 145
cell 132

show_growth(): What Appeared Since Last Check

import objgraph

objgraph.get_leaking_objects() # takes baseline

# run suspected leaking code
cache = {}
for i in range(100):
cache[f"key_{i}"] = {"value": i, "data": list(range(i))}

# show what grew since baseline
objgraph.show_growth(limit=5)
dict +102
list +101
str +200
int +10

show_refs() and show_backrefs(): Visualising the Object Graph

import objgraph

# What does this object reference?
x = {"key": [1, 2, 3], "other": (4, 5)}
objgraph.show_refs([x], filename="refs.png")

# What is referencing this object? (retention path)
leaky_list = [1, 2, 3]
global_cache = {"session_1": leaky_list}

objgraph.show_backrefs(leaky_list, filename="backrefs.png")
# Shows: leaky_list ← global_cache ← module globals

show_backrefs is invaluable when you expected an object to be garbage collected but it is not - it shows you exactly which chain of references is keeping it alive.

Part 5 - pympler: True Recursive Object Size

pympler.asizeof() recursively follows all references and returns the true total memory footprint of an object and everything it references.

pip install pympler
from pympler import asizeof

# Shallow size - misleading
import sys
data = {"key": [1, 2, 3, "hello", (4, 5)]}
print(sys.getsizeof(data)) # 232 bytes - dict header only

# True size - all referenced objects included
print(asizeof.asizeof(data)) # ~600+ bytes - full recursive measurement

Comparing sys.getsizeof vs asizeof

import sys
from pympler import asizeof

class Config:
def __init__(self):
self.settings = {f"key_{i}": f"value_{i}" for i in range(100)}
self.history = list(range(1000))

cfg = Config()

print(f"sys.getsizeof: {sys.getsizeof(cfg):>10} bytes")
# sys.getsizeof: 48 bytes - just the instance __dict__ pointer

print(f"asizeof: {asizeof.asizeof(cfg):>10} bytes")
# asizeof: 58432 bytes - everything cfg references, recursively

muppy: Heap Snapshot

from pympler import muppy, summary

# Get all Python objects currently in the heap
all_objects = muppy.get_objects()

# Summarise by type
summ = summary.summarize(all_objects)
summary.print_(summ)
types | # objects | total size
============================ | =========== | ============
dict | 2847 | 3.08 MB
function | 1053 | 148.90 KB
tuple | 891 | 84.24 KB
list | 672 | 74.12 KB

Part 6 - Common Memory Leak Patterns

Understanding the tools is only half the battle. Here are the patterns that cause real-world Python memory leaks.

Pattern 1: Unbounded Global Caches

# anti-pattern: cache that only grows
_results_cache = {}

def expensive_query(user_id: int, query: str) -> dict:
key = (user_id, query)
if key not in _results_cache:
_results_cache[key] = run_db_query(user_id, query)
return _results_cache[key]

# After 24 hours of traffic: _results_cache has millions of entries
# Fix: use functools.lru_cache with a maxsize, or an LRU dict
from functools import lru_cache

@lru_cache(maxsize=10_000)
def expensive_query_fixed(user_id: int, query: str) -> dict:
return run_db_query(user_id, query)

Pattern 2: Event Listeners Not Unregistered

# anti-pattern in an event system
class EventBus:
_listeners: dict[str, list] = {} # class-level - lives forever

@classmethod
def subscribe(cls, event: str, callback):
cls._listeners.setdefault(event, []).append(callback)
# BUG: callback captures 'self' of the subscriber object
# subscriber cannot be GC'd because the EventBus holds a reference

# Fix: use weakref.WeakSet or weakref.ref
import weakref

class EventBusSafe:
_listeners: dict[str, list] = {}

@classmethod
def subscribe(cls, event: str, callback):
cls._listeners.setdefault(event, []).append(weakref.ref(callback))

@classmethod
def unsubscribe(cls, event: str, callback):
listeners = cls._listeners.get(event, [])
cls._listeners[event] = [l for l in listeners if l() is not None and l() != callback]

Pattern 3: Django QuerySets Cached in Class Attributes

# anti-pattern: QuerySet evaluated and pinned to the class
class ProductView:
# This executes the query ONCE at class definition time
# and pins the entire queryset result to the class forever
all_products = Product.objects.filter(active=True) # WRONG

# Fix: use a property or method
class ProductViewFixed:
@property
def all_products(self):
return Product.objects.filter(active=True) # fresh queryset each time

Pattern 4: Large Objects Captured in Closures

# anti-pattern: closure captures a large object
def make_processor(large_dataset: list) -> callable:
# large_dataset is captured by the closure
# it cannot be GC'd as long as the returned function is alive
def process(x):
return x in large_dataset # 'large_dataset' is closed over
return process

processor = make_processor(list(range(1_000_000)))
del large_dataset # does NOT free the list - the closure still holds it

# Fix: capture only what you need
def make_processor_fixed(large_dataset: list) -> callable:
lookup = set(large_dataset) # convert to set for O(1) lookup
del large_dataset # explicitly release the list
def process(x):
return x in lookup # only the set is captured
return process

Part 7 - Practical Leak Debugging Workflow

Complete Workflow Example

import tracemalloc
import gc

def find_leak(leaky_function, iterations: int = 100):
"""
Profile memory growth of leaky_function over multiple calls.
Returns the top allocation sites.
"""
gc.collect()
tracemalloc.start(10) # 10 frames of traceback

snapshot_before = tracemalloc.take_snapshot()

for _ in range(iterations):
leaky_function()

gc.collect() # collect cyclic garbage before final snapshot
snapshot_after = tracemalloc.take_snapshot()

tracemalloc.stop()

diff = snapshot_after.compare_to(snapshot_before, "lineno")
return diff[:10]

# Usage
def suspect_function():
# simulate accumulation in a module-level dict
import sys
module = sys.modules[__name__]
if not hasattr(module, "_leak_store"):
module._leak_store = {}
module._leak_store[id(object())] = list(range(100))

results = find_leak(suspect_function, iterations=50)
for stat in results:
print(stat)

Part 8 - slots Memory Savings: Proven with tracemalloc

__slots__ was introduced in the OOP module. Here we prove the savings with tracemalloc rather than just asserting them.

import tracemalloc

class WithDict:
def __init__(self, x: int, y: int):
self.x = x
self.y = y

class WithSlots:
__slots__ = ("x", "y")

def __init__(self, x: int, y: int):
self.x = x
self.y = y

N = 100_000

# Measure WithDict
tracemalloc.start()
objects_with_dict = [WithDict(i, i * 2) for i in range(N)]
snap_dict = tracemalloc.take_snapshot()
tracemalloc.stop()

# Measure WithSlots
tracemalloc.start()
objects_with_slots = [WithSlots(i, i * 2) for i in range(N)]
snap_slots = tracemalloc.take_snapshot()
tracemalloc.stop()

def total_size(snapshot):
return sum(s.size for s in snapshot.statistics("filename"))

print(f"WithDict total: {total_size(snap_dict) / 1024 / 1024:.2f} MB")
print(f"WithSlots total: {total_size(snap_slots) / 1024 / 1024:.2f} MB")
WithDict total: 56.32 MB
WithSlots total: 19.07 MB

__slots__ eliminates the per-instance __dict__ (a dictionary is ~232 bytes), leaving only the slot descriptors. For 100 000 objects, the savings are over 37 MB - a 66% reduction.

import sys

d = WithDict(1, 2)
s = WithSlots(1, 2)

print(sys.getsizeof(d)) # 48 bytes
print(sys.getsizeof(s)) # 48 bytes - identical for getsizeof
print(sys.getsizeof(d.__dict__)) # 232 bytes - the hidden cost of __dict__
# s has no __dict__ - AttributeError if you try s.__dict__

This confirms why sys.getsizeof is inadequate: it reports the same size for both classes, hiding the 232-byte per-instance __dict__ overhead of WithDict.

Graded Practice

Level 1 - Predict the Output

import sys

data = [1, 2, 3, 4, 5]
print(sys.getsizeof(data))

nested = [[1, 2], [3, 4], [5, 6]]
print(sys.getsizeof(nested))
print(sys.getsizeof(nested) > sys.getsizeof(data))
Show Answer
120
120
False

Both lists have 5 (or 3×2) elements of comparable pointer size. The outer nested list has 3 slots, reporting a smaller header than the 5-element data list. Neither measurement includes the integers or inner lists referenced. sys.getsizeof of a list is approximately 56 + 8 * len(list) bytes - only the list header and pointer array.

Level 2 - Debug This Code

This code is supposed to profile how much memory a function uses. Why does it always report 0 bytes?

import tracemalloc

def profile_function(func):
snapshot_before = tracemalloc.take_snapshot()
func()
snapshot_after = tracemalloc.take_snapshot()
diff = snapshot_after.compare_to(snapshot_before, "lineno")
return diff

def allocate_data():
return [i * 2 for i in range(100_000)]

results = profile_function(allocate_data)
for stat in results[:3]:
print(stat)
Show Answer

tracemalloc.start() was never called. tracemalloc.take_snapshot() requires tracing to be active - if you call it without first calling tracemalloc.start(), it raises RuntimeError: the tracemalloc module must be tracing memory allocations.

Additionally, the return value of allocate_data() is discarded - the list is allocated and immediately freed. Even with tracing active, the comparison might show zero net change because the allocation and deallocation both happen before the second snapshot.

Fixed version:

import tracemalloc

def profile_function(func):
tracemalloc.start() # must start tracing
snapshot_before = tracemalloc.take_snapshot()
result = func() # keep result to prevent early GC
snapshot_after = tracemalloc.take_snapshot()
tracemalloc.stop()
diff = snapshot_after.compare_to(snapshot_before, "lineno")
return diff, result # return result to keep it alive

def allocate_data():
return [i * 2 for i in range(100_000)]

results, _ = profile_function(allocate_data)
for stat in results[:3]:
print(stat)

Level 3 - Design Challenge

You are the lead engineer on a Python web service that processes user sessions. After 24 hours of load testing, the service's RSS grows from 200 MB to 1.8 GB without restarting. The service handles roughly 50 000 sessions per hour.

Design a complete memory profiling plan:

  1. What tools would you use and in what order?
  2. Write the tracemalloc instrumentation code you would add to the service.
  3. What are the three most likely root causes given this context, and how would you confirm each?
  4. How would you ensure the profiling instrumentation itself does not cause a secondary memory issue in production?
Show Answer

Step 1: Hypothesis formation

Given 50 000 sessions/hour over 24 hours = 1.2 million sessions, and growth from 200 MB to 1.8 GB (+1.6 GB), that is roughly 1.3 KB per session that is never freed. The pattern points to a cache or session store that accumulates session objects.

Step 2: Profiling plan

Tools in order:

  1. tracemalloc - identify which line is responsible for allocations
  2. objgraph.show_growth() - confirm which Python type is growing
  3. objgraph.show_backrefs() - find the retention chain on a sample leaked object
  4. pympler.asizeof() - measure the true size of a single leaked object

Step 3: tracemalloc instrumentation

import tracemalloc
import logging
import threading
import time

_snapshot_lock = threading.Lock()
_baseline_snapshot = None

def start_leak_detection():
"""Call at service startup."""
global _baseline_snapshot
tracemalloc.start(10)
time.sleep(30) # let startup allocations settle
_baseline_snapshot = tracemalloc.take_snapshot()
logging.info("Memory baseline captured")

def report_allocations_since_baseline():
"""Call periodically (e.g., every hour) via a background task."""
global _baseline_snapshot
if _baseline_snapshot is None:
return
with _snapshot_lock:
current = tracemalloc.take_snapshot()
diff = current.compare_to(_baseline_snapshot, "lineno")
for stat in diff[:10]:
logging.warning("MEMORY_GROWTH: %s", stat)
# Roll the baseline forward to track incremental growth
_baseline_snapshot = current

Step 4: Three most likely root causes

  1. Session dict never evicted: A _sessions = {} global dict holds session data indexed by session ID and is never cleared. Confirm by objgraph.show_growth() showing dict count growing linearly with session count, and show_backrefs() showing a session dict referenced by a module-level variable.

  2. Middleware or decorator capturing request objects: A logging decorator closes over the request object (which holds the full request body). Confirm by finding HttpRequest or similar objects in objgraph.show_most_common_types() with count proportional to total requests processed.

  3. asyncio tasks not awaited or cancelled: In async services, tasks that raise exceptions without being awaited accumulate in the event loop's _ready queue. Confirm with len(asyncio.all_tasks()) growing over time.

Step 5: Production safety

  • Use tracemalloc.start(10) (limit traceback depth) rather than the default - deep tracebacks multiply overhead
  • Sample: only call report_allocations_since_baseline() every 30–60 minutes, not per request
  • Use a feature flag to enable/disable tracing without restarting the service
  • Monitor the overhead: compare p99 latency with and without tracemalloc active; if overhead exceeds 15%, reduce snapshot frequency

Key Takeaways

  • sys.getsizeof() measures shallow size only - the object header and its own data structure, never the objects it references. It is correct for scalars but dangerously misleading for containers.
  • tracemalloc is the built-in standard library solution for finding which line of code is responsible for allocations. Use compare_to() between two snapshots to isolate leaks.
  • memory_profiler's @profile decorator shows the memory increment of each line inside a function - the closest thing Python has to a line-level memory profiler.
  • objgraph.show_backrefs() answers "why is this object still alive?" by showing the reference chain that prevents garbage collection.
  • pympler.asizeof() gives the true recursive size of an object and everything it references - use it when you need an honest size measurement.
  • The four most common Python memory leak patterns are: unbounded global caches, event listeners not unregistered, Django QuerySets cached on class attributes, and large objects captured in closures.
  • __slots__ saves ~232 bytes per instance by eliminating the per-instance __dict__. On 100 000 objects this is over 22 MB - measurable with tracemalloc.
  • A systematic leak investigation always goes: tracemalloc baseline → reproduce → compare snapshots → objgraph backrefs → fix → verify flat diff.
© 2026 EngineersOfAI. All rights reserved.