Skip to main content

Python Memory Profiling — tracemalloc, sys.getsizeof,: Practice Problems & Exercises

Practice: Memory Profiling — tracemalloc, sys.getsizeof, objgraph, and pympler

11 problems4 Easy4 Medium3 Hard50–70 min
← Back to lesson

Easy

#1sys.getsizeof — Shallow vs DeepEasy
sys.getsizeofshallow-sizecontainers

Predict which comparisons evaluate to True. This tests your understanding of sys.getsizeof shallow measurement.

Python
import sys

small = [1, 2, 3]
large = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# The list header grows with the number of elements (pointer slots)
print(sys.getsizeof(large) > sys.getsizeof(small))

# A nested list: the inner list is not counted by the outer's getsizeof
outer = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [1, 2, 3]
# outer has 3 pointer slots; flat has 3 pointer slots — same number of references
print(sys.getsizeof(outer) == sys.getsizeof(flat))

# An empty dict and a dict with items — getsizeof shows the hash table size changes
d_empty = {}
d_full = {"a": 1, "b": 2, "c": 3, "d": 4, "e": 5, "f": 6, "g": 7, "h": 8, "i": 9}
print(sys.getsizeof(d_full) > sys.getsizeof(d_empty))
Solution
True
True
True

Breaking down each comparison:

Comparison 1 — large > small: True. A 10-element list needs 10 pointer slots; a 3-element list needs 3. The list header is base_size + n_elements * pointer_size. On 64-bit Python, a pointer is 8 bytes, so large is about (10 - 3) * 8 = 56 bytes larger.

Comparison 2 — outer == flat: True, and this is the most important one. outer has 3 elements (three references to inner lists). flat also has 3 elements (three references to ints). sys.getsizeof counts pointer slots, not pointed-to sizes. Both have 3 pointers, so their shallow sizes are equal. The inner lists' memory is completely invisible to the outer's getsizeof.

Comparison 3 — d_full > d_empty: True. Python's dict uses an open-addressing hash table. As you add more keys, the table is resized (rehashed) to maintain load factor below 2/3. More keys = larger hash table = larger getsizeof result.

The practical lesson: Never use sys.getsizeof to measure the total memory of a data structure. Use pympler.asizeof or a recursive size function instead.

Expected Output
True\nTrue\nTrue
Hints

Hint 1: sys.getsizeof() returns the shallow size — the size of the container object itself, not the objects it contains.

Hint 2: A list of 1000 ints has a larger shallow size (more pointer slots) than a list of 3 ints.

Hint 3: Both lists have the same shallow size if they have the same number of allocated slots — but the ints themselves are not counted.

#2tracemalloc.start() and take_snapshot()Easy
tracemallocsnapshotstatistics

Verify your understanding of basic tracemalloc usage. Each assertion checks one core concept.

Python
import tracemalloc

tracemalloc.start()

# Allocate a known structure
data = list(range(100_000))

snapshot = tracemalloc.take_snapshot()
stats = snapshot.statistics("lineno")

# There should be at least one statistic
print(len(stats) > 0)

# The largest allocation should be measured in KiB or MiB (at least 100 KB)
top_size_bytes = stats[0].size
print(top_size_bytes > 100_000)

# Each stat has a count of allocated objects
print(stats[0].count >= 1)

tracemalloc.stop()
Solution
True
True
True

What each line confirms:

Line 1 — len(stats) > 0: After allocating a 100K-element list, there is at least one allocation tracked. The statistics list will typically have several entries: the list itself, the integers inside it (though small ints are cached), and internal tracemalloc overhead.

Line 2 — top_size_bytes > 100_000: A list(range(100_000)) allocates a list object with 100,000 pointer slots. On a 64-bit system: list header (~56 bytes) + 100,000 * 8 bytes per pointer = ~800 KB. So the top allocation will be well over 100,000 bytes.

Line 3 — stats[0].count >= 1: Each Statistic records how many distinct Python objects were allocated at that source line. A list comprehension for 100K elements is a single allocation (count=1 for the list itself).

Key takeaway: tracemalloc.start() must be called before the code you want to profile. Any allocations made before start() (including Python's own startup allocations) are not recorded.

Expected Output
True\nTrue\nTrue
Hints

Hint 1: tracemalloc.start() begins recording memory allocations. Call it before the code you want to measure.

Hint 2: snapshot.statistics("lineno") returns a list of Statistic objects sorted by size, largest first.

Hint 3: Each Statistic has .size (bytes), .count (number of objects), and .traceback attributes.

#3__slots__ Memory SavingsEasy
__slots__memorygetsizeof

Verify that __slots__ reduces per-instance memory. Compare two equivalent classes, one with and one without __slots__.

Python
import sys

class WithDict:
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

class WithSlots:
    __slots__ = ('x', 'y', 'z')

    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

d = WithDict(1, 2, 3)
s = WithSlots(1, 2, 3)

# Slots instance is smaller
print(sys.getsizeof(s) < sys.getsizeof(d))

# WithDict instances have a __dict__; WithSlots instances do not
print(hasattr(d, '__dict__') and not hasattr(s, '__dict__'))
Solution
True
True

Why __slots__ saves memory:

A regular Python instance stores all its attributes in a __dict__. The dict itself is a full Python object with its own overhead: hash table, multiple pointer arrays, metadata. On CPython 3.12 (64-bit), an empty __dict__ consumes about 232 bytes.

With __slots__, Python replaces the __dict__ with a compact fixed-size C array of typed slots — one pointer per declared attribute. There is no hash table, no resizing, no per-entry metadata.

Rough size comparison for a 3-attribute instance:

  • WithDict: ~56 bytes (instance) + ~232 bytes (empty dict) + pointer overhead = ~288+ bytes
  • WithSlots: ~72 bytes (instance + 3 slots) = ~72 bytes

The tradeoff: __slots__ instances cannot have arbitrary attributes added at runtime:

s = WithSlots(1, 2, 3)
s.new_attr = 99 # AttributeError: 'WithSlots' object has no attribute 'new_attr'

When to use: Data classes that are instantiated millions of times (numerical coordinates, event records, ORM rows). The savings scale linearly: 100,000 instances saves roughly 21 MB (the missing __dict__ per instance).

Expected Output
True\nTrue
Hints

Hint 1: Without __slots__, each instance stores attributes in a __dict__, which itself takes memory.

Hint 2: With __slots__, attributes are stored in a fixed-size array — no __dict__ is created.

Hint 3: sys.getsizeof(instance_with_slots) will be smaller than sys.getsizeof(instance_without_slots).

#4tracemalloc Snapshot ComparisonEasy
tracemalloccompare_toleak-detection

Use tracemalloc snapshot comparison to identify which line of code allocated new memory between two snapshots. This is the core technique for finding memory leaks in long-running services.

Python
import tracemalloc

tracemalloc.start()

# Snapshot before the allocation
snap1 = tracemalloc.take_snapshot()

# Do some allocation
growing_list = [{"key": i, "val": i * 2} for i in range(1_000)]

# Snapshot after
snap2 = tracemalloc.take_snapshot()

# Compare: find what grew
diffs = snap2.compare_to(snap1, "lineno")
positive_diffs = [d for d in diffs if d.size_diff > 0]

# There should be at least one line that allocated new memory
print(len(positive_diffs) > 0)

# The total new memory should be at least 50 KB (1000 dicts + the list)
total_new = sum(d.size_diff for d in positive_diffs)
print(total_new > 50_000)

tracemalloc.stop()
Solution
True
True

How snapshot comparison works:

snap2.compare_to(snap1, "lineno") produces a list of StatisticDiff objects, one per source line that had any allocation change between the two snapshots. Each diff reports:

  • size_diff: bytes added (positive) or freed (negative)
  • count_diff: object count change
  • traceback: which source line caused the allocation

Why this is powerful for leak detection:

In a production leak investigation, you take two snapshots separated by time (e.g., before and after processing 1000 requests). If certain lines show a consistently positive size_diff across multiple snapshot pairs, those are your leak candidates.

Practical workflow:

import tracemalloc

tracemalloc.start(10) # keep 10 frames of traceback
snap1 = tracemalloc.take_snapshot()

process_requests(count=1000)

snap2 = tracemalloc.take_snapshot()
top_diffs = snap2.compare_to(snap1, "traceback")
for diff in top_diffs[:5]:
print(diff)
for line in diff.traceback.format():
print(" ", line)

The "traceback" key gives you the full call stack to the allocation site, not just the immediate line.

Expected Output
True\nTrue
Hints

Hint 1: snapshot2.compare_to(snapshot1, "lineno") returns a list of StatisticDiff objects.

Hint 2: Each StatisticDiff has .size_diff (positive means more memory allocated in snap2 vs snap1).

Hint 3: Filter for entries where size_diff > 0 to find code lines that allocated new memory.


Medium

#5Recursive Deep Size MeasurementMedium
sys.getsizeofdeep-sizerecursioncontainers

Implement a deep_sizeof() function that correctly measures the total memory of a data structure, including all nested objects. This is what tools like pympler.asizeof do internally.

import sys

def deep_sizeof(obj, seen=None):
"""Return total memory of obj and all referenced objects."""
# Your implementation here
pass
Solution
import sys

def deep_sizeof(obj, seen=None):
"""Return the total memory used by obj and all objects it references."""
if seen is None:
seen = set()

obj_id = id(obj)
if obj_id in seen:
return 0
seen.add(obj_id)

size = sys.getsizeof(obj)

if isinstance(obj, dict):
size += sum(deep_sizeof(k, seen) + deep_sizeof(v, seen)
for k, v in obj.items())
elif isinstance(obj, (list, tuple, set, frozenset)):
size += sum(deep_sizeof(item, seen) for item in obj)
elif hasattr(obj, '__dict__'):
size += deep_sizeof(obj.__dict__, seen)
elif hasattr(obj, '__slots__'):
size += sum(
deep_sizeof(getattr(obj, slot), seen)
for slot in obj.__slots__
if hasattr(obj, slot)
)

return size

Why the seen set is essential:

Without cycle detection, a self-referential structure like a = []; a.append(a) would cause infinite recursion. Using id(obj) in seen prevents visiting the same object twice. This also prevents double-counting objects that appear multiple times (e.g., the same interned string used as multiple dict keys).

Why we use id() instead of the object itself:

Storing objects in seen would create additional references to them and prevent garbage collection. id() is just an integer — it holds no reference to the object.

Comparison with pympler: pympler.asizeof handles more edge cases: C extension objects, __reduce__, slots on base classes, and more. For production use, prefer pympler. For learning and simple scripts, deep_sizeof covers 90% of cases.

import sys

def deep_sizeof(obj, seen=None):
    """Return the total memory used by obj and all objects it references.
    Handles cycles by tracking visited object ids.
    Only recurse into containers (list, tuple, dict, set, frozenset).
    For dict, count both keys and values.
    """
    if seen is None:
        seen = set()

    obj_id = id(obj)
    if obj_id in seen:
        return 0
    seen.add(obj_id)

    size = sys.getsizeof(obj)

    # Add sizes of contained objects
    pass

    return size

# Test 1: simple list of ints
shallow = sys.getsizeof([1, 2, 3])
deep = deep_sizeof([1, 2, 3])
print(f"Shallow: {shallow}, Deep: {deep}, Deep > Shallow: {deep > shallow}")

# Test 2: nested structure
nested = {"users": [{"id": i, "name": f"user_{i}"} for i in range(10)]}
shallow_nested = sys.getsizeof(nested)
deep_nested = deep_sizeof(nested)
print(f"Nested deep > shallow: {deep_nested > shallow_nested}")
print(f"Deep at least 10x shallow: {deep_nested > shallow_nested * 10}")
Expected Output
Shallow: True, Deep: True, Deep > Shallow: True\nNested deep > shallow: True\nDeep at least 10x shallow: True
Hints

Hint 1: For lists and tuples, iterate over the elements and recursively call deep_sizeof on each.

Hint 2: For dicts, iterate over both .keys() and .values() and recurse on each key and value.

Hint 3: For sets and frozensets, iterate and recurse on each element.

Hint 4: Pass the same `seen` set to all recursive calls to handle cycles and avoid double-counting shared objects.

#6tracemalloc Top AllocatorsMedium
tracemallocstatisticstop-allocators

Build a find_top_allocators() function that profiles a callable and returns the top N lines by memory allocated. This is the core of any memory profiling tool.

import tracemalloc

def find_top_allocators(code_to_run, top_n=3):
"""Return (filename, lineno, size_bytes) for the top_n allocation sites."""
# Use tracemalloc to profile code_to_run
pass
Solution
import tracemalloc

def find_top_allocators(code_to_run, top_n=3):
"""Profile the memory allocations made by code_to_run."""
tracemalloc.start()
snap_before = tracemalloc.take_snapshot()

code_to_run()

snap_after = tracemalloc.take_snapshot()
tracemalloc.stop()

diffs = snap_after.compare_to(snap_before, "lineno")
# Sort by size_diff descending, take top N with positive growth
growing = sorted(
[d for d in diffs if d.size_diff > 0],
key=lambda d: d.size_diff,
reverse=True
)[:top_n]

result = []
for stat in growing:
frame = stat.traceback[0]
result.append((frame.filename, frame.lineno, stat.size_diff))

return result

What you would see for the workload function:

The three largest allocations would correspond to:

  1. big_str = "x" * 1_000_000 — approximately 1 MB (a str with 1 million characters is about 1 MB + 49 bytes header).
  2. big_list = list(range(50_000)) — approximately 400 KB (50,000 × 8 bytes per pointer + 56 bytes header).
  3. big_dict = {i: i**2 for i in range(10_000)} — approximately 290 KB (dict hash table overhead + entries).

Extending to full tracebacks:

For deep call stacks, use tracemalloc.start(25) (25 frames) and "traceback" as the key:

diffs = snap_after.compare_to(snap_before, "traceback")
for d in sorted(diffs, key=lambda d: d.size_diff, reverse=True)[:3]:
print(f"Size: {d.size_diff / 1024:.1f} KiB")
for line in d.traceback.format():
print(" ", line)

This shows you the full call chain to the allocation, not just the immediate line.

import tracemalloc

def find_top_allocators(code_to_run, top_n=3):
    """Profile the memory allocations made by code_to_run (a callable).
    Return a list of the top_n (filename, lineno, size_bytes) tuples,
    sorted by size descending.
    """
    tracemalloc.start()
    snap_before = tracemalloc.take_snapshot()

    code_to_run()

    snap_after = tracemalloc.take_snapshot()
    tracemalloc.stop()

    # Get statistics and return the top N
    pass


def workload():
    """A function that makes several distinct allocations."""
    big_list = list(range(50_000))         # allocation 1
    big_dict = {i: i**2 for i in range(10_000)}  # allocation 2
    big_str = "x" * 1_000_000             # allocation 3

results = find_top_allocators(workload, top_n=3)
print(f"Got {len(results)} results: {len(results) == 3}")
print(f"First result is a tuple: {isinstance(results[0], tuple)}")
print(f"First result has 3 elements: {len(results[0]) == 3}")
print(f"Sizes are descending: {results[0][2] >= results[1][2] >= results[2][2]}")
Expected Output
Got 3 results: True\nFirst result is a tuple: True\nFirst result has 3 elements: True\nSizes are descending: True
Hints

Hint 1: Use snap_after.compare_to(snap_before, "lineno") to get only the new allocations.

Hint 2: Each StatisticDiff has .traceback.format() and .size_diff attributes.

Hint 3: Sort by size_diff descending and take the first top_n entries.

Hint 4: Extract filename and lineno from stat.traceback[0].filename and stat.traceback[0].lineno.

#7__slots__ Memory BenchmarkMedium
__slots__memorybenchmarktracemalloc

Use tracemalloc to empirically measure the memory savings from __slots__ across 10,000 instances. Quantify the difference rather than just asserting that slots use less memory.

import tracemalloc

class Point:
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z

class PointSlots:
__slots__ = ('x', 'y', 'z')
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z

def measure_allocation(cls, count):
"""Return bytes allocated by creating count instances of cls."""
pass
Solution
import tracemalloc

class Point:
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z

class PointSlots:
__slots__ = ('x', 'y', 'z')

def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z

def measure_allocation(cls, count):
"""Measure bytes allocated by creating count instances of cls."""
tracemalloc.start()
snap1 = tracemalloc.take_snapshot()

instances = [cls(i, i + 1, i + 2) for i in range(count)]

snap2 = tracemalloc.take_snapshot()
tracemalloc.stop()

diffs = snap2.compare_to(snap1, "lineno")
total = sum(d.size_diff for d in diffs if d.size_diff > 0)

del instances # don't let the return value affect the next measurement
return total

n = 10_000
bytes_dict = measure_allocation(Point, n)
bytes_slots = measure_allocation(PointSlots, n)

print(f"Dict-based: {bytes_dict / 1024:.1f} KiB")
print(f"Slots-based: {bytes_slots / 1024:.1f} KiB")
print(f"Slots saves memory: {bytes_slots < bytes_dict}")
print(f"Savings ratio: {bytes_dict / bytes_slots:.1f}x")

Typical output on CPython 3.12:

Dict-based: 2734.4 KiB
Slots-based: 703.1 KiB
Slots saves memory: True
Savings ratio: 3.9x

Where the savings come from:

For Point (no slots):

  • Per instance: ~56 bytes (object header)
  • Per __dict__: ~232 bytes (empty dict initial size)
  • Total per instance: ~288 bytes
  • 10,000 instances: ~2.81 MB

For PointSlots:

  • Per instance: ~72 bytes (object header + 3 slot descriptors)
  • No __dict__
  • 10,000 instances: ~720 KB

The ~4x ratio matches the theoretical calculation. For data-heavy applications (ORM result rows, scientific simulations, event records), this savings is multiplicative with instance count.

import tracemalloc
import sys

class Point:
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

class PointSlots:
    __slots__ = ('x', 'y', 'z')

    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z


def measure_allocation(cls, count):
    """Measure how many bytes are allocated to create 'count' instances of cls.
    Use tracemalloc snapshot comparison.
    Return the total size_diff in bytes.
    """
    pass


n = 10_000
bytes_dict = measure_allocation(Point, n)
bytes_slots = measure_allocation(PointSlots, n)

print(f"Dict-based: {bytes_dict / 1024:.1f} KiB")
print(f"Slots-based: {bytes_slots / 1024:.1f} KiB")
print(f"Slots saves memory: {bytes_slots < bytes_dict}")
print(f"Savings ratio: {bytes_dict / bytes_slots:.1f}x")
Expected Output
Dict-based: True\nSlots-based: True\nSlots saves memory: True\nSavings ratio: True
Hints

Hint 1: Use tracemalloc.start() before and take_snapshot() before and after the list comprehension.

Hint 2: Sum all positive size_diffs from compare_to() to get total bytes allocated.

Hint 3: The dict-based version should allocate significantly more — each instance creates a __dict__ object too.

#8Identifying a Leak with tracemallocMedium
tracemallocleak-detectionsnapshot-comparison

Build a leak detector that profiles a function across multiple calls and returns the source lines responsible for persistent memory growth. Apply it to a real leaky function.

import tracemalloc
import gc

_cache = {}

def leaky_function(key):
if key not in _cache:
_cache[key] = [0] * 1000
return _cache[key]

def detect_leak(func, iterations):
"""Return (size_diff, filename, lineno) for lines that grew by more than 1 KB."""
pass
Solution
import tracemalloc
import gc

_cache = {}

def leaky_function(key):
if key not in _cache:
_cache[key] = [0] * 1000
return _cache[key]

def detect_leak(func, iterations):
"""Return (size_diff, filename, lineno) for lines that grew by > 1 KB."""
gc.collect()
tracemalloc.start()
snap1 = tracemalloc.take_snapshot()

for i in range(iterations):
func(i)

gc.collect()
snap2 = tracemalloc.take_snapshot()
tracemalloc.stop()

diffs = snap2.compare_to(snap1, "lineno")
leaks = [
(d.size_diff, d.traceback[0].filename, d.traceback[0].lineno)
for d in diffs
if d.size_diff > 1024
]
leaks.sort(key=lambda x: x[0], reverse=True)
return leaks

leaks = detect_leak(leaky_function, 100)
print(f"Leak lines found: {len(leaks) > 0}")
print(f"Each leak entry is a tuple: {isinstance(leaks[0], tuple)}")
print(f"Leak is at least 100 KB: {leaks[0][0] > 100_000}")

Why this leaks:

Each unique key (0 through 99) creates a new _cache entry containing [0] * 1000. That list is approximately 8000 bytes (1000 pointers at 8 bytes each). After 100 unique calls: 100 entries × 8 KB = 800 KB of retained memory.

Because _cache is module-level (global), the GC cannot collect these lists — they are reachable through a live root. The leak detector correctly identifies the line _cache[key] = [0] * 1000 as the allocation site.

The fix options:

  1. Use functools.lru_cache(maxsize=128) — built-in LRU eviction.
  2. Use cachetools.TTLCache — time-based expiry.
  3. Use weakref.WeakValueDictionary — entries are evicted when the caller no longer holds a reference.
import tracemalloc
import gc

# Global cache that grows without bound — this is the leak
_cache = {}

def leaky_function(key):
    """Simulate a function that caches results but never evicts them."""
    if key not in _cache:
        _cache[key] = [0] * 1000  # 1000-element list per cache entry
    return _cache[key]


def detect_leak(func, iterations):
    """Call func(i) for i in range(iterations).
    Take snapshots before and after, return a list of
    (size_diff_bytes, filename, lineno) for lines that grew by more than 1 KB.
    """
    pass


leaks = detect_leak(leaky_function, 100)
print(f"Leak lines found: {len(leaks) > 0}")
print(f"Each leak entry is a tuple: {isinstance(leaks[0], tuple)}")
print(f"Leak is at least 100 KB: {leaks[0][0] > 100_000}")
Expected Output
Leak lines found: True\nEach leak entry is a tuple: True\nLeak is at least 100 KB: True
Hints

Hint 1: Call tracemalloc.start() before the first snapshot and tracemalloc.stop() after the second.

Hint 2: Use snap2.compare_to(snap1, "lineno") and filter for size_diff > 1024 (1 KB).

Hint 3: Sort by size_diff descending so the largest leak appears first.


Hard

#9Memory-Bounded LRU Cache with tracemallocHard
lru-cachememory-boundtracemalloceviction

Implement a memory-bounded LRU cache that evicts entries based on estimated byte usage rather than just count. Test it with tracemalloc to confirm the memory bound is respected.

import sys
from collections import OrderedDict

class MemoryBoundedCache:
"""LRU cache that evicts when total estimated memory exceeds max_bytes."""

def __init__(self, max_bytes):
self.max_bytes = max_bytes
self._cache = OrderedDict()
self._sizes = {}
self._current_bytes = 0

def get(self, key):
pass

def put(self, key, value):
pass

def current_usage(self):
return self._current_bytes
Solution
import sys
from collections import OrderedDict

class MemoryBoundedCache:
"""LRU cache that evicts when total estimated memory exceeds max_bytes."""

def __init__(self, max_bytes):
self.max_bytes = max_bytes
self._cache = OrderedDict()
self._sizes = {}
self._current_bytes = 0

def get(self, key):
"""Return value and mark as recently used."""
if key not in self._cache:
return None
self._cache.move_to_end(key)
return self._cache[key]

def put(self, key, value):
"""Store key -> value, evicting LRU entries if needed."""
value_size = sys.getsizeof(value)

# If key exists, remove old size before updating
if key in self._cache:
self._current_bytes -= self._sizes[key]
del self._cache[key]
del self._sizes[key]

# Evict LRU entries until there is room
while self._current_bytes + value_size > self.max_bytes and self._cache:
oldest_key, _ = self._cache.popitem(last=False)
self._current_bytes -= self._sizes.pop(oldest_key)

# Store new entry
self._cache[key] = value
self._sizes[key] = value_size
self._current_bytes += value_size
self._cache.move_to_end(key)

def current_usage(self):
return self._current_bytes

def __len__(self):
return len(self._cache)

How OrderedDict implements LRU:

  • move_to_end(key) — moves the key to the "most recently used" position (the end of the insertion order).
  • popitem(last=False) — removes and returns the "least recently used" item (the oldest, at the beginning).

This gives O(1) get, put, and evict operations — the same complexity as functools.lru_cache.

Why sys.getsizeof for sizing:

In a production implementation, you would use pympler.asizeof or deep_sizeof for accurate sizes. sys.getsizeof is fast but only measures shallow size. For simple value types (lists of primitives, strings, tuples), it is usually a good enough proxy.

Real-world consideration: Django's cache framework, Redis clients, and many API result caches benefit from memory-bounded eviction. Count-based LRU (Python's functools.lru_cache) can retain a small number of very large objects while rejecting many small ones — a memory-bounded cache avoids this pathology.

import tracemalloc
from collections import OrderedDict

class MemoryBoundedCache:
    """An LRU cache that evicts entries not just by count but when total
    estimated memory usage exceeds a byte limit.

    Uses a simple per-value size estimate (sys.getsizeof of the value).
    When adding a new entry would exceed max_bytes, evict LRU entries
    until there is room.
    """

    def __init__(self, max_bytes):
        self.max_bytes = max_bytes
        self._cache = OrderedDict()
        self._sizes = {}   # key -> estimated bytes for value
        self._current_bytes = 0

    def get(self, key):
        """Return value for key, marking it as recently used.
        Return None if key is not present.
        """
        pass

    def put(self, key, value):
        """Store key -> value. Evict LRU entries if needed to stay under max_bytes."""
        pass

    def current_usage(self):
        """Return current estimated bytes used."""
        return self._current_bytes

    def __len__(self):
        return len(self._cache)


import sys

cache = MemoryBoundedCache(max_bytes=10_000)  # 10 KB limit

# Each value is [0] * 500 = ~4000 bytes
for i in range(5):
    cache.put(f"key_{i}", [0] * 500)

print(f"After 5 puts: len={len(cache)}, usage_ok={cache.current_usage() <= 10_000}")

# Adding a 6th entry should evict the oldest
cache.put("key_5", [0] * 500)
print(f"After 6th put: eviction happened={len(cache) < 6}")
print(f"Stays under limit: {cache.current_usage() <= 10_000}")

# Access key_1 to make it recently used, then add more
cache.get("key_1")
cache.put("key_6", [0] * 500)
# key_2 should be evicted (LRU after key_1 was accessed)
print(f"key_1 still present after LRU: {cache.get('key_1') is not None}")
Expected Output
After 5 puts: len=5, usage_ok=True\nAfter 6th put: eviction happened=True\nStays under limit: True\nkey_1 still present after LRU: True
Hints

Hint 1: Use sys.getsizeof(value) for per-value size estimation.

Hint 2: In put(), if the new entry would exceed max_bytes, call _cache.popitem(last=False) to remove the LRU entry (oldest = first) and subtract its size.

Hint 3: Keep evicting until current_bytes + new_size <= max_bytes.

Hint 4: In get(), use _cache.move_to_end(key) to mark it as recently used (OrderedDict LRU pattern).

#10tracemalloc Continuous MonitoringHard
tracemallocmonitoringproduction-patterns

Build a background MemoryMonitor that uses a daemon thread to periodically sample tracemalloc snapshots. This mirrors how production APM tools (Datadog, New Relic) continuously monitor Python process memory.

import tracemalloc
import threading
import time

class MemoryMonitor:
"""Background memory monitor using tracemalloc snapshots."""

def __init__(self, sample_interval=0.5, top_n=5):
self.sample_interval = sample_interval
self.top_n = top_n
self._thread = None
self._running = False
self._snapshots = []
self._lock = threading.Lock()

def start(self): pass
def stop(self): pass
def _sample_loop(self): pass
def report(self): pass
Solution
import tracemalloc
import threading
import time

class MemoryMonitor:
"""Background memory monitor using tracemalloc snapshots."""

def __init__(self, sample_interval=0.5, top_n=5):
self.sample_interval = sample_interval
self.top_n = top_n
self._thread = None
self._running = False
self._snapshots = []
self._lock = threading.Lock()

def start(self):
"""Start tracemalloc and the background sampling thread."""
tracemalloc.start(10) # 10 frames of traceback
self._running = True
self._thread = threading.Thread(target=self._sample_loop, daemon=True)
self._thread.start()

def stop(self):
"""Stop the sampling thread and tracemalloc."""
self._running = False
if self._thread:
self._thread.join(timeout=2.0)
tracemalloc.stop()

def _sample_loop(self):
"""Background thread: sample every sample_interval seconds."""
while self._running:
snap = tracemalloc.take_snapshot()
with self._lock:
self._snapshots.append(snap)
time.sleep(self.sample_interval)

def report(self):
"""Return peak_bytes, sample_count, top_allocators."""
with self._lock:
snaps = list(self._snapshots)

if not snaps:
return {"peak_bytes": 0, "sample_count": 0, "top_allocators": []}

# Peak: snapshot with highest total tracked memory
peak_snap = max(
snaps,
key=lambda s: sum(stat.size for stat in s.statistics("lineno"))
)
peak_bytes = sum(stat.size for stat in peak_snap.statistics("lineno"))

# Top allocators from the peak snapshot
top_stats = sorted(
peak_snap.statistics("lineno"),
key=lambda s: s.size,
reverse=True
)[:self.top_n]

top_allocators = [
(stat.size, stat.traceback[0].filename, stat.traceback[0].lineno)
for stat in top_stats
]

return {
"peak_bytes": peak_bytes,
"sample_count": len(snaps),
"top_allocators": top_allocators,
}

Production considerations:

  1. GIL and thread safety: tracemalloc.take_snapshot() acquires the GIL briefly. The background thread's snapshot calls are safe but add a small overhead to the monitored threads.

  2. Snapshot storage cost: Each snapshot stores the full allocation state — potentially thousands of entries. For long-running monitors, limit retained snapshots (e.g., keep only the last N or only the peak).

  3. Daemon thread: Setting daemon=True ensures the monitor thread does not prevent process exit if stop() is not called.

  4. Production APM integration: Real tools like py-spy and memray use lower-level hooks. tracemalloc is the stdlib approach — good for application-level profiling, not for profiling the profiler itself.

import tracemalloc
import threading
import time

class MemoryMonitor:
    """A background monitor that periodically samples memory allocations
    using tracemalloc and records peak usage and top allocation sites.

    Usage:
        monitor = MemoryMonitor(sample_interval=0.1)
        monitor.start()
        # ... run code ...
        monitor.stop()
        report = monitor.report()
    """

    def __init__(self, sample_interval=0.5, top_n=5):
        self.sample_interval = sample_interval
        self.top_n = top_n
        self._thread = None
        self._running = False
        self._snapshots = []
        self._lock = threading.Lock()

    def start(self):
        """Start tracemalloc and begin background sampling."""
        pass

    def stop(self):
        """Stop the background thread and tracemalloc."""
        pass

    def _sample_loop(self):
        """Background thread: take a snapshot every sample_interval seconds."""
        pass

    def report(self):
        """Return a dict with: peak_bytes, sample_count, top_allocators.
        top_allocators is a list of (size_bytes, filename, lineno) tuples.
        """
        pass


# Demo usage
monitor = MemoryMonitor(sample_interval=0.05, top_n=3)
monitor.start()

# Simulate some work with allocations
results = []
for i in range(1000):
    results.append({"id": i, "data": list(range(100))})
    if i % 100 == 0:
        time.sleep(0.01)

monitor.stop()
report = monitor.report()

print(f"Got samples: {report['sample_count'] > 0}")
print(f"Peak bytes tracked: {report['peak_bytes'] > 0}")
print(f"Top allocators found: {len(report['top_allocators']) > 0}")
Expected Output
Got samples: True\nPeak bytes tracked: True\nTop allocators found: True
Hints

Hint 1: In start(), call tracemalloc.start(), then launch a daemon thread running _sample_loop.

Hint 2: In _sample_loop(), loop while self._running, call tracemalloc.take_snapshot(), append to self._snapshots with the lock held, then time.sleep(self.sample_interval).

Hint 3: In stop(), set self._running = False and join the thread, then call tracemalloc.stop().

Hint 4: In report(), find peak_bytes by taking the max of sum(stat.size for stat in snap.statistics("lineno")) across all snapshots.

#11Memory-Efficient Data PipelineHard
generators__slots__memory-efficiencypipeline

Implement a lazy generator-based data pipeline that processes 50,000 records with the same logic as an eager list-based pipeline, but uses dramatically less peak memory. Verify the improvement with tracemalloc.

import tracemalloc

def eager_pipeline(n):
records = [{"id": i, "value": i * 3, "active": i % 2 == 0} for i in range(n)]
active = [r for r in records if r["active"]]
transformed = [{"id": r["id"], "score": r["value"] ** 2} for r in active]
return sum(t["score"] for t in transformed)

def lazy_pipeline(n):
"""Same logic using generators throughout — no intermediate lists."""
pass
Solution
import tracemalloc

def eager_pipeline(n):
records = [{"id": i, "value": i * 3, "active": i % 2 == 0} for i in range(n)]
active = [r for r in records if r["active"]]
transformed = [{"id": r["id"], "score": r["value"] ** 2} for r in active]
return sum(t["score"] for t in transformed)

def lazy_pipeline(n):
"""Same logic using generators throughout."""
records = ({"id": i, "value": i * 3, "active": i % 2 == 0} for i in range(n))
active = (r for r in records if r["active"])
transformed = ({"id": r["id"], "score": r["value"] ** 2} for r in active)
return sum(t["score"] for t in transformed)

def measure_peak(func, n):
tracemalloc.start()
result = func(n)
_, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
return peak, result

n = 50_000
peak_eager, result_eager = measure_peak(eager_pipeline, n)
peak_lazy, result_lazy = measure_peak(lazy_pipeline, n)

print(f"Same result: {result_eager == result_lazy}")
print(f"Eager peak: {peak_eager / 1024:.0f} KiB")
print(f"Lazy peak: {peak_lazy / 1024:.0f} KiB")
print(f"Lazy uses less memory: {peak_lazy < peak_eager}")
print(f"Lazy is at least 5x more efficient: {peak_eager / peak_lazy >= 5}")

Why generators are dramatically more memory-efficient:

Eager pipeline builds three full lists in memory simultaneously at peak:

  1. records: 50,000 dicts × ~232 bytes each = ~11.4 MB
  2. active: ~25,000 dicts (every even id) = ~5.7 MB
  3. transformed: ~25,000 dicts = ~5.7 MB Peak: all three coexist briefly = ~22 MB

Lazy pipeline holds at most one record in memory at a time. The generator chain records → active → transformed processes one dict, feeds it to the filter, transforms it, adds its score to the running total, then discards it. Peak memory: roughly one dict + generator frame overhead = a few KB.

Typical tracemalloc peak readings:

  • Eager: ~22,000 KiB
  • Lazy: ~100–500 KiB
  • Ratio: ~50–200x

When lazy generators are NOT better: If you need random access, multiple passes, or to share the result list across multiple callers, you must materialize. Generators are consumed once and cannot be rewound without re-creating the pipeline.

import tracemalloc
import sys

# Eager (bad) approach — loads everything into memory at once
def eager_pipeline(n):
    """Load all n records, filter, transform, aggregate — returns a list."""
    records = [{"id": i, "value": i * 3, "active": i % 2 == 0} for i in range(n)]
    active = [r for r in records if r["active"]]
    transformed = [{"id": r["id"], "score": r["value"] ** 2} for r in active]
    total = sum(t["score"] for t in transformed)
    return total


# Lazy (good) approach — implement this
def lazy_pipeline(n):
    """Same logic but using generators throughout. No intermediate lists."""
    pass


# Memory measurement helper
def measure_peak(func, n):
    """Return peak memory in bytes used by func(n)."""
    tracemalloc.start()
    result = func(n)
    _, peak = tracemalloc.get_traced_memory()
    tracemalloc.stop()
    return peak, result


n = 50_000
peak_eager, result_eager = measure_peak(eager_pipeline, n)
peak_lazy, result_lazy = measure_peak(lazy_pipeline, n)

print(f"Same result: {result_eager == result_lazy}")
print(f"Eager peak: {peak_eager / 1024:.0f} KiB")
print(f"Lazy peak: {peak_lazy / 1024:.0f} KiB")
print(f"Lazy uses less memory: {peak_lazy < peak_eager}")
print(f"Lazy is at least 5x more efficient: {peak_eager / peak_lazy >= 5}")
Expected Output
Same result: True\nEager peak: True\nLazy peak: True\nLazy uses less memory: True\nLazy is at least 5x more efficient: True
Hints

Hint 1: Use generator expressions (parentheses) instead of list comprehensions (brackets) for each pipeline step.

Hint 2: The generator pipeline: records = (... for i in range(n)), then active = (r for r in records if ...), etc.

Hint 3: Only the final sum() call actually triggers evaluation — the rest is lazy.

Hint 4: tracemalloc.get_traced_memory() returns (current, peak) — use peak for the comparison.

© 2026 EngineersOfAI. All rights reserved.