Skip to main content

Python Memory Optimization Practice Problems & Exercises

Practice: Memory Optimization

11 problems3 Easy4 Medium4 Hard60–90 min
← Back to lesson

Easy

#1__slots__ Memory SavingEasy
__slots__instance-memoryoptimization

Compare memory usage of a regular class vs a __slots__ class for a Point with x, y, z coordinates.

import sys
import tracemalloc

class RegularPoint:
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z

class SlottedPoint:
__slots__ = ("x", "y", "z")
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z

rp = RegularPoint(1.0, 2.0, 3.0)
sp = SlottedPoint(1.0, 2.0, 3.0)

print(f"Regular: {sys.getsizeof(rp)} bytes")
print(f"Slotted: {sys.getsizeof(sp)} bytes")
print(f"Savings per instance: {sys.getsizeof(rp) - sys.getsizeof(sp)} bytes")

# Verify __slots__ prevents dynamic attributes
try:
sp.w = 4.0
print("ERROR: should not allow new attribute")
except AttributeError:
print("__slots__ correctly prevents dynamic attributes")

print("__slots__ class uses less memory per instance")
Solution
import sys

class RegularPoint:
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z

class SlottedPoint:
__slots__ = ("x", "y", "z")
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z

rp = RegularPoint(1.0, 2.0, 3.0)
sp = SlottedPoint(1.0, 2.0, 3.0)

print(f"Regular: {sys.getsizeof(rp)} bytes")
print(f"Slotted: {sys.getsizeof(sp)} bytes")
print(f"Savings per instance: {sys.getsizeof(rp) - sys.getsizeof(sp)} bytes")

try:
sp.w = 4.0
except AttributeError:
print("__slots__ correctly prevents dynamic attributes")

print("__slots__ class uses less memory per instance")

slots mechanics:

  • Regular class: each instance has __dict__ (the attribute namespace dict), __weakref__, and the object header. Total ~232 bytes for an empty instance.
  • __slots__ class: replaces __dict__ with fixed-size descriptors. ~48-56 bytes per instance.
  • Savings are 100-200 bytes per instance — for 1 million instances: 100-200 MB saved.
  • Limitation: cannot add new attributes at runtime; __dict__ is gone.
  • __weakref__ is also removed unless explicitly added to __slots__.
Expected Output
__slots__ class uses less memory per instance
Hints

Hint 1: __slots__ removes the per-instance __dict__, saving ~200 bytes per instance.

Hint 2: Use sys.getsizeof() to compare instance sizes.

#2Generator vs List MemoryEasy
generatorlistmemorylazy-evaluation

Measure the peak memory difference between a list comprehension and a generator expression for large data.

import tracemalloc

def sum_squares_list(n):
squares = [i * i for i in range(n)]
return sum(squares)

def sum_squares_gen(n):
squares = (i * i for i in range(n))
return sum(squares)

n = 500_000

tracemalloc.start()
sum_squares_list(n)
_, peak_list = tracemalloc.get_traced_memory()
tracemalloc.stop()

tracemalloc.start()
sum_squares_gen(n)
_, peak_gen = tracemalloc.get_traced_memory()
tracemalloc.stop()

print(f"List peak: {peak_list / 1e6:.1f} MB")
print(f"Generator peak: {peak_gen / 1024:.0f} KB")
print(f"Ratio: {peak_list / peak_gen:.0f}x")
print("Generator uses dramatically less memory than list")
Solution
import tracemalloc

def sum_squares_list(n):
return sum([i * i for i in range(n)])

def sum_squares_gen(n):
return sum(i * i for i in range(n))

n = 500_000

tracemalloc.start()
sum_squares_list(n)
_, peak_list = tracemalloc.get_traced_memory()
tracemalloc.stop()

tracemalloc.start()
sum_squares_gen(n)
_, peak_gen = tracemalloc.get_traced_memory()
tracemalloc.stop()

print(f"List peak: {peak_list / 1e6:.1f} MB")
print(f"Generator peak: {peak_gen / 1024:.0f} KB")
print(f"Ratio: {peak_list / peak_gen:.0f}x")
print("Generator uses dramatically less memory than list")

Generator memory model:

  • List [i*i for i in range(n)]: allocates n integer objects + n pointer slots in the list array. For n=500k: ~4-20MB depending on integer size.
  • Generator (i*i for i in range(n)): stores only the generator state (current i, code pointer). ~100-200 bytes regardless of n.
  • Use generators whenever you only need to iterate once and don't need random access.
  • If you need to iterate multiple times, materialize to a list the first time and reuse it.
Expected Output
Generator uses dramatically less memory than list
Hints

Hint 1: A list materializes all values at once. A generator computes one at a time.

Hint 2: Use tracemalloc to measure peak memory for both.

#3array Module vs ListEasy
arraymemory-efficienttyped-arrayfloat-storage

Compare memory usage of array.array vs a list for storing 100,000 floats.

import array
import sys

n = 100_000

float_list = [float(i) for i in range(n)]
float_array = array.array("d", range(n)) # "d" = double (C double = 8 bytes)

list_size = sys.getsizeof(float_list) + sum(sys.getsizeof(x) for x in float_list[:1000]) * n // 1000
array_size = sys.getsizeof(float_array)

print(f"List approx size: {list_size / 1e6:.1f} MB")
print(f"array.array size: {array_size / 1e6:.1f} MB")
print(f"Ratio: {list_size / array_size:.1f}x")
print("array is more memory-efficient than list for numeric data")
Solution
import array
import sys

n = 100_000

float_list = [float(i) for i in range(n)]
float_array = array.array("d", range(n))

# List: pointer array + individual Python float objects
list_array_overhead = sys.getsizeof(float_list)
float_obj_size = sys.getsizeof(1.0) # ~24 bytes
total_list = list_array_overhead + float_obj_size * n

array_size = sys.getsizeof(float_array)

print(f"List approx: {total_list / 1e6:.1f} MB")
print(f"array.array: {array_size / 1e6:.1f} MB")
print(f"Ratio: {total_list / array_size:.1f}x")
print("array is more memory-efficient than list for numeric data")

array.array vs list:

  • Python list: an array of pointers to Python objects. Each float is a 24-byte heap-allocated object + 8-byte pointer = 32 bytes minimum.
  • array.array("d"): stores raw C doubles, 8 bytes each. No object overhead.
  • For 100,000 floats: list ~3.2MB, array ~0.8MB — roughly 4x more efficient.
  • Type codes: "b" int8, "h" int16, "i" int32, "l" int64, "f" float32, "d" float64.
  • For mathematical work on large numeric arrays, use NumPy (np.array) — it is faster AND more memory-efficient than array.array.
Expected Output
array is more memory-efficient than list for numeric data
Hints

Hint 1: array.array stores primitive C types (not Python objects). Bytes per element = C type size.

Hint 2: A Python float in a list = 24-byte object + 8-byte pointer. In array.array("d") = 8 bytes.


Medium

#4struct Module for Compact Binary StorageMedium
structbinary-packingmemoryserialization

Use struct to pack a list of records into a compact binary buffer and compare sizes.

import struct
import sys

# Each record: id (4 bytes), x (4 bytes float), y (4 bytes float), flags (1 byte)
RECORD_FORMAT = "!IffB" # big-endian: uint32, float, float, uint8
RECORD_SIZE = struct.calcsize(RECORD_FORMAT)

def pack_records(records):
buf = bytearray(len(records) * RECORD_SIZE)
for i, r in enumerate(records):
struct.pack_into(RECORD_FORMAT, buf, i * RECORD_SIZE,
r["id"], r["x"], r["y"], r["flags"])
return bytes(buf)

def unpack_records(buf):
n = len(buf) // RECORD_SIZE
return [
{"id": v[0], "x": v[1], "y": v[2], "flags": v[3]}
for i in range(n)
for v in [struct.unpack_from(RECORD_FORMAT, buf, i * RECORD_SIZE)]
]

records = [{"id": i, "x": float(i), "y": float(i * 2), "flags": i % 256} for i in range(1000)]
packed = pack_records(records)
unpacked = unpack_records(packed)

dict_size = sum(sys.getsizeof(r) + sum(sys.getsizeof(v) for v in r.values()) for r in records[:10]) * 100
struct_size = len(packed)

print(f"Dict list approx: {dict_size / 1024:.1f} KB")
print(f"struct buffer: {struct_size / 1024:.1f} KB")
print(f"Ratio: {dict_size / struct_size:.1f}x")
print("struct packs 100 records into much less space than dicts")
Solution
import struct
import sys

RECORD_FORMAT = "!IffB"
RECORD_SIZE = struct.calcsize(RECORD_FORMAT)

def pack_records(records):
buf = bytearray(len(records) * RECORD_SIZE)
for i, r in enumerate(records):
struct.pack_into(RECORD_FORMAT, buf, i * RECORD_SIZE,
r["id"], r["x"], r["y"], r["flags"])
return bytes(buf)

def unpack_records(buf):
n = len(buf) // RECORD_SIZE
return [
struct.unpack_from(RECORD_FORMAT, buf, i * RECORD_SIZE)
for i in range(n)
]

records = [{"id": i, "x": float(i), "y": float(i * 2), "flags": i % 256} for i in range(1000)]
packed = pack_records(records)
unpacked = unpack_records(packed)

print(f"struct buffer: {len(packed)} bytes ({len(packed)/1024:.1f} KB)")
print(f"Records per byte: {len(records)/len(packed):.3f}")
print(f"Bytes per record: {len(packed)/len(records):.1f} (vs ~200+ for dict)")
print("struct packs 100 records into much less space than dicts")

struct module efficiency:

  • struct.pack_into avoids allocating a new bytes object per call — writes directly into a pre-allocated buffer.
  • "!IffB" = big-endian !, uint32 I (4 bytes), float f (4 bytes), float f (4 bytes), uint8 B (1 byte) = 13 bytes per record.
  • A Python dict with 4 string keys + 4 values uses ~400-600 bytes. Struct: 13 bytes. ~40x more compact.
  • Use case: network protocols, binary file formats, memory-mapped arrays, inter-process shared memory.
  • ctypes.Structure is similar but provides C-style struct access with attribute names.
Expected Output
struct packs 100 records into much less space than dicts
Hints

Hint 1: struct.pack(format, *values) packs values into a binary buffer. Format codes: B=uint8, H=uint16, I=uint32, f=float32.

Hint 2: struct.calcsize(format) tells you the byte size of a packed format.

#5Memory-Efficient String InterningMedium
string-interningsys.internmemorydeduplication

Use sys.intern to deduplicate a large number of repeated strings and measure memory savings.

import sys
import tracemalloc

N = 50000
CATEGORIES = ["alpha", "beta", "gamma", "delta", "epsilon"]

# Without interning — many duplicate string objects
tracemalloc.start()
without_intern = [CATEGORIES[i % 5] for i in range(N)]
_, peak_no_intern = tracemalloc.get_traced_memory()
tracemalloc.stop()

# With interning — all equal strings share one object
tracemalloc.start()
with_intern = [sys.intern(CATEGORIES[i % 5]) for i in range(N)]
_, peak_intern = tracemalloc.get_traced_memory()
tracemalloc.stop()

# Verify identity sharing
a = sys.intern("hello_world_unique")
b = sys.intern("hello_world_unique")
print(f"Interned strings are same object: {a is b}")

print(f"Without intern peak: {peak_no_intern/1024:.1f} KB")
print(f"With intern peak: {peak_intern/1024:.1f} KB")
print("Interned strings share memory — deduplication verified")
Solution
import sys
import tracemalloc

N = 50000
CATEGORIES = ["alpha", "beta", "gamma", "delta", "epsilon"]

tracemalloc.start()
without_intern = [CATEGORIES[i % 5] for i in range(N)]
_, peak_no_intern = tracemalloc.get_traced_memory()
tracemalloc.stop()

tracemalloc.start()
with_intern = [sys.intern(CATEGORIES[i % 5]) for i in range(N)]
_, peak_intern = tracemalloc.get_traced_memory()
tracemalloc.stop()

a = sys.intern("hello_world_unique")
b = sys.intern("hello_world_unique")
print(f"Interned strings are same object: {a is b}")
print(f"Without intern peak: {peak_no_intern/1024:.1f} KB")
print(f"With intern peak: {peak_intern/1024:.1f} KB")
print("Interned strings share memory — deduplication verified")

String interning:

  • sys.intern(s) stores s in a global table. Returns the same object for all equal interned strings.
  • Python automatically interns string literals and identifiers (short strings that look like identifiers).
  • For large collections of repeated strings (log levels, category names, column names), explicit interning saves significant memory.
  • a is b tests identity (same object in memory). Two equal non-interned strings have a == b but not a is b.
  • Interned strings persist in the intern table for the program's lifetime — do not intern dynamic strings that grow without bound.
Expected Output
Interned strings share memory — deduplication verified
Hints

Hint 1: sys.intern(s) returns a canonical version of the string — all interned equal strings are the same object.

Hint 2: Use "is" to verify identity (same object), "==" to verify equality.

#6Lazy Loading with __getattr__Medium
lazy-loading__getattr__memorydeferred-computation

Implement lazy attribute loading — expensive data is only loaded when first accessed.

import tracemalloc

class LazyDataModel:
def __init__(self, model_id):
self.model_id = model_id
self._loaded = set()

def __getattr__(self, name):
if name.startswith("_") or name == "model_id":
raise AttributeError(name)
# Simulate expensive load
if name == "embeddings":
data = list(range(10000)) # 10k floats
elif name == "metadata":
data = {f"key_{i}": i for i in range(1000)}
else:
raise AttributeError(f"No attribute {name!r}")

object.__setattr__(self, name, data) # cache in instance dict
self._loaded.add(name)
return data

# Measure: creating many models without accessing their data
tracemalloc.start()
models = [LazyDataModel(i) for i in range(100)]
_, peak_init = tracemalloc.get_traced_memory()
tracemalloc.stop()

print(f"Peak after creating 100 models: {peak_init/1024:.1f} KB")

# Access embeddings on one model
tracemalloc.start()
_ = models[0].embeddings
_, peak_after = tracemalloc.get_traced_memory()
tracemalloc.stop()

print(f"Peak after loading one embedding: {peak_after/1024:.1f} KB")
print("Attributes loaded only when accessed, not on init")
Solution
import tracemalloc

class LazyDataModel:
def __init__(self, model_id):
self.model_id = model_id
self._loaded = set()

def __getattr__(self, name):
if name.startswith("_") or name == "model_id":
raise AttributeError(name)
if name == "embeddings":
data = list(range(10000))
elif name == "metadata":
data = {f"key_{i}": i for i in range(1000)}
else:
raise AttributeError(f"No attribute {name!r}")
object.__setattr__(self, name, data)
self._loaded.add(name)
return data

tracemalloc.start()
models = [LazyDataModel(i) for i in range(100)]
_, peak_init = tracemalloc.get_traced_memory()
tracemalloc.stop()

print(f"Peak after creating 100 models: {peak_init/1024:.1f} KB")

tracemalloc.start()
_ = models[0].embeddings
_, peak_after = tracemalloc.get_traced_memory()
tracemalloc.stop()

print(f"Peak after loading one embedding: {peak_after/1024:.1f} KB")
print("Attributes loaded only when accessed, not on init")

Lazy loading pattern:

  • __getattr__ is only called when normal attribute lookup (instance dict, class dict, MRO) fails.
  • object.__setattr__(self, name, data) stores the result in the instance dict directly — bypassing any custom __setattr__.
  • After the first access, the attribute lives in instance.__dict____getattr__ is not called again (normal lookup succeeds).
  • Pattern: create 100 model objects cheaply; only pay the memory cost of loading data for models you actually use.
  • This is how ORM lazy loading works: accessing a relationship attribute triggers a database query on first access.
Expected Output
Attributes loaded only when accessed, not on init
Hints

Hint 1: __getattr__ is called only when normal attribute lookup fails — perfect for lazy initialization.

Hint 2: Store loaded data in the instance dict so __getattr__ is not called again for the same attribute.

#7Memory-Mapped FilesMedium
mmapmemory-mappedlarge-filerandom-access

Use mmap to read specific sections of a large file without loading the entire file into memory.

import mmap
import os
import tempfile
import tracemalloc

# Create a 1MB test file
with tempfile.NamedTemporaryFile(delete=False, suffix=".bin") as f:
tmpfile = f.name
f.write(b"0123456789" * 100000) # 1MB

# Read with mmap — only requested pages loaded
tracemalloc.start()
with open(tmpfile, "rb") as f:
with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
# Read only 100 bytes from position 500000
chunk = mm[500000:500100]
size = len(mm)
_, peak_mmap = tracemalloc.get_traced_memory()
tracemalloc.stop()

# Read entire file into memory
tracemalloc.start()
with open(tmpfile, "rb") as f:
data = f.read()
_, peak_full = tracemalloc.get_traced_memory()
tracemalloc.stop()

print(f"mmap peak: {peak_mmap/1024:.1f} KB")
print(f"full read peak: {peak_full/1024:.1f} KB")
print(f"chunk: {chunk[:20]}...")
print("mmap accesses file data without loading entire file into memory")

os.unlink(tmpfile)
Solution
import mmap
import os
import tempfile
import tracemalloc

with tempfile.NamedTemporaryFile(delete=False, suffix=".bin") as f:
tmpfile = f.name
f.write(b"0123456789" * 100000)

tracemalloc.start()
with open(tmpfile, "rb") as f:
with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
chunk = mm[500000:500100]
size = len(mm)
_, peak_mmap = tracemalloc.get_traced_memory()
tracemalloc.stop()

tracemalloc.start()
with open(tmpfile, "rb") as f:
data = f.read()
_, peak_full = tracemalloc.get_traced_memory()
tracemalloc.stop()

print(f"mmap peak: {peak_mmap/1024:.1f} KB")
print(f"full read peak: {peak_full/1024:.1f} KB")
print(f"chunk: {chunk[:20]}...")
print("mmap accesses file data without loading entire file into memory")

os.unlink(tmpfile)

mmap use cases:

  • Large files where you only need specific sections (binary indices, database pages, log files).
  • Random access without loading everything: mm[offset:offset+size] loads only those OS pages.
  • mmap.mmap(fd, 0) maps the entire file. The OS uses virtual memory — actual physical pages are loaded on demand.
  • Write access: mmap.mmap(fd, 0, access=mmap.ACCESS_WRITE) — changes are reflected in the file.
  • mmap.ACCESS_COPY creates a private copy-on-write mapping — writes do not affect the file.
Expected Output
mmap accesses file data without loading entire file into memory
Hints

Hint 1: mmap.mmap(fd, 0) maps the entire file. Access it like a bytes object — the OS loads pages on demand.

Hint 2: mmap only loads the pages you actually read — perfect for random access into large files.


Hard

#8Copy-on-Write with weakrefHard
copy-on-writeweakrefmemorydeferred-copy

Implement a copy-on-write wrapper that shares data between instances until mutation.

import copy

class CopyOnWrite:
def __init__(self, data):
self._data = data
self._owned = False # False = shared, True = own copy

def _ensure_owned(self):
if not self._owned:
self._data = copy.deepcopy(self._data)
self._owned = True

def get(self, key):
return self._data[key]

def set(self, key, value):
self._ensure_owned()
self._data[key] = value

def __repr__(self):
return f"CoW(owned={self._owned}, data={self._data})"

original_data = {"a": 1, "b": 2, "c": 3}
cow1 = CopyOnWrite(original_data)
cow2 = CopyOnWrite(original_data)

# Both share the same dict initially
print(f"Shared: {cow1._data is cow2._data}")

# Read doesn't trigger copy
_ = cow1.get("a")
print(f"After read: cow1 owned={cow1._owned}, cow2 owned={cow2._owned}")

# Write triggers copy on cow1 only
cow1.set("a", 99)
print(f"After write: cow1 owned={cow1._owned}, cow2 owned={cow2._owned}")
print(f"cow1.a={cow1.get('a')}, cow2.a={cow2.get('a')} (unchanged)")
print("Shared data until mutation; copy triggered on write")
Solution
import copy

class CopyOnWrite:
def __init__(self, data):
self._data = data
self._owned = False

def _ensure_owned(self):
if not self._owned:
self._data = copy.deepcopy(self._data)
self._owned = True

def get(self, key):
return self._data[key]

def set(self, key, value):
self._ensure_owned()
self._data[key] = value

original_data = {"a": 1, "b": 2, "c": 3}
cow1 = CopyOnWrite(original_data)
cow2 = CopyOnWrite(original_data)

print(f"Shared: {cow1._data is cow2._data}")
_ = cow1.get("a")
print(f"After read: cow1 owned={cow1._owned}, cow2 owned={cow2._owned}")
cow1.set("a", 99)
print(f"After write: cow1 owned={cow1._owned}, cow2 owned={cow2._owned}")
print(f"cow1.a={cow1.get('a')}, cow2.a={cow2.get('a')}")
print("Shared data until mutation; copy triggered on write")

Copy-on-write pattern:

  • Reads are zero-copy — all instances share the original object.
  • Only the first write triggers a deep copy of the data for that instance.
  • Use case: function arguments that are "pass by value" semantics with "share until modified" performance.
  • Python uses COW in the OS for fork(): child processes share parent memory pages until they write.
  • Production: pandas DataFrames use COW mode since 2.0 to reduce unexpected mutations and memory usage.
Expected Output
Shared data until mutation; copy triggered on write
Hints

Hint 1: Store a reference to the original data. On first write, make a copy and work on it.

Hint 2: Use a flag to track whether a copy has been made.

#9Memory Pool PatternHard
memory-poolobject-reuseallocation-overheadpooling

Implement an object pool that reuses expensive-to-create objects instead of allocating new ones each time.

import tracemalloc

class ExpensiveBuffer:
def __init__(self, size=1024):
self.data = bytearray(size)
self.in_use = False

def reset(self):
self.data[:] = b"\x00" * len(self.data)
self.in_use = False

class BufferPool:
def __init__(self, pool_size=10, buffer_size=1024):
self._pool = [ExpensiveBuffer(buffer_size) for _ in range(pool_size)]
self._stats = {"allocations": 0, "reuses": 0}

def acquire(self):
for buf in self._pool:
if not buf.in_use:
buf.in_use = True
self._stats["reuses"] += 1
return buf
# Pool exhausted — allocate new (not ideal)
self._stats["allocations"] += 1
buf = ExpensiveBuffer()
buf.in_use = True
return buf

def release(self, buf):
buf.reset()

# With pooling
pool = BufferPool(pool_size=5)
tracemalloc.start()
for _ in range(100):
buf = pool.acquire()
buf.data[0] = 42
pool.release(buf)
_, peak_pool = tracemalloc.get_traced_memory()
tracemalloc.stop()

# Without pooling
tracemalloc.start()
for _ in range(100):
buf = ExpensiveBuffer()
buf.data[0] = 42
_, peak_no_pool = tracemalloc.get_traced_memory()
tracemalloc.stop()

print(f"Pool reuses: {pool._stats['reuses']} / extra allocs: {pool._stats['allocations']}")
print(f"Pool peak: {peak_pool/1024:.1f} KB")
print(f"No-pool peak: {peak_no_pool/1024:.1f} KB")
print("Pool reuses objects — fewer allocations than create-and-discard")
Solution
import tracemalloc

class ExpensiveBuffer:
def __init__(self, size=1024):
self.data = bytearray(size)
self.in_use = False

def reset(self):
for i in range(len(self.data)):
self.data[i] = 0
self.in_use = False

class BufferPool:
def __init__(self, pool_size=10, buffer_size=1024):
self._pool = [ExpensiveBuffer(buffer_size) for _ in range(pool_size)]
self._stats = {"allocations": 0, "reuses": 0}

def acquire(self):
for buf in self._pool:
if not buf.in_use:
buf.in_use = True
self._stats["reuses"] += 1
return buf
self._stats["allocations"] += 1
buf = ExpensiveBuffer()
buf.in_use = True
return buf

def release(self, buf):
buf.reset()

pool = BufferPool(pool_size=5)
tracemalloc.start()
for _ in range(100):
buf = pool.acquire()
buf.data[0] = 42
pool.release(buf)
_, peak_pool = tracemalloc.get_traced_memory()
tracemalloc.stop()

tracemalloc.start()
for _ in range(100):
buf = ExpensiveBuffer()
buf.data[0] = 42
_, peak_no_pool = tracemalloc.get_traced_memory()
tracemalloc.stop()

print(f"Pool reuses: {pool._stats['reuses']}")
print(f"Pool peak: {peak_pool/1024:.1f} KB")
print(f"No-pool peak: {peak_no_pool/1024:.1f} KB")
print("Pool reuses objects — fewer allocations than create-and-discard")

Object pool benefits:

  • Eliminates repeated allocation and deallocation of expensive objects (network buffers, DB connections, worker threads).
  • Reduces GC pressure: fewer short-lived objects means fewer GC cycles.
  • Predictable memory: pool size is fixed; no unexpected growth.
  • reset() clears state so released objects are safe to reuse — critical for security (clear sensitive data).
  • Python's concurrent.futures.ThreadPoolExecutor and asyncio's connection pools use this pattern.
Expected Output
Pool reuses objects — fewer allocations than create-and-discard
Hints

Hint 1: A memory pool pre-allocates objects and hands them out. Returned objects are reset and reused.

Hint 2: Track allocations with tracemalloc to show reduced allocation count with pooling.

#10Compact Record Storage with namedtupleHard
namedtupledataclassmemorycompact-records

Compare memory usage of dict, regular class, dataclass, and namedtuple for the same record data.

import sys
import tracemalloc
from collections import namedtuple
from dataclasses import dataclass

RecordNamedTuple = namedtuple("Record", ["id", "x", "y", "label"])

@dataclass
class RecordDataclass:
id: int
x: float
y: float
label: str

class RecordClass:
def __init__(self, id, x, y, label):
self.id = id
self.x = x
self.y = y
self.label = label

@dataclass
class RecordSlotted:
__slots__ = ("id", "x", "y", "label")
id: int
x: float
y: float
label: str

n = 50000
label = "category_A"

def measure_peak(factory):
tracemalloc.start()
records = [factory(i, float(i), float(i * 2), label) for i in range(n)]
_, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
return peak

results = {
"dict": measure_peak(lambda i, x, y, l: {"id": i, "x": x, "y": y, "label": l}),
"namedtuple": measure_peak(lambda i, x, y, l: RecordNamedTuple(i, x, y, l)),
"dataclass": measure_peak(lambda i, x, y, l: RecordDataclass(i, x, y, l)),
"class": measure_peak(lambda i, x, y, l: RecordClass(i, x, y, l)),
}

for name, peak in sorted(results.items(), key=lambda x: x[1]):
print(f"{name:12s}: {peak/1e6:.1f} MB peak")

print("namedtuple is more memory-efficient than dict or regular class")
Solution
import tracemalloc
from collections import namedtuple
from dataclasses import dataclass

RecordNamedTuple = namedtuple("Record", ["id", "x", "y", "label"])

@dataclass
class RecordDataclass:
id: int
x: float
y: float
label: str

class RecordClass:
def __init__(self, id, x, y, label):
self.id = id
self.x = x
self.y = y
self.label = label

n = 50000
label = "category_A"

def measure_peak(factory):
tracemalloc.start()
records = [factory(i, float(i), float(i * 2), label) for i in range(n)]
_, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
return peak

results = {
"dict": measure_peak(lambda i, x, y, l: {"id": i, "x": x, "y": y, "label": l}),
"namedtuple": measure_peak(lambda i, x, y, l: RecordNamedTuple(i, x, y, l)),
"dataclass": measure_peak(lambda i, x, y, l: RecordDataclass(i, x, y, l)),
"class": measure_peak(lambda i, x, y, l: RecordClass(i, x, y, l)),
}

for name, peak in sorted(results.items(), key=lambda x: x[1]):
print(f"{name:12s}: {peak/1e6:.1f} MB peak")

print("namedtuple is more memory-efficient than dict or regular class")

Memory ranking (typical, ascending):

  1. namedtuple: tuple subclass — same overhead as tuple (~56 bytes + N * 8-byte pointers).
  2. Regular class with __slots__: ~56 bytes, no __dict__.
  3. Regular class: ~232 bytes with __dict__.
  4. dataclass: same as regular class by default (has __dict__). Add __slots__=True (Python 3.10+) for slot efficiency.
  5. dict: ~200 bytes base + ~50 bytes per key-value pair.

Namedtuple is the most memory-efficient option when you don't need mutability.

Expected Output
namedtuple is more memory-efficient than dict or regular class
Hints

Hint 1: namedtuple is a tuple subclass — has tuple memory overhead (very low) plus slot-like access.

Hint 2: Compare sys.getsizeof() for dict, regular class, and namedtuple with the same fields.

#11Chunked Processing for Large DatasetsHard
chunkingmemorylarge-datasetstreaming

Process a large simulated dataset in fixed-size chunks to bound peak memory usage.

import tracemalloc

def data_source(total_records):
"""Simulates a large data source — yields records lazily."""
for i in range(total_records):
yield {"id": i, "value": float(i), "tags": [i % 5, i % 7]}

def chunked(iterable, size):
"""Yield successive chunks of `size` items."""
chunk = []
for item in iterable:
chunk.append(item)
if len(chunk) == size:
yield chunk
chunk = []
if chunk:
yield chunk

def process_chunk(chunk):
return sum(r["value"] for r in chunk)

def process_all_chunked(total, chunk_size):
total_value = 0
for chunk in chunked(data_source(total), chunk_size):
total_value += process_chunk(chunk)
return total_value

def process_all_at_once(total):
records = list(data_source(total))
return sum(r["value"] for r in records)

total = 200_000
chunk_size = 1_000

tracemalloc.start()
r1 = process_all_chunked(total, chunk_size)
_, peak_chunked = tracemalloc.get_traced_memory()
tracemalloc.stop()

tracemalloc.start()
r2 = process_all_at_once(total)
_, peak_at_once = tracemalloc.get_traced_memory()
tracemalloc.stop()

assert abs(r1 - r2) < 0.001, "Results must match"
print(f"Chunked peak: {peak_chunked/1e6:.1f} MB")
print(f"All-at-once: {peak_at_once/1e6:.1f} MB")
print(f"Ratio: {peak_at_once/peak_chunked:.1f}x")
print("Chunked processing uses bounded memory regardless of dataset size")
Solution
import tracemalloc

def data_source(total_records):
for i in range(total_records):
yield {"id": i, "value": float(i), "tags": [i % 5, i % 7]}

def chunked(iterable, size):
chunk = []
for item in iterable:
chunk.append(item)
if len(chunk) == size:
yield chunk
chunk = []
if chunk:
yield chunk

def process_chunk(chunk):
return sum(r["value"] for r in chunk)

def process_all_chunked(total, chunk_size):
return sum(process_chunk(chunk) for chunk in chunked(data_source(total), chunk_size))

def process_all_at_once(total):
return sum(r["value"] for r in list(data_source(total)))

total = 200_000
chunk_size = 1_000

tracemalloc.start()
r1 = process_all_chunked(total, chunk_size)
_, peak_chunked = tracemalloc.get_traced_memory()
tracemalloc.stop()

tracemalloc.start()
r2 = process_all_at_once(total)
_, peak_at_once = tracemalloc.get_traced_memory()
tracemalloc.stop()

print(f"Chunked peak: {peak_chunked/1e6:.1f} MB")
print(f"All-at-once: {peak_at_once/1e6:.1f} MB")
print(f"Ratio: {peak_at_once/peak_chunked:.1f}x")
print("Chunked processing uses bounded memory regardless of dataset size")

Chunked processing:

  • Peak memory = size of one chunk (1,000 records) regardless of total dataset size.
  • All-at-once: peak memory = entire dataset (200,000 records).
  • Chunk size tuning: larger chunks → better CPU cache utilization; smaller chunks → lower peak memory.
  • Production: tune chunk size to fit comfortably in L3 cache (~4-16MB) for maximum throughput.
  • This pattern is used by pandas read_csv(chunksize=N), SQLAlchemy yield_per(N), and database cursor-based iteration.
Expected Output
Chunked processing uses bounded memory regardless of dataset size
Hints

Hint 1: Process the dataset in fixed-size chunks. Only one chunk lives in memory at a time.

Hint 2: Use a generator to yield chunks lazily — no pre-allocation of the entire dataset.

© 2026 EngineersOfAI. All rights reserved.