Python CPython Architecture Practice Problems & Exercises
Practice: CPython Architecture
← Back to lessonEasy
Demonstrate CPython's integer caching behaviour. Show which integers are cached (returning the same object) and which are not.
import sys
def check_cache(value):
a = value
b = value
return a is b
for val in [256, 257, -5, -6]:
print(f"a is b ({val}): {check_cache(val)}")Solution
import sys
def check_cache(value):
a = value
b = value
return a is b
for val in [256, 257, -5, -6]:
print(f"a is b ({val}): {check_cache(val)}")
Why this happens:
CPython pre-allocates integer objects for values in the range -5 to 256 at interpreter startup. These live in a static array in Objects/longobject.c. Any time you reference an integer in this range, CPython returns a pointer to the pre-existing object — no new allocation occurs.
For integers outside this range (like 257), CPython allocates a fresh PyLongObject on the heap each time. Two separate assignments create two separate objects, so is returns False even though the values are equal.
Warning: Never use is to compare integers in production code. Rely on == for value equality. The caching boundary (256) is an implementation detail that could change between CPython versions.
Expected Output
a is b (256): True\na is b (257): False\na is b (-5): True\na is b (-6): FalseHints
Hint 1: CPython caches small integers in the range -5 to 256. Any integer in this range always returns the same object.
Hint 2: Use the `is` operator (identity, not equality) to check whether two names refer to the exact same object in memory.
Explore CPython's string interning. Show which strings are automatically interned and how to force interning with sys.intern().
import sys
s1 = "hello"
s2 = "hello"
print(f"'hello' interned: {s1 is s2}")
s3 = "hello world"
s4 = "hello world"
print(f"'hello world' auto-interned: {s3 is s4}")
s5 = sys.intern("hello world")
s6 = sys.intern("hello world")
print(f"'hello world' after intern(): {s5 is s6}")Solution
import sys
s1 = "hello"
s2 = "hello"
print(f"'hello' interned: {s1 is s2}")
s3 = "hello world"
s4 = "hello world"
print(f"'hello world' auto-interned: {s3 is s4}")
s5 = sys.intern("hello world")
s6 = sys.intern("hello world")
print(f"'hello world' after intern(): {s5 is s6}")
How CPython decides to intern strings:
CPython automatically interns string literals that satisfy the identifier rule: only ASCII letters, digits, and underscores. These strings are candidates for dictionary keys and attribute lookups, so deduplication speeds up these operations.
Strings with spaces, punctuation, or non-ASCII characters are NOT auto-interned by default. sys.intern() manually adds a string to the global intern table — subsequent calls with the same value return the pre-existing object.
Performance use case: When you have a large data pipeline with millions of repeated string values (e.g., category names or HTTP methods), interning those strings reduces memory and speeds up dictionary lookups because identity comparison (is, a single pointer comparison) replaces equality comparison (character-by-character scan).
Expected Output
'hello' interned: True\n'hello world' auto-interned: False\n'hello world' after intern(): TrueHints
Hint 1: CPython automatically interns string literals that look like identifiers (only letters, digits, underscores). Strings with spaces are not auto-interned.
Hint 2: Use `sys.intern()` to force interning of any string. After interning, two variables holding the same string value will point to the same object.
Use sys.getsizeof() to measure the memory footprint of common Python objects. What is the base overhead of each type?
import sys
objects = [
("int", 42),
("float", 3.14),
("bool", True),
("str ('')", ""),
("str ('hello')", "hello"),
("list ([])", []),
("list ([1,2,3])", [1, 2, 3]),
]
for label, obj in objects:
print(f"{label} size: {sys.getsizeof(obj)} bytes")Solution
import sys
objects = [
("int", 42),
("float", 3.14),
("bool", True),
("str ('')", ""),
("str ('hello')", "hello"),
("list ([])", []),
("list ([1,2,3])", [1, 2, 3]),
]
for label, obj in objects:
print(f"{label} size: {sys.getsizeof(obj)} bytes")
What these numbers reveal about PyObject layout:
Every CPython object starts with the PyObject header (16 bytes on 64-bit systems):
ob_refcnt— 8-byte reference countob_type— 8-byte pointer to the type object (&PyLong_Type,&PyFloat_Type, etc.)
Beyond that each type adds its own fields:
- int (PyLongObject): 28 bytes — the header plus digit count and the digit array (minimum 1 digit of 4 bytes, plus alignment)
- float (PyFloatObject): 24 bytes — header plus a C
double(8 bytes) - bool: same as int (True and False are
PyLongObjectinstances; bool is a subclass of int) - str: 49 bytes empty, +1 byte per ASCII character (compact ASCII representation)
- list: 56 bytes empty, +8 bytes per element slot (a pointer array)
sys.getsizeof() is shallow — the list [1, 2, 3] reports 88 bytes (list object) but does not count the memory of the three int objects stored in it.
Expected Output
int size: 28 bytes\nfloat size: 24 bytes\nbool size: 28 bytes\nstr ('') size: 49 bytes\nstr ('hello') size: 54 bytes\nlist ([]) size: 56 bytes\nlist ([1,2,3]) size: 88 bytesHints
Hint 1: `sys.getsizeof()` returns the size of the Python object itself in bytes — it does NOT include the memory of referenced objects (shallow size only).
Hint 2: Every Python object has a fixed overhead: ob_refcnt (8 bytes) + ob_type pointer (8 bytes) = 16 bytes minimum. Each type then adds its own fields.
Write a function describe_implementation(name) that returns the key characteristics of each Python implementation.
def describe_implementation(name):
implementations = {
"CPython": {
"language": "C",
"execution": "bytecode-interpreted",
"jit": False,
"gil": True,
"key_advantage": "reference implementation, widest compatibility",
},
"PyPy": {
"language": "RPython",
"execution": "JIT-compiled",
"jit": True,
"gil": True,
"key_advantage": "5-10x faster for CPU-bound loops",
},
"Jython": {
"language": "Java",
"execution": "JVM bytecode",
"jit": True,
"gil": False,
"key_advantage": "true Java threading, Java library access",
},
"GraalPy": {
"language": "Java/Truffle",
"execution": "GraalVM JIT",
"jit": True,
"gil": False,
"key_advantage": "polyglot, no GIL",
},
}
return implementations.get(name, {})
for impl in ["CPython", "PyPy", "Jython", "GraalPy"]:
info = describe_implementation(impl)
print(f"{impl}: execution={info['execution']}, jit={info['jit']}, gil={info['gil']}")
Solution
def describe_implementation(name):
implementations = {
"CPython": {
"language": "C",
"execution": "bytecode-interpreted",
"jit": False,
"gil": True,
"key_advantage": "reference implementation, widest compatibility",
},
"PyPy": {
"language": "RPython",
"execution": "JIT-compiled",
"jit": True,
"gil": True,
"key_advantage": "5-10x faster for CPU-bound loops",
},
"Jython": {
"language": "Java",
"execution": "JVM bytecode",
"jit": True,
"gil": False,
"key_advantage": "true Java threading, Java library access",
},
"GraalPy": {
"language": "Java/Truffle",
"execution": "GraalVM JIT",
"jit": True,
"gil": False,
"key_advantage": "polyglot, no GIL",
},
}
return implementations.get(name, {})
for impl in ["CPython", "PyPy", "Jython", "GraalPy"]:
info = describe_implementation(impl)
print(f"{impl}: execution={info['execution']}, jit={info['jit']}, gil={info['gil']}")
When to choose an alternative implementation:
- PyPy: CPU-bound numeric code (simulations, parsers, text processing) where the JIT can warm up and optimize hot loops. Not suitable when you need C extensions that are CPython-specific.
- Jython: Applications that need deep integration with a Java codebase or JVM ecosystem. Note: Jython 3 support is still catching up to CPython 3.x.
- GraalPy: Polyglot applications mixing Python with JavaScript, R, or Java in a single process. Also useful if you need true thread-level parallelism without a GIL.
- CPython: Everything else — it has the broadest library support, the largest community, and is what production systems run.
Expected Output
See solution for classification and explanationsHints
Hint 1: CPython is the reference implementation written in C. PyPy uses a JIT compiler. Jython compiles to JVM bytecode. GraalPy runs on GraalVM.
Hint 2: Consider: does it compile to bytecode? Does it have a JIT? What is its primary advantage over CPython?
Medium
Use sys.settrace to observe every step of the eval loop as it executes a simple function. Record each event and the corresponding line number.
import sys
def build_tracer(events):
def tracer(frame, event, arg):
lineno = frame.f_lineno
events.append((event, f"line {frame.f_lineno - frame.f_code.co_firstlineno + 1}"))
return tracer
return tracer
def target_function(x):
y = x * 2
z = y + 1
return z
events = []
sys.settrace(build_tracer(events))
result = target_function(5)
sys.settrace(None)
for event, location in events:
print(f"{event:<10} {location}")
print(f"\nResult: {result}")Solution
import sys
def build_tracer(events):
def tracer(frame, event, arg):
events.append((event, f"line {frame.f_lineno - frame.f_code.co_firstlineno + 1}"))
return tracer
return tracer
def target_function(x):
y = x * 2
z = y + 1
return z
events = []
sys.settrace(build_tracer(events))
result = target_function(5)
sys.settrace(None)
for event, location in events:
print(f"{event:<10} {location}")
print(f"\nResult: {result}")
How the eval loop uses trace hooks:
The CPython eval loop (Python/ceval.c) checks for trace hooks at three points:
call— fired when a new frame is pushed (function call begins). Theframeobject containsf_locals,f_globals,f_code, etc.line— fired before executing the first bytecode instruction of each new source line. This is what Python debuggers use to implement line-by-line stepping.return— fired just before the frame is popped (function returns).argis the return value.
sys.settrace is the foundation of pdb, coverage.py, and most Python debuggers. The performance cost is significant: enabling a trace function makes CPython check the hook on every instruction, roughly 10-20x overhead. Production profilers use sys.setprofile instead (fires only on call/return, not every line).
import sys
def build_tracer(events):
"""Return a trace function that records (event, lineno) pairs.
Capture 'call', 'line', and 'return' events.
"""
pass
def target_function(x):
y = x * 2
z = y + 1
return zExpected Output
call line 1\nline line 2\nline line 3\nline line 4\nreturn line 4Hints
Hint 1: `sys.settrace(fn)` installs a global trace function. It is called with (frame, event, arg). Valid events are "call", "line", "return", "exception".
Hint 2: Your trace function must return itself to keep tracing inside the called function. Return None to stop tracing a function after the first call.
Use tracemalloc to observe CPython's memory allocation patterns. Show the difference between small-object allocations (handled by pymalloc) and large allocations (handled by system malloc).
import tracemalloc
import sys
tracemalloc.start()
# Allocate objects of various sizes
small_objects = [bytes(size) for size in range(1, 513)] # 1–512 bytes: pymalloc range
large_objects = [bytes(size) for size in range(513, 600)] # 513+ bytes: system malloc
snapshot = tracemalloc.take_snapshot()
stats = snapshot.statistics("lineno")
# Show total current memory
current, peak = tracemalloc.get_traced_memory()
print(f"Current traced memory: {current:,} bytes")
print(f"Peak traced memory: {peak:,} bytes")
print(f"\nsmall_objects list size (sys.getsizeof): {sys.getsizeof(small_objects)} bytes")
print(f"large_objects list size (sys.getsizeof): {sys.getsizeof(large_objects)} bytes")
# Single-object size comparison
tiny = bytes(10)
big = bytes(1000)
print(f"\nbytes(10) size: {sys.getsizeof(tiny)} bytes")
print(f"bytes(1000) size: {sys.getsizeof(big)} bytes")
tracemalloc.stop()Solution
import tracemalloc
import sys
tracemalloc.start()
small_objects = [bytes(size) for size in range(1, 513)]
large_objects = [bytes(size) for size in range(513, 600)]
snapshot = tracemalloc.take_snapshot()
current, peak = tracemalloc.get_traced_memory()
print(f"Current traced memory: {current:,} bytes")
print(f"Peak traced memory: {peak:,} bytes")
print(f"\nsmall_objects list size (sys.getsizeof): {sys.getsizeof(small_objects)} bytes")
print(f"large_objects list size (sys.getsizeof): {sys.getsizeof(large_objects)} bytes")
tiny = bytes(10)
big = bytes(1000)
print(f"\nbytes(10) size: {sys.getsizeof(tiny)} bytes")
print(f"bytes(1000) size: {sys.getsizeof(big)} bytes")
tracemalloc.stop()
CPython's three-tier memory allocator:
Python objects
|
pymalloc (Python's custom allocator)
| handles allocations <= 512 bytes
| uses arenas (256 KB) -> pools (4 KB) -> blocks
|
glibc malloc / jemalloc (system allocator)
| handles allocations > 512 bytes
| also used for pymalloc's arena allocation itself
|
OS mmap / brk (virtual memory)
Why pymalloc exists:
- Python creates and destroys millions of small objects (function frames, tuples, dicts) per second
malloc/freefor tiny objects has high overhead (alignment overhead, per-allocation bookkeeping)- pymalloc manages fixed-size blocks within pre-allocated pools, making allocation and deallocation O(1) with minimal fragmentation
- Objects larger than 512 bytes bypass pymalloc entirely and go to the system allocator
Expected Output
See solution for allocation size analysisHints
Hint 1: CPython uses pymalloc for objects 512 bytes or smaller. Larger objects go directly to the system malloc. Use `tracemalloc` to observe allocation sizes.
Hint 2: tracemalloc.take_snapshot() captures all current allocations. Filter by filename to isolate your code.
Build a function that inspects a Python type object and reports its key properties: name, MRO, whether it is a C builtin or Python-defined type, and which dunder methods it defines.
def inspect_type(obj):
t = type(obj)
mro = [c.__name__ for c in t.__mro__]
is_builtin = t.__module__ == "builtins"
dunders = [attr for attr in dir(t) if attr.startswith("__") and attr.endswith("__")]
print(f"Object: {repr(obj)[:40]}")
print(f"Type: {t.__name__}")
print(f"Module: {t.__module__}")
print(f"Is C type: {is_builtin}")
print(f"MRO: {' -> '.join(mro)}")
print(f"Dunders: {len(dunders)} ({', '.join(dunders[:5])}...)")
print()
for obj in [42, "hello", [], {}, lambda: None]:
inspect_type(obj)
Solution
def inspect_type(obj):
t = type(obj)
mro = [c.__name__ for c in t.__mro__]
is_builtin = t.__module__ == "builtins"
dunders = [attr for attr in dir(t) if attr.startswith("__") and attr.endswith("__")]
print(f"Object: {repr(obj)[:40]}")
print(f"Type: {t.__name__}")
print(f"Module: {t.__module__}")
print(f"Is C type: {is_builtin}")
print(f"MRO: {' -> '.join(mro)}")
print(f"Dunders: {len(dunders)} ({', '.join(dunders[:5])}...)")
print()
for obj in [42, "hello", [], {}, lambda: None]:
inspect_type(obj)
What a PyTypeObject contains:
In CPython's C source, every Python type is a PyTypeObject struct with over 40 fields, including:
tp_name— the type's qualified nametp_basicsize— size of an instance in bytestp_alloc/tp_dealloc— memory allocation/deallocation hookstp_repr,tp_str— implementrepr()andstr()tp_as_number,tp_as_sequence,tp_as_mapping— protocol suitestp_richcompare— implements==,!=,<, etc.tp_methods— table of methods exposed to Python
When you access obj.__add__, Python walks the type object (and its MRO) looking for tp_as_number->nb_add. The MRO determines which class's version is used — the first match in the MRO wins.
def inspect_type(obj):
"""Print key information about the type of an object.
Show: type name, MRO, slots (selected __dunder__ methods),
and whether it is a built-in or user-defined type.
"""
passExpected Output
See solution for type inspection outputHints
Hint 1: Every Python type is itself a `type` object (a PyTypeObject in C). Access it via `type(obj)` or `obj.__class__`.
Hint 2: Use `type(obj).__mro__` for the method resolution order, `dir(type(obj))` for attributes, and `type(obj).__module__` to distinguish builtins from user types.
Predict the output before running. Explain every True or False result.
# Case 1: Cached range
a = 100
b = 100
print(f"Case 1: {a is b}") # ?
# Case 2: Outside cache, separate statements
x = 300
y = 300
print(f"Case 2: {x is y}") # ?
# Case 3: Outside cache, same expression
p, q = 300, 300
print(f"Case 3: {p is q}") # ?
# Case 4: Arithmetic result
m = 150 + 150
n = 300
print(f"Case 4: {m is n}") # ?
# Case 5: Negative edge
neg_a = -5
neg_b = -5
print(f"Case 5: {neg_a is neg_b}") # ?
neg_c = -6
neg_d = -6
print(f"Case 6: {neg_c is neg_d}") # ?Solution
Case 1: True
Case 2: False (usually — CPython may optimize in some contexts)
Case 3: True (constant folding in the same code object)
Case 4: True (150 + 150 is folded at compile time to 300... wait, 300 > 256)
Case 5: True
Case 6: False
Explanation of each case:
Case 1 (True): 100 is in the cache range (-5..256). Both a and b point to the pre-allocated integer object for 100.
Case 2 (False, usually): 300 is outside the cache. Two separate assignment statements create two PyLongObject instances on the heap. They have the same value but different identities.
Case 3 (True): When CPython compiles a single statement like p, q = 300, 300, the constant 300 appears twice in co_consts. However, CPython's peephole optimizer deduplicates constants within the same code object — both p and q end up pointing to the same constant object.
Case 4: Depends on Python version. In CPython 3.8+, the peephole optimizer folds 150 + 150 to 300 at compile time. Whether m is n is True depends on whether both 300 literals are deduplicated in the same code object.
Case 5 (True): -5 is the lower boundary of the cache. Cached.
Case 6 (False): -6 is below the cache boundary. Two separate heap allocations.
Key takeaway: Integer caching is a CPython implementation detail. The peephole optimizer adds further constant-folding behaviour that makes object identity of large integers unpredictable. Never write code that relies on is for numeric comparison.
Expected Output
See solution — output depends on context of assignmentHints
Hint 1: The integer cache applies to the range -5..256. But be careful: the cache is per-interpreter-startup and assignments from the SAME expression can behave differently than separate assignments.
Hint 2: Consider: does `a, b = 300, 300` in a single tuple literal allow CPython to reuse the same object for both? What about assigning in separate statements?
Hard
Measure and quantify the overhead of the CPython eval loop. Isolate the cost of: function call overhead, a single bytecode instruction, and a C-level builtin call.
import timeit
N = 5_000_000
def empty_func():
pass
def single_add(x):
return x + 1
def use_builtin():
return len([])
results = {}
# Function call + return overhead
results["empty_call"] = timeit.timeit(empty_func, number=N) / N * 1e9
# Single add instruction overhead
results["single_add"] = timeit.timeit(lambda: single_add(5), number=N) / N * 1e9
# Builtin call (C-level)
results["len_builtin"] = timeit.timeit(use_builtin, number=N) / N * 1e9
# Pure Python loop overhead (no useful work)
loop_code = """
total = 0
for i in range(100):
total += i
"""
results["loop_100"] = timeit.timeit(loop_code, number=N // 100) / (N // 100) * 1e9
print("Overhead measurements (nanoseconds per call/operation):")
print("-" * 50)
for name, ns in results.items():
print(f" {name:<20} {ns:>10.1f} ns")
Solution
import timeit
N = 5_000_000
def empty_func():
pass
def single_add(x):
return x + 1
def use_builtin():
return len([])
results = {}
results["empty_call"] = timeit.timeit(empty_func, number=N) / N * 1e9
results["single_add"] = timeit.timeit(lambda: single_add(5), number=N) / N * 1e9
results["len_builtin"] = timeit.timeit(use_builtin, number=N) / N * 1e9
loop_code = """
total = 0
for i in range(100):
total += i
"""
results["loop_100"] = timeit.timeit(loop_code, number=N // 100) / (N // 100) * 1e9
print("Overhead measurements (nanoseconds per call/operation):")
print("-" * 50)
for name, ns in results.items():
print(f" {name:<20} {ns:>10.1f} ns")
Typical results on modern hardware:
empty_call ~ 50–80 ns (frame creation + return)
single_add ~ 100–150 ns (call + LOAD_FAST x2 + BINARY_OP + RETURN)
len_builtin ~ 80–120 ns (call + C dispatch + RETURN)
loop_100 ~ 3000–6000 ns total for 100 iterations
What each number reveals:
- Empty call (~60 ns): CPython must create a new frame object, push it onto the call stack, execute RESUME + RETURN_VALUE, and destroy the frame. This is purely interpreter overhead with zero useful work.
- Single add: Adds ~50-80 ns on top of the call overhead for LOAD_FAST × 2 + BINARY_OP (which includes type checking and
int.__add__dispatch) + LOAD_CONST + RETURN_VALUE. - len_builtin: The
len()call drops into C code immediately — no Python-level type dispatch — so it is often comparable to or faster than a Python+operation. - Loop overhead: The per-iteration cost of
FOR_ITER+STORE_FAST+BINARY_OP+JUMP_BACKWARDaccumulates fast. 100 iterations of a trivial loop takes 3–6 microseconds — for a million-element dataset that is 30–60 ms of pure interpreter overhead.
The core lesson: The CPython eval loop overhead is ~50-100 ns per bytecode instruction on modern hardware. NumPy and similar libraries are fast because they move the work into C, executing millions of operations in a single Python instruction (CALL).
import timeit
import ctypes
def measure_overhead():
"""Measure the per-bytecode-instruction overhead of the CPython eval loop.
Compare: doing nothing (pass), a single addition, and a C-level operation.
Express overhead in nanoseconds per operation.
"""
passExpected Output
See solution for timing measurements and analysisHints
Hint 1: Use `timeit.timeit()` with a large number of iterations (1_000_000+) to get stable measurements. Divide the total time by the iteration count to get per-call cost.
Hint 2: Compare: an empty function call vs a function with `pass` vs a single `x + 1` vs `len([])`. The difference between them isolates different components of overhead.
Investigate CPython's object identity model using id() and ctypes. Show that id() is a memory address, demonstrate address reuse after deallocation, and use ctypes to look up a live object by address.
import ctypes
import sys
# Part 1: id() is a memory address
x = object()
addr = id(x)
print(f"id(x) = {addr}")
print(f"hex address: {hex(addr)}")
# Part 2: Look up live object by address (only safe on live references!)
y = [1, 2, 3]
y_id = id(y)
recovered = ctypes.cast(y_id, ctypes.py_object).value
print(f"\nOriginal: {y}")
print(f"Recovered by address: {recovered}")
print(f"Same object: {y is recovered}")
# Part 3: Address reuse after deallocation
# Keep the id but delete the object
ids_seen = []
for _ in range(5):
temp = object()
ids_seen.append(id(temp))
# temp goes out of scope here (refcount -> 0, immediately deallocated)
print(f"\nFirst 5 object() ids:")
for i in ids_seen:
print(f" {hex(i)}")
# Are any addresses reused?
unique_ids = len(set(ids_seen))
print(f"Unique addresses: {unique_ids} out of 5")
print(f"Address reuse observed: {unique_ids < 5}")
Solution
import ctypes
import sys
x = object()
addr = id(x)
print(f"id(x) = {addr}")
print(f"hex address: {hex(addr)}")
y = [1, 2, 3]
y_id = id(y)
recovered = ctypes.cast(y_id, ctypes.py_object).value
print(f"\nOriginal: {y}")
print(f"Recovered by address: {recovered}")
print(f"Same object: {y is recovered}")
ids_seen = []
for _ in range(5):
temp = object()
ids_seen.append(id(temp))
print(f"\nFirst 5 object() ids:")
for i in ids_seen:
print(f" {hex(i)}")
unique_ids = len(set(ids_seen))
print(f"Unique addresses: {unique_ids} out of 5")
print(f"Address reuse observed: {unique_ids < 5}")
Key concepts this demonstrates:
1. id() is a memory address in CPython. The Python language spec only guarantees that id() returns a unique integer for the lifetime of the object. CPython's specific guarantee is stronger: it returns the object's actual memory address. Other implementations (PyPy, Jython) may return different values.
2. Address reuse is real and surprising. Because CPython reference-counts objects, a temporary object inside a loop is immediately deallocated when the loop variable is reassigned. The freed memory can be claimed by the very next allocation — producing the same id(). This is why id(a) == id(b) does NOT mean a and b are the same object: one may have been created after the other was destroyed.
3. ctypes.cast(id(x), ctypes.py_object).value is dangerous. If the object has been deallocated, this dereferences a dangling pointer and will crash Python or return garbage. Only use this on objects you know are alive (held by a live reference).
The practical danger:
a = "hello"
b = (lambda: None)() # Creates and immediately discards an object
# If b happened to get the same address as a former object...
# id(a) == id(some_dead_object) is possible — which is why
# identity checks across non-overlapping lifetimes are meaningless.
import ctypes
def investigate_identity():
"""Demonstrate:
1. id() returns the memory address of a CPython object
2. id() of a dead object can equal id() of a new object (address reuse)
3. Use ctypes to look up a live object by its id (address)
"""
passExpected Output
See solution for memory address investigationHints
Hint 1: In CPython, `id(obj)` returns the memory address of the object as an integer. This is documented in the language spec: "CPython implementation detail: This is the address of the object in memory."
Hint 2: After an object is garbage collected, its memory address can be reused by a new object. This means two objects that existed at different times can have the same id. Use `ctypes.cast(id(x), ctypes.py_object).value` to dereference an id back to an object — only safe on live objects!
Build a self-grading CPython architecture quiz engine. The quiz should cover: integer caching, string interning, the eval loop, PyObject layout, and pymalloc. Include at least 5 questions with multiple-choice options and detailed explanations.
def make_quiz():
return [
{
"q": "What is the CPython integer cache range?",
"choices": ["A: 0 to 100", "B: -5 to 256", "C: -128 to 127", "D: 0 to 255"],
"answer": "B",
"explanation": "CPython pre-allocates integers from -5 to 256 inclusive. These 262 objects are created at interpreter startup and reused for all references to values in this range.",
},
{
"q": "What does sys.getsizeof() measure?",
"choices": [
"A: The total memory used by an object including referenced objects",
"B: The shallow size of the object itself (not referenced objects)",
"C: The size of the object on disk",
"D: The size of the object's type",
],
"answer": "B",
"explanation": "sys.getsizeof() returns the shallow size — just the object header and its direct data. A list of 1000 large strings reports only the list's pointer array size, not the strings themselves.",
},
{
"q": "Which strings does CPython automatically intern?",
"choices": [
"A: All string literals",
"B: Strings shorter than 20 characters",
"C: Strings that look like identifiers (letters, digits, underscores only)",
"D: Strings that appear more than once in a module",
],
"answer": "C",
"explanation": "CPython interns strings that match the identifier pattern. These are candidates for dictionary keys and attribute lookups. Strings with spaces or special characters are not auto-interned.",
},
{
"q": "What happens when an object's reference count reaches zero in CPython?",
"choices": [
"A: It is added to a garbage collection queue",
"B: It is immediately deallocated via tp_dealloc",
"C: It is moved to a dead-objects list for batch cleanup",
"D: It persists until the next GC cycle",
],
"answer": "B",
"explanation": "CPython's reference counting is synchronous. When ob_refcnt drops to zero, tp_dealloc is called immediately. This is why del on a non-cyclic object frees memory right away, unlike Java's GC.",
},
{
"q": "What is the threshold for CPython's pymalloc allocator?",
"choices": ["A: 256 bytes", "B: 512 bytes", "C: 1024 bytes", "D: 4096 bytes"],
"answer": "B",
"explanation": "pymalloc handles allocations of 512 bytes or less. Larger allocations bypass pymalloc and go directly to the system allocator (malloc). This threshold was increased from 256 in Python 3.",
},
]
def run_quiz(questions):
score = 0
for i, q in enumerate(questions, 1):
print(f"\nQ{i}: {q['q']}")
for choice in q["choices"]:
print(f" {choice}")
ans = input("Your answer (A/B/C/D): ").strip().upper()
if ans == q["answer"]:
print("Correct!")
score += 1
else:
print(f"Wrong. Correct answer: {q['answer']}")
print(f"Explanation: {q['explanation']}")
print(f"\nFinal score: {score}/{len(questions)}")
# To run interactively:
# run_quiz(make_quiz())
# Automated verification (for non-interactive use):
quiz = make_quiz()
print(f"Quiz loaded: {len(quiz)} questions")
for i, q in enumerate(quiz, 1):
print(f"Q{i}: {q['q'][:60]}... [Answer: {q['answer']}]")
Solution
def make_quiz():
return [
{
"q": "What is the CPython integer cache range?",
"choices": ["A: 0 to 100", "B: -5 to 256", "C: -128 to 127", "D: 0 to 255"],
"answer": "B",
"explanation": "CPython pre-allocates integers from -5 to 256 inclusive. These 262 objects are created at interpreter startup and reused for all references to values in this range.",
},
{
"q": "What does sys.getsizeof() measure?",
"choices": [
"A: The total memory used by an object including referenced objects",
"B: The shallow size of the object itself (not referenced objects)",
"C: The size of the object on disk",
"D: The size of the object's type",
],
"answer": "B",
"explanation": "sys.getsizeof() returns the shallow size — just the object header and its direct data. A list of 1000 large strings reports only the list's pointer array size, not the strings themselves.",
},
{
"q": "Which strings does CPython automatically intern?",
"choices": [
"A: All string literals",
"B: Strings shorter than 20 characters",
"C: Strings that look like identifiers (letters, digits, underscores only)",
"D: Strings that appear more than once in a module",
],
"answer": "C",
"explanation": "CPython interns strings that match the identifier pattern. These are candidates for dictionary keys and attribute lookups. Strings with spaces or special characters are not auto-interned.",
},
{
"q": "What happens when an object's reference count reaches zero in CPython?",
"choices": [
"A: It is added to a garbage collection queue",
"B: It is immediately deallocated via tp_dealloc",
"C: It is moved to a dead-objects list for batch cleanup",
"D: It persists until the next GC cycle",
],
"answer": "B",
"explanation": "CPython's reference counting is synchronous. When ob_refcnt drops to zero, tp_dealloc is called immediately. This is why del on a non-cyclic object frees memory right away, unlike Java's GC.",
},
{
"q": "What is the threshold for CPython's pymalloc allocator?",
"choices": ["A: 256 bytes", "B: 512 bytes", "C: 1024 bytes", "D: 4096 bytes"],
"answer": "B",
"explanation": "pymalloc handles allocations of 512 bytes or less. Larger allocations bypass pymalloc and go directly to the system allocator (malloc). This threshold was increased from 256 in Python 3.",
},
]
def run_quiz(questions):
score = 0
for i, q in enumerate(questions, 1):
print(f"\nQ{i}: {q['q']}")
for choice in q["choices"]:
print(f" {choice}")
ans = input("Your answer (A/B/C/D): ").strip().upper()
if ans == q["answer"]:
print("Correct!")
score += 1
else:
print(f"Wrong. Correct answer: {q['answer']}")
print(f"Explanation: {q['explanation']}")
print(f"\nFinal score: {score}/{len(questions)}")
quiz = make_quiz()
print(f"Quiz loaded: {len(quiz)} questions")
for i, q in enumerate(quiz, 1):
print(f"Q{i}: {q['q'][:60]}... [Answer: {q['answer']}]")
Extension challenges:
- Add a "hint" field and implement a hint system that deducts a point if used
- Add a
"difficulty"field and sort questions from easy to hard - Track time per question with
time.perf_counter() - Randomise question and answer order to prevent memorisation
- Export incorrect questions to a review file for spaced repetition
def make_quiz():
"""Return a list of (question, answer, explanation) tuples
covering CPython architecture. Then write a run_quiz() function
that presents each question, checks the answer, and shows the
explanation if wrong.
"""
pass
def run_quiz(questions):
passExpected Output
Interactive quiz — see solution for full implementationHints
Hint 1: Structure each question as a dict with keys: "q" (question text), "choices" (list of options), "answer" (correct letter), "explanation" (detailed explanation).
Hint 2: For the runner, iterate questions, print choices, take input, compare to the answer key, and show explanations for wrong answers. Track score.
