Skip to main content

Python CPython Architecture Practice Problems & Exercises

Practice: CPython Architecture

11 problems4 Easy4 Medium3 Hard50–70 min
← Back to lesson

Easy

#1Integer Cache BoundariesEasy
integer-cacheobject-identityis-operator

Demonstrate CPython's integer caching behaviour. Show which integers are cached (returning the same object) and which are not.

Python
import sys

def check_cache(value):
    a = value
    b = value
    return a is b

for val in [256, 257, -5, -6]:
    print(f"a is b ({val}): {check_cache(val)}")
Solution
import sys

def check_cache(value):
a = value
b = value
return a is b

for val in [256, 257, -5, -6]:
print(f"a is b ({val}): {check_cache(val)}")

Why this happens:

CPython pre-allocates integer objects for values in the range -5 to 256 at interpreter startup. These live in a static array in Objects/longobject.c. Any time you reference an integer in this range, CPython returns a pointer to the pre-existing object — no new allocation occurs.

For integers outside this range (like 257), CPython allocates a fresh PyLongObject on the heap each time. Two separate assignments create two separate objects, so is returns False even though the values are equal.

Warning: Never use is to compare integers in production code. Rely on == for value equality. The caching boundary (256) is an implementation detail that could change between CPython versions.

Expected Output
a is b (256): True\na is b (257): False\na is b (-5): True\na is b (-6): False
Hints

Hint 1: CPython caches small integers in the range -5 to 256. Any integer in this range always returns the same object.

Hint 2: Use the `is` operator (identity, not equality) to check whether two names refer to the exact same object in memory.

#2String Interning CheckEasy
string-interningobject-identitysys.intern

Explore CPython's string interning. Show which strings are automatically interned and how to force interning with sys.intern().

Python
import sys

s1 = "hello"
s2 = "hello"
print(f"'hello' interned: {s1 is s2}")

s3 = "hello world"
s4 = "hello world"
print(f"'hello world' auto-interned: {s3 is s4}")

s5 = sys.intern("hello world")
s6 = sys.intern("hello world")
print(f"'hello world' after intern(): {s5 is s6}")
Solution
import sys

s1 = "hello"
s2 = "hello"
print(f"'hello' interned: {s1 is s2}")

s3 = "hello world"
s4 = "hello world"
print(f"'hello world' auto-interned: {s3 is s4}")

s5 = sys.intern("hello world")
s6 = sys.intern("hello world")
print(f"'hello world' after intern(): {s5 is s6}")

How CPython decides to intern strings:

CPython automatically interns string literals that satisfy the identifier rule: only ASCII letters, digits, and underscores. These strings are candidates for dictionary keys and attribute lookups, so deduplication speeds up these operations.

Strings with spaces, punctuation, or non-ASCII characters are NOT auto-interned by default. sys.intern() manually adds a string to the global intern table — subsequent calls with the same value return the pre-existing object.

Performance use case: When you have a large data pipeline with millions of repeated string values (e.g., category names or HTTP methods), interning those strings reduces memory and speeds up dictionary lookups because identity comparison (is, a single pointer comparison) replaces equality comparison (character-by-character scan).

Expected Output
'hello' interned: True\n'hello world' auto-interned: False\n'hello world' after intern(): True
Hints

Hint 1: CPython automatically interns string literals that look like identifiers (only letters, digits, underscores). Strings with spaces are not auto-interned.

Hint 2: Use `sys.intern()` to force interning of any string. After interning, two variables holding the same string value will point to the same object.

#3Inspect PyObject with sys.getsizeofEasy
PyObjectmemory-layoutsys.getsizeof

Use sys.getsizeof() to measure the memory footprint of common Python objects. What is the base overhead of each type?

Python
import sys

objects = [
    ("int", 42),
    ("float", 3.14),
    ("bool", True),
    ("str ('')", ""),
    ("str ('hello')", "hello"),
    ("list ([])", []),
    ("list ([1,2,3])", [1, 2, 3]),
]

for label, obj in objects:
    print(f"{label} size: {sys.getsizeof(obj)} bytes")
Solution
import sys

objects = [
("int", 42),
("float", 3.14),
("bool", True),
("str ('')", ""),
("str ('hello')", "hello"),
("list ([])", []),
("list ([1,2,3])", [1, 2, 3]),
]

for label, obj in objects:
print(f"{label} size: {sys.getsizeof(obj)} bytes")

What these numbers reveal about PyObject layout:

Every CPython object starts with the PyObject header (16 bytes on 64-bit systems):

  • ob_refcnt — 8-byte reference count
  • ob_type — 8-byte pointer to the type object (&PyLong_Type, &PyFloat_Type, etc.)

Beyond that each type adds its own fields:

  • int (PyLongObject): 28 bytes — the header plus digit count and the digit array (minimum 1 digit of 4 bytes, plus alignment)
  • float (PyFloatObject): 24 bytes — header plus a C double (8 bytes)
  • bool: same as int (True and False are PyLongObject instances; bool is a subclass of int)
  • str: 49 bytes empty, +1 byte per ASCII character (compact ASCII representation)
  • list: 56 bytes empty, +8 bytes per element slot (a pointer array)

sys.getsizeof() is shallow — the list [1, 2, 3] reports 88 bytes (list object) but does not count the memory of the three int objects stored in it.

Expected Output
int size: 28 bytes\nfloat size: 24 bytes\nbool size: 28 bytes\nstr ('') size: 49 bytes\nstr ('hello') size: 54 bytes\nlist ([]) size: 56 bytes\nlist ([1,2,3]) size: 88 bytes
Hints

Hint 1: `sys.getsizeof()` returns the size of the Python object itself in bytes — it does NOT include the memory of referenced objects (shallow size only).

Hint 2: Every Python object has a fixed overhead: ob_refcnt (8 bytes) + ob_type pointer (8 bytes) = 16 bytes minimum. Each type then adds its own fields.

#4CPython vs Alternative ImplementationsEasy
cpythonpypyjythonimplementations

Write a function describe_implementation(name) that returns the key characteristics of each Python implementation.

def describe_implementation(name):
implementations = {
"CPython": {
"language": "C",
"execution": "bytecode-interpreted",
"jit": False,
"gil": True,
"key_advantage": "reference implementation, widest compatibility",
},
"PyPy": {
"language": "RPython",
"execution": "JIT-compiled",
"jit": True,
"gil": True,
"key_advantage": "5-10x faster for CPU-bound loops",
},
"Jython": {
"language": "Java",
"execution": "JVM bytecode",
"jit": True,
"gil": False,
"key_advantage": "true Java threading, Java library access",
},
"GraalPy": {
"language": "Java/Truffle",
"execution": "GraalVM JIT",
"jit": True,
"gil": False,
"key_advantage": "polyglot, no GIL",
},
}
return implementations.get(name, {})

for impl in ["CPython", "PyPy", "Jython", "GraalPy"]:
info = describe_implementation(impl)
print(f"{impl}: execution={info['execution']}, jit={info['jit']}, gil={info['gil']}")
Solution
def describe_implementation(name):
implementations = {
"CPython": {
"language": "C",
"execution": "bytecode-interpreted",
"jit": False,
"gil": True,
"key_advantage": "reference implementation, widest compatibility",
},
"PyPy": {
"language": "RPython",
"execution": "JIT-compiled",
"jit": True,
"gil": True,
"key_advantage": "5-10x faster for CPU-bound loops",
},
"Jython": {
"language": "Java",
"execution": "JVM bytecode",
"jit": True,
"gil": False,
"key_advantage": "true Java threading, Java library access",
},
"GraalPy": {
"language": "Java/Truffle",
"execution": "GraalVM JIT",
"jit": True,
"gil": False,
"key_advantage": "polyglot, no GIL",
},
}
return implementations.get(name, {})

for impl in ["CPython", "PyPy", "Jython", "GraalPy"]:
info = describe_implementation(impl)
print(f"{impl}: execution={info['execution']}, jit={info['jit']}, gil={info['gil']}")

When to choose an alternative implementation:

  • PyPy: CPU-bound numeric code (simulations, parsers, text processing) where the JIT can warm up and optimize hot loops. Not suitable when you need C extensions that are CPython-specific.
  • Jython: Applications that need deep integration with a Java codebase or JVM ecosystem. Note: Jython 3 support is still catching up to CPython 3.x.
  • GraalPy: Polyglot applications mixing Python with JavaScript, R, or Java in a single process. Also useful if you need true thread-level parallelism without a GIL.
  • CPython: Everything else — it has the broadest library support, the largest community, and is what production systems run.
Expected Output
See solution for classification and explanations
Hints

Hint 1: CPython is the reference implementation written in C. PyPy uses a JIT compiler. Jython compiles to JVM bytecode. GraalPy runs on GraalVM.

Hint 2: Consider: does it compile to bytecode? Does it have a JIT? What is its primary advantage over CPython?


Medium

#5Observe the Eval Loop with sys.settraceMedium
eval-loopsys.settracetracing

Use sys.settrace to observe every step of the eval loop as it executes a simple function. Record each event and the corresponding line number.

Python
import sys

def build_tracer(events):
    def tracer(frame, event, arg):
        lineno = frame.f_lineno
        events.append((event, f"line {frame.f_lineno - frame.f_code.co_firstlineno + 1}"))
        return tracer
    return tracer

def target_function(x):
    y = x * 2
    z = y + 1
    return z

events = []
sys.settrace(build_tracer(events))
result = target_function(5)
sys.settrace(None)

for event, location in events:
    print(f"{event:<10} {location}")

print(f"\nResult: {result}")
Solution
import sys

def build_tracer(events):
def tracer(frame, event, arg):
events.append((event, f"line {frame.f_lineno - frame.f_code.co_firstlineno + 1}"))
return tracer
return tracer

def target_function(x):
y = x * 2
z = y + 1
return z

events = []
sys.settrace(build_tracer(events))
result = target_function(5)
sys.settrace(None)

for event, location in events:
print(f"{event:<10} {location}")

print(f"\nResult: {result}")

How the eval loop uses trace hooks:

The CPython eval loop (Python/ceval.c) checks for trace hooks at three points:

  1. call — fired when a new frame is pushed (function call begins). The frame object contains f_locals, f_globals, f_code, etc.
  2. line — fired before executing the first bytecode instruction of each new source line. This is what Python debuggers use to implement line-by-line stepping.
  3. return — fired just before the frame is popped (function returns). arg is the return value.

sys.settrace is the foundation of pdb, coverage.py, and most Python debuggers. The performance cost is significant: enabling a trace function makes CPython check the hook on every instruction, roughly 10-20x overhead. Production profilers use sys.setprofile instead (fires only on call/return, not every line).

import sys

def build_tracer(events):
    """Return a trace function that records (event, lineno) pairs.
    Capture 'call', 'line', and 'return' events.
    """
    pass

def target_function(x):
    y = x * 2
    z = y + 1
    return z
Expected Output
call      line 1\nline      line 2\nline      line 3\nline      line 4\nreturn    line 4
Hints

Hint 1: `sys.settrace(fn)` installs a global trace function. It is called with (frame, event, arg). Valid events are "call", "line", "return", "exception".

Hint 2: Your trace function must return itself to keep tracing inside the called function. Return None to stop tracing a function after the first call.

#6pymalloc: Small vs Large AllocationsMedium
pymallocmemory-allocatortracemalloc

Use tracemalloc to observe CPython's memory allocation patterns. Show the difference between small-object allocations (handled by pymalloc) and large allocations (handled by system malloc).

Python
import tracemalloc
import sys

tracemalloc.start()

# Allocate objects of various sizes
small_objects = [bytes(size) for size in range(1, 513)]   # 1–512 bytes: pymalloc range
large_objects = [bytes(size) for size in range(513, 600)]  # 513+ bytes: system malloc

snapshot = tracemalloc.take_snapshot()
stats = snapshot.statistics("lineno")

# Show total current memory
current, peak = tracemalloc.get_traced_memory()
print(f"Current traced memory: {current:,} bytes")
print(f"Peak traced memory:    {peak:,} bytes")
print(f"\nsmall_objects list size (sys.getsizeof): {sys.getsizeof(small_objects)} bytes")
print(f"large_objects list size (sys.getsizeof): {sys.getsizeof(large_objects)} bytes")

# Single-object size comparison
tiny = bytes(10)
big = bytes(1000)
print(f"\nbytes(10) size:   {sys.getsizeof(tiny)} bytes")
print(f"bytes(1000) size: {sys.getsizeof(big)} bytes")

tracemalloc.stop()
Solution
import tracemalloc
import sys

tracemalloc.start()

small_objects = [bytes(size) for size in range(1, 513)]
large_objects = [bytes(size) for size in range(513, 600)]

snapshot = tracemalloc.take_snapshot()

current, peak = tracemalloc.get_traced_memory()
print(f"Current traced memory: {current:,} bytes")
print(f"Peak traced memory: {peak:,} bytes")
print(f"\nsmall_objects list size (sys.getsizeof): {sys.getsizeof(small_objects)} bytes")
print(f"large_objects list size (sys.getsizeof): {sys.getsizeof(large_objects)} bytes")

tiny = bytes(10)
big = bytes(1000)
print(f"\nbytes(10) size: {sys.getsizeof(tiny)} bytes")
print(f"bytes(1000) size: {sys.getsizeof(big)} bytes")

tracemalloc.stop()

CPython's three-tier memory allocator:

Python objects
|
pymalloc (Python's custom allocator)
| handles allocations <= 512 bytes
| uses arenas (256 KB) -> pools (4 KB) -> blocks
|
glibc malloc / jemalloc (system allocator)
| handles allocations > 512 bytes
| also used for pymalloc's arena allocation itself
|
OS mmap / brk (virtual memory)

Why pymalloc exists:

  • Python creates and destroys millions of small objects (function frames, tuples, dicts) per second
  • malloc/free for tiny objects has high overhead (alignment overhead, per-allocation bookkeeping)
  • pymalloc manages fixed-size blocks within pre-allocated pools, making allocation and deallocation O(1) with minimal fragmentation
  • Objects larger than 512 bytes bypass pymalloc entirely and go to the system allocator
Expected Output
See solution for allocation size analysis
Hints

Hint 1: CPython uses pymalloc for objects 512 bytes or smaller. Larger objects go directly to the system malloc. Use `tracemalloc` to observe allocation sizes.

Hint 2: tracemalloc.take_snapshot() captures all current allocations. Filter by filename to isolate your code.

#7Type Object InspectionMedium
type-objectPyTypeObjectMROtype-system

Build a function that inspects a Python type object and reports its key properties: name, MRO, whether it is a C builtin or Python-defined type, and which dunder methods it defines.

def inspect_type(obj):
t = type(obj)
mro = [c.__name__ for c in t.__mro__]
is_builtin = t.__module__ == "builtins"
dunders = [attr for attr in dir(t) if attr.startswith("__") and attr.endswith("__")]

print(f"Object: {repr(obj)[:40]}")
print(f"Type: {t.__name__}")
print(f"Module: {t.__module__}")
print(f"Is C type: {is_builtin}")
print(f"MRO: {' -> '.join(mro)}")
print(f"Dunders: {len(dunders)} ({', '.join(dunders[:5])}...)")
print()

for obj in [42, "hello", [], {}, lambda: None]:
inspect_type(obj)
Solution
def inspect_type(obj):
t = type(obj)
mro = [c.__name__ for c in t.__mro__]
is_builtin = t.__module__ == "builtins"
dunders = [attr for attr in dir(t) if attr.startswith("__") and attr.endswith("__")]

print(f"Object: {repr(obj)[:40]}")
print(f"Type: {t.__name__}")
print(f"Module: {t.__module__}")
print(f"Is C type: {is_builtin}")
print(f"MRO: {' -> '.join(mro)}")
print(f"Dunders: {len(dunders)} ({', '.join(dunders[:5])}...)")
print()

for obj in [42, "hello", [], {}, lambda: None]:
inspect_type(obj)

What a PyTypeObject contains:

In CPython's C source, every Python type is a PyTypeObject struct with over 40 fields, including:

  • tp_name — the type's qualified name
  • tp_basicsize — size of an instance in bytes
  • tp_alloc / tp_dealloc — memory allocation/deallocation hooks
  • tp_repr, tp_str — implement repr() and str()
  • tp_as_number, tp_as_sequence, tp_as_mapping — protocol suites
  • tp_richcompare — implements ==, !=, <, etc.
  • tp_methods — table of methods exposed to Python

When you access obj.__add__, Python walks the type object (and its MRO) looking for tp_as_number->nb_add. The MRO determines which class's version is used — the first match in the MRO wins.

def inspect_type(obj):
    """Print key information about the type of an object.
    Show: type name, MRO, slots (selected __dunder__ methods),
    and whether it is a built-in or user-defined type.
    """
    pass
Expected Output
See solution for type inspection output
Hints

Hint 1: Every Python type is itself a `type` object (a PyTypeObject in C). Access it via `type(obj)` or `obj.__class__`.

Hint 2: Use `type(obj).__mro__` for the method resolution order, `dir(type(obj))` for attributes, and `type(obj).__module__` to distinguish builtins from user types.

#8Predict the Output: Integer Identity Edge CasesMedium
integer-cacheobject-identitypredict-output

Predict the output before running. Explain every True or False result.

Python
# Case 1: Cached range
a = 100
b = 100
print(f"Case 1: {a is b}")   # ?

# Case 2: Outside cache, separate statements
x = 300
y = 300
print(f"Case 2: {x is y}")   # ?

# Case 3: Outside cache, same expression
p, q = 300, 300
print(f"Case 3: {p is q}")   # ?

# Case 4: Arithmetic result
m = 150 + 150
n = 300
print(f"Case 4: {m is n}")   # ?

# Case 5: Negative edge
neg_a = -5
neg_b = -5
print(f"Case 5: {neg_a is neg_b}")   # ?

neg_c = -6
neg_d = -6
print(f"Case 6: {neg_c is neg_d}")   # ?
Solution
Case 1: True
Case 2: False (usually — CPython may optimize in some contexts)
Case 3: True (constant folding in the same code object)
Case 4: True (150 + 150 is folded at compile time to 300... wait, 300 > 256)
Case 5: True
Case 6: False

Explanation of each case:

Case 1 (True): 100 is in the cache range (-5..256). Both a and b point to the pre-allocated integer object for 100.

Case 2 (False, usually): 300 is outside the cache. Two separate assignment statements create two PyLongObject instances on the heap. They have the same value but different identities.

Case 3 (True): When CPython compiles a single statement like p, q = 300, 300, the constant 300 appears twice in co_consts. However, CPython's peephole optimizer deduplicates constants within the same code object — both p and q end up pointing to the same constant object.

Case 4: Depends on Python version. In CPython 3.8+, the peephole optimizer folds 150 + 150 to 300 at compile time. Whether m is n is True depends on whether both 300 literals are deduplicated in the same code object.

Case 5 (True): -5 is the lower boundary of the cache. Cached.

Case 6 (False): -6 is below the cache boundary. Two separate heap allocations.

Key takeaway: Integer caching is a CPython implementation detail. The peephole optimizer adds further constant-folding behaviour that makes object identity of large integers unpredictable. Never write code that relies on is for numeric comparison.

Expected Output
See solution — output depends on context of assignment
Hints

Hint 1: The integer cache applies to the range -5..256. But be careful: the cache is per-interpreter-startup and assignments from the SAME expression can behave differently than separate assignments.

Hint 2: Consider: does `a, b = 300, 300` in a single tuple literal allow CPython to reuse the same object for both? What about assigning in separate statements?


Hard

#9Measure eval Loop OverheadHard
eval-loopperformancetimeitoverhead

Measure and quantify the overhead of the CPython eval loop. Isolate the cost of: function call overhead, a single bytecode instruction, and a C-level builtin call.

import timeit

N = 5_000_000

def empty_func():
pass

def single_add(x):
return x + 1

def use_builtin():
return len([])

results = {}

# Function call + return overhead
results["empty_call"] = timeit.timeit(empty_func, number=N) / N * 1e9

# Single add instruction overhead
results["single_add"] = timeit.timeit(lambda: single_add(5), number=N) / N * 1e9

# Builtin call (C-level)
results["len_builtin"] = timeit.timeit(use_builtin, number=N) / N * 1e9

# Pure Python loop overhead (no useful work)
loop_code = """
total = 0
for i in range(100):
total += i
"""
results["loop_100"] = timeit.timeit(loop_code, number=N // 100) / (N // 100) * 1e9

print("Overhead measurements (nanoseconds per call/operation):")
print("-" * 50)
for name, ns in results.items():
print(f" {name:<20} {ns:>10.1f} ns")
Solution
import timeit

N = 5_000_000

def empty_func():
pass

def single_add(x):
return x + 1

def use_builtin():
return len([])

results = {}

results["empty_call"] = timeit.timeit(empty_func, number=N) / N * 1e9
results["single_add"] = timeit.timeit(lambda: single_add(5), number=N) / N * 1e9
results["len_builtin"] = timeit.timeit(use_builtin, number=N) / N * 1e9

loop_code = """
total = 0
for i in range(100):
total += i
"""
results["loop_100"] = timeit.timeit(loop_code, number=N // 100) / (N // 100) * 1e9

print("Overhead measurements (nanoseconds per call/operation):")
print("-" * 50)
for name, ns in results.items():
print(f" {name:<20} {ns:>10.1f} ns")

Typical results on modern hardware:

empty_call ~ 50–80 ns (frame creation + return)
single_add ~ 100–150 ns (call + LOAD_FAST x2 + BINARY_OP + RETURN)
len_builtin ~ 80–120 ns (call + C dispatch + RETURN)
loop_100 ~ 3000–6000 ns total for 100 iterations

What each number reveals:

  • Empty call (~60 ns): CPython must create a new frame object, push it onto the call stack, execute RESUME + RETURN_VALUE, and destroy the frame. This is purely interpreter overhead with zero useful work.
  • Single add: Adds ~50-80 ns on top of the call overhead for LOAD_FAST × 2 + BINARY_OP (which includes type checking and int.__add__ dispatch) + LOAD_CONST + RETURN_VALUE.
  • len_builtin: The len() call drops into C code immediately — no Python-level type dispatch — so it is often comparable to or faster than a Python + operation.
  • Loop overhead: The per-iteration cost of FOR_ITER + STORE_FAST + BINARY_OP + JUMP_BACKWARD accumulates fast. 100 iterations of a trivial loop takes 3–6 microseconds — for a million-element dataset that is 30–60 ms of pure interpreter overhead.

The core lesson: The CPython eval loop overhead is ~50-100 ns per bytecode instruction on modern hardware. NumPy and similar libraries are fast because they move the work into C, executing millions of operations in a single Python instruction (CALL).

import timeit
import ctypes

def measure_overhead():
    """Measure the per-bytecode-instruction overhead of the CPython eval loop.
    Compare: doing nothing (pass), a single addition, and a C-level operation.
    Express overhead in nanoseconds per operation.
    """
    pass
Expected Output
See solution for timing measurements and analysis
Hints

Hint 1: Use `timeit.timeit()` with a large number of iterations (1_000_000+) to get stable measurements. Divide the total time by the iteration count to get per-call cost.

Hint 2: Compare: an empty function call vs a function with `pass` vs a single `x + 1` vs `len([])`. The difference between them isolates different components of overhead.

#10Object Identity Through the Object ModelHard
idobject-identitymemory-reusectypes

Investigate CPython's object identity model using id() and ctypes. Show that id() is a memory address, demonstrate address reuse after deallocation, and use ctypes to look up a live object by address.

import ctypes
import sys

# Part 1: id() is a memory address
x = object()
addr = id(x)
print(f"id(x) = {addr}")
print(f"hex address: {hex(addr)}")

# Part 2: Look up live object by address (only safe on live references!)
y = [1, 2, 3]
y_id = id(y)
recovered = ctypes.cast(y_id, ctypes.py_object).value
print(f"\nOriginal: {y}")
print(f"Recovered by address: {recovered}")
print(f"Same object: {y is recovered}")

# Part 3: Address reuse after deallocation
# Keep the id but delete the object
ids_seen = []
for _ in range(5):
temp = object()
ids_seen.append(id(temp))
# temp goes out of scope here (refcount -> 0, immediately deallocated)

print(f"\nFirst 5 object() ids:")
for i in ids_seen:
print(f" {hex(i)}")

# Are any addresses reused?
unique_ids = len(set(ids_seen))
print(f"Unique addresses: {unique_ids} out of 5")
print(f"Address reuse observed: {unique_ids < 5}")
Solution
import ctypes
import sys

x = object()
addr = id(x)
print(f"id(x) = {addr}")
print(f"hex address: {hex(addr)}")

y = [1, 2, 3]
y_id = id(y)
recovered = ctypes.cast(y_id, ctypes.py_object).value
print(f"\nOriginal: {y}")
print(f"Recovered by address: {recovered}")
print(f"Same object: {y is recovered}")

ids_seen = []
for _ in range(5):
temp = object()
ids_seen.append(id(temp))

print(f"\nFirst 5 object() ids:")
for i in ids_seen:
print(f" {hex(i)}")

unique_ids = len(set(ids_seen))
print(f"Unique addresses: {unique_ids} out of 5")
print(f"Address reuse observed: {unique_ids < 5}")

Key concepts this demonstrates:

1. id() is a memory address in CPython. The Python language spec only guarantees that id() returns a unique integer for the lifetime of the object. CPython's specific guarantee is stronger: it returns the object's actual memory address. Other implementations (PyPy, Jython) may return different values.

2. Address reuse is real and surprising. Because CPython reference-counts objects, a temporary object inside a loop is immediately deallocated when the loop variable is reassigned. The freed memory can be claimed by the very next allocation — producing the same id(). This is why id(a) == id(b) does NOT mean a and b are the same object: one may have been created after the other was destroyed.

3. ctypes.cast(id(x), ctypes.py_object).value is dangerous. If the object has been deallocated, this dereferences a dangling pointer and will crash Python or return garbage. Only use this on objects you know are alive (held by a live reference).

The practical danger:

a = "hello"
b = (lambda: None)() # Creates and immediately discards an object
# If b happened to get the same address as a former object...
# id(a) == id(some_dead_object) is possible — which is why
# identity checks across non-overlapping lifetimes are meaningless.
import ctypes

def investigate_identity():
    """Demonstrate:
    1. id() returns the memory address of a CPython object
    2. id() of a dead object can equal id() of a new object (address reuse)
    3. Use ctypes to look up a live object by its id (address)
    """
    pass
Expected Output
See solution for memory address investigation
Hints

Hint 1: In CPython, `id(obj)` returns the memory address of the object as an integer. This is documented in the language spec: "CPython implementation detail: This is the address of the object in memory."

Hint 2: After an object is garbage collected, its memory address can be reused by a new object. This means two objects that existed at different times can have the same id. Use `ctypes.cast(id(x), ctypes.py_object).value` to dereference an id back to an object — only safe on live objects!

#11Build a CPython Architecture Quiz EngineHard
cpythonarchitecturequizinternals

Build a self-grading CPython architecture quiz engine. The quiz should cover: integer caching, string interning, the eval loop, PyObject layout, and pymalloc. Include at least 5 questions with multiple-choice options and detailed explanations.

def make_quiz():
return [
{
"q": "What is the CPython integer cache range?",
"choices": ["A: 0 to 100", "B: -5 to 256", "C: -128 to 127", "D: 0 to 255"],
"answer": "B",
"explanation": "CPython pre-allocates integers from -5 to 256 inclusive. These 262 objects are created at interpreter startup and reused for all references to values in this range.",
},
{
"q": "What does sys.getsizeof() measure?",
"choices": [
"A: The total memory used by an object including referenced objects",
"B: The shallow size of the object itself (not referenced objects)",
"C: The size of the object on disk",
"D: The size of the object's type",
],
"answer": "B",
"explanation": "sys.getsizeof() returns the shallow size — just the object header and its direct data. A list of 1000 large strings reports only the list's pointer array size, not the strings themselves.",
},
{
"q": "Which strings does CPython automatically intern?",
"choices": [
"A: All string literals",
"B: Strings shorter than 20 characters",
"C: Strings that look like identifiers (letters, digits, underscores only)",
"D: Strings that appear more than once in a module",
],
"answer": "C",
"explanation": "CPython interns strings that match the identifier pattern. These are candidates for dictionary keys and attribute lookups. Strings with spaces or special characters are not auto-interned.",
},
{
"q": "What happens when an object's reference count reaches zero in CPython?",
"choices": [
"A: It is added to a garbage collection queue",
"B: It is immediately deallocated via tp_dealloc",
"C: It is moved to a dead-objects list for batch cleanup",
"D: It persists until the next GC cycle",
],
"answer": "B",
"explanation": "CPython's reference counting is synchronous. When ob_refcnt drops to zero, tp_dealloc is called immediately. This is why del on a non-cyclic object frees memory right away, unlike Java's GC.",
},
{
"q": "What is the threshold for CPython's pymalloc allocator?",
"choices": ["A: 256 bytes", "B: 512 bytes", "C: 1024 bytes", "D: 4096 bytes"],
"answer": "B",
"explanation": "pymalloc handles allocations of 512 bytes or less. Larger allocations bypass pymalloc and go directly to the system allocator (malloc). This threshold was increased from 256 in Python 3.",
},
]

def run_quiz(questions):
score = 0
for i, q in enumerate(questions, 1):
print(f"\nQ{i}: {q['q']}")
for choice in q["choices"]:
print(f" {choice}")
ans = input("Your answer (A/B/C/D): ").strip().upper()
if ans == q["answer"]:
print("Correct!")
score += 1
else:
print(f"Wrong. Correct answer: {q['answer']}")
print(f"Explanation: {q['explanation']}")
print(f"\nFinal score: {score}/{len(questions)}")

# To run interactively:
# run_quiz(make_quiz())

# Automated verification (for non-interactive use):
quiz = make_quiz()
print(f"Quiz loaded: {len(quiz)} questions")
for i, q in enumerate(quiz, 1):
print(f"Q{i}: {q['q'][:60]}... [Answer: {q['answer']}]")
Solution
def make_quiz():
return [
{
"q": "What is the CPython integer cache range?",
"choices": ["A: 0 to 100", "B: -5 to 256", "C: -128 to 127", "D: 0 to 255"],
"answer": "B",
"explanation": "CPython pre-allocates integers from -5 to 256 inclusive. These 262 objects are created at interpreter startup and reused for all references to values in this range.",
},
{
"q": "What does sys.getsizeof() measure?",
"choices": [
"A: The total memory used by an object including referenced objects",
"B: The shallow size of the object itself (not referenced objects)",
"C: The size of the object on disk",
"D: The size of the object's type",
],
"answer": "B",
"explanation": "sys.getsizeof() returns the shallow size — just the object header and its direct data. A list of 1000 large strings reports only the list's pointer array size, not the strings themselves.",
},
{
"q": "Which strings does CPython automatically intern?",
"choices": [
"A: All string literals",
"B: Strings shorter than 20 characters",
"C: Strings that look like identifiers (letters, digits, underscores only)",
"D: Strings that appear more than once in a module",
],
"answer": "C",
"explanation": "CPython interns strings that match the identifier pattern. These are candidates for dictionary keys and attribute lookups. Strings with spaces or special characters are not auto-interned.",
},
{
"q": "What happens when an object's reference count reaches zero in CPython?",
"choices": [
"A: It is added to a garbage collection queue",
"B: It is immediately deallocated via tp_dealloc",
"C: It is moved to a dead-objects list for batch cleanup",
"D: It persists until the next GC cycle",
],
"answer": "B",
"explanation": "CPython's reference counting is synchronous. When ob_refcnt drops to zero, tp_dealloc is called immediately. This is why del on a non-cyclic object frees memory right away, unlike Java's GC.",
},
{
"q": "What is the threshold for CPython's pymalloc allocator?",
"choices": ["A: 256 bytes", "B: 512 bytes", "C: 1024 bytes", "D: 4096 bytes"],
"answer": "B",
"explanation": "pymalloc handles allocations of 512 bytes or less. Larger allocations bypass pymalloc and go directly to the system allocator (malloc). This threshold was increased from 256 in Python 3.",
},
]

def run_quiz(questions):
score = 0
for i, q in enumerate(questions, 1):
print(f"\nQ{i}: {q['q']}")
for choice in q["choices"]:
print(f" {choice}")
ans = input("Your answer (A/B/C/D): ").strip().upper()
if ans == q["answer"]:
print("Correct!")
score += 1
else:
print(f"Wrong. Correct answer: {q['answer']}")
print(f"Explanation: {q['explanation']}")
print(f"\nFinal score: {score}/{len(questions)}")

quiz = make_quiz()
print(f"Quiz loaded: {len(quiz)} questions")
for i, q in enumerate(quiz, 1):
print(f"Q{i}: {q['q'][:60]}... [Answer: {q['answer']}]")

Extension challenges:

  • Add a "hint" field and implement a hint system that deducts a point if used
  • Add a "difficulty" field and sort questions from easy to hard
  • Track time per question with time.perf_counter()
  • Randomise question and answer order to prevent memorisation
  • Export incorrect questions to a review file for spaced repetition
def make_quiz():
    """Return a list of (question, answer, explanation) tuples
    covering CPython architecture. Then write a run_quiz() function
    that presents each question, checks the answer, and shows the
    explanation if wrong.
    """
    pass

def run_quiz(questions):
    pass
Expected Output
Interactive quiz — see solution for full implementation
Hints

Hint 1: Structure each question as a dict with keys: "q" (question text), "choices" (list of options), "answer" (correct letter), "explanation" (detailed explanation).

Hint 2: For the runner, iterate questions, print choices, take input, compare to the answer key, and show explanations for wrong answers. Track score.

© 2026 EngineersOfAI. All rights reserved.