Python CPython Architecture Practice Problems & Exercises

Practice: CPython Architecture

11 problems4 Easy4 Medium3 Hard⏱ 50–70 min

Easy

#1Integer Cache BoundariesEasy

integer-cacheobject-identityis-operator

Demonstrate CPython's integer caching behaviour. Show which integers are cached (returning the same object) and which are not.

Python

import sys

def check_cache(value):
    a = value
    b = value
    return a is b

for val in [256, 257, -5, -6]:
    print(f"a is b ({val}): {check_cache(val)}")

Solution

import sys

def check_cache(value):
    a = value
    b = value
    return a is b

for val in [256, 257, -5, -6]:
    print(f"a is b ({val}): {check_cache(val)}")

Why this happens:

CPython pre-allocates integer objects for values in the range -5 to 256 at interpreter startup. These live in a static array in Objects/longobject.c. Any time you reference an integer in this range, CPython returns a pointer to the pre-existing object — no new allocation occurs.

For integers outside this range (like 257), CPython allocates a fresh PyLongObject on the heap each time. Two separate assignments create two separate objects, so is returns False even though the values are equal.

Warning: Never use is to compare integers in production code. Rely on == for value equality. The caching boundary (256) is an implementation detail that could change between CPython versions.

Expected Output

a is b (256): True\na is b (257): False\na is b (-5): True\na is b (-6): False

Hints

Hint 1: CPython caches small integers in the range -5 to 256. Any integer in this range always returns the same object.

Hint 2: Use the `is` operator (identity, not equality) to check whether two names refer to the exact same object in memory.

#2String Interning CheckEasy

string-interningobject-identitysys.intern

Explore CPython's string interning. Show which strings are automatically interned and how to force interning with sys.intern().

Python

import sys

s1 = "hello"
s2 = "hello"
print(f"'hello' interned: {s1 is s2}")

s3 = "hello world"
s4 = "hello world"
print(f"'hello world' auto-interned: {s3 is s4}")

s5 = sys.intern("hello world")
s6 = sys.intern("hello world")
print(f"'hello world' after intern(): {s5 is s6}")

Solution

import sys

s1 = "hello"
s2 = "hello"
print(f"'hello' interned: {s1 is s2}")

s3 = "hello world"
s4 = "hello world"
print(f"'hello world' auto-interned: {s3 is s4}")

s5 = sys.intern("hello world")
s6 = sys.intern("hello world")
print(f"'hello world' after intern(): {s5 is s6}")

How CPython decides to intern strings:

CPython automatically interns string literals that satisfy the identifier rule: only ASCII letters, digits, and underscores. These strings are candidates for dictionary keys and attribute lookups, so deduplication speeds up these operations.

Strings with spaces, punctuation, or non-ASCII characters are NOT auto-interned by default. sys.intern() manually adds a string to the global intern table — subsequent calls with the same value return the pre-existing object.

Performance use case: When you have a large data pipeline with millions of repeated string values (e.g., category names or HTTP methods), interning those strings reduces memory and speeds up dictionary lookups because identity comparison (is, a single pointer comparison) replaces equality comparison (character-by-character scan).

Expected Output

'hello' interned: True\n'hello world' auto-interned: False\n'hello world' after intern(): True

Hints

Hint 1: CPython automatically interns string literals that look like identifiers (only letters, digits, underscores). Strings with spaces are not auto-interned.

Hint 2: Use `sys.intern()` to force interning of any string. After interning, two variables holding the same string value will point to the same object.

#3Inspect PyObject with sys.getsizeofEasy

PyObjectmemory-layoutsys.getsizeof

Use sys.getsizeof() to measure the memory footprint of common Python objects. What is the base overhead of each type?

Python

import sys

objects = [
    ("int", 42),
    ("float", 3.14),
    ("bool", True),
    ("str ('')", ""),
    ("str ('hello')", "hello"),
    ("list ([])", []),
    ("list ([1,2,3])", [1, 2, 3]),
]

for label, obj in objects:
    print(f"{label} size: {sys.getsizeof(obj)} bytes")

Solution

import sys

objects = [
    ("int", 42),
    ("float", 3.14),
    ("bool", True),
    ("str ('')", ""),
    ("str ('hello')", "hello"),
    ("list ([])", []),
    ("list ([1,2,3])", [1, 2, 3]),
]

for label, obj in objects:
    print(f"{label} size: {sys.getsizeof(obj)} bytes")

What these numbers reveal about PyObject layout:

Every CPython object starts with the PyObject header (16 bytes on 64-bit systems):

ob_refcnt — 8-byte reference count
ob_type — 8-byte pointer to the type object (&PyLong_Type, &PyFloat_Type, etc.)

Beyond that each type adds its own fields:

int (PyLongObject): 28 bytes — the header plus digit count and the digit array (minimum 1 digit of 4 bytes, plus alignment)
float (PyFloatObject): 24 bytes — header plus a C double (8 bytes)
bool: same as int (True and False are PyLongObject instances; bool is a subclass of int)
str: 49 bytes empty, +1 byte per ASCII character (compact ASCII representation)
list: 56 bytes empty, +8 bytes per element slot (a pointer array)

sys.getsizeof() is shallow — the list [1, 2, 3] reports 88 bytes (list object) but does not count the memory of the three int objects stored in it.

Expected Output

int size: 28 bytes\nfloat size: 24 bytes\nbool size: 28 bytes\nstr ('') size: 49 bytes\nstr ('hello') size: 54 bytes\nlist ([]) size: 56 bytes\nlist ([1,2,3]) size: 88 bytes

Hints

Hint 1: `sys.getsizeof()` returns the size of the Python object itself in bytes — it does NOT include the memory of referenced objects (shallow size only).

Hint 2: Every Python object has a fixed overhead: ob_refcnt (8 bytes) + ob_type pointer (8 bytes) = 16 bytes minimum. Each type then adds its own fields.

#4CPython vs Alternative ImplementationsEasy

cpythonpypyjythonimplementations

Write a function describe_implementation(name) that returns the key characteristics of each Python implementation.

def describe_implementation(name):
    implementations = {
        "CPython": {
            "language": "C",
            "execution": "bytecode-interpreted",
            "jit": False,
            "gil": True,
            "key_advantage": "reference implementation, widest compatibility",
        },
        "PyPy": {
            "language": "RPython",
            "execution": "JIT-compiled",
            "jit": True,
            "gil": True,
            "key_advantage": "5-10x faster for CPU-bound loops",
        },
        "Jython": {
            "language": "Java",
            "execution": "JVM bytecode",
            "jit": True,
            "gil": False,
            "key_advantage": "true Java threading, Java library access",
        },
        "GraalPy": {
            "language": "Java/Truffle",
            "execution": "GraalVM JIT",
            "jit": True,
            "gil": False,
            "key_advantage": "polyglot, no GIL",
        },
    }
    return implementations.get(name, {})

for impl in ["CPython", "PyPy", "Jython", "GraalPy"]:
    info = describe_implementation(impl)
    print(f"{impl}: execution={info['execution']}, jit={info['jit']}, gil={info['gil']}")

Solution

def describe_implementation(name):
    implementations = {
        "CPython": {
            "language": "C",
            "execution": "bytecode-interpreted",
            "jit": False,
            "gil": True,
            "key_advantage": "reference implementation, widest compatibility",
        },
        "PyPy": {
            "language": "RPython",
            "execution": "JIT-compiled",
            "jit": True,
            "gil": True,
            "key_advantage": "5-10x faster for CPU-bound loops",
        },
        "Jython": {
            "language": "Java",
            "execution": "JVM bytecode",
            "jit": True,
            "gil": False,
            "key_advantage": "true Java threading, Java library access",
        },
        "GraalPy": {
            "language": "Java/Truffle",
            "execution": "GraalVM JIT",
            "jit": True,
            "gil": False,
            "key_advantage": "polyglot, no GIL",
        },
    }
    return implementations.get(name, {})

for impl in ["CPython", "PyPy", "Jython", "GraalPy"]:
    info = describe_implementation(impl)
    print(f"{impl}: execution={info['execution']}, jit={info['jit']}, gil={info['gil']}")

When to choose an alternative implementation:

PyPy: CPU-bound numeric code (simulations, parsers, text processing) where the JIT can warm up and optimize hot loops. Not suitable when you need C extensions that are CPython-specific.
Jython: Applications that need deep integration with a Java codebase or JVM ecosystem. Note: Jython 3 support is still catching up to CPython 3.x.
GraalPy: Polyglot applications mixing Python with JavaScript, R, or Java in a single process. Also useful if you need true thread-level parallelism without a GIL.
CPython: Everything else — it has the broadest library support, the largest community, and is what production systems run.

Expected Output

See solution for classification and explanations

Hints

Hint 1: CPython is the reference implementation written in C. PyPy uses a JIT compiler. Jython compiles to JVM bytecode. GraalPy runs on GraalVM.

Hint 2: Consider: does it compile to bytecode? Does it have a JIT? What is its primary advantage over CPython?

Medium

#5Observe the Eval Loop with sys.settraceMedium

eval-loopsys.settracetracing

Use sys.settrace to observe every step of the eval loop as it executes a simple function. Record each event and the corresponding line number.

Python

import sys

def build_tracer(events):
    def tracer(frame, event, arg):
        lineno = frame.f_lineno
        events.append((event, f"line {frame.f_lineno - frame.f_code.co_firstlineno + 1}"))
        return tracer
    return tracer

def target_function(x):
    y = x * 2
    z = y + 1
    return z

events = []
sys.settrace(build_tracer(events))
result = target_function(5)
sys.settrace(None)

for event, location in events:
    print(f"{event:<10} {location}")

print(f"\nResult: {result}")

Solution

import sys

def build_tracer(events):
    def tracer(frame, event, arg):
        events.append((event, f"line {frame.f_lineno - frame.f_code.co_firstlineno + 1}"))
        return tracer
    return tracer

def target_function(x):
    y = x * 2
    z = y + 1
    return z

events = []
sys.settrace(build_tracer(events))
result = target_function(5)
sys.settrace(None)

for event, location in events:
    print(f"{event:<10} {location}")

print(f"\nResult: {result}")

How the eval loop uses trace hooks:

The CPython eval loop (Python/ceval.c) checks for trace hooks at three points:

call — fired when a new frame is pushed (function call begins). The frame object contains f_locals, f_globals, f_code, etc.
line — fired before executing the first bytecode instruction of each new source line. This is what Python debuggers use to implement line-by-line stepping.
return — fired just before the frame is popped (function returns). arg is the return value.

sys.settrace is the foundation of pdb, coverage.py, and most Python debuggers. The performance cost is significant: enabling a trace function makes CPython check the hook on every instruction, roughly 10-20x overhead. Production profilers use sys.setprofile instead (fires only on call/return, not every line).

Starter Code

import sys

def build_tracer(events):
    """Return a trace function that records (event, lineno) pairs.
    Capture 'call', 'line', and 'return' events.
    """
    pass

def target_function(x):
    y = x * 2
    z = y + 1
    return z

Expected Output

call      line 1\nline      line 2\nline      line 3\nline      line 4\nreturn    line 4

Hints

Hint 1: `sys.settrace(fn)` installs a global trace function. It is called with (frame, event, arg). Valid events are "call", "line", "return", "exception".

Hint 2: Your trace function must return itself to keep tracing inside the called function. Return None to stop tracing a function after the first call.

#6pymalloc: Small vs Large AllocationsMedium

pymallocmemory-allocatortracemalloc

Use tracemalloc to observe CPython's memory allocation patterns. Show the difference between small-object allocations (handled by pymalloc) and large allocations (handled by system malloc).

Python

import tracemalloc
import sys

tracemalloc.start()

# Allocate objects of various sizes
small_objects = [bytes(size) for size in range(1, 513)]   # 1–512 bytes: pymalloc range
large_objects = [bytes(size) for size in range(513, 600)]  # 513+ bytes: system malloc

snapshot = tracemalloc.take_snapshot()
stats = snapshot.statistics("lineno")

# Show total current memory
current, peak = tracemalloc.get_traced_memory()
print(f"Current traced memory: {current:,} bytes")
print(f"Peak traced memory:    {peak:,} bytes")
print(f"\nsmall_objects list size (sys.getsizeof): {sys.getsizeof(small_objects)} bytes")
print(f"large_objects list size (sys.getsizeof): {sys.getsizeof(large_objects)} bytes")

# Single-object size comparison
tiny = bytes(10)
big = bytes(1000)
print(f"\nbytes(10) size:   {sys.getsizeof(tiny)} bytes")
print(f"bytes(1000) size: {sys.getsizeof(big)} bytes")

tracemalloc.stop()

Solution

import tracemalloc
import sys

tracemalloc.start()

small_objects = [bytes(size) for size in range(1, 513)]
large_objects = [bytes(size) for size in range(513, 600)]

snapshot = tracemalloc.take_snapshot()

current, peak = tracemalloc.get_traced_memory()
print(f"Current traced memory: {current:,} bytes")
print(f"Peak traced memory:    {peak:,} bytes")
print(f"\nsmall_objects list size (sys.getsizeof): {sys.getsizeof(small_objects)} bytes")
print(f"large_objects list size (sys.getsizeof): {sys.getsizeof(large_objects)} bytes")

tiny = bytes(10)
big = bytes(1000)
print(f"\nbytes(10) size:   {sys.getsizeof(tiny)} bytes")
print(f"bytes(1000) size: {sys.getsizeof(big)} bytes")

tracemalloc.stop()

CPython's three-tier memory allocator:

Python objects
      |
   pymalloc (Python's custom allocator)
      |  handles allocations <= 512 bytes
      |  uses arenas (256 KB) -> pools (4 KB) -> blocks
      |
   glibc malloc / jemalloc (system allocator)
      |  handles allocations > 512 bytes
      |  also used for pymalloc's arena allocation itself
      |
   OS mmap / brk (virtual memory)

Why pymalloc exists:

Python creates and destroys millions of small objects (function frames, tuples, dicts) per second
malloc/free for tiny objects has high overhead (alignment overhead, per-allocation bookkeeping)
pymalloc manages fixed-size blocks within pre-allocated pools, making allocation and deallocation O(1) with minimal fragmentation
Objects larger than 512 bytes bypass pymalloc entirely and go to the system allocator

Expected Output

See solution for allocation size analysis

Hints

Hint 1: CPython uses pymalloc for objects 512 bytes or smaller. Larger objects go directly to the system malloc. Use `tracemalloc` to observe allocation sizes.

Hint 2: tracemalloc.take_snapshot() captures all current allocations. Filter by filename to isolate your code.

#7Type Object InspectionMedium

type-objectPyTypeObjectMROtype-system

Build a function that inspects a Python type object and reports its key properties: name, MRO, whether it is a C builtin or Python-defined type, and which dunder methods it defines.

def inspect_type(obj):
    t = type(obj)
    mro = [c.__name__ for c in t.__mro__]
    is_builtin = t.__module__ == "builtins"
    dunders = [attr for attr in dir(t) if attr.startswith("__") and attr.endswith("__")]

    print(f"Object:    {repr(obj)[:40]}")
    print(f"Type:      {t.__name__}")
    print(f"Module:    {t.__module__}")
    print(f"Is C type: {is_builtin}")
    print(f"MRO:       {' -> '.join(mro)}")
    print(f"Dunders:   {len(dunders)} ({', '.join(dunders[:5])}...)")
    print()

for obj in [42, "hello", [], {}, lambda: None]:
    inspect_type(obj)

Solution

def inspect_type(obj):
    t = type(obj)
    mro = [c.__name__ for c in t.__mro__]
    is_builtin = t.__module__ == "builtins"
    dunders = [attr for attr in dir(t) if attr.startswith("__") and attr.endswith("__")]

    print(f"Object:    {repr(obj)[:40]}")
    print(f"Type:      {t.__name__}")
    print(f"Module:    {t.__module__}")
    print(f"Is C type: {is_builtin}")
    print(f"MRO:       {' -> '.join(mro)}")
    print(f"Dunders:   {len(dunders)} ({', '.join(dunders[:5])}...)")
    print()

for obj in [42, "hello", [], {}, lambda: None]:
    inspect_type(obj)

What a PyTypeObject contains:

In CPython's C source, every Python type is a PyTypeObject struct with over 40 fields, including:

tp_name — the type's qualified name
tp_basicsize — size of an instance in bytes
tp_alloc / tp_dealloc — memory allocation/deallocation hooks
tp_repr, tp_str — implement repr() and str()
tp_as_number, tp_as_sequence, tp_as_mapping — protocol suites
tp_richcompare — implements ==, !=, <, etc.
tp_methods — table of methods exposed to Python

When you access obj.__add__, Python walks the type object (and its MRO) looking for tp_as_number->nb_add. The MRO determines which class's version is used — the first match in the MRO wins.

Starter Code

def inspect_type(obj):
    """Print key information about the type of an object.
    Show: type name, MRO, slots (selected __dunder__ methods),
    and whether it is a built-in or user-defined type.
    """
    pass

Expected Output

See solution for type inspection output

Hints

Hint 1: Every Python type is itself a `type` object (a PyTypeObject in C). Access it via `type(obj)` or `obj.__class__`.

Hint 2: Use `type(obj).__mro__` for the method resolution order, `dir(type(obj))` for attributes, and `type(obj).__module__` to distinguish builtins from user types.

#8Predict the Output: Integer Identity Edge CasesMedium

integer-cacheobject-identitypredict-output

Predict the output before running. Explain every True or False result.

Python

# Case 1: Cached range
a = 100
b = 100
print(f"Case 1: {a is b}")   # ?

# Case 2: Outside cache, separate statements
x = 300
y = 300
print(f"Case 2: {x is y}")   # ?

# Case 3: Outside cache, same expression
p, q = 300, 300
print(f"Case 3: {p is q}")   # ?

# Case 4: Arithmetic result
m = 150 + 150
n = 300
print(f"Case 4: {m is n}")   # ?

# Case 5: Negative edge
neg_a = -5
neg_b = -5
print(f"Case 5: {neg_a is neg_b}")   # ?

neg_c = -6
neg_d = -6
print(f"Case 6: {neg_c is neg_d}")   # ?

Solution

Case 1: True
Case 2: False   (usually — CPython may optimize in some contexts)
Case 3: True    (constant folding in the same code object)
Case 4: True    (150 + 150 is folded at compile time to 300... wait, 300 > 256)
Case 5: True
Case 6: False

Explanation of each case:

Case 1 (True): 100 is in the cache range (-5..256). Both a and b point to the pre-allocated integer object for 100.

Case 2 (False, usually): 300 is outside the cache. Two separate assignment statements create two PyLongObject instances on the heap. They have the same value but different identities.

Case 3 (True): When CPython compiles a single statement like p, q = 300, 300, the constant 300 appears twice in co_consts. However, CPython's peephole optimizer deduplicates constants within the same code object — both p and q end up pointing to the same constant object.

Case 4: Depends on Python version. In CPython 3.8+, the peephole optimizer folds 150 + 150 to 300 at compile time. Whether m is n is True depends on whether both 300 literals are deduplicated in the same code object.

Case 5 (True): -5 is the lower boundary of the cache. Cached.

Case 6 (False): -6 is below the cache boundary. Two separate heap allocations.

Key takeaway: Integer caching is a CPython implementation detail. The peephole optimizer adds further constant-folding behaviour that makes object identity of large integers unpredictable. Never write code that relies on is for numeric comparison.

Expected Output

See solution — output depends on context of assignment

Hints

Hint 1: The integer cache applies to the range -5..256. But be careful: the cache is per-interpreter-startup and assignments from the SAME expression can behave differently than separate assignments.

Hint 2: Consider: does `a, b = 300, 300` in a single tuple literal allow CPython to reuse the same object for both? What about assigning in separate statements?

Hard

#9Measure eval Loop OverheadHard

eval-loopperformancetimeitoverhead

Measure and quantify the overhead of the CPython eval loop. Isolate the cost of: function call overhead, a single bytecode instruction, and a C-level builtin call.

import timeit

N = 5_000_000

def empty_func():
    pass

def single_add(x):
    return x + 1

def use_builtin():
    return len([])

results = {}

# Function call + return overhead
results["empty_call"] = timeit.timeit(empty_func, number=N) / N * 1e9

# Single add instruction overhead
results["single_add"] = timeit.timeit(lambda: single_add(5), number=N) / N * 1e9

# Builtin call (C-level)
results["len_builtin"] = timeit.timeit(use_builtin, number=N) / N * 1e9

# Pure Python loop overhead (no useful work)
loop_code = """
total = 0
for i in range(100):
    total += i
"""
results["loop_100"] = timeit.timeit(loop_code, number=N // 100) / (N // 100) * 1e9

print("Overhead measurements (nanoseconds per call/operation):")
print("-" * 50)
for name, ns in results.items():
    print(f"  {name:<20} {ns:>10.1f} ns")

Solution

import timeit

N = 5_000_000

def empty_func():
    pass

def single_add(x):
    return x + 1

def use_builtin():
    return len([])

results = {}

results["empty_call"] = timeit.timeit(empty_func, number=N) / N * 1e9
results["single_add"] = timeit.timeit(lambda: single_add(5), number=N) / N * 1e9
results["len_builtin"] = timeit.timeit(use_builtin, number=N) / N * 1e9

loop_code = """
total = 0
for i in range(100):
    total += i
"""
results["loop_100"] = timeit.timeit(loop_code, number=N // 100) / (N // 100) * 1e9

print("Overhead measurements (nanoseconds per call/operation):")
print("-" * 50)
for name, ns in results.items():
    print(f"  {name:<20} {ns:>10.1f} ns")

Typical results on modern hardware:

empty_call           ~  50–80 ns    (frame creation + return)
single_add           ~ 100–150 ns   (call + LOAD_FAST x2 + BINARY_OP + RETURN)
len_builtin          ~  80–120 ns   (call + C dispatch + RETURN)
loop_100             ~  3000–6000 ns total for 100 iterations

What each number reveals:

Empty call (~60 ns): CPython must create a new frame object, push it onto the call stack, execute RESUME + RETURN_VALUE, and destroy the frame. This is purely interpreter overhead with zero useful work.
Single add: Adds ~50-80 ns on top of the call overhead for LOAD_FAST × 2 + BINARY_OP (which includes type checking and int.__add__ dispatch) + LOAD_CONST + RETURN_VALUE.
len_builtin: The len() call drops into C code immediately — no Python-level type dispatch — so it is often comparable to or faster than a Python + operation.
Loop overhead: The per-iteration cost of FOR_ITER + STORE_FAST + BINARY_OP + JUMP_BACKWARD accumulates fast. 100 iterations of a trivial loop takes 3–6 microseconds — for a million-element dataset that is 30–60 ms of pure interpreter overhead.

The core lesson: The CPython eval loop overhead is ~50-100 ns per bytecode instruction on modern hardware. NumPy and similar libraries are fast because they move the work into C, executing millions of operations in a single Python instruction (CALL).

Starter Code

import timeit
import ctypes

def measure_overhead():
    """Measure the per-bytecode-instruction overhead of the CPython eval loop.
    Compare: doing nothing (pass), a single addition, and a C-level operation.
    Express overhead in nanoseconds per operation.
    """
    pass

Expected Output

See solution for timing measurements and analysis

Hints

Hint 1: Use `timeit.timeit()` with a large number of iterations (1_000_000+) to get stable measurements. Divide the total time by the iteration count to get per-call cost.

Hint 2: Compare: an empty function call vs a function with `pass` vs a single `x + 1` vs `len([])`. The difference between them isolates different components of overhead.

#10Object Identity Through the Object ModelHard

idobject-identitymemory-reusectypes

Investigate CPython's object identity model using id() and ctypes. Show that id() is a memory address, demonstrate address reuse after deallocation, and use ctypes to look up a live object by address.

import ctypes
import sys

# Part 1: id() is a memory address
x = object()
addr = id(x)
print(f"id(x) = {addr}")
print(f"hex address: {hex(addr)}")

# Part 2: Look up live object by address (only safe on live references!)
y = [1, 2, 3]
y_id = id(y)
recovered = ctypes.cast(y_id, ctypes.py_object).value
print(f"\nOriginal: {y}")
print(f"Recovered by address: {recovered}")
print(f"Same object: {y is recovered}")

# Part 3: Address reuse after deallocation
# Keep the id but delete the object
ids_seen = []
for _ in range(5):
    temp = object()
    ids_seen.append(id(temp))
    # temp goes out of scope here (refcount -> 0, immediately deallocated)

print(f"\nFirst 5 object() ids:")
for i in ids_seen:
    print(f"  {hex(i)}")

# Are any addresses reused?
unique_ids = len(set(ids_seen))
print(f"Unique addresses: {unique_ids} out of 5")
print(f"Address reuse observed: {unique_ids < 5}")

Solution

import ctypes
import sys

x = object()
addr = id(x)
print(f"id(x) = {addr}")
print(f"hex address: {hex(addr)}")

y = [1, 2, 3]
y_id = id(y)
recovered = ctypes.cast(y_id, ctypes.py_object).value
print(f"\nOriginal: {y}")
print(f"Recovered by address: {recovered}")
print(f"Same object: {y is recovered}")

ids_seen = []
for _ in range(5):
    temp = object()
    ids_seen.append(id(temp))

print(f"\nFirst 5 object() ids:")
for i in ids_seen:
    print(f"  {hex(i)}")

unique_ids = len(set(ids_seen))
print(f"Unique addresses: {unique_ids} out of 5")
print(f"Address reuse observed: {unique_ids < 5}")

Key concepts this demonstrates:

1. id() is a memory address in CPython. The Python language spec only guarantees that id() returns a unique integer for the lifetime of the object. CPython's specific guarantee is stronger: it returns the object's actual memory address. Other implementations (PyPy, Jython) may return different values.

2. Address reuse is real and surprising. Because CPython reference-counts objects, a temporary object inside a loop is immediately deallocated when the loop variable is reassigned. The freed memory can be claimed by the very next allocation — producing the same id(). This is why id(a) == id(b) does NOT mean a and b are the same object: one may have been created after the other was destroyed.

3. ctypes.cast(id(x), ctypes.py_object).value is dangerous. If the object has been deallocated, this dereferences a dangling pointer and will crash Python or return garbage. Only use this on objects you know are alive (held by a live reference).

The practical danger:

a = "hello"
b = (lambda: None)()  # Creates and immediately discards an object
# If b happened to get the same address as a former object...
# id(a) == id(some_dead_object) is possible — which is why
# identity checks across non-overlapping lifetimes are meaningless.

Starter Code

import ctypes

def investigate_identity():
    """Demonstrate:
    1. id() returns the memory address of a CPython object
    2. id() of a dead object can equal id() of a new object (address reuse)
    3. Use ctypes to look up a live object by its id (address)
    """
    pass

Expected Output

See solution for memory address investigation

Hints

Hint 1: In CPython, `id(obj)` returns the memory address of the object as an integer. This is documented in the language spec: "CPython implementation detail: This is the address of the object in memory."

Hint 2: After an object is garbage collected, its memory address can be reused by a new object. This means two objects that existed at different times can have the same id. Use `ctypes.cast(id(x), ctypes.py_object).value` to dereference an id back to an object — only safe on live objects!

#11Build a CPython Architecture Quiz EngineHard

cpythonarchitecturequizinternals

Build a self-grading CPython architecture quiz engine. The quiz should cover: integer caching, string interning, the eval loop, PyObject layout, and pymalloc. Include at least 5 questions with multiple-choice options and detailed explanations.

def make_quiz():
    return [
        {
            "q": "What is the CPython integer cache range?",
            "choices": ["A: 0 to 100", "B: -5 to 256", "C: -128 to 127", "D: 0 to 255"],
            "answer": "B",
            "explanation": "CPython pre-allocates integers from -5 to 256 inclusive. These 262 objects are created at interpreter startup and reused for all references to values in this range.",
        },
        {
            "q": "What does sys.getsizeof() measure?",
            "choices": [
                "A: The total memory used by an object including referenced objects",
                "B: The shallow size of the object itself (not referenced objects)",
                "C: The size of the object on disk",
                "D: The size of the object's type",
            ],
            "answer": "B",
            "explanation": "sys.getsizeof() returns the shallow size — just the object header and its direct data. A list of 1000 large strings reports only the list's pointer array size, not the strings themselves.",
        },
        {
            "q": "Which strings does CPython automatically intern?",
            "choices": [
                "A: All string literals",
                "B: Strings shorter than 20 characters",
                "C: Strings that look like identifiers (letters, digits, underscores only)",
                "D: Strings that appear more than once in a module",
            ],
            "answer": "C",
            "explanation": "CPython interns strings that match the identifier pattern. These are candidates for dictionary keys and attribute lookups. Strings with spaces or special characters are not auto-interned.",
        },
        {
            "q": "What happens when an object's reference count reaches zero in CPython?",
            "choices": [
                "A: It is added to a garbage collection queue",
                "B: It is immediately deallocated via tp_dealloc",
                "C: It is moved to a dead-objects list for batch cleanup",
                "D: It persists until the next GC cycle",
            ],
            "answer": "B",
            "explanation": "CPython's reference counting is synchronous. When ob_refcnt drops to zero, tp_dealloc is called immediately. This is why del on a non-cyclic object frees memory right away, unlike Java's GC.",
        },
        {
            "q": "What is the threshold for CPython's pymalloc allocator?",
            "choices": ["A: 256 bytes", "B: 512 bytes", "C: 1024 bytes", "D: 4096 bytes"],
            "answer": "B",
            "explanation": "pymalloc handles allocations of 512 bytes or less. Larger allocations bypass pymalloc and go directly to the system allocator (malloc). This threshold was increased from 256 in Python 3.",
        },
    ]

def run_quiz(questions):
    score = 0
    for i, q in enumerate(questions, 1):
        print(f"\nQ{i}: {q['q']}")
        for choice in q["choices"]:
            print(f"  {choice}")
        ans = input("Your answer (A/B/C/D): ").strip().upper()
        if ans == q["answer"]:
            print("Correct!")
            score += 1
        else:
            print(f"Wrong. Correct answer: {q['answer']}")
            print(f"Explanation: {q['explanation']}")
    print(f"\nFinal score: {score}/{len(questions)}")

# To run interactively:
# run_quiz(make_quiz())

# Automated verification (for non-interactive use):
quiz = make_quiz()
print(f"Quiz loaded: {len(quiz)} questions")
for i, q in enumerate(quiz, 1):
    print(f"Q{i}: {q['q'][:60]}... [Answer: {q['answer']}]")

Solution

def make_quiz():
    return [
        {
            "q": "What is the CPython integer cache range?",
            "choices": ["A: 0 to 100", "B: -5 to 256", "C: -128 to 127", "D: 0 to 255"],
            "answer": "B",
            "explanation": "CPython pre-allocates integers from -5 to 256 inclusive. These 262 objects are created at interpreter startup and reused for all references to values in this range.",
        },
        {
            "q": "What does sys.getsizeof() measure?",
            "choices": [
                "A: The total memory used by an object including referenced objects",
                "B: The shallow size of the object itself (not referenced objects)",
                "C: The size of the object on disk",
                "D: The size of the object's type",
            ],
            "answer": "B",
            "explanation": "sys.getsizeof() returns the shallow size — just the object header and its direct data. A list of 1000 large strings reports only the list's pointer array size, not the strings themselves.",
        },
        {
            "q": "Which strings does CPython automatically intern?",
            "choices": [
                "A: All string literals",
                "B: Strings shorter than 20 characters",
                "C: Strings that look like identifiers (letters, digits, underscores only)",
                "D: Strings that appear more than once in a module",
            ],
            "answer": "C",
            "explanation": "CPython interns strings that match the identifier pattern. These are candidates for dictionary keys and attribute lookups. Strings with spaces or special characters are not auto-interned.",
        },
        {
            "q": "What happens when an object's reference count reaches zero in CPython?",
            "choices": [
                "A: It is added to a garbage collection queue",
                "B: It is immediately deallocated via tp_dealloc",
                "C: It is moved to a dead-objects list for batch cleanup",
                "D: It persists until the next GC cycle",
            ],
            "answer": "B",
            "explanation": "CPython's reference counting is synchronous. When ob_refcnt drops to zero, tp_dealloc is called immediately. This is why del on a non-cyclic object frees memory right away, unlike Java's GC.",
        },
        {
            "q": "What is the threshold for CPython's pymalloc allocator?",
            "choices": ["A: 256 bytes", "B: 512 bytes", "C: 1024 bytes", "D: 4096 bytes"],
            "answer": "B",
            "explanation": "pymalloc handles allocations of 512 bytes or less. Larger allocations bypass pymalloc and go directly to the system allocator (malloc). This threshold was increased from 256 in Python 3.",
        },
    ]

def run_quiz(questions):
    score = 0
    for i, q in enumerate(questions, 1):
        print(f"\nQ{i}: {q['q']}")
        for choice in q["choices"]:
            print(f"  {choice}")
        ans = input("Your answer (A/B/C/D): ").strip().upper()
        if ans == q["answer"]:
            print("Correct!")
            score += 1
        else:
            print(f"Wrong. Correct answer: {q['answer']}")
            print(f"Explanation: {q['explanation']}")
    print(f"\nFinal score: {score}/{len(questions)}")

quiz = make_quiz()
print(f"Quiz loaded: {len(quiz)} questions")
for i, q in enumerate(quiz, 1):
    print(f"Q{i}: {q['q'][:60]}... [Answer: {q['answer']}]")

Extension challenges:

Add a "hint" field and implement a hint system that deducts a point if used
Add a "difficulty" field and sort questions from easy to hard
Track time per question with time.perf_counter()
Randomise question and answer order to prevent memorisation
Export incorrect questions to a review file for spaced repetition

Starter Code

def make_quiz():
    """Return a list of (question, answer, explanation) tuples
    covering CPython architecture. Then write a run_quiz() function
    that presents each question, checks the answer, and shows the
    explanation if wrong.
    """
    pass

def run_quiz(questions):
    pass

Expected Output

Interactive quiz — see solution for full implementation

Hints

Hint 1: Structure each question as a dict with keys: "q" (question text), "choices" (list of options), "answer" (correct letter), "explanation" (detailed explanation).

Hint 2: For the runner, iterate questions, print choices, take input, compare to the answer key, and show explanations for wrong answers. Track score.

Practice: CPython Architecture

Easy​

Medium​

Hard​

Easy

Medium

Hard