CPython Architecture - The Interpreter at Engineering Depth
Reading time: ~40 minutes | Level: Intermediate → Engineering
Before reading further, predict the exact output of this program, line by line:
import sys
print(sys.implementation.name) # ?
print(sys.version) # ?
x = []
print(type(x)) # ?
print(type(type(x))) # ?
print(type(type(type(x)))) # ?
Write out every line you expect to see, in order.
Most developers get the first two lines roughly right. Almost no one predicts that the last two type() calls return identical values. The actual output (on CPython 3.12):
cpython
3.12.0 (main, Oct 2 2023, ...) [GCC ...]
<class 'list'>
<class 'type'>
<class 'type'>
type(type(x)) is type. type(type(type(x))) is also type. The chain bottoms out and loops. Why?
Because in CPython, type is its own metaclass. list is an instance of type. type is an instance of type. At the C level, everything bottoms out at PyTypeObject - the C struct that represents every Python type. type is the PyTypeObject for type objects themselves, and its own ob_type pointer points back at itself.
This is not a quirk. It is the foundation of Python's object model. Understanding it means understanding CPython.
What You Will Learn
- What CPython is and how it differs from PyPy, Jython, and MicroPython
- The full execution pipeline: from
.pysource file to result - The CPython eval loop: how
ceval.cexecutes bytecode - PyObject: what every Python object looks like in memory
- The integer cache (-5 to 256) and string interning
sys.getrefcount()andid()in CPython- The small object allocator (pymalloc): arenas, pools, and blocks
- When to consider alternative Python implementations
Prerequisites
- Module 03 Overview
- Python Foundation: functions, classes, modules
- Comfortable with the idea that Python compiles to bytecode (even if you have not seen it yet)
Part 1 - What CPython Is
The Reference Implementation
Python is a language specification. CPython is the reference implementation of that specification - the interpreter the Python core team maintains at github.com/python/cpython, written in C.
When someone says "Python is slow" they mean CPython. When someone says "Python has a GIL" they mean CPython. When someone says "Python compiles to bytecode" they mean CPython's compiler. The language specification does not mandate any of these things - they are CPython implementation choices.
There are several Python implementations:
| Implementation | Written In | Target Use Case | Notes |
|---|---|---|---|
| CPython | C | General purpose, reference implementation | Default; what you are almost certainly running |
| PyPy | Python (RPython) | CPU-bound performance | JIT compiler; 5–50× faster for loops; different GC; no GIL by default |
| Jython | Java | JVM integration | Runs on JVM; can call Java libraries; Python 2 only as of 2024 |
| MicroPython | C | Microcontrollers, embedded | Runs on devices with 256 KB RAM; subset of CPython |
| GraalPy | Java | Polyglot (GraalVM) | Part of GraalVM; experimental Python-Java-JS interoperability |
:::note CPython Is Single-Threaded Per Interpreter Due to the GIL CPython's memory management is not thread-safe at the C level. The Global Interpreter Lock (GIL) ensures only one thread executes Python bytecode at a time. This simplifies CPython's implementation and makes single-threaded code fast, but limits CPU-bound parallelism. Lesson 04 covers the GIL in full. :::
Part 2 - The Execution Pipeline
From Source to Result
When CPython runs a .py file, it passes through six stages:
Each stage transforms the representation:
1. Tokenizer (tokenize module, Parser/tokenize.c in CPython source)
The tokenizer converts raw source text into a flat stream of tokens - keywords, identifiers, operators, literals, and delimiters. At this stage, for i in range(10): becomes [NAME:'for', NAME:'i', NAME:'in', NAME:'range', OP:'(', NUMBER:'10', OP:')', OP:':'].
import tokenize
import io
source = "x = 1 + 2"
tokens = list(tokenize.generate_tokens(io.StringIO(source).readline))
for tok in tokens:
print(tok)
# TokenInfo(type=1 (NAME), string='x', ...)
# TokenInfo(type=54 (OP), string='=', ...)
# TokenInfo(type=2 (NUMBER), string='1', ...)
# ... and so on
2. Parser (Parser/python.asdl, PEG grammar since Python 3.9)
The parser converts the token stream into an Abstract Syntax Tree (AST) - a tree where each node represents a syntactic construct. The ast module gives you direct access:
import ast
tree = ast.parse("x = 1 + 2")
print(ast.dump(tree, indent=2))
# Module(
# body=[
# Assign(
# targets=[Name(id='x', ...)],
# value=BinOp(
# left=Constant(value=1),
# op=Add(),
# right=Constant(value=2)))],
# ...)
3. Compiler (Python/compile.c)
The compiler walks the AST and emits bytecode instructions. This is where variable scoping is resolved, closures are detected, and optimisations (like constant folding) are applied.
4. .pyc file (__pycache__/module.cpython-312.pyc)
The compiled bytecode is cached to disk as a .pyc file. The next time the module is imported, CPython checks if the source has changed (by timestamp or hash). If not, it loads the cached .pyc directly, skipping the tokeniser, parser, and compiler.
5. Code object (in memory)
The bytecode is stored in a code object - a types.CodeType instance containing the bytecode bytes, constants, variable names, and metadata. Lesson 02 covers code objects in full.
6. Eval loop (Python/ceval.c)
The eval loop executes the bytecode, one instruction at a time.
Part 3 - The CPython Eval Loop
ceval.c: The Heart of CPython
The eval loop in Python/ceval.c is a C function - _PyEval_EvalFrameDefault - that contains a giant switch statement. Each case handles one bytecode opcode:
// Simplified conceptual version of ceval.c's main loop
PyObject *
_PyEval_EvalFrameDefault(PyThreadState *tstate, PyFrameObject *f, int throwflag) {
// ...
for (;;) {
opcode = NEXTOPCODE();
switch (opcode) {
case LOAD_FAST: {
PyObject *value = GETLOCAL(oparg);
Py_INCREF(value);
PUSH(value);
break;
}
case BINARY_OP: {
PyObject *right = POP();
PyObject *left = TOP();
PyObject *result = binary_op(left, right, oparg);
SET_TOP(result);
break;
}
case RETURN_VALUE: {
retval = POP();
goto return_or_yield;
}
// ... ~150 more cases
}
}
}
Three key structures drive the eval loop:
The frame object (PyFrameObject / _PyInterpreterFrame in 3.11+): holds the execution context for one function call - the code object, local variables, the instruction pointer, and the value stack.
The value stack: a stack of PyObject * pointers that opcodes push to and pop from. LOAD_FAST pushes a local variable. BINARY_OP pops two operands, computes a result, and pushes it back. RETURN_VALUE pops the return value and exits the frame.
The instruction pointer: an index into co_code (the bytecode bytes) that advances with each instruction.
Part 4 - PyObject: Everything in Memory
The Fundamental C Struct
Every Python object in memory begins with the same header. In C:
// From Include/object.h (simplified)
typedef struct _object {
Py_ssize_t ob_refcnt; // reference count
PyTypeObject *ob_type; // pointer to type object
} PyObject;
// An integer extends PyObject:
typedef struct {
PyObject ob_base; // inherits ob_refcnt and ob_type
Py_ssize_t ob_ival; // the integer value
} PyLongObject;
// A list extends PyObject:
typedef struct {
PyObject ob_base;
Py_ssize_t ob_size; // number of items
PyObject **ob_item; // pointer to array of PyObject pointers
Py_ssize_t allocated; // allocated capacity
} PyListObject;
Every Python object - integers, strings, lists, functions, classes, modules - is a PyObject at its base. The first field is always the reference count. The second is always the type pointer. Type-specific data follows.
:::tip id(obj) is the Memory Address in CPython
In CPython, id(obj) returns the memory address of the object's PyObject struct. This is why two objects with the same value but different identities have different id() values:
a = [1, 2, 3]
b = [1, 2, 3]
print(id(a)) # e.g. 140234567890
print(id(b)) # e.g. 140234567950 - different address, different object
print(a == b) # True - same value
print(a is b) # False - different objects
This is a CPython implementation detail. The language specification only guarantees that id() is unique for the lifetime of an object - not that it is a memory address.
:::
sys.getrefcount() - Inspecting Reference Counts
import sys
x = []
print(sys.getrefcount(x)) # 2 - one for x, one for the getrefcount argument
y = x
print(sys.getrefcount(x)) # 3 - x, y, and the argument
def use(obj):
print(sys.getrefcount(obj)) # 4 inside the call - x, y, argument to use(), argument to getrefcount()
use(x)
print(sys.getrefcount(x)) # back to 3 after use() returns
:::warning sys.getrefcount() Always Returns One Higher Than You Expect
The argument passed to sys.getrefcount() itself creates a temporary reference, so the count is always at least 1 higher than the "real" count. This is expected behaviour - account for it when interpreting results.
:::
Part 5 - The Integer Cache and String Interning
Integer Cache: -5 to 256
At CPython startup, the interpreter pre-allocates PyLongObject structs for every integer from -5 to 256. These are stored in a static array and reused for every reference to these values:
# Integers in the cached range share identity
a = 100
b = 100
print(a is b) # True - same PyLongObject in memory
# Integers outside the cache are freshly allocated
a = 1000
b = 1000
print(a is b) # False - two separate PyLongObject allocations
The range -5 to 256 was chosen empirically: these are the most commonly used integer values in Python programs. Caching them eliminates millions of allocations per typical program run.
:::danger Never Use is to Compare Integer Values
Integer identity (is) only works reliably for -5 to 256 because of the cache. For any other integers, is tests object identity, not value equality. Always use == for value comparison:
# DO NOT DO THIS:
user_age = get_age_from_database() # might return 257
if user_age is 257: # SyntaxWarning in 3.8+, may silently fail
...
# ALWAYS DO THIS:
if user_age == 257:
...
Python 3.8+ issues a SyntaxWarning for is comparisons with literals precisely because of this footgun.
:::
String Interning
CPython also interns certain strings - stores a single canonical copy and reuses it for all references to the same value. This is called string interning.
Strings that look like identifiers (letters, digits, underscores) are automatically interned:
a = "hello"
b = "hello"
print(a is b) # True - automatically interned
a = "hello world" # contains a space
b = "hello world"
print(a is b) # May be True or False - implementation-dependent
import sys
a = sys.intern("hello world") # force interning
b = sys.intern("hello world")
print(a is b) # True - both point to the interned copy
String interning is used heavily in CPython for dictionary key lookups (most dictionary keys are identifier-like strings) and attribute access.
Part 6 - CPython's Small Object Allocator (pymalloc)
Why a Custom Allocator?
Python programs create and destroy vast numbers of small objects - integers, tuples, strings, function argument lists. The system malloc/free cycle is too slow for this volume. CPython uses a custom allocator called pymalloc for objects up to 512 bytes.
The Three-Level Hierarchy
- Arena: 256 KB chunks allocated from the OS. CPython maintains a list of arenas. Arenas are freed back to the OS only when entirely empty.
- Pool: 4 KB pages within an arena. Each pool is dedicated to a single size class - all blocks within a pool are the same size (e.g., all 24-byte blocks). There are 64 size classes (8, 16, 24, ..., 512 bytes).
- Block: a single allocation unit within a pool. When you create a Python object, pymalloc finds a free block of the appropriate size class and returns it.
Why This Is Fast
- No fragmentation within a pool: all blocks are the same size, so any freed block can immediately serve the next allocation of that size class.
- No system call overhead: allocations are served from pre-allocated arenas - no
mmaporsbrkper object. - Cache-friendly: objects of similar size tend to be co-located in the same pool, improving CPU cache utilisation.
Objects larger than 512 bytes bypass pymalloc entirely and go directly to the system allocator.
Part 7 - Alternative Implementations
When to Consider Something Other Than CPython
PyPy is the most mature alternative. It compiles Python to machine code using a Just-In-Time compiler:
# PyPy makes this kind of tight numerical loop ~10-50x faster:
def compute():
total = 0
for i in range(10_000_000):
total += i * i
return total
CPython executes ~10M iterations/sec. PyPy executes ~100M+ iterations/sec for the same loop, because the JIT has compiled the loop to native machine code.
When to use PyPy:
- CPU-bound numerical computation
- Long-running processes where JIT warmup is amortised
- When you cannot use NumPy/C extensions for the hot path
When to stay with CPython:
- You rely on C extensions that haven't been ported (some work, many don't)
- You need the latest Python version immediately (PyPy lags by ~6–18 months)
- Your code is mostly I/O-bound (the GIL releases during I/O; CPython and PyPy are similar)
MicroPython for embedded systems - STM32, ESP32, Raspberry Pi Pico:
# MicroPython on an ESP32 microcontroller
from machine import Pin
import time
led = Pin(2, Pin.OUT)
while True:
led.toggle()
time.sleep_ms(500)
MicroPython runs in as little as 256 KB of RAM with 64 KB of flash. It implements a subset of CPython 3's syntax and most of the standard library.
:::danger Never Write Code That Depends on CPython Implementation Details Not in the Language Spec
The integer cache, string interning, id() returning a memory address, the specific layout of __code__ attributes, the .pyc format - none of these are guaranteed by the Python language specification. Code that relies on them will break on PyPy, Jython, or future CPython versions.
# FRAGILE - do not do this:
assert a is b # assumes integer cache
# CORRECT - language-spec guaranteed:
assert a == b # value equality is always correct
# FRAGILE - do not do this:
address = id(obj) # assume it is a memory address
# CORRECT - language-spec guaranteed:
# id() is unique for the object's lifetime; that's all you can rely on
The one exception: code inside CPython itself, or code in C extensions that explicitly targets CPython. There, knowing the implementation is necessary. :::
Part 8 - Practical Implications
Why Local Variables Are Faster Than Global Variables
import dis
def fast(n):
total = 0
for i in range(n):
total += i
return total
dis.dis(fast)
# LOAD_FAST (total, i) - fast: index into local variable array
# vs
# LOAD_GLOBAL - slower: dict lookup in globals/builtins
LOAD_FAST is an array index operation on the frame's fastlocals array - O(1) with a single C array access. LOAD_GLOBAL is a dict lookup, which is also O(1) but involves hashing and is slower than a direct array index.
This is why experienced Python developers sometimes write:
def hot_loop(data):
# Pull global into local for hot path
_len = len # local reference to the built-in
_append = [].append.__class__ # etc.
for item in data:
_len(item) # LOAD_FAST, not LOAD_GLOBAL
This is a micro-optimisation. Profile before doing it.
Why __pycache__ Exists
import importlib.util
import marshal
# Read a .pyc file manually
with open("__pycache__/mymodule.cpython-312.pyc", "rb") as f:
magic = f.read(4) # Python version magic number
flags = f.read(4) # bit flags (0 = timestamp-based)
timestamp = f.read(4) # source modification time
size = f.read(4) # source file size
code = marshal.loads(f.read()) # the actual code object
print(code.co_filename)
print(code.co_consts)
The .pyc file means Python skips the tokeniser, parser, and compiler on every subsequent import. For large applications with hundreds of modules, this is a meaningful startup time saving.
Key Takeaways
- CPython is the reference implementation of Python, written in C. PyPy, Jython, and MicroPython are alternative implementations with different performance characteristics and tradeoffs
- Every Python object in memory begins with
ob_refcnt(reference count) andob_type(pointer to type) - this is thePyObjectheader; type-specific data follows - The execution pipeline is: source → tokeniser → parser → AST → compiler → bytecode →
.pyccache → eval loop - The eval loop in
ceval.cis a giantswitchon opcodes, operating on a value stack within a frame object - CPython pre-allocates integers -5 to 256 at startup - references to these values always point to the same objects; never use
isto compare integers (use==) - String interning reuses a single canonical object for identifier-like strings;
sys.intern()forces interning for other strings id(obj)returns the memory address in CPython; this is an implementation detail, not a language guaranteesys.getrefcount(obj)shows the reference count (always +1 due to the argument reference); use this to debug reference leaks- pymalloc is CPython's custom small object allocator: arenas (256 KB) → pools (4 KB, single size class) → blocks (8–512 bytes); faster than system
mallocfor small objects LOAD_FAST(local variable) is faster thanLOAD_GLOBAL(module-level lookup) because it is a direct array index vs a dict lookup
Graded Practice Challenges
Level 1 - Predict the Output
Question 1: What does this print?
import sys
a = -5
b = -5
print(a is b)
c = -6
d = -6
print(c is d)
Show Answer
Output:
True
False
-5 is within the integer cache range (-5 to 256), so a and b point to the same pre-allocated PyLongObject. -6 is outside the cache range, so c and d are freshly allocated objects with different identities.
Question 2: What does this print?
import sys
x = "hello"
y = "hello"
print(x is y)
a = sys.intern("hello world")
b = sys.intern("hello world")
print(a is b)
Show Answer
Output:
True
True
"hello" is an identifier-like string and is automatically interned by CPython, so x is y is True. "hello world" contains a space and would not normally be interned, but sys.intern() forces interning, so a is b is True after both calls.
Question 3: What does this print?
import sys
x = []
print(sys.getrefcount(x))
y = x
z = x
print(sys.getrefcount(x))
del y
print(sys.getrefcount(x))
Show Answer
Output:
2
4
3
After x = []: refcount is 2 (one for x, one for the getrefcount argument). After y = x; z = x: refcount is 4 (x, y, z, plus getrefcount argument). After del y: refcount is 3 (x, z, plus getrefcount argument). Each getrefcount call itself adds 1.
Question 4: What does this print?
x = []
print(type(x))
print(type(type(x)))
print(type(type(type(x))))
print(type(x) is type(type(x)))
Show Answer
Output:
<class 'list'>
<class 'type'>
<class 'type'>
True
type(x) is list (the list class). type(list) is type (the metaclass). type(type) is also type - type is its own metaclass. At the C level, PyTypeObject for type has its ob_type pointing back to itself.
Question 5: What does this print?
import sys
a = 256
b = 256
print(id(a) == id(b))
print(id(a) is id(b))
Show Answer
Output:
True
False
id(a) and id(b) return the same integer value (the memory address of the cached 256 object), so id(a) == id(b) is True. However, id(a) is id(b) compares whether the two return values are the same object. Each call to id() returns an integer, and that integer (the address value) is likely larger than 256 - so it is outside the cache range, and each id() call creates a fresh PyLongObject. Therefore is is False.
Level 2 - Debug Challenge
Find and explain the bug. This code is supposed to count references to track whether an object is being held somewhere unexpectedly:
import sys
def check_refs(obj):
count = sys.getrefcount(obj)
if count > 2:
print(f"WARNING: {count} references to object - possible leak")
else:
print(f"OK: {count} references")
my_data = {"key": "value"}
check_refs(my_data) # prints "OK: 2 references"
cache = {}
cache["data"] = my_data
check_refs(my_data) # developer expects "WARNING: 3 references" - prints what?
Show Solution
The bug: The threshold > 2 is wrong. sys.getrefcount() always adds 1 for the argument. So the minimum refcount for a "normal" object passed to check_refs is 2 (one for the original variable, one for the argument). When my_data is also in cache, the refcount inside check_refs is 3 (my_data + cache["data"] + getrefcount argument), which does trigger the warning - so the output for the second call is actually "WARNING: 3 references to object - possible leak".
The conceptual bug is in the threshold logic. The correct threshold for "held somewhere beyond the caller" is > 2 (caller's variable + argument = 2 is normal). The developer's reasoning was correct but confused by the +1.
Corrected and clarified version:
import sys
def check_refs(obj, expected_holders: int = 1):
"""
Check reference count.
expected_holders: how many places you expect to hold a reference
(not counting this function's argument).
sys.getrefcount always adds 1 for the argument itself.
"""
actual = sys.getrefcount(obj)
# actual = expected_holders + 1 (for getrefcount arg) if no extra refs
normal = expected_holders + 1
if actual > normal:
extra = actual - normal
print(f"WARNING: {extra} unexpected extra reference(s) - possible leak")
else:
print(f"OK: refcount={actual} (expected {normal})")
my_data = {"key": "value"}
check_refs(my_data, expected_holders=1) # OK: refcount=2
cache = {}
cache["data"] = my_data
check_refs(my_data, expected_holders=2) # OK: refcount=3 (my_data + cache + arg)
check_refs(my_data, expected_holders=1) # WARNING: 1 unexpected extra reference
Level 3 - Design Challenge
Design a RefTracker context manager that:
- Records the reference count of a target object on entry
- Records the reference count on exit
- Warns if the reference count on exit is higher than on entry (indicating a reference was not released)
- Works correctly accounting for the
sys.getrefcount+1 artefact - Is usable in a test suite to assert no reference leaks in a code block
# Target usage:
data = {"key": "value"}
with RefTracker(data) as tracker:
process(data) # some function that might or might not hold a reference
tracker.assert_no_leak() # raises AssertionError if refcount grew
Show Reference Solution
import sys
class RefTracker:
"""
Context manager to detect reference count leaks.
Usage:
with RefTracker(obj) as tracker:
do_something(obj)
tracker.assert_no_leak()
"""
def __init__(self, obj):
self._obj = obj
self.count_before = None
self.count_after = None
def __enter__(self):
# sys.getrefcount adds 1 for its own argument.
# We subtract 1 to get the "real" count at this point.
# We also subtract 1 for the RefTracker.__init__ storing self._obj.
# The net: count_before = getrefcount - 2
self.count_before = sys.getrefcount(self._obj) - 2
return self
def __exit__(self, exc_type, exc_val, exc_tb):
# On exit, self._obj still holds a reference (+1 we account for).
# getrefcount argument adds another +1.
# Net: count_after = getrefcount - 2
self.count_after = sys.getrefcount(self._obj) - 2
return False # do not suppress exceptions
@property
def delta(self):
"""Net change in reference count."""
if self.count_before is None or self.count_after is None:
raise RuntimeError("RefTracker must be used as a context manager")
return self.count_after - self.count_before
def assert_no_leak(self):
if self.delta > 0:
raise AssertionError(
f"Reference leak detected: refcount grew by {self.delta}. "
f"Before: {self.count_before}, After: {self.count_after}. "
f"Something is holding a reference to the object."
)
def __repr__(self):
return (
f"RefTracker(before={self.count_before}, "
f"after={self.count_after}, delta={self.delta})"
)
# Example: a function that correctly releases references
def safe_process(obj):
local_ref = obj # creates a reference
result = str(local_ref)
# local_ref goes out of scope here - reference released
return result
# Example: a function that leaks (holds reference after return)
_cache = []
def leaky_process(obj):
_cache.append(obj) # holds a reference forever
# Usage in tests:
data = {"key": "value"}
with RefTracker(data) as tracker:
safe_process(data)
tracker.assert_no_leak() # passes - safe_process released its ref
print(tracker) # RefTracker(before=1, after=1, delta=0)
with RefTracker(data) as tracker:
leaky_process(data)
try:
tracker.assert_no_leak() # raises AssertionError
except AssertionError as e:
print(f"Caught: {e}")
# Caught: Reference leak detected: refcount grew by 1. Before: 1, After: 2.
Key design decisions:
- The
-2offset in__enter__and__exit__accounts for: thesys.getrefcountargument (+1) andself._objinsideRefTracker(+1). This normalises to the "external" reference count. __exit__returnsFalseso that exceptions propagate normally - the tracker does not swallow errors.assert_no_leak()is separate from__exit__so you can inspecttracker.deltain different ways in test code.- In pytest, you would typically call
tracker.assert_no_leak()at the end of the test rather than in__exit__, so that test assertions still run even if the block raised.
What's Next
Lesson 02 covers Bytecode Inspection - the code object and all its attributes, how .pyc files are structured on disk, and how to read bytecode with the marshal module. You have seen in this lesson that the compiler produces bytecode and the eval loop executes it. Now you will look inside the bytecode itself: what fields does a code object carry, what do the raw bytes mean, and how does CPython map them back to your source file for tracebacks?
