Skip to main content

Execution Model in Practice - Source to Bytecode to PVM

Reading time: ~25 minutes | Level: Foundation → Engineering

Consider this snippet. What do you think happens when Python "runs" it?

def add(a, b):
return a + b

result = add(3, 4)
print(result)

Most people answer: "Python reads the code top to bottom and executes it." That answer is incomplete enough to cause real bugs. Python does not simply "read and run." It transforms your source text through a structured 5-stage pipeline before a single instruction executes. The pipeline determines when errors are caught, why imports behave the way they do, how scope lookup works, and why the mutable default argument trap exists. Engineers who understand this pipeline debug faster, write better code, and answer interview questions confidently.

What You Will Learn

  • The complete 5-stage execution pipeline: source → tokenizer → parser/AST → compiler → bytecode → PVM
  • How to inspect bytecode using the dis module and what each instruction means
  • What a stack frame contains, its lifetime, and how the call stack operates as LIFO
  • The architectural distinction between heap-allocated objects and stack-frame references
  • What .pyc files are, where they live, and when CPython regenerates them
  • Why if __name__ == "__main__" works and what it has to do with the execution model
  • Exactly what happens during import, why modules execute only once, and how sys.modules caches them
  • The difference between syntax errors (caught at parse time) and runtime errors (caught during PVM execution)
  • The LEGB rule as a frame-lookup algorithm, not merely a conceptual model
  • The CPython Global Interpreter Lock (GIL) - what it protects and its consequence for threads
  • Pitfalls: NameError from execution order and the mutable default argument trap

Prerequisites

  • Python installed (3.10+ recommended)
  • Familiarity with defining and calling functions
  • Basic understanding of variables and assignment

The 5-Stage Execution Pipeline

When you run python script.py, CPython executes five distinct stages. Each stage has a specific input and output. Understanding the boundary between stages is the key to diagnosing errors correctly.

Stage 1 - Source Reading

CPython reads the .py file as raw bytes, applying the encoding declaration (# -*- coding: utf-8 -*-) or defaulting to UTF-8. This stage is trivial - it simply loads text into memory. No analysis happens here.

Stage 2 - Tokenization (Lexing)

The tokenizer (lexer) converts the raw character stream into a flat sequence of tokens. A token is a classified unit: NAME (identifier), NUMBER (literal), OP (operator), NEWLINE, INDENT, DEDENT, and so on. You can inspect tokens directly:

import tokenize
import io

source = "x = 10 + 5"
tokens = tokenize.generate_tokens(io.StringIO(source).readline)
for tok in tokens:
print(tok)
TokenInfo(type=1 (NAME), string='x', ...)
TokenInfo(type=54 (OP), string='=', ...)
TokenInfo(type=2 (NUMBER), string='10', ...)
TokenInfo(type=54 (OP), string='+', ...)
TokenInfo(type=2 (NUMBER), string='5', ...)

Syntax errors at this stage are SyntaxError: invalid character or similar - the tokenizer rejects characters that cannot form valid tokens.

Stage 3 - Parsing into an Abstract Syntax Tree (AST)

The parser consumes the token stream and builds an Abstract Syntax Tree (AST) - a hierarchical data structure that represents the grammatical structure of your program. The AST strips away formatting details (whitespace, comments) and retains only the semantically meaningful structure.

import ast

source = "x = 10 + 5"
tree = ast.parse(source)
print(ast.dump(tree, indent=2))
Module(
body=[
Assign(
targets=[Name(id='x', ctx=Store())],
value=BinOp(
left=Constant(value=10),
op=Add(),
right=Constant(value=5)
)
)
],
type_ignores=[]
)

Syntax errors at this stage are the classic SyntaxError: invalid syntax - a mismatched parenthesis, a missing colon after def, or an invalid expression that the grammar does not accept. These errors are caught before any code executes.

Stage 4 - Bytecode Compilation

The compiler walks the AST and emits bytecode - a sequence of compact, platform-independent instructions understood by the Python Virtual Machine. Each instruction is typically 2 bytes (opcode + argument). Bytecode is not machine code; it is an intermediate representation that trades execution speed for portability and simplicity.

The compiled bytecode is packaged into a code object (a PyCodeObject in CPython's C source), which contains:

  • The bytecode instruction sequence
  • Constants referenced by the code
  • Names of variables used
  • Metadata: filename, line number table, argument counts

Stage 5 - Execution by the Python Virtual Machine (PVM)

The PVM is a stack-based interpreter implemented in C (for CPython). It reads bytecode instructions one at a time, manipulates a value stack, creates and destroys stack frames, and manages the call stack. The PVM is a tight loop (ceval.c in CPython's source) that dispatches on each opcode.

note

CPython is the reference implementation. Other implementations (PyPy, Jython, MicroPython) compile to different targets but share the same execution model conceptually. PyPy uses a JIT compiler that re-compiles hot bytecode to native machine code at runtime.

Inspecting Bytecode with dis

The dis module exposes the bytecode compilation stage. It is an essential engineering tool for understanding performance, debugging closures, and demystifying Python's behavior.

import dis

def add(a, b):
result = a + b
return result

dis.dis(add)
4 0 RESUME 0

5 2 LOAD_FAST 0 (a)
4 LOAD_FAST 1 (b)
6 BINARY_OP 0 (+)
10 STORE_FAST 2 (result)

6 12 LOAD_FAST 2 (result)
14 RETURN_VALUE

Reading the Output

Each row in the dis output has four columns:

LINE_NUM OFFSET OPCODE ARGUMENT (name)
5 2 LOAD_FAST 0 (a)
  • LINE_NUM: the source line that generated this instruction
  • OFFSET: byte offset of the instruction within the bytecode string (instructions are 2 bytes each in Python 3.6+)
  • OPCODE: the operation name
  • ARGUMENT: numeric argument to the opcode, with a human-readable name in parentheses

Key Opcodes

OpcodeWhat It Does
LOAD_FASTPush a local variable's value onto the value stack
STORE_FASTPop the top of the value stack into a local variable slot
LOAD_GLOBALLook up a global name and push its value
LOAD_CONSTPush a constant (number, string, None) onto the stack
BINARY_OPPop two values, apply a binary operator, push result
CALLCall a callable with arguments from the stack
RETURN_VALUEReturn the top of the stack to the caller
RESUMENo-op marker used by debuggers/profilers (Python 3.11+)
tip

Use dis.dis() on any function to understand why one version of code is faster than another. Fewer instructions generally means less interpreter overhead - though the most expensive instructions (function calls, attribute lookups) dominate.

Inspecting a Code Object Directly

def greet(name):
return "Hello, " + name

code = greet.__code__
print("Argument count:", code.co_argcount)
print("Local variables:", code.co_varnames)
print("Constants: ", code.co_consts)
print("Bytecode (hex): ", code.co_code.hex())
Argument count: 1
Local variables: ('name',)
Constants: ('Hello, ',)
Bytecode (hex): 97000...

Stack Frames - The Invisible Structure

Every function call in Python creates a stack frame (a PyFrameObject in CPython). The frame is the execution context for one invocation of one function.

What a Stack Frame Contains

Frame Lifetime

A frame is created when a function is called and destroyed when the function returns (or raises an unhandled exception). CPython uses a free-list to reuse frame objects rather than allocating fresh ones every call - an important performance optimization.

You can inspect the current frame at runtime:

import sys

def show_frame():
frame = sys._getframe()
print("Function:", frame.f_code.co_name)
print("File: ", frame.f_code.co_filename)
print("Line: ", frame.f_lineno)
print("Locals: ", frame.f_locals)

show_frame()

The Call Stack is LIFO

When functions call other functions, frames stack up. The call stack is a Last In, First Out (LIFO) structure.

def a():
b()

def b():
c()

def c():
import traceback
traceback.print_stack()

a()
File "demo.py", line 10, in <module>
a()
File "demo.py", line 2, in a
b()
File "demo.py", line 5, in b
c()
File "demo.py", line 8, in c
traceback.print_stack()

Frames are pushed on call, popped on return. When c() returns, its frame is destroyed and execution resumes in b()'s frame exactly where it left off - this is tracked by f_lasti (last instruction index).

warning

Python's default recursion limit is 1000 frames. Deeply recursive algorithms will raise RecursionError: maximum recursion depth exceeded. Check sys.getrecursionlimit() and set with sys.setrecursionlimit() - but prefer iterative solutions or functools.lru_cache for recursive problems.

Heap vs Stack - Where Objects Actually Live

This distinction trips up developers coming from languages with value semantics (C, Go, Rust).

Every Python object - int, str, list, dict, your custom class instance, even True and None - lives on the heap. A variable name in a frame is merely a reference (a pointer) to a heap object. Assignment (x = 42) does not store 42 in the frame; it stores a reference to the int object on the heap that holds 42.

This is why:

a = [1, 2, 3]
b = a # b is another reference to the SAME list object
b.append(4)
print(a) # [1, 2, 3, 4] - a and b point to the same heap object

And why reassignment does not mutate:

a = [1, 2, 3]
b = a
b = [9, 9, 9] # b now points to a NEW list object on the heap
print(a) # [1, 2, 3] - a is unchanged

.pyc Files and __pycache__

CPython caches the bytecode compilation result to avoid recompiling unchanged source files on every run.

Where They Live

# After running: python mymodule.py
ls __pycache__/
mymodule.cpython-312.pyc

The naming convention is <module>.cpython-<version>.pyc. The cpython-312 tag means this bytecode was compiled by CPython 3.12. Different Python versions produce incompatible bytecode, so each version gets its own .pyc file.

What Is Inside a .pyc File

A .pyc file contains:

  1. A 4-byte magic number (identifies Python version)
  2. Bit field flags
  3. A timestamp or hash of the source file
  4. The source file size
  5. The marshalled code object (serialized bytecode + constants + metadata)

When Is a .pyc Regenerated?

CPython compares the modification timestamp (and size) of the .py file against what is stored in the .pyc header. If they differ, the source is recompiled and a new .pyc is written. This means:

  • First run of a new file: always compiles
  • Unchanged file: loads .pyc directly, skipping stages 2–4
  • Modified file: recompiles automatically
tip

.pyc files belong in .gitignore. They are build artifacts and are regenerated on demand. Add **/__pycache__/ to your .gitignore.

__name__ == "__main__" - Explained via Execution Model

This is one of Python's most frequently used but least understood patterns. The execution model explains it completely.

When CPython executes a .py file directly (python script.py), it sets the module's __name__ attribute to the string "__main__" before executing any code in that file.

When the same file is imported by another module, __name__ is set to the module's name (e.g., "script" for script.py).

# script.py

def train_model():
print("Training started...")

print(f"Module name: {__name__}")

if __name__ == "__main__":
train_model()
# Direct execution:
python script.py
# Output: Module name: __main__
# Training started...

# Imported:
python -c "import script"
# Output: Module name: script
# (train_model is NOT called)

This pattern exists precisely because of stage 3/4 in the execution model: when you import a module, Python compiles and executes all of its top-level code. Without the __name__ guard, importing a module would trigger its side effects (database connections, network calls, training loops).

note

Every Python module is a first-class object. The __name__ attribute is set by the import machinery, not by anything in your code. The __main__ module is special: it is always the entry point of execution.

Import Execution - What Really Happens on import

The import statement is far more than a "load this file" instruction.

The Full Import Sequence

Why Modules Execute Only Once

sys.modules is a dictionary mapping module names to module objects. On the first import mymodule, the module executes and its object is stored in sys.modules["mymodule"]. Every subsequent import mymodule anywhere in the process returns the cached object from sys.modules immediately - step 1 short-circuits the rest.

import sys

import os
print("os" in sys.modules) # True

# Importing again costs almost nothing:
import os # Returns cached module instantly

This has important consequences:

# module_a.py
import module_b # module_b executes here

# module_b.py
import module_a # sys.modules["module_a"] already exists (partially!)
# Returns the partial module object - classic circular import issue
warning

Circular imports are a symptom of poor module design. If module A imports module B which imports module A, the second import gets a partially-initialized module object. Refactor by moving shared code to a third module or deferring imports to function scope.

Runtime Errors vs Syntax Errors - Which Stage They Come From

Understanding the pipeline tells you exactly when each class of error is detected.

Error TypeStage DetectedExample
SyntaxError (invalid token)Tokenizer (Stage 2)print("hello" (unclosed paren)
SyntaxError (invalid grammar)Parser (Stage 3)def def foo():
IndentationErrorParser (Stage 3)Mixed tabs and spaces
NameErrorPVM (Stage 5)Using undefined variable
TypeErrorPVM (Stage 5)1 + "a"
ZeroDivisionErrorPVM (Stage 5)x / 0
AttributeErrorPVM (Stage 5)None.upper()
ImportErrorImport/PVM (Stage 4-5)Module not found

The critical insight: all runtime errors occur during stage 5 (PVM execution), which means the code compiled successfully - the error is in the program's logic or data, not its structure.

# This compiles fine. The error only appears when this line executes.
def risky(data):
return data["key"] # KeyError at runtime if "key" missing

# This fails at parse time - no bytecode is ever generated
def broken(: # SyntaxError: invalid syntax
pass

LEGB Scope Rule - As a Frame Lookup Algorithm

LEGB is not just a mnemonic. It describes the precise algorithm the PVM uses when executing a LOAD_GLOBAL or LOAD_NAME instruction for a name that is not in the local fast-variable array.

x = "global" # G - stored in module's __dict__

def outer():
x = "enclosing" # E - stored in outer's frame

def inner():
# x = "local" # L - would be stored in inner's frame
print(x) # No local x → walks to E → finds "enclosing"

inner()

outer() # prints: enclosing

The compiler decides at compile time whether a name in a function is local or global, based on whether it is ever assigned within that function. This is why you get UnboundLocalError (not NameError) when you read a name before assigning it in the same function:

x = 10

def broken():
print(x) # UnboundLocalError! Python sees x = 20 below and marks x as local.
x = 20 # This assignment makes x a LOCAL name for the entire function.

broken()

The compiler sees the assignment x = 20 and marks x as a local variable for the whole function body. When LOAD_FAST x executes before STORE_FAST x, the slot is empty - hence UnboundLocalError.

The CPython GIL - Brief Engineering Perspective

CPython's Global Interpreter Lock (GIL) is a mutex that protects the interpreter's internal state. Only one thread can execute Python bytecode at any given moment.

Why does it exist? CPython manages memory with reference counting. Every object has an ob_refcnt field. Without the GIL, two threads simultaneously incrementing/decrementing reference counts could corrupt the count, causing use-after-free or memory leaks. The GIL makes reference counting thread-safe without per-object locking overhead.

Consequence: CPU-bound multi-threaded Python programs do not gain speed from multiple cores. For CPU parallelism, use multiprocessing (separate processes, no shared GIL) or native extensions that release the GIL during computation (NumPy, pandas do this).

I/O-bound programs work fine with threads: the GIL is released during I/O waits, allowing other threads to execute.

note

Python 3.13 introduces an experimental "free-threaded" mode (--disable-gil build flag) that removes the GIL. This is opt-in and experimental - production code should not rely on it yet.

Pitfalls from the Execution Model

Pitfall 1 - NameError from Execution Order

print(message) # NameError: name 'message' is not defined
message = "hello" # This line has not executed yet when print runs

Python executes statements sequentially. There is no hoisting (unlike JavaScript's var declarations). The parser sees message as a valid name syntactically - the error only surfaces at runtime when LOAD_GLOBAL message executes and the name is not yet in the global namespace.

Pitfall 2 - The Mutable Default Argument Trap

This is the single most common Python pitfall caused by misunderstanding the execution model:

def append_to(item, collection=[]): # DEFAULT EVALUATED ONCE AT COMPILE/DEFINITION TIME
collection.append(item)
return collection

print(append_to(1)) # [1]
print(append_to(2)) # [1, 2] ← SURPRISE
print(append_to(3)) # [1, 2, 3] ← the same list object every call

Why this happens: Default argument values are evaluated once, when the def statement executes (stage 5, when the def statement itself runs as bytecode). The default value [] creates one list object on the heap, and the function's code object holds a reference to it permanently. Every call that omits collection gets that same list object - mutating it persists across calls.

The fix:

def append_to(item, collection=None):
if collection is None:
collection = [] # New list created on each call
collection.append(item)
return collection
danger

Never use a mutable object ([], {}, set()) as a default argument value. This trap catches experienced developers off-guard because it looks like the default "resets" on each call - it does not. The default is evaluated exactly once.

Interview Questions and Answers

Q1: What are the five stages of CPython's execution pipeline, and which stage catches syntax errors?

A: The five stages are: (1) source reading, (2) tokenization (lexing), (3) parsing into an AST, (4) bytecode compilation, and (5) PVM execution. Syntax errors are caught at stages 2 and 3 - before any bytecode is generated or executed. A SyntaxError: invalid syntax from the parser means the code's grammatical structure is malformed. Runtime errors like NameError, TypeError, and ZeroDivisionError occur during stage 5.

Q2: What does dis.dis() show, and what do LOAD_FAST and RETURN_VALUE mean?

A: dis.dis() disassembles a function's code object and prints its bytecode instructions in human-readable form. LOAD_FAST pushes a local variable's value from the frame's local array onto the PVM's value stack - it is the fastest variable lookup because local variables are indexed by position, not looked up by name in a dictionary. RETURN_VALUE pops the top of the value stack and returns it to the calling frame, causing the current frame to be destroyed and the PVM to resume executing the caller's bytecode.

Q3: What is a stack frame, what does it contain, and when is it destroyed?

A: A stack frame (PyFrameObject in CPython) is the execution context for a single function invocation. It contains: the code object (bytecode + constants + metadata), a fast-locals array (indexed slots for local variables), references to the global namespace (f_globals) and builtins (f_builtins), a pointer to the enclosing frame (f_back - used for LEGB and tracebacks), the current instruction pointer (f_lasti), and the value stack for the PVM's operand operations. A frame is created when the function is called and destroyed (deallocated or returned to a free-list) when the function returns or propagates an exception to the caller.

Q4: What is a .pyc file, where does it live, and when does CPython regenerate it?

A: A .pyc file is the cached output of CPython's bytecode compilation step. It is stored in the __pycache__/ subdirectory adjacent to the source file, named <module>.cpython-<version>.pyc. The file contains a magic number (interpreter version identifier), a timestamp and size of the source file, and the marshalled code object. CPython regenerates the .pyc whenever the source file's modification timestamp or size changes from what is recorded in the .pyc header, or when no .pyc exists. If the source is unchanged, CPython loads the .pyc directly, skipping tokenization, parsing, and compilation.

Q5: Why does if __name__ == "__main__" prevent code from running on import?

A: When CPython executes a file directly (e.g., python script.py), it sets the module-level __name__ variable to "__main__" before running any code. When the same file is imported by another module, __name__ is set to the module's file-based name (e.g., "script"). Because import causes CPython to execute all of the module's top-level code (stage 5), any top-level statements run unconditionally on import. Guarding side-effecting code with if __name__ == "__main__" ensures it runs only when the file is the direct entry point. This is not a special language feature - it is a simple conditional check on a variable set by the interpreter.

Q6: Why do modules execute only once even if imported multiple times, and what is sys.modules?

A: sys.modules is a dictionary (accessible as sys.modules) that maps module names (strings) to their module objects. The import machinery checks this cache first: if sys.modules["mymodule"] exists, the cached object is returned immediately without re-executing the module's code. Only on the first import does CPython load, compile, and execute the module, then store the resulting module object in sys.modules. This ensures that module-level state (global variables, registered callbacks, singleton objects) is initialized exactly once per process. You can force a re-import using importlib.reload(module), which re-executes the module's code in its existing namespace.

Graded Practice Challenges

Level 1 - Predict the Output

Challenge: What does this code print, and at which stage of the pipeline does each issue (if any) manifest?

x = 5

def compute():
y = x * 2
return y

print(compute())
print(x)
Show Answer

Output:

10
5

Explanation: The function compute() accesses x via the LEGB rule - it finds x in the Global (module-level) scope since there is no local x in compute. The PVM emits LOAD_GLOBAL x for the reference inside compute. No errors occur at any stage. x in the module scope remains 5 because compute() never assigns to x - it only reads it.

Challenge: Does this raise an error? If so, which stage catches it and what type?

def calculate():
print(total)
total = 100
return total

calculate()
Show Answer

Error: UnboundLocalError: local variable 'total' referenced before assignment

Stage: PVM (Stage 5 - runtime). The code is syntactically valid and compiles without error. However, the compiler (stage 4) sees the assignment total = 100 inside calculate() and marks total as a local variable for the entire function body. At runtime, when the PVM executes LOAD_FAST total (for the print call), the local slot for total is empty because STORE_FAST total has not yet executed. This is an UnboundLocalError, a subclass of NameError.

Level 2 - Debug the Code

Challenge: This code has a subtle bug rooted in the execution model. Find it, explain why it happens, and fix it.

def make_counter(start=0, history=[]):
history.append(start)
return history

print(make_counter(1))
print(make_counter(2))
print(make_counter(3))

Expected behavior: each call returns a list containing only the current start value. Actual behavior is different. Why?

Show Answer

Actual output:

[1]
[1, 2]
[1, 2, 3]

Root cause: The default value [] for history is evaluated once when the def statement executes (stage 5, when the function definition is processed as bytecode). A single list object is created on the heap and stored as a constant in the function's code object. Every call that does not pass an explicit history argument receives a reference to this same list object. Mutating it with .append() permanently modifies it.

Fix:

def make_counter(start=0, history=None):
if history is None:
history = [] # New list object created for each call
history.append(start)
return history

print(make_counter(1)) # [1]
print(make_counter(2)) # [2]
print(make_counter(3)) # [3]

The sentinel pattern (None as default, create the mutable inside the function body) is the canonical Python fix for mutable default arguments.

Level 3 - Design and Explain

Challenge: You are building a plugin system. Each plugin is a Python module loaded with importlib.import_module(). A colleague complains that calling load_plugin("analytics") twice crashes on the second call because the plugin's __init__ function re-registers handlers that are already registered.

Design a solution using your knowledge of sys.modules and the import model. Then explain: could you use the __name__ guard in the plugin module to solve this?

Show Answer

Solution using sys.modules:

import importlib
import sys

def load_plugin(name: str):
"""Load a plugin module exactly once. Return cached module on repeat calls."""
module_key = f"plugins.{name}"

if module_key in sys.modules:
# Module already imported and initialized - return cached object
return sys.modules[module_key]

# First import: executes module top-level code (including __init__)
module = importlib.import_module(module_key)
return module

Because sys.modules caches module objects after the first import, subsequent calls to load_plugin("analytics") return the cached module without re-executing its top-level code. This is the standard behavior of import itself - we are just making it explicit and controllable.

Alternative: guard inside the plugin module:

# plugins/analytics.py
_initialized = False

def _init():
global _initialized
if _initialized:
return
# ... register handlers here ...
_initialized = True

_init() # Called on first import; does nothing on reload

Why the __name__ guard does NOT solve this: if __name__ == "__main__" only distinguishes "was this file run directly" from "was it imported." It cannot distinguish "first import" from "second import" - because modules are only executed once anyway (they are cached in sys.modules). The real problem the colleague describes cannot happen with normal imports - it could only happen if they are using importlib.reload(), which re-executes the module even if it is already in sys.modules. The correct fix is to either avoid reload() or use the _initialized guard pattern shown above.

Key insight: The sys.modules cache is the authoritative answer. The __name__ == "__main__" guard solves a completely different problem (preventing side effects on import vs. direct execution).

Quick Reference Cheatsheet

ConceptKey Facts
Execution stagesSource → Tokenizer → Parser/AST → Compiler → PVM
Syntax errors caught atTokenizer (Stage 2) or Parser (Stage 3) - before execution
Runtime errors caught atPVM (Stage 5) - during execution
dis.dis(fn)Prints bytecode instructions for a function
LOAD_FASTPush local variable onto value stack (fastest lookup)
LOAD_GLOBALLook up name in module globals dict
RETURN_VALUEReturn top of stack to caller; destroy current frame
Stack frame containsCode object, locals array, globals ref, builtins ref, f_back, f_lasti
Call stack orderLIFO - last frame pushed is first popped
Objects live onHeap (all Python objects, including int, str, None)
Variable names live inStack frames (as references/pointers to heap objects)
.pyc location__pycache__/<module>.cpython-<version>.pyc
.pyc regenerationWhen source timestamp/size changes from stored header
__name__ when run directly"__main__"
__name__ when importedModule name string (e.g., "mymodule")
sys.modulesDict of all imported modules; prevents re-execution on re-import
LEGB lookupLocal → Enclosing → Global → Built-in
Mutable default trapDefault value evaluated once at def execution; shared across all calls
CPython GILOne thread executes bytecode at a time; protects reference counting
Recursion limit1000 by default; check with sys.getrecursionlimit()

Key Takeaways

  • Python's execution is a structured 5-stage pipeline - source text is never "directly run." Each stage has a specific responsibility and failure mode.
  • Syntax errors are caught before execution (stages 2–3). Runtime errors (NameError, TypeError, ZeroDivisionError) occur during stage 5 (PVM execution).
  • The dis module reveals what Python actually executes. LOAD_FAST is the most efficient variable lookup; LOAD_GLOBAL requires a dictionary search.
  • Every function call creates a stack frame containing local variables, a reference to globals, and a pointer to the calling frame. Frames are LIFO and destroyed on return.
  • All Python objects live on the heap. Variable names in frames are references (pointers) to heap objects - assignment rebinds the reference, it does not copy the object.
  • .pyc files cache bytecode in __pycache__/ and are automatically regenerated when source changes. They are build artifacts and do not belong in version control.
  • __name__ == "__main__" works because CPython sets __name__ to "__main__" for the entry-point module and to the module's name for imported modules.
  • import executes module top-level code exactly once per process. sys.modules caches the resulting module object - subsequent imports return the cached object instantly.
  • LEGB is the PVM's name lookup algorithm: Local frame → Enclosing frames (via f_back) → Module globals → Builtins.
  • The mutable default argument trap exists because default values are evaluated when the def statement executes - once - not on each function call.
  • The CPython GIL ensures only one thread runs bytecode at a time, protecting reference counting. Use multiprocessing for CPU-bound parallelism.
© 2026 EngineersOfAI. All rights reserved.