Python Disassembly with dis Practice Problems & Exercises
Practice: Disassembly with dis
← Back to lessonEasy
Disassemble a simple function and answer the questions about each instruction.
import dis
def add_and_double(x, y):
result = x + y
return result * 2
dis.dis(add_and_double)After running, answer:
- Which opcode loads the parameter
x? What argument index does it use? - Which opcode performs the addition?
- Which opcode stores the result into
result? - Which opcode loads the constant
2? - What is the last opcode executed before the function returns?
Solution
import dis
def add_and_double(x, y):
result = x + y
return result * 2
dis.dis(add_and_double)
Annotated disassembly:
2 RESUME 0 # Entry point / hot-path indicator
3 LOAD_FAST 0 (x) # Push local[0] = x onto stack
LOAD_FAST 1 (y) # Push local[1] = y onto stack
BINARY_OP 0 (+) # Pop y,x; push x+y
STORE_FAST 2 (result) # Pop top; store in local[2]
4 LOAD_FAST 2 (result) # Push local[2] = result
LOAD_CONST 1 (2) # Push constant 2
BINARY_OP 5 (*) # Pop 2, result; push result*2
RETURN_VALUE # Return top of stack
Answers:
LOAD_FASTloadsx. It uses argument index0, meaning index 0 inco_varnames(which contains('x', 'y', 'result')).BINARY_OP 0 (+)performs addition. The0is the operator code for+.STORE_FAST 2 (result)pops the top of stack and stores it inco_varnames[2]=result.LOAD_CONST 1 (2)loads the constant at index 1 ofco_consts. (Index 0 is alwaysNone.)RETURN_VALUEpops and returns the top of the stack. It is always the last instruction of any function.
Reading the stack evolution:
After LOAD_FAST x: [x]
After LOAD_FAST y: [x, y]
After BINARY_OP +: [x+y]
After STORE_FAST result: []
After LOAD_FAST result: [x+y]
After LOAD_CONST 2: [x+y, 2]
After BINARY_OP *: [(x+y)*2]
After RETURN_VALUE: [] (returned (x+y)*2)
Expected Output
See solution — focus on identifying each instructionHints
Hint 1: Run `dis.dis(fn)` and read each column: offset, optional line number, opcode name, argument index, (human-readable argument).
Hint 2: LOAD_FAST loads a local variable. LOAD_CONST loads a literal constant. BINARY_OP performs an arithmetic or comparison operation. RETURN_VALUE returns the top of the stack.
Disassemble this function and identify which LOAD opcode is used for each distinct value. Explain why each one uses the opcode it does.
import dis
MAX_SIZE = 100 # module-level global
def process(data, multiplier):
length = len(data)
if length > MAX_SIZE:
return None
return [x * multiplier for x in data]
dis.dis(process)Solution
import dis
MAX_SIZE = 100
def process(data, multiplier):
length = len(data)
if length > MAX_SIZE:
return None
return [x * multiplier for x in data]
dis.dis(process)
LOAD opcode mapping:
| Value | Opcode | Reason |
|---|---|---|
data (param) | LOAD_FAST | Local variable — array index lookup |
multiplier (param) | LOAD_FAST | Local variable — array index lookup |
length (local) | LOAD_FAST | Local variable assigned in function body |
len (builtin) | LOAD_GLOBAL | Module-level name lookup (builtins fall through to globals) |
MAX_SIZE (global) | LOAD_GLOBAL | Module-level variable |
None | LOAD_CONST | Constant literal |
Why this matters for performance:
LOAD_FASTis an array index lookup — O(1), one C array accessLOAD_GLOBALis a dictionary lookup — O(1) average, but involves hashing and a dict probeLOAD_CONSTis also an array index lookup — same cost as LOAD_FAST
This is why the classic Python optimization tip "cache a global in a local at the top of a tight loop" works:
# Slow: LOAD_GLOBAL on every iteration
for i in range(1000):
result = math.sqrt(i)
# Faster: one LOAD_GLOBAL, then LOAD_FAST on each iteration
sqrt = math.sqrt
for i in range(1000):
result = sqrt(i)
Expected Output
See solution — spot which LOAD opcode is used for each valueHints
Hint 1: LOAD_FAST is for local variables and parameters (fast because it is an array index lookup). LOAD_GLOBAL is for module-level names or builtins. LOAD_CONST is for literal values baked into the bytecode.
Hint 2: Function parameters are always locals, so they always use LOAD_FAST. Module-level variables and imported names use LOAD_GLOBAL. Numbers, strings, None, True, False use LOAD_CONST.
Trace the value stack through this disassembly manually. Fill in the stack state after each instruction.
import dis
def compute(a, b):
return (a + b) * (a - b)
dis.dis(compute)
# Now trace manually for a=5, b=2:
# Expected result: (5+2) * (5-2) = 7 * 3 = 21
print("\nExpected result:", compute(5, 2))Fill in the stack after each instruction:
Instruction Stack (bottom -> top)
RESUME []
LOAD_FAST (a=5) ?
LOAD_FAST (b=2) ?
BINARY_OP + ?
LOAD_FAST (a=5) ?
LOAD_FAST (b=2) ?
BINARY_OP - ?
BINARY_OP * ?
RETURN_VALUE ?
Solution
import dis
def compute(a, b):
return (a + b) * (a - b)
dis.dis(compute)
print("\nExpected result:", compute(5, 2))
Stack trace for compute(5, 2):
Instruction Stack (bottom -> top) Notes
RESUME [] Entry point
LOAD_FAST (a=5) [5] Push a
LOAD_FAST (b=2) [5, 2] Push b
BINARY_OP + [7] Pop 5,2; push 5+2=7
LOAD_FAST (a=5) [7, 5] Push a again
LOAD_FAST (b=2) [7, 5, 2] Push b again
BINARY_OP - [7, 3] Pop 5,2; push 5-2=3
BINARY_OP * [21] Pop 7,3; push 7*3=21
RETURN_VALUE [] Return 21 to caller
Key insight: The expression (a + b) * (a - b) requires loading a and b twice each because CPython is a stack machine — it cannot reuse a computed sub-result that has already been popped off the stack. If you needed to use (a + b) twice, you would need a DUP_TOP instruction or assign the result to a local variable first.
Stack depth observation: The maximum depth here is 3 (after both LOAD_FAST a instructions before the subtraction). The co_stacksize attribute of this function's code object will be 3 — CPython pre-allocates this many slots in the frame's evaluation stack.
Expected Output
Final stack state: [result], where result = 15Hints
Hint 1: Work through each instruction one at a time. Keep a list representing the stack. LOAD instructions push values, BINARY_OP pops two and pushes one, STORE pops one.
Hint 2: The stack grows to the right. Start with an empty stack []. At RETURN_VALUE, the top of the stack is the return value.
Compare the bytecode for an if/else statement versus a ternary expression that performs the same logic.
import dis
def with_if(x):
if x > 0:
result = "positive"
else:
result = "non-positive"
return result
def with_ternary(x):
return "positive" if x > 0 else "non-positive"
print("=== if/else ===")
dis.dis(with_if)
print("\n=== ternary ===")
dis.dis(with_ternary)
# Count instructions
if_count = len(list(dis.get_instructions(with_if)))
tern_count = len(list(dis.get_instructions(with_ternary)))
print(f"\nif/else instruction count: {if_count}")
print(f"ternary instruction count: {tern_count}")Solution
import dis
def with_if(x):
if x > 0:
result = "positive"
else:
result = "non-positive"
return result
def with_ternary(x):
return "positive" if x > 0 else "non-positive"
print("=== if/else ===")
dis.dis(with_if)
print("\n=== ternary ===")
dis.dis(with_ternary)
if_count = len(list(dis.get_instructions(with_if)))
tern_count = len(list(dis.get_instructions(with_ternary)))
print(f"\nif/else instruction count: {if_count}")
print(f"ternary instruction count: {tern_count}")
Typical bytecode comparison:
The if/else version has more instructions because it uses STORE_FAST to save to a local variable result, then LOAD_FAST result to return it. The ternary version avoids this intermediate storage — it puts the chosen string directly on the stack and returns it immediately.
Both use a conditional jump opcode (POP_JUMP_IF_FALSE or similar) for the x > 0 check. The jump targets differ slightly, but the core logic is the same.
When does the ternary save instructions?
The ternary is more concise at the bytecode level when:
- The result is used immediately (not assigned to a variable used later)
- The condition and both branches are simple expressions
When the branches are complex multi-line blocks, use if/else for readability — the bytecode difference is negligible.
Important: Neither form is significantly faster than the other. The jump overhead is the same. Prefer whichever is more readable for humans.
Expected Output
See solution — compare jump instructionsHints
Hint 1: Both `if/else` and ternary expressions compile to conditional jump opcodes. Look for `POP_JUMP_IF_FALSE`, `POP_JUMP_IF_TRUE`, or `JUMP_FORWARD`.
Hint 2: The ternary expression `x if cond else y` compiles to a conditional jump that either loads `x` or `y`. A full `if/else` block may produce slightly different jump patterns.
Medium
Disassemble a for loop. Identify the GET_ITER, FOR_ITER, and JUMP_BACKWARD opcodes. Trace the loop structure and count per-iteration instruction cost.
import dis
def sum_squares(n):
total = 0
for i in range(n):
total += i * i
return total
print("=== Disassembly ===")
dis.dis(sum_squares)
# Instruction count analysis
instructions = list(dis.get_instructions(sum_squares))
print(f"\nTotal instructions in function: {len(instructions)}")
loop_opcodes = ["STORE_FAST", "LOAD_FAST", "BINARY_OP", "FOR_ITER", "JUMP_BACKWARD"]
loop_instr = [i for i in instructions if i.opname in loop_opcodes]
print(f"Loop-related instructions: {len(loop_instr)}")Solution
import dis
def sum_squares(n):
total = 0
for i in range(n):
total += i * i
return total
print("=== Disassembly ===")
dis.dis(sum_squares)
instructions = list(dis.get_instructions(sum_squares))
print(f"\nTotal instructions in function: {len(instructions)}")
loop_opcodes = ["STORE_FAST", "LOAD_FAST", "BINARY_OP", "FOR_ITER", "JUMP_BACKWARD"]
loop_instr = [i for i in instructions if i.opname in loop_opcodes]
print(f"Loop-related instructions: {len(loop_instr)}")
Annotated loop structure:
LOAD_CONST 0 # Push 0 (initial total)
STORE_FAST total # total = 0
LOAD_GLOBAL range # Push range
LOAD_FAST n # Push n
CALL 1 # range(n) -> iterator (still a range object)
GET_ITER # Convert to iterator -> iter(range(n))
[LOOP START]
FOR_ITER <end> # Call next(); if StopIteration, jump to <end>
STORE_FAST i # i = next_value
LOAD_FAST total # Push total
LOAD_FAST i # Push i
LOAD_FAST i # Push i
BINARY_OP * # i * i
BINARY_OP += # total += i*i (in-place add)
STORE_FAST total # Store result
JUMP_BACKWARD <loop> # Go back to FOR_ITER
[LOOP END]
LOAD_FAST total
RETURN_VALUE
Per-iteration instructions (approximate): FOR_ITER + STORE_FAST + LOAD_FAST(3) + BINARY_OP(2) + STORE_FAST + JUMP_BACKWARD = ~9 instructions per iteration.
For n = 1,000,000 that is ~9 million bytecode dispatches just for this loop. This is why vectorized NumPy operations (which execute the loop in C) are orders of magnitude faster for numeric work.
import dis
def sum_squares(n):
total = 0
for i in range(n):
total += i * i
return total
# 1. Disassemble and identify the loop structure
# 2. Label: where does the loop begin, where does FOR_ITER jump on exhaustion?
# 3. Count how many instructions execute per loop iterationExpected Output
See solution for loop structure analysisHints
Hint 1: A for loop compiles to: GET_ITER (convert iterable to iterator), then a loop of FOR_ITER (call next()) / body / JUMP_BACKWARD. When the iterator is exhausted, FOR_ITER jumps past the end of the loop.
Hint 2: Count only the instructions inside the loop body (between FOR_ITER and JUMP_BACKWARD). These execute N times for N iterations.
Compare the bytecode of a list built with a comprehension versus a for loop with .append(). Find the LIST_APPEND opcode and explain why it is faster than the CALL approach.
import dis
import types
def build_with_loop(n):
result = []
for i in range(n):
result.append(i * i)
return result
def build_with_comp(n):
return [i * i for i in range(n)]
print("=== Loop version ===")
dis.dis(build_with_loop)
print("\n=== Comprehension outer ===")
dis.dis(build_with_comp)
# The comprehension has a nested code object — disassemble it too
for const in build_with_comp.__code__.co_consts:
if isinstance(const, types.CodeType):
print(f"\n=== Comprehension inner [{const.co_name}] ===")
dis.dis(const)Solution
import dis
import types
def build_with_loop(n):
result = []
for i in range(n):
result.append(i * i)
return result
def build_with_comp(n):
return [i * i for i in range(n)]
print("=== Loop version ===")
dis.dis(build_with_loop)
print("\n=== Comprehension outer ===")
dis.dis(build_with_comp)
for const in build_with_comp.__code__.co_consts:
if isinstance(const, types.CodeType):
print(f"\n=== Comprehension inner [{const.co_name}] ===")
dis.dis(const)
The key difference:
Loop version per-iteration (inside the loop):
LOAD_FAST result # Push result list
LOAD_ATTR append # Look up .append method (dict lookup!)
LOAD_FAST i # Push i
LOAD_FAST i # Push i
BINARY_OP * # i*i
CALL 1 # Call append(i*i) — creates call frame
POP_TOP # Discard None return value of append
Comprehension inner per-iteration:
LOAD_FAST i # Push i (from .0 iterator arg)
LOAD_FAST i # Push i
BINARY_OP * # i*i
LIST_APPEND 1 # Direct C-level append, no method lookup, no CALL
Why LIST_APPEND is faster:
- No attribute lookup —
LOAD_ATTRis a dictionary lookup through the object's type - No function call overhead —
CALLcreates a new frame, pushes arguments, dispatches - No
POP_TOP—LIST_APPENDdoes not produce a return value
LIST_APPEND is implemented in ceval.c as a direct call to PyList_Append() — it skips all the Python method dispatch machinery entirely.
Expected Output
See solution for LIST_APPEND vs CALL comparisonHints
Hint 1: A list comprehension compiles with the specialized LIST_APPEND opcode inside its inner code object. A loop uses LOAD_ATTR to look up .append, then CALL to invoke it.
Hint 2: Use dis.dis() on both. For the comprehension, also disassemble the nested code object (found in the outer function's co_consts).
Compare the bytecode generated by three Python string formatting approaches: f-strings, .format(), and % formatting.
import dis
name = "world"
count = 42
def use_fstring(name, count):
return f"Hello, {name}! Count: {count}"
def use_format(name, count):
return "Hello, {}! Count: {}".format(name, count)
def use_percent(name, count):
return "Hello, %s! Count: %d" % (name, count)
for fn in [use_fstring, use_format, use_percent]:
instructions = list(dis.get_instructions(fn))
print(f"\n--- {fn.__name__} ({len(instructions)} instructions) ---")
dis.dis(fn)Solution
import dis
def use_fstring(name, count):
return f"Hello, {name}! Count: {count}"
def use_format(name, count):
return "Hello, {}! Count: {}".format(name, count)
def use_percent(name, count):
return "Hello, %s! Count: %d" % (name, count)
for fn in [use_fstring, use_format, use_percent]:
instructions = list(dis.get_instructions(fn))
print(f"\n--- {fn.__name__} ({len(instructions)} instructions) ---")
dis.dis(fn)
Bytecode patterns for each approach:
f-string (FORMAT_VALUE + BUILD_STRING):
LOAD_CONST "Hello, "
LOAD_FAST name
FORMAT_VALUE # Convert name to str (fast, specialized opcode)
LOAD_CONST "! Count: "
LOAD_FAST count
FORMAT_VALUE # Convert count to str
BUILD_STRING 4 # Concatenate 4 string pieces
No attribute lookup, no function call machinery.
.format() (LOAD_ATTR + CALL):
LOAD_CONST "Hello, {}! Count: {}"
LOAD_METHOD format # Dict lookup for .format method
LOAD_FAST name
LOAD_FAST count
CALL_METHOD 2 # Invokes str.format() — full Python/C dispatch
Requires attribute lookup and a full method call.
% formatting (BINARY_OP %):
LOAD_CONST "Hello, %s! Count: %d"
LOAD_FAST name
LOAD_FAST count
BUILD_TUPLE 2 # Pack args into tuple
BINARY_OP % # String formatting via __mod__ dispatch
One BINARY_OP but it invokes complex C-level parsing of the format string.
Performance ranking: f-strings are typically fastest for simple cases (no complex format specs), followed by %, then .format(). For format specs like :.2f, the ranking may change. In practice, the difference is tiny unless you are in a tight loop formatting millions of strings.
Expected Output
See solution for opcode comparison across three formatting stylesHints
Hint 1: f-strings compile to FORMAT_VALUE + BUILD_STRING opcodes. str.format() compiles to LOAD_METHOD + CALL. % formatting compiles to BINARY_OP %.
Hint 2: Count the total instructions for each. f-strings are generally the most efficient because FORMAT_VALUE is a specialized opcode with no attribute lookup overhead.
Predict the co_stacksize for each function before running the code. Then verify your predictions.
def f1(x):
return x + 1
def f2(x, y, z):
return x + y + z
def f3(x):
return x * x + x * 2 + 1
def f4(n):
return sum(range(n))
for fn in [f1, f2, f3, f4]:
print(f"{fn.__name__}: co_stacksize = {fn.__code__.co_stacksize}")Predict before running:
f1:x + 1— what is the max stack depth?f2:x + y + z— does adding more terms deepen the stack?f3:x*x + x*2 + 1— more complex expressionf4:sum(range(n))— a function call
Solution
def f1(x):
return x + 1
def f2(x, y, z):
return x + y + z
def f3(x):
return x * x + x * 2 + 1
def f4(n):
return sum(range(n))
for fn in [f1, f2, f3, f4]:
print(f"{fn.__name__}: co_stacksize = {fn.__code__.co_stacksize}")
Expected outputs:
f1: co_stacksize = 2
f2: co_stacksize = 2
f3: co_stacksize = 2
f4: co_stacksize = 3
Stack trace for each:
f1 (x + 1): LOAD_FAST x → depth 1, LOAD_CONST 1 → depth 2, BINARY_OP → depth 1. Max = 2.
f2 (x + y + z): This evaluates left-to-right as (x + y) + z. LOAD x → 1, LOAD y → 2, BINARY_OP → 1, LOAD z → 2, BINARY_OP → 1. Max = 2. Notice that a chain of additions does NOT grow the stack — intermediate results are consumed immediately.
f3 (x*x + x*2 + 1): Evaluates as ((x*x) + (x*2)) + 1. The subexpressions xx and x2 each need depth 2. The result is accumulated. Max = 2. The compiler evaluates left to right, consuming results eagerly.
f4 (sum(range(n))): LOAD_GLOBAL sum → 1, LOAD_GLOBAL range → 2, LOAD_FAST n → 3, CALL range(n) → 2, CALL sum(...) → 1. Max = 3 because at the peak, both sum and range are on the stack simultaneously before the range(n) call.
Key insight: co_stacksize is computed by the compiler (not at runtime) as the worst-case stack depth across all code paths. It is used to pre-allocate the frame's value array — there is no dynamic resizing of the stack during execution.
Expected Output
See solution for stack depth predictions and explanationsHints
Hint 1: co_stacksize is the MAXIMUM depth the stack reaches at any point during execution. Trace through the instructions and track the high-water mark.
Hint 2: Each LOAD operation +1, each BINARY_OP -1 (pops 2, pushes 1), CALL pops args+1 (function) and pushes 1 result, STORE_FAST -1.
Hard
Build a bytecode profiler that analyses a function's compiled code and flags potentially expensive instruction patterns, particularly LOAD_GLOBAL inside loops.
import dis
from collections import Counter
def bytecode_profile(func):
instructions = list(dis.get_instructions(func))
counts = Counter(i.opname for i in instructions)
print(f"=== Bytecode Profile: {func.__name__} ===")
print(f"Total instructions: {len(instructions)}")
print()
# Top opcodes
print("Top 8 opcodes:")
for opname, count in counts.most_common(8):
bar = "=" * count
print(f" {opname:<25} {count:>3} {bar}")
print()
# Flag expensive patterns
warnings = []
in_loop = False
for i, instr in enumerate(instructions):
if instr.opname == "GET_ITER":
in_loop = True
if instr.opname == "JUMP_BACKWARD":
in_loop = False
if in_loop and instr.opname == "LOAD_GLOBAL":
warnings.append(f" LOAD_GLOBAL '{instr.argval}' inside loop at offset {instr.offset}")
if warnings:
print("Performance warnings:")
for w in warnings:
print(w)
else:
print("No performance warnings.")
# Test: a function with LOAD_GLOBAL in a loop
import math
def compute_norms(vectors):
results = []
for v in vectors:
norm = math.sqrt(v[0]**2 + v[1]**2) # LOAD_GLOBAL math in loop
results.append(norm)
return results
bytecode_profile(compute_norms)
Solution
import dis
from collections import Counter
def bytecode_profile(func):
instructions = list(dis.get_instructions(func))
counts = Counter(i.opname for i in instructions)
print(f"=== Bytecode Profile: {func.__name__} ===")
print(f"Total instructions: {len(instructions)}")
print()
print("Top 8 opcodes:")
for opname, count in counts.most_common(8):
bar = "=" * count
print(f" {opname:<25} {count:>3} {bar}")
print()
warnings = []
in_loop = False
for i, instr in enumerate(instructions):
if instr.opname == "GET_ITER":
in_loop = True
if instr.opname == "JUMP_BACKWARD":
in_loop = False
if in_loop and instr.opname == "LOAD_GLOBAL":
warnings.append(f" LOAD_GLOBAL '{instr.argval}' inside loop at offset {instr.offset}")
if warnings:
print("Performance warnings:")
for w in warnings:
print(w)
else:
print("No performance warnings.")
import math
def compute_norms(vectors):
results = []
for v in vectors:
norm = math.sqrt(v[0]**2 + v[1]**2)
results.append(norm)
return results
bytecode_profile(compute_norms)
The LOAD_GLOBAL-in-loop optimization:
When you write math.sqrt inside a loop, CPython executes:
LOAD_GLOBAL math— dictionary lookup in the module's globals (hash + probe)LOAD_ATTR sqrt— dictionary lookup on the math module object (another hash + probe)
These two dictionary lookups happen on every iteration. For a loop with 1 million iterations, that is 2 million unnecessary dict lookups.
The fix:
sqrt = math.sqrt # One LOAD_GLOBAL, one LOAD_ATTR, outside the loop
def compute_norms_fast(vectors):
results = []
for v in vectors:
norm = sqrt(v[0]**2 + v[1]**2) # LOAD_FAST — array index, no hashing
results.append(norm)
return results
This is exactly what CPython's bytecode shows: the sqrt local is loaded with LOAD_FAST instead of LOAD_GLOBAL + LOAD_ATTR. Real-world speedup: 15-30% for math-heavy loops.
import dis
from collections import Counter
def bytecode_profile(func):
"""Analyze a function's bytecode and produce a performance profile:
- Total instruction count
- Opcode frequency distribution
- Estimated 'hot' vs 'cold' operations
- Flag any potentially expensive patterns (LOAD_GLOBAL in loops, etc.)
"""
passExpected Output
See solution for full profiler outputHints
Hint 1: Use dis.get_instructions() to iterate all instructions. Count by opname. Flag LOAD_GLOBAL as "potentially slow" (dict lookup vs LOAD_FAST array access).
Hint 2: To detect LOAD_GLOBAL inside a loop, look for LOAD_GLOBAL instructions that appear between a GET_ITER and a JUMP_BACKWARD in the instruction sequence.
Build a partial decompiler that translates CPython bytecode back into pseudocode. This is how tools like uncompyle6 work.
import dis
def bytecode_to_pseudocode(func):
instructions = list(dis.get_instructions(func))
stack = []
output = []
binary_ops = {
0: "+", 1: "&", 2: "//", 3: "@", 4: "<<", 5: "*",
6: "%", 7: "|", 8: "**", 9: ">>", 10: "-",
11: "/", 12: "^", 13: "+=", 14: "&=",
}
compare_ops = {0: "<", 1: "<=", 2: "==", 3: "!=", 4: ">", 5: ">="}
for instr in instructions:
op = instr.opname
if op == "RESUME":
continue
elif op in ("LOAD_FAST", "LOAD_GLOBAL", "LOAD_DEREF"):
stack.append(str(instr.argval))
elif op == "LOAD_CONST":
stack.append(repr(instr.argval))
elif op == "BINARY_OP":
rhs = stack.pop()
lhs = stack.pop()
sym = binary_ops.get(instr.arg, "?")
stack.append(f"({lhs} {sym} {rhs})")
elif op == "COMPARE_OP":
rhs = stack.pop()
lhs = stack.pop()
sym = compare_ops.get(instr.arg, "?")
stack.append(f"({lhs} {sym} {rhs})")
elif op == "STORE_FAST":
val = stack.pop() if stack else "?"
output.append(f" {instr.argval} = {val}")
elif op in ("POP_JUMP_IF_FALSE", "POP_JUMP_FORWARD_IF_FALSE"):
cond = stack.pop() if stack else "?"
output.append(f" if not {cond}: goto offset {instr.argval}")
elif op == "RETURN_VALUE":
val = stack.pop() if stack else "?"
output.append(f" return {val}")
else:
output.append(f" # [{op} {instr.argval}]")
return "\n".join(output)
# Test
def hypotenuse(a, b):
c_sq = a * a + b * b
return c_sq ** 0.5
print(f"def {hypotenuse.__name__}(a, b):")
print(bytecode_to_pseudocode(hypotenuse))
Solution
import dis
def bytecode_to_pseudocode(func):
instructions = list(dis.get_instructions(func))
stack = []
output = []
binary_ops = {
0: "+", 1: "&", 2: "//", 3: "@", 4: "<<", 5: "*",
6: "%", 7: "|", 8: "**", 9: ">>", 10: "-",
11: "/", 12: "^", 13: "+=", 14: "&=",
}
compare_ops = {0: "<", 1: "<=", 2: "==", 3: "!=", 4: ">", 5: ">="}
for instr in instructions:
op = instr.opname
if op == "RESUME":
continue
elif op in ("LOAD_FAST", "LOAD_GLOBAL", "LOAD_DEREF"):
stack.append(str(instr.argval))
elif op == "LOAD_CONST":
stack.append(repr(instr.argval))
elif op == "BINARY_OP":
rhs = stack.pop()
lhs = stack.pop()
sym = binary_ops.get(instr.arg, "?")
stack.append(f"({lhs} {sym} {rhs})")
elif op == "COMPARE_OP":
rhs = stack.pop()
lhs = stack.pop()
sym = compare_ops.get(instr.arg, "?")
stack.append(f"({lhs} {sym} {rhs})")
elif op == "STORE_FAST":
val = stack.pop() if stack else "?"
output.append(f" {instr.argval} = {val}")
elif op in ("POP_JUMP_IF_FALSE", "POP_JUMP_FORWARD_IF_FALSE"):
cond = stack.pop() if stack else "?"
output.append(f" if not {cond}: goto offset {instr.argval}")
elif op == "RETURN_VALUE":
val = stack.pop() if stack else "?"
output.append(f" return {val}")
else:
output.append(f" # [{op} {instr.argval}]")
return "\n".join(output)
def hypotenuse(a, b):
c_sq = a * a + b * b
return c_sq ** 0.5
print(f"def {hypotenuse.__name__}(a, b):")
print(bytecode_to_pseudocode(hypotenuse))
Expected pseudocode output:
def hypotenuse(a, b):
c_sq = ((a * a) + (b * b))
return (c_sq ** 0.5)
Why decompilation is hard in general:
This simple version handles straight-line code well but struggles with:
- Control flow: Jumps create branches that require reconstructing
if/while/forstructures from raw offsets (this is the core of decompiler research) - Function calls:
CALLpops N arguments plus the function — needs special handling - Comprehensions: Nested code objects must be decompiled separately and inlined
- Exception handling:
SETUP_FINALLY,PUSH_EXC_INFO, etc. create complex control flow graphs
Real decompilers (uncompyle6, decompile3) use control flow graph analysis to reconstruct the original structure. They build a graph of basic blocks connected by jump edges, then match patterns against known compiler output templates to reconstruct if, while, for, try/except from the raw jump instructions.
import dis
def bytecode_to_pseudocode(func):
"""Translate a function's bytecode into human-readable pseudocode.
Handle: LOAD_FAST, LOAD_CONST, LOAD_GLOBAL, BINARY_OP,
STORE_FAST, COMPARE_OP, POP_JUMP_IF_FALSE, RETURN_VALUE.
Use a stack to track operands and build expressions.
"""
passExpected Output
See solution for pseudocode outputHints
Hint 1: Model the value stack as a Python list of strings. LOAD_ instructions push name strings; BINARY_OP pops two strings and pushes a combined expression string like "(a + b)".
Hint 2: For STORE_FAST, emit an assignment statement. For RETURN_VALUE, emit "return <top>". Jumps are the hardest part — simplify by emitting "if <condition>: goto <offset>".
Build a complexity scorer that assigns a numeric score to a function based on its bytecode characteristics. Higher scores indicate code that is likely slower.
import dis
from dataclasses import dataclass, field
from typing import List
@dataclass
class ComplexityReport:
function_name: str
total_instructions: int
call_count: int
load_global_count: int
loop_count: int
complexity_score: float
warnings: List[str] = field(default_factory=list)
def score_complexity(func) -> ComplexityReport:
instructions = list(dis.get_instructions(func))
loop_depth = 0
call_count = 0
load_global_count = 0
loop_count = 0
score = 0
warnings = []
for instr in instructions:
op = instr.opname
score += 1 # baseline per instruction
if op == "GET_ITER":
loop_depth += 1
loop_count += 1
score += 10
elif op == "JUMP_BACKWARD":
loop_depth = max(0, loop_depth - 1)
elif op in ("CALL", "CALL_FUNCTION", "CALL_METHOD"):
call_count += 1
score += 5
elif op == "LOAD_GLOBAL":
load_global_count += 1
score += 3
if loop_depth > 0:
score += 20
warnings.append(
f"LOAD_GLOBAL '{instr.argval}' inside loop at offset {instr.offset}"
)
return ComplexityReport(
function_name=func.__name__,
total_instructions=len(instructions),
call_count=call_count,
load_global_count=load_global_count,
loop_count=loop_count,
complexity_score=score,
warnings=warnings,
)
def print_report(report):
print(f"\n=== {report.function_name} ===")
print(f" Total instructions: {report.total_instructions}")
print(f" CALL count: {report.call_count}")
print(f" LOAD_GLOBAL count: {report.load_global_count}")
print(f" Loop count: {report.loop_count}")
print(f" Complexity score: {report.complexity_score}")
if report.warnings:
print(f" Warnings:")
for w in report.warnings:
print(f" {w}")
# Test functions
import math
def simple(x, y):
return x + y
def loop_with_globals(data):
result = []
for item in data:
result.append(math.sqrt(item))
return result
def loop_with_local(data):
sqrt = math.sqrt
result = []
for item in data:
result.append(sqrt(item))
return result
for fn in [simple, loop_with_globals, loop_with_local]:
print_report(score_complexity(fn))
Solution
import dis
from dataclasses import dataclass, field
from typing import List
@dataclass
class ComplexityReport:
function_name: str
total_instructions: int
call_count: int
load_global_count: int
loop_count: int
complexity_score: float
warnings: List[str] = field(default_factory=list)
def score_complexity(func) -> ComplexityReport:
instructions = list(dis.get_instructions(func))
loop_depth = 0
call_count = 0
load_global_count = 0
loop_count = 0
score = 0
warnings = []
for instr in instructions:
op = instr.opname
score += 1
if op == "GET_ITER":
loop_depth += 1
loop_count += 1
score += 10
elif op == "JUMP_BACKWARD":
loop_depth = max(0, loop_depth - 1)
elif op in ("CALL", "CALL_FUNCTION", "CALL_METHOD"):
call_count += 1
score += 5
elif op == "LOAD_GLOBAL":
load_global_count += 1
score += 3
if loop_depth > 0:
score += 20
warnings.append(
f"LOAD_GLOBAL '{instr.argval}' inside loop at offset {instr.offset}"
)
return ComplexityReport(
function_name=func.__name__,
total_instructions=len(instructions),
call_count=call_count,
load_global_count=load_global_count,
loop_count=loop_count,
complexity_score=score,
warnings=warnings,
)
def print_report(report):
print(f"\n=== {report.function_name} ===")
print(f" Total instructions: {report.total_instructions}")
print(f" CALL count: {report.call_count}")
print(f" LOAD_GLOBAL count: {report.load_global_count}")
print(f" Loop count: {report.loop_count}")
print(f" Complexity score: {report.complexity_score}")
if report.warnings:
print(f" Warnings:")
for w in report.warnings:
print(f" {w}")
import math
def simple(x, y):
return x + y
def loop_with_globals(data):
result = []
for item in data:
result.append(math.sqrt(item))
return result
def loop_with_local(data):
sqrt = math.sqrt
result = []
for item in data:
result.append(sqrt(item))
return result
for fn in [simple, loop_with_globals, loop_with_local]:
print_report(score_complexity(fn))
Expected complexity scores:
simple: low (5-10) — no loops, no globals, two LOAD_FASTloop_with_globals: high (50+) — loop with LOAD_GLOBALmathandappendon every iterationloop_with_local: medium (25-35) — loop without LOAD_GLOBAL inside the body (sqrt is LOAD_FAST)
Extensions:
- Add scoring for nested loops (multiply by nesting depth)
- Flag
LOAD_ATTRinside loops (attribute lookups are also dict-based) - Add a
suggest_fix()method that generates specific optimization advice based on warnings - Integrate with
ast.parse()to also report cyclomatic complexity alongside bytecode complexity
import dis
from dataclasses import dataclass, field
from typing import List
@dataclass
class ComplexityReport:
function_name: str
total_instructions: int
call_count: int
load_global_count: int
loop_count: int
complexity_score: float
warnings: List[str] = field(default_factory=list)
def score_complexity(func) -> ComplexityReport:
"""Score a function's bytecode complexity.
Scoring rules:
+1 per instruction (baseline)
+5 per CALL instruction (expensive dispatch)
+3 per LOAD_GLOBAL (dict lookup vs array access)
+10 per loop (GET_ITER = one loop)
+20 if LOAD_GLOBAL appears inside a loop
"""
passExpected Output
See solution for complexity scoring reportHints
Hint 1: Track loop state with a counter that increments on GET_ITER and decrements on JUMP_BACKWARD. A LOAD_GLOBAL when the counter > 0 is inside a loop.
Hint 2: Accumulate the score using the rules given. Package everything into the ComplexityReport dataclass.
