Skip to main content

Python Disassembly with dis Practice Problems & Exercises

Practice: Disassembly with dis

11 problems4 Easy4 Medium3 Hard55–75 min
← Back to lesson

Easy

#1Read Your First DisassemblyEasy
disLOAD_FASTBINARY_OPRETURN_VALUE

Disassemble a simple function and answer the questions about each instruction.

Python
import dis

def add_and_double(x, y):
    result = x + y
    return result * 2

dis.dis(add_and_double)

After running, answer:

  1. Which opcode loads the parameter x? What argument index does it use?
  2. Which opcode performs the addition?
  3. Which opcode stores the result into result?
  4. Which opcode loads the constant 2?
  5. What is the last opcode executed before the function returns?
Solution
import dis

def add_and_double(x, y):
result = x + y
return result * 2

dis.dis(add_and_double)

Annotated disassembly:

2 RESUME 0 # Entry point / hot-path indicator
3 LOAD_FAST 0 (x) # Push local[0] = x onto stack
LOAD_FAST 1 (y) # Push local[1] = y onto stack
BINARY_OP 0 (+) # Pop y,x; push x+y
STORE_FAST 2 (result) # Pop top; store in local[2]
4 LOAD_FAST 2 (result) # Push local[2] = result
LOAD_CONST 1 (2) # Push constant 2
BINARY_OP 5 (*) # Pop 2, result; push result*2
RETURN_VALUE # Return top of stack

Answers:

  1. LOAD_FAST loads x. It uses argument index 0, meaning index 0 in co_varnames (which contains ('x', 'y', 'result')).
  2. BINARY_OP 0 (+) performs addition. The 0 is the operator code for +.
  3. STORE_FAST 2 (result) pops the top of stack and stores it in co_varnames[2] = result.
  4. LOAD_CONST 1 (2) loads the constant at index 1 of co_consts. (Index 0 is always None.)
  5. RETURN_VALUE pops and returns the top of the stack. It is always the last instruction of any function.

Reading the stack evolution:

After LOAD_FAST x: [x]
After LOAD_FAST y: [x, y]
After BINARY_OP +: [x+y]
After STORE_FAST result: []
After LOAD_FAST result: [x+y]
After LOAD_CONST 2: [x+y, 2]
After BINARY_OP *: [(x+y)*2]
After RETURN_VALUE: [] (returned (x+y)*2)
Expected Output
See solution — focus on identifying each instruction
Hints

Hint 1: Run `dis.dis(fn)` and read each column: offset, optional line number, opcode name, argument index, (human-readable argument).

Hint 2: LOAD_FAST loads a local variable. LOAD_CONST loads a literal constant. BINARY_OP performs an arithmetic or comparison operation. RETURN_VALUE returns the top of the stack.

#2LOAD_FAST vs LOAD_GLOBAL vs LOAD_CONSTEasy
LOAD_FASTLOAD_GLOBALLOAD_CONSTopcodes

Disassemble this function and identify which LOAD opcode is used for each distinct value. Explain why each one uses the opcode it does.

Python
import dis

MAX_SIZE = 100  # module-level global

def process(data, multiplier):
    length = len(data)
    if length > MAX_SIZE:
        return None
    return [x * multiplier for x in data]

dis.dis(process)
Solution
import dis

MAX_SIZE = 100

def process(data, multiplier):
length = len(data)
if length > MAX_SIZE:
return None
return [x * multiplier for x in data]

dis.dis(process)

LOAD opcode mapping:

ValueOpcodeReason
data (param)LOAD_FASTLocal variable — array index lookup
multiplier (param)LOAD_FASTLocal variable — array index lookup
length (local)LOAD_FASTLocal variable assigned in function body
len (builtin)LOAD_GLOBALModule-level name lookup (builtins fall through to globals)
MAX_SIZE (global)LOAD_GLOBALModule-level variable
NoneLOAD_CONSTConstant literal

Why this matters for performance:

  • LOAD_FAST is an array index lookup — O(1), one C array access
  • LOAD_GLOBAL is a dictionary lookup — O(1) average, but involves hashing and a dict probe
  • LOAD_CONST is also an array index lookup — same cost as LOAD_FAST

This is why the classic Python optimization tip "cache a global in a local at the top of a tight loop" works:

# Slow: LOAD_GLOBAL on every iteration
for i in range(1000):
result = math.sqrt(i)

# Faster: one LOAD_GLOBAL, then LOAD_FAST on each iteration
sqrt = math.sqrt
for i in range(1000):
result = sqrt(i)
Expected Output
See solution — spot which LOAD opcode is used for each value
Hints

Hint 1: LOAD_FAST is for local variables and parameters (fast because it is an array index lookup). LOAD_GLOBAL is for module-level names or builtins. LOAD_CONST is for literal values baked into the bytecode.

Hint 2: Function parameters are always locals, so they always use LOAD_FAST. Module-level variables and imported names use LOAD_GLOBAL. Numbers, strings, None, True, False use LOAD_CONST.

#3Trace the Value Stack by HandEasy
value-stackstack-tracebytecode

Trace the value stack through this disassembly manually. Fill in the stack state after each instruction.

Python
import dis

def compute(a, b):
    return (a + b) * (a - b)

dis.dis(compute)

# Now trace manually for a=5, b=2:
# Expected result: (5+2) * (5-2) = 7 * 3 = 21
print("\nExpected result:", compute(5, 2))

Fill in the stack after each instruction:

Instruction Stack (bottom -> top)
RESUME []
LOAD_FAST (a=5) ?
LOAD_FAST (b=2) ?
BINARY_OP + ?
LOAD_FAST (a=5) ?
LOAD_FAST (b=2) ?
BINARY_OP - ?
BINARY_OP * ?
RETURN_VALUE ?
Solution
import dis

def compute(a, b):
return (a + b) * (a - b)

dis.dis(compute)
print("\nExpected result:", compute(5, 2))

Stack trace for compute(5, 2):

Instruction Stack (bottom -> top) Notes
RESUME [] Entry point
LOAD_FAST (a=5) [5] Push a
LOAD_FAST (b=2) [5, 2] Push b
BINARY_OP + [7] Pop 5,2; push 5+2=7
LOAD_FAST (a=5) [7, 5] Push a again
LOAD_FAST (b=2) [7, 5, 2] Push b again
BINARY_OP - [7, 3] Pop 5,2; push 5-2=3
BINARY_OP * [21] Pop 7,3; push 7*3=21
RETURN_VALUE [] Return 21 to caller

Key insight: The expression (a + b) * (a - b) requires loading a and b twice each because CPython is a stack machine — it cannot reuse a computed sub-result that has already been popped off the stack. If you needed to use (a + b) twice, you would need a DUP_TOP instruction or assign the result to a local variable first.

Stack depth observation: The maximum depth here is 3 (after both LOAD_FAST a instructions before the subtraction). The co_stacksize attribute of this function's code object will be 3 — CPython pre-allocates this many slots in the frame's evaluation stack.

Expected Output
Final stack state: [result], where result = 15
Hints

Hint 1: Work through each instruction one at a time. Keep a list representing the stack. LOAD instructions push values, BINARY_OP pops two and pushes one, STORE pops one.

Hint 2: The stack grows to the right. Start with an empty stack []. At RETURN_VALUE, the top of the stack is the return value.

#4Spot the Opcode Difference: if vs ternaryEasy
POP_JUMP_IF_FALSEternaryconditionalopcodes

Compare the bytecode for an if/else statement versus a ternary expression that performs the same logic.

Python
import dis

def with_if(x):
    if x > 0:
        result = "positive"
    else:
        result = "non-positive"
    return result

def with_ternary(x):
    return "positive" if x > 0 else "non-positive"

print("=== if/else ===")
dis.dis(with_if)

print("\n=== ternary ===")
dis.dis(with_ternary)

# Count instructions
if_count = len(list(dis.get_instructions(with_if)))
tern_count = len(list(dis.get_instructions(with_ternary)))
print(f"\nif/else instruction count:  {if_count}")
print(f"ternary instruction count:  {tern_count}")
Solution
import dis

def with_if(x):
if x > 0:
result = "positive"
else:
result = "non-positive"
return result

def with_ternary(x):
return "positive" if x > 0 else "non-positive"

print("=== if/else ===")
dis.dis(with_if)

print("\n=== ternary ===")
dis.dis(with_ternary)

if_count = len(list(dis.get_instructions(with_if)))
tern_count = len(list(dis.get_instructions(with_ternary)))
print(f"\nif/else instruction count: {if_count}")
print(f"ternary instruction count: {tern_count}")

Typical bytecode comparison:

The if/else version has more instructions because it uses STORE_FAST to save to a local variable result, then LOAD_FAST result to return it. The ternary version avoids this intermediate storage — it puts the chosen string directly on the stack and returns it immediately.

Both use a conditional jump opcode (POP_JUMP_IF_FALSE or similar) for the x > 0 check. The jump targets differ slightly, but the core logic is the same.

When does the ternary save instructions?

The ternary is more concise at the bytecode level when:

  1. The result is used immediately (not assigned to a variable used later)
  2. The condition and both branches are simple expressions

When the branches are complex multi-line blocks, use if/else for readability — the bytecode difference is negligible.

Important: Neither form is significantly faster than the other. The jump overhead is the same. Prefer whichever is more readable for humans.

Expected Output
See solution — compare jump instructions
Hints

Hint 1: Both `if/else` and ternary expressions compile to conditional jump opcodes. Look for `POP_JUMP_IF_FALSE`, `POP_JUMP_IF_TRUE`, or `JUMP_FORWARD`.

Hint 2: The ternary expression `x if cond else y` compiles to a conditional jump that either loads `x` or `y`. A full `if/else` block may produce slightly different jump patterns.


Medium

#5Disassemble a For Loop and Trace FOR_ITERMedium
FOR_ITERGET_ITERJUMP_BACKWARDloop-bytecode

Disassemble a for loop. Identify the GET_ITER, FOR_ITER, and JUMP_BACKWARD opcodes. Trace the loop structure and count per-iteration instruction cost.

Python
import dis

def sum_squares(n):
    total = 0
    for i in range(n):
        total += i * i
    return total

print("=== Disassembly ===")
dis.dis(sum_squares)

# Instruction count analysis
instructions = list(dis.get_instructions(sum_squares))
print(f"\nTotal instructions in function: {len(instructions)}")

loop_opcodes = ["STORE_FAST", "LOAD_FAST", "BINARY_OP", "FOR_ITER", "JUMP_BACKWARD"]
loop_instr = [i for i in instructions if i.opname in loop_opcodes]
print(f"Loop-related instructions: {len(loop_instr)}")
Solution
import dis

def sum_squares(n):
total = 0
for i in range(n):
total += i * i
return total

print("=== Disassembly ===")
dis.dis(sum_squares)

instructions = list(dis.get_instructions(sum_squares))
print(f"\nTotal instructions in function: {len(instructions)}")

loop_opcodes = ["STORE_FAST", "LOAD_FAST", "BINARY_OP", "FOR_ITER", "JUMP_BACKWARD"]
loop_instr = [i for i in instructions if i.opname in loop_opcodes]
print(f"Loop-related instructions: {len(loop_instr)}")

Annotated loop structure:

LOAD_CONST 0 # Push 0 (initial total)
STORE_FAST total # total = 0

LOAD_GLOBAL range # Push range
LOAD_FAST n # Push n
CALL 1 # range(n) -> iterator (still a range object)
GET_ITER # Convert to iterator -> iter(range(n))

[LOOP START]
FOR_ITER <end> # Call next(); if StopIteration, jump to <end>
STORE_FAST i # i = next_value

LOAD_FAST total # Push total
LOAD_FAST i # Push i
LOAD_FAST i # Push i
BINARY_OP * # i * i
BINARY_OP += # total += i*i (in-place add)
STORE_FAST total # Store result

JUMP_BACKWARD <loop> # Go back to FOR_ITER
[LOOP END]

LOAD_FAST total
RETURN_VALUE

Per-iteration instructions (approximate): FOR_ITER + STORE_FAST + LOAD_FAST(3) + BINARY_OP(2) + STORE_FAST + JUMP_BACKWARD = ~9 instructions per iteration.

For n = 1,000,000 that is ~9 million bytecode dispatches just for this loop. This is why vectorized NumPy operations (which execute the loop in C) are orders of magnitude faster for numeric work.

import dis

def sum_squares(n):
    total = 0
    for i in range(n):
        total += i * i
    return total

# 1. Disassemble and identify the loop structure
# 2. Label: where does the loop begin, where does FOR_ITER jump on exhaustion?
# 3. Count how many instructions execute per loop iteration
Expected Output
See solution for loop structure analysis
Hints

Hint 1: A for loop compiles to: GET_ITER (convert iterable to iterator), then a loop of FOR_ITER (call next()) / body / JUMP_BACKWARD. When the iterator is exhausted, FOR_ITER jumps past the end of the loop.

Hint 2: Count only the instructions inside the loop body (between FOR_ITER and JUMP_BACKWARD). These execute N times for N iterations.

#6List Comprehension vs Loop: Opcode ShowdownMedium
LIST_APPENDlist-comprehensionCALLperformance

Compare the bytecode of a list built with a comprehension versus a for loop with .append(). Find the LIST_APPEND opcode and explain why it is faster than the CALL approach.

Python
import dis
import types

def build_with_loop(n):
    result = []
    for i in range(n):
        result.append(i * i)
    return result

def build_with_comp(n):
    return [i * i for i in range(n)]

print("=== Loop version ===")
dis.dis(build_with_loop)

print("\n=== Comprehension outer ===")
dis.dis(build_with_comp)

# The comprehension has a nested code object — disassemble it too
for const in build_with_comp.__code__.co_consts:
    if isinstance(const, types.CodeType):
        print(f"\n=== Comprehension inner [{const.co_name}] ===")
        dis.dis(const)
Solution
import dis
import types

def build_with_loop(n):
result = []
for i in range(n):
result.append(i * i)
return result

def build_with_comp(n):
return [i * i for i in range(n)]

print("=== Loop version ===")
dis.dis(build_with_loop)

print("\n=== Comprehension outer ===")
dis.dis(build_with_comp)

for const in build_with_comp.__code__.co_consts:
if isinstance(const, types.CodeType):
print(f"\n=== Comprehension inner [{const.co_name}] ===")
dis.dis(const)

The key difference:

Loop version per-iteration (inside the loop):

LOAD_FAST result # Push result list
LOAD_ATTR append # Look up .append method (dict lookup!)
LOAD_FAST i # Push i
LOAD_FAST i # Push i
BINARY_OP * # i*i
CALL 1 # Call append(i*i) — creates call frame
POP_TOP # Discard None return value of append

Comprehension inner per-iteration:

LOAD_FAST i # Push i (from .0 iterator arg)
LOAD_FAST i # Push i
BINARY_OP * # i*i
LIST_APPEND 1 # Direct C-level append, no method lookup, no CALL

Why LIST_APPEND is faster:

  1. No attribute lookup — LOAD_ATTR is a dictionary lookup through the object's type
  2. No function call overhead — CALL creates a new frame, pushes arguments, dispatches
  3. No POP_TOPLIST_APPEND does not produce a return value

LIST_APPEND is implemented in ceval.c as a direct call to PyList_Append() — it skips all the Python method dispatch machinery entirely.

Expected Output
See solution for LIST_APPEND vs CALL comparison
Hints

Hint 1: A list comprehension compiles with the specialized LIST_APPEND opcode inside its inner code object. A loop uses LOAD_ATTR to look up .append, then CALL to invoke it.

Hint 2: Use dis.dis() on both. For the comprehension, also disassemble the nested code object (found in the outer function's co_consts).

#7f-string vs format() vs % Bytecode ComparisonMedium
FORMAT_VALUEBUILD_STRINGf-stringstring-formatting

Compare the bytecode generated by three Python string formatting approaches: f-strings, .format(), and % formatting.

Python
import dis

name = "world"
count = 42

def use_fstring(name, count):
    return f"Hello, {name}! Count: {count}"

def use_format(name, count):
    return "Hello, {}! Count: {}".format(name, count)

def use_percent(name, count):
    return "Hello, %s! Count: %d" % (name, count)

for fn in [use_fstring, use_format, use_percent]:
    instructions = list(dis.get_instructions(fn))
    print(f"\n--- {fn.__name__} ({len(instructions)} instructions) ---")
    dis.dis(fn)
Solution
import dis

def use_fstring(name, count):
return f"Hello, {name}! Count: {count}"

def use_format(name, count):
return "Hello, {}! Count: {}".format(name, count)

def use_percent(name, count):
return "Hello, %s! Count: %d" % (name, count)

for fn in [use_fstring, use_format, use_percent]:
instructions = list(dis.get_instructions(fn))
print(f"\n--- {fn.__name__} ({len(instructions)} instructions) ---")
dis.dis(fn)

Bytecode patterns for each approach:

f-string (FORMAT_VALUE + BUILD_STRING):

LOAD_CONST "Hello, "
LOAD_FAST name
FORMAT_VALUE # Convert name to str (fast, specialized opcode)
LOAD_CONST "! Count: "
LOAD_FAST count
FORMAT_VALUE # Convert count to str
BUILD_STRING 4 # Concatenate 4 string pieces

No attribute lookup, no function call machinery.

.format() (LOAD_ATTR + CALL):

LOAD_CONST "Hello, {}! Count: {}"
LOAD_METHOD format # Dict lookup for .format method
LOAD_FAST name
LOAD_FAST count
CALL_METHOD 2 # Invokes str.format() — full Python/C dispatch

Requires attribute lookup and a full method call.

% formatting (BINARY_OP %):

LOAD_CONST "Hello, %s! Count: %d"
LOAD_FAST name
LOAD_FAST count
BUILD_TUPLE 2 # Pack args into tuple
BINARY_OP % # String formatting via __mod__ dispatch

One BINARY_OP but it invokes complex C-level parsing of the format string.

Performance ranking: f-strings are typically fastest for simple cases (no complex format specs), followed by %, then .format(). For format specs like :.2f, the ranking may change. In practice, the difference is tiny unless you are in a tight loop formatting millions of strings.

Expected Output
See solution for opcode comparison across three formatting styles
Hints

Hint 1: f-strings compile to FORMAT_VALUE + BUILD_STRING opcodes. str.format() compiles to LOAD_METHOD + CALL. % formatting compiles to BINARY_OP %.

Hint 2: Count the total instructions for each. f-strings are generally the most efficient because FORMAT_VALUE is a specialized opcode with no attribute lookup overhead.

#8Predict the Stack DepthMedium
co_stacksizestack-depthpredict-output

Predict the co_stacksize for each function before running the code. Then verify your predictions.

Python
def f1(x):
    return x + 1

def f2(x, y, z):
    return x + y + z

def f3(x):
    return x * x + x * 2 + 1

def f4(n):
    return sum(range(n))

for fn in [f1, f2, f3, f4]:
    print(f"{fn.__name__}: co_stacksize = {fn.__code__.co_stacksize}")

Predict before running:

  • f1: x + 1 — what is the max stack depth?
  • f2: x + y + z — does adding more terms deepen the stack?
  • f3: x*x + x*2 + 1 — more complex expression
  • f4: sum(range(n)) — a function call
Solution
def f1(x):
return x + 1

def f2(x, y, z):
return x + y + z

def f3(x):
return x * x + x * 2 + 1

def f4(n):
return sum(range(n))

for fn in [f1, f2, f3, f4]:
print(f"{fn.__name__}: co_stacksize = {fn.__code__.co_stacksize}")

Expected outputs:

f1: co_stacksize = 2
f2: co_stacksize = 2
f3: co_stacksize = 2
f4: co_stacksize = 3

Stack trace for each:

f1 (x + 1): LOAD_FAST x → depth 1, LOAD_CONST 1 → depth 2, BINARY_OP → depth 1. Max = 2.

f2 (x + y + z): This evaluates left-to-right as (x + y) + z. LOAD x → 1, LOAD y → 2, BINARY_OP → 1, LOAD z → 2, BINARY_OP → 1. Max = 2. Notice that a chain of additions does NOT grow the stack — intermediate results are consumed immediately.

f3 (x*x + x*2 + 1): Evaluates as ((x*x) + (x*2)) + 1. The subexpressions xx and x2 each need depth 2. The result is accumulated. Max = 2. The compiler evaluates left to right, consuming results eagerly.

f4 (sum(range(n))): LOAD_GLOBAL sum → 1, LOAD_GLOBAL range → 2, LOAD_FAST n → 3, CALL range(n) → 2, CALL sum(...) → 1. Max = 3 because at the peak, both sum and range are on the stack simultaneously before the range(n) call.

Key insight: co_stacksize is computed by the compiler (not at runtime) as the worst-case stack depth across all code paths. It is used to pre-allocate the frame's value array — there is no dynamic resizing of the stack during execution.

Expected Output
See solution for stack depth predictions and explanations
Hints

Hint 1: co_stacksize is the MAXIMUM depth the stack reaches at any point during execution. Trace through the instructions and track the high-water mark.

Hint 2: Each LOAD operation +1, each BINARY_OP -1 (pops 2, pushes 1), CALL pops args+1 (function) and pushes 1 result, STORE_FAST -1.


Hard

#9Write a Bytecode-Based ProfilerHard
disprofilingopcode-frequencyLOAD_GLOBAL

Build a bytecode profiler that analyses a function's compiled code and flags potentially expensive instruction patterns, particularly LOAD_GLOBAL inside loops.

import dis
from collections import Counter

def bytecode_profile(func):
instructions = list(dis.get_instructions(func))
counts = Counter(i.opname for i in instructions)

print(f"=== Bytecode Profile: {func.__name__} ===")
print(f"Total instructions: {len(instructions)}")
print()

# Top opcodes
print("Top 8 opcodes:")
for opname, count in counts.most_common(8):
bar = "=" * count
print(f" {opname:<25} {count:>3} {bar}")
print()

# Flag expensive patterns
warnings = []
in_loop = False
for i, instr in enumerate(instructions):
if instr.opname == "GET_ITER":
in_loop = True
if instr.opname == "JUMP_BACKWARD":
in_loop = False
if in_loop and instr.opname == "LOAD_GLOBAL":
warnings.append(f" LOAD_GLOBAL '{instr.argval}' inside loop at offset {instr.offset}")

if warnings:
print("Performance warnings:")
for w in warnings:
print(w)
else:
print("No performance warnings.")

# Test: a function with LOAD_GLOBAL in a loop
import math

def compute_norms(vectors):
results = []
for v in vectors:
norm = math.sqrt(v[0]**2 + v[1]**2) # LOAD_GLOBAL math in loop
results.append(norm)
return results

bytecode_profile(compute_norms)
Solution
import dis
from collections import Counter

def bytecode_profile(func):
instructions = list(dis.get_instructions(func))
counts = Counter(i.opname for i in instructions)

print(f"=== Bytecode Profile: {func.__name__} ===")
print(f"Total instructions: {len(instructions)}")
print()

print("Top 8 opcodes:")
for opname, count in counts.most_common(8):
bar = "=" * count
print(f" {opname:<25} {count:>3} {bar}")
print()

warnings = []
in_loop = False
for i, instr in enumerate(instructions):
if instr.opname == "GET_ITER":
in_loop = True
if instr.opname == "JUMP_BACKWARD":
in_loop = False
if in_loop and instr.opname == "LOAD_GLOBAL":
warnings.append(f" LOAD_GLOBAL '{instr.argval}' inside loop at offset {instr.offset}")

if warnings:
print("Performance warnings:")
for w in warnings:
print(w)
else:
print("No performance warnings.")

import math

def compute_norms(vectors):
results = []
for v in vectors:
norm = math.sqrt(v[0]**2 + v[1]**2)
results.append(norm)
return results

bytecode_profile(compute_norms)

The LOAD_GLOBAL-in-loop optimization:

When you write math.sqrt inside a loop, CPython executes:

  1. LOAD_GLOBAL math — dictionary lookup in the module's globals (hash + probe)
  2. LOAD_ATTR sqrt — dictionary lookup on the math module object (another hash + probe)

These two dictionary lookups happen on every iteration. For a loop with 1 million iterations, that is 2 million unnecessary dict lookups.

The fix:

sqrt = math.sqrt # One LOAD_GLOBAL, one LOAD_ATTR, outside the loop
def compute_norms_fast(vectors):
results = []
for v in vectors:
norm = sqrt(v[0]**2 + v[1]**2) # LOAD_FAST — array index, no hashing
results.append(norm)
return results

This is exactly what CPython's bytecode shows: the sqrt local is loaded with LOAD_FAST instead of LOAD_GLOBAL + LOAD_ATTR. Real-world speedup: 15-30% for math-heavy loops.

import dis
from collections import Counter

def bytecode_profile(func):
    """Analyze a function's bytecode and produce a performance profile:
    - Total instruction count
    - Opcode frequency distribution
    - Estimated 'hot' vs 'cold' operations
    - Flag any potentially expensive patterns (LOAD_GLOBAL in loops, etc.)
    """
    pass
Expected Output
See solution for full profiler output
Hints

Hint 1: Use dis.get_instructions() to iterate all instructions. Count by opname. Flag LOAD_GLOBAL as "potentially slow" (dict lookup vs LOAD_FAST array access).

Hint 2: To detect LOAD_GLOBAL inside a loop, look for LOAD_GLOBAL instructions that appear between a GET_ITER and a JUMP_BACKWARD in the instruction sequence.

#10Decompile Bytecode Back to PseudocodeHard
disdecompilationpseudocodereverse-engineering

Build a partial decompiler that translates CPython bytecode back into pseudocode. This is how tools like uncompyle6 work.

import dis

def bytecode_to_pseudocode(func):
instructions = list(dis.get_instructions(func))
stack = []
output = []

binary_ops = {
0: "+", 1: "&", 2: "//", 3: "@", 4: "<<", 5: "*",
6: "%", 7: "|", 8: "**", 9: ">>", 10: "-",
11: "/", 12: "^", 13: "+=", 14: "&=",
}
compare_ops = {0: "<", 1: "<=", 2: "==", 3: "!=", 4: ">", 5: ">="}

for instr in instructions:
op = instr.opname

if op == "RESUME":
continue
elif op in ("LOAD_FAST", "LOAD_GLOBAL", "LOAD_DEREF"):
stack.append(str(instr.argval))
elif op == "LOAD_CONST":
stack.append(repr(instr.argval))
elif op == "BINARY_OP":
rhs = stack.pop()
lhs = stack.pop()
sym = binary_ops.get(instr.arg, "?")
stack.append(f"({lhs} {sym} {rhs})")
elif op == "COMPARE_OP":
rhs = stack.pop()
lhs = stack.pop()
sym = compare_ops.get(instr.arg, "?")
stack.append(f"({lhs} {sym} {rhs})")
elif op == "STORE_FAST":
val = stack.pop() if stack else "?"
output.append(f" {instr.argval} = {val}")
elif op in ("POP_JUMP_IF_FALSE", "POP_JUMP_FORWARD_IF_FALSE"):
cond = stack.pop() if stack else "?"
output.append(f" if not {cond}: goto offset {instr.argval}")
elif op == "RETURN_VALUE":
val = stack.pop() if stack else "?"
output.append(f" return {val}")
else:
output.append(f" # [{op} {instr.argval}]")

return "\n".join(output)

# Test
def hypotenuse(a, b):
c_sq = a * a + b * b
return c_sq ** 0.5

print(f"def {hypotenuse.__name__}(a, b):")
print(bytecode_to_pseudocode(hypotenuse))
Solution
import dis

def bytecode_to_pseudocode(func):
instructions = list(dis.get_instructions(func))
stack = []
output = []

binary_ops = {
0: "+", 1: "&", 2: "//", 3: "@", 4: "<<", 5: "*",
6: "%", 7: "|", 8: "**", 9: ">>", 10: "-",
11: "/", 12: "^", 13: "+=", 14: "&=",
}
compare_ops = {0: "<", 1: "<=", 2: "==", 3: "!=", 4: ">", 5: ">="}

for instr in instructions:
op = instr.opname

if op == "RESUME":
continue
elif op in ("LOAD_FAST", "LOAD_GLOBAL", "LOAD_DEREF"):
stack.append(str(instr.argval))
elif op == "LOAD_CONST":
stack.append(repr(instr.argval))
elif op == "BINARY_OP":
rhs = stack.pop()
lhs = stack.pop()
sym = binary_ops.get(instr.arg, "?")
stack.append(f"({lhs} {sym} {rhs})")
elif op == "COMPARE_OP":
rhs = stack.pop()
lhs = stack.pop()
sym = compare_ops.get(instr.arg, "?")
stack.append(f"({lhs} {sym} {rhs})")
elif op == "STORE_FAST":
val = stack.pop() if stack else "?"
output.append(f" {instr.argval} = {val}")
elif op in ("POP_JUMP_IF_FALSE", "POP_JUMP_FORWARD_IF_FALSE"):
cond = stack.pop() if stack else "?"
output.append(f" if not {cond}: goto offset {instr.argval}")
elif op == "RETURN_VALUE":
val = stack.pop() if stack else "?"
output.append(f" return {val}")
else:
output.append(f" # [{op} {instr.argval}]")

return "\n".join(output)

def hypotenuse(a, b):
c_sq = a * a + b * b
return c_sq ** 0.5

print(f"def {hypotenuse.__name__}(a, b):")
print(bytecode_to_pseudocode(hypotenuse))

Expected pseudocode output:

def hypotenuse(a, b):
c_sq = ((a * a) + (b * b))
return (c_sq ** 0.5)

Why decompilation is hard in general:

This simple version handles straight-line code well but struggles with:

  • Control flow: Jumps create branches that require reconstructing if/while/for structures from raw offsets (this is the core of decompiler research)
  • Function calls: CALL pops N arguments plus the function — needs special handling
  • Comprehensions: Nested code objects must be decompiled separately and inlined
  • Exception handling: SETUP_FINALLY, PUSH_EXC_INFO, etc. create complex control flow graphs

Real decompilers (uncompyle6, decompile3) use control flow graph analysis to reconstruct the original structure. They build a graph of basic blocks connected by jump edges, then match patterns against known compiler output templates to reconstruct if, while, for, try/except from the raw jump instructions.

import dis

def bytecode_to_pseudocode(func):
    """Translate a function's bytecode into human-readable pseudocode.
    Handle: LOAD_FAST, LOAD_CONST, LOAD_GLOBAL, BINARY_OP,
    STORE_FAST, COMPARE_OP, POP_JUMP_IF_FALSE, RETURN_VALUE.
    Use a stack to track operands and build expressions.
    """
    pass
Expected Output
See solution for pseudocode output
Hints

Hint 1: Model the value stack as a Python list of strings. LOAD_ instructions push name strings; BINARY_OP pops two strings and pushes a combined expression string like "(a + b)".

Hint 2: For STORE_FAST, emit an assignment statement. For RETURN_VALUE, emit "return <top>". Jumps are the hardest part — simplify by emitting "if <condition>: goto <offset>".

#11Build an Opcode-Based Complexity ScorerHard
discomplexityCALLLOAD_GLOBALloops

Build a complexity scorer that assigns a numeric score to a function based on its bytecode characteristics. Higher scores indicate code that is likely slower.

import dis
from dataclasses import dataclass, field
from typing import List

@dataclass
class ComplexityReport:
function_name: str
total_instructions: int
call_count: int
load_global_count: int
loop_count: int
complexity_score: float
warnings: List[str] = field(default_factory=list)

def score_complexity(func) -> ComplexityReport:
instructions = list(dis.get_instructions(func))
loop_depth = 0
call_count = 0
load_global_count = 0
loop_count = 0
score = 0
warnings = []

for instr in instructions:
op = instr.opname
score += 1 # baseline per instruction

if op == "GET_ITER":
loop_depth += 1
loop_count += 1
score += 10
elif op == "JUMP_BACKWARD":
loop_depth = max(0, loop_depth - 1)
elif op in ("CALL", "CALL_FUNCTION", "CALL_METHOD"):
call_count += 1
score += 5
elif op == "LOAD_GLOBAL":
load_global_count += 1
score += 3
if loop_depth > 0:
score += 20
warnings.append(
f"LOAD_GLOBAL '{instr.argval}' inside loop at offset {instr.offset}"
)

return ComplexityReport(
function_name=func.__name__,
total_instructions=len(instructions),
call_count=call_count,
load_global_count=load_global_count,
loop_count=loop_count,
complexity_score=score,
warnings=warnings,
)

def print_report(report):
print(f"\n=== {report.function_name} ===")
print(f" Total instructions: {report.total_instructions}")
print(f" CALL count: {report.call_count}")
print(f" LOAD_GLOBAL count: {report.load_global_count}")
print(f" Loop count: {report.loop_count}")
print(f" Complexity score: {report.complexity_score}")
if report.warnings:
print(f" Warnings:")
for w in report.warnings:
print(f" {w}")

# Test functions
import math

def simple(x, y):
return x + y

def loop_with_globals(data):
result = []
for item in data:
result.append(math.sqrt(item))
return result

def loop_with_local(data):
sqrt = math.sqrt
result = []
for item in data:
result.append(sqrt(item))
return result

for fn in [simple, loop_with_globals, loop_with_local]:
print_report(score_complexity(fn))
Solution
import dis
from dataclasses import dataclass, field
from typing import List

@dataclass
class ComplexityReport:
function_name: str
total_instructions: int
call_count: int
load_global_count: int
loop_count: int
complexity_score: float
warnings: List[str] = field(default_factory=list)

def score_complexity(func) -> ComplexityReport:
instructions = list(dis.get_instructions(func))
loop_depth = 0
call_count = 0
load_global_count = 0
loop_count = 0
score = 0
warnings = []

for instr in instructions:
op = instr.opname
score += 1

if op == "GET_ITER":
loop_depth += 1
loop_count += 1
score += 10
elif op == "JUMP_BACKWARD":
loop_depth = max(0, loop_depth - 1)
elif op in ("CALL", "CALL_FUNCTION", "CALL_METHOD"):
call_count += 1
score += 5
elif op == "LOAD_GLOBAL":
load_global_count += 1
score += 3
if loop_depth > 0:
score += 20
warnings.append(
f"LOAD_GLOBAL '{instr.argval}' inside loop at offset {instr.offset}"
)

return ComplexityReport(
function_name=func.__name__,
total_instructions=len(instructions),
call_count=call_count,
load_global_count=load_global_count,
loop_count=loop_count,
complexity_score=score,
warnings=warnings,
)

def print_report(report):
print(f"\n=== {report.function_name} ===")
print(f" Total instructions: {report.total_instructions}")
print(f" CALL count: {report.call_count}")
print(f" LOAD_GLOBAL count: {report.load_global_count}")
print(f" Loop count: {report.loop_count}")
print(f" Complexity score: {report.complexity_score}")
if report.warnings:
print(f" Warnings:")
for w in report.warnings:
print(f" {w}")

import math

def simple(x, y):
return x + y

def loop_with_globals(data):
result = []
for item in data:
result.append(math.sqrt(item))
return result

def loop_with_local(data):
sqrt = math.sqrt
result = []
for item in data:
result.append(sqrt(item))
return result

for fn in [simple, loop_with_globals, loop_with_local]:
print_report(score_complexity(fn))

Expected complexity scores:

  • simple: low (5-10) — no loops, no globals, two LOAD_FAST
  • loop_with_globals: high (50+) — loop with LOAD_GLOBAL math and append on every iteration
  • loop_with_local: medium (25-35) — loop without LOAD_GLOBAL inside the body (sqrt is LOAD_FAST)

Extensions:

  • Add scoring for nested loops (multiply by nesting depth)
  • Flag LOAD_ATTR inside loops (attribute lookups are also dict-based)
  • Add a suggest_fix() method that generates specific optimization advice based on warnings
  • Integrate with ast.parse() to also report cyclomatic complexity alongside bytecode complexity
import dis
from dataclasses import dataclass, field
from typing import List

@dataclass
class ComplexityReport:
    function_name: str
    total_instructions: int
    call_count: int
    load_global_count: int
    loop_count: int
    complexity_score: float
    warnings: List[str] = field(default_factory=list)

def score_complexity(func) -> ComplexityReport:
    """Score a function's bytecode complexity.
    
    Scoring rules:
      +1 per instruction (baseline)
      +5 per CALL instruction (expensive dispatch)
      +3 per LOAD_GLOBAL (dict lookup vs array access)
      +10 per loop (GET_ITER = one loop)
      +20 if LOAD_GLOBAL appears inside a loop
    """
    pass
Expected Output
See solution for complexity scoring report
Hints

Hint 1: Track loop state with a counter that increments on GET_ITER and decrements on JUMP_BACKWARD. A LOAD_GLOBAL when the counter > 0 is inside a loop.

Hint 2: Accumulate the score using the rules given. Package everything into the ComplexityReport dataclass.

© 2026 EngineersOfAI. All rights reserved.