Python Disassembly with dis Practice Problems & Exercises

Practice: Disassembly with dis

11 problems4 Easy4 Medium3 Hard⏱ 55–75 min

Easy

#1Read Your First DisassemblyEasy

disLOAD_FASTBINARY_OPRETURN_VALUE

Disassemble a simple function and answer the questions about each instruction.

Python

import dis

def add_and_double(x, y):
    result = x + y
    return result * 2

dis.dis(add_and_double)

After running, answer:

Which opcode loads the parameter x? What argument index does it use?
Which opcode performs the addition?
Which opcode stores the result into result?
Which opcode loads the constant 2?
What is the last opcode executed before the function returns?

Solution

import dis

def add_and_double(x, y):
    result = x + y
    return result * 2

dis.dis(add_and_double)

Annotated disassembly:

  2      RESUME          0          # Entry point / hot-path indicator
  3      LOAD_FAST       0 (x)      # Push local[0] = x onto stack
         LOAD_FAST       1 (y)      # Push local[1] = y onto stack
         BINARY_OP       0 (+)      # Pop y,x; push x+y
         STORE_FAST      2 (result) # Pop top; store in local[2]
  4      LOAD_FAST       2 (result) # Push local[2] = result
         LOAD_CONST      1 (2)      # Push constant 2
         BINARY_OP       5 (*)      # Pop 2, result; push result*2
         RETURN_VALUE               # Return top of stack

Answers:

LOAD_FAST loads x. It uses argument index 0, meaning index 0 in co_varnames (which contains ('x', 'y', 'result')).
BINARY_OP 0 (+) performs addition. The 0 is the operator code for +.
STORE_FAST 2 (result) pops the top of stack and stores it in co_varnames[2] = result.
LOAD_CONST 1 (2) loads the constant at index 1 of co_consts. (Index 0 is always None.)
RETURN_VALUE pops and returns the top of the stack. It is always the last instruction of any function.

Reading the stack evolution:

After LOAD_FAST x:       [x]
After LOAD_FAST y:       [x, y]
After BINARY_OP +:       [x+y]
After STORE_FAST result: []
After LOAD_FAST result:  [x+y]
After LOAD_CONST 2:      [x+y, 2]
After BINARY_OP *:       [(x+y)*2]
After RETURN_VALUE:      []   (returned (x+y)*2)

Expected Output

See solution — focus on identifying each instruction

Hints

Hint 1: Run `dis.dis(fn)` and read each column: offset, optional line number, opcode name, argument index, (human-readable argument).

Hint 2: LOAD_FAST loads a local variable. LOAD_CONST loads a literal constant. BINARY_OP performs an arithmetic or comparison operation. RETURN_VALUE returns the top of the stack.

#2LOAD_FAST vs LOAD_GLOBAL vs LOAD_CONSTEasy

LOAD_FASTLOAD_GLOBALLOAD_CONSTopcodes

Disassemble this function and identify which LOAD opcode is used for each distinct value. Explain why each one uses the opcode it does.

Python

import dis

MAX_SIZE = 100  # module-level global

def process(data, multiplier):
    length = len(data)
    if length > MAX_SIZE:
        return None
    return [x * multiplier for x in data]

dis.dis(process)

Solution

import dis

MAX_SIZE = 100

def process(data, multiplier):
    length = len(data)
    if length > MAX_SIZE:
        return None
    return [x * multiplier for x in data]

dis.dis(process)

LOAD opcode mapping:

Value	Opcode	Reason
`data` (param)	`LOAD_FAST`	Local variable — array index lookup
`multiplier` (param)	`LOAD_FAST`	Local variable — array index lookup
`length` (local)	`LOAD_FAST`	Local variable assigned in function body
`len` (builtin)	`LOAD_GLOBAL`	Module-level name lookup (builtins fall through to globals)
`MAX_SIZE` (global)	`LOAD_GLOBAL`	Module-level variable
`None`	`LOAD_CONST`	Constant literal

Why this matters for performance:

LOAD_FAST is an array index lookup — O(1), one C array access
LOAD_GLOBAL is a dictionary lookup — O(1) average, but involves hashing and a dict probe
LOAD_CONST is also an array index lookup — same cost as LOAD_FAST

This is why the classic Python optimization tip "cache a global in a local at the top of a tight loop" works:

# Slow: LOAD_GLOBAL on every iteration
for i in range(1000):
    result = math.sqrt(i)

# Faster: one LOAD_GLOBAL, then LOAD_FAST on each iteration
sqrt = math.sqrt
for i in range(1000):
    result = sqrt(i)

Expected Output

See solution — spot which LOAD opcode is used for each value

Hints

Hint 1: LOAD_FAST is for local variables and parameters (fast because it is an array index lookup). LOAD_GLOBAL is for module-level names or builtins. LOAD_CONST is for literal values baked into the bytecode.

Hint 2: Function parameters are always locals, so they always use LOAD_FAST. Module-level variables and imported names use LOAD_GLOBAL. Numbers, strings, None, True, False use LOAD_CONST.

#3Trace the Value Stack by HandEasy

value-stackstack-tracebytecode

Trace the value stack through this disassembly manually. Fill in the stack state after each instruction.

Python

import dis

def compute(a, b):
    return (a + b) * (a - b)

dis.dis(compute)

# Now trace manually for a=5, b=2:
# Expected result: (5+2) * (5-2) = 7 * 3 = 21
print("\nExpected result:", compute(5, 2))

Fill in the stack after each instruction:

Instruction          Stack (bottom -> top)
RESUME               []
LOAD_FAST (a=5)      ?
LOAD_FAST (b=2)      ?
BINARY_OP +          ?
LOAD_FAST (a=5)      ?
LOAD_FAST (b=2)      ?
BINARY_OP -          ?
BINARY_OP *          ?
RETURN_VALUE         ?

Solution

import dis

def compute(a, b):
    return (a + b) * (a - b)

dis.dis(compute)
print("\nExpected result:", compute(5, 2))

Stack trace for compute(5, 2):

Instruction          Stack (bottom -> top)   Notes
RESUME               []                      Entry point
LOAD_FAST (a=5)      [5]                     Push a
LOAD_FAST (b=2)      [5, 2]                  Push b
BINARY_OP +          [7]                     Pop 5,2; push 5+2=7
LOAD_FAST (a=5)      [7, 5]                  Push a again
LOAD_FAST (b=2)      [7, 5, 2]               Push b again
BINARY_OP -          [7, 3]                  Pop 5,2; push 5-2=3
BINARY_OP *          [21]                    Pop 7,3; push 7*3=21
RETURN_VALUE         []                      Return 21 to caller

Key insight: The expression (a + b) * (a - b) requires loading a and b twice each because CPython is a stack machine — it cannot reuse a computed sub-result that has already been popped off the stack. If you needed to use (a + b) twice, you would need a DUP_TOP instruction or assign the result to a local variable first.

Stack depth observation: The maximum depth here is 3 (after both LOAD_FAST a instructions before the subtraction). The co_stacksize attribute of this function's code object will be 3 — CPython pre-allocates this many slots in the frame's evaluation stack.

Expected Output

Final stack state: [result], where result = 15

Hints

Hint 1: Work through each instruction one at a time. Keep a list representing the stack. LOAD instructions push values, BINARY_OP pops two and pushes one, STORE pops one.

Hint 2: The stack grows to the right. Start with an empty stack []. At RETURN_VALUE, the top of the stack is the return value.

#4Spot the Opcode Difference: if vs ternaryEasy

POP_JUMP_IF_FALSEternaryconditionalopcodes

Compare the bytecode for an if/else statement versus a ternary expression that performs the same logic.

Python

import dis

def with_if(x):
    if x > 0:
        result = "positive"
    else:
        result = "non-positive"
    return result

def with_ternary(x):
    return "positive" if x > 0 else "non-positive"

print("=== if/else ===")
dis.dis(with_if)

print("\n=== ternary ===")
dis.dis(with_ternary)

# Count instructions
if_count = len(list(dis.get_instructions(with_if)))
tern_count = len(list(dis.get_instructions(with_ternary)))
print(f"\nif/else instruction count:  {if_count}")
print(f"ternary instruction count:  {tern_count}")

Solution

import dis

def with_if(x):
    if x > 0:
        result = "positive"
    else:
        result = "non-positive"
    return result

def with_ternary(x):
    return "positive" if x > 0 else "non-positive"

print("=== if/else ===")
dis.dis(with_if)

print("\n=== ternary ===")
dis.dis(with_ternary)

if_count = len(list(dis.get_instructions(with_if)))
tern_count = len(list(dis.get_instructions(with_ternary)))
print(f"\nif/else instruction count:  {if_count}")
print(f"ternary instruction count:  {tern_count}")

Typical bytecode comparison:

The if/else version has more instructions because it uses STORE_FAST to save to a local variable result, then LOAD_FAST result to return it. The ternary version avoids this intermediate storage — it puts the chosen string directly on the stack and returns it immediately.

Both use a conditional jump opcode (POP_JUMP_IF_FALSE or similar) for the x > 0 check. The jump targets differ slightly, but the core logic is the same.

When does the ternary save instructions?

The ternary is more concise at the bytecode level when:

The result is used immediately (not assigned to a variable used later)
The condition and both branches are simple expressions

When the branches are complex multi-line blocks, use if/else for readability — the bytecode difference is negligible.

Important: Neither form is significantly faster than the other. The jump overhead is the same. Prefer whichever is more readable for humans.

Expected Output

See solution — compare jump instructions

Hints

Hint 1: Both `if/else` and ternary expressions compile to conditional jump opcodes. Look for `POP_JUMP_IF_FALSE`, `POP_JUMP_IF_TRUE`, or `JUMP_FORWARD`.

Hint 2: The ternary expression `x if cond else y` compiles to a conditional jump that either loads `x` or `y`. A full `if/else` block may produce slightly different jump patterns.

Medium

#5Disassemble a For Loop and Trace FOR_ITERMedium

FOR_ITERGET_ITERJUMP_BACKWARDloop-bytecode

Disassemble a for loop. Identify the GET_ITER, FOR_ITER, and JUMP_BACKWARD opcodes. Trace the loop structure and count per-iteration instruction cost.

Python

import dis

def sum_squares(n):
    total = 0
    for i in range(n):
        total += i * i
    return total

print("=== Disassembly ===")
dis.dis(sum_squares)

# Instruction count analysis
instructions = list(dis.get_instructions(sum_squares))
print(f"\nTotal instructions in function: {len(instructions)}")

loop_opcodes = ["STORE_FAST", "LOAD_FAST", "BINARY_OP", "FOR_ITER", "JUMP_BACKWARD"]
loop_instr = [i for i in instructions if i.opname in loop_opcodes]
print(f"Loop-related instructions: {len(loop_instr)}")

Solution

import dis

def sum_squares(n):
    total = 0
    for i in range(n):
        total += i * i
    return total

print("=== Disassembly ===")
dis.dis(sum_squares)

instructions = list(dis.get_instructions(sum_squares))
print(f"\nTotal instructions in function: {len(instructions)}")

loop_opcodes = ["STORE_FAST", "LOAD_FAST", "BINARY_OP", "FOR_ITER", "JUMP_BACKWARD"]
loop_instr = [i for i in instructions if i.opname in loop_opcodes]
print(f"Loop-related instructions: {len(loop_instr)}")

Annotated loop structure:

LOAD_CONST 0            # Push 0 (initial total)
STORE_FAST total        # total = 0

LOAD_GLOBAL range       # Push range
LOAD_FAST n             # Push n
CALL 1                  # range(n) -> iterator (still a range object)
GET_ITER                # Convert to iterator -> iter(range(n))

[LOOP START]
FOR_ITER <end>          # Call next(); if StopIteration, jump to <end>
STORE_FAST i            # i = next_value

LOAD_FAST total         # Push total
LOAD_FAST i             # Push i
LOAD_FAST i             # Push i
BINARY_OP *             # i * i
BINARY_OP +=            # total += i*i  (in-place add)
STORE_FAST total        # Store result

JUMP_BACKWARD <loop>    # Go back to FOR_ITER
[LOOP END]

LOAD_FAST total
RETURN_VALUE

Per-iteration instructions (approximate): FOR_ITER + STORE_FAST + LOAD_FAST(3) + BINARY_OP(2) + STORE_FAST + JUMP_BACKWARD = ~9 instructions per iteration.

For n = 1,000,000 that is ~9 million bytecode dispatches just for this loop. This is why vectorized NumPy operations (which execute the loop in C) are orders of magnitude faster for numeric work.

Starter Code

import dis

def sum_squares(n):
    total = 0
    for i in range(n):
        total += i * i
    return total

# 1. Disassemble and identify the loop structure
# 2. Label: where does the loop begin, where does FOR_ITER jump on exhaustion?
# 3. Count how many instructions execute per loop iteration

Expected Output

See solution for loop structure analysis

Hints

Hint 1: A for loop compiles to: GET_ITER (convert iterable to iterator), then a loop of FOR_ITER (call next()) / body / JUMP_BACKWARD. When the iterator is exhausted, FOR_ITER jumps past the end of the loop.

Hint 2: Count only the instructions inside the loop body (between FOR_ITER and JUMP_BACKWARD). These execute N times for N iterations.

#6List Comprehension vs Loop: Opcode ShowdownMedium

LIST_APPENDlist-comprehensionCALLperformance

Compare the bytecode of a list built with a comprehension versus a for loop with .append(). Find the LIST_APPEND opcode and explain why it is faster than the CALL approach.

Python

import dis
import types

def build_with_loop(n):
    result = []
    for i in range(n):
        result.append(i * i)
    return result

def build_with_comp(n):
    return [i * i for i in range(n)]

print("=== Loop version ===")
dis.dis(build_with_loop)

print("\n=== Comprehension outer ===")
dis.dis(build_with_comp)

# The comprehension has a nested code object — disassemble it too
for const in build_with_comp.__code__.co_consts:
    if isinstance(const, types.CodeType):
        print(f"\n=== Comprehension inner [{const.co_name}] ===")
        dis.dis(const)

Solution

import dis
import types

def build_with_loop(n):
    result = []
    for i in range(n):
        result.append(i * i)
    return result

def build_with_comp(n):
    return [i * i for i in range(n)]

print("=== Loop version ===")
dis.dis(build_with_loop)

print("\n=== Comprehension outer ===")
dis.dis(build_with_comp)

for const in build_with_comp.__code__.co_consts:
    if isinstance(const, types.CodeType):
        print(f"\n=== Comprehension inner [{const.co_name}] ===")
        dis.dis(const)

The key difference:

Loop version per-iteration (inside the loop):

LOAD_FAST result       # Push result list
LOAD_ATTR append       # Look up .append method (dict lookup!)
LOAD_FAST i            # Push i
LOAD_FAST i            # Push i
BINARY_OP *            # i*i
CALL 1                 # Call append(i*i) — creates call frame
POP_TOP                # Discard None return value of append

Comprehension inner per-iteration:

LOAD_FAST i            # Push i (from .0 iterator arg)
LOAD_FAST i            # Push i
BINARY_OP *            # i*i
LIST_APPEND 1          # Direct C-level append, no method lookup, no CALL

Why LIST_APPEND is faster:

No attribute lookup — LOAD_ATTR is a dictionary lookup through the object's type
No function call overhead — CALL creates a new frame, pushes arguments, dispatches
No POP_TOP — LIST_APPEND does not produce a return value

LIST_APPEND is implemented in ceval.c as a direct call to PyList_Append() — it skips all the Python method dispatch machinery entirely.

Expected Output

See solution for LIST_APPEND vs CALL comparison

Hints

Hint 1: A list comprehension compiles with the specialized LIST_APPEND opcode inside its inner code object. A loop uses LOAD_ATTR to look up .append, then CALL to invoke it.

Hint 2: Use dis.dis() on both. For the comprehension, also disassemble the nested code object (found in the outer function's co_consts).

#7f-string vs format() vs % Bytecode ComparisonMedium

FORMAT_VALUEBUILD_STRINGf-stringstring-formatting

Compare the bytecode generated by three Python string formatting approaches: f-strings, .format(), and % formatting.

Python

import dis

name = "world"
count = 42

def use_fstring(name, count):
    return f"Hello, {name}! Count: {count}"

def use_format(name, count):
    return "Hello, {}! Count: {}".format(name, count)

def use_percent(name, count):
    return "Hello, %s! Count: %d" % (name, count)

for fn in [use_fstring, use_format, use_percent]:
    instructions = list(dis.get_instructions(fn))
    print(f"\n--- {fn.__name__} ({len(instructions)} instructions) ---")
    dis.dis(fn)

Solution

import dis

def use_fstring(name, count):
    return f"Hello, {name}! Count: {count}"

def use_format(name, count):
    return "Hello, {}! Count: {}".format(name, count)

def use_percent(name, count):
    return "Hello, %s! Count: %d" % (name, count)

for fn in [use_fstring, use_format, use_percent]:
    instructions = list(dis.get_instructions(fn))
    print(f"\n--- {fn.__name__} ({len(instructions)} instructions) ---")
    dis.dis(fn)

Bytecode patterns for each approach:

f-string (FORMAT_VALUE + BUILD_STRING):

LOAD_CONST "Hello, "
LOAD_FAST name
FORMAT_VALUE           # Convert name to str (fast, specialized opcode)
LOAD_CONST "! Count: "
LOAD_FAST count
FORMAT_VALUE           # Convert count to str
BUILD_STRING 4         # Concatenate 4 string pieces

No attribute lookup, no function call machinery.

.format() (LOAD_ATTR + CALL):

LOAD_CONST "Hello, {}! Count: {}"
LOAD_METHOD format     # Dict lookup for .format method
LOAD_FAST name
LOAD_FAST count
CALL_METHOD 2          # Invokes str.format() — full Python/C dispatch

Requires attribute lookup and a full method call.

% formatting (BINARY_OP %):

LOAD_CONST "Hello, %s! Count: %d"
LOAD_FAST name
LOAD_FAST count
BUILD_TUPLE 2          # Pack args into tuple
BINARY_OP %            # String formatting via __mod__ dispatch

One BINARY_OP but it invokes complex C-level parsing of the format string.

Performance ranking: f-strings are typically fastest for simple cases (no complex format specs), followed by %, then .format(). For format specs like :.2f, the ranking may change. In practice, the difference is tiny unless you are in a tight loop formatting millions of strings.

Expected Output

See solution for opcode comparison across three formatting styles

Hints

Hint 1: f-strings compile to FORMAT_VALUE + BUILD_STRING opcodes. str.format() compiles to LOAD_METHOD + CALL. % formatting compiles to BINARY_OP %.

Hint 2: Count the total instructions for each. f-strings are generally the most efficient because FORMAT_VALUE is a specialized opcode with no attribute lookup overhead.

#8Predict the Stack DepthMedium

co_stacksizestack-depthpredict-output

Predict the co_stacksize for each function before running the code. Then verify your predictions.

Python

def f1(x):
    return x + 1

def f2(x, y, z):
    return x + y + z

def f3(x):
    return x * x + x * 2 + 1

def f4(n):
    return sum(range(n))

for fn in [f1, f2, f3, f4]:
    print(f"{fn.__name__}: co_stacksize = {fn.__code__.co_stacksize}")

Predict before running:

f1: x + 1 — what is the max stack depth?
f2: x + y + z — does adding more terms deepen the stack?
f3: x*x + x*2 + 1 — more complex expression
f4: sum(range(n)) — a function call

Solution

def f1(x):
    return x + 1

def f2(x, y, z):
    return x + y + z

def f3(x):
    return x * x + x * 2 + 1

def f4(n):
    return sum(range(n))

for fn in [f1, f2, f3, f4]:
    print(f"{fn.__name__}: co_stacksize = {fn.__code__.co_stacksize}")

Expected outputs:

f1: co_stacksize = 2
f2: co_stacksize = 2
f3: co_stacksize = 2
f4: co_stacksize = 3

Stack trace for each:

f1 (x + 1): LOAD_FAST x → depth 1, LOAD_CONST 1 → depth 2, BINARY_OP → depth 1. Max = 2.

f2 (x + y + z): This evaluates left-to-right as (x + y) + z. LOAD x → 1, LOAD y → 2, BINARY_OP → 1, LOAD z → 2, BINARY_OP → 1. Max = 2. Notice that a chain of additions does NOT grow the stack — intermediate results are consumed immediately.

f3 (x*x + x*2 + 1): Evaluates as ((x*x) + (x*2)) + 1. The subexpressions xx and x2 each need depth 2. The result is accumulated. Max = 2. The compiler evaluates left to right, consuming results eagerly.

f4 (sum(range(n))): LOAD_GLOBAL sum → 1, LOAD_GLOBAL range → 2, LOAD_FAST n → 3, CALL range(n) → 2, CALL sum(...) → 1. Max = 3 because at the peak, both sum and range are on the stack simultaneously before the range(n) call.

Key insight: co_stacksize is computed by the compiler (not at runtime) as the worst-case stack depth across all code paths. It is used to pre-allocate the frame's value array — there is no dynamic resizing of the stack during execution.

Expected Output

See solution for stack depth predictions and explanations

Hints

Hint 1: co_stacksize is the MAXIMUM depth the stack reaches at any point during execution. Trace through the instructions and track the high-water mark.

Hint 2: Each LOAD operation +1, each BINARY_OP -1 (pops 2, pushes 1), CALL pops args+1 (function) and pushes 1 result, STORE_FAST -1.

Hard

#9Write a Bytecode-Based ProfilerHard

disprofilingopcode-frequencyLOAD_GLOBAL

Build a bytecode profiler that analyses a function's compiled code and flags potentially expensive instruction patterns, particularly LOAD_GLOBAL inside loops.

import dis
from collections import Counter

def bytecode_profile(func):
    instructions = list(dis.get_instructions(func))
    counts = Counter(i.opname for i in instructions)

    print(f"=== Bytecode Profile: {func.__name__} ===")
    print(f"Total instructions: {len(instructions)}")
    print()

    # Top opcodes
    print("Top 8 opcodes:")
    for opname, count in counts.most_common(8):
        bar = "=" * count
        print(f"  {opname:<25} {count:>3}  {bar}")
    print()

    # Flag expensive patterns
    warnings = []
    in_loop = False
    for i, instr in enumerate(instructions):
        if instr.opname == "GET_ITER":
            in_loop = True
        if instr.opname == "JUMP_BACKWARD":
            in_loop = False
        if in_loop and instr.opname == "LOAD_GLOBAL":
            warnings.append(f"  LOAD_GLOBAL '{instr.argval}' inside loop at offset {instr.offset}")

    if warnings:
        print("Performance warnings:")
        for w in warnings:
            print(w)
    else:
        print("No performance warnings.")

# Test: a function with LOAD_GLOBAL in a loop
import math

def compute_norms(vectors):
    results = []
    for v in vectors:
        norm = math.sqrt(v[0]**2 + v[1]**2)  # LOAD_GLOBAL math in loop
        results.append(norm)
    return results

bytecode_profile(compute_norms)

Solution

import dis
from collections import Counter

def bytecode_profile(func):
    instructions = list(dis.get_instructions(func))
    counts = Counter(i.opname for i in instructions)

    print(f"=== Bytecode Profile: {func.__name__} ===")
    print(f"Total instructions: {len(instructions)}")
    print()

    print("Top 8 opcodes:")
    for opname, count in counts.most_common(8):
        bar = "=" * count
        print(f"  {opname:<25} {count:>3}  {bar}")
    print()

    warnings = []
    in_loop = False
    for i, instr in enumerate(instructions):
        if instr.opname == "GET_ITER":
            in_loop = True
        if instr.opname == "JUMP_BACKWARD":
            in_loop = False
        if in_loop and instr.opname == "LOAD_GLOBAL":
            warnings.append(f"  LOAD_GLOBAL '{instr.argval}' inside loop at offset {instr.offset}")

    if warnings:
        print("Performance warnings:")
        for w in warnings:
            print(w)
    else:
        print("No performance warnings.")

import math

def compute_norms(vectors):
    results = []
    for v in vectors:
        norm = math.sqrt(v[0]**2 + v[1]**2)
        results.append(norm)
    return results

bytecode_profile(compute_norms)

The LOAD_GLOBAL-in-loop optimization:

When you write math.sqrt inside a loop, CPython executes:

LOAD_GLOBAL math — dictionary lookup in the module's globals (hash + probe)
LOAD_ATTR sqrt — dictionary lookup on the math module object (another hash + probe)

These two dictionary lookups happen on every iteration. For a loop with 1 million iterations, that is 2 million unnecessary dict lookups.

The fix:

sqrt = math.sqrt  # One LOAD_GLOBAL, one LOAD_ATTR, outside the loop
def compute_norms_fast(vectors):
    results = []
    for v in vectors:
        norm = sqrt(v[0]**2 + v[1]**2)  # LOAD_FAST — array index, no hashing
        results.append(norm)
    return results

This is exactly what CPython's bytecode shows: the sqrt local is loaded with LOAD_FAST instead of LOAD_GLOBAL + LOAD_ATTR. Real-world speedup: 15-30% for math-heavy loops.

Starter Code

import dis
from collections import Counter

def bytecode_profile(func):
    """Analyze a function's bytecode and produce a performance profile:
    - Total instruction count
    - Opcode frequency distribution
    - Estimated 'hot' vs 'cold' operations
    - Flag any potentially expensive patterns (LOAD_GLOBAL in loops, etc.)
    """
    pass

Expected Output

See solution for full profiler output

Hints

Hint 1: Use dis.get_instructions() to iterate all instructions. Count by opname. Flag LOAD_GLOBAL as "potentially slow" (dict lookup vs LOAD_FAST array access).

Hint 2: To detect LOAD_GLOBAL inside a loop, look for LOAD_GLOBAL instructions that appear between a GET_ITER and a JUMP_BACKWARD in the instruction sequence.

#10Decompile Bytecode Back to PseudocodeHard

disdecompilationpseudocodereverse-engineering

Build a partial decompiler that translates CPython bytecode back into pseudocode. This is how tools like uncompyle6 work.

import dis

def bytecode_to_pseudocode(func):
    instructions = list(dis.get_instructions(func))
    stack = []
    output = []

    binary_ops = {
        0: "+", 1: "&", 2: "//", 3: "@", 4: "<<", 5: "*",
        6: "%", 7: "|", 8: "**", 9: ">>", 10: "-",
        11: "/", 12: "^", 13: "+=", 14: "&=",
    }
    compare_ops = {0: "<", 1: "<=", 2: "==", 3: "!=", 4: ">", 5: ">="}

    for instr in instructions:
        op = instr.opname

        if op == "RESUME":
            continue
        elif op in ("LOAD_FAST", "LOAD_GLOBAL", "LOAD_DEREF"):
            stack.append(str(instr.argval))
        elif op == "LOAD_CONST":
            stack.append(repr(instr.argval))
        elif op == "BINARY_OP":
            rhs = stack.pop()
            lhs = stack.pop()
            sym = binary_ops.get(instr.arg, "?")
            stack.append(f"({lhs} {sym} {rhs})")
        elif op == "COMPARE_OP":
            rhs = stack.pop()
            lhs = stack.pop()
            sym = compare_ops.get(instr.arg, "?")
            stack.append(f"({lhs} {sym} {rhs})")
        elif op == "STORE_FAST":
            val = stack.pop() if stack else "?"
            output.append(f"  {instr.argval} = {val}")
        elif op in ("POP_JUMP_IF_FALSE", "POP_JUMP_FORWARD_IF_FALSE"):
            cond = stack.pop() if stack else "?"
            output.append(f"  if not {cond}: goto offset {instr.argval}")
        elif op == "RETURN_VALUE":
            val = stack.pop() if stack else "?"
            output.append(f"  return {val}")
        else:
            output.append(f"  # [{op} {instr.argval}]")

    return "\n".join(output)

# Test
def hypotenuse(a, b):
    c_sq = a * a + b * b
    return c_sq ** 0.5

print(f"def {hypotenuse.__name__}(a, b):")
print(bytecode_to_pseudocode(hypotenuse))

Solution

import dis

def bytecode_to_pseudocode(func):
    instructions = list(dis.get_instructions(func))
    stack = []
    output = []

    binary_ops = {
        0: "+", 1: "&", 2: "//", 3: "@", 4: "<<", 5: "*",
        6: "%", 7: "|", 8: "**", 9: ">>", 10: "-",
        11: "/", 12: "^", 13: "+=", 14: "&=",
    }
    compare_ops = {0: "<", 1: "<=", 2: "==", 3: "!=", 4: ">", 5: ">="}

    for instr in instructions:
        op = instr.opname

        if op == "RESUME":
            continue
        elif op in ("LOAD_FAST", "LOAD_GLOBAL", "LOAD_DEREF"):
            stack.append(str(instr.argval))
        elif op == "LOAD_CONST":
            stack.append(repr(instr.argval))
        elif op == "BINARY_OP":
            rhs = stack.pop()
            lhs = stack.pop()
            sym = binary_ops.get(instr.arg, "?")
            stack.append(f"({lhs} {sym} {rhs})")
        elif op == "COMPARE_OP":
            rhs = stack.pop()
            lhs = stack.pop()
            sym = compare_ops.get(instr.arg, "?")
            stack.append(f"({lhs} {sym} {rhs})")
        elif op == "STORE_FAST":
            val = stack.pop() if stack else "?"
            output.append(f"  {instr.argval} = {val}")
        elif op in ("POP_JUMP_IF_FALSE", "POP_JUMP_FORWARD_IF_FALSE"):
            cond = stack.pop() if stack else "?"
            output.append(f"  if not {cond}: goto offset {instr.argval}")
        elif op == "RETURN_VALUE":
            val = stack.pop() if stack else "?"
            output.append(f"  return {val}")
        else:
            output.append(f"  # [{op} {instr.argval}]")

    return "\n".join(output)

def hypotenuse(a, b):
    c_sq = a * a + b * b
    return c_sq ** 0.5

print(f"def {hypotenuse.__name__}(a, b):")
print(bytecode_to_pseudocode(hypotenuse))

Expected pseudocode output:

def hypotenuse(a, b):
  c_sq = ((a * a) + (b * b))
  return (c_sq ** 0.5)

Why decompilation is hard in general:

This simple version handles straight-line code well but struggles with:

Control flow: Jumps create branches that require reconstructing if/while/for structures from raw offsets (this is the core of decompiler research)
Function calls: CALL pops N arguments plus the function — needs special handling
Comprehensions: Nested code objects must be decompiled separately and inlined
Exception handling: SETUP_FINALLY, PUSH_EXC_INFO, etc. create complex control flow graphs

Real decompilers (uncompyle6, decompile3) use control flow graph analysis to reconstruct the original structure. They build a graph of basic blocks connected by jump edges, then match patterns against known compiler output templates to reconstruct if, while, for, try/except from the raw jump instructions.

Starter Code

import dis

def bytecode_to_pseudocode(func):
    """Translate a function's bytecode into human-readable pseudocode.
    Handle: LOAD_FAST, LOAD_CONST, LOAD_GLOBAL, BINARY_OP,
    STORE_FAST, COMPARE_OP, POP_JUMP_IF_FALSE, RETURN_VALUE.
    Use a stack to track operands and build expressions.
    """
    pass

Expected Output

See solution for pseudocode output

Hints

Hint 1: Model the value stack as a Python list of strings. LOAD_ instructions push name strings; BINARY_OP pops two strings and pushes a combined expression string like "(a + b)".

Hint 2: For STORE_FAST, emit an assignment statement. For RETURN_VALUE, emit "return <top>". Jumps are the hardest part — simplify by emitting "if <condition>: goto <offset>".

#11Build an Opcode-Based Complexity ScorerHard

discomplexityCALLLOAD_GLOBALloops

Build a complexity scorer that assigns a numeric score to a function based on its bytecode characteristics. Higher scores indicate code that is likely slower.

import dis
from dataclasses import dataclass, field
from typing import List

@dataclass
class ComplexityReport:
    function_name: str
    total_instructions: int
    call_count: int
    load_global_count: int
    loop_count: int
    complexity_score: float
    warnings: List[str] = field(default_factory=list)

def score_complexity(func) -> ComplexityReport:
    instructions = list(dis.get_instructions(func))
    loop_depth = 0
    call_count = 0
    load_global_count = 0
    loop_count = 0
    score = 0
    warnings = []

    for instr in instructions:
        op = instr.opname
        score += 1  # baseline per instruction

        if op == "GET_ITER":
            loop_depth += 1
            loop_count += 1
            score += 10
        elif op == "JUMP_BACKWARD":
            loop_depth = max(0, loop_depth - 1)
        elif op in ("CALL", "CALL_FUNCTION", "CALL_METHOD"):
            call_count += 1
            score += 5
        elif op == "LOAD_GLOBAL":
            load_global_count += 1
            score += 3
            if loop_depth > 0:
                score += 20
                warnings.append(
                    f"LOAD_GLOBAL '{instr.argval}' inside loop at offset {instr.offset}"
                )

    return ComplexityReport(
        function_name=func.__name__,
        total_instructions=len(instructions),
        call_count=call_count,
        load_global_count=load_global_count,
        loop_count=loop_count,
        complexity_score=score,
        warnings=warnings,
    )

def print_report(report):
    print(f"\n=== {report.function_name} ===")
    print(f"  Total instructions:  {report.total_instructions}")
    print(f"  CALL count:          {report.call_count}")
    print(f"  LOAD_GLOBAL count:   {report.load_global_count}")
    print(f"  Loop count:          {report.loop_count}")
    print(f"  Complexity score:    {report.complexity_score}")
    if report.warnings:
        print(f"  Warnings:")
        for w in report.warnings:
            print(f"    {w}")

# Test functions
import math

def simple(x, y):
    return x + y

def loop_with_globals(data):
    result = []
    for item in data:
        result.append(math.sqrt(item))
    return result

def loop_with_local(data):
    sqrt = math.sqrt
    result = []
    for item in data:
        result.append(sqrt(item))
    return result

for fn in [simple, loop_with_globals, loop_with_local]:
    print_report(score_complexity(fn))

Solution

import dis
from dataclasses import dataclass, field
from typing import List

@dataclass
class ComplexityReport:
    function_name: str
    total_instructions: int
    call_count: int
    load_global_count: int
    loop_count: int
    complexity_score: float
    warnings: List[str] = field(default_factory=list)

def score_complexity(func) -> ComplexityReport:
    instructions = list(dis.get_instructions(func))
    loop_depth = 0
    call_count = 0
    load_global_count = 0
    loop_count = 0
    score = 0
    warnings = []

    for instr in instructions:
        op = instr.opname
        score += 1

        if op == "GET_ITER":
            loop_depth += 1
            loop_count += 1
            score += 10
        elif op == "JUMP_BACKWARD":
            loop_depth = max(0, loop_depth - 1)
        elif op in ("CALL", "CALL_FUNCTION", "CALL_METHOD"):
            call_count += 1
            score += 5
        elif op == "LOAD_GLOBAL":
            load_global_count += 1
            score += 3
            if loop_depth > 0:
                score += 20
                warnings.append(
                    f"LOAD_GLOBAL '{instr.argval}' inside loop at offset {instr.offset}"
                )

    return ComplexityReport(
        function_name=func.__name__,
        total_instructions=len(instructions),
        call_count=call_count,
        load_global_count=load_global_count,
        loop_count=loop_count,
        complexity_score=score,
        warnings=warnings,
    )

def print_report(report):
    print(f"\n=== {report.function_name} ===")
    print(f"  Total instructions:  {report.total_instructions}")
    print(f"  CALL count:          {report.call_count}")
    print(f"  LOAD_GLOBAL count:   {report.load_global_count}")
    print(f"  Loop count:          {report.loop_count}")
    print(f"  Complexity score:    {report.complexity_score}")
    if report.warnings:
        print(f"  Warnings:")
        for w in report.warnings:
            print(f"    {w}")

import math

def simple(x, y):
    return x + y

def loop_with_globals(data):
    result = []
    for item in data:
        result.append(math.sqrt(item))
    return result

def loop_with_local(data):
    sqrt = math.sqrt
    result = []
    for item in data:
        result.append(sqrt(item))
    return result

for fn in [simple, loop_with_globals, loop_with_local]:
    print_report(score_complexity(fn))

Expected complexity scores:

simple: low (5-10) — no loops, no globals, two LOAD_FAST
loop_with_globals: high (50+) — loop with LOAD_GLOBAL math and append on every iteration
loop_with_local: medium (25-35) — loop without LOAD_GLOBAL inside the body (sqrt is LOAD_FAST)

Extensions:

Add scoring for nested loops (multiply by nesting depth)
Flag LOAD_ATTR inside loops (attribute lookups are also dict-based)
Add a suggest_fix() method that generates specific optimization advice based on warnings
Integrate with ast.parse() to also report cyclomatic complexity alongside bytecode complexity

Starter Code

import dis
from dataclasses import dataclass, field
from typing import List

@dataclass
class ComplexityReport:
    function_name: str
    total_instructions: int
    call_count: int
    load_global_count: int
    loop_count: int
    complexity_score: float
    warnings: List[str] = field(default_factory=list)

def score_complexity(func) -> ComplexityReport:
    """Score a function's bytecode complexity.
    
    Scoring rules:
      +1 per instruction (baseline)
      +5 per CALL instruction (expensive dispatch)
      +3 per LOAD_GLOBAL (dict lookup vs array access)
      +10 per loop (GET_ITER = one loop)
      +20 if LOAD_GLOBAL appears inside a loop
    """
    pass

Expected Output

See solution for complexity scoring report

Hints

Hint 1: Track loop state with a counter that increments on GET_ITER and decrements on JUMP_BACKWARD. A LOAD_GLOBAL when the counter > 0 is inside a loop.

Hint 2: Accumulate the score using the rules given. Package everything into the ComplexityReport dataclass.

Practice: Disassembly with dis

Easy​

Medium​

Hard​

Easy

Medium

Hard