What is python dis module?

Master Python bytecode disassembly with the dis module at engineering depth - reading disassembly output, key opcodes explained, value stack evolution, comparing equivalent Python patterns at the instruction level, and practical performance insights.

How does python bytecode disassembly work in practice?

Disassembly with dis - Reading CPython Bytecode covers python dis module, python bytecode disassembly, python opcodes from first principles with code examples. Free lesson at https://engineersofai.com/docs/python/python-intermediate/python-internals/disassembly-with-dis

What is the difference between python dis module and python opcodes?

See the full breakdown at https://engineersofai.com/docs/python/python-intermediate/python-internals/disassembly-with-dis

Disassembly with `dis` - Reading CPython Bytecode

Reading time: ~40 minutes | Level: Intermediate → Engineering

Before reading further, predict what dis.dis(mystery) will print, and what mystery(3) and mystery(-2) will return:

import dis

def mystery(x):
    return x * 2 if x > 0 else -x

dis.dis(mystery)

Write out the opcode sequence you expect to see, in order.

Almost no one gets this right without having read bytecode before. The actual dis.dis output (Python 3.12):

  2           RESUME                   0

  3           LOAD_FAST                0 (x)
              LOAD_CONST               1 (2)
              BINARY_OP                5 (*)
              LOAD_FAST                0 (x)
              LOAD_CONST               2 (0)
              COMPARE_OP               4 (>)
              POP_JUMP_IF_FALSE        3 (to 26)
              RETURN_VALUE

  >>   26     LOAD_FAST                0 (x)
              UNARY_NEGATIVE
              RETURN_VALUE

mystery(3) returns 6. mystery(-2) returns 2.

Three things should surprise you here. First: the conditional expression evaluates the "true" branch first, then jumps over it if the condition is false - it is backward from how you read it in source. Second: x * 2 is computed before the x > 0 check, not after. Third: there are two RETURN_VALUE instructions - CPython generates one per branch rather than one shared exit point.

Once you understand why, you understand something about how CPython's compiler works. That is what this lesson builds.

What You Will Learn

The dis.dis(), dis.disassemble(), and dis.get_instructions() API
How to read disassembly output: offset, line number, opcode name, argument, comment
The key opcodes and what they do on the value stack
Value stack evolution step by step through a real example
How equivalent Python patterns compile differently (or identically)
Practical performance insights from bytecode comparison
The dis.Bytecode object for structured programmatic access

Prerequisites

Lesson 01: CPython Architecture (the eval loop, the value stack)
Lesson 02: Bytecode Inspection (the code object and its attributes)

Part 1 - The `dis` Module API

`dis.dis()` - Human-Readable Disassembly

dis.dis() disassembles a function, method, class, module, string of source, or bytes-like object and prints the result:

import dis

def add(a, b):
    return a + b

dis.dis(add)

Output:

  2           RESUME                   0

  3           LOAD_FAST                0 (a)
              LOAD_FAST                1 (b)
              BINARY_OP                0 (+)
              RETURN_VALUE

Reading the Output Format

Each line has up to five fields:

  3           LOAD_FAST                0 (a)
  ^           ^                        ^ ^
  |           |                        | +-- human-readable comment (variable name, const value)
  |           |                        +---- opcode argument (index or value)
  |           +----------------------------- opcode name
  +----------------------------------------- source line number (only shown at first instruction per line)

There is also an optional >> prefix indicating a jump target:

  >>   26     LOAD_FAST                0 (x)
  ^^   ^^
  ||   |+--- bytecode offset (in bytes from start of co_code)
  ||   +---- offset field
  |+-------- ">>" marks this offset as a jump target
  +--------- (no line number - same line as previous)

The offset is the byte position of this instruction within co_code. Since Python 3.6, each instruction is exactly 2 bytes (opcode byte + argument byte), so offsets increment by 2.

`dis.get_instructions()` - Structured Access

dis.get_instructions() returns an iterator of dis.Instruction named tuples - useful when writing tools that process bytecode programmatically:

import dis

def greet(name):
    return f"Hello, {name}"

for instr in dis.get_instructions(greet):
    print(
        f"{instr.offset:4d}  {instr.opname:<20s}  "
        f"arg={instr.argval!r:20}  "
        f"line={instr.starts_line}"
    )

dis.Instruction fields:

opname: the instruction name as a string
opcode: the numeric opcode
arg: the raw argument integer
argval: the resolved argument (e.g., the actual variable name instead of the index)
argrepr: human-readable representation
offset: byte offset in co_code
starts_line: source line number if this instruction starts a new line, else None
is_jump_target: True if this instruction is a jump target

`dis.Bytecode` - Object-Oriented Interface

dis.Bytecode wraps a callable and provides iterable, indexable, and printable access to its bytecode:

import dis

def compute(x, y):
    return x ** 2 + y ** 2

bc = dis.Bytecode(compute)
print(bc.info())         # summary of the code object
print(bc.dis())          # same as dis.dis() but returned as a string

for instr in bc:
    if instr.opname == "BINARY_OP":
        print(f"Binary operation: {instr.argrepr} at offset {instr.offset}")

Part 2 - Key Opcodes Explained

Load and Store Opcodes

These opcodes move values between the frame's storage and the value stack:

Opcode	What it does	Speed
`LOAD_FAST`	Push a local variable (from `co_varnames`) onto the stack	Fast - direct array index
`STORE_FAST`	Pop from stack; store in local variable array	Fast - direct array index
`LOAD_GLOBAL`	Push a global name (look up in `f_globals` then `f_builtins`)	Slower - dict lookup
`STORE_GLOBAL`	Pop from stack; store in `f_globals`	Slower - dict store
`LOAD_CONST`	Push a constant (from `co_consts`)	Fast - direct array index
`LOAD_DEREF`	Push a value from a cell (closure variable)	Medium - cell dereference
`STORE_DEREF`	Pop from stack; store in a cell	Medium - cell dereference
`LOAD_ATTR`	Pop object; push `getattr(object, name)`	Slowest - attribute lookup

:::tip LOAD_FAST Is Faster Than LOAD_GLOBAL LOAD_FAST is an indexed access into the frame's local variable array - essentially fastlocals[i]. LOAD_GLOBAL involves a dictionary hash lookup in f_globals, then potentially another in f_builtins.

For extremely hot loops that access a global function many thousands of times, pulling it into a local variable makes a measurable difference:

import math

def hot_loop_global(data):
    return [math.sqrt(x) for x in data]   # LOAD_GLOBAL math, LOAD_ATTR sqrt each iteration

def hot_loop_local(data):
    _sqrt = math.sqrt   # one LOAD_GLOBAL + LOAD_ATTR, then STORE_FAST
    return [_sqrt(x) for x in data]       # LOAD_FAST _sqrt each iteration

# Profile before optimising - only do this in measured hot paths

This is a micro-optimisation. Apply it only after profiling confirms it is the bottleneck. :::

Function Call Opcodes

In Python 3.10 and earlier:

CALL_FUNCTION n - call a function with n positional arguments
CALL_FUNCTION_KW n - call with keyword arguments
CALL_FUNCTION_EX - call with *args and **kwargs unpacking

In Python 3.11+:

PUSH_NULL - push a NULL marker for the call protocol
CALL n - unified call instruction replacing the above variants
PRECALL n - setup before CALL (3.11 only, removed in 3.12)

import dis

def caller():
    return len([1, 2, 3])

dis.dis(caller)
# Python 3.12 output:
#   LOAD_GLOBAL          1 (len + NULL)  -- pushes len and NULL marker
#   BUILD_LIST           0               -- builds []
#   LOAD_CONST           1 ((1, 2, 3))
#   LIST_EXTEND          1
#   CALL                 1               -- calls len([1, 2, 3])
#   RETURN_VALUE

Build Opcodes

These create new collection objects:

Opcode	What it builds
`BUILD_LIST n`	Pop `n` items from stack; push a list
`BUILD_TUPLE n`	Pop `n` items from stack; push a tuple
`BUILD_SET n`	Pop `n` items from stack; push a set
`BUILD_MAP n`	Pop `2n` items (alternating key, value); push a dict
`BUILD_STRING n`	Pop `n` strings; concatenate; push result
`FORMAT_VALUE`	Format a value for an f-string (with optional format spec)

import dis

def build_examples():
    a = [1, 2, 3]
    b = (4, 5)
    c = {"x": 1}

dis.dis(build_examples)
# LOAD_CONST  1      (1)
# LOAD_CONST  2      (2)
# LOAD_CONST  3      (3)
# BUILD_LIST  3      -- pops 3, pushes [1, 2, 3]
# STORE_FAST  0  (a)
# ...

Iteration Opcodes

import dis

def sum_squares(items):
    total = 0
    for x in items:
        total += x * x
    return total

dis.dis(sum_squares)

Key iteration opcodes:

GET_ITER - calls iter() on the top-of-stack object; pushes the iterator
FOR_ITER n - calls next() on the iterator; if StopIteration, jump forward by n bytes (exit the loop); else push the value and continue
The loop body executes; at the end, an unconditional JUMP_BACKWARD returns to FOR_ITER

Arithmetic and Comparison Opcodes

In Python 3.12, arithmetic uses a single BINARY_OP instruction with an argument encoding the operation:

`BINARY_OP` argument	Operation
0	`+`
1	`&`
2	`//`
3	`<<`
4	`@` (matmul)
5	`*`
6	`%`
7	`
8	`**`
9	`>>`
10	`-`
11	`/`
12	`^`
13	`+=` (in-place)
...	(in-place variants)

COMPARE_OP handles comparison operators (<, >, ==, !=, in, not in, is, is not).

Jump Opcodes

Opcode	Behaviour
`POP_JUMP_IF_FALSE n`	Pop; if falsy, jump to offset `n`
`POP_JUMP_IF_TRUE n`	Pop; if truthy, jump to offset `n`
`JUMP_FORWARD n`	Unconditional jump forward by `n` bytes
`JUMP_BACKWARD n`	Unconditional jump backward (for loops)
`JUMP_IF_FALSE_OR_POP`	Short-circuit `and`
`JUMP_IF_TRUE_OR_POP`	Short-circuit `or`

Part 3 - Value Stack Evolution

Step-by-Step Through `a + b`

The value stack is a LIFO stack of PyObject * pointers within the frame. Here is how LOAD_FAST a → LOAD_FAST b → BINARY_OP + evolves the stack:

Every opcode either pushes, pops, or both. The compiler statically computes the maximum stack depth (co_stacksize) so CPython can allocate the right amount of space in the frame.

Following the Mystery Function

Let's trace mystery(3) step by step:

def mystery(x):
    return x * 2 if x > 0 else -x

dis output:
  RESUME               0
  LOAD_FAST            0 (x)      # push x=3           stack: [3]
  LOAD_CONST           1 (2)      # push 2              stack: [3, 2]
  BINARY_OP            5 (*)      # pop 2,3; push 6     stack: [6]
  LOAD_FAST            0 (x)      # push x=3           stack: [6, 3]
  LOAD_CONST           2 (0)      # push 0              stack: [6, 3, 0]
  COMPARE_OP           4 (>)      # pop 0,3; 3>0=True; push True   stack: [6, True]
  POP_JUMP_IF_FALSE    to 26      # pop True; True so do NOT jump   stack: [6]
  RETURN_VALUE                    # pop 6; return 6

Now trace mystery(-2):

  RESUME               0
  LOAD_FAST            0 (x)      # push x=-2           stack: [-2]
  LOAD_CONST           1 (2)      # push 2               stack: [-2, 2]
  BINARY_OP            5 (*)      # pop 2,-2; push -4    stack: [-4]
  LOAD_FAST            0 (x)      # push x=-2            stack: [-4, -2]
  LOAD_CONST           2 (0)      # push 0               stack: [-4, -2, 0]
  COMPARE_OP           4 (>)      # pop 0,-2; -2>0=False; push False  stack: [-4, False]
  POP_JUMP_IF_FALSE    to 26      # pop False; False so JUMP to 26    stack: [-4]
  # jump to offset 26 - the stack still has -4 on it!
  # but wait - JUMP_IF_FALSE jumped HERE:
  >>  26:
  LOAD_FAST            0 (x)      # push x=-2            stack: [-4, -2]
  UNARY_NEGATIVE                  # pop -2; push 2       stack: [-4, 2]
  RETURN_VALUE                    # pop 2; return 2

Notice: the -4 value computed from x * 2 is still on the stack when we jump to the else branch. It is immediately overwritten - RETURN_VALUE only pops the top value and returns it. The -4 is "garbage" on the stack that CPython simply ignores because the frame is about to be destroyed. This is how CPython's compiler handles ternary expressions - it computes the "true" value first, then checks the condition, and if false, jumps to compute the "false" value and return that instead.

:::note Opcodes Changed Significantly in Python 3.11+ Python 3.11 introduced the "specialising adaptive interpreter": after a function is called enough times, CPython replaces generic opcodes with specialised ones. LOAD_GLOBAL might become LOAD_GLOBAL_MODULE (bypassing the builtins lookup). BINARY_OP for two integers might become BINARY_OP_ADD_INT. These specialisations are invisible to your Python code but make it faster. dis.dis() shows the original (non-specialised) opcodes. Python 3.12 extended this further with more specialised opcodes. :::

Part 4 - Comparing Equivalent Python Patterns

`x = x + 1` vs `x += 1`

import dis

def plus_assign(x):
    x = x + 1
    return x

def inplace(x):
    x += 1
    return x

print("=== x = x + 1 ===")
dis.dis(plus_assign)
print("=== x += 1 ===")
dis.dis(inplace)

Expected output pattern:

=== x = x + 1 ===
  LOAD_FAST    0 (x)
  LOAD_CONST   1 (1)
  BINARY_OP    0 (+)     # creates a new object
  STORE_FAST   0 (x)
  LOAD_FAST    0 (x)
  RETURN_VALUE

=== x += 1 ===
  LOAD_FAST    0 (x)
  LOAD_CONST   1 (1)
  BINARY_OP   13 (+=)    # in-place if supported by the type
  STORE_FAST   0 (x)
  LOAD_FAST    0 (x)
  RETURN_VALUE

For integers, += and + produce the same result because integers are immutable - there is no in-place operation. For mutable types like lists, += calls __iadd__ which modifies in place and is faster:

a = [1, 2, 3]
b = a
a += [4]      # calls a.__iadd__([4]) - modifies a in place; b also sees the change
print(b)      # [1, 2, 3, 4] - same object

a = [1, 2, 3]
b = a
a = a + [4]   # creates a new list; a now points to new object; b unchanged
print(b)      # [1, 2, 3]

List Comprehension vs `for` Loop

import dis

def list_comp(items):
    return [x * 2 for x in items]

def for_loop(items):
    result = []
    for x in items:
        result.append(x * 2)
    return result

dis.dis(list_comp)

The key difference: list_comp shows a MAKE_FUNCTION call - the comprehension compiles to a hidden code object that runs in its own scope. The outer function creates this inner function and calls it with GET_ITER / CALL. The inner comprehension code object uses LIST_APPEND to build the list.

The for_loop version uses explicit LOAD_ATTR (to get result.append) and CALL on every iteration.

In practice, list comprehensions are faster than equivalent for loops for two reasons:

The LIST_APPEND opcode is a direct C call, bypassing Python attribute lookup
The comprehension body runs in a tight inner loop with no LOAD_ATTR overhead for append

f-string vs `str()` vs `.format()`

import dis

name = "world"

def use_fstring():
    return f"Hello, {name}"

def use_str():
    return "Hello, " + str(name)

def use_format():
    return "Hello, {}".format(name)

dis.dis(use_fstring)
dis.dis(use_str)
dis.dis(use_format)

F-strings compile to:

LOAD_GLOBAL name (or LOAD_FAST if local)
FORMAT_VALUE - calls __format__ on the value
BUILD_STRING n - concatenates n string pieces

str(name) compiles to:

LOAD_GLOBAL str - load the str type
LOAD_GLOBAL name
CALL 1 - call str(name)

"...".format(name) compiles to:

LOAD_CONST "Hello, {}" - load the format string
LOAD_ATTR format - attribute lookup on the string
LOAD_GLOBAL name
CALL 1

F-strings are fastest for simple substitutions because FORMAT_VALUE + BUILD_STRING avoids the overhead of a full CALL instruction. For complex cases (multiple conversions, nested expressions), the difference is negligible.

:::warning dis Output Varies Across Python Versions - Do Not Hardcode Opcode Numbers Opcode names and numbers change between Python minor versions. Python 3.11 renamed and reorganised many opcodes. Python 3.12 added new specialised opcodes.

# DO NOT do this - will break on different Python versions:
assert instr.opcode == 90   # hardcoded opcode number

# DO this - use the name:
assert instr.opname == "LOAD_FAST"

# Or use the dis module's own mapping:
import dis
opcode_number = dis.opmap["LOAD_FAST"]   # safe - looks up current version's mapping

:::

`for` Loop vs Generator Expression in `sum()`

import dis

data = range(10)

def sum_loop():
    total = 0
    for x in data:
        total += x
    return total

def sum_genexpr():
    return sum(x for x in data)

def sum_direct():
    return sum(data)

sum_direct is the fastest - it passes the iterable directly to the C implementation of sum. sum_genexpr is nearly identical because CPython has an optimisation: sum(genexpr) avoids creating the generator object when the generator expression is the sole argument. sum_loop is the slowest because it executes Python bytecode for every iteration.

Part 5 - Practical Uses of `dis`

Understanding Why Locals Beat Globals in Hot Loops

import dis
import math

def global_access():
    result = 0
    for i in range(1000):
        result += math.sqrt(i)   # LOAD_GLOBAL + LOAD_ATTR on every iteration
    return result

def local_access():
    sqrt = math.sqrt             # one LOAD_GLOBAL + LOAD_ATTR, then STORE_FAST
    result = 0
    for i in range(1000):
        result += sqrt(i)        # LOAD_FAST on every iteration
    return result

# Verify with dis:
dis.dis(global_access)
# Inner loop body includes: LOAD_GLOBAL math, LOAD_ATTR sqrt

dis.dis(local_access)
# Inner loop body includes: LOAD_FAST sqrt  (much cheaper)

Spotting Unnecessary Attribute Lookups

import dis

class Processor:
    def __init__(self):
        self.count = 0

    def process_slow(self, items):
        for item in items:
            self.count += 1         # LOAD_FAST self, LOAD_ATTR count - every iteration
            self.count += item

    def process_fast(self, items):
        count = self.count          # one attribute load
        for item in items:
            count += 1              # LOAD_FAST count - no attribute lookup
            count += item
        self.count = count          # one attribute store

# dis shows the difference clearly:
dis.dis(Processor.process_slow)    # LOAD_ATTR count appears in the loop
dis.dis(Processor.process_fast)    # LOAD_ATTR only appears outside the loop

Using `dis` to Verify a Refactoring

import dis

# Before: multiple separate attribute lookups
def before(obj):
    obj.x = obj.x + 1
    obj.y = obj.y + 1
    obj.z = obj.z + 1

# After: reading attributes once
def after(obj):
    x, y, z = obj.x, obj.y, obj.z
    obj.x = x + 1
    obj.y = y + 1
    obj.z = z + 1

# Use dis to count LOAD_ATTR and STORE_ATTR instructions:
def count_opname(func, name):
    return sum(1 for i in dis.get_instructions(func) if i.opname == name)

print(f"before LOAD_ATTR:  {count_opname(before, 'LOAD_ATTR')}")   # 3
print(f"after  LOAD_ATTR:  {count_opname(after, 'LOAD_ATTR')}")    # 3 (same - loads still needed)
print(f"before STORE_ATTR: {count_opname(before, 'STORE_ATTR')}")  # 3
print(f"after  STORE_ATTR: {count_opname(after, 'STORE_ATTR')}")   # 3 (same)
# In this case, dis confirms the refactoring didn't reduce attribute ops -
# profile before optimising

:::danger Reading Bytecode Does Not Mean You Should Optimise at the Bytecode Level Understanding bytecode is a diagnostic tool, not an optimisation guide. The correct workflow is:

Profile first - use cProfile, line_profiler, or py-spy to find the actual bottleneck
Identify the hot path - where does the program spend its time?
Use dis to understand - confirm your mental model of what the hot path is doing
Optimise at the Python level - use better algorithms, data structures, or libraries (NumPy, etc.)
Only reach for C extensions as a last resort - Cython, cffi, or ctypes

Micro-optimising bytecode for non-hot paths is premature optimisation. A function called once at startup is not worth byte-counting. A function called 10 million times in the inner loop is. :::

Part 6 - `dis.Bytecode` for Programmatic Analysis

For building tools that analyse bytecode, dis.Bytecode provides a clean programmatic interface:

import dis

def analyse_function(func):
    """Report opcodes used, sorted by frequency."""
    from collections import Counter

    bc = dis.Bytecode(func)
    opcode_counts = Counter(instr.opname for instr in bc)

    print(f"Function: {func.__name__}")
    print(f"Total instructions: {sum(opcode_counts.values())}")
    print("Opcode frequency:")
    for opname, count in opcode_counts.most_common():
        print(f"  {opname:<25s} {count}")
    print()

def complex_example(data):
    result = []
    for item in data:
        if item > 0:
            result.append(item * 2)
        elif item < 0:
            result.append(-item)
    return result

analyse_function(complex_example)

Finding All Jump Targets

import dis

def find_branches(func):
    """Find all conditional branches in a function."""
    branches = []
    for instr in dis.get_instructions(func):
        if "JUMP" in instr.opname or instr.opname in ("FOR_ITER",):
            branches.append({
                "offset": instr.offset,
                "opname": instr.opname,
                "target": instr.argval,
                "line": instr.starts_line,
            })
    return branches

def example(x, items):
    if x > 0:
        for item in items:
            if item:
                return item
    return None

for branch in find_branches(example):
    print(branch)

Key Takeaways

dis.dis(func) prints human-readable disassembly; dis.get_instructions(func) returns structured Instruction objects; dis.Bytecode(func) provides an object-oriented interface
Each disassembly line shows: source line number (when it changes), bytecode offset, opcode name, argument, and a human-readable comment
The >> prefix marks a jump target - an offset that some other instruction jumps to
LOAD_FAST (local variable) is an array index operation - significantly faster than LOAD_GLOBAL (dict lookup) or LOAD_ATTR (attribute lookup)
The value stack is a LIFO stack; opcodes push, pop, or both; co_stacksize is the statically computed maximum depth
x += 1 compiles to BINARY_OP += (in-place attempt); x = x + 1 compiles to BINARY_OP + (new object); for integers (immutable) the result is identical; for lists (mutable) += is faster
List comprehensions compile to a separate hidden code object with LIST_APPEND; this is faster than explicit for + append because it avoids per-iteration attribute lookup
F-strings use FORMAT_VALUE + BUILD_STRING - faster than str() or .format() for simple substitutions
Always use instr.opname (string) rather than instr.opcode (number) when writing tools - opcode numbers change between Python versions
Profile before optimising - bytecode inspection is a diagnostic tool, not a guide to premature micro-optimisation

Graded Practice Challenges

Level 1 - Predict the Output

Question 1: How many LOAD_FAST instructions does this function contain?

import dis

def process(a, b, c):
    x = a + b
    y = x * c
    return x + y

count = sum(1 for i in dis.get_instructions(process) if i.opname == "LOAD_FAST")
print(count)

Show Answer

Output: 5

Trace the uses of local variables: a (1), b (1), x (1), x again (1), c (1), y (1) - wait, that is 6. Let's be more precise:

x = a + b: LOAD_FAST a, LOAD_FAST b (2 loads)
y = x * c: LOAD_FAST x, LOAD_FAST c (2 loads)
return x + y: LOAD_FAST x, LOAD_FAST y (2 loads)

Total: 6 LOAD_FAST instructions.

(The exact count may vary slightly by Python version due to optimisations, but 6 is the expected count in CPython 3.10–3.12.)

Question 2: What is the difference in the dis output between these two functions?

import dis

def f1():
    x = [1, 2, 3]
    return x

def f2():
    return [1, 2, 3]

dis.dis(f1)
dis.dis(f2)

Show Answer

f1 has a STORE_FAST x and then a LOAD_FAST x before RETURN_VALUE - one extra round-trip through the local variable store and load. f2 has no STORE_FAST/LOAD_FAST at all - the list goes directly from BUILD_LIST to RETURN_VALUE.

CPython does not eliminate the unnecessary store-and-load in f1 (it is not an optimising compiler in general). f2 is strictly shorter at the bytecode level. For a simple return like this, the compiler can often optimise it, but the explicit variable assignment prevents that optimisation.

Question 3: What does this print?

import dis

def short_circuit(a, b):
    return a or b

instructions = list(dis.get_instructions(short_circuit))
jump_instrs = [i for i in instructions if "JUMP" in i.opname]
print(len(jump_instrs))
print(jump_instrs[0].opname)

Show Answer

Output:

1
JUMP_IF_TRUE_OR_POP

The or operator compiles to JUMP_IF_TRUE_OR_POP: if the left operand is truthy, jump (keeping it on the stack as the result); if falsy, pop it and evaluate the right operand. There is exactly one jump instruction. The opname is JUMP_IF_TRUE_OR_POP.

Question 4: True or False - a list comprehension and its equivalent for loop always produce the same bytecode?

def list_comp():
    return [x for x in range(5)]

def for_loop():
    result = []
    for x in range(5):
        result.append(x)
    return result

Show Answer

False. They are semantically equivalent but produce different bytecode. The list comprehension compiles to a nested code object (a hidden <listcomp> function) that is created with MAKE_FUNCTION and called via CALL. It uses LIST_APPEND inside the inner code object. The for loop uses LOAD_ATTR to get list.append, then CALL on each iteration. They are different instruction sequences, and the comprehension version is generally faster due to the optimised LIST_APPEND opcode.

Question 5: What does this print?

import dis

def f(x):
    if x:
        return 1
    return 2

has_two_returns = sum(
    1 for i in dis.get_instructions(f) if i.opname == "RETURN_VALUE"
)
print(has_two_returns)

Show Answer

Output: 2

CPython generates a separate RETURN_VALUE instruction for each return path through the function. The if x: return 1 branch has one RETURN_VALUE, and the return 2 has another. Unlike some compilers that merge return paths into a single exit point, CPython generates one RETURN_VALUE per explicit return statement.

Level 2 - Debug Challenge

A developer uses dis to try to confirm that a performance optimisation worked. Find the flaw in their reasoning:

import dis

# Original version
def process_original(items):
    result = []
    for item in items:
        result.append(item * 2)
    return result

# "Optimised" version - developer claims it avoids attribute lookup
def process_optimised(items):
    result = []
    append = result.append   # cache the bound method
    for item in items:
        append(item * 2)
    return result

# Developer's analysis:
orig_attrs = sum(1 for i in dis.get_instructions(process_original) if i.opname == "LOAD_ATTR")
opt_attrs = sum(1 for i in dis.get_instructions(process_optimised) if i.opname == "LOAD_ATTR")

print(f"Original LOAD_ATTR count: {orig_attrs}")    # prints 1 (for .append in loop)
print(f"Optimised LOAD_ATTR count: {opt_attrs}")    # prints 1 (for .append in setup)
print("Optimisation saved:", orig_attrs - opt_attrs, "attribute lookups per call")
# prints "Optimisation saved: 0 attribute lookups per call"
# Developer concludes: "no improvement - not worth it"

Show Solution

The flaw: The developer is counting LOAD_ATTR instructions in the function definition, not in the loop body. The dis output shows the bytecode statically - it does not account for how many times each instruction executes at runtime.

In process_original, LOAD_ATTR append is inside the for loop - it executes once per item in items. In process_optimised, LOAD_ATTR append is outside the loop - it executes exactly once regardless of how many items there are.

Correct analysis - count instructions per loop iteration, not per function:

import dis

def instructions_in_loop(func):
    """Roughly identify which instructions are inside a for loop."""
    instrs = list(dis.get_instructions(func))
    # Find FOR_ITER (start of loop) and JUMP_BACKWARD (end of loop)
    for_iter_offsets = [i.offset for i in instrs if i.opname == "FOR_ITER"]
    jump_back_offsets = [i.offset for i in instrs if i.opname == "JUMP_BACKWARD"]

    if not for_iter_offsets or not jump_back_offsets:
        return []

    loop_start = for_iter_offsets[0]
    loop_end = jump_back_offsets[-1]

    return [
        i for i in instrs
        if loop_start < i.offset <= loop_end
    ]

orig_loop = instructions_in_loop(process_original)
opt_loop = instructions_in_loop(process_optimised)

orig_load_attr = sum(1 for i in orig_loop if i.opname == "LOAD_ATTR")
opt_load_attr = sum(1 for i in opt_loop if i.opname == "LOAD_ATTR")

print(f"Original LOAD_ATTR per iteration: {orig_load_attr}")    # 1
print(f"Optimised LOAD_ATTR per iteration: {opt_load_attr}")    # 0
print(f"Optimisation saves {orig_load_attr - opt_load_attr} LOAD_ATTR per iteration")
# Optimisation saves 1 LOAD_ATTR per iteration - meaningful at large scale

The optimisation is real. For 1 million items, it saves 1 million LOAD_ATTR operations. The original analysis was wrong because it counted total instructions per function call, not per loop iteration.

Level 3 - Design Challenge

Design a BytecodeProfiler class that:

Accepts any Python function
Analyses the bytecode to classify instructions by category (loads, stores, calls, builds, jumps, returns, arithmetic)
Identifies which instructions are inside loop bodies vs outside loops
Produces a cost estimate by weighting instruction categories (e.g., LOAD_ATTR is more expensive than LOAD_FAST)
Reports a human-readable summary with a hotspot warning if expensive instructions are inside loops

# Target usage:
def slow_func(items):
    result = []
    for item in items:
        result.append(item.strip().upper())
    return result

profiler = BytecodeProfiler(slow_func)
profiler.report()
# Function: slow_func
# Total instructions: N
# Instructions in loop body: M
# Estimated cost per iteration: K units
# HOTSPOT WARNING: LOAD_ATTR found inside loop body (3 occurrences)
#   Consider caching: str.strip, str.upper, list.append

Show Reference Solution

import dis
from collections import defaultdict


# Relative cost weights per opcode category
INSTRUCTION_COSTS = {
    "LOAD_FAST": 1,
    "STORE_FAST": 1,
    "LOAD_CONST": 1,
    "LOAD_GLOBAL": 3,
    "STORE_GLOBAL": 3,
    "LOAD_ATTR": 5,
    "STORE_ATTR": 5,
    "LOAD_DEREF": 2,
    "STORE_DEREF": 2,
    "CALL": 10,
    "BINARY_OP": 2,
    "COMPARE_OP": 2,
    "BUILD_LIST": 2,
    "BUILD_TUPLE": 2,
    "BUILD_DICT": 3,
    "FOR_ITER": 3,
    "GET_ITER": 2,
    "JUMP_BACKWARD": 1,
    "POP_JUMP_IF_FALSE": 1,
    "POP_JUMP_IF_TRUE": 1,
    "RETURN_VALUE": 1,
}

EXPENSIVE_IN_LOOP = {"LOAD_ATTR", "STORE_ATTR", "LOAD_GLOBAL", "CALL"}


class BytecodeProfiler:
    def __init__(self, func):
        self._func = func
        self._instructions = list(dis.get_instructions(func))

    def _find_loop_range(self):
        """Return (start_offset, end_offset) for the first for loop, or None."""
        for_iter = next(
            (i for i in self._instructions if i.opname == "FOR_ITER"), None
        )
        jump_back = next(
            (i for i in reversed(self._instructions) if i.opname == "JUMP_BACKWARD"),
            None,
        )
        if for_iter and jump_back:
            return (for_iter.offset, jump_back.offset)
        return None

    def _classify(self, instr):
        name = instr.opname
        if name.startswith("LOAD"):
            return "load"
        if name.startswith("STORE"):
            return "store"
        if name in ("CALL", "CALL_FUNCTION", "CALL_FUNCTION_KW"):
            return "call"
        if name.startswith("BUILD"):
            return "build"
        if "JUMP" in name or name in ("FOR_ITER", "GET_ITER"):
            return "control"
        if name in ("BINARY_OP", "COMPARE_OP", "UNARY_NEGATIVE", "UNARY_NOT"):
            return "arithmetic"
        if name == "RETURN_VALUE":
            return "return"
        return "other"

    def _cost(self, instr):
        return INSTRUCTION_COSTS.get(instr.opname, 2)

    def report(self):
        loop_range = self._find_loop_range()
        loop_instrs = []
        outer_instrs = []

        for instr in self._instructions:
            if loop_range and loop_range[0] < instr.offset <= loop_range[1]:
                loop_instrs.append(instr)
            else:
                outer_instrs.append(instr)

        total = len(self._instructions)
        loop_count = len(loop_instrs)
        loop_cost = sum(self._cost(i) for i in loop_instrs)

        categories = defaultdict(int)
        for instr in self._instructions:
            categories[self._classify(instr)] += 1

        print(f"Function: {self._func.__name__}")
        print(f"Total instructions: {total}")
        print(f"Instructions in loop body: {loop_count}")
        print(f"Estimated cost per loop iteration: {loop_cost} units")
        print()
        print("Instruction categories:")
        for cat, count in sorted(categories.items(), key=lambda x: -x[1]):
            print(f"  {cat:<12s} {count}")

        # Hotspot warnings
        if loop_instrs:
            print()
            hotspots = [
                i for i in loop_instrs if i.opname in EXPENSIVE_IN_LOOP
            ]
            if hotspots:
                from collections import Counter
                hotspot_counts = Counter(i.opname for i in hotspots)
                print("HOTSPOT WARNING: Expensive operations inside loop body:")
                for opname, count in hotspot_counts.most_common():
                    attrs = [
                        i.argrepr for i in hotspots
                        if i.opname == opname
                    ]
                    print(f"  {opname} ({count} occurrences): {', '.join(set(attrs))}")
                print("  Consider caching attribute lookups and globals outside the loop.")
            else:
                print("No hotspot warnings - loop body looks clean.")


# Demo:
def slow_func(items):
    result = []
    for item in items:
        result.append(item.strip().upper())
    return result

profiler = BytecodeProfiler(slow_func)
profiler.report()

Key design decisions:

_find_loop_range identifies the loop body by finding FOR_ITER (loop header) and JUMP_BACKWARD (loop tail) - instructions between these offsets are inside the loop
The cost weights in INSTRUCTION_COSTS are rough relative heuristics, not profiled measurements - the tool is for directional guidance, not precise benchmarking
EXPENSIVE_IN_LOOP flags LOAD_ATTR, LOAD_GLOBAL, and CALL as worth investigating when found in loop bodies
The tool complements - it does not replace - a real profiler like cProfile or py-spy

What's Next

Lesson 04 covers The GIL Explained - what CPython's Global Interpreter Lock actually is, why it exists, what it protects, how it interacts with threads and I/O, when it matters in practice, and what Python 3.12+ is doing to weaken it. You have now seen the eval loop at the bytecode level. The GIL is what controls which thread gets to run that eval loop at any given moment.

What You Will Learn​

Prerequisites​

Part 1 - The dis Module API​

dis.dis() - Human-Readable Disassembly​

Reading the Output Format​

dis.get_instructions() - Structured Access​

dis.Bytecode - Object-Oriented Interface​

Part 2 - Key Opcodes Explained​

Load and Store Opcodes​

Function Call Opcodes​

Build Opcodes​

Iteration Opcodes​

Arithmetic and Comparison Opcodes​

Jump Opcodes​

Part 3 - Value Stack Evolution​

Step-by-Step Through a + b​

Following the Mystery Function​

Part 4 - Comparing Equivalent Python Patterns​

x = x + 1 vs x += 1​

List Comprehension vs for Loop​

f-string vs str() vs .format()​

for Loop vs Generator Expression in sum()​

Part 5 - Practical Uses of dis​

Understanding Why Locals Beat Globals in Hot Loops​

Spotting Unnecessary Attribute Lookups​

Using dis to Verify a Refactoring​

Part 6 - dis.Bytecode for Programmatic Analysis​

Finding All Jump Targets​

Key Takeaways​

Graded Practice Challenges​

Level 1 - Predict the Output​

Level 2 - Debug Challenge​

Level 3 - Design Challenge​

What's Next​

What You Will Learn

Prerequisites

Part 1 - The `dis` Module API

`dis.dis()` - Human-Readable Disassembly

Reading the Output Format

`dis.get_instructions()` - Structured Access

`dis.Bytecode` - Object-Oriented Interface

Part 2 - Key Opcodes Explained

Load and Store Opcodes

Function Call Opcodes

Build Opcodes

Iteration Opcodes

Arithmetic and Comparison Opcodes

Jump Opcodes

Part 3 - Value Stack Evolution

Step-by-Step Through `a + b`

Following the Mystery Function

Part 4 - Comparing Equivalent Python Patterns

`x = x + 1` vs `x += 1`

List Comprehension vs `for` Loop

f-string vs `str()` vs `.format()`

`for` Loop vs Generator Expression in `sum()`

Part 5 - Practical Uses of `dis`

Understanding Why Locals Beat Globals in Hot Loops

Spotting Unnecessary Attribute Lookups

Using `dis` to Verify a Refactoring

Part 6 - `dis.Bytecode` for Programmatic Analysis

Finding All Jump Targets

Key Takeaways

Graded Practice Challenges

Level 1 - Predict the Output

Level 2 - Debug Challenge

Level 3 - Design Challenge

What's Next