What is python iterator protocol?

Master Python's iterator protocol at engineering depth - __iter__, __next__, StopIteration, the iterable vs iterator distinction, for-loop desugaring, iter() with sentinel, next() with default, and lazy pipelines with itertools.

How does python __iter__ __next__ work in practice?

The Iterator Protocol - How Python's for Loop Really Works covers python iterator protocol, python __iter__ __next__, python iterable vs iterator from first principles with code examples. Free lesson at https://engineersofai.com/docs/python/python-intermediate/functional-programming/iterators-protocol

What is the difference between python iterator protocol and python iterable vs iterator?

See the full breakdown at https://engineersofai.com/docs/python/python-intermediate/functional-programming/iterators-protocol

The Iterator Protocol - How Python's for Loop Really Works

Reading time: ~30 minutes | Level: Intermediate → Engineering

Before reading further, predict every output:

class NumberRange:
    def __init__(self, start, stop):
        self.start = start
        self.stop = stop

r = NumberRange(1, 4)
for n in r:
    print(n)

# Second loop:
for n in r:
    print(n)

Show Answer

This raises TypeError: 'NumberRange' object is not iterable on the very first loop. NumberRange has no __iter__ method, so Python cannot iterate over it.

If you had implemented __iter__ as a generator (yield self.start; yield self.start+1; ...), the first loop would print 1, 2, 3. Whether the second loop also prints depends on the design: if __iter__ returns self (the object is its own iterator), the second loop prints nothing because the iterator is exhausted. If __iter__ returns a new iterator each time (the object is a reusable iterable), the second loop prints 1, 2, 3 again.

This is the central question: are you building an iterable or an iterator?

Now consider: every for loop, every list(), every zip(), and every in check against a sequence calls this same protocol. Understanding the iterator protocol at this depth lets you build lazy data pipelines, custom sequence types, and efficient streaming processors.

What You Will Learn

The iterator protocol: __iter__ and __next__, and StopIteration
The critical distinction between iterables and iterators - and why it matters
How Python's for loop desugars into explicit protocol calls
Implementing a reusable iterable (separate iterator class) vs a one-shot iterator
iter() with a sentinel: reading until EOF
next() with a default: avoiding StopIteration
The itertools module: chain, islice, zip_longest, groupby, takewhile, dropwhile
Building lazy evaluation pipelines

Prerequisites

Lesson 01: Lambda Expressions - callable objects
Lesson 02: map, filter, reduce - higher-order functions consuming iterables
Lesson 03: Generators and yield - generator functions return iterators
Comfortable writing classes with __init__ and instance methods

Part 1 - The Iterator Protocol

Two Methods, One Contract

The iterator protocol consists of exactly two methods:

class SomeIterator:
    def __iter__(self):
        return self   # an iterator is its own iterator

    def __next__(self):
        # Return the next value, or:
        raise StopIteration  # signal exhaustion

An object that implements both __iter__ and __next__ is an iterator.

An object that implements only __iter__ (returning an iterator) is an iterable.

The built-in functions iter(obj) and next(obj) call these methods:

iter(obj)      # calls obj.__iter__()
next(obj)      # calls obj.__next__()

A Minimal Iterator

class CountUp:
    """An iterator that counts from start to stop (exclusive)."""

    def __init__(self, start: int, stop: int):
        self.current = start
        self.stop = stop

    def __iter__(self):
        return self   # this object IS the iterator

    def __next__(self):
        if self.current >= self.stop:
            raise StopIteration
        value = self.current
        self.current += 1
        return value

counter = CountUp(1, 4)
print(next(counter))   # 1
print(next(counter))   # 2
print(next(counter))   # 3
print(next(counter))   # raises StopIteration

Once StopIteration is raised, the iterator is exhausted. Calling next() again on an exhausted iterator should continue raising StopIteration (not reset).

Part 2 - Iterable vs Iterator: The Critical Distinction

The Class Diagram

Every iterator is also an iterable (because it has __iter__ returning itself). But not every iterable is an iterator.

# list is an iterable - it has __iter__
numbers = [1, 2, 3]
print(hasattr(numbers, '__iter__'))   # True
print(hasattr(numbers, '__next__'))   # False  ← not an iterator

# iter() creates an iterator from an iterable
it = iter(numbers)
print(hasattr(it, '__iter__'))    # True
print(hasattr(it, '__next__'))    # True  ← now it's an iterator

print(next(it))   # 1
print(next(it))   # 2
print(next(it))   # 3
print(next(it))   # StopIteration

Why This Matters: Reusability

A list is reusable in multiple for loops because each for loop calls iter(list) to get a fresh iterator. If the list itself were the iterator, the second loop would produce nothing:

numbers = [10, 20, 30]

for n in numbers:
    print(n)    # 10, 20, 30

for n in numbers:
    print(n)    # 10, 20, 30 - fresh iterator created by iter()

But an iterator used in two loops behaves differently:

it = iter([10, 20, 30])

for n in it:
    print(n)    # 10, 20, 30

for n in it:
    print(n)    # nothing - iterator is exhausted

:::warning Iterators Are Stateful and Single-Use An iterator maintains position state. Once exhausted, it stays exhausted. zip(), map(), filter(), enumerate(), and generator expressions all return iterators - single-use objects. If you need to iterate over them twice, either convert to a list first or recreate them.

doubled = map(lambda x: x * 2, [1, 2, 3])

print(list(doubled))   # [2, 4, 6]
print(list(doubled))   # [] - exhausted!

# Fix: recreate, or convert to list first
items = [1, 2, 3]
doubled_list = list(map(lambda x: x * 2, items))
print(doubled_list)    # [2, 4, 6]
print(doubled_list)    # [2, 4, 6] - list is reusable

:::

Part 3 - The for Loop Desugared

What Python Actually Does

The for loop is syntactic sugar over explicit iterator protocol calls. This:

for item in iterable:
    process(item)

is exactly equivalent to:

_iterator = iter(iterable)      # calls iterable.__iter__()
while True:
    try:
        item = next(_iterator)  # calls _iterator.__next__()
    except StopIteration:
        break
    process(item)

The Flowchart

This desugaring has important implications:

break abandons the iterator mid-stream - the remaining items are never consumed
continue skips the loop body but calls next() on the next iteration as normal
Modifying the iterable during iteration (e.g., adding to a list while iterating) causes undefined behavior - Python will raise RuntimeError: dictionary changed size during iteration for dicts

Verifying the Desugaring

class TracingIterable:
    """Prints every protocol call to show what the for loop does."""

    def __init__(self, data):
        self._data = data

    def __iter__(self):
        print(f"  __iter__ called → returning TracingIterator")
        return TracingIterator(iter(self._data))


class TracingIterator:
    def __init__(self, inner):
        self._inner = inner
        self._count = 0

    def __iter__(self):
        return self

    def __next__(self):
        self._count += 1
        try:
            value = next(self._inner)
            print(f"  __next__ call #{self._count} → {value}")
            return value
        except StopIteration:
            print(f"  __next__ call #{self._count} → StopIteration")
            raise


obj = TracingIterable([10, 20, 30])
for x in obj:
    print(f"    loop body: x = {x}")

Output:

  __iter__ called → returning TracingIterator
  __next__ call #1 → 10
    loop body: x = 10
  __next__ call #2 → 20
    loop body: x = 20
  __next__ call #3 → 30
    loop body: x = 30
  __next__ call #4 → StopIteration

Part 4 - Implementing a Reusable Iterable

The Separate Iterator Class Pattern

To make your object reusable across multiple for loops, separate the iterable from the iterator. The iterable's __iter__ returns a new iterator object each time:

class NumberRange:
    """A reusable iterable over a range of integers."""

    def __init__(self, start: int, stop: int, step: int = 1):
        self.start = start
        self.stop = stop
        self.step = step

    def __iter__(self):
        # Returns a NEW iterator every time - the object remains reusable
        return NumberRangeIterator(self.start, self.stop, self.step)

    def __repr__(self):
        return f"NumberRange({self.start}, {self.stop}, {self.step})"


class NumberRangeIterator:
    """The stateful iterator - created fresh by NumberRange.__iter__."""

    def __init__(self, start: int, stop: int, step: int):
        self.current = start
        self.stop = stop
        self.step = step

    def __iter__(self):
        return self   # iterator is its own iterator

    def __next__(self) -> int:
        if self.current >= self.stop:
            raise StopIteration
        value = self.current
        self.current += self.step
        return value


r = NumberRange(1, 10, 2)

# First loop
print(list(r))   # [1, 3, 5, 7, 9]

# Second loop - produces same result because __iter__ returns a NEW iterator
print(list(r))   # [1, 3, 5, 7, 9]

# Two simultaneous iterations are independent
it1 = iter(r)
it2 = iter(r)
print(next(it1))   # 1
print(next(it1))   # 3
print(next(it2))   # 1 - it2 is independent of it1

:::tip Make iter Return a New Iterator for Reusable Objects If your class represents a data container (a collection, a range, a dataset), its __iter__ should return a new iterator each time. This is how list, tuple, dict, set, and range all work. Only make an object its own iterator if it is inherently single-use (like a file handle or a network stream).

# Reusable iterable pattern (preferred for containers)
class MyCollection:
    def __iter__(self):
        return MyCollectionIterator(self._data)  # new object each time

# One-shot iterator pattern (for inherently stateful streams)
class MyStream:
    def __iter__(self):
        return self   # same object - exhausts on first full traversal

:::

One-Shot Iterator Pattern (When Appropriate)

For inherently sequential, stateful sources - file reads, network streams, generators - an object being its own iterator is correct:

class FileLineIterator:
    """Iterates over lines in a file, skipping blank lines and comments."""

    def __init__(self, filepath: str):
        self._file = open(filepath, encoding="utf-8")

    def __iter__(self):
        return self   # one-shot: file position is shared state

    def __next__(self) -> str:
        while True:
            line = self._file.readline()
            if not line:                        # EOF
                self._file.close()
                raise StopIteration
            stripped = line.strip()
            if stripped and not stripped.startswith("#"):
                return stripped

    def __del__(self):
        if not self._file.closed:
            self._file.close()

Part 5 - `iter()` with Sentinel and `next()` with Default

`iter(callable, sentinel)` - Reading Until a Value

iter() has a two-argument form that repeatedly calls a callable until it returns the sentinel value:

# iter(callable, sentinel)
# Equivalent to:
# while True:
#     value = callable()
#     if value == sentinel:
#         raise StopIteration
#     yield value

import io

data = io.StringIO("line1\nline2\nline3\n")

# Read lines until readline() returns "" (EOF)
for line in iter(data.readline, ""):
    print(repr(line))
# 'line1\n'
# 'line2\n'
# 'line3\n'

This is idiomatic for reading fixed-size chunks from a binary file:

import functools

def read_in_chunks(filepath: str, chunk_size: int = 4096):
    """Yields chunks of a binary file without loading it all into memory."""
    with open(filepath, "rb") as f:
        reader = functools.partial(f.read, chunk_size)
        for chunk in iter(reader, b""):
            yield chunk

# Usage
for chunk in read_in_chunks("/dev/urandom", 1024):
    process(chunk)
    break   # just one chunk for demo

`next(iterator, default)` - Avoiding StopIteration

The two-argument form of next() returns a default value instead of raising StopIteration:

it = iter([1, 2, 3])

print(next(it, None))    # 1
print(next(it, None))    # 2
print(next(it, None))    # 3
print(next(it, None))    # None - instead of StopIteration
print(next(it, None))    # None - stays at default

# Useful for "get first matching item" without try/except
numbers = [4, 7, 2, 9, 1]
first_even = next((n for n in numbers if n % 2 == 0), None)
print(first_even)   # 4

first_over_100 = next((n for n in numbers if n > 100), -1)
print(first_over_100)   # -1 (sentinel default)

:::danger Do Not Confuse iter(obj) with obj.iter() iter(obj) and obj.__iter__() are not identical. iter() adds two important behaviors:

For non-iterator iterables, iter(obj) calls obj.__iter__() and validates the result is an iterator (has __next__). Calling obj.__iter__() directly skips this check.
iter(callable, sentinel) does not call __iter__ at all - it wraps the callable in an iterator.

Always use the built-in iter() and next() rather than calling dunder methods directly:

# Wrong - bypasses validation, skips sentinel form
it = obj.__iter__()

# Right
it = iter(obj)

:::

Part 6 - Built-in Iterators You Already Use

:::note zip(), enumerate(), map(), filter() All Return Iterators Python 3 built-ins that process sequences return iterators, not lists. This is a deliberate memory-efficiency design - they do not evaluate the full sequence upfront.

z = zip([1, 2, 3], ["a", "b", "c"])
print(type(z))         # <class 'zip'>
print(hasattr(z, '__next__'))  # True - it's an iterator

e = enumerate(["x", "y"])
print(type(e))         # <class 'enumerate'>

m = map(str, [1, 2, 3])
print(type(m))         # <class 'map'>

f = filter(None, [0, 1, 2, None, 3])
print(type(f))         # <class 'filter'>

Consequence: passing zip(...) to two for loops consumes it on the first pass. Wrap in list() if you need multiple passes. :::

Part 7 - The `itertools` Module

itertools provides a suite of memory-efficient iterators for combinatorial and streaming operations. All itertools functions return iterators (lazy by default).

Combining Iterables: `chain`

from itertools import chain

# chain concatenates multiple iterables lazily
combined = chain([1, 2, 3], [4, 5], [6])
print(list(combined))   # [1, 2, 3, 4, 5, 6]

# chain.from_iterable: flatten one level of nesting
nested = [[1, 2], [3, 4], [5, 6]]
flat = list(chain.from_iterable(nested))
print(flat)   # [1, 2, 3, 4, 5, 6]

Slicing: `islice`

islice slices an iterator without materializing it - essential for taking the first N items from an infinite iterator:

from itertools import islice, count

# count() is an infinite iterator: 0, 1, 2, 3, ...
infinite = count(start=10, step=3)

# islice(iterable, stop) or islice(iterable, start, stop, step)
first_five = list(islice(infinite, 5))
print(first_five)   # [10, 13, 16, 19, 22]

# Skipping items: islice(iter, start=2, stop=7)
data = iter(range(100))
middle = list(islice(data, 2, 7))
print(middle)   # [2, 3, 4, 5, 6]

Zipping Unevenly: `zip_longest`

Standard zip() stops at the shortest iterable. zip_longest pads with a fill value:

from itertools import zip_longest

names = ["Alice", "Bob", "Charlie"]
scores = [95, 87]

# zip stops at length 2
for pair in zip(names, scores):
    print(pair)
# ('Alice', 95)
# ('Bob', 87)

# zip_longest pads missing values
for pair in zip_longest(names, scores, fillvalue=0):
    print(pair)
# ('Alice', 95)
# ('Bob', 87)
# ('Charlie', 0)

Grouping Consecutive Items: `groupby`

groupby groups consecutive elements with the same key - similar to SQL GROUP BY but requires the input to be sorted by the key first:

from itertools import groupby

# Input MUST be sorted by the key you group on
events = [
    {"type": "click", "x": 10},
    {"type": "click", "x": 20},
    {"type": "move",  "x": 30},
    {"type": "move",  "x": 40},
    {"type": "click", "x": 50},
]

# Sort by type first, then group
sorted_events = sorted(events, key=lambda e: e["type"])
for event_type, group in groupby(sorted_events, key=lambda e: e["type"]):
    items = list(group)
    print(f"{event_type}: {len(items)} events")
# click: 3 events
# move: 2 events

:::warning groupby Requires Sorted Input groupby only groups consecutive equal keys. If the data is unsorted, you get multiple groups for the same key. Always sort by the grouping key before passing to groupby.

from itertools import groupby

data = [1, 1, 2, 1, 1]   # unsorted - 1 appears in two separate runs

for key, group in groupby(data):
    print(key, list(group))
# 1 [1, 1]
# 2 [2]
# 1 [1, 1]  ← 1 appears again as a separate group!

:::

Conditional Iteration: `takewhile` and `dropwhile`

from itertools import takewhile, dropwhile

data = [2, 4, 6, 7, 8, 10]

# takewhile: yield items while predicate is True, stop at first False
evens_until_odd = list(takewhile(lambda x: x % 2 == 0, data))
print(evens_until_odd)   # [2, 4, 6]

# dropwhile: skip items while predicate is True, then yield all remaining
after_first_odd = list(dropwhile(lambda x: x % 2 == 0, data))
print(after_first_odd)   # [7, 8, 10]

Part 8 - Lazy Evaluation Pipelines

The real power of iterators is composing them into lazy pipelines: data flows through each stage one item at a time, without materializing intermediate lists.

from itertools import islice, chain, takewhile
import csv
import io

# Simulate a large CSV-like data source
raw_data = """
id,name,score,active
1,Alice,95,true
2,Bob,42,false
3,Charlie,88,true
4,Dave,30,false
5,Eve,91,true
6,Frank,55,true
""".strip()

def parse_csv_rows(text: str):
    """Yields dicts from CSV text, one row at a time."""
    reader = csv.DictReader(io.StringIO(text))
    yield from reader

def is_active(row: dict) -> bool:
    return row["active"] == "true"

def above_threshold(threshold: int):
    def predicate(row: dict) -> bool:
        return int(row["score"]) >= threshold
    return predicate

def extract_name(row: dict) -> str:
    return row["name"]

# Lazy pipeline - nothing is computed until iteration
pipeline = (
    extract_name(row)
    for row in parse_csv_rows(raw_data)
    if is_active(row) and above_threshold(70)(row)
)

# Consume only the first 3 results
top_names = list(islice(pipeline, 3))
print(top_names)   # ['Alice', 'Charlie', 'Eve']

The key insight: parse_csv_rows reads one row, passes it to the filter, passes it to the transform, and yields the result - before reading the next row. Memory usage stays constant regardless of the input file size.

Comparing Eager vs Lazy

import sys

data = range(1_000_000)

# Eager - materializes all 1 million items
eager = [x * 2 for x in data if x % 3 == 0]
print(f"Eager:  {sys.getsizeof(eager):,} bytes")      # ~2,800,000 bytes

# Lazy - iterator holds no data, just the computation recipe
lazy = (x * 2 for x in data if x % 3 == 0)
print(f"Lazy:   {sys.getsizeof(lazy):,} bytes")        # ~104 bytes

# Both produce the same values when consumed
print(next(lazy))    # 0
print(next(lazy))    # 6

Common Mistakes

Mistake 1 - Using the Iterator in Two Places

# Wrong - iterator exhausts after first use
rows = csv.reader(open("data.csv"))
headers = list(rows)         # consumes entire iterator
data = list(rows)            # empty - iterator is exhausted

# Right - re-open or use a list
with open("data.csv") as f:
    all_rows = list(csv.reader(f))
headers = all_rows[:1]
data = all_rows[1:]

Mistake 2 - Missing `return self` in `iter` on an Iterator

# Wrong - iterator is not iterable, can't be used in for loops
class BadIterator:
    def __next__(self):
        ...
    # no __iter__ → TypeError: 'BadIterator' object is not iterable

# Right
class GoodIterator:
    def __iter__(self):
        return self       # iterators must be their own iterables

    def __next__(self):
        ...

Mistake 3 - Consuming a Generator Twice

# Wrong - generator is a one-shot iterator
def generate_numbers():
    yield from range(5)

gen = generate_numbers()
print(sum(gen))       # 10 - consumed
print(sum(gen))       # 0  - already exhausted

# Right - call the generator function again
print(sum(generate_numbers()))   # 10
print(sum(generate_numbers()))   # 10

Mistake 4 - Forgetting groupby Needs Sorted Input

# Wrong - data not sorted by key
from itertools import groupby
data = [("a", 1), ("b", 2), ("a", 3)]
groups = {k: list(v) for k, v in groupby(data, key=lambda x: x[0])}
print(groups)   # {'a': [('a', 1)], 'b': [('b', 2)], 'a': [('a', 3)]}
# Key 'a' appears twice!

# Right - sort first
data_sorted = sorted(data, key=lambda x: x[0])
groups = {k: list(v) for k, v in groupby(data_sorted, key=lambda x: x[0])}
print(groups)   # {'a': [('a', 1), ('a', 3)], 'b': [('b', 2)]}

Graded Practice Challenges

Level 1 - Predict the Output

Question 1: What does this print?

nums = [10, 20, 30]
it = iter(nums)

print(next(it))
print(next(it))

for n in it:
    print(n)

for n in it:
    print(n)

Show Answer

Output:

10
20
30

next(it) consumes 10 and 20. The first for loop calls iter(it) - which returns it itself (iterators are their own iterators) - and then consumes 30 via next(). The second for loop also calls iter(it), gets it back, and immediately hits StopIteration with no output.

Question 2: What does this print?

from itertools import takewhile

data = [1, 3, 5, 6, 7, 9]
result = list(takewhile(lambda x: x % 2 != 0, data))
print(result)

Show Answer

Output:

[1, 3, 5]

takewhile yields items while the predicate returns True. It stops permanently at the first False - even if later items would satisfy the predicate. 6 is even so takewhile stops there. 7 and 9 are odd but are never reached.

Question 3: What does this print?

from itertools import chain

a = iter([1, 2])
b = iter([3, 4])
c = chain(a, b)

print(next(c))
print(next(a))
print(list(c))

Show Answer

Output:

1
2
[3, 4]

chain(a, b) creates a lazy chain that draws from a first, then b. next(c) pulls from a, returning 1. next(a) pulls directly from a, returning 2 - and also advances the shared iterator state that c holds. When list(c) is called, a is now exhausted, so chain moves to b and returns [3, 4].

Question 4: What does this print?

def make_range(n):
    class Range:
        def __iter__(self):
            self.i = 0
            return self
        def __next__(self):
            if self.i >= n:
                raise StopIteration
            self.i += 1
            return self.i
    return Range()

r = make_range(3)
print(list(r))
print(list(r))

Show Answer

Output:

[1, 2, 3]
[1, 2, 3]

This is a subtle case. Range.__iter__ resets self.i = 0 each time it is called. Because the for loop (and list()) calls iter(r) first, which calls r.__iter__(), which resets the counter, the range is reusable - even though Range is its own iterator. This is an unusual but valid pattern: reset state in __iter__. The standard approach is a separate iterator class, but resetting in __iter__ achieves the same effect.

Question 5: What does this print?

from itertools import islice

def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

fib = fibonacci()
first_eight = list(islice(fib, 8))
print(first_eight)

next_two = list(islice(fib, 2))
print(next_two)

Show Answer

Output:

[0, 1, 1, 2, 3, 5, 8, 13]
[21, 34]

fibonacci() returns an infinite generator. islice(fib, 8) lazily takes the first 8 values without stopping the generator. The generator fib is still alive after islice completes. The second islice(fib, 2) continues from where the generator left off - the 9th and 10th Fibonacci numbers, 21 and 34.

Level 2 - Debug Challenge

Find and fix all bugs:

from itertools import groupby

class EventLog:
    def __init__(self, events):
        self.events = events

    def __iter__(self):
        return self   # bug 1

    def __next__(self):
        if not self.events:
            raise StopIteration
        return self.events.pop(0)

log = EventLog(["click", "click", "move", "click"])

# First pass: count events
total = sum(1 for _ in log)
print(f"Total events: {total}")

# Second pass: iterate again - bug 2
for event in log:
    print(event)

# Group by event type - bug 3
events = ["click", "move", "click", "click", "move"]
for etype, group in groupby(events):
    print(etype, len(list(group)))

Show Solution

Bugs:

EventLog.__iter__ returns self, making EventLog a one-shot iterator. Each call to iter(log) returns the same object with shared (destructive) state. self.events.pop(0) mutates the list - after the first full pass, self.events is empty. The second loop sees nothing.
Directly caused by bug 1: log is exhausted after the first sum(1 for _ in log). A proper iterable should return a new iterator each time.
groupby(events) groups consecutive equal items. The input ["click", "move", "click", "click", "move"] has "click" appearing non-consecutively. Sort first for meaningful grouping.

Fixed version:

from itertools import groupby

class EventLog:
    def __init__(self, events):
        self._events = list(events)   # preserve original

    def __iter__(self):
        # Fix 1 & 2: return a new iterator each time
        return iter(list(self._events))


log = EventLog(["click", "click", "move", "click"])

# First pass: count events
total = sum(1 for _ in log)
print(f"Total events: {total}")   # 4

# Second pass: works now - __iter__ returns a fresh iterator
for event in log:
    print(event)   # click, click, move, click

# Fix 3: sort before groupby
events = ["click", "move", "click", "click", "move"]
for etype, group in groupby(sorted(events)):
    print(etype, len(list(group)))
# click 3
# move 2

Level 3 - Design Challenge

Design a DataPipeline class that:

Accepts a source iterable at construction time
Supports chaining .filter(predicate), .map(transform), and .take(n) methods
Each method returns a new DataPipeline (immutable, chainable)
Evaluation is lazy - nothing is computed until iteration begins
Is reusable: calling list(pipeline) twice produces the same result

Show Reference Solution

from itertools import islice
from typing import Callable, Iterable, Iterator, TypeVar

T = TypeVar("T")
U = TypeVar("U")


class DataPipeline:
    """
    An immutable, lazy, chainable data pipeline.

    Each transformation method returns a new DataPipeline.
    Evaluation only happens when the pipeline is iterated.
    Reusable because the source is re-iterated on each __iter__ call.
    """

    def __init__(self, source: Iterable):
        # Store the source - it must itself be reusable (a list, range, etc.)
        # or a callable that produces an iterable each time
        self._source = source
        self._transforms: list[Callable] = []

    def _clone_with(self, transform: Callable) -> "DataPipeline":
        """Return a new pipeline with one additional transform stage."""
        new = DataPipeline(self._source)
        new._transforms = self._transforms + [transform]
        return new

    def filter(self, predicate: Callable) -> "DataPipeline":
        """Lazily filter items by predicate."""
        return self._clone_with(lambda it: (x for x in it if predicate(x)))

    def map(self, transform: Callable) -> "DataPipeline":
        """Lazily transform each item."""
        return self._clone_with(lambda it: (transform(x) for x in it))

    def take(self, n: int) -> "DataPipeline":
        """Lazily take the first n items."""
        return self._clone_with(lambda it: islice(it, n))

    def __iter__(self) -> Iterator:
        # Build the pipeline fresh from the source each time
        result = iter(self._source)
        for transform in self._transforms:
            result = transform(result)
        return result

    def __repr__(self) -> str:
        return (
            f"DataPipeline(source={self._source!r}, "
            f"stages={len(self._transforms)})"
        )


# Usage
data = range(1, 21)   # 1..20

pipeline = (
    DataPipeline(data)
    .filter(lambda x: x % 2 == 0)    # keep evens: 2,4,6,...,20
    .map(lambda x: x ** 2)            # square: 4,16,36,...,400
    .filter(lambda x: x > 50)         # keep > 50: 64,100,...,400
    .take(4)                           # first 4
)

print(list(pipeline))   # [64, 100, 144, 196]
print(list(pipeline))   # [64, 100, 144, 196] - reusable!

# Branching pipelines - base pipeline is not modified
base = DataPipeline(range(10)).filter(lambda x: x % 2 == 0)
doubled = base.map(lambda x: x * 2)
squared = base.map(lambda x: x ** 2)

print(list(doubled))   # [0, 4, 8, 12, 16, 18] - wait, let's recalculate
# range(10): 0..9, evens: 0,2,4,6,8, doubled: 0,4,8,12,16
print(list(doubled))   # [0, 4, 8, 12, 16]
print(list(squared))   # [0, 4, 16, 36, 64]

Design decisions:

_transforms is a list of callables each taking an iterator and returning an iterator - this is the pipeline stage pattern
__iter__ rebuilds the full pipeline from the source each time, enabling reusability
_clone_with returns a new pipeline rather than mutating - immutable by design
islice handles take without materializing the full sequence

Key Takeaways

The iterator protocol requires two methods: __iter__ (returns iterator) and __next__ (returns next value or raises StopIteration)
An iterable has __iter__; an iterator has both __iter__ and __next__ - an iterator is always iterable, but not vice versa
list, tuple, dict, and set are iterables, not iterators - iter(collection) creates a fresh iterator each time
The for loop desugars to: _it = iter(obj) followed by repeated next(_it) until StopIteration
Make __iter__ return a new iterator object for reusable containers; return self only for inherently stateful, single-use streams
iter(callable, sentinel) repeatedly calls a callable until it returns the sentinel - idiomatic for reading files in chunks
next(iterator, default) returns a default instead of raising StopIteration - essential for "first matching item" patterns
zip(), map(), filter(), enumerate() all return iterators (single-use) - not lists
itertools.chain concatenates, islice slices lazily, zip_longest pads short iterables, groupby groups consecutive equal keys (sort first!), takewhile/dropwhile stop/skip based on a predicate
Lazy pipelines pass one item through all stages at a time, keeping memory constant regardless of input size

What's Next

Lesson 05 covers decorators - callables that wrap other callables to extend their behavior. Decorators are built directly on top of the function-as-first-class-object concept and use closures under the hood. You will learn the @ syntax desugaring, functools.wraps, decorator factories with arguments, class-based decorators, stacking behavior, and the production patterns (timing, retry, caching, rate limiting) that appear in every serious Python codebase.

What You Will Learn​

Prerequisites​

Part 1 - The Iterator Protocol​

Two Methods, One Contract​

A Minimal Iterator​

Part 2 - Iterable vs Iterator: The Critical Distinction​

The Class Diagram​

Why This Matters: Reusability​

Part 3 - The for Loop Desugared​

What Python Actually Does​

The Flowchart​

Verifying the Desugaring​

Part 4 - Implementing a Reusable Iterable​

The Separate Iterator Class Pattern​

One-Shot Iterator Pattern (When Appropriate)​

Part 5 - iter() with Sentinel and next() with Default​

iter(callable, sentinel) - Reading Until a Value​

next(iterator, default) - Avoiding StopIteration​

Part 6 - Built-in Iterators You Already Use​

Part 7 - The itertools Module​

Combining Iterables: chain​

Slicing: islice​

Zipping Unevenly: zip_longest​

Grouping Consecutive Items: groupby​

Conditional Iteration: takewhile and dropwhile​

Part 8 - Lazy Evaluation Pipelines​

Comparing Eager vs Lazy​

Common Mistakes​

Mistake 1 - Using the Iterator in Two Places​

Mistake 2 - Missing return self in __iter__ on an Iterator​

Mistake 3 - Consuming a Generator Twice​

Mistake 4 - Forgetting groupby Needs Sorted Input​

Graded Practice Challenges​

Level 1 - Predict the Output​

Level 2 - Debug Challenge​

Level 3 - Design Challenge​

Key Takeaways​

What's Next​

What You Will Learn

Prerequisites

Part 1 - The Iterator Protocol

Two Methods, One Contract

A Minimal Iterator

Part 2 - Iterable vs Iterator: The Critical Distinction

The Class Diagram

Why This Matters: Reusability

Part 3 - The for Loop Desugared

What Python Actually Does

The Flowchart

Verifying the Desugaring

Part 4 - Implementing a Reusable Iterable

The Separate Iterator Class Pattern

One-Shot Iterator Pattern (When Appropriate)

Part 5 - `iter()` with Sentinel and `next()` with Default

`iter(callable, sentinel)` - Reading Until a Value

`next(iterator, default)` - Avoiding StopIteration

Part 6 - Built-in Iterators You Already Use

Part 7 - The `itertools` Module

Combining Iterables: `chain`

Slicing: `islice`

Zipping Unevenly: `zip_longest`

Grouping Consecutive Items: `groupby`

Conditional Iteration: `takewhile` and `dropwhile`

Part 8 - Lazy Evaluation Pipelines

Comparing Eager vs Lazy

Common Mistakes

Mistake 1 - Using the Iterator in Two Places

Mistake 2 - Missing `return self` in `iter` on an Iterator

Mistake 3 - Consuming a Generator Twice

Mistake 4 - Forgetting groupby Needs Sorted Input

Graded Practice Challenges

Level 1 - Predict the Output

Level 2 - Debug Challenge

Level 3 - Design Challenge

Key Takeaways

What's Next