The Iterator Protocol - How Python's for Loop Really Works
Reading time: ~30 minutes | Level: Intermediate → Engineering
Before reading further, predict every output:
class NumberRange:
def __init__(self, start, stop):
self.start = start
self.stop = stop
r = NumberRange(1, 4)
for n in r:
print(n)
# Second loop:
for n in r:
print(n)
Show Answer
This raises TypeError: 'NumberRange' object is not iterable on the very first loop. NumberRange has no __iter__ method, so Python cannot iterate over it.
If you had implemented __iter__ as a generator (yield self.start; yield self.start+1; ...), the first loop would print 1, 2, 3. Whether the second loop also prints depends on the design: if __iter__ returns self (the object is its own iterator), the second loop prints nothing because the iterator is exhausted. If __iter__ returns a new iterator each time (the object is a reusable iterable), the second loop prints 1, 2, 3 again.
This is the central question: are you building an iterable or an iterator?
Now consider: every for loop, every list(), every zip(), and every in check against a sequence calls this same protocol. Understanding the iterator protocol at this depth lets you build lazy data pipelines, custom sequence types, and efficient streaming processors.
What You Will Learn
- The iterator protocol:
__iter__and__next__, andStopIteration - The critical distinction between iterables and iterators - and why it matters
- How Python's
forloop desugars into explicit protocol calls - Implementing a reusable iterable (separate iterator class) vs a one-shot iterator
iter()with a sentinel: reading until EOFnext()with a default: avoidingStopIteration- The
itertoolsmodule:chain,islice,zip_longest,groupby,takewhile,dropwhile - Building lazy evaluation pipelines
Prerequisites
- Lesson 01: Lambda Expressions - callable objects
- Lesson 02:
map,filter,reduce- higher-order functions consuming iterables - Lesson 03: Generators and
yield- generator functions return iterators - Comfortable writing classes with
__init__and instance methods
Part 1 - The Iterator Protocol
Two Methods, One Contract
The iterator protocol consists of exactly two methods:
class SomeIterator:
def __iter__(self):
return self # an iterator is its own iterator
def __next__(self):
# Return the next value, or:
raise StopIteration # signal exhaustion
An object that implements both __iter__ and __next__ is an iterator.
An object that implements only __iter__ (returning an iterator) is an iterable.
The built-in functions iter(obj) and next(obj) call these methods:
iter(obj) # calls obj.__iter__()
next(obj) # calls obj.__next__()
A Minimal Iterator
class CountUp:
"""An iterator that counts from start to stop (exclusive)."""
def __init__(self, start: int, stop: int):
self.current = start
self.stop = stop
def __iter__(self):
return self # this object IS the iterator
def __next__(self):
if self.current >= self.stop:
raise StopIteration
value = self.current
self.current += 1
return value
counter = CountUp(1, 4)
print(next(counter)) # 1
print(next(counter)) # 2
print(next(counter)) # 3
print(next(counter)) # raises StopIteration
Once StopIteration is raised, the iterator is exhausted. Calling next() again on an exhausted iterator should continue raising StopIteration (not reset).
Part 2 - Iterable vs Iterator: The Critical Distinction
The Class Diagram
Every iterator is also an iterable (because it has __iter__ returning itself). But not every iterable is an iterator.
# list is an iterable - it has __iter__
numbers = [1, 2, 3]
print(hasattr(numbers, '__iter__')) # True
print(hasattr(numbers, '__next__')) # False ← not an iterator
# iter() creates an iterator from an iterable
it = iter(numbers)
print(hasattr(it, '__iter__')) # True
print(hasattr(it, '__next__')) # True ← now it's an iterator
print(next(it)) # 1
print(next(it)) # 2
print(next(it)) # 3
print(next(it)) # StopIteration
Why This Matters: Reusability
A list is reusable in multiple for loops because each for loop calls iter(list) to get a fresh iterator. If the list itself were the iterator, the second loop would produce nothing:
numbers = [10, 20, 30]
for n in numbers:
print(n) # 10, 20, 30
for n in numbers:
print(n) # 10, 20, 30 - fresh iterator created by iter()
But an iterator used in two loops behaves differently:
it = iter([10, 20, 30])
for n in it:
print(n) # 10, 20, 30
for n in it:
print(n) # nothing - iterator is exhausted
:::warning Iterators Are Stateful and Single-Use
An iterator maintains position state. Once exhausted, it stays exhausted. zip(), map(), filter(), enumerate(), and generator expressions all return iterators - single-use objects. If you need to iterate over them twice, either convert to a list first or recreate them.
doubled = map(lambda x: x * 2, [1, 2, 3])
print(list(doubled)) # [2, 4, 6]
print(list(doubled)) # [] - exhausted!
# Fix: recreate, or convert to list first
items = [1, 2, 3]
doubled_list = list(map(lambda x: x * 2, items))
print(doubled_list) # [2, 4, 6]
print(doubled_list) # [2, 4, 6] - list is reusable
:::
Part 3 - The for Loop Desugared
What Python Actually Does
The for loop is syntactic sugar over explicit iterator protocol calls. This:
for item in iterable:
process(item)
is exactly equivalent to:
_iterator = iter(iterable) # calls iterable.__iter__()
while True:
try:
item = next(_iterator) # calls _iterator.__next__()
except StopIteration:
break
process(item)
The Flowchart
This desugaring has important implications:
breakabandons the iterator mid-stream - the remaining items are never consumedcontinueskips the loop body but callsnext()on the next iteration as normal- Modifying the iterable during iteration (e.g., adding to a list while iterating) causes undefined behavior - Python will raise
RuntimeError: dictionary changed size during iterationfor dicts
Verifying the Desugaring
class TracingIterable:
"""Prints every protocol call to show what the for loop does."""
def __init__(self, data):
self._data = data
def __iter__(self):
print(f" __iter__ called → returning TracingIterator")
return TracingIterator(iter(self._data))
class TracingIterator:
def __init__(self, inner):
self._inner = inner
self._count = 0
def __iter__(self):
return self
def __next__(self):
self._count += 1
try:
value = next(self._inner)
print(f" __next__ call #{self._count} → {value}")
return value
except StopIteration:
print(f" __next__ call #{self._count} → StopIteration")
raise
obj = TracingIterable([10, 20, 30])
for x in obj:
print(f" loop body: x = {x}")
Output:
__iter__ called → returning TracingIterator
__next__ call #1 → 10
loop body: x = 10
__next__ call #2 → 20
loop body: x = 20
__next__ call #3 → 30
loop body: x = 30
__next__ call #4 → StopIteration
Part 4 - Implementing a Reusable Iterable
The Separate Iterator Class Pattern
To make your object reusable across multiple for loops, separate the iterable from the iterator. The iterable's __iter__ returns a new iterator object each time:
class NumberRange:
"""A reusable iterable over a range of integers."""
def __init__(self, start: int, stop: int, step: int = 1):
self.start = start
self.stop = stop
self.step = step
def __iter__(self):
# Returns a NEW iterator every time - the object remains reusable
return NumberRangeIterator(self.start, self.stop, self.step)
def __repr__(self):
return f"NumberRange({self.start}, {self.stop}, {self.step})"
class NumberRangeIterator:
"""The stateful iterator - created fresh by NumberRange.__iter__."""
def __init__(self, start: int, stop: int, step: int):
self.current = start
self.stop = stop
self.step = step
def __iter__(self):
return self # iterator is its own iterator
def __next__(self) -> int:
if self.current >= self.stop:
raise StopIteration
value = self.current
self.current += self.step
return value
r = NumberRange(1, 10, 2)
# First loop
print(list(r)) # [1, 3, 5, 7, 9]
# Second loop - produces same result because __iter__ returns a NEW iterator
print(list(r)) # [1, 3, 5, 7, 9]
# Two simultaneous iterations are independent
it1 = iter(r)
it2 = iter(r)
print(next(it1)) # 1
print(next(it1)) # 3
print(next(it2)) # 1 - it2 is independent of it1
:::tip Make iter Return a New Iterator for Reusable Objects
If your class represents a data container (a collection, a range, a dataset), its __iter__ should return a new iterator each time. This is how list, tuple, dict, set, and range all work. Only make an object its own iterator if it is inherently single-use (like a file handle or a network stream).
# Reusable iterable pattern (preferred for containers)
class MyCollection:
def __iter__(self):
return MyCollectionIterator(self._data) # new object each time
# One-shot iterator pattern (for inherently stateful streams)
class MyStream:
def __iter__(self):
return self # same object - exhausts on first full traversal
:::
One-Shot Iterator Pattern (When Appropriate)
For inherently sequential, stateful sources - file reads, network streams, generators - an object being its own iterator is correct:
class FileLineIterator:
"""Iterates over lines in a file, skipping blank lines and comments."""
def __init__(self, filepath: str):
self._file = open(filepath, encoding="utf-8")
def __iter__(self):
return self # one-shot: file position is shared state
def __next__(self) -> str:
while True:
line = self._file.readline()
if not line: # EOF
self._file.close()
raise StopIteration
stripped = line.strip()
if stripped and not stripped.startswith("#"):
return stripped
def __del__(self):
if not self._file.closed:
self._file.close()
Part 5 - iter() with Sentinel and next() with Default
iter(callable, sentinel) - Reading Until a Value
iter() has a two-argument form that repeatedly calls a callable until it returns the sentinel value:
# iter(callable, sentinel)
# Equivalent to:
# while True:
# value = callable()
# if value == sentinel:
# raise StopIteration
# yield value
import io
data = io.StringIO("line1\nline2\nline3\n")
# Read lines until readline() returns "" (EOF)
for line in iter(data.readline, ""):
print(repr(line))
# 'line1\n'
# 'line2\n'
# 'line3\n'
This is idiomatic for reading fixed-size chunks from a binary file:
import functools
def read_in_chunks(filepath: str, chunk_size: int = 4096):
"""Yields chunks of a binary file without loading it all into memory."""
with open(filepath, "rb") as f:
reader = functools.partial(f.read, chunk_size)
for chunk in iter(reader, b""):
yield chunk
# Usage
for chunk in read_in_chunks("/dev/urandom", 1024):
process(chunk)
break # just one chunk for demo
next(iterator, default) - Avoiding StopIteration
The two-argument form of next() returns a default value instead of raising StopIteration:
it = iter([1, 2, 3])
print(next(it, None)) # 1
print(next(it, None)) # 2
print(next(it, None)) # 3
print(next(it, None)) # None - instead of StopIteration
print(next(it, None)) # None - stays at default
# Useful for "get first matching item" without try/except
numbers = [4, 7, 2, 9, 1]
first_even = next((n for n in numbers if n % 2 == 0), None)
print(first_even) # 4
first_over_100 = next((n for n in numbers if n > 100), -1)
print(first_over_100) # -1 (sentinel default)
:::danger Do Not Confuse iter(obj) with obj.iter()
iter(obj) and obj.__iter__() are not identical. iter() adds two important behaviors:
- For non-iterator iterables,
iter(obj)callsobj.__iter__()and validates the result is an iterator (has__next__). Callingobj.__iter__()directly skips this check. iter(callable, sentinel)does not call__iter__at all - it wraps the callable in aniterator.
Always use the built-in iter() and next() rather than calling dunder methods directly:
# Wrong - bypasses validation, skips sentinel form
it = obj.__iter__()
# Right
it = iter(obj)
:::
Part 6 - Built-in Iterators You Already Use
:::note zip(), enumerate(), map(), filter() All Return Iterators Python 3 built-ins that process sequences return iterators, not lists. This is a deliberate memory-efficiency design - they do not evaluate the full sequence upfront.
z = zip([1, 2, 3], ["a", "b", "c"])
print(type(z)) # <class 'zip'>
print(hasattr(z, '__next__')) # True - it's an iterator
e = enumerate(["x", "y"])
print(type(e)) # <class 'enumerate'>
m = map(str, [1, 2, 3])
print(type(m)) # <class 'map'>
f = filter(None, [0, 1, 2, None, 3])
print(type(f)) # <class 'filter'>
Consequence: passing zip(...) to two for loops consumes it on the first pass. Wrap in list() if you need multiple passes.
:::
Part 7 - The itertools Module
itertools provides a suite of memory-efficient iterators for combinatorial and streaming operations. All itertools functions return iterators (lazy by default).
Combining Iterables: chain
from itertools import chain
# chain concatenates multiple iterables lazily
combined = chain([1, 2, 3], [4, 5], [6])
print(list(combined)) # [1, 2, 3, 4, 5, 6]
# chain.from_iterable: flatten one level of nesting
nested = [[1, 2], [3, 4], [5, 6]]
flat = list(chain.from_iterable(nested))
print(flat) # [1, 2, 3, 4, 5, 6]
Slicing: islice
islice slices an iterator without materializing it - essential for taking the first N items from an infinite iterator:
from itertools import islice, count
# count() is an infinite iterator: 0, 1, 2, 3, ...
infinite = count(start=10, step=3)
# islice(iterable, stop) or islice(iterable, start, stop, step)
first_five = list(islice(infinite, 5))
print(first_five) # [10, 13, 16, 19, 22]
# Skipping items: islice(iter, start=2, stop=7)
data = iter(range(100))
middle = list(islice(data, 2, 7))
print(middle) # [2, 3, 4, 5, 6]
Zipping Unevenly: zip_longest
Standard zip() stops at the shortest iterable. zip_longest pads with a fill value:
from itertools import zip_longest
names = ["Alice", "Bob", "Charlie"]
scores = [95, 87]
# zip stops at length 2
for pair in zip(names, scores):
print(pair)
# ('Alice', 95)
# ('Bob', 87)
# zip_longest pads missing values
for pair in zip_longest(names, scores, fillvalue=0):
print(pair)
# ('Alice', 95)
# ('Bob', 87)
# ('Charlie', 0)
Grouping Consecutive Items: groupby
groupby groups consecutive elements with the same key - similar to SQL GROUP BY but requires the input to be sorted by the key first:
from itertools import groupby
# Input MUST be sorted by the key you group on
events = [
{"type": "click", "x": 10},
{"type": "click", "x": 20},
{"type": "move", "x": 30},
{"type": "move", "x": 40},
{"type": "click", "x": 50},
]
# Sort by type first, then group
sorted_events = sorted(events, key=lambda e: e["type"])
for event_type, group in groupby(sorted_events, key=lambda e: e["type"]):
items = list(group)
print(f"{event_type}: {len(items)} events")
# click: 3 events
# move: 2 events
:::warning groupby Requires Sorted Input
groupby only groups consecutive equal keys. If the data is unsorted, you get multiple groups for the same key. Always sort by the grouping key before passing to groupby.
from itertools import groupby
data = [1, 1, 2, 1, 1] # unsorted - 1 appears in two separate runs
for key, group in groupby(data):
print(key, list(group))
# 1 [1, 1]
# 2 [2]
# 1 [1, 1] ← 1 appears again as a separate group!
:::
Conditional Iteration: takewhile and dropwhile
from itertools import takewhile, dropwhile
data = [2, 4, 6, 7, 8, 10]
# takewhile: yield items while predicate is True, stop at first False
evens_until_odd = list(takewhile(lambda x: x % 2 == 0, data))
print(evens_until_odd) # [2, 4, 6]
# dropwhile: skip items while predicate is True, then yield all remaining
after_first_odd = list(dropwhile(lambda x: x % 2 == 0, data))
print(after_first_odd) # [7, 8, 10]
Part 8 - Lazy Evaluation Pipelines
The real power of iterators is composing them into lazy pipelines: data flows through each stage one item at a time, without materializing intermediate lists.
from itertools import islice, chain, takewhile
import csv
import io
# Simulate a large CSV-like data source
raw_data = """
id,name,score,active
1,Alice,95,true
2,Bob,42,false
3,Charlie,88,true
4,Dave,30,false
5,Eve,91,true
6,Frank,55,true
""".strip()
def parse_csv_rows(text: str):
"""Yields dicts from CSV text, one row at a time."""
reader = csv.DictReader(io.StringIO(text))
yield from reader
def is_active(row: dict) -> bool:
return row["active"] == "true"
def above_threshold(threshold: int):
def predicate(row: dict) -> bool:
return int(row["score"]) >= threshold
return predicate
def extract_name(row: dict) -> str:
return row["name"]
# Lazy pipeline - nothing is computed until iteration
pipeline = (
extract_name(row)
for row in parse_csv_rows(raw_data)
if is_active(row) and above_threshold(70)(row)
)
# Consume only the first 3 results
top_names = list(islice(pipeline, 3))
print(top_names) # ['Alice', 'Charlie', 'Eve']
The key insight: parse_csv_rows reads one row, passes it to the filter, passes it to the transform, and yields the result - before reading the next row. Memory usage stays constant regardless of the input file size.
Comparing Eager vs Lazy
import sys
data = range(1_000_000)
# Eager - materializes all 1 million items
eager = [x * 2 for x in data if x % 3 == 0]
print(f"Eager: {sys.getsizeof(eager):,} bytes") # ~2,800,000 bytes
# Lazy - iterator holds no data, just the computation recipe
lazy = (x * 2 for x in data if x % 3 == 0)
print(f"Lazy: {sys.getsizeof(lazy):,} bytes") # ~104 bytes
# Both produce the same values when consumed
print(next(lazy)) # 0
print(next(lazy)) # 6
Common Mistakes
Mistake 1 - Using the Iterator in Two Places
# Wrong - iterator exhausts after first use
rows = csv.reader(open("data.csv"))
headers = list(rows) # consumes entire iterator
data = list(rows) # empty - iterator is exhausted
# Right - re-open or use a list
with open("data.csv") as f:
all_rows = list(csv.reader(f))
headers = all_rows[:1]
data = all_rows[1:]
Mistake 2 - Missing return self in __iter__ on an Iterator
# Wrong - iterator is not iterable, can't be used in for loops
class BadIterator:
def __next__(self):
...
# no __iter__ → TypeError: 'BadIterator' object is not iterable
# Right
class GoodIterator:
def __iter__(self):
return self # iterators must be their own iterables
def __next__(self):
...
Mistake 3 - Consuming a Generator Twice
# Wrong - generator is a one-shot iterator
def generate_numbers():
yield from range(5)
gen = generate_numbers()
print(sum(gen)) # 10 - consumed
print(sum(gen)) # 0 - already exhausted
# Right - call the generator function again
print(sum(generate_numbers())) # 10
print(sum(generate_numbers())) # 10
Mistake 4 - Forgetting groupby Needs Sorted Input
# Wrong - data not sorted by key
from itertools import groupby
data = [("a", 1), ("b", 2), ("a", 3)]
groups = {k: list(v) for k, v in groupby(data, key=lambda x: x[0])}
print(groups) # {'a': [('a', 1)], 'b': [('b', 2)], 'a': [('a', 3)]}
# Key 'a' appears twice!
# Right - sort first
data_sorted = sorted(data, key=lambda x: x[0])
groups = {k: list(v) for k, v in groupby(data_sorted, key=lambda x: x[0])}
print(groups) # {'a': [('a', 1), ('a', 3)], 'b': [('b', 2)]}
Graded Practice Challenges
Level 1 - Predict the Output
Question 1: What does this print?
nums = [10, 20, 30]
it = iter(nums)
print(next(it))
print(next(it))
for n in it:
print(n)
for n in it:
print(n)
Show Answer
Output:
10
20
30
next(it) consumes 10 and 20. The first for loop calls iter(it) - which returns it itself (iterators are their own iterators) - and then consumes 30 via next(). The second for loop also calls iter(it), gets it back, and immediately hits StopIteration with no output.
Question 2: What does this print?
from itertools import takewhile
data = [1, 3, 5, 6, 7, 9]
result = list(takewhile(lambda x: x % 2 != 0, data))
print(result)
Show Answer
Output:
[1, 3, 5]
takewhile yields items while the predicate returns True. It stops permanently at the first False - even if later items would satisfy the predicate. 6 is even so takewhile stops there. 7 and 9 are odd but are never reached.
Question 3: What does this print?
from itertools import chain
a = iter([1, 2])
b = iter([3, 4])
c = chain(a, b)
print(next(c))
print(next(a))
print(list(c))
Show Answer
Output:
1
2
[3, 4]
chain(a, b) creates a lazy chain that draws from a first, then b. next(c) pulls from a, returning 1. next(a) pulls directly from a, returning 2 - and also advances the shared iterator state that c holds. When list(c) is called, a is now exhausted, so chain moves to b and returns [3, 4].
Question 4: What does this print?
def make_range(n):
class Range:
def __iter__(self):
self.i = 0
return self
def __next__(self):
if self.i >= n:
raise StopIteration
self.i += 1
return self.i
return Range()
r = make_range(3)
print(list(r))
print(list(r))
Show Answer
Output:
[1, 2, 3]
[1, 2, 3]
This is a subtle case. Range.__iter__ resets self.i = 0 each time it is called. Because the for loop (and list()) calls iter(r) first, which calls r.__iter__(), which resets the counter, the range is reusable - even though Range is its own iterator. This is an unusual but valid pattern: reset state in __iter__. The standard approach is a separate iterator class, but resetting in __iter__ achieves the same effect.
Question 5: What does this print?
from itertools import islice
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
fib = fibonacci()
first_eight = list(islice(fib, 8))
print(first_eight)
next_two = list(islice(fib, 2))
print(next_two)
Show Answer
Output:
[0, 1, 1, 2, 3, 5, 8, 13]
[21, 34]
fibonacci() returns an infinite generator. islice(fib, 8) lazily takes the first 8 values without stopping the generator. The generator fib is still alive after islice completes. The second islice(fib, 2) continues from where the generator left off - the 9th and 10th Fibonacci numbers, 21 and 34.
Level 2 - Debug Challenge
Find and fix all bugs:
from itertools import groupby
class EventLog:
def __init__(self, events):
self.events = events
def __iter__(self):
return self # bug 1
def __next__(self):
if not self.events:
raise StopIteration
return self.events.pop(0)
log = EventLog(["click", "click", "move", "click"])
# First pass: count events
total = sum(1 for _ in log)
print(f"Total events: {total}")
# Second pass: iterate again - bug 2
for event in log:
print(event)
# Group by event type - bug 3
events = ["click", "move", "click", "click", "move"]
for etype, group in groupby(events):
print(etype, len(list(group)))
Show Solution
Bugs:
-
EventLog.__iter__returnsself, makingEventLoga one-shot iterator. Each call toiter(log)returns the same object with shared (destructive) state.self.events.pop(0)mutates the list - after the first full pass,self.eventsis empty. The second loop sees nothing. -
Directly caused by bug 1:
logis exhausted after the firstsum(1 for _ in log). A proper iterable should return a new iterator each time. -
groupby(events)groups consecutive equal items. The input["click", "move", "click", "click", "move"]has"click"appearing non-consecutively. Sort first for meaningful grouping.
Fixed version:
from itertools import groupby
class EventLog:
def __init__(self, events):
self._events = list(events) # preserve original
def __iter__(self):
# Fix 1 & 2: return a new iterator each time
return iter(list(self._events))
log = EventLog(["click", "click", "move", "click"])
# First pass: count events
total = sum(1 for _ in log)
print(f"Total events: {total}") # 4
# Second pass: works now - __iter__ returns a fresh iterator
for event in log:
print(event) # click, click, move, click
# Fix 3: sort before groupby
events = ["click", "move", "click", "click", "move"]
for etype, group in groupby(sorted(events)):
print(etype, len(list(group)))
# click 3
# move 2
Level 3 - Design Challenge
Design a DataPipeline class that:
- Accepts a source iterable at construction time
- Supports chaining
.filter(predicate),.map(transform), and.take(n)methods - Each method returns a new
DataPipeline(immutable, chainable) - Evaluation is lazy - nothing is computed until iteration begins
- Is reusable: calling
list(pipeline)twice produces the same result
Show Reference Solution
from itertools import islice
from typing import Callable, Iterable, Iterator, TypeVar
T = TypeVar("T")
U = TypeVar("U")
class DataPipeline:
"""
An immutable, lazy, chainable data pipeline.
Each transformation method returns a new DataPipeline.
Evaluation only happens when the pipeline is iterated.
Reusable because the source is re-iterated on each __iter__ call.
"""
def __init__(self, source: Iterable):
# Store the source - it must itself be reusable (a list, range, etc.)
# or a callable that produces an iterable each time
self._source = source
self._transforms: list[Callable] = []
def _clone_with(self, transform: Callable) -> "DataPipeline":
"""Return a new pipeline with one additional transform stage."""
new = DataPipeline(self._source)
new._transforms = self._transforms + [transform]
return new
def filter(self, predicate: Callable) -> "DataPipeline":
"""Lazily filter items by predicate."""
return self._clone_with(lambda it: (x for x in it if predicate(x)))
def map(self, transform: Callable) -> "DataPipeline":
"""Lazily transform each item."""
return self._clone_with(lambda it: (transform(x) for x in it))
def take(self, n: int) -> "DataPipeline":
"""Lazily take the first n items."""
return self._clone_with(lambda it: islice(it, n))
def __iter__(self) -> Iterator:
# Build the pipeline fresh from the source each time
result = iter(self._source)
for transform in self._transforms:
result = transform(result)
return result
def __repr__(self) -> str:
return (
f"DataPipeline(source={self._source!r}, "
f"stages={len(self._transforms)})"
)
# Usage
data = range(1, 21) # 1..20
pipeline = (
DataPipeline(data)
.filter(lambda x: x % 2 == 0) # keep evens: 2,4,6,...,20
.map(lambda x: x ** 2) # square: 4,16,36,...,400
.filter(lambda x: x > 50) # keep > 50: 64,100,...,400
.take(4) # first 4
)
print(list(pipeline)) # [64, 100, 144, 196]
print(list(pipeline)) # [64, 100, 144, 196] - reusable!
# Branching pipelines - base pipeline is not modified
base = DataPipeline(range(10)).filter(lambda x: x % 2 == 0)
doubled = base.map(lambda x: x * 2)
squared = base.map(lambda x: x ** 2)
print(list(doubled)) # [0, 4, 8, 12, 16, 18] - wait, let's recalculate
# range(10): 0..9, evens: 0,2,4,6,8, doubled: 0,4,8,12,16
print(list(doubled)) # [0, 4, 8, 12, 16]
print(list(squared)) # [0, 4, 16, 36, 64]
Design decisions:
_transformsis a list of callables each taking an iterator and returning an iterator - this is the pipeline stage pattern__iter__rebuilds the full pipeline from the source each time, enabling reusability_clone_withreturns a new pipeline rather than mutating - immutable by designislicehandlestakewithout materializing the full sequence
Key Takeaways
- The iterator protocol requires two methods:
__iter__(returns iterator) and__next__(returns next value or raisesStopIteration) - An iterable has
__iter__; an iterator has both__iter__and__next__- an iterator is always iterable, but not vice versa list,tuple,dict, andsetare iterables, not iterators -iter(collection)creates a fresh iterator each time- The
forloop desugars to:_it = iter(obj)followed by repeatednext(_it)untilStopIteration - Make
__iter__return a new iterator object for reusable containers; returnselfonly for inherently stateful, single-use streams iter(callable, sentinel)repeatedly calls a callable until it returns the sentinel - idiomatic for reading files in chunksnext(iterator, default)returns a default instead of raisingStopIteration- essential for "first matching item" patternszip(),map(),filter(),enumerate()all return iterators (single-use) - not listsitertools.chainconcatenates,isliceslices lazily,zip_longestpads short iterables,groupbygroups consecutive equal keys (sort first!),takewhile/dropwhilestop/skip based on a predicate- Lazy pipelines pass one item through all stages at a time, keeping memory constant regardless of input size
What's Next
Lesson 05 covers decorators - callables that wrap other callables to extend their behavior. Decorators are built directly on top of the function-as-first-class-object concept and use closures under the hood. You will learn the @ syntax desugaring, functools.wraps, decorator factories with arguments, class-based decorators, stacking behavior, and the production patterns (timing, retry, caching, rate limiting) that appear in every serious Python codebase.
