Python Designing Clean APIs Practice Problems & Exercises
Practice: Designing Clean APIs
← Back to lessonEasy
The function below works correctly but has a terrible name. Rename it (and its internal variable) so that a caller can understand what it does without reading the body.
def filter_active_users(users):
"""Return only users whose 'active' field is True."""
active = [u for u in users if u.get("active")]
return active
# Test
users = [
{"name": "Alice", "active": True},
{"name": "Bob", "active": False},
{"name": "Carol", "active": True},
]
result = filter_active_users(users)
print(f"Active users: {result}")
print(f"Total active: {len(result)}")Solution
# BEFORE — bad name
def do(d):
r = [u for u in d if u.get("active")]
return r
# AFTER — intent is clear from the name alone
def filter_active_users(users):
"""Return only users whose 'active' field is True."""
active = [u for u in users if u.get("active")]
return active
users = [
{"name": "Alice", "active": True},
{"name": "Bob", "active": False},
{"name": "Carol", "active": True},
]
result = filter_active_users(users)
print(f"Active users: {result}")
print(f"Total active: {len(result)}")
Naming principles applied:
do(d)violates every naming rule: no verb phrase, no indication of whatdis, no hint at the return type.filter_active_users(users)tells you: (1) it filters, (2) it returns active items, (3) it works on users.- The variable
rbecomesactive— descriptive of what it holds.
The newspaper test: Can you describe the function in one verb phrase? "Filter active users" — passes. "Do d" — fails.
Expected Output
Active users: [{'name': 'Alice', 'active': True}, {'name': 'Carol', 'active': True}]\nTotal active: 2Hints
Hint 1: Function names should be verb phrases that describe WHAT the function does, not vague words like "do", "run", or "process".
Hint 2: Use the prefix table: `get_` for read-only retrieval, `filter_` or descriptive verb for transformations. The function filters users by active status, so a name like `get_active_users` or `filter_active_users` communicates intent.
The function below takes too many positional arguments. Refactor it so that data remains positional but all other parameters are keyword-only (using the * separator). This prevents callers from passing unlabeled values.
def create_report(
data,
*,
title,
author,
format="pdf",
include_charts=True,
):
"""Create a report from data with the given configuration."""
return f"Report: {title} by {author} ({format}, charts={include_charts})"
# These calls MUST use keyword names — positional won't work
report1 = create_report(
[1, 2, 3],
title="Q4 Summary",
author="Alice",
)
print(report1)
report2 = create_report(
[4, 5, 6],
title="Monthly",
author="Bob",
format="html",
include_charts=False,
)
print(report2)Solution
# BEFORE — all positional, caller can mix up order
def create_report(data, title, author, format="pdf", include_charts=True):
return f"Report: {title} by {author} ({format}, charts={include_charts})"
# Dangerous call — which string is title, which is author?
# create_report([1,2,3], "Q4 Summary", "Alice", "pdf", True)
# AFTER — keyword-only after *
def create_report(
data,
*,
title,
author,
format="pdf",
include_charts=True,
):
"""Create a report from data with the given configuration."""
return f"Report: {title} by {author} ({format}, charts={include_charts})"
report1 = create_report(
[1, 2, 3],
title="Q4 Summary",
author="Alice",
)
print(report1)
report2 = create_report(
[4, 5, 6],
title="Monthly",
author="Bob",
format="html",
include_charts=False,
)
print(report2)
Why keyword-only matters:
BEFORE (positional):
create_report(data, "Q4", "Alice", "pdf", True)
# What does True mean? What if you swap "Q4" and "Alice"?
AFTER (keyword-only):
create_report(data, title="Q4", author="Alice", include_charts=True)
# Every argument is labeled — impossible to confuse
Rule of thumb: If a function has more than 2-3 parameters, make everything after the first one or two keyword-only. The * separator costs nothing at runtime but prevents entire categories of bugs.
Expected Output
Report: Q4 Summary by Alice (pdf, charts=True)\nReport: Monthly by Bob (html, charts=False)Hints
Hint 1: Place a bare `*` after the positional parameter `data` in the function signature. Everything after `*` becomes keyword-only.
Hint 2: Keyword-only arguments force callers to write `title="Q4 Summary"` instead of just `"Q4 Summary"` — making every call site self-documenting.
The two functions below have inconsistent return types — they return None when there are no results, forcing callers to check for None before iterating. Fix both functions to always return a list (empty list for no results).
# Simulated database
tag_db = {1: ["python", "tutorial"], 2: ["javascript", "react"]}
score_db = {"alice": [95, 87], "bob": [72, 68, 91]}
def get_tags(post_id):
"""Return tags for a post. Always returns a list."""
return tag_db.get(post_id, [])
def get_scores(username):
"""Return scores for a user. Always returns a list."""
return score_db.get(username, [])
# Callers can iterate safely without None checks
tags1 = get_tags(1)
print(f"Tags for post 1: {tags1}")
tags_missing = get_tags(999)
print(f"Tags for post 999: {tags_missing}")
print(f"Can iterate empty: {[t.upper() for t in tags_missing] == []}")
scores1 = get_scores("alice")
print(f"Scores for alice: {scores1}")
scores_missing = get_scores("unknown")
print(f"Scores for unknown: {scores_missing}")Solution
# BEFORE — inconsistent, returns None or list
def get_tags_bad(post_id):
tags = tag_db.get(post_id)
if not tags:
return None # Caller must check for None!
return tags
def get_scores_bad(username):
if username not in score_db:
return None # Caller must check for None!
return score_db[username]
# AFTER — always returns a list
tag_db = {1: ["python", "tutorial"], 2: ["javascript", "react"]}
score_db = {"alice": [95, 87], "bob": [72, 68, 91]}
def get_tags(post_id):
"""Return tags for a post. Always returns a list."""
return tag_db.get(post_id, [])
def get_scores(username):
"""Return scores for a user. Always returns a list."""
return score_db.get(username, [])
tags1 = get_tags(1)
print(f"Tags for post 1: {tags1}")
tags_missing = get_tags(999)
print(f"Tags for post 999: {tags_missing}")
print(f"Can iterate empty: {[t.upper() for t in tags_missing] == []}")
scores1 = get_scores("alice")
print(f"Scores for alice: {scores1}")
scores_missing = get_scores("unknown")
print(f"Scores for unknown: {scores_missing}")
Why consistent return types matter:
BEFORE (returns None or list):
tags = get_tags(999)
# tags is None — next line crashes:
for t in tags: # TypeError: 'NoneType' is not iterable
print(t.upper())
# Caller must write defensive code EVERY time:
tags = get_tags(999)
if tags is not None:
for t in tags:
print(t.upper())
AFTER (always returns list):
tags = get_tags(999) # returns []
for t in tags: # safe — iterating empty list does nothing
print(t.upper())
Rule: Prefer empty collections over None for "no results." Reserve None for cases where absence has a distinct semantic meaning (e.g., find_user() returns None meaning "user does not exist" vs returning a user dict).
Expected Output
Tags for post 1: ['python', 'tutorial']\nTags for post 999: []\nCan iterate empty: True\nScores for alice: [95, 87]\nScores for unknown: []Hints
Hint 1: A function that returns a list should ALWAYS return a list — including an empty list `[]` for "no results". Never return `None` when the caller expects a list.
Hint 2: Returning `None` for "not found" forces every caller to write `if result is not None:` before iterating. An empty list lets callers iterate safely without any check.
The function below uses a boolean flag reverse to control sort order. At the call site, sort_values(data, True) is ambiguous. Refactor to use a SortOrder enum so call sites are self-documenting.
from enum import Enum
class SortOrder(Enum):
ASCENDING = "ascending"
DESCENDING = "descending"
def sort_values(values, *, order=SortOrder.ASCENDING):
"""Sort a list of values in the specified order."""
descending = order == SortOrder.DESCENDING
return sorted(values, reverse=descending)
# Test — call sites are now unambiguous
data = [3, 1, 4, 1, 5, 2]
asc = sort_values(data, order=SortOrder.ASCENDING)
print(f"Ascending: {asc}")
desc = sort_values(data, order=SortOrder.DESCENDING)
print(f"Descending: {desc}")
# Default is ascending
default = sort_values(data)
print(f"Ascending (explicit): {default}")Solution
# BEFORE — boolean flag
def sort_values_bad(values, reverse=False):
return sorted(values, reverse=reverse)
# Call site: what does True mean?
# sort_values_bad([3,1,4], True) # Reverse what? Sort? Order?
# AFTER — enum makes intent explicit
from enum import Enum
class SortOrder(Enum):
ASCENDING = "ascending"
DESCENDING = "descending"
def sort_values(values, *, order=SortOrder.ASCENDING):
"""Sort a list of values in the specified order."""
descending = order == SortOrder.DESCENDING
return sorted(values, reverse=descending)
data = [3, 1, 4, 1, 5, 2]
asc = sort_values(data, order=SortOrder.ASCENDING)
print(f"Ascending: {asc}")
desc = sort_values(data, order=SortOrder.DESCENDING)
print(f"Descending: {desc}")
default = sort_values(data)
print(f"Ascending (explicit): {default}")
Why enums beat booleans:
BEFORE:
sort_values(data, True) # What does True mean??
sort_values(data, False) # And False??
AFTER:
sort_values(data, order=SortOrder.DESCENDING) # Crystal clear
sort_values(data, order=SortOrder.ASCENDING) # No ambiguity
Bonus: Enums are extensible. If you later need SortOrder.RANDOM or SortOrder.STABLE_DESCENDING, you add a member — no boolean gymnastics needed. Booleans only offer two states; enums offer as many as you need.
Expected Output
Ascending: [1, 2, 3, 4, 5]\nDescending: [5, 4, 3, 2, 1]\nAscending (explicit): [1, 2, 3, 4, 5]Hints
Hint 1: A boolean flag `reverse=True` at a call site leaves the reader guessing: "reverse what?" An enum like `SortOrder.DESCENDING` is unambiguous.
Hint 2: Create a `SortOrder` enum with `ASCENDING` and `DESCENDING` members. Replace the boolean parameter with a `sort_order: SortOrder` parameter.
Medium
The function below violates single responsibility — it validates, creates, saves, and notifies all in one place. Split it into focused functions that each do one thing. Then compose them in a coordinator function.
def create_user(name, email):
"""Create a user dict from name and email."""
return {"name": name, "email": email}
def validate_user(user):
"""Validate that a user dict has required fields with valid data."""
if not user.get("name"):
raise ValueError("Name is required")
if not user.get("email") or "@" not in user["email"]:
raise ValueError("Valid email is required")
return True
def save_user(user):
"""Persist a user to the database."""
print(f"Saved: {user['email']}")
return True
def notify_user(user):
"""Send a welcome notification to the user."""
print(f"Notified: {user['email']}")
return True
def register_user(name, email):
"""Orchestrate user registration: create, validate, save, notify."""
user = create_user(name, email)
print(f"User: {user}")
valid = validate_user(user)
print(f"Valid: {valid}")
save_user(user)
notify_user(user)
# Test
register_user("Alice", "[email protected]")Solution
# BEFORE — one function does everything
def register_user_bad(name, email):
if not name:
raise ValueError("Name required")
if "@" not in email:
raise ValueError("Invalid email")
user = {"name": name, "email": email}
# save to db...
print(f"Saved: {email}")
# send notification...
print(f"Notified: {email}")
return user
# AFTER — each function does one thing
def create_user(name, email):
"""Create a user dict from name and email."""
return {"name": name, "email": email}
def validate_user(user):
"""Validate that a user dict has required fields with valid data."""
if not user.get("name"):
raise ValueError("Name is required")
if not user.get("email") or "@" not in user["email"]:
raise ValueError("Valid email is required")
return True
def save_user(user):
"""Persist a user to the database."""
print(f"Saved: {user['email']}")
return True
def notify_user(user):
"""Send a welcome notification to the user."""
print(f"Notified: {user['email']}")
return True
def register_user(name, email):
"""Orchestrate user registration: create, validate, save, notify."""
user = create_user(name, email)
print(f"User: {user}")
valid = validate_user(user)
print(f"Valid: {valid}")
save_user(user)
notify_user(user)
Single responsibility benefits:
Testability:
- test_validate_user() — no DB, no network, just logic
- test_save_user() — mock only the DB
- test_notify_user() — mock only the mailer
Reusability:
- validate_user() can be used in update_user() too
- save_user() works for any user dict, not just new registrations
- notify_user() can be reused for password resets
Debuggability:
- If notifications fail, you know it is in notify_user()
- The monolithic version could fail anywhere in 20 lines
The newspaper test for each function:
create_user— "Create a user dict." Pass.validate_user— "Validate user data." Pass.save_user— "Save user to database." Pass.notify_user— "Send welcome notification." Pass.register_user— "Orchestrate registration." Pass (it delegates, does not implement).
Expected Output
User: {'name': 'Alice', 'email': '[email protected]'}\nValid: True\nSaved: [email protected]\nNotified: [email protected]Hints
Hint 1: The original function does four things: validate, create, save, and notify. Each should be its own function with a single purpose.
Hint 2: Apply the newspaper test: describe each function in one sentence without "and" or "or." If you cannot, split further. `fetch_user_data`, `validate_user`, `save_user`, `notify_user` each pass this test.
Design three functions with clear, consistent error handling contracts. Each function demonstrates a different but internally consistent pattern: raise on invalid input, raise on domain violations, or return None for lookups.
def divide(a, b):
"""Divide a by b. Raises ZeroDivisionError if b is zero."""
if b == 0:
raise ZeroDivisionError("Cannot divide by zero")
return a / b
def parse_age(value):
"""Parse a string into a valid age. Raises ValueError on invalid input."""
try:
age = int(value)
except (ValueError, TypeError):
raise ValueError(f"Cannot parse age from: {value!r}")
if age < 0 or age > 150:
raise ValueError(f"Age out of valid range: {age}")
return age
def find_user(username):
"""Look up a user by username. Returns dict or None if not found."""
users = {
"alice": {"name": "Alice", "email": "[email protected]"},
"bob": {"name": "Bob", "email": "[email protected]"},
}
return users.get(username)
# Test divide
result = divide(10, 3)
print(f"divide(10, 3): {result:.4f}")
try:
divide(10, 0)
except ZeroDivisionError:
print("divide(10, 0): raised ZeroDivisionError")
# Test parse_age
print(f"parse_age('25'): {parse_age('25')}")
try:
parse_age("abc")
except ValueError:
print("parse_age('abc'): raised ValueError")
try:
parse_age("-5")
except ValueError:
print("parse_age('-5'): raised ValueError")
# Test find_user
user = find_user("alice")
print(f"find_user('alice'): {user['email']}")
missing = find_user("unknown")
print(f"find_user('unknown'): {missing}")Solution
def divide(a, b):
"""Divide a by b. Raises ZeroDivisionError if b is zero."""
if b == 0:
raise ZeroDivisionError("Cannot divide by zero")
return a / b
def parse_age(value):
"""Parse a string into a valid age. Raises ValueError on invalid input."""
try:
age = int(value)
except (ValueError, TypeError):
raise ValueError(f"Cannot parse age from: {value!r}")
if age < 0 or age > 150:
raise ValueError(f"Age out of valid range: {age}")
return age
def find_user(username):
"""Look up a user by username. Returns dict or None if not found."""
users = {
}
return users.get(username)
result = divide(10, 3)
print(f"divide(10, 3): {result:.4f}")
try:
divide(10, 0)
except ZeroDivisionError:
print("divide(10, 0): raised ZeroDivisionError")
print(f"parse_age('25'): {parse_age('25')}")
try:
parse_age("abc")
except ValueError:
print("parse_age('abc'): raised ValueError")
try:
parse_age("-5")
except ValueError:
print("parse_age('-5'): raised ValueError")
user = find_user("alice")
print(f"find_user('alice'): {user['email']}")
missing = find_user("unknown")
print(f"find_user('unknown'): {missing}")
Three error handling contracts:
Contract 1 — Always returns or raises (computation):
divide(a, b) -> float # always float
divide(a, 0) -> raises # never returns None for errors
Contract 2 — Always returns or raises (validation):
parse_age("25") -> int # always int
parse_age("abc") -> raises # invalid input = exception
parse_age("-5") -> raises # domain violation = exception
Contract 3 — Returns value or None (lookup):
find_user("alice") -> dict # found
find_user("xxx") -> None # not found (absence, not error)
When to raise vs return None:
- Raise when the caller gave you bad input (wrong type, out of range, impossible operation). The caller made a mistake.
- Return None when the input is valid but the item does not exist. "Not found" is a normal outcome, not an error.
- Never mix both in the same function — pick one contract and stick to it.
Expected Output
divide(10, 3): 3.3333\ndivide(10, 0): raised ZeroDivisionError\nparse_age('25'): 25\nparse_age('abc'): raised ValueError\nparse_age('-5'): raised ValueError\nfind_user('alice'): [email protected]\nfind_user('unknown'): NoneHints
Hint 1: Functions that compute or get values should raise exceptions on invalid input rather than returning None. Reserve None for "not found" semantics in lookup functions.
Hint 2: Design three contracts: (1) `divide` always returns float or raises, (2) `parse_age` always returns int or raises, (3) `find_user` returns dict or None (lookup semantics). Each is consistent within its own contract.
The function below mixes I/O (database lookup, logging) with business logic (discount calculation). Refactor so the discount logic is a pure function that can be tested without any mocks or database.
def compute_price(
base_price,
*,
is_premium=False,
years_active=0,
):
"""Compute final price after applying loyalty discount.
Pure function — no I/O, no side effects, fully testable.
"""
if is_premium:
discount = min(0.20 + years_active * 0.01, 0.35)
else:
discount = 0.0
return round(base_price * (1 - discount), 2)
# Test pure logic — no mocks needed!
print(f"Basic (0 years): ${compute_price(100, is_premium=False):.2f}")
print(f"Premium (0 years): ${compute_price(100, is_premium=True):.2f}")
print(f"Premium (5 years): ${compute_price(100, is_premium=True, years_active=5):.2f}")
print(f"Premium (20 years): ${compute_price(100, is_premium=True, years_active=20):.2f}")
print("Loyalty discount capped at 35%")Solution
# BEFORE — impure, mixes I/O with logic
def get_price_bad(user_id, base_price):
user = db.get_user(user_id) # I/O: database read
logger.info(f"Computing price for {user_id}") # side effect: logging
if user.is_premium:
discount = min(0.20 + user.years_active * 0.01, 0.35)
else:
discount = 0.0
return round(base_price * (1 - discount), 2)
# Testing requires: mock db, mock logger, create fake user — painful
# AFTER — pure logic separated from I/O
def compute_price(
base_price,
*,
is_premium=False,
years_active=0,
):
"""Compute final price after applying loyalty discount.
Pure function — no I/O, no side effects, fully testable.
"""
if is_premium:
discount = min(0.20 + years_active * 0.01, 0.35)
else:
discount = 0.0
return round(base_price * (1 - discount), 2)
# I/O wrapper (lives at the boundary of the system)
def get_user_price(user_id, base_price):
"""Fetch user and compute their price. I/O at the boundary."""
# user = db.get_user(user_id) # I/O happens here
# return compute_price(base_price, is_premium=user.is_premium, years_active=user.years_active)
pass
# Test the PURE logic — zero mocks
print(f"Basic (0 years): ${compute_price(100, is_premium=False):.2f}")
print(f"Premium (0 years): ${compute_price(100, is_premium=True):.2f}")
print(f"Premium (5 years): ${compute_price(100, is_premium=True, years_active=5):.2f}")
print(f"Premium (20 years): ${compute_price(100, is_premium=True, years_active=20):.2f}")
print("Loyalty discount capped at 35%")
Pure vs impure comparison:
IMPURE — get_price_bad(user_id, 100):
Requires: database mock, logger mock, fake user object
Test setup: ~15 lines of mocking boilerplate
Risk: test might fail due to mock setup, not logic bugs
PURE — compute_price(100, is_premium=True, years_active=5):
Requires: nothing
Test: assert compute_price(100, is_premium=True, years_active=5) == 75.0
Risk: only fails if the logic is wrong
The pattern: Push I/O to the edges of your system. Keep business logic pure in the middle. The thin I/O wrapper fetches data, calls the pure function, and saves the result.
Expected Output
Basic (0 years): $100.00\nPremium (0 years): $80.00\nPremium (5 years): $75.00\nPremium (20 years): $65.00\nLoyalty discount capped at 35%Hints
Hint 1: A pure function takes all data as arguments and returns a result with no side effects. It never calls a database, API, or logger.
Hint 2: Extract the pricing logic into a pure function that takes `is_premium` and `years_active` as inputs. The impure wrapper fetches the user data and calls the pure function.
The function below has 7 parameters — too many to keep straight. Refactor by grouping the delivery options into an EmailConfig dataclass, reducing the function signature to essentials plus a config object.
from dataclasses import dataclass
@dataclass
class EmailConfig:
"""Configuration for email delivery."""
html: bool = True
retries: int = 3
timeout_seconds: int = 30
def send_email(
to,
subject,
body,
*,
config=None,
):
"""Send an email with the given configuration."""
if config is None:
config = EmailConfig()
print(f"Sending email...")
print(f" to: {to}")
print(f" subject: {subject}")
print(f" html: {config.html}, retries: {config.retries}, timeout: {config.timeout_seconds}s")
# Default config
send_email("[email protected]", "Hello", "Hi Alice!")
# Custom config
urgent_config = EmailConfig(html=False, retries=1, timeout_seconds=10)
send_email("[email protected]", "Alert", "System down!", config=urgent_config)Solution
# BEFORE — too many parameters
def send_email_bad(to, subject, body, html=True, retries=3,
timeout=30, track_opens=False):
pass
# Call site is messy:
# send_email_bad("[email protected]", "Hi", "body", True, 3, 30, False)
# AFTER — grouped into config dataclass
from dataclasses import dataclass
@dataclass
class EmailConfig:
"""Configuration for email delivery."""
html: bool = True
retries: int = 3
timeout_seconds: int = 30
def send_email(
to,
subject,
body,
*,
config=None,
):
"""Send an email with the given configuration."""
if config is None:
config = EmailConfig()
print(f"Sending email...")
print(f" to: {to}")
print(f" subject: {subject}")
print(f" html: {config.html}, retries: {config.retries}, timeout: {config.timeout_seconds}s")
urgent_config = EmailConfig(html=False, retries=1, timeout_seconds=10)
Why config objects beat long parameter lists:
1. Reusable — define once, pass to many calls:
prod_config = EmailConfig(retries=5, timeout_seconds=60)
send_email("[email protected]", "Test1", "...", config=prod_config)
send_email("[email protected]", "Test2", "...", config=prod_config)
2. Testable — config is a plain data object:
assert EmailConfig().retries == 3
assert EmailConfig(retries=1).retries == 1
3. Extensible — add a field without changing callers:
@dataclass
class EmailConfig:
html: bool = True
retries: int = 3
timeout_seconds: int = 30
track_opens: bool = False # New! No existing calls break.
4. Self-documenting — IDE shows all fields and defaults
Rule of thumb: When 3+ parameters "travel together" and configure the same concern, extract them into a dataclass.
Expected Output
Sending email...\n to: [email protected]\n subject: Hello\n html: True, retries: 3, timeout: 30s\nSending email...\n to: [email protected]\n subject: Alert\n html: False, retries: 1, timeout: 10sHints
Hint 1: When a function has more than 3-4 related parameters, group them into a dataclass. This gives you named fields, defaults, type safety, and a clean repr.
Hint 2: The `EmailConfig` dataclass holds delivery options (html, retries, timeout). The `send_email` function takes the essential args (to, subject, body) plus the config object.
Hard
The function below is a worst-case API — cryptic name, mystery parameters, magic integers, positional booleans, and no documentation. Redesign it from scratch applying every clean API principle from the lesson.
from enum import Enum
class Aggregation(Enum):
SUM = "sum"
AVERAGE = "average"
MAX = "max"
def aggregate_top_values(
values,
aggregation,
*,
top_n=None,
use_absolute=False,
):
"""Aggregate the top-N values from a list.
Args:
values: Input list of numeric values.
aggregation: Aggregation method (SUM, AVERAGE, MAX).
top_n: If given, consider only the top N values. None means all.
use_absolute: If True, take absolute value of each element first.
Returns:
The aggregated result as a float.
Raises:
ValueError: If values is empty.
"""
if not values:
raise ValueError("Cannot aggregate empty list")
processed = [abs(v) for v in values] if use_absolute else list(values)
processed.sort(reverse=True)
selected = processed[:top_n] if top_n is not None else processed
if aggregation == Aggregation.SUM:
return float(sum(selected))
if aggregation == Aggregation.AVERAGE:
return sum(selected) / len(selected)
if aggregation == Aggregation.MAX:
return float(max(selected))
raise ValueError(f"Unknown aggregation: {aggregation}")
# Clean call sites — self-documenting
result1 = aggregate_top_values(
[-3, 1, -5, 2],
Aggregation.SUM,
use_absolute=True,
top_n=2,
)
print(f"Top 2 (sum, absolute): {result1}")
result2 = aggregate_top_values([1, 2, 3, 4], Aggregation.AVERAGE)
print(f"All (average): {result2}")
result3 = aggregate_top_values([3, 1, 5, 2], Aggregation.MAX, top_n=3)
print(f"Top 3 (max): {int(result3)}")Solution
# BEFORE — the worst API
def do(x, y, z=1, flag=False, out=None):
if flag:
x = [abs(v) for v in x]
x = sorted(x, reverse=True)[:z]
if y == "sum":
result = sum(x)
elif y == "avg":
result = sum(x) / len(x) if x else 0
elif y == "max":
result = max(x) if x else 0
if out == "print":
print(result)
return result
# Problems:
# 1. Name "do" says nothing
# 2. x, y, z, flag, out — meaningless parameter names
# 3. z=1 means "all" — magic value
# 4. y is a string mode flag — easy to typo
# 5. flag is a positional boolean
# 6. out="print" mixes I/O with computation
# 7. No type annotations, no docstring
# AFTER — clean API
from enum import Enum
class Aggregation(Enum):
SUM = "sum"
AVERAGE = "average"
MAX = "max"
def aggregate_top_values(
values,
aggregation,
*,
top_n=None,
use_absolute=False,
):
"""Aggregate the top-N values from a list.
Args:
values: Input list of numeric values.
aggregation: Aggregation method (SUM, AVERAGE, MAX).
top_n: If given, consider only the top N values. None means all.
use_absolute: If True, take absolute value of each element first.
Returns:
The aggregated result as a float.
Raises:
ValueError: If values is empty.
"""
if not values:
raise ValueError("Cannot aggregate empty list")
processed = [abs(v) for v in values] if use_absolute else list(values)
processed.sort(reverse=True)
selected = processed[:top_n] if top_n is not None else processed
if aggregation == Aggregation.SUM:
return float(sum(selected))
if aggregation == Aggregation.AVERAGE:
return sum(selected) / len(selected)
if aggregation == Aggregation.MAX:
return float(max(selected))
raise ValueError(f"Unknown aggregation: {aggregation}")
result1 = aggregate_top_values(
[-3, 1, -5, 2],
Aggregation.SUM,
use_absolute=True,
top_n=2,
)
print(f"Top 2 (sum, absolute): {result1}")
result2 = aggregate_top_values([1, 2, 3, 4], Aggregation.AVERAGE)
print(f"All (average): {result2}")
result3 = aggregate_top_values([3, 1, 5, 2], Aggregation.MAX, top_n=3)
print(f"Top 3 (max): {int(result3)}")
Every fix mapped to a principle:
| Before | After | Principle |
|---|---|---|
do | aggregate_top_values | Naming communicates intent |
x | values | Descriptive parameter names |
y = "sum" | Aggregation.SUM | Enum replaces magic strings |
z = 1 (means all) | top_n = None | None means "no limit" — no magic values |
flag (positional bool) | use_absolute (keyword-only) | Boolean flag anti-pattern fixed |
out = "print" | Removed | Single responsibility — I/O is not this function's job |
| No annotations | Full annotations | Type annotations as documentation |
| No docstring | Google-style docstring | Documents args, returns, raises |
| Returns int or float | Always returns float | Consistent return type |
Expected Output
Top 2 (sum, absolute): 8.0\nAll (average): 2.5\nTop 3 (max): 5Hints
Hint 1: Identify every problem: cryptic name, positional booleans, integer mode flag, inconsistent return behavior, no type annotations, no docstring.
Hint 2: Apply all principles: (1) descriptive function name, (2) enum for aggregation mode, (3) keyword-only args, (4) type annotations, (5) consistent float return, (6) Google-style docstring.
Implement a QueryBuilder that constructs a database query configuration step by step. The builder should support method chaining (fluent interface) and produce an immutable QueryConfig result. Validate that a table is set before building.
from dataclasses import dataclass, field
@dataclass(frozen=True)
class QueryConfig:
"""Immutable query configuration — cannot be modified after creation."""
table: str
filters: tuple
order_by: str
limit: int
offset: int
class QueryBuilder:
"""Fluent builder for constructing QueryConfig objects."""
def __init__(self):
self._table = None
self._filters = []
self._order_by = None
self._limit = 100
self._offset = 0
def from_table(self, table):
"""Set the target table."""
self._table = table
return self
def where(self, column, operator, value):
"""Add a filter condition."""
self._filters.append((column, operator, value))
return self
def order_by(self, column):
"""Set the sort column."""
self._order_by = column
return self
def limit(self, n):
"""Set the maximum number of results."""
self._limit = n
return self
def offset(self, n):
"""Set the starting offset."""
self._offset = n
return self
def build(self):
"""Validate and produce an immutable QueryConfig."""
if not self._table:
raise ValueError("Table is required — call .from_table() first")
return QueryConfig(
table=self._table,
filters=tuple(self._filters),
order_by=self._order_by or self._table,
limit=self._limit,
offset=self._offset,
)
# Fluent chained usage
config1 = (
QueryBuilder()
.from_table("users")
.where("age", ">", 21)
.where("status", "=", "active")
.order_by("name")
.limit(10)
.build()
)
print(config1)
config2 = (
QueryBuilder()
.from_table("orders")
.where("total", ">", 100)
.order_by("created_at")
.limit(50)
.build()
)
print(config2)Solution
from dataclasses import dataclass, field
@dataclass(frozen=True)
class QueryConfig:
"""Immutable query configuration — cannot be modified after creation."""
table: str
filters: tuple
order_by: str
limit: int
offset: int
class QueryBuilder:
"""Fluent builder for constructing QueryConfig objects."""
def __init__(self):
self._table = None
self._filters = []
self._order_by = None
self._limit = 100
self._offset = 0
def from_table(self, table):
self._table = table
return self # enables chaining
def where(self, column, operator, value):
self._filters.append((column, operator, value))
return self
def order_by(self, column):
self._order_by = column
return self
def limit(self, n):
self._limit = n
return self
def offset(self, n):
self._offset = n
return self
def build(self):
if not self._table:
raise ValueError("Table is required — call .from_table() first")
return QueryConfig(
table=self._table,
filters=tuple(self._filters),
order_by=self._order_by or self._table,
limit=self._limit,
offset=self._offset,
)
config1 = (
QueryBuilder()
.from_table("users")
.where("age", ">", 21)
.where("status", "=", "active")
.order_by("name")
.limit(10)
.build()
)
print(config1)
config2 = (
QueryBuilder()
.from_table("orders")
.where("total", ">", 100)
.order_by("created_at")
.limit(50)
.build()
)
print(config2)
Builder pattern anatomy:
QueryBuilder() # 1. Create mutable builder
.from_table("users") # 2. Set required fields
.where("age", ">", 21) # 3. Add optional config (chainable)
.where("status", "=", "ok") # 4. Chain more config
.order_by("name") # 5. Chain more config
.limit(10) # 6. Chain more config
.build() # 7. Validate + produce immutable result
Why builders beat constructors for complex objects:
- Readable: Each step is named —
.where("age", ">", 21)vs positional args. - Flexible: Optional steps can be skipped — only
.from_table()is required. - Safe:
build()validates before producing the immutable result. Thefrozen=Truedataclass prevents accidental mutation after construction. - Composable: You can pass a partially-built builder around and let different parts of the system add their constraints.
When to use the builder pattern:
- Object has many optional fields with sensible defaults
- Construction requires validation that spans multiple fields
- You want the final object to be immutable
- You want call sites to read like a declarative specification
Expected Output
QueryConfig(table='users', filters=[('age', '>', 21), ('status', '=', 'active')], order_by='name', limit=10, offset=0)\nQueryConfig(table='orders', filters=[('total', '>', 100)], order_by='created_at', limit=50, offset=0)Hints
Hint 1: A builder collects configuration step by step and produces a final immutable object. Each method returns `self` to enable chaining.
Hint 2: The `build()` method validates the configuration and returns a frozen dataclass or namedtuple. Once built, the config cannot be modified — this prevents accidental mutation.
Implement a lazy, chainable Pipeline class that supports filter, map, sort_by, limit, collect, and reduce. All transformations should be deferred until collect() or reduce() is called. Each chainable method returns a new Pipeline (does not mutate the original).
from functools import reduce as functools_reduce
class Pipeline:
"""Lazy, chainable data transformation pipeline."""
def __init__(self, source):
self._source = source
self._ops = []
def _clone_with(self, op):
"""Create a new Pipeline with an additional operation."""
new = Pipeline(self._source)
new._ops = self._ops + [op]
return new
def filter(self, predicate):
"""Keep elements where predicate returns True."""
return self._clone_with(lambda data, p=predicate: [x for x in data if p(x)])
def map(self, transform):
"""Apply transform to every element."""
return self._clone_with(lambda data, f=transform: [f(x) for x in data])
def sort_by(self, key, *, descending=False):
"""Sort elements by a key name or function."""
if isinstance(key, str):
key_fn = lambda x, k=key: x[k]
else:
key_fn = key
return self._clone_with(
lambda data, kf=key_fn, desc=descending: sorted(data, key=kf, reverse=desc)
)
def limit(self, n):
"""Keep only the first n elements."""
return self._clone_with(lambda data, n=n: data[:n])
def collect(self):
"""Execute all operations and return a list."""
result = list(self._source)
for op in self._ops:
result = op(result)
return result
def reduce(self, func, initial):
"""Execute operations and fold into a single value."""
return functools_reduce(func, self.collect(), initial)
# Test data
records = [
{"name": "Alice", "score": 80, "active": True},
{"name": "Bob", "score": 95, "active": False},
{"name": "Carol", "score": 70, "active": True},
{"name": "Dave", "score": 88, "active": True},
]
# Chain: filter active -> boost scores -> sort -> top 2
top = (
Pipeline(records)
.filter(lambda r: r["active"])
.map(lambda r: {"name": r["name"], "score": round(r["score"] * 1.1, 1)})
.sort_by("score", descending=True)
.limit(2)
.collect()
)
print(f"Top scorers: {top}")
# Reduce: sum of active scores
total = (
Pipeline(records)
.filter(lambda r: r["active"])
.reduce(lambda acc, r: acc + r["score"], 0)
)
print(f"Total active score: {total}")
# Reuse: same pipeline, different terminal
names = (
Pipeline(records)
.filter(lambda r: r["active"])
.sort_by("score", descending=True)
.map(lambda r: r["name"])
.collect()
)
print(f"Names: {names}")Solution
from functools import reduce as functools_reduce
class Pipeline:
"""Lazy, chainable data transformation pipeline."""
def __init__(self, source):
self._source = source
self._ops = []
def _clone_with(self, op):
"""Create a new Pipeline with an additional operation."""
new = Pipeline(self._source)
new._ops = self._ops + [op]
return new
def filter(self, predicate):
return self._clone_with(lambda data, p=predicate: [x for x in data if p(x)])
def map(self, transform):
return self._clone_with(lambda data, f=transform: [f(x) for x in data])
def sort_by(self, key, *, descending=False):
if isinstance(key, str):
key_fn = lambda x, k=key: x[k]
else:
key_fn = key
return self._clone_with(
lambda data, kf=key_fn, desc=descending: sorted(data, key=kf, reverse=desc)
)
def limit(self, n):
return self._clone_with(lambda data, n=n: data[:n])
def collect(self):
result = list(self._source)
for op in self._ops:
result = op(result)
return result
def reduce(self, func, initial):
return functools_reduce(func, self.collect(), initial)
records = [
{"name": "Alice", "score": 80, "active": True},
{"name": "Bob", "score": 95, "active": False},
{"name": "Carol", "score": 70, "active": True},
{"name": "Dave", "score": 88, "active": True},
]
top = (
Pipeline(records)
.filter(lambda r: r["active"])
.map(lambda r: {"name": r["name"], "score": round(r["score"] * 1.1, 1)})
.sort_by("score", descending=True)
.limit(2)
.collect()
)
print(f"Top scorers: {top}")
total = (
Pipeline(records)
.filter(lambda r: r["active"])
.reduce(lambda acc, r: acc + r["score"], 0)
)
print(f"Total active score: {total}")
names = (
Pipeline(records)
.filter(lambda r: r["active"])
.sort_by("score", descending=True)
.map(lambda r: r["name"])
.collect()
)
print(f"Names: {names}")
Key design decisions:
1. LAZY — operations stored, not executed:
Pipeline(data).filter(fn).map(fn) # no work done yet
.collect() # NOW all ops run in sequence
2. IMMUTABLE CHAINS — each method returns a NEW Pipeline:
base = Pipeline(data).filter(active)
branch_a = base.limit(5) # does not modify base
branch_b = base.sort_by("score") # does not modify base
3. LAMBDA CAPTURE — default arg trick prevents late binding:
lambda data, p=predicate: ... # p is bound at creation time
# Without p=predicate, all lambdas would share the last predicate
4. FLUENT INTERFACE — return self/new enables chaining:
Pipeline(data).filter(fn).map(fn).limit(5).collect()
# Reads like a sentence: "filter, then map, then limit, then collect"
Clean API principles applied:
- Naming:
filter,map,sort_by,limit,collect,reduce— all standard verbs from functional programming. - Keyword-only:
sort_by(key, *, descending=False)— preventssort_by("name", True). - Consistent returns: chainable methods return
Pipeline, terminal methods return the result type. - Single responsibility: each method does exactly one thing — add one operation to the chain.
- Least surprise:
collect()returns a list,reduce()returns a single value — no surprises.
Expected Output
Top scorers: [{'name': 'Dave', 'score': 96.8}, {'name': 'Alice', 'score': 88.0}]\nTotal active score: 238\nNames: ['Dave', 'Alice', 'Carol']Hints
Hint 1: Each method (filter, map, sort_by, limit) should store the operation but NOT execute it. Return a new Pipeline instance with the operation appended.
Hint 2: The `collect()` method applies all stored operations in order. The `reduce()` method applies them and then folds. Lazy evaluation means no work happens until collect/reduce is called.
