Code Smells - Recognizing and Eliminating Bad Patterns
Reading time: ~24 minutes | Level: Foundation → Engineering
# Can you spot the five code smells in this 15-line function?
def proc(u, data, x=[], flag=True, flag2=False):
# process the user data
result = None
for i in range(len(data)):
d = data[i]
if d != None:
if d['t'] == 1 or d['t'] == 2 or d['t'] == 3:
if u['active'] == True:
x.append(d['v'])
result = sum(x) / len(x)
return result
Every line of this function is a code smell. The mutable default argument (x=[]) is a live bug. The parameter names (u, d, flag) are opaque. The if d != None should be if d is not None. The repeated d['t'] comparisons cry out for a set membership check. The nested conditionals hide the logic. No comment explains the purpose.
Code smells are not bugs. They are patterns that indicate the code was written under pressure, grew without design, or accumulated debt over time. A code smell does not cause an immediate failure - it causes the code to become progressively harder to change, understand, and trust. Left unaddressed, smells compound: a function that is too long also tends to have magic numbers, nested conditionals, and inconsistent names.
Learning to name code smells is a professional skill. Once you can say "this is Feature Envy" or "this is a Boolean Trap," you can communicate problems precisely with your team and apply the correct refactoring rather than guessing.
What You Will Learn
- The formal definition of a code smell and why it matters to name them
- Long Function and how to apply Extract Function systematically
- Long Parameter List and the Parameter Object solution
- Duplicate Code (DRY violations) and how to identify copy-paste programming
- Dead Code - unused imports, commented-out code, unreachable branches
- Magic Numbers and Strings - hardcoded values that obscure intent
- Inconsistent Names across a codebase
- Nested Conditionals (the arrow anti-pattern) and guard clauses
- Feature Envy - a method that belongs in another class
- God Class / God Function - the single-responsibility violation at scale
- Primitive Obsession - passing raw strings and ints for domain concepts
- Comments as Excuses - the worst kind of comment
- Overly Defensive Code - try/except everywhere, None checks on every line
- Boolean Trap - positional boolean arguments that mean nothing at the call site
- Mutable Default Arguments - Python's most notorious subtle bug
Prerequisites
- Understanding of Python functions, classes, and modules
- Familiarity with the refactoring concepts from the previous lesson
- Basic understanding of
dataclass,enum, and type hints
What Is a Code Smell?
The term was introduced by Kent Beck and popularized by Martin Fowler's book Refactoring. A code smell is a surface indication that usually corresponds to a deeper problem in the system. Like a biological smell, it is a signal worth investigating - not always a confirmed problem, but never a signal to ignore.
Smells differ from bugs:
| Bug | Code Smell | |
|---|---|---|
| Breaks tests? | Yes | No |
| Immediate user impact? | Usually | Rarely |
| Harm caused | Incorrect behavior | Increased cost of change |
| Fix urgency | Immediate | During refactoring cycles |
The cost of a code smell is measured in time: the extra time it takes to understand the code, the extra time it takes to change it safely, and the extra time required to debug the inevitable bug that the smell eventually enables.
Smell 1: Long Function
A function that spans more than 20-30 lines almost always has more than one responsibility. Long functions are harder to test (you must construct the full context to trigger any internal branch), harder to name accurately, and harder to reuse.
Before
def generate_invoice(order_id: int) -> dict:
order = db.query("SELECT * FROM orders WHERE id = ?", order_id)
if not order:
raise ValueError(f"Order {order_id} not found")
items = db.query("SELECT * FROM order_items WHERE order_id = ?", order_id)
customer = db.query("SELECT * FROM customers WHERE id = ?", order["customer_id"])
subtotal = 0
for item in items:
product = db.query("SELECT * FROM products WHERE id = ?", item["product_id"])
line_price = product["price"] * item["quantity"]
if item.get("discount_percent"):
line_price *= (1 - item["discount_percent"] / 100)
subtotal += line_price
if customer["tier"] == "gold":
subtotal *= 0.95
elif customer["tier"] == "platinum":
subtotal *= 0.90
tax_rates = {"CA": 0.0725, "NY": 0.08, "TX": 0.0625}
tax_rate = tax_rates.get(customer["state"], 0.07)
tax = subtotal * tax_rate
invoice_number = f"INV-{order_id:06d}"
return {
"invoice_number": invoice_number,
"customer": customer["name"],
"items": items,
"subtotal": round(subtotal, 2),
"tax": round(tax, 2),
"total": round(subtotal + tax, 2),
}
After
TIER_DISCOUNTS = {"gold": 0.05, "platinum": 0.10}
STATE_TAX_RATES = {"CA": 0.0725, "NY": 0.08, "TX": 0.0625}
DEFAULT_TAX_RATE = 0.07
def calculate_line_price(item: dict) -> float:
product = db.query("SELECT * FROM products WHERE id = ?", item["product_id"])
price = product["price"] * item["quantity"]
if item.get("discount_percent"):
price *= 1 - item["discount_percent"] / 100
return price
def calculate_subtotal(items: list[dict]) -> float:
return sum(calculate_line_price(item) for item in items)
def apply_tier_discount(subtotal: float, customer_tier: str) -> float:
discount = TIER_DISCOUNTS.get(customer_tier, 0.0)
return subtotal * (1 - discount)
def calculate_tax(subtotal: float, state: str) -> float:
rate = STATE_TAX_RATES.get(state, DEFAULT_TAX_RATE)
return subtotal * rate
def generate_invoice(order_id: int) -> dict:
order = db.query("SELECT * FROM orders WHERE id = ?", order_id)
if not order:
raise ValueError(f"Order {order_id} not found")
items = db.query("SELECT * FROM order_items WHERE order_id = ?", order_id)
customer = db.query("SELECT * FROM customers WHERE id = ?", order["customer_id"])
subtotal = apply_tier_discount(calculate_subtotal(items), customer["tier"])
tax = calculate_tax(subtotal, customer["state"])
return {
"invoice_number": f"INV-{order_id:06d}",
"customer": customer["name"],
"items": items,
"subtotal": round(subtotal, 2),
"tax": round(tax, 2),
"total": round(subtotal + tax, 2),
}
Refactoring: Extract Function.
Smell 2: Long Parameter List
A function with more than four parameters is hard to call correctly. The caller must track the order and meaning of each argument. Long parameter lists also tend to grow over time: each new requirement adds one more parameter.
Before
def send_email(
to_address, subject, body, from_address,
reply_to, cc, bcc, is_html, track_opens,
campaign_id, template_id, attachments
):
...
After
from dataclasses import dataclass, field
@dataclass
class EmailMessage:
to_address: str
subject: str
body: str
reply_to: str | None = None
cc: list[str] = field(default_factory=list)
bcc: list[str] = field(default_factory=list)
is_html: bool = False
track_opens: bool = True
campaign_id: str | None = None
template_id: str | None = None
attachments: list[str] = field(default_factory=list)
def send_email(message: EmailMessage) -> None:
...
The call site now reads clearly and is immune to argument-order mistakes:
send_email(EmailMessage(
subject="Your invoice is ready",
body="<p>Please find your invoice attached.</p>",
is_html=True,
attachments=["invoice_1234.pdf"],
))
Refactoring: Introduce Parameter Object.
Smell 3: Duplicate Code
Duplicate code - the same logic copy-pasted into two or more places - is one of the most expensive smells. When the logic changes (and it always does), every copy must be updated. One copy always gets missed.
Before
def get_admin_report(start_date, end_date):
orders = db.query(
"SELECT * FROM orders WHERE created_at BETWEEN ? AND ? AND status != 'cancelled'",
start_date, end_date
)
total = sum(o["total"] for o in orders)
return {"count": len(orders), "total": total, "orders": orders}
def get_manager_report(start_date, end_date, region):
orders = db.query(
"SELECT * FROM orders WHERE created_at BETWEEN ? AND ? AND status != 'cancelled' AND region = ?",
start_date, end_date, region
)
total = sum(o["total"] for o in orders) # identical logic
return {"count": len(orders), "total": total, "orders": orders}
After
def summarize_orders(orders: list[dict]) -> dict:
"""Build a summary dict from a list of order records."""
return {
"count": len(orders),
"total": sum(o["total"] for o in orders),
"orders": orders,
}
def get_active_orders(start_date, end_date, region: str | None = None) -> list[dict]:
"""Fetch non-cancelled orders, optionally filtered by region."""
if region:
return db.query(
"SELECT * FROM orders WHERE created_at BETWEEN ? AND ? "
"AND status != 'cancelled' AND region = ?",
start_date, end_date, region,
)
return db.query(
"SELECT * FROM orders WHERE created_at BETWEEN ? AND ? AND status != 'cancelled'",
start_date, end_date,
)
def get_admin_report(start_date, end_date) -> dict:
return summarize_orders(get_active_orders(start_date, end_date))
def get_manager_report(start_date, end_date, region: str) -> dict:
return summarize_orders(get_active_orders(start_date, end_date, region))
Refactoring: Extract Function; apply DRY (Don't Repeat Yourself).
Smell 4: Dead Code
Dead code is code that can never be reached, is never called, or was commented out and never deleted. It adds noise, confuses readers, and wastes maintenance attention.
# Dead import
import os # Never used anywhere in this file
# Dead function - nothing calls this anymore
def _old_validate_email(email):
return "@" in email
# Commented-out code - should be deleted, not parked here
# def send_slack_notification(channel, message):
# requests.post(SLACK_WEBHOOK, json={"channel": channel, "text": message})
def process_event(event):
if event["type"] == "login":
handle_login(event)
elif event["type"] == "logout":
handle_logout(event)
else:
pass # dead: the else was never cleaned up after a feature removal
After
def process_event(event: dict) -> None:
if event["type"] == "login":
handle_login(event)
elif event["type"] == "logout":
handle_logout(event)
# Unhandled event types are silently ignored by design (see ADR-0017)
Tool: ruff check --select F401 removes unused imports automatically. vulture detects unreachable functions and variables. Version control is the real archive - deleted code lives in git history; there is no reason to keep it in comments.
Smell 5: Magic Numbers and Strings
A magic number or string is a literal value embedded in code with no explanation of its purpose. The value 650 means nothing. MINIMUM_CREDIT_SCORE = 650 is self-documenting.
Before
def is_eligible(user):
return user.age >= 18 and user.credit_score >= 650
def calculate_fee(amount):
if amount < 1000:
return amount * 0.029 + 0.30
return amount * 0.022 + 0.30
if response.status_code == 429:
time.sleep(60)
retry()
After
MINIMUM_ELIGIBLE_AGE = 18
MINIMUM_CREDIT_SCORE = 650
SMALL_TRANSACTION_THRESHOLD = 1000.00
SMALL_TRANSACTION_RATE = 0.029 # 2.9%
LARGE_TRANSACTION_RATE = 0.022 # 2.2%
FIXED_TRANSACTION_FEE = 0.30 # $0.30 per transaction
HTTP_TOO_MANY_REQUESTS = 429
RATE_LIMIT_RETRY_DELAY_SECONDS = 60
def is_eligible(user) -> bool:
return user.age >= MINIMUM_ELIGIBLE_AGE and user.credit_score >= MINIMUM_CREDIT_SCORE
def calculate_fee(amount: float) -> float:
rate = SMALL_TRANSACTION_RATE if amount < SMALL_TRANSACTION_THRESHOLD else LARGE_TRANSACTION_RATE
return amount * rate + FIXED_TRANSACTION_FEE
if response.status_code == HTTP_TOO_MANY_REQUESTS:
time.sleep(RATE_LIMIT_RETRY_DELAY_SECONDS)
retry()
Refactoring: Replace Magic Number with Symbolic Constant. For constrained string/integer sets, prefer Enum.
Smell 6: Inconsistent Names
Using different names for the same concept across a codebase forces the reader to constantly resolve mental aliases. user, usr, u, current_user, and the_user are all the same concept written by different engineers on different days.
Before
# In auth.py
def get_current_user(token): ...
# In orders.py
def create_order(usr, items): ...
# In billing.py
def charge(u, amount): ...
# In models.py
class UserProfile: # But the DB table is called "accounts"
...
After
Establish a team glossary and enforce it with type annotations:
# Glossary decision: the concept is always "user", Python type is User
# In auth.py
def get_current_user(token: str) -> "User": ...
# In orders.py
def create_order(user: "User", items: list["OrderItem"]) -> "Order": ...
# In billing.py
def charge(user: "User", amount: "Money") -> "Receipt": ...
# In models.py
class User: # DB table aliased in ORM config, concept name wins
...
Type annotations are your best ally for consistency. If every function that accepts a user object uses the type annotation User, a type checker will flag any mismatch and grep will find every usage site.
Smell 7: Nested Conditionals (The Arrow Anti-Pattern)
Deeply nested if/else structures - sometimes four or five levels deep - force the reader to track a growing stack of context. The code visually resembles an arrow pointing right.
Before
def process_withdrawal(user, account, amount):
if user is not None:
if user.is_verified:
if account is not None:
if account.balance >= amount:
if amount <= user.daily_limit:
account.balance -= amount
return "success"
else:
return "exceeds_daily_limit"
else:
return "insufficient_funds"
else:
return "account_not_found"
else:
return "user_not_verified"
else:
return "user_not_found"
After: Guard Clauses
def process_withdrawal(user, account, amount: float) -> str:
if user is None:
return "user_not_found"
if not user.is_verified:
return "user_not_verified"
if account is None:
return "account_not_found"
if account.balance < amount:
return "insufficient_funds"
if amount > user.daily_limit:
return "exceeds_daily_limit"
account.balance -= amount
return "success"
Each guard clause is one line, one reason, one exit. The happy path - the last two lines - is at zero indentation. The reader can scan the failure conditions in a few seconds without tracking any nesting context.
Refactoring: Collapse Else Clause, Replace Nested Conditional with Guard Clauses.
Smell 8: Feature Envy
A method has Feature Envy when it uses the data and methods of another class more than its own. It is a sign the method is in the wrong class.
Before
class Order:
def __init__(self, customer, items, discount_code):
self.customer = customer
self.items = items
self.discount_code = discount_code
class InvoiceGenerator:
def calculate_customer_discount(self, order: Order) -> float:
# This method only touches order.customer - it envies Customer
if order.customer.tier == "gold":
return 0.10
elif order.customer.tier == "platinum":
return 0.15
elif order.customer.loyalty_years >= 5:
return 0.08
return 0.0
calculate_customer_discount is on InvoiceGenerator but exclusively uses customer data. The method belongs on Customer.
After
class Customer:
def __init__(self, tier: str, loyalty_years: int):
self.tier = tier
self.loyalty_years = loyalty_years
def discount_rate(self) -> float:
"""Return this customer's applicable discount rate."""
if self.tier == "gold":
return 0.10
if self.tier == "platinum":
return 0.15
if self.loyalty_years >= 5:
return 0.08
return 0.0
class InvoiceGenerator:
def calculate_customer_discount(self, order: "Order") -> float:
# One line - InvoiceGenerator no longer envies Customer
return order.customer.discount_rate()
Refactoring: Move Method.
Smell 9: God Class / God Function
A God Class is a class that knows everything and does everything. It becomes the bottleneck for every feature request. A God Function is the same problem at the function level - one function that handles end-to-end processing of an entire domain.
Before (Abbreviated)
class App:
def login(self, email, password): ...
def logout(self): ...
def create_order(self, items): ...
def cancel_order(self, order_id): ...
def process_payment(self, amount, card): ...
def refund_payment(self, payment_id): ...
def send_email(self, to, subject, body): ...
def send_sms(self, phone, body): ...
def generate_report(self, report_type): ...
def upload_file(self, file, bucket): ...
def resize_image(self, path, width, height): ...
def geocode_address(self, address): ...
# 40 more methods...
After: Split by Responsibility
class AuthService:
def login(self, email: str, password: str) -> "User": ...
def logout(self, user: "User") -> None: ...
class OrderService:
def create_order(self, user: "User", items: list) -> "Order": ...
def cancel_order(self, order_id: int) -> None: ...
class PaymentService:
def charge(self, amount: float, card: "Card") -> "Payment": ...
def refund(self, payment_id: int) -> "Refund": ...
class NotificationService:
def send_email(self, message: "EmailMessage") -> None: ...
def send_sms(self, message: "SmsMessage") -> None: ...
Each service has a single, nameable responsibility. Each can be developed, tested, and replaced independently.
Refactoring: Extract Class; apply Single Responsibility Principle.
Smell 10: Primitive Obsession
Primitive Obsession is using raw built-in types - str, int, float - to represent domain concepts that have their own validation rules and behavior.
Before
def create_account(
email: str, # Could be any string
phone: str, # Could be any string
currency: str, # Is "EUR" valid? How about "Euros"? "eur"?
amount: float, # Dollars? Cents? Which currency?
country_code: str, # 2-letter ISO? 3-letter? No enforcement
) -> dict:
...
Nothing prevents a caller from passing currency="not-a-currency" or mixing up dollar amounts with cent amounts.
After
from dataclasses import dataclass
from enum import Enum
class Currency(str, Enum):
USD = "USD"
EUR = "EUR"
GBP = "GBP"
JPY = "JPY"
class CountryCode(str, Enum):
US = "US"
GB = "GB"
DE = "DE"
JP = "JP"
@dataclass(frozen=True)
class Email:
address: str
def __post_init__(self):
if "@" not in self.address or "." not in self.address.split("@")[-1]:
raise ValueError(f"Invalid email: {self.address!r}")
@dataclass(frozen=True)
class Money:
amount_cents: int # Always store monetary values in the smallest unit
currency: Currency
@classmethod
def from_dollars(cls, dollars: float, currency: Currency = Currency.USD) -> "Money":
return cls(amount_cents=round(dollars * 100), currency=currency)
def __add__(self, other: "Money") -> "Money":
if self.currency != other.currency:
raise ValueError(f"Cannot add {self.currency} and {other.currency}")
return Money(self.amount_cents + other.amount_cents, self.currency)
def create_account(
email: Email,
phone: str,
currency: Currency,
initial_deposit: Money,
country: CountryCode,
) -> dict:
...
Refactoring: Replace Primitive with Object; use Enum for any constrained string or integer set.
Smell 11: Comments as Excuses
The worst comments are those that apologize for the code, acknowledge that it is broken, or promise a future fix that never comes.
# BAD: excuses, not explanations
result = x * 1.0625 # I'm not sure why this works but don't touch it
# TODO: fix this properly someday
data = data[:-1] # hack to remove trailing newline
# This is terrible, I know, but we needed to ship
if status == 2 or status == 7 or status == 11:
force_refresh()
When you feel the urge to write # hack or # not sure why, the code needs to be understood and fixed - not annotated.
After
# The 6.25% is the federal excise tax rate under 26 USC section 4251
TAX_RATE = 0.0625
result = x * (1 + TAX_RATE)
# Strip the trailing newline that the legacy v2.0 API appends to all responses.
# This can be removed when we drop support for clients below v2.1.
data = data.rstrip("\n")
# These statuses indicate cache-invalidating events per the event schema
# documented at internal/docs/event-schema.md
CACHE_INVALIDATING_STATUSES = {2, 7, 11}
if status in CACHE_INVALIDATING_STATUSES:
force_refresh()
The rule: Comments should explain WHY, not WHAT. If a comment explains what the code does, rewrite the code until the comment is unnecessary.
Smell 12: Overly Defensive Code
Defensive code is good: validate inputs at system boundaries, handle errors, check preconditions. Over-defensive code wraps everything in try/except, adds None checks before every attribute access, and swallows errors it cannot meaningfully respond to.
Before
def get_user_email(user_id):
try:
if user_id is not None:
user = None
try:
user = db.get_user(user_id)
except Exception:
pass
if user is not None:
if hasattr(user, "email"):
if user.email is not None:
return user.email
except Exception:
pass
return None
A database connection failure is silently treated the same as a missing user. The caller receives None and cannot distinguish the two cases.
After
def get_user_email(user_id: int) -> str:
"""Return the email address for the given user_id.
Raises:
UserNotFoundError: if no user exists with that id.
DatabaseError: if the database query fails.
"""
user = db.get_user(user_id) # DatabaseError propagates to the caller
if user is None:
raise UserNotFoundError(f"No user found with id {user_id}")
return user.email
Be defensive at the system boundary: validate user input, handle network timeouts, check file existence before reading. Trust your own code and let unexpected errors propagate where they can be logged, monitored, and fixed at the right layer.
Smell 13: Boolean Trap
A Boolean Trap occurs when a function takes one or more boolean positional parameters whose meaning is invisible at the call site.
Before
def render_widget(data, True, False, True):
...
# At the call site: what do True, False, True mean?
render_widget(data, True, False, True)
After: Keyword-Only Arguments
def render_widget(
data: dict,
*, # Force all following args to be keyword-only
show_border: bool = True,
is_read_only: bool = False,
animate: bool = True,
) -> str:
...
# At the call site: completely self-documenting
render_widget(data, show_border=True, is_read_only=False, animate=True)
The * in the signature makes it syntactically impossible to pass those booleans positionally. For richer cases, replace multiple booleans with an Enum:
from enum import Enum
class WidgetMode(Enum):
INTERACTIVE = "interactive"
READ_ONLY = "read_only"
PREVIEW = "preview"
def render_widget(data: dict, mode: WidgetMode = WidgetMode.INTERACTIVE) -> str:
...
render_widget(data, mode=WidgetMode.READ_ONLY)
Refactoring: Use keyword-only arguments (*); replace boolean flags with Enum types.
Smell 14: Mutable Default Arguments
This is Python's most notorious subtle bug. When you use a mutable object - a list, dict, or set - as a default argument, that object is created once at function definition time and shared across all calls that rely on the default.
The Bug
def add_item(item, items=[]):
items.append(item)
return items
print(add_item("apple")) # ['apple']
print(add_item("banana")) # ['apple', 'banana'] <- not a fresh list!
print(add_item("cherry")) # ['apple', 'banana', 'cherry']
The [] is created once when Python parses the def statement. All calls that omit items share the same list, which accumulates entries across calls. This is almost never the intended behavior.
After: The Canonical Fix
def add_item(item: str, items: list[str] | None = None) -> list[str]:
"""Return a new list with item appended."""
if items is None:
items = []
items.append(item)
return items
print(add_item("apple")) # ['apple']
print(add_item("banana")) # ['banana'] <- correct: fresh list each call
None is immutable - safe as a default. The first line of the function body creates a fresh list when no argument is provided.
The Intentional Exception
Mutable defaults are sometimes used intentionally for simple caching, where persistent state across calls is the goal. This should always be documented:
def cached_fetch(url: str, _cache: dict = {}) -> str:
"""Fetch url, caching results across calls via the module-level _cache dict.
Note: _cache is intentionally mutable. Use functools.lru_cache for
production caching with size limits and thread safety.
"""
if url not in _cache:
_cache[url] = requests.get(url).text
return _cache[url]
Smell 15: Data Clumps
Data Clumps are groups of data items that always appear together but are not formally grouped. Wherever you see (latitude, longitude), (start_date, end_date), or (first_name, last_name, email), these clumps belong in their own type.
Before
def calculate_distance(lat1, lon1, lat2, lon2): ...
def format_coordinates(lat, lon, precision=4): ...
def is_within_radius(center_lat, center_lon, point_lat, point_lon, radius_km): ...
The four-parameter form of is_within_radius requires the caller to know the correct order of eight values. One transposition silently gives wrong answers.
After
from dataclasses import dataclass
import math
@dataclass(frozen=True)
class Coordinates:
latitude: float
longitude: float
def distance_to(self, other: "Coordinates") -> float:
"""Return great-circle distance to other in kilometers."""
R = 6371.0
lat1 = math.radians(self.latitude)
lon1 = math.radians(self.longitude)
lat2 = math.radians(other.latitude)
lon2 = math.radians(other.longitude)
dlat, dlon = lat2 - lat1, lon2 - lon1
a = math.sin(dlat / 2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2)**2
return R * 2 * math.asin(math.sqrt(a))
def is_within_radius(self, center: "Coordinates", radius_km: float) -> bool:
return self.distance_to(center) <= radius_km
def __str__(self) -> str:
return f"{self.latitude:.4f}, {self.longitude:.4f}"
Refactoring: Introduce Value Object / dataclass.
Interview Questions
Q1: What is a code smell and how does it differ from a bug?
Answer: A code smell is a surface indication of a deeper design problem - a pattern in code that suggests structural issues rather than incorrect behavior. Unlike bugs, code smells do not break tests or cause immediate failures. Their harm is indirect: they make code harder to understand, change, and maintain over time, increasing the cost of every future modification. The classic examples are Long Function, Duplicate Code, and Magic Numbers. The term was coined by Kent Beck and popularized by Martin Fowler's Refactoring.
Q2: Explain the mutable default argument problem in Python and the canonical fix.
Answer: When a mutable object such as a list or dict is used as a default parameter value in Python, that object is created once at function definition time, not at each call. All calls that rely on the default share the same object, so mutations accumulate across calls - a list grows indefinitely across calls, for example. The canonical fix is to use None as the default sentinel: def f(items=None): items = [] if items is None else items. None is immutable and safe as a default. The function body creates a fresh mutable object on each call that needs one.
Q3: What is the Boolean Trap and how do keyword-only arguments eliminate it?
Answer: The Boolean Trap occurs when a function takes boolean positional arguments that are meaningless at the call site - render(data, True, False, True) is impossible to understand without reading the function definition. Keyword-only arguments, enforced by placing * in the parameter list, make it syntactically impossible to pass those values positionally. The call site becomes render(data, show_border=True, is_readonly=False, animate=True), which is completely self-documenting. For more complex cases, an Enum replaces multiple booleans with a single named state value.
Q4: Describe Feature Envy and how you recognize it in code.
Answer: Feature Envy occurs when a method uses the data or methods of another class more than its own. You recognize it when a method primarily accesses attributes of one of its parameters: order.customer.tier, order.customer.loyalty_years, order.customer.discount_rate(). The method is doing work that belongs on the class whose data it envies. The fix is to move the method - or its core logic - to the class being accessed most. The original method becomes a one-liner that delegates to the moved version. The result is that each class owns the logic that operates on its own data.
Q5: When is a comment appropriate and when does it indicate a code smell?
Answer: A comment is appropriate when it explains WHY - the business rationale, historical context, a non-obvious algorithmic constraint, or a link to an external specification. A comment is a smell when it explains WHAT the code does, because that means the code is not clear enough to be self-explanatory. "# Loop through items and calculate total" is a smell - rename the function calculate_item_total() and the comment becomes redundant. "# Using exponential backoff per RFC 7231 section 6.6.4" is a good comment - it explains a non-obvious design decision. Comments that apologize - "# hack", "# not sure why this works" - are the clearest signal that the underlying code needs to be fixed, not annotated.
Q6: What is Primitive Obsession and what Python constructs best fix it?
Answer: Primitive Obsession is using raw built-in types (str, int, float) to represent domain concepts that have their own validation rules and behavior. A parameter typed as currency: str can receive any string - "USD", "Euros", "invalid" - without a validation error. The fixes in Python: use Enum for any constrained set of string or integer values (it prevents invalid members at construction time), use @dataclass with __post_init__ validation for compound concepts with their own rules (like Email or Money), and use frozen=True when the value should be immutable. These types consolidate validation at the boundary, make function signatures self-documenting, and enable the type checker to catch misuse statically.
Practice Challenges
Beginner - Identify and Name Five Smells
Read this function and name every code smell you can find. For each smell, identify the refactoring that fixes it.
def calc(x, y, z, a=[], b=True, c=False):
# calculate the result
r = None
if x != None:
if x > 0:
if y != None:
t = x * 1.13 + y * 0.85
if z == 1:
t = t * 0.9
elif z == 2:
t = t * 0.8
elif z == 3:
t = t * 0.75
if b == True:
t = t * 1.1
a.append(t)
r = sum(a) / len(a)
return r
Solution
Smells identified:
- Mutable Default Argument (
a=[]) - Bug:aaccumulates across calls. Fix: usea=None, create[]inside the body. - Magic Numbers (
1.13,0.85,0.9,0.8,0.75,1.1) - Fix: replace with named constants. - Inconsistent / Meaningless Names (
x,y,z,a,b,c,r,t) - Fix: rename to descriptive identifiers. - Boolean Trap / Redundant Comparison (
if b == True) - Fix: useif b:. - Nested Conditionals (Arrow Anti-Pattern) - Fix: guard clauses.
- Comment as WHAT (
# calculate the result) - Fix: delete it; rename the function instead.
TAX_RATE = 1.13
SHIPPING_RATE = 0.85
CATEGORY_MULTIPLIERS = {1: 0.90, 2: 0.80, 3: 0.75}
PREMIUM_SURCHARGE = 1.10
def calculate_running_average(
base_price: float,
shipping_cost: float,
category: int,
include_premium: bool = True,
history: list[float] | None = None,
) -> float | None:
if history is None:
history = []
if base_price is None or base_price <= 0:
return None
if shipping_cost is None:
return None
total = base_price * TAX_RATE + shipping_cost * SHIPPING_RATE
total *= CATEGORY_MULTIPLIERS.get(category, 1.0)
if include_premium:
total *= PREMIUM_SURCHARGE
history.append(total)
return sum(history) / len(history)
Intermediate - Eliminate Feature Envy and a God Function
This ReportGenerator class has Feature Envy (it obsesses over User data) and a large generate method doing too much. Refactor it by moving logic to the right classes and splitting responsibilities.
class User:
def __init__(self, name, email, tier, region, signup_date):
self.name = name
self.email = email
self.tier = tier
self.region = region
self.signup_date = signup_date
class ReportGenerator:
def generate(self, user, orders, report_type, include_tax, currency, fmt):
if user.tier == "free":
discount = 0
elif user.tier == "pro":
discount = 0.10
elif user.tier == "enterprise":
discount = 0.20
else:
discount = 0
subtotal = sum(o["amount"] for o in orders)
discounted = subtotal * (1 - discount)
if include_tax:
tax_rates = {"US": 0.08, "EU": 0.20, "UK": 0.20}
tax = discounted * tax_rates.get(user.region, 0.0)
else:
tax = 0
total = discounted + tax
if fmt == "json":
return {"user": user.name, "total": total, "currency": currency}
elif fmt == "csv":
return f"{user.name},{total},{currency}"
elif fmt == "text":
return f"Report for {user.name}: {total} {currency}"
Solution
from dataclasses import dataclass
from enum import Enum
class UserTier(str, Enum):
FREE = "free"
PRO = "pro"
ENTERPRISE = "enterprise"
class Region(str, Enum):
US = "US"
EU = "EU"
UK = "UK"
class ReportFormat(str, Enum):
JSON = "json"
CSV = "csv"
TEXT = "text"
TIER_DISCOUNTS: dict[UserTier, float] = {
UserTier.FREE: 0.00,
UserTier.PRO: 0.10,
UserTier.ENTERPRISE: 0.20,
}
REGIONAL_TAX_RATES: dict[Region, float] = {
Region.US: 0.08,
Region.EU: 0.20,
Region.UK: 0.20,
}
@dataclass
class User:
name: str
email: str
tier: UserTier
region: Region
signup_date: str
def discount_rate(self) -> float:
"""Return this user's discount rate based on their tier."""
return TIER_DISCOUNTS.get(self.tier, 0.0)
def tax_rate(self) -> float:
"""Return the applicable tax rate for this user's region."""
return REGIONAL_TAX_RATES.get(self.region, 0.0)
@dataclass
class ReportData:
user: User
total: float
currency: str
def as_json(self) -> dict:
return {"user": self.user.name, "total": self.total, "currency": self.currency}
def as_csv(self) -> str:
return f"{self.user.name},{self.total},{self.currency}"
def as_text(self) -> str:
return f"Report for {self.user.name}: {self.total} {self.currency}"
def render(self, fmt: ReportFormat) -> "dict | str":
renderers = {
ReportFormat.JSON: self.as_json,
ReportFormat.CSV: self.as_csv,
ReportFormat.TEXT: self.as_text,
}
return renderers[fmt]()
class ReportGenerator:
def generate(
self,
user: User,
orders: list[dict],
report_type: str,
include_tax: bool,
currency: str,
fmt: ReportFormat,
) -> "dict | str":
subtotal = sum(o["amount"] for o in orders)
discounted = subtotal * (1 - user.discount_rate())
tax = discounted * user.tax_rate() if include_tax else 0.0
total = discounted + tax
return ReportData(user=user, total=total, currency=currency).render(fmt)
Advanced - Fix All Smells in a Real-World Webhook Handler
This function has seven distinct smells. Identify each, explain why it is a smell, and produce a fully refactored version.
def handle_webhook(payload, headers, db, cache, logger, notifier, retry=0):
try:
sig = headers.get('X-Signature')
if sig:
if sig == hashlib.sha256((payload + 'secret123').encode()).hexdigest():
data = json.loads(payload)
if data:
if data.get('type') == 'payment.completed':
order = db.get(data['order_id'])
if order:
order['status'] = 4
db.save(order)
cache.delete('order_' + str(data['order_id']))
notifier.send(order['user_email'], 'Payment received', 'Your payment was processed.')
elif data.get('type') == 'payment.failed':
order = db.get(data['order_id'])
if order:
order['status'] = 5
db.save(order)
notifier.send(order['user_email'], 'Payment failed', 'Please update your payment method.')
except:
pass
Solution
Smells identified:
- Magic Numbers -
4,5(order statuses with no names) - Magic String -
'secret123'(hardcoded secret in business logic) - Nested Conditionals - five levels; arrow anti-pattern
- Overly Defensive Code - bare
except: passsilences all errors including database failures - Duplicate Code - identical
db.get/db.save/notifier.sendpattern duplicated per event type - Long Parameter List - seven parameters; several are infrastructure services that should be injected differently
- God Function - handles signature verification, JSON parsing, event dispatch, and database/cache updates in one place
import hashlib
import json
import logging
from dataclasses import dataclass
from enum import IntEnum
from typing import Callable
logger = logging.getLogger(__name__)
# Load from environment in production - never hardcode secrets
import os
WEBHOOK_SECRET = os.environ.get("WEBHOOK_SECRET", "secret123")
class OrderStatus(IntEnum):
PAYMENT_COMPLETED = 4
PAYMENT_FAILED = 5
@dataclass
class WebhookContext:
"""Infrastructure services needed to process webhook events."""
db: object
cache: object
notifier: object
def verify_signature(payload: str, signature: str) -> bool:
"""Return True if signature matches the HMAC of payload using WEBHOOK_SECRET."""
expected = hashlib.sha256((payload + WEBHOOK_SECRET).encode()).hexdigest()
return signature == expected
def update_order_status(order_id: int, status: OrderStatus, ctx: WebhookContext) -> dict | None:
"""Fetch, update, and save an order's status. Returns None if order not found."""
order = ctx.db.get(order_id)
if order is None:
logger.warning("Webhook referenced unknown order %s", order_id)
return None
order["status"] = status
ctx.db.save(order)
return order
def handle_payment_completed(order_id: int, ctx: WebhookContext) -> None:
order = update_order_status(order_id, OrderStatus.PAYMENT_COMPLETED, ctx)
if order is None:
return
ctx.cache.delete(f"order_{order_id}")
ctx.notifier.send(
order["user_email"],
"Payment received",
"Your payment was processed successfully.",
)
def handle_payment_failed(order_id: int, ctx: WebhookContext) -> None:
order = update_order_status(order_id, OrderStatus.PAYMENT_FAILED, ctx)
if order is None:
return
ctx.notifier.send(
order["user_email"],
"Payment failed",
"Please update your payment method to continue your subscription.",
)
EventHandler = Callable[[int, WebhookContext], None]
EVENT_HANDLERS: dict[str, EventHandler] = {
"payment.completed": handle_payment_completed,
"payment.failed": handle_payment_failed,
}
def handle_webhook(payload: str, headers: dict, ctx: WebhookContext) -> None:
"""Verify, parse, and dispatch an incoming webhook event."""
signature = headers.get("X-Signature", "")
if not verify_signature(payload, signature):
logger.warning("Webhook received with invalid signature - rejecting")
return
try:
data = json.loads(payload)
except json.JSONDecodeError as exc:
logger.error("Malformed JSON in webhook payload: %s", exc)
return
event_type = data.get("type")
handler = EVENT_HANDLERS.get(event_type)
if handler is None:
logger.info("No handler registered for webhook event type %r", event_type)
return
try:
handler(data["order_id"], ctx)
except Exception:
# Log with full traceback and re-raise so the HTTP layer can return 500
# and the webhook provider will retry delivery
logger.exception("Unhandled error processing %r for order %s", event_type, data.get("order_id"))
raise
Quick Reference
| Smell | Signal | Refactoring |
|---|---|---|
| Long Function | More than 25 lines; multiple comment sections | Extract Function |
| Long Parameter List | More than 4 parameters | Introduce Parameter Object |
| Duplicate Code | Same logic in two or more locations | Extract Function; DRY |
| Dead Code | Unused imports, commented-out code, unreachable branches | Delete it |
| Magic Numbers / Strings | Unexplained literals in logic | Replace with Named Constant or Enum |
| Inconsistent Names | Same concept with multiple names | Rename; establish team glossary |
| Nested Conditionals | Arrow anti-pattern; indentation depth 4+ | Guard Clauses / Early Return |
| Feature Envy | Method uses another class's data more than its own | Move Method |
| God Class / Function | One class/function handles everything | Extract Class; Single Responsibility |
| Primitive Obsession | Raw str/int/float for domain concepts | Replace with Value Object; Enum |
| Comments as Excuses | # hack, # not sure why, unapologetic TODOs | Fix the code; write WHY comments |
| Overly Defensive | Bare except: pass, None checks everywhere | Validate at boundary; let errors propagate |
| Boolean Trap | f(True, False, True) - meaningless at call site | Keyword-only args (*); Enum |
| Mutable Default Arg | def f(items=[]): - list shared across calls | Use None sentinel; create object in body |
| Data Clumps | Same group of primitives always travels together | Introduce Value Object / dataclass |
Key Takeaways
- A code smell is not a bug - it is a warning sign of structural problems that make code progressively more expensive to change.
- Mutable default arguments (
def f(items=[])) are a live bug in Python, not just a smell - the object is shared across all calls. - The Boolean Trap (
f(True, False)) is eliminated by keyword-only arguments (*in the signature) or Enum types. - Bare
except: passis never acceptable in production code - it silently swallows failures that should be logged, monitored, and fixed. - Dead code is not preserved by comments - version control preserves it. Delete without guilt.
- Good comments explain WHY; smelly comments explain WHAT or apologize for HOW.
- Primitive Obsession is best fixed with
@dataclassfor grouped domain concepts andEnumfor constrained value sets.
