Code Smells - Recognizing and Eliminating Bad Patterns

Reading time: ~24 minutes | Level: Foundation → Engineering

# Can you spot the five code smells in this 15-line function?
def proc(u, data, x=[], flag=True, flag2=False):
    # process the user data
    result = None
    for i in range(len(data)):
        d = data[i]
        if d != None:
            if d['t'] == 1 or d['t'] == 2 or d['t'] == 3:
                if u['active'] == True:
                    x.append(d['v'])
                    result = sum(x) / len(x)
    return result

Every line of this function is a code smell. The mutable default argument (x=[]) is a live bug. The parameter names (u, d, flag) are opaque. The if d != None should be if d is not None. The repeated d['t'] comparisons cry out for a set membership check. The nested conditionals hide the logic. No comment explains the purpose.

Code smells are not bugs. They are patterns that indicate the code was written under pressure, grew without design, or accumulated debt over time. A code smell does not cause an immediate failure - it causes the code to become progressively harder to change, understand, and trust. Left unaddressed, smells compound: a function that is too long also tends to have magic numbers, nested conditionals, and inconsistent names.

Learning to name code smells is a professional skill. Once you can say "this is Feature Envy" or "this is a Boolean Trap," you can communicate problems precisely with your team and apply the correct refactoring rather than guessing.

What You Will Learn

The formal definition of a code smell and why it matters to name them
Long Function and how to apply Extract Function systematically
Long Parameter List and the Parameter Object solution
Duplicate Code (DRY violations) and how to identify copy-paste programming
Dead Code - unused imports, commented-out code, unreachable branches
Magic Numbers and Strings - hardcoded values that obscure intent
Inconsistent Names across a codebase
Nested Conditionals (the arrow anti-pattern) and guard clauses
Feature Envy - a method that belongs in another class
God Class / God Function - the single-responsibility violation at scale
Primitive Obsession - passing raw strings and ints for domain concepts
Comments as Excuses - the worst kind of comment
Overly Defensive Code - try/except everywhere, None checks on every line
Boolean Trap - positional boolean arguments that mean nothing at the call site
Mutable Default Arguments - Python's most notorious subtle bug

Prerequisites

Understanding of Python functions, classes, and modules
Familiarity with the refactoring concepts from the previous lesson
Basic understanding of dataclass, enum, and type hints

What Is a Code Smell?

The term was introduced by Kent Beck and popularized by Martin Fowler's book Refactoring. A code smell is a surface indication that usually corresponds to a deeper problem in the system. Like a biological smell, it is a signal worth investigating - not always a confirmed problem, but never a signal to ignore.

Smells differ from bugs:

	Bug	Code Smell
Breaks tests?	Yes	No
Immediate user impact?	Usually	Rarely
Harm caused	Incorrect behavior	Increased cost of change
Fix urgency	Immediate	During refactoring cycles

The cost of a code smell is measured in time: the extra time it takes to understand the code, the extra time it takes to change it safely, and the extra time required to debug the inevitable bug that the smell eventually enables.

Smell 1: Long Function

A function that spans more than 20-30 lines almost always has more than one responsibility. Long functions are harder to test (you must construct the full context to trigger any internal branch), harder to name accurately, and harder to reuse.

Before

def generate_invoice(order_id: int) -> dict:
    order = db.query("SELECT * FROM orders WHERE id = ?", order_id)
    if not order:
        raise ValueError(f"Order {order_id} not found")

    items = db.query("SELECT * FROM order_items WHERE order_id = ?", order_id)
    customer = db.query("SELECT * FROM customers WHERE id = ?", order["customer_id"])

    subtotal = 0
    for item in items:
        product = db.query("SELECT * FROM products WHERE id = ?", item["product_id"])
        line_price = product["price"] * item["quantity"]
        if item.get("discount_percent"):
            line_price *= (1 - item["discount_percent"] / 100)
        subtotal += line_price

    if customer["tier"] == "gold":
        subtotal *= 0.95
    elif customer["tier"] == "platinum":
        subtotal *= 0.90

    tax_rates = {"CA": 0.0725, "NY": 0.08, "TX": 0.0625}
    tax_rate = tax_rates.get(customer["state"], 0.07)
    tax = subtotal * tax_rate

    invoice_number = f"INV-{order_id:06d}"
    return {
        "invoice_number": invoice_number,
        "customer": customer["name"],
        "items": items,
        "subtotal": round(subtotal, 2),
        "tax": round(tax, 2),
        "total": round(subtotal + tax, 2),
    }

After

TIER_DISCOUNTS = {"gold": 0.05, "platinum": 0.10}
STATE_TAX_RATES = {"CA": 0.0725, "NY": 0.08, "TX": 0.0625}
DEFAULT_TAX_RATE = 0.07

def calculate_line_price(item: dict) -> float:
    product = db.query("SELECT * FROM products WHERE id = ?", item["product_id"])
    price = product["price"] * item["quantity"]
    if item.get("discount_percent"):
        price *= 1 - item["discount_percent"] / 100
    return price

def calculate_subtotal(items: list[dict]) -> float:
    return sum(calculate_line_price(item) for item in items)

def apply_tier_discount(subtotal: float, customer_tier: str) -> float:
    discount = TIER_DISCOUNTS.get(customer_tier, 0.0)
    return subtotal * (1 - discount)

def calculate_tax(subtotal: float, state: str) -> float:
    rate = STATE_TAX_RATES.get(state, DEFAULT_TAX_RATE)
    return subtotal * rate

def generate_invoice(order_id: int) -> dict:
    order = db.query("SELECT * FROM orders WHERE id = ?", order_id)
    if not order:
        raise ValueError(f"Order {order_id} not found")
    items = db.query("SELECT * FROM order_items WHERE order_id = ?", order_id)
    customer = db.query("SELECT * FROM customers WHERE id = ?", order["customer_id"])

    subtotal = apply_tier_discount(calculate_subtotal(items), customer["tier"])
    tax = calculate_tax(subtotal, customer["state"])

    return {
        "invoice_number": f"INV-{order_id:06d}",
        "customer": customer["name"],
        "items": items,
        "subtotal": round(subtotal, 2),
        "tax": round(tax, 2),
        "total": round(subtotal + tax, 2),
    }

Refactoring: Extract Function.

Smell 2: Long Parameter List

A function with more than four parameters is hard to call correctly. The caller must track the order and meaning of each argument. Long parameter lists also tend to grow over time: each new requirement adds one more parameter.

Before

def send_email(
    to_address, subject, body, from_address,
    reply_to, cc, bcc, is_html, track_opens,
    campaign_id, template_id, attachments
):
    ...

After

from dataclasses import dataclass, field

@dataclass
class EmailMessage:
    to_address: str
    subject: str
    body: str
    from_address: str = "[email protected]"
    reply_to: str | None = None
    cc: list[str] = field(default_factory=list)
    bcc: list[str] = field(default_factory=list)
    is_html: bool = False
    track_opens: bool = True
    campaign_id: str | None = None
    template_id: str | None = None
    attachments: list[str] = field(default_factory=list)

def send_email(message: EmailMessage) -> None:
    ...

The call site now reads clearly and is immune to argument-order mistakes:

send_email(EmailMessage(
    to_address="[email protected]",
    subject="Your invoice is ready",
    body="<p>Please find your invoice attached.</p>",
    is_html=True,
    attachments=["invoice_1234.pdf"],
))

Refactoring: Introduce Parameter Object.

Smell 3: Duplicate Code

Duplicate code - the same logic copy-pasted into two or more places - is one of the most expensive smells. When the logic changes (and it always does), every copy must be updated. One copy always gets missed.

Before

def get_admin_report(start_date, end_date):
    orders = db.query(
        "SELECT * FROM orders WHERE created_at BETWEEN ? AND ? AND status != 'cancelled'",
        start_date, end_date
    )
    total = sum(o["total"] for o in orders)
    return {"count": len(orders), "total": total, "orders": orders}

def get_manager_report(start_date, end_date, region):
    orders = db.query(
        "SELECT * FROM orders WHERE created_at BETWEEN ? AND ? AND status != 'cancelled' AND region = ?",
        start_date, end_date, region
    )
    total = sum(o["total"] for o in orders)  # identical logic
    return {"count": len(orders), "total": total, "orders": orders}

After

def summarize_orders(orders: list[dict]) -> dict:
    """Build a summary dict from a list of order records."""
    return {
        "count": len(orders),
        "total": sum(o["total"] for o in orders),
        "orders": orders,
    }

def get_active_orders(start_date, end_date, region: str | None = None) -> list[dict]:
    """Fetch non-cancelled orders, optionally filtered by region."""
    if region:
        return db.query(
            "SELECT * FROM orders WHERE created_at BETWEEN ? AND ? "
            "AND status != 'cancelled' AND region = ?",
            start_date, end_date, region,
        )
    return db.query(
        "SELECT * FROM orders WHERE created_at BETWEEN ? AND ? AND status != 'cancelled'",
        start_date, end_date,
    )

def get_admin_report(start_date, end_date) -> dict:
    return summarize_orders(get_active_orders(start_date, end_date))

def get_manager_report(start_date, end_date, region: str) -> dict:
    return summarize_orders(get_active_orders(start_date, end_date, region))

Refactoring: Extract Function; apply DRY (Don't Repeat Yourself).

Smell 4: Dead Code

Dead code is code that can never be reached, is never called, or was commented out and never deleted. It adds noise, confuses readers, and wastes maintenance attention.

# Dead import
import os  # Never used anywhere in this file

# Dead function - nothing calls this anymore
def _old_validate_email(email):
    return "@" in email

# Commented-out code - should be deleted, not parked here
# def send_slack_notification(channel, message):
#     requests.post(SLACK_WEBHOOK, json={"channel": channel, "text": message})

def process_event(event):
    if event["type"] == "login":
        handle_login(event)
    elif event["type"] == "logout":
        handle_logout(event)
    else:
        pass  # dead: the else was never cleaned up after a feature removal

After

def process_event(event: dict) -> None:
    if event["type"] == "login":
        handle_login(event)
    elif event["type"] == "logout":
        handle_logout(event)
    # Unhandled event types are silently ignored by design (see ADR-0017)

Tool: ruff check --select F401 removes unused imports automatically. vulture detects unreachable functions and variables. Version control is the real archive - deleted code lives in git history; there is no reason to keep it in comments.

Smell 5: Magic Numbers and Strings

A magic number or string is a literal value embedded in code with no explanation of its purpose. The value 650 means nothing. MINIMUM_CREDIT_SCORE = 650 is self-documenting.

Before

def is_eligible(user):
    return user.age >= 18 and user.credit_score >= 650

def calculate_fee(amount):
    if amount < 1000:
        return amount * 0.029 + 0.30
    return amount * 0.022 + 0.30

if response.status_code == 429:
    time.sleep(60)
    retry()

After

MINIMUM_ELIGIBLE_AGE = 18
MINIMUM_CREDIT_SCORE = 650

SMALL_TRANSACTION_THRESHOLD = 1000.00
SMALL_TRANSACTION_RATE = 0.029      # 2.9%
LARGE_TRANSACTION_RATE = 0.022      # 2.2%
FIXED_TRANSACTION_FEE  = 0.30       # $0.30 per transaction

HTTP_TOO_MANY_REQUESTS = 429
RATE_LIMIT_RETRY_DELAY_SECONDS = 60

def is_eligible(user) -> bool:
    return user.age >= MINIMUM_ELIGIBLE_AGE and user.credit_score >= MINIMUM_CREDIT_SCORE

def calculate_fee(amount: float) -> float:
    rate = SMALL_TRANSACTION_RATE if amount < SMALL_TRANSACTION_THRESHOLD else LARGE_TRANSACTION_RATE
    return amount * rate + FIXED_TRANSACTION_FEE

if response.status_code == HTTP_TOO_MANY_REQUESTS:
    time.sleep(RATE_LIMIT_RETRY_DELAY_SECONDS)
    retry()

Refactoring: Replace Magic Number with Symbolic Constant. For constrained string/integer sets, prefer Enum.

Smell 6: Inconsistent Names

Using different names for the same concept across a codebase forces the reader to constantly resolve mental aliases. user, usr, u, current_user, and the_user are all the same concept written by different engineers on different days.

Before

# In auth.py
def get_current_user(token): ...

# In orders.py
def create_order(usr, items): ...

# In billing.py
def charge(u, amount): ...

# In models.py
class UserProfile:  # But the DB table is called "accounts"
    ...

After

Establish a team glossary and enforce it with type annotations:

# Glossary decision: the concept is always "user", Python type is User

# In auth.py
def get_current_user(token: str) -> "User": ...

# In orders.py
def create_order(user: "User", items: list["OrderItem"]) -> "Order": ...

# In billing.py
def charge(user: "User", amount: "Money") -> "Receipt": ...

# In models.py
class User:  # DB table aliased in ORM config, concept name wins
    ...

Type annotations are your best ally for consistency. If every function that accepts a user object uses the type annotation User, a type checker will flag any mismatch and grep will find every usage site.

Smell 7: Nested Conditionals (The Arrow Anti-Pattern)

Deeply nested if/else structures - sometimes four or five levels deep - force the reader to track a growing stack of context. The code visually resembles an arrow pointing right.

Before

def process_withdrawal(user, account, amount):
    if user is not None:
        if user.is_verified:
            if account is not None:
                if account.balance >= amount:
                    if amount <= user.daily_limit:
                        account.balance -= amount
                        return "success"
                    else:
                        return "exceeds_daily_limit"
                else:
                    return "insufficient_funds"
            else:
                return "account_not_found"
        else:
            return "user_not_verified"
    else:
        return "user_not_found"

After: Guard Clauses

def process_withdrawal(user, account, amount: float) -> str:
    if user is None:
        return "user_not_found"
    if not user.is_verified:
        return "user_not_verified"
    if account is None:
        return "account_not_found"
    if account.balance < amount:
        return "insufficient_funds"
    if amount > user.daily_limit:
        return "exceeds_daily_limit"

    account.balance -= amount
    return "success"

Each guard clause is one line, one reason, one exit. The happy path - the last two lines - is at zero indentation. The reader can scan the failure conditions in a few seconds without tracking any nesting context.

Refactoring: Collapse Else Clause, Replace Nested Conditional with Guard Clauses.

Smell 8: Feature Envy

A method has Feature Envy when it uses the data and methods of another class more than its own. It is a sign the method is in the wrong class.

Before

class Order:
    def __init__(self, customer, items, discount_code):
        self.customer = customer
        self.items = items
        self.discount_code = discount_code

class InvoiceGenerator:
    def calculate_customer_discount(self, order: Order) -> float:
        # This method only touches order.customer - it envies Customer
        if order.customer.tier == "gold":
            return 0.10
        elif order.customer.tier == "platinum":
            return 0.15
        elif order.customer.loyalty_years >= 5:
            return 0.08
        return 0.0

calculate_customer_discount is on InvoiceGenerator but exclusively uses customer data. The method belongs on Customer.

After

class Customer:
    def __init__(self, tier: str, loyalty_years: int):
        self.tier = tier
        self.loyalty_years = loyalty_years

    def discount_rate(self) -> float:
        """Return this customer's applicable discount rate."""
        if self.tier == "gold":
            return 0.10
        if self.tier == "platinum":
            return 0.15
        if self.loyalty_years >= 5:
            return 0.08
        return 0.0

class InvoiceGenerator:
    def calculate_customer_discount(self, order: "Order") -> float:
        # One line - InvoiceGenerator no longer envies Customer
        return order.customer.discount_rate()

Refactoring: Move Method.

Smell 9: God Class / God Function

A God Class is a class that knows everything and does everything. It becomes the bottleneck for every feature request. A God Function is the same problem at the function level - one function that handles end-to-end processing of an entire domain.

Before (Abbreviated)

class App:
    def login(self, email, password): ...
    def logout(self): ...
    def create_order(self, items): ...
    def cancel_order(self, order_id): ...
    def process_payment(self, amount, card): ...
    def refund_payment(self, payment_id): ...
    def send_email(self, to, subject, body): ...
    def send_sms(self, phone, body): ...
    def generate_report(self, report_type): ...
    def upload_file(self, file, bucket): ...
    def resize_image(self, path, width, height): ...
    def geocode_address(self, address): ...
    # 40 more methods...

After: Split by Responsibility

class AuthService:
    def login(self, email: str, password: str) -> "User": ...
    def logout(self, user: "User") -> None: ...

class OrderService:
    def create_order(self, user: "User", items: list) -> "Order": ...
    def cancel_order(self, order_id: int) -> None: ...

class PaymentService:
    def charge(self, amount: float, card: "Card") -> "Payment": ...
    def refund(self, payment_id: int) -> "Refund": ...

class NotificationService:
    def send_email(self, message: "EmailMessage") -> None: ...
    def send_sms(self, message: "SmsMessage") -> None: ...

Each service has a single, nameable responsibility. Each can be developed, tested, and replaced independently.

Refactoring: Extract Class; apply Single Responsibility Principle.

Smell 10: Primitive Obsession

Primitive Obsession is using raw built-in types - str, int, float - to represent domain concepts that have their own validation rules and behavior.

Before

def create_account(
    email: str,          # Could be any string
    phone: str,          # Could be any string
    currency: str,       # Is "EUR" valid? How about "Euros"? "eur"?
    amount: float,       # Dollars? Cents? Which currency?
    country_code: str,   # 2-letter ISO? 3-letter? No enforcement
) -> dict:
    ...

Nothing prevents a caller from passing currency="not-a-currency" or mixing up dollar amounts with cent amounts.

After

from dataclasses import dataclass
from enum import Enum

class Currency(str, Enum):
    USD = "USD"
    EUR = "EUR"
    GBP = "GBP"
    JPY = "JPY"

class CountryCode(str, Enum):
    US = "US"
    GB = "GB"
    DE = "DE"
    JP = "JP"

@dataclass(frozen=True)
class Email:
    address: str

    def __post_init__(self):
        if "@" not in self.address or "." not in self.address.split("@")[-1]:
            raise ValueError(f"Invalid email: {self.address!r}")

@dataclass(frozen=True)
class Money:
    amount_cents: int    # Always store monetary values in the smallest unit
    currency: Currency

    @classmethod
    def from_dollars(cls, dollars: float, currency: Currency = Currency.USD) -> "Money":
        return cls(amount_cents=round(dollars * 100), currency=currency)

    def __add__(self, other: "Money") -> "Money":
        if self.currency != other.currency:
            raise ValueError(f"Cannot add {self.currency} and {other.currency}")
        return Money(self.amount_cents + other.amount_cents, self.currency)

def create_account(
    email: Email,
    phone: str,
    currency: Currency,
    initial_deposit: Money,
    country: CountryCode,
) -> dict:
    ...

Refactoring: Replace Primitive with Object; use Enum for any constrained string or integer set.

Smell 11: Comments as Excuses

The worst comments are those that apologize for the code, acknowledge that it is broken, or promise a future fix that never comes.

# BAD: excuses, not explanations
result = x * 1.0625  # I'm not sure why this works but don't touch it

# TODO: fix this properly someday
data = data[:-1]  # hack to remove trailing newline

# This is terrible, I know, but we needed to ship
if status == 2 or status == 7 or status == 11:
    force_refresh()

When you feel the urge to write # hack or # not sure why, the code needs to be understood and fixed - not annotated.

After

# The 6.25% is the federal excise tax rate under 26 USC section 4251
TAX_RATE = 0.0625
result = x * (1 + TAX_RATE)

# Strip the trailing newline that the legacy v2.0 API appends to all responses.
# This can be removed when we drop support for clients below v2.1.
data = data.rstrip("\n")

# These statuses indicate cache-invalidating events per the event schema
# documented at internal/docs/event-schema.md
CACHE_INVALIDATING_STATUSES = {2, 7, 11}
if status in CACHE_INVALIDATING_STATUSES:
    force_refresh()

The rule: Comments should explain WHY, not WHAT. If a comment explains what the code does, rewrite the code until the comment is unnecessary.

Smell 12: Overly Defensive Code

Defensive code is good: validate inputs at system boundaries, handle errors, check preconditions. Over-defensive code wraps everything in try/except, adds None checks before every attribute access, and swallows errors it cannot meaningfully respond to.

Before

def get_user_email(user_id):
    try:
        if user_id is not None:
            user = None
            try:
                user = db.get_user(user_id)
            except Exception:
                pass
            if user is not None:
                if hasattr(user, "email"):
                    if user.email is not None:
                        return user.email
    except Exception:
        pass
    return None

A database connection failure is silently treated the same as a missing user. The caller receives None and cannot distinguish the two cases.

After

def get_user_email(user_id: int) -> str:
    """Return the email address for the given user_id.

    Raises:
        UserNotFoundError: if no user exists with that id.
        DatabaseError: if the database query fails.
    """
    user = db.get_user(user_id)   # DatabaseError propagates to the caller
    if user is None:
        raise UserNotFoundError(f"No user found with id {user_id}")
    return user.email

Be defensive at the system boundary: validate user input, handle network timeouts, check file existence before reading. Trust your own code and let unexpected errors propagate where they can be logged, monitored, and fixed at the right layer.

Smell 13: Boolean Trap

A Boolean Trap occurs when a function takes one or more boolean positional parameters whose meaning is invisible at the call site.

Before

def render_widget(data, True, False, True):
    ...

# At the call site: what do True, False, True mean?
render_widget(data, True, False, True)

After: Keyword-Only Arguments

def render_widget(
    data: dict,
    *,                             # Force all following args to be keyword-only
    show_border: bool = True,
    is_read_only: bool = False,
    animate: bool = True,
) -> str:
    ...

# At the call site: completely self-documenting
render_widget(data, show_border=True, is_read_only=False, animate=True)

The * in the signature makes it syntactically impossible to pass those booleans positionally. For richer cases, replace multiple booleans with an Enum:

from enum import Enum

class WidgetMode(Enum):
    INTERACTIVE = "interactive"
    READ_ONLY   = "read_only"
    PREVIEW     = "preview"

def render_widget(data: dict, mode: WidgetMode = WidgetMode.INTERACTIVE) -> str:
    ...

render_widget(data, mode=WidgetMode.READ_ONLY)

Refactoring: Use keyword-only arguments (*); replace boolean flags with Enum types.

Smell 14: Mutable Default Arguments

This is Python's most notorious subtle bug. When you use a mutable object - a list, dict, or set - as a default argument, that object is created once at function definition time and shared across all calls that rely on the default.

The Bug

def add_item(item, items=[]):
    items.append(item)
    return items

print(add_item("apple"))   # ['apple']
print(add_item("banana"))  # ['apple', 'banana']  <- not a fresh list!
print(add_item("cherry"))  # ['apple', 'banana', 'cherry']

The [] is created once when Python parses the def statement. All calls that omit items share the same list, which accumulates entries across calls. This is almost never the intended behavior.

After: The Canonical Fix

def add_item(item: str, items: list[str] | None = None) -> list[str]:
    """Return a new list with item appended."""
    if items is None:
        items = []
    items.append(item)
    return items

print(add_item("apple"))   # ['apple']
print(add_item("banana"))  # ['banana']  <- correct: fresh list each call

None is immutable - safe as a default. The first line of the function body creates a fresh list when no argument is provided.

The Intentional Exception

Mutable defaults are sometimes used intentionally for simple caching, where persistent state across calls is the goal. This should always be documented:

def cached_fetch(url: str, _cache: dict = {}) -> str:
    """Fetch url, caching results across calls via the module-level _cache dict.

    Note: _cache is intentionally mutable. Use functools.lru_cache for
    production caching with size limits and thread safety.
    """
    if url not in _cache:
        _cache[url] = requests.get(url).text
    return _cache[url]

Smell 15: Data Clumps

Data Clumps are groups of data items that always appear together but are not formally grouped. Wherever you see (latitude, longitude), (start_date, end_date), or (first_name, last_name, email), these clumps belong in their own type.

Before

def calculate_distance(lat1, lon1, lat2, lon2): ...
def format_coordinates(lat, lon, precision=4): ...
def is_within_radius(center_lat, center_lon, point_lat, point_lon, radius_km): ...

The four-parameter form of is_within_radius requires the caller to know the correct order of eight values. One transposition silently gives wrong answers.

After

from dataclasses import dataclass
import math

@dataclass(frozen=True)
class Coordinates:
    latitude: float
    longitude: float

    def distance_to(self, other: "Coordinates") -> float:
        """Return great-circle distance to other in kilometers."""
        R = 6371.0
        lat1 = math.radians(self.latitude)
        lon1 = math.radians(self.longitude)
        lat2 = math.radians(other.latitude)
        lon2 = math.radians(other.longitude)
        dlat, dlon = lat2 - lat1, lon2 - lon1
        a = math.sin(dlat / 2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2)**2
        return R * 2 * math.asin(math.sqrt(a))

    def is_within_radius(self, center: "Coordinates", radius_km: float) -> bool:
        return self.distance_to(center) <= radius_km

    def __str__(self) -> str:
        return f"{self.latitude:.4f}, {self.longitude:.4f}"

Refactoring: Introduce Value Object / dataclass.

Interview Questions

Q1: What is a code smell and how does it differ from a bug?

Answer: A code smell is a surface indication of a deeper design problem - a pattern in code that suggests structural issues rather than incorrect behavior. Unlike bugs, code smells do not break tests or cause immediate failures. Their harm is indirect: they make code harder to understand, change, and maintain over time, increasing the cost of every future modification. The classic examples are Long Function, Duplicate Code, and Magic Numbers. The term was coined by Kent Beck and popularized by Martin Fowler's Refactoring.

Q2: Explain the mutable default argument problem in Python and the canonical fix.

Answer: When a mutable object such as a list or dict is used as a default parameter value in Python, that object is created once at function definition time, not at each call. All calls that rely on the default share the same object, so mutations accumulate across calls - a list grows indefinitely across calls, for example. The canonical fix is to use None as the default sentinel: def f(items=None): items = [] if items is None else items. None is immutable and safe as a default. The function body creates a fresh mutable object on each call that needs one.

Q3: What is the Boolean Trap and how do keyword-only arguments eliminate it?

Answer: The Boolean Trap occurs when a function takes boolean positional arguments that are meaningless at the call site - render(data, True, False, True) is impossible to understand without reading the function definition. Keyword-only arguments, enforced by placing * in the parameter list, make it syntactically impossible to pass those values positionally. The call site becomes render(data, show_border=True, is_readonly=False, animate=True), which is completely self-documenting. For more complex cases, an Enum replaces multiple booleans with a single named state value.

Q4: Describe Feature Envy and how you recognize it in code.

Answer: Feature Envy occurs when a method uses the data or methods of another class more than its own. You recognize it when a method primarily accesses attributes of one of its parameters: order.customer.tier, order.customer.loyalty_years, order.customer.discount_rate(). The method is doing work that belongs on the class whose data it envies. The fix is to move the method - or its core logic - to the class being accessed most. The original method becomes a one-liner that delegates to the moved version. The result is that each class owns the logic that operates on its own data.

Q5: When is a comment appropriate and when does it indicate a code smell?

Answer: A comment is appropriate when it explains WHY - the business rationale, historical context, a non-obvious algorithmic constraint, or a link to an external specification. A comment is a smell when it explains WHAT the code does, because that means the code is not clear enough to be self-explanatory. "# Loop through items and calculate total" is a smell - rename the function calculate_item_total() and the comment becomes redundant. "# Using exponential backoff per RFC 7231 section 6.6.4" is a good comment - it explains a non-obvious design decision. Comments that apologize - "# hack", "# not sure why this works" - are the clearest signal that the underlying code needs to be fixed, not annotated.

Q6: What is Primitive Obsession and what Python constructs best fix it?

Answer: Primitive Obsession is using raw built-in types (str, int, float) to represent domain concepts that have their own validation rules and behavior. A parameter typed as currency: str can receive any string - "USD", "Euros", "invalid" - without a validation error. The fixes in Python: use Enum for any constrained set of string or integer values (it prevents invalid members at construction time), use @dataclass with __post_init__ validation for compound concepts with their own rules (like Email or Money), and use frozen=True when the value should be immutable. These types consolidate validation at the boundary, make function signatures self-documenting, and enable the type checker to catch misuse statically.

Practice Challenges

Beginner - Identify and Name Five Smells

Read this function and name every code smell you can find. For each smell, identify the refactoring that fixes it.

def calc(x, y, z, a=[], b=True, c=False):
    # calculate the result
    r = None
    if x != None:
        if x > 0:
            if y != None:
                t = x * 1.13 + y * 0.85
                if z == 1:
                    t = t * 0.9
                elif z == 2:
                    t = t * 0.8
                elif z == 3:
                    t = t * 0.75
                if b == True:
                    t = t * 1.1
                a.append(t)
                r = sum(a) / len(a)
    return r

Solution

Smells identified:

Mutable Default Argument (a=[]) - Bug: a accumulates across calls. Fix: use a=None, create [] inside the body.
Magic Numbers (1.13, 0.85, 0.9, 0.8, 0.75, 1.1) - Fix: replace with named constants.
Inconsistent / Meaningless Names (x, y, z, a, b, c, r, t) - Fix: rename to descriptive identifiers.
Boolean Trap / Redundant Comparison (if b == True) - Fix: use if b:.
Nested Conditionals (Arrow Anti-Pattern) - Fix: guard clauses.
Comment as WHAT (# calculate the result) - Fix: delete it; rename the function instead.

TAX_RATE         = 1.13
SHIPPING_RATE    = 0.85
CATEGORY_MULTIPLIERS = {1: 0.90, 2: 0.80, 3: 0.75}
PREMIUM_SURCHARGE = 1.10

def calculate_running_average(
    base_price: float,
    shipping_cost: float,
    category: int,
    include_premium: bool = True,
    history: list[float] | None = None,
) -> float | None:
    if history is None:
        history = []
    if base_price is None or base_price <= 0:
        return None
    if shipping_cost is None:
        return None

    total = base_price * TAX_RATE + shipping_cost * SHIPPING_RATE
    total *= CATEGORY_MULTIPLIERS.get(category, 1.0)
    if include_premium:
        total *= PREMIUM_SURCHARGE

    history.append(total)
    return sum(history) / len(history)

Intermediate - Eliminate Feature Envy and a God Function

This ReportGenerator class has Feature Envy (it obsesses over User data) and a large generate method doing too much. Refactor it by moving logic to the right classes and splitting responsibilities.

class User:
    def __init__(self, name, email, tier, region, signup_date):
        self.name = name
        self.email = email
        self.tier = tier
        self.region = region
        self.signup_date = signup_date

class ReportGenerator:
    def generate(self, user, orders, report_type, include_tax, currency, fmt):
        if user.tier == "free":
            discount = 0
        elif user.tier == "pro":
            discount = 0.10
        elif user.tier == "enterprise":
            discount = 0.20
        else:
            discount = 0

        subtotal = sum(o["amount"] for o in orders)
        discounted = subtotal * (1 - discount)

        if include_tax:
            tax_rates = {"US": 0.08, "EU": 0.20, "UK": 0.20}
            tax = discounted * tax_rates.get(user.region, 0.0)
        else:
            tax = 0

        total = discounted + tax

        if fmt == "json":
            return {"user": user.name, "total": total, "currency": currency}
        elif fmt == "csv":
            return f"{user.name},{total},{currency}"
        elif fmt == "text":
            return f"Report for {user.name}: {total} {currency}"

Solution

from dataclasses import dataclass
from enum import Enum

class UserTier(str, Enum):
    FREE       = "free"
    PRO        = "pro"
    ENTERPRISE = "enterprise"

class Region(str, Enum):
    US = "US"
    EU = "EU"
    UK = "UK"

class ReportFormat(str, Enum):
    JSON = "json"
    CSV  = "csv"
    TEXT = "text"

TIER_DISCOUNTS: dict[UserTier, float] = {
    UserTier.FREE:       0.00,
    UserTier.PRO:        0.10,
    UserTier.ENTERPRISE: 0.20,
}

REGIONAL_TAX_RATES: dict[Region, float] = {
    Region.US: 0.08,
    Region.EU: 0.20,
    Region.UK: 0.20,
}

@dataclass
class User:
    name: str
    email: str
    tier: UserTier
    region: Region
    signup_date: str

    def discount_rate(self) -> float:
        """Return this user's discount rate based on their tier."""
        return TIER_DISCOUNTS.get(self.tier, 0.0)

    def tax_rate(self) -> float:
        """Return the applicable tax rate for this user's region."""
        return REGIONAL_TAX_RATES.get(self.region, 0.0)

@dataclass
class ReportData:
    user: User
    total: float
    currency: str

    def as_json(self) -> dict:
        return {"user": self.user.name, "total": self.total, "currency": self.currency}

    def as_csv(self) -> str:
        return f"{self.user.name},{self.total},{self.currency}"

    def as_text(self) -> str:
        return f"Report for {self.user.name}: {self.total} {self.currency}"

    def render(self, fmt: ReportFormat) -> "dict | str":
        renderers = {
            ReportFormat.JSON: self.as_json,
            ReportFormat.CSV:  self.as_csv,
            ReportFormat.TEXT: self.as_text,
        }
        return renderers[fmt]()

class ReportGenerator:
    def generate(
        self,
        user: User,
        orders: list[dict],
        report_type: str,
        include_tax: bool,
        currency: str,
        fmt: ReportFormat,
    ) -> "dict | str":
        subtotal = sum(o["amount"] for o in orders)
        discounted = subtotal * (1 - user.discount_rate())
        tax = discounted * user.tax_rate() if include_tax else 0.0
        total = discounted + tax
        return ReportData(user=user, total=total, currency=currency).render(fmt)

Advanced - Fix All Smells in a Real-World Webhook Handler

This function has seven distinct smells. Identify each, explain why it is a smell, and produce a fully refactored version.

def handle_webhook(payload, headers, db, cache, logger, notifier, retry=0):
    try:
        sig = headers.get('X-Signature')
        if sig:
            if sig == hashlib.sha256((payload + 'secret123').encode()).hexdigest():
                data = json.loads(payload)
                if data:
                    if data.get('type') == 'payment.completed':
                        order = db.get(data['order_id'])
                        if order:
                            order['status'] = 4
                            db.save(order)
                            cache.delete('order_' + str(data['order_id']))
                            notifier.send(order['user_email'], 'Payment received', 'Your payment was processed.')
                    elif data.get('type') == 'payment.failed':
                        order = db.get(data['order_id'])
                        if order:
                            order['status'] = 5
                            db.save(order)
                            notifier.send(order['user_email'], 'Payment failed', 'Please update your payment method.')
    except:
        pass

Solution

Smells identified:

Magic Numbers - 4, 5 (order statuses with no names)
Magic String - 'secret123' (hardcoded secret in business logic)
Nested Conditionals - five levels; arrow anti-pattern
Overly Defensive Code - bare except: pass silences all errors including database failures
Duplicate Code - identical db.get / db.save / notifier.send pattern duplicated per event type
Long Parameter List - seven parameters; several are infrastructure services that should be injected differently
God Function - handles signature verification, JSON parsing, event dispatch, and database/cache updates in one place

import hashlib
import json
import logging
from dataclasses import dataclass
from enum import IntEnum
from typing import Callable

logger = logging.getLogger(__name__)

# Load from environment in production - never hardcode secrets
import os
WEBHOOK_SECRET = os.environ.get("WEBHOOK_SECRET", "secret123")

class OrderStatus(IntEnum):
    PAYMENT_COMPLETED = 4
    PAYMENT_FAILED    = 5

@dataclass
class WebhookContext:
    """Infrastructure services needed to process webhook events."""
    db: object
    cache: object
    notifier: object


def verify_signature(payload: str, signature: str) -> bool:
    """Return True if signature matches the HMAC of payload using WEBHOOK_SECRET."""
    expected = hashlib.sha256((payload + WEBHOOK_SECRET).encode()).hexdigest()
    return signature == expected


def update_order_status(order_id: int, status: OrderStatus, ctx: WebhookContext) -> dict | None:
    """Fetch, update, and save an order's status. Returns None if order not found."""
    order = ctx.db.get(order_id)
    if order is None:
        logger.warning("Webhook referenced unknown order %s", order_id)
        return None
    order["status"] = status
    ctx.db.save(order)
    return order


def handle_payment_completed(order_id: int, ctx: WebhookContext) -> None:
    order = update_order_status(order_id, OrderStatus.PAYMENT_COMPLETED, ctx)
    if order is None:
        return
    ctx.cache.delete(f"order_{order_id}")
    ctx.notifier.send(
        order["user_email"],
        "Payment received",
        "Your payment was processed successfully.",
    )


def handle_payment_failed(order_id: int, ctx: WebhookContext) -> None:
    order = update_order_status(order_id, OrderStatus.PAYMENT_FAILED, ctx)
    if order is None:
        return
    ctx.notifier.send(
        order["user_email"],
        "Payment failed",
        "Please update your payment method to continue your subscription.",
    )


EventHandler = Callable[[int, WebhookContext], None]

EVENT_HANDLERS: dict[str, EventHandler] = {
    "payment.completed": handle_payment_completed,
    "payment.failed":    handle_payment_failed,
}


def handle_webhook(payload: str, headers: dict, ctx: WebhookContext) -> None:
    """Verify, parse, and dispatch an incoming webhook event."""
    signature = headers.get("X-Signature", "")
    if not verify_signature(payload, signature):
        logger.warning("Webhook received with invalid signature - rejecting")
        return

    try:
        data = json.loads(payload)
    except json.JSONDecodeError as exc:
        logger.error("Malformed JSON in webhook payload: %s", exc)
        return

    event_type = data.get("type")
    handler = EVENT_HANDLERS.get(event_type)
    if handler is None:
        logger.info("No handler registered for webhook event type %r", event_type)
        return

    try:
        handler(data["order_id"], ctx)
    except Exception:
        # Log with full traceback and re-raise so the HTTP layer can return 500
        # and the webhook provider will retry delivery
        logger.exception("Unhandled error processing %r for order %s", event_type, data.get("order_id"))
        raise

Quick Reference

Smell	Signal	Refactoring
Long Function	More than 25 lines; multiple comment sections	Extract Function
Long Parameter List	More than 4 parameters	Introduce Parameter Object
Duplicate Code	Same logic in two or more locations	Extract Function; DRY
Dead Code	Unused imports, commented-out code, unreachable branches	Delete it
Magic Numbers / Strings	Unexplained literals in logic	Replace with Named Constant or Enum
Inconsistent Names	Same concept with multiple names	Rename; establish team glossary
Nested Conditionals	Arrow anti-pattern; indentation depth 4+	Guard Clauses / Early Return
Feature Envy	Method uses another class's data more than its own	Move Method
God Class / Function	One class/function handles everything	Extract Class; Single Responsibility
Primitive Obsession	Raw `str`/`int`/`float` for domain concepts	Replace with Value Object; Enum
Comments as Excuses	`# hack`, `# not sure why`, unapologetic TODOs	Fix the code; write WHY comments
Overly Defensive	Bare `except: pass`, None checks everywhere	Validate at boundary; let errors propagate
Boolean Trap	`f(True, False, True)` - meaningless at call site	Keyword-only args (`*`); Enum
Mutable Default Arg	`def f(items=[]):` - list shared across calls	Use `None` sentinel; create object in body
Data Clumps	Same group of primitives always travels together	Introduce Value Object / dataclass

Key Takeaways

A code smell is not a bug - it is a warning sign of structural problems that make code progressively more expensive to change.
Mutable default arguments (def f(items=[])) are a live bug in Python, not just a smell - the object is shared across all calls.
The Boolean Trap (f(True, False)) is eliminated by keyword-only arguments (* in the signature) or Enum types.
Bare except: pass is never acceptable in production code - it silently swallows failures that should be logged, monitored, and fixed.
Dead code is not preserved by comments - version control preserves it. Delete without guilt.
Good comments explain WHY; smelly comments explain WHAT or apologize for HOW.
Primitive Obsession is best fixed with @dataclass for grouped domain concepts and Enum for constrained value sets.

What You Will Learn​

Prerequisites​

What Is a Code Smell?​

Smell 1: Long Function​

Before​

After​

Smell 2: Long Parameter List​

Before​

After​

Smell 3: Duplicate Code​

Before​

After​

Smell 4: Dead Code​

After​

Smell 5: Magic Numbers and Strings​

Before​

After​

Smell 6: Inconsistent Names​

Before​

After​

Smell 7: Nested Conditionals (The Arrow Anti-Pattern)​

Before​

After: Guard Clauses​

Smell 8: Feature Envy​

Before​

After​

Smell 9: God Class / God Function​

Before (Abbreviated)​

After: Split by Responsibility​

Smell 10: Primitive Obsession​

Before​

After​

Smell 11: Comments as Excuses​

After​

Smell 12: Overly Defensive Code​

Before​

After​

Smell 13: Boolean Trap​

Before​

After: Keyword-Only Arguments​

Smell 14: Mutable Default Arguments​

The Bug​

After: The Canonical Fix​

The Intentional Exception​

Smell 15: Data Clumps​

Before​

After​

Interview Questions​

Q1: What is a code smell and how does it differ from a bug?​

Q2: Explain the mutable default argument problem in Python and the canonical fix.​

Q3: What is the Boolean Trap and how do keyword-only arguments eliminate it?​

Q4: Describe Feature Envy and how you recognize it in code.​

Q5: When is a comment appropriate and when does it indicate a code smell?​

Q6: What is Primitive Obsession and what Python constructs best fix it?​

Practice Challenges​

Beginner - Identify and Name Five Smells​

Intermediate - Eliminate Feature Envy and a God Function​

Advanced - Fix All Smells in a Real-World Webhook Handler​

Quick Reference​

Key Takeaways​

What You Will Learn

Prerequisites

What Is a Code Smell?

Smell 1: Long Function

Before

After

Smell 2: Long Parameter List

Before

After

Smell 3: Duplicate Code

Before

After

Smell 4: Dead Code

After

Smell 5: Magic Numbers and Strings

Before

After

Smell 6: Inconsistent Names

Before

After

Smell 7: Nested Conditionals (The Arrow Anti-Pattern)

Before

After: Guard Clauses

Smell 8: Feature Envy

Before

After

Smell 9: God Class / God Function

Before (Abbreviated)

After: Split by Responsibility

Smell 10: Primitive Obsession

Before

After

Smell 11: Comments as Excuses

After

Smell 12: Overly Defensive Code

Before

After

Smell 13: Boolean Trap

Before

After: Keyword-Only Arguments

Smell 14: Mutable Default Arguments

The Bug

After: The Canonical Fix

The Intentional Exception

Smell 15: Data Clumps

Before

After

Interview Questions

Q1: What is a code smell and how does it differ from a bug?

Q2: Explain the mutable default argument problem in Python and the canonical fix.

Q3: What is the Boolean Trap and how do keyword-only arguments eliminate it?

Q4: Describe Feature Envy and how you recognize it in code.

Q5: When is a comment appropriate and when does it indicate a code smell?

Q6: What is Primitive Obsession and what Python constructs best fix it?

Practice Challenges

Beginner - Identify and Name Five Smells

Intermediate - Eliminate Feature Envy and a God Function

Advanced - Fix All Smells in a Real-World Webhook Handler

Quick Reference

Key Takeaways