Refactoring Techniques - Improving Code Without Breaking It

Reading time: ~22 minutes | Level: Foundation → Engineering

# Before refactoring - 58 lines, one function, zero clarity
def process(u, items, disc=0):
    t = 0
    for i in items:
        p = i['price']
        if i['type'] == 1:
            p = p * 0.9
        elif i['type'] == 2:
            p = p * 0.8
        elif i['type'] == 3:
            p = p * 0.75
        if disc > 0:
            p = p - (p * disc / 100)
        t += p
    if u['premium']:
        t = t * 0.95
    tax = t * 0.08
    return {'subtotal': t, 'tax': tax, 'total': t + tax}

# After refactoring - same behavior, zero mystery
CATEGORY_DISCOUNTS = {1: 0.10, 2: 0.20, 3: 0.25}
PREMIUM_DISCOUNT = 0.05
TAX_RATE = 0.08

def apply_category_discount(price: float, item_type: int) -> float:
    discount = CATEGORY_DISCOUNTS.get(item_type, 0.0)
    return price * (1 - discount)

def apply_coupon_discount(price: float, coupon_percent: float) -> float:
    return price * (1 - coupon_percent / 100)

def apply_premium_discount(subtotal: float, is_premium: bool) -> float:
    return subtotal * (1 - PREMIUM_DISCOUNT) if is_premium else subtotal

def calculate_order_total(user: dict, items: list, coupon_percent: float = 0) -> dict:
    subtotal = sum(
        apply_coupon_discount(apply_category_discount(item["price"], item["type"]), coupon_percent)
        for item in items
    )
    subtotal = apply_premium_discount(subtotal, user["premium"])
    tax = subtotal * TAX_RATE
    return {"subtotal": subtotal, "tax": tax, "total": subtotal + tax}

These two functions do exactly the same thing. The second one took 20 minutes to produce from the first. That 20 minutes is what separates code you dread from code you trust.

Refactoring is the discipline of improving the internal structure of code without changing its observable behavior. It is not rewriting. It is not optimizing. It is not adding features. It is a precise, test-guarded surgery on the shape of your code - one small step at a time.

What You Will Learn

The formal definition of refactoring and what distinguishes it from rewriting
The cardinal rule: why you must never refactor untested code
Extract Function - breaking monoliths into focused units
Rename - the most impactful and underrated refactoring
Extract Variable - replacing expressions with meaning
Replace Magic Numbers with named constants
Extract Class - when data and state accumulate in a function
Replace Conditional with Polymorphism or dispatch tables
Introduce Parameter Object - replacing long argument lists
Collapse Else Clause (guard clauses / early return)
The complete refactoring workflow and when NOT to refactor

Prerequisites

Comfort writing Python functions and classes
Basic familiarity with writing tests (pytest or unittest)
Understanding of the code smells concept is helpful but not required

What Refactoring IS (and IS NOT)

Refactoring has a precise definition, coined by Martin Fowler: a change to the internal structure of software that makes it easier to understand and cheaper to modify, without changing its observable behavior.

That last clause - "without changing observable behavior" - is what separates refactoring from every other kind of code change.

Activity	Changes behavior?	Is it refactoring?
Renaming a variable for clarity	No	Yes
Extracting a function	No	Yes
Fixing a bug	Yes	No
Adding a feature	Yes	No
Rewriting a module from scratch	Sometimes	No
Optimizing a slow algorithm	Sometimes	No

This distinction matters operationally. When you are refactoring, your tests should stay green at every step. If a test fails after a refactoring step, you either introduced a bug or your test was testing implementation rather than behavior. Either way, you roll back, not push forward.

Why Refactor at All?

Code accumulates complexity over time - not through malice but through momentum. A function that started at 15 lines grows to 80 because each new requirement adds 5 more. A class that handled one concern gains responsibilities as the system evolves. This is called technical debt: a loan you take out in the form of expedient code that you must eventually repay with interest (debugging time, onboarding time, change fear).

Refactoring is the repayment mechanism. Small, frequent refactors prevent debt from compounding.

# What 18 months of feature additions looks like
def send_email(user_id, template_id, subject, body, attachments=None,
               cc=None, bcc=None, reply_to=None, from_name=None,
               from_email=None, track_opens=True, track_clicks=True,
               schedule_at=None, campaign_id=None, tags=None):
    # 140 more lines...
    pass

This function started as send_email(user_id, template_id). Nobody planned for 13 parameters. Each parameter was added for a legitimate reason. The refactoring discipline says: when you add the 5th parameter, that is the moment to introduce a parameter object. Not later - now.

The Cardinal Rule: Never Refactor Without Tests

The most dangerous situation in software is refactoring code that has no tests. Without tests, you cannot verify that your structural change left the behavior intact. You are navigating in the dark.

# This function has no tests. DO NOT refactor it yet.
def calculate_tax(income, filing_status, state, has_dependents, year):
    # 90 lines of tax bracket logic
    pass

Before touching this function, you write tests. Not after - before. This is called characterization testing: you run the existing code with a variety of inputs and record its outputs. Those outputs become your test cases, even if the outputs are wrong. They document what the code currently does.

# Step 1: Write characterization tests BEFORE changing anything
import pytest

def test_calculate_tax_single_no_dependents():
    result = calculate_tax(75000, "single", "CA", False, 2024)
    assert result == 18750.0  # This is what the code returns TODAY

def test_calculate_tax_married_with_dependents():
    result = calculate_tax(120000, "married", "TX", True, 2024)
    assert result == 22400.0

def test_calculate_tax_zero_income():
    result = calculate_tax(0, "single", "NY", False, 2024)
    assert result == 0.0

Now you have a safety net. Every refactoring step you take, you run these tests. If they pass, you have preserved behavior. If they fail, you broke something and you revert.

Extract Function

Extract Function is the most fundamental refactoring. When a function does more than one thing - or when a block of code requires a comment to explain it - extract it into a named function.

The Signal

def process_order(order_data):
    # Validate the order
    if not order_data.get("items"):
        raise ValueError("Order must have items")
    if not order_data.get("customer_id"):
        raise ValueError("Order must have a customer")
    total = sum(item["price"] * item["quantity"] for item in order_data["items"])
    if total <= 0:
        raise ValueError("Order total must be positive")

    # Apply discounts
    discount = 0
    if order_data.get("coupon_code") == "SAVE10":
        discount = total * 0.10
    elif order_data.get("coupon_code") == "SAVE20":
        discount = total * 0.20
    discounted_total = total - discount

    # Calculate tax
    state = order_data.get("shipping_state", "CA")
    TAX_RATES = {"CA": 0.0725, "NY": 0.08, "TX": 0.0625}
    tax_rate = TAX_RATES.get(state, 0.07)
    tax = discounted_total * tax_rate

    # Save to database
    db_record = {
        "customer_id": order_data["customer_id"],
        "items": order_data["items"],
        "subtotal": discounted_total,
        "tax": tax,
        "total": discounted_total + tax,
        "status": "pending",
    }
    # ... db.save(db_record)
    return db_record

This function has four responsibilities signaled by four comments. Each comment is a refactoring opportunity: the comment title becomes the function name.

After Extract Function

def validate_order(order_data: dict) -> None:
    """Raise ValueError if order_data is missing required fields."""
    if not order_data.get("items"):
        raise ValueError("Order must have items")
    if not order_data.get("customer_id"):
        raise ValueError("Order must have a customer")
    total = sum(item["price"] * item["quantity"] for item in order_data["items"])
    if total <= 0:
        raise ValueError("Order total must be positive")


COUPON_DISCOUNTS = {"SAVE10": 0.10, "SAVE20": 0.20}

def apply_discount(subtotal: float, coupon_code: str | None) -> float:
    """Return subtotal after applying coupon discount, if any."""
    rate = COUPON_DISCOUNTS.get(coupon_code or "", 0.0)
    return subtotal * (1 - rate)


STATE_TAX_RATES = {"CA": 0.0725, "NY": 0.08, "TX": 0.0625}
DEFAULT_TAX_RATE = 0.07

def calculate_tax(subtotal: float, state: str) -> float:
    """Return tax amount for the given subtotal and state."""
    rate = STATE_TAX_RATES.get(state, DEFAULT_TAX_RATE)
    return subtotal * rate


def build_order_record(order_data: dict, subtotal: float, tax: float) -> dict:
    """Assemble the database record for a new order."""
    return {
        "customer_id": order_data["customer_id"],
        "items": order_data["items"],
        "subtotal": subtotal,
        "tax": tax,
        "total": subtotal + tax,
        "status": "pending",
    }


def process_order(order_data: dict) -> dict:
    """Validate, price, and build a pending order record."""
    validate_order(order_data)
    raw_total = sum(item["price"] * item["quantity"] for item in order_data["items"])
    subtotal = apply_discount(raw_total, order_data.get("coupon_code"))
    tax = calculate_tax(subtotal, order_data.get("shipping_state", "CA"))
    return build_order_record(order_data, subtotal, tax)

The orchestration function process_order now reads like a table of contents. Each helper is independently testable. You can test discount logic without building a full order. You can test tax calculation with a single float.

The Extraction Heuristic

A block of code is a candidate for extraction when:

It requires a comment to explain what it does
It could be useful in more than one place
It can be given a name that explains intent rather than mechanism
It operates on a distinct subset of the function's variables

Rename

Renaming is the most frequently performed refactoring and one of the most impactful. The right name eliminates the need for a comment and makes the code self-documenting.

# Before: what does 'd' mean? what does 'calc' do?
def calc(d, r, t):
    return d * (1 + r) ** t

# After: no comment needed
def compound_interest(principal: float, annual_rate: float, years: int) -> float:
    return principal * (1 + annual_rate) ** years

Safe Renaming in Practice

Modern IDEs (PyCharm, VS Code with Pylance) perform rename refactoring with full cross-file awareness. Use them. Manual find-and-replace is fragile because it catches strings inside comments and string literals that you may not want to rename.

When you do not have IDE support:

# Find all usages before renaming
grep -rn "old_function_name" src/ tests/

# After renaming, verify nothing is broken
grep -rn "old_function_name" src/ tests/
# This should return no results

# Run tests
pytest

For public APIs (functions called from outside your module), renaming requires a deprecation period:

def calculate_order_total(user, items, coupon_percent=0):
    """Calculate total cost for an order."""
    # ... implementation ...

# Provide an alias during transition
def calc_total(user, items, disc=0):
    """Deprecated: use calculate_order_total instead."""
    import warnings
    warnings.warn(
        "calc_total is deprecated, use calculate_order_total",
        DeprecationWarning,
        stacklevel=2,
    )
    return calculate_order_total(user, items, coupon_percent=disc)

Extract Variable

When a complex expression appears in a conditional or computation, extract it into a named intermediate variable. The name serves as documentation.

Before

# What is this condition checking?
if (user.subscription_end_date - datetime.now()).days < 30 and not user.auto_renew and user.payment_method is not None:
    send_renewal_reminder(user)

This line requires the reader to parse the entire boolean expression to understand the intent. It is also not reusable - if the same check appears elsewhere, it will be duplicated.

After

subscription_expiring_soon = (user.subscription_end_date - datetime.now()).days < 30
has_auto_renewal_disabled = not user.auto_renew
has_valid_payment_method = user.payment_method is not None

should_send_renewal_reminder = (
    subscription_expiring_soon
    and has_auto_renewal_disabled
    and has_valid_payment_method
)

if should_send_renewal_reminder:
    send_renewal_reminder(user)

Or, if this check has a single home, extracted to a method:

def needs_renewal_reminder(self) -> bool:
    """True if this user should receive a subscription renewal reminder."""
    expiring_soon = (self.subscription_end_date - datetime.now()).days < 30
    return expiring_soon and not self.auto_renew and self.payment_method is not None

The Pattern in Comprehensions

# Before: hard to read
result = [item["price"] * item["quantity"] * (1 - item.get("discount", 0)) for item in cart if item["in_stock"] and item["price"] > 0]

# After: extracted variables
valid_items = [item for item in cart if item["in_stock"] and item["price"] > 0]
line_totals = [
    item["price"] * item["quantity"] * (1 - item.get("discount", 0))
    for item in valid_items
]

Replace Magic Numbers with Named Constants

A magic number is a numeric literal (or string) that appears in code with no explanation of what it represents. The number 86400 means nothing. SECONDS_PER_DAY = 86400 is self-documenting.

Before

def is_session_expired(last_active_timestamp: float) -> bool:
    return (time.time() - last_active_timestamp) > 3600

def can_retry(attempt_count: int) -> bool:
    return attempt_count < 3

def calculate_late_fee(days_overdue: int, amount: float) -> float:
    if days_overdue <= 7:
        return amount * 0.02
    return amount * 0.05

After

SESSION_TIMEOUT_SECONDS = 3600        # 1 hour
MAX_RETRY_ATTEMPTS = 3
GRACE_PERIOD_DAYS = 7
STANDARD_LATE_FEE_RATE = 0.02        # 2%
OVERDUE_LATE_FEE_RATE = 0.05         # 5%

def is_session_expired(last_active_timestamp: float) -> bool:
    return (time.time() - last_active_timestamp) > SESSION_TIMEOUT_SECONDS

def can_retry(attempt_count: int) -> bool:
    return attempt_count < MAX_RETRY_ATTEMPTS

def calculate_late_fee(days_overdue: int, amount: float) -> float:
    if days_overdue <= GRACE_PERIOD_DAYS:
        return amount * STANDARD_LATE_FEE_RATE
    return amount * OVERDUE_LATE_FEE_RATE

For domain-specific status values, prefer enum over bare constants:

from enum import IntEnum

class OrderStatus(IntEnum):
    DRAFT = 1
    PENDING = 2
    CONFIRMED = 3
    SHIPPED = 4
    DELIVERED = 5
    CANCELLED = 6

# Before
if order.status == 4:
    notify_tracking(order)

# After
if order.status == OrderStatus.SHIPPED:
    notify_tracking(order)

The enum version is grep-able, type-checkable, and prevents the introduction of invalid status values.

Extract Class

When a function accumulates too many parameters that travel together, or when a module accumulates too many functions that operate on the same data, it is time to extract a class.

Signal: Parameters That Always Travel Together

def send_report(
    report_title: str,
    report_body: str,
    recipient_email: str,
    recipient_name: str,
    sender_email: str,
    sender_name: str,
    cc_list: list[str],
) -> bool:
    ...

def archive_report(
    report_title: str,
    report_body: str,
    author_email: str,
    storage_path: str,
) -> str:
    ...

The parameters report_title and report_body travel together everywhere. Extract them:

from dataclasses import dataclass, field

@dataclass
class Report:
    title: str
    body: str
    author_email: str

@dataclass
class EmailRecipient:
    email: str
    name: str

def send_report(report: Report, to: EmailRecipient, cc: list[EmailRecipient] | None = None) -> bool:
    ...

def archive_report(report: Report, storage_path: str) -> str:
    ...

Signal: A Growing Bag of State in Module-Level Variables

# Before: state scattered across module
_connection = None
_retry_count = 0
_last_error = None
_is_connected = False

def connect(host, port): ...
def disconnect(): ...
def send(data): ...
def get_status(): ...

# After: state encapsulated in a class
class DatabaseConnection:
    def __init__(self, host: str, port: int):
        self._host = host
        self._port = port
        self._connection = None
        self._retry_count = 0
        self._last_error: Exception | None = None

    def connect(self) -> None: ...
    def disconnect(self) -> None: ...
    def send(self, data: bytes) -> None: ...

    @property
    def is_connected(self) -> bool:
        return self._connection is not None

Replace Conditional with Polymorphism (or a Dispatch Table)

Long if/elif chains that switch on a type or category are a maintenance liability: every new type requires modifying the chain, and the chain grows without bound.

Before

def calculate_shipping(order_type: str, weight: float, distance: float) -> float:
    if order_type == "standard":
        return weight * 0.5 + distance * 0.01
    elif order_type == "express":
        return weight * 1.2 + distance * 0.05 + 5.00
    elif order_type == "overnight":
        return weight * 2.0 + distance * 0.08 + 15.00
    elif order_type == "freight":
        return weight * 0.3 + distance * 0.02
    else:
        raise ValueError(f"Unknown order type: {order_type}")

After: Dispatch Table

from typing import Callable

ShippingCalculator = Callable[[float, float], float]

SHIPPING_CALCULATORS: dict[str, ShippingCalculator] = {
    "standard":  lambda w, d: w * 0.5  + d * 0.01,
    "express":   lambda w, d: w * 1.2  + d * 0.05 + 5.00,
    "overnight": lambda w, d: w * 2.0  + d * 0.08 + 15.00,
    "freight":   lambda w, d: w * 0.3  + d * 0.02,
}

def calculate_shipping(order_type: str, weight: float, distance: float) -> float:
    calculator = SHIPPING_CALCULATORS.get(order_type)
    if calculator is None:
        raise ValueError(f"Unknown order type: {order_type}")
    return calculator(weight, distance)

Adding a new shipping type now requires adding one entry to the dictionary - no if/elif modification needed.

After: Polymorphism (for Complex Logic per Type)

When the per-type logic is too complex for a lambda, subclasses give you the same open/closed property:

from abc import ABC, abstractmethod

class ShippingMethod(ABC):
    @abstractmethod
    def calculate(self, weight: float, distance: float) -> float: ...

    @abstractmethod
    def estimated_days(self) -> int: ...

class StandardShipping(ShippingMethod):
    def calculate(self, weight: float, distance: float) -> float:
        return weight * 0.5 + distance * 0.01

    def estimated_days(self) -> int:
        return 5

class OvernightShipping(ShippingMethod):
    def calculate(self, weight: float, distance: float) -> float:
        return weight * 2.0 + distance * 0.08 + 15.00

    def estimated_days(self) -> int:
        return 1

SHIPPING_METHODS: dict[str, ShippingMethod] = {
    "standard": StandardShipping(),
    "overnight": OvernightShipping(),
}

def get_shipping_method(order_type: str) -> ShippingMethod:
    method = SHIPPING_METHODS.get(order_type)
    if method is None:
        raise ValueError(f"Unknown shipping type: {order_type}")
    return method

Introduce Parameter Object

When a function signature grows beyond four parameters - especially when those parameters frequently travel together - introduce a dataclass or TypedDict to group them.

Before

def create_user(
    first_name: str,
    last_name: str,
    email: str,
    age: int,
    country: str,
    timezone: str,
    newsletter_opt_in: bool,
    referral_code: str | None = None,
) -> dict:
    ...

def update_user(
    user_id: int,
    first_name: str,
    last_name: str,
    email: str,
    country: str,
    timezone: str,
) -> dict:
    ...

After

from dataclasses import dataclass

@dataclass
class UserProfile:
    first_name: str
    last_name: str
    email: str
    country: str
    timezone: str
    age: int | None = None
    newsletter_opt_in: bool = False
    referral_code: str | None = None

    @property
    def full_name(self) -> str:
        return f"{self.first_name} {self.last_name}"

def create_user(profile: UserProfile) -> dict:
    ...

def update_user(user_id: int, profile: UserProfile) -> dict:
    ...

The parameter object has two compounding benefits: it reduces call-site noise, and it gives you a natural home for derived properties like full_name. It also means that when you need to add a new field to the user concept, you change one dataclass, not every function signature.

Collapse Else Clause (Guard Clauses / Early Return)

Deeply nested conditionals - sometimes called the "arrow anti-pattern" - are hard to read because the reader must track multiple levels of indentation mentally.

Before: Arrow Anti-Pattern

def process_payment(user, order, payment_method):
    if user is not None:
        if user.is_active:
            if order is not None:
                if order.total > 0:
                    if payment_method.is_valid():
                        charge = payment_method.charge(order.total)
                        if charge.success:
                            order.status = "paid"
                            return charge.transaction_id
                        else:
                            return None
                    else:
                        return None
                else:
                    return None
            else:
                return None
        else:
            return None
    else:
        return None

After: Guard Clauses (Early Return)

def process_payment(user, order, payment_method) -> str | None:
    if user is None:
        return None
    if not user.is_active:
        return None
    if order is None:
        return None
    if order.total <= 0:
        return None
    if not payment_method.is_valid():
        return None

    charge = payment_method.charge(order.total)
    if not charge.success:
        return None

    order.status = "paid"
    return charge.transaction_id

The guard-clause version eliminates all nesting. The happy path is the last few lines with no indentation. The failure paths are at the top, each with a single reason for failure. The reader can scan the guards quickly and focus on what happens when all conditions are met.

The pattern works by inverting the condition: instead of if condition: do_thing, write if not condition: return early. This keeps the main logic at the lowest indentation level.

The Refactoring Workflow

Effective refactoring follows a tight loop that prevents accumulating broken intermediate states.

Ensure tests exist and pass
Identify one code smell
Apply one refactoring technique
Run tests - they must all pass
Commit (the commit message is the refactoring name: "Extract Function: validate_order")
Repeat

Step 5 - commit after each refactoring - is underappreciated. It gives you a clean history that documents the evolution of the code, and it gives you safe rollback points if a later change introduces a problem.

Tooling That Helps

# Run tests in watch mode during refactoring
pytest-watch  # pip install pytest-watch
ptw src/ tests/

# Or with pytest directly
pytest --tb=short -q

# Check for unused imports and variables after renames
ruff check src/ --select F401,F841

# Detect complexity (functions that need extraction)
radon cc src/ -a -nb  # pip install radon
# Functions with complexity > 10 are candidates for extraction

Commit Message Convention for Refactoring

refactor: Extract Function - validate_order from process_order

No behavior changes. Tests pass. Extracted order validation logic
into a dedicated validate_order() function to improve testability
and reduce process_order complexity from 58 to 18 lines.

When NOT to Refactor

Refactoring judgment includes knowing when to leave code alone.

Near a deadline. Refactoring introduces risk. If you are 48 hours from a production release, a "quick" refactor that breaks something is catastrophic. Add a TODO comment, file a ticket, and refactor after the release.

Without tests. This is the cardinal rule revisited. If you cannot write characterization tests (the code is too tangled, the dependencies are too difficult to inject), the refactoring risk is too high. Invest in making the code testable first.

Code you do not understand. Before you can safely restructure code, you must understand what it does. Spend time reading, add tests that document behavior, and only then refactor. Refactoring code you do not understand is guessing.

In shared/legacy code without coordination. If ten people depend on a module, a rename or interface change affects all of them. Coordinate first, provide deprecation paths, and communicate changes through your team's standard process.

When the real problem is a rewrite. Some code is not worth refactoring. If a module is fundamentally misdesigned - wrong abstractions at the core, impossible to test, built on an obsolete dependency - a targeted rewrite with a clean interface is better than incremental improvement. The way to identify this case: if each refactoring reveals two new problems rather than one fewer, you are fighting the architecture.

Interview Questions

Q1: What is the difference between refactoring and rewriting?

Answer: Refactoring changes the internal structure of code while preserving its external behavior - the same inputs produce the same outputs before and after. A rewrite replaces code entirely, often changing interfaces, abstractions, and sometimes behavior. Refactoring is done in small, verifiable steps with tests passing at each step. A rewrite is a larger investment that temporarily removes the safety net of existing tests. In practice, refactoring is preferred because it maintains system stability; a rewrite is warranted only when the existing design is fundamentally broken.

Q2: Why is it dangerous to refactor code that has no tests?

Answer: Without tests, you have no automated way to verify that your structural changes preserved the original behavior. Human review alone is unreliable - subtle behavioral changes in edge cases are easy to miss. The risk compounds as the refactoring grows larger. The professional discipline is to write characterization tests first: run the existing code with representative inputs, record the outputs, and make those test cases. Only then refactor, running the tests after every change.

Q3: Explain the Extract Function refactoring and what signals indicate it is needed.

Answer: Extract Function takes a block of code within a function and moves it into a new, named function. The signals that indicate it is needed include: a comment explaining what a block does (the comment title becomes the function name), a block that operates on a distinct subset of the variables, logic that could be useful in more than one place, and any function that grows beyond 20-30 lines. After extraction, the original function calls the new one, and the new function can be independently tested. The most important outcome is that each function has a single responsibility that its name accurately describes.

Q4: What is the guard clause pattern and why does it improve readability?

Answer: Guard clauses (also called early returns) are conditional checks at the top of a function that return immediately when a precondition is not met. They replace deeply nested if/else structures - the "arrow anti-pattern" - with a flat structure where failure cases are handled upfront and the happy path is at the lowest indentation level. This improves readability because the reader can scan the failure conditions quickly and then focus on the main logic without tracking nested context. It also eliminates dangling else clauses, which are a common source of confusion.

Q5: When should you use a dispatch table instead of if/elif chains?

Answer: A dispatch table (a dictionary mapping keys to functions or callables) is appropriate when the if/elif chain switches on a type, category, or string value and each branch does something distinct. The dispatch table is open for extension without modification: adding a new type means adding a dictionary entry, not modifying the branching logic. This matters most when the set of types is expected to grow over time, or when different parts of the codebase need to register their own handlers. If the branching logic is trivial (two or three cases with no expectation of growth), an if/elif is simpler and clearer.

Q6: What does "Introduce Parameter Object" mean and when should you apply it?

Answer: Introduce Parameter Object groups a set of parameters that repeatedly travel together into a single dataclass or class. It is appropriate when a function takes more than four parameters, especially when the same group of parameters appears in multiple related functions. The parameter object has two benefits: it reduces the noise at call sites (one argument instead of six), and it provides a natural place to add derived properties and validation logic related to that concept. In Python, @dataclass is the idiomatic choice for parameter objects when you do not need behavior beyond data holding and basic property computation.

Practice Challenges

Beginner - Extract Three Functions from a Monolith

The following function does four things. Extract each responsibility into a named helper function, keeping process_signup as the orchestrator.

def process_signup(email, password, username, invite_code):
    # Validate inputs
    if not email or "@" not in email:
        raise ValueError("Invalid email")
    if len(password) < 8:
        raise ValueError("Password too short")
    if not username or len(username) < 3:
        raise ValueError("Username too short")

    # Check invite code
    valid_codes = {"BETA2024", "EARLYBIRD", "FRIENDS"}
    if invite_code not in valid_codes:
        raise ValueError("Invalid invite code")

    # Hash password (simplified)
    import hashlib
    hashed = hashlib.sha256(password.encode()).hexdigest()

    # Build user record
    return {
        "email": email.lower().strip(),
        "username": username.lower(),
        "password_hash": hashed,
        "invite_code": invite_code,
        "status": "pending_verification",
    }

Solution

import hashlib

VALID_INVITE_CODES = {"BETA2024", "EARLYBIRD", "FRIENDS"}
MIN_PASSWORD_LENGTH = 8
MIN_USERNAME_LENGTH = 3

def validate_signup_inputs(email: str, password: str, username: str) -> None:
    """Raise ValueError if any signup field fails validation."""
    if not email or "@" not in email:
        raise ValueError("Invalid email")
    if len(password) < MIN_PASSWORD_LENGTH:
        raise ValueError(f"Password must be at least {MIN_PASSWORD_LENGTH} characters")
    if not username or len(username) < MIN_USERNAME_LENGTH:
        raise ValueError(f"Username must be at least {MIN_USERNAME_LENGTH} characters")

def validate_invite_code(invite_code: str) -> None:
    """Raise ValueError if the invite code is not recognized."""
    if invite_code not in VALID_INVITE_CODES:
        raise ValueError("Invalid invite code")

def hash_password(password: str) -> str:
    """Return a SHA-256 hex digest of the given password."""
    return hashlib.sha256(password.encode()).hexdigest()

def build_user_record(email: str, username: str, password_hash: str, invite_code: str) -> dict:
    """Assemble the initial user record for persistence."""
    return {
        "email": email.lower().strip(),
        "username": username.lower(),
        "password_hash": password_hash,
        "invite_code": invite_code,
        "status": "pending_verification",
    }

def process_signup(email: str, password: str, username: str, invite_code: str) -> dict:
    """Validate and build a new user record from signup data."""
    validate_signup_inputs(email, password, username)
    validate_invite_code(invite_code)
    password_hash = hash_password(password)
    return build_user_record(email, username, password_hash, invite_code)

Intermediate - Apply Three Refactoring Techniques

Apply Extract Variable, Replace Magic Numbers with Constants, and Collapse Else Clause to this function:

def get_discount_rate(user, cart_total):
    if user is not None:
        if user.is_active:
            if cart_total >= 100:
                if user.loyalty_years >= 3 and not user.has_pending_payment:
                    if cart_total >= 500:
                        return 0.20
                    else:
                        return 0.15
                else:
                    if cart_total >= 500:
                        return 0.10
                    else:
                        return 0.05
            else:
                return 0.0
        else:
            return 0.0
    else:
        return 0.0

Solution

# Named constants replacing magic numbers
MIN_DISCOUNT_THRESHOLD = 100
LARGE_ORDER_THRESHOLD = 500
LOYALTY_YEARS_REQUIRED = 3

LOYAL_LARGE_ORDER_DISCOUNT = 0.20
LOYAL_STANDARD_DISCOUNT    = 0.15
STANDARD_LARGE_DISCOUNT    = 0.10
STANDARD_DISCOUNT          = 0.05
NO_DISCOUNT                = 0.0

def get_discount_rate(user, cart_total: float) -> float:
    """Return the discount rate for a user's cart.

    Loyal customers (3+ years, no pending payment) receive higher rates.
    Minimum cart total of $100 required for any discount.
    """
    # Guard clauses (collapsed else)
    if user is None:
        return NO_DISCOUNT
    if not user.is_active:
        return NO_DISCOUNT
    if cart_total < MIN_DISCOUNT_THRESHOLD:
        return NO_DISCOUNT

    # Extract variables for complex conditions
    is_loyal_customer = user.loyalty_years >= LOYALTY_YEARS_REQUIRED and not user.has_pending_payment
    is_large_order = cart_total >= LARGE_ORDER_THRESHOLD

    # Clean dispatch logic
    if is_loyal_customer:
        return LOYAL_LARGE_ORDER_DISCOUNT if is_large_order else LOYAL_STANDARD_DISCOUNT
    else:
        return STANDARD_LARGE_DISCOUNT if is_large_order else STANDARD_DISCOUNT

Advanced - Replace Conditional with Dispatch Table and Introduce Parameter Object

This function handles different notification types with a growing if/elif chain and a long parameter list. Refactor it using both techniques:

def send_notification(user_id, notif_type, subject, body, email, phone,
                      push_token, priority, retry_count, schedule_at):
    if notif_type == "email":
        if priority == "high":
            send_email_immediately(email, subject, body)
        else:
            schedule_email(email, subject, body, schedule_at)
    elif notif_type == "sms":
        if len(body) > 160:
            body = body[:157] + "..."
        send_sms(phone, body, priority)
    elif notif_type == "push":
        send_push(push_token, subject, body, priority)
    elif notif_type == "in_app":
        store_in_app_notification(user_id, subject, body)
    else:
        raise ValueError(f"Unknown notification type: {notif_type}")

Solution

from dataclasses import dataclass
from enum import Enum
from typing import Callable
import datetime

class NotificationType(str, Enum):
    EMAIL  = "email"
    SMS    = "sms"
    PUSH   = "push"
    IN_APP = "in_app"

class Priority(str, Enum):
    HIGH   = "high"
    NORMAL = "normal"
    LOW    = "low"

SMS_MAX_LENGTH = 160

@dataclass
class NotificationRequest:
    user_id: int
    notif_type: NotificationType
    subject: str
    body: str
    email: str | None = None
    phone: str | None = None
    push_token: str | None = None
    priority: Priority = Priority.NORMAL
    retry_count: int = 3
    schedule_at: datetime.datetime | None = None

def _send_email_notification(req: NotificationRequest) -> None:
    if req.priority == Priority.HIGH:
        send_email_immediately(req.email, req.subject, req.body)
    else:
        send_scheduled_email(req.email, req.subject, req.body, req.schedule_at)

def _send_sms_notification(req: NotificationRequest) -> None:
    body = req.body if len(req.body) <= SMS_MAX_LENGTH else req.body[:157] + "..."
    send_sms(req.phone, body, req.priority)

def _send_push_notification(req: NotificationRequest) -> None:
    send_push(req.push_token, req.subject, req.body, req.priority)

def _send_in_app_notification(req: NotificationRequest) -> None:
    store_in_app_notification(req.user_id, req.subject, req.body)

NotificationHandler = Callable[[NotificationRequest], None]

NOTIFICATION_HANDLERS: dict[NotificationType, NotificationHandler] = {
    NotificationType.EMAIL:  _send_email_notification,
    NotificationType.SMS:    _send_sms_notification,
    NotificationType.PUSH:   _send_push_notification,
    NotificationType.IN_APP: _send_in_app_notification,
}

def send_notification(req: NotificationRequest) -> None:
    """Dispatch a notification to the appropriate channel."""
    handler = NOTIFICATION_HANDLERS.get(req.notif_type)
    if handler is None:
        raise ValueError(f"Unknown notification type: {req.notif_type}")
    handler(req)

Quick Reference

Refactoring	Signal to Apply	Core Mechanic
Extract Function	Comment explaining a block; function > 25 lines	Move block to named function, replace with call
Rename	Ambiguous name, single-letter variable	IDE rename or grep; add deprecation alias for public API
Extract Variable	Complex boolean or arithmetic expression	Assign expression to a descriptive name
Replace Magic Number	Literal with no explanation in context	Define `CONSTANT_NAME = value` at module level
Extract Class	Parameters that always travel together; growing module state	Group into `@dataclass` or class
Replace Conditional	`if/elif` switching on type/category	Dict mapping keys to functions/classes
Introduce Parameter Object	More than 4 parameters traveling together	Create `@dataclass` with those fields
Collapse Else / Guard Clause	Deeply nested conditionals; arrow anti-pattern	Invert condition, return early at top

Key Takeaways

Refactoring is a structural change with no behavior change - tests must stay green at every step.
The cardinal rule: never refactor code without tests. Write characterization tests first.
Extract Function is the most important refactoring - apply it whenever a comment explains what a block does.
Guard clauses (early returns) eliminate nesting and make the happy path the lowest-indented code.
Replace magic numbers with named constants so the purpose of a value is visible at its point of use.
Dispatch tables (dicts of callables) replace if/elif chains and make adding new types non-invasive.
Commit after every refactoring step. A clean history of small changes is its own form of documentation.

What You Will Learn​

Prerequisites​

What Refactoring IS (and IS NOT)​

Why Refactor at All?​

The Cardinal Rule: Never Refactor Without Tests​

Extract Function​

The Signal​

After Extract Function​

The Extraction Heuristic​

Rename​

Safe Renaming in Practice​

Extract Variable​

Before​

After​

The Pattern in Comprehensions​

Replace Magic Numbers with Named Constants​

Before​

After​

Extract Class​

Signal: Parameters That Always Travel Together​

Signal: A Growing Bag of State in Module-Level Variables​

Replace Conditional with Polymorphism (or a Dispatch Table)​

Before​

After: Dispatch Table​

After: Polymorphism (for Complex Logic per Type)​

Introduce Parameter Object​

Before​

After​

Collapse Else Clause (Guard Clauses / Early Return)​

Before: Arrow Anti-Pattern​

After: Guard Clauses (Early Return)​

The Refactoring Workflow​

Tooling That Helps​

Commit Message Convention for Refactoring​

When NOT to Refactor​

Interview Questions​

Q1: What is the difference between refactoring and rewriting?​

Q2: Why is it dangerous to refactor code that has no tests?​

Q3: Explain the Extract Function refactoring and what signals indicate it is needed.​

Q4: What is the guard clause pattern and why does it improve readability?​

Q5: When should you use a dispatch table instead of if/elif chains?​

Q6: What does "Introduce Parameter Object" mean and when should you apply it?​

Practice Challenges​

Beginner - Extract Three Functions from a Monolith​

Intermediate - Apply Three Refactoring Techniques​

Advanced - Replace Conditional with Dispatch Table and Introduce Parameter Object​

Quick Reference​

Key Takeaways​

What You Will Learn

Prerequisites

What Refactoring IS (and IS NOT)

Why Refactor at All?

The Cardinal Rule: Never Refactor Without Tests

Extract Function

The Signal

After Extract Function

The Extraction Heuristic

Rename

Safe Renaming in Practice

Extract Variable

Before

After

The Pattern in Comprehensions

Replace Magic Numbers with Named Constants

Before

After

Extract Class

Signal: Parameters That Always Travel Together

Signal: A Growing Bag of State in Module-Level Variables

Replace Conditional with Polymorphism (or a Dispatch Table)

Before

After: Dispatch Table

After: Polymorphism (for Complex Logic per Type)

Introduce Parameter Object

Before

After

Collapse Else Clause (Guard Clauses / Early Return)

Before: Arrow Anti-Pattern

After: Guard Clauses (Early Return)

The Refactoring Workflow

Tooling That Helps

Commit Message Convention for Refactoring

When NOT to Refactor

Interview Questions

Q1: What is the difference between refactoring and rewriting?

Q2: Why is it dangerous to refactor code that has no tests?

Q3: Explain the Extract Function refactoring and what signals indicate it is needed.

Q4: What is the guard clause pattern and why does it improve readability?

Q5: When should you use a dispatch table instead of if/elif chains?

Q6: What does "Introduce Parameter Object" mean and when should you apply it?

Practice Challenges

Beginner - Extract Three Functions from a Monolith

Intermediate - Apply Three Refactoring Techniques

Advanced - Replace Conditional with Dispatch Table and Introduce Parameter Object

Quick Reference

Key Takeaways