Skip to main content

Refactoring Techniques - Improving Code Without Breaking It

Reading time: ~22 minutes | Level: Foundation → Engineering

# Before refactoring - 58 lines, one function, zero clarity
def process(u, items, disc=0):
t = 0
for i in items:
p = i['price']
if i['type'] == 1:
p = p * 0.9
elif i['type'] == 2:
p = p * 0.8
elif i['type'] == 3:
p = p * 0.75
if disc > 0:
p = p - (p * disc / 100)
t += p
if u['premium']:
t = t * 0.95
tax = t * 0.08
return {'subtotal': t, 'tax': tax, 'total': t + tax}

# After refactoring - same behavior, zero mystery
CATEGORY_DISCOUNTS = {1: 0.10, 2: 0.20, 3: 0.25}
PREMIUM_DISCOUNT = 0.05
TAX_RATE = 0.08

def apply_category_discount(price: float, item_type: int) -> float:
discount = CATEGORY_DISCOUNTS.get(item_type, 0.0)
return price * (1 - discount)

def apply_coupon_discount(price: float, coupon_percent: float) -> float:
return price * (1 - coupon_percent / 100)

def apply_premium_discount(subtotal: float, is_premium: bool) -> float:
return subtotal * (1 - PREMIUM_DISCOUNT) if is_premium else subtotal

def calculate_order_total(user: dict, items: list, coupon_percent: float = 0) -> dict:
subtotal = sum(
apply_coupon_discount(apply_category_discount(item["price"], item["type"]), coupon_percent)
for item in items
)
subtotal = apply_premium_discount(subtotal, user["premium"])
tax = subtotal * TAX_RATE
return {"subtotal": subtotal, "tax": tax, "total": subtotal + tax}

These two functions do exactly the same thing. The second one took 20 minutes to produce from the first. That 20 minutes is what separates code you dread from code you trust.

Refactoring is the discipline of improving the internal structure of code without changing its observable behavior. It is not rewriting. It is not optimizing. It is not adding features. It is a precise, test-guarded surgery on the shape of your code - one small step at a time.

What You Will Learn

  • The formal definition of refactoring and what distinguishes it from rewriting
  • The cardinal rule: why you must never refactor untested code
  • Extract Function - breaking monoliths into focused units
  • Rename - the most impactful and underrated refactoring
  • Extract Variable - replacing expressions with meaning
  • Replace Magic Numbers with named constants
  • Extract Class - when data and state accumulate in a function
  • Replace Conditional with Polymorphism or dispatch tables
  • Introduce Parameter Object - replacing long argument lists
  • Collapse Else Clause (guard clauses / early return)
  • The complete refactoring workflow and when NOT to refactor

Prerequisites

  • Comfort writing Python functions and classes
  • Basic familiarity with writing tests (pytest or unittest)
  • Understanding of the code smells concept is helpful but not required

What Refactoring IS (and IS NOT)

Refactoring has a precise definition, coined by Martin Fowler: a change to the internal structure of software that makes it easier to understand and cheaper to modify, without changing its observable behavior.

That last clause - "without changing observable behavior" - is what separates refactoring from every other kind of code change.

ActivityChanges behavior?Is it refactoring?
Renaming a variable for clarityNoYes
Extracting a functionNoYes
Fixing a bugYesNo
Adding a featureYesNo
Rewriting a module from scratchSometimesNo
Optimizing a slow algorithmSometimesNo

This distinction matters operationally. When you are refactoring, your tests should stay green at every step. If a test fails after a refactoring step, you either introduced a bug or your test was testing implementation rather than behavior. Either way, you roll back, not push forward.

Why Refactor at All?

Code accumulates complexity over time - not through malice but through momentum. A function that started at 15 lines grows to 80 because each new requirement adds 5 more. A class that handled one concern gains responsibilities as the system evolves. This is called technical debt: a loan you take out in the form of expedient code that you must eventually repay with interest (debugging time, onboarding time, change fear).

Refactoring is the repayment mechanism. Small, frequent refactors prevent debt from compounding.

# What 18 months of feature additions looks like
def send_email(user_id, template_id, subject, body, attachments=None,
cc=None, bcc=None, reply_to=None, from_name=None,
from_email=None, track_opens=True, track_clicks=True,
schedule_at=None, campaign_id=None, tags=None):
# 140 more lines...
pass

This function started as send_email(user_id, template_id). Nobody planned for 13 parameters. Each parameter was added for a legitimate reason. The refactoring discipline says: when you add the 5th parameter, that is the moment to introduce a parameter object. Not later - now.

The Cardinal Rule: Never Refactor Without Tests

The most dangerous situation in software is refactoring code that has no tests. Without tests, you cannot verify that your structural change left the behavior intact. You are navigating in the dark.

# This function has no tests. DO NOT refactor it yet.
def calculate_tax(income, filing_status, state, has_dependents, year):
# 90 lines of tax bracket logic
pass

Before touching this function, you write tests. Not after - before. This is called characterization testing: you run the existing code with a variety of inputs and record its outputs. Those outputs become your test cases, even if the outputs are wrong. They document what the code currently does.

# Step 1: Write characterization tests BEFORE changing anything
import pytest

def test_calculate_tax_single_no_dependents():
result = calculate_tax(75000, "single", "CA", False, 2024)
assert result == 18750.0 # This is what the code returns TODAY

def test_calculate_tax_married_with_dependents():
result = calculate_tax(120000, "married", "TX", True, 2024)
assert result == 22400.0

def test_calculate_tax_zero_income():
result = calculate_tax(0, "single", "NY", False, 2024)
assert result == 0.0

Now you have a safety net. Every refactoring step you take, you run these tests. If they pass, you have preserved behavior. If they fail, you broke something and you revert.

Extract Function

Extract Function is the most fundamental refactoring. When a function does more than one thing - or when a block of code requires a comment to explain it - extract it into a named function.

The Signal

def process_order(order_data):
# Validate the order
if not order_data.get("items"):
raise ValueError("Order must have items")
if not order_data.get("customer_id"):
raise ValueError("Order must have a customer")
total = sum(item["price"] * item["quantity"] for item in order_data["items"])
if total <= 0:
raise ValueError("Order total must be positive")

# Apply discounts
discount = 0
if order_data.get("coupon_code") == "SAVE10":
discount = total * 0.10
elif order_data.get("coupon_code") == "SAVE20":
discount = total * 0.20
discounted_total = total - discount

# Calculate tax
state = order_data.get("shipping_state", "CA")
TAX_RATES = {"CA": 0.0725, "NY": 0.08, "TX": 0.0625}
tax_rate = TAX_RATES.get(state, 0.07)
tax = discounted_total * tax_rate

# Save to database
db_record = {
"customer_id": order_data["customer_id"],
"items": order_data["items"],
"subtotal": discounted_total,
"tax": tax,
"total": discounted_total + tax,
"status": "pending",
}
# ... db.save(db_record)
return db_record

This function has four responsibilities signaled by four comments. Each comment is a refactoring opportunity: the comment title becomes the function name.

After Extract Function

def validate_order(order_data: dict) -> None:
"""Raise ValueError if order_data is missing required fields."""
if not order_data.get("items"):
raise ValueError("Order must have items")
if not order_data.get("customer_id"):
raise ValueError("Order must have a customer")
total = sum(item["price"] * item["quantity"] for item in order_data["items"])
if total <= 0:
raise ValueError("Order total must be positive")


COUPON_DISCOUNTS = {"SAVE10": 0.10, "SAVE20": 0.20}

def apply_discount(subtotal: float, coupon_code: str | None) -> float:
"""Return subtotal after applying coupon discount, if any."""
rate = COUPON_DISCOUNTS.get(coupon_code or "", 0.0)
return subtotal * (1 - rate)


STATE_TAX_RATES = {"CA": 0.0725, "NY": 0.08, "TX": 0.0625}
DEFAULT_TAX_RATE = 0.07

def calculate_tax(subtotal: float, state: str) -> float:
"""Return tax amount for the given subtotal and state."""
rate = STATE_TAX_RATES.get(state, DEFAULT_TAX_RATE)
return subtotal * rate


def build_order_record(order_data: dict, subtotal: float, tax: float) -> dict:
"""Assemble the database record for a new order."""
return {
"customer_id": order_data["customer_id"],
"items": order_data["items"],
"subtotal": subtotal,
"tax": tax,
"total": subtotal + tax,
"status": "pending",
}


def process_order(order_data: dict) -> dict:
"""Validate, price, and build a pending order record."""
validate_order(order_data)
raw_total = sum(item["price"] * item["quantity"] for item in order_data["items"])
subtotal = apply_discount(raw_total, order_data.get("coupon_code"))
tax = calculate_tax(subtotal, order_data.get("shipping_state", "CA"))
return build_order_record(order_data, subtotal, tax)

The orchestration function process_order now reads like a table of contents. Each helper is independently testable. You can test discount logic without building a full order. You can test tax calculation with a single float.

The Extraction Heuristic

A block of code is a candidate for extraction when:

  • It requires a comment to explain what it does
  • It could be useful in more than one place
  • It can be given a name that explains intent rather than mechanism
  • It operates on a distinct subset of the function's variables

Rename

Renaming is the most frequently performed refactoring and one of the most impactful. The right name eliminates the need for a comment and makes the code self-documenting.

# Before: what does 'd' mean? what does 'calc' do?
def calc(d, r, t):
return d * (1 + r) ** t

# After: no comment needed
def compound_interest(principal: float, annual_rate: float, years: int) -> float:
return principal * (1 + annual_rate) ** years

Safe Renaming in Practice

Modern IDEs (PyCharm, VS Code with Pylance) perform rename refactoring with full cross-file awareness. Use them. Manual find-and-replace is fragile because it catches strings inside comments and string literals that you may not want to rename.

When you do not have IDE support:

# Find all usages before renaming
grep -rn "old_function_name" src/ tests/

# After renaming, verify nothing is broken
grep -rn "old_function_name" src/ tests/
# This should return no results

# Run tests
pytest

For public APIs (functions called from outside your module), renaming requires a deprecation period:

def calculate_order_total(user, items, coupon_percent=0):
"""Calculate total cost for an order."""
# ... implementation ...

# Provide an alias during transition
def calc_total(user, items, disc=0):
"""Deprecated: use calculate_order_total instead."""
import warnings
warnings.warn(
"calc_total is deprecated, use calculate_order_total",
DeprecationWarning,
stacklevel=2,
)
return calculate_order_total(user, items, coupon_percent=disc)

Extract Variable

When a complex expression appears in a conditional or computation, extract it into a named intermediate variable. The name serves as documentation.

Before

# What is this condition checking?
if (user.subscription_end_date - datetime.now()).days < 30 and not user.auto_renew and user.payment_method is not None:
send_renewal_reminder(user)

This line requires the reader to parse the entire boolean expression to understand the intent. It is also not reusable - if the same check appears elsewhere, it will be duplicated.

After

subscription_expiring_soon = (user.subscription_end_date - datetime.now()).days < 30
has_auto_renewal_disabled = not user.auto_renew
has_valid_payment_method = user.payment_method is not None

should_send_renewal_reminder = (
subscription_expiring_soon
and has_auto_renewal_disabled
and has_valid_payment_method
)

if should_send_renewal_reminder:
send_renewal_reminder(user)

Or, if this check has a single home, extracted to a method:

def needs_renewal_reminder(self) -> bool:
"""True if this user should receive a subscription renewal reminder."""
expiring_soon = (self.subscription_end_date - datetime.now()).days < 30
return expiring_soon and not self.auto_renew and self.payment_method is not None

The Pattern in Comprehensions

# Before: hard to read
result = [item["price"] * item["quantity"] * (1 - item.get("discount", 0)) for item in cart if item["in_stock"] and item["price"] > 0]

# After: extracted variables
valid_items = [item for item in cart if item["in_stock"] and item["price"] > 0]
line_totals = [
item["price"] * item["quantity"] * (1 - item.get("discount", 0))
for item in valid_items
]

Replace Magic Numbers with Named Constants

A magic number is a numeric literal (or string) that appears in code with no explanation of what it represents. The number 86400 means nothing. SECONDS_PER_DAY = 86400 is self-documenting.

Before

def is_session_expired(last_active_timestamp: float) -> bool:
return (time.time() - last_active_timestamp) > 3600

def can_retry(attempt_count: int) -> bool:
return attempt_count < 3

def calculate_late_fee(days_overdue: int, amount: float) -> float:
if days_overdue <= 7:
return amount * 0.02
return amount * 0.05

After

SESSION_TIMEOUT_SECONDS = 3600 # 1 hour
MAX_RETRY_ATTEMPTS = 3
GRACE_PERIOD_DAYS = 7
STANDARD_LATE_FEE_RATE = 0.02 # 2%
OVERDUE_LATE_FEE_RATE = 0.05 # 5%

def is_session_expired(last_active_timestamp: float) -> bool:
return (time.time() - last_active_timestamp) > SESSION_TIMEOUT_SECONDS

def can_retry(attempt_count: int) -> bool:
return attempt_count < MAX_RETRY_ATTEMPTS

def calculate_late_fee(days_overdue: int, amount: float) -> float:
if days_overdue <= GRACE_PERIOD_DAYS:
return amount * STANDARD_LATE_FEE_RATE
return amount * OVERDUE_LATE_FEE_RATE

For domain-specific status values, prefer enum over bare constants:

from enum import IntEnum

class OrderStatus(IntEnum):
DRAFT = 1
PENDING = 2
CONFIRMED = 3
SHIPPED = 4
DELIVERED = 5
CANCELLED = 6

# Before
if order.status == 4:
notify_tracking(order)

# After
if order.status == OrderStatus.SHIPPED:
notify_tracking(order)

The enum version is grep-able, type-checkable, and prevents the introduction of invalid status values.

Extract Class

When a function accumulates too many parameters that travel together, or when a module accumulates too many functions that operate on the same data, it is time to extract a class.

Signal: Parameters That Always Travel Together

def send_report(
report_title: str,
report_body: str,
recipient_email: str,
recipient_name: str,
sender_email: str,
sender_name: str,
cc_list: list[str],
) -> bool:
...

def archive_report(
report_title: str,
report_body: str,
author_email: str,
storage_path: str,
) -> str:
...

The parameters report_title and report_body travel together everywhere. Extract them:

from dataclasses import dataclass, field

@dataclass
class Report:
title: str
body: str
author_email: str

@dataclass
class EmailRecipient:
email: str
name: str

def send_report(report: Report, to: EmailRecipient, cc: list[EmailRecipient] | None = None) -> bool:
...

def archive_report(report: Report, storage_path: str) -> str:
...

Signal: A Growing Bag of State in Module-Level Variables

# Before: state scattered across module
_connection = None
_retry_count = 0
_last_error = None
_is_connected = False

def connect(host, port): ...
def disconnect(): ...
def send(data): ...
def get_status(): ...
# After: state encapsulated in a class
class DatabaseConnection:
def __init__(self, host: str, port: int):
self._host = host
self._port = port
self._connection = None
self._retry_count = 0
self._last_error: Exception | None = None

def connect(self) -> None: ...
def disconnect(self) -> None: ...
def send(self, data: bytes) -> None: ...

@property
def is_connected(self) -> bool:
return self._connection is not None

Replace Conditional with Polymorphism (or a Dispatch Table)

Long if/elif chains that switch on a type or category are a maintenance liability: every new type requires modifying the chain, and the chain grows without bound.

Before

def calculate_shipping(order_type: str, weight: float, distance: float) -> float:
if order_type == "standard":
return weight * 0.5 + distance * 0.01
elif order_type == "express":
return weight * 1.2 + distance * 0.05 + 5.00
elif order_type == "overnight":
return weight * 2.0 + distance * 0.08 + 15.00
elif order_type == "freight":
return weight * 0.3 + distance * 0.02
else:
raise ValueError(f"Unknown order type: {order_type}")

After: Dispatch Table

from typing import Callable

ShippingCalculator = Callable[[float, float], float]

SHIPPING_CALCULATORS: dict[str, ShippingCalculator] = {
"standard": lambda w, d: w * 0.5 + d * 0.01,
"express": lambda w, d: w * 1.2 + d * 0.05 + 5.00,
"overnight": lambda w, d: w * 2.0 + d * 0.08 + 15.00,
"freight": lambda w, d: w * 0.3 + d * 0.02,
}

def calculate_shipping(order_type: str, weight: float, distance: float) -> float:
calculator = SHIPPING_CALCULATORS.get(order_type)
if calculator is None:
raise ValueError(f"Unknown order type: {order_type}")
return calculator(weight, distance)

Adding a new shipping type now requires adding one entry to the dictionary - no if/elif modification needed.

After: Polymorphism (for Complex Logic per Type)

When the per-type logic is too complex for a lambda, subclasses give you the same open/closed property:

from abc import ABC, abstractmethod

class ShippingMethod(ABC):
@abstractmethod
def calculate(self, weight: float, distance: float) -> float: ...

@abstractmethod
def estimated_days(self) -> int: ...

class StandardShipping(ShippingMethod):
def calculate(self, weight: float, distance: float) -> float:
return weight * 0.5 + distance * 0.01

def estimated_days(self) -> int:
return 5

class OvernightShipping(ShippingMethod):
def calculate(self, weight: float, distance: float) -> float:
return weight * 2.0 + distance * 0.08 + 15.00

def estimated_days(self) -> int:
return 1

SHIPPING_METHODS: dict[str, ShippingMethod] = {
"standard": StandardShipping(),
"overnight": OvernightShipping(),
}

def get_shipping_method(order_type: str) -> ShippingMethod:
method = SHIPPING_METHODS.get(order_type)
if method is None:
raise ValueError(f"Unknown shipping type: {order_type}")
return method

Introduce Parameter Object

When a function signature grows beyond four parameters - especially when those parameters frequently travel together - introduce a dataclass or TypedDict to group them.

Before

def create_user(
first_name: str,
last_name: str,
email: str,
age: int,
country: str,
timezone: str,
newsletter_opt_in: bool,
referral_code: str | None = None,
) -> dict:
...

def update_user(
user_id: int,
first_name: str,
last_name: str,
email: str,
country: str,
timezone: str,
) -> dict:
...

After

from dataclasses import dataclass

@dataclass
class UserProfile:
first_name: str
last_name: str
email: str
country: str
timezone: str
age: int | None = None
newsletter_opt_in: bool = False
referral_code: str | None = None

@property
def full_name(self) -> str:
return f"{self.first_name} {self.last_name}"

def create_user(profile: UserProfile) -> dict:
...

def update_user(user_id: int, profile: UserProfile) -> dict:
...

The parameter object has two compounding benefits: it reduces call-site noise, and it gives you a natural home for derived properties like full_name. It also means that when you need to add a new field to the user concept, you change one dataclass, not every function signature.

Collapse Else Clause (Guard Clauses / Early Return)

Deeply nested conditionals - sometimes called the "arrow anti-pattern" - are hard to read because the reader must track multiple levels of indentation mentally.

Before: Arrow Anti-Pattern

def process_payment(user, order, payment_method):
if user is not None:
if user.is_active:
if order is not None:
if order.total > 0:
if payment_method.is_valid():
charge = payment_method.charge(order.total)
if charge.success:
order.status = "paid"
return charge.transaction_id
else:
return None
else:
return None
else:
return None
else:
return None
else:
return None
else:
return None

After: Guard Clauses (Early Return)

def process_payment(user, order, payment_method) -> str | None:
if user is None:
return None
if not user.is_active:
return None
if order is None:
return None
if order.total <= 0:
return None
if not payment_method.is_valid():
return None

charge = payment_method.charge(order.total)
if not charge.success:
return None

order.status = "paid"
return charge.transaction_id

The guard-clause version eliminates all nesting. The happy path is the last few lines with no indentation. The failure paths are at the top, each with a single reason for failure. The reader can scan the guards quickly and focus on what happens when all conditions are met.

The pattern works by inverting the condition: instead of if condition: do_thing, write if not condition: return early. This keeps the main logic at the lowest indentation level.

The Refactoring Workflow

Effective refactoring follows a tight loop that prevents accumulating broken intermediate states.

1. Ensure tests exist and pass
2. Identify one code smell
3. Apply one refactoring technique
4. Run tests - they must all pass
5. Commit (the commit message is the refactoring name: "Extract Function: validate_order")
6. Repeat

Step 5 - commit after each refactoring - is underappreciated. It gives you a clean history that documents the evolution of the code, and it gives you safe rollback points if a later change introduces a problem.

Tooling That Helps

# Run tests in watch mode during refactoring
pytest-watch # pip install pytest-watch
ptw src/ tests/

# Or with pytest directly
pytest --tb=short -q

# Check for unused imports and variables after renames
ruff check src/ --select F401,F841

# Detect complexity (functions that need extraction)
radon cc src/ -a -nb # pip install radon
# Functions with complexity > 10 are candidates for extraction

Commit Message Convention for Refactoring

refactor: Extract Function - validate_order from process_order

No behavior changes. Tests pass. Extracted order validation logic
into a dedicated validate_order() function to improve testability
and reduce process_order complexity from 58 to 18 lines.

When NOT to Refactor

Refactoring judgment includes knowing when to leave code alone.

Near a deadline. Refactoring introduces risk. If you are 48 hours from a production release, a "quick" refactor that breaks something is catastrophic. Add a TODO comment, file a ticket, and refactor after the release.

Without tests. This is the cardinal rule revisited. If you cannot write characterization tests (the code is too tangled, the dependencies are too difficult to inject), the refactoring risk is too high. Invest in making the code testable first.

Code you do not understand. Before you can safely restructure code, you must understand what it does. Spend time reading, add tests that document behavior, and only then refactor. Refactoring code you do not understand is guessing.

In shared/legacy code without coordination. If ten people depend on a module, a rename or interface change affects all of them. Coordinate first, provide deprecation paths, and communicate changes through your team's standard process.

When the real problem is a rewrite. Some code is not worth refactoring. If a module is fundamentally misdesigned - wrong abstractions at the core, impossible to test, built on an obsolete dependency - a targeted rewrite with a clean interface is better than incremental improvement. The way to identify this case: if each refactoring reveals two new problems rather than one fewer, you are fighting the architecture.

Interview Questions

Q1: What is the difference between refactoring and rewriting?

Answer: Refactoring changes the internal structure of code while preserving its external behavior - the same inputs produce the same outputs before and after. A rewrite replaces code entirely, often changing interfaces, abstractions, and sometimes behavior. Refactoring is done in small, verifiable steps with tests passing at each step. A rewrite is a larger investment that temporarily removes the safety net of existing tests. In practice, refactoring is preferred because it maintains system stability; a rewrite is warranted only when the existing design is fundamentally broken.

Q2: Why is it dangerous to refactor code that has no tests?

Answer: Without tests, you have no automated way to verify that your structural changes preserved the original behavior. Human review alone is unreliable - subtle behavioral changes in edge cases are easy to miss. The risk compounds as the refactoring grows larger. The professional discipline is to write characterization tests first: run the existing code with representative inputs, record the outputs, and make those test cases. Only then refactor, running the tests after every change.

Q3: Explain the Extract Function refactoring and what signals indicate it is needed.

Answer: Extract Function takes a block of code within a function and moves it into a new, named function. The signals that indicate it is needed include: a comment explaining what a block does (the comment title becomes the function name), a block that operates on a distinct subset of the variables, logic that could be useful in more than one place, and any function that grows beyond 20-30 lines. After extraction, the original function calls the new one, and the new function can be independently tested. The most important outcome is that each function has a single responsibility that its name accurately describes.

Q4: What is the guard clause pattern and why does it improve readability?

Answer: Guard clauses (also called early returns) are conditional checks at the top of a function that return immediately when a precondition is not met. They replace deeply nested if/else structures - the "arrow anti-pattern" - with a flat structure where failure cases are handled upfront and the happy path is at the lowest indentation level. This improves readability because the reader can scan the failure conditions quickly and then focus on the main logic without tracking nested context. It also eliminates dangling else clauses, which are a common source of confusion.

Q5: When should you use a dispatch table instead of if/elif chains?

Answer: A dispatch table (a dictionary mapping keys to functions or callables) is appropriate when the if/elif chain switches on a type, category, or string value and each branch does something distinct. The dispatch table is open for extension without modification: adding a new type means adding a dictionary entry, not modifying the branching logic. This matters most when the set of types is expected to grow over time, or when different parts of the codebase need to register their own handlers. If the branching logic is trivial (two or three cases with no expectation of growth), an if/elif is simpler and clearer.

Q6: What does "Introduce Parameter Object" mean and when should you apply it?

Answer: Introduce Parameter Object groups a set of parameters that repeatedly travel together into a single dataclass or class. It is appropriate when a function takes more than four parameters, especially when the same group of parameters appears in multiple related functions. The parameter object has two benefits: it reduces the noise at call sites (one argument instead of six), and it provides a natural place to add derived properties and validation logic related to that concept. In Python, @dataclass is the idiomatic choice for parameter objects when you do not need behavior beyond data holding and basic property computation.

Practice Challenges

Beginner - Extract Three Functions from a Monolith

The following function does four things. Extract each responsibility into a named helper function, keeping process_signup as the orchestrator.

def process_signup(email, password, username, invite_code):
# Validate inputs
if not email or "@" not in email:
raise ValueError("Invalid email")
if len(password) < 8:
raise ValueError("Password too short")
if not username or len(username) < 3:
raise ValueError("Username too short")

# Check invite code
valid_codes = {"BETA2024", "EARLYBIRD", "FRIENDS"}
if invite_code not in valid_codes:
raise ValueError("Invalid invite code")

# Hash password (simplified)
import hashlib
hashed = hashlib.sha256(password.encode()).hexdigest()

# Build user record
return {
"email": email.lower().strip(),
"username": username.lower(),
"password_hash": hashed,
"invite_code": invite_code,
"status": "pending_verification",
}
Solution
import hashlib

VALID_INVITE_CODES = {"BETA2024", "EARLYBIRD", "FRIENDS"}
MIN_PASSWORD_LENGTH = 8
MIN_USERNAME_LENGTH = 3

def validate_signup_inputs(email: str, password: str, username: str) -> None:
"""Raise ValueError if any signup field fails validation."""
if not email or "@" not in email:
raise ValueError("Invalid email")
if len(password) < MIN_PASSWORD_LENGTH:
raise ValueError(f"Password must be at least {MIN_PASSWORD_LENGTH} characters")
if not username or len(username) < MIN_USERNAME_LENGTH:
raise ValueError(f"Username must be at least {MIN_USERNAME_LENGTH} characters")

def validate_invite_code(invite_code: str) -> None:
"""Raise ValueError if the invite code is not recognized."""
if invite_code not in VALID_INVITE_CODES:
raise ValueError("Invalid invite code")

def hash_password(password: str) -> str:
"""Return a SHA-256 hex digest of the given password."""
return hashlib.sha256(password.encode()).hexdigest()

def build_user_record(email: str, username: str, password_hash: str, invite_code: str) -> dict:
"""Assemble the initial user record for persistence."""
return {
"email": email.lower().strip(),
"username": username.lower(),
"password_hash": password_hash,
"invite_code": invite_code,
"status": "pending_verification",
}

def process_signup(email: str, password: str, username: str, invite_code: str) -> dict:
"""Validate and build a new user record from signup data."""
validate_signup_inputs(email, password, username)
validate_invite_code(invite_code)
password_hash = hash_password(password)
return build_user_record(email, username, password_hash, invite_code)

Intermediate - Apply Three Refactoring Techniques

Apply Extract Variable, Replace Magic Numbers with Constants, and Collapse Else Clause to this function:

def get_discount_rate(user, cart_total):
if user is not None:
if user.is_active:
if cart_total >= 100:
if user.loyalty_years >= 3 and not user.has_pending_payment:
if cart_total >= 500:
return 0.20
else:
return 0.15
else:
if cart_total >= 500:
return 0.10
else:
return 0.05
else:
return 0.0
else:
return 0.0
else:
return 0.0
Solution
# Named constants replacing magic numbers
MIN_DISCOUNT_THRESHOLD = 100
LARGE_ORDER_THRESHOLD = 500
LOYALTY_YEARS_REQUIRED = 3

LOYAL_LARGE_ORDER_DISCOUNT = 0.20
LOYAL_STANDARD_DISCOUNT = 0.15
STANDARD_LARGE_DISCOUNT = 0.10
STANDARD_DISCOUNT = 0.05
NO_DISCOUNT = 0.0

def get_discount_rate(user, cart_total: float) -> float:
"""Return the discount rate for a user's cart.

Loyal customers (3+ years, no pending payment) receive higher rates.
Minimum cart total of $100 required for any discount.
"""
# Guard clauses (collapsed else)
if user is None:
return NO_DISCOUNT
if not user.is_active:
return NO_DISCOUNT
if cart_total < MIN_DISCOUNT_THRESHOLD:
return NO_DISCOUNT

# Extract variables for complex conditions
is_loyal_customer = user.loyalty_years >= LOYALTY_YEARS_REQUIRED and not user.has_pending_payment
is_large_order = cart_total >= LARGE_ORDER_THRESHOLD

# Clean dispatch logic
if is_loyal_customer:
return LOYAL_LARGE_ORDER_DISCOUNT if is_large_order else LOYAL_STANDARD_DISCOUNT
else:
return STANDARD_LARGE_DISCOUNT if is_large_order else STANDARD_DISCOUNT

Advanced - Replace Conditional with Dispatch Table and Introduce Parameter Object

This function handles different notification types with a growing if/elif chain and a long parameter list. Refactor it using both techniques:

def send_notification(user_id, notif_type, subject, body, email, phone,
push_token, priority, retry_count, schedule_at):
if notif_type == "email":
if priority == "high":
send_email_immediately(email, subject, body)
else:
schedule_email(email, subject, body, schedule_at)
elif notif_type == "sms":
if len(body) > 160:
body = body[:157] + "..."
send_sms(phone, body, priority)
elif notif_type == "push":
send_push(push_token, subject, body, priority)
elif notif_type == "in_app":
store_in_app_notification(user_id, subject, body)
else:
raise ValueError(f"Unknown notification type: {notif_type}")
Solution
from dataclasses import dataclass
from enum import Enum
from typing import Callable
import datetime

class NotificationType(str, Enum):
EMAIL = "email"
SMS = "sms"
PUSH = "push"
IN_APP = "in_app"

class Priority(str, Enum):
HIGH = "high"
NORMAL = "normal"
LOW = "low"

SMS_MAX_LENGTH = 160

@dataclass
class NotificationRequest:
user_id: int
notif_type: NotificationType
subject: str
body: str
email: str | None = None
phone: str | None = None
push_token: str | None = None
priority: Priority = Priority.NORMAL
retry_count: int = 3
schedule_at: datetime.datetime | None = None

def _send_email_notification(req: NotificationRequest) -> None:
if req.priority == Priority.HIGH:
send_email_immediately(req.email, req.subject, req.body)
else:
send_scheduled_email(req.email, req.subject, req.body, req.schedule_at)

def _send_sms_notification(req: NotificationRequest) -> None:
body = req.body if len(req.body) <= SMS_MAX_LENGTH else req.body[:157] + "..."
send_sms(req.phone, body, req.priority)

def _send_push_notification(req: NotificationRequest) -> None:
send_push(req.push_token, req.subject, req.body, req.priority)

def _send_in_app_notification(req: NotificationRequest) -> None:
store_in_app_notification(req.user_id, req.subject, req.body)

NotificationHandler = Callable[[NotificationRequest], None]

NOTIFICATION_HANDLERS: dict[NotificationType, NotificationHandler] = {
NotificationType.EMAIL: _send_email_notification,
NotificationType.SMS: _send_sms_notification,
NotificationType.PUSH: _send_push_notification,
NotificationType.IN_APP: _send_in_app_notification,
}

def send_notification(req: NotificationRequest) -> None:
"""Dispatch a notification to the appropriate channel."""
handler = NOTIFICATION_HANDLERS.get(req.notif_type)
if handler is None:
raise ValueError(f"Unknown notification type: {req.notif_type}")
handler(req)

Quick Reference

RefactoringSignal to ApplyCore Mechanic
Extract FunctionComment explaining a block; function > 25 linesMove block to named function, replace with call
RenameAmbiguous name, single-letter variableIDE rename or grep; add deprecation alias for public API
Extract VariableComplex boolean or arithmetic expressionAssign expression to a descriptive name
Replace Magic NumberLiteral with no explanation in contextDefine CONSTANT_NAME = value at module level
Extract ClassParameters that always travel together; growing module stateGroup into @dataclass or class
Replace Conditionalif/elif switching on type/categoryDict mapping keys to functions/classes
Introduce Parameter ObjectMore than 4 parameters traveling togetherCreate @dataclass with those fields
Collapse Else / Guard ClauseDeeply nested conditionals; arrow anti-patternInvert condition, return early at top

Key Takeaways

  • Refactoring is a structural change with no behavior change - tests must stay green at every step.
  • The cardinal rule: never refactor code without tests. Write characterization tests first.
  • Extract Function is the most important refactoring - apply it whenever a comment explains what a block does.
  • Guard clauses (early returns) eliminate nesting and make the happy path the lowest-indented code.
  • Replace magic numbers with named constants so the purpose of a value is visible at its point of use.
  • Dispatch tables (dicts of callables) replace if/elif chains and make adding new types non-invasive.
  • Commit after every refactoring step. A clean history of small changes is its own form of documentation.
© 2026 EngineersOfAI. All rights reserved.