Python Refactoring Techniques Practice Problems & Exercises
Practice: Refactoring Techniques
← Back to lessonEasy
The function below has three distinct responsibilities separated by comments. Extract each block into a named helper function so that calculate_invoice becomes a pure orchestrator.
def calculate_invoice(items, tax_rate, discount_code):
"""Calculate invoice subtotal, tax, and total."""
# Compute subtotal from items
subtotal = sum(item["price"] * item["qty"] for item in items)
# Apply discount
discounts = {"SAVE10": 0.10, "SAVE20": 0.20}
rate = discounts.get(discount_code, 0.0)
subtotal = subtotal * (1 - rate)
# Compute tax and total
tax = round(subtotal * tax_rate, 2)
total = round(subtotal + tax, 2)
return {"subtotal": subtotal, "tax": tax, "total": total}
items = [{"price": 15.0, "qty": 2}, {"price": 5.0, "qty": 3}]
result = calculate_invoice(items, 0.08, "SAVE10")
print(f"Subtotal: {result['subtotal']}")
print(f"Tax: {result['tax']}")
print(f"Total: {result['total']}")Solution
# BEFORE — three responsibilities in one function
# def calculate_invoice(items, tax_rate, discount_code): ...
# AFTER — each responsibility is a named function
DISCOUNT_RATES = {"SAVE10": 0.10, "SAVE20": 0.20}
def compute_subtotal(items):
"""Sum price * qty for all items."""
return sum(item["price"] * item["qty"] for item in items)
def apply_discount(subtotal, discount_code):
"""Return subtotal after applying coupon, if any."""
rate = DISCOUNT_RATES.get(discount_code, 0.0)
return subtotal * (1 - rate)
def compute_tax_and_total(subtotal, tax_rate):
"""Return (tax, total) tuple for a given subtotal and rate."""
tax = round(subtotal * tax_rate, 2)
return tax, round(subtotal + tax, 2)
def calculate_invoice(items, tax_rate, discount_code):
"""Orchestrate subtotal, discount, and tax calculation."""
subtotal = compute_subtotal(items)
subtotal = apply_discount(subtotal, discount_code)
tax, total = compute_tax_and_total(subtotal, tax_rate)
return {"subtotal": subtotal, "tax": tax, "total": total}
items = [{"price": 15.0, "qty": 2}, {"price": 5.0, "qty": 3}]
result = calculate_invoice(items, 0.08, "SAVE10")
print(f"Subtotal: {result['subtotal']}")
print(f"Tax: {result['tax']}")
print(f"Total: {result['total']}")
What changed:
calculate_invoiceshrank from 10 lines of logic to 4 delegating calls.compute_subtotal,apply_discount, andcompute_tax_and_totalare independently testable — you can test discount logic without building a full item list.- The constant
DISCOUNT_RATESmoved to module level, making it visible and easy to extend.
The heuristic: If a comment explains what a block does, that comment title is the name of the extracted function.
Expected Output
Subtotal: 45.0\nTax: 3.6\nTotal: 48.6Hints
Hint 1: Every comment that says "# do X" is a refactoring signal — the comment title becomes the function name. Identify the three comment-separated blocks and extract each into its own function.
Hint 2: The orchestrator function should only call the helpers in sequence. After extraction it should be 4–5 lines with no arithmetic logic of its own.
The three functions below are littered with unexplained numeric literals. Replace every magic number with a named constant defined at the top of the module.
import time
SESSION_TIMEOUT_SECONDS = 3600 # 1 hour
MAX_RETRY_ATTEMPTS = 3
GRACE_PERIOD_DAYS = 7
STANDARD_LATE_FEE_RATE = 0.025 # 2.5%
OVERDUE_LATE_FEE_RATE = 0.05 # 5%
def is_session_expired(last_active: float) -> bool:
return (time.time() - last_active) > SESSION_TIMEOUT_SECONDS
def can_retry(attempt_count: int) -> bool:
return attempt_count < MAX_RETRY_ATTEMPTS
def calculate_late_fee(days_overdue: int, amount: float) -> float:
if days_overdue <= GRACE_PERIOD_DAYS:
return amount * STANDARD_LATE_FEE_RATE
return amount * OVERDUE_LATE_FEE_RATE
# Tests
last_active = time.time() - 4000 # more than 1 hour ago
print(f"Session expired: {is_session_expired(last_active)}")
print(f"Can retry: {can_retry(2)}")
print(f"Late fee: {calculate_late_fee(10, 250.0)}")Solution
# BEFORE — magic numbers scattered everywhere
import time
def is_session_expired_bad(last_active):
return (time.time() - last_active) > 3600 # What is 3600?
def can_retry_bad(attempt_count):
return attempt_count < 3 # Why 3?
def calculate_late_fee_bad(days_overdue, amount):
if days_overdue <= 7: # What is 7?
return amount * 0.025 # And 0.025?
return amount * 0.05 # And 0.05?
# AFTER — constants named at module level
SESSION_TIMEOUT_SECONDS = 3600 # 1 hour
MAX_RETRY_ATTEMPTS = 3
GRACE_PERIOD_DAYS = 7
STANDARD_LATE_FEE_RATE = 0.025 # 2.5%
OVERDUE_LATE_FEE_RATE = 0.05 # 5%
def is_session_expired(last_active: float) -> bool:
return (time.time() - last_active) > SESSION_TIMEOUT_SECONDS
def can_retry(attempt_count: int) -> bool:
return attempt_count < MAX_RETRY_ATTEMPTS
def calculate_late_fee(days_overdue: int, amount: float) -> float:
if days_overdue <= GRACE_PERIOD_DAYS:
return amount * STANDARD_LATE_FEE_RATE
return amount * OVERDUE_LATE_FEE_RATE
last_active = time.time() - 4000
print(f"Session expired: {is_session_expired(last_active)}")
print(f"Can retry: {can_retry(2)}")
print(f"Late fee: {calculate_late_fee(10, 250.0)}")
Three benefits of naming constants:
- Readability —
SESSION_TIMEOUT_SECONDSexplains what 3600 means;3600does not. - Single source of truth — change the value in one place; all functions update automatically.
- Grep-ability — searching
SESSION_TIMEOUT_SECONDSfinds every usage; searching3600also matches phone numbers, port numbers, and anything else that happens to be 3600.
Expected Output
Session expired: True\nCan retry: True\nLate fee: 12.5Hints
Hint 1: Every bare numeric literal that is not 0 or 1 is a magic number. Ask yourself: "Would a new team member know what 3600 or 0.05 means here without reading the whole function?" If not, it needs a name.
Hint 2: Define constants at the top of the module in UPPER_SNAKE_CASE. Put a short comment explaining the unit or domain meaning (e.g., `# 1 hour in seconds`).
The condition inside should_trigger_alert is a single dense boolean expression. Extract meaningful intermediate variables so the final condition reads like plain English.
def should_trigger_alert(cpu_percent, queue_depth, error_rate, maintenance_mode):
"""Return True if alert conditions are met."""
is_overloaded = cpu_percent > 90
has_queue_backup = queue_depth > 500
has_high_errors = error_rate > 0.05
is_eligible = not maintenance_mode
should_alert = is_eligible and (is_overloaded or has_queue_backup or has_high_errors)
return should_alert
# Test
print(f"Should alert: {should_trigger_alert(95, 100, 0.01, False)}")
print(f"Should alert: {should_trigger_alert(95, 100, 0.01, True)}")Solution
# BEFORE — single impenetrable boolean expression
def should_trigger_alert_bad(cpu_percent, queue_depth, error_rate, maintenance_mode):
return (not maintenance_mode and
(cpu_percent > 90 or queue_depth > 500 or error_rate > 0.05))
# AFTER — extracted variables, reads like English
def should_trigger_alert(cpu_percent, queue_depth, error_rate, maintenance_mode):
"""Return True if alert conditions are met."""
is_overloaded = cpu_percent > 90
has_queue_backup = queue_depth > 500
has_high_errors = error_rate > 0.05
is_eligible = not maintenance_mode
should_alert = is_eligible and (is_overloaded or has_queue_backup or has_high_errors)
return should_alert
print(f"Should alert: {should_trigger_alert(95, 100, 0.01, False)}")
print(f"Should alert: {should_trigger_alert(95, 100, 0.01, True)}")
What changes at review time:
BEFORE:
return (not maintenance_mode and
(cpu_percent > 90 or queue_depth > 500 or error_rate > 0.05))
# Reviewer must decode every threshold inline
AFTER:
is_overloaded = cpu_percent > 90
has_queue_backup = queue_depth > 500
has_high_errors = error_rate > 0.05
is_eligible = not maintenance_mode
should_alert = is_eligible and (is_overloaded or has_queue_backup or has_high_errors)
# Each line answers one question — logic is obvious
Bonus: extracted variables are trivially unit-testable. You can verify the threshold for "overloaded" independently of the overall alerting logic.
Expected Output
Should alert: True\nShould alert: FalseHints
Hint 1: A complex boolean condition that spans one long line is hard to parse. Break it into named intermediate variables — each variable name describes one aspect of the check.
Hint 2: Think of each sub-expression as answering a yes/no question: "Is the server overloaded?", "Are requests backed up?", "Is the error rate high?" Name variables accordingly.
The function below uses five levels of nesting — the "arrow anti-pattern." Refactor it using guard clauses (early returns) so the happy path is flat.
def generate_shipment_id(order, warehouse, carrier):
"""Generate a shipment ID string or return None if inputs are invalid."""
if order is None:
return None
if not order.get("confirmed"):
return None
if warehouse is None:
return None
if not warehouse.get("has_stock"):
return None
if carrier is None:
return None
state = warehouse["state"]
order_num = order["number"]
return f"{state}-{order_num}"
# Tests
order_ok = {"confirmed": True, "number": "89423"}
warehouse_ok = {"has_stock": True, "state": "TX"}
carrier_ok = {"name": "FedEx"}
print(f"Result: {generate_shipment_id(order_ok, warehouse_ok, carrier_ok)}")
print(f"Result: {generate_shipment_id(None, warehouse_ok, carrier_ok)}")
print(f"Result: {generate_shipment_id(order_ok, warehouse_ok, None)}")
print(f"Result: {generate_shipment_id({'confirmed': False, 'number': '1'}, warehouse_ok, carrier_ok)}")Solution
# BEFORE — arrow anti-pattern (deeply nested)
def generate_shipment_id_bad(order, warehouse, carrier):
if order is not None:
if order.get("confirmed"):
if warehouse is not None:
if warehouse.get("has_stock"):
if carrier is not None:
state = warehouse["state"]
order_num = order["number"]
return f"{state}-{order_num}"
else:
return None
else:
return None
else:
return None
else:
return None
else:
return None
# AFTER — guard clauses, flat happy path
def generate_shipment_id(order, warehouse, carrier):
"""Generate a shipment ID string or return None if inputs are invalid."""
if order is None:
return None
if not order.get("confirmed"):
return None
if warehouse is None:
return None
if not warehouse.get("has_stock"):
return None
if carrier is None:
return None
state = warehouse["state"]
order_num = order["number"]
return f"{state}-{order_num}"
order_ok = {"confirmed": True, "number": "89423"}
warehouse_ok = {"has_stock": True, "state": "TX"}
carrier_ok = {"name": "FedEx"}
print(f"Result: {generate_shipment_id(order_ok, warehouse_ok, carrier_ok)}")
print(f"Result: {generate_shipment_id(None, warehouse_ok, carrier_ok)}")
print(f"Result: {generate_shipment_id(order_ok, warehouse_ok, None)}")
print(f"Result: {generate_shipment_id({'confirmed': False, 'number': '1'}, warehouse_ok, carrier_ok)}")
The pattern — invert and return early:
BEFORE (nested):
if condition_A:
if condition_B:
do_work() # buried under 2 levels
AFTER (guard):
if not condition_A: return None
if not condition_B: return None
do_work() # at top level, no nesting
Rule: Failure cases belong at the top. The happy path belongs at the bottom, flat. A reviewer scanning the function sees all the ways it can fail before seeing what it does on success.
Expected Output
Result: TX-89423\nResult: None\nResult: None\nResult: NoneHints
Hint 1: Every nested `if condition: ... else: return None` can be inverted to a guard clause: `if not condition: return None`. Apply this inversion to each level of nesting, working from the outermost inward.
Hint 2: After applying guard clauses, the happy path (the actual work) should sit at the lowest indentation level with no nesting around it.
Medium
The function below contains four distinct responsibilities. Extract each into a named helper, move the inline constants to module level, and keep process_signup as a clean orchestrator.
import hashlib
VALID_INVITE_CODES = {"BETA2024", "EARLYBIRD", "FRIENDS"}
MIN_PASSWORD_LENGTH = 8
MIN_USERNAME_LENGTH = 3
def validate_signup_inputs(email, password, username):
"""Raise ValueError if any signup field fails validation."""
if not email or "@" not in email:
raise ValueError("Invalid email")
if len(password) < MIN_PASSWORD_LENGTH:
raise ValueError("Password too short")
if not username or len(username) < MIN_USERNAME_LENGTH:
raise ValueError("Username too short")
def validate_invite_code(invite_code):
"""Raise ValueError if the invite code is not recognized."""
if invite_code not in VALID_INVITE_CODES:
raise ValueError("Invalid invite code")
def hash_password(password):
"""Return a SHA-256 hex digest of the password."""
return hashlib.sha256(password.encode()).hexdigest()
def build_user_record(email, username, password_hash, invite_code):
"""Assemble the initial user record dict."""
return {
"email": email.lower().strip(),
"username": username.lower(),
"password_hash": password_hash,
"invite_code": invite_code,
"status": "pending_verification",
}
def process_signup(email, password, username, invite_code):
"""Validate and build a new user record from signup data."""
validate_signup_inputs(email, password, username)
validate_invite_code(invite_code)
password_hash = hash_password(password)
return build_user_record(email, username, password_hash, invite_code)
result = process_signup("[email protected]", "123456", "alice", "BETA2024")
print(result)Solution
# BEFORE — four responsibilities in one function
import hashlib
def process_signup_bad(email, password, username, invite_code):
# Validate inputs
if not email or "@" not in email:
raise ValueError("Invalid email")
if len(password) < 8:
raise ValueError("Password too short")
if not username or len(username) < 3:
raise ValueError("Username too short")
# Check invite code
valid_codes = {"BETA2024", "EARLYBIRD", "FRIENDS"}
if invite_code not in valid_codes:
raise ValueError("Invalid invite code")
# Hash password
hashed = hashlib.sha256(password.encode()).hexdigest()
# Build user record
return {
"email": email.lower().strip(),
"username": username.lower(),
"password_hash": hashed,
"invite_code": invite_code,
"status": "pending_verification",
}
# AFTER — extracted and named
VALID_INVITE_CODES = {"BETA2024", "EARLYBIRD", "FRIENDS"}
MIN_PASSWORD_LENGTH = 8
MIN_USERNAME_LENGTH = 3
def validate_signup_inputs(email, password, username):
if not email or "@" not in email:
raise ValueError("Invalid email")
if len(password) < MIN_PASSWORD_LENGTH:
raise ValueError("Password too short")
if not username or len(username) < MIN_USERNAME_LENGTH:
raise ValueError("Username too short")
def validate_invite_code(invite_code):
if invite_code not in VALID_INVITE_CODES:
raise ValueError("Invalid invite code")
def hash_password(password):
return hashlib.sha256(password.encode()).hexdigest()
def build_user_record(email, username, password_hash, invite_code):
return {
"email": email.lower().strip(),
"username": username.lower(),
"password_hash": password_hash,
"invite_code": invite_code,
"status": "pending_verification",
}
def process_signup(email, password, username, invite_code):
validate_signup_inputs(email, password, username)
validate_invite_code(invite_code)
password_hash = hash_password(password)
return build_user_record(email, username, password_hash, invite_code)
print(result)
Testability gains:
BEFORE:
One test must exercise all four paths via process_signup()
To test invite validation you must also pass valid email + password
AFTER:
test_validate_signup_inputs() — only needs email/password/username
test_validate_invite_code() — only needs invite string
test_hash_password() — only needs a string
test_build_user_record() — only needs clean inputs
Each helper has a single reason to change and a single reason to fail.
Expected Output
{'email': '[email protected]', 'username': 'alice', 'password_hash': '8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92', 'invite_code': 'BETA2024', 'status': 'pending_verification'}Hints
Hint 1: There are four comment-labeled blocks: validate inputs, check invite code, hash password, build user record. Each becomes one function. The orchestrator calls all four in sequence.
Hint 2: Move the constants (VALID_INVITE_CODES, MIN_PASSWORD_LENGTH, MIN_USERNAME_LENGTH) to module level. Named constants prevent the same magic number appearing in multiple places.
Refactor get_discount_rate by (1) replacing all magic numbers with named constants, (2) applying guard clauses to collapse the nesting, and (3) extracting intermediate boolean variables.
MIN_DISCOUNT_THRESHOLD = 100
LARGE_ORDER_THRESHOLD = 500
LOYALTY_YEARS_REQUIRED = 3
LOYAL_LARGE_DISCOUNT = 0.20
LOYAL_STANDARD_DISCOUNT = 0.15
STANDARD_LARGE_DISCOUNT = 0.10
STANDARD_DISCOUNT = 0.05
NO_DISCOUNT = 0.0
def get_discount_rate(user, cart_total):
"""Return discount rate based on user loyalty and cart size."""
if user is None:
return NO_DISCOUNT
if not user.get("is_active"):
return NO_DISCOUNT
if cart_total < MIN_DISCOUNT_THRESHOLD:
return NO_DISCOUNT
is_loyal = user.get("loyalty_years", 0) >= LOYALTY_YEARS_REQUIRED and not user.get("has_pending_payment")
is_large = cart_total >= LARGE_ORDER_THRESHOLD
if is_loyal:
return LOYAL_LARGE_DISCOUNT if is_large else LOYAL_STANDARD_DISCOUNT
return STANDARD_LARGE_DISCOUNT if is_large else STANDARD_DISCOUNT
# Tests
loyal_user = {"is_active": True, "loyalty_years": 5, "has_pending_payment": False}
standard_user = {"is_active": True, "loyalty_years": 1, "has_pending_payment": False}
inactive_user = {"is_active": False, "loyalty_years": 5, "has_pending_payment": False}
print(f"Rate: {get_discount_rate(loyal_user, 600)}")
print(f"Rate: {get_discount_rate(loyal_user, 200)}")
print(f"Rate: {get_discount_rate(standard_user, 600)}")
print(f"Rate: {get_discount_rate(standard_user, 200)}")
print(f"Rate: {get_discount_rate(inactive_user, 600)}")
print(f"Rate: {get_discount_rate(None, 600)}")Solution
# BEFORE — nested arrow pattern with magic numbers
def get_discount_rate_bad(user, cart_total):
if user is not None:
if user.get("is_active"):
if cart_total >= 100:
if user.get("loyalty_years", 0) >= 3 and not user.get("has_pending_payment"):
if cart_total >= 500:
return 0.20
else:
return 0.15
else:
if cart_total >= 500:
return 0.10
else:
return 0.05
else:
return 0.0
else:
return 0.0
else:
return 0.0
# AFTER — named constants + guard clauses + extracted variables
MIN_DISCOUNT_THRESHOLD = 100
LARGE_ORDER_THRESHOLD = 500
LOYALTY_YEARS_REQUIRED = 3
LOYAL_LARGE_DISCOUNT = 0.20
LOYAL_STANDARD_DISCOUNT = 0.15
STANDARD_LARGE_DISCOUNT = 0.10
STANDARD_DISCOUNT = 0.05
NO_DISCOUNT = 0.0
def get_discount_rate(user, cart_total):
if user is None:
return NO_DISCOUNT
if not user.get("is_active"):
return NO_DISCOUNT
if cart_total < MIN_DISCOUNT_THRESHOLD:
return NO_DISCOUNT
is_loyal = user.get("loyalty_years", 0) >= LOYALTY_YEARS_REQUIRED and not user.get("has_pending_payment")
is_large = cart_total >= LARGE_ORDER_THRESHOLD
if is_loyal:
return LOYAL_LARGE_DISCOUNT if is_large else LOYAL_STANDARD_DISCOUNT
return STANDARD_LARGE_DISCOUNT if is_large else STANDARD_DISCOUNT
loyal_user = {"is_active": True, "loyalty_years": 5, "has_pending_payment": False}
standard_user = {"is_active": True, "loyalty_years": 1, "has_pending_payment": False}
inactive_user = {"is_active": False, "loyalty_years": 5, "has_pending_payment": False}
print(f"Rate: {get_discount_rate(loyal_user, 600)}")
print(f"Rate: {get_discount_rate(loyal_user, 200)}")
print(f"Rate: {get_discount_rate(standard_user, 600)}")
print(f"Rate: {get_discount_rate(standard_user, 200)}")
print(f"Rate: {get_discount_rate(inactive_user, 600)}")
print(f"Rate: {get_discount_rate(None, 600)}")
Three techniques applied together:
1. Named constants — 0.20 becomes LOYAL_LARGE_DISCOUNT
2. Guard clauses — 5 nested if/else collapses to 3 early returns
3. Extract variable — complex boolean becomes is_loyal / is_large
The refactored body is 8 lines vs 18 lines of nesting. Every line now has an obvious purpose.
Expected Output
Rate: 0.2\nRate: 0.15\nRate: 0.1\nRate: 0.05\nRate: 0.0\nRate: 0.0Hints
Hint 1: Apply guard clauses first to flatten the nesting, then name every numeric literal. The function has five distinct exit conditions — each becomes a guard at the top of the refactored function.
Hint 2: After guards, the remaining logic is a simple 2-by-2 dispatch based on is_loyal and is_large_order. Extract both boolean sub-expressions into named variables before writing the final return.
The function below contains three forms of dead code: an unreachable branch, a variable assigned but never used, and commented-out logic that will never be restored. Clean all of it up.
def format_currency(amount: float, currency: str = "USD") -> str:
"""Format a monetary amount with currency symbol."""
SYMBOLS = {"USD": "$", "EUR": "EUR ", "GBP": "GBP "}
symbol = SYMBOLS.get(currency, currency + " ")
formatted = f"{amount:,.2f}"
result = f"{currency} {formatted}"
return result
print(f"Formatted: {format_currency(1234.567)}")
print(f"Formatted: {format_currency(99.9, 'EUR')}")
print(f"Formatted: {format_currency(0.5, 'GBP')}")Solution
# BEFORE — three kinds of dead code
def format_currency_bad(amount: float, currency: str = "USD") -> str:
SYMBOLS = {"USD": "$", "EUR": "EUR ", "GBP": "GBP "}
symbol = SYMBOLS.get(currency, currency + " ") # DEAD: symbol never used
formatted = f"{amount:,.2f}"
# DEAD: unreachable — amount is a float, can never be a string
if isinstance(amount, str):
formatted = amount
result = f"{currency} {formatted}"
# DEAD: commented-out legacy code that will never be un-commented
# legacy_format = "%s%.2f" % (symbol, amount)
# return legacy_format
return result
# AFTER — dead code removed
def format_currency(amount: float, currency: str = "USD") -> str:
"""Format a monetary amount with currency symbol."""
formatted = f"{amount:,.2f}"
return f"{currency} {formatted}"
print(f"Formatted: {format_currency(1234.567)}")
print(f"Formatted: {format_currency(99.9, 'EUR')}")
print(f"Formatted: {format_currency(0.5, 'GBP')}")
Three categories of dead code removed:
1. Unused variable: symbol computed, never referenced
-> Delete SYMBOLS dict and the symbol assignment entirely
2. Unreachable branch: amount is typed float; isinstance(amount, str) always False
-> Delete the if block
3. Commented-out code: legacy_format and the return below it
-> Delete both lines; use version control for history
Why commented-out code is worse than deletion:
- It creates confusion: "Is this coming back? Is it broken? Do I need to understand it?"
- It makes the file longer, increasing cognitive load on every reader.
- Git history preserves deleted code — you can always recover it. Comments cannot be
git blamed.
Expected Output
Formatted: USD 1,234.57\nFormatted: EUR 99.90\nFormatted: GBP 0.50Hints
Hint 1: Dead code includes: branches that can never execute, variables that are assigned but never read, and functions that are defined but never called. Identify each dead section and remove it entirely.
Hint 2: Be precise: if a variable is used in one branch but not another, it is not dead code — only the specific unused assignment or branch is dead.
Each function below has a single-use intermediate variable that adds no clarity. Inline the expression directly into the return statement. Also fix the one variable that should stay extracted.
def build_connection_string(host, port, user, password, db_name):
"""Build a PostgreSQL DSN string."""
return f"postgresql://{user}:{password}@{host}:{port}/{db_name}"
def build_api_endpoint(base_url, version, resource, page, limit):
"""Build a paginated API endpoint URL."""
versioned_base = f"{base_url}/v{version}"
query = f"page={page}&limit={limit}"
endpoint = f"{versioned_base}/{resource}?{query}"
return endpoint
print(f"Connection string: {build_connection_string('db.prod.internal', 5432, 'admin', 'secret', 'appdb')}")
print(f"Endpoint: {build_api_endpoint('https://api.example.com', 2, 'users', 1, 50)}")Solution
# BEFORE — unnecessary single-use variables
def build_connection_string_bad(host, port, user, password, db_name):
dsn = f"postgresql://{user}:{password}@{host}:{port}/{db_name}"
return dsn # dsn used exactly once, immediately — no value added
def build_api_endpoint_bad(base_url, version, resource, page, limit):
versioned_base = f"{base_url}/v{version}" # multi-use: appears in endpoint
query = f"page={page}&limit={limit}" # multi-use: appears in endpoint
endpoint = f"{versioned_base}/{resource}?{query}" # single-use: could inline
return endpoint
# AFTER
# Inline single-use intermediates; keep variables that appear more than once
# or whose name adds meaning not present in the expression
def build_connection_string(host, port, user, password, db_name):
"""Build a PostgreSQL DSN string."""
return f"postgresql://{user}:{password}@{host}:{port}/{db_name}"
def build_api_endpoint(base_url, version, resource, page, limit):
"""Build a paginated API endpoint URL."""
versioned_base = f"{base_url}/v{version}"
query = f"page={page}&limit={limit}"
endpoint = f"{versioned_base}/{resource}?{query}"
return endpoint
print(f"Connection string: {build_connection_string('db.prod.internal', 5432, 'admin', 'secret', 'appdb')}")
print(f"Endpoint: {build_api_endpoint('https://api.example.com', 2, 'users', 1, 50)}")
The inlining decision rule:
Keep the variable if it:
- Is used more than once
- Has a name that explains a non-obvious concept
- Makes a complex expression readable
Inline the variable if it:
- Is used exactly once, immediately after assignment
- Has a name that just mirrors the expression ("dsn = f'...'")
- Its removal makes the code shorter without losing clarity
Note: versioned_base and query in build_api_endpoint are each referenced once but kept — they break a three-part f-string into understandable chunks. The judgment call is whether the name adds meaning. versioned_base does; dsn does not (it just repeats "the connection string").
Expected Output
Connection string: postgresql://admin:[email protected]:5432/appdb\nEndpoint: https://api.example.com/v2/users?page=1&limit=50Hints
Hint 1: Inline a variable when (a) it is used exactly once, (b) immediately after assignment, and (c) the expression being assigned is already self-explanatory. The variable adds no clarity — it only adds a line.
Hint 2: Contrast: `result = expensive_lookup(); return result` — inline it. But `threshold = config.get("timeout") * 1.5` assigned to `timeout_with_buffer` used three lines later — keep it (the name adds meaning).
Hard
Refactor calculate_shipping by replacing the if/elif chain with a dispatch table (a dictionary mapping shipping types to callables). Adding a new type should require only one line of code.
from typing import Callable
ShippingFn = Callable[[float, float], float]
SHIPPING_CALCULATORS: dict[str, ShippingFn] = {
"standard": lambda w, d: w * 0.5 + d * 0.01,
"express": lambda w, d: w * 1.2 + d * 0.05 + 5.00,
"overnight": lambda w, d: w * 2.0 + d * 0.08 + 15.00,
"freight": lambda w, d: w * 0.3 + d * 0.02,
}
def calculate_shipping(order_type: str, weight: float, distance: float) -> float:
"""Calculate shipping cost using a dispatch table."""
calculator = SHIPPING_CALCULATORS.get(order_type)
if calculator is None:
raise ValueError(f"Unknown order type: {order_type}")
return calculator(weight, distance)
for otype in ["standard", "express", "overnight", "freight"]:
cost = calculate_shipping(otype, 10.0, 200.0)
print(f"{otype}: {cost}")Solution
# BEFORE — if/elif chain, grows with every new type
def calculate_shipping_bad(order_type: str, weight: float, distance: float) -> float:
if order_type == "standard":
return weight * 0.5 + distance * 0.01
elif order_type == "express":
return weight * 1.2 + distance * 0.05 + 5.00
elif order_type == "overnight":
return weight * 2.0 + distance * 0.08 + 15.00
elif order_type == "freight":
return weight * 0.3 + distance * 0.02
else:
raise ValueError(f"Unknown order type: {order_type}")
# AFTER — dispatch table
from typing import Callable
ShippingFn = Callable[[float, float], float]
SHIPPING_CALCULATORS: dict[str, ShippingFn] = {
"standard": lambda w, d: w * 0.5 + d * 0.01,
"express": lambda w, d: w * 1.2 + d * 0.05 + 5.00,
"overnight": lambda w, d: w * 2.0 + d * 0.08 + 15.00,
"freight": lambda w, d: w * 0.3 + d * 0.02,
}
def calculate_shipping(order_type: str, weight: float, distance: float) -> float:
"""Calculate shipping cost using a dispatch table."""
calculator = SHIPPING_CALCULATORS.get(order_type)
if calculator is None:
raise ValueError(f"Unknown order type: {order_type}")
return calculator(weight, distance)
for otype in ["standard", "express", "overnight", "freight"]:
cost = calculate_shipping(otype, 10.0, 200.0)
print(f"{otype}: {cost}")
Open/closed comparison:
BEFORE (if/elif):
Adding "same_day" requires editing calculate_shipping()
Every edit risks breaking existing branches
Test coverage must grow for every new branch
AFTER (dispatch table):
Adding "same_day" = one dict entry:
"same_day": lambda w, d: w * 3.0 + d * 0.12 + 25.00
calculate_shipping() itself never changes
Existing tests remain unaffected
When to prefer classes over lambdas: When the per-type logic exceeds what fits comfortably in a single expression, extract it to a named function or subclass. The dictionary still maps the key to the callable — the callable just happens to be a function rather than a lambda.
Expected Output
standard: 14.5\nexpress: 34.6\novernight: 51.0\nfreight: 11.1Hints
Hint 1: A dictionary mapping the `order_type` string to a callable eliminates the if/elif chain entirely. The callable takes `(weight, distance)` and returns the cost. Lambda functions work for simple formulas.
Hint 2: The function becomes: look up the calculator, raise ValueError if absent, call it. Adding a new shipping type means adding one dict entry — no modification of branching logic.
The two functions below share a long list of parameters that always travel together. Introduce a UserProfile dataclass, move the shared fields into it, and add a full_name property. Update both function signatures.
from dataclasses import dataclass
@dataclass
class UserProfile:
first_name: str
last_name: str
email: str
country: str
timezone: str
newsletter_opt_in: bool = False
referral_code: str | None = None
@property
def full_name(self) -> str:
return f"{self.first_name} {self.last_name}"
def create_user(profile: UserProfile, region: str) -> dict:
"""Create a new user record."""
return {
"email": profile.email,
"name": profile.full_name,
"country": profile.country,
"timezone": profile.timezone,
"region": region,
"newsletter": profile.newsletter_opt_in,
}
def update_user_timezone(user_id: int, profile: UserProfile) -> dict:
"""Update the timezone for an existing user."""
return {"user_id": user_id, "email": profile.email, "timezone": profile.timezone}
# Test
profile = UserProfile(
first_name="Alice",
last_name="Smith",
email="[email protected]",
country="US",
timezone="America/New_York",
)
created = create_user(profile, "us-east-1")
print(f"Created: {created['email']} ({created['name']}) in {created['region']}")
profile.timezone = "Europe/London"
updated = update_user_timezone(42, profile)
print(f"Updated: {updated['email']} timezone={updated['timezone']}")
print(f"Full name: {profile.full_name}")Solution
# BEFORE — long parameter lists duplicated across functions
def create_user_bad(
first_name, last_name, email, country, timezone,
newsletter_opt_in=False, referral_code=None, region="us-east-1"
):
return {
"email": email,
"name": f"{first_name} {last_name}",
"country": country,
"timezone": timezone,
"region": region,
"newsletter": newsletter_opt_in,
}
def update_user_timezone_bad(user_id, first_name, last_name, email, country, timezone):
return {"user_id": user_id, "email": email, "timezone": timezone}
# AFTER — parameter object groups shared fields
from dataclasses import dataclass
@dataclass
class UserProfile:
first_name: str
last_name: str
email: str
country: str
timezone: str
newsletter_opt_in: bool = False
referral_code: str | None = None
@property
def full_name(self) -> str:
return f"{self.first_name} {self.last_name}"
def create_user(profile: UserProfile, region: str) -> dict:
return {
"email": profile.email,
"name": profile.full_name,
"country": profile.country,
"timezone": profile.timezone,
"region": region,
"newsletter": profile.newsletter_opt_in,
}
def update_user_timezone(user_id: int, profile: UserProfile) -> dict:
return {"user_id": user_id, "email": profile.email, "timezone": profile.timezone}
profile = UserProfile(
first_name="Alice",
last_name="Smith",
country="US",
timezone="America/New_York",
)
created = create_user(profile, "us-east-1")
print(f"Created: {created['email']} ({created['name']}) in {created['region']}")
profile.timezone = "Europe/London"
updated = update_user_timezone(42, profile)
print(f"Updated: {updated['email']} timezone={updated['timezone']}")
print(f"Full name: {profile.full_name}")
Two compounding benefits:
1. Call-site noise reduction:
BEFORE: create_user("Alice", "Smith", "[email protected]", "US", "America/New_York", False, None, "us-east-1")
AFTER: create_user(profile, "us-east-1")
2. Derived properties have a home:
BEFORE: every function computes f"{first_name} {last_name}"
AFTER: profile.full_name — defined once, used everywhere
Adding a new field later:
Before: change every function signature that uses the user concept.
After: add one field to UserProfile — all functions that accept UserProfile automatically have access.
Expected Output
Created: [email protected] (Alice Smith) in us-east-1\nUpdated: [email protected] timezone=Europe/London\nFull name: Alice SmithHints
Hint 1: Identify the group of parameters that appear together in multiple functions: first_name, last_name, email, country, timezone. These belong in a dataclass called UserProfile.
Hint 2: Derived properties like full_name belong on the dataclass — not on the calling code. After introducing the object, check whether any logic in the old functions can move to properties on the class.
Apply all five refactoring techniques to send_notification: introduce a parameter object, replace the growing if/elif chain with a dispatch table, extract per-channel helper functions, replace magic numbers, and apply a guard clause for the unknown-type error path.
from dataclasses import dataclass
from enum import Enum
class Channel(str, Enum):
EMAIL = "email"
SMS = "sms"
PUSH = "push"
IN_APP = "in_app"
class Priority(str, Enum):
HIGH = "high"
NORMAL = "normal"
SMS_MAX_LENGTH = 160
@dataclass
class NotificationRequest:
user_id: int
channel: Channel
subject: str
body: str
email: str | None = None
phone: str | None = None
push_token: str | None = None
priority: Priority = Priority.NORMAL
def _send_email(req: NotificationRequest) -> None:
label = "high priority" if req.priority == Priority.HIGH else "normal priority"
print(f"email sent to {req.email} ({label})")
def _send_sms(req: NotificationRequest) -> None:
body = req.body if len(req.body) <= SMS_MAX_LENGTH else req.body[:157] + "..."
truncated = len(req.body) > SMS_MAX_LENGTH
print(f"sms sent to {req.phone} (truncated: {truncated})")
def _send_push(req: NotificationRequest) -> None:
print(f"push sent to {req.push_token} (priority: {req.priority.value})")
def _send_in_app(req: NotificationRequest) -> None:
print(f"in_app stored for user {req.user_id}")
CHANNEL_HANDLERS = {
Channel.EMAIL: _send_email,
Channel.SMS: _send_sms,
Channel.PUSH: _send_push,
Channel.IN_APP: _send_in_app,
}
def send_notification(req: NotificationRequest) -> None:
"""Dispatch a notification to the appropriate channel handler."""
handler = CHANNEL_HANDLERS.get(req.channel)
if handler is None:
raise ValueError(f"Unknown channel: {req.channel}")
handler(req)
# Tests
send_notification(NotificationRequest(1, Channel.EMAIL, "Hi", "Body", email="[email protected]", priority=Priority.HIGH))
send_notification(NotificationRequest(2, Channel.SMS, "Hi", "Short body", phone="+15551234"))
send_notification(NotificationRequest(3, Channel.PUSH, "Hi", "Body", push_token="tok_abc123"))
send_notification(NotificationRequest(7, Channel.IN_APP, "Hi", "Body"))
try:
send_notification(NotificationRequest(5, "fax", "Hi", "Body"))
except ValueError as e:
print(f"ValueError: {e}")Solution
# BEFORE — monolithic function with 10 parameters and growing if/elif
def send_notification_bad(user_id, notif_type, subject, body, email, phone,
push_token, priority, retry_count, schedule_at):
if notif_type == "email":
if priority == "high":
print(f"email sent to {email} (high priority)")
else:
print(f"email sent to {email} (normal priority)")
elif notif_type == "sms":
if len(body) > 160:
body = body[:157] + "..."
print(f"sms sent to {phone}")
elif notif_type == "push":
print(f"push sent to {push_token}")
elif notif_type == "in_app":
print(f"in_app stored for user {user_id}")
else:
raise ValueError(f"Unknown notification type: {notif_type}")
# AFTER — all five techniques applied
from dataclasses import dataclass
from enum import Enum
class Channel(str, Enum):
EMAIL = "email"
SMS = "sms"
PUSH = "push"
IN_APP = "in_app"
class Priority(str, Enum):
HIGH = "high"
NORMAL = "normal"
SMS_MAX_LENGTH = 160 # characters — carrier limit
@dataclass
class NotificationRequest:
user_id: int
channel: Channel
subject: str
body: str
email: str | None = None
phone: str | None = None
push_token: str | None = None
priority: Priority = Priority.NORMAL
def _send_email(req: NotificationRequest) -> None:
label = "high priority" if req.priority == Priority.HIGH else "normal priority"
print(f"email sent to {req.email} ({label})")
def _send_sms(req: NotificationRequest) -> None:
body = req.body if len(req.body) <= SMS_MAX_LENGTH else req.body[:157] + "..."
truncated = len(req.body) > SMS_MAX_LENGTH
print(f"sms sent to {req.phone} (truncated: {truncated})")
def _send_push(req: NotificationRequest) -> None:
print(f"push sent to {req.push_token} (priority: {req.priority.value})")
def _send_in_app(req: NotificationRequest) -> None:
print(f"in_app stored for user {req.user_id}")
CHANNEL_HANDLERS = {
Channel.EMAIL: _send_email,
Channel.SMS: _send_sms,
Channel.PUSH: _send_push,
Channel.IN_APP: _send_in_app,
}
def send_notification(req: NotificationRequest) -> None:
"""Dispatch a notification to the appropriate channel handler."""
handler = CHANNEL_HANDLERS.get(req.channel)
if handler is None:
raise ValueError(f"Unknown channel: {req.channel}")
handler(req)
send_notification(NotificationRequest(1, Channel.EMAIL, "Hi", "Body", email="[email protected]", priority=Priority.HIGH))
send_notification(NotificationRequest(2, Channel.SMS, "Hi", "Short body", phone="+15551234"))
send_notification(NotificationRequest(3, Channel.PUSH, "Hi", "Body", push_token="tok_abc123"))
send_notification(NotificationRequest(7, Channel.IN_APP, "Hi", "Body"))
try:
send_notification(NotificationRequest(5, "fax", "Hi", "Body"))
except ValueError as e:
print(f"ValueError: {e}")
Techniques applied and their payoffs:
1. Parameter object (NotificationRequest):
- 10 positional args -> 1 typed dataclass
- Call sites are self-documenting: email="alice@..."
2. Enum (Channel, Priority):
- "email" -> Channel.EMAIL (typo-proof, IDE-completable)
- Type checker catches Channel.FAX before runtime
3. Extract function (_send_email, _send_sms, ...):
- Each channel's logic lives in its own function
- Independently testable — test SMS truncation in isolation
4. Dispatch table (CHANNEL_HANDLERS):
- Adding "slack" = one dict entry + one private function
- send_notification() never changes
5. Guard clause (if handler is None: raise):
- Unknown channel detected at entry, not buried in elif
- Error message names the unknown value explicitly
Adding a new notification channel now requires zero changes to send_notification. That is the goal of every refactoring in this set.
Expected Output
email sent to [email protected] (high priority)\nsms sent to +15551234 (truncated: False)\npush sent to tok_abc123 (priority: normal)\nin_app stored for user 7\nValueError: Unknown channel: faxHints
Hint 1: Work through the techniques in order: (1) introduce a NotificationRequest dataclass, (2) replace string literals with an Enum, (3) extract per-channel handlers as private functions, (4) build the dispatch table, (5) apply guard clause for unknown channel.
Hint 2: The SMS truncation logic (160-character limit) is a named constant waiting to be extracted. The per-channel helpers each receive the full NotificationRequest object — they pick out only what they need.
