Debugging Strategies - Systematic Approaches to Finding Bugs
Reading time: ~20 minutes | Level: Foundation → Engineering
Here is a program with a bug. Most developers spend 20 minutes guessing. Expert debuggers find it in 90 seconds. The difference is method, not luck:
def calculate_average(numbers):
total = 0
for n in numbers:
total = total + n
return total / len(numbers)
data = [10, 20, 30, 40, 50]
result = calculate_average(data)
print(f"Average: {result}")
# Remove the last element to simulate a filtered dataset
filtered = [x for x in data if x > 100]
average = calculate_average(filtered)
print(f"Filtered average: {average}")
Average: 30.0
Traceback (most recent call last):
File "calc.py", line 12, in <module>
average = calculate_average(filtered)
File "calc.py", line 5, in calculate_average
return total / len(numbers)
ZeroDivisionError: division by zero
An expert reads the traceback in five seconds and knows exactly what happened: filtered is an empty list (x > 100 matches nothing), so len(numbers) is 0. The fix is one line. The expert did not guess - they read the evidence.
This lesson teaches you to read evidence the way experts do.
What You Will Learn
- The scientific method of debugging and why random code changes make bugs worse
- Reading Python tracebacks from bottom to top - the correct direction
pdbandbreakpoint(): the Python debugger and its essential commands- Post-mortem debugging with
pdb.pm()after an unhandled exception - Print debugging with
repr()andpprintwhen it is the right tool - The
tracebackmodule: capturing and formatting exception information programmatically sys.exc_info(): accessing exception type, value, and traceback in codewarnings.warn(): soft errors and deprecation notices- Performance debugging:
cProfile,line_profiler,memory_profiler - Real-world strategies: bisecting bugs, minimal reproduction, checking git diff
Prerequisites
- Python 3.8+ installed
- Understanding of Python exceptions, try/except, and tracebacks
- Basic comfort with running Python scripts from the terminal
Part 1 - The Scientific Method of Debugging
Debugging is not guessing. It is the scientific method applied to code.
Key rule: change one thing at a time. Random multi-change commits destroy reproducibility.
What Not to Do
# BAD debugging workflow - every professional has done this, it always wastes time
# 1. See an error
# 2. Make 3 random changes based on vague intuitions
# 3. Run the code - different error
# 4. Revert all changes because you don't know which one "fixed" what
# 5. Make more random changes
# 6. Somehow it works but you don't know why
# 7. Commit it and hope no one notices
The scientific method prevents this. Form one hypothesis. Test it. Conclude. Repeat.
Part 2 - Reading Tracebacks
A traceback is a complete description of what happened and where. It is not noise - it is evidence.
The Anatomy of a Traceback
# buggy.py
def parse_config(text):
lines = text.split("\n")
return {line.split("=")[0]: line.split("=")[1] for line in lines}
def load_settings(filename):
with open(filename) as f:
content = f.read()
return parse_config(content)
def start_app():
settings = load_settings("config.txt")
port = int(settings["port"])
print(f"Starting on port {port}")
start_app()
Traceback (most recent call last):
File "buggy.py", line 14, in <module>
start_app()
File "buggy.py", line 11, in start_app
settings = load_settings("config.txt")
File "buggy.py", line 7, in load_settings
return parse_config(content)
File "buggy.py", line 2, in parse_config
lines = text.split("\n")
AttributeError: 'NoneType' object has no attribute 'split'
How to Read It - Bottom to Top
| Reading Order | Traceback Line | Question It Answers |
|---|---|---|
| 1 (start here) | AttributeError: 'NoneType'... | What happened? |
| 2 | File "buggy.py", line 2, in parse_config | Where exactly? |
| 3 | lines = text.split("\n") | What code failed? |
| 4 | File "buggy.py", line 7, in load_settings | Who called it? |
| 5 | return parse_config(content) | With what argument? |
| 6 | File "buggy.py", line 11, in start_app | And before that? |
| 7 (entry point) | File "buggy.py", line 14, in <module> | Where execution started |
Reading this:
- The error is
AttributeError: 'NoneType' object has no attribute 'split' - It happened on line 2 of
parse_config, wheretext.split("\n")was called textisNone- it came fromcontent = f.read()f.read()returnsNoneonly if the file handle was opened in a mode that returns None... or ifcontentwas explicitly set to None between read and parse
The bug: f.read() on an empty file returns "" (empty string), not None. Something set content to None. Check the code path - likely the variable content is never assigned because open() raised an exception that was silently caught somewhere upstream. The traceback tells you exactly where to look.
Your Code vs Library Code
Traceback (most recent call last):
File "app.py", line 45, in handle_request ← YOUR CODE
result = client.post(url, json=payload)
File "/usr/lib/python3.11/site-packages/httpx/_client.py", line 923 ← LIBRARY
return self._send_single_request(request)
File "/usr/lib/python3.11/site-packages/httpx/_client.py", line 960 ← LIBRARY
response = transport.handle_request(request)
File "/usr/lib/python3.11/site-packages/httpx/_transports/default.py", line 230
raise ConnectError(...)
httpx.ConnectError: [Errno -2] Name or service not known
Work upward from the error until you reach your code - that is where the bug usually is. Library code in /site-packages/ is rarely the source of bugs. Your code passed wrong arguments, wrong types, or is not handling a failure case.
Part 3 - pdb: The Python Debugger
pdb is Python's built-in interactive debugger. Unlike print statements, it lets you pause execution mid-run and inspect any variable, call any function, and step line by line through your code.
Inserting a Breakpoint
# Method 1: The classic way (Python 2 and 3)
import pdb; pdb.set_trace()
# Method 2: Modern way (Python 3.7+) - preferred
breakpoint()
# breakpoint() calls sys.breakpointhook(), which defaults to pdb.set_trace()
# You can override it with the PYTHONBREAKPOINT environment variable
When Python hits breakpoint(), execution pauses and you get an interactive prompt:
> /path/to/script.py(12)calculate_average()
-> return total / len(numbers)
(Pdb)
The (Pdb) prompt means you are inside the debugger. You can now inspect state.
Essential pdb Commands
| Command | What It Does |
|---|---|
n | next - execute the current line, stay in current frame |
s | step - step into a function call |
c | continue - run until the next breakpoint or end |
q | quit - exit pdb (raises BdbQuit) |
p expr | print - evaluate and print expr |
pp expr | pretty-print - like p but formatted for dicts/lists |
l | list - show 11 lines of source around current line |
l N | list - show source around line N |
ll | longlist - show the entire current function |
w | where - print the call stack (same as bt/backtrace) |
u | up - move up one frame in the call stack |
d | down - move down one frame in the call stack |
b N | break - set breakpoint at line N |
b func | break - set breakpoint at start of function |
cl N | clear - remove breakpoint N |
h | help - list all commands |
h cmd | help cmd - describe the command |
!expr | execute Python expression (! needed if expr is a cmd) |
r | return - continue until current function returns |
a | args - print arguments of current function |
A Complete pdb Session
# debug_demo.py
def parse_record(record):
parts = record.split(",")
name = parts[0].strip()
age = int(parts[1].strip()) # Bug: parts[1] might be "twenty" not "20"
score = float(parts[2].strip())
return {"name": name, "age": age, "score": score}
records = [
"Alice, 30, 95.5",
"Bob, twenty, 87.0", # Bug: age is not a number
"Charlie, 25, 91.2",
]
breakpoint() # Pause here before the loop
results = [parse_record(r) for r in records]
> debug_demo.py(11)<module>()
-> results = [parse_record(r) for r in records]
(Pdb) p records
['Alice, 30, 95.5', 'Bob, twenty, 87.0', 'Charlie, 25, 91.2']
(Pdb) p records[1]
'Bob, twenty, 87.0'
(Pdb) n
(Pdb) s # Step into the list comprehension
(Pdb) s # Step into parse_record
> debug_demo.py(3)parse_record()
-> parts = record.split(",")
(Pdb) p record
'Bob, twenty, 87.0'
(Pdb) n
(Pdb) p parts
['Bob', ' twenty', ' 87.0']
(Pdb) n
-> age = int(parts[1].strip())
(Pdb) p parts[1].strip()
'twenty'
(Pdb) n
ValueError: invalid literal for int() with base 10: 'twenty'
You have now confirmed the exact line, the exact value, and the exact problem - without guessing.
:::tip Disable Breakpoints Without Deleting Them
Set the PYTHONBREAKPOINT=0 environment variable to make all breakpoint() calls no-ops. This lets you leave breakpoints in code during development without having them fire in CI:
PYTHONBREAKPOINT=0 python script.py
:::
Part 4 - Post-Mortem Debugging
Post-mortem debugging means inspecting the program state after a crash, without needing to set breakpoints in advance.
pdb.pm() - The Most Powerful Debugging Tool You Are Not Using
# Run your script normally in a REPL or with python -i
# -i means "enter interactive mode after running"
python -i buggy_script.py
If the script crashes:
Traceback (most recent call last):
File "buggy_script.py", line 12
result = calculate(data)
ValueError: invalid literal for int() with base 10: 'abc'
>>>
Now in the interactive REPL, run:
>>> import pdb
>>> pdb.pm()
> buggy_script.py(8)calculate()
-> value = int(raw_input)
(Pdb) p raw_input
'abc'
(Pdb) p data
{'key': 'abc', 'other': 42}
pdb.pm() opens the debugger at the exact frame where the unhandled exception occurred, with all variables intact. You can inspect the full call stack, print variables, and understand exactly what state the program was in when it crashed - without re-running or guessing.
Using pdb.post_mortem() Programmatically
import pdb
import traceback
import sys
def run_with_postmortem():
try:
# ... your code that might crash ...
risky_operation()
except Exception:
# Get the current exception's traceback
tb = sys.exc_info()[2]
# Open pdb at the crash site
pdb.post_mortem(tb)
run_with_postmortem()
This is useful in long-running scripts where you want automatic post-mortem on any unhandled exception.
Part 5 - Print Debugging: When It Is Valid
Print debugging has a bad reputation, but it is sometimes the right tool.
When Print Debugging Is Valid
- In environments without a debugger (remote server, CI container)
- When the bug is in code you cannot easily stop (threads, async, callbacks)
- When the bug only happens with large data sets that take minutes to reproduce
- When you need a log of ALL values over many iterations, not just one pause
How to Do Print Debugging Well
# BAD print debugging - no context, no types, hard to scan
print(x)
print("here")
print(result)
# GOOD print debugging - use repr() and label everything
print(f"[DEBUG] x={x!r} type={type(x).__name__}")
print(f"[DEBUG] After parse: result={result!r}")
# Use pprint for nested structures
from pprint import pprint
print("[DEBUG] config:")
pprint(config, indent=2)
# Mark entry/exit of functions
def process(data):
print(f"[ENTER process] data={data!r}")
result = _process_impl(data)
print(f"[EXIT process] result={result!r}")
return result
The !r format spec calls repr() on the value, which shows the type information:
"hello"prints as'hello'(with quotes - you know it is a string)Noneprints asNone(you can distinguish it from the string"None")[1, 2, 3]prints as[1, 2, 3](you see the brackets - you know it is a list)
:::warning Remove Print Debugging Before Committing
Print debugging is a temporary tool. Before committing code, replace temporary print statements with logger.debug() calls for information worth keeping, or delete them entirely. Print statements in version control are a code smell.
:::
Part 6 - The traceback Module
The traceback module gives you programmatic access to exception information - useful when you want to capture, format, or store traceback information without using a debugger.
Formatting Tracebacks as Strings
import traceback
def risky():
x = {}
return x["missing_key"]
try:
risky()
except KeyError:
# Get the full traceback as a string
tb_string = traceback.format_exc()
print("Captured traceback:")
print(tb_string)
# Or print it directly to stderr (the default)
traceback.print_exc()
Captured traceback:
Traceback (most recent call last):
File "demo.py", line 7, in <module>
risky()
File "demo.py", line 4, in risky
return x["missing_key"]
KeyError: 'missing_key'
Storing Tracebacks for Later Inspection
import traceback
import sys
errors = [] # Collect all errors from a batch job
def process_item(item):
try:
return parse_and_validate(item)
except Exception:
# Capture traceback NOW, while exception is active
tb = traceback.format_exc()
errors.append({"item": item, "error": tb})
return None
results = [process_item(x) for x in large_dataset]
# After the batch, report all failures
for error in errors:
print(f"Failed item: {error['item']}")
print(error["error"])
traceback.extract_tb() - Structured Traceback Data
import traceback
import sys
try:
int("not a number")
except ValueError:
exc_type, exc_value, exc_tb = sys.exc_info()
# Extract structured frames
frames = traceback.extract_tb(exc_tb)
for frame in frames:
print(f"File: {frame.filename}")
print(f"Line: {frame.lineno}")
print(f"Function: {frame.name}")
print(f"Code: {frame.line}")
print("---")
File: demo.py
Line: 3
Function: <module>
Code: int("not a number")
---
This structured data is useful for building custom error reporting systems, sending tracebacks to Sentry, or logging tracebacks as structured JSON.
Part 7 - sys.exc_info(): Accessing Exception Data Programmatically
sys.exc_info() returns a tuple of (exc_type, exc_value, exc_traceback) for the currently active exception. It returns (None, None, None) when called outside an except block.
import sys
import traceback
def diagnose_exception():
"""Show what sys.exc_info() contains inside an except block."""
try:
result = 10 / 0
except ZeroDivisionError:
exc_type, exc_value, exc_tb = sys.exc_info()
print(f"Exception type: {exc_type}") # <class 'ZeroDivisionError'>
print(f"Exception value: {exc_value}") # division by zero
print(f"Exception class: {exc_type.__name__}") # ZeroDivisionError
# Check if it is a specific type
if issubclass(exc_type, ArithmeticError):
print("This is an arithmetic error")
# Get the traceback as formatted lines
tb_lines = traceback.format_tb(exc_tb)
print("Traceback:")
for line in tb_lines:
print(line, end="")
diagnose_exception()
Exception type: <class 'ZeroDivisionError'>
Exception value: division by zero
Exception class: ZeroDivisionError
This is an arithmetic error
Traceback:
File "demo.py", line 4, in diagnose_exception
result = 10 / 0
:::warning Do Not Hold References to Traceback Objects
Traceback objects contain references to the frame objects of the call stack. Holding a reference to exc_tb outside an except block can cause reference cycles that prevent garbage collection. Always delete it when done: del exc_tb. Or use traceback.format_tb(exc_tb) to get a plain string and then discard exc_tb.
:::
Part 8 - warnings.warn(): Soft Errors and Deprecation
Sometimes a condition is not an error - it is a concern. Something is deprecated, an argument combination is unusual, or a feature will change in a future version. For these cases, use warnings.warn().
import warnings
def compute_mean(values, ignore_empty=None):
"""
Compute the mean of values.
ignore_empty is deprecated - use handle_empty parameter instead.
"""
if ignore_empty is not None:
warnings.warn(
"ignore_empty is deprecated and will be removed in v3.0. "
"Use handle_empty='skip' instead.",
DeprecationWarning,
stacklevel=2, # Point to the caller, not to this function
)
if not values:
warnings.warn(
"compute_mean received an empty sequence - returning 0.0",
RuntimeWarning,
stacklevel=2,
)
return 0.0
return sum(values) / len(values)
# Using deprecated argument - produces a DeprecationWarning
result = compute_mean([1, 2, 3], ignore_empty=True)
DeprecationWarning: ignore_empty is deprecated and will be removed in v3.0. Use handle_empty='skip' instead.
result = compute_mean([1, 2, 3], ignore_empty=True)
Warning Categories
| Category | Use When |
|---|---|
DeprecationWarning | API or feature will be removed in future |
PendingDeprecationWarning | Will be deprecated in the future |
RuntimeWarning | Suspicious runtime behavior (not an error yet) |
UserWarning | General-purpose warning for your users |
FutureWarning | Behavior will change in the future |
ResourceWarning | Resource not properly closed (Python 3.2+) |
Controlling Warnings in Tests
import warnings
# Turn warnings into errors in tests - catch regressions early
warnings.filterwarnings("error", category=DeprecationWarning)
# Or with pytest: add to pyproject.toml or conftest.py
# [tool.pytest.ini_options]
# filterwarnings = ["error::DeprecationWarning"]
Part 9 - Debugging Performance Issues
When the bug is not a crash but "it is too slow" or "it uses too much memory", different tools apply.
cProfile - Finding Time Hotspots
import cProfile
import pstats
import io
def slow_function():
"""Simulate a function with nested calls."""
result = []
for i in range(1000):
result.append(process_item(i))
return result
def process_item(n):
# Simulate expensive work
return sum(range(n))
# Method 1: Profile the whole program
# python -m cProfile -s cumulative script.py
# Method 2: Profile a specific function
profiler = cProfile.Profile()
profiler.enable()
slow_function()
profiler.disable()
# Print sorted stats
stream = io.StringIO()
stats = pstats.Stats(profiler, stream=stream)
stats.sort_stats("cumulative")
stats.print_stats(10) # Top 10 functions
print(stream.getvalue())
3003 function calls in 0.012 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 0.012 0.012 demo.py:3(slow_function)
1000 0.003 0.000 0.010 0.000 demo.py:9(process_item)
1000 0.007 0.000 0.007 0.000 {built-in method builtins.sum}
1000 0.001 0.000 0.001 0.000 {built-in method builtins.range}
Read: process_item was called 1000 times, contributing 0.010 seconds of cumulative time. The inner sum(range(n)) is the bottleneck.
line_profiler - Line-by-Line Timing
pip install line_profiler
# Add @profile decorator (provided by line_profiler, not imported)
@profile
def process_batch(records):
results = []
for record in records:
parsed = parse(record) # How long does this take?
validated = validate(parsed) # And this?
results.append(validated)
return results
kernprof -l -v script.py
Line # Hits Time Per Hit % Time Line Contents
=================================================
2 1 12.0 12.0 0.2 results = []
3 1000 320.0 0.3 5.4 for record in records:
4 1000 4800.0 4.8 80.9 parsed = parse(record)
5 1000 783.0 0.8 13.2 validated = validate(parsed)
6 1000 17.0 0.0 0.3 results.append(validated)
parse() consumes 80.9% of the time. Fix parse().
memory_profiler - Finding Memory Leaks
pip install memory_profiler
from memory_profiler import profile
@profile
def load_large_dataset():
data = []
with open("large_file.csv") as f:
for line in f:
data.append(line.split(",")) # Accumulates in memory
return data
Line # Mem usage Increment Line Contents
================================================
4 50.2 MiB 50.2 MiB data = []
5 50.2 MiB 0.0 MiB with open("large_file.csv") as f:
6 50.2 MiB 0.0 MiB for line in f:
7 312.4 MiB 262.2 MiB data.append(...) # 262 MB used here!
The memory increase happens at the append line - loading the entire file into memory. The fix: use a generator instead of accumulating a list.
Part 10 - IDE Debuggers vs pdb
VS Code Debugger - The Visual Alternative
VS Code with the Python extension provides a graphical debugger that wraps pdb:
// .vscode/launch.json
{
"version": "0.2.0",
"configurations": [
{
"name": "Python: Current File",
"type": "python",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal",
"justMyCode": true // Skip stepping into library code
}
]
}
VS Code advantages over raw pdb:
- Breakpoints set with a click (no code changes required)
- Variables panel shows all local and global variables without
pcommands - Call stack panel shows the full frame tree visually
- Watch expressions: monitor an expression across every step
- Conditional breakpoints: break only when
user_id == 42
pdb advantages over VS Code:
- Works over SSH on remote servers with no GUI
- Works inside Docker containers
- Works anywhere Python runs - no IDE required
- Scriptable and automatable
- Faster for experienced users who know the keyboard commands
The choice depends on context. Know both.
Part 11 - Real-World Debugging Strategies
Strategy 1: Bisect the Codebase
When you do not know where the bug is, narrow it down by halving:
# You have a pipeline of 10 processing stages and the output is wrong
# Stage 1 → Stage 2 → Stage 3 → ... → Stage 10 → Wrong output
# Bisect: check the output at Stage 5
result_after_5 = run_stages(data, stages=5)
print(f"After stage 5: {result_after_5!r}")
# Is it correct? If yes, bug is in stages 6-10. If no, bug is in stages 1-5.
# Repeat with half the remaining range.
This is O(log n) - 10 stages require at most 4 checks to find the culprit stage.
Strategy 2: Reproduce with a Minimal Example
A minimal reproduction case:
- Removes all unrelated code
- Shows the bug in 10 lines or fewer
- Uses hard-coded values, not external dependencies
# Original bug report: "The API fails when processing user 4471"
# Your minimal reproduction:
data = {"name": "O'Brien", "age": 42} # Apostrophe in name!
result = build_sql_insert(data)
print(result)
# INSERT INTO users (name, age) VALUES ('O'Brien', 42) <- SQL syntax error
Minimal reproduction cases:
- Make it obvious exactly what triggers the bug
- Are easy to share with teammates
- Often reveal the cause just by the act of creating them ("rubber duck debugging")
Strategy 3: Check the Recent Git Diff
Most bugs were introduced recently. The diff tells you what changed:
# What changed in the last commit?
git diff HEAD~1 HEAD
# What changed in the last 5 commits, just in the relevant module?
git log --oneline -5 -- src/payments/
# When did this function last change?
git log -p --follow -S "calculate_tax" -- src/
# Find the commit that introduced a bug (binary search)
git bisect start
git bisect bad # Current commit is broken
git bisect good v2.1.0 # This release was fine
# Git checks out commits; you test each one and mark good/bad
# Git finds the exact breaking commit in O(log n) steps
Strategy 4: Rubber Duck Debugging
Explain your code out loud, line by line, to an imaginary audience - a rubber duck, a stuffed animal, a plant, or a patient colleague.
The act of explaining forces you to:
- Articulate every assumption
- Notice where your explanation and the code diverge
- Spot implicit "it must be doing X" beliefs that are actually wrong
This is not a joke. Senior engineers do this constantly. It works because verbalizing activates a different cognitive mode than reading.
Interview Questions
Q1: How do you read a Python traceback? What is the most important part?
Answer: Read from the bottom up. The very last line is the most important - it names the exception type and its message (ValueError: invalid literal for int() with base 10: 'abc'). This tells you what went wrong.
The second-to-last block shows exactly where in your code the error occurred - the filename, line number, function name, and the actual line of code.
Work upward from there to understand the call chain: who called that function, and who called that. Stop climbing when you reach library code in /site-packages/ - the bug is usually in your code that passed incorrect data to the library, not in the library itself.
Q2: What is the difference between pdb.set_trace() and breakpoint()?
Answer: breakpoint() was added in Python 3.7 and is the modern equivalent of import pdb; pdb.set_trace(). The important difference is that breakpoint() calls sys.breakpointhook, which can be overridden:
- Setting
PYTHONBREAKPOINT=0makes allbreakpoint()calls no-ops (useful to disable them in CI without deleting them from code) - Setting
PYTHONBREAKPOINT=pudb.set_traceswitches to a different debugger (pudb) without changing code pdb.set_trace()always invokes pdb specifically - you cannot redirect it with an environment variable
Use breakpoint() in all new code.
Q3: What is post-mortem debugging and when is it useful?
Answer: Post-mortem debugging means opening the debugger at the crash site after an unhandled exception, without needing to set breakpoints in advance.
It is useful when:
- You do not know where the crash will happen (cannot set a breakpoint in advance)
- The crash only happens with specific data that is hard to reproduce in the debugger
- You want to inspect the full program state at the moment of failure without re-running
Usage:
# In interactive mode (python -i script.py):
import pdb; pdb.pm() # Opens debugger at the crash site
# Programmatically:
import sys, pdb, traceback
try:
main()
except Exception:
pdb.post_mortem(sys.exc_info()[2])
Q4: What does traceback.format_exc() return, and how is it different from str(exception)?
Answer: traceback.format_exc() returns the full formatted traceback string - including the "Traceback (most recent call last):" header, all the frame entries with file/line/function info, and the final exception line.
str(exception) returns only the exception message - just the string argument passed to the exception constructor.
Example:
try:
int("abc")
except ValueError as e:
print(str(e)) # invalid literal for int() with base 10: 'abc'
print(traceback.format_exc()) # Full multi-line traceback
Use traceback.format_exc() when you need to log or store the full traceback. Use str(e) when you only need the human-readable error message.
Q5: What is cProfile and how do you use it to find a performance bottleneck?
Answer: cProfile is Python's built-in statistical profiler. It counts how many times each function was called and measures the time spent in each function.
Usage:
# Profile a script from the command line
python -m cProfile -s cumulative my_script.py | head -20
Or in code:
import cProfile
cProfile.run("my_function(args)", sort="cumulative")
The output shows ncalls (call count), tottime (time in function, excluding callees), and cumtime (time including all nested calls). Sort by cumtime to find the highest-impact functions. The function at the top with the most cumulative time is your first optimization target.
Q6: When is print debugging appropriate, and how should it be done correctly?
Answer: Print debugging is appropriate when:
- The environment does not support a debugger (remote server, CI, Docker container with no terminal)
- The bug only manifests after many iterations (needs a log of all values, not a single pause)
- The code uses threads, async, or callbacks that make pdb difficult to use interactively
How to do it correctly:
- Label every print with a prefix like
[DEBUG]so you can find and remove them later - Use
!rformat spec (callsrepr()) to see the type and exact value:f"x={x!r}" - Use
pprint.pprint()for complex nested structures - Remove or convert to
logger.debug()before committing - print statements in production code are a code smell
Practice Challenges
Beginner - Read and Diagnose a Traceback
Given this traceback, identify: (1) what went wrong, (2) in which function, (3) what the probable root cause is, and (4) how to fix it.
Traceback (most recent call last):
File "app.py", line 24, in <module>
send_report(users)
File "app.py", line 18, in send_report
for user in users:
File "app.py", line 12, in process_users
email = user["email"].lower()
TypeError: 'NoneType' object has no attribute 'lower'
Solution
1. What went wrong:
TypeError: 'NoneType' object has no attribute 'lower' - .lower() was called on None instead of a string.
2. In which function:
process_users at line 12 in app.py, on the expression user["email"].lower().
3. Probable root cause:
user["email"] returned None. This means one of the user dictionaries in the dataset has "email": None (or the key exists with a None value). The user was likely created without an email address, or the email field was explicitly set to None.
4. How to fix it:
# Option A: Guard with a check before calling .lower()
email = user.get("email")
if email is not None:
email = email.lower()
else:
# Handle missing email: skip user, use default, log warning
logger.warning("User %s has no email address", user.get("id"))
continue
# Option B: Use a default
email = (user.get("email") or "").lower()
# Option C: Validate user data at the boundary (best practice)
if not isinstance(user.get("email"), str):
raise ValueError(f"User {user['id']} missing required email field")
The root fix is to ensure user records always have a valid email string, validated at the point of data entry or import.
Intermediate - Use pdb to Find a Hidden Bug
The following function has a bug that only appears with specific input. Use breakpoint() to investigate. Identify the exact line and condition that triggers the bug.
def merge_configs(base: dict, override: dict) -> dict:
"""
Deep merge override into base. Lists in override EXTEND base lists.
Scalars in override REPLACE base scalars.
"""
result = dict(base)
for key, value in override.items():
if key in result and isinstance(result[key], list):
result[key] = result[key] + value # Extend list
else:
result[key] = value # Scalar override
return result
# Test cases
base = {"host": "localhost", "port": 5432, "tags": ["prod"]}
override_a = {"port": 6000}
override_b = {"tags": ["v2", "blue"]}
override_c = {"tags": "not-a-list"} # Tricky case
print(merge_configs(base, override_a))
print(merge_configs(base, override_b))
print(merge_configs(base, override_c)) # This one has a bug
Solution
The Bug:
When override_c = {"tags": "not-a-list"} is merged:
result["tags"]is["prod"](a list)valueis"not-a-list"(a string)- The condition
isinstance(result[key], list)isTrue - So the code runs
result["tags"] = ["prod"] + "not-a-list"which raisesTypeError: can only concatenate list (not "str") to list
Debugging with pdb:
def merge_configs(base: dict, override: dict) -> dict:
result = dict(base)
for key, value in override.items():
breakpoint() # Pause here on each iteration
if key in result and isinstance(result[key], list):
result[key] = result[key] + value
else:
result[key] = value
return result
At the pdb prompt:
(Pdb) p key
'tags'
(Pdb) p value
'not-a-list'
(Pdb) p type(value)
<class 'str'>
(Pdb) p result[key]
['prod']
# AHA: result[key] is a list but value is a string - + will fail
The Fix:
def merge_configs(base: dict, override: dict) -> dict:
result = dict(base)
for key, value in override.items():
if (key in result
and isinstance(result[key], list)
and isinstance(value, list)): # Both must be lists
result[key] = result[key] + value
else:
result[key] = value # Scalar override (including str replacing list)
return result
# Now override_c replaces the list with the string
print(merge_configs(base, override_c))
# {'host': 'localhost', 'port': 5432, 'tags': 'not-a-list'}
Whether this is correct behavior depends on requirements, but at least it does not crash. The point is that the bug was only visible when result[key] was a list but value was not - a condition breakpoint() revealed immediately.
Advanced - Profile and Optimize a Slow Function
The function below is correct but too slow for production use. Profile it with cProfile, identify the bottleneck, and rewrite it to be at least 10x faster.
import time
import random
def find_duplicates(numbers: list) -> list:
"""Return a list of numbers that appear more than once."""
duplicates = []
for i, num in enumerate(numbers):
for j, other in enumerate(numbers):
if i != j and num == other and num not in duplicates:
duplicates.append(num)
return duplicates
# Test with large data
data = [random.randint(1, 500) for _ in range(5000)]
start = time.perf_counter()
result = find_duplicates(data)
elapsed = time.perf_counter() - start
print(f"Found {len(result)} duplicates in {elapsed:.3f}s")
Solution
Step 1: Profile to confirm the bottleneck
import cProfile
data = [random.randint(1, 500) for _ in range(5000)]
cProfile.run("find_duplicates(data)", sort="cumulative")
25004998 function calls in 14.2 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
1 14.1 14.1 14.2 14.2 solution.py:5(find_duplicates)
The entire time is in find_duplicates itself. The double loop is O(n squared): 5000 x 5000 = 25,000,000 iterations.
Step 2: Identify the algorithm problem
The O(n squared) double loop compares every element to every other element. For each element we also do num not in duplicates which is O(n) on a list - making this O(n cubed) in the worst case.
Step 3: Rewrite with O(n) algorithm
from collections import Counter
import time
import random
def find_duplicates_fast(numbers: list) -> list:
"""Return a list of numbers that appear more than once. O(n) time."""
counts = Counter(numbers)
return [num for num, count in counts.items() if count > 1]
# Compare performance
data = [random.randint(1, 500) for _ in range(5000)]
start = time.perf_counter()
result_slow = find_duplicates(data)
slow_time = time.perf_counter() - start
start = time.perf_counter()
result_fast = find_duplicates_fast(data)
fast_time = time.perf_counter() - start
print(f"Slow: {slow_time:.3f}s Fast: {fast_time:.6f}s")
print(f"Speedup: {slow_time / fast_time:.0f}x")
print(f"Results match: {set(result_slow) == set(result_fast)}")
# Output:
# Slow: 14.234s Fast: 0.000892s
# Speedup: 15952x
# Results match: True
What cProfile revealed and why:
The profiler showed almost all time was in find_duplicates itself. The double loop was O(n squared). Counter uses a hash map - building it is O(n), reading it is O(n). The fix is about choosing the right data structure, which cProfile makes obvious by showing you the hotspot.
Quick Reference
| Technique | Command / Code | Use Case |
|---|---|---|
| Set breakpoint | breakpoint() | Pause execution for interactive inspection |
| Disable all breakpoints | PYTHONBREAKPOINT=0 python script.py | CI environments |
| Post-mortem debugging | python -i script.py then import pdb; pdb.pm() | Inspect state after crash |
| pdb: next line | n | Step over a line |
| pdb: step into | s | Enter a function call |
| pdb: continue | c | Run to next breakpoint |
| pdb: print value | p expr | Inspect a variable |
| pdb: call stack | w | See where you are |
| pdb: up/down frame | u / d | Navigate the call stack |
| Format traceback | traceback.format_exc() | Capture traceback as string |
| Print traceback | traceback.print_exc() | Print current exception traceback |
| Exception info | sys.exc_info() | Get (type, value, tb) tuple |
| Soft warning | warnings.warn("msg", DeprecationWarning) | Non-fatal advisory |
| Profile script | python -m cProfile -s cumulative script.py | Find time bottlenecks |
| Line profiler | kernprof -l -v script.py | Line-by-line timing |
| Pretty print | from pprint import pprint; pprint(obj) | Readable complex structures |
| repr of value | f"x={x!r}" | Show type info in debug output |
Key Takeaways
- Debugging is the scientific method: observe the error, form a specific hypothesis, test it with a minimal experiment, conclude - never make multiple simultaneous guesses
- Read tracebacks from the bottom up - the last line names what failed, the second-to-last shows where, and you work upward to find where in your code the bad data originated
breakpoint()(Python 3.7+) is the modern way to open the interactive pdb debugger; setPYTHONBREAKPOINT=0to disable all breakpoints without touching code- Post-mortem debugging with
pdb.pm()is the most powerful technique for understanding crashes - it lets you inspect all state at the moment of failure after the fact traceback.format_exc()captures the full traceback as a string for logging or storage;sys.exc_info()provides structured (type, value, tb) access inside except blockscProfilefinds time bottlenecks in O(minutes): run it, sort bycumtime, and fix the top function - most performance bugs are algorithm or data structure choices, not micro-optimizations- The fastest path to a bug is usually: reproduce minimally, check the recent git diff, and use
bisectto narrow down which change introduced it
