Recursion Fundamentals - Thinking in Self-Reference
Reading time: ~20 minutes | Level: Foundation → Engineering
Here is a function that seems like it should loop forever - but it terminates and produces the right answer:
def factorial(n):
if n == 0:
return 1 # base case: stop here
return n * factorial(n - 1) # recursive case: reduce the problem
print(factorial(5)) # 120
print(factorial(10)) # 3628800
print(factorial(0)) # 1
And here is a number that will crash it:
print(factorial(10000))
# RecursionError: maximum recursion depth exceeded
Recursion is elegant. It is also dangerous if you do not understand what is happening under the hood. By the end of this page, you will know exactly how each call creates a new stack frame, why Python limits recursion to around 1000 calls by default, when recursion is the right choice, and how to make it fast.
What You Will Learn
- What recursion is and why it requires both a base case and a recursive case
- How each recursive call creates a new stack frame - visualized with ASCII diagrams
- Python's recursion limit (
sys.getrecursionlimit()), why it exists, and what happens when you hit it - Classic recursive algorithms: factorial, Fibonacci, binary search, tree traversal
- Why naive Fibonacci is O(2^n) and how memoization makes it O(n) - with timing proof
- When recursion is the natural tool (trees, divide-and-conquer, parsing)
- When NOT to use recursion (unbounded depth, performance-critical loops)
- Real-world recursion: directory trees, JSON schema validation, expression parsers
Prerequisites
- Comfortable with Python functions, arguments, and return values (topics 01–06)
- Understanding of Python's call stack at a conceptual level
- Familiarity with lists and dicts
Mental Model: A Function Calling a Smaller Version of Itself
The core idea of recursion is problem reduction: solve a problem by reducing it to a simpler version of the same problem, until you reach a version so simple it has a direct answer.
Every recursive function needs exactly two things:
- Base case: the termination condition - a direct answer for the simplest input
- Recursive case: reduces the problem toward the base case
If you omit the base case, the function calls itself forever (until Python raises RecursionError). If the recursive case does not make progress toward the base case, same result.
Part 1 - The Two Required Parts
Base Case
The base case is a condition where the function returns a result without calling itself. It is the anchor that stops the recursion.
def countdown(n):
if n <= 0: # base case
print("Go!")
return
print(n)
countdown(n - 1) # recursive case: n decreases each call
countdown(5)
# 5
# 4
# 3
# 2
# 1
# Go!
What Happens Without a Base Case
def broken(n):
print(n)
broken(n + 1) # no base case - grows forever
broken(0)
# 0
# 1
# 2
# ...
# RecursionError: maximum recursion depth exceeded while calling a Python object
Python raises RecursionError at approximately 1000 frames deep (the exact limit depends on available stack space and can be queried and adjusted - more on this shortly).
Part 2 - Call Stack Mechanics: What Happens at Each Call
Every function call in Python creates a stack frame - a block of memory holding the function's local variables, arguments, and the return address. Recursive calls create a chain of frames, one per active call.
Let us trace factorial(4) frame by frame:
Each frame stays alive and suspended until the call it is waiting for returns. This is why n recursive calls consume n stack frames simultaneously.
Part 3 - Python's Recursion Limit
Python deliberately caps the recursion depth:
import sys
print(sys.getrecursionlimit()) # 1000 (default on most systems)
The limit exists because each stack frame consumes memory (typically 1–4 KB), and the OS allocates a fixed amount of stack space per thread (usually 1–8 MB). If recursion could go arbitrarily deep, a simple bug (missing base case) would crash the entire interpreter process. Python's limit turns a process crash into a catchable RecursionError.
Catching RecursionError
import sys
def deep(n):
return deep(n + 1)
try:
deep(0)
except RecursionError as e:
print(f"Caught: {e}")
print(f"Current limit: {sys.getrecursionlimit()}")
# Caught: maximum recursion depth exceeded
# Current limit: 1000
Adjusting the Limit
import sys
sys.setrecursionlimit(5000) # increase - use with caution
:::danger Raising the Recursion Limit
sys.setrecursionlimit() raises the limit, but the OS stack is still bounded. Setting it too high causes a segmentation fault (a hard process crash, not a Python exception) because Python overruns the OS-allocated stack memory. Only raise the limit if you have a specific, bounded use case and have profiled the actual stack consumption. The safe alternative is usually to convert the algorithm to iteration.
:::
Part 4 - Classic Recursive Algorithms
Factorial
def factorial(n):
if n < 0:
raise ValueError(f"factorial undefined for negative n: {n}")
if n == 0:
return 1
return n * factorial(n - 1)
print(factorial(0)) # 1
print(factorial(5)) # 120
print(factorial(10)) # 3628800
Sum of a List (Recursive Decomposition)
def recursive_sum(lst):
if not lst: # base case: empty list sums to 0
return 0
return lst[0] + recursive_sum(lst[1:]) # head + sum of tail
print(recursive_sum([1, 2, 3, 4, 5])) # 15
print(recursive_sum([])) # 0
Binary Search
def binary_search(arr, target, low=0, high=None):
if high is None:
high = len(arr) - 1
if low > high: # base case: search space exhausted
return -1
mid = (low + high) // 2
if arr[mid] == target:
return mid # base case: found
if arr[mid] < target:
return binary_search(arr, target, mid + 1, high) # search right
return binary_search(arr, target, low, mid - 1) # search left
nums = [1, 3, 5, 7, 9, 11, 13, 15]
print(binary_search(nums, 7)) # 3
print(binary_search(nums, 6)) # -1
print(binary_search(nums, 1)) # 0
print(binary_search(nums, 15)) # 7
Tree Traversal
Recursion shines on trees. A tree node is either a leaf (base case) or a node with children (recurse into each):
class TreeNode:
def __init__(self, val, left=None, right=None):
self.val = val
self.left = left
self.right = right
def inorder(node):
"""In-order traversal: left → root → right."""
if node is None: # base case: empty subtree
return []
return inorder(node.left) + [node.val] + inorder(node.right)
# 4
# / \
# 2 6
# / \ / \
# 1 3 5 7
tree = TreeNode(4,
TreeNode(2, TreeNode(1), TreeNode(3)),
TreeNode(6, TreeNode(5), TreeNode(7))
)
print(inorder(tree)) # [1, 2, 3, 4, 5, 6, 7]
The recursive structure of the code mirrors the recursive structure of the data. This is why recursion is natural for trees: each call handles one node, and the children are handled by recursive calls.
Part 5 - Fibonacci: Naive O(2^n) vs Memoized O(n)
Fibonacci is the canonical example for showing why naive recursion can be catastrophically slow.
Naive Recursive Fibonacci
def fib_naive(n):
if n <= 1:
return n
return fib_naive(n - 1) + fib_naive(n - 2)
print(fib_naive(10)) # 55
print(fib_naive(30)) # 832040 (takes noticeable time)
# fib_naive(50) would take ~10 minutes
Why It Is O(2^n)
fib_naive(n) calls itself an exponential number of times. Every subproblem is recomputed from scratch. For n=50, there are roughly 2^50 (about 10^15) calls.
Memoized Fibonacci - O(n)
from functools import lru_cache
@lru_cache(maxsize=None)
def fib(n):
if n <= 1:
return n
return fib(n - 1) + fib(n - 2)
print(fib(10)) # 55
print(fib(50)) # 12586269025
print(fib(100)) # 354224848179261915075
print(fib(300)) # (a very large number, computed instantly)
Or hand-rolled with a dict:
_cache = {}
def fib_memo(n):
if n <= 1:
return n
if n in _cache:
return _cache[n]
result = fib_memo(n - 1) + fib_memo(n - 2)
_cache[n] = result
return result
Timing Comparison
import time
# Naive
start = time.perf_counter()
fib_naive(35)
naive_time = time.perf_counter() - start
# Memoized (first call builds the cache)
start = time.perf_counter()
fib(35)
memo_time = time.perf_counter() - start
print(f"Naive: {naive_time:.4f}s")
print(f"Memoized: {memo_time:.6f}s")
# Naive: 2.8431s
# Memoized: 0.000012s
# Speedup: roughly 230,000x for n=35
Memoization transforms the call tree into a call chain - each unique n is computed exactly once.
Part 6 - Iterative Equivalent: Same Power, Different Tradeoffs
Any recursive algorithm can be rewritten iteratively. Sometimes the iterative version is clearer; sometimes recursion is clearer.
Iterative Factorial
def factorial_iterative(n):
result = 1
for i in range(2, n + 1):
result *= i
return result
print(factorial_iterative(10)) # 3628800
print(factorial_iterative(100)) # works fine - no stack limit
Iterative Fibonacci
def fib_iterative(n):
if n <= 1:
return n
a, b = 0, 1
for _ in range(2, n + 1):
a, b = b, a + b
return b
print(fib_iterative(100)) # 354224848179261915075
Trade-off Summary
| Recursion | Iteration |
|---|---|
| Mirrors recursive data structures | Usually more memory-efficient (O(1) stack space vs O(n)) |
| Elegant for trees and graphs | No recursion limit risk |
| Risk: stack overflow | Sometimes harder to read |
| Easy to reason about correctness | Risk: off-by-one errors in loops |
| Natural for divide-and-conquer | May need explicit stack (list) to simulate recursion |
Part 7 - When Recursion Is the Right Tool
Trees and Graphs
Any hierarchical data structure is naturally recursive. File systems, JSON documents, HTML DOM, AST (abstract syntax trees), org charts - all are trees, and recursion fits them perfectly.
import os
def count_python_files(directory):
"""Recursively count .py files under directory."""
count = 0
for entry in os.scandir(directory):
if entry.is_file() and entry.name.endswith(".py"):
count += 1
elif entry.is_dir():
count += count_python_files(entry.path) # recurse into subdir
return count
# print(count_python_files("/your/project"))
Divide and Conquer
Merge sort, quicksort, and binary search all divide the problem in half at each step. The recursive structure makes correctness arguments clear:
def merge_sort(arr):
if len(arr) <= 1: # base case: a list of 0 or 1 elements is sorted
return arr
mid = len(arr) // 2
left = merge_sort(arr[:mid]) # sort the left half
right = merge_sort(arr[mid:]) # sort the right half
return merge(left, right) # combine two sorted halves
def merge(left, right):
result = []
i = j = 0
while i < len(left) and j < len(right):
if left[i] <= right[j]:
result.append(left[i]); i += 1
else:
result.append(right[j]); j += 1
result.extend(left[i:])
result.extend(right[j:])
return result
print(merge_sort([5, 2, 8, 1, 9, 3])) # [1, 2, 3, 5, 8, 9]
JSON Schema Validation
Nested JSON documents validate recursively - if a field's schema says it contains an object, validate that object's fields recursively:
def validate(data, schema):
"""
Minimal recursive JSON schema validator.
schema: dict with 'type' key and optional 'properties'.
"""
expected_type = schema["type"]
type_map = {"string": str, "integer": int,
"number": float, "boolean": bool, "object": dict}
if expected_type == "object":
if not isinstance(data, dict):
raise ValueError(f"Expected object, got {type(data).__name__}")
for key, sub_schema in schema.get("properties", {}).items():
if key not in data:
raise ValueError(f"Missing required key: {key}")
validate(data[key], sub_schema) # recurse into each field
else:
if not isinstance(data, type_map[expected_type]):
raise ValueError(
f"Expected {expected_type}, got {type(data).__name__}"
)
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"score": {"type": "number"},
}
}
validate({"name": "Alice", "age": 30, "score": 98.5}, schema)
print("Valid!") # Valid!
try:
validate({"name": "Bob", "age": "thirty", "score": 85.0}, schema)
except ValueError as e:
print(f"Invalid: {e}") # Invalid: Expected integer, got str
Part 8 - When NOT to Use Recursion
Unbounded Depth
If the recursion depth is proportional to user input and can be arbitrarily large (e.g., reading a million-line file line by line), use iteration. Recursion would hit the limit.
Performance-Critical Inner Loops
Python function calls are relatively expensive (setting up a frame takes time). In a tight inner loop that executes millions of times, iterative code can be significantly faster.
When the Iterative Version Is Equally Clear
# Recursive - elegant but unnecessary for this case
def sum_list(lst):
if not lst:
return 0
return lst[0] + sum_list(lst[1:])
# Iterative - equally readable, no stack risk, built-in is best
total = sum([1, 2, 3, 4, 5])
Use the simplest tool that solves the problem clearly and safely.
:::tip When to choose recursion Use recursion when the problem itself is recursive - trees, graphs, divide-and-conquer, parsers. Use iteration when the problem is fundamentally sequential (process each item in order). If both are equally clear, prefer iteration for its predictable memory usage. :::
Interview Questions
Q1: What are the two required parts of every recursive function?
Answer: Every recursive function needs a base case and a recursive case. The base case is a condition where the function returns a result directly, without calling itself - it terminates the recursion. The recursive case calls the function with a strictly simpler input, making progress toward the base case. Without a base case, recursion continues until Python raises RecursionError. Without a recursive case that reduces the problem, the base case is never reached and recursion also goes infinite.
Q2: What is Python's default recursion limit and why does it exist?
Answer: The default is approximately 1000 frames, queryable via sys.getrecursionlimit(). It exists because each call frame consumes memory on the OS call stack, which is bounded (typically 1–8 MB per thread). Without a limit, a missing base case would exhaust the OS stack and crash the entire Python process with a segmentation fault - an unrecoverable error. Python's limit converts that crash into a catchable RecursionError. You can raise the limit with sys.setrecursionlimit(), but setting it too high still risks a segfault.
Q3: Why is naive recursive Fibonacci O(2^n) and how does memoization fix it?
Answer: Naive recursive Fibonacci calls fib(n-1) and fib(n-2). Each of those calls fib twice more, creating an exponential call tree. fib(n-2) is computed independently by both fib(n) and fib(n-1), and this redundant computation cascades down the tree. The total number of calls is approximately 2^n.
Memoization stores each computed value in a cache (keyed by n). Before computing, it checks the cache. If the result is there, it returns immediately - O(1). This collapses the exponential call tree into a linear call chain: fib(n) computes fib(n-1), which computes fib(n-2), and so on, each unique value computed once. Total complexity: O(n) time and O(n) space.
Q4: How do you convert a recursive function to an iterative one?
Answer: The general strategy depends on the algorithm:
For linear recursion (factorial, list sum): use a loop with an accumulator variable.
For tree/graph recursion: use an explicit stack (a Python list) to simulate the call stack. Push the initial problem, then in a loop: pop a problem, solve or decompose it, push subproblems back. This is equivalent to what Python does internally with the call stack.
Example converting recursive DFS to iterative:
def dfs_iterative(graph, start):
visited = set()
stack = [start]
while stack:
node = stack.pop()
if node not in visited:
visited.add(node)
stack.extend(graph[node]) # push neighbors
return visited
Q5: When is recursion the natural choice over iteration?
Answer: Recursion is natural when the data structure or problem is itself recursive. Key indicators:
- The problem decomposes into smaller instances of the same problem (divide and conquer: merge sort, binary search).
- The data is hierarchical: trees, graphs, nested JSON, file system directories.
- The algorithm is a parser or evaluator for a grammar that is defined recursively.
- You are implementing a mathematical definition that is recursive (factorial, Fibonacci, Ackermann).
In these cases, the recursive structure of the code mirrors the recursive structure of the data, making correctness much easier to reason about. For purely sequential problems (process each item in a flat list), iteration is simpler.
Q6: What is the real-world risk of using recursion on production data?
Answer: The main risk is hitting RecursionError when input depth is unbounded. If a user submits deeply nested JSON (common in certain APIs), a deeply nested directory tree, or a graph with long paths, a naive recursive implementation will crash. Production-grade recursive code should:
- Validate or cap input depth before recursing.
- Use
sys.setrecursionlimit()conservatively if deeper recursion is truly needed. - Prefer iterative implementations with explicit stacks for algorithms that will process user-controlled data of unknown depth.
- Use memoization (
functools.lru_cache) to avoid exponential recomputation.
Practice Challenges
Beginner: Recursive Power Function
Write a recursive function power(base, exp) that computes base raised to exp without using ** or pow(). Handle exp == 0 as the base case. Test with positive integers.
Solution
def power(base, exp):
"""
Compute base ** exp recursively.
Requires exp to be a non-negative integer.
"""
if exp < 0:
raise ValueError("exp must be non-negative")
if exp == 0: # base case: anything^0 = 1
return 1
return base * power(base, exp - 1) # reduce exp by 1 each call
print(power(2, 0)) # 1
print(power(2, 10)) # 1024
print(power(3, 4)) # 81
print(power(5, 3)) # 125
print(power(2, 8)) # 256
# Trace for power(2, 3):
# power(2, 3) = 2 * power(2, 2)
# = 2 * power(2, 1)
# = 2 * power(2, 0)
# = 1
# = 2 * 1 = 2
# = 2 * 2 = 4
# = 2 * 4 = 8 ✓
# Note: Python's built-in ** is dramatically more efficient for large
# exponents using fast exponentiation (O(log n) multiplications),
# but this illustrates the recursive structure clearly.
Intermediate: Flatten a Nested List
Write a recursive function flatten(lst) that takes an arbitrarily nested list and returns a flat list of all the leaf values. For example, flatten([1, [2, [3, 4], 5], 6]) returns [1, 2, 3, 4, 5, 6]. The nesting can be arbitrarily deep.
Solution
def flatten(lst):
"""
Recursively flatten an arbitrarily nested list.
Base case: element is not a list → yield it as a leaf.
Recursive case: element is a list → flatten it.
"""
result = []
for item in lst:
if isinstance(item, list):
result.extend(flatten(item)) # recurse into sub-list
else:
result.append(item) # base case: leaf value
return result
print(flatten([1, 2, 3])) # [1, 2, 3]
print(flatten([1, [2, 3], 4])) # [1, 2, 3, 4]
print(flatten([1, [2, [3, 4], 5], 6])) # [1, 2, 3, 4, 5, 6]
print(flatten([[[[1]]], [[[2]]]])) # [1, 2]
print(flatten([])) # []
print(flatten([1, [2, [], [3]], 4])) # [1, 2, 3, 4]
# Works with mixed types
print(flatten(["a", ["b", ["c", "d"]], "e"])) # ['a', 'b', 'c', 'd', 'e']
# Real-world use: deeply nested API responses
nested_config = {
"servers": [
{"ips": ["10.0.0.1", "10.0.0.2"]},
{"ips": ["10.0.1.1"]},
]
}
all_ips = flatten([s["ips"] for s in nested_config["servers"]])
print(all_ips) # ['10.0.0.1', '10.0.0.2', '10.0.1.1']
Advanced: Directory Tree Size Calculator with Memoization
Write a function dir_size(path) that recursively computes the total size in bytes of all files under a directory. Then profile it on a real directory and add memoization for repeated calls on the same path. Handle permission errors gracefully.
Solution
import os
import functools
import time
def dir_size(path):
"""
Recursively compute total size of all files under `path`.
Handles PermissionError for directories we cannot read.
Returns total size in bytes.
"""
total = 0
try:
with os.scandir(path) as entries:
for entry in entries:
try:
if entry.is_file(follow_symlinks=False):
total += entry.stat(follow_symlinks=False).st_size
elif entry.is_dir(follow_symlinks=False):
total += dir_size(entry.path) # recurse
except PermissionError:
pass # skip files we cannot read
except PermissionError:
pass # skip directories we cannot enter
return total
def human_readable(size_bytes):
for unit in ["B", "KB", "MB", "GB", "TB"]:
if size_bytes < 1024:
return f"{size_bytes:.1f} {unit}"
size_bytes /= 1024
return f"{size_bytes:.1f} PB"
# --- Version with memoization for repeated queries ---
_size_cache = {}
def dir_size_cached(path):
"""
Memoized version: caches result for each path.
Note: cache is invalidated manually - real code would use file
system change notifications or a time-based TTL.
"""
real_path = os.path.realpath(path) # resolve symlinks
if real_path in _size_cache:
return _size_cache[real_path]
size = dir_size(real_path)
_size_cache[real_path] = size
return size
# --- Demo ---
target = os.path.expanduser("~") # measure home directory
start = time.perf_counter()
size = dir_size(target)
elapsed = time.perf_counter() - start
print(f"Size of ~: {human_readable(size)}")
print(f"Computed in {elapsed:.2f}s")
# Second call uses cache
start = time.perf_counter()
size2 = dir_size_cached(target)
elapsed2 = time.perf_counter() - start
print(f"Cached result: {human_readable(size2)} in {elapsed2:.4f}s")
# Recursive depth example - count directory nesting levels
def max_depth(path, current_depth=0):
"""Find the maximum directory nesting depth."""
max_d = current_depth
try:
with os.scandir(path) as entries:
for entry in entries:
if entry.is_dir(follow_symlinks=False):
try:
d = max_depth(entry.path, current_depth + 1)
max_d = max(max_d, d)
except (PermissionError, RecursionError):
pass
except PermissionError:
pass
return max_d
# This shows how recursion depth can reach the Python limit on deep trees
# For very deep trees, the iterative version with an explicit stack is safer
Quick Reference
| Concept | Syntax / Example | Notes |
|---|---|---|
| Define recursive function | def f(n): if n==0: return 1; return n * f(n-1) | Base case first |
| Get recursion limit | sys.getrecursionlimit() | Default ~1000 |
| Set recursion limit | sys.setrecursionlimit(5000) | Use cautiously |
| Catch RecursionError | try: f(n) except RecursionError: ... | Treat like any exception |
| Memoize with decorator | @functools.lru_cache(maxsize=None) | Caches all call results |
| Hand-rolled memoization | cache={}; if n in cache: return cache[n] | Dict keyed by args |
| Iterative with stack | stack=[start]; while stack: node=stack.pop() | Simulate call stack |
| Flatten nested list | for item in lst: if isinstance(item, list): ... | Classic recursive pattern |
| Tree traversal | if node is None: return []; return inorder(left)+[val]+inorder(right) | Base case = None |
| Recursion depth danger | Deep user input → RecursionError | Validate or use iteration |
Key Takeaways
- Every recursive function needs a base case (direct answer, no self-call) and a recursive case (reduce the problem and call self with a smaller input)
- Each recursive call creates a new stack frame that stays alive until the call returns -
ncalls deep meansnframes simultaneously on the stack - Python's default recursion limit is approximately 1000 frames;
RecursionErroris thrown when exceeded; the limit exists to prevent OS-level stack exhaustion - Naive recursive Fibonacci is O(2^n) because it recomputes overlapping subproblems; memoization collapses the exponential tree into a linear chain, making it O(n)
- Recursion is natural when the data structure or problem is itself recursive - trees, graphs, divide-and-conquer, parsers all benefit from recursive solutions
- For unbounded input depth or performance-critical code, prefer iterative solutions with explicit stacks; they have the same expressive power but no recursion limit risk
functools.lru_cacheis the idiomatic Python way to add memoization; it handles cache key hashing, thread safety, and cache statistics automatically
