Skip to main content

TDD Principles - Write the Test First, Let Failure Guide the Design

Reading time: ~30 minutes | Level: Intermediate → Engineering

Before reading further, run this test mentally. What happens when you execute it?

# Write this test FIRST - before any implementation exists
def test_password_validator():
validator = PasswordValidator(min_length=8, require_digit=True)
assert validator.validate("abc") == False
assert validator.validate("abcdefg") == False
assert validator.validate("abcdefg1") == True
Show Answer
NameError: name 'PasswordValidator' is not defined

That is exactly the point.

In TDD, a NameError is not a failure - it is the first red test. It tells you precisely what to build next. Write the minimum code to eliminate that error: define PasswordValidator. Then run again. You will get AttributeError: 'PasswordValidator' object has no attribute 'validate'. Write validate. Run again. Now the assertion fails because the logic is missing. Write the logic. Run: green. Now refactor. That cycle - Red → Green → Refactor - is the complete TDD heartbeat.

The insight: starting from a failing test forces you to think about the API you want before you think about the implementation you will build. You define the interface as a consumer first, not as a builder. This surfaces usability problems immediately - before you have sunk hours into an implementation that turns out to be awkward to call.

Now consider: most production bugs are introduced in code that was written without tests. The code worked at the time, a requirement changed, and the refactored version broke a subtle edge case. TDD makes every requirement explicit before it becomes code - and every change visible before it reaches a user.

What You Will Learn

  • Red → Green → Refactor cycle: the TDD heartbeat
  • What "minimal code" means and why it matters (emergent design)
  • The three laws of TDD (Uncle Bob): the rules that keep you honest
  • Worked full example: BankAccount built test-first, 8 tests, each adding one behaviour
  • Test naming conventions: test_<method>_<scenario>_<expected> and Given/When/Then
  • The test pyramid: unit, integration, and E2E - proportions and tradeoffs
  • When TDD works well and when it creates friction
  • Outside-in (London school) vs inside-out (Detroit school) TDD
  • Regression tests: turning every bug report into a failing test first

Prerequisites

  • Familiarity with pytest (writing and running tests)
  • Basic Python classes and exceptions
  • Lesson 02 (pytest) helps but is not required

Part 1 - The Red-Green-Refactor Cycle

The Heartbeat of TDD

Three phases, three rules:

  • Red: Write one test for the next behaviour. Run it. It must fail - if it passes without any code change, the test is wrong or the behaviour already exists.
  • Green: Write the simplest code that makes the test pass. Not the best code. Not the general case. The minimum.
  • Refactor: Clean up the implementation. Remove duplication. Improve names. The test suite is your safety net - you can refactor aggressively because the tests will tell you immediately if you break something.

What "Minimal Code" Means

"Minimal code" is not a shortcut - it is a discipline. Writing more than the test requires:

  • Implements behaviour that has no test yet (and therefore no specification)
  • Creates complexity that may need to be undone when requirements change
  • Delays the feedback that comes from writing the next test first
# Test says: validate("abc") == False (too short)
# Minimal code: hardcode False - DO THIS
class PasswordValidator:
def __init__(self, min_length, require_digit):
self.min_length = min_length
self.require_digit = require_digit

def validate(self, password):
return False # minimal - makes the current test pass

# The test for "abcdefg1" == True will force you to implement real logic
# Only then do you generalise

This looks absurd - but it is the discipline. When you write return False, the next test forces you to implement len(password) >= self.min_length. When you have that, the digit test forces any(c.isdigit() for c in password). The design emerges from the tests, not from speculation about what you might need.

Part 2 - The Three Laws of TDD

Robert C. Martin ("Uncle Bob") formulated three laws that define the TDD discipline:

Law 1 - You may not write production code unless it is to make a failing unit test pass.

No code without a red test first. This is the hardest rule to follow under time pressure.

Law 2 - You may not write more of a unit test than is sufficient to fail.

Stop writing the test as soon as it fails. One failing assertion is enough. Write the production code to fix it, then continue the test.

Law 3 - You may not write more production code than is sufficient to pass the currently failing test.

Hardcode the return value if it makes the test pass. The next test will force generalization.

# Following the three laws for PasswordValidator:

# Test 1 - Law 1: write the test first
def test_validate_short_password_returns_false():
v = PasswordValidator(min_length=8, require_digit=True)
assert v.validate("abc") == False

# Law 3: write minimum to pass
class PasswordValidator:
def __init__(self, min_length, require_digit):
self.min_length = min_length
self.require_digit = require_digit
def validate(self, password):
return False # hardcoded - sufficient to pass test 1

# Test 2 - Law 1: next failing behaviour
def test_validate_valid_password_returns_true():
v = PasswordValidator(min_length=8, require_digit=True)
assert v.validate("abcdefg1") == True

# Law 3: hardcoded False now fails - must generalise
class PasswordValidator:
def validate(self, password):
if len(password) < self.min_length:
return False
if self.require_digit and not any(c.isdigit() for c in password):
return False
return True

# Test 3 - Law 1: right length but no digit
def test_validate_no_digit_returns_false():
v = PasswordValidator(min_length=8, require_digit=True)
assert v.validate("abcdefgh") == False

# This test now passes without further changes - the logic already covers it

:::note TDD Is Not About the Tests TDD is a design technique. The tests are a by-product. The real output is code that was written to be testable - which means it has small functions, clear boundaries, and minimal coupling. The test suite you accumulate is a regression safety net. But the primary value was in using tests to drive the design decisions. :::

Part 3 - Worked Example: BankAccount Built Test-First

We will build a BankAccount class from scratch using TDD. Each test adds exactly one new behaviour. Watch how the design emerges.

Setup

# test_bank_account.py
import pytest
# bank_account.py does not exist yet

Test 1 - Create an Account with a Starting Balance

def test_new_account_has_correct_balance():
account = BankAccount(initial_balance=100)
assert account.balance == 100

Run → RED: NameError: name 'BankAccount' is not defined

Minimum code:

# bank_account.py
class BankAccount:
def __init__(self, initial_balance: float = 0):
self.balance = initial_balance

Run → GREEN.

Test 2 - Deposit Increases Balance

def test_deposit_increases_balance():
account = BankAccount(initial_balance=100)
account.deposit(50)
assert account.balance == 150

Run → RED: AttributeError: 'BankAccount' object has no attribute 'deposit'

Minimum code:

def deposit(self, amount: float) -> None:
self.balance += amount

Run → GREEN.

Test 3 - Withdraw Decreases Balance

def test_withdraw_decreases_balance():
account = BankAccount(initial_balance=100)
account.withdraw(30)
assert account.balance == 70

Minimum code:

def withdraw(self, amount: float) -> None:
self.balance -= amount

Test 4 - Overdraft Raises an Exception

def test_withdraw_more_than_balance_raises_error():
account = BankAccount(initial_balance=50)
with pytest.raises(ValueError, match="Insufficient funds"):
account.withdraw(100)

Minimum code (refactor withdraw):

def withdraw(self, amount: float) -> None:
if amount > self.balance:
raise ValueError(f"Insufficient funds: balance {self.balance}, requested {amount}")
self.balance -= amount

Run → GREEN. All 4 previous tests still pass.

Test 5 - Deposit of Zero or Negative Is Invalid

def test_deposit_non_positive_amount_raises_error():
account = BankAccount(initial_balance=100)
with pytest.raises(ValueError):
account.deposit(0)
with pytest.raises(ValueError):
account.deposit(-10)

Minimum code:

def deposit(self, amount: float) -> None:
if amount <= 0:
raise ValueError(f"Deposit amount must be positive, got {amount}")
self.balance += amount

Test 6 - Transaction History

def test_transaction_history_records_deposits_and_withdrawals():
account = BankAccount(initial_balance=0)
account.deposit(100)
account.withdraw(40)
assert account.history == [
{"type": "deposit", "amount": 100, "balance": 100},
{"type": "withdraw", "amount": 40, "balance": 60},
]

RED: AttributeError: 'BankAccount' object has no attribute 'history'

Minimum code (add history list, update deposit and withdraw to record):

class BankAccount:
def __init__(self, initial_balance: float = 0):
self.balance = initial_balance
self.history: list[dict] = []

def deposit(self, amount: float) -> None:
if amount <= 0:
raise ValueError(f"Deposit amount must be positive, got {amount}")
self.balance += amount
self.history.append({"type": "deposit", "amount": amount, "balance": self.balance})

def withdraw(self, amount: float) -> None:
if amount > self.balance:
raise ValueError(f"Insufficient funds: balance {self.balance}, requested {amount}")
self.balance -= amount
self.history.append({"type": "withdraw", "amount": amount, "balance": self.balance})

Test 7 - Transfer Between Accounts

def test_transfer_moves_funds_between_accounts():
source = BankAccount(initial_balance=200)
destination = BankAccount(initial_balance=50)
source.transfer(75, destination)
assert source.balance == 125
assert destination.balance == 125

Minimum code:

def transfer(self, amount: float, destination: "BankAccount") -> None:
self.withdraw(amount) # raises ValueError if insufficient funds
destination.deposit(amount) # raises ValueError if amount <= 0

Notice: transfer reuses withdraw and deposit - we get the validation and history recording for free because the design emerged from tests. This is the emergent design benefit of TDD.

Test 8 - Transfer to Self Raises an Error

def test_transfer_to_self_raises_error():
account = BankAccount(initial_balance=100)
with pytest.raises(ValueError, match="Cannot transfer to same account"):
account.transfer(50, account)

Minimum code:

def transfer(self, amount: float, destination: "BankAccount") -> None:
if destination is self:
raise ValueError("Cannot transfer to same account")
self.withdraw(amount)
destination.deposit(amount)

The Complete BankAccount - Emerged from Tests

# bank_account.py - final implementation after 8 TDD cycles
class BankAccount:
def __init__(self, initial_balance: float = 0):
self.balance = initial_balance
self.history: list[dict] = []

def deposit(self, amount: float) -> None:
if amount <= 0:
raise ValueError(f"Deposit amount must be positive, got {amount}")
self.balance += amount
self.history.append({"type": "deposit", "amount": amount, "balance": self.balance})

def withdraw(self, amount: float) -> None:
if amount > self.balance:
raise ValueError(
f"Insufficient funds: balance {self.balance}, requested {amount}"
)
self.balance -= amount
self.history.append({"type": "withdraw", "amount": amount, "balance": self.balance})

def transfer(self, amount: float, destination: "BankAccount") -> None:
if destination is self:
raise ValueError("Cannot transfer to same account")
self.withdraw(amount)
destination.deposit(amount)

8 tests, each added one behaviour, the class emerged incrementally with zero speculation.

Part 4 - Test Naming Conventions

Good test names make failures self-documenting:

Convention 1: test_<method>_<scenario>_<expected>

def test_withdraw_sufficient_funds_decreases_balance(): ...
def test_withdraw_insufficient_funds_raises_value_error(): ...
def test_deposit_positive_amount_increases_balance(): ...
def test_deposit_zero_raises_value_error(): ...
def test_transfer_to_self_raises_value_error(): ...

When test_withdraw_insufficient_funds_raises_value_error fails in CI, you know exactly what broke without reading the test body.

Convention 2: Given/When/Then (BDD-style)

class TestBankAccountWithdrawal:

def test_given_sufficient_balance_when_withdraw_then_balance_decreases(self):
# Given
account = BankAccount(initial_balance=100)
# When
account.withdraw(40)
# Then
assert account.balance == 60

def test_given_insufficient_balance_when_withdraw_then_raises(self):
# Given
account = BankAccount(initial_balance=30)
# When / Then
with pytest.raises(ValueError):
account.withdraw(100)

Given/When/Then is verbose but makes the test structure explicit for complex scenarios.

:::tip If Writing the Test Is Hard, the Design Is Wrong When you struggle to write a test for a function - it needs too many setup steps, too many mocks, or returns something hard to assert on - that is a signal the function is doing too much. TDD surfaces API usability problems before you have invested in the implementation. A function that is painful to test is painful to use from real code too. :::

Part 5 - The Test Pyramid

Unit tests (base of pyramid): Test a single function or class in isolation. Mock all external dependencies. Fast enough to run on every file save. These are the tests TDD produces most naturally.

Integration tests (middle): Test two or more components working together - your code calling a real database, your HTTP client talking to a test server. Slower but catch integration bugs that unit tests cannot.

End-to-End tests (apex): Test the whole system from the user's perspective. Extremely slow, often brittle, but verify that the product actually works. Keep these few and focused on critical user journeys.

The 10/20/70 heuristic: 10% E2E, 20% integration, 70% unit. This is not a law - it is a starting point. Adjust based on your system's risk profile.

:::warning 100% Test Coverage Does Not Mean Correct Code Coverage measures which lines ran, not whether your assertions are meaningful. A test that calls every line but asserts only assert result is not None achieves 100% coverage and catches almost nothing. Tests must assert the right things. Coverage is a floor, not a ceiling. :::

Part 6 - When TDD Works Well and When It Creates Friction

TDD Works Well When

  • Requirements are well-defined. You know what the function must do before writing it. A payment validator, a data parser, a sorting algorithm - the contract is clear.
  • You are building pure logic. Functions with deterministic inputs and outputs are easy to test first.
  • You are fixing a bug. Turn the bug report into a failing test first. The test demonstrates the bug. Fix the code. The test proves the fix. This is regression-test TDD at its most valuable.
  • You are refactoring. The existing test suite is your safety net.

TDD Creates Friction When

  • Requirements are unclear or changing rapidly. Writing tests for a design you do not yet understand produces tests that will be deleted or rewritten. Explore first, test after the design stabilizes.
  • You are building UI. Visual layout and interaction patterns are hard to specify precisely before you see them. Explore, then extract testable logic from the UI and test that.
  • You are spiking or prototyping. Throw-away code does not need a TDD discipline. Once the spike reveals the right design, rewrite it test-first.
  • The codebase has no test infrastructure. Adding TDD to legacy code with no existing tests requires building the infrastructure first. Start by writing tests for the most critical paths, not by trying to TDD from the beginning.

:::danger Never Delete Failing Tests to Make CI Green A failing test is a specification of behaviour that is not met. Deleting it silences the specification - you no longer know the behaviour is broken, but the user still does. Fix the code, or fix the test if the requirement genuinely changed and the old test is now incorrect. Never delete a test to make the build pass. :::

Part 7 - Outside-In vs Inside-Out TDD

Two schools of TDD have different views on where to start:

Detroit School (Inside-Out / Classic TDD)

Start with the smallest units. Build from the bottom up. Units are tested in isolation. Integration tests verify that the units compose correctly. This is the approach shown in the BankAccount example.

# Detroit: start with the core domain model
def test_account_balance(): # core domain
...
def test_account_deposit(): # core domain
...
# Later: test the service that uses BankAccount
def test_payment_service_charges_account(): # higher-level
...

London School (Outside-In / Mockist TDD)

Start from the user-facing behaviour. Write a failing acceptance test at the highest level. Then work inward, mocking collaborators that do not exist yet. The mocks define the interface contracts between layers before implementation.

# London: start from the acceptance test
def test_user_can_complete_purchase(mocker): # top-level behaviour
mock_payment = mocker.patch("shop.payment_service.charge")
mock_inventory = mocker.patch("shop.inventory.reserve")
mock_payment.return_value = {"status": "ok"}
mock_inventory.return_value = True

result = shop.checkout(cart={"item": "book", "qty": 1}, card="4111...")
assert result["status"] == "confirmed"

# Now implement payment_service and inventory,
# writing their own unit tests as you go

Which to use: Detroit school produces tests that are less coupled to implementation structure (they test behaviour, not collaboration). London school produces tests that specify interfaces upfront - valuable when building large systems where multiple engineers work on different layers simultaneously.

:::tip Neither School Is Universally Correct Use Detroit-style TDD for well-understood domain logic where the algorithm is the main challenge. Use London-style when you are designing a system's layer boundaries and want the tests to define the contracts between layers before any implementation exists. :::

Part 8 - Regression Tests: Every Bug Is a Test Opportunity

The most immediately valuable TDD application in production work: turning every bug report into a test first.

# Bug report: "When a user transfers their entire balance, the account shows -0.0 instead of 0.0"

# Step 1: Write the failing regression test BEFORE touching code
def test_transfer_entire_balance_leaves_zero_not_negative_zero():
account = BankAccount(initial_balance=100.0)
destination = BankAccount(initial_balance=0.0)
account.transfer(100.0, destination)
assert account.balance == 0.0
assert str(account.balance) == "0.0" # -0.0 would fail this

# Step 2: Run → RED (confirms the bug is reproducible and the test captures it)

# Step 3: Fix the bug
def withdraw(self, amount: float) -> None:
if amount > self.balance:
raise ValueError(...)
self.balance -= amount
if self.balance == 0.0:
self.balance = 0.0 # normalize -0.0 to 0.0
self.history.append(...)

# Step 4: Run → GREEN (test passes, bug is fixed)
# Step 5: This test now runs in CI forever - the bug cannot silently return

Regression tests serve two purposes: they prove the bug is fixed, and they prevent the bug from ever returning undetected.

Graded Practice Challenges

Level 1 - Predict and Identify

Question 1: In strict TDD, what must be true about a test before you write any production code for it?

Show Answer

The test must fail (be in the RED state) before you write production code. If you write a test and it passes without any new production code, either the behaviour already exists or the test is wrong (it does not actually verify what you think it does). A passing test before implementation is a defective test.

Question 2: Which of these test names gives the most information when it fails in CI?

A: test_account_1
B: test_withdraw
C: test_withdraw_insufficient_balance_raises_value_error
D: test_bank_account_behaviour
Show Answer

C: test_withdraw_insufficient_balance_raises_value_error

Name C tells you: the method being tested (withdraw), the scenario (insufficient_balance), and the expected outcome (raises_value_error). When this fails in CI, an engineer reading the test summary knows immediately what broke without opening the test file.

Names A, B, and D are all too vague - test_account_1 says nothing, test_withdraw does not specify which scenario, test_bank_account_behaviour covers too much.

Question 3: A developer writes all their tests after the implementation is complete. What benefit of TDD do they lose?

Show Answer

API usability feedback. When you write tests first, you experience the API as a consumer before building it as a producer. This surfaces awkward interfaces, unclear naming, and missing return values before they are baked into the implementation.

Additionally: tests written after implementation tend to test the implementation as it was written, not as it should behave. The test writer knows how the function works and unconsciously tests the happy paths the implementation already handles. Test-first writing forces you to think about all the scenarios that matter to the caller.

Question 4: In the test pyramid, why should unit tests be the largest layer?

Show Answer

Unit tests are fast (microseconds to milliseconds), isolated (no external dependencies to set up or tear down), precise (a unit test failure points directly to the broken function), and cheap to maintain (they test a single responsibility). Having many unit tests means most feedback comes in milliseconds on every file save.

E2E tests are slow (seconds to minutes), brittle (any UI change can break them), and expensive to maintain. Having few E2E tests means they run less often and cover only the most critical paths. A pyramid with a large E2E base and few unit tests is an "ice cream cone" - slow, fragile, and expensive.

Level 2 - Debug and Fix

This TDD sequence has several violations of TDD principles. Identify each one and explain the correct approach:

# Violation 1: writing too much code in one step
# After a single test: test_account_has_balance(),
# the developer writes the entire class:
class BankAccount:
def __init__(self, balance=0): self.balance = balance
def deposit(self, amount):
if amount <= 0: raise ValueError()
self.balance += amount
def withdraw(self, amount):
if amount > self.balance: raise ValueError()
self.balance -= amount
def transfer(self, amount, dest):
self.withdraw(amount); dest.deposit(amount)
def history(self): return self._history

# Violation 2: writing a test that already passes
def test_new_account_balance_is_zero():
account = BankAccount() # BankAccount already exists and has balance=0 default
assert account.balance == 0
# Developer runs this - it passes immediately without any new code

# Violation 3: deleting a failing test
# CI is red because test_transfer_to_self_raises_error fails.
# Developer removes the test to unblock the release.

# Violation 4: writing tests after the fact
# Developer builds the entire PaymentService, then writes tests
# to match what the code already does, achieving 100% coverage.
Show Solution

Violation 1 - Too much code in one step:

This violates Laws 2 and 3. After test_account_has_balance(), the only code allowed is enough to make that test pass: class BankAccount: def __init__(self, balance=0): self.balance = balance. All other methods should be added one test at a time. Writing the full class in one step skips the feedback loop that would have caught API design issues early.

Violation 2 - Test that already passes:

A test that passes before any implementation is written is not a red test - it is either testing already-existing behaviour (which has no value in this cycle) or it does not verify anything meaningful. In TDD you should watch the test fail first. If a test passes immediately, it is not driving any new development. Investigate: is the behaviour already covered? Is the assertion too weak?

Violation 3 - Deleting the failing test:

Never delete a failing test to make CI green. test_transfer_to_self_raises_error describes a real requirement: transferring to yourself should raise an error. Deleting the test means this bug ships to production silently. The correct action: implement the fix (if destination is self: raise ValueError), then confirm the test passes.

Violation 4 - Tests written after implementation:

Tests written after implementation test the implementation as it was built, not as it should behave. The developer unconsciously avoids testing scenarios the implementation does not handle. The API design is already locked - the tests cannot surface usability problems. The resulting tests tend to be implementation-coupled rather than behaviour-focused. TDD requires tests first so the failing test drives the implementation, not the other way around.

Level 3 - Design Challenge

Using strict TDD (write each test first, then the minimum code), build a RateLimiter class that:

  1. Accepts max_calls and window_seconds in its constructor
  2. Has an allow(key: str) -> bool method that returns True if the key has made fewer than max_calls in the last window_seconds, False otherwise
  3. Tracks separate call counts per key
  4. Resets the count for a key after window_seconds have elapsed

Write the 5 tests you would write first (in TDD order), then provide the final implementation.

Show Reference Solution

The 5 tests, in TDD order:

import pytest
from unittest.mock import patch
from datetime import datetime

# Test 1 - First call for a new key is always allowed
def test_allow_first_call_returns_true():
limiter = RateLimiter(max_calls=3, window_seconds=60)
assert limiter.allow("user_123") is True

# Test 2 - Calls within the limit are allowed
def test_allow_within_limit_returns_true():
limiter = RateLimiter(max_calls=3, window_seconds=60)
assert limiter.allow("user_123") is True
assert limiter.allow("user_123") is True
assert limiter.allow("user_123") is True

# Test 3 - Call exceeding the limit is rejected
def test_allow_exceeding_limit_returns_false():
limiter = RateLimiter(max_calls=3, window_seconds=60)
limiter.allow("user_123")
limiter.allow("user_123")
limiter.allow("user_123")
assert limiter.allow("user_123") is False # 4th call rejected

# Test 4 - Different keys have separate limits
def test_allow_different_keys_are_independent():
limiter = RateLimiter(max_calls=1, window_seconds=60)
assert limiter.allow("user_a") is True
assert limiter.allow("user_a") is False # user_a exhausted
assert limiter.allow("user_b") is True # user_b unaffected

# Test 5 - Window expiry resets the count
def test_allow_after_window_expires_resets_count():
limiter = RateLimiter(max_calls=1, window_seconds=60)
with patch("ratelimiter.time.time") as mock_time:
mock_time.return_value = 1000.0
assert limiter.allow("user_123") is True
assert limiter.allow("user_123") is False # exhausted at t=1000

mock_time.return_value = 1061.0 # window expired
assert limiter.allow("user_123") is True # reset

Final implementation (emerged from the tests):

# ratelimiter.py
import time


class RateLimiter:
"""
Token-bucket rate limiter: allows up to max_calls per key per window_seconds.
Thread-safety not guaranteed (add a Lock for production use in threaded code).
"""

def __init__(self, max_calls: int, window_seconds: float):
self.max_calls = max_calls
self.window_seconds = window_seconds
# {key: [timestamp_of_call, ...]}
self._calls: dict[str, list[float]] = {}

def allow(self, key: str) -> bool:
now = time.time()
window_start = now - self.window_seconds

# Get existing timestamps for this key, filtering out expired ones
timestamps = self._calls.get(key, [])
timestamps = [t for t in timestamps if t > window_start]

if len(timestamps) >= self.max_calls:
self._calls[key] = timestamps # save pruned list
return False

timestamps.append(now)
self._calls[key] = timestamps
return True

Design decisions driven by each test:

  • Test 1 forced creation of _calls dict and basic allow logic
  • Test 2 forced tracking multiple calls per key (list, not a counter)
  • Test 3 forced the >= max_calls check
  • Test 4 revealed that per-key isolation is naturally provided by the dict design
  • Test 5 forced the sliding window - timestamps instead of a simple counter, filtered by window_start

The sliding window design emerged from the test requirements, not from upfront design.

Key Takeaways

  • TDD is Red → Green → Refactor: write one failing test, write the minimum code to pass it, clean up, repeat
  • "Minimum code" is not a shortcut - it is a discipline that prevents speculative generalization and lets design emerge from real requirements
  • The three laws: no production code without a failing test; no more test than sufficient to fail; no more code than sufficient to pass
  • TDD is a design technique first. The tests are a byproduct. The primary value is using tests to drive API decisions before implementation locks them in
  • Test names should be self-documenting: test_<method>_<scenario>_<expected> makes CI failures readable without opening the test
  • The test pyramid: many fast unit tests at the base, fewer integration tests in the middle, very few E2E tests at the top
  • TDD works best for well-defined logic, algorithms, and bug fixes; it creates friction for exploratory work, UI, and unclear requirements
  • Detroit school (inside-out) builds from core domain up; London school (outside-in) starts from acceptance tests and mocks inward
  • Turn every bug report into a failing test first - this proves the bug, drives the fix, and permanently prevents regression
  • Never delete a failing test to make CI green; it is a specification of behaviour that the code does not meet - fix the code

What's Next

Lesson 05 covers code coverage - coverage.py, line vs branch coverage, the .coveragerc configuration, pytest-cov, and mutation testing with mutmut. You will see why 100% line coverage can still miss bugs, how coverage.py uses sys.settrace() under the hood (connecting back to the internals lesson), and why chasing coverage numbers without meaningful assertions is the coverage trap.

© 2026 EngineersOfAI. All rights reserved.