Skip to main content

Code Coverage - Measuring What You Test (and What You Miss)

Reading time: ~25 minutes | Level: Intermediate → Engineering

Before reading further, predict how much of this code is covered by the test:

def absolute_value(x):
if x >= 0:
return x
return -x

def test_absolute_value():
assert absolute_value(5) == 5
Show Answer

It depends which coverage metric you use.

Line coverage: 100%. Both return x and return -x are executable lines. The test calls absolute_value(5), which executes the if x >= 0 check, enters the if body, and hits return x. Both return lines are in the function... wait, actually return -x is never reached with input 5. Line coverage counts lines that executed. return -x is line 4 - it never ran.

Let's be precise: with input 5, lines 2 (if x >= 0), 3 (return x) execute. Line 4 (return -x) does not. Line coverage is 75% with this test.

But if coverage.py reports line coverage and the function has two branches - the if True path and the if False path - branch coverage is 50%: only the positive branch was exercised. The negative branch (x < 0) was never tested.

The puzzle is a demonstration that the metric you use changes what gaps you see. A codebase with "100% line coverage" can still have entire execution paths - half of every conditional - that were never run during testing.

Now consider: your CI pipeline shows "coverage: 94%." A new engineer asks "does that mean 94% of our code is tested?" The answer is: it means 94% of lines were executed during the test run. It says nothing about whether those executions verified correct behaviour, whether all branches were taken, whether every condition combination was tested, or whether your tests actually assert anything meaningful. Coverage is a floor, not a guarantee.

What You Will Learn

  • Line, branch, condition, and path coverage: differences and what each catches
  • How coverage.py works: sys.settrace() and why it connects to the Python internals lesson
  • coverage run, coverage report, coverage html, coverage xml
  • .coveragerc and pyproject.toml configuration: omit, exclude_lines, branch = true
  • pytest-cov: --cov, --cov-report, --cov-fail-under
  • Reading HTML reports: red lines, branch indicator arrows
  • # pragma: no cover: when and how to use it legitimately
  • Coverage in CI: fail-under thresholds, trend tracking, badges
  • Mutation testing with mutmut: why 90% coverage can still miss real bugs

Prerequisites

  • pytest basics (writing and running tests)
  • Module 03 Lesson 04 (Python internals - sys.settrace, frame objects) is helpful but not required
  • Familiarity with pyproject.toml or .ini configuration files

Part 1 - Coverage Metrics: Line, Branch, Condition, Path

Line Coverage

Line coverage counts executable lines that were run at least once:

def absolute_value(x):
if x >= 0: # line 2: executed ✅
return x # line 3: executed ✅ (x=5 path)
return -x # line 4: NOT executed ❌

def test_absolute_value():
assert absolute_value(5) == 5

# Line coverage: 3 of 4 lines = 75%
# (The 'def' line and 'if' line always run; only one return runs)

What line coverage misses: any branch not taken. Every if/else pair gives you line coverage credit for both sides if each line runs at least once, but does not tell you whether both branches were tested.

Branch Coverage

Branch coverage counts decision points where both outcomes (True and False) must be exercised:

def process_order(order):
if order["amount"] > 1000: # branch: True AND False must both run
apply_discount(order)
if order["status"] == "vip": # branch: True AND False must both run
send_vip_email(order)
return order

# Test with only {"amount": 500, "status": "regular"}:
# Branch 1 (amount > 1000): only False branch taken - 50% branch coverage
# Branch 2 (status == "vip"): only False branch taken - 50% branch coverage
# Line coverage: 100% (all lines run except discount and email lines)
# Branch coverage: 50%

Branch coverage requires at least two tests: one that takes each side of every conditional. It catches far more logic bugs than line coverage.

Condition Coverage and Path Coverage

Condition (MC/DC) coverage: Every sub-expression in a compound condition must independently affect the outcome:

def is_eligible(age, has_id, is_member):
if age >= 18 and has_id and is_member:
return True
return False

# Branch coverage: needs True case and False case for the overall if
# Condition coverage: needs each of age>=18, has_id, is_member to independently
# cause the condition to be True AND False
# This requires at minimum 4 tests - impractical for most code

Path coverage: Every unique execution path through a function is tested. For a function with N independent binary conditions, there are 2^N paths. Path coverage is rarely achievable in practice - it is primarily a theoretical metric and a tool for understanding combinatorial test explosion.

Practical recommendation: Aim for branch coverage in production. It catches the most bugs per test written without the exponential explosion of path coverage. Line coverage is a useful sanity check but is not sufficient.

:::tip Aim for Branch Coverage, Not Just Line Coverage Enable branch = true in your .coveragerc or pyproject.toml. Branch coverage adds minimal overhead to measurement and catches an entire class of bugs that line coverage misses: logic errors in conditionals where one branch was never tested. Most teams that discover a production bug "despite 90% coverage" are measuring lines, not branches. :::

Part 2 - How coverage.py Works: sys.settrace

The Tracing Hook

coverage.py does not parse your code or instrument it at the bytecode level (unlike some other tools). It uses sys.settrace() - Python's built-in tracing hook:

import sys

def my_trace(frame, event, arg):
"""Called by CPython for every line executed, function call, and return."""
if event == "line":
# frame.f_code.co_filename = which file
# frame.f_lineno = which line number
print(f"Executing: {frame.f_code.co_filename}:{frame.f_lineno}")
return my_trace # return self to continue tracing

sys.settrace(my_trace)

# Now every line executed triggers my_trace
x = 1 + 2
y = x * 3
# Output:
# Executing: /path/to/script.py:10
# Executing: /path/to/script.py:11

sys.settrace(None) # stop tracing

coverage.py installs a highly optimized version of this trace function when you run coverage run. For every line that executes, it records the (filename, line_number) pair. For branch coverage, it tracks transitions between lines to determine which branches were taken.

:::note coverage.py Uses sys.settrace - The Same Mechanism as Debuggers This is why running tests under coverage is slower than running them without coverage. Every executed line triggers the trace callback. sys.settrace is a global, single-slot hook - only one trace function can be active at a time. This means you cannot run coverage.py and pdb simultaneously in their standard forms. The internals lesson on sys.settrace (Module 03) explains the frame objects that the trace function receives. :::

Why Coverage Slows Tests

The overhead is proportional to how many lines execute. A tight loop that runs 10 million iterations:

# Without coverage: loop executes in ~100ms
result = sum(i * i for i in range(10_000_000))

# With coverage: trace callback fires ~10 million times - may take 5x longer
# coverage.py optimizes this extensively, but the fundamental overhead remains

For test suites measuring minutes rather than seconds, coverage overhead is usually acceptable (10-30% slowdown). For microbenchmarks or performance-sensitive test scenarios, run coverage separately, not on every test run.

Part 3 - coverage.py Commands

Basic Workflow

# Install
pip install coverage

# Step 1: Run your test suite under coverage
coverage run -m pytest tests/

# Step 2: View a summary report in the terminal
coverage report

# Example output:
# Name Stmts Miss Cover
# ---------------------------------------------
# payment.py 42 5 88%
# bank_account.py 67 0 100%
# utils/email.py 23 8 65%
# ---------------------------------------------
# TOTAL 132 13 90%

# Step 3: Generate an HTML report (open in browser)
coverage html
# Creates htmlcov/index.html - click on any file to see red/green line highlighting

# Step 4: Generate XML (for CI integration - SonarQube, Codecov, etc.)
coverage xml
# Creates coverage.xml

# Show missing lines inline
coverage report --show-missing
# payment.py 42 5 88% 23-25, 41, 67
# Line numbers of uncovered lines shown in last column

Branch Coverage

# Enable branch coverage measurement
coverage run --branch -m pytest tests/

coverage report --branch
# Name Stmts Miss Branch BrPart Cover
# -------------------------------------------------
# payment.py 42 2 18 3 88%
# BrPart = branches partially covered (one arm missing)

Part 4 - Configuration: .coveragerc and pyproject.toml

.coveragerc File

# .coveragerc
[run]
branch = true
source = mypackage
omit =
*/tests/*
*/migrations/*
setup.py
conftest.py

[report]
exclude_lines =
# Standard pragmas
pragma: no cover

# Abstract methods never called directly
raise NotImplementedError

# Type checking imports (never run at runtime)
if TYPE_CHECKING:

# __repr__ and __str__ are tested implicitly or not worth testing
def __repr__
def __str__

# Main guard
if __name__ == .__main__.:

show_missing = true
skip_covered = false

[html]
directory = htmlcov

[xml]
output = coverage.xml

pyproject.toml (Modern Approach)

# pyproject.toml
[tool.coverage.run]
branch = true
source = ["mypackage"]
omit = [
"*/tests/*",
"*/migrations/*",
"setup.py",
]

[tool.coverage.report]
exclude_lines = [
"pragma: no cover",
"raise NotImplementedError",
"if TYPE_CHECKING:",
"def __repr__",
"if __name__ == .__main__.:",
]
show_missing = true
fail_under = 85

[tool.coverage.html]
directory = "htmlcov"

omit vs exclude_lines

  • omit: excludes entire files from measurement. Use for test files themselves, migrations, generated code, vendored libraries.
  • exclude_lines: excludes specific patterns of lines within measured files. Use for unreachable defensive code, platform-specific blocks, TYPE_CHECKING guards.

Part 5 - pytest-cov: Coverage Integrated with pytest

# Install
pip install pytest-cov

# Run with coverage
pytest --cov=mypackage tests/

# Multiple source directories
pytest --cov=mypackage --cov=utils tests/

# With branch coverage
pytest --cov=mypackage --cov-branch tests/

# Generate HTML report alongside terminal output
pytest --cov=mypackage --cov-report=html tests/

# Multiple report formats at once
pytest --cov=mypackage \
--cov-report=term-missing \
--cov-report=html \
--cov-report=xml \
tests/

# Fail if coverage falls below threshold (useful in CI)
pytest --cov=mypackage --cov-fail-under=85 tests/
# Exit code 2 if coverage < 85% - CI pipeline fails

# Show which lines are missing
pytest --cov=mypackage --cov-report=term-missing tests/
# payment.py 42 5 88% 23-25, 41, 67

pytest-cov in pyproject.toml

[tool.pytest.ini_options]
addopts = "--cov=mypackage --cov-report=term-missing --cov-report=html"

This bakes coverage into every pytest run - every developer and CI pipeline gets coverage by default.

Part 6 - Reading HTML Coverage Reports

The HTML report (coverage html) generates an interactive browser view. Each file shows:

  • Green lines: executed during the test run
  • Red lines: never executed - uncovered
  • Yellow/orange lines: partially covered (branch coverage only) - line ran but not all branches taken
# Example function - what the HTML report shows

def validate_age(age): # green - always runs
if age < 0: # green - always runs (branch partially covered)
raise ValueError("Age cannot be negative") # RED - negative branch never tested
if age > 150: # green - always runs (branch partially covered)
raise ValueError("Age unrealistically high") # RED - >150 branch never tested
return True # green - runs when valid

Branch indicators in the HTML report show arrows: a filled arrow means a branch was taken; an empty arrow means it was not. Hovering over partially-covered lines shows which branch was missed: "Branch 1→4 not taken (condition was always False)."

Part 7 - # pragma: no cover

# pragma: no cover tells coverage.py to exclude a line or block from measurement:

# Legitimate uses of pragma: no cover

# 1. Platform-specific code that cannot run on the test platform
import sys
if sys.platform == "win32": # pragma: no cover
def get_registry_value(key):
import winreg
return winreg.QueryValue(winreg.HKEY_LOCAL_MACHINE, key)

# 2. Debug-only code that is never active in tests
if os.environ.get("DEBUG_VERBOSE"): # pragma: no cover
print(f"Processing item: {item}")

# 3. The __main__ guard
if __name__ == "__main__": # pragma: no cover
main()

# 4. Abstract method bodies that raise NotImplementedError
class BaseProcessor:
def process(self, data): # pragma: no cover
raise NotImplementedError("Subclasses must implement process()")

# 5. Type-checking-only imports
from typing import TYPE_CHECKING
if TYPE_CHECKING: # pragma: no cover
from myapp.models import User

:::danger Never Exclude Meaningful Code with # pragma: no cover # pragma: no cover is not a way to achieve a coverage number goal. It is for genuinely untestable code: platform-specific blocks that cannot run in CI, __main__ guards, debug logging. Never use it to exclude logic branches that are simply inconvenient to test - those branches are exactly what coverage exists to flag. If you find yourself adding pragma: no cover to business logic, the correct action is to write the test. :::

Part 8 - Coverage in CI

GitHub Actions Example

# .github/workflows/test.yml
name: Tests

on: [push, pull_request]

jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"

- name: Install dependencies
run: pip install -e ".[dev]"

- name: Run tests with coverage
run: |
pytest \
--cov=mypackage \
--cov-report=xml \
--cov-report=term-missing \
--cov-fail-under=85

- name: Upload coverage to Codecov
uses: codecov/codecov-action@v4
with:
file: coverage.xml
fail_ci_if_error: true

Setting a Meaningful Threshold

# Start conservatively - measure current coverage first
pytest --cov=mypackage --cov-report=term-missing

# Set threshold slightly below current value
# (raise it incrementally as coverage improves)
pytest --cov=mypackage --cov-fail-under=80

# In pyproject.toml for persistent enforcement
[tool.coverage.report]
fail_under = 85

The threshold strategy: measure what you have, set the threshold 2-5% below it (to avoid blocking CI immediately), then raise it as you add tests. Setting the threshold at 100% immediately on a legacy codebase is a mistake - it blocks all development until everything is covered.

Coverage Badges

<!-- In your README.md - after setting up Codecov or Coveralls -->
[![Coverage](https://codecov.io/gh/yourorg/yourrepo/branch/main/graph/badge.svg)](https://codecov.io/gh/yourorg/yourrepo)

Coverage badges in the README communicate the project's test quality at a glance to contributors and users.

:::warning 100% Coverage Does Not Mean Your Code Is Correct Coverage measures execution. Execution does not imply verification. A test that calls every line and asserts only assert result is not None achieves 100% coverage while catching almost no bugs. The quality of what tests assert matters as much as - often more than - which lines they execute. :::

Part 9 - Mutation Testing with mutmut

The Coverage Limitation

def is_adult(age):
return age >= 18

def test_is_adult():
assert is_adult(20) is True
assert is_adult(15) is False
# Line coverage: 100%
# Branch coverage: 100%

Both metrics show 100%. But what if a developer accidentally changes >= to >?

def is_adult(age):
return age > 18 # bug: 18-year-olds no longer qualify

The test suite still passes - is_adult(20) is still True, is_adult(15) is still False. The off-by-one bug at the boundary is not caught because no test exercises is_adult(18).

This is what mutation testing detects.

How mutmut Works

mutmut systematically modifies your source code - creating mutants - and runs your test suite against each mutant. If the tests catch the mutation (a test fails), the mutant is killed. If the tests pass with the mutated code, the mutant survives - indicating a gap in your test suite.

# Install
pip install mutmut

# Run mutation testing on your package
mutmut run --paths-to-mutate mypackage/

# View results
mutmut results
# Survived mutants are the ones your tests missed:
# mypackage/payment.py:L23 [#42]
# mypackage/validator.py:L15 [#17]

# See what a specific mutant looks like
mutmut show 42
# --- payment.py
# +++ payment.py
# @@ -23 +23 @@
# - if amount > 0:
# + if amount >= 0:

Common Mutations mutmut Generates

# Original → Mutation → What it tests

age >= 18 → age > 18 # tests boundary conditions
x == y → x != y # tests equality logic
count += 1 → count -= 1 # tests increment direction
return Truereturn False # tests return value
results.append(x) → results # tests that append is called
if condition:if not condition: # tests conditional logic
a and b → a or b # tests boolean operators

Interpreting Results

mutmut results

# Output:
# Survived: 15
# Killed: 203
# Total: 218
# Mutation score: 93.1%

# mutation score = killed / total * 100
# Higher is better - surviving mutants are test gaps

A mutation score of 93% means 7% of plausible bugs would not be caught by your tests. The surviving mutants tell you exactly which conditions, boundaries, and logic branches need better assertions.

:::note Mutation Testing Is Slow - Use It Selectively mutmut runs your entire test suite once per mutant. A codebase with 200 functions and 1000 mutants takes 1000 test runs. Use mutation testing on your most critical modules - payment logic, authentication, data validation - not the entire codebase at once. Run it in CI on a schedule (nightly or weekly) rather than on every commit. :::

The Coverage Trap: High Coverage, Low Quality Tests

# This achieves 100% line and branch coverage - and catches almost nothing

def divide(a, b):
if b == 0:
raise ZeroDivisionError("Cannot divide by zero")
return a / b

# "Coverage trap" tests - they run every line but assert nothing meaningful
def test_divide():
try:
result = divide(10, 2)
except Exception:
pass # silently swallow all errors

def test_divide_zero():
try:
divide(10, 0)
except ZeroDivisionError:
pass # catches the right exception but asserts nothing about it

# Coverage: 100%. Bugs caught: 0.
# divide(10, 2) could return 42 and both tests would still pass.

# Correct tests that actually verify behaviour:
def test_divide_positive_numbers():
assert divide(10, 2) == 5.0

def test_divide_raises_on_zero_denominator():
with pytest.raises(ZeroDivisionError, match="Cannot divide by zero"):
divide(10, 0)

Coverage tells you which lines ran. It cannot tell you whether the assertions are meaningful. This is why coverage is necessary but not sufficient.

Part 10 - Putting It Together: Production Coverage Setup

# pyproject.toml - complete production coverage configuration

[tool.coverage.run]
branch = true
source = ["mypackage"]
omit = [
"*/tests/*",
"*/migrations/*",
"mypackage/vendor/*",
"setup.py",
]
parallel = false
data_file = ".coverage"

[tool.coverage.report]
exclude_lines = [
"pragma: no cover",
"raise NotImplementedError",
"if TYPE_CHECKING:",
"def __repr__",
"def __str__",
"if __name__ == .__main__.:",
"@(abc\\.)?abstractmethod",
]
show_missing = true
skip_covered = false
fail_under = 85
precision = 1

[tool.coverage.html]
directory = "htmlcov"
title = "MyPackage Coverage Report"

[tool.coverage.xml]
output = "coverage.xml"

[tool.pytest.ini_options]
addopts = [
"--cov=mypackage",
"--cov-report=term-missing",
"--cov-report=html",
"--cov-report=xml",
"--cov-fail-under=85",
]
# Developer workflow
pytest # runs tests + coverage automatically via addopts
open htmlcov/index.html # review uncovered lines in browser

# CI workflow
pytest --cov-fail-under=85 # fail build if coverage drops below 85%

Graded Practice Challenges

Level 1 - Predict and Identify

Question 1: This function achieves 100% line coverage with one test. What coverage metric would catch the missing test case, and what is the missing case?

def classify_temperature(temp):
if temp < 0:
return "freezing"
elif temp < 20:
return "cold"
elif temp < 30:
return "comfortable"
else:
return "hot"

def test_classify():
assert classify_temperature(-5) == "freezing"
assert classify_temperature(10) == "cold"
assert classify_temperature(25) == "comfortable"
assert classify_temperature(35) == "hot"
Show Answer

Actually this test achieves both 100% line coverage AND 100% branch coverage - all four branches are exercised by the four test cases. This is a well-tested function.

The interesting question is: what is not tested? The boundary values: 0, 20, 30 exactly. These are the off-by-one errors that mutation testing with mutmut would catch - it would mutate < 0 to <= 0 or > 0 and verify whether your tests catch the change.

Add boundary tests to prevent off-by-one mutations from surviving:

def test_classify_boundaries():
assert classify_temperature(0) == "cold" # not "freezing"
assert classify_temperature(20) == "comfortable" # not "cold"
assert classify_temperature(30) == "hot" # not "comfortable"
assert classify_temperature(-1) == "freezing"
assert classify_temperature(19) == "cold"

Question 2: What does BrPart mean in this coverage report --branch output?

Name Stmts Miss Branch BrPart Cover
-------------------------------------------------
validator.py 30 1 12 3 89%
Show Answer

BrPart means branches partially covered - branches where one outcome (True or False) was exercised but the other was not. In this case, 3 branches in validator.py were partially covered: the conditional ran, but only one of its two outcomes was tested.

For example:

if config.get("strict_mode"): # BrPart: only tested when False (strict_mode absent)
apply_strict_validation(data)

If no test ever provides a config with strict_mode=True, this branch is partially covered. The if line ran (so it counts for line coverage), but the True path was never taken (branch coverage gap).

Question 3: What does this # pragma: no cover usage tell you, and is it legitimate?

def process_payment(amount, currency):
if amount > 1_000_000: # pragma: no cover
raise ValueError("Amount exceeds maximum transaction limit")
return _charge(amount, currency)
Show Answer

This is NOT a legitimate use of # pragma: no cover. The > 1_000_000 validation is business-critical logic - a missing test for it means an attacker could potentially submit a $10M transaction without triggering the limit (if there were a bug in the condition). This should have a test:

def test_process_payment_raises_on_excessive_amount():
with pytest.raises(ValueError, match="exceeds maximum"):
process_payment(1_500_000, "usd")

The # pragma: no cover here is being used to artificially inflate the coverage number by excluding an inconvenient test case. Legitimate uses are: platform-specific code that cannot run in CI, __main__ guards, TYPE_CHECKING blocks, and raise NotImplementedError in abstract base classes.

Question 4: A project has 95% line coverage but mutation testing shows a 60% mutation score. What does this indicate?

Show Answer

It indicates that most of the code runs during tests, but the tests do not verify the correct behaviour. 40% of plausible bugs - changed operators, inverted conditions, modified boundary values - would not cause any test to fail.

This is the coverage trap: tests that exist to execute lines rather than to assert outcomes. Common causes:

  • assert result is not None instead of assert result == expected_value
  • Tests that call functions but do not check return values
  • try/except blocks that swallow assertion errors
  • Tests written after implementation that mirror the implementation structure rather than testing behaviour

The fix: improve assertion quality, not just add more tests. Each test should assert a specific expected value, error message, or state change.

Level 2 - Debug and Fix

Find all coverage configuration issues in this setup:

# Issue 1: This test provides 100% line coverage but the function has a bug
def clamp(value, min_val, max_val):
if value < min_val:
return min_val
if value > max_val:
return max_val
return value

def test_clamp():
assert clamp(5, 0, 10) == 5 # middle case
assert clamp(-5, 0, 10) == 0 # below min
# Missing: above max case and boundary values

# Issue 2: coveragerc excludes critical code
# .coveragerc
[report]
exclude_lines =
if amount > 0:
if user.is_authenticated:
raise ValidationError

# Issue 3: CI threshold set too low to be useful
pytest --cov=mypackage --cov-fail-under=10

# Issue 4: pragma misused on business logic
def calculate_tax(income, rate):
if income > 0: # pragma: no cover
return income * rate
return 0
Show Solution

Issue 1 - Incomplete test for clamp:

The test covers the value < min_val branch and the min_val <= value <= max_val branch but never tests value > max_val. Add:

def test_clamp():
assert clamp(5, 0, 10) == 5 # within range
assert clamp(-5, 0, 10) == 0 # below min
assert clamp(15, 0, 10) == 10 # above max - MISSING
assert clamp(0, 0, 10) == 0 # boundary: exactly min
assert clamp(10, 0, 10) == 10 # boundary: exactly max

Issue 2 - Excluding critical logic:

The .coveragerc excludes if amount > 0:, if user.is_authenticated:, and raise ValidationError - these are exactly the conditions that should be tested. Excluding condition checks from coverage means payment logic and authentication guards have no coverage enforcement. Remove these exclusions and write the tests.

Issue 3 - Threshold of 10% is meaningless:

A threshold of 10% means CI only fails if coverage drops below 10%. Any codebase with at least one test will pass this. The threshold should be set based on the project's actual coverage (measure first), then set 2-5% below that and raise it incrementally:

# Measure first
pytest --cov=mypackage --cov-report=term-missing
# → Current coverage: 72%

# Set threshold just below current
pytest --cov=mypackage --cov-fail-under=70
# → Raises it to 75% next sprint, 80% the sprint after

Issue 4 - pragma: no cover on business logic:

if income > 0: is the core condition that determines whether tax is calculated at all. Excluding it means a bug like if income > 1_000_000: (wrong threshold) would never be caught by coverage enforcement. Remove the pragma and write the test:

def test_calculate_tax_on_positive_income():
assert calculate_tax(50_000, 0.20) == 10_000.0

def test_calculate_tax_on_zero_income():
assert calculate_tax(0, 0.20) == 0

def test_calculate_tax_on_negative_income():
assert calculate_tax(-100, 0.20) == 0

Level 3 - Design Challenge

Design a CoverageGatekeeper CI utility that:

  1. Reads a coverage.xml file and parses line and branch coverage percentages
  2. Compares against configurable thresholds per-module (e.g., payment.py requires 95%, utils.py requires 80%)
  3. Has a global fallback threshold for modules without specific requirements
  4. Outputs a pass/fail summary with per-module details
  5. Returns a non-zero exit code if any module fails its threshold
  6. Reads configuration from a dict (simulating a pyproject.toml parse)
Show Reference Solution
# coverage_gate.py
import sys
import xml.etree.ElementTree as ET
from dataclasses import dataclass, field


@dataclass
class ModuleResult:
name: str
line_rate: float
branch_rate: float
required: float
passed: bool

@property
def effective_rate(self) -> float:
"""Use branch_rate if available, else line_rate."""
return self.branch_rate if self.branch_rate > 0 else self.line_rate


@dataclass
class GatekeeperConfig:
global_threshold: float = 0.80
module_thresholds: dict[str, float] = field(default_factory=dict)


def parse_coverage_xml(xml_path: str) -> dict[str, tuple[float, float]]:
"""
Parse coverage.xml and return {module_name: (line_rate, branch_rate)}.
line_rate and branch_rate are floats between 0.0 and 1.0.
"""
tree = ET.parse(xml_path)
root = tree.getroot()
results = {}

for package in root.iter("package"):
for cls in package.iter("class"):
filename = cls.get("filename", "")
line_rate = float(cls.get("line-rate", 0))
branch_rate = float(cls.get("branch-rate", 0))
results[filename] = (line_rate, branch_rate)

return results


def evaluate_coverage(
coverage_data: dict[str, tuple[float, float]],
config: GatekeeperConfig,
) -> list[ModuleResult]:
"""Evaluate each module against its threshold."""
results = []

for module, (line_rate, branch_rate) in coverage_data.items():
# Look up module-specific threshold, fall back to global
threshold = config.global_threshold
for pattern, rate in config.module_thresholds.items():
if pattern in module:
threshold = rate
break

effective = branch_rate if branch_rate > 0 else line_rate
passed = effective >= threshold

results.append(ModuleResult(
name=module,
line_rate=line_rate,
branch_rate=branch_rate,
required=threshold,
passed=passed,
))

return sorted(results, key=lambda r: r.effective_rate)


def run_gate(xml_path: str, config: GatekeeperConfig) -> int:
"""
Main entry point. Returns 0 if all modules pass, 1 otherwise.
"""
try:
coverage_data = parse_coverage_xml(xml_path)
except (FileNotFoundError, ET.ParseError) as e:
print(f"ERROR: Could not read {xml_path}: {e}", file=sys.stderr)
return 1

results = evaluate_coverage(coverage_data, config)
failures = [r for r in results if not r.passed]

print(f"\nCoverage Gate Results ({xml_path})")
print("=" * 60)
for r in results:
status = "PASS" if r.passed else "FAIL"
rate_pct = r.effective_rate * 100
req_pct = r.required * 100
branch_info = f" (branch: {r.branch_rate:.0%})" if r.branch_rate > 0 else ""
print(
f" [{status}] {r.name:<35} "
f"{rate_pct:5.1f}% >= {req_pct:.0f}% required"
f"{branch_info}"
)

print("=" * 60)
if failures:
print(f"\nFAILED: {len(failures)} module(s) below threshold")
for f in failures:
print(f" - {f.name}: {f.effective_rate:.1%} < {f.required:.0%}")
return 1

print(f"\nPASSED: all {len(results)} module(s) meet coverage requirements")
return 0


# Usage
if __name__ == "__main__": # pragma: no cover
config = GatekeeperConfig(
global_threshold=0.80,
module_thresholds={
"payment.py": 0.95, # payment logic: high bar
"auth.py": 0.95, # authentication: high bar
"utils/email.py": 0.70, # email utils: lower bar
},
)
exit_code = run_gate("coverage.xml", config)
sys.exit(exit_code)

Design decisions:

  • parse_coverage_xml is a pure function that returns a plain dict - easy to unit test without real XML files by passing constructed dicts to evaluate_coverage
  • GatekeeperConfig is a dataclass with explicit defaults - serializable from pyproject.toml parsing
  • effective_rate uses branch coverage when available (preferred) and falls back to line coverage
  • Module matching uses substring (if pattern in module) - simple to understand, sufficient for most path patterns; a production version might use fnmatch or regex
  • The run_gate function returns an exit code (0/1) rather than raising - makes it composable in CI scripts without exception handling

Key Takeaways

  • Line coverage measures which lines ran; branch coverage measures which decisions were tested on both sides. Enable branch = true in .coveragerc - it catches an entire class of bugs that line coverage misses
  • coverage.py uses sys.settrace() - the same hook debuggers use - to trace every executed line; this is why coverage measurement slows test runs
  • Run coverage run -m pytest, then coverage report --show-missing for terminal output and coverage html for an interactive browser report showing red/green lines and branch arrows
  • Configure coverage.py in .coveragerc or pyproject.toml: use omit to exclude entire files (migrations, tests), use exclude_lines to exclude specific patterns (abstract methods, type-checking guards)
  • pytest-cov integrates coverage into pytest: --cov-fail-under=85 fails the build if coverage drops below the threshold - enforce this in CI
  • # pragma: no cover is for genuinely untestable code: platform-specific blocks, __main__ guards, TYPE_CHECKING imports. Never use it to exclude business logic to hit a coverage number
  • Set CI thresholds based on measured current coverage, 2-5% below it, then raise incrementally
  • 100% coverage does not mean correct code - it means every line ran. Tests must assert the right things; coverage cannot verify assertion quality
  • Mutation testing (mutmut) fills the gap: it mutates your source code and verifies that at least one test fails per mutation. Surviving mutants are bugs your tests would miss
  • The coverage trap: tests that exist to execute lines rather than verify behaviour can achieve 100% coverage while catching almost no bugs. Use mutation testing on critical modules to expose weak assertions

What's Next

Lesson 06 covers linting and formatting - ruff, pylint, mypy for type checking, black and isort for formatting, and pre-commit hooks to enforce quality automatically before code reaches CI.

© 2026 EngineersOfAI. All rights reserved.