What is python code coverage?

Master code coverage at engineering depth - line vs branch vs condition coverage, coverage.py internals with sys.settrace, pytest-cov, .coveragerc configuration, pragma no cover, coverage in CI, and mutation testing with mutmut to find tests that pass but don't catch bugs.

How does python coverage.py work in practice?

Code Coverage - Measuring What You Test (and What You Miss) covers python code coverage, python coverage.py, python pytest-cov from first principles with code examples. Free lesson at https://engineersofai.com/docs/python/python-intermediate/testing-and-quality/code-coverage

What is the difference between python code coverage and python pytest-cov?

See the full breakdown at https://engineersofai.com/docs/python/python-intermediate/testing-and-quality/code-coverage

Code Coverage - Measuring What You Test (and What You Miss)

Reading time: ~25 minutes | Level: Intermediate → Engineering

Before reading further, predict how much of this code is covered by the test:

def absolute_value(x):
    if x >= 0:
        return x
    return -x

def test_absolute_value():
    assert absolute_value(5) == 5

Show Answer

It depends which coverage metric you use.

Line coverage: 100%. Both return x and return -x are executable lines. The test calls absolute_value(5), which executes the if x >= 0 check, enters the if body, and hits return x. Both return lines are in the function... wait, actually return -x is never reached with input 5. Line coverage counts lines that executed. return -x is line 4 - it never ran.

Let's be precise: with input 5, lines 2 (if x >= 0), 3 (return x) execute. Line 4 (return -x) does not. Line coverage is 75% with this test.

But if coverage.py reports line coverage and the function has two branches - the if True path and the if False path - branch coverage is 50%: only the positive branch was exercised. The negative branch (x < 0) was never tested.

The puzzle is a demonstration that the metric you use changes what gaps you see. A codebase with "100% line coverage" can still have entire execution paths - half of every conditional - that were never run during testing.

Now consider: your CI pipeline shows "coverage: 94%." A new engineer asks "does that mean 94% of our code is tested?" The answer is: it means 94% of lines were executed during the test run. It says nothing about whether those executions verified correct behaviour, whether all branches were taken, whether every condition combination was tested, or whether your tests actually assert anything meaningful. Coverage is a floor, not a guarantee.

What You Will Learn

Line, branch, condition, and path coverage: differences and what each catches
How coverage.py works: sys.settrace() and why it connects to the Python internals lesson
coverage run, coverage report, coverage html, coverage xml
.coveragerc and pyproject.toml configuration: omit, exclude_lines, branch = true
pytest-cov: --cov, --cov-report, --cov-fail-under
Reading HTML reports: red lines, branch indicator arrows
# pragma: no cover: when and how to use it legitimately
Coverage in CI: fail-under thresholds, trend tracking, badges
Mutation testing with mutmut: why 90% coverage can still miss real bugs

Prerequisites

pytest basics (writing and running tests)
Module 03 Lesson 04 (Python internals - sys.settrace, frame objects) is helpful but not required
Familiarity with pyproject.toml or .ini configuration files

Part 1 - Coverage Metrics: Line, Branch, Condition, Path

Line Coverage

Line coverage counts executable lines that were run at least once:

def absolute_value(x):
    if x >= 0:      # line 2: executed ✅
        return x    # line 3: executed ✅ (x=5 path)
    return -x       # line 4: NOT executed ❌

def test_absolute_value():
    assert absolute_value(5) == 5

# Line coverage: 3 of 4 lines = 75%
# (The 'def' line and 'if' line always run; only one return runs)

What line coverage misses: any branch not taken. Every if/else pair gives you line coverage credit for both sides if each line runs at least once, but does not tell you whether both branches were tested.

Branch Coverage

Branch coverage counts decision points where both outcomes (True and False) must be exercised:

def process_order(order):
    if order["amount"] > 1000:     # branch: True AND False must both run
        apply_discount(order)
    if order["status"] == "vip":   # branch: True AND False must both run
        send_vip_email(order)
    return order

# Test with only {"amount": 500, "status": "regular"}:
# Branch 1 (amount > 1000): only False branch taken - 50% branch coverage
# Branch 2 (status == "vip"): only False branch taken - 50% branch coverage
# Line coverage: 100% (all lines run except discount and email lines)
# Branch coverage: 50%

Branch coverage requires at least two tests: one that takes each side of every conditional. It catches far more logic bugs than line coverage.

Condition Coverage and Path Coverage

Condition (MC/DC) coverage: Every sub-expression in a compound condition must independently affect the outcome:

def is_eligible(age, has_id, is_member):
    if age >= 18 and has_id and is_member:
        return True
    return False

# Branch coverage: needs True case and False case for the overall if
# Condition coverage: needs each of age>=18, has_id, is_member to independently
# cause the condition to be True AND False
# This requires at minimum 4 tests - impractical for most code

Path coverage: Every unique execution path through a function is tested. For a function with N independent binary conditions, there are 2^N paths. Path coverage is rarely achievable in practice - it is primarily a theoretical metric and a tool for understanding combinatorial test explosion.

Practical recommendation: Aim for branch coverage in production. It catches the most bugs per test written without the exponential explosion of path coverage. Line coverage is a useful sanity check but is not sufficient.

:::tip Aim for Branch Coverage, Not Just Line Coverage Enable branch = true in your .coveragerc or pyproject.toml. Branch coverage adds minimal overhead to measurement and catches an entire class of bugs that line coverage misses: logic errors in conditionals where one branch was never tested. Most teams that discover a production bug "despite 90% coverage" are measuring lines, not branches. :::

Part 2 - How `coverage.py` Works: `sys.settrace`

The Tracing Hook

coverage.py does not parse your code or instrument it at the bytecode level (unlike some other tools). It uses sys.settrace() - Python's built-in tracing hook:

import sys

def my_trace(frame, event, arg):
    """Called by CPython for every line executed, function call, and return."""
    if event == "line":
        # frame.f_code.co_filename = which file
        # frame.f_lineno = which line number
        print(f"Executing: {frame.f_code.co_filename}:{frame.f_lineno}")
    return my_trace   # return self to continue tracing

sys.settrace(my_trace)

# Now every line executed triggers my_trace
x = 1 + 2
y = x * 3
# Output:
# Executing: /path/to/script.py:10
# Executing: /path/to/script.py:11

sys.settrace(None)   # stop tracing

coverage.py installs a highly optimized version of this trace function when you run coverage run. For every line that executes, it records the (filename, line_number) pair. For branch coverage, it tracks transitions between lines to determine which branches were taken.

:::note coverage.py Uses sys.settrace - The Same Mechanism as Debuggers This is why running tests under coverage is slower than running them without coverage. Every executed line triggers the trace callback. sys.settrace is a global, single-slot hook - only one trace function can be active at a time. This means you cannot run coverage.py and pdb simultaneously in their standard forms. The internals lesson on sys.settrace (Module 03) explains the frame objects that the trace function receives. :::

Why Coverage Slows Tests

The overhead is proportional to how many lines execute. A tight loop that runs 10 million iterations:

# Without coverage: loop executes in ~100ms
result = sum(i * i for i in range(10_000_000))

# With coverage: trace callback fires ~10 million times - may take 5x longer
# coverage.py optimizes this extensively, but the fundamental overhead remains

For test suites measuring minutes rather than seconds, coverage overhead is usually acceptable (10-30% slowdown). For microbenchmarks or performance-sensitive test scenarios, run coverage separately, not on every test run.

Part 3 - `coverage.py` Commands

Basic Workflow

# Install
pip install coverage

# Step 1: Run your test suite under coverage
coverage run -m pytest tests/

# Step 2: View a summary report in the terminal
coverage report

# Example output:
# Name                      Stmts   Miss  Cover
# ---------------------------------------------
# payment.py                   42      5    88%
# bank_account.py              67      0   100%
# utils/email.py               23      8    65%
# ---------------------------------------------
# TOTAL                       132     13    90%

# Step 3: Generate an HTML report (open in browser)
coverage html
# Creates htmlcov/index.html - click on any file to see red/green line highlighting

# Step 4: Generate XML (for CI integration - SonarQube, Codecov, etc.)
coverage xml
# Creates coverage.xml

# Show missing lines inline
coverage report --show-missing
# payment.py    42    5    88%   23-25, 41, 67
# Line numbers of uncovered lines shown in last column

Branch Coverage

# Enable branch coverage measurement
coverage run --branch -m pytest tests/

coverage report --branch
# Name           Stmts   Miss Branch BrPart  Cover
# -------------------------------------------------
# payment.py        42      2     18      3    88%
# BrPart = branches partially covered (one arm missing)

Part 4 - Configuration: `.coveragerc` and `pyproject.toml`

`.coveragerc` File

# .coveragerc
[run]
branch = true
source = mypackage
omit =
    */tests/*
    */migrations/*
    setup.py
    conftest.py

[report]
exclude_lines =
    # Standard pragmas
    pragma: no cover

    # Abstract methods never called directly
    raise NotImplementedError

    # Type checking imports (never run at runtime)
    if TYPE_CHECKING:

    # __repr__ and __str__ are tested implicitly or not worth testing
    def __repr__
    def __str__

    # Main guard
    if __name__ == .__main__.:

show_missing = true
skip_covered = false

[html]
directory = htmlcov

[xml]
output = coverage.xml

`pyproject.toml` (Modern Approach)

# pyproject.toml
[tool.coverage.run]
branch = true
source = ["mypackage"]
omit = [
    "*/tests/*",
    "*/migrations/*",
    "setup.py",
]

[tool.coverage.report]
exclude_lines = [
    "pragma: no cover",
    "raise NotImplementedError",
    "if TYPE_CHECKING:",
    "def __repr__",
    "if __name__ == .__main__.:",
]
show_missing = true
fail_under = 85

[tool.coverage.html]
directory = "htmlcov"

`omit` vs `exclude_lines`

omit: excludes entire files from measurement. Use for test files themselves, migrations, generated code, vendored libraries.
exclude_lines: excludes specific patterns of lines within measured files. Use for unreachable defensive code, platform-specific blocks, TYPE_CHECKING guards.

Part 5 - `pytest-cov`: Coverage Integrated with pytest

# Install
pip install pytest-cov

# Run with coverage
pytest --cov=mypackage tests/

# Multiple source directories
pytest --cov=mypackage --cov=utils tests/

# With branch coverage
pytest --cov=mypackage --cov-branch tests/

# Generate HTML report alongside terminal output
pytest --cov=mypackage --cov-report=html tests/

# Multiple report formats at once
pytest --cov=mypackage \
       --cov-report=term-missing \
       --cov-report=html \
       --cov-report=xml \
       tests/

# Fail if coverage falls below threshold (useful in CI)
pytest --cov=mypackage --cov-fail-under=85 tests/
# Exit code 2 if coverage < 85% - CI pipeline fails

# Show which lines are missing
pytest --cov=mypackage --cov-report=term-missing tests/
# payment.py   42   5   88%   23-25, 41, 67

`pytest-cov` in `pyproject.toml`

[tool.pytest.ini_options]
addopts = "--cov=mypackage --cov-report=term-missing --cov-report=html"

This bakes coverage into every pytest run - every developer and CI pipeline gets coverage by default.

Part 6 - Reading HTML Coverage Reports

The HTML report (coverage html) generates an interactive browser view. Each file shows:

Green lines: executed during the test run
Red lines: never executed - uncovered
Yellow/orange lines: partially covered (branch coverage only) - line ran but not all branches taken

# Example function - what the HTML report shows

def validate_age(age):                              # green - always runs
    if age < 0:                                     # green - always runs (branch partially covered)
        raise ValueError("Age cannot be negative")  # RED - negative branch never tested
    if age > 150:                                   # green - always runs (branch partially covered)
        raise ValueError("Age unrealistically high") # RED - >150 branch never tested
    return True                                     # green - runs when valid

Branch indicators in the HTML report show arrows: a filled arrow means a branch was taken; an empty arrow means it was not. Hovering over partially-covered lines shows which branch was missed: "Branch 1→4 not taken (condition was always False)."

Part 7 - `# pragma: no cover`

# pragma: no cover tells coverage.py to exclude a line or block from measurement:

# Legitimate uses of pragma: no cover

# 1. Platform-specific code that cannot run on the test platform
import sys
if sys.platform == "win32":  # pragma: no cover
    def get_registry_value(key):
        import winreg
        return winreg.QueryValue(winreg.HKEY_LOCAL_MACHINE, key)

# 2. Debug-only code that is never active in tests
if os.environ.get("DEBUG_VERBOSE"):  # pragma: no cover
    print(f"Processing item: {item}")

# 3. The __main__ guard
if __name__ == "__main__":  # pragma: no cover
    main()

# 4. Abstract method bodies that raise NotImplementedError
class BaseProcessor:
    def process(self, data):  # pragma: no cover
        raise NotImplementedError("Subclasses must implement process()")

# 5. Type-checking-only imports
from typing import TYPE_CHECKING
if TYPE_CHECKING:  # pragma: no cover
    from myapp.models import User

:::danger Never Exclude Meaningful Code with # pragma: no cover # pragma: no cover is not a way to achieve a coverage number goal. It is for genuinely untestable code: platform-specific blocks that cannot run in CI, __main__ guards, debug logging. Never use it to exclude logic branches that are simply inconvenient to test - those branches are exactly what coverage exists to flag. If you find yourself adding pragma: no cover to business logic, the correct action is to write the test. :::

Part 8 - Coverage in CI

GitHub Actions Example

# .github/workflows/test.yml
name: Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install dependencies
        run: pip install -e ".[dev]"

      - name: Run tests with coverage
        run: |
          pytest \
            --cov=mypackage \
            --cov-report=xml \
            --cov-report=term-missing \
            --cov-fail-under=85

      - name: Upload coverage to Codecov
        uses: codecov/codecov-action@v4
        with:
          file: coverage.xml
          fail_ci_if_error: true

Setting a Meaningful Threshold

# Start conservatively - measure current coverage first
pytest --cov=mypackage --cov-report=term-missing

# Set threshold slightly below current value
# (raise it incrementally as coverage improves)
pytest --cov=mypackage --cov-fail-under=80

# In pyproject.toml for persistent enforcement
[tool.coverage.report]
fail_under = 85

The threshold strategy: measure what you have, set the threshold 2-5% below it (to avoid blocking CI immediately), then raise it as you add tests. Setting the threshold at 100% immediately on a legacy codebase is a mistake - it blocks all development until everything is covered.

Coverage Badges

<!-- In your README.md - after setting up Codecov or Coveralls -->
[![Coverage](https://codecov.io/gh/yourorg/yourrepo/branch/main/graph/badge.svg)](https://codecov.io/gh/yourorg/yourrepo)

Coverage badges in the README communicate the project's test quality at a glance to contributors and users.

:::warning 100% Coverage Does Not Mean Your Code Is Correct Coverage measures execution. Execution does not imply verification. A test that calls every line and asserts only assert result is not None achieves 100% coverage while catching almost no bugs. The quality of what tests assert matters as much as - often more than - which lines they execute. :::

Part 9 - Mutation Testing with `mutmut`

The Coverage Limitation

def is_adult(age):
    return age >= 18

def test_is_adult():
    assert is_adult(20) is True
    assert is_adult(15) is False
    # Line coverage: 100%
    # Branch coverage: 100%

Both metrics show 100%. But what if a developer accidentally changes >= to >?

def is_adult(age):
    return age > 18   # bug: 18-year-olds no longer qualify

The test suite still passes - is_adult(20) is still True, is_adult(15) is still False. The off-by-one bug at the boundary is not caught because no test exercises is_adult(18).

This is what mutation testing detects.

How `mutmut` Works

mutmut systematically modifies your source code - creating mutants - and runs your test suite against each mutant. If the tests catch the mutation (a test fails), the mutant is killed. If the tests pass with the mutated code, the mutant survives - indicating a gap in your test suite.

# Install
pip install mutmut

# Run mutation testing on your package
mutmut run --paths-to-mutate mypackage/

# View results
mutmut results
# Survived mutants are the ones your tests missed:
# mypackage/payment.py:L23 [#42]
# mypackage/validator.py:L15 [#17]

# See what a specific mutant looks like
mutmut show 42
# --- payment.py
# +++ payment.py
# @@ -23 +23 @@
# -    if amount > 0:
# +    if amount >= 0:

Common Mutations `mutmut` Generates

# Original → Mutation → What it tests

age >= 18          →  age > 18           # tests boundary conditions
x == y             →  x != y             # tests equality logic
count += 1         →  count -= 1         # tests increment direction
return True        →  return False       # tests return value
results.append(x)  →  results           # tests that append is called
if condition:      →  if not condition:  # tests conditional logic
a and b            →  a or b            # tests boolean operators

Interpreting Results

mutmut results

# Output:
# Survived: 15
# Killed: 203
# Total: 218
# Mutation score: 93.1%

# mutation score = killed / total * 100
# Higher is better - surviving mutants are test gaps

A mutation score of 93% means 7% of plausible bugs would not be caught by your tests. The surviving mutants tell you exactly which conditions, boundaries, and logic branches need better assertions.

:::note Mutation Testing Is Slow - Use It Selectively mutmut runs your entire test suite once per mutant. A codebase with 200 functions and 1000 mutants takes 1000 test runs. Use mutation testing on your most critical modules - payment logic, authentication, data validation - not the entire codebase at once. Run it in CI on a schedule (nightly or weekly) rather than on every commit. :::

The Coverage Trap: High Coverage, Low Quality Tests

# This achieves 100% line and branch coverage - and catches almost nothing

def divide(a, b):
    if b == 0:
        raise ZeroDivisionError("Cannot divide by zero")
    return a / b

# "Coverage trap" tests - they run every line but assert nothing meaningful
def test_divide():
    try:
        result = divide(10, 2)
    except Exception:
        pass   # silently swallow all errors

def test_divide_zero():
    try:
        divide(10, 0)
    except ZeroDivisionError:
        pass   # catches the right exception but asserts nothing about it

# Coverage: 100%. Bugs caught: 0.
# divide(10, 2) could return 42 and both tests would still pass.

# Correct tests that actually verify behaviour:
def test_divide_positive_numbers():
    assert divide(10, 2) == 5.0

def test_divide_raises_on_zero_denominator():
    with pytest.raises(ZeroDivisionError, match="Cannot divide by zero"):
        divide(10, 0)

Coverage tells you which lines ran. It cannot tell you whether the assertions are meaningful. This is why coverage is necessary but not sufficient.

Part 10 - Putting It Together: Production Coverage Setup

# pyproject.toml - complete production coverage configuration

[tool.coverage.run]
branch = true
source = ["mypackage"]
omit = [
    "*/tests/*",
    "*/migrations/*",
    "mypackage/vendor/*",
    "setup.py",
]
parallel = false
data_file = ".coverage"

[tool.coverage.report]
exclude_lines = [
    "pragma: no cover",
    "raise NotImplementedError",
    "if TYPE_CHECKING:",
    "def __repr__",
    "def __str__",
    "if __name__ == .__main__.:",
    "@(abc\\.)?abstractmethod",
]
show_missing = true
skip_covered = false
fail_under = 85
precision = 1

[tool.coverage.html]
directory = "htmlcov"
title = "MyPackage Coverage Report"

[tool.coverage.xml]
output = "coverage.xml"

[tool.pytest.ini_options]
addopts = [
    "--cov=mypackage",
    "--cov-report=term-missing",
    "--cov-report=html",
    "--cov-report=xml",
    "--cov-fail-under=85",
]

# Developer workflow
pytest                          # runs tests + coverage automatically via addopts
open htmlcov/index.html         # review uncovered lines in browser

# CI workflow
pytest --cov-fail-under=85      # fail build if coverage drops below 85%

Graded Practice Challenges

Level 1 - Predict and Identify

Question 1: This function achieves 100% line coverage with one test. What coverage metric would catch the missing test case, and what is the missing case?

def classify_temperature(temp):
    if temp < 0:
        return "freezing"
    elif temp < 20:
        return "cold"
    elif temp < 30:
        return "comfortable"
    else:
        return "hot"

def test_classify():
    assert classify_temperature(-5) == "freezing"
    assert classify_temperature(10) == "cold"
    assert classify_temperature(25) == "comfortable"
    assert classify_temperature(35) == "hot"

Show Answer

Actually this test achieves both 100% line coverage AND 100% branch coverage - all four branches are exercised by the four test cases. This is a well-tested function.

The interesting question is: what is not tested? The boundary values: 0, 20, 30 exactly. These are the off-by-one errors that mutation testing with mutmut would catch - it would mutate < 0 to <= 0 or > 0 and verify whether your tests catch the change.

Add boundary tests to prevent off-by-one mutations from surviving:

def test_classify_boundaries():
    assert classify_temperature(0) == "cold"      # not "freezing"
    assert classify_temperature(20) == "comfortable"  # not "cold"
    assert classify_temperature(30) == "hot"          # not "comfortable"
    assert classify_temperature(-1) == "freezing"
    assert classify_temperature(19) == "cold"

Question 2: What does BrPart mean in this coverage report --branch output?

Name           Stmts   Miss Branch BrPart  Cover
-------------------------------------------------
validator.py      30      1     12      3    89%

Show Answer

BrPart means branches partially covered - branches where one outcome (True or False) was exercised but the other was not. In this case, 3 branches in validator.py were partially covered: the conditional ran, but only one of its two outcomes was tested.

For example:

if config.get("strict_mode"):   # BrPart: only tested when False (strict_mode absent)
    apply_strict_validation(data)

If no test ever provides a config with strict_mode=True, this branch is partially covered. The if line ran (so it counts for line coverage), but the True path was never taken (branch coverage gap).

Question 3: What does this # pragma: no cover usage tell you, and is it legitimate?

def process_payment(amount, currency):
    if amount > 1_000_000:  # pragma: no cover
        raise ValueError("Amount exceeds maximum transaction limit")
    return _charge(amount, currency)

Show Answer

This is NOT a legitimate use of # pragma: no cover. The > 1_000_000 validation is business-critical logic - a missing test for it means an attacker could potentially submit a $10M transaction without triggering the limit (if there were a bug in the condition). This should have a test:

def test_process_payment_raises_on_excessive_amount():
    with pytest.raises(ValueError, match="exceeds maximum"):
        process_payment(1_500_000, "usd")

The # pragma: no cover here is being used to artificially inflate the coverage number by excluding an inconvenient test case. Legitimate uses are: platform-specific code that cannot run in CI, __main__ guards, TYPE_CHECKING blocks, and raise NotImplementedError in abstract base classes.

Question 4: A project has 95% line coverage but mutation testing shows a 60% mutation score. What does this indicate?

Show Answer

It indicates that most of the code runs during tests, but the tests do not verify the correct behaviour. 40% of plausible bugs - changed operators, inverted conditions, modified boundary values - would not cause any test to fail.

This is the coverage trap: tests that exist to execute lines rather than to assert outcomes. Common causes:

assert result is not None instead of assert result == expected_value
Tests that call functions but do not check return values
try/except blocks that swallow assertion errors
Tests written after implementation that mirror the implementation structure rather than testing behaviour

The fix: improve assertion quality, not just add more tests. Each test should assert a specific expected value, error message, or state change.

Level 2 - Debug and Fix

Find all coverage configuration issues in this setup:

# Issue 1: This test provides 100% line coverage but the function has a bug
def clamp(value, min_val, max_val):
    if value < min_val:
        return min_val
    if value > max_val:
        return max_val
    return value

def test_clamp():
    assert clamp(5, 0, 10) == 5    # middle case
    assert clamp(-5, 0, 10) == 0   # below min
    # Missing: above max case and boundary values

# Issue 2: coveragerc excludes critical code
# .coveragerc
[report]
exclude_lines =
    if amount > 0:
    if user.is_authenticated:
    raise ValidationError

# Issue 3: CI threshold set too low to be useful
pytest --cov=mypackage --cov-fail-under=10

# Issue 4: pragma misused on business logic
def calculate_tax(income, rate):
    if income > 0:  # pragma: no cover
        return income * rate
    return 0

Show Solution

Issue 1 - Incomplete test for clamp:

The test covers the value < min_val branch and the min_val <= value <= max_val branch but never tests value > max_val. Add:

def test_clamp():
    assert clamp(5, 0, 10) == 5     # within range
    assert clamp(-5, 0, 10) == 0    # below min
    assert clamp(15, 0, 10) == 10   # above max - MISSING
    assert clamp(0, 0, 10) == 0     # boundary: exactly min
    assert clamp(10, 0, 10) == 10   # boundary: exactly max

Issue 2 - Excluding critical logic:

The .coveragerc excludes if amount > 0:, if user.is_authenticated:, and raise ValidationError - these are exactly the conditions that should be tested. Excluding condition checks from coverage means payment logic and authentication guards have no coverage enforcement. Remove these exclusions and write the tests.

Issue 3 - Threshold of 10% is meaningless:

A threshold of 10% means CI only fails if coverage drops below 10%. Any codebase with at least one test will pass this. The threshold should be set based on the project's actual coverage (measure first), then set 2-5% below that and raise it incrementally:

# Measure first
pytest --cov=mypackage --cov-report=term-missing
# → Current coverage: 72%

# Set threshold just below current
pytest --cov=mypackage --cov-fail-under=70
# → Raises it to 75% next sprint, 80% the sprint after

Issue 4 - pragma: no cover on business logic:

if income > 0: is the core condition that determines whether tax is calculated at all. Excluding it means a bug like if income > 1_000_000: (wrong threshold) would never be caught by coverage enforcement. Remove the pragma and write the test:

def test_calculate_tax_on_positive_income():
    assert calculate_tax(50_000, 0.20) == 10_000.0

def test_calculate_tax_on_zero_income():
    assert calculate_tax(0, 0.20) == 0

def test_calculate_tax_on_negative_income():
    assert calculate_tax(-100, 0.20) == 0

Level 3 - Design Challenge

Design a CoverageGatekeeper CI utility that:

Reads a coverage.xml file and parses line and branch coverage percentages
Compares against configurable thresholds per-module (e.g., payment.py requires 95%, utils.py requires 80%)
Has a global fallback threshold for modules without specific requirements
Outputs a pass/fail summary with per-module details
Returns a non-zero exit code if any module fails its threshold
Reads configuration from a dict (simulating a pyproject.toml parse)

Show Reference Solution

# coverage_gate.py
import sys
import xml.etree.ElementTree as ET
from dataclasses import dataclass, field


@dataclass
class ModuleResult:
    name: str
    line_rate: float
    branch_rate: float
    required: float
    passed: bool

    @property
    def effective_rate(self) -> float:
        """Use branch_rate if available, else line_rate."""
        return self.branch_rate if self.branch_rate > 0 else self.line_rate


@dataclass
class GatekeeperConfig:
    global_threshold: float = 0.80
    module_thresholds: dict[str, float] = field(default_factory=dict)


def parse_coverage_xml(xml_path: str) -> dict[str, tuple[float, float]]:
    """
    Parse coverage.xml and return {module_name: (line_rate, branch_rate)}.
    line_rate and branch_rate are floats between 0.0 and 1.0.
    """
    tree = ET.parse(xml_path)
    root = tree.getroot()
    results = {}

    for package in root.iter("package"):
        for cls in package.iter("class"):
            filename = cls.get("filename", "")
            line_rate = float(cls.get("line-rate", 0))
            branch_rate = float(cls.get("branch-rate", 0))
            results[filename] = (line_rate, branch_rate)

    return results


def evaluate_coverage(
    coverage_data: dict[str, tuple[float, float]],
    config: GatekeeperConfig,
) -> list[ModuleResult]:
    """Evaluate each module against its threshold."""
    results = []

    for module, (line_rate, branch_rate) in coverage_data.items():
        # Look up module-specific threshold, fall back to global
        threshold = config.global_threshold
        for pattern, rate in config.module_thresholds.items():
            if pattern in module:
                threshold = rate
                break

        effective = branch_rate if branch_rate > 0 else line_rate
        passed = effective >= threshold

        results.append(ModuleResult(
            name=module,
            line_rate=line_rate,
            branch_rate=branch_rate,
            required=threshold,
            passed=passed,
        ))

    return sorted(results, key=lambda r: r.effective_rate)


def run_gate(xml_path: str, config: GatekeeperConfig) -> int:
    """
    Main entry point. Returns 0 if all modules pass, 1 otherwise.
    """
    try:
        coverage_data = parse_coverage_xml(xml_path)
    except (FileNotFoundError, ET.ParseError) as e:
        print(f"ERROR: Could not read {xml_path}: {e}", file=sys.stderr)
        return 1

    results = evaluate_coverage(coverage_data, config)
    failures = [r for r in results if not r.passed]

    print(f"\nCoverage Gate Results ({xml_path})")
    print("=" * 60)
    for r in results:
        status = "PASS" if r.passed else "FAIL"
        rate_pct = r.effective_rate * 100
        req_pct = r.required * 100
        branch_info = f" (branch: {r.branch_rate:.0%})" if r.branch_rate > 0 else ""
        print(
            f"  [{status}] {r.name:<35} "
            f"{rate_pct:5.1f}% >= {req_pct:.0f}% required"
            f"{branch_info}"
        )

    print("=" * 60)
    if failures:
        print(f"\nFAILED: {len(failures)} module(s) below threshold")
        for f in failures:
            print(f"  - {f.name}: {f.effective_rate:.1%} < {f.required:.0%}")
        return 1

    print(f"\nPASSED: all {len(results)} module(s) meet coverage requirements")
    return 0


# Usage
if __name__ == "__main__":  # pragma: no cover
    config = GatekeeperConfig(
        global_threshold=0.80,
        module_thresholds={
            "payment.py": 0.95,       # payment logic: high bar
            "auth.py": 0.95,          # authentication: high bar
            "utils/email.py": 0.70,   # email utils: lower bar
        },
    )
    exit_code = run_gate("coverage.xml", config)
    sys.exit(exit_code)

Design decisions:

parse_coverage_xml is a pure function that returns a plain dict - easy to unit test without real XML files by passing constructed dicts to evaluate_coverage
GatekeeperConfig is a dataclass with explicit defaults - serializable from pyproject.toml parsing
effective_rate uses branch coverage when available (preferred) and falls back to line coverage
Module matching uses substring (if pattern in module) - simple to understand, sufficient for most path patterns; a production version might use fnmatch or regex
The run_gate function returns an exit code (0/1) rather than raising - makes it composable in CI scripts without exception handling

Key Takeaways

Line coverage measures which lines ran; branch coverage measures which decisions were tested on both sides. Enable branch = true in .coveragerc - it catches an entire class of bugs that line coverage misses
coverage.py uses sys.settrace() - the same hook debuggers use - to trace every executed line; this is why coverage measurement slows test runs
Run coverage run -m pytest, then coverage report --show-missing for terminal output and coverage html for an interactive browser report showing red/green lines and branch arrows
Configure coverage.py in .coveragerc or pyproject.toml: use omit to exclude entire files (migrations, tests), use exclude_lines to exclude specific patterns (abstract methods, type-checking guards)
pytest-cov integrates coverage into pytest: --cov-fail-under=85 fails the build if coverage drops below the threshold - enforce this in CI
# pragma: no cover is for genuinely untestable code: platform-specific blocks, __main__ guards, TYPE_CHECKING imports. Never use it to exclude business logic to hit a coverage number
Set CI thresholds based on measured current coverage, 2-5% below it, then raise incrementally
100% coverage does not mean correct code - it means every line ran. Tests must assert the right things; coverage cannot verify assertion quality
Mutation testing (mutmut) fills the gap: it mutates your source code and verifies that at least one test fails per mutation. Surviving mutants are bugs your tests would miss
The coverage trap: tests that exist to execute lines rather than verify behaviour can achieve 100% coverage while catching almost no bugs. Use mutation testing on critical modules to expose weak assertions

What's Next

Lesson 06 covers linting and formatting - ruff, pylint, mypy for type checking, black and isort for formatting, and pre-commit hooks to enforce quality automatically before code reaches CI.

What You Will Learn​

Prerequisites​

Part 1 - Coverage Metrics: Line, Branch, Condition, Path​

Line Coverage​

Branch Coverage​

Condition Coverage and Path Coverage​

Part 2 - How coverage.py Works: sys.settrace​

The Tracing Hook​

Why Coverage Slows Tests​

Part 3 - coverage.py Commands​

Basic Workflow​

Branch Coverage​

Part 4 - Configuration: .coveragerc and pyproject.toml​

.coveragerc File​

pyproject.toml (Modern Approach)​

omit vs exclude_lines​

Part 5 - pytest-cov: Coverage Integrated with pytest​

pytest-cov in pyproject.toml​

Part 6 - Reading HTML Coverage Reports​

Part 7 - # pragma: no cover​

Part 8 - Coverage in CI​

GitHub Actions Example​

Setting a Meaningful Threshold​

Coverage Badges​

Part 9 - Mutation Testing with mutmut​

The Coverage Limitation​

How mutmut Works​

Common Mutations mutmut Generates​

Interpreting Results​

The Coverage Trap: High Coverage, Low Quality Tests​

Part 10 - Putting It Together: Production Coverage Setup​

Graded Practice Challenges​

Level 1 - Predict and Identify​

Level 2 - Debug and Fix​

Level 3 - Design Challenge​

Key Takeaways​

What's Next​

What You Will Learn

Prerequisites

Part 1 - Coverage Metrics: Line, Branch, Condition, Path

Line Coverage

Branch Coverage

Condition Coverage and Path Coverage

Part 2 - How `coverage.py` Works: `sys.settrace`

The Tracing Hook

Why Coverage Slows Tests

Part 3 - `coverage.py` Commands

Basic Workflow

Branch Coverage

Part 4 - Configuration: `.coveragerc` and `pyproject.toml`

`.coveragerc` File

`pyproject.toml` (Modern Approach)

`omit` vs `exclude_lines`

Part 5 - `pytest-cov`: Coverage Integrated with pytest

`pytest-cov` in `pyproject.toml`

Part 6 - Reading HTML Coverage Reports

Part 7 - `# pragma: no cover`

Part 8 - Coverage in CI

GitHub Actions Example

Setting a Meaningful Threshold

Coverage Badges

Part 9 - Mutation Testing with `mutmut`

The Coverage Limitation

How `mutmut` Works

Common Mutations `mutmut` Generates

Interpreting Results

The Coverage Trap: High Coverage, Low Quality Tests

Part 10 - Putting It Together: Production Coverage Setup

Graded Practice Challenges

Level 1 - Predict and Identify

Level 2 - Debug and Fix

Level 3 - Design Challenge

Key Takeaways

What's Next