Comments vs Docstrings - Documentation as Executable Metadata
Reading time: ~22 minutes | Level: Foundation → Engineering
Look at these two functions. Both are "documented." Only one documents anything useful.
# Function A
def calculate_retry_delay(attempt: int) -> float:
# multiply attempt by 2
return attempt * 2.0
# Function B
def calculate_retry_delay(attempt: int) -> float:
"""
Return the exponential backoff delay in seconds for a given attempt number.
Uses a base of 2 seconds with jitter added by the caller. Attempt numbers
are 0-indexed: attempt 0 returns 0.0 (no delay before the first try),
attempt 1 returns 2.0, attempt 2 returns 4.0, and so on.
The maximum value returned is not capped here - apply a cap at the call site
using min(calculate_retry_delay(n), MAX_DELAY).
Args:
attempt: Zero-indexed retry attempt number. Must be non-negative.
Returns:
Delay in seconds as a float.
"""
return float(attempt * 2)
Function A's comment tells you what the multiplication operator does - something anyone can read from the code. Function B's docstring tells you why this specific multiplier, what the return value means in context, what the index convention is, and what invariant the caller must enforce. It also happens to be accessible at runtime, queryable by help(), readable by IDEs, and parseable by documentation generators. The comment in Function A is noise. The docstring in Function B is part of the API contract.
Understanding the difference - and the engineering reasons behind it - is what this lesson covers.
What You Will Learn
By the end of this lesson you will understand the exact boundary between comments and docstrings, write comments that add genuine value rather than restating code, write docstrings in all three professional formats (Google, NumPy, Sphinx), use help(), inspect.getdoc(), and the __doc__ attribute to introspect your code at runtime, understand how documentation generators consume docstrings, explain the modern role of type hints alongside docstrings, and recognize - and fix - the four most destructive documentation anti-patterns.
Prerequisites
- Python functions, classes, and modules (basic definitions)
- Variables and data types
- An interest in writing code that other people can actually use
The Three-Layer Documentation Philosophy
Python documentation has three distinct layers, each with a different purpose and different audience:
| Layer | Tool | Answers | Audience | Runtime accessible? |
|---|---|---|---|---|
| 1: Code itself | Variable/function names, small functions | WHAT does this do? | Anyone reading source | - |
2: Comments (#) | Inline comments | WHY is it done this way? | Future maintainers | No - stripped at parse time |
3: Docstrings ("""...""") | Module/class/function docstrings | HOW do I use this? | Callers, library users, API consumers | Yes - stored in __doc__ |
When the code itself is clear enough, no comment is needed. When a comment is needed, that is a signal to ask whether the code could be made clearer instead. When a function, class, or module forms part of any public or internal API, a docstring is mandatory - not because someone told you to write one, but because without it, callers cannot use the API without reading the implementation.
Part 1 - Comments
The # Syntax and PEP 8 Style Rules
A comment begins at the # character and extends to the end of the line. Python's style guide (PEP 8) has specific rules:
# Block comment: starts with # followed by a single space.
# It sits on its own line, indented to the same level as the code it describes.
x = 10 # Inline comment: at least two spaces before #, one space after.
#This is wrong - no space after # (PEP 8 violation)
x = 10 #Also wrong - no space before the comment text
Inline comments are appropriate for brief clarifications on a single expression. Block comments are used for explanations that span a concept rather than a single line. Do not mix the two styles on the same conceptual unit.
When to Write a Comment vs When to Refactor
This is the most important judgment call in writing comments. Before writing any comment, ask: can the code be rewritten so that the comment becomes unnecessary?
# Bad: the comment explains what the operator does
# Multiply quantity by unit price
total = qty * price
# Bad: the variable name already tells you this - comment is pure noise
n = 10 # number of retries
# Good: refactoring eliminates the comment entirely
total_price = quantity * unit_price
max_retries = 10
Now consider this:
# Bad: obvious code, useless comment
users = get_users()
# Loop through users
for user in users:
send_email(user)
# Good: same code, comment explains the NON-OBVIOUS decision
# BCC all users rather than individual sends to avoid rate limiting
# on the transactional email provider (limit: 100 req/min per IP).
for user in users:
queue_for_batch_send(user)
The rule of thumb: if the comment could have been written by someone who understood only the programming language and not the problem domain, it is probably restating the obvious. Comments earn their place when they convey domain knowledge, historical context, or engineering trade-offs that the code cannot express.
Good Comment Examples - Explaining the "Why"
# Use a set for O(1) lookup; the list version was the bottleneck
# identified in the 2025-11 profiling session (see perf/report-001.md).
seen_ids = set()
# We intentionally do not cache this result: the underlying data
# changes frequently and stale cache hits caused the P0 incident
# on 2026-01-14 (see incident report INC-4892).
user_data = fetch_fresh_user_data(user_id)
# Keycloak returns roles in a nested structure under 'realm_access'.
# This differs from the OAuth2 spec - see Keycloak issue #12345.
roles = token_payload.get("realm_access", {}).get("roles", [])
# NOTE: this comparison must use `is` not `==` because MyLibrary
# overrides __eq__ to return a non-boolean type (a known upstream bug).
if result is None:
return default_value
Each of these comments carries information that cannot be read from the code: a performance decision with evidence, a safety constraint from an incident, a third-party quirk, and a workaround for a known bug. These are worth writing.
Comments for Algorithms
For non-trivial algorithms, a comment that names the algorithm and cites a reference is more valuable than a line-by-line explanation:
# Binary search - O(log n). See CLRS 3rd ed., Section 2.3.
# Invariant: target is in arr[low..high] if it exists.
low, high = 0, len(arr) - 1
while low <= high:
mid = (low + high) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
low = mid + 1
else:
high = mid - 1
return -1
:::tip Comment Formatting for TODOs and FIXMEs Most teams and IDEs recognize specific comment prefixes:
# TODO: replace with asyncio version once we upgrade to Python 3.12
# FIXME: this silently drops items when the queue is full
# HACK: workaround for upstream bug, remove after library v2.1 release
# NOTE: thread safety not guaranteed - callers must hold the lock
Use these consistently so they are searchable and visible in IDE annotations. :::
Part 2 - Docstrings
What a Docstring Is at the Language Level
A docstring is a string literal - not a comment - placed as the first statement of a module, class, function, or method. Python's compiler specifically looks for this pattern and stores the string in the object's __doc__ attribute.
def greet(name: str) -> str:
"""Return a personalized greeting string."""
return f"Hello, {name}!"
print(greet.__doc__) # Return a personalized greeting string.
print(type(greet.__doc__)) # <class 'str'>
This is fundamentally different from a comment. Comments are stripped during parsing and have no runtime presence. Docstrings are part of the compiled bytecode and persist as attributes of the object they document. They can be read, formatted, and processed by any Python code - at runtime.
Where Docstrings Live
Docstrings are valid at four levels:
"""Module docstring - describes the module's purpose and public API overview."""
import sys
from typing import Optional
class PaymentProcessor:
"""
Class docstring - describes what instances represent and the overall API.
Handles payment authorization, capture, and refund operations against
the configured payment gateway. Thread-safe for concurrent use.
"""
def __init__(self, api_key: str, sandbox: bool = False) -> None:
"""
Method docstring - describes this specific method.
Initialize the processor with credentials and environment config.
"""
self.api_key = api_key
self.sandbox = sandbox
def charge(self, amount_cents: int, token: str) -> dict:
"""Authorize and capture a payment. Returns the gateway response dict."""
...
PEP 257 Conventions
PEP 257 is the official docstring style guide. Its key rules:
For single-line docstrings:
def add(a: int, b: int) -> int:
"""Return the sum of a and b.""" # Correct: all on one line
return a + b
# Wrong: opening quotes on separate line
def add(a: int, b: int) -> int:
"""
Return the sum of a and b.
"""
# Wrong: closing quotes not on last line of content
def add(a: int, b: int) -> int:
"""Return the sum of a and b.
"""
For multi-line docstrings:
def process_payment(amount: float, currency: str) -> dict:
"""
Process a payment and return the transaction result.
The first line is a summary - one sentence, imperative mood ("Return",
"Process", "Compute" - not "Returns", "Processes", "Computes").
A blank line separates the summary from the extended description.
The extended description explains behavior, edge cases, side effects,
and anything else a caller needs to know.
The closing triple-quote sits on its own line for multi-line docstrings.
"""
...
PEP 257 specifies that the first line of any docstring should be a short, imperative summary ("Do X" not "Does X"). This is the line that appears in help() listings and IDEs.
The Three Professional Docstring Formats
PEP 257 defines the structural rules. Three style conventions define how to document parameters, return values, exceptions, and examples. You must choose one format and apply it consistently throughout a project.
Google Style
Widely used in open-source Python projects. Uses indented section headers with no special markup:
def fetch_user(user_id: int, include_deleted: bool = False) -> Optional[dict]:
"""
Fetch a user record from the database by ID.
Queries the primary database. Returns None if the user does not exist.
Deleted users are excluded by default; pass include_deleted=True to
retrieve them, which is required for audit log display.
Args:
user_id: The integer primary key of the user record.
include_deleted: When True, soft-deleted users are included
in the result set. Defaults to False.
Returns:
A dict with keys 'id', 'email', 'name', 'created_at', and optionally
'deleted_at'. Returns None if no matching record is found.
Raises:
DatabaseConnectionError: If the database is unreachable.
ValueError: If user_id is not a positive integer.
Example:
>>> user = fetch_user(42)
>>> print(user['email'])
"""
...
NumPy Style
Standard in scientific Python (NumPy, SciPy, pandas). More verbose, uses dashes to underline section headers:
def normalize(array, axis=0, ddof=1):
"""
Normalize an array to zero mean and unit variance.
Parameters
----------
array : numpy.ndarray
Input data array of shape (n_samples, n_features) or (n_samples,).
axis : int, optional
Axis along which to compute statistics. Default is 0 (column-wise).
ddof : int, optional
Delta degrees of freedom for std calculation. Default is 1 (sample std).
Returns
-------
numpy.ndarray
Normalized array of the same shape as input.
Raises
------
ValueError
If the standard deviation along the specified axis is zero, indicating
a constant column which cannot be normalized.
Notes
-----
This uses the sample standard deviation (ddof=1) by default, which is
appropriate for training data normalization. Use ddof=0 for population
statistics.
Examples
--------
>>> import numpy as np
>>> normalize(np.array([[1.0, 2.0], [3.0, 4.0]]))
array([[-1., -1.],
[ 1., 1.]])
"""
...
Sphinx / reStructuredText Style
The oldest format; native to Sphinx, Python's standard documentation generator. Uses :param:, :type:, :returns:, and :raises: directives:
def divide(numerator: float, denominator: float) -> float:
"""
Divide two numbers, raising an error for zero denominators.
Unlike Python's built-in division, this function provides a domain-specific
error message that includes both operands for easier debugging.
:param numerator: The dividend.
:type numerator: float
:param denominator: The divisor. Must not be zero.
:type denominator: float
:returns: The quotient of numerator / denominator.
:rtype: float
:raises ZeroDivisionError: When denominator equals zero, with both
operands included in the error message.
Usage::
result = divide(10.0, 3.0)
# result == 3.333...
"""
...
:::note Choosing a Format
Google style is most readable in source code and is preferred for application code. NumPy style is standard for libraries that produce API references with parameter tables. Sphinx style is necessary when your project uses Sphinx and you want parameters to appear as formatted fields in the generated HTML. Pick one format per project and enforce it with a linter such as pydocstyle or ruff.
:::
The __doc__ Attribute and help()
Because docstrings are stored as __doc__, you can access and manipulate them programmatically:
class Vector:
"""A 2D vector with arithmetic operations."""
def __init__(self, x: float, y: float) -> None:
"""Initialize with x and y components."""
self.x = x
self.y = y
def magnitude(self) -> float:
"""Return the Euclidean magnitude (L2 norm) of this vector."""
return (self.x ** 2 + self.y ** 2) ** 0.5
print(Vector.__doc__) # A 2D vector with arithmetic operations.
print(Vector.magnitude.__doc__) # Return the Euclidean magnitude (L2 norm) of this vector.
print(Vector.__init__.__doc__) # Initialize with x and y components.
# help() uses __doc__ to format interactive documentation
help(Vector)
help() formats the entire class - the class docstring, the __init__ signature, the method list, and each method's docstring - into a readable help page in the REPL. This is what you get when you type help(str) or help(list) - the same mechanism, reading __doc__ attributes all the way down.
inspect.getdoc() and inspect.signature()
The inspect module provides cleaner access than reading __doc__ directly:
import inspect
def compute_tax(income: float, rate: float = 0.21) -> float:
"""
Compute tax liability for a given income and flat rate.
Args:
income: Gross income in the base currency unit.
rate: Tax rate as a decimal (0.21 = 21%). Defaults to 0.21.
Returns:
Tax owed as a float in the same currency unit as income.
"""
return income * rate
# inspect.getdoc() trims leading whitespace and normalizes indentation
# - raw __doc__ often has leading spaces from indentation inside the function
print(inspect.getdoc(compute_tax))
# inspect.signature() returns the call signature as a Signature object
sig = inspect.signature(compute_tax)
print(sig) # (income: float, rate: float = 0.21) -> float
print(sig.parameters) # OrderedDict with Parameter objects
print(sig.return_annotation) # <class 'float'>
inspect.getdoc() is more correct than obj.__doc__ for display purposes because it handles the indentation stripping that the raw attribute does not. Always use inspect.getdoc() when building tooling that renders docstrings.
Module-Level Docstrings
The first statement of a module file should be a docstring if the module is part of any public or internal API:
"""
payment.stripe_client
~~~~~~~~~~~~~~~~~~~~~
Thin wrapper around the Stripe Python SDK that enforces company-wide
retry behavior, logging conventions, and error normalization.
All monetary amounts are in integer cents (ISO 4217 minor units) to
avoid floating-point rounding errors. Currency codes follow ISO 4217.
Typical usage::
from payment.stripe_client import StripeClient
client = StripeClient(api_key=settings.STRIPE_KEY)
result = client.charge(amount_cents=1999, currency="USD", token=tok)
Public API:
StripeClient - main class
StripeError - base exception class
ChargeResult - named tuple for successful charges
Environment:
STRIPE_API_KEY - required; set in .env or CI secrets
STRIPE_SANDBOX - optional; set to '1' to use test endpoints
"""
from __future__ import annotations
import os
from typing import NamedTuple
The module __doc__ is accessible as module.__doc__ after import, and appears in help(module). It is also the content Sphinx uses as the landing page for a module's API reference.
Type Hints vs Docstrings - The Modern Approach
Type annotations (PEP 484, Python 3.5+) and docstrings serve overlapping but distinct purposes:
| Type Hints | Docstrings |
|---|---|
| Machine-readable | Human-readable |
| Checked by mypy/pyright | Not statically checked |
| In function signature | In the body |
| Describe the TYPE of input | Describe the MEANING of input |
| Cannot explain semantics | Can explain semantics |
| Cannot describe side effects | Can describe side effects |
| Cannot give examples | Can give examples |
With modern type hints, you no longer need to document the type in the docstring - the signature already carries that information:
# Pre-type-hints style (still common in older codebases)
def send_email(recipient, subject, body, cc=None):
"""
Send an email.
Args:
recipient (str): Email address of the primary recipient.
subject (str): Subject line (max 998 characters per RFC 5321).
body (str): Plain-text body. HTML is not supported.
cc (list of str, optional): Additional recipients. Defaults to None.
Returns:
bool: True if accepted by the SMTP server, False otherwise.
"""
# Modern style: type information lives in the signature
def send_email(
recipient: str,
subject: str,
body: str,
cc: Optional[list[str]] = None,
) -> bool:
"""
Submit an email for delivery via the configured SMTP server.
The subject line is truncated to 998 characters per RFC 5321 if longer.
HTML content in body is not rendered - use the HTML variant instead.
The return value reflects acceptance by the MTA, not delivery success.
Args:
recipient: Primary recipient's email address.
subject: Subject line; truncated to 998 chars if exceeded.
body: Plain-text message body.
cc: Additional recipients who receive a copy. Defaults to empty list.
Returns:
True if the MTA accepted the message; False on rejection.
Does not indicate successful delivery to the final inbox.
"""
In the modern style, the Args: section documents the semantics and constraints, not the type - the type annotation already covers that. The docstring becomes denser with meaning per line.
:::warning Do Not Duplicate the Signature in the Docstring A common anti-pattern, especially in auto-generated docstrings:
def add(a: int, b: int) -> int:
"""
Add two integers and return an integer.
Args:
a (int): First integer.
b (int): Second integer.
Returns:
int: The sum.
"""
return a + b
This docstring adds zero information beyond the signature. A useful docstring for this function would note overflow behavior, that the + operator is used (so subclasses that override __add__ are supported), or that the function is a no-op beyond the built-in + and callers should simply use a + b directly.
:::
Documentation Generation - How Sphinx, pdoc, and mkdocs Work
When a project's docstrings are well-written, documentation generators can produce complete, formatted API references automatically.
Sphinx is the oldest and most powerful. It uses RST-format docstrings or, with the napoleon extension, Google and NumPy styles. The autodoc extension imports your modules and reads __doc__ to generate API pages. FastAPI and Django both use Sphinx for their official docs.
pdoc is simpler and zero-configuration. Run pdoc mypackage and it renders all docstrings as HTML with navigation. It supports Google style natively.
mkdocs with mkdocstrings uses Markdown files for narrative documentation and pulls in docstrings automatically for API references. It is the modern default for many open-source Python libraries.
The implication: every docstring you write is not just for the person reading the source file. It becomes a rendered page in documentation sites, an entry in IDE pop-ups, and the content of help() in the REPL. Write accordingly.
Pitfalls and Anti-Patterns
Anti-Pattern 1: Commented-Out Code in Production
# This is extremely common and extremely wrong:
def process_order(order_id: int) -> dict:
# old_result = fetch_from_legacy_system(order_id)
# if old_result:
# return transform_legacy(old_result)
result = fetch_from_new_system(order_id)
return result
Commented-out code is a maintenance hazard. Future developers cannot know: Was this intentionally disabled? Is it a safe rollback option? Is it still tested? Does it still work? The correct tool for preserving previous implementations is version control - git log can show you any previous version of the function. Delete commented-out code and commit. If you need it, git revert or git show will retrieve it.
Anti-Pattern 2: Comments That Lie
# Validate the email format
username = extract_username(email)
This comment says "validate" but the code does something different - it extracts a username. The comment is wrong. Wrong comments are worse than no comments because they actively mislead. Comments drift as code changes. If you rename a function, refactor a block, or change an algorithm, you must update any comment that refers to it. If you do not have discipline to update comments, do not write them - clear code is better than misleading documentation.
Anti-Pattern 3: Docstrings That Only Restate the Signature
# Useless - adds no information
def get_user(user_id: int) -> Optional[User]:
"""Get user by user_id. Returns User or None."""
...
# Useful - tells callers what they need to know
def get_user(user_id: int) -> Optional[User]:
"""
Retrieve a User instance from the database by primary key.
Queries the read replica for performance. Deleted users (where
deleted_at is not null) are never returned - use get_user_including_deleted()
for audit operations. The result is cached in the request-local cache
for the lifetime of the current request.
Returns None if no active user with the given ID exists.
"""
...
Anti-Pattern 4: Missing Docstrings on Public API
Any function, class, or module that is imported and used by other code without the caller reading its source is part of a public or internal API. It requires a docstring. The test is simple: could someone use this function correctly from the signature and docstring alone, without reading the body? If the answer is no, the docstring is incomplete.
:::warning The Outdated Comment Problem In a survey of professional Python codebases, outdated comments - comments that once accurately described the code but no longer do after a refactor - are consistently cited as one of the top sources of bugs during maintenance. A developer reading an outdated comment forms an incorrect mental model and makes a change based on wrong assumptions. Prefer code that does not need comments. When you must comment, treat comment updates as mandatory during code review. :::
Comments in Configuration Files
Not all comments in a Python project live in .py files. Configuration and data files in modern Python projects also use comments:
TOML (pyproject.toml):
[tool.ruff]
# Enforce docstrings on all public functions and classes
select = ["D"]
[tool.ruff.per-file-ignores]
# Test files do not need docstrings on every helper function
"tests/**" = ["D"]
YAML (CI configuration):
env:
# PYTHONUNBUFFERED ensures subprocess output streams in real time
PYTHONUNBUFFERED: "1"
# PYTHONUTF8 avoids encoding errors on Windows CI runners
PYTHONUTF8: "1"
Comments in TOML use #. YAML uses #. Neither supports block comments - every comment line must start with #.
Interview Questions and Answers
Q1: What is the __doc__ attribute? Where does Python store it and how is it populated?
__doc__ is a string attribute present on modules, classes, functions, and methods. Python populates it automatically during compilation: when the parser encounters a string literal as the first statement of a module, class, or function body, it stores that string as the __doc__ attribute of the resulting object. The attribute is set to None for objects with no docstring. Because __doc__ is a regular attribute on a regular Python object, you can read it, assign to it, format it, and use it in any Python expression. Comments, by contrast, are discarded at parse time and have no runtime representation whatsoever.
Q2: What are the three main docstring formats? How do they differ and when would you choose each?
Google style uses indented section headers (Args:, Returns:, Raises:, Example:) with colon-terminated labels. It is the most readable in raw source code. Choose it for application code and most open-source libraries. NumPy style uses underlined section headers (Parameters, Returns, Notes) separated by dashed lines. It is standard in scientific Python (NumPy, pandas, scikit-learn) and is preferred when generating reference documentation with parameter tables. Sphinx / reStructuredText style uses :param name:, :type name:, :returns:, :rtype:, and :raises ExcType: directives. Choose it when you are using Sphinx directly without the napoleon extension and need native directive support. All three convey the same information; the difference is syntax and the tooling that consumes them.
Q3: Explain PEP 257's rule for the first line of any docstring. Why is the imperative mood specified?
PEP 257 requires the first line to be a concise, imperative-mood summary - "Return the square root of n" rather than "Returns the square root of n" or "This function returns the square root of n." The imperative mood is chosen because it treats the function as a command: calling sqrt(n) issues the command "return the square root of n." This parallels how Python's own standard library documents its functions. The first-line summary is also the most visible part of the docstring: it appears in help() listings, IDE hover text, and the summary tables generated by Sphinx. PEP 257 specifies that it must fit on one line, end with a period, and not repeat the function name.
Q4: When should you use type hints and when should you use docstrings for documenting parameters? Is there overlap?
Type hints document the type of a parameter - the Python type system that static checkers like mypy and pyright can verify. Docstrings document the semantics - what the parameter means, valid ranges, domain constraints, relationships to other parameters, and any invariants the caller must maintain. There is intentional overlap in older codebases where type hints were not used: the docstring would include both the type and the semantics. In modern Python (3.9+), the canonical approach is to put types in annotations and put everything else in the docstring. A parameter like rate: float in the signature is still better documented in the docstring as "Tax rate as a decimal between 0.0 and 1.0 inclusive" - the type hint says float, but it cannot say that 2.5 would be semantically wrong.
Q5: What does inspect.getdoc() do differently than accessing obj.__doc__ directly?
inspect.getdoc() trims the common leading whitespace from multi-line docstrings. Because docstrings inside functions and methods are typically indented to match the surrounding code, the raw __doc__ attribute includes that indentation as leading whitespace on every line. inspect.getdoc() calls inspect.cleandoc() internally, which strips the common indentation prefix so the output is properly left-aligned regardless of where in the source the docstring appears. It also converts tabs to spaces and strips trailing whitespace. Always use inspect.getdoc() when displaying docstrings programmatically - for example, in a custom help system, a CLI --help implementation, or a documentation renderer.
Q6: How does the python -O flag relate to docstrings, and when does this matter in practice?
Running Python with -O (optimize flag, capital letter O) strips docstrings from compiled bytecode - __doc__ attributes become None at runtime. This reduces memory usage and binary size, which matters in memory-constrained embedded systems or in very large applications with thousands of functions. In normal application development and production server deployments, -O is almost never used because modern systems have ample memory and the docstring storage cost is negligible. The practical implication: do not write code that depends on __doc__ being non-None in production - either document that the code requires non-optimized execution, or make the docstring-dependent code gracefully handle None.
Graded Practice Challenges
Level 1 - Predict the Output
Without running the code, write what will be printed to the terminal.
import inspect
def calculate(x: float, y: float) -> float:
"""Compute the weighted average of x and y, with x weighted at 70%."""
return x * 0.7 + y * 0.3
class Report:
"""Monthly performance report."""
pass
print(calculate.__doc__)
print(Report.__doc__)
print(calculate.__doc__ is None)
print(inspect.getdoc(calculate))
Show Answer
Compute the weighted average of x and y, with x weighted at 70%.
Monthly performance report.
False
Compute the weighted average of x and y, with x weighted at 70%.
Line 1: calculate.__doc__ returns the raw docstring string. Since the function body is not indented past a single level, there is no leading whitespace to differ from inspect.getdoc().
Line 2: Report.__doc__ returns the class docstring string.
Line 3: calculate.__doc__ is a non-empty string, so is None evaluates to False.
Line 4: inspect.getdoc(calculate) returns the same string as __doc__ in this case because there is no extra indentation to strip. If the docstring had been indented inside a method of a class, getdoc() would have returned a differently formatted result.
Level 2 - Debug the Documentation
The following code has three distinct documentation problems. Identify each one, explain why it is a problem, and show the corrected version.
def transfer_funds(
from_account: str,
to_account: str,
amount: float,
currency: str = "USD",
) -> bool:
# Transfer funds between accounts
"""
Transfer funds.
Args:
from_account (str): source account
to_account (str): destination account
amount (float): amount to transfer
currency (str): currency
Returns:
bool: result
"""
if amount <= 0:
raise ValueError("Amount must be positive")
# result = old_transfer_api(from_account, to_account, amount)
return _execute_transfer(from_account, to_account, amount, currency)
Show Answer
Problem 1: Comment before the docstring
The # Transfer funds between accounts comment appears before the docstring. This is both a PEP 257 violation (the docstring must be the first statement) and redundant - the comment says nothing the docstring does not already say. Delete the comment.
Problem 2: Docstring duplicates type annotations without adding meaning
Every Args: entry says (str): or (float): - information already in the type annotations. The descriptions ("source account", "amount to transfer", "currency") add no domain information. Returns: bool: result is completely uninformative. A caller reading this docstring learns nothing beyond what the signature already conveys.
Problem 3: Commented-out code
# result = old_transfer_api(from_account, to_account, amount) is commented-out production code. This should be deleted and preserved in version control history instead.
Corrected version:
def transfer_funds(
from_account: str,
to_account: str,
amount: float,
currency: str = "USD",
) -> bool:
"""
Move funds between two accounts atomically via the payment service.
Debits from_account and credits to_account in a single transaction.
If either leg fails, the entire transfer is rolled back. The operation
is idempotent with respect to the transfer ID generated internally.
Args:
from_account: Account identifier of the sender (format: "ACC-XXXXXX").
to_account: Account identifier of the recipient (format: "ACC-XXXXXX").
amount: Transfer amount in the minor currency unit (e.g., cents for USD).
Must be strictly positive.
currency: ISO 4217 three-letter currency code. Defaults to "USD".
Returns:
True if the transfer completed successfully and was confirmed
by the payment service. False if the service accepted the request
but confirmation is pending (caller should poll for status).
Raises:
ValueError: If amount is not positive.
InsufficientFundsError: If from_account lacks the required balance.
AccountNotFoundError: If either account ID does not exist.
"""
if amount <= 0:
raise ValueError("Amount must be positive")
return _execute_transfer(from_account, to_account, amount, currency)
Level 3 - Design Challenge
Design and implement a decorator @require_docstring that:
- Can be applied to any function
- Raises
ValueErrorat decoration time (when the decorator is applied, not when the function is called) if the function has no docstring - Raises
ValueErrorif the docstring is shorter than 20 characters (too short to be meaningful) - Leaves the function's behavior, signature,
__name__, and__doc__completely unchanged - Produces a clear error message that includes the function name and the actual docstring length
Then write a second decorator @document(summary, *, author=None) that:
- Accepts a required
summarystring and an optionalauthorkeyword argument - Prepends the summary (and author if provided) to the function's existing
__doc__(or sets it if there is none) - Also adds a
__documented_by__attribute to the function
Show both decorators and demonstrate their use.
Show Answer
import functools
import inspect
from typing import Callable, Optional, TypeVar
F = TypeVar("F", bound=Callable)
def require_docstring(func: F) -> F:
"""
Decorator that enforces a meaningful docstring on the decorated function.
Raises ValueError at decoration time (not at call time) if the function
has no docstring or if the docstring is shorter than 20 characters.
Args:
func: The function to validate.
Returns:
The original function, unmodified.
Raises:
ValueError: If the docstring is missing or shorter than 20 characters.
"""
doc = inspect.getdoc(func)
if doc is None:
raise ValueError(
f"Function '{func.__name__}' has no docstring. "
f"All public functions must be documented."
)
if len(doc) < 20:
raise ValueError(
f"Function '{func.__name__}' has a docstring of only "
f"{len(doc)} characters (minimum: 20). "
f"Docstring: {doc!r}"
)
# Return the function completely unchanged - no wrapper needed
return func
def document(summary: str, *, author: Optional[str] = None) -> Callable[[F], F]:
"""
Decorator factory that prepends a summary line to a function's docstring.
Also sets __documented_by__ on the function if author is provided.
Args:
summary: A brief description to prepend to the existing docstring.
author: Optional name or identifier of the documenting author.
Returns:
A decorator that modifies the function's __doc__ in place.
"""
def decorator(func: F) -> F:
existing = inspect.getdoc(func) or ""
if existing:
func.__doc__ = f"{summary}\n\n{existing}"
else:
func.__doc__ = summary
if author is not None:
func.__documented_by__ = author # type: ignore[attr-defined]
return func
return decorator
# --- Demonstration ---
@require_docstring
def compute_tax(income: float, rate: float = 0.21) -> float:
"""
Compute flat-rate tax on gross income.
Returns income multiplied by rate, rounded to two decimal places.
"""
return round(income * rate, 2)
def _legacy_hash(value: str) -> int:
"""
Compute a non-cryptographic hash of value for legacy table lookup.
Uses the FNV-1a algorithm. Not suitable for security-sensitive use.
"""
h = 2166136261
for char in value:
h ^= ord(char)
h = (h * 16777619) & 0xFFFFFFFF
return h
print(compute_tax.__doc__)
print("---")
print(_legacy_hash.__doc__)
print("---")
print(getattr(_legacy_hash, "__documented_by__", "not set"))
# This should raise ValueError
try:
@require_docstring
def bad_function():
"""Too short."""
pass
except ValueError as exc:
print(f"Caught: {exc}")
The key design insight for require_docstring: because we only need to validate - not wrap - the function, we return the original function object directly rather than a wrapper. This preserves the function's identity completely. The validation happens once at decoration time, which is exactly the right moment: you discover missing documentation when the module is loaded, not when the function is first called at runtime. The document decorator uses inspect.getdoc() rather than func.__doc__ directly to handle indentation normalization before prepending the summary.
Quick Reference Cheatsheet
| Topic | Detail |
|---|---|
| Comment syntax | # comment text - # followed by one space |
| Inline comment spacing | Two spaces before #, one space after |
| Docstring placement | First statement of module, class, function, or method |
| Single-line docstring | """Do X and return Y.""" - all on one line, ends with . |
| Multi-line docstring | Summary line, blank line, extended description, closing """ on own line |
__doc__ attribute | func.__doc__, Class.__doc__, module.__doc__ |
| Normalized docstring | inspect.getdoc(obj) - strips leading indentation |
| Function signature | inspect.signature(func) - returns Signature object |
| Interactive docs | help(func) - reads __doc__ and formats for terminal |
| Google style | Args:, Returns:, Raises:, Example: sections |
| NumPy style | Parameters, Returns, Notes with dashed underlines |
| Sphinx style | :param name:, :type name:, :returns:, :rtype: |
| Type info in docstring | Omit type if using type annotations - let hints carry it |
| Commented-out code | Delete it; use git log to retrieve previous versions |
| Imperative mood | "Return X" not "Returns X" - PEP 257 |
| Optimized mode | python -O strips docstrings; __doc__ becomes None |
| Module docstring | First statement in .py file before any imports |
Key Takeaways
- Comments and docstrings serve fundamentally different audiences at different times: comments communicate intent to future maintainers reading source code, docstrings communicate interface contracts to callers who may never see the source.
- Comments are stripped at parse time and have no runtime presence; docstrings are stored in
__doc__and are accessible tohelp(),inspect, IDEs, and documentation generators at runtime. - The best comment is no comment - if the code can be rewritten to be self-explanatory, that is always preferable to annotating confusing code.
- When you do write comments, comment the "why" (the non-obvious decision, the domain constraint, the historical context) never the "what" (which is readable from the code itself).
- Write docstrings in one consistent format across a project - Google style for most application code, NumPy style for scientific libraries, Sphinx style for projects using Sphinx without the napoleon extension.
- Modern Python uses type annotations for type information and docstrings for semantic meaning; do not duplicate type information in the
Args:section if your function has annotations. - Commented-out code is a maintenance hazard; delete it and rely on version control - the commit history is your safety net.
- An outdated comment that contradicts the actual code behavior is worse than no comment at all; treat comment updates as a mandatory part of every code review.
- The
python -Ooptimization flag strips docstrings at runtime; code that depends on__doc__being non-None must handle this case or document the requirement.
