Request-Response Lifecycle - Every Step From Client to Handler and Back
Reading time: ~30 minutes | Level: Intermediate → Engineering
Before reading further, trace this request:
curl -X POST https://api.example.com/users \
-H "Authorization: Bearer token123" \
-H "Content-Type: application/json" \
-d '{"name": "Alice", "email": "[email protected]"}'
Most developers describe four steps: "the client sends the request, the server receives it, the handler runs, the response goes back." The actual lifecycle has more than fifteen distinct steps - and each step is a possible failure point with a specific HTTP status code, a specific error message, and a specific log entry that tells you exactly where in the stack the failure occurred.
Knowing the full lifecycle is not academic. When a request returns 502 Bad Gateway in production, you need to know immediately: is the problem in Nginx (reverse proxy layer), in Uvicorn (ASGI server layer), in your middleware, in your route handler, or in your database? Each one requires a different fix. Engineers who understand all fifteen steps debug production incidents in minutes. Engineers who understand four steps debug them in hours.
What You Will Learn
- The client-side steps before any bytes reach your server
- Network infrastructure: load balancers, reverse proxies, SSL termination
- Server-side OS and socket layers
- WSGI and ASGI server internals - how Gunicorn and Uvicorn work
- The full middleware stack and its execution order
- Route matching, dependency resolution, and Pydantic validation
- Handler execution and response serialisation
- What status codes each layer produces on failure
X-Request-IDpropagation for distributed tracing- Content negotiation and keep-alive connection reuse
Prerequisites
- Lesson 01 (HTTP Deep Dive) - HTTP methods, headers, status codes
- Lesson 04 (FastAPI) - middleware, dependency injection, ASGI
- Lesson 06 (Middleware) - middleware stack order and patterns
Part 1 - The Full 15-Step Lifecycle
Each numbered step is a distinct failure domain with its own error signature. The following sections walk through each layer in depth.
Part 2 - Client Side: Before Any Server Sees the Request
Step 1 - DNS Resolution
The client does not know api.example.com's IP address. It queries a DNS resolver (typically your ISP's or a configured resolver like 8.8.8.8). The resolver walks the DNS hierarchy: root → .com nameservers → example.com's authoritative nameserver → returns an A or AAAA record.
DNS caches results for the TTL (time-to-live) of the record. A low TTL (60 seconds) allows fast failover when you change IP addresses. A high TTL (3600 seconds) reduces resolver load but slows down IP changes.
Failure signature: DNS resolution failure produces a client-side error, not an HTTP status. curl reports Could not resolve host. Your server never sees the request. DNS failures are invisible in server logs.
Steps 2–3 - TCP Connection and TLS Handshake
The client opens a TCP connection to the resolved IP on port 443 (HTTPS). The TCP three-way handshake (SYN → SYN-ACK → ACK) establishes a reliable bidirectional stream. This takes one round-trip time (RTT) - typically 1–100 ms depending on geography.
The TLS handshake follows immediately. TLS 1.3 (current standard) requires one RTT for the handshake (ClientHello → ServerHello + Certificate + Finished → ClientFinished). TLS 1.2 required two RTTs. The handshake negotiates:
- The cipher suite (e.g.,
TLS_AES_128_GCM_SHA256) - The server's certificate (proves identity, signed by a trusted CA)
- Session keys (derived from Diffie-Hellman key exchange - the server's private key never leaves the server)
Failure signatures:
| Failure | Client error | HTTP status |
|---|---|---|
| TCP connection refused | Connection refused | - (no HTTP) |
| TCP timeout | Connection timed out | - (no HTTP) |
| TLS certificate expired | SSL certificate problem | - (no HTTP) |
| TLS certificate mismatch | hostname doesn't match | - (no HTTP) |
Step 4 - HTTP Request Serialisation
The client serialises the request into bytes:
POST /users HTTP/1.1\r\n
Host: api.example.com\r\n
Authorization: Bearer token123\r\n
Content-Type: application/json\r\n
Content-Length: 47\r\n
Accept: application/json\r\n
\r\n
{"name": "Alice", "email": "[email protected]"}
HTTP/2 frames this differently (binary, multiplexed over a single TCP connection), but the semantic content is the same. The TLS layer encrypts these bytes before sending.
Part 3 - Network Infrastructure
Step 5 - Load Balancer (L4 vs L7)
L4 load balancers (TCP level - HAProxy in TCP mode, AWS NLB) operate on IP addresses and ports. They forward raw TCP streams without inspecting HTTP content. They cannot route based on URL path or headers.
L7 load balancers (HTTP level - AWS ALB, HAProxy in HTTP mode, GCP HTTPS LB) terminate the TCP/TLS connection and inspect the HTTP content. They can:
- Route to different backend pools based on URL path (
/api/*→ API servers,/static/*→ S3) - Perform health checks by sending real HTTP requests (
GET /health) - Rewrite headers, add
X-Forwarded-For, strip internal headers - Implement sticky sessions, circuit breaking, and retry logic
Failure signatures: A load balancer that cannot reach any healthy backend returns 502 Bad Gateway to the client. A backend that times out returns 504 Gateway Timeout.
Step 6 - Reverse Proxy: Nginx
Nginx sits between the load balancer and your Python application server. It handles:
- SSL termination (if the LB passes TLS traffic through): decrypts HTTPS, forwards plain HTTP upstream
- Header injection: adds
X-Forwarded-For(original client IP),X-Real-IP,X-Forwarded-Proto: https - Connection pooling: maintains persistent HTTP connections to upstream (Uvicorn) - avoiding per-request TCP handshake overhead
- Static file serving: serves
/static/directly from disk, never reaching Python - Buffering: buffers the full request body before passing to upstream (protects slow Python apps from slow clients)
- Rate limiting:
limit_req_zone- coarse IP-level rate limiting before your application
upstream fastapi_app {
server 127.0.0.1:8000;
server 127.0.0.1:8001;
keepalive 32; # persistent connections to upstream
}
server {
listen 443 ssl http2;
server_name api.example.com;
ssl_certificate /etc/ssl/api.example.com.crt;
ssl_certificate_key /etc/ssl/api.example.com.key;
location / {
proxy_pass http://fastapi_app;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_read_timeout 30s;
proxy_connect_timeout 5s;
}
location /static/ {
root /var/www;
expires 1y;
}
}
Failure signatures: Nginx returns errors to the client if the upstream is unreachable:
| Nginx error | HTTP status | Cause |
|---|---|---|
| Upstream refused connection | 502 Bad Gateway | Uvicorn not running |
| Upstream read timeout | 504 Gateway Timeout | Handler took too long |
| Client body too large | 413 Content Too Large | Exceeds client_max_body_size |
Part 4 - Server Side: OS and ASGI Server
Step 8 - OS: TCP Socket Accept
The operating system's network stack receives the TCP segment, performs IP routing, and deposits the data into a kernel receive buffer associated with the listening socket. The ASGI server calls accept() to receive a new connection file descriptor. The OS returns the raw bytes - it knows nothing about HTTP.
Step 9 - ASGI Server: Uvicorn
Uvicorn is an ASGI server built on Python's asyncio and httptools (a fast HTTP parser written in C). A single Uvicorn worker runs a single asyncio event loop.
Incoming bytes (from OS)
↓
httptools parser → HTTP headers, method, path, query string, body
↓
Build ASGI scope dict:
{
"type": "http",
"method": "POST",
"path": "/users",
"query_string": b"",
"headers": [(b"authorization", b"Bearer token123"), ...],
"client": ("127.0.0.1", 54321),
"server": ("0.0.0.0", 8000),
}
↓
await app(scope, receive, send) ← calls your FastAPI app object
In production, Uvicorn is typically managed by Gunicorn (gunicorn -w 4 -k uvicorn.workers.UvicornWorker). Gunicorn is the process manager: it spawns multiple Uvicorn worker processes, monitors them, restarts crashed workers, and handles graceful shutdown. Each Uvicorn worker is a full asyncio event loop capable of handling thousands of concurrent connections.
WSGI and ASGI are interfaces, not implementations. WSGI (def app(environ, start_response)) is a synchronous calling convention. ASGI (async def app(scope, receive, send)) is an async calling convention. Neither interface is magic - they are contracts that define how the server calls your framework. The performance difference comes from what the framework does inside the interface: WSGI blocks a thread per request; ASGI multiplexes many requests on one event loop.
Part 5 - Framework Layer: Middleware, Routing, Validation
Step 10–11 - Middleware Stack
FastAPI (via Starlette) executes middleware in a specific order that is critical to understand. Middleware added last is outermost - it wraps everything added before it. Given:
app.add_middleware(AuthMiddleware) # added first → innermost
app.add_middleware(LoggingMiddleware) # added second → middle
app.add_middleware(CORSMiddleware) # added last → outermost
Execution order:
Request: CORSMiddleware → LoggingMiddleware → AuthMiddleware → Handler
Response: AuthMiddleware → LoggingMiddleware → CORSMiddleware → Client
Each middleware calls await call_next(request) to pass the request deeper. The response flows back through middleware in reverse as call_next returns.
Failure signatures from middleware:
| Middleware | Failure | Status code |
|---|---|---|
| CORS | Origin not in allowed list | 400 or no CORS headers (browser blocks) |
| Authentication | Missing or invalid token | 401 Unauthorized |
| Rate limiting | Too many requests | 429 Too Many Requests |
| Trusted host | Host header not in allowlist | 400 Bad Request |
Middleware order matters in ways that can create security holes. Authentication middleware must run before any business logic. If you accidentally place logging middleware that reads request.state.user_id before authentication middleware sets it, you get AttributeError. More critically: if you place rate limiting before authentication, unauthenticated requests consume rate limit budget - potential for denial-of-service amplification.
Step 12 - Route Matching
Starlette's router matches the incoming method and path against registered routes. It extracts path parameters from the URL:
POST /users/42/orders/99
Route: /users/{user_id}/orders/{order_id}
Match: user_id=42, order_id=99 (as strings initially)
If no route matches: 404 Not Found. If the path matches but the method does not: 405 Method Not Allowed (with an Allow header listing accepted methods).
Step 13 - Dependency Resolution
FastAPI builds a directed acyclic graph (DAG) of all Depends() calls for the matched endpoint and resolves them in topological order (leaves first). Dependencies that do not depend on each other are resolved concurrently (if async).
If any dependency raises HTTPException, the entire request fails immediately with that status code - the handler never runs.
Step 14 - Pydantic Validation
FastAPI collects parameters from three sources:
- Path parameters: extracted from the URL by the router (string → typed by Pydantic)
- Query parameters: parsed from
?key=valuepairs in the URL - Request body: read via
receive()callable, parsed as JSON, validated against the Pydantic model
If validation fails at any parameter, FastAPI raises RequestValidationError, which the default exception handler converts to 422 Unprocessable Entity with a structured body identifying every failed field.
Step 15 - Handler Execution
The handler runs with all parameters validated and typed. Any exception that is not caught here propagates to FastAPI's exception handler system. An uncaught non-HTTPException becomes 500 Internal Server Error.
Part 6 - Content Negotiation
HTTP allows clients to declare what content types they can accept and what type the request body is.
Request headers:
Content-Type: application/json ← format of the request body
Accept: application/json ← what the client wants in return
Accept-Encoding: gzip, deflate, br ← compression algorithms client supports
Accept-Language: en-US,en;q=0.9 ← language preference
FastAPI's default behaviour:
- If
Content-Typeis notapplication/jsonand the endpoint expects a JSON body, FastAPI returns422(Pydantic cannot parse the body as JSON). - FastAPI always returns
application/jsonresponses regardless ofAccept. If you need content negotiation (returning XML vs JSON based onAccept), you must implement it manually. GZipMiddlewarehandlesAccept-Encoding: gzipautomatically - it compresses responses larger than the threshold.
What happens when Content-Type mismatches:
# Sending form data to a JSON endpoint
curl -X POST https://api.example.com/users \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "[email protected]"
# Result: 422 - Pydantic tried to parse URL-encoded bytes as JSON and failed
Part 7 - Request IDs and Distributed Tracing
Without request IDs, correlating log entries across a microservice architecture is nearly impossible. A single user action might produce log entries in 5 different services, 20 different log lines each - all interleaved with other users' requests.
The solution is to generate a UUID at the earliest possible point (the outermost middleware) and propagate it through every layer:
import uuid
import logging
from fastapi import FastAPI, Request
from starlette.middleware.base import BaseHTTPMiddleware
logger = logging.getLogger(__name__)
app = FastAPI()
class RequestIDMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
# Use client-provided ID (from upstream service) or generate a new one
request_id = request.headers.get("X-Request-ID") or str(uuid.uuid4())
request.state.request_id = request_id
# Bind to structured logging context so all log lines in this request
# include the request_id automatically
with structlog.contextvars.bound_contextvars(request_id=request_id):
response = await call_next(request)
# Propagate to client so they can report it in bug reports
response.headers["X-Request-ID"] = request_id
return response
Propagate the request ID to downstream services:
import httpx
async def call_downstream_service(request: Request, data: dict) -> dict:
async with httpx.AsyncClient() as client:
response = await client.post(
"https://internal-service/api/endpoint",
json=data,
headers={
"X-Request-ID": request.state.request_id, # forward the ID
"Authorization": f"Bearer {get_service_token()}",
},
)
return response.json()
With this pattern, when a customer reports "my request failed, I got X-Request-ID: abc-123", you search all service logs for request_id=abc-123 and see the complete call chain in chronological order.
Add X-Request-ID middleware to every service you build. It is the single highest-leverage observability investment you can make. The implementation is 10 lines of code. The debugging value in a distributed system is enormous - it converts hour-long production investigations into 2-minute log searches.
Part 8 - Keep-Alive and Connection Reuse
Opening a new TCP connection for every HTTP request is expensive: DNS lookup + TCP handshake + TLS handshake costs 100–400 ms before the first byte of your request arrives. HTTP keep-alive reuses the same TCP connection for multiple requests.
# Without keep-alive: one TCP connection per request
Client → DNS → TCP handshake → TLS handshake → POST /users → response → connection closed
Client → DNS (cached) → TCP handshake → TLS handshake → GET /users/1 → response → closed
# With keep-alive: one connection, multiple requests
Client → DNS → TCP handshake → TLS handshake → [persistent connection]
→ POST /users → response (connection stays open)
→ GET /users/1 → response (same connection)
→ GET /users/2 → response (same connection)
→ ... idle timeout → connection closed
Nginx keeps connections to clients alive for 65 seconds by default (keepalive_timeout 65). Nginx also maintains a pool of persistent connections to Uvicorn (keepalive 32 in the upstream block) - avoiding per-request TCP overhead between Nginx and Python.
HTTP/2 takes this further: it multiplexes multiple requests over a single TCP connection simultaneously (not sequentially). A browser loading a page with 50 resources sends all 50 requests in parallel over one HTTP/2 connection.
Part 9 - Where Errors Occur at Each Layer
This table maps each step to its failure mode and resulting status code:
| Step | Layer | Failure | Status / Error |
|---|---|---|---|
| 1 | DNS | Name not found | Client error - no HTTP |
| 2 | TCP | Connection refused | Client error - no HTTP |
| 3 | TLS | Certificate invalid | Client error - no HTTP |
| 5 | Load balancer | All backends down | 502 Bad Gateway |
| 5 | Load balancer | Backend timeout | 504 Gateway Timeout |
| 6 | Nginx | client_max_body_size exceeded | 413 Content Too Large |
| 6 | Nginx | Upstream not running | 502 Bad Gateway |
| 6 | Nginx | Upstream read timeout | 504 Gateway Timeout |
| 9 | Uvicorn | Worker crashed | 502 Bad Gateway (Nginx sees disconnect) |
| 11 | Auth middleware | Invalid token | 401 Unauthorized |
| 11 | Rate limit middleware | Quota exceeded | 429 Too Many Requests |
| 11 | CORS middleware | Origin rejected | 400 / no CORS headers |
| 12 | Router | Path not found | 404 Not Found |
| 12 | Router | Method not allowed | 405 Method Not Allowed |
| 13 | Dependencies | Auth check fails | 401 / 403 |
| 14 | Pydantic | Invalid body/params | 422 Unprocessable Entity |
| 15 | Handler | Unhandled exception | 500 Internal Server Error |
| 15 | Handler | HTTPException(404) raised | 404 Not Found |
Never log request bodies that may contain passwords, credit card numbers, or other sensitive data. The POST body to /login contains the user's plaintext password. The POST body to /payments may contain a card number. Logging these to your centralised log system creates a compliance violation and a security breach vector. Log only safe fields: method, path, status code, duration, request ID, and authenticated user ID.
Part 10 - Deploying: Gunicorn + Uvicorn Worker Pattern
# Production deployment command
gunicorn myapp.main:app \
--workers 4 \
--worker-class uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:8000 \
--timeout 30 \
--keepalive 5 \
--access-logfile - \
--error-logfile - \
--log-level info
┌─────────────────────────────────┐
│ Gunicorn │
│ (process manager, master PID) │
└────────────┬────────────────────┘
┌─────────────────┼──────────────────────┐
↓ ↓ ↓
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Uvicorn │ │ Uvicorn │ │ Uvicorn │
│ Worker 1 │ │ Worker 2 │ ... │ Worker N │
│ (asyncio │ │ (asyncio │ │ (asyncio │
│ event loop) │ │ event loop) │ │ event loop) │
└──────────────┘ └──────────────┘ └──────────────┘
Worker count rule of thumb: (2 × CPU cores) + 1. For a 4-core machine, run 9 workers. Each worker handles thousands of concurrent async requests. If your handlers are CPU-bound, more workers help. If they are I/O-bound (database calls, HTTP calls), fewer workers with async are more efficient.
Graded Practice
Level 1 - Identify the Layer
For each error, identify which layer produced it and what the most likely cause is:
curl: (6) Could not resolve host: api.example.comHTTP/1.1 413 Request Entity Too LargeHTTP/1.1 502 Bad Gatewaywith an Nginx HTML bodyHTTP/1.1 422 Unprocessable Entitywith a JSON body listing field errorsHTTP/1.1 404 Not Foundwith a JSON body{"detail": "Not Found"}HTTP/1.1 504 Gateway Timeout
Show Answer
-
DNS layer. The DNS resolver could not find
api.example.com. Possible causes: DNS record deleted, DNS propagation delay after a change, resolver misconfiguration, or network connectivity to the resolver is broken. No bytes reached the server. -
Nginx layer. The request body exceeds Nginx's
client_max_body_sizedirective (default 1 MB). Your Python application never saw the request. Fix: increaseclient_max_body_sizein nginx.conf if the large body is legitimate (e.g., file upload endpoint). -
Nginx layer (upstream failure). Nginx received an invalid or no response from the upstream server (Uvicorn). Likely causes: Uvicorn process crashed, is not running, or the upstream port is wrong. Check
systemctl statusfor your application andjournalctlfor crash logs. -
Pydantic validation layer (FastAPI). The request body or query parameters did not match the endpoint's declared types or constraints. The JSON body identifies exactly which fields failed. This is the expected behaviour when a client sends malformed data - it is not a server bug.
-
Router layer (FastAPI/Starlette). The path
/whateveris not registered in the application's route table. The JSON body (rather than an HTML body) confirms that Nginx forwarded the request successfully to FastAPI, and FastAPI's 404 handler responded. If the body was HTML, Nginx itself would be the source. -
Nginx layer (upstream timeout). Nginx waited for the upstream (Uvicorn/FastAPI) to respond but the timeout expired (
proxy_read_timeout). Likely causes: the handler is performing a slow database query, waiting on an external HTTP call, or is deadlocked. The request did reach the application - look in application logs for the matching request that never completed.
Level 2 - Debug the Production Incident
A customer reports: "My request to POST /orders returned a 500 error. The request ID in the header is X-Request-ID: f4a2b3c1-dead-beef-0000-aabbccddeeff."
You search the logs and find:
INFO [f4a2b3c1] POST /orders user_id=88 status=500 duration_ms=12043
ERROR [f4a2b3c1] sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 0
reached, connection timed out, timeout 10
- What is the root cause?
- Why did it take 12 seconds to fail?
- What are three fixes, in order of urgency?
- What monitoring would have caught this before the customer reported it?
Show Answer
-
Root cause: SQLAlchemy connection pool exhaustion. The application has a connection pool configured with
pool_size=5, max_overflow=0. All 5 database connections are in use by other requests when this request tries to acquire one. The request waits 10 seconds (thetimeoutsetting) for a connection to become available, then fails withTimeoutError, which propagates as a500. -
12 seconds = 10 second pool timeout + ~2 seconds of processing before the DB call. The request reached the handler, did some work, then hit the database call and waited the full 10-second timeout before failing.
-
Three fixes in order of urgency:
Immediate (minutes): Increase
max_overflowtemporarily to let the pool grow:engine = create_async_engine(DATABASE_URL, pool_size=5, max_overflow=10)This is a band-aid - it increases the maximum load the database can handle but does not fix the underlying cause.
Short term (hours): Investigate what is holding connections open. Are any transactions uncommitted? Are there long-running queries? Is the pool size appropriate for the number of Uvicorn workers? Rule of thumb:
pool_sizeper worker × number of workers ≤ databasemax_connections.Medium term (days): Add a connection pool monitoring metric (SQLAlchemy emits pool events). Alert when pool utilisation exceeds 80%. Add a shorter
pool_timeout(5 seconds instead of 10) so failures are faster and less painful. Add circuit breaking to avoid cascading failures if the database becomes unavailable. -
Monitoring that would have caught this first:
- A histogram metric for
sqlalchemy_pool_size_used- alert at 80% utilisation - A p99 request latency alert - 12-second responses are 100× above normal; a latency alert would have fired before the first customer complaint
- An error rate alert on
5xxresponses - even a single500in a 5-minute window should page on-call for a critical endpoint like/orders
- A histogram metric for
Level 3 - Design Challenge
You are adding request tracing to an existing FastAPI application. The system has:
- An API gateway (Kong) that generates
X-Request-IDheaders - Three internal microservices (User Service, Order Service, Payment Service)
- PostgreSQL databases, one per service
- A centralised logging system (Elasticsearch)
- Each service is a separate FastAPI application
Design a complete request tracing system that:
- Preserves request IDs across service boundaries
- Adds trace context to every log line without manual passing
- Allows you to reconstruct the full request chain for any request ID
- Does not require changes to individual handler functions
Show Answer
Architecture: Context-var-based propagation + structured logging
The key insight is to use Python's contextvars.ContextVar - a per-async-task variable that is automatically inherited by tasks spawned from the current context. This allows middleware to set a request ID once, and all logging within that request's execution context automatically includes it.
Step 1 - Request context storage:
# shared/context.py
from contextvars import ContextVar
from typing import Optional
request_id_var: ContextVar[Optional[str]] = ContextVar("request_id", default=None)
service_name_var: ContextVar[str] = ContextVar("service_name", default="unknown")
Step 2 - Middleware that extracts/generates request ID:
# shared/middleware.py
from fastapi import Request
from starlette.middleware.base import BaseHTTPMiddleware
import uuid
from .context import request_id_var, service_name_var
class TracingMiddleware(BaseHTTPMiddleware):
def __init__(self, app, service_name: str):
super().__init__(app)
self.service_name = service_name
async def dispatch(self, request: Request, call_next):
# Accept from upstream or generate new
request_id = (
request.headers.get("X-Request-ID")
or request.headers.get("X-Kong-Request-ID")
or str(uuid.uuid4())
)
# Set context vars - inherited by all coroutines in this request
token_rid = request_id_var.set(request_id)
token_svc = service_name_var.set(self.service_name)
request.state.request_id = request_id
try:
response = await call_next(request)
response.headers["X-Request-ID"] = request_id
return response
finally:
# Reset context vars after request completes
request_id_var.reset(token_rid)
service_name_var.reset(token_svc)
Step 3 - Structured logging filter that reads context vars:
# shared/logging.py
import logging
from .context import request_id_var, service_name_var
class RequestContextFilter(logging.Filter):
def filter(self, record):
record.request_id = request_id_var.get()
record.service = service_name_var.get()
return True
# Configure in each service's startup:
logging.getLogger().addFilter(RequestContextFilter())
# Use JSON formatter (e.g., python-json-logger) so Elasticsearch can index fields
Step 4 - HTTP client that propagates request ID to downstream services:
# shared/http_client.py
import httpx
from .context import request_id_var
class TracedAsyncClient:
"""Drop-in replacement for httpx.AsyncClient that forwards tracing headers."""
async def request(self, method: str, url: str, **kwargs) -> httpx.Response:
headers = kwargs.pop("headers", {})
request_id = request_id_var.get()
if request_id:
headers["X-Request-ID"] = request_id
async with httpx.AsyncClient() as client:
return await client.request(method, url, headers=headers, **kwargs)
Step 5 - Per-service setup (no handler changes):
# order_service/main.py
from fastapi import FastAPI
from shared.middleware import TracingMiddleware
app = FastAPI(title="Order Service")
app.add_middleware(TracingMiddleware, service_name="order-service")
# All handlers automatically log with request_id and service fields
Result: Every log line in every service includes {"request_id": "f4a2b3c1-...", "service": "order-service", ...}. An Elasticsearch query for request_id: "f4a2b3c1-..." returns all log lines from all three services in chronological order - the complete trace of one user's request, without any manual instrumentation in handlers.
Key Takeaways
- A single HTTP request traverses 15+ distinct layers before the handler runs - DNS, TCP, TLS, load balancer, reverse proxy, OS socket, ASGI server, middleware stack, router, dependency injection, Pydantic validation, handler, response serialisation, and back out through middleware. Each layer is a distinct failure domain with its own error signature and status code.
- Every non-HTTP error happens before Nginx. If
curlreports a connection error rather than an HTTP status code, the problem is in DNS, TCP, or TLS - not your application. Server logs will be empty. 502 Bad Gatewayfrom Nginx means your Python application is unreachable - Uvicorn crashed, is not running, or the port is wrong.504 Gateway Timeoutmeans the application is running but too slow.422 Unprocessable Entityis always from your framework's validation layer - the request reached FastAPI but the body or parameters did not match the declared schema. It is not a server error; it is a client sending malformed data.- WSGI and ASGI are calling conventions, not magic. WSGI blocks a thread per request. ASGI multiplexes many requests on one event loop. The concurrency benefit of ASGI is only realised if every I/O call inside
async defusesawait- otherwise you block the event loop and lose all the benefit. X-Request-IDmiddleware is the highest-leverage observability investment in a distributed system. Generate a UUID at the outermost middleware, propagate it to downstream services via HTTP headers, include it in every log line, and return it to the client. This makes production incidents debuggable in minutes instead of hours.- Content-Type and Accept are separate concerns.
Content-Typedescribes the request body format.Acceptdescribes what format the client wants in the response. FastAPI validatesContent-Type(defaults to requiringapplication/json) but ignoresAccept(always returns JSON). MismatchingContent-Typeproduces422. - Keep-alive connection reuse eliminates 100–400 ms of overhead per request. Nginx maintains persistent connections to both clients and upstream Uvicorn workers. HTTP/2 goes further by multiplexing requests over a single connection.
- Never log request bodies on authentication or payment endpoints. Plaintext passwords and card numbers in log files are compliance violations and attack vectors. Log only method, path, status, duration, request ID, and user ID.
- Middleware order is a security property. Authentication must run before any middleware or handler that trusts
request.state.user. Rate limiting before authentication allows unauthenticated traffic to exhaust rate limits. Getting middleware order wrong creates subtle security holes that are invisible in unit tests.
