Skip to main content

Request-Response Lifecycle - Every Step From Client to Handler and Back

Reading time: ~30 minutes | Level: Intermediate → Engineering

Before reading further, trace this request:

curl -X POST https://api.example.com/users \
-H "Authorization: Bearer token123" \
-H "Content-Type: application/json" \
-d '{"name": "Alice", "email": "[email protected]"}'

Most developers describe four steps: "the client sends the request, the server receives it, the handler runs, the response goes back." The actual lifecycle has more than fifteen distinct steps - and each step is a possible failure point with a specific HTTP status code, a specific error message, and a specific log entry that tells you exactly where in the stack the failure occurred.

Knowing the full lifecycle is not academic. When a request returns 502 Bad Gateway in production, you need to know immediately: is the problem in Nginx (reverse proxy layer), in Uvicorn (ASGI server layer), in your middleware, in your route handler, or in your database? Each one requires a different fix. Engineers who understand all fifteen steps debug production incidents in minutes. Engineers who understand four steps debug them in hours.

What You Will Learn

  • The client-side steps before any bytes reach your server
  • Network infrastructure: load balancers, reverse proxies, SSL termination
  • Server-side OS and socket layers
  • WSGI and ASGI server internals - how Gunicorn and Uvicorn work
  • The full middleware stack and its execution order
  • Route matching, dependency resolution, and Pydantic validation
  • Handler execution and response serialisation
  • What status codes each layer produces on failure
  • X-Request-ID propagation for distributed tracing
  • Content negotiation and keep-alive connection reuse

Prerequisites

  • Lesson 01 (HTTP Deep Dive) - HTTP methods, headers, status codes
  • Lesson 04 (FastAPI) - middleware, dependency injection, ASGI
  • Lesson 06 (Middleware) - middleware stack order and patterns

Part 1 - The Full 15-Step Lifecycle

Each numbered step is a distinct failure domain with its own error signature. The following sections walk through each layer in depth.

Part 2 - Client Side: Before Any Server Sees the Request

Step 1 - DNS Resolution

The client does not know api.example.com's IP address. It queries a DNS resolver (typically your ISP's or a configured resolver like 8.8.8.8). The resolver walks the DNS hierarchy: root → .com nameservers → example.com's authoritative nameserver → returns an A or AAAA record.

DNS caches results for the TTL (time-to-live) of the record. A low TTL (60 seconds) allows fast failover when you change IP addresses. A high TTL (3600 seconds) reduces resolver load but slows down IP changes.

Failure signature: DNS resolution failure produces a client-side error, not an HTTP status. curl reports Could not resolve host. Your server never sees the request. DNS failures are invisible in server logs.

Steps 2–3 - TCP Connection and TLS Handshake

The client opens a TCP connection to the resolved IP on port 443 (HTTPS). The TCP three-way handshake (SYN → SYN-ACK → ACK) establishes a reliable bidirectional stream. This takes one round-trip time (RTT) - typically 1–100 ms depending on geography.

The TLS handshake follows immediately. TLS 1.3 (current standard) requires one RTT for the handshake (ClientHello → ServerHello + Certificate + Finished → ClientFinished). TLS 1.2 required two RTTs. The handshake negotiates:

  • The cipher suite (e.g., TLS_AES_128_GCM_SHA256)
  • The server's certificate (proves identity, signed by a trusted CA)
  • Session keys (derived from Diffie-Hellman key exchange - the server's private key never leaves the server)

Failure signatures:

FailureClient errorHTTP status
TCP connection refusedConnection refused- (no HTTP)
TCP timeoutConnection timed out- (no HTTP)
TLS certificate expiredSSL certificate problem- (no HTTP)
TLS certificate mismatchhostname doesn't match- (no HTTP)

Step 4 - HTTP Request Serialisation

The client serialises the request into bytes:

POST /users HTTP/1.1\r\n
Host: api.example.com\r\n
Authorization: Bearer token123\r\n
Content-Type: application/json\r\n
Content-Length: 47\r\n
Accept: application/json\r\n
\r\n
{"name": "Alice", "email": "[email protected]"}

HTTP/2 frames this differently (binary, multiplexed over a single TCP connection), but the semantic content is the same. The TLS layer encrypts these bytes before sending.

Part 3 - Network Infrastructure

Step 5 - Load Balancer (L4 vs L7)

L4 load balancers (TCP level - HAProxy in TCP mode, AWS NLB) operate on IP addresses and ports. They forward raw TCP streams without inspecting HTTP content. They cannot route based on URL path or headers.

L7 load balancers (HTTP level - AWS ALB, HAProxy in HTTP mode, GCP HTTPS LB) terminate the TCP/TLS connection and inspect the HTTP content. They can:

  • Route to different backend pools based on URL path (/api/* → API servers, /static/* → S3)
  • Perform health checks by sending real HTTP requests (GET /health)
  • Rewrite headers, add X-Forwarded-For, strip internal headers
  • Implement sticky sessions, circuit breaking, and retry logic

Failure signatures: A load balancer that cannot reach any healthy backend returns 502 Bad Gateway to the client. A backend that times out returns 504 Gateway Timeout.

Step 6 - Reverse Proxy: Nginx

Nginx sits between the load balancer and your Python application server. It handles:

  • SSL termination (if the LB passes TLS traffic through): decrypts HTTPS, forwards plain HTTP upstream
  • Header injection: adds X-Forwarded-For (original client IP), X-Real-IP, X-Forwarded-Proto: https
  • Connection pooling: maintains persistent HTTP connections to upstream (Uvicorn) - avoiding per-request TCP handshake overhead
  • Static file serving: serves /static/ directly from disk, never reaching Python
  • Buffering: buffers the full request body before passing to upstream (protects slow Python apps from slow clients)
  • Rate limiting: limit_req_zone - coarse IP-level rate limiting before your application
upstream fastapi_app {
server 127.0.0.1:8000;
server 127.0.0.1:8001;
keepalive 32; # persistent connections to upstream
}

server {
listen 443 ssl http2;
server_name api.example.com;

ssl_certificate /etc/ssl/api.example.com.crt;
ssl_certificate_key /etc/ssl/api.example.com.key;

location / {
proxy_pass http://fastapi_app;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_read_timeout 30s;
proxy_connect_timeout 5s;
}

location /static/ {
root /var/www;
expires 1y;
}
}

Failure signatures: Nginx returns errors to the client if the upstream is unreachable:

Nginx errorHTTP statusCause
Upstream refused connection502 Bad GatewayUvicorn not running
Upstream read timeout504 Gateway TimeoutHandler took too long
Client body too large413 Content Too LargeExceeds client_max_body_size

Part 4 - Server Side: OS and ASGI Server

Step 8 - OS: TCP Socket Accept

The operating system's network stack receives the TCP segment, performs IP routing, and deposits the data into a kernel receive buffer associated with the listening socket. The ASGI server calls accept() to receive a new connection file descriptor. The OS returns the raw bytes - it knows nothing about HTTP.

Step 9 - ASGI Server: Uvicorn

Uvicorn is an ASGI server built on Python's asyncio and httptools (a fast HTTP parser written in C). A single Uvicorn worker runs a single asyncio event loop.

Incoming bytes (from OS)

httptools parser → HTTP headers, method, path, query string, body

Build ASGI scope dict:
{
"type": "http",
"method": "POST",
"path": "/users",
"query_string": b"",
"headers": [(b"authorization", b"Bearer token123"), ...],
"client": ("127.0.0.1", 54321),
"server": ("0.0.0.0", 8000),
}

await app(scope, receive, send) ← calls your FastAPI app object

In production, Uvicorn is typically managed by Gunicorn (gunicorn -w 4 -k uvicorn.workers.UvicornWorker). Gunicorn is the process manager: it spawns multiple Uvicorn worker processes, monitors them, restarts crashed workers, and handles graceful shutdown. Each Uvicorn worker is a full asyncio event loop capable of handling thousands of concurrent connections.

note

WSGI and ASGI are interfaces, not implementations. WSGI (def app(environ, start_response)) is a synchronous calling convention. ASGI (async def app(scope, receive, send)) is an async calling convention. Neither interface is magic - they are contracts that define how the server calls your framework. The performance difference comes from what the framework does inside the interface: WSGI blocks a thread per request; ASGI multiplexes many requests on one event loop.

Part 5 - Framework Layer: Middleware, Routing, Validation

Step 10–11 - Middleware Stack

FastAPI (via Starlette) executes middleware in a specific order that is critical to understand. Middleware added last is outermost - it wraps everything added before it. Given:

app.add_middleware(AuthMiddleware) # added first → innermost
app.add_middleware(LoggingMiddleware) # added second → middle
app.add_middleware(CORSMiddleware) # added last → outermost

Execution order:

Request: CORSMiddleware → LoggingMiddleware → AuthMiddleware → Handler
Response: AuthMiddleware → LoggingMiddleware → CORSMiddleware → Client

Each middleware calls await call_next(request) to pass the request deeper. The response flows back through middleware in reverse as call_next returns.

Failure signatures from middleware:

MiddlewareFailureStatus code
CORSOrigin not in allowed list400 or no CORS headers (browser blocks)
AuthenticationMissing or invalid token401 Unauthorized
Rate limitingToo many requests429 Too Many Requests
Trusted hostHost header not in allowlist400 Bad Request
warning

Middleware order matters in ways that can create security holes. Authentication middleware must run before any business logic. If you accidentally place logging middleware that reads request.state.user_id before authentication middleware sets it, you get AttributeError. More critically: if you place rate limiting before authentication, unauthenticated requests consume rate limit budget - potential for denial-of-service amplification.

Step 12 - Route Matching

Starlette's router matches the incoming method and path against registered routes. It extracts path parameters from the URL:

POST /users/42/orders/99

Route: /users/{user_id}/orders/{order_id}
Match: user_id=42, order_id=99 (as strings initially)

If no route matches: 404 Not Found. If the path matches but the method does not: 405 Method Not Allowed (with an Allow header listing accepted methods).

Step 13 - Dependency Resolution

FastAPI builds a directed acyclic graph (DAG) of all Depends() calls for the matched endpoint and resolves them in topological order (leaves first). Dependencies that do not depend on each other are resolved concurrently (if async).

If any dependency raises HTTPException, the entire request fails immediately with that status code - the handler never runs.

Step 14 - Pydantic Validation

FastAPI collects parameters from three sources:

  • Path parameters: extracted from the URL by the router (string → typed by Pydantic)
  • Query parameters: parsed from ?key=value pairs in the URL
  • Request body: read via receive() callable, parsed as JSON, validated against the Pydantic model

If validation fails at any parameter, FastAPI raises RequestValidationError, which the default exception handler converts to 422 Unprocessable Entity with a structured body identifying every failed field.

Step 15 - Handler Execution

The handler runs with all parameters validated and typed. Any exception that is not caught here propagates to FastAPI's exception handler system. An uncaught non-HTTPException becomes 500 Internal Server Error.

Part 6 - Content Negotiation

HTTP allows clients to declare what content types they can accept and what type the request body is.

Request headers:
Content-Type: application/json ← format of the request body
Accept: application/json ← what the client wants in return
Accept-Encoding: gzip, deflate, br ← compression algorithms client supports
Accept-Language: en-US,en;q=0.9 ← language preference

FastAPI's default behaviour:

  • If Content-Type is not application/json and the endpoint expects a JSON body, FastAPI returns 422 (Pydantic cannot parse the body as JSON).
  • FastAPI always returns application/json responses regardless of Accept. If you need content negotiation (returning XML vs JSON based on Accept), you must implement it manually.
  • GZipMiddleware handles Accept-Encoding: gzip automatically - it compresses responses larger than the threshold.

What happens when Content-Type mismatches:

# Sending form data to a JSON endpoint
curl -X POST https://api.example.com/users \
-H "Content-Type: application/x-www-form-urlencoded" \
# Result: 422 - Pydantic tried to parse URL-encoded bytes as JSON and failed

Part 7 - Request IDs and Distributed Tracing

Without request IDs, correlating log entries across a microservice architecture is nearly impossible. A single user action might produce log entries in 5 different services, 20 different log lines each - all interleaved with other users' requests.

The solution is to generate a UUID at the earliest possible point (the outermost middleware) and propagate it through every layer:

import uuid
import logging
from fastapi import FastAPI, Request
from starlette.middleware.base import BaseHTTPMiddleware

logger = logging.getLogger(__name__)
app = FastAPI()

class RequestIDMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
# Use client-provided ID (from upstream service) or generate a new one
request_id = request.headers.get("X-Request-ID") or str(uuid.uuid4())
request.state.request_id = request_id

# Bind to structured logging context so all log lines in this request
# include the request_id automatically
with structlog.contextvars.bound_contextvars(request_id=request_id):
response = await call_next(request)

# Propagate to client so they can report it in bug reports
response.headers["X-Request-ID"] = request_id
return response

Propagate the request ID to downstream services:

import httpx

async def call_downstream_service(request: Request, data: dict) -> dict:
async with httpx.AsyncClient() as client:
response = await client.post(
"https://internal-service/api/endpoint",
json=data,
headers={
"X-Request-ID": request.state.request_id, # forward the ID
"Authorization": f"Bearer {get_service_token()}",
},
)
return response.json()

With this pattern, when a customer reports "my request failed, I got X-Request-ID: abc-123", you search all service logs for request_id=abc-123 and see the complete call chain in chronological order.

tip

Add X-Request-ID middleware to every service you build. It is the single highest-leverage observability investment you can make. The implementation is 10 lines of code. The debugging value in a distributed system is enormous - it converts hour-long production investigations into 2-minute log searches.

Part 8 - Keep-Alive and Connection Reuse

Opening a new TCP connection for every HTTP request is expensive: DNS lookup + TCP handshake + TLS handshake costs 100–400 ms before the first byte of your request arrives. HTTP keep-alive reuses the same TCP connection for multiple requests.

# Without keep-alive: one TCP connection per request
Client → DNS → TCP handshake → TLS handshake → POST /users → response → connection closed
Client → DNS (cached) → TCP handshake → TLS handshake → GET /users/1 → response → closed

# With keep-alive: one connection, multiple requests
Client → DNS → TCP handshake → TLS handshake → [persistent connection]
→ POST /users → response (connection stays open)
→ GET /users/1 → response (same connection)
→ GET /users/2 → response (same connection)
→ ... idle timeout → connection closed

Nginx keeps connections to clients alive for 65 seconds by default (keepalive_timeout 65). Nginx also maintains a pool of persistent connections to Uvicorn (keepalive 32 in the upstream block) - avoiding per-request TCP overhead between Nginx and Python.

HTTP/2 takes this further: it multiplexes multiple requests over a single TCP connection simultaneously (not sequentially). A browser loading a page with 50 resources sends all 50 requests in parallel over one HTTP/2 connection.

Part 9 - Where Errors Occur at Each Layer

This table maps each step to its failure mode and resulting status code:

StepLayerFailureStatus / Error
1DNSName not foundClient error - no HTTP
2TCPConnection refusedClient error - no HTTP
3TLSCertificate invalidClient error - no HTTP
5Load balancerAll backends down502 Bad Gateway
5Load balancerBackend timeout504 Gateway Timeout
6Nginxclient_max_body_size exceeded413 Content Too Large
6NginxUpstream not running502 Bad Gateway
6NginxUpstream read timeout504 Gateway Timeout
9UvicornWorker crashed502 Bad Gateway (Nginx sees disconnect)
11Auth middlewareInvalid token401 Unauthorized
11Rate limit middlewareQuota exceeded429 Too Many Requests
11CORS middlewareOrigin rejected400 / no CORS headers
12RouterPath not found404 Not Found
12RouterMethod not allowed405 Method Not Allowed
13DependenciesAuth check fails401 / 403
14PydanticInvalid body/params422 Unprocessable Entity
15HandlerUnhandled exception500 Internal Server Error
15HandlerHTTPException(404) raised404 Not Found
danger

Never log request bodies that may contain passwords, credit card numbers, or other sensitive data. The POST body to /login contains the user's plaintext password. The POST body to /payments may contain a card number. Logging these to your centralised log system creates a compliance violation and a security breach vector. Log only safe fields: method, path, status code, duration, request ID, and authenticated user ID.

Part 10 - Deploying: Gunicorn + Uvicorn Worker Pattern

# Production deployment command
gunicorn myapp.main:app \
--workers 4 \
--worker-class uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:8000 \
--timeout 30 \
--keepalive 5 \
--access-logfile - \
--error-logfile - \
--log-level info
┌─────────────────────────────────┐
│ Gunicorn │
│ (process manager, master PID) │
└────────────┬────────────────────┘
┌─────────────────┼──────────────────────┐
↓ ↓ ↓
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Uvicorn │ │ Uvicorn │ │ Uvicorn │
│ Worker 1 │ │ Worker 2 │ ... │ Worker N │
│ (asyncio │ │ (asyncio │ │ (asyncio │
│ event loop) │ │ event loop) │ │ event loop) │
└──────────────┘ └──────────────┘ └──────────────┘

Worker count rule of thumb: (2 × CPU cores) + 1. For a 4-core machine, run 9 workers. Each worker handles thousands of concurrent async requests. If your handlers are CPU-bound, more workers help. If they are I/O-bound (database calls, HTTP calls), fewer workers with async are more efficient.

Graded Practice

Level 1 - Identify the Layer

For each error, identify which layer produced it and what the most likely cause is:

  1. curl: (6) Could not resolve host: api.example.com
  2. HTTP/1.1 413 Request Entity Too Large
  3. HTTP/1.1 502 Bad Gateway with an Nginx HTML body
  4. HTTP/1.1 422 Unprocessable Entity with a JSON body listing field errors
  5. HTTP/1.1 404 Not Found with a JSON body {"detail": "Not Found"}
  6. HTTP/1.1 504 Gateway Timeout
Show Answer
  1. DNS layer. The DNS resolver could not find api.example.com. Possible causes: DNS record deleted, DNS propagation delay after a change, resolver misconfiguration, or network connectivity to the resolver is broken. No bytes reached the server.

  2. Nginx layer. The request body exceeds Nginx's client_max_body_size directive (default 1 MB). Your Python application never saw the request. Fix: increase client_max_body_size in nginx.conf if the large body is legitimate (e.g., file upload endpoint).

  3. Nginx layer (upstream failure). Nginx received an invalid or no response from the upstream server (Uvicorn). Likely causes: Uvicorn process crashed, is not running, or the upstream port is wrong. Check systemctl status for your application and journalctl for crash logs.

  4. Pydantic validation layer (FastAPI). The request body or query parameters did not match the endpoint's declared types or constraints. The JSON body identifies exactly which fields failed. This is the expected behaviour when a client sends malformed data - it is not a server bug.

  5. Router layer (FastAPI/Starlette). The path /whatever is not registered in the application's route table. The JSON body (rather than an HTML body) confirms that Nginx forwarded the request successfully to FastAPI, and FastAPI's 404 handler responded. If the body was HTML, Nginx itself would be the source.

  6. Nginx layer (upstream timeout). Nginx waited for the upstream (Uvicorn/FastAPI) to respond but the timeout expired (proxy_read_timeout). Likely causes: the handler is performing a slow database query, waiting on an external HTTP call, or is deadlocked. The request did reach the application - look in application logs for the matching request that never completed.

Level 2 - Debug the Production Incident

A customer reports: "My request to POST /orders returned a 500 error. The request ID in the header is X-Request-ID: f4a2b3c1-dead-beef-0000-aabbccddeeff."

You search the logs and find:

INFO [f4a2b3c1] POST /orders user_id=88 status=500 duration_ms=12043
ERROR [f4a2b3c1] sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 0
reached, connection timed out, timeout 10
  1. What is the root cause?
  2. Why did it take 12 seconds to fail?
  3. What are three fixes, in order of urgency?
  4. What monitoring would have caught this before the customer reported it?
Show Answer
  1. Root cause: SQLAlchemy connection pool exhaustion. The application has a connection pool configured with pool_size=5, max_overflow=0. All 5 database connections are in use by other requests when this request tries to acquire one. The request waits 10 seconds (the timeout setting) for a connection to become available, then fails with TimeoutError, which propagates as a 500.

  2. 12 seconds = 10 second pool timeout + ~2 seconds of processing before the DB call. The request reached the handler, did some work, then hit the database call and waited the full 10-second timeout before failing.

  3. Three fixes in order of urgency:

    Immediate (minutes): Increase max_overflow temporarily to let the pool grow:

    engine = create_async_engine(DATABASE_URL, pool_size=5, max_overflow=10)

    This is a band-aid - it increases the maximum load the database can handle but does not fix the underlying cause.

    Short term (hours): Investigate what is holding connections open. Are any transactions uncommitted? Are there long-running queries? Is the pool size appropriate for the number of Uvicorn workers? Rule of thumb: pool_size per worker × number of workers ≤ database max_connections.

    Medium term (days): Add a connection pool monitoring metric (SQLAlchemy emits pool events). Alert when pool utilisation exceeds 80%. Add a shorter pool_timeout (5 seconds instead of 10) so failures are faster and less painful. Add circuit breaking to avoid cascading failures if the database becomes unavailable.

  4. Monitoring that would have caught this first:

    • A histogram metric for sqlalchemy_pool_size_used - alert at 80% utilisation
    • A p99 request latency alert - 12-second responses are 100× above normal; a latency alert would have fired before the first customer complaint
    • An error rate alert on 5xx responses - even a single 500 in a 5-minute window should page on-call for a critical endpoint like /orders

Level 3 - Design Challenge

You are adding request tracing to an existing FastAPI application. The system has:

  • An API gateway (Kong) that generates X-Request-ID headers
  • Three internal microservices (User Service, Order Service, Payment Service)
  • PostgreSQL databases, one per service
  • A centralised logging system (Elasticsearch)
  • Each service is a separate FastAPI application

Design a complete request tracing system that:

  1. Preserves request IDs across service boundaries
  2. Adds trace context to every log line without manual passing
  3. Allows you to reconstruct the full request chain for any request ID
  4. Does not require changes to individual handler functions
Show Answer

Architecture: Context-var-based propagation + structured logging

The key insight is to use Python's contextvars.ContextVar - a per-async-task variable that is automatically inherited by tasks spawned from the current context. This allows middleware to set a request ID once, and all logging within that request's execution context automatically includes it.

Step 1 - Request context storage:

# shared/context.py
from contextvars import ContextVar
from typing import Optional

request_id_var: ContextVar[Optional[str]] = ContextVar("request_id", default=None)
service_name_var: ContextVar[str] = ContextVar("service_name", default="unknown")

Step 2 - Middleware that extracts/generates request ID:

# shared/middleware.py
from fastapi import Request
from starlette.middleware.base import BaseHTTPMiddleware
import uuid
from .context import request_id_var, service_name_var

class TracingMiddleware(BaseHTTPMiddleware):
def __init__(self, app, service_name: str):
super().__init__(app)
self.service_name = service_name

async def dispatch(self, request: Request, call_next):
# Accept from upstream or generate new
request_id = (
request.headers.get("X-Request-ID")
or request.headers.get("X-Kong-Request-ID")
or str(uuid.uuid4())
)
# Set context vars - inherited by all coroutines in this request
token_rid = request_id_var.set(request_id)
token_svc = service_name_var.set(self.service_name)

request.state.request_id = request_id

try:
response = await call_next(request)
response.headers["X-Request-ID"] = request_id
return response
finally:
# Reset context vars after request completes
request_id_var.reset(token_rid)
service_name_var.reset(token_svc)

Step 3 - Structured logging filter that reads context vars:

# shared/logging.py
import logging
from .context import request_id_var, service_name_var

class RequestContextFilter(logging.Filter):
def filter(self, record):
record.request_id = request_id_var.get()
record.service = service_name_var.get()
return True

# Configure in each service's startup:
logging.getLogger().addFilter(RequestContextFilter())
# Use JSON formatter (e.g., python-json-logger) so Elasticsearch can index fields

Step 4 - HTTP client that propagates request ID to downstream services:

# shared/http_client.py
import httpx
from .context import request_id_var

class TracedAsyncClient:
"""Drop-in replacement for httpx.AsyncClient that forwards tracing headers."""

async def request(self, method: str, url: str, **kwargs) -> httpx.Response:
headers = kwargs.pop("headers", {})
request_id = request_id_var.get()
if request_id:
headers["X-Request-ID"] = request_id
async with httpx.AsyncClient() as client:
return await client.request(method, url, headers=headers, **kwargs)

Step 5 - Per-service setup (no handler changes):

# order_service/main.py
from fastapi import FastAPI
from shared.middleware import TracingMiddleware

app = FastAPI(title="Order Service")
app.add_middleware(TracingMiddleware, service_name="order-service")
# All handlers automatically log with request_id and service fields

Result: Every log line in every service includes {"request_id": "f4a2b3c1-...", "service": "order-service", ...}. An Elasticsearch query for request_id: "f4a2b3c1-..." returns all log lines from all three services in chronological order - the complete trace of one user's request, without any manual instrumentation in handlers.

Key Takeaways

  • A single HTTP request traverses 15+ distinct layers before the handler runs - DNS, TCP, TLS, load balancer, reverse proxy, OS socket, ASGI server, middleware stack, router, dependency injection, Pydantic validation, handler, response serialisation, and back out through middleware. Each layer is a distinct failure domain with its own error signature and status code.
  • Every non-HTTP error happens before Nginx. If curl reports a connection error rather than an HTTP status code, the problem is in DNS, TCP, or TLS - not your application. Server logs will be empty.
  • 502 Bad Gateway from Nginx means your Python application is unreachable - Uvicorn crashed, is not running, or the port is wrong. 504 Gateway Timeout means the application is running but too slow.
  • 422 Unprocessable Entity is always from your framework's validation layer - the request reached FastAPI but the body or parameters did not match the declared schema. It is not a server error; it is a client sending malformed data.
  • WSGI and ASGI are calling conventions, not magic. WSGI blocks a thread per request. ASGI multiplexes many requests on one event loop. The concurrency benefit of ASGI is only realised if every I/O call inside async def uses await - otherwise you block the event loop and lose all the benefit.
  • X-Request-ID middleware is the highest-leverage observability investment in a distributed system. Generate a UUID at the outermost middleware, propagate it to downstream services via HTTP headers, include it in every log line, and return it to the client. This makes production incidents debuggable in minutes instead of hours.
  • Content-Type and Accept are separate concerns. Content-Type describes the request body format. Accept describes what format the client wants in the response. FastAPI validates Content-Type (defaults to requiring application/json) but ignores Accept (always returns JSON). Mismatching Content-Type produces 422.
  • Keep-alive connection reuse eliminates 100–400 ms of overhead per request. Nginx maintains persistent connections to both clients and upstream Uvicorn workers. HTTP/2 goes further by multiplexing requests over a single connection.
  • Never log request bodies on authentication or payment endpoints. Plaintext passwords and card numbers in log files are compliance violations and attack vectors. Log only method, path, status, duration, request ID, and user ID.
  • Middleware order is a security property. Authentication must run before any middleware or handler that trusts request.state.user. Rate limiting before authentication allows unauthenticated traffic to exhaust rate limits. Getting middleware order wrong creates subtle security holes that are invisible in unit tests.
© 2026 EngineersOfAI. All rights reserved.