Skip to main content

HTTP Deep Dive - What Actually Travels Over the Wire

Reading time: ~35 minutes | Level: Intermediate → Engineering

Before reading further, predict what this code does step by step:

import socket

# Raw HTTP request - no library, no abstraction
sock = socket.create_connection(("httpbin.org", 80))
sock.sendall(b"GET /get HTTP/1.1\r\nHost: httpbin.org\r\nConnection: close\r\n\r\n")
response = b""
while chunk := sock.recv(4096):
response += chunk
sock.close()
print(response[:500].decode())
Show Answer

This sends a raw HTTP/1.1 GET request directly over a TCP socket and receives the full HTTP response. The output will look like:

HTTP/1.1 200 OK
Date: Wed, 05 Mar 2026 10:00:00 GMT
Content-Type: application/json
Content-Length: 312
Connection: close
Server: gunicorn/19.9.0

{
"args": {},
"headers": {
"Host": "httpbin.org"
},
"url": "http://httpbin.org/get"
}

This is all requests does - wrap this socket interaction. The requests library adds connection pooling, keep-alive, retry logic, cookie management, authentication helpers, SSL/TLS, and a cleaner API. But at the wire level, every HTTP request is exactly this: a formatted text message over a TCP connection.

Understanding this demystifies every HTTP library, every framework error message, and every network debugging session.

When requests.get("https://api.github.com/users/python") hangs in production and you have no idea why, you are debugging a socket. When a ConnectionResetError fires after 300 requests, it is the connection pool telling you the server closed a keep-alive connection. When you see SSLError: certificate verify failed, it is the TLS handshake failing before a single byte of HTTP is sent. These are not library bugs - they are the wire asserting itself. This lesson makes the wire legible.

What You Will Learn

  • HTTP/1.1 wire format: every field, every delimiter, exact byte representation
  • HTTP method semantics: safe, idempotent - why these properties matter for caching and retries
  • Status code families: what each family means and which codes matter in production
  • Critical headers: Content-Type, Authorization, Cache-Control, ETag, X-Request-ID
  • HTTP request/response lifecycle: DNS → TCP → TLS → HTTP → response → connection reuse
  • HTTP/2: multiplexing and header compression - why it matters for high-throughput APIs
  • HTTP/3 and QUIC: what changed and why
  • requests: Session, HTTPAdapter, retry, timeout, hooks
  • httpx: async-capable, HTTP/2 support, drop-in familiar API
  • Connection pooling: why requests.get() in a loop is a performance anti-pattern

Prerequisites

  • Python functions, loops, and basic networking concepts (TCP/IP at a high level)
  • Basic familiarity with requests.get() at the usage level
  • Comfort with dictionaries and string formatting

Part 1 - The HTTP/1.1 Wire Format

Anatomy of an HTTP Request

An HTTP/1.1 request is plain text with a precise structure. Every field matters:

GET /users/123?include=orders HTTP/1.1\r\n
Host: api.example.com\r\n
Accept: application/json\r\n
Authorization: Bearer eyJhbGciOiJSUzI1NiJ9...\r\n
Accept-Encoding: gzip, deflate, br\r\n
User-Agent: python-requests/2.31.0\r\n
X-Request-ID: f47ac10b-58cc-4372-a567-0e02b2c3d479\r\n
\r\n

Breaking it down:

ComponentDescription
GETHTTP method
/users/123?include=ordersRequest target (path + query string)
HTTP/1.1Protocol version
\r\nCRLF - carriage return + line feed - required between each line
Host: api.example.comRequired header in HTTP/1.1; tells the server which virtual host to route to
Blank line (\r\n\r\n)Signals end of headers; everything after is the body

For a POST request with a JSON body:

POST /users HTTP/1.1\r\n
Host: api.example.com\r\n
Content-Type: application/json\r\n
Content-Length: 38\r\n
Authorization: Bearer eyJhbGciOiJSUzI1NiJ9...\r\n
\r\n
{"name": "Alice", "email": "[email protected]"}

Content-Length tells the server how many bytes to read as the body. If it is wrong - too short (body truncated) or too long (server waits forever) - the request is broken.

Anatomy of an HTTP Response

HTTP/1.1 201 Created\r\n
Content-Type: application/json\r\n
Content-Length: 67\r\n
Location: /users/456\r\n
X-Request-ID: f47ac10b-58cc-4372-a567-0e02b2c3d479\r\n
ETag: "33a64df551425fcc55e4d42a148795d9f25f89d"\r\n
\r\n
{"id": 456, "name": "Alice", "email": "[email protected]", "created_at": "2026-03-05"}

The status line has three parts: protocol version, status code, and reason phrase. The reason phrase is informational and ignored by machines - the status code is what matters.

Part 2 - HTTP Methods: Semantics, Not Just Conventions

HTTP methods have two properties that define how infrastructure (proxies, CDNs, browsers, load balancers) treats requests:

  • Safe: the request does not modify server state (read-only). Infrastructure can retry safe requests freely.
  • Idempotent: repeating the request N times has the same effect as sending it once. Infrastructure can retry idempotent requests after a network failure.
MethodSafeIdempotentUse
GETYesYesRead a resource. Cache-able by default.
HEADYesYesLike GET but no body. Used to check existence, get headers, validate cache.
OPTIONSYesYesDiscover allowed methods. Used in CORS preflight.
POSTNoNoCreate a resource or trigger an action. Never cache. Never auto-retry.
PUTNoYesFull replace of a resource. Idempotent: sending the same PUT twice results in the same state.
PATCHNoNo (by default)Partial update. Not idempotent unless the patch is defined that way.
DELETENoYesDelete a resource. First call deletes; subsequent calls are 404 or 204. Same end state.

:::danger Never Use GET for State-Changing Operations Proxies, CDNs, and browsers cache GET requests. A browser may serve a cached response for a GET. A CDN may serve the same cached GET to a million users. A monitoring tool may probe your GET endpoint every 30 seconds. If your GET route deletes data, creates records, or triggers payments, any of these will cause production incidents. Use POST, PUT, PATCH, or DELETE for operations with side effects - no exceptions. :::

:::note Idempotency in Practice PUT /users/123 with {"name": "Alice"} is idempotent - sending it ten times leaves the user named Alice. POST /users with {"name": "Alice"} is not - sending it ten times creates ten Alice records. This is why payment APIs use idempotency keys on POST requests: the key makes the server treat repeated POSTs as idempotent on the application layer. :::

Part 3 - Status Code Families

1xx - Informational

Rarely seen in Python client code. 100 Continue tells the client it may send a large body after confirming the server will accept it.

2xx - Success

CodeNameWhen to use
200 OKSuccessGET, PUT, PATCH responses with a body
201 CreatedCreatedPOST response after creating a resource; include Location header
204 No ContentNo contentDELETE, or PUT/PATCH with no response body

3xx - Redirection

CodeNameBehaviour
301 Moved PermanentlyPermanent redirectBrowser/client updates bookmarks; method may change to GET
302 FoundTemporary redirectMethod may change to GET on redirect (legacy)
307 Temporary RedirectTemporary, method preservedClient must repeat with same method (POST stays POST)
308 Permanent RedirectPermanent, method preservedSame as 307 but permanent

requests follows 301/302 redirects automatically and changes POST to GET - which is the standard browser behaviour but wrong for API clients that POST to an endpoint that has moved. Set allow_redirects=False and handle redirects explicitly for POST/PUT.

4xx - Client Error

The client sent a request the server could not process. Retrying without changing the request is pointless.

CodeNameWhen
400 Bad RequestMalformed requestSyntax error, missing field
401 UnauthorizedNot authenticatedNo credentials or invalid credentials (misleading name)
403 ForbiddenNot authorizedAuthenticated but lacks permission
404 Not FoundResource not foundResource does not exist at this URL
409 ConflictConflictDuplicate resource, version conflict
422 Unprocessable EntityValidation failedSyntactically valid but semantically invalid (FastAPI uses this)
429 Too Many RequestsRate limitedCheck Retry-After header before retrying

:::warning 401 vs 403: The Naming Confusion 401 Unauthorized should have been called "Unauthenticated" - it means the request lacks valid credentials. 403 Forbidden should have been called "Unauthorized" - it means the request has credentials but lacks permission. The naming is historical and cannot be changed. In your API, return 401 when credentials are missing or invalid, and 403 when the authenticated identity lacks permission. :::

5xx - Server Error

The server failed to process a valid request. Clients should retry with backoff; requests may succeed after the server recovers.

CodeNameWhen
500 Internal Server ErrorUnhandled exceptionBug in server code
502 Bad GatewayGateway received invalid responseUpstream service returned garbage
503 Service UnavailableServer temporarily downOverloaded or in maintenance
504 Gateway TimeoutGateway timed out waitingUpstream service too slow

Part 4 - Critical Headers

Request Headers

import requests

response = requests.get(
"https://api.example.com/users/123",
headers={
# Tell the server what format you want back
"Accept": "application/json",

# Tell the server you accept compressed responses
"Accept-Encoding": "gzip, deflate, br",

# Bearer token authentication
"Authorization": "Bearer eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJ1c2VyXzEyMyJ9...",

# Distributed tracing - attach to every request in a chain
"X-Request-ID": "f47ac10b-58cc-4372-a567-0e02b2c3d479",

# Conditional GET - only return body if resource changed since ETag
"If-None-Match": '"33a64df551425fcc55e4d42a148795d9f25f89d"',
}
)

Response Headers

HeaderMeaning
Content-Type: application/json; charset=utf-8Body is JSON, UTF-8 encoded
Content-Length: 1024Body is exactly 1024 bytes
Location: /users/456Used with 201 Created or 3xx redirects
ETag: "abc123"Opaque identifier for the current version of the resource
Last-Modified: Wed, 05 Mar 2026 10:00:00 GMTWhen the resource was last changed
Cache-Control: max-age=3600, privateCache this response for 1 hour, only in private (browser) caches
Cache-Control: no-storeNever cache this response
Retry-After: 60Wait 60 seconds before retrying (used with 429 and 503)
X-Request-ID: f47ac10...Echo back the request ID for distributed tracing

Authorization Patterns

import base64
import requests

# Basic Auth - base64-encode "username:password"
credentials = base64.b64encode(b"alice:secret_password").decode()
headers = {"Authorization": f"Basic {credentials}"}
# OR use requests' built-in:
response = requests.get(url, auth=("alice", "secret_password"))

# Bearer Token - JWT or opaque token
headers = {"Authorization": f"Bearer {access_token}"}

# API Key - varies by service; often in a custom header
headers = {"X-API-Key": "your-api-key-here"}
# OR in query string (less secure - ends up in logs):
response = requests.get(f"{url}?api_key={key}")

ETag and Conditional Requests

ETags enable efficient cache validation - the client only downloads the body when it has changed:

import requests

session = requests.Session()

# First request - server returns ETag
response = session.get("https://api.example.com/users/123")
etag = response.headers.get("ETag") # e.g. '"33a64df551..."'
data = response.json() # parse and cache locally

# Later - conditional GET: only download if resource changed
response = session.get(
"https://api.example.com/users/123",
headers={"If-None-Match": etag}
)

if response.status_code == 304: # Not Modified
# Use cached data - body is empty, ETag still valid
pass
elif response.status_code == 200:
# Resource changed - parse fresh data and update local cache
data = response.json()
etag = response.headers.get("ETag")

Part 5 - HTTP Request/Response Lifecycle

For a cold HTTPS request to a new host, there are three round trips before the first byte of HTTP:

  1. DNS resolution (0–300ms depending on TTL and resolver)
  2. TCP handshake (one RTT - typically 20–200ms)
  3. TLS 1.3 handshake (one RTT with session resumption, or 1.5 RTT cold)

Then the HTTP request and response (one RTT minimum for small responses). This is why connection reuse (keep-alive) and connection pooling matter so much - they eliminate round trips 2 and 3 for subsequent requests to the same host.

Part 6 - Connection Management and the requests Library

Why requests.get() in a Loop Is Wrong

import requests

users = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# WRONG - creates a new TCP + TLS connection for each request
# 10 requests = 10 DNS lookups + 10 TCP handshakes + 10 TLS handshakes
for user_id in users:
response = requests.get(f"https://api.example.com/users/{user_id}")
process(response.json())
import requests

# RIGHT - Session reuses the connection pool
session = requests.Session()

# Session persists: cookies, auth headers, base headers, connection pool
session.headers.update({
"Authorization": "Bearer your-token",
"Accept": "application/json",
})

for user_id in users:
response = session.get(f"https://api.example.com/users/{user_id}")
process(response.json())

:::warning requests.get() Creates a New Connection Each Time Every call to requests.get(), requests.post(), etc. at the module level creates a new Session internally and tears it down after the call. For a single request, this is fine. For multiple requests to the same host - even two - use a Session explicitly. The connection pool reuses the TCP and TLS layer, reducing latency from hundreds of milliseconds per request to under 10ms. :::

Timeouts: Always Set Them Explicitly

import requests

# WRONG - no timeout; this can block forever
response = requests.get("https://slow-api.example.com/data")

# RIGHT - connect timeout and read timeout separately
response = requests.get(
"https://api.example.com/data",
timeout=(3.05, 30)
# (connect_timeout, read_timeout)
# connect: how long to wait to establish the connection
# read: how long to wait between bytes once connected
)

# OR a single value - applies to both connect and read
response = requests.get("https://api.example.com/data", timeout=10)

:::tip Always Set Explicit Timeouts A requests call without a timeout will block indefinitely if the server stops responding mid-transfer. In production, this means a single slow upstream API can exhaust your thread pool, taking down the entire service. Set timeout=(connect_seconds, read_seconds) on every request. The connect timeout should be short (1–5 seconds); the read timeout depends on the expected response size and server processing time. :::

Retry Logic with HTTPAdapter

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def create_session_with_retry() -> requests.Session:
"""
Production-grade requests Session with:
- 3 retries on transient failures
- Exponential backoff (0.5s, 1s, 2s)
- Retry only on 5xx errors and connection failures
- NOT retrying on 4xx (client errors are not transient)
"""
session = requests.Session()

retry_strategy = Retry(
total=3,
backoff_factor=0.5, # waits: 0.5, 1.0, 2.0 seconds
status_forcelist=[500, 502, 503, 504],
allowed_methods=["GET", "HEAD", "OPTIONS"], # safe + idempotent only
raise_on_status=False, # don't raise; let the caller check status
)

adapter = HTTPAdapter(
max_retries=retry_strategy,
pool_connections=10, # number of connection pools (one per host)
pool_maxsize=20, # max connections per pool
)

session.mount("https://", adapter)
session.mount("http://", adapter)

return session


session = create_session_with_retry()
response = session.get("https://api.example.com/data", timeout=(3, 30))
response.raise_for_status() # raises HTTPError for 4xx/5xx after retries exhausted
data = response.json()

:::danger Never Disable SSL Verification in Production requests.get(url, verify=False) disables certificate validation entirely. It silences the SSL warning, but it does not fix the underlying problem - it opens the connection to man-in-the-middle attacks. Any intermediate network node (corporate proxy, cloud load balancer, ISP) can intercept and read the traffic. Fix SSL errors properly: update the CA bundle (pip install certifi), specify the correct CA file (verify="/path/to/ca-bundle.crt"), or debug the certificate chain. Never ship verify=False in production code. :::

Request Hooks

import requests
import time
import logging

logger = logging.getLogger(__name__)

def log_response_time(response, *args, **kwargs):
"""Hook called after every response. Logs timing for observability."""
elapsed = response.elapsed.total_seconds()
logger.info(
"HTTP %s %s → %d in %.3fs",
response.request.method,
response.request.url,
response.status_code,
elapsed,
)
if elapsed > 1.0:
logger.warning("Slow response: %.3fs for %s", elapsed, response.request.url)

session = requests.Session()
session.hooks["response"].append(log_response_time)

# Every response through this session logs timing automatically
response = session.get("https://api.example.com/users/123", timeout=10)

Part 7 - httpx: The Modern Alternative

httpx is a near drop-in replacement for requests that adds async support and HTTP/2:

import httpx

# Synchronous - identical API to requests
with httpx.Client(timeout=10) as client:
response = client.get("https://api.example.com/users/123")
data = response.json()

# Async - same client, async methods
import asyncio

async def fetch_users(user_ids: list[int]) -> list[dict]:
"""Fetch multiple users concurrently - impossible with synchronous requests."""
async with httpx.AsyncClient(timeout=10) as client:
tasks = [
client.get(f"https://api.example.com/users/{uid}")
for uid in user_ids
]
responses = await asyncio.gather(*tasks)
return [r.json() for r in responses]

# HTTP/2 - multiplexes multiple requests over one connection
async def fetch_with_http2():
async with httpx.AsyncClient(http2=True, timeout=10) as client:
response = await client.get("https://api.example.com/users/123")
return response.json()

asyncio.run(fetch_with_http2())
Featurerequestshttpx
SynchronousYesYes
AsyncNoYes
HTTP/2NoYes (with httpx[http2])
API familiarityReferenceVery similar to requests
StreamingYesYes
Test clientNo (use responses)Yes (built-in httpx.MockTransport)

Part 8 - HTTP/2 and HTTP/3

HTTP/2: Multiplexing and Header Compression

HTTP/1.1 has two performance problems:

  1. Head-of-line blocking: requests on a single connection are serialized. If response #1 is large, response #2 waits.
  2. Header repetition: every request sends the full header set (often 500–1000 bytes of repeated Cookie, User-Agent, Authorization).

HTTP/2 solves both:

HPACK header compression maintains a shared table of previously sent headers. After the first request sends Authorization: Bearer eyJ..., subsequent requests send a single-byte reference to that header instead of the full value. For APIs with large auth tokens, this reduces per-request overhead by 60–90%.

Server push (rarely used in APIs): the server can send responses the client has not asked for yet. Used in web browsers for CSS/JS assets but almost never in API design.

HTTP/3 and QUIC

HTTP/3 replaces TCP with QUIC (UDP-based, with reliability built in):

  • Eliminates TCP head-of-line blocking at the transport layer (HTTP/2 still suffers this at TCP level)
  • Faster connection establishment (0-RTT for resumed sessions)
  • Better performance on lossy mobile networks (packet loss only blocks one stream, not all)

Python support for HTTP/3 is early-stage as of 2026. Most production Python APIs are HTTP/1.1 or HTTP/2. QUIC matters more for CDN edge nodes and mobile clients than for server-to-server API calls.

Part 9 - A Production HTTP Client Pattern

import uuid
import logging
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
from typing import Any

logger = logging.getLogger(__name__)


class APIClient:
"""
Production HTTP client for a single upstream API.

Features:
- Connection pooling via Session
- Retry with exponential backoff on 5xx
- Explicit timeouts on every request
- Request ID injection for distributed tracing
- Response time logging
- Consistent error handling
"""

def __init__(self, base_url: str, api_key: str, timeout: tuple = (3, 30)):
self.base_url = base_url.rstrip("/")
self.timeout = timeout

self._session = requests.Session()
self._session.headers.update({
"Authorization": f"Bearer {api_key}",
"Accept": "application/json",
"Content-Type": "application/json",
})

retry = Retry(
total=3,
backoff_factor=0.5,
status_forcelist=[500, 502, 503, 504],
allowed_methods=["GET", "HEAD", "DELETE"],
)
adapter = HTTPAdapter(max_retries=retry, pool_maxsize=20)
self._session.mount("https://", adapter)
self._session.mount("http://", adapter)

self._session.hooks["response"].append(self._log_response)

def _log_response(self, response, *args, **kwargs):
elapsed = response.elapsed.total_seconds()
logger.info(
"API %s %s → %d (%.3fs)",
response.request.method,
response.request.url,
response.status_code,
elapsed,
)

def _request(self, method: str, path: str, **kwargs) -> requests.Response:
url = f"{self.base_url}/{path.lstrip('/')}"
request_id = str(uuid.uuid4())

headers = kwargs.pop("headers", {})
headers["X-Request-ID"] = request_id

response = self._session.request(
method,
url,
headers=headers,
timeout=self.timeout,
**kwargs,
)
response.raise_for_status()
return response

def get(self, path: str, **kwargs) -> Any:
return self._request("GET", path, **kwargs).json()

def post(self, path: str, body: dict, **kwargs) -> Any:
return self._request("POST", path, json=body, **kwargs).json()

def patch(self, path: str, body: dict, **kwargs) -> Any:
return self._request("PATCH", path, json=body, **kwargs).json()

def delete(self, path: str, **kwargs) -> None:
self._request("DELETE", path, **kwargs)

def close(self):
self._session.close()

def __enter__(self):
return self

def __exit__(self, *exc):
self.close()
return False


# Usage
with APIClient("https://api.example.com", api_key="sk-...") as client:
user = client.get("/users/123")
client.patch("/users/123", body={"email": "[email protected]"})
client.delete("/users/456")

:::note HTTP Is Stateless Every HTTP request is independent. The server has no memory of previous requests from the same client. Cookies, sessions, JWTs, and API keys are all application-layer illusions built on top of a fundamentally stateless protocol. Each request must carry all the context the server needs to process it. This is a feature - it makes HTTP services horizontally scalable, because any server instance can handle any request. :::

Graded Practice Challenges

Level 1 - Predict and Identify

Question 1: What is the minimum number of network round trips needed before an HTTP response body can start arriving on a cold HTTPS connection?

Show Answer

Three round trips before HTTP:

  1. DNS resolution (not strictly a round trip in the HTTP sense, but a network call)
  2. TCP handshake (SYN / SYN-ACK / ACK = one RTT)
  3. TLS 1.3 handshake (one RTT for the key exchange)

Then one more RTT for the HTTP request and the first response bytes to arrive.

Total: 3–4 RTTs on a cold connection. On a warm connection (Session with keep-alive), steps 2 and 3 are skipped - the HTTP request goes out immediately. This is why Session reuse matters.

Question 2: Which HTTP methods are both safe AND idempotent? Which are idempotent but not safe? Which are neither?

Show Answer

Safe AND idempotent (read-only, repeatable): GET, HEAD, OPTIONS

Idempotent but NOT safe (write operations with stable end state): PUT, DELETE

Neither safe nor idempotent: POST, PATCH

The practical consequence: GET, HEAD, and OPTIONS can be retried automatically by any infrastructure (proxy, load balancer, retry adapter). PUT and DELETE can be retried by the application if a network error occurs mid-flight. POST and PATCH must not be automatically retried - a duplicate POST creates two records; a duplicate PATCH may apply the change twice (unless the server implements idempotency keys).

Question 3: What does this code actually do differently from a plain requests.get() call?

session = requests.Session()
session.mount("https://", HTTPAdapter(pool_maxsize=20))
response = session.get("https://api.example.com/users/1", timeout=(3, 30))
Show Answer

Three differences from requests.get():

  1. Connection pooling: the Session holds up to 20 open HTTPS connections to api.example.com. Subsequent requests to the same host reuse existing connections, skipping the TCP + TLS handshake (saving 50–400ms per request depending on RTT).

  2. Timeout: (3, 30) sets a 3-second connect timeout and a 30-second read timeout. Plain requests.get() without timeout blocks indefinitely. This is the most important difference for production safety.

  3. Reusable configuration: Session persists headers, auth, and cookies across requests. Any subsequent call to session.get() or session.post() reuses the same pool and configuration without re-initialization overhead.

Question 4: A service makes 1000 API calls per hour to the same external host. Currently it uses requests.get() at the module level. Estimate how much time is wasted on connection overhead compared to using a Session.

Show Answer

Assumptions: typical RTT to a public HTTPS endpoint is 80ms. Each cold connection requires:

  • TCP handshake: ~80ms (1 RTT)
  • TLS 1.3 handshake: ~80ms (1 RTT)
  • Total overhead per request: ~160ms

With requests.get() (new connection each time):

  • 1000 requests × 160ms = 160 seconds of wasted connection overhead per hour

With Session (connection pool reused):

  • First request: 160ms overhead
  • Subsequent 999 requests: ~0ms connection overhead
  • Total overhead: ~160ms

Net savings: ~160 seconds per hour - wasted purely on redundant TCP + TLS handshakes. At higher request rates this becomes a significant latency and resource cost.

Level 2 - Debug and Fix

Find and fix all problems in this HTTP client code:

import requests

def fetch_all_users(user_ids):
results = []
for uid in user_ids:
resp = requests.get(f"https://api.example.com/users/{uid}")
if resp.status_code == 200:
results.append(resp.json())
elif resp.status_code == 500:
results.append(fetch_all_users([uid])) # retry
return results

def create_user(name, email):
resp = requests.get(
"https://api.example.com/users",
params={"name": name, "email": email}
)
return resp.json()

def delete_user(uid):
resp = requests.post(
"https://api.example.com/users",
json={"user_id": uid, "action": "delete"},
verify=False
)
return resp.status_code == 200
Show Solution

Bug 1 - No Session, no timeout: requests.get() at module level creates a new connection for every call. For a list of user IDs, this is expensive.

Bug 2 - Naive retry causes infinite recursion: fetch_all_users([uid]) on a 500 will recurse on every 500 response - infinite recursion if the server stays down.

Bug 3 - Wrong method for create: requests.get() for creating a user is wrong semantically and dangerous - GET is cacheable and safe. Use POST.

Bug 4 - Wrong method for delete: requests.post() for a delete operation violates REST conventions and is not idempotent as expressed.

Bug 5 - verify=False: SSL verification disabled - man-in-the-middle vulnerability in production.

Fixed version:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def make_session() -> requests.Session:
session = requests.Session()
retry = Retry(
total=3,
backoff_factor=0.5,
status_forcelist=[500, 502, 503, 504],
allowed_methods=["GET", "DELETE"],
)
session.mount("https://", HTTPAdapter(max_retries=retry, pool_maxsize=10))
return session

_SESSION = make_session()

def fetch_all_users(user_ids: list[int]) -> list[dict]:
results = []
for uid in user_ids:
response = _SESSION.get(
f"https://api.example.com/users/{uid}",
timeout=(3, 30),
)
response.raise_for_status() # raises on 4xx/5xx; retry handles 5xx
results.append(response.json())
return results

def create_user(name: str, email: str) -> dict:
response = _SESSION.post(
"https://api.example.com/users",
json={"name": name, "email": email},
timeout=(3, 30),
)
response.raise_for_status()
return response.json()

def delete_user(uid: int) -> None:
response = _SESSION.delete(
f"https://api.example.com/users/{uid}",
timeout=(3, 30),
)
response.raise_for_status()

Level 3 - Design Challenge

Design a RateLimitedHTTPClient class that:

  1. Wraps requests.Session
  2. Respects Retry-After response headers on 429 responses
  3. Maintains a configurable per-host request rate (e.g. max 10 requests/second)
  4. Logs all rate limiting events with host, wait duration, and request URL
  5. Raises RateLimitExceeded after a configurable maximum wait time
Show Reference Solution
import time
import logging
import threading
import requests
from collections import defaultdict, deque

logger = logging.getLogger(__name__)


class RateLimitExceeded(Exception):
pass


class RateLimitedHTTPClient:
"""
HTTP client that enforces per-host rate limits and respects Retry-After.

Args:
requests_per_second: Maximum requests per second per host (token bucket)
max_wait_seconds: Maximum time to wait on a 429 before raising
session: Optional pre-configured requests.Session
"""

def __init__(
self,
requests_per_second: float = 10.0,
max_wait_seconds: float = 60.0,
session: requests.Session = None,
):
self.requests_per_second = requests_per_second
self.max_wait_seconds = max_wait_seconds
self._session = session or requests.Session()

# Per-host sliding window of request timestamps
self._lock = threading.Lock()
self._request_times: dict[str, deque] = defaultdict(
lambda: deque(maxlen=int(requests_per_second * 2))
)

def _host(self, url: str) -> str:
from urllib.parse import urlparse
return urlparse(url).netloc

def _throttle(self, url: str) -> None:
"""Block until the per-host rate limit allows the next request."""
host = self._host(url)
window = 1.0 / self.requests_per_second # seconds between requests

with self._lock:
now = time.monotonic()
times = self._request_times[host]

if len(times) >= self.requests_per_second:
oldest = times[0]
wait = oldest + 1.0 - now
if wait > 0:
logger.debug("Rate throttle: %.3fs for %s", wait, host)
time.sleep(wait)

self._request_times[host].append(time.monotonic())

def _handle_429(self, response: requests.Response, url: str) -> None:
retry_after = response.headers.get("Retry-After", "").strip()
try:
wait = float(retry_after)
except (ValueError, TypeError):
wait = 5.0 # default backoff if Retry-After is missing or invalid

logger.warning(
"Rate limited (429): %s - waiting %.1fs (Retry-After: %r)",
url, wait, retry_after
)

if wait > self.max_wait_seconds:
raise RateLimitExceeded(
f"Server requested {wait}s wait on {url}; "
f"max_wait_seconds={self.max_wait_seconds}"
)

time.sleep(wait)

def request(self, method: str, url: str, **kwargs) -> requests.Response:
kwargs.setdefault("timeout", (3, 30))
attempts = 0
max_attempts = 5

while attempts < max_attempts:
self._throttle(url)
response = self._session.request(method, url, **kwargs)

if response.status_code == 429:
attempts += 1
if attempts >= max_attempts:
raise RateLimitExceeded(
f"Still rate limited after {max_attempts} attempts: {url}"
)
self._handle_429(response, url)
continue

return response

raise RateLimitExceeded(f"Max retry attempts exceeded for {url}")

def get(self, url: str, **kwargs) -> requests.Response:
return self.request("GET", url, **kwargs)

def post(self, url: str, **kwargs) -> requests.Response:
return self.request("POST", url, **kwargs)

def close(self):
self._session.close()

def __enter__(self):
return self

def __exit__(self, *exc):
self.close()
return False


# Usage
with RateLimitedHTTPClient(requests_per_second=5, max_wait_seconds=30) as client:
for user_id in range(100):
response = client.get(f"https://api.example.com/users/{user_id}")
response.raise_for_status()
process(response.json())

Design decisions:

  • Per-host sliding window prevents one busy host from affecting requests to other hosts
  • _handle_429 separates server-requested rate limiting (via Retry-After) from client-side throttling - they have different semantics and different wait durations
  • max_wait_seconds provides a circuit breaker for server-requested waits that would block the caller for unacceptable durations
  • Thread lock on the request time window makes the client safe for multi-threaded use
  • The request() method is the single dispatch point - all HTTP methods go through it, making rate limiting and retry logic uniform

Key Takeaways

  • HTTP is plain text over a TCP socket: request line + headers + blank line + optional body. Every HTTP library, framework, and debugging tool operates on this format
  • GET, HEAD, OPTIONS are safe and idempotent - infrastructure can retry them freely. PUT, DELETE are idempotent but not safe. POST and PATCH are neither - never auto-retry them
  • Status code families: 2xx (success), 3xx (redirect), 4xx (client error - fix the request), 5xx (server error - retry with backoff)
  • 401 means unauthenticated (missing/invalid credentials); 403 means unauthorized (lacks permission). 429 means rate limited - check Retry-After before retrying
  • ETags enable conditional GET: send If-None-Match with the ETag and get 304 Not Modified (with no body) when the resource has not changed
  • A cold HTTPS connection requires DNS + TCP handshake + TLS handshake before the first HTTP byte - 3+ round trips. Session reuse eliminates handshake overhead for subsequent requests
  • Always set explicit timeouts: timeout=(connect_seconds, read_seconds). A hanging request without a timeout blocks a thread or coroutine indefinitely
  • Use Session for multiple requests to the same host. requests.get() at module level creates a new connection every time
  • Never use verify=False in production. Fix SSL errors properly with correct CA bundles
  • HTTPAdapter with Retry handles transient 5xx failures with exponential backoff - only safe for idempotent methods
  • httpx provides near-identical API to requests with added async support and HTTP/2 multiplexing
  • HTTP/2 multiplexes multiple streams over one connection and compresses repeated headers - significant throughput improvement for high-request-rate API clients

What's Next

Lesson 02 covers REST principles - the architectural style that gives HTTP methods their meaning in API design. You will see two designs for the same API, understand why one is "RESTful" and the other is not, and learn the specific rules (URL design, method semantics, status codes, error formats, pagination, versioning) that make REST APIs predictable, cacheable, and maintainable at scale.

© 2026 EngineersOfAI. All rights reserved.