Skip to main content

JSON Serialization - Production-Grade Encoding and Decoding

Reading time: ~30 minutes | Level: Intermediate → Engineering

Before reading further, run this in your head:

import json
from datetime import datetime
from decimal import Decimal
from uuid import UUID

data = {
"id": UUID("550e8400-e29b-41d4-a716-446655440000"),
"created_at": datetime.now(),
"price": Decimal("9.99"),
"tags": {1, 2, 3},
}

print(json.dumps(data)) # ?
TypeError: Object of type UUID is not JSON serializable

This is the first thing your API crashes on when it graduates from toy examples to real data. UUID, datetime, Decimal, Enum, dataclass, SQLAlchemy model - none of them serialize out of the box. The standard library's json module handles exactly six Python types natively, and everything in production is something else.

This lesson shows you how to fix this properly: not with a one-line hack, but with a production-grade encoder, awareness of which library to reach for when performance matters, and the footguns that silently corrupt data.

What You Will Learn

  • What Python's json module can and cannot serialize natively
  • Writing custom JSONEncoder subclasses for production APIs
  • Handling datetime, Decimal, UUID, Enum, dataclass, and bytes
  • json.dumps kwargs that matter: indent, sort_keys, ensure_ascii, separators
  • orjson: Rust-based, 3–5× faster, handles datetime/UUID/numpy natively
  • msgspec: schema-based, even faster, with validation built in
  • Choosing the right JSON library for your workload
  • Content negotiation: Content-Type: application/json; charset=utf-8
  • Streaming large responses with NDJSON (Newline-Delimited JSON)
  • Pydantic's JSON serialization and its mode='json' gotcha

Prerequisites

  • Lesson 04 (FastAPI) - request/response model context
  • Lesson 05 (Request-Response Lifecycle) - where serialization happens in the stack
  • Basic familiarity with dataclasses and Enum

Part 1 - What the Standard Library Can Serialize

Python's json module supports exactly these types natively:

Python typeJSON output
dictobject ({})
list, tuplearray ([])
strstring ("")
intnumber (integer)
floatnumber (decimal)
True / Falsetrue / false
Nonenull

Everything else raises TypeError: Object of type X is not JSON serializable. No exceptions, no fallbacks, no coercion.

import json

# These work
json.dumps({"name": "Alice", "age": 30, "active": True})
# '{"name": "Alice", "age": 30, "active": true}'

json.dumps([1, 2, (3, 4)]) # tuple → array
# '[1, 2, [3, 4]]'

json.dumps(None)
# 'null'

# These all raise TypeError
from datetime import datetime
from decimal import Decimal
from uuid import UUID
import enum

json.dumps(datetime.now()) # TypeError: Object of type datetime is not JSON serializable
json.dumps(Decimal("9.99")) # TypeError: Object of type Decimal is not JSON serializable
json.dumps(UUID("550e8400-e29b-41d4-a716-446655440000")) # TypeError
json.dumps({1, 2, 3}) # TypeError: Object of type set is not JSON serializable
json.dumps(b"hello") # TypeError: Object of type bytes is not JSON serializable

Part 2 - json.dumps Kwargs That Matter

Before building custom encoders, understand the formatting controls:

import json

data = {"b": 2, "a": 1, "nested": {"z": 26, "y": 25}}

# Pretty print for debugging / developer-facing responses
json.dumps(data, indent=2)
# {
# "b": 2,
# "a": 1,
# "nested": {
# "z": 26,
# "y": 25
# }
# }

# Sort keys for deterministic output (important for caching and hashing)
json.dumps(data, indent=2, sort_keys=True)
# {
# "a": 1,
# "b": 2,
# "nested": {
# "y": 25,
# "z": 26
# }
# }

# Compact output for APIs (smaller payload, faster transmission)
json.dumps(data, separators=(",", ":"))
# '{"b":2,"a":1,"nested":{"z":26,"y":25}}'

# ensure_ascii=False: preserve unicode characters instead of escaping them
# Default is True - emojis and non-ASCII become \uXXXX escape sequences
json.dumps({"greeting": "こんにちは"}, ensure_ascii=True)
# '{"greeting": "\\u3053\\u3093\\u306b\\u3061\\u306f"}'

json.dumps({"greeting": "こんにちは"}, ensure_ascii=False)
# '{"greeting": "こんにちは"}'
# Smaller output, correct UTF-8 when Content-Type includes charset=utf-8
tip

Use separators=(",", ":") in production APIs - it removes the space after , and : that json.dumps includes by default. On a response with thousands of keys, this saves measurable bytes. Use indent=2 only for developer-facing endpoints or logging.

Part 3 - Custom Encoders: The default() Method

The correct way to handle non-serializable types is to subclass json.JSONEncoder and override default():

import json
import enum
import dataclasses
from datetime import datetime, date
from decimal import Decimal
from uuid import UUID


class ProductionJSONEncoder(json.JSONEncoder):
"""
Production-grade JSON encoder that handles all common types
that the standard library cannot serialize.
"""

def default(self, obj):
# datetime → ISO 8601 string (always UTC-aware in production)
if isinstance(obj, datetime):
return obj.isoformat()

# date → ISO 8601 date string
if isinstance(obj, date):
return obj.isoformat()

# Decimal → string to preserve exact precision
# NEVER convert to float - see the warning below
if isinstance(obj, Decimal):
return str(obj)

# UUID → canonical string form
if isinstance(obj, UUID):
return str(obj)

# Enum → its value (handles IntEnum, StrEnum, plain Enum)
if isinstance(obj, enum.Enum):
return obj.value

# dataclass → dict (recursive, handles nested dataclasses)
if dataclasses.is_dataclass(obj) and not isinstance(obj, type):
return dataclasses.asdict(obj)

# set/frozenset → sorted list (sorted for determinism)
if isinstance(obj, (set, frozenset)):
return sorted(obj, key=str)

# bytes → base64-encoded string
import base64
if isinstance(obj, bytes):
return base64.b64encode(obj).decode("ascii")

# Fallback to the parent - raises TypeError for truly unsupported types
return super().default(obj)

Use it with json.dumps:

import json
from datetime import datetime
from decimal import Decimal
from uuid import UUID

data = {
"id": UUID("550e8400-e29b-41d4-a716-446655440000"),
"created_at": datetime(2024, 3, 15, 12, 0, 0),
"price": Decimal("9.99"),
"tags": {1, 2, 3},
}

result = json.dumps(data, cls=ProductionJSONEncoder, separators=(",", ":"))
print(result)
# {"id":"550e8400-e29b-41d4-a716-446655440000","created_at":"2024-03-15T12:00:00","price":"9.99","tags":[1,2,3]}

Or as a convenience wrapper:

def dumps(obj, **kwargs) -> str:
"""Drop-in replacement for json.dumps with production encoder."""
kwargs.setdefault("cls", ProductionJSONEncoder)
kwargs.setdefault("ensure_ascii", False)
return json.dumps(obj, **kwargs)


def loads(s: str | bytes) -> dict:
"""Thin wrapper around json.loads for symmetry."""
return json.loads(s)

One-Off Serialization with default Parameter

For quick, non-reusable serialization, use the default kwarg instead of a full subclass:

import json
from datetime import datetime
from uuid import UUID

# Inline default function - used for single-call overrides
result = json.dumps(
{"id": UUID("550e8400-e29b-41d4-a716-446655440000"), "ts": datetime.now()},
default=str, # str() everything that isn't natively serializable
)
# Works, but str(Decimal("9.99")) == "9.99" which is fine,
# while str(datetime.now()) is locale-dependent and not ISO 8601
# Use only for debugging - not production
warning

default=str is convenient for debugging but not production. str(datetime.now()) produces 2024-03-15 12:00:00.123456 - not ISO 8601, not parseable by most JSON consumers. str(Decimal("9.99")) happens to produce "9.99" which is correct, but you are relying on accident rather than intent. Always use an explicit encoder in production code.

Part 4 - The Production Problem: Types You Will Encounter

datetime and Timezone Awareness

from datetime import datetime, timezone

# Naive datetime (no timezone) - ambiguous, never use in APIs
naive = datetime(2024, 3, 15, 12, 0, 0)
naive.isoformat() # "2024-03-15T12:00:00"
# What timezone? No one knows.

# Aware datetime (explicit UTC) - unambiguous
aware = datetime(2024, 3, 15, 12, 0, 0, tzinfo=timezone.utc)
aware.isoformat() # "2024-03-15T12:00:00+00:00"

# Or use .now(tz=timezone.utc) - always aware
datetime.now(tz=timezone.utc).isoformat() # "2024-03-15T12:00:00.123456+00:00"

Always store and transmit UTC. Always include timezone offset in JSON output.

Decimal and Financial Data

from decimal import Decimal
import json

price = Decimal("9.99")

# WRONG - float loses precision
json.dumps({"price": float(price)}) # '{"price": 9.99}' - looks fine
# But:
float(Decimal("0.1")) + float(Decimal("0.2")) # 0.30000000000000004
# IEEE 754 floating point cannot represent 0.1 or 0.2 exactly

# CORRECT - serialize as string, reconstruct as Decimal on the other side
json.dumps({"price": str(price)}) # '{"price": "9.99"}'
# The consumer parses "9.99" back to Decimal, not float
danger

Never serialize monetary values as JSON number. IEEE 754 float cannot represent most decimal fractions exactly - 0.1 + 0.2 == 0.30000000000000004 in Python and in JavaScript. For financial APIs, serialize Decimal to str and document in your OpenAPI schema that the field is a "type": "string", "format": "decimal". JavaScript clients must parse it as a string and use a decimal library (e.g., decimal.js) - never parseFloat().

Enum Types

import enum
import json

class TaskStatus(str, enum.Enum):
PENDING = "pending"
RUNNING = "running"
DONE = "done"

class Priority(int, enum.Enum):
LOW = 1
MEDIUM = 2
HIGH = 3

# str Enum: obj.value is a string - serialize naturally
# int Enum: obj.value is an int - also serializes naturally
# But plain Enum: obj.value could be anything

class Color(enum.Enum):
RED = "red"

json.dumps(Color.RED) # TypeError - not serializable
json.dumps(Color.RED.value) # '"red"' - correct

# With ProductionJSONEncoder:
json.dumps({"color": Color.RED}, cls=ProductionJSONEncoder)
# '{"color": "red"}'

dataclass Models

import dataclasses
import json
from datetime import datetime

@dataclasses.dataclass
class Task:
id: int
title: str
created_at: datetime
completed: bool = False

task = Task(id=1, title="Write tests", created_at=datetime(2024, 3, 15))

# Without custom encoder:
json.dumps(dataclasses.asdict(task)) # Still fails: datetime is not serializable

# With ProductionJSONEncoder:
json.dumps(task, cls=ProductionJSONEncoder)
# '{"id":1,"title":"Write tests","created_at":"2024-03-15T00:00:00","completed":false}'
danger

Never serialize SQLAlchemy models directly - not with __dict__, not by iterating attributes. SQLAlchemy models may contain lazy-loaded relationships: accessing user.__dict__ can trigger N+1 database queries for every object in a list. Always convert SQLAlchemy models to plain dicts or Pydantic models first, with explicit field selection. The serialization layer must never touch the database.

Part 5 - Pydantic's JSON Serialization

Pydantic v2 handles serialization as a first-class feature:

from pydantic import BaseModel
from datetime import datetime
from uuid import UUID
from decimal import Decimal

class OrderResponse(BaseModel):
id: UUID
created_at: datetime
total: Decimal
status: str

order = OrderResponse(
id=UUID("550e8400-e29b-41d4-a716-446655440000"),
created_at=datetime(2024, 3, 15, 12, 0, 0),
total=Decimal("49.99"),
status="confirmed",
)

# model_dump() → Python dict (datetime stays as datetime object)
d = order.model_dump()
print(type(d["created_at"])) # <class 'datetime.datetime'>
print(type(d["total"])) # <class 'decimal.Decimal'>

# model_dump(mode="json") → dict with JSON-serializable values
d_json = order.model_dump(mode="json")
print(type(d_json["created_at"])) # <class 'str'> - "2024-03-15T12:00:00"
print(type(d_json["total"])) # <class 'str'> - "49.99"

# model_dump_json() → JSON string directly (fastest path)
json_str = order.model_dump_json()
print(json_str)
# '{"id":"550e8400-e29b-41d4-a716-446655440000","created_at":"2024-03-15T12:00:00","total":"49.99","status":"confirmed"}'

# model_json_schema() → JSON Schema for OpenAPI docs
import json
print(json.dumps(OrderResponse.model_json_schema(), indent=2))
danger

model_dump() vs model_dump(mode="json") is the most common Pydantic serialization bug. model_dump() returns a Python dict where datetime is still a datetime object - passing this dict to json.dumps() without a custom encoder will still raise TypeError. Always use model_dump(mode="json") when you need a dict to pass to standard json.dumps, or use model_dump_json() directly to get the final JSON string.

Part 6 - orjson: The Production Standard

orjson is a Rust-based JSON library that handles all the types stdlib cannot, at 3–5× the speed:

pip install orjson
import orjson
from datetime import datetime, timezone
from uuid import UUID
from decimal import Decimal
from dataclasses import dataclass

@dataclass
class Task:
id: UUID
title: str
created_at: datetime
price: Decimal

task = Task(
id=UUID("550e8400-e29b-41d4-a716-446655440000"),
title="Write tests",
created_at=datetime(2024, 3, 15, 12, 0, 0, tzinfo=timezone.utc),
price=Decimal("9.99"),
)

# orjson.dumps() returns bytes, not str
result: bytes = orjson.dumps(task)
print(result)
# b'{"id":"550e8400-e29b-41d4-a716-446655440000","title":"Write tests","created_at":"2024-03-15T12:00:00+00:00","price":"9.99"}'

# Decode to str when needed
result.decode()

# orjson handles natively: datetime, date, time, UUID, dataclass,
# numpy arrays, pandas DataFrames - no custom encoder needed

# Options (passed as flags, not kwargs)
orjson.dumps(task, option=orjson.OPT_INDENT_2) # pretty-print
orjson.dumps(task, option=orjson.OPT_SORT_KEYS) # sorted keys
orjson.dumps(task, option=orjson.OPT_NON_STR_KEYS) # allow non-str dict keys
orjson.dumps(task, option=orjson.OPT_INDENT_2 | orjson.OPT_SORT_KEYS) # combine with |

# orjson.loads() accepts str, bytes, or bytearray
data = orjson.loads(b'{"id": "550e8400-e29b-41d4-a716-446655440000"}')

API Differences from stdlib

Featurestdlib jsonorjson
Return type of dumpsstrbytes
Custom typesvia cls= or default=via default= only
datetimeTypeErrornative (ISO 8601)
UUIDTypeErrornative (string)
dataclassTypeErrornative
numpy.ndarrayTypeErrornative
Speedbaseline3–5× faster
indent parameterindent=2option=orjson.OPT_INDENT_2
tip

Use orjson in production FastAPI apps. It is a near-drop-in replacement for the JSON layer - orjson.dumps returns bytes instead of str, but FastAPI's Response class accepts bytes directly. orjson handles datetime and UUID natively without any encoder configuration. Install it and it silently replaces stdlib JSON in many frameworks.

Using orjson with FastAPI

from fastapi import FastAPI
from fastapi.responses import Response
import orjson
from datetime import datetime, timezone
from uuid import UUID, uuid4

app = FastAPI()

@app.get("/tasks/{task_id}")
async def get_task(task_id: UUID):
task = {
"id": task_id,
"title": "Write tests",
"created_at": datetime.now(tz=timezone.utc),
}
# Return bytes directly - orjson handles UUID and datetime
return Response(
content=orjson.dumps(task),
media_type="application/json",
)

Part 7 - msgspec: Schema-Based, Maximum Speed

msgspec takes a different approach: define a schema, get both validation and serialization:

pip install msgspec
import msgspec
import msgspec.json
from datetime import datetime
from uuid import UUID

# Define a schema (similar to dataclass or Pydantic model)
class Task(msgspec.Struct):
id: UUID
title: str
created_at: datetime
completed: bool = False

task = Task(
id=UUID("550e8400-e29b-41d4-a716-446655440000"),
title="Write tests",
created_at=datetime(2024, 3, 15, 12, 0, 0),
)

# Encode → bytes (faster than orjson for struct types)
encoded: bytes = msgspec.json.encode(task)

# Decode + validate in one step
decoded: Task = msgspec.json.decode(encoded, type=Task)
print(decoded.title) # "Write tests"
print(type(decoded.id)) # <class 'uuid.UUID'> - UUID is reconstructed, not left as str

# msgspec.json.decode validates types during decode
# Pass invalid data and it raises a clear error
try:
msgspec.json.decode(b'{"id": "not-a-uuid", "title": "test", "created_at": "2024-03-15"}', type=Task)
except msgspec.ValidationError as e:
print(e) # Expected `uuid`, got `str` - at `$.id`

Benchmark Comparison

For reference on a typical API response serialization task (10,000 iterations):

LibraryEncode (relative)Decode (relative)Custom types
json (stdlib)Encoder class needed
orjson~5× faster~3× fasterdatetime/UUID/dataclass native
msgspec~8× faster~5× fasterSchema-based, validation included

Part 8 - Choosing a JSON Library

Part 9 - Content Negotiation

Every JSON HTTP response must set the correct Content-Type header:

Content-Type: application/json; charset=utf-8
from fastapi import FastAPI
from fastapi.responses import JSONResponse, Response
import orjson

app = FastAPI()

# JSONResponse sets Content-Type: application/json automatically
@app.get("/tasks")
async def list_tasks():
return JSONResponse(content={"tasks": []})
# Content-Type: application/json

# For orjson (returns bytes), use Response with explicit media_type
@app.get("/tasks/fast")
async def list_tasks_fast():
return Response(
content=orjson.dumps({"tasks": []}),
media_type="application/json",
)
# Content-Type: application/json

# Clients may request specific content types via Accept header
from fastapi import Request

@app.get("/tasks/negotiated")
async def list_tasks_negotiated(request: Request):
accept = request.headers.get("Accept", "application/json")
data = {"tasks": []}

if "application/json" in accept or "*/*" in accept:
return Response(
content=orjson.dumps(data),
media_type="application/json; charset=utf-8",
)
else:
# 406 Not Acceptable
from fastapi import HTTPException
raise HTTPException(status_code=406, detail="Only application/json supported")
note

json.loads() always expects a str or bytes/bytearray of UTF-8 encoded JSON. If you receive a response as bytes (which httpx and requests do by default), call response.json() rather than json.loads(response.text) - the .json() method handles encoding detection. Never call json.loads(response.content) on a binary response - decode it first with response.content.decode("utf-8") if you must, but .json() is always correct.

Part 10 - Streaming Large Responses with NDJSON

For large datasets, returning a single JSON array means building the entire response in memory before sending the first byte. NDJSON (Newline-Delimited JSON) streams one record per line:

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import orjson
from typing import AsyncIterator

app = FastAPI()

async def generate_tasks() -> AsyncIterator[bytes]:
"""Yield NDJSON: one JSON object per line."""
# Simulates a database cursor - never loads all rows into memory
tasks = [
{"id": i, "title": f"Task {i}", "status": "pending"}
for i in range(1, 10001) # 10,000 tasks
]

for task in tasks:
# Each line: JSON object + newline
yield orjson.dumps(task) + b"\n"

@app.get("/tasks/stream")
async def stream_tasks():
return StreamingResponse(
generate_tasks(),
media_type="application/x-ndjson",
)

NDJSON clients process records as they arrive - no need to wait for all 10,000 records:

import httpx

with httpx.stream("GET", "http://localhost:8000/tasks/stream") as response:
for line in response.iter_lines():
if line: # skip empty lines
task = orjson.loads(line)
print(task["title"])
note

NDJSON (application/x-ndjson) and JSON Lines (application/jsonl) are different names for the same format: one JSON object per line, newline-separated, no outer array wrapper. Use it for: log export APIs, data export endpoints, real-time event streams, and any response with more than a few hundred records.

Graded Practice

Level 1 - Predict the Output

For each snippet, predict whether it succeeds or raises TypeError, and if it succeeds, what the output is:

1a:

import json
print(json.dumps({"values": (1, 2, 3), "active": None}))

1b:

import json
from decimal import Decimal
print(json.dumps({"price": Decimal("9.99")}))

1c:

import json
import enum

class Color(str, enum.Enum):
RED = "red"

print(json.dumps({"color": Color.RED}))

1d:

import json
from datetime import datetime

print(json.dumps({"ts": datetime.now()}, default=str))
Show Answer

1a - Succeeds:

{"values": [1, 2, 3], "active": null}

tuple is converted to JSON array just like list. None becomes null.

1b - Raises TypeError:

TypeError: Object of type Decimal is not JSON serializable

Decimal is not in the stdlib JSON type table. Even though Decimal("9.99") looks like a number, the JSON encoder does not know how to handle it.

1c - Raises TypeError:

TypeError: Object of type Color is not JSON serializable

Even though Color inherits from str, json.dumps checks the type explicitly. Color.RED is a Color instance, not a plain str. The fix: json.dumps({"color": Color.RED.value}) - serialize the .value explicitly, or use ProductionJSONEncoder which calls obj.value for all Enum instances.

1d - Succeeds, but produces a non-standard format:

{"ts": "2024-03-15 12:00:00.123456"}

default=str calls str() on the datetime object. Python's str(datetime(...)) produces "2024-03-15 12:00:00.123456" - not ISO 8601 ("2024-03-15T12:00:00.123456"). Most JSON parsers and APIs expect the T separator. For production, use obj.isoformat() in a proper encoder, not str().

Level 2 - Debug the Encoder

A developer wrote this custom encoder for a FastAPI app. Find and fix all bugs:

import json
from datetime import datetime
from decimal import Decimal
from uuid import UUID

class MyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.strftime("%Y-%m-%d") # Bug 1

if isinstance(obj, Decimal):
return float(obj) # Bug 2

if isinstance(obj, UUID):
return obj.bytes # Bug 3

return super().default(obj)

# Test data
data = {
"id": UUID("550e8400-e29b-41d4-a716-446655440000"),
"created_at": datetime(2024, 3, 15, 14, 30, 0),
"price": Decimal("0.10"),
"balance": Decimal("0.20"),
}

result = json.loads(json.dumps(data, cls=MyEncoder))
# What is wrong with result["price"] + result["balance"]?
Show Answer

Bug 1 - strftime("%Y-%m-%d") drops the time component:

strftime("%Y-%m-%d") produces "2024-03-15" - a date string, not a datetime string. For a datetime object, you lose hours, minutes, seconds, and timezone. The fix:

# Correct: ISO 8601 with time component
if isinstance(obj, datetime):
return obj.isoformat()
# Produces "2024-03-15T14:30:00" or "2024-03-15T14:30:00+00:00" if timezone-aware

Bug 2 - float(Decimal) loses precision:

float(Decimal("0.10")) # 0.1
float(Decimal("0.20")) # 0.2
float(Decimal("0.10")) + float(Decimal("0.20"))
# 0.30000000000000004 - IEEE 754 cannot represent 0.1 or 0.2 exactly

# So result["price"] + result["balance"] == 0.30000000000000004, NOT 0.30

The fix: serialize Decimal to str. The receiving side must parse it as Decimal, not float:

if isinstance(obj, Decimal):
return str(obj) # "0.10" - exact, reconstructable

Bug 3 - UUID.bytes is bytes, which is also not serializable:

uuid_obj.bytes returns the 16-byte raw binary representation of the UUID - not a string. bytes is not in the stdlib JSON type table, so json.dumps will call default() again on the bytes object, hit the super().default(obj) fallback, and raise TypeError.

The fix: serialize UUID as its canonical string form:

if isinstance(obj, UUID):
return str(obj) # "550e8400-e29b-41d4-a716-446655440000"

Fixed encoder:

class MyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat() # Full ISO 8601 with time
if isinstance(obj, Decimal):
return str(obj) # Exact decimal string
if isinstance(obj, UUID):
return str(obj) # Canonical UUID string
return super().default(obj)

Level 3 - Design a Serialization Layer

You are building the serialization layer for a financial API that serves:

  1. A React frontend that calls parseFloat() on number fields (it cannot be changed)
  2. A Python microservice that uses Decimal for all arithmetic
  3. A third-party audit service that expects ISO 8601 UTC datetimes with explicit Z suffix (e.g., "2024-03-15T12:00:00Z")
  4. A data export endpoint that returns 500,000 transaction records

Design requirements:

  • The React frontend must receive amounts as float (the team insists, despite the precision risk)
  • The Python microservice must receive amounts as strings so it can reconstruct Decimal
  • The audit service requires "2024-03-15T12:00:00Z" format, not +00:00 offset
  • The export endpoint must stream - not load 500K records into memory

Design the complete solution including: how to produce different formats for different consumers, the datetime format difference, and the streaming architecture.

Show Answer

The core insight: you need per-consumer serialization, not a single universal encoder. The design uses content negotiation and separate response models.

1 - Datetime format: +00:00 vs Z

Python's datetime.isoformat() produces +00:00 for UTC. The audit service requires Z. A wrapper:

from datetime import datetime, timezone

def to_utc_z(dt: datetime) -> str:
"""Convert datetime to ISO 8601 with 'Z' suffix (audit service format)."""
if dt.tzinfo is None:
raise ValueError("Naive datetime - always use UTC-aware datetimes in production")
utc = dt.astimezone(timezone.utc)
return utc.strftime("%Y-%m-%dT%H:%M:%SZ")

# vs standard
datetime(2024, 3, 15, 12, 0, tzinfo=timezone.utc).isoformat() # "2024-03-15T12:00:00+00:00"
to_utc_z(datetime(2024, 3, 15, 12, 0, tzinfo=timezone.utc)) # "2024-03-15T12:00:00Z"

2 - Per-consumer Pydantic response models

from pydantic import BaseModel, field_serializer
from datetime import datetime
from decimal import Decimal
from uuid import UUID

# For React frontend: amounts as float (acknowledged precision risk)
class TransactionForFrontend(BaseModel):
id: UUID
amount: float # float accepted by the team
timestamp: datetime

model_config = {"from_attributes": True}

@field_serializer("timestamp")
def serialize_timestamp(self, dt: datetime) -> str:
return dt.isoformat() # "+00:00" format is fine for React

# For Python microservice: amounts as string Decimal
class TransactionForMicroservice(BaseModel):
id: UUID
amount: Decimal # preserved as string in JSON

model_config = {"from_attributes": True}

# For audit service: Z-suffix datetimes
class TransactionForAudit(BaseModel):
id: UUID
amount: Decimal
timestamp: datetime

model_config = {"from_attributes": True}

@field_serializer("timestamp")
def serialize_timestamp(self, dt: datetime) -> str:
return to_utc_z(dt) # "Z" suffix format

3 - Routing with Accept header content negotiation

from fastapi import FastAPI, Request, Header
from fastapi.responses import Response, StreamingResponse
import orjson
from typing import Optional

app = FastAPI()

@app.get("/transactions/{txn_id}")
async def get_transaction(
txn_id: UUID,
accept: Optional[str] = Header(default="application/json"),
):
txn = await fetch_transaction(txn_id) # returns SQLAlchemy model

# Route to per-consumer response model based on Accept header
if accept and "application/vnd.audit+json" in accept:
model = TransactionForAudit.model_validate(txn)
elif accept and "application/vnd.microservice+json" in accept:
model = TransactionForMicroservice.model_validate(txn)
else:
# Default: frontend format
model = TransactionForFrontend.model_validate(txn)

return Response(
content=model.model_dump_json(),
media_type="application/json",
)

4 - Streaming 500K records with NDJSON

from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy import select

async def stream_transactions(session: AsyncSession, limit: int = 500_000):
"""Yield transactions as NDJSON - one record per line, never all in memory."""
stmt = select(Transaction).limit(limit).execution_options(stream_results=True)

async with session.stream(stmt) as result:
async for row in result:
txn = row[0]
record = {
"id": str(txn.id),
"amount": str(txn.amount), # Decimal as string
"timestamp": to_utc_z(txn.timestamp),
}
yield orjson.dumps(record) + b"\n"

@app.get("/transactions/export")
async def export_transactions(session: AsyncSession = Depends(get_db)):
return StreamingResponse(
stream_transactions(session),
media_type="application/x-ndjson",
headers={
"Content-Disposition": "attachment; filename=transactions.ndjson",
"X-Record-Format": "ndjson",
},
)

Key design decisions:

  • Per-consumer models over a universal encoder: different consumers have genuinely different requirements. A single encoder cannot satisfy "float for React, string for Python, Z-suffix for audit" simultaneously. Explicit models make the contract visible and testable.
  • model_dump_json() over json.dumps(model.model_dump()): Pydantic's native JSON serialization path is faster and handles types correctly. Never pass model_dump() (Python objects) to json.dumps without a custom encoder.
  • stream_results=True in SQLAlchemy: tells the database driver to use a server-side cursor, fetching rows in chunks rather than loading the full result set. Without this, 500K rows are fetched into Python memory before the first byte is streamed.
  • orjson.dumps in the stream: faster than stdlib for high-throughput streaming; no encoder configuration needed for the types used.

Key Takeaways

  • stdlib json handles exactly 7 types: dict, list/tuple, str, int, float, bool, None. Everything else needs a custom encoder or a different library.
  • json.JSONEncoder.default() is the correct extension point - subclass it for production, use default=str only for debugging.
  • Never serialize Decimal as float: IEEE 754 floating-point cannot represent 0.1 or 0.2 exactly. Serialize financial values to str and document the field format in your OpenAPI schema.
  • Always use timezone-aware datetime: naive datetimes are ambiguous in APIs. Store UTC, serialize with .isoformat() (produces +00:00) or strftime("%Y-%m-%dT%H:%M:%SZ") (produces Z suffix) depending on consumer requirements.
  • orjson is the production standard for FastAPI: 3–5× faster than stdlib, handles datetime/UUID/dataclass/numpy natively, bytes output is fine for Response(content=...).
  • msgspec adds schema-based validation on decode - use it when you want the speed of orjson plus type-safe deserialization without Pydantic's overhead.
  • model_dump(mode="json") vs model_dump(): only mode="json" produces a dict with JSON-serializable values. Plain model_dump() returns Python objects - datetime is still datetime, Decimal is still Decimal.
  • NDJSON for large responses: never load 500K records into memory to build a JSON array. Stream with StreamingResponse + yield orjson.dumps(record) + b"\n". Use SQLAlchemy's stream_results=True to avoid loading the query result into memory either.
  • Content negotiation is real: set Content-Type: application/json; charset=utf-8 on all JSON responses. For multi-consumer APIs, use Accept header routing to select per-consumer serialization models.
  • Never serialize SQLAlchemy models directly: lazy-loaded relationships trigger N+1 queries at serialization time. Always convert to a Pydantic model or plain dict first.
© 2026 EngineersOfAI. All rights reserved.