JSON Handling - Serialization, Deserialization, and Edge Cases

Reading time: ~18 minutes | Level: Foundation → Engineering

Here is a question that trips up most developers the first time they hit it in production:

import json
from datetime import datetime
from decimal import Decimal

data = {
    "user": "alice",
    "created_at": datetime.now(),
    "balance": Decimal("99.99"),
}

print(json.dumps(data))

Output:

TypeError: Object of type datetime is not JSON serializable

The json module only handles six types. Everything else - datetime, UUID, Decimal, bytes, custom objects - raises TypeError. Knowing exactly which types fail and exactly how to handle them is the difference between a working REST API and a production incident at 2 AM.

What You Will Learn

The six JSON types and their exact Python equivalents
json.dumps() and json.loads() for string-based serialization
json.dump() and json.load() for file-based serialization
indent, sort_keys, and separators parameters and when to use each
How to handle non-serializable types: datetime, UUID, Decimal, bytes, custom objects
Custom encoders with json.JSONEncoder and the default() method
Custom decoders with object_hook for round-trip fidelity
json.JSONDecodeError: what causes it and how to handle it gracefully
ensure_ascii=False for Unicode-rich data
Performance: when to reach for orjson or ujson

Prerequisites

Python 3.8+ with json module (standard library - no install needed)
Understanding of Python dicts, lists, and basic types
Familiarity with reading and writing files (see lessons 01 and 02 of this module)
Basic understanding of context managers (lesson 03)

Mental Model: JSON Is a Typed Subset of Python

JSON is not Python. It is a language-independent text format with exactly six types:

JSON Type	JSON Example	Python Type
object	`{"key": "value"}`	`dict`
array	`[1, 2, 3]`	`list`
string	`"hello"`	`str`
number	`42` or `3.14`	`int` or `float`
boolean	`true` or `false`	`True` or `False`
null	`null`	`None`

Not in JSON: datetime, UUID, Decimal, bytes, set, tuple, custom objects, complex, frozenset, ...

This mismatch is the source of every JSON serialization problem. Python's type system is far richer than JSON's. The json module handles the six core mappings automatically. Everything else is your responsibility.

Part 1 - The Four Core Functions

`json.dumps()` - Python Object to JSON String

import json

data = {
    "name": "Alice",
    "age": 30,
    "scores": [95, 87, 92],
    "active": True,
    "profile": None,
}

json_string = json.dumps(data)
print(json_string)
# {"name": "Alice", "age": 30, "scores": [95, 87, 92], "active": true, "profile": null}

print(type(json_string))
# <class 'str'>

Notice the automatic type conversions:

Python True becomes JSON true
Python None becomes JSON null
Python dict becomes JSON object
Python list becomes JSON array

`json.loads()` - JSON String to Python Object

import json

json_string = '{"name": "Alice", "age": 30, "active": true, "profile": null}'

data = json.loads(json_string)
print(data)
# {'name': 'Alice', 'age': 30, 'active': True, 'profile': None}

print(type(data))           # <class 'dict'>
print(type(data["active"])) # <class 'bool'>
print(data["profile"])      # None

The conversions are symmetric:

JSON true becomes Python True
JSON false becomes Python False
JSON null becomes Python None

`json.dump()` - Python Object to JSON File

import json

config = {
    "database": {
        "host": "localhost",
        "port": 5432,
        "name": "appdb",
    },
    "debug": False,
    "max_connections": 100,
}

with open("config.json", "w", encoding="utf-8") as f:
    json.dump(config, f, indent=2)

# config.json now contains:
# {
#   "database": {
#     "host": "localhost",
#     "port": 5432,
#     "name": "appdb"
#   },
#   "debug": false,
#   "max_connections": 100
# }

`json.load()` - JSON File to Python Object

import json

with open("config.json", "r", encoding="utf-8") as f:
    config = json.load(f)

print(config["database"]["host"])  # localhost
print(config["debug"])             # False
print(type(config["database"]))    # <class 'dict'>

:::note Always specify encoding Always open JSON files with encoding="utf-8". JSON is defined to be UTF-8 encoded by RFC 8259. Omitting the encoding parameter uses the platform default, which can differ on Windows. :::

Part 2 - Formatting Parameters

`indent` - Human-Readable Output

import json

data = {"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}

# Compact (default)
compact = json.dumps(data)
print(compact)
# {"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}

# Indented - for config files, logging, debugging
readable = json.dumps(data, indent=2)
print(readable)
# {
#   "users": [
#     {
#       "id": 1,
#       "name": "Alice"
#     },
#     {
#       "id": 2,
#       "name": "Bob"
#     }
#   ]
# }

`sort_keys` - Deterministic Output

import json

data = {"zebra": 1, "apple": 2, "mango": 3}

print(json.dumps(data))
# {"zebra": 1, "apple": 2, "mango": 3}  - dict insertion order (Python 3.7+)

print(json.dumps(data, sort_keys=True))
# {"apple": 2, "mango": 3, "zebra": 1}  - alphabetical

:::tip Use sort_keys for reproducible hashing When you need to hash JSON (e.g., for caching or checksums), use sort_keys=True to ensure the same dict always produces the same JSON string regardless of insertion order.

import hashlib, json

def dict_hash(d: dict) -> str:
    canonical = json.dumps(d, sort_keys=True, separators=(',', ':'))
    return hashlib.sha256(canonical.encode()).hexdigest()

:::

`separators` - Compact JSON for Network Transmission

import json

data = {"event": "click", "x": 100, "y": 200}

# Default separators include spaces: (', ', ': ')
default = json.dumps(data)
print(f"Default: {len(default)} bytes  →  {default}")
# Default: 36 bytes  →  {"event": "click", "x": 100, "y": 200}

# Compact separators - no extra whitespace
compact = json.dumps(data, separators=(',', ':'))
print(f"Compact: {len(compact)} bytes  →  {compact}")
# Compact: 34 bytes  →  {"event":"click","x":100,"y":200}

Format	Use for
`indent=2`	Config files, responses for human review
`separators=(',', ':')`	Network APIs, high-throughput logging (compact)
Default	General use, debugging

Part 3 - Non-Serializable Types and How to Handle Each

The Problem

import json
from datetime import datetime
from decimal import Decimal
import uuid

# These all raise TypeError:
json.dumps(datetime.now())          # TypeError: datetime not serializable
json.dumps(Decimal("3.14"))         # TypeError: Decimal not serializable
json.dumps(uuid.uuid4())            # TypeError: UUID not serializable
json.dumps(b"raw bytes")            # TypeError: bytes not serializable
json.dumps({1, 2, 3})               # TypeError: set not serializable

Solution 1: Manual Conversion Before Serializing

The simplest approach for one-off cases:

import json
from datetime import datetime
from decimal import Decimal
import uuid

data = {
    "user_id": str(uuid.uuid4()),          # UUID → str
    "created_at": datetime.now().isoformat(),  # datetime → str
    "balance": float(Decimal("99.99")),    # Decimal → float
    "tags": list({"python", "api"}),       # set → list
}

print(json.dumps(data, indent=2))
# {
#   "user_id": "a3f4...",
#   "created_at": "2024-01-15T14:30:00.123456",
#   "balance": 99.99,
#   "tags": ["python", "api"]
# }

:::warning Float precision loss Converting Decimal("99.99") to float introduces floating-point representation errors. For financial data, serialize as a string instead: str(Decimal("99.99")) → "99.99". Deserialize back with Decimal(data["balance"]). :::

Solution 2: Custom Encoder Class

For systematic handling across your entire application:

import json
from datetime import datetime, date
from decimal import Decimal
import uuid

class EngineeringEncoder(json.JSONEncoder):
    """Production-grade JSON encoder handling common Python types."""

    def default(self, obj):
        # Called for every object the default encoder cannot handle
        if isinstance(obj, datetime):
            return obj.isoformat()
        if isinstance(obj, date):
            return obj.isoformat()
        if isinstance(obj, Decimal):
            return str(obj)  # Preserve exact representation
        if isinstance(obj, uuid.UUID):
            return str(obj)
        if isinstance(obj, bytes):
            return obj.decode("utf-8")  # Or use base64 for binary data
        if isinstance(obj, set | frozenset):
            return sorted(obj)  # Sort for deterministic output
        # For any other type, call the parent (raises TypeError)
        return super().default(obj)


# Use with cls= parameter
data = {
    "event_id": uuid.uuid4(),
    "timestamp": datetime.now(),
    "amount": Decimal("1234.56"),
    "raw": b"hello",
    "tags": {"python", "backend"},
}

result = json.dumps(data, cls=EngineeringEncoder, indent=2)
print(result)
# {
#   "event_id": "3f2c8b...",
#   "timestamp": "2024-01-15T14:30:00.123456",
#   "amount": "1234.56",
#   "raw": "hello",
#   "tags": ["backend", "python"]
# }

Solution 3: `default` Function Parameter

For lightweight one-off needs without a full class:

import json
from datetime import datetime
from decimal import Decimal

def encode_extended(obj):
    if isinstance(obj, datetime):
        return {"__type__": "datetime", "value": obj.isoformat()}
    if isinstance(obj, Decimal):
        return {"__type__": "decimal", "value": str(obj)}
    raise TypeError(f"Object of type {type(obj).__name__} is not JSON serializable")

data = {
    "created": datetime(2024, 1, 15, 14, 30),
    "price": Decimal("29.99"),
}

print(json.dumps(data, default=encode_extended, indent=2))
# {
#   "created": {"__type__": "datetime", "value": "2024-01-15T14:30:00"},
#   "price": {"__type__": "decimal", "value": "29.99"}
# }

Part 4 - Custom Decoders with `object_hook`

object_hook is called on every JSON object (dict) after parsing. Use it to restore original Python types - achieving true round-trip serialization.

import json
from datetime import datetime
from decimal import Decimal

def decode_extended(obj):
    """Restore special types encoded with __type__ markers."""
    if "__type__" not in obj:
        return obj  # Regular dict - return as-is

    type_name = obj["__type__"]
    value = obj["value"]

    if type_name == "datetime":
        return datetime.fromisoformat(value)
    if type_name == "decimal":
        return Decimal(value)

    return obj  # Unknown type - return dict unchanged

# Round-trip example
original = {
    "event": "purchase",
    "timestamp": datetime(2024, 1, 15, 14, 30),
    "amount": Decimal("99.99"),
}

# Encode
json_str = json.dumps(original, default=encode_extended)

# Decode - restores original Python types
restored = json.loads(json_str, object_hook=decode_extended)

print(restored["timestamp"])         # 2024-01-15 14:30:00
print(type(restored["timestamp"]))   # <class 'datetime.datetime'>
print(restored["amount"])            # 99.99
print(type(restored["amount"]))      # <class 'decimal.Decimal'>

Part 5 - Error Handling

`json.JSONDecodeError`

json.loads() raises json.JSONDecodeError (a subclass of ValueError) when the input is not valid JSON:

import json

def safe_parse(text: str) -> dict | None:
    """Parse JSON with graceful error handling."""
    try:
        return json.loads(text)
    except json.JSONDecodeError as e:
        print(f"JSON parse error at line {e.lineno}, col {e.colno}: {e.msg}")
        print(f"Problem text: {e.doc[max(0, e.pos-20):e.pos+20]!r}")
        return None

# Common causes of JSONDecodeError:
safe_parse("{'key': 'value'}")   # Single quotes - not valid JSON
# JSON parse error at line 1, col 2: Expecting property name enclosed in double quotes

safe_parse('{"key": undefined}') # undefined is JavaScript, not JSON
# JSON parse error at line 1, col 9: Expecting value

safe_parse('{"key": "value",}')  # Trailing comma - not allowed in JSON
# JSON parse error at line 1, col 18: Expecting property name enclosed in double quotes

safe_parse("")                    # Empty string
# JSON parse error at line 1, col 1: Expecting value

Defensive Parsing Pattern

import json
import logging

logger = logging.getLogger(__name__)

def parse_api_response(response_text: str, request_id: str) -> dict:
    """
    Parse an API response body, always returning a usable dict.
    Logs errors with context for debugging production issues.
    """
    if not response_text or not response_text.strip():
        logger.warning("Empty response body for request %s", request_id)
        return {"error": "empty_response"}

    try:
        return json.loads(response_text)
    except json.JSONDecodeError as e:
        logger.error(
            "Failed to parse JSON for request %s: %s (pos=%d)",
            request_id, e.msg, e.pos,
        )
        # Log a snippet for debugging (avoid logging full response in case it contains PII)
        snippet = response_text[:200]
        logger.debug("Response snippet: %r", snippet)
        return {"error": "json_parse_error", "detail": e.msg}

Part 6 - `ensure_ascii` for Unicode Data

By default, json.dumps() escapes all non-ASCII characters:

import json

data = {
    "message": "こんにちは",  # Japanese: "Hello"
    "currency": "€100",
    "emoji": "✓",
}

# Default: everything escaped to ASCII-safe sequences
print(json.dumps(data))
# {"message": "\u3053\u3093\u306b\u3061\u306f", "currency": "\u20ac100", "emoji": "\u2713"}

# ensure_ascii=False: write Unicode characters directly
print(json.dumps(data, ensure_ascii=False))
# {"message": "こんにちは", "currency": "€100", "emoji": "✓"}

:::tip Use ensure_ascii=False for modern APIs Both outputs are valid JSON - any compliant parser handles both. But ensure_ascii=False produces smaller output and is human-readable. Use it whenever you're working with multilingual data and writing to UTF-8 files or HTTP responses with Content-Type: application/json; charset=utf-8. :::

# Correct pattern for writing international JSON to file
with open("data.json", "w", encoding="utf-8") as f:
    json.dump(data, f, ensure_ascii=False, indent=2)

Part 7 - Serializing Custom Objects

Approach 1: `dict` Serialization

For simple objects, dump the __dict__ attribute:

import json

class User:
    def __init__(self, user_id, name, email):
        self.user_id = user_id
        self.name = name
        self.email = email

user = User(42, "Alice", "[email protected]")

# Serialize via __dict__
print(json.dumps(user.__dict__))
# {"user_id": 42, "name": "Alice", "email": "[email protected]"}

Approach 2: `to_dict()` Method

Add explicit serialization control to your class:

import json
from datetime import datetime

class Event:
    def __init__(self, name, occurred_at, severity):
        self.name = name
        self.occurred_at = occurred_at  # datetime
        self.severity = severity

    def to_dict(self) -> dict:
        return {
            "name": self.name,
            "occurred_at": self.occurred_at.isoformat(),
            "severity": self.severity,
        }

    @classmethod
    def from_dict(cls, data: dict) -> "Event":
        return cls(
            name=data["name"],
            occurred_at=datetime.fromisoformat(data["occurred_at"]),
            severity=data["severity"],
        )

event = Event("deploy", datetime.now(), "info")

# Serialize
json_str = json.dumps(event.to_dict())

# Deserialize - fully restores the object
restored = Event.from_dict(json.loads(json_str))
print(restored.name)            # deploy
print(type(restored.occurred_at))  # <class 'datetime.datetime'>

Approach 3: Encoder with `isinstance` Dispatch

The cleanest production pattern for systems with many custom types:

import json
from datetime import datetime
from decimal import Decimal
import uuid
from dataclasses import dataclass, asdict

@dataclass
class Product:
    product_id: uuid.UUID
    name: str
    price: Decimal
    created_at: datetime

class AppEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, uuid.UUID):
            return str(obj)
        if isinstance(obj, Decimal):
            return str(obj)
        if isinstance(obj, datetime):
            return obj.isoformat()
        # Dataclasses: convert to dict first, then individual fields encode recursively
        if hasattr(obj, "__dataclass_fields__"):
            return asdict(obj)
        return super().default(obj)

product = Product(
    product_id=uuid.uuid4(),
    name="Widget Pro",
    price=Decimal("49.99"),
    created_at=datetime.now(),
)

print(json.dumps(product, cls=AppEncoder, indent=2))
# {
#   "product_id": "b4c2...",
#   "name": "Widget Pro",
#   "price": "49.99",
#   "created_at": "2024-01-15T14:30:00.123456"
# }

Part 8 - Performance: When the Standard Library Is Not Fast Enough

The standard json module is implemented in C (via _json), but third-party libraries go much further:

Library	Speed vs stdlib	Cross-lang	Custom types	Install
`json` (stdlib)	1x (baseline)	Yes	Manual	Built-in
`orjson`	10x–100x	Yes	Automatic*	`pip install orjson`
`ujson`	2x–5x	Yes	Limited	`pip install ujson`
`msgpack`	Fast + binary	Yes	Manual	`pip install msgpack`

* orjson natively handles: datetime, UUID, numpy arrays, dataclasses.

`orjson` - The Production Standard for High Throughput

import orjson
from datetime import datetime
from decimal import Decimal
import uuid

data = {
    "event_id": uuid.uuid4(),
    "timestamp": datetime.now(),
    "value": 42,
}

# orjson.dumps returns bytes (not str) - faster for network I/O
json_bytes = orjson.dumps(data)
print(json_bytes)
# b'{"event_id":"b4c2...","timestamp":"2024-01-15T14:30:00.123456","value":42}'

# orjson handles datetime and UUID natively - no custom encoder needed!

# Deserialize
restored = orjson.loads(json_bytes)
print(restored["value"])  # 42

# orjson does NOT restore datetime objects on load - they stay as strings
# This is the same behavior as stdlib json
print(type(restored["timestamp"]))  # <class 'str'>

When to Use Each Library

# stdlib json - default choice; zero dependencies
import json
data = json.dumps(payload)

# orjson - high-throughput APIs, event streaming, ML feature stores
# > 10,000 serializations/second, native datetime/UUID/numpy support
import orjson
data = orjson.dumps(payload)  # Returns bytes

# ujson - drop-in replacement for stdlib, moderate speedup
import ujson
data = ujson.dumps(payload)   # Returns str like stdlib

:::warning orjson returns bytes orjson.dumps() returns bytes, not str. When writing to a file opened in text mode, you must decode first: f.write(orjson.dumps(data).decode()). Or open the file in binary mode: open("file.json", "wb"). :::

Part 9 - Real-World Patterns

Pattern 1: REST API Response Parsing

import json
import urllib.request
from datetime import datetime

def fetch_github_user(username: str) -> dict:
    """Fetch GitHub user data from the public API."""
    url = f"https://api.github.com/users/{username}"

    with urllib.request.urlopen(url) as response:
        raw = response.read().decode("utf-8")

    data = json.loads(raw)

    # Extract only what we need; convert types
    return {
        "login": data["login"],
        "id": data["id"],
        "repos": data["public_repos"],
        # GitHub returns ISO 8601 strings - parse to datetime
        "created": datetime.fromisoformat(data["created_at"].replace("Z", "+00:00")),
        "bio": data.get("bio"),  # May be null → None
    }

# user = fetch_github_user("gvanrossum")
# print(user["created"])  # 2011-01-25 18:44:36+00:00

Pattern 2: Append-Only JSON Log (JSONL Format)

JSON Lines (.jsonl) - one JSON object per line - is the standard format for structured logs and ML training data:

import json
from datetime import datetime

def log_event(filepath: str, event_type: str, data: dict) -> None:
    """Append a structured event to a JSON Lines log file."""
    record = {
        "ts": datetime.utcnow().isoformat() + "Z",
        "event": event_type,
        **data,
    }
    with open(filepath, "a", encoding="utf-8") as f:
        f.write(json.dumps(record, separators=(',', ':')) + "\n")

def read_log(filepath: str):
    """Read all events from a JSON Lines log file."""
    with open(filepath, "r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if line:
                yield json.loads(line)

# Usage
log_event("events.jsonl", "user_login", {"user_id": 42, "ip": "10.0.0.1"})
log_event("events.jsonl", "purchase", {"user_id": 42, "amount": 99.99})

for event in read_log("events.jsonl"):
    print(event["event"], event["ts"])
# user_login  2024-01-15T14:30:00.000000Z
# purchase    2024-01-15T14:30:01.234567Z

Pattern 3: Config File with Schema Validation

import json
from pathlib import Path

DEFAULT_CONFIG = {
    "database": {"host": "localhost", "port": 5432},
    "debug": False,
    "log_level": "INFO",
}

def load_config(config_path: str | Path) -> dict:
    """
    Load JSON config file, falling back to defaults for missing keys.
    Validates required keys are present.
    """
    path = Path(config_path)

    if not path.exists():
        return DEFAULT_CONFIG.copy()

    with path.open("r", encoding="utf-8") as f:
        try:
            user_config = json.load(f)
        except json.JSONDecodeError as e:
            raise ValueError(f"Config file {path} is not valid JSON: {e}") from e

    # Deep merge: user config overrides defaults
    config = DEFAULT_CONFIG.copy()
    for key, value in user_config.items():
        if isinstance(value, dict) and key in config and isinstance(config[key], dict):
            config[key] = {**config[key], **value}
        else:
            config[key] = value

    return config

Pattern 4: Feature Store Serialization (ML Context)

import json
import numpy as np
from datetime import datetime

class FeatureStoreEncoder(json.JSONEncoder):
    """Encoder for ML feature data including numpy types."""

    def default(self, obj):
        # numpy scalars
        if isinstance(obj, (np.integer,)):
            return int(obj)
        if isinstance(obj, (np.floating,)):
            return float(obj)
        # numpy arrays - convert to nested lists
        if isinstance(obj, np.ndarray):
            return obj.tolist()
        if isinstance(obj, datetime):
            return obj.isoformat()
        return super().default(obj)

# Simulated feature vector
features = {
    "user_id": np.int64(12345),
    "embedding": np.array([0.1, 0.2, 0.3, 0.4]),
    "click_rate": np.float32(0.045),
    "computed_at": datetime.utcnow(),
}

json_str = json.dumps(features, cls=FeatureStoreEncoder)
print(json_str)
# {"user_id": 12345, "embedding": [0.1, 0.2, 0.3, 0.4], "click_rate": 0.04500000178813934, "computed_at": "2024-01-15T..."}

Interview Questions

Q1: What are the six JSON types, and what do they map to in Python?

Answer: JSON has exactly six types:

object maps to Python dict
array maps to Python list
string maps to Python str
number maps to Python int (if no decimal point) or float (if decimal point present)
true/false map to Python True/False
null maps to Python None

Everything else in Python - datetime, UUID, Decimal, bytes, set, custom objects - must be explicitly converted before JSON serialization.

Q2: What is the difference between `json.dumps()` and `json.dump()`?

Answer: json.dumps() serializes a Python object to a string (the s stands for "string"). json.dump() serializes to a file-like object - any object with a .write() method. Both accept the same keyword arguments (indent, sort_keys, cls, default, etc.). Use dumps() when you need the JSON as a string in memory (e.g., for an HTTP response body, for hashing). Use dump() when writing directly to a file to avoid holding the entire string in memory.

Q3: How do you serialize a `datetime` object to JSON? How do you deserialize it back?

Answer: datetime is not JSON-serializable by default. There are two main approaches:

Simple (no round-trip guarantee): datetime.now().isoformat() produces a string like "2024-01-15T14:30:00". Deserialize with datetime.fromisoformat(s).
Round-trip with type markers:

# Encode
def encode(obj):
    if isinstance(obj, datetime):
        return {"__type__": "datetime", "value": obj.isoformat()}
    raise TypeError

# Decode
def decode(obj):
    if obj.get("__type__") == "datetime":
        return datetime.fromisoformat(obj["value"])
    return obj

json.dumps(data, default=encode)
json.loads(json_str, object_hook=decode)

Use object_hook to restore the Python type during deserialization.

Q4: You need to hash a dict to use as a cache key. How do you do it correctly with JSON?

Answer: Use json.dumps(d, sort_keys=True, separators=(',', ':')) to get a canonical representation. Without sort_keys=True, two dicts with the same content but different insertion order would produce different strings (though in Python 3.7+ dicts preserve insertion order, so same code always yields the same order - but sort_keys=True is still the safe, explicit choice). Without separators=(',', ':'), whitespace in the default output is harmless but wasteful.

import json, hashlib

def cache_key(params: dict) -> str:
    canonical = json.dumps(params, sort_keys=True, separators=(',', ':'))
    return hashlib.sha256(canonical.encode()).hexdigest()

Q5: What is `object_hook` in `json.loads()` and when would you use it?

Answer: object_hook is a callable that is called for every JSON object (dict) parsed. The return value replaces the default dict. It enables custom deserialization - turning type-annotated dicts back into proper Python objects.

Use it when you control both the encoder and decoder and want true round-trip fidelity. For example, if you encode datetime as {"__type__": "datetime", "value": "..."}, your object_hook checks for "__type__" and reconstructs the datetime. Without object_hook, you would need to walk the deserialized dict manually.

Q6: When should you use `orjson` instead of the standard `json` module?

Answer: Use orjson when:

You are serializing more than ~10,000 JSON objects per second (high-throughput APIs, event streams, ML inference servers)
Your data contains datetime, UUID, or numpy arrays - orjson handles them natively without a custom encoder
You are writing JSON to network sockets where bytes output is more efficient than str

orjson is 10x–100x faster than stdlib json because it is implemented in Rust. The main difference is that orjson.dumps() returns bytes, not str. This is fine for file I/O in binary mode or HTTP response bodies, but requires .decode() if you need a string.

Practice Challenges

Beginner: Build a Simple Config File Manager

Write a module that loads a JSON config file on startup and saves updates back to disk.

Requirements:

load_config(path) - load from file, return dict; create file with defaults if it doesn't exist
save_config(path, config) - save dict to file with indent=2
get(path, key, default=None) - get a value from config
set(path, key, value) - update a value and immediately persist

Solution

import json
from pathlib import Path

DEFAULTS = {
    "theme": "dark",
    "language": "en",
    "notifications": True,
    "max_retries": 3,
}

def load_config(path: str | Path) -> dict:
    """Load config from JSON file, creating it with defaults if absent."""
    path = Path(path)

    if not path.exists():
        config = DEFAULTS.copy()
        save_config(path, config)
        return config

    with path.open("r", encoding="utf-8") as f:
        try:
            return json.load(f)
        except json.JSONDecodeError as e:
            print(f"Warning: config file corrupted ({e}), using defaults")
            return DEFAULTS.copy()

def save_config(path: str | Path, config: dict) -> None:
    """Save config dict to JSON file with readable formatting."""
    path = Path(path)
    path.parent.mkdir(parents=True, exist_ok=True)

    with path.open("w", encoding="utf-8") as f:
        json.dump(config, f, indent=2, sort_keys=True)
        f.write("\n")  # Trailing newline - POSIX convention

def get(path: str | Path, key: str, default=None):
    """Get a single value from the config file."""
    config = load_config(path)
    return config.get(key, default)

def set_value(path: str | Path, key: str, value) -> None:
    """Update a single config value and persist immediately."""
    config = load_config(path)
    config[key] = value
    save_config(path, config)


# Demo
config_path = "/tmp/demo_config.json"

# First load creates the file with defaults
config = load_config(config_path)
print(config)
# {'language': 'en', 'max_retries': 3, 'notifications': True, 'theme': 'dark'}

# Update a value
set_value(config_path, "theme", "light")
set_value(config_path, "max_retries", 5)

# Read back
print(get(config_path, "theme"))       # light
print(get(config_path, "max_retries")) # 5
print(get(config_path, "missing", 42)) # 42 (default)

# Verify file contents
with open(config_path) as f:
    print(f.read())
# {
#   "language": "en",
#   "max_retries": 5,
#   "notifications": true,
#   "theme": "light"
# }

Intermediate: Full Round-Trip Serializer for Custom Types

Build a SmartJSON class that handles datetime, Decimal, UUID, set, and dataclasses - with full round-trip fidelity (deserializing restores original Python types).

Solution

import json
from datetime import datetime
from decimal import Decimal
import uuid
from dataclasses import dataclass, asdict, fields

# Type marker key
TYPE_KEY = "__python_type__"

class SmartEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime):
            return {TYPE_KEY: "datetime", "v": obj.isoformat()}
        if isinstance(obj, Decimal):
            return {TYPE_KEY: "decimal", "v": str(obj)}
        if isinstance(obj, uuid.UUID):
            return {TYPE_KEY: "uuid", "v": str(obj)}
        if isinstance(obj, (set, frozenset)):
            return {TYPE_KEY: "set", "v": sorted(str(i) for i in obj)}
        if hasattr(obj, "__dataclass_fields__"):
            return {TYPE_KEY: "dataclass", "cls": type(obj).__name__, "v": asdict(obj)}
        return super().default(obj)


def smart_decoder(obj: dict):
    """object_hook that restores Python types from type-annotated dicts."""
    if TYPE_KEY not in obj:
        return obj

    kind = obj[TYPE_KEY]
    val = obj["v"]

    if kind == "datetime":
        return datetime.fromisoformat(val)
    if kind == "decimal":
        return Decimal(val)
    if kind == "uuid":
        return uuid.UUID(val)
    if kind == "set":
        return set(val)
    if kind == "dataclass":
        # Note: restoring to dict since we don't have the class in scope here
        # In production, maintain a registry of dataclass types
        return val

    return obj  # Unknown type - pass through


class SmartJSON:
    """Drop-in replacement for json module with extended type support."""

    @staticmethod
    def dumps(obj, **kwargs) -> str:
        return json.dumps(obj, cls=SmartEncoder, **kwargs)

    @staticmethod
    def loads(s: str, **kwargs):
        return json.loads(s, object_hook=smart_decoder, **kwargs)

    @staticmethod
    def dump(obj, fp, **kwargs) -> None:
        json.dump(obj, fp, cls=SmartEncoder, **kwargs)

    @staticmethod
    def load(fp, **kwargs):
        return json.load(fp, object_hook=smart_decoder, **kwargs)


# Test round-trips
@dataclass
class Order:
    order_id: str
    amount: Decimal
    created: datetime

data = {
    "session_id": uuid.UUID("12345678-1234-5678-1234-567812345678"),
    "timestamp": datetime(2024, 1, 15, 14, 30, 0),
    "price": Decimal("1234.56"),
    "tags": {"python", "backend", "v2"},
}

# Encode
encoded = SmartJSON.dumps(data, indent=2)
print(encoded)

# Decode - restores all original types
restored = SmartJSON.loads(encoded)

print(type(restored["session_id"]))   # <class 'uuid.UUID'>
print(type(restored["timestamp"]))    # <class 'datetime.datetime'>
print(type(restored["price"]))        # <class 'decimal.Decimal'>
print(type(restored["tags"]))         # <class 'set'>

# Verify values survived round-trip exactly
assert restored["price"] == Decimal("1234.56")  # No float precision loss!
assert restored["timestamp"] == datetime(2024, 1, 15, 14, 30, 0)
print("All round-trip assertions passed.")

Advanced: High-Throughput JSONL Pipeline

Build an event processing pipeline that reads a JSONL log file, filters and transforms events, and writes results to a new JSONL file. Handle malformed lines gracefully. Benchmark the stdlib json version against orjson.

Solution

import json
import time
import random
from datetime import datetime, timedelta
from pathlib import Path
from typing import Iterator

# ── Generate sample data ─────────────────────────────────────────────────────

def generate_events(path: str, count: int = 10_000) -> None:
    """Generate a sample JSONL event log."""
    event_types = ["page_view", "click", "purchase", "search", "logout"]

    base_time = datetime(2024, 1, 1)

    with open(path, "w", encoding="utf-8") as f:
        for i in range(count):
            ts = base_time + timedelta(seconds=i * 0.5)
            event = {
                "id": i,
                "type": random.choice(event_types),
                "user_id": random.randint(1, 1000),
                "ts": ts.isoformat() + "Z",
                "value": round(random.uniform(0, 1000), 2),
            }
            f.write(json.dumps(event, separators=(',', ':')) + "\n")

        # Inject some bad lines
        f.write("not json at all\n")
        f.write('{"incomplete": \n')
        f.write("\n")  # Empty line


# ── Pipeline with stdlib json ─────────────────────────────────────────────────

def read_jsonl(path: str) -> Iterator[dict]:
    """Yield parsed events, skipping malformed lines."""
    with open(path, "r", encoding="utf-8") as f:
        for line_num, line in enumerate(f, 1):
            line = line.strip()
            if not line:
                continue
            try:
                yield json.loads(line)
            except json.JSONDecodeError as e:
                print(f"  Skipping bad line {line_num}: {e.msg}")


def process_events(
    input_path: str,
    output_path: str,
    event_filter: str,
    min_value: float,
) -> int:
    """
    Filter events by type and minimum value, write to new JSONL file.
    Returns count of events written.
    """
    written = 0

    with open(output_path, "w", encoding="utf-8") as out_f:
        for event in read_jsonl(input_path):
            if event.get("type") != event_filter:
                continue
            if event.get("value", 0) < min_value:
                continue

            # Transform: add processing timestamp
            event["processed_at"] = datetime.utcnow().isoformat() + "Z"

            out_f.write(json.dumps(event, separators=(',', ':')) + "\n")
            written += 1

    return written


# ── Benchmark ────────────────────────────────────────────────────────────────

def benchmark():
    input_path = "/tmp/events.jsonl"
    output_path = "/tmp/purchases.jsonl"

    print("Generating 10,000 events...")
    generate_events(input_path, 10_000)

    print("\nProcessing with stdlib json:")
    start = time.perf_counter()
    count = process_events(input_path, output_path, "purchase", 100.0)
    elapsed = time.perf_counter() - start
    print(f"  Wrote {count} purchase events in {elapsed:.4f}s")

    # Try orjson if available
    try:
        import orjson

        def process_events_orjson(input_path, output_path, event_filter, min_value):
            written = 0
            with open(input_path, "rb") as in_f, open(output_path, "wb") as out_f:
                for line in in_f:
                    line = line.strip()
                    if not line:
                        continue
                    try:
                        event = orjson.loads(line)
                    except orjson.JSONDecodeError:
                        continue
                    if event.get("type") != event_filter:
                        continue
                    if event.get("value", 0) < min_value:
                        continue
                    event["processed_at"] = datetime.utcnow().isoformat() + "Z"
                    out_f.write(orjson.dumps(event) + b"\n")
                    written += 1
            return written

        print("\nProcessing with orjson:")
        start = time.perf_counter()
        count = process_events_orjson(input_path, "/tmp/purchases_orjson.jsonl", "purchase", 100.0)
        elapsed = time.perf_counter() - start
        print(f"  Wrote {count} purchase events in {elapsed:.4f}s")

    except ImportError:
        print("\norjson not installed. Install with: pip install orjson")

    # Verify output
    events = list(read_jsonl(output_path))
    print(f"\nVerification: first purchase event:")
    print(json.dumps(events[0], indent=2))

benchmark()
# Generating 10,000 events...
# Processing with stdlib json:
#   Skipping bad line 10001: Expecting value
#   Skipping bad line 10002: Expecting property name enclosed in double quotes
#   Wrote ~476 purchase events in 0.0234s
# Processing with orjson:
#   Wrote ~476 purchase events in 0.0031s  (≈7x faster)

Quick Reference

Operation	Syntax	Notes
Object to JSON string	`json.dumps(obj)`	Returns `str`
JSON string to object	`json.loads(s)`	Returns Python type
Object to JSON file	`json.dump(obj, f)`	`f` must be open for writing
JSON file to object	`json.load(f)`	`f` must be open for reading
Pretty print	`json.dumps(obj, indent=2)`	Indent in spaces
Sorted keys	`json.dumps(obj, sort_keys=True)`	Alphabetical key order
Compact output	`json.dumps(obj, separators=(',', ':'))`	No spaces, smaller payload
Unicode direct	`json.dumps(obj, ensure_ascii=False)`	Write non-ASCII as-is
Custom encoder class	`json.dumps(obj, cls=MyEncoder)`	Subclass `json.JSONEncoder`
Custom encoder function	`json.dumps(obj, default=fn)`	`fn(obj)` must return serializable value
Custom decoder	`json.loads(s, object_hook=fn)`	Called for every JSON object
Handle parse errors	`json.JSONDecodeError`	Subclass of `ValueError`
Python→JSON `True`	`true`	Case-sensitive
Python→JSON `None`	`null`	Case-sensitive
Python→JSON `dict`	`{}` object	Keys must be strings
Python→JSON `tuple`	`[]` array	Tuples become arrays

Key Takeaways

JSON has exactly six types: object, array, string, number, boolean, null - everything else requires explicit handling
Use json.dumps() / json.loads() for string round-trips; use json.dump() / json.load() for file I/O
Always open JSON files with encoding="utf-8" - JSON is defined to be UTF-8 by spec
indent=2 for human-readable output; separators=(',', ':') for compact network payloads; sort_keys=True for deterministic hashing
Extend json.JSONEncoder and override default() for systematic custom-type handling across your application
Use object_hook in json.loads() to achieve full round-trip fidelity - restoring original Python types on deserialization
For financial data: serialize Decimal as str, not float, to avoid floating-point precision loss
At high throughput (10k+ ops/sec), reach for orjson - it is 10x–100x faster and handles datetime, UUID, and numpy arrays natively
JSONL (one JSON object per line) is the standard format for structured logs, event streams, and ML training datasets

What You Will Learn​

Prerequisites​

Mental Model: JSON Is a Typed Subset of Python​

Part 1 - The Four Core Functions​

json.dumps() - Python Object to JSON String​

json.loads() - JSON String to Python Object​

json.dump() - Python Object to JSON File​

json.load() - JSON File to Python Object​

Part 2 - Formatting Parameters​

indent - Human-Readable Output​

sort_keys - Deterministic Output​

separators - Compact JSON for Network Transmission​

Part 3 - Non-Serializable Types and How to Handle Each​

The Problem​

Solution 1: Manual Conversion Before Serializing​

Solution 2: Custom Encoder Class​

Solution 3: default Function Parameter​

Part 4 - Custom Decoders with object_hook​

Part 5 - Error Handling​

json.JSONDecodeError​

Defensive Parsing Pattern​

Part 6 - ensure_ascii for Unicode Data​

Part 7 - Serializing Custom Objects​

Approach 1: __dict__ Serialization​

Approach 2: to_dict() Method​

Approach 3: Encoder with isinstance Dispatch​

Part 8 - Performance: When the Standard Library Is Not Fast Enough​

orjson - The Production Standard for High Throughput​

When to Use Each Library​

Part 9 - Real-World Patterns​

Pattern 1: REST API Response Parsing​

Pattern 2: Append-Only JSON Log (JSONL Format)​

Pattern 3: Config File with Schema Validation​

Pattern 4: Feature Store Serialization (ML Context)​

Interview Questions​

Q1: What are the six JSON types, and what do they map to in Python?​

Q2: What is the difference between json.dumps() and json.dump()?​

Q3: How do you serialize a datetime object to JSON? How do you deserialize it back?​

Q4: You need to hash a dict to use as a cache key. How do you do it correctly with JSON?​

Q5: What is object_hook in json.loads() and when would you use it?​

Q6: When should you use orjson instead of the standard json module?​

Practice Challenges​

Beginner: Build a Simple Config File Manager​

Intermediate: Full Round-Trip Serializer for Custom Types​

Advanced: High-Throughput JSONL Pipeline​

Quick Reference​

Key Takeaways​

What You Will Learn

Prerequisites

Mental Model: JSON Is a Typed Subset of Python

Part 1 - The Four Core Functions

`json.dumps()` - Python Object to JSON String

`json.loads()` - JSON String to Python Object

`json.dump()` - Python Object to JSON File

`json.load()` - JSON File to Python Object

Part 2 - Formatting Parameters

`indent` - Human-Readable Output

`sort_keys` - Deterministic Output

`separators` - Compact JSON for Network Transmission

Part 3 - Non-Serializable Types and How to Handle Each

The Problem

Solution 1: Manual Conversion Before Serializing

Solution 2: Custom Encoder Class

Solution 3: `default` Function Parameter

Part 4 - Custom Decoders with `object_hook`

Part 5 - Error Handling

`json.JSONDecodeError`

Defensive Parsing Pattern

Part 6 - `ensure_ascii` for Unicode Data

Part 7 - Serializing Custom Objects

Approach 1: `dict` Serialization

Approach 2: `to_dict()` Method

Approach 3: Encoder with `isinstance` Dispatch

Part 8 - Performance: When the Standard Library Is Not Fast Enough

`orjson` - The Production Standard for High Throughput

When to Use Each Library

Part 9 - Real-World Patterns

Pattern 1: REST API Response Parsing

Pattern 2: Append-Only JSON Log (JSONL Format)

Pattern 3: Config File with Schema Validation

Pattern 4: Feature Store Serialization (ML Context)

Interview Questions

Q1: What are the six JSON types, and what do they map to in Python?

Q2: What is the difference between `json.dumps()` and `json.dump()`?

Q3: How do you serialize a `datetime` object to JSON? How do you deserialize it back?

Q4: You need to hash a dict to use as a cache key. How do you do it correctly with JSON?

Q5: What is `object_hook` in `json.loads()` and when would you use it?

Q6: When should you use `orjson` instead of the standard `json` module?

Practice Challenges

Beginner: Build a Simple Config File Manager

Intermediate: Full Round-Trip Serializer for Custom Types

Advanced: High-Throughput JSONL Pipeline

Quick Reference

Key Takeaways