Skip to main content

JSON Handling - Serialization, Deserialization, and Edge Cases

Reading time: ~18 minutes | Level: Foundation → Engineering

Here is a question that trips up most developers the first time they hit it in production:

import json
from datetime import datetime
from decimal import Decimal

data = {
"user": "alice",
"created_at": datetime.now(),
"balance": Decimal("99.99"),
}

print(json.dumps(data))

Output:

TypeError: Object of type datetime is not JSON serializable

The json module only handles six types. Everything else - datetime, UUID, Decimal, bytes, custom objects - raises TypeError. Knowing exactly which types fail and exactly how to handle them is the difference between a working REST API and a production incident at 2 AM.

What You Will Learn

  • The six JSON types and their exact Python equivalents
  • json.dumps() and json.loads() for string-based serialization
  • json.dump() and json.load() for file-based serialization
  • indent, sort_keys, and separators parameters and when to use each
  • How to handle non-serializable types: datetime, UUID, Decimal, bytes, custom objects
  • Custom encoders with json.JSONEncoder and the default() method
  • Custom decoders with object_hook for round-trip fidelity
  • json.JSONDecodeError: what causes it and how to handle it gracefully
  • ensure_ascii=False for Unicode-rich data
  • Performance: when to reach for orjson or ujson

Prerequisites

  • Python 3.8+ with json module (standard library - no install needed)
  • Understanding of Python dicts, lists, and basic types
  • Familiarity with reading and writing files (see lessons 01 and 02 of this module)
  • Basic understanding of context managers (lesson 03)

Mental Model: JSON Is a Typed Subset of Python

JSON is not Python. It is a language-independent text format with exactly six types:

JSON TypeJSON ExamplePython Type
object{"key": "value"}dict
array[1, 2, 3]list
string"hello"str
number42 or 3.14int or float
booleantrue or falseTrue or False
nullnullNone

Not in JSON: datetime, UUID, Decimal, bytes, set, tuple, custom objects, complex, frozenset, ...

This mismatch is the source of every JSON serialization problem. Python's type system is far richer than JSON's. The json module handles the six core mappings automatically. Everything else is your responsibility.

Part 1 - The Four Core Functions

json.dumps() - Python Object to JSON String

import json

data = {
"name": "Alice",
"age": 30,
"scores": [95, 87, 92],
"active": True,
"profile": None,
}

json_string = json.dumps(data)
print(json_string)
# {"name": "Alice", "age": 30, "scores": [95, 87, 92], "active": true, "profile": null}

print(type(json_string))
# <class 'str'>

Notice the automatic type conversions:

  • Python True becomes JSON true
  • Python None becomes JSON null
  • Python dict becomes JSON object
  • Python list becomes JSON array

json.loads() - JSON String to Python Object

import json

json_string = '{"name": "Alice", "age": 30, "active": true, "profile": null}'

data = json.loads(json_string)
print(data)
# {'name': 'Alice', 'age': 30, 'active': True, 'profile': None}

print(type(data)) # <class 'dict'>
print(type(data["active"])) # <class 'bool'>
print(data["profile"]) # None

The conversions are symmetric:

  • JSON true becomes Python True
  • JSON false becomes Python False
  • JSON null becomes Python None

json.dump() - Python Object to JSON File

import json

config = {
"database": {
"host": "localhost",
"port": 5432,
"name": "appdb",
},
"debug": False,
"max_connections": 100,
}

with open("config.json", "w", encoding="utf-8") as f:
json.dump(config, f, indent=2)

# config.json now contains:
# {
# "database": {
# "host": "localhost",
# "port": 5432,
# "name": "appdb"
# },
# "debug": false,
# "max_connections": 100
# }

json.load() - JSON File to Python Object

import json

with open("config.json", "r", encoding="utf-8") as f:
config = json.load(f)

print(config["database"]["host"]) # localhost
print(config["debug"]) # False
print(type(config["database"])) # <class 'dict'>

:::note Always specify encoding Always open JSON files with encoding="utf-8". JSON is defined to be UTF-8 encoded by RFC 8259. Omitting the encoding parameter uses the platform default, which can differ on Windows. :::

Part 2 - Formatting Parameters

indent - Human-Readable Output

import json

data = {"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}

# Compact (default)
compact = json.dumps(data)
print(compact)
# {"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}

# Indented - for config files, logging, debugging
readable = json.dumps(data, indent=2)
print(readable)
# {
# "users": [
# {
# "id": 1,
# "name": "Alice"
# },
# {
# "id": 2,
# "name": "Bob"
# }
# ]
# }

sort_keys - Deterministic Output

import json

data = {"zebra": 1, "apple": 2, "mango": 3}

print(json.dumps(data))
# {"zebra": 1, "apple": 2, "mango": 3} - dict insertion order (Python 3.7+)

print(json.dumps(data, sort_keys=True))
# {"apple": 2, "mango": 3, "zebra": 1} - alphabetical

:::tip Use sort_keys for reproducible hashing When you need to hash JSON (e.g., for caching or checksums), use sort_keys=True to ensure the same dict always produces the same JSON string regardless of insertion order.

import hashlib, json

def dict_hash(d: dict) -> str:
canonical = json.dumps(d, sort_keys=True, separators=(',', ':'))
return hashlib.sha256(canonical.encode()).hexdigest()

:::

separators - Compact JSON for Network Transmission

import json

data = {"event": "click", "x": 100, "y": 200}

# Default separators include spaces: (', ', ': ')
default = json.dumps(data)
print(f"Default: {len(default)} bytes → {default}")
# Default: 36 bytes → {"event": "click", "x": 100, "y": 200}

# Compact separators - no extra whitespace
compact = json.dumps(data, separators=(',', ':'))
print(f"Compact: {len(compact)} bytes → {compact}")
# Compact: 34 bytes → {"event":"click","x":100,"y":200}
FormatUse for
indent=2Config files, responses for human review
separators=(',', ':')Network APIs, high-throughput logging (compact)
DefaultGeneral use, debugging

Part 3 - Non-Serializable Types and How to Handle Each

The Problem

import json
from datetime import datetime
from decimal import Decimal
import uuid

# These all raise TypeError:
json.dumps(datetime.now()) # TypeError: datetime not serializable
json.dumps(Decimal("3.14")) # TypeError: Decimal not serializable
json.dumps(uuid.uuid4()) # TypeError: UUID not serializable
json.dumps(b"raw bytes") # TypeError: bytes not serializable
json.dumps({1, 2, 3}) # TypeError: set not serializable

Solution 1: Manual Conversion Before Serializing

The simplest approach for one-off cases:

import json
from datetime import datetime
from decimal import Decimal
import uuid

data = {
"user_id": str(uuid.uuid4()), # UUID → str
"created_at": datetime.now().isoformat(), # datetime → str
"balance": float(Decimal("99.99")), # Decimal → float
"tags": list({"python", "api"}), # set → list
}

print(json.dumps(data, indent=2))
# {
# "user_id": "a3f4...",
# "created_at": "2024-01-15T14:30:00.123456",
# "balance": 99.99,
# "tags": ["python", "api"]
# }

:::warning Float precision loss Converting Decimal("99.99") to float introduces floating-point representation errors. For financial data, serialize as a string instead: str(Decimal("99.99"))"99.99". Deserialize back with Decimal(data["balance"]). :::

Solution 2: Custom Encoder Class

For systematic handling across your entire application:

import json
from datetime import datetime, date
from decimal import Decimal
import uuid

class EngineeringEncoder(json.JSONEncoder):
"""Production-grade JSON encoder handling common Python types."""

def default(self, obj):
# Called for every object the default encoder cannot handle
if isinstance(obj, datetime):
return obj.isoformat()
if isinstance(obj, date):
return obj.isoformat()
if isinstance(obj, Decimal):
return str(obj) # Preserve exact representation
if isinstance(obj, uuid.UUID):
return str(obj)
if isinstance(obj, bytes):
return obj.decode("utf-8") # Or use base64 for binary data
if isinstance(obj, set | frozenset):
return sorted(obj) # Sort for deterministic output
# For any other type, call the parent (raises TypeError)
return super().default(obj)


# Use with cls= parameter
data = {
"event_id": uuid.uuid4(),
"timestamp": datetime.now(),
"amount": Decimal("1234.56"),
"raw": b"hello",
"tags": {"python", "backend"},
}

result = json.dumps(data, cls=EngineeringEncoder, indent=2)
print(result)
# {
# "event_id": "3f2c8b...",
# "timestamp": "2024-01-15T14:30:00.123456",
# "amount": "1234.56",
# "raw": "hello",
# "tags": ["backend", "python"]
# }

Solution 3: default Function Parameter

For lightweight one-off needs without a full class:

import json
from datetime import datetime
from decimal import Decimal

def encode_extended(obj):
if isinstance(obj, datetime):
return {"__type__": "datetime", "value": obj.isoformat()}
if isinstance(obj, Decimal):
return {"__type__": "decimal", "value": str(obj)}
raise TypeError(f"Object of type {type(obj).__name__} is not JSON serializable")

data = {
"created": datetime(2024, 1, 15, 14, 30),
"price": Decimal("29.99"),
}

print(json.dumps(data, default=encode_extended, indent=2))
# {
# "created": {"__type__": "datetime", "value": "2024-01-15T14:30:00"},
# "price": {"__type__": "decimal", "value": "29.99"}
# }

Part 4 - Custom Decoders with object_hook

object_hook is called on every JSON object (dict) after parsing. Use it to restore original Python types - achieving true round-trip serialization.

import json
from datetime import datetime
from decimal import Decimal

def decode_extended(obj):
"""Restore special types encoded with __type__ markers."""
if "__type__" not in obj:
return obj # Regular dict - return as-is

type_name = obj["__type__"]
value = obj["value"]

if type_name == "datetime":
return datetime.fromisoformat(value)
if type_name == "decimal":
return Decimal(value)

return obj # Unknown type - return dict unchanged

# Round-trip example
original = {
"event": "purchase",
"timestamp": datetime(2024, 1, 15, 14, 30),
"amount": Decimal("99.99"),
}

# Encode
json_str = json.dumps(original, default=encode_extended)

# Decode - restores original Python types
restored = json.loads(json_str, object_hook=decode_extended)

print(restored["timestamp"]) # 2024-01-15 14:30:00
print(type(restored["timestamp"])) # <class 'datetime.datetime'>
print(restored["amount"]) # 99.99
print(type(restored["amount"])) # <class 'decimal.Decimal'>

Part 5 - Error Handling

json.JSONDecodeError

json.loads() raises json.JSONDecodeError (a subclass of ValueError) when the input is not valid JSON:

import json

def safe_parse(text: str) -> dict | None:
"""Parse JSON with graceful error handling."""
try:
return json.loads(text)
except json.JSONDecodeError as e:
print(f"JSON parse error at line {e.lineno}, col {e.colno}: {e.msg}")
print(f"Problem text: {e.doc[max(0, e.pos-20):e.pos+20]!r}")
return None

# Common causes of JSONDecodeError:
safe_parse("{'key': 'value'}") # Single quotes - not valid JSON
# JSON parse error at line 1, col 2: Expecting property name enclosed in double quotes

safe_parse('{"key": undefined}') # undefined is JavaScript, not JSON
# JSON parse error at line 1, col 9: Expecting value

safe_parse('{"key": "value",}') # Trailing comma - not allowed in JSON
# JSON parse error at line 1, col 18: Expecting property name enclosed in double quotes

safe_parse("") # Empty string
# JSON parse error at line 1, col 1: Expecting value

Defensive Parsing Pattern

import json
import logging

logger = logging.getLogger(__name__)

def parse_api_response(response_text: str, request_id: str) -> dict:
"""
Parse an API response body, always returning a usable dict.
Logs errors with context for debugging production issues.
"""
if not response_text or not response_text.strip():
logger.warning("Empty response body for request %s", request_id)
return {"error": "empty_response"}

try:
return json.loads(response_text)
except json.JSONDecodeError as e:
logger.error(
"Failed to parse JSON for request %s: %s (pos=%d)",
request_id, e.msg, e.pos,
)
# Log a snippet for debugging (avoid logging full response in case it contains PII)
snippet = response_text[:200]
logger.debug("Response snippet: %r", snippet)
return {"error": "json_parse_error", "detail": e.msg}

Part 6 - ensure_ascii for Unicode Data

By default, json.dumps() escapes all non-ASCII characters:

import json

data = {
"message": "こんにちは", # Japanese: "Hello"
"currency": "€100",
"emoji": "✓",
}

# Default: everything escaped to ASCII-safe sequences
print(json.dumps(data))
# {"message": "\u3053\u3093\u306b\u3061\u306f", "currency": "\u20ac100", "emoji": "\u2713"}

# ensure_ascii=False: write Unicode characters directly
print(json.dumps(data, ensure_ascii=False))
# {"message": "こんにちは", "currency": "€100", "emoji": "✓"}

:::tip Use ensure_ascii=False for modern APIs Both outputs are valid JSON - any compliant parser handles both. But ensure_ascii=False produces smaller output and is human-readable. Use it whenever you're working with multilingual data and writing to UTF-8 files or HTTP responses with Content-Type: application/json; charset=utf-8. :::

# Correct pattern for writing international JSON to file
with open("data.json", "w", encoding="utf-8") as f:
json.dump(data, f, ensure_ascii=False, indent=2)

Part 7 - Serializing Custom Objects

Approach 1: __dict__ Serialization

For simple objects, dump the __dict__ attribute:

import json

class User:
def __init__(self, user_id, name, email):
self.user_id = user_id
self.name = name
self.email = email

user = User(42, "Alice", "[email protected]")

# Serialize via __dict__
print(json.dumps(user.__dict__))
# {"user_id": 42, "name": "Alice", "email": "[email protected]"}

Approach 2: to_dict() Method

Add explicit serialization control to your class:

import json
from datetime import datetime

class Event:
def __init__(self, name, occurred_at, severity):
self.name = name
self.occurred_at = occurred_at # datetime
self.severity = severity

def to_dict(self) -> dict:
return {
"name": self.name,
"occurred_at": self.occurred_at.isoformat(),
"severity": self.severity,
}

@classmethod
def from_dict(cls, data: dict) -> "Event":
return cls(
name=data["name"],
occurred_at=datetime.fromisoformat(data["occurred_at"]),
severity=data["severity"],
)

event = Event("deploy", datetime.now(), "info")

# Serialize
json_str = json.dumps(event.to_dict())

# Deserialize - fully restores the object
restored = Event.from_dict(json.loads(json_str))
print(restored.name) # deploy
print(type(restored.occurred_at)) # <class 'datetime.datetime'>

Approach 3: Encoder with isinstance Dispatch

The cleanest production pattern for systems with many custom types:

import json
from datetime import datetime
from decimal import Decimal
import uuid
from dataclasses import dataclass, asdict

@dataclass
class Product:
product_id: uuid.UUID
name: str
price: Decimal
created_at: datetime

class AppEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, uuid.UUID):
return str(obj)
if isinstance(obj, Decimal):
return str(obj)
if isinstance(obj, datetime):
return obj.isoformat()
# Dataclasses: convert to dict first, then individual fields encode recursively
if hasattr(obj, "__dataclass_fields__"):
return asdict(obj)
return super().default(obj)

product = Product(
product_id=uuid.uuid4(),
name="Widget Pro",
price=Decimal("49.99"),
created_at=datetime.now(),
)

print(json.dumps(product, cls=AppEncoder, indent=2))
# {
# "product_id": "b4c2...",
# "name": "Widget Pro",
# "price": "49.99",
# "created_at": "2024-01-15T14:30:00.123456"
# }

Part 8 - Performance: When the Standard Library Is Not Fast Enough

The standard json module is implemented in C (via _json), but third-party libraries go much further:

LibrarySpeed vs stdlibCross-langCustom typesInstall
json (stdlib)1x (baseline)YesManualBuilt-in
orjson10x–100xYesAutomatic*pip install orjson
ujson2x–5xYesLimitedpip install ujson
msgpackFast + binaryYesManualpip install msgpack

* orjson natively handles: datetime, UUID, numpy arrays, dataclasses.

orjson - The Production Standard for High Throughput

import orjson
from datetime import datetime
from decimal import Decimal
import uuid

data = {
"event_id": uuid.uuid4(),
"timestamp": datetime.now(),
"value": 42,
}

# orjson.dumps returns bytes (not str) - faster for network I/O
json_bytes = orjson.dumps(data)
print(json_bytes)
# b'{"event_id":"b4c2...","timestamp":"2024-01-15T14:30:00.123456","value":42}'

# orjson handles datetime and UUID natively - no custom encoder needed!

# Deserialize
restored = orjson.loads(json_bytes)
print(restored["value"]) # 42

# orjson does NOT restore datetime objects on load - they stay as strings
# This is the same behavior as stdlib json
print(type(restored["timestamp"])) # <class 'str'>

When to Use Each Library

# stdlib json - default choice; zero dependencies
import json
data = json.dumps(payload)

# orjson - high-throughput APIs, event streaming, ML feature stores
# > 10,000 serializations/second, native datetime/UUID/numpy support
import orjson
data = orjson.dumps(payload) # Returns bytes

# ujson - drop-in replacement for stdlib, moderate speedup
import ujson
data = ujson.dumps(payload) # Returns str like stdlib

:::warning orjson returns bytes orjson.dumps() returns bytes, not str. When writing to a file opened in text mode, you must decode first: f.write(orjson.dumps(data).decode()). Or open the file in binary mode: open("file.json", "wb"). :::

Part 9 - Real-World Patterns

Pattern 1: REST API Response Parsing

import json
import urllib.request
from datetime import datetime

def fetch_github_user(username: str) -> dict:
"""Fetch GitHub user data from the public API."""
url = f"https://api.github.com/users/{username}"

with urllib.request.urlopen(url) as response:
raw = response.read().decode("utf-8")

data = json.loads(raw)

# Extract only what we need; convert types
return {
"login": data["login"],
"id": data["id"],
"repos": data["public_repos"],
# GitHub returns ISO 8601 strings - parse to datetime
"created": datetime.fromisoformat(data["created_at"].replace("Z", "+00:00")),
"bio": data.get("bio"), # May be null → None
}

# user = fetch_github_user("gvanrossum")
# print(user["created"]) # 2011-01-25 18:44:36+00:00

Pattern 2: Append-Only JSON Log (JSONL Format)

JSON Lines (.jsonl) - one JSON object per line - is the standard format for structured logs and ML training data:

import json
from datetime import datetime

def log_event(filepath: str, event_type: str, data: dict) -> None:
"""Append a structured event to a JSON Lines log file."""
record = {
"ts": datetime.utcnow().isoformat() + "Z",
"event": event_type,
**data,
}
with open(filepath, "a", encoding="utf-8") as f:
f.write(json.dumps(record, separators=(',', ':')) + "\n")

def read_log(filepath: str):
"""Read all events from a JSON Lines log file."""
with open(filepath, "r", encoding="utf-8") as f:
for line in f:
line = line.strip()
if line:
yield json.loads(line)

# Usage
log_event("events.jsonl", "user_login", {"user_id": 42, "ip": "10.0.0.1"})
log_event("events.jsonl", "purchase", {"user_id": 42, "amount": 99.99})

for event in read_log("events.jsonl"):
print(event["event"], event["ts"])
# user_login 2024-01-15T14:30:00.000000Z
# purchase 2024-01-15T14:30:01.234567Z

Pattern 3: Config File with Schema Validation

import json
from pathlib import Path

DEFAULT_CONFIG = {
"database": {"host": "localhost", "port": 5432},
"debug": False,
"log_level": "INFO",
}

def load_config(config_path: str | Path) -> dict:
"""
Load JSON config file, falling back to defaults for missing keys.
Validates required keys are present.
"""
path = Path(config_path)

if not path.exists():
return DEFAULT_CONFIG.copy()

with path.open("r", encoding="utf-8") as f:
try:
user_config = json.load(f)
except json.JSONDecodeError as e:
raise ValueError(f"Config file {path} is not valid JSON: {e}") from e

# Deep merge: user config overrides defaults
config = DEFAULT_CONFIG.copy()
for key, value in user_config.items():
if isinstance(value, dict) and key in config and isinstance(config[key], dict):
config[key] = {**config[key], **value}
else:
config[key] = value

return config

Pattern 4: Feature Store Serialization (ML Context)

import json
import numpy as np
from datetime import datetime

class FeatureStoreEncoder(json.JSONEncoder):
"""Encoder for ML feature data including numpy types."""

def default(self, obj):
# numpy scalars
if isinstance(obj, (np.integer,)):
return int(obj)
if isinstance(obj, (np.floating,)):
return float(obj)
# numpy arrays - convert to nested lists
if isinstance(obj, np.ndarray):
return obj.tolist()
if isinstance(obj, datetime):
return obj.isoformat()
return super().default(obj)

# Simulated feature vector
features = {
"user_id": np.int64(12345),
"embedding": np.array([0.1, 0.2, 0.3, 0.4]),
"click_rate": np.float32(0.045),
"computed_at": datetime.utcnow(),
}

json_str = json.dumps(features, cls=FeatureStoreEncoder)
print(json_str)
# {"user_id": 12345, "embedding": [0.1, 0.2, 0.3, 0.4], "click_rate": 0.04500000178813934, "computed_at": "2024-01-15T..."}

Interview Questions

Q1: What are the six JSON types, and what do they map to in Python?

Answer: JSON has exactly six types:

  • object maps to Python dict
  • array maps to Python list
  • string maps to Python str
  • number maps to Python int (if no decimal point) or float (if decimal point present)
  • true/false map to Python True/False
  • null maps to Python None

Everything else in Python - datetime, UUID, Decimal, bytes, set, custom objects - must be explicitly converted before JSON serialization.

Q2: What is the difference between json.dumps() and json.dump()?

Answer: json.dumps() serializes a Python object to a string (the s stands for "string"). json.dump() serializes to a file-like object - any object with a .write() method. Both accept the same keyword arguments (indent, sort_keys, cls, default, etc.). Use dumps() when you need the JSON as a string in memory (e.g., for an HTTP response body, for hashing). Use dump() when writing directly to a file to avoid holding the entire string in memory.

Q3: How do you serialize a datetime object to JSON? How do you deserialize it back?

Answer: datetime is not JSON-serializable by default. There are two main approaches:

  1. Simple (no round-trip guarantee): datetime.now().isoformat() produces a string like "2024-01-15T14:30:00". Deserialize with datetime.fromisoformat(s).

  2. Round-trip with type markers:

# Encode
def encode(obj):
if isinstance(obj, datetime):
return {"__type__": "datetime", "value": obj.isoformat()}
raise TypeError

# Decode
def decode(obj):
if obj.get("__type__") == "datetime":
return datetime.fromisoformat(obj["value"])
return obj

json.dumps(data, default=encode)
json.loads(json_str, object_hook=decode)

Use object_hook to restore the Python type during deserialization.

Q4: You need to hash a dict to use as a cache key. How do you do it correctly with JSON?

Answer: Use json.dumps(d, sort_keys=True, separators=(',', ':')) to get a canonical representation. Without sort_keys=True, two dicts with the same content but different insertion order would produce different strings (though in Python 3.7+ dicts preserve insertion order, so same code always yields the same order - but sort_keys=True is still the safe, explicit choice). Without separators=(',', ':'), whitespace in the default output is harmless but wasteful.

import json, hashlib

def cache_key(params: dict) -> str:
canonical = json.dumps(params, sort_keys=True, separators=(',', ':'))
return hashlib.sha256(canonical.encode()).hexdigest()

Q5: What is object_hook in json.loads() and when would you use it?

Answer: object_hook is a callable that is called for every JSON object (dict) parsed. The return value replaces the default dict. It enables custom deserialization - turning type-annotated dicts back into proper Python objects.

Use it when you control both the encoder and decoder and want true round-trip fidelity. For example, if you encode datetime as {"__type__": "datetime", "value": "..."}, your object_hook checks for "__type__" and reconstructs the datetime. Without object_hook, you would need to walk the deserialized dict manually.

Q6: When should you use orjson instead of the standard json module?

Answer: Use orjson when:

  1. You are serializing more than ~10,000 JSON objects per second (high-throughput APIs, event streams, ML inference servers)
  2. Your data contains datetime, UUID, or numpy arrays - orjson handles them natively without a custom encoder
  3. You are writing JSON to network sockets where bytes output is more efficient than str

orjson is 10x–100x faster than stdlib json because it is implemented in Rust. The main difference is that orjson.dumps() returns bytes, not str. This is fine for file I/O in binary mode or HTTP response bodies, but requires .decode() if you need a string.

Practice Challenges

Beginner: Build a Simple Config File Manager

Write a module that loads a JSON config file on startup and saves updates back to disk.

Requirements:

  • load_config(path) - load from file, return dict; create file with defaults if it doesn't exist
  • save_config(path, config) - save dict to file with indent=2
  • get(path, key, default=None) - get a value from config
  • set(path, key, value) - update a value and immediately persist
Solution
import json
from pathlib import Path

DEFAULTS = {
"theme": "dark",
"language": "en",
"notifications": True,
"max_retries": 3,
}

def load_config(path: str | Path) -> dict:
"""Load config from JSON file, creating it with defaults if absent."""
path = Path(path)

if not path.exists():
config = DEFAULTS.copy()
save_config(path, config)
return config

with path.open("r", encoding="utf-8") as f:
try:
return json.load(f)
except json.JSONDecodeError as e:
print(f"Warning: config file corrupted ({e}), using defaults")
return DEFAULTS.copy()

def save_config(path: str | Path, config: dict) -> None:
"""Save config dict to JSON file with readable formatting."""
path = Path(path)
path.parent.mkdir(parents=True, exist_ok=True)

with path.open("w", encoding="utf-8") as f:
json.dump(config, f, indent=2, sort_keys=True)
f.write("\n") # Trailing newline - POSIX convention

def get(path: str | Path, key: str, default=None):
"""Get a single value from the config file."""
config = load_config(path)
return config.get(key, default)

def set_value(path: str | Path, key: str, value) -> None:
"""Update a single config value and persist immediately."""
config = load_config(path)
config[key] = value
save_config(path, config)


# Demo
config_path = "/tmp/demo_config.json"

# First load creates the file with defaults
config = load_config(config_path)
print(config)
# {'language': 'en', 'max_retries': 3, 'notifications': True, 'theme': 'dark'}

# Update a value
set_value(config_path, "theme", "light")
set_value(config_path, "max_retries", 5)

# Read back
print(get(config_path, "theme")) # light
print(get(config_path, "max_retries")) # 5
print(get(config_path, "missing", 42)) # 42 (default)

# Verify file contents
with open(config_path) as f:
print(f.read())
# {
# "language": "en",
# "max_retries": 5,
# "notifications": true,
# "theme": "light"
# }

Intermediate: Full Round-Trip Serializer for Custom Types

Build a SmartJSON class that handles datetime, Decimal, UUID, set, and dataclasses - with full round-trip fidelity (deserializing restores original Python types).

Solution
import json
from datetime import datetime
from decimal import Decimal
import uuid
from dataclasses import dataclass, asdict, fields

# Type marker key
TYPE_KEY = "__python_type__"

class SmartEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return {TYPE_KEY: "datetime", "v": obj.isoformat()}
if isinstance(obj, Decimal):
return {TYPE_KEY: "decimal", "v": str(obj)}
if isinstance(obj, uuid.UUID):
return {TYPE_KEY: "uuid", "v": str(obj)}
if isinstance(obj, (set, frozenset)):
return {TYPE_KEY: "set", "v": sorted(str(i) for i in obj)}
if hasattr(obj, "__dataclass_fields__"):
return {TYPE_KEY: "dataclass", "cls": type(obj).__name__, "v": asdict(obj)}
return super().default(obj)


def smart_decoder(obj: dict):
"""object_hook that restores Python types from type-annotated dicts."""
if TYPE_KEY not in obj:
return obj

kind = obj[TYPE_KEY]
val = obj["v"]

if kind == "datetime":
return datetime.fromisoformat(val)
if kind == "decimal":
return Decimal(val)
if kind == "uuid":
return uuid.UUID(val)
if kind == "set":
return set(val)
if kind == "dataclass":
# Note: restoring to dict since we don't have the class in scope here
# In production, maintain a registry of dataclass types
return val

return obj # Unknown type - pass through


class SmartJSON:
"""Drop-in replacement for json module with extended type support."""

@staticmethod
def dumps(obj, **kwargs) -> str:
return json.dumps(obj, cls=SmartEncoder, **kwargs)

@staticmethod
def loads(s: str, **kwargs):
return json.loads(s, object_hook=smart_decoder, **kwargs)

@staticmethod
def dump(obj, fp, **kwargs) -> None:
json.dump(obj, fp, cls=SmartEncoder, **kwargs)

@staticmethod
def load(fp, **kwargs):
return json.load(fp, object_hook=smart_decoder, **kwargs)


# Test round-trips
@dataclass
class Order:
order_id: str
amount: Decimal
created: datetime

data = {
"session_id": uuid.UUID("12345678-1234-5678-1234-567812345678"),
"timestamp": datetime(2024, 1, 15, 14, 30, 0),
"price": Decimal("1234.56"),
"tags": {"python", "backend", "v2"},
}

# Encode
encoded = SmartJSON.dumps(data, indent=2)
print(encoded)

# Decode - restores all original types
restored = SmartJSON.loads(encoded)

print(type(restored["session_id"])) # <class 'uuid.UUID'>
print(type(restored["timestamp"])) # <class 'datetime.datetime'>
print(type(restored["price"])) # <class 'decimal.Decimal'>
print(type(restored["tags"])) # <class 'set'>

# Verify values survived round-trip exactly
assert restored["price"] == Decimal("1234.56") # No float precision loss!
assert restored["timestamp"] == datetime(2024, 1, 15, 14, 30, 0)
print("All round-trip assertions passed.")

Advanced: High-Throughput JSONL Pipeline

Build an event processing pipeline that reads a JSONL log file, filters and transforms events, and writes results to a new JSONL file. Handle malformed lines gracefully. Benchmark the stdlib json version against orjson.

Solution
import json
import time
import random
from datetime import datetime, timedelta
from pathlib import Path
from typing import Iterator

# ── Generate sample data ─────────────────────────────────────────────────────

def generate_events(path: str, count: int = 10_000) -> None:
"""Generate a sample JSONL event log."""
event_types = ["page_view", "click", "purchase", "search", "logout"]

base_time = datetime(2024, 1, 1)

with open(path, "w", encoding="utf-8") as f:
for i in range(count):
ts = base_time + timedelta(seconds=i * 0.5)
event = {
"id": i,
"type": random.choice(event_types),
"user_id": random.randint(1, 1000),
"ts": ts.isoformat() + "Z",
"value": round(random.uniform(0, 1000), 2),
}
f.write(json.dumps(event, separators=(',', ':')) + "\n")

# Inject some bad lines
f.write("not json at all\n")
f.write('{"incomplete": \n')
f.write("\n") # Empty line


# ── Pipeline with stdlib json ─────────────────────────────────────────────────

def read_jsonl(path: str) -> Iterator[dict]:
"""Yield parsed events, skipping malformed lines."""
with open(path, "r", encoding="utf-8") as f:
for line_num, line in enumerate(f, 1):
line = line.strip()
if not line:
continue
try:
yield json.loads(line)
except json.JSONDecodeError as e:
print(f" Skipping bad line {line_num}: {e.msg}")


def process_events(
input_path: str,
output_path: str,
event_filter: str,
min_value: float,
) -> int:
"""
Filter events by type and minimum value, write to new JSONL file.
Returns count of events written.
"""
written = 0

with open(output_path, "w", encoding="utf-8") as out_f:
for event in read_jsonl(input_path):
if event.get("type") != event_filter:
continue
if event.get("value", 0) < min_value:
continue

# Transform: add processing timestamp
event["processed_at"] = datetime.utcnow().isoformat() + "Z"

out_f.write(json.dumps(event, separators=(',', ':')) + "\n")
written += 1

return written


# ── Benchmark ────────────────────────────────────────────────────────────────

def benchmark():
input_path = "/tmp/events.jsonl"
output_path = "/tmp/purchases.jsonl"

print("Generating 10,000 events...")
generate_events(input_path, 10_000)

print("\nProcessing with stdlib json:")
start = time.perf_counter()
count = process_events(input_path, output_path, "purchase", 100.0)
elapsed = time.perf_counter() - start
print(f" Wrote {count} purchase events in {elapsed:.4f}s")

# Try orjson if available
try:
import orjson

def process_events_orjson(input_path, output_path, event_filter, min_value):
written = 0
with open(input_path, "rb") as in_f, open(output_path, "wb") as out_f:
for line in in_f:
line = line.strip()
if not line:
continue
try:
event = orjson.loads(line)
except orjson.JSONDecodeError:
continue
if event.get("type") != event_filter:
continue
if event.get("value", 0) < min_value:
continue
event["processed_at"] = datetime.utcnow().isoformat() + "Z"
out_f.write(orjson.dumps(event) + b"\n")
written += 1
return written

print("\nProcessing with orjson:")
start = time.perf_counter()
count = process_events_orjson(input_path, "/tmp/purchases_orjson.jsonl", "purchase", 100.0)
elapsed = time.perf_counter() - start
print(f" Wrote {count} purchase events in {elapsed:.4f}s")

except ImportError:
print("\norjson not installed. Install with: pip install orjson")

# Verify output
events = list(read_jsonl(output_path))
print(f"\nVerification: first purchase event:")
print(json.dumps(events[0], indent=2))

benchmark()
# Generating 10,000 events...
# Processing with stdlib json:
# Skipping bad line 10001: Expecting value
# Skipping bad line 10002: Expecting property name enclosed in double quotes
# Wrote ~476 purchase events in 0.0234s
# Processing with orjson:
# Wrote ~476 purchase events in 0.0031s (≈7x faster)

Quick Reference

OperationSyntaxNotes
Object to JSON stringjson.dumps(obj)Returns str
JSON string to objectjson.loads(s)Returns Python type
Object to JSON filejson.dump(obj, f)f must be open for writing
JSON file to objectjson.load(f)f must be open for reading
Pretty printjson.dumps(obj, indent=2)Indent in spaces
Sorted keysjson.dumps(obj, sort_keys=True)Alphabetical key order
Compact outputjson.dumps(obj, separators=(',', ':'))No spaces, smaller payload
Unicode directjson.dumps(obj, ensure_ascii=False)Write non-ASCII as-is
Custom encoder classjson.dumps(obj, cls=MyEncoder)Subclass json.JSONEncoder
Custom encoder functionjson.dumps(obj, default=fn)fn(obj) must return serializable value
Custom decoderjson.loads(s, object_hook=fn)Called for every JSON object
Handle parse errorsjson.JSONDecodeErrorSubclass of ValueError
Python→JSON TruetrueCase-sensitive
Python→JSON NonenullCase-sensitive
Python→JSON dict{} objectKeys must be strings
Python→JSON tuple[] arrayTuples become arrays

Key Takeaways

  • JSON has exactly six types: object, array, string, number, boolean, null - everything else requires explicit handling
  • Use json.dumps() / json.loads() for string round-trips; use json.dump() / json.load() for file I/O
  • Always open JSON files with encoding="utf-8" - JSON is defined to be UTF-8 by spec
  • indent=2 for human-readable output; separators=(',', ':') for compact network payloads; sort_keys=True for deterministic hashing
  • Extend json.JSONEncoder and override default() for systematic custom-type handling across your application
  • Use object_hook in json.loads() to achieve full round-trip fidelity - restoring original Python types on deserialization
  • For financial data: serialize Decimal as str, not float, to avoid floating-point precision loss
  • At high throughput (10k+ ops/sec), reach for orjson - it is 10x–100x faster and handles datetime, UUID, and numpy arrays natively
  • JSONL (one JSON object per line) is the standard format for structured logs, event streams, and ML training datasets
© 2026 EngineersOfAI. All rights reserved.