JSON Mode and Tool/Function Schemas
Opening Scenario: The Provider-Side Guarantee
When OpenAI introduced Structured Outputs in November 2023 (separate from the earlier JSON mode), it was a quiet but significant announcement. For the first time, a frontier API provider was offering a mathematical guarantee: if you use structured outputs with a supported model, the response will conform exactly to your JSON schema. Zero failures. No need for retry logic for schema conformance.
The engineering implication was substantial. Teams that had been maintaining complex retry and parsing infrastructure could simplify their code dramatically. Teams that had been blocked by the 0.5-1% residual failure rate of Instructor + JSON mode (for applications requiring absolute guarantees like medical records or financial transactions) now had a path forward.
This lesson covers how to use these provider-side structured generation features effectively - and where their limits are, which is where Outlines (Lesson 3) and Instructor (Lesson 4) step in.
JSON Mode vs Structured Outputs: The Critical Distinction
OpenAI has two different "make it return JSON" features, and they are often confused:
| Feature | Guarantee | Schema Enforcement | Release |
|---|---|---|---|
response_format={"type": "json_object"} | Valid JSON syntax | None - model decides structure | March 2023 |
response_format={"type": "json_schema", "json_schema": {...}} | Valid JSON + schema conformance | Exact schema adherence | November 2023 |
client.beta.chat.completions.parse(response_model=Model) | Valid JSON + Pydantic model | Exact Pydantic model adherence | November 2023 |
JSON mode gives you valid JSON. Structured Outputs gives you valid JSON that matches your schema. They are entirely different reliability levels.
JSON Mode: What It Does and Doesn't Guarantee
from openai import OpenAI
import json
client = OpenAI()
# JSON mode - guarantees valid JSON, nothing else
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": "Extract person data as JSON with fields: name, age, email",
},
],
response_format={"type": "json_object"}, # JSON mode
temperature=0,
)
raw_text = response.choices[0].message.content
data = json.loads(raw_text) # Always succeeds - guaranteed valid JSON
# What you GET:
# {"name": "Alice Johnson", "age": 32, "email": "[email protected]"}
# What you might ALSO get (all valid JSON, all can break your code):
# {"person_name": "Alice Johnson", ...} <- field name drift
# {"name": "Alice", "surname": "Johnson", ...} <- schema divergence
# {"name": "Alice Johnson", "age": "32", ...} <- string instead of int
# {"name": "Alice Johnson", "info": {...}} <- unexpected structure
# {"result": {"name": "Alice Johnson", ...}} <- extra wrapping
# JSON mode does NOT prevent any of these
The practical consequence: JSON mode eliminates parse errors but does not eliminate schema validation errors. Your code must still validate the parsed JSON against your expected structure.
OpenAI Structured Outputs: The Guarantee
Structured Outputs use constrained decoding at OpenAI's inference infrastructure level. The result is schema-guaranteed output:
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Optional, List
import json
client = OpenAI()
class LineItem(BaseModel):
description: str
quantity: int
unit_price: float
line_total: float
class Invoice(BaseModel):
vendor_name: str
invoice_number: str
invoice_date: str
line_items: List[LineItem]
subtotal: float
tax: float
total: float
currency: str
def extract_invoice_structured(document_text: str) -> Invoice:
"""
Extract invoice using OpenAI Structured Outputs.
Guaranteed to return a valid Invoice - no retry needed.
"""
response = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06", # Must support structured outputs
messages=[
{
"role": "system",
"content": "Extract invoice data from documents.",
},
{
"role": "user",
"content": f"Extract this invoice:\n\n{document_text}",
},
],
response_format=Invoice, # Pydantic model directly
temperature=0,
)
# response.choices[0].message.parsed is already an Invoice object
# No json.loads(), no validation, no error handling needed for structure
return response.choices[0].message.parsed
# Usage
invoice = extract_invoice_structured("""
INVOICE #INV-2024-001
From: Acme Corp, 123 Main St
To: Your Company
Items:
Widget A × 5 @ $10.00 = $50.00
Widget B × 3 @ $25.00 = $75.00
Subtotal: $125.00
Tax (8%): $10.00
Total: $135.00
""")
print(invoice.vendor_name) # "Acme Corp"
print(invoice.line_items[0].quantity) # 5
print(type(invoice.total)) # <class 'float'>
Handling Refusals with Structured Outputs
When using Structured Outputs, the model might still refuse to produce an output (for safety reasons or if the document is irrelevant):
response = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[...],
response_format=Invoice,
)
message = response.choices[0].message
# Always check for refusal BEFORE accessing .parsed
if message.refusal:
print(f"Model refused: {message.refusal}")
# Handle refusal: log it, route to human review, etc.
else:
invoice = message.parsed
# invoice is guaranteed to be a valid Invoice
Tool/Function Calling: The Practical Structured Data Pattern
Function calling (now called "tool calling") was introduced before Structured Outputs and remains the primary way to get structured data from LLMs in most production systems. The key insight: when you ask a model to "call a function," it outputs a structured argument object - which is exactly what you need for structured data extraction.
from openai import OpenAI
import json
client = OpenAI()
# Define the "function" the model should call
# This is really just structured output via the function calling API
tools = [
{
"type": "function",
"function": {
"name": "extract_contact_info",
"description": "Extract contact information from text",
"parameters": {
"type": "object",
"properties": {
"full_name": {
"type": "string",
"description": "Person's full name"
},
"email": {
"type": "string",
"description": "Email address"
},
"phone": {
"type": "string",
"description": "Phone number in format (XXX) XXX-XXXX"
},
"company": {
"type": "string",
"description": "Company name if mentioned"
}
},
"required": ["full_name"],
"additionalProperties": False,
}
}
}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
}
],
tools=tools,
tool_choice={"type": "function", "function": {"name": "extract_contact_info"}},
# tool_choice="auto" lets the model decide whether to call
# "required" forces a tool call (useful for extraction)
)
# Parse the tool call arguments
tool_call = response.choices[0].message.tool_calls[0]
contact = json.loads(tool_call.function.arguments)
print(contact)
# {"full_name": "John Smith", "company": "Acme Corp",
# "email": "[email protected]", "phone": "(555) 123-4567"}
Anthropic Tool Use: Parallel Approach
Anthropic's API uses a similar but distinct tool use pattern:
import anthropic
import json
from pydantic import BaseModel
from typing import Optional
class SentimentAnalysis(BaseModel):
sentiment: str
confidence: float
key_phrases: list[str]
explanation: str
def extract_with_anthropic_tools(text: str) -> SentimentAnalysis:
"""Extract structured data using Anthropic tool use."""
client = anthropic.Anthropic()
# Define the tool
tools = [
{
"name": "analyze_sentiment",
"description": "Analyze the sentiment of text and return structured results",
"input_schema": {
"type": "object",
"properties": {
"sentiment": {
"type": "string",
"enum": ["positive", "negative", "neutral", "mixed"],
"description": "The overall sentiment"
},
"confidence": {
"type": "number",
"minimum": 0.0,
"maximum": 1.0,
"description": "Confidence score 0-1"
},
"key_phrases": {
"type": "array",
"items": {"type": "string"},
"maxItems": 5,
"description": "Key phrases that indicate sentiment"
},
"explanation": {
"type": "string",
"maxLength": 200,
"description": "Brief explanation of the sentiment"
}
},
"required": ["sentiment", "confidence", "key_phrases", "explanation"]
}
}
]
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=tools,
tool_choice={"type": "tool", "name": "analyze_sentiment"}, # Force tool use
messages=[
{"role": "user", "content": f"Analyze sentiment: {text}"}
],
)
# Find the tool use block in the response
for block in response.content:
if block.type == "tool_use":
return SentimentAnalysis(**block.input)
raise ValueError("No tool use block in response")
Parallel Tool Calls: Multiple Structured Outputs
One of the most powerful tool calling features is getting multiple structured outputs in a single API call:
from openai import OpenAI
import json
client = OpenAI()
# Define multiple extraction tools
tools = [
{
"type": "function",
"function": {
"name": "extract_entities",
"description": "Extract named entities from text",
"parameters": {
"type": "object",
"properties": {
"persons": {"type": "array", "items": {"type": "string"}},
"organizations": {"type": "array", "items": {"type": "string"}},
"locations": {"type": "array", "items": {"type": "string"}},
"dates": {"type": "array", "items": {"type": "string"}},
},
"required": ["persons", "organizations", "locations", "dates"],
},
},
},
{
"type": "function",
"function": {
"name": "extract_sentiment",
"description": "Analyze document sentiment",
"parameters": {
"type": "object",
"properties": {
"overall_sentiment": {
"type": "string",
"enum": ["positive", "negative", "neutral"],
},
"sentiment_score": {"type": "number"},
"key_phrases": {"type": "array", "items": {"type": "string"}},
},
"required": ["overall_sentiment", "sentiment_score"],
},
},
},
{
"type": "function",
"function": {
"name": "extract_topics",
"description": "Identify main topics",
"parameters": {
"type": "object",
"properties": {
"primary_topic": {"type": "string"},
"secondary_topics": {"type": "array", "items": {"type": "string"}},
"category": {
"type": "string",
"enum": ["politics", "business", "technology", "science", "sports", "entertainment", "other"],
},
},
"required": ["primary_topic", "category"],
},
},
},
]
def analyze_document_parallel(text: str) -> dict:
"""
Run three extractions in a single API call using parallel tool use.
More efficient than three separate calls.
"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": "Analyze the document using all available tools.",
},
{"role": "user", "content": text},
],
tools=tools,
tool_choice="required", # Must use at least one tool
parallel_tool_calls=True, # Allow multiple tools in one response
)
results = {}
for tool_call in response.choices[0].message.tool_calls:
fn_name = tool_call.function.name
fn_args = json.loads(tool_call.function.arguments)
results[fn_name] = fn_args
return results
# Single API call returns all three analyses
analysis = analyze_document_parallel(news_article)
entities = analysis.get("extract_entities", {})
sentiment = analysis.get("extract_sentiment", {})
topics = analysis.get("extract_topics", {})
print(f"Persons: {entities.get('persons', [])}")
print(f"Sentiment: {sentiment.get('overall_sentiment')}")
print(f"Category: {topics.get('category')}")
Schema Design for Tool Calling
The quality of your tool schema directly affects extraction quality. Here are the key principles:
# PRINCIPLE 1: Use enums for bounded values - don't use open strings
# BAD: Open string for category
{
"name": "category",
"type": "string",
"description": "The category of the document"
}
# Model might output: "business news", "Business", "BUSINESS", "financial news"
# GOOD: Bounded enum
{
"name": "category",
"type": "string",
"enum": ["politics", "business", "technology", "science", "sports", "other"],
"description": "The category of the document"
}
# Model will output exactly one of the 6 options
# PRINCIPLE 2: Be specific in descriptions - they ARE prompt engineering
# BAD: Vague description
{"name": "date", "type": "string", "description": "The date"}
# GOOD: Specific format instruction
{
"name": "date",
"type": "string",
"description": "Date in ISO format YYYY-MM-DD. Use null if date is not mentioned.",
"pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}$"
}
# PRINCIPLE 3: Set reasonable limits on arrays
# BAD: Unbounded array
{"name": "tags", "type": "array", "items": {"type": "string"}}
# Model might output 50 tags for a simple document
# GOOD: Bounded array
{
"name": "tags",
"type": "array",
"items": {"type": "string"},
"minItems": 1,
"maxItems": 8,
"description": "3-8 relevant tags for the document"
}
# PRINCIPLE 4: Use additionalProperties: false to prevent hallucinated fields
# BAD: Open schema
{
"type": "object",
"properties": {"name": {"type": "string"}}
# Missing: "additionalProperties": false
}
# Model might add {"name": "Alice", "id": "12345", "source": "LinkedIn"}
# GOOD: Closed schema
{
"type": "object",
"properties": {"name": {"type": "string"}},
"required": ["name"],
"additionalProperties": false # No extra fields allowed
}
# PRINCIPLE 5: Make optional fields explicit with clear null handling
# BAD: Implicit optional
{
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string"} # What if email is missing?
},
"required": ["name"] # Only "name" required - email is optional
}
# Model behavior for missing email: omit field? Empty string? "unknown"?
# GOOD: Explicit nullable
{
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {
"oneOf": [{"type": "string"}, {"type": "null"}],
"description": "Email address, or null if not mentioned"
}
},
"required": ["name", "email"] # Both required, email can be null
}
When to Use Tool Calling vs JSON Schema vs Outlines
Limitations of Provider Structured Outputs
Schema Complexity Limits
OpenAI's Structured Outputs have documented limits:
- Maximum 100 object properties per schema (nested included)
- No recursive schemas (a type that contains itself)
- No
anyOfcontaining non-object types in some configurations - All optional properties must still be defined (no
additionalProperties: true)
# This will fail with OpenAI Structured Outputs:
from pydantic import BaseModel
from typing import Optional, List
class TreeNode(BaseModel):
value: str
children: Optional[List["TreeNode"]] = None # Recursive! Not supported.
# Workaround: limit depth explicitly
class TreeNodeL3(BaseModel):
value: str
class TreeNodeL2(BaseModel):
value: str
children: Optional[List[TreeNodeL3]] = None
class TreeNodeL1(BaseModel):
value: str
children: Optional[List[TreeNodeL2]] = None
Dynamic Schemas
If your schema changes based on runtime data (e.g., the user configures custom fields), OpenAI Structured Outputs require registering the schema in advance (it is precompiled server-side). This limits dynamic schema applications.
For dynamic schemas, use Instructor (which sends the schema as a prompt each time) or Outlines (which recompiles the FSM for each new schema, with caching for repeated schemas).
Production Usage Patterns
import asyncio
from openai import AsyncOpenAI
from pydantic import BaseModel
from typing import List, Optional
import time
class ProductReview(BaseModel):
product_name: str
rating: int
pros: List[str]
cons: List[str]
would_recommend: bool
target_audience: Optional[str] = None
client = AsyncOpenAI()
async def extract_review(review_text: str, semaphore: asyncio.Semaphore) -> ProductReview:
"""Extract structured review data with rate limiting."""
async with semaphore:
response = await client.beta.chat.completions.parse(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": "Extract product review data into a structured format.",
},
{"role": "user", "content": review_text},
],
response_format=ProductReview,
temperature=0,
)
message = response.choices[0].message
if message.refusal:
# Return a default for refused inputs
return ProductReview(
product_name="Unknown",
rating=0,
pros=[],
cons=["Could not extract review"],
would_recommend=False,
)
return message.parsed
async def process_reviews_batch(reviews: List[str]) -> List[ProductReview]:
"""Process many reviews concurrently with rate limiting."""
# Limit to 20 concurrent API calls
semaphore = asyncio.Semaphore(20)
tasks = [extract_review(review, semaphore) for review in reviews]
results = await asyncio.gather(*tasks, return_exceptions=True)
# Handle any exceptions from gather
structured_results = []
for result in results:
if isinstance(result, Exception):
print(f"Review extraction failed: {result}")
structured_results.append(None)
else:
structured_results.append(result)
return structured_results
Common Mistakes
:::danger Confusing JSON Mode and Structured Outputs
JSON mode (response_format={"type": "json_object"}) guarantees valid JSON syntax but not schema conformance. Structured Outputs (response_format=Model with client.beta.chat.completions.parse()) guarantees JSON + exact schema conformance. These are entirely different reliability levels. Many engineers implement what they believe is "structured outputs" but are actually using JSON mode, then wonder why their pipeline still has field drift and type errors. Always verify which API you are calling.
:::
:::warning Not Checking for Refusals with Structured Outputs
When using client.beta.chat.completions.parse(), always check message.refusal before accessing message.parsed. If the model refuses (for safety reasons, or if the content doesn't match the schema's intent), message.parsed will be None and message.refusal will contain the refusal message. Accessing message.parsed without checking will raise an AttributeError or return None depending on the SDK version.
:::
:::warning Schema Too Complex for Structured Outputs OpenAI Structured Outputs have a limit of ~100 total properties. Complex schemas with many nested objects can exceed this. The error is not always obvious - the API may silently fall back to non-guaranteed output, or return a 400 error. For complex schemas exceeding these limits, use Instructor (which sends the schema as prompt context without server-side precompilation limits) or split into multiple smaller extraction steps. :::
Interview Q&A
Q1: What is the difference between OpenAI JSON mode and Structured Outputs, and when would you use each?
JSON mode (response_format={"type": "json_object"}) uses a server-side instruction that tells the model to focus on producing valid JSON. It guarantees syntactically valid JSON but not schema conformance - field names can drift, types can be wrong, required fields can be missing. Structured Outputs (client.beta.chat.completions.parse(response_format=Model)) uses constrained decoding on OpenAI's infrastructure to enforce exact schema adherence. Use JSON mode when you only need valid JSON and your code can tolerate schema variations with Pydantic validation + Instructor retry. Use Structured Outputs when you need absolute guarantee of schema conformance (medical records, financial transactions) and can accept the schema complexity limits.
Q2: Explain how tool/function calling achieves structured output and why it often outperforms JSON mode.
Tool calling works by asking the model to "invoke a function" with arguments matching the function's parameter schema. The model outputs a structured JSON object in the function_call.arguments field, designed specifically for tool argument format. It outperforms JSON mode for three reasons: (1) OpenAI and Anthropic models are specifically trained to produce well-formed tool arguments as a distinct task - the training signal is different from generic JSON generation; (2) The semantic framing ("fill in this function's parameters") may activate better attention to the schema constraints; (3) Tool arguments are a distinct API format that providers may enforce more strictly than response_format=json_object. The tool_choice="required" or tool_choice={"type": "function", "name": "..."} parameters force tool use, guaranteeing structured output format.
Q3: How do parallel tool calls work, and what are their advantages?
Parallel tool calls allow the model to invoke multiple tools in a single API response. Instead of making three separate API calls to extract entities, sentiment, and topics from a document, you define three tools and the model calls all three in one response. The parallel_tool_calls=True parameter enables this in OpenAI's API. Advantages: (1) Reduced latency - one API round trip instead of three; (2) Reduced cost - one context window processed instead of three (the input text is sent once); (3) Consistency - all three extractions happen over the same context with the same understanding; (4) Single rate limit charge - one call to rate limits and quota. The tradeoff: response parsing is slightly more complex (iterate over message.tool_calls).
Q4: What schema design principles maximize extraction quality with tool calling?
Five key principles: (1) Use enum for bounded string values - never open-ended strings for fields with known valid options (sentiment, category, priority, status). The model's uncertainty is focused within the valid options rather than producing arbitrary text. (2) Write descriptions as instructions - field descriptions are part of the prompt. "The invoice total in USD as a decimal number, e.g. 450.00" is better than "The total." (3) Set additionalProperties: false - prevents the model from adding hallucinated fields. (4) Make all fields required with explicit null handling for optional data - "oneOf": [{"type": "string"}, {"type": "null"}] with required: true is more reliable than leaving fields out of required. (5) Add examples in descriptions for ambiguous fields - "Start date in ISO format, e.g. '2024-01-15'" eliminates date format ambiguity.
Q5: When would you choose tool calling over Outlines/constrained decoding for structured output?
Use tool calling when: (1) You are using an API provider (OpenAI, Anthropic, Gemini) and don't want to run your own model; (2) The schema is well within complexity limits (under 100 fields); (3) A 99-99.9% reliability target is sufficient (not absolute zero); (4) You value the simplicity of the tool calling API over the setup cost of local model deployment + Outlines; (5) You need to use a frontier model (GPT-4o, Claude 3.5) that is unavailable as a local model. Choose Outlines when: (1) Running local models (cost, privacy, or performance reasons); (2) Absolute zero structural failure rate is required; (3) Schema complexity exceeds tool calling limits; (4) Schema changes dynamically at runtime (Outlines recompiles efficiently); (5) Retry latency from Instructor is unacceptable for your SLA.
:::tip 🎮 Interactive Playground
Visualize this concept: Try the Constrained Decoding & Structured Generation demo on the EngineersOfAI Playground - no code required.
:::
