A complete guide to native JSON mode, OpenAI Structured Outputs, tool calling for structured data, Anthropic tool use, parallel tool calls, and schema design best practices.

How does OpenAI structured outputs work in practice?

JSON Mode and Tool/Function Schemas covers JSON mode, OpenAI structured outputs, tool calling from first principles with code examples. Free lesson at https://engineersofai.com/docs/llms/structured-generation/json-mode-and-tool-schemas

What is the difference between JSON mode and tool calling?

See the full breakdown at https://engineersofai.com/docs/llms/structured-generation/json-mode-and-tool-schemas

JSON Mode and Tool/Function Schemas

Opening Scenario: The Provider-Side Guarantee

When OpenAI introduced Structured Outputs in November 2023 (separate from the earlier JSON mode), it was a quiet but significant announcement. For the first time, a frontier API provider was offering a mathematical guarantee: if you use structured outputs with a supported model, the response will conform exactly to your JSON schema. Zero failures. No need for retry logic for schema conformance.

The engineering implication was substantial. Teams that had been maintaining complex retry and parsing infrastructure could simplify their code dramatically. Teams that had been blocked by the 0.5-1% residual failure rate of Instructor + JSON mode (for applications requiring absolute guarantees like medical records or financial transactions) now had a path forward.

This lesson covers how to use these provider-side structured generation features effectively - and where their limits are, which is where Outlines (Lesson 3) and Instructor (Lesson 4) step in.

JSON Mode vs Structured Outputs: The Critical Distinction

OpenAI has two different "make it return JSON" features, and they are often confused:

Feature	Guarantee	Schema Enforcement	Release
`response_format={"type": "json_object"}`	Valid JSON syntax	None - model decides structure	March 2023
`response_format={"type": "json_schema", "json_schema": {...}}`	Valid JSON + schema conformance	Exact schema adherence	November 2023
`client.beta.chat.completions.parse(response_model=Model)`	Valid JSON + Pydantic model	Exact Pydantic model adherence	November 2023

JSON mode gives you valid JSON. Structured Outputs gives you valid JSON that matches your schema. They are entirely different reliability levels.

JSON Mode: What It Does and Doesn't Guarantee

from openai import OpenAI
import json

client = OpenAI()

# JSON mode - guarantees valid JSON, nothing else
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "system",
            "content": "Extract person data as JSON with fields: name, age, email",
        },
        {"role": "user", "content": "Alice Johnson, 32, [email protected]"},
    ],
    response_format={"type": "json_object"},  # JSON mode
    temperature=0,
)

raw_text = response.choices[0].message.content
data = json.loads(raw_text)  # Always succeeds - guaranteed valid JSON

# What you GET:
# {"name": "Alice Johnson", "age": 32, "email": "[email protected]"}

# What you might ALSO get (all valid JSON, all can break your code):
# {"person_name": "Alice Johnson", ...}       <- field name drift
# {"name": "Alice", "surname": "Johnson", ...}  <- schema divergence
# {"name": "Alice Johnson", "age": "32", ...} <- string instead of int
# {"name": "Alice Johnson", "info": {...}}    <- unexpected structure
# {"result": {"name": "Alice Johnson", ...}}  <- extra wrapping

# JSON mode does NOT prevent any of these

The practical consequence: JSON mode eliminates parse errors but does not eliminate schema validation errors. Your code must still validate the parsed JSON against your expected structure.

OpenAI Structured Outputs: The Guarantee

Structured Outputs use constrained decoding at OpenAI's inference infrastructure level. The result is schema-guaranteed output:

from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Optional, List
import json

client = OpenAI()


class LineItem(BaseModel):
    description: str
    quantity: int
    unit_price: float
    line_total: float


class Invoice(BaseModel):
    vendor_name: str
    invoice_number: str
    invoice_date: str
    line_items: List[LineItem]
    subtotal: float
    tax: float
    total: float
    currency: str


def extract_invoice_structured(document_text: str) -> Invoice:
    """
    Extract invoice using OpenAI Structured Outputs.
    Guaranteed to return a valid Invoice - no retry needed.
    """
    response = client.beta.chat.completions.parse(
        model="gpt-4o-2024-08-06",  # Must support structured outputs
        messages=[
            {
                "role": "system",
                "content": "Extract invoice data from documents.",
            },
            {
                "role": "user",
                "content": f"Extract this invoice:\n\n{document_text}",
            },
        ],
        response_format=Invoice,  # Pydantic model directly
        temperature=0,
    )

    # response.choices[0].message.parsed is already an Invoice object
    # No json.loads(), no validation, no error handling needed for structure
    return response.choices[0].message.parsed


# Usage
invoice = extract_invoice_structured("""
INVOICE #INV-2024-001
From: Acme Corp, 123 Main St
To: Your Company
Items:
  Widget A × 5 @ $10.00 = $50.00
  Widget B × 3 @ $25.00 = $75.00
Subtotal: $125.00
Tax (8%): $10.00
Total: $135.00
""")

print(invoice.vendor_name)        # "Acme Corp"
print(invoice.line_items[0].quantity)  # 5
print(type(invoice.total))        # <class 'float'>

Handling Refusals with Structured Outputs

When using Structured Outputs, the model might still refuse to produce an output (for safety reasons or if the document is irrelevant):

response = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[...],
    response_format=Invoice,
)

message = response.choices[0].message

# Always check for refusal BEFORE accessing .parsed
if message.refusal:
    print(f"Model refused: {message.refusal}")
    # Handle refusal: log it, route to human review, etc.
else:
    invoice = message.parsed
    # invoice is guaranteed to be a valid Invoice

Tool/Function Calling: The Practical Structured Data Pattern

Function calling (now called "tool calling") was introduced before Structured Outputs and remains the primary way to get structured data from LLMs in most production systems. The key insight: when you ask a model to "call a function," it outputs a structured argument object - which is exactly what you need for structured data extraction.

from openai import OpenAI
import json

client = OpenAI()

# Define the "function" the model should call
# This is really just structured output via the function calling API
tools = [
    {
        "type": "function",
        "function": {
            "name": "extract_contact_info",
            "description": "Extract contact information from text",
            "parameters": {
                "type": "object",
                "properties": {
                    "full_name": {
                        "type": "string",
                        "description": "Person's full name"
                    },
                    "email": {
                        "type": "string",
                        "description": "Email address"
                    },
                    "phone": {
                        "type": "string",
                        "description": "Phone number in format (XXX) XXX-XXXX"
                    },
                    "company": {
                        "type": "string",
                        "description": "Company name if mentioned"
                    }
                },
                "required": ["full_name"],
                "additionalProperties": False,
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "Extract contact info: John Smith at Acme Corp, [email protected], (555) 123-4567"
        }
    ],
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "extract_contact_info"}},
    # tool_choice="auto" lets the model decide whether to call
    # "required" forces a tool call (useful for extraction)
)

# Parse the tool call arguments
tool_call = response.choices[0].message.tool_calls[0]
contact = json.loads(tool_call.function.arguments)
print(contact)
# {"full_name": "John Smith", "company": "Acme Corp",
#  "email": "[email protected]", "phone": "(555) 123-4567"}

Anthropic Tool Use: Parallel Approach

Anthropic's API uses a similar but distinct tool use pattern:

import anthropic
import json
from pydantic import BaseModel
from typing import Optional


class SentimentAnalysis(BaseModel):
    sentiment: str
    confidence: float
    key_phrases: list[str]
    explanation: str


def extract_with_anthropic_tools(text: str) -> SentimentAnalysis:
    """Extract structured data using Anthropic tool use."""
    client = anthropic.Anthropic()

    # Define the tool
    tools = [
        {
            "name": "analyze_sentiment",
            "description": "Analyze the sentiment of text and return structured results",
            "input_schema": {
                "type": "object",
                "properties": {
                    "sentiment": {
                        "type": "string",
                        "enum": ["positive", "negative", "neutral", "mixed"],
                        "description": "The overall sentiment"
                    },
                    "confidence": {
                        "type": "number",
                        "minimum": 0.0,
                        "maximum": 1.0,
                        "description": "Confidence score 0-1"
                    },
                    "key_phrases": {
                        "type": "array",
                        "items": {"type": "string"},
                        "maxItems": 5,
                        "description": "Key phrases that indicate sentiment"
                    },
                    "explanation": {
                        "type": "string",
                        "maxLength": 200,
                        "description": "Brief explanation of the sentiment"
                    }
                },
                "required": ["sentiment", "confidence", "key_phrases", "explanation"]
            }
        }
    ]

    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        tools=tools,
        tool_choice={"type": "tool", "name": "analyze_sentiment"},  # Force tool use
        messages=[
            {"role": "user", "content": f"Analyze sentiment: {text}"}
        ],
    )

    # Find the tool use block in the response
    for block in response.content:
        if block.type == "tool_use":
            return SentimentAnalysis(**block.input)

    raise ValueError("No tool use block in response")

Parallel Tool Calls: Multiple Structured Outputs

One of the most powerful tool calling features is getting multiple structured outputs in a single API call:

from openai import OpenAI
import json

client = OpenAI()

# Define multiple extraction tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "extract_entities",
            "description": "Extract named entities from text",
            "parameters": {
                "type": "object",
                "properties": {
                    "persons": {"type": "array", "items": {"type": "string"}},
                    "organizations": {"type": "array", "items": {"type": "string"}},
                    "locations": {"type": "array", "items": {"type": "string"}},
                    "dates": {"type": "array", "items": {"type": "string"}},
                },
                "required": ["persons", "organizations", "locations", "dates"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "extract_sentiment",
            "description": "Analyze document sentiment",
            "parameters": {
                "type": "object",
                "properties": {
                    "overall_sentiment": {
                        "type": "string",
                        "enum": ["positive", "negative", "neutral"],
                    },
                    "sentiment_score": {"type": "number"},
                    "key_phrases": {"type": "array", "items": {"type": "string"}},
                },
                "required": ["overall_sentiment", "sentiment_score"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "extract_topics",
            "description": "Identify main topics",
            "parameters": {
                "type": "object",
                "properties": {
                    "primary_topic": {"type": "string"},
                    "secondary_topics": {"type": "array", "items": {"type": "string"}},
                    "category": {
                        "type": "string",
                        "enum": ["politics", "business", "technology", "science", "sports", "entertainment", "other"],
                    },
                },
                "required": ["primary_topic", "category"],
            },
        },
    },
]


def analyze_document_parallel(text: str) -> dict:
    """
    Run three extractions in a single API call using parallel tool use.
    More efficient than three separate calls.
    """
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": "Analyze the document using all available tools.",
            },
            {"role": "user", "content": text},
        ],
        tools=tools,
        tool_choice="required",  # Must use at least one tool
        parallel_tool_calls=True,  # Allow multiple tools in one response
    )

    results = {}
    for tool_call in response.choices[0].message.tool_calls:
        fn_name = tool_call.function.name
        fn_args = json.loads(tool_call.function.arguments)
        results[fn_name] = fn_args

    return results


# Single API call returns all three analyses
analysis = analyze_document_parallel(news_article)
entities = analysis.get("extract_entities", {})
sentiment = analysis.get("extract_sentiment", {})
topics = analysis.get("extract_topics", {})

print(f"Persons: {entities.get('persons', [])}")
print(f"Sentiment: {sentiment.get('overall_sentiment')}")
print(f"Category: {topics.get('category')}")

Schema Design for Tool Calling

The quality of your tool schema directly affects extraction quality. Here are the key principles:

# PRINCIPLE 1: Use enums for bounded values - don't use open strings

# BAD: Open string for category
{
    "name": "category",
    "type": "string",
    "description": "The category of the document"
}
# Model might output: "business news", "Business", "BUSINESS", "financial news"

# GOOD: Bounded enum
{
    "name": "category",
    "type": "string",
    "enum": ["politics", "business", "technology", "science", "sports", "other"],
    "description": "The category of the document"
}
# Model will output exactly one of the 6 options


# PRINCIPLE 2: Be specific in descriptions - they ARE prompt engineering

# BAD: Vague description
{"name": "date", "type": "string", "description": "The date"}

# GOOD: Specific format instruction
{
    "name": "date",
    "type": "string",
    "description": "Date in ISO format YYYY-MM-DD. Use null if date is not mentioned.",
    "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}$"
}


# PRINCIPLE 3: Set reasonable limits on arrays

# BAD: Unbounded array
{"name": "tags", "type": "array", "items": {"type": "string"}}
# Model might output 50 tags for a simple document

# GOOD: Bounded array
{
    "name": "tags",
    "type": "array",
    "items": {"type": "string"},
    "minItems": 1,
    "maxItems": 8,
    "description": "3-8 relevant tags for the document"
}


# PRINCIPLE 4: Use additionalProperties: false to prevent hallucinated fields

# BAD: Open schema
{
    "type": "object",
    "properties": {"name": {"type": "string"}}
    # Missing: "additionalProperties": false
}
# Model might add {"name": "Alice", "id": "12345", "source": "LinkedIn"}

# GOOD: Closed schema
{
    "type": "object",
    "properties": {"name": {"type": "string"}},
    "required": ["name"],
    "additionalProperties": false  # No extra fields allowed
}


# PRINCIPLE 5: Make optional fields explicit with clear null handling

# BAD: Implicit optional
{
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "email": {"type": "string"}  # What if email is missing?
    },
    "required": ["name"]  # Only "name" required - email is optional
}
# Model behavior for missing email: omit field? Empty string? "unknown"?

# GOOD: Explicit nullable
{
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "email": {
            "oneOf": [{"type": "string"}, {"type": "null"}],
            "description": "Email address, or null if not mentioned"
        }
    },
    "required": ["name", "email"]  # Both required, email can be null
}

When to Use Tool Calling vs JSON Schema vs Outlines

Limitations of Provider Structured Outputs

Schema Complexity Limits

OpenAI's Structured Outputs have documented limits:

Maximum 100 object properties per schema (nested included)
No recursive schemas (a type that contains itself)
No anyOf containing non-object types in some configurations
All optional properties must still be defined (no additionalProperties: true)

# This will fail with OpenAI Structured Outputs:
from pydantic import BaseModel
from typing import Optional, List

class TreeNode(BaseModel):
    value: str
    children: Optional[List["TreeNode"]] = None  # Recursive! Not supported.

# Workaround: limit depth explicitly
class TreeNodeL3(BaseModel):
    value: str

class TreeNodeL2(BaseModel):
    value: str
    children: Optional[List[TreeNodeL3]] = None

class TreeNodeL1(BaseModel):
    value: str
    children: Optional[List[TreeNodeL2]] = None

Dynamic Schemas

If your schema changes based on runtime data (e.g., the user configures custom fields), OpenAI Structured Outputs require registering the schema in advance (it is precompiled server-side). This limits dynamic schema applications.

For dynamic schemas, use Instructor (which sends the schema as a prompt each time) or Outlines (which recompiles the FSM for each new schema, with caching for repeated schemas).

Production Usage Patterns

import asyncio
from openai import AsyncOpenAI
from pydantic import BaseModel
from typing import List, Optional
import time


class ProductReview(BaseModel):
    product_name: str
    rating: int
    pros: List[str]
    cons: List[str]
    would_recommend: bool
    target_audience: Optional[str] = None


client = AsyncOpenAI()


async def extract_review(review_text: str, semaphore: asyncio.Semaphore) -> ProductReview:
    """Extract structured review data with rate limiting."""
    async with semaphore:
        response = await client.beta.chat.completions.parse(
            model="gpt-4o-mini",
            messages=[
                {
                    "role": "system",
                    "content": "Extract product review data into a structured format.",
                },
                {"role": "user", "content": review_text},
            ],
            response_format=ProductReview,
            temperature=0,
        )

        message = response.choices[0].message
        if message.refusal:
            # Return a default for refused inputs
            return ProductReview(
                product_name="Unknown",
                rating=0,
                pros=[],
                cons=["Could not extract review"],
                would_recommend=False,
            )

        return message.parsed


async def process_reviews_batch(reviews: List[str]) -> List[ProductReview]:
    """Process many reviews concurrently with rate limiting."""
    # Limit to 20 concurrent API calls
    semaphore = asyncio.Semaphore(20)

    tasks = [extract_review(review, semaphore) for review in reviews]
    results = await asyncio.gather(*tasks, return_exceptions=True)

    # Handle any exceptions from gather
    structured_results = []
    for result in results:
        if isinstance(result, Exception):
            print(f"Review extraction failed: {result}")
            structured_results.append(None)
        else:
            structured_results.append(result)

    return structured_results

Common Mistakes

:::danger Confusing JSON Mode and Structured Outputs JSON mode (response_format={"type": "json_object"}) guarantees valid JSON syntax but not schema conformance. Structured Outputs (response_format=Model with client.beta.chat.completions.parse()) guarantees JSON + exact schema conformance. These are entirely different reliability levels. Many engineers implement what they believe is "structured outputs" but are actually using JSON mode, then wonder why their pipeline still has field drift and type errors. Always verify which API you are calling. :::

:::warning Not Checking for Refusals with Structured Outputs When using client.beta.chat.completions.parse(), always check message.refusal before accessing message.parsed. If the model refuses (for safety reasons, or if the content doesn't match the schema's intent), message.parsed will be None and message.refusal will contain the refusal message. Accessing message.parsed without checking will raise an AttributeError or return None depending on the SDK version. :::

:::warning Schema Too Complex for Structured Outputs OpenAI Structured Outputs have a limit of ~100 total properties. Complex schemas with many nested objects can exceed this. The error is not always obvious - the API may silently fall back to non-guaranteed output, or return a 400 error. For complex schemas exceeding these limits, use Instructor (which sends the schema as prompt context without server-side precompilation limits) or split into multiple smaller extraction steps. :::

Interview Q&A

Q1: What is the difference between OpenAI JSON mode and Structured Outputs, and when would you use each?

JSON mode (response_format={"type": "json_object"}) uses a server-side instruction that tells the model to focus on producing valid JSON. It guarantees syntactically valid JSON but not schema conformance - field names can drift, types can be wrong, required fields can be missing. Structured Outputs (client.beta.chat.completions.parse(response_format=Model)) uses constrained decoding on OpenAI's infrastructure to enforce exact schema adherence. Use JSON mode when you only need valid JSON and your code can tolerate schema variations with Pydantic validation + Instructor retry. Use Structured Outputs when you need absolute guarantee of schema conformance (medical records, financial transactions) and can accept the schema complexity limits.

Q2: Explain how tool/function calling achieves structured output and why it often outperforms JSON mode.

Tool calling works by asking the model to "invoke a function" with arguments matching the function's parameter schema. The model outputs a structured JSON object in the function_call.arguments field, designed specifically for tool argument format. It outperforms JSON mode for three reasons: (1) OpenAI and Anthropic models are specifically trained to produce well-formed tool arguments as a distinct task - the training signal is different from generic JSON generation; (2) The semantic framing ("fill in this function's parameters") may activate better attention to the schema constraints; (3) Tool arguments are a distinct API format that providers may enforce more strictly than response_format=json_object. The tool_choice="required" or tool_choice={"type": "function", "name": "..."} parameters force tool use, guaranteeing structured output format.

Q3: How do parallel tool calls work, and what are their advantages?

Parallel tool calls allow the model to invoke multiple tools in a single API response. Instead of making three separate API calls to extract entities, sentiment, and topics from a document, you define three tools and the model calls all three in one response. The parallel_tool_calls=True parameter enables this in OpenAI's API. Advantages: (1) Reduced latency - one API round trip instead of three; (2) Reduced cost - one context window processed instead of three (the input text is sent once); (3) Consistency - all three extractions happen over the same context with the same understanding; (4) Single rate limit charge - one call to rate limits and quota. The tradeoff: response parsing is slightly more complex (iterate over message.tool_calls).

Q4: What schema design principles maximize extraction quality with tool calling?

Five key principles: (1) Use enum for bounded string values - never open-ended strings for fields with known valid options (sentiment, category, priority, status). The model's uncertainty is focused within the valid options rather than producing arbitrary text. (2) Write descriptions as instructions - field descriptions are part of the prompt. "The invoice total in USD as a decimal number, e.g. 450.00" is better than "The total." (3) Set additionalProperties: false - prevents the model from adding hallucinated fields. (4) Make all fields required with explicit null handling for optional data - "oneOf": [{"type": "string"}, {"type": "null"}] with required: true is more reliable than leaving fields out of required. (5) Add examples in descriptions for ambiguous fields - "Start date in ISO format, e.g. '2024-01-15'" eliminates date format ambiguity.

Q5: When would you choose tool calling over Outlines/constrained decoding for structured output?

Use tool calling when: (1) You are using an API provider (OpenAI, Anthropic, Gemini) and don't want to run your own model; (2) The schema is well within complexity limits (under 100 fields); (3) A 99-99.9% reliability target is sufficient (not absolute zero); (4) You value the simplicity of the tool calling API over the setup cost of local model deployment + Outlines; (5) You need to use a frontier model (GPT-4o, Claude 3.5) that is unavailable as a local model. Choose Outlines when: (1) Running local models (cost, privacy, or performance reasons); (2) Absolute zero structural failure rate is required; (3) Schema complexity exceeds tool calling limits; (4) Schema changes dynamically at runtime (Outlines recompiles efficiently); (5) Retry latency from Instructor is unacceptable for your SLA.

:::tip 🎮 Interactive Playground

Visualize this concept: Try the Constrained Decoding & Structured Generation demo on the EngineersOfAI Playground - no code required.

:::

Opening Scenario: The Provider-Side Guarantee​

JSON Mode vs Structured Outputs: The Critical Distinction​

JSON Mode: What It Does and Doesn't Guarantee​

OpenAI Structured Outputs: The Guarantee​

Handling Refusals with Structured Outputs​

Tool/Function Calling: The Practical Structured Data Pattern​

Anthropic Tool Use: Parallel Approach​

Parallel Tool Calls: Multiple Structured Outputs​

Schema Design for Tool Calling​

When to Use Tool Calling vs JSON Schema vs Outlines​

Limitations of Provider Structured Outputs​

Schema Complexity Limits​

Dynamic Schemas​

Production Usage Patterns​

Common Mistakes​

Interview Q&A​