What is structured output?

Reliably extract structured data from LLMs using JSON mode, function calling, Pydantic validation, and constrained decoding - the backbone of production LLM pipelines.

How does JSON mode work in practice?

Structured Output and JSON Mode covers structured output, JSON mode, function calling from first principles with code examples. Free lesson at https://engineersofai.com/docs/llms/prompt-engineering/structured-output-and-json-mode

What is the difference between structured output and function calling?

See the full breakdown at https://engineersofai.com/docs/llms/prompt-engineering/structured-output-and-json-mode

Structured Output and JSON Mode

The Pipeline That Broke at 3 AM

It's 3 AM. Your on-call phone rings. The data pipeline that powers your company's daily intelligence report has been failing for 2 hours. 400 errors. All the same: json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes.

You pull the logs. The LLM is returning responses like:

Here's the extracted information:

{
  name: "OpenAI",
  category: "AI Research",
  // Headquarters removed for privacy
  "funding": "$10B"
}

Hope this helps!

Three problems in one response:

Unquoted keys (name: instead of "name":)
A comment inside the JSON (// Headquarters...)
Prose before and after the JSON block

Your json.loads() call fails on all three. Your pipeline is down. You're losing data that took weeks to collect.

Sound familiar? Every engineer who's tried to use LLMs for data extraction has hit this wall. The model understands what you want. It often gets the content right. But the format is unreliable - and unreliable format breaks production pipelines.

This lesson is the complete guide to getting structured output you can actually use.

The Core Problem: LLMs Are Text Generators

Language models generate tokens probabilistically. They don't "know" that {name: "OpenAI"} is invalid JSON. They just know that in their training data, they've seen JSON written many different ways - with and without quoting keys, with comments, with trailing commas - and they sample from all of those patterns.

The problem compounds with complexity. Simple JSON outputs are usually fine. Deeply nested structures, arrays of objects, nullable fields, integer vs. string distinctions - these are where reliability degrades.

The solutions exist on a spectrum from "better prompting" to "mathematical guarantees":

Approach 1: Prompt Engineering for JSON

The baseline approach - ask clearly and hope:

Weak prompt:

Extract company information from this text.

Text: "OpenAI, the AI research company founded in 2015, raised $10B from Microsoft."

Strong prompt for structured output:

Extract company information from the text below and return it as a JSON object.

Required JSON format:
{
  "name": "string - company name",
  "founded_year": "integer - year founded, null if unknown",
  "category": "string - business category",
  "funding": {
    "amount": "number - amount in USD, null if unknown",
    "investor": "string - main investor, null if unknown"
  }
}

Rules:
- Return ONLY the JSON object, no other text
- Use null (not "null" or empty string) for missing values
- All string values must use double quotes
- Do not include comments or trailing commas

Text: "OpenAI, the AI research company founded in 2015, raised $10B from Microsoft."

This works most of the time. But it's not guaranteed. For production pipelines, you need more.

Approach 2: JSON Mode

Several providers offer a "JSON mode" that guarantees the output is valid, parseable JSON:

import anthropic
import json

client = anthropic.Anthropic()

def extract_with_json_prompt(text: str) -> dict:
    """
    Extract structured data using careful prompting.
    Works with any model but requires good prompt design.
    """
    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=500,
        messages=[
            {
                "role": "user",
                "content": f"""Extract the following information from the text and return
as a valid JSON object with these exact fields:
- name: company name (string)
- founded_year: year founded (integer, null if unknown)
- category: business category (string)
- headquarters: city and country (string, null if unknown)
- funding_usd: total funding in USD (integer, null if unknown)

Return ONLY the JSON object, no markdown, no explanation.

Text: {text}"""
            },
            {
                # Pre-fill the assistant's response to force JSON start
                "role": "assistant",
                "content": "{"
            }
        ]
    )

    # The response starts with "{" because we pre-filled it
    raw = "{" + message.content[0].text
    return json.loads(raw)


# The assistant pre-fill trick: by starting the assistant's turn with "{",
# we force the model to continue completing a JSON object.
# This dramatically reduces prose-before-JSON failures.

result = extract_with_json_prompt(
    "OpenAI, the AI research company founded in 2015, has headquarters in San Francisco."
    " They raised $10 billion from Microsoft."
)
print(json.dumps(result, indent=2))

The Assistant Pre-Fill Trick

One of the most reliable JSON prompting techniques for Claude: pre-fill the assistant's turn with {. This forces the model to continue completing a JSON object rather than starting with prose.

messages=[
    {"role": "user", "content": "Extract info and return as JSON: ..."},
    {"role": "assistant", "content": "{"}  # Force JSON start
]

The model must now complete: {"name": ... - it can't produce prose before the JSON.

Approach 3: Tool Use / Function Calling (Recommended for Production)

The most reliable approach: define your schema as a tool, and let the model "call" it with structured arguments. The API validates the schema - not just that output is valid JSON, but that it matches your specific schema.

import anthropic
import json
from typing import Optional

client = anthropic.Anthropic()

# Define the extraction schema as a tool
extraction_tool = {
    "name": "extract_company_info",
    "description": "Extract structured company information from the provided text",
    "input_schema": {
        "type": "object",
        "properties": {
            "name": {
                "type": "string",
                "description": "The company name"
            },
            "founded_year": {
                "type": "integer",
                "description": "Year the company was founded"
            },
            "category": {
                "type": "string",
                "description": "Business category (e.g., 'AI Research', 'E-commerce')"
            },
            "headquarters": {
                "type": "string",
                "description": "City and country of headquarters"
            },
            "funding_usd": {
                "type": "integer",
                "description": "Total funding raised in USD"
            },
            "notable_investors": {
                "type": "array",
                "items": {"type": "string"},
                "description": "List of notable investors"
            }
        },
        "required": ["name", "category"]  # These fields are always required
    }
}


def extract_company_info(text: str) -> dict:
    """
    Extract structured company information using tool calling.
    The API enforces schema compliance - much more reliable than pure prompting.
    """
    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        tools=[extraction_tool],
        tool_choice={"type": "tool", "name": "extract_company_info"},  # Force tool use
        messages=[
            {
                "role": "user",
                "content": f"Extract company information from this text:\n\n{text}"
            }
        ]
    )

    # Find the tool use block
    for block in message.content:
        if block.type == "tool_use" and block.name == "extract_company_info":
            return block.input

    raise ValueError("No tool use block found in response")


# Test with various inputs
texts = [
    "Stripe was founded in 2010 by Patrick and John Collison. Based in San Francisco, "
    "the fintech company has raised over $2.2 billion and processes billions in payments.",

    "Anthropic, an AI safety company, was started in 2021 by former OpenAI researchers. "
    "They've raised approximately $7.3 billion from investors including Google and Spark Capital.",
]

for text in texts:
    info = extract_company_info(text)
    print(json.dumps(info, indent=2))
    print()

Multi-Entity Extraction

Tool calling also makes it easy to extract lists of structured objects:

import anthropic
import json

client = anthropic.Anthropic()

# Tool for extracting multiple entities
multi_entity_tool = {
    "name": "extract_entities",
    "description": "Extract all mentioned entities from the text",
    "input_schema": {
        "type": "object",
        "properties": {
            "companies": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "name": {"type": "string"},
                        "role": {"type": "string", "description": "Their role in the context (e.g., 'acquirer', 'acquired', 'investor')"},
                        "valuation_usd": {"type": "integer"}
                    },
                    "required": ["name", "role"]
                }
            },
            "transaction_type": {
                "type": "string",
                "enum": ["acquisition", "merger", "investment", "ipo", "other"]
            },
            "transaction_date": {
                "type": "string",
                "description": "Date in YYYY-MM-DD format"
            },
            "deal_value_usd": {
                "type": "integer"
            }
        },
        "required": ["companies", "transaction_type"]
    }
}


def extract_financial_event(text: str) -> dict:
    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        tools=[multi_entity_tool],
        tool_choice={"type": "tool", "name": "extract_entities"},
        messages=[{"role": "user", "content": f"Extract financial event details:\n\n{text}"}]
    )

    for block in message.content:
        if block.type == "tool_use":
            return block.input
    return {}


result = extract_financial_event(
    "Microsoft acquired Activision Blizzard in January 2023 for $68.7 billion,"
    " the largest acquisition in gaming history."
)
print(json.dumps(result, indent=2))

Approach 4: Pydantic Integration

Pydantic provides Python-native schema validation and type coercion. Combine it with LLM extraction for a complete typed pipeline:

import anthropic
import json
from pydantic import BaseModel, Field, validator
from typing import Optional, List
from enum import Enum

client = anthropic.Anthropic()

class CompanyCategory(str, Enum):
    AI_RESEARCH = "ai_research"
    FINTECH = "fintech"
    ECOMMERCE = "ecommerce"
    SAAS = "saas"
    BIOTECH = "biotech"
    OTHER = "other"

class FundingRound(BaseModel):
    round_type: str = Field(description="e.g., 'Series A', 'Seed', 'IPO'")
    amount_usd: Optional[int] = Field(None, description="Amount in USD")
    date: Optional[str] = Field(None, description="YYYY-MM-DD format")
    lead_investor: Optional[str] = None

class Company(BaseModel):
    name: str
    category: CompanyCategory
    founded_year: Optional[int] = Field(None, ge=1900, le=2030)
    headquarters: Optional[str] = None
    total_funding_usd: Optional[int] = Field(None, ge=0)
    funding_rounds: List[FundingRound] = Field(default_factory=list)
    is_public: bool = False

    @validator('name')
    def name_not_empty(cls, v):
        if not v.strip():
            raise ValueError('Company name cannot be empty')
        return v.strip()

    @validator('founded_year')
    def reasonable_year(cls, v):
        if v and v > 2026:
            raise ValueError('Founded year cannot be in the future')
        return v


def pydantic_tool_schema(model_class: type[BaseModel]) -> dict:
    """Convert a Pydantic model to a Claude tool schema."""
    schema = model_class.schema()
    return {
        "name": "extract_" + model_class.__name__.lower(),
        "description": f"Extract {model_class.__name__} information from text",
        "input_schema": schema
    }


def extract_typed(text: str, output_class: type[BaseModel]) -> BaseModel:
    """
    Extract structured data and validate with Pydantic.
    Returns a typed Python object.
    """
    tool = pydantic_tool_schema(output_class)
    tool_name = tool["name"]

    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        tools=[tool],
        tool_choice={"type": "tool", "name": tool_name},
        messages=[{
            "role": "user",
            "content": f"Extract information from this text:\n\n{text}"
        }]
    )

    for block in message.content:
        if block.type == "tool_use":
            # Pydantic validates and coerces types
            return output_class(**block.input)

    raise ValueError("Extraction failed")


# Usage with full type safety
text = """
Anthropic was founded in 2021 in San Francisco by Dario Amodei and other former OpenAI
researchers. The AI safety company has raised approximately $7.3 billion in funding,
including a $4 billion investment from Amazon. They are not publicly traded.
"""

company = extract_typed(text, Company)
print(f"Name: {company.name}")
print(f"Category: {company.category.value}")
print(f"Founded: {company.founded_year}")
print(f"Funding: ${company.total_funding_usd:,}" if company.total_funding_usd else "Funding: Unknown")
print(f"Public: {company.is_public}")

Approach 5: Retry Logic and Partial Parse Recovery

Even with tool calling, you should handle failures gracefully:

import anthropic
import json
import re
from typing import TypeVar, Type

client = anthropic.Anthropic()
T = TypeVar('T', bound=BaseModel)


def extract_with_retry(
    text: str,
    output_class: Type[T],
    max_retries: int = 3
) -> T:
    """
    Extract with retry logic and progressive error correction.
    """
    last_error = None

    for attempt in range(max_retries):
        try:
            result = extract_typed(text, output_class)
            return result

        except Exception as e:
            last_error = e

            if attempt < max_retries - 1:
                # Give the model feedback about what went wrong
                print(f"Attempt {attempt + 1} failed: {e}. Retrying with correction prompt.")

    raise ValueError(f"Extraction failed after {max_retries} attempts: {last_error}")


def repair_json(malformed_json: str) -> str:
    """
    Attempt to repair common JSON formatting issues.
    Use this as a fallback when proper structured output isn't available.
    """
    # Remove JavaScript-style comments
    text = re.sub(r'//.*?$', '', malformed_json, flags=re.MULTILINE)
    text = re.sub(r'/\*.*?\*/', '', text, flags=re.DOTALL)

    # Fix unquoted keys
    text = re.sub(r'(\s*)(\w+)(\s*:)', r'\1"\2"\3', text)

    # Remove trailing commas before } or ]
    text = re.sub(r',(\s*[}\]])', r'\1', text)

    # Extract JSON from text that has prose around it
    json_match = re.search(r'\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\}', text, re.DOTALL)
    if json_match:
        text = json_match.group(0)

    return text


def safe_json_parse(text: str) -> dict:
    """
    Try to parse JSON, attempting repair if initial parse fails.
    """
    # Try direct parse first
    try:
        return json.loads(text)
    except json.JSONDecodeError:
        pass

    # Try repair
    try:
        repaired = repair_json(text)
        return json.loads(repaired)
    except json.JSONDecodeError:
        pass

    # Last resort: ask the model to fix it
    fix_message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1000,
        messages=[
            {
                "role": "user",
                "content": f"""The following JSON is malformed. Fix it and return only valid JSON.
Do not add, remove, or change any data - only fix the formatting.

Malformed JSON:
{text}"""
            },
            {
                "role": "assistant",
                "content": "{"
            }
        ]
    )
    fixed = "{" + fix_message.content[0].text
    return json.loads(fixed)

Schema Design Principles

The schema you define directly affects extraction quality. Good schemas:

1. Use Flat Structures When Possible

Complex nested (harder for models):

{
  "company": {
    "info": {
      "basic": {
        "name": "...",
        "founded": "..."
      }
    }
  }
}

Flat (easier for models):

{
  "company_name": "...",
  "founded_year": "..."
}

2. Use Enums for Constrained Fields

class Sentiment(str, Enum):
    POSITIVE = "positive"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"
    MIXED = "mixed"

Enum fields dramatically reduce hallucination and variation in categorical outputs.

3. Separate Mandatory from Optional

Make truly required fields required. Use Optional for fields that may not be in the text. Required fields that are often absent force the model to hallucinate values.

4. Add Descriptions to All Fields

class Article(BaseModel):
    title: str = Field(description="Article headline exactly as written")
    publish_date: str = Field(description="Publication date in YYYY-MM-DD format")
    author: Optional[str] = Field(None, description="Author name, null if not bylined")
    word_count: Optional[int] = Field(None, description="Total word count, null if not available")
    topics: List[str] = Field(description="Main topics, 1-5 single-word or short-phrase topics")

Field descriptions significantly improve extraction accuracy for ambiguous fields.

Constrained Decoding: Mathematical Guarantees

Libraries like Outlines and LM-Format-Enforcer take a different approach: they constrain the token generation process itself, ensuring the output matches a grammar or schema.

# With Outlines (for locally-run models)
import outlines

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")

schema = """{
    "title": "Company",
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "founded_year": {"type": "integer"},
        "is_public": {"type": "boolean"}
    },
    "required": ["name"]
}"""

generator = outlines.generate.json(model, schema)
result = generator(
    "Extract company info from: 'Google was founded in 1998 and went public in 2004.'"
)
# result is GUARANTEED to match the schema
# This is not prompt engineering - it's constrained generation at the token level

How constrained decoding works: At each token generation step, a mask is applied that sets the probability of any token that would violate the schema to zero. The model can only generate tokens that advance toward a valid output.

When to use it: Constrained decoding is ideal for production pipelines where schema compliance must be absolute (not just probable). Requires running your own model or using APIs that support it.

Production Engineering Notes

1. Validate All Model Outputs Before Use

from pydantic import ValidationError

def safe_extract(text: str, model_class: type[BaseModel]) -> tuple[BaseModel | None, str | None]:
    """
    Safe extraction with explicit error handling.
    Returns (result, error_message) - never raises.
    """
    try:
        result = extract_typed(text, model_class)
        return result, None
    except ValidationError as e:
        return None, f"Schema validation failed: {e}"
    except json.JSONDecodeError as e:
        return None, f"JSON parsing failed: {e}"
    except Exception as e:
        return None, f"Extraction failed: {e}"

2. Log and Monitor Extraction Failures

from prometheus_client import Counter

extraction_attempts = Counter('llm_extraction_attempts_total', 'Total extraction attempts')
extraction_failures = Counter('llm_extraction_failures_total', 'Failed extractions', ['reason'])

def monitored_extract(text: str, model_class: type[BaseModel]) -> BaseModel | None:
    extraction_attempts.inc()
    result, error = safe_extract(text, model_class)
    if error:
        reason = error.split(':')[0]  # First word of error
        extraction_failures.labels(reason=reason).inc()
    return result

3. Consider Output Token Cost

Complex schemas with many fields require more output tokens. A schema with 20 optional fields costs more than one with 5 required fields. Design schemas to match what you actually need.

Common Mistakes

:::danger Mistake 1: Parsing Raw LLM Text Without Validation Always validate model output against your expected schema. Never pass raw model text directly to json.loads() without handling the exception - and never pass the parsed dict directly to downstream code without type validation. :::

:::danger Mistake 2: No Retry Logic A single extraction attempt failing is normal. Build retry logic with exponential backoff. For critical pipelines, implement a repair-and-retry strategy. :::

:::warning Mistake 3: Overly Complex Schemas A schema with 30 fields, deep nesting, and complex interdependencies will be extracted unreliably. Design the simplest schema that meets your needs. Split complex extractions into multiple simpler ones if necessary. :::

:::warning Mistake 4: Required Fields for Optional Content Making a field required when the information may not be present in the text forces the model to hallucinate. Use Optional for any field that may not always be available. :::

:::warning Mistake 5: Using JSON Mode Without Schema Validation JSON mode guarantees valid JSON. It does not guarantee your schema. {"status": "error", "message": "Could not extract"} is valid JSON. Always validate schema compliance after parsing. :::

Interview Q&A

Q1: Why is getting consistent structured output from LLMs hard, and what are the main approaches?

LLMs generate tokens probabilistically - they don't inherently "know" JSON syntax rules. They've seen JSON written many ways in training data and sample from all patterns. The problem compounds with complex schemas. The main approaches in order of reliability: (1) Prompt engineering - asking clearly with format specifications; (2) JSON mode - API-level guarantee of valid JSON, but not schema compliance; (3) Tool/function calling - schema-enforced structured output, type-safe; (4) Constrained decoding (Outlines, LM-Format-Enforcer) - mathematical guarantee by constraining token generation at the model level.

Q2: What is the assistant pre-fill trick and how does it work?

You pre-populate the assistant's response with { before making the API call. This forces the model to continue completing a JSON object - it can't generate prose before the JSON because the response has already "started" with {. This eliminates the most common failure mode (the model starting with "Here's the extracted information:" before the JSON). It works because the model predicts the next token conditioned on all previous tokens, including the pre-filled {.

Q3: How does tool/function calling improve structured output reliability?

Tool calling defines a JSON schema in the API call that specifies the expected output structure (field names, types, required fields). The API validates that the model's output matches this schema before returning it to you. The model is also fine-tuned to produce valid tool call responses. This gives you schema compliance, not just syntactic validity. Additionally, with tool_choice: {"type": "tool", "name": "..."}, you force the model to use the tool - eliminating the possibility of it returning prose instead.

Q4: How do you integrate Pydantic with LLM extraction?

Define a Pydantic model that describes your expected output. Convert it to a tool schema using Pydantic's .schema() method. Use tool calling to get the model to populate the schema. Pass the tool call's input dict to your Pydantic model's constructor - Pydantic validates types, applies validators, coerces compatible types, and raises ValidationError for any violations. This gives you a fully typed Python object with validated data, rather than a raw dict.

Q5: What is constrained decoding and when should you use it?

Constrained decoding (implemented by libraries like Outlines and LM-Format-Enforcer) works at the token generation level: at each step, a mask zeros out the probability of any token that would violate the target grammar or schema. The model can only generate tokens that advance toward a valid output. This gives a mathematical guarantee that the output matches the schema - not a probabilistic one. Use it when you need absolute schema compliance (e.g., generating code that must be syntactically valid, structured data that will be directly parsed without error handling), and when you're running your own model or using an API that supports it. It's not available for most cloud API providers.

Q6: How do you design a schema for reliable LLM extraction?

Key principles: (1) Prefer flat over deeply nested structures - models handle shallow schemas better; (2) Use enums for categorical fields - reduces variation and hallucination; (3) Mark fields as Optional if they may not be present in the source text - required fields force hallucination when data is absent; (4) Add detailed descriptions to every field - the model uses these to decide what to extract and how to format it; (5) Use appropriate types - integers for counts and years (prevents "two thousand and twenty three"), booleans for yes/no, strings for free text; (6) Keep the schema to the minimum fields you actually need - more fields means more opportunities for error.

:::tip 🎮 Interactive Playground

Visualize this concept: Try the Structured Output & Constrained Generation demo on the EngineersOfAI Playground - no code required.

:::

The Pipeline That Broke at 3 AM​

The Core Problem: LLMs Are Text Generators​

Approach 1: Prompt Engineering for JSON​

Approach 2: JSON Mode​

The Assistant Pre-Fill Trick​

Approach 3: Tool Use / Function Calling (Recommended for Production)​

Multi-Entity Extraction​

Approach 4: Pydantic Integration​

Approach 5: Retry Logic and Partial Parse Recovery​

Schema Design Principles​

1. Use Flat Structures When Possible​

2. Use Enums for Constrained Fields​

3. Separate Mandatory from Optional​

4. Add Descriptions to All Fields​

Constrained Decoding: Mathematical Guarantees​

Production Engineering Notes​

1. Validate All Model Outputs Before Use​

2. Log and Monitor Extraction Failures​

3. Consider Output Token Cost​

Common Mistakes​

Interview Q&A​