Skip to main content

Instructor - Structured Outputs with Pydantic

Opening Scenario: The Right Tool for the API World

A data science team at a marketing company needs to extract structured insights from thousands of customer reviews daily. They use OpenAI's GPT-4o through the API. Their current approach - a custom prompt with json.loads() and manual retry logic - fails 8% of the time and has accumulated 400 lines of error-handling code over six months.

A new engineer on the team suggests Instructor. The migration takes four hours. The error-handling code shrinks from 400 lines to 30. The failure rate drops to under 0.5%. Complex features they had been deferring - nested objects, discriminated unions, custom validators - become trivial to add.

Instructor does not use constrained decoding (it works with black-box API providers where logit access is unavailable). Instead, it uses a different approach: inject the schema as a system prompt, use the provider's built-in JSON capabilities, validate the output with Pydantic, and automatically re-prompt with the validation error on failure. This retry-until-valid loop, combined with good prompt engineering, reaches very high reliability in practice.

What Is Instructor?

Instructor (python-instructor/instructor, created by Jason Liu) is a Python library that wraps LLM API clients to add:

  1. Pydantic model injection: Your Pydantic model is automatically converted to the appropriate API format (JSON Schema for tool calling, system prompt for JSON mode, etc.)
  2. Automatic retry: When Pydantic validation fails, Instructor automatically re-prompts with the validation error as context
  3. Multi-provider support: Works with OpenAI, Anthropic, Cohere, Gemini, Mistral, and more - same code, different backends
  4. Streaming support: Stream partial structured objects as they generate
  5. Hooks and callbacks: Observe retry attempts, log failures, measure latency

Installation

pip install instructor
# Or with specific provider:
pip install "instructor[anthropic]"
pip install "instructor[gemini]"

The Core Pattern

import instructor
from openai import OpenAI
from pydantic import BaseModel


class User(BaseModel):
name: str
age: int
email: str


# Patch the OpenAI client with Instructor
client = instructor.from_openai(OpenAI())

# Extract structured data - Instructor handles everything
user = client.chat.completions.create(
model="gpt-4o",
response_model=User, # <- Your Pydantic model
messages=[
{"role": "user", "content": "Extract: John Smith, 28 years old, [email protected]"}
],
)

# No json.loads(), no try/except, no validation code
print(user.name) # "John Smith"
print(user.age) # 28
print(user.email) # "[email protected]"
print(type(user)) # <class '__main__.User'>

That is the entire API. The complexity is inside Instructor.

How Instructor Works Under the Hood

Instructor uses one of two mechanisms depending on the provider and configuration:

Tool call mode (used by default with GPT-4 and Claude): Instructor converts your Pydantic model to a tool/function definition and asks the model to "call" that function. The function arguments are the structured data. This tends to have higher reliability than JSON mode because the model's training specifically optimizes for tool call format.

JSON mode: For models that support JSON mode but not tool calls, or when explicitly configured, Instructor uses JSON mode with schema injection in the system prompt.

Multi-Provider Support

The same code works across providers with minimal changes:

import instructor
from pydantic import BaseModel, Field
from typing import List, Optional


class ArticleAnalysis(BaseModel):
title: str
topics: List[str] = Field(max_length=5)
sentiment: str = Field(description="overall sentiment: positive/negative/neutral")
key_claim: str = Field(max_length=300)
confidence: float = Field(ge=0.0, le=1.0)


# ===== OpenAI =====
import instructor
from openai import OpenAI

openai_client = instructor.from_openai(OpenAI())
result = openai_client.chat.completions.create(
model="gpt-4o",
response_model=ArticleAnalysis,
messages=[{"role": "user", "content": "Analyze this article: " + article_text}],
)


# ===== Anthropic =====
import instructor
from anthropic import Anthropic

anthropic_client = instructor.from_anthropic(Anthropic())
result = anthropic_client.messages.create(
model="claude-3-5-sonnet-20241022",
response_model=ArticleAnalysis,
max_tokens=1024,
messages=[{"role": "user", "content": "Analyze this article: " + article_text}],
)


# ===== Google Gemini =====
import instructor
import google.generativeai as genai

genai.configure(api_key="your-api-key")
gemini_client = instructor.from_gemini(
client=genai.GenerativeModel(model_name="gemini-1.5-flash")
)
result = gemini_client.chat.completions.create(
response_model=ArticleAnalysis,
messages=[{"role": "user", "content": "Analyze this article: " + article_text}],
)


# ===== Mistral =====
import instructor
from mistralai.client import MistralClient

mistral_client = instructor.from_mistral(MistralClient(api_key="your-api-key"))
result = mistral_client.chat.completions.create(
model="mistral-large-latest",
response_model=ArticleAnalysis,
messages=[{"role": "user", "content": "Analyze this article: " + article_text}],
)

# result is always ArticleAnalysis regardless of which provider you used
print(result.topics) # List[str]
print(result.confidence) # float 0.0-1.0

Configuring Retry Behavior

Instructor's retry mechanism is highly configurable:

import instructor
from instructor.exceptions import InstructorRetryException
from openai import OpenAI
from pydantic import BaseModel, validator, Field
import logging

logger = logging.getLogger(__name__)


class ExtractedData(BaseModel):
amount: float = Field(ge=0.01, description="Amount in dollars, must be positive")
category: str = Field(description="One of: food, transport, entertainment, utilities")
date: str = Field(pattern=r"\d{4}-\d{2}-\d{2}", description="ISO date format")

@validator("category")
def validate_category(cls, v):
valid = {"food", "transport", "entertainment", "utilities"}
if v.lower() not in valid:
raise ValueError(
f"Category must be one of {valid}, got '{v}'. "
f"Map similar categories: 'groceries' -> 'food', 'taxi' -> 'transport'."
)
return v.lower()


client = instructor.from_openai(
OpenAI(),
mode=instructor.Mode.TOOLS, # Explicit tool calling mode
)


def extract_transaction(text: str) -> ExtractedData:
"""
Extract transaction data with retry and error logging.
"""
try:
return client.chat.completions.create(
model="gpt-4o-mini",
response_model=ExtractedData,
max_retries=3, # Retry up to 3 times on validation failure
messages=[
{
"role": "system",
"content": (
"Extract transaction data. "
"Category must be: food, transport, entertainment, utilities."
),
},
{"role": "user", "content": text},
],
)
except InstructorRetryException as e:
logger.error(
"Extraction failed after %d retries. Last error: %s",
e.n_attempts,
str(e.last_completion),
)
raise


# The validator error message IS the retry prompt
# If model outputs category="groceries", validator raises:
# ValueError("Category must be one of {...}, got 'groceries'.
# Map similar categories: 'groceries' -> 'food'")
# Instructor sends this error to the model, which then outputs category="food"

Nested Models and Complex Structures

Instructor handles arbitrarily nested Pydantic models:

from pydantic import BaseModel, Field
from typing import List, Optional, Union
from enum import Enum
import instructor
from openai import OpenAI


class SkillLevel(str, Enum):
BEGINNER = "beginner"
INTERMEDIATE = "intermediate"
ADVANCED = "advanced"
EXPERT = "expert"


class Skill(BaseModel):
name: str
level: SkillLevel
years_experience: Optional[float] = Field(None, ge=0)


class Education(BaseModel):
degree: str
field: str
institution: str
graduation_year: Optional[int] = Field(None, ge=1950, le=2030)


class WorkExperience(BaseModel):
company: str
title: str
start_year: int = Field(ge=1980, le=2030)
end_year: Optional[int] = Field(None, ge=1980, le=2030)
is_current: bool = False
responsibilities: List[str] = Field(max_length=5)


class ResumeExtract(BaseModel):
full_name: str
email: Optional[str] = None
phone: Optional[str] = None
skills: List[Skill] = Field(max_length=20)
education: List[Education] = Field(max_length=5)
experience: List[WorkExperience] = Field(max_length=10)
summary: Optional[str] = Field(None, max_length=500)


client = instructor.from_openai(OpenAI())

def parse_resume(resume_text: str) -> ResumeExtract:
"""Parse a resume into structured data."""
return client.chat.completions.create(
model="gpt-4o",
response_model=ResumeExtract,
max_retries=2,
messages=[
{
"role": "system",
"content": "You are a resume parser. Extract all information from resumes into structured JSON."
},
{"role": "user", "content": resume_text},
],
temperature=0, # Greedy for consistency
)


# Usage
resume_text = """
Alice Chen
[email protected] | (555) 123-4567

EXPERIENCE
Senior Software Engineer at TechCorp (2021 - Present)
- Led migration to microservices architecture
- Mentored team of 5 engineers

Software Engineer at StartupXYZ (2018 - 2021)
- Built real-time data pipeline

EDUCATION
BS Computer Science, MIT, 2018

SKILLS
Python (Expert), Go (Advanced), Kubernetes (Intermediate)
"""

resume = parse_resume(resume_text)
print(resume.full_name) # "Alice Chen"
print(resume.skills[0].name) # "Python"
print(resume.skills[0].level) # SkillLevel.EXPERT
print(resume.experience[0].is_current) # True
print(resume.education[0].graduation_year) # 2018

Discriminated Unions: Dynamic Schema Selection

One of Instructor's more powerful features is discriminated unions - where the model determines which subtype to extract:

from pydantic import BaseModel, Field
from typing import Union, Annotated, Literal
import instructor
from openai import OpenAI


class EmailEvent(BaseModel):
event_type: Literal["email"] = "email"
sender: str
subject: str
body_preview: str = Field(max_length=200)


class CalendarEvent(BaseModel):
event_type: Literal["calendar"] = "calendar"
title: str
start_time: str
attendees: list[str]


class TaskEvent(BaseModel):
event_type: Literal["task"] = "task"
title: str
due_date: Optional[str] = None
priority: Literal["low", "medium", "high"] = "medium"


# Discriminated union - model picks which type to extract
WorkplaceEvent = Annotated[
Union[EmailEvent, CalendarEvent, TaskEvent],
Field(discriminator="event_type")
]


client = instructor.from_openai(OpenAI())

def classify_and_extract(notification_text: str) -> WorkplaceEvent:
"""
Classify and extract any workplace notification.
The model determines event_type, which determines which fields to extract.
"""
return client.chat.completions.create(
model="gpt-4o",
response_model=WorkplaceEvent,
messages=[
{
"role": "system",
"content": "Classify and extract workplace events from text.",
},
{"role": "user", "content": notification_text},
],
)


# Test
texts = [
"You received an email from [email protected]: 'Q3 Review Meeting'",
"Meeting: Product Launch Planning, 2pm tomorrow with Alice, Bob",
"Task assigned: Fix login bug, due Friday, Priority: High",
]

for text in texts:
event = classify_and_extract(text)
print(f"Type: {event.event_type}")
if isinstance(event, EmailEvent):
print(f" From: {event.sender}")
elif isinstance(event, CalendarEvent):
print(f" Attendees: {event.attendees}")
elif isinstance(event, TaskEvent):
print(f" Priority: {event.priority}")

Streaming Structured Outputs

For large extractions where you want to display partial results as they stream:

import instructor
from openai import OpenAI
from pydantic import BaseModel
from typing import List

client = instructor.from_openai(OpenAI())


class ReportSection(BaseModel):
title: str
content: str
key_points: List[str]


class FullReport(BaseModel):
executive_summary: str
sections: List[ReportSection]
recommendations: List[str]


def stream_report_analysis(document: str):
"""
Stream a structured report analysis.
Yields partial FullReport objects as they build up.
"""
# instructor.Partial allows streaming incomplete models
stream = client.chat.completions.create_partial(
model="gpt-4o",
response_model=FullReport,
messages=[
{"role": "system", "content": "Generate a structured analysis report."},
{"role": "user", "content": document},
],
stream=True,
)

for partial_report in stream:
# partial_report is a FullReport with some fields potentially None
if partial_report.executive_summary:
# Display as it streams
yield partial_report

# By the end, partial_report is a complete FullReport


# Usage with a display loop
for partial in stream_report_analysis(long_document):
if partial.executive_summary:
print(f"\rSummary: {partial.executive_summary[:80]}...", end="")
if partial.sections:
print(f"\nSections found: {len(partial.sections)}", end="")

Instructor vs Outlines: When to Use Each

"""
Decision framework for choosing Instructor vs Outlines.
"""

def choose_structured_generation_approach(
using_api_provider: bool, # OpenAI, Anthropic, Cohere, Gemini
need_zero_failures: bool, # Absolute 100% structural guarantee
need_streaming: bool,
schema_complexity: str, # "simple", "complex", "recursive"
retry_latency_acceptable: bool, # Can afford 2-10s retry overhead?
) -> str:

if using_api_provider:
if need_zero_failures and not retry_latency_acceptable:
return (
"Use OpenAI Structured Outputs or Anthropic Tool Use - "
"provider-side constrained generation for API models"
)
elif schema_complexity == "recursive":
return (
"Use Instructor - OpenAI Structured Outputs doesn't support recursive schemas; "
"Instructor's retry loop handles complex schemas"
)
else:
return "Use Instructor - validated retry with multi-provider support"

else: # Local model
if need_zero_failures:
return "Use Outlines - mathematically guaranteed, no retry needed"
elif need_streaming:
return "Use Outlines with vLLM streaming integration"
elif schema_complexity == "recursive":
return (
"Use Instructor with local model client (ollama) - "
"Outlines doesn't support recursive schemas"
)
else:
return "Use Outlines - preferred for local models"


# Summary table:
print("""
Use Case | Recommended Tool
--------------------------------- | ----------------
OpenAI/Anthropic API, simple schema | Instructor
OpenAI API, need 100% guarantee | OpenAI Structured Outputs
Local model (Llama, Mistral) | Outlines
Local model, recursive schema | Instructor + Ollama
Streaming structured output (API) | Instructor streaming
Streaming (local model) | Outlines + vLLM
High-throughput local serving | Outlines + vLLM
""")

Adding Observability to Instructor

import instructor
from openai import OpenAI
from pydantic import BaseModel
import time
import logging
from dataclasses import dataclass, field
from typing import List


@dataclass
class ExtractionMetrics:
total_requests: int = 0
total_retries: int = 0
failures: int = 0
latencies: List[float] = field(default_factory=list)

@property
def retry_rate(self):
return self.total_retries / max(self.total_requests, 1)

@property
def failure_rate(self):
return self.failures / max(self.total_requests, 1)

@property
def avg_latency(self):
return sum(self.latencies) / len(self.latencies) if self.latencies else 0

def report(self):
print(f"Total requests: {self.total_requests}")
print(f"Retry rate: {self.retry_rate:.1%}")
print(f"Failure rate: {self.failure_rate:.1%}")
print(f"Avg latency: {self.avg_latency:.3f}s")


metrics = ExtractionMetrics()

# Create patched client with hooks
raw_openai_client = OpenAI()
client = instructor.from_openai(raw_openai_client)


class MyModel(BaseModel):
name: str
value: float


def extract_with_metrics(text: str) -> MyModel:
"""Extract with full observability."""
metrics.total_requests += 1
start_time = time.perf_counter()
retry_count = 0

# Instructor hook: called on each retry attempt
def on_retry(attempt: int, error: Exception, messages: list):
nonlocal retry_count
retry_count += 1
metrics.total_retries += 1
logging.warning(
"Retry %d for extraction. Error: %s",
attempt,
str(error)[:200],
)

try:
result = client.chat.completions.create(
model="gpt-4o-mini",
response_model=MyModel,
max_retries=3,
messages=[{"role": "user", "content": text}],
)
return result

except Exception as e:
metrics.failures += 1
raise

finally:
elapsed = time.perf_counter() - start_time
metrics.latencies.append(elapsed)

Common Mistakes

:::danger Relying on Instructor for Zero-Failure Guarantees Instructor dramatically improves reliability - typically from 85-95% to 97-99.5% success rates. But it is not a zero-failure system. Instructor can still fail if: the model refuses to produce structured output (explicit refusal), the validation error message is unclear and the model cannot correct it, max_retries is exhausted on a particularly difficult input, or the provider returns an error. For applications requiring absolute zero structural failures, use constrained decoding (Outlines for local models, OpenAI Structured Outputs for API usage). :::

:::warning Not Customizing Validator Error Messages Instructor's retry mechanism works by sending the Pydantic validation error back to the model. Default Pydantic error messages are technical and model-unfriendly: Value error, 1 validation error for User, age: Value is not a valid integer. Custom error messages dramatically improve retry success rates: "Field 'age' must be a whole number between 1 and 120. You provided 'thirty-five' - please convert to an integer." Always write validator raise ValueError() messages as if you are instructing the model, not a human developer. :::

:::warning Using Instructor With Temperature > 0 Without Setting max_retries At temperature > 0, the model is sampling, and retries may produce different (possibly valid) outputs. At temperature = 0, retries may produce the same invalid output repeatedly, wasting API calls. Set temperature=0 for extraction tasks (you want the most likely valid output) and max_retries=2-3 as a safety net for edge cases. At temperature > 0, keep max_retries=2 - the randomness means retries have a genuine chance of succeeding. :::

Interview Q&A

Q1: How does Instructor achieve higher reliability than naive JSON prompting?

Instructor uses three mechanisms in combination: (1) Schema injection - the Pydantic model is automatically converted to an appropriate format (tool definition, JSON schema) and injected into the system prompt or as API parameters, giving the model precise structural guidance; (2) Validated parsing - Pydantic validation runs on every response, catching not just JSON parse errors but also type errors, missing fields, and custom business logic violations; (3) Automatic retry with error context - when validation fails, Instructor re-prompts with the specific validation error as context, effectively telling the model "you tried X, but that's wrong because Y. Try again." This retry-with-feedback loop is surprisingly effective because models can correct specific, explicit errors much more reliably than they can produce correct output on the first try.

Q2: What is the advantage of using tool call mode vs JSON mode in Instructor?

Tool call mode (used by default with GPT-4o and Claude) converts the Pydantic schema to a function/tool definition and asks the model to "call" the function with the extracted data as arguments. This mode tends to produce higher-quality structured output because: (1) Models like GPT-4o have been RLHF-trained specifically to produce well-formed tool calls - it is a well-optimized code path in their training; (2) Tool call arguments are guaranteed to be valid JSON by the API (not just valid JSON in general - well-formed function arguments); (3) The mental model for the model is cleaner - it is "filling in parameters" rather than "generating a JSON document." JSON mode is used as a fallback when tool calls aren't available, or when the tool call overhead adds too much to the prompt length.

Q3: How do you handle the case where Instructor exhausts all retries and still fails?

Instructor raises InstructorRetryException when max_retries is exhausted. This exception contains the number of attempts made and the last completion object. In production, handle this explicitly: log the failure with full context (input text, last model response, validation error), emit a metric for failure rate monitoring, decide on a fallback strategy: (a) return None and handle downstream (appropriate for optional enrichment), (b) return a default/empty model (appropriate when partial data is acceptable), (c) route to a more capable model for retry (e.g., retry with GPT-4o when GPT-4o-mini fails), or (d) queue for human review (appropriate for high-stakes extractions like medical records). Never silently swallow this exception - the failure is information that should be observed and acted upon.

Q4: Can Instructor work with local models? What are the limitations?

Yes, Instructor works with local models through several clients: instructor.from_openai(AsyncOpenAI(base_url="http://localhost:11434/v1")) for Ollama, instructor.from_openai(AsyncOpenAI(base_url="http://localhost:8000/v1")) for vLLM OpenAI-compatible API, and instructor.from_anthropic(...) for locally-served Anthropic-compatible endpoints. Limitations: (1) Tool call support varies - not all local model servers implement the tool call API correctly; use JSON mode as fallback with mode=instructor.Mode.JSON; (2) Smaller local models (7B) have higher first-try failure rates than GPT-4o, making the retry mechanism more important and potentially exhausted more often; (3) Local models may have different JSON output reliability than API models due to differences in training data and RLHF. For high-reliability local model structured output, Outlines (constrained decoding) is often preferable to Instructor.

Q5: Walk through a production design using Instructor for a document classification pipeline.

Design: (1) Schema - define a DocumentClassification(BaseModel) with fields: category: Literal[...] (bounded list of categories), confidence: float = Field(ge=0, le=1), reason: str = Field(max_length=300), requires_review: bool. (2) Client setup - instructor.from_openai(AsyncOpenAI(), mode=Mode.TOOLS) for async processing. (3) Retry config - max_retries=2, temperature=0. (4) Validator error messages - descriptive and actionable: raise ValueError(f"Category must be one of {VALID_CATEGORIES}. '{v}' is not valid. For ambiguous documents use 'other'."). (5) Failure handling - except InstructorRetryException: emit_metric('classification_failure'); return DocumentClassification(category='other', confidence=0.0, reason='extraction_failed', requires_review=True). (6) Observability - log every retry with input truncation and error type; monitor retry_rate and failure_rate dashboards. (7) Batch processing - use asyncio.gather with semaphore for concurrent API calls with rate limiting.

:::tip 🎮 Interactive Playground

Visualize this concept: Try the Constrained Decoding & Structured Generation demo on the EngineersOfAI Playground - no code required.

:::

© 2026 EngineersOfAI. All rights reserved.