Module 13: Structured Generation

The Problem This Module Solves

You have a 95% accurate extraction pipeline. Your LLM correctly identifies entities, relationships, and values from documents 95% of the time. You are proud of it. Then your engineering manager asks: "What happens to the 5% that fail?"

You explain: the model sometimes outputs malformed JSON. Sometimes it wraps the JSON in markdown code blocks. Sometimes it makes up field names that aren't in your schema. Sometimes it outputs a sentence instead of a structured object. Your downstream system crashes on these failures. You have a retry loop that adds 3 seconds of latency. You have a fallback parser with regexes that cover most cases. You have alerts for when the failure rate spikes above 8%.

The engineering manager's follow-up: "Why aren't you getting 100% valid JSON?"

The answer to that question - and the path to guaranteed schema-conformant output - is what this module covers.

What You Will Learn

Module Concepts at a Glance

Lesson	Core Concept	Key Tools / Papers
1	Why free-text LLM output breaks production systems	Failure mode taxonomy
2	How constrained decoding guarantees valid output	Outlines paper, FSMs, token masking
3	Grammar-constrained generation with Outlines	dottxt-ai/outlines
4	Pydantic-based extraction with retry logic	Instructor (Jason Liu)
5	JSON mode, structured outputs, tool calling	OpenAI, Anthropic, Gemini APIs
6	Programmatic LLM control	Guidance (Microsoft), LMQL
7	Production reliability patterns	Retry, fallback, monitoring

Prerequisites

You should be comfortable with:

Python type hints and Pydantic v2 models
Making API calls to OpenAI, Anthropic, or similar LLM APIs
JSON and JSON Schema basics
Basic understanding of language model decoding (greedy search, sampling)

Why This Module Matters for Your Career

Every serious LLM application needs structured output. Chatbots that expose parsed information to downstream services, document extraction pipelines, classification systems, API response generators - all of them need the model to produce machine-readable output, not just fluent text.

The gap between "it usually works" and "it always works" in production systems is entirely determined by how you approach structured generation. Engineers who understand constrained decoding, Instructor's retry patterns, and production monitoring will build the reliable systems that others struggle to maintain.

The 30-Second Intuition

When an LLM generates text, it samples from a probability distribution over all possible next tokens at each step. Unguided, it can produce any token sequence - including malformed JSON, wrong field names, or narrative text when you wanted structured data.

Structured generation constrains this distribution at each step: instead of sampling from all tokens, you only allow tokens that are consistent with a valid completion of your target structure (JSON, regex pattern, Python type). The model's probability distribution is masked - invalid token choices are suppressed. The result is mathematically guaranteed to conform to the target structure.

This is the difference between "please give me JSON" (hoping the model cooperates) and "you are only allowed to generate valid JSON" (enforcing it at the generation level).

Proceed to Lesson 1 to understand the production failure modes that make structured generation essential.

The Problem This Module Solves​

What You Will Learn​

Module Concepts at a Glance​

Prerequisites​

Why This Module Matters for Your Career​

The 30-Second Intuition​