Skip to main content

Module 13: Structured Generation

The Problem This Module Solves

You have a 95% accurate extraction pipeline. Your LLM correctly identifies entities, relationships, and values from documents 95% of the time. You are proud of it. Then your engineering manager asks: "What happens to the 5% that fail?"

You explain: the model sometimes outputs malformed JSON. Sometimes it wraps the JSON in markdown code blocks. Sometimes it makes up field names that aren't in your schema. Sometimes it outputs a sentence instead of a structured object. Your downstream system crashes on these failures. You have a retry loop that adds 3 seconds of latency. You have a fallback parser with regexes that cover most cases. You have alerts for when the failure rate spikes above 8%.

The engineering manager's follow-up: "Why aren't you getting 100% valid JSON?"

The answer to that question - and the path to guaranteed schema-conformant output - is what this module covers.

What You Will Learn

Module Concepts at a Glance

LessonCore ConceptKey Tools / Papers
1Why free-text LLM output breaks production systemsFailure mode taxonomy
2How constrained decoding guarantees valid outputOutlines paper, FSMs, token masking
3Grammar-constrained generation with Outlinesdottxt-ai/outlines
4Pydantic-based extraction with retry logicInstructor (Jason Liu)
5JSON mode, structured outputs, tool callingOpenAI, Anthropic, Gemini APIs
6Programmatic LLM controlGuidance (Microsoft), LMQL
7Production reliability patternsRetry, fallback, monitoring

Prerequisites

You should be comfortable with:

  • Python type hints and Pydantic v2 models
  • Making API calls to OpenAI, Anthropic, or similar LLM APIs
  • JSON and JSON Schema basics
  • Basic understanding of language model decoding (greedy search, sampling)

Why This Module Matters for Your Career

Every serious LLM application needs structured output. Chatbots that expose parsed information to downstream services, document extraction pipelines, classification systems, API response generators - all of them need the model to produce machine-readable output, not just fluent text.

The gap between "it usually works" and "it always works" in production systems is entirely determined by how you approach structured generation. Engineers who understand constrained decoding, Instructor's retry patterns, and production monitoring will build the reliable systems that others struggle to maintain.

The 30-Second Intuition

When an LLM generates text, it samples from a probability distribution over all possible next tokens at each step. Unguided, it can produce any token sequence - including malformed JSON, wrong field names, or narrative text when you wanted structured data.

Structured generation constrains this distribution at each step: instead of sampling from all tokens, you only allow tokens that are consistent with a valid completion of your target structure (JSON, regex pattern, Python type). The model's probability distribution is masked - invalid token choices are suppressed. The result is mathematically guaranteed to conform to the target structure.

This is the difference between "please give me JSON" (hoping the model cooperates) and "you are only allowed to generate valid JSON" (enforcing it at the generation level).


Proceed to Lesson 1 to understand the production failure modes that make structured generation essential.

© 2026 EngineersOfAI. All rights reserved.