How Microsoft Guidance and LMQL extend structured generation to full programmatic control - interleaving generation with code, SQL-like constraints, token healing, and when each tool wins over Outlines and Instructor.

How does Guidance Microsoft work in practice?

LMQL and Guidance - Programmatic LLM Control covers LMQL, Guidance Microsoft, programmatic LLM from first principles with code examples. Free lesson at https://engineersofai.com/docs/llms/structured-generation/LMQL-and-Guidance

What is the difference between LMQL and programmatic LLM?

See the full breakdown at https://engineersofai.com/docs/llms/structured-generation/LMQL-and-Guidance

LMQL and Guidance - Programmatic LLM Control

Opening Scenario: When a Schema Isn't Enough

You are building a code generation system. The model needs to:

Decide which programming language to use (Python or JavaScript)
If Python: generate a function using specific allowed libraries
If JavaScript: generate using different allowed patterns
Add a docstring that follows the language's convention
Add test cases that reference the function name generated in step 2

This is not a simple schema problem. The valid output at step 3 depends on the decision at step 1. The valid content at step 5 depends on what was generated at step 2. The constraints are interdependent and contextual - exactly the kind of problem that Pydantic schemas cannot express.

This is the domain of programmatic LLM control: systems that interleave code execution with generation, where each generation step can be constrained by the outputs of previous steps. Microsoft Guidance and LMQL (Language Model Query Language) are the two main tools in this space.

Microsoft Guidance: Interleaving Code and Generation

Guidance is a Microsoft Research library that uses a Handlebars-inspired template syntax to specify generation programs - documents where some parts are static text, some parts are generated by the model, and some parts are computed by Python code.

# pip install guidance
import guidance
from guidance import models, gen, select, substring, regex


# Load model (Guidance works with local models and OpenAI)
llm = models.Transformers("microsoft/Phi-3-mini-4k-instruct")


# Example 1: Simple constrained generation
with guidance.system():
    lm = llm + "You are a programming assistant."

with guidance.user():
    lm += "Generate a short function to add two numbers."

with guidance.assistant():
    # Generate function name as a regex-constrained string
    lm += "def " + gen(name="func_name", regex=r"[a-z_]+") + "("
    # Generate parameters (regex constrained)
    lm += gen(name="params", regex=r"[a-z, ]+") + "):"
    # Generate body (free generation with stop sequence)
    lm += "\n    " + gen(name="body", stop="\n\n")

# Access generated parts
print(lm["func_name"])  # e.g., "add_numbers"
print(lm["params"])     # e.g., "a, b"
print(lm["body"])       # e.g., "return a + b"

The Guidance Template Syntax

Guidance templates mix static text with generation calls:

import guidance
from guidance import models, gen, select


llm = models.Transformers("mistralai/Mistral-7B-Instruct-v0.2")


# gen() - generate free text with optional constraints
result = llm + "Name: " + gen(name="name", max_tokens=20, stop="\n")
name = result["name"]  # The generated name

# select() - select from a list of options
result = llm + "Sentiment: " + select(["positive", "negative", "neutral"], name="sentiment")
sentiment = result["sentiment"]  # Always one of the three options

# regex() - generate text matching a pattern
result = llm + "Date: " + gen(name="date", regex=r"\d{4}-\d{2}-\d{2}")
date = result["date"]  # Always YYYY-MM-DD format

# Combining: conditional generation based on previous output
@guidance
def classify_and_explain(lm, text):
    lm += f"Text: {text}\n"
    lm += "Category: " + select(["urgent", "normal", "low"], name="priority")

    if lm["priority"] == "urgent":
        lm += "\nEscalation path: " + select(
            ["call_manager", "send_email", "create_ticket"],
            name="escalation",
        )
    else:
        lm += "\nResponse time: " + gen(name="response_time", regex=r"\d+ (hours?|days?)")

    return lm

result = classify_and_explain(llm, "Production database is down!")
print(result["priority"])    # "urgent"
print(result["escalation"])  # "call_manager"

The Role of Token Healing in Guidance

Token healing addresses a subtle tokenization artifact. Consider this template:

lm + "The answer is: " + gen(name="answer", regex=r"\d+")

The string "The answer is: " ends with a space. The tokenizer might encode ": 5" (colon, space, digit) as a single token [": 5"]. But the generation starts after the space - so the model starts in the middle of a tokenizer boundary.

Token healing in Guidance works by "rewinding" the last few tokens of the prompt and regenerating them jointly with the constrained generation. This ensures that the generation starts on a clean token boundary, preventing the tokenization artifact from causing incorrect constraint application.

import guidance
from guidance import models, gen


# Without token healing: might fail to apply regex correctly
# because the generation boundary falls inside a token
llm = models.Transformers("gpt2", token_healing=False)  # Explicit disable

# With token healing (default): generation always starts at clean boundary
llm_healed = models.Transformers("gpt2", token_healing=True)  # Default

# In practice: always use token_healing=True (the default)
# The difference matters for complex regex patterns near prompt boundaries

Guidance for Structured Data Extraction

import guidance
from guidance import models, gen, select
import json


@guidance
def extract_structured_data(lm, document: str):
    """
    Extract structured data using Guidance's interleaved generation.
    The advantage: each field's generation can be constrained by previous fields.
    """
    lm += f"Document: {document}\n\n"
    lm += "Extraction:\n"

    # Extract entity type first
    lm += "Entity type: " + select(
        ["person", "organization", "location", "event"],
        name="entity_type",
    )

    # Now condition on entity type
    if lm["entity_type"] == "person":
        lm += "\nFirst name: " + gen(name="first_name", regex=r"[A-Z][a-z]+")
        lm += "\nLast name: " + gen(name="last_name", regex=r"[A-Z][a-z]+")
        lm += "\nAge: " + gen(name="age", regex=r"[0-9]{1,3}")
        full_name = lm["first_name"] + " " + lm["last_name"]

    elif lm["entity_type"] == "organization":
        lm += "\nOrganization name: " + gen(name="org_name", max_tokens=30, stop="\n")
        lm += "\nIndustry: " + select(
            ["technology", "finance", "healthcare", "education", "retail", "other"],
            name="industry",
        )

    elif lm["entity_type"] == "location":
        lm += "\nCity: " + gen(name="city", max_tokens=20, stop=",")
        lm += ", Country: " + gen(name="country", max_tokens=20, stop="\n")

    return lm


llm = models.Transformers("mistralai/Mistral-7B-Instruct-v0.2")
result = extract_structured_data(llm, "Apple Inc. was founded in Cupertino, California.")

print(result["entity_type"])  # "organization"
print(result["org_name"])     # "Apple Inc."
print(result["industry"])     # "technology"

Guidance with OpenAI APIs

Guidance also works with API providers, using token healing and constrained generation through the API's JSON mode:

import guidance
from guidance import models, gen, select

# Use OpenAI with Guidance
llm = models.OpenAI("gpt-4o-mini")

# Same template syntax works
result = llm + "Classify: " + select(
    ["spam", "not_spam"],
    name="label",
)
print(result["label"])

LMQL: SQL-Like Constraints for LLMs

LMQL (Language Model Query Language) takes a different approach: a programming language with SQL-inspired syntax for expressing constrained generation as queries.

# pip install lmql
import lmql


# The core LMQL pattern: a decorated Python function
# with WHERE clauses for constraints and DISTRIBUTION for choices

@lmql.query
async def classify_sentiment(text: str):
    '''lmql
    argmax
        "Sentiment of '{text}' is [SENTIMENT]"
    where
        SENTIMENT in ["positive", "negative", "neutral"]
    '''


# Run the query
result = await classify_sentiment("I love this product!")
print(result.variables["SENTIMENT"])  # "positive"
print(result.distribution)           # {"positive": 0.82, "negative": 0.05, "neutral": 0.13}

LMQL's Key Differentiators

1. argmax vs sample

LMQL supports both greedy decoding (argmax) and sampling (sample) as first-class operations:

@lmql.query
async def sample_with_constraint(prompt: str):
    '''lmql
    sample(temperature=0.7, n=3)   # Generate 3 different samples
        "{prompt} [RESPONSE]"
    where
        len(TOKENS(RESPONSE)) < 50  # Max 50 tokens
    from
        "openai/gpt-3.5-turbo"
    '''

2. Distribution Output

LMQL can output the probability distribution over constrained choices:

@lmql.query
async def get_sentiment_distribution(text: str):
    '''lmql
    distribution
        "The sentiment is [LABEL]"
    where
        LABEL in ["positive", "negative", "neutral"]
    '''

# Returns probability for each option:
# {"positive": 0.72, "negative": 0.18, "neutral": 0.10}
# This is like beam search - you get all options with probabilities

3. Beam Search with Constraints

LMQL supports beam search under constraints, finding the highest-probability valid completion:

@lmql.query
async def beam_constrained_generation(text: str):
    '''lmql
    beam(n=5)  # 5-beam search
        "Summary of '{text}': [SUMMARY]"
    where
        len(TOKENS(SUMMARY)) in range(20, 50) and  # Length constraint
        STOPS_AT(SUMMARY, ".")                       # Stop at period
    '''

4. Multi-Variable Queries

LMQL naturally handles multi-step generation with dependencies:

@lmql.query
async def structured_extraction(document: str):
    '''lmql
    argmax
        "Document: {document}\n"
        "Type: [DOC_TYPE]\n"
        "Key finding: [FINDING]\n"
        "Action required: [ACTION]\n"
    where
        DOC_TYPE in ["invoice", "contract", "report", "email"] and
        len(TOKENS(FINDING)) < 30 and
        (ACTION in ["none", "review", "urgent_action"] if DOC_TYPE == "contract"
         else ACTION in ["none", "process"])
    '''
    # Note: ACTION constraint depends on DOC_TYPE - this is what makes LMQL powerful

LMQL for Token Probability Analysis

One unique LMQL capability is exposing the probability distribution at constrained positions:

import lmql
import asyncio


@lmql.query
async def get_next_token_probs(context: str):
    '''lmql
    distribution
        "{context}[NEXT_WORD]"
    where
        NEXT_WORD in ["the", "a", "an", "this", "that", "some", "many"]
    '''


async def analyze_context_preferences():
    """Analyze which determiners a model prefers in different contexts."""
    contexts = [
        "The engineer reviewed ",
        "A student submitted ",
        "The company announced ",
    ]

    for context in contexts:
        result = await get_next_token_probs(context)
        # result.distribution is dict: {word: probability}
        sorted_probs = sorted(
            result.distribution.items(),
            key=lambda x: x[1],
            reverse=True,
        )
        print(f"\nContext: '{context}'")
        for word, prob in sorted_probs[:3]:
            print(f"  '{word}': {prob:.3f}")

asyncio.run(analyze_context_preferences())

Guidance vs LMQL vs Outlines: Choosing the Right Tool

A Practical Comparison

"""
The same task implemented in Outlines, Guidance, and LMQL.

Task: Extract a sentiment label and a brief reason from text.
Schema: {"label": "positive|negative|neutral", "reason": string (max 50 chars)}
"""

# ===== OUTLINES =====
import outlines
from pydantic import BaseModel, Field

class SentimentResult(BaseModel):
    label: str  # Would use Literal in practice
    reason: str = Field(max_length=50)

outlines_model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
outlines_gen = outlines.generate.json(outlines_model, SentimentResult)

def outlines_extract(text: str) -> SentimentResult:
    return outlines_gen(f"Analyze sentiment of: {text}")

# Pros: Simple, guaranteed structure, Pydantic integration, caches FSM
# Cons: Doesn't handle conditional logic between fields natively


# ===== GUIDANCE =====
import guidance
from guidance import models, gen, select

guidance_llm = models.Transformers("microsoft/Phi-3-mini-4k-instruct")

@guidance
def guidance_extract(lm, text):
    lm += f"Text: {text}\n"
    lm += "Label: " + select(["positive", "negative", "neutral"], name="label")
    lm += "\nReason: " + gen(name="reason", max_tokens=50, stop="\n")
    return lm

def guidance_extract_wrapper(text: str) -> dict:
    result = guidance_extract(guidance_llm, text)
    return {"label": result["label"], "reason": result["reason"]}

# Pros: Fine-grained control, template syntax, easy conditional logic
# Cons: Verbose, less standard API, reason length not strictly enforced


# ===== LMQL =====
import lmql

@lmql.query
async def lmql_extract(text: str):
    '''lmql
    argmax
        "Text: {text}\nLabel: [LABEL]\nReason: [REASON]"
    where
        LABEL in ["positive", "negative", "neutral"] and
        len(TOKENS(REASON)) <= 15  # ~50 chars
    '''

# Pros: SQL-like constraints, beam search support, probability distribution
# Cons: Async-only, unusual syntax, less popular, smaller community

When to Use LMQL and Guidance in Production

The honest answer: most production structured generation needs are well-served by Outlines, Instructor, and the tool calling / structured outputs APIs from providers. LMQL and Guidance fill specific niches:

Use Guidance when:

Your generation logic has complex conditional branches based on earlier generated content
You are building a multi-step reasoning system where each step's constraints depend on previous steps
You need a template-based approach for prompt management (Guidance's template syntax is more readable than f-strings for complex prompts)

Use LMQL when:

You need the probability distribution over constrained choices (for uncertainty quantification, not just the argmax)
You need beam search under complex constraints
You are doing research on constrained decoding and need fine-grained control over the decoding process
You need the WHERE clause's expressive constraint language for complex multi-variable constraints

Do not use either when:

Simple schema extraction (Outlines is simpler and faster)
API-based providers (Instructor handles this better)
When you need production reliability tooling (neither Guidance nor LMQL has mature retry/observability infrastructure)

Common Mistakes

:::warning Guidance and LMQL Are Research-Oriented Tools Neither Guidance nor LMQL has the production maturity of Outlines or Instructor. Version stability, documentation quality, community support, and production deployment patterns are all less developed. For anything beyond prototyping or research, carefully evaluate maintenance status and issue tracker activity before adopting either tool in a production system. :::

:::danger Token Healing Doesn't Solve All Tokenization Boundary Issues Guidance's token healing addresses the most common tokenization artifact at the boundary between static template and generated content. But it does not handle all edge cases: tokenization artifacts in the middle of complex regex patterns, tokenizer-specific quirks with special characters, or cases where the healed region is too short to cover a complex token boundary. If you observe unexpected regex constraint failures, the cause may be a tokenization boundary issue that token healing doesn't cover. Use a debug mode to inspect the actual tokens at the constraint boundary. :::

:::warning LMQL's Async-Only API Has Integration Overhead All LMQL queries are async by default. If your application is synchronous (common in data processing scripts, CLI tools), wrapping LMQL in asyncio.run() works but adds overhead and complexity. Consider whether LMQL's features are worth this integration cost for your use case, or whether Outlines' synchronous API is a better fit. :::

Interview Q&A

Q1: What is Microsoft Guidance and how does it differ from Outlines?

Guidance is a programmatic LLM control framework using a Handlebars-like template syntax to interleave Python code with generation. It allows conditional logic - the constraints applied to a generation step can depend on the output of a previous step. For example, if the model generates "Python" as the language, subsequent constraints apply Python-specific patterns; if it generates "JavaScript," different constraints apply. Outlines, in contrast, is schema-focused: you define a Pydantic model or JSON schema, and the entire output is constrained to that fixed structure. Outlines cannot express "if field A has value X, field B must match pattern Y." The difference: Outlines for static schemas, Guidance for dynamic conditional generation.

Q2: What is LMQL and what unique capabilities does it provide?

LMQL (Language Model Query Language) is a query language for LLMs with SQL-inspired syntax for expressing constraints as WHERE clauses. Unique capabilities: (1) distribution mode - returns the probability distribution over constrained choices, not just the argmax. This enables uncertainty quantification: "how confident is the model in this classification?" (2) Beam search under constraints - beam(n=k) finds the k most probable valid completions, useful when you want to explore multiple valid outputs; (3) Expressive constraint composition - WHERE clauses can combine multiple constraints with boolean operators, including constraints that depend on the values of other generated variables; (4) Native n parameter for sampling multiple completions in one query. These capabilities are not available in Outlines or Instructor.

Q3: What is token healing in Guidance and why is it necessary?

Token healing addresses the mismatch between static prompt text and the start of constrained generation. Tokenizers encode strings into tokens that may not align with character-level boundaries. When a Guidance template has "The answer is: " + gen(regex=r"\d+"), the space before the digit generation might be part of a token that includes the colon, or the space might be encoded as a prefix of the first generated token. Token healing "rewinds" to the last few tokens of the prompt and regenerates them jointly with the constrained sequence, ensuring the generation starts on a clean token boundary. Without token healing, the constrained regex might fail to apply correctly because the generation starts mid-token.

Q4: In what scenarios does programmatic LLM control (Guidance/LMQL) provide clear value over simpler tools?

Three clear scenarios: (1) Multi-step generation with inter-step dependencies - building a structured report where section 3's constraints depend on what was generated in section 1; (2) Adaptive schema selection - when you need to select which Pydantic schema to use for extraction based on a preliminary classification step (Guidance handles this natively; with Outlines you'd need two separate inference calls); (3) Distribution analysis - when you need to know not just the most likely classification but the probability of each option (LMQL's distribution mode). For most production extraction pipelines, these use cases are rare; standard tool calling or Outlines suffices. The added complexity of Guidance/LMQL is only justified when the specific capabilities are genuinely needed.

Q5: Compare the maturity and production-readiness of Outlines, Instructor, Guidance, and LMQL.

Maturity ranking: (1) Instructor - most production-ready, extensive documentation, active community, used in thousands of production systems, multi-provider support, well-maintained by Jason Liu. (2) Outlines - production-ready, well-documented, integrated with vLLM and other serving frameworks, active development by dottxt-ai, used in production by several companies. (3) Guidance - research-oriented with growing production adoption; Microsoft Research provenance gives it credibility; template syntax is opinionated; production deployment patterns are less documented than Outlines/Instructor; may have breaking API changes between versions. (4) LMQL - primarily a research tool; academic project with limited commercial adoption; documentation is thorough for academic users but production deployment examples are scarce; async-only API adds friction; fewer integrations with serving infrastructure. In production, prefer Instructor or Outlines except for use cases that specifically require Guidance or LMQL's distinctive capabilities.

Advanced Guidance Pattern: Multi-Step Chain of Thought with Constraints

One of Guidance's most powerful patterns is constraining a chain-of-thought reasoning process:

import guidance
from guidance import models, gen, select


llm = models.Transformers("mistralai/Mistral-7B-Instruct-v0.2")


@guidance
def constrained_chain_of_thought(lm, problem: str):
    """
    Generate a chain of thought where:
    - The reasoning can be free-form
    - But the final answer is constrained to specific options
    - The confidence must be a regex-constrained decimal
    """
    lm += f"Problem: {problem}\n\n"

    # Step 1: Free-form reasoning (unconstrained)
    lm += "Let me think through this step by step:\n"
    lm += gen(name="reasoning", max_tokens=200, stop="\n\nFinal")

    lm += "\n\nFinal answer: "
    # Step 2: Constrained final answer
    lm += select(
        ["yes", "no", "uncertain"],
        name="final_answer",
    )

    lm += "\nConfidence: "
    # Step 3: Regex-constrained confidence percentage
    lm += gen(name="confidence", regex=r"[0-9]{1,3}%")

    lm += "\nCategory: "
    # Step 4: Category depends on final answer
    if lm["final_answer"] == "yes":
        lm += select(["high-confidence-yes", "low-confidence-yes"], name="category")
    elif lm["final_answer"] == "no":
        lm += select(["clear-no", "borderline-no"], name="category")
    else:
        lm += select(["need-more-info", "genuinely-ambiguous"], name="category")

    return lm


result = constrained_chain_of_thought(
    llm,
    "Is this email likely to be spam? Subject: 'You have won $1,000,000!'"
)

print(f"Reasoning: {result['reasoning'][:200]}")
print(f"Final answer: {result['final_answer']}")
print(f"Confidence: {result['confidence']}")
print(f"Category: {result['category']}")

This pattern - free reasoning, then constrained conclusion - captures the best of both approaches: the model can reason naturally (improving quality of the conclusion) while the final answer is guaranteed to be one of the valid options.

LMQL Advanced: Beam Search for Structured Generation

LMQL's beam search capability enables finding the highest-probability valid completion among multiple candidates:

import lmql
import asyncio


@lmql.query
async def best_category_extraction(text: str, categories: list[str]):
    '''lmql
    beam(n=3)
        "Classify the following text into the most appropriate category.\n"
        "Text: {text}\n"
        "Category: [CATEGORY]"
    where
        CATEGORY in categories
    '''


async def extract_with_beam(text: str):
    """
    Use beam search to find the most probable category assignment.
    Unlike argmax, beam search explores multiple paths and picks the
    globally best one, not just the locally greedy one.
    """
    categories = [
        "technology", "business", "science",
        "sports", "entertainment", "politics", "health"
    ]

    # Get top-3 beams
    result = await best_category_extraction(text, categories)

    # The distribution shows probabilities for each category
    if hasattr(result, "distribution"):
        print("Category probabilities:")
        for cat, prob in sorted(result.distribution.items(), key=lambda x: -x[1]):
            print(f"  {cat}: {prob:.3f}")

    return result.variables.get("CATEGORY")


# Usage
category = asyncio.run(extract_with_beam(
    "Apple's new chip outperforms competitors in benchmark tests"
))
print(f"\nBest category: {category}")  # Expected: "technology"

Beam search is particularly valuable when:

Multiple valid categories are plausible (text about "a tech CEO's political donation" - is it technology, business, or politics?)
You want the probability distribution, not just the argmax
The greedy choice might lead to a locally suboptimal but globally better path

The Token Mask Visualization

Understanding which tokens are masked at each step builds intuition for constrained generation:

import torch
import json


def visualize_json_token_mask(
    tokenizer,
    partial_json: str,
    vocab_size: int = 200,  # Show first N tokens for illustration
) -> dict:
    """
    Show which tokens are valid at a given point in JSON generation.
    Illustrates what the FSM mask looks like in practice.
    """
    # Simplified valid character sets for different JSON positions
    def get_valid_chars_for_state(json_str: str) -> set:
        """Determine valid next characters based on partial JSON."""
        if not json_str or json_str == "":
            return {"{", "["}

        stripped = json_str.strip()

        if stripped.endswith("{"):
            return {'"', "}"}  # Start of key or empty object

        if stripped.endswith(":"):
            return {'"', "0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
                    "-", "[", "{", "t", "f", "n", " "}

        if stripped.endswith(","):
            return {'"', " "}  # Next key

        if stripped.endswith("}"):
            return {",", "}", " ", "\n"}

        if json_str.count('"') % 2 == 1:
            # Inside a string
            return set("abcdefghijklmnopqrstuvwxyz "
                       "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789._-@!,")

        return {'"', "}"}  # Default: next key or close

    valid_chars = get_valid_chars_for_state(partial_json)

    # Count valid tokens (simplified)
    total_tokens_shown = vocab_size
    valid_count = 0
    sample_valid = []
    sample_invalid = []

    for token_id in range(total_tokens_shown):
        token_str = tokenizer.decode([token_id])
        # A token is valid if its first character is a valid next char
        if token_str and token_str[0] in valid_chars:
            valid_count += 1
            if len(sample_valid) < 5:
                sample_valid.append(f"'{token_str.strip()}'")
        else:
            if len(sample_invalid) < 5 and token_str.strip():
                sample_invalid.append(f"'{token_str.strip()}'")

    return {
        "partial_json": partial_json,
        "valid_chars": sorted(valid_chars),
        "valid_tokens_in_first_200": valid_count,
        "invalid_tokens_in_first_200": total_tokens_shown - valid_count,
        "sample_valid": sample_valid,
        "sample_invalid": sample_invalid,
        "mask_density": valid_count / total_tokens_shown,
    }


# This shows how the mask becomes very sparse (few valid tokens)
# at highly constrained points like field name generation
example_positions = [
    "",                           # Start: only { or [
    '{"',                         # After open brace and quote: letters only
    '{"name": "',                 # Inside string value: many chars valid
    '{"name": "Alice", "age": ',  # After colon for number: digits and minus
]

for pos in example_positions:
    print(f"\nAt: {repr(pos)}")
    print(f"Valid chars: {sorted(set('abcdefghijklmnopqrstuvwxyz{}\",:0123456789-. ')) if '\"' in pos and pos.count('\"') % 2 == 1 else ['{', '}', '\"', ':', ',', '0-9', '-', ' ']}")

The visualization reveals an important property of constrained generation: the mask density varies dramatically depending on where you are in the JSON structure. Inside a string value, most printable characters are valid (high density). At the start of a field name, only characters matching known field names are valid (very sparse). This sparsity is what provides the guarantee - at each step, the choice space is constrained to valid completions.

Practical Tool Comparison: Decision Matrix

When evaluating tools for a new structured generation use case, use this decision matrix:

Requirement	Outlines	Instructor	Guidance	LMQL	OpenAI Struct.
Local model support	Yes	Yes (via Ollama)	Yes	Yes	No
API model support	No	Yes	Yes	Yes	Yes only
100% schema guarantee	Yes	No (99.5%+)	Yes (regex)	Yes	Yes
Multi-provider	No	Yes	Yes (limited)	Yes	OpenAI only
Conditional constraints	Limited	No	Yes	Yes	No
Probability distribution	No	No	No	Yes	No
Beam search	No	No	No	Yes	No
Streaming support	Yes (vLLM)	Yes	Limited	No	Yes
Production maturity	High	High	Medium	Low	High
Learning curve	Low	Low	Medium	High	Low
Schema complexity limit	High	High	Medium	Medium	~100 fields

This matrix should be your first reference when evaluating tools. For most production use cases, the decision simplifies to: Outlines for local models, Instructor for API models, with Guidance or LMQL added only when their specific features are genuinely required.

:::tip 🎮 Interactive Playground

Visualize this concept: Try the LMQL: Constraint-Based Prompting demo on the EngineersOfAI Playground - no code required.

:::

Opening Scenario: When a Schema Isn't Enough​

Microsoft Guidance: Interleaving Code and Generation​

The Guidance Template Syntax​

The Role of Token Healing in Guidance​

Guidance for Structured Data Extraction​

Guidance with OpenAI APIs​

LMQL: SQL-Like Constraints for LLMs​

LMQL's Key Differentiators​

LMQL for Token Probability Analysis​

Guidance vs LMQL vs Outlines: Choosing the Right Tool​

A Practical Comparison​

When to Use LMQL and Guidance in Production​

Common Mistakes​

Interview Q&A​

Advanced Guidance Pattern: Multi-Step Chain of Thought with Constraints​

LMQL Advanced: Beam Search for Structured Generation​

The Token Mask Visualization​

Practical Tool Comparison: Decision Matrix​