Skip to main content

LMQL and Guidance - Programmatic LLM Control

Opening Scenario: When a Schema Isn't Enough

You are building a code generation system. The model needs to:

  1. Decide which programming language to use (Python or JavaScript)
  2. If Python: generate a function using specific allowed libraries
  3. If JavaScript: generate using different allowed patterns
  4. Add a docstring that follows the language's convention
  5. Add test cases that reference the function name generated in step 2

This is not a simple schema problem. The valid output at step 3 depends on the decision at step 1. The valid content at step 5 depends on what was generated at step 2. The constraints are interdependent and contextual - exactly the kind of problem that Pydantic schemas cannot express.

This is the domain of programmatic LLM control: systems that interleave code execution with generation, where each generation step can be constrained by the outputs of previous steps. Microsoft Guidance and LMQL (Language Model Query Language) are the two main tools in this space.

Microsoft Guidance: Interleaving Code and Generation

Guidance is a Microsoft Research library that uses a Handlebars-inspired template syntax to specify generation programs - documents where some parts are static text, some parts are generated by the model, and some parts are computed by Python code.

# pip install guidance
import guidance
from guidance import models, gen, select, substring, regex


# Load model (Guidance works with local models and OpenAI)
llm = models.Transformers("microsoft/Phi-3-mini-4k-instruct")


# Example 1: Simple constrained generation
with guidance.system():
lm = llm + "You are a programming assistant."

with guidance.user():
lm += "Generate a short function to add two numbers."

with guidance.assistant():
# Generate function name as a regex-constrained string
lm += "def " + gen(name="func_name", regex=r"[a-z_]+") + "("
# Generate parameters (regex constrained)
lm += gen(name="params", regex=r"[a-z, ]+") + "):"
# Generate body (free generation with stop sequence)
lm += "\n " + gen(name="body", stop="\n\n")

# Access generated parts
print(lm["func_name"]) # e.g., "add_numbers"
print(lm["params"]) # e.g., "a, b"
print(lm["body"]) # e.g., "return a + b"

The Guidance Template Syntax

Guidance templates mix static text with generation calls:

import guidance
from guidance import models, gen, select


llm = models.Transformers("mistralai/Mistral-7B-Instruct-v0.2")


# gen() - generate free text with optional constraints
result = llm + "Name: " + gen(name="name", max_tokens=20, stop="\n")
name = result["name"] # The generated name

# select() - select from a list of options
result = llm + "Sentiment: " + select(["positive", "negative", "neutral"], name="sentiment")
sentiment = result["sentiment"] # Always one of the three options

# regex() - generate text matching a pattern
result = llm + "Date: " + gen(name="date", regex=r"\d{4}-\d{2}-\d{2}")
date = result["date"] # Always YYYY-MM-DD format

# Combining: conditional generation based on previous output
@guidance
def classify_and_explain(lm, text):
lm += f"Text: {text}\n"
lm += "Category: " + select(["urgent", "normal", "low"], name="priority")

if lm["priority"] == "urgent":
lm += "\nEscalation path: " + select(
["call_manager", "send_email", "create_ticket"],
name="escalation",
)
else:
lm += "\nResponse time: " + gen(name="response_time", regex=r"\d+ (hours?|days?)")

return lm

result = classify_and_explain(llm, "Production database is down!")
print(result["priority"]) # "urgent"
print(result["escalation"]) # "call_manager"

The Role of Token Healing in Guidance

Token healing addresses a subtle tokenization artifact. Consider this template:

lm + "The answer is: " + gen(name="answer", regex=r"\d+")

The string "The answer is: " ends with a space. The tokenizer might encode ": 5" (colon, space, digit) as a single token [": 5"]. But the generation starts after the space - so the model starts in the middle of a tokenizer boundary.

Token healing in Guidance works by "rewinding" the last few tokens of the prompt and regenerating them jointly with the constrained generation. This ensures that the generation starts on a clean token boundary, preventing the tokenization artifact from causing incorrect constraint application.

import guidance
from guidance import models, gen


# Without token healing: might fail to apply regex correctly
# because the generation boundary falls inside a token
llm = models.Transformers("gpt2", token_healing=False) # Explicit disable

# With token healing (default): generation always starts at clean boundary
llm_healed = models.Transformers("gpt2", token_healing=True) # Default

# In practice: always use token_healing=True (the default)
# The difference matters for complex regex patterns near prompt boundaries

Guidance for Structured Data Extraction

import guidance
from guidance import models, gen, select
import json


@guidance
def extract_structured_data(lm, document: str):
"""
Extract structured data using Guidance's interleaved generation.
The advantage: each field's generation can be constrained by previous fields.
"""
lm += f"Document: {document}\n\n"
lm += "Extraction:\n"

# Extract entity type first
lm += "Entity type: " + select(
["person", "organization", "location", "event"],
name="entity_type",
)

# Now condition on entity type
if lm["entity_type"] == "person":
lm += "\nFirst name: " + gen(name="first_name", regex=r"[A-Z][a-z]+")
lm += "\nLast name: " + gen(name="last_name", regex=r"[A-Z][a-z]+")
lm += "\nAge: " + gen(name="age", regex=r"[0-9]{1,3}")
full_name = lm["first_name"] + " " + lm["last_name"]

elif lm["entity_type"] == "organization":
lm += "\nOrganization name: " + gen(name="org_name", max_tokens=30, stop="\n")
lm += "\nIndustry: " + select(
["technology", "finance", "healthcare", "education", "retail", "other"],
name="industry",
)

elif lm["entity_type"] == "location":
lm += "\nCity: " + gen(name="city", max_tokens=20, stop=",")
lm += ", Country: " + gen(name="country", max_tokens=20, stop="\n")

return lm


llm = models.Transformers("mistralai/Mistral-7B-Instruct-v0.2")
result = extract_structured_data(llm, "Apple Inc. was founded in Cupertino, California.")

print(result["entity_type"]) # "organization"
print(result["org_name"]) # "Apple Inc."
print(result["industry"]) # "technology"

Guidance with OpenAI APIs

Guidance also works with API providers, using token healing and constrained generation through the API's JSON mode:

import guidance
from guidance import models, gen, select

# Use OpenAI with Guidance
llm = models.OpenAI("gpt-4o-mini")

# Same template syntax works
result = llm + "Classify: " + select(
["spam", "not_spam"],
name="label",
)
print(result["label"])

LMQL: SQL-Like Constraints for LLMs

LMQL (Language Model Query Language) takes a different approach: a programming language with SQL-inspired syntax for expressing constrained generation as queries.

# pip install lmql
import lmql


# The core LMQL pattern: a decorated Python function
# with WHERE clauses for constraints and DISTRIBUTION for choices

@lmql.query
async def classify_sentiment(text: str):
'''lmql
argmax
"Sentiment of '{text}' is [SENTIMENT]"
where
SENTIMENT in ["positive", "negative", "neutral"]
'''


# Run the query
result = await classify_sentiment("I love this product!")
print(result.variables["SENTIMENT"]) # "positive"
print(result.distribution) # {"positive": 0.82, "negative": 0.05, "neutral": 0.13}

LMQL's Key Differentiators

1. argmax vs sample

LMQL supports both greedy decoding (argmax) and sampling (sample) as first-class operations:

@lmql.query
async def sample_with_constraint(prompt: str):
'''lmql
sample(temperature=0.7, n=3) # Generate 3 different samples
"{prompt} [RESPONSE]"
where
len(TOKENS(RESPONSE)) < 50 # Max 50 tokens
from
"openai/gpt-3.5-turbo"
'''

2. Distribution Output

LMQL can output the probability distribution over constrained choices:

@lmql.query
async def get_sentiment_distribution(text: str):
'''lmql
distribution
"The sentiment is [LABEL]"
where
LABEL in ["positive", "negative", "neutral"]
'''

# Returns probability for each option:
# {"positive": 0.72, "negative": 0.18, "neutral": 0.10}
# This is like beam search - you get all options with probabilities

3. Beam Search with Constraints

LMQL supports beam search under constraints, finding the highest-probability valid completion:

@lmql.query
async def beam_constrained_generation(text: str):
'''lmql
beam(n=5) # 5-beam search
"Summary of '{text}': [SUMMARY]"
where
len(TOKENS(SUMMARY)) in range(20, 50) and # Length constraint
STOPS_AT(SUMMARY, ".") # Stop at period
'''

4. Multi-Variable Queries

LMQL naturally handles multi-step generation with dependencies:

@lmql.query
async def structured_extraction(document: str):
'''lmql
argmax
"Document: {document}\n"
"Type: [DOC_TYPE]\n"
"Key finding: [FINDING]\n"
"Action required: [ACTION]\n"
where
DOC_TYPE in ["invoice", "contract", "report", "email"] and
len(TOKENS(FINDING)) < 30 and
(ACTION in ["none", "review", "urgent_action"] if DOC_TYPE == "contract"
else ACTION in ["none", "process"])
'''
# Note: ACTION constraint depends on DOC_TYPE - this is what makes LMQL powerful

LMQL for Token Probability Analysis

One unique LMQL capability is exposing the probability distribution at constrained positions:

import lmql
import asyncio


@lmql.query
async def get_next_token_probs(context: str):
'''lmql
distribution
"{context}[NEXT_WORD]"
where
NEXT_WORD in ["the", "a", "an", "this", "that", "some", "many"]
'''


async def analyze_context_preferences():
"""Analyze which determiners a model prefers in different contexts."""
contexts = [
"The engineer reviewed ",
"A student submitted ",
"The company announced ",
]

for context in contexts:
result = await get_next_token_probs(context)
# result.distribution is dict: {word: probability}
sorted_probs = sorted(
result.distribution.items(),
key=lambda x: x[1],
reverse=True,
)
print(f"\nContext: '{context}'")
for word, prob in sorted_probs[:3]:
print(f" '{word}': {prob:.3f}")

asyncio.run(analyze_context_preferences())

Guidance vs LMQL vs Outlines: Choosing the Right Tool

A Practical Comparison

"""
The same task implemented in Outlines, Guidance, and LMQL.

Task: Extract a sentiment label and a brief reason from text.
Schema: {"label": "positive|negative|neutral", "reason": string (max 50 chars)}
"""

# ===== OUTLINES =====
import outlines
from pydantic import BaseModel, Field

class SentimentResult(BaseModel):
label: str # Would use Literal in practice
reason: str = Field(max_length=50)

outlines_model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
outlines_gen = outlines.generate.json(outlines_model, SentimentResult)

def outlines_extract(text: str) -> SentimentResult:
return outlines_gen(f"Analyze sentiment of: {text}")

# Pros: Simple, guaranteed structure, Pydantic integration, caches FSM
# Cons: Doesn't handle conditional logic between fields natively


# ===== GUIDANCE =====
import guidance
from guidance import models, gen, select

guidance_llm = models.Transformers("microsoft/Phi-3-mini-4k-instruct")

@guidance
def guidance_extract(lm, text):
lm += f"Text: {text}\n"
lm += "Label: " + select(["positive", "negative", "neutral"], name="label")
lm += "\nReason: " + gen(name="reason", max_tokens=50, stop="\n")
return lm

def guidance_extract_wrapper(text: str) -> dict:
result = guidance_extract(guidance_llm, text)
return {"label": result["label"], "reason": result["reason"]}

# Pros: Fine-grained control, template syntax, easy conditional logic
# Cons: Verbose, less standard API, reason length not strictly enforced


# ===== LMQL =====
import lmql

@lmql.query
async def lmql_extract(text: str):
'''lmql
argmax
"Text: {text}\nLabel: [LABEL]\nReason: [REASON]"
where
LABEL in ["positive", "negative", "neutral"] and
len(TOKENS(REASON)) <= 15 # ~50 chars
'''

# Pros: SQL-like constraints, beam search support, probability distribution
# Cons: Async-only, unusual syntax, less popular, smaller community

When to Use LMQL and Guidance in Production

The honest answer: most production structured generation needs are well-served by Outlines, Instructor, and the tool calling / structured outputs APIs from providers. LMQL and Guidance fill specific niches:

Use Guidance when:

  • Your generation logic has complex conditional branches based on earlier generated content
  • You are building a multi-step reasoning system where each step's constraints depend on previous steps
  • You need a template-based approach for prompt management (Guidance's template syntax is more readable than f-strings for complex prompts)

Use LMQL when:

  • You need the probability distribution over constrained choices (for uncertainty quantification, not just the argmax)
  • You need beam search under complex constraints
  • You are doing research on constrained decoding and need fine-grained control over the decoding process
  • You need the WHERE clause's expressive constraint language for complex multi-variable constraints

Do not use either when:

  • Simple schema extraction (Outlines is simpler and faster)
  • API-based providers (Instructor handles this better)
  • When you need production reliability tooling (neither Guidance nor LMQL has mature retry/observability infrastructure)

Common Mistakes

:::warning Guidance and LMQL Are Research-Oriented Tools Neither Guidance nor LMQL has the production maturity of Outlines or Instructor. Version stability, documentation quality, community support, and production deployment patterns are all less developed. For anything beyond prototyping or research, carefully evaluate maintenance status and issue tracker activity before adopting either tool in a production system. :::

:::danger Token Healing Doesn't Solve All Tokenization Boundary Issues Guidance's token healing addresses the most common tokenization artifact at the boundary between static template and generated content. But it does not handle all edge cases: tokenization artifacts in the middle of complex regex patterns, tokenizer-specific quirks with special characters, or cases where the healed region is too short to cover a complex token boundary. If you observe unexpected regex constraint failures, the cause may be a tokenization boundary issue that token healing doesn't cover. Use a debug mode to inspect the actual tokens at the constraint boundary. :::

:::warning LMQL's Async-Only API Has Integration Overhead All LMQL queries are async by default. If your application is synchronous (common in data processing scripts, CLI tools), wrapping LMQL in asyncio.run() works but adds overhead and complexity. Consider whether LMQL's features are worth this integration cost for your use case, or whether Outlines' synchronous API is a better fit. :::

Interview Q&A

Q1: What is Microsoft Guidance and how does it differ from Outlines?

Guidance is a programmatic LLM control framework using a Handlebars-like template syntax to interleave Python code with generation. It allows conditional logic - the constraints applied to a generation step can depend on the output of a previous step. For example, if the model generates "Python" as the language, subsequent constraints apply Python-specific patterns; if it generates "JavaScript," different constraints apply. Outlines, in contrast, is schema-focused: you define a Pydantic model or JSON schema, and the entire output is constrained to that fixed structure. Outlines cannot express "if field A has value X, field B must match pattern Y." The difference: Outlines for static schemas, Guidance for dynamic conditional generation.

Q2: What is LMQL and what unique capabilities does it provide?

LMQL (Language Model Query Language) is a query language for LLMs with SQL-inspired syntax for expressing constraints as WHERE clauses. Unique capabilities: (1) distribution mode - returns the probability distribution over constrained choices, not just the argmax. This enables uncertainty quantification: "how confident is the model in this classification?" (2) Beam search under constraints - beam(n=k) finds the k most probable valid completions, useful when you want to explore multiple valid outputs; (3) Expressive constraint composition - WHERE clauses can combine multiple constraints with boolean operators, including constraints that depend on the values of other generated variables; (4) Native n parameter for sampling multiple completions in one query. These capabilities are not available in Outlines or Instructor.

Q3: What is token healing in Guidance and why is it necessary?

Token healing addresses the mismatch between static prompt text and the start of constrained generation. Tokenizers encode strings into tokens that may not align with character-level boundaries. When a Guidance template has "The answer is: " + gen(regex=r"\d+"), the space before the digit generation might be part of a token that includes the colon, or the space might be encoded as a prefix of the first generated token. Token healing "rewinds" to the last few tokens of the prompt and regenerates them jointly with the constrained sequence, ensuring the generation starts on a clean token boundary. Without token healing, the constrained regex might fail to apply correctly because the generation starts mid-token.

Q4: In what scenarios does programmatic LLM control (Guidance/LMQL) provide clear value over simpler tools?

Three clear scenarios: (1) Multi-step generation with inter-step dependencies - building a structured report where section 3's constraints depend on what was generated in section 1; (2) Adaptive schema selection - when you need to select which Pydantic schema to use for extraction based on a preliminary classification step (Guidance handles this natively; with Outlines you'd need two separate inference calls); (3) Distribution analysis - when you need to know not just the most likely classification but the probability of each option (LMQL's distribution mode). For most production extraction pipelines, these use cases are rare; standard tool calling or Outlines suffices. The added complexity of Guidance/LMQL is only justified when the specific capabilities are genuinely needed.

Q5: Compare the maturity and production-readiness of Outlines, Instructor, Guidance, and LMQL.

Maturity ranking: (1) Instructor - most production-ready, extensive documentation, active community, used in thousands of production systems, multi-provider support, well-maintained by Jason Liu. (2) Outlines - production-ready, well-documented, integrated with vLLM and other serving frameworks, active development by dottxt-ai, used in production by several companies. (3) Guidance - research-oriented with growing production adoption; Microsoft Research provenance gives it credibility; template syntax is opinionated; production deployment patterns are less documented than Outlines/Instructor; may have breaking API changes between versions. (4) LMQL - primarily a research tool; academic project with limited commercial adoption; documentation is thorough for academic users but production deployment examples are scarce; async-only API adds friction; fewer integrations with serving infrastructure. In production, prefer Instructor or Outlines except for use cases that specifically require Guidance or LMQL's distinctive capabilities.

Advanced Guidance Pattern: Multi-Step Chain of Thought with Constraints

One of Guidance's most powerful patterns is constraining a chain-of-thought reasoning process:

import guidance
from guidance import models, gen, select


llm = models.Transformers("mistralai/Mistral-7B-Instruct-v0.2")


@guidance
def constrained_chain_of_thought(lm, problem: str):
"""
Generate a chain of thought where:
- The reasoning can be free-form
- But the final answer is constrained to specific options
- The confidence must be a regex-constrained decimal
"""
lm += f"Problem: {problem}\n\n"

# Step 1: Free-form reasoning (unconstrained)
lm += "Let me think through this step by step:\n"
lm += gen(name="reasoning", max_tokens=200, stop="\n\nFinal")

lm += "\n\nFinal answer: "
# Step 2: Constrained final answer
lm += select(
["yes", "no", "uncertain"],
name="final_answer",
)

lm += "\nConfidence: "
# Step 3: Regex-constrained confidence percentage
lm += gen(name="confidence", regex=r"[0-9]{1,3}%")

lm += "\nCategory: "
# Step 4: Category depends on final answer
if lm["final_answer"] == "yes":
lm += select(["high-confidence-yes", "low-confidence-yes"], name="category")
elif lm["final_answer"] == "no":
lm += select(["clear-no", "borderline-no"], name="category")
else:
lm += select(["need-more-info", "genuinely-ambiguous"], name="category")

return lm


result = constrained_chain_of_thought(
llm,
"Is this email likely to be spam? Subject: 'You have won $1,000,000!'"
)

print(f"Reasoning: {result['reasoning'][:200]}")
print(f"Final answer: {result['final_answer']}")
print(f"Confidence: {result['confidence']}")
print(f"Category: {result['category']}")

This pattern - free reasoning, then constrained conclusion - captures the best of both approaches: the model can reason naturally (improving quality of the conclusion) while the final answer is guaranteed to be one of the valid options.

LMQL Advanced: Beam Search for Structured Generation

LMQL's beam search capability enables finding the highest-probability valid completion among multiple candidates:

import lmql
import asyncio


@lmql.query
async def best_category_extraction(text: str, categories: list[str]):
'''lmql
beam(n=3)
"Classify the following text into the most appropriate category.\n"
"Text: {text}\n"
"Category: [CATEGORY]"
where
CATEGORY in categories
'''


async def extract_with_beam(text: str):
"""
Use beam search to find the most probable category assignment.
Unlike argmax, beam search explores multiple paths and picks the
globally best one, not just the locally greedy one.
"""
categories = [
"technology", "business", "science",
"sports", "entertainment", "politics", "health"
]

# Get top-3 beams
result = await best_category_extraction(text, categories)

# The distribution shows probabilities for each category
if hasattr(result, "distribution"):
print("Category probabilities:")
for cat, prob in sorted(result.distribution.items(), key=lambda x: -x[1]):
print(f" {cat}: {prob:.3f}")

return result.variables.get("CATEGORY")


# Usage
category = asyncio.run(extract_with_beam(
"Apple's new chip outperforms competitors in benchmark tests"
))
print(f"\nBest category: {category}") # Expected: "technology"

Beam search is particularly valuable when:

  1. Multiple valid categories are plausible (text about "a tech CEO's political donation" - is it technology, business, or politics?)
  2. You want the probability distribution, not just the argmax
  3. The greedy choice might lead to a locally suboptimal but globally better path

The Token Mask Visualization

Understanding which tokens are masked at each step builds intuition for constrained generation:

import torch
import json


def visualize_json_token_mask(
tokenizer,
partial_json: str,
vocab_size: int = 200, # Show first N tokens for illustration
) -> dict:
"""
Show which tokens are valid at a given point in JSON generation.
Illustrates what the FSM mask looks like in practice.
"""
# Simplified valid character sets for different JSON positions
def get_valid_chars_for_state(json_str: str) -> set:
"""Determine valid next characters based on partial JSON."""
if not json_str or json_str == "":
return {"{", "["}

stripped = json_str.strip()

if stripped.endswith("{"):
return {'"', "}"} # Start of key or empty object

if stripped.endswith(":"):
return {'"', "0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
"-", "[", "{", "t", "f", "n", " "}

if stripped.endswith(","):
return {'"', " "} # Next key

if stripped.endswith("}"):
return {",", "}", " ", "\n"}

if json_str.count('"') % 2 == 1:
# Inside a string
return set("abcdefghijklmnopqrstuvwxyz "
"ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789._-@!,")

return {'"', "}"} # Default: next key or close

valid_chars = get_valid_chars_for_state(partial_json)

# Count valid tokens (simplified)
total_tokens_shown = vocab_size
valid_count = 0
sample_valid = []
sample_invalid = []

for token_id in range(total_tokens_shown):
token_str = tokenizer.decode([token_id])
# A token is valid if its first character is a valid next char
if token_str and token_str[0] in valid_chars:
valid_count += 1
if len(sample_valid) < 5:
sample_valid.append(f"'{token_str.strip()}'")
else:
if len(sample_invalid) < 5 and token_str.strip():
sample_invalid.append(f"'{token_str.strip()}'")

return {
"partial_json": partial_json,
"valid_chars": sorted(valid_chars),
"valid_tokens_in_first_200": valid_count,
"invalid_tokens_in_first_200": total_tokens_shown - valid_count,
"sample_valid": sample_valid,
"sample_invalid": sample_invalid,
"mask_density": valid_count / total_tokens_shown,
}


# This shows how the mask becomes very sparse (few valid tokens)
# at highly constrained points like field name generation
example_positions = [
"", # Start: only { or [
'{"', # After open brace and quote: letters only
'{"name": "', # Inside string value: many chars valid
'{"name": "Alice", "age": ', # After colon for number: digits and minus
]

for pos in example_positions:
print(f"\nAt: {repr(pos)}")
print(f"Valid chars: {sorted(set('abcdefghijklmnopqrstuvwxyz{}\",:0123456789-. ')) if '\"' in pos and pos.count('\"') % 2 == 1 else ['{', '}', '\"', ':', ',', '0-9', '-', ' ']}")

The visualization reveals an important property of constrained generation: the mask density varies dramatically depending on where you are in the JSON structure. Inside a string value, most printable characters are valid (high density). At the start of a field name, only characters matching known field names are valid (very sparse). This sparsity is what provides the guarantee - at each step, the choice space is constrained to valid completions.

Practical Tool Comparison: Decision Matrix

When evaluating tools for a new structured generation use case, use this decision matrix:

RequirementOutlinesInstructorGuidanceLMQLOpenAI Struct.
Local model supportYesYes (via Ollama)YesYesNo
API model supportNoYesYesYesYes only
100% schema guaranteeYesNo (99.5%+)Yes (regex)YesYes
Multi-providerNoYesYes (limited)YesOpenAI only
Conditional constraintsLimitedNoYesYesNo
Probability distributionNoNoNoYesNo
Beam searchNoNoNoYesNo
Streaming supportYes (vLLM)YesLimitedNoYes
Production maturityHighHighMediumLowHigh
Learning curveLowLowMediumHighLow
Schema complexity limitHighHighMediumMedium~100 fields

This matrix should be your first reference when evaluating tools. For most production use cases, the decision simplifies to: Outlines for local models, Instructor for API models, with Guidance or LMQL added only when their specific features are genuinely required.

:::tip 🎮 Interactive Playground

Visualize this concept: Try the LMQL: Constraint-Based Prompting demo on the EngineersOfAI Playground - no code required.

:::

© 2026 EngineersOfAI. All rights reserved.