Handling Ambiguity and Clarification
The Instruction That Could Mean Anything
"Make the app faster."
An agent receives this instruction and must decide what to do next. Consider all the ways this is ambiguous:
- Which app? The backend API? The frontend React app? The mobile app? The data pipeline?
- Faster how? Reduce p50 latency? Reduce p99 latency? Increase throughput? Reduce startup time?
- By how much? 10% faster? 2x faster? Under 100ms?
- At what cost? Any spend is fine? Under $500/month? No new infrastructure?
- What can change? Just code? Architecture? Database schema? Adding caching? CDN?
- Measured how? Synthetic benchmarks? Real user metrics? Load tests?
An agent that charges ahead on any single interpretation will very likely do something the user did not want. Maybe it optimizes the wrong service. Maybe it makes an irreversible architectural change. Maybe it spends a week micro-optimizing while the real bottleneck is a missing database index.
But an agent that asks about every ambiguity is equally useless. Users hate chatbots that respond to every request with a barrage of clarification questions. The skill is knowing which ambiguities matter enough to ask about.
:::tip 🎮 Interactive Playground Visualize this concept: Try the Human-in-the-Loop Agents demo on the EngineersOfAI Playground - no code required. :::
Types of Ambiguity
Not all ambiguity is the same. Understanding the type helps you decide what to do about it.
Goal Ambiguity
The intended outcome is unclear. "Make it better." "Fix the issues." "Update the system."
These are the most dangerous - the agent genuinely does not know what success looks like. Execution without clarification is likely to produce the wrong thing.
Constraint Ambiguity
The goal is clear but the boundaries are not. "Migrate the database" - but to what? On what timeline? Can the service be down during migration?
Often the agent can infer reasonable defaults but should document its assumptions.
Scope Ambiguity
Which parts of the system are in scope? "Improve the search functionality" - does this include the indexing pipeline? The relevance algorithm? The UI?
Priority Ambiguity
Multiple valid interpretations exist and they conflict. "Make it reliable and make it fast" - what if reliability and speed require different architectural choices?
The Cost-Benefit Analysis of Asking
Every clarification decision is a trade-off:
The decision to ask should depend on:
- Cost of being wrong: irreversible actions cost more than reversible ones. Deleting a database vs. adding a log statement.
- Confidence in the assumption: if there's one obvious interpretation, proceed with it documented. If multiple interpretations are equally plausible, ask.
- Cost of asking: interrupting a human mid-flow has a cost. Batching clarification questions is better than asking one at a time.
- Task duration: for a 30-second task, ask first. For a 2-hour task, definitely confirm before starting.
Clarification Strategies
Ask Upfront
Gather all requirements before starting. Best for long tasks with high cost of redirection. Worst for conversational contexts where users want responsiveness.
Infer and Proceed (Document Assumptions)
Make the most reasonable assumption, document it clearly, proceed. "I'm assuming you mean the backend API (not the mobile app). If that's wrong, let me know and I'll redirect."
This is often the right call for low-stakes decisions where the assumption is obvious.
Bounded Execution
Start on the unambiguous parts. Stop and ask at the first decision point that requires a specific interpretation. "I've set up the project structure and started on the database schema. Before I continue, I need to know: should users be able to have multiple organizations, or is one org per user?"
Multiple Interpretations
Execute for each plausible interpretation (in parallel if fast enough), present both results, let the user choose. Works for short tasks with 2–3 interpretations.
The Overly Cautious Agent Anti-Pattern
This is a critical failure mode: the agent that asks about everything. Some engineers, worried about their agents doing the wrong thing, err toward constant clarification. The result is an agent that feels like filling out a form.
Signs you have an overly cautious agent:
- It asks about decisions that have obvious defaults
- It asks multiple questions per turn instead of one focused question
- It asks about low-stakes, easily reversible actions
- It asks clarifying questions even when the user has already answered them
The right threshold: only ask when you genuinely cannot make a reasonable assumption, AND the cost of being wrong is significant. For everything else, infer and proceed.
Confidence-Based Asking
The most principled approach: compute a confidence score for your interpretation, and ask only when confidence falls below a threshold.
Full Implementation: Ambiguity Detection and Clarification System
"""
ambiguity_handler.py
Complete ambiguity detection and clarification question generation system.
Analyzes incoming instructions, identifies unclear parameters,
and generates targeted, actionable clarification questions.
Requirements:
pip install openai pydantic
"""
from __future__ import annotations
import json
import textwrap
from enum import Enum
from typing import Optional
from openai import OpenAI
from pydantic import BaseModel, Field
client = OpenAI()
# ─── Data Models ─────────────────────────────────────────────────────────────
class AmbiguityType(str, Enum):
GOAL = "goal" # What is the intended outcome?
SCOPE = "scope" # What parts of the system are affected?
CONSTRAINT = "constraint" # What are the boundaries/limits?
PRIORITY = "priority" # Which of multiple goals matters most?
TIMELINE = "timeline" # By when?
SUCCESS_CRITERION = "success_criterion" # How will we know it worked?
class AmbiguityLevel(str, Enum):
HIGH = "high" # Must ask - cannot proceed safely
MEDIUM = "medium" # Should ask unless task is low-stakes
LOW = "low" # Can infer and proceed with documented assumption
class Parameter(BaseModel):
"""A single parameter of the task that may or may not be specified."""
name: str
description: str
ambiguity_type: AmbiguityType
ambiguity_level: AmbiguityLevel
confidence: float # 0.0–1.0, how confident we are in our best guess
best_guess: Optional[str] = None
alternative_interpretations: list[str] = Field(default_factory=list)
reversible_if_wrong: bool = True
class AmbiguityAnalysis(BaseModel):
"""Full analysis of an instruction's ambiguities."""
instruction: str
overall_confidence: float # 0.0–1.0
is_safe_to_proceed: bool # Can we proceed without asking?
parameters: list[Parameter]
documented_assumptions: list[str] # What we'll assume if we proceed
clarification_questions: list[str] # Questions to ask if we need clarity
class ConversationContext(BaseModel):
"""Track what has been asked and answered."""
answered_parameters: dict[str, str] = Field(default_factory=dict)
skipped_parameters: list[str] = Field(default_factory=list)
interaction_count: int = 0
# ─── Ambiguity Analyzer ────────────────────────────────────────────────────────
ANALYSIS_PROMPT = textwrap.dedent("""
You are an expert at analyzing natural language instructions for ambiguity.
Given an instruction, identify all parameters that could be interpreted in multiple ways.
For each ambiguous parameter, assess:
1. What type of ambiguity this is (goal, scope, constraint, priority, timeline, success_criterion)
2. How severe the ambiguity is (high/medium/low)
3. Your confidence that you know the correct interpretation (0.0–1.0)
4. Your best guess interpretation
5. Alternative interpretations
6. Whether proceeding with the wrong interpretation is reversible
Then determine:
- overall_confidence: weighted average confidence across all parameters
- is_safe_to_proceed: true if overall_confidence > 0.7 AND no HIGH ambiguity parameters exist
- documented_assumptions: what you would assume if proceeding without asking
- clarification_questions: specific, single-focus questions for HIGH ambiguity params only
Output JSON matching this schema exactly:
{
"overall_confidence": 0.45,
"is_safe_to_proceed": false,
"parameters": [
{
"name": "target_service",
"description": "Which service to optimize",
"ambiguity_type": "scope",
"ambiguity_level": "high",
"confidence": 0.3,
"best_guess": "the backend FastAPI service",
"alternative_interpretations": ["the React frontend", "the PostgreSQL queries", "all services"],
"reversible_if_wrong": true
}
],
"documented_assumptions": [
"Assuming target is backend API based on recent context",
"Assuming 20% latency reduction as success threshold"
],
"clarification_questions": [
"Which service should I focus on - the backend API, the React frontend, or the database queries?",
"What does 'faster' mean to you - lower response latency, higher throughput, or faster page load?"
]
}
Instruction to analyze: {instruction}
Context (may be empty): {context}
""").strip()
class AmbiguityAnalyzer:
"""Analyzes instructions for ambiguity and generates clarification questions."""
def __init__(self, model: str = "gpt-4o", ask_threshold: float = 0.7):
self.model = model
self.ask_threshold = ask_threshold # Ask if confidence < this threshold
def analyze(self, instruction: str, context: str = "") -> AmbiguityAnalysis:
"""Analyze an instruction for ambiguities."""
response = client.chat.completions.create(
model=self.model,
messages=[{
"role": "user",
"content": ANALYSIS_PROMPT.format(
instruction=instruction,
context=context or "(none)",
),
}],
temperature=0.1,
response_format={"type": "json_object"},
)
data = json.loads(response.choices[0].message.content)
# Deserialize parameters
parameters = [Parameter(**p) for p in data.get("parameters", [])]
return AmbiguityAnalysis(
instruction=instruction,
overall_confidence=data["overall_confidence"],
is_safe_to_proceed=data["is_safe_to_proceed"],
parameters=parameters,
documented_assumptions=data.get("documented_assumptions", []),
clarification_questions=data.get("clarification_questions", []),
)
def should_ask(self, analysis: AmbiguityAnalysis, task_duration_minutes: float = 5.0) -> bool:
"""
Decide whether to ask clarification questions.
Factors:
- Overall confidence below threshold
- Presence of HIGH ambiguity parameters
- Task duration (longer tasks warrant more upfront clarity)
- Reversibility of potential wrong actions
"""
# Always ask if there are irreversible HIGH-ambiguity params
irreversible_high = [
p for p in analysis.parameters
if p.ambiguity_level == AmbiguityLevel.HIGH and not p.reversible_if_wrong
]
if irreversible_high:
return True
# Ask if confidence is below threshold AND task is non-trivial
if analysis.overall_confidence < self.ask_threshold and task_duration_minutes > 1.0:
return True
# For very long tasks (> 30 min), lower the threshold
if task_duration_minutes > 30 and analysis.overall_confidence < 0.85:
return True
return False
def format_clarification_request(
self,
analysis: AmbiguityAnalysis,
max_questions: int = 2,
) -> str:
"""
Format a clarification request to the user.
Never ask more than max_questions questions at once.
"""
# Prioritize by ambiguity level and confidence (lowest confidence first)
urgent_params = sorted(
[p for p in analysis.parameters if p.ambiguity_level == AmbiguityLevel.HIGH],
key=lambda p: p.confidence,
)
questions = analysis.clarification_questions[:max_questions]
if not questions and urgent_params:
# Generate questions from parameters if the LLM didn't provide them
questions = [
self._generate_question(p)
for p in urgent_params[:max_questions]
]
lines = ["Before I proceed, I need to clarify a couple of things:\n"]
for i, q in enumerate(questions, 1):
lines.append(f"{i}. {q}")
# Show what we'll assume for lower-priority ambiguities
medium_params = [p for p in analysis.parameters if p.ambiguity_level == AmbiguityLevel.MEDIUM]
if medium_params:
lines.append(f"\nFor everything else, I'll assume:")
for p in medium_params:
if p.best_guess:
lines.append(f" • {p.name}: {p.best_guess}")
return "\n".join(lines)
def _generate_question(self, param: Parameter) -> str:
"""Generate a specific clarification question for a parameter."""
if param.alternative_interpretations:
options = " / ".join(param.alternative_interpretations[:3])
return f"For {param.name}: {options}?"
return f"Could you clarify {param.description}?"
def proceed_with_assumptions(self, analysis: AmbiguityAnalysis) -> str:
"""
When we decide NOT to ask, document what we're assuming.
Returns a string to show the user (or log internally).
"""
if not analysis.documented_assumptions:
return "Proceeding with the task."
lines = ["Proceeding with these assumptions:"]
for assumption in analysis.documented_assumptions:
lines.append(f" • {assumption}")
lines.append("\nLet me know if any of these are wrong and I'll adjust.")
return "\n".join(lines)
# ─── Multi-Turn Clarification Handler ─────────────────────────────────────────
class ClarificationHandler:
"""
Manages a multi-turn clarification conversation.
Tracks what has been asked, prevents repeat questions.
"""
def __init__(self, analyzer: AmbiguityAnalyzer):
self.analyzer = analyzer
self.context = ConversationContext()
self._initial_analysis: Optional[AmbiguityAnalysis] = None
def start(self, instruction: str, context: str = "") -> str:
"""
Initial analysis. Returns either a clarification request or a proceed message.
"""
analysis = self.analyzer.analyze(instruction, context)
self._initial_analysis = analysis
self.context.interaction_count += 1
print(f"\n📊 Ambiguity Analysis:")
print(f" Overall confidence: {analysis.overall_confidence:.0%}")
print(f" Safe to proceed: {analysis.is_safe_to_proceed}")
print(f" Ambiguous parameters: {len(analysis.parameters)}")
for param in analysis.parameters:
level_emoji = {"high": "🔴", "medium": "🟡", "low": "🟢"}[param.ambiguity_level]
print(f" {level_emoji} {param.name} ({param.confidence:.0%} confident): {param.best_guess}")
if self.analyzer.should_ask(analysis):
return self.analyzer.format_clarification_request(analysis)
else:
return self.analyzer.proceed_with_assumptions(analysis)
def handle_response(self, user_response: str) -> str:
"""
Process user's clarification response.
Re-analyze with the new information and determine if more clarification is needed.
"""
self.context.interaction_count += 1
# Build updated context from the conversation so far
updated_context = f"User clarification: {user_response}"
if self.context.answered_parameters:
updates = "; ".join(f"{k}={v}" for k, v in self.context.answered_parameters.items())
updated_context = f"Previously established: {updates}. {updated_context}"
# Re-analyze with the user's response as context
if self._initial_analysis:
new_analysis = self.analyzer.analyze(
instruction=self._initial_analysis.instruction,
context=updated_context,
)
if new_analysis.overall_confidence > 0.8 or not self.analyzer.should_ask(new_analysis):
return self.analyzer.proceed_with_assumptions(new_analysis)
elif self.context.interaction_count >= 3:
# Limit clarification rounds - proceed after 3 exchanges
return (
"I have enough to proceed. Here's what I'll do:\n" +
self.analyzer.proceed_with_assumptions(new_analysis)
)
else:
return self.analyzer.format_clarification_request(new_analysis, max_questions=1)
return "Ready to proceed."
# ─── Instruction Classifier ────────────────────────────────────────────────────
class InstructionClassifier:
"""
Quickly classifies instructions by ambiguity level before full analysis.
Uses keyword heuristics for speed - no LLM call needed.
"""
HIGH_AMBIGUITY_PATTERNS = [
"make it", "fix it", "improve it", "update it", "clean it up",
"make it better", "make it faster", "make it work",
"do something about", "sort out", "handle the",
]
CLEARLY_SCOPED_PATTERNS = [
"add a", "create a", "write a", "implement", "add test for",
"rename", "delete", "move", "refactor", "extract",
]
def quick_classify(self, instruction: str) -> AmbiguityLevel:
"""
Fast heuristic classification. Use this to decide whether to
run the full LLM analysis (expensive) or proceed directly.
"""
lower = instruction.lower()
for pattern in self.HIGH_AMBIGUITY_PATTERNS:
if pattern in lower:
return AmbiguityLevel.HIGH
# Very short instructions are usually ambiguous
word_count = len(instruction.split())
if word_count < 5:
return AmbiguityLevel.HIGH
if word_count < 10:
return AmbiguityLevel.MEDIUM
for pattern in self.CLEARLY_SCOPED_PATTERNS:
if lower.startswith(pattern) or f" {pattern}" in lower:
return AmbiguityLevel.LOW
return AmbiguityLevel.MEDIUM
def needs_full_analysis(self, instruction: str) -> bool:
"""Should we run the full LLM analysis? Use for routing."""
return self.quick_classify(instruction) != AmbiguityLevel.LOW
# ─── Demo ─────────────────────────────────────────────────────────────────────
def demo_ambiguity_detection():
analyzer = AmbiguityAnalyzer(model="gpt-4o", ask_threshold=0.75)
classifier = InstructionClassifier()
test_instructions = [
"Make the app faster",
"Add a /health endpoint to the FastAPI app in src/main.py that returns {\"status\": \"ok\"}",
"Fix the login bug",
"Migrate the user data to the new schema",
"Improve our test coverage",
]
print("=" * 60)
print("AMBIGUITY DETECTION DEMO")
print("=" * 60)
for instruction in test_instructions:
print(f"\n{'─'*50}")
print(f"Instruction: \"{instruction}\"")
# Quick classify first (no LLM)
quick_level = classifier.quick_classify(instruction)
print(f"Quick classification: {quick_level.value.upper()}")
if not classifier.needs_full_analysis(instruction):
print("→ Clearly scoped, proceeding directly")
continue
# Full LLM analysis
analysis = analyzer.analyze(instruction)
if analyzer.should_ask(analysis, task_duration_minutes=30):
print("\n" + analyzer.format_clarification_request(analysis))
else:
print("\n" + analyzer.proceed_with_assumptions(analysis))
def demo_multi_turn_clarification():
"""Simulate a multi-turn clarification conversation."""
print("\n" + "=" * 60)
print("MULTI-TURN CLARIFICATION DEMO")
print("=" * 60)
analyzer = AmbiguityAnalyzer(model="gpt-4o")
handler = ClarificationHandler(analyzer)
# Initial ambiguous instruction
instruction = "Make the app faster"
print(f"\nUser: {instruction}")
response = handler.start(instruction)
print(f"\nAgent: {response}")
# User clarifies
user_reply = "I mean the API response time - the /search endpoint is taking 3+ seconds"
print(f"\nUser: {user_reply}")
response = handler.handle_response(user_reply)
print(f"\nAgent: {response}")
if __name__ == "__main__":
demo_ambiguity_detection()
# demo_multi_turn_clarification() # Uncomment for conversation demo
Clarification Question Design
Good clarification questions have these properties:
- Single focus: ask about one thing at a time. "Which service, and what's your latency target?" is two questions.
- Specific, not open-ended: "Which service - the API, the frontend, or the database?" is better than "What did you mean?"
- Offer options when possible: giving options reduces cognitive load and surfaces your understanding.
- Order by impact: ask about the highest-stakes ambiguity first. If the user answers that and nothing else, you have the most important information.
Poor question: "Can you clarify what you mean?" Better question: "Should I optimize the backend API endpoint response time, the database query time, or the frontend page render time?"
Production Notes
:::warning The Annoyance Threshold Research on conversational AI systems shows that users tolerate at most 2 clarification questions per task before frustration. Never ask more than 2 questions in a single turn. If you need to resolve 5 ambiguities, ask the 2 most critical ones, infer the rest, and document your assumptions. :::
:::danger Clarification Loops Do not let clarification questions loop indefinitely. If after 3 rounds of clarification the agent still has low confidence, proceed with assumptions rather than asking again. An agent that cannot make progress is worse than one that makes a reasonable assumption. :::
Context as disambiguation: often the ambiguity resolves itself from context. An instruction sent in a thread about the API server almost certainly refers to the API server. Incorporate conversation history, the current working directory, recently modified files, and user preferences into the context before deciding whether to ask.
Interview Questions and Answers
Q: How does your agent decide when to ask for clarification vs. proceed?
A: I use a confidence-weighted decision with several factors: overall confidence in my interpretation (ask if < 70%), presence of irreversible actions (always ask if the wrong action cannot be undone), and task duration (longer tasks warrant higher confidence before starting). I also use quick heuristic classification - short or vague instructions get full LLM analysis; clearly scoped instructions proceed directly. The key principle is that the cost of asking must be less than the expected cost of doing the wrong thing.
Q: What are the failure modes of an overly cautious agent?
A: Several. First, it creates friction that makes users avoid the agent or override it with more explicit instructions. Second, it trains users to write exhaustive specifications upfront, eliminating the value of natural language interaction. Third, it wastes time on low-stakes decisions that obviously have a correct answer. Fourth, it signals that the agent does not understand context - a real assistant would infer "make the app faster" means the system we've been discussing, not every possible app. The solution is to set appropriate confidence thresholds, use context aggressively, and default to proceeding with documented assumptions for medium-confidence scenarios.
Q: How do you design good clarification questions?
A: Three principles: single focus (one question at a time), specific rather than open-ended (offer interpretations as options, not "what did you mean?"), and ordered by impact (most critical ambiguity first). I also limit to 2 questions per turn and explain what I'll assume for everything else, so the user knows what I'll do without their input. This gives them the choice to answer additional questions or let me proceed.
Q: How do you handle instruction ambiguity in a production system with thousands of users?
A: At scale, I build a feedback loop. Log every case where the agent asked for clarification and what the user answered. Use this to: (1) identify frequently ambiguous instruction patterns and build explicit handling for them, (2) calibrate confidence thresholds by measuring how often the agent's assumptions were correct when it did not ask, and (3) personalize - some users write terse instructions and prefer the agent to infer; others write detailed specs and expect precise execution. The system learns user preferences over time.
Q: What is the bounded execution strategy and when is it better than asking upfront?
A: Bounded execution means starting on the unambiguous parts of a task, proceeding until you hit a decision point that requires clarification, then pausing and asking at that natural breakpoint. It is better than asking upfront when: (1) the ambiguous decision is deep in the task and may never be reached (the task might complete without needing that clarification), (2) by the time you reach the decision point you will have more context that makes the decision obvious, (3) the user prefers to see progress before being asked questions. It is worse than asking upfront when: the ambiguous decision affects the direction of all the work, so you could spend significant effort going the wrong direction before reaching the decision point.
