Module 05: Long-Horizon Planning
The Problem Most Agents Cannot Solve
Ask a single LLM to "build a REST API with authentication, tests, CI/CD, and documentation." It will hallucinate a plausible-looking answer. But it will not actually build the software. It cannot - because a single context window, a single generation pass, cannot hold the full complexity of a real engineering task.
Long-horizon tasks are multi-step, stateful, and require managing uncertainty over time. They are where most agents break down. They are also where the real value is.
A long-horizon task has several defining characteristics:
- Multiple steps with dependencies - step 5 cannot start until step 3 is complete
- Accumulated state - later steps depend on the outputs of earlier ones
- Emergent sub-problems - you discover new requirements as you execute
- Failure and recovery - something will go wrong; the agent must handle it
- Time horizon measured in minutes or hours, not seconds
This module teaches you to build agents that can operate reliably across all of it.
Why Long-Horizon Planning Is Hard
These failure modes are not bugs - they are structural properties of LLMs operating on long tasks. The fix is not a better prompt. It is a better architecture.
Module Map
| Lesson | Title | What You Learn |
|---|---|---|
| 01 | Task Decomposition | Break complex goals into dependency graphs; hierarchical decomposition; parallel vs sequential tasks |
| 02 | Planning with LLMs | Zero-shot, CoT, Tree of Thoughts, ReWOO planning; when plans fail and how to replan |
| 03 | Checkpointing and Recovery | Save agent state mid-run; resume after failures; idempotent actions; distributed checkpointing |
| 04 | Handling Ambiguity and Clarification | Detect ambiguous instructions; when to ask vs infer; clarification question design |
| 05 | Interruption and Human-in-the-Loop | Safe vs dangerous actions; async approval workflows; Slack-based HITL; resuming after pause |
| 06 | Evaluation of Long-Horizon Tasks | Trajectory evaluation; efficiency and path quality metrics; LLM-as-judge; benchmark datasets |
What You Will Build
By the end of this module, you will have implemented:
- A hierarchical task decomposer - takes a natural language goal, produces a dependency-ordered DAG of subtasks
- A Tree of Thoughts planner - explores multiple plan branches and selects the best path forward
- A checkpoint-and-recovery system - SQLite-backed state persistence that survives process crashes
- An ambiguity detector - confidence-scored analysis of goal clarity with targeted clarification questions
- A human-in-the-loop interrupt layer - configurable approval thresholds with async notification
- A trajectory evaluator - step-by-step scoring of agent runs across multiple quality dimensions
These components compose into a production-grade long-horizon agent framework.
Prerequisites
You should have completed Module 04 (Tool Use and Function Calling) before this module. You will be calling tools inside planning loops, and the examples assume familiarity with tool schemas and the ReAct loop from Module 01.
The Mental Model
Think of a long-horizon agent as a project manager + engineer hybrid:
The planner breaks work down. The executor does the work. The state manager ensures nothing is lost. The replanner handles the inevitable surprises. Human oversight gates the dangerous decisions.
Core Principle
Long-horizon agents do not succeed by being smarter. They succeed by being systematic. The agent that checkpoints, replans, and asks for help when stuck will outperform the agent that charges ahead confidently every time.
Let's build that agent.
