Module 05: Long-Horizon Planning

The Problem Most Agents Cannot Solve

Ask a single LLM to "build a REST API with authentication, tests, CI/CD, and documentation." It will hallucinate a plausible-looking answer. But it will not actually build the software. It cannot - because a single context window, a single generation pass, cannot hold the full complexity of a real engineering task.

Long-horizon tasks are multi-step, stateful, and require managing uncertainty over time. They are where most agents break down. They are also where the real value is.

A long-horizon task has several defining characteristics:

Multiple steps with dependencies - step 5 cannot start until step 3 is complete
Accumulated state - later steps depend on the outputs of earlier ones
Emergent sub-problems - you discover new requirements as you execute
Failure and recovery - something will go wrong; the agent must handle it
Time horizon measured in minutes or hours, not seconds

This module teaches you to build agents that can operate reliably across all of it.

Why Long-Horizon Planning Is Hard

These failure modes are not bugs - they are structural properties of LLMs operating on long tasks. The fix is not a better prompt. It is a better architecture.

Module Map

Lesson	Title	What You Learn
01	Task Decomposition	Break complex goals into dependency graphs; hierarchical decomposition; parallel vs sequential tasks
02	Planning with LLMs	Zero-shot, CoT, Tree of Thoughts, ReWOO planning; when plans fail and how to replan
03	Checkpointing and Recovery	Save agent state mid-run; resume after failures; idempotent actions; distributed checkpointing
04	Handling Ambiguity and Clarification	Detect ambiguous instructions; when to ask vs infer; clarification question design
05	Interruption and Human-in-the-Loop	Safe vs dangerous actions; async approval workflows; Slack-based HITL; resuming after pause
06	Evaluation of Long-Horizon Tasks	Trajectory evaluation; efficiency and path quality metrics; LLM-as-judge; benchmark datasets

What You Will Build

By the end of this module, you will have implemented:

A hierarchical task decomposer - takes a natural language goal, produces a dependency-ordered DAG of subtasks
A Tree of Thoughts planner - explores multiple plan branches and selects the best path forward
A checkpoint-and-recovery system - SQLite-backed state persistence that survives process crashes
An ambiguity detector - confidence-scored analysis of goal clarity with targeted clarification questions
A human-in-the-loop interrupt layer - configurable approval thresholds with async notification
A trajectory evaluator - step-by-step scoring of agent runs across multiple quality dimensions

These components compose into a production-grade long-horizon agent framework.

Prerequisites

You should have completed Module 04 (Tool Use and Function Calling) before this module. You will be calling tools inside planning loops, and the examples assume familiarity with tool schemas and the ReAct loop from Module 01.

The Mental Model

Think of a long-horizon agent as a project manager + engineer hybrid:

The planner breaks work down. The executor does the work. The state manager ensures nothing is lost. The replanner handles the inevitable surprises. Human oversight gates the dangerous decisions.

Core Principle

Long-horizon agents do not succeed by being smarter. They succeed by being systematic. The agent that checkpoints, replans, and asks for help when stuck will outperform the agent that charges ahead confidently every time.

Let's build that agent.

The Problem Most Agents Cannot Solve​

Why Long-Horizon Planning Is Hard​

Module Map​

What You Will Build​

Prerequisites​

The Mental Model​

Core Principle​