Skip to main content

Module 05: Long-Horizon Planning

The Problem Most Agents Cannot Solve

Ask a single LLM to "build a REST API with authentication, tests, CI/CD, and documentation." It will hallucinate a plausible-looking answer. But it will not actually build the software. It cannot - because a single context window, a single generation pass, cannot hold the full complexity of a real engineering task.

Long-horizon tasks are multi-step, stateful, and require managing uncertainty over time. They are where most agents break down. They are also where the real value is.

A long-horizon task has several defining characteristics:

  • Multiple steps with dependencies - step 5 cannot start until step 3 is complete
  • Accumulated state - later steps depend on the outputs of earlier ones
  • Emergent sub-problems - you discover new requirements as you execute
  • Failure and recovery - something will go wrong; the agent must handle it
  • Time horizon measured in minutes or hours, not seconds

This module teaches you to build agents that can operate reliably across all of it.


Why Long-Horizon Planning Is Hard

These failure modes are not bugs - they are structural properties of LLMs operating on long tasks. The fix is not a better prompt. It is a better architecture.


Module Map

LessonTitleWhat You Learn
01Task DecompositionBreak complex goals into dependency graphs; hierarchical decomposition; parallel vs sequential tasks
02Planning with LLMsZero-shot, CoT, Tree of Thoughts, ReWOO planning; when plans fail and how to replan
03Checkpointing and RecoverySave agent state mid-run; resume after failures; idempotent actions; distributed checkpointing
04Handling Ambiguity and ClarificationDetect ambiguous instructions; when to ask vs infer; clarification question design
05Interruption and Human-in-the-LoopSafe vs dangerous actions; async approval workflows; Slack-based HITL; resuming after pause
06Evaluation of Long-Horizon TasksTrajectory evaluation; efficiency and path quality metrics; LLM-as-judge; benchmark datasets

What You Will Build

By the end of this module, you will have implemented:

  1. A hierarchical task decomposer - takes a natural language goal, produces a dependency-ordered DAG of subtasks
  2. A Tree of Thoughts planner - explores multiple plan branches and selects the best path forward
  3. A checkpoint-and-recovery system - SQLite-backed state persistence that survives process crashes
  4. An ambiguity detector - confidence-scored analysis of goal clarity with targeted clarification questions
  5. A human-in-the-loop interrupt layer - configurable approval thresholds with async notification
  6. A trajectory evaluator - step-by-step scoring of agent runs across multiple quality dimensions

These components compose into a production-grade long-horizon agent framework.


Prerequisites

You should have completed Module 04 (Tool Use and Function Calling) before this module. You will be calling tools inside planning loops, and the examples assume familiarity with tool schemas and the ReAct loop from Module 01.


The Mental Model

Think of a long-horizon agent as a project manager + engineer hybrid:

The planner breaks work down. The executor does the work. The state manager ensures nothing is lost. The replanner handles the inevitable surprises. Human oversight gates the dangerous decisions.


Core Principle

Long-horizon agents do not succeed by being smarter. They succeed by being systematic. The agent that checkpoints, replans, and asks for help when stuck will outperform the agent that charges ahead confidently every time.

Let's build that agent.

© 2026 EngineersOfAI. All rights reserved.