Module 04: Coding Agents
Coding agents are the most commercially successful form of agentic AI deployed today. GitHub Copilot reached 1.3 million paid subscribers within its first year. Cursor grew to tens of thousands of paying developers in months. Devin shocked the software engineering world by achieving 13.86% on SWE-bench - the first AI to autonomously fix real GitHub issues at meaningful scale. Claude Code is being used by engineers at Anthropic and externally to tackle complex multi-file engineering tasks that would take a human engineer hours.
These are not autocomplete tools. They are agents: systems that perceive their environment (the codebase), reason about a task, plan a sequence of actions, execute those actions using tools, observe the results, and iterate until the task is complete or they exhaust their context.
Understanding how coding agents work is the highest-leverage thing you can learn right now in AI engineering. Every software organization will use them. Many will build their own, specialized for their stack.
What Makes Coding Agents Different
Standard code generation - Copilot classic, ChatGPT code mode - is stateless. You give it a prompt, it gives you code. Done. The model has no idea what happened before. It cannot run its own output and see if it works.
Coding agents are stateful and multi-step:
- They read the codebase to understand context
- They plan a sequence of changes
- They execute edits using tools (read file, write file, run tests)
- They observe output (test results, errors, linter output)
- They iterate - adjusting their plan based on what they learn
This loop can run for dozens or hundreds of steps. The agent is not just generating code; it is engineering software.
Lesson Map
Key Concepts in This Module
The coding loop - The core perception-reasoning-action cycle adapted for software engineering tasks. Understanding this loop is the foundation for everything else.
Context management - Large codebases do not fit in a context window. Coding agents use repo maps, embeddings, and retrieval to bring only relevant code into context.
Edit strategies - There are multiple ways to modify a file: replace the whole thing, apply a unified diff, use search-and-replace, or manipulate the AST. Each has trade-offs you need to understand deeply.
Tool design - A coding agent's capability ceiling is set by its tools. You will implement a complete tool set from scratch.
Test-driven iteration - Running tests after every edit and feeding the output back into the next LLM call is the most powerful technique available. Tests are ground truth.
SWE-bench - The standard evaluation framework for coding agents. You need to understand what these benchmark numbers mean and how to interpret claims about agent capability.
Prerequisites
Before this module, you should have completed:
- Module 01: Foundations - Agent architectures, ReAct loop, tool use basics
- Module 02: Tools & Function Calling - Tool design patterns, Anthropic API tool use
- Module 03: Memory & Knowledge - Context management, retrieval
You should be comfortable with:
- Python at an intermediate level - classes, subprocess, file I/O
- The Anthropic Python SDK (
anthropic.Anthropic().messages.create(...)) - Basic software engineering concepts (AST, grep, git, pytest)
What You Will Understand After This Module
By the end of this module, you will be able to:
- Explain exactly how Claude Code, Cursor, and Devin work architecturally
- Interpret SWE-bench scores and understand what they actually measure
- Implement the four major edit strategies and know when to use each
- Build a complete, working tool set for a coding agent
- Implement a test-driven agent loop that uses pytest output as its feedback signal
- Build a functional coding agent from scratch using the Anthropic API
This is not theoretical. Every lesson includes complete, runnable Python code. By the end, you will have a coding agent you built yourself.
Commercial Landscape (2025–2026)
| Product | Approach | SWE-bench Verified |
|---|---|---|
| Claude Code | Terminal-native, multi-tool, extended thinking | ~57% |
| Devin (Cognition) | Browser + terminal + full VM | ~13.86% (original) |
| Cursor Agent | IDE-integrated, codebase indexing | Not published |
| Copilot Workspace | GitHub-native, plan-then-edit | Not published |
| OpenHands (OpenDevin) | Open-source, sandboxed execution | ~35%+ |
| SWE-agent (Princeton) | Structured agent-computer interface | ~12-18% |
The field moves fast. Understanding the architecture means you can track these numbers intelligently rather than just watching headlines.
Let's Build
Start with Lesson 01 to understand the complete architecture of how coding agents reason about and modify codebases. Then proceed through the lessons in order - each one builds on the last, culminating in a complete coding agent you built yourself.
