Module 04: Coding Agents

Coding agents are the most commercially successful form of agentic AI deployed today. GitHub Copilot reached 1.3 million paid subscribers within its first year. Cursor grew to tens of thousands of paying developers in months. Devin shocked the software engineering world by achieving 13.86% on SWE-bench - the first AI to autonomously fix real GitHub issues at meaningful scale. Claude Code is being used by engineers at Anthropic and externally to tackle complex multi-file engineering tasks that would take a human engineer hours.

These are not autocomplete tools. They are agents: systems that perceive their environment (the codebase), reason about a task, plan a sequence of actions, execute those actions using tools, observe the results, and iterate until the task is complete or they exhaust their context.

Understanding how coding agents work is the highest-leverage thing you can learn right now in AI engineering. Every software organization will use them. Many will build their own, specialized for their stack.

What Makes Coding Agents Different

Standard code generation - Copilot classic, ChatGPT code mode - is stateless. You give it a prompt, it gives you code. Done. The model has no idea what happened before. It cannot run its own output and see if it works.

Coding agents are stateful and multi-step:

They read the codebase to understand context
They plan a sequence of changes
They execute edits using tools (read file, write file, run tests)
They observe output (test results, errors, linter output)
They iterate - adjusting their plan based on what they learn

This loop can run for dozens or hundreds of steps. The agent is not just generating code; it is engineering software.

Lesson Map

Key Concepts in This Module

The coding loop - The core perception-reasoning-action cycle adapted for software engineering tasks. Understanding this loop is the foundation for everything else.

Context management - Large codebases do not fit in a context window. Coding agents use repo maps, embeddings, and retrieval to bring only relevant code into context.

Edit strategies - There are multiple ways to modify a file: replace the whole thing, apply a unified diff, use search-and-replace, or manipulate the AST. Each has trade-offs you need to understand deeply.

Tool design - A coding agent's capability ceiling is set by its tools. You will implement a complete tool set from scratch.

Test-driven iteration - Running tests after every edit and feeding the output back into the next LLM call is the most powerful technique available. Tests are ground truth.

SWE-bench - The standard evaluation framework for coding agents. You need to understand what these benchmark numbers mean and how to interpret claims about agent capability.

Prerequisites

Before this module, you should have completed:

Module 01: Foundations - Agent architectures, ReAct loop, tool use basics
Module 02: Tools & Function Calling - Tool design patterns, Anthropic API tool use
Module 03: Memory & Knowledge - Context management, retrieval

You should be comfortable with:

Python at an intermediate level - classes, subprocess, file I/O
The Anthropic Python SDK (anthropic.Anthropic().messages.create(...))
Basic software engineering concepts (AST, grep, git, pytest)

What You Will Understand After This Module

By the end of this module, you will be able to:

Explain exactly how Claude Code, Cursor, and Devin work architecturally
Interpret SWE-bench scores and understand what they actually measure
Implement the four major edit strategies and know when to use each
Build a complete, working tool set for a coding agent
Implement a test-driven agent loop that uses pytest output as its feedback signal
Build a functional coding agent from scratch using the Anthropic API

This is not theoretical. Every lesson includes complete, runnable Python code. By the end, you will have a coding agent you built yourself.

Commercial Landscape (2025–2026)

Product	Approach	SWE-bench Verified
Claude Code	Terminal-native, multi-tool, extended thinking	~57%
Devin (Cognition)	Browser + terminal + full VM	~13.86% (original)
Cursor Agent	IDE-integrated, codebase indexing	Not published
Copilot Workspace	GitHub-native, plan-then-edit	Not published
OpenHands (OpenDevin)	Open-source, sandboxed execution	~35%+
SWE-agent (Princeton)	Structured agent-computer interface	~12-18%

The field moves fast. Understanding the architecture means you can track these numbers intelligently rather than just watching headlines.

Let's Build

Start with Lesson 01 to understand the complete architecture of how coding agents reason about and modify codebases. Then proceed through the lessons in order - each one builds on the last, culminating in a complete coding agent you built yourself.

What Makes Coding Agents Different​

Lesson Map​

Key Concepts in This Module​

Prerequisites​

What You Will Understand After This Module​

Commercial Landscape (2025–2026)​

Let's Build​