Module 09: Agent Safety
Why Safety Is an Engineering Problem
In 2024, an Air Canada chatbot autonomously cited a bereavement refund policy that did not exist. Courts held the airline liable for what the agent said. The agent did exactly what it was designed to do - answer questions helpfully - and that was enough to create real legal and financial harm.
Agent safety is not a philosophical concern about future superintelligence. It is an immediate engineering discipline. Every agent you deploy today can take real actions in the world: send emails, call APIs, execute code, modify databases, charge credit cards, delete files. Each action carries risk. Safety engineering is how you keep that risk under control.
This module covers the seven topics you must understand to build agents you would be comfortable deploying in production.
Module Map
Lesson Guide
| Lesson | Topic | Key Skills |
|---|---|---|
| 01 | Risk Taxonomy | Identify and score agent risks before they happen |
| 02 | Minimal Footprint | Design agents with least privilege and reversibility |
| 03 | Prompt Injection | Detect and block injection attacks in agent pipelines |
| 04 | Guardrails | Build composable validation pipelines for agent actions |
| 05 | Human Oversight | Design meaningful oversight without creating bottlenecks |
| 06 | Sandboxing | Isolate agent execution environments at multiple levels |
| 07 | Responsible AI | Navigate regulation, accountability, and ethics in practice |
The Safety Mindset
Safe agent engineering requires a shift in how you think about failure. Traditional software fails by crashing or returning errors. Agents fail by doing the wrong thing convincingly. The agent that confidently books the wrong flight, deletes the wrong file, or sends the wrong email is more dangerous than one that errors out.
The tools in this module address this failure mode directly: limit what the agent can do, validate every action before execution, detect when something is going wrong, and always preserve the ability for humans to intervene.
By the end of this module, you will have the vocabulary, frameworks, and code patterns to build agents that are both capable and safe.
