Skip to main content

8 docs tagged with "agent-safety"

View all tags

01 - Agent Risk Taxonomy

Eight categories of agent risk, the confused deputy problem, severity matrices, and a Python risk assessment module.

03 - Prompt Injection in Agents

Indirect prompt injection attacks, real-world examples, detection and defense strategies, and a Python injection defense system.

Human Oversight Mechanisms

Design human oversight that is meaningful, not performative - risk-based interruption, async approval queues, audit trails, and graduated autonomy.

Module 09: Agent Safety

Risk taxonomy, minimal footprint, prompt injection defense, guardrails, human oversight, sandboxing, and responsible deployment.

Responsible Agentic AI

Safety principles, EU AI Act compliance, accountability chains, bias, privacy, red-teaming, and building a safety review process for autonomous agent systems.

Sandboxing Agent Environments

Contain the blast radius of any agent failure - process isolation, Docker security hardening, network policy, E2B cloud sandboxes, and escape vector prevention.