8 docs tagged with "agent-safety"

01 - Agent Risk Taxonomy

Eight categories of agent risk, the confused deputy problem, severity matrices, and a Python risk assessment module.

02 - Minimal Footprint Principle

Least privilege, reversibility preference, scope confirmation, and a Python minimal-footprint agent wrapper.

03 - Prompt Injection in Agents

Indirect prompt injection attacks, real-world examples, detection and defense strategies, and a Python injection defense system.

04 - Guardrails and Action Validation

Pre- and post-action guardrails, composable validators, denylist enforcement, rate limiting, and a complete Python guardrail pipeline.

Human Oversight Mechanisms

Design human oversight that is meaningful, not performative - risk-based interruption, async approval queues, audit trails, and graduated autonomy.

Module 09: Agent Safety

Risk taxonomy, minimal footprint, prompt injection defense, guardrails, human oversight, sandboxing, and responsible deployment.

Responsible Agentic AI

Safety principles, EU AI Act compliance, accountability chains, bias, privacy, red-teaming, and building a safety review process for autonomous agent systems.

Sandboxing Agent Environments

Contain the blast radius of any agent failure - process isolation, Docker security hardening, network policy, E2B cloud sandboxes, and escape vector prevention.