Module 6 - AI Security
AI systems introduce a new class of security vulnerabilities that don't map cleanly onto traditional software security. A SQL injection attack exploits predictable, deterministic behavior. Prompt injection exploits the fact that LLMs cannot reliably distinguish between instructions and data. That fundamental ambiguity is what makes AI security uniquely difficult - and why every engineer building production AI systems needs to understand it deeply.
This module covers the full threat landscape: how attackers exploit AI systems, how defenders detect and mitigate attacks, and how to build AI systems that are secure by design. Each lesson pairs theoretical understanding with production-grade code.
Threat Landscape
Lessons in This Module
| # | Lesson | What You Will Learn |
|---|---|---|
| 01 | Prompt Injection | Direct vs indirect injection, instruction hierarchy attacks, real-world exploits (Bing Chat, ChatGPT plugins), detection and defense layers |
| 02 | Jailbreaks and Bypasses | DAN prompts, role-play exploits, token smuggling, many-shot jailbreaking, red-team taxonomy, alignment challenges |
| 03 | Data Poisoning | Backdoor attacks, trigger patterns, clean-label attacks, Hugging Face supply chain risks, RAG index poisoning |
| 04 | Model Extraction | Query-based extraction, distillation attacks, API rate limiting, watermarking, legal and IP considerations |
| 05 | Membership Inference | Shadow model attacks, likelihood-ratio tests, LLM memorization (Carlini et al.), differential privacy mitigations |
| 06 | Adversarial Examples | GCG and AutoDAN token attacks, embedding space attacks, adversarial suffixes, robustness evaluation |
| 07 | Red Teaming AI Systems | Manual vs automated red teaming, PyRIT, Garak, red team composition, operationalizing in SDLC |
| 08 | Securing RAG Systems | Prompt injection via documents, index poisoning, vector DB access control, PII leakage, multi-tenant security |
| 09 | AI Security Governance | NIST AI RMF, EU AI Act, model cards, responsible disclosure, AI bug bounties, SBOM for AI |
Why AI Security Is Different
Traditional application security has decades of established patterns: sanitize inputs, parameterize queries, enforce least privilege, patch known CVEs. AI security breaks every one of these assumptions:
- No clear input/instruction boundary - LLMs process instructions and data in the same token stream
- Non-deterministic behavior - the same attack may succeed 30% of the time, making testing harder
- Opaque internals - you cannot inspect what a model "knows" or "intends" at inference time
- Training data as attack surface - the model itself can be compromised before deployment
- Emergent capabilities - models do things they were not explicitly trained to do, including unsafe things
This module treats AI security as a first-class engineering discipline, not an afterthought.
