Skip to main content

Module 6 - AI Security

AI systems introduce a new class of security vulnerabilities that don't map cleanly onto traditional software security. A SQL injection attack exploits predictable, deterministic behavior. Prompt injection exploits the fact that LLMs cannot reliably distinguish between instructions and data. That fundamental ambiguity is what makes AI security uniquely difficult - and why every engineer building production AI systems needs to understand it deeply.

This module covers the full threat landscape: how attackers exploit AI systems, how defenders detect and mitigate attacks, and how to build AI systems that are secure by design. Each lesson pairs theoretical understanding with production-grade code.

Threat Landscape

Lessons in This Module

#LessonWhat You Will Learn
01Prompt InjectionDirect vs indirect injection, instruction hierarchy attacks, real-world exploits (Bing Chat, ChatGPT plugins), detection and defense layers
02Jailbreaks and BypassesDAN prompts, role-play exploits, token smuggling, many-shot jailbreaking, red-team taxonomy, alignment challenges
03Data PoisoningBackdoor attacks, trigger patterns, clean-label attacks, Hugging Face supply chain risks, RAG index poisoning
04Model ExtractionQuery-based extraction, distillation attacks, API rate limiting, watermarking, legal and IP considerations
05Membership InferenceShadow model attacks, likelihood-ratio tests, LLM memorization (Carlini et al.), differential privacy mitigations
06Adversarial ExamplesGCG and AutoDAN token attacks, embedding space attacks, adversarial suffixes, robustness evaluation
07Red Teaming AI SystemsManual vs automated red teaming, PyRIT, Garak, red team composition, operationalizing in SDLC
08Securing RAG SystemsPrompt injection via documents, index poisoning, vector DB access control, PII leakage, multi-tenant security
09AI Security GovernanceNIST AI RMF, EU AI Act, model cards, responsible disclosure, AI bug bounties, SBOM for AI

Why AI Security Is Different

Traditional application security has decades of established patterns: sanitize inputs, parameterize queries, enforce least privilege, patch known CVEs. AI security breaks every one of these assumptions:

  • No clear input/instruction boundary - LLMs process instructions and data in the same token stream
  • Non-deterministic behavior - the same attack may succeed 30% of the time, making testing harder
  • Opaque internals - you cannot inspect what a model "knows" or "intends" at inference time
  • Training data as attack surface - the model itself can be compromised before deployment
  • Emergent capabilities - models do things they were not explicitly trained to do, including unsafe things

This module treats AI security as a first-class engineering discipline, not an afterthought.

© 2026 EngineersOfAI. All rights reserved.