SQL Injection Prevention - Parameterize Everything
Before you read any further, study this FastAPI endpoint and predict what an attacker could extract:
from fastapi import FastAPI, Query
from sqlalchemy import text
from sqlalchemy.ext.asyncio import AsyncSession
app = FastAPI()
@app.get("/courses")
async def search_courses(
q: str = Query(...),
db: AsyncSession = Depends(get_db),
):
query = text(f"SELECT id, title, price FROM courses WHERE title LIKE '%{q}%'")
result = await db.execute(query)
return [dict(row._mapping) for row in result.fetchall()]
An attacker sends q = %' UNION SELECT username, password_hash, '' FROM users --. How many rows from the users table does the attacker get? All of them. By the end of this lesson, you will understand exactly how this works and how to make it impossible.
What You Will Learn
- How SQL injection works at the database driver level
- The three attack patterns: UNION, error-based, and blind injection
- How parameterized queries prevent injection at the protocol level
- SQLAlchemy safety guarantees and their limits
- The danger of
text()with string formatting - Second-order injection - when the attack payload comes from the database
- How to audit a codebase for SQL injection vulnerabilities
- How to use sqlmap for defensive testing
Prerequisites
- SQL fundamentals (SELECT, INSERT, UPDATE, WHERE, UNION)
- SQLAlchemy models and sessions (from Intermediate course)
- FastAPI dependency injection (from Intermediate course)
pip install sqlalchemy asyncpg
Part 1 - How SQL Injection Works
SQL injection exploits the boundary between code and data. When user input is concatenated into a SQL string, the database cannot distinguish between the developer's SQL and the attacker's SQL:
# What the developer intended:
# "Find courses with 'python' in the title"
query = f"SELECT * FROM courses WHERE title LIKE '%python%'"
# What the attacker sends:
user_input = "python%' UNION SELECT id, username, password_hash FROM users --"
query = f"SELECT * FROM courses WHERE title LIKE '%{user_input}%'"
# Resulting SQL:
# SELECT * FROM courses WHERE title LIKE '%python%'
# UNION SELECT id, username, password_hash FROM users --%'
#
# The -- comments out the trailing %'
# The UNION appends every row from the users table to the result
Why String Concatenation Is Fundamentally Broken
The database receives a single string. It cannot tell which parts were written by the developer and which came from the user:
Part 2 \text{---} The Three Attack Patterns
Pattern 1: UNION-Based Injection
UNION combines results from two SELECT statements. The attacker appends a second query to extract data from any table:
# Original query
query = f"SELECT id, title, price FROM courses WHERE title LIKE '\%{q}\%'"
# Attack: q = \%' UNION SELECT 1, username, password_hash FROM users --
# Result SQL:
# SELECT id, title, price FROM courses WHERE title LIKE '\%\%'
# UNION SELECT 1, username, password_hash FROM users --'
# The attacker sees:
# [
# {"id": 1, "title": "admin", "price": "$argon2id$v=19$..."},
# {"id": 2, "title": "alice", "price": "$argon2id$v=19$..."},
# ]
UNION injection requires the attacker to know the number of columns in the original query. They determine this by trial: ' UNION SELECT NULL--, ' UNION SELECT NULL,NULL--, ' UNION SELECT NULL,NULL,NULL-- until the query succeeds.
Pattern 2: Error-Based Injection
The attacker crafts input that causes the database to include sensitive data in error messages:
# PostgreSQL error-based extraction
# Attack: q = ' AND 1=CAST((SELECT password_hash FROM users LIMIT 1) AS INT) --
# Result SQL:
# SELECT * FROM courses WHERE title LIKE '\%' AND 1=CAST((SELECT password_hash FROM users LIMIT 1) AS INT) --'
# PostgreSQL error message:
# ERROR: invalid input syntax for type integer: "$argon2id$v=19$..."
# The password hash is leaked through the error message
Fix: Never expose database error messages to the client:
from fastapi import HTTPException
from sqlalchemy.exc import SQLAlchemyError
@app.get("/courses")
async def search_courses(q: str, db: AsyncSession):
try:
result = await db.execute(
text("SELECT id, title, price FROM courses WHERE title LIKE :q"),
{"q": f"%{q}%"},
)
return [dict(row._mapping) for row in result.fetchall()]
except SQLAlchemyError:
# Log the actual error internally
logger.exception("Database error in course search")
# Return a generic error to the client
raise HTTPException(status_code=500, detail="Internal server error")
Pattern 3: Blind Injection
The attacker cannot see query results or error messages, but can infer information from the application's behavior (response time, HTTP status codes, or content differences):
# Boolean-based blind injection
# Does the first character of the admin's password hash start with '$'?
# Attack: q = ' AND (SELECT SUBSTRING(password_hash,1,1) FROM users WHERE username='admin') = '$' --
# If the response returns results: the character is '$'
# If the response is empty: try the next character
# Time-based blind injection
# Attack: q = ' AND (SELECT CASE WHEN (SUBSTRING(password_hash,1,1)='$') THEN pg_sleep(5) ELSE pg_sleep(0) END FROM users WHERE username='admin') --
# If the response takes 5 seconds: the character is '$'
# If the response is instant: try the next character
An attacker can extract an entire database character by character. Automated tools like sqlmap can do this in minutes.
Part 3 \text{---} Parameterized Queries: The Definitive Fix
Parameterized queries send the SQL template and the parameter values separately to the database. The database compiles the SQL first, then binds the parameters as data values. The parameters are never interpreted as SQL code:
With asyncpg (Direct PostgreSQL Driver)
import asyncpg
async def get_user_safe(username: str):
conn = await asyncpg.connect("postgresql://localhost/mydb")
# The $1 placeholder is a parameter - NOT string interpolation
row = await conn.fetchrow(
"SELECT id, username, email FROM users WHERE username = $1",
username,
)
await conn.close()
return row
# Even with malicious input:
# username = "admin' OR '1'='1"
# asyncpg sends to PostgreSQL:
# SQL: SELECT id, username, email FROM users WHERE username = $1
# Param $1: "admin' OR '1'='1" (treated as a literal string value)
# Result: No rows (no user has that literal username)
With psycopg2 (Synchronous PostgreSQL Driver)
import psycopg2
def get_user_safe(username: str):
conn = psycopg2.connect("dbname=mydb")
cur = conn.cursor()
# %s is a parameter placeholder, NOT Python string formatting
cur.execute(
"SELECT id, username, email FROM users WHERE username = %s",
(username,), # Tuple of parameters
)
row = cur.fetchone()
conn.close()
return row
psycopg2 uses %s as the parameter placeholder, but this is NOT Python's % string formatting operator. The %s is handled by the database driver. Never do cursor.execute("... WHERE name = '%s'" % username) - that is string formatting, not parameterization. The correct syntax is cursor.execute("... WHERE name = %s", (username,)).
With SQLAlchemy text()
from sqlalchemy import text
# VULNERABLE - string formatting with text()
query = text(f"SELECT * FROM users WHERE username = '{username}'")
# SAFE - parameterized text()
query = text("SELECT * FROM users WHERE username = :username")
result = await db.execute(query, {"username": username})
Part 4 - SQLAlchemy ORM: Safety Guarantees and Their Limits
What the ORM Protects
SQLAlchemy's ORM automatically parameterizes all filter values:
from sqlalchemy import select
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
class Base(DeclarativeBase):
pass
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
username: Mapped[str]
email: Mapped[str]
# SAFE - the ORM parameterizes the filter value
async def get_user(username: str, db: AsyncSession):
stmt = select(User).where(User.username == username)
result = await db.execute(stmt)
return result.scalar_one_or_none()
# Under the hood, SQLAlchemy generates:
# SELECT users.id, users.username, users.email
# FROM users
# WHERE users.username = :username_1
# With parameter: username_1 = "admin' OR '1'='1"
# SAFE - multiple filters
async def search_users(name: str, role: str, db: AsyncSession):
stmt = (
select(User)
.where(User.username.like(f"%{name}%"))
.where(User.role == role)
)
result = await db.execute(stmt)
return result.scalars().all()
# Both name and role are parameterized automatically
Where the ORM Does NOT Protect You
from sqlalchemy import text, column, literal_column
# VULNERABLE - text() with f-string
async def search_vulnerable(q: str, db: AsyncSession):
stmt = select(User).where(text(f"username = '{q}'"))
result = await db.execute(stmt)
return result.scalars().all()
# VULNERABLE - literal_column with user input
async def sort_vulnerable(sort_col: str, db: AsyncSession):
stmt = select(User).order_by(literal_column(sort_col))
# Attack: sort_col = "1; DROP TABLE users --"
result = await db.execute(stmt)
return result.scalars().all()
# VULNERABLE - raw SQL execution
async def raw_vulnerable(username: str, db: AsyncSession):
await db.execute(text(f"DELETE FROM sessions WHERE username = '{username}'"))
Safe Alternatives
from sqlalchemy import asc, desc
# SAFE - whitelist column names for sorting
ALLOWED_SORT_COLUMNS = {
"username": User.username,
"email": User.email,
"created_at": User.created_at,
}
async def search_safe(
q: str,
sort_by: str = "username",
sort_order: str = "asc",
db: AsyncSession = None,
):
# Validate sort column against whitelist
sort_column = ALLOWED_SORT_COLUMNS.get(sort_by)
if sort_column is None:
raise HTTPException(status_code=400, detail="Invalid sort column")
order_func = asc if sort_order == "asc" else desc
stmt = (
select(User)
.where(User.username.like(f"%{q}%")) # Parameterized by ORM
.order_by(order_func(sort_column)) # Column object, not string
)
result = await db.execute(stmt)
return result.scalars().all()
Whitelist identifiers, parameterize values. Column names, table names, and sort orders cannot be parameterized - they are SQL identifiers, not values. Validate them against a whitelist of allowed options. Never pass user input as a column or table name.
Part 5 - The text() Pitfall in SQLAlchemy
text() is the most common source of SQL injection in SQLAlchemy codebases. It provides raw SQL power but removes ORM safety:
from sqlalchemy import text
# Pattern 1: VULNERABLE - f-string in text()
query = text(f"SELECT * FROM users WHERE id = {user_id}")
# Pattern 2: VULNERABLE - .format() in text()
query = text("SELECT * FROM users WHERE id = {}".format(user_id))
# Pattern 3: VULNERABLE - % formatting in text()
query = text("SELECT * FROM users WHERE id = %s" % user_id)
# Pattern 4: SAFE - named parameter in text()
query = text("SELECT * FROM users WHERE id = :user_id")
result = await db.execute(query, {"user_id": user_id})
# Pattern 5: SAFE - bound parameters
query = text("SELECT * FROM users WHERE id = :user_id").bindparams(user_id=user_id)
result = await db.execute(query)
Auditing for text() Vulnerabilities
# audit_sql.py - Simple static analysis script
import ast
import sys
from pathlib import Path
class SQLInjectionChecker(ast.NodeVisitor):
"""Find potential SQL injection via text() with string formatting."""
def __init__(self, filepath: str):
self.filepath = filepath
self.warnings: list[str] = []
def visit_Call(self, node: ast.Call):
# Look for calls to text()
if isinstance(node.func, ast.Name) and node.func.id == "text":
if node.args:
arg = node.args[0]
# Check if the argument is an f-string or format call
if isinstance(arg, ast.JoinedStr):
self.warnings.append(
f"{self.filepath}:{node.lineno} "
f"DANGER: text() with f-string - SQL injection risk"
)
elif isinstance(arg, ast.Call):
if hasattr(arg.func, "attr") and arg.func.attr == "format":
self.warnings.append(
f"{self.filepath}:{node.lineno} "
f"DANGER: text() with .format() - SQL injection risk"
)
self.generic_visit(node)
def audit_file(filepath: Path) -> list[str]:
source = filepath.read_text()
tree = ast.parse(source)
checker = SQLInjectionChecker(str(filepath))
checker.visit(tree)
return checker.warnings
# Usage:
# for path in Path("app").rglob("*.py"):
# warnings = audit_file(path)
# for w in warnings:
# print(w)
Part 6 - Second-Order Injection
Second-order injection occurs when the attack payload is stored safely in the database but triggers an injection when retrieved and used in a subsequent query:
# Step 1: Attacker registers with a malicious username
# (This is safely parameterized during INSERT)
username = "admin'--"
await db.execute(
text("INSERT INTO users (username) VALUES (:u)"),
{"u": username},
)
# The username "admin'--" is safely stored in the database
# Step 2: Later, another part of the application reads the username
# and uses it unsafely in a different query
user = await db.execute(
text("SELECT username FROM users WHERE id = :id"),
{"id": user_id},
)
username = user.scalar() # "admin'--"
# Step 3: VULNERABLE - the stored value is used in a string-formatted query
await db.execute(
text(f"UPDATE sessions SET active = TRUE WHERE username = '{username}'")
)
# Resulting SQL:
# UPDATE sessions SET active = TRUE WHERE username = 'admin'--'
# The -- comments out the rest, potentially affecting the wrong rows
# FIXED - parameterize EVERY query, even those using internal data
await db.execute(
text("UPDATE sessions SET active = TRUE WHERE username = :username"),
{"username": username},
)
# The username value is always treated as data, never as SQL
Data from your own database is not automatically safe. If a malicious value was stored earlier (through any path), it can trigger injection when used in a string-formatted query. Parameterize every query, even those that use data retrieved from your own tables.
Part 7 - IN Clauses and Dynamic Queries
Building IN (...) clauses safely requires special attention:
# VULNERABLE - joining user-controlled values into SQL
tags = ["python", "security"] # Could be ["python'; DROP TABLE courses --"]
tag_list = "', '".join(tags)
query = text(f"SELECT * FROM courses WHERE tag IN ('{tag_list}')")
# SAFE - SQLAlchemy's in_() operator
from sqlalchemy import select
async def get_courses_by_tags(tags: list[str], db: AsyncSession):
stmt = select(Course).where(Course.tag.in_(tags))
result = await db.execute(stmt)
return result.scalars().all()
# SQLAlchemy generates: WHERE tag IN (:tag_1, :tag_2)
# with parameters: tag_1='python', tag_2='security'
# SAFE - raw SQL with dynamically generated placeholders
async def get_courses_by_tags_raw(tags: list[str], db: AsyncSession):
# Generate named placeholders: :tag_0, :tag_1, :tag_2, ...
placeholders = ", ".join(f":tag_{i}" for i in range(len(tags)))
params = {f"tag_{i}": tag for i, tag in enumerate(tags)}
query = text(f"SELECT * FROM courses WHERE tag IN ({placeholders})")
result = await db.execute(query, params)
return result.fetchall()
# SQL: SELECT * FROM courses WHERE tag IN (:tag_0, :tag_1)
# Params: {tag_0: 'python', tag_1: 'security'}
# The placeholders are generated from trusted code, not user input
Part 8 - Defensive Testing with sqlmap
sqlmap is an open-source tool for automated SQL injection testing. Use it to verify your defenses:
# Install sqlmap
pip install sqlmap
# Test a single endpoint
sqlmap -u "http://localhost:8001/courses?q=test" --batch --level=3
# Test with authentication
sqlmap -u "http://localhost:8001/courses?q=test" \
--headers="Authorization: Bearer <token>" \
--batch --level=3
# Test POST endpoints
sqlmap -u "http://localhost:8001/users" \
--data='{"username":"test","email":"[email protected]"}' \
--headers="Content-Type: application/json" \
--batch --level=3
Interpreting Results
# SAFE output:
[WARNING] GET parameter 'q' does not seem to be injectable
[INFO] all tested parameters do not appear to be injectable
# VULNERABLE output:
[INFO] GET parameter 'q' is vulnerable. Do you want to keep testing? [y/N]
Type: UNION query
Payload: q=-1' UNION SELECT username,password_hash FROM users--
Only run sqlmap against your own applications in development or staging environments. Running it against third-party applications without permission is illegal in most jurisdictions.
Part 9 - Real-World: Auditing a FastAPI + SQLAlchemy Codebase
A systematic audit checklist for SQL injection in a Python web application:
"""
SQL Injection Audit Checklist for FastAPI + SQLAlchemy
=====================================================
1. Search for all uses of text():
grep -rn "text(" app/ --include="*.py"
For each match, verify the argument is a string literal
with :named parameters, NOT an f-string or .format().
2. Search for all uses of execute() with string formatting:
grep -rn "execute(f\"" app/ --include="*.py"
grep -rn "execute(\".*%s" app/ --include="*.py"
grep -rn "execute(\".*\.format" app/ --include="*.py"
Every match is a potential SQL injection vulnerability.
3. Search for literal_column() and column():
grep -rn "literal_column\|column(" app/ --include="*.py"
Verify the argument is NEVER user-controlled input.
4. Search for dynamic ORDER BY:
grep -rn "order_by" app/ --include="*.py"
Verify sort columns are validated against a whitelist.
5. Search for raw SQL in Alembic migrations:
grep -rn "op.execute" alembic/versions/ --include="*.py"
Migrations usually use hardcoded SQL, but verify no
dynamic input is used.
6. Check for second-order injection:
Find places where database query results are used in
subsequent queries. Verify the subsequent query is parameterized.
"""
# Automated audit function
import ast
from pathlib import Path
DANGEROUS_PATTERNS = [
"f-string in text()",
".format() in text()",
"% formatting in text()",
"f-string in execute()",
"literal_column with variable",
]
def audit_project(project_root: str) -> list[dict]:
"""Audit all Python files for SQL injection risks."""
findings = []
root = Path(project_root)
for py_file in root.rglob("*.py"):
try:
source = py_file.read_text()
except Exception:
continue
# Quick string-based scan for obvious patterns
for line_no, line in enumerate(source.splitlines(), 1):
stripped = line.strip()
# text() with f-string
if "text(f" in stripped or 'text(f"' in stripped or "text(f'" in stripped:
findings.append({
"file": str(py_file),
"line": line_no,
"pattern": "text() with f-string",
"severity": "CRITICAL",
"code": stripped,
})
# execute() with f-string
if "execute(f" in stripped:
findings.append({
"file": str(py_file),
"line": line_no,
"pattern": "execute() with f-string",
"severity": "CRITICAL",
"code": stripped,
})
# .format() in SQL context
if (".format(" in stripped and
any(kw in stripped.lower() for kw in ["select", "insert", "update", "delete"])):
findings.append({
"file": str(py_file),
"line": line_no,
"pattern": ".format() in SQL context",
"severity": "HIGH",
"code": stripped,
})
return findings
# Usage:
# findings = audit_project("/app")
# for f in findings:
# print(f"[{f['severity']}] {f['file']}:{f['line']} - {f['pattern']}")
# print(f" Code: {f['code']}")
Key Takeaways
- SQL injection works because the database cannot distinguish developer SQL from user data when they are concatenated
- Parameterized queries send SQL and data separately - the data is never interpreted as code
- SQLAlchemy ORM
.where(),.filter(), and.in_()are safe because they parameterize automatically text()with f-strings,.format(), or%is dangerous - always use:named_params- UNION injection extracts data from other tables; blind injection extracts data character by character
- Second-order injection occurs when safely stored data is later used in a string-formatted query
- Whitelist identifiers (column names, table names) and parameterize values (filter conditions, search terms)
- Never expose database error messages to clients - they can leak schema and data
- Use sqlmap defensively to verify your endpoints are not injectable
- Audit every
text(),execute(), andliteral_column()call in your codebase
Graded Practice Challenges
Level 1 - Identify the Vulnerability
Question 1: Is this query safe from SQL injection?
stmt = select(User).where(User.email == email).order_by(text(sort_column))
Answer
The where clause is safe - the ORM parameterizes email. But the order_by(text(sort_column)) is vulnerable if sort_column comes from user input. An attacker could pass sort_column = "1; DROP TABLE users --" to execute arbitrary SQL. Fix: validate sort_column against a whitelist of allowed column names and use the SQLAlchemy column object directly: order_by(asc(ALLOWED_COLUMNS[sort_column])).
Question 2: What is the difference between these two lines?
# Line A
cursor.execute("SELECT * FROM users WHERE id = %s" % user_id)
# Line B
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
Answer
Line A uses Python's % string formatting operator - it creates the full SQL string with the value embedded, then sends the complete string to the database. This is vulnerable to SQL injection. Line B passes the SQL template and parameters separately to the database driver. The %s is a parameter placeholder handled by psycopg2, not Python string formatting. This is safe - the database treats the parameter as data, never as SQL code.
Question 3: A developer argues "We use an ORM so SQL injection is impossible." When is this wrong?
Answer
The ORM protects standard query builder operations (.where(), .filter(), .in_()), but SQL injection is still possible through: (1) text() with string formatting. (2) literal_column() or column() with user input. (3) Raw session.execute() with concatenated strings. (4) Dynamic column names in order_by(). (5) Custom SQL fragments in hybrid properties. (6) Alembic migrations with dynamic SQL. The ORM reduces the attack surface but does not eliminate it.
Level 2 - Fix the Vulnerability
This search endpoint has SQL injection vulnerabilities. Fix all of them:
@app.get("/api/courses/search")
async def search_courses(
q: str = "",
category: str = "",
sort_by: str = "title",
min_price: float = 0,
db: AsyncSession = Depends(get_db),
):
where_clauses = []
if q:
where_clauses.append(f"title LIKE '%{q}%'")
if category:
where_clauses.append(f"category = '{category}'")
if min_price > 0:
where_clauses.append(f"price >= {min_price}")
where_sql = " AND ".join(where_clauses) if where_clauses else "1=1"
query = text(f"SELECT * FROM courses WHERE {where_sql} ORDER BY {sort_by}")
result = await db.execute(query)
return [dict(row._mapping) for row in result.fetchall()]
Solution
from sqlalchemy import select, asc, desc
from pydantic import BaseModel, Field, constr, confloat
class CourseSearchParams(BaseModel):
q: constr(max_length=100) = ""
category: constr(max_length=50) = ""
sort_by: str = "title"
min_price: confloat(ge=0, le=9999) = 0
ALLOWED_SORT_COLUMNS = {
"title": Course.title,
"price": Course.price,
"created_at": Course.created_at,
}
@app.get("/api/courses/search")
async def search_courses(
params: CourseSearchParams = Depends(),
db: AsyncSession = Depends(get_db),
):
stmt = select(Course)
if params.q:
stmt = stmt.where(Course.title.ilike(f"%{params.q}%"))
if params.category:
stmt = stmt.where(Course.category == params.category)
if params.min_price > 0:
stmt = stmt.where(Course.price >= params.min_price)
sort_column = ALLOWED_SORT_COLUMNS.get(params.sort_by)
if sort_column is None:
raise HTTPException(status_code=400, detail="Invalid sort column")
stmt = stmt.order_by(asc(sort_column))
result = await db.execute(stmt)
return [row.to_dict() for row in result.scalars().all()]
Fixes: (1) All filters use ORM query builder (parameterized). (2) Sort column validated against whitelist. (3) Input length limits via Pydantic. (4) Price range validation. (5) No text() or string concatenation anywhere.
Level 3 - Design a Secure System
Design a dynamic report builder where users can:
- Select which columns to display (from a predefined list)
- Add filter conditions (column, operator, value)
- Choose sort order (column, direction)
- Aggregate data (COUNT, SUM, AVG on specific columns)
- Export results as CSV
The system must be completely immune to SQL injection while supporting these dynamic queries. Document your approach for: building safe dynamic SELECT lists, safe WHERE clauses, safe ORDER BY, safe aggregate functions, and how you would test the implementation.
Design Hints
- Column selection: Map user-friendly names to SQLAlchemy column objects. Accept a list of strings, validate each against a whitelist, and build
select(col1, col2, col3). - Filters: Accept structured filter objects
{"column": "price", "operator": "gte", "value": 50}. Validate column against whitelist. Map operator strings to SQLAlchemy operators:{"eq": __eq__, "gte": __ge__, "lt": __lt__, "like": like}. The value is always parameterized by the ORM. - Sorting: Same whitelist approach as columns. Map direction to
asc()ordesc(). - Aggregates: Map function names to SQLAlchemy functions:
{"count": func.count, "sum": func.sum, "avg": func.avg}. Validate the column and function name against whitelists. - Testing: Write unit tests for every edge case. Use sqlmap against the endpoint. Test with SQL metacharacters in every parameter. Verify the generated SQL with
str(stmt.compile(compile_kwargs={"literal_binds": True}))to visually inspect the query.
What's Next
In the next lesson, Secrets Management, you will learn how to manage database credentials, API keys, and other secrets securely - using environment variables, python-dotenv, Pydantic SecretStr, and dedicated secrets management tools like HashiCorp Vault.
