Module 07 - Databases
Reading time: ~10 minutes | Level: Intermediate → Engineering
Before reading further, predict what this code does - and more importantly, what it silently destroys:
import psycopg2
conn = psycopg2.connect(dsn="postgresql://localhost/production")
cur = conn.cursor()
# A developer meant to delete only one user's sessions
# They forgot the WHERE clause
cur.execute("DELETE FROM user_sessions")
conn.commit()
cur.close()
conn.close()
Show Answer
This code deletes every row in the user_sessions table - not just one user's sessions. Every active login token, every "remember me" cookie, every OAuth session across the entire application is now gone.
There is no warning. No confirmation prompt. No partial failure. The database executes the DELETE faithfully and returns a row count. The conn.commit() makes it permanent. If there is no recent backup, this is unrecoverable.
The missing WHERE clause is one of the most common and most catastrophic database bugs in production. The database does exactly what you asked - which is exactly the problem.
This bug killed sessions at a startup once. They had 50,000 active users. Every single one was logged out simultaneously at 2pm on a Tuesday. Customer support was flooded. The root cause was identified in 45 seconds. Recovery required restoring from a 24-hour-old backup.
What would have prevented it:
- A database user with restricted DELETE permissions in production
- A pre-commit SQL linter that flags
DELETEwithoutWHERE - A staging environment that mirrors production data volume so the impact would have been visible before deployment
- Transaction wrapping with a
RETURNINGclause to verify the affected row count before commit
Every lesson in this module exists because databases don't protect you from yourself - unless you design the system to.
Databases sit at the center of nearly every production Python system. They are also the most common source of data loss, performance problems, and security vulnerabilities. This module teaches you to work with databases the way a production engineer does - not just "make it work" but "make it safe, fast, and maintainable."
Why Databases Matter for Python Engineers
Every meaningful Python application eventually needs to persist state. Files work for small data. In-memory data structures work for the lifetime of a process. But anything that needs to outlive a restart, be shared between processes or servers, be queried efficiently at scale, or survive a power failure requires a database.
Understanding databases at engineering depth means:
- Writing SQL queries that use indexes instead of full table scans
- Using transactions correctly so that partial failures don't leave data in an inconsistent state
- Designing schemas that handle millions of rows without requiring a full rewrite
- Choosing between raw SQL, query builders, and ORMs based on actual tradeoffs - not cargo cult
- Running migrations in production without downtime or data loss
- Debugging slow queries from first principles using
EXPLAIN ANALYZE
The engineering standard for this module connects to PostgreSQL (the dominant open-source relational database in Python production systems), SQLAlchemy (the most widely used Python ORM and SQL toolkit), and Alembic (the standard migration tool for SQLAlchemy projects).
What You Will Learn
This module covers seven lessons plus two projects:
Lesson 01 - SQL Fundamentals
SQL as a set language - not a procedural one. SELECT, WHERE, ORDER BY, LIMIT with engineering context (what the database actually does). INNER, LEFT, RIGHT, and FULL JOINs. GROUP BY, HAVING, and aggregate functions (COUNT, SUM, AVG, MIN, MAX). Subqueries vs JOINs and their performance implications. Window functions (ROW_NUMBER, RANK, LAG, LEAD) - the foundation of every analytics query. SQL injection: how it works and why parameterized queries are non-negotiable.
Lesson 02 - SQLite
SQLite as a Python-native, file-based database - no server required. The sqlite3 standard library module: creating databases, executing queries, managing cursors. Connection modes: WAL mode, :memory: for testing. SQLite's type system (type affinity). When SQLite is the right choice - embedded applications, testing, CLI tools, local data pipelines. Limitations: no concurrent writes, no network access, no row-level locking.
Lesson 03 - PostgreSQL
Connecting from Python with psycopg2 and psycopg3. Connection strings and environment-based configuration. The cursor API vs the modern execute() interface. Server-side cursors for streaming large result sets. PostgreSQL-specific types: JSONB, arrays, UUID, TIMESTAMP WITH TIME ZONE. Connection pooling with PgBouncer and psycopg3's built-in pool. Production connection management patterns.
Lesson 04 - Transactions
ACID properties - Atomicity, Consistency, Isolation, Durability - and what each one protects against. BEGIN, COMMIT, ROLLBACK at the Python level. Savepoints for partial rollback. Isolation levels: READ COMMITTED, REPEATABLE READ, SERIALIZABLE - what anomalies each level prevents. Deadlocks: how they happen and how to avoid them. Long-running transactions and lock contention in production.
Lesson 05 - Indexing
Why full table scans are slow and how B-tree indexes work. Creating and dropping indexes. Composite indexes and column ordering. Partial indexes and expression indexes. EXPLAIN and EXPLAIN ANALYZE - reading query plans. Index bloat and VACUUM. When indexes hurt (write-heavy tables, low-cardinality columns). The indexing decisions that separate fast queries from slow ones at scale.
Lesson 06 - ORM with SQLAlchemy
SQLAlchemy's two APIs: Core (SQL expression language) and ORM (mapper layer). Declarative model definition. Session lifecycle: add, flush, commit, rollback, expire. Relationships: ForeignKey, relationship(), backref. Lazy vs eager loading - the N+1 query problem and how to diagnose it. Bulk operations. Raw SQL escape hatch. When ORM abstractions leak and how to handle it.
Lesson 07 - Migrations
Why schema changes in production require a migration strategy. Alembic: alembic init, revision, upgrade, downgrade. Auto-generation from SQLAlchemy models. Migration best practices: additive changes first, never rename columns in a single migration, zero-downtime deployment patterns. The expand/contract pattern for safely changing column types in live systems.
Projects
Project A - CRUD App with Database
Build a complete CRUD application backed by SQLite (local development) and PostgreSQL (production): schema design, parameterized queries, transaction safety, connection management, and a test suite using an in-memory SQLite database. The foundation every backend service is built on.
Project B - Transaction-Safe Service
Implement a money-transfer service that demonstrates correct transaction semantics: atomic balance updates, rollback on failure, optimistic locking for concurrent transfers, and idempotency keys to prevent duplicate operations. The patterns that separate toy database code from production-grade data engineering.
The Data Layer Stack
The path from raw SQL to a production ORM is not a replacement hierarchy - it is a series of abstractions, each solving a specific problem:
Understanding each layer independently is what makes you effective at debugging slow queries, diagnosing ORM-generated SQL, and choosing the right abstraction for each problem.
:::tip Know All the Layers Engineers who only know the ORM are helpless when it generates a slow query. Engineers who only know raw SQL write boilerplate that a good ORM eliminates. The goal of this module is mastery of all layers so that you can move between them consciously, choosing the right tool for each problem. :::
Module Prerequisites
- Python fundamentals: functions, classes, context managers, exception handling
- Module 04 - Testing and Quality: you will write database tests using in-memory SQLite and pytest fixtures
- Module 05 - Packaging and Environments: you will install
psycopg2,sqlalchemy, andalembic - Module 06 - APIs and Web Basics (recommended but not required): several examples connect database code to API endpoints
You do not need prior database or SQL experience. The module builds from first principles, starting with SQL as a language before touching any Python library.
The Engineering Standard
Every concept in this module is grounded in how production database systems actually work:
- PostgreSQL is the default relational database for Python greenfield projects in 2025 - it is open source, ACID-compliant, has excellent Python driver support, and is available on every major cloud platform (RDS, Cloud SQL, Supabase, Neon)
- SQLAlchemy is the de facto standard Python database toolkit - it powers Django-adjacent applications, Flask/FastAPI backends, data pipelines, and CLI tools. Understanding SQLAlchemy Core separates engineers who can optimize queries from those who can only generate them
- Alembic is used by nearly every SQLAlchemy project in production - understanding migration semantics (expand/contract, zero-downtime, rollback safety) is required for any team that deploys database schema changes to live systems
- Transactions are the fundamental primitive of data integrity - every engineer who writes to a database must understand ACID and isolation levels, because the defaults are not always safe
:::warning Database Bugs Are Often Invisible Until Too Late A bad SQL query returns wrong data silently. A missing WHERE clause deletes everything. A missing index makes a query fast in development (100 rows) and catastrophically slow in production (10 million rows). A transaction isolation bug causes a race condition that only manifests under concurrent load. The gap between "works on my machine" and "safe in production" is larger for databases than for almost any other layer. This module closes that gap. :::
:::note This Module Uses Both SQLite and PostgreSQL Most lessons use SQLite for examples because it requires no setup - it is part of the Python standard library. PostgreSQL examples are clearly marked and require a local or cloud PostgreSQL instance. If you do not have PostgreSQL available, all concepts apply equally; run the examples against SQLite and read the PostgreSQL-specific notes for future reference. :::
How to Follow Along
Set up your environment once:
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install psycopg2-binary sqlalchemy alembic
pip install pytest pytest-mock
For PostgreSQL, the quickest local setup is Docker:
docker run --name pg-dev \
-e POSTGRES_PASSWORD=devpassword \
-e POSTGRES_DB=devdb \
-p 5432:5432 \
-d postgres:16
Connection string for the examples:
DATABASE_URL = "postgresql://postgres:devpassword@localhost:5432/devdb"
# SQLite equivalent (no server needed):
DATABASE_URL = "sqlite:///./dev.db"
:::tip Use SQLite for Learning, PostgreSQL for Production Patterns Run every example locally with SQLite first - it requires zero setup and the SQL is identical for most queries. Switch to PostgreSQL when you reach the PostgreSQL-specific lesson or when you want to validate production-realistic behavior (concurrent connections, index behavior at scale, PostgreSQL-specific types). :::
Key Takeaways
- Databases are the most consequential layer in a production system - bugs here cause data loss, not just downtime
- This module covers the full data layer stack: SQL fundamentals, SQLite, PostgreSQL, transactions, indexing, SQLAlchemy ORM, and Alembic migrations
- The module builds from raw SQL upward to ORM, so you understand every layer and can move between them consciously
- Two projects apply the material end-to-end: a CRUD application and a transaction-safe service
- Prerequisites are Python fundamentals and basic testing knowledge - no prior database experience required
- The engineering standard connects to PostgreSQL, SQLAlchemy, and Alembic - the production Python database stack
What's Next
Lesson 01 opens with SQL as a set language - a fundamentally different mental model from procedural Python. You will learn why SELECT * FROM orders WHERE user_id = NULL returns nothing (and why that surprises almost everyone), how JOINs work at the relational algebra level, why window functions exist, and how a single line of unsanitized SQL input can destroy a database. SQL is the query language you will use every day as a Python engineer - understanding it deeply is not optional.
