Module 08 - Concurrency

Reading time: ~12 minutes | Level: Intermediate → Engineering

Before reading further, predict what happens when two threads each increment a shared counter one million times:

import threading

counter = 0

def increment():
    global counter
    for _ in range(1_000_000):
        counter += 1

t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)

t1.start()
t2.start()
t1.join()
t2.join()

print(counter)  # Expected: 2,000,000 - what do you actually get?

Run this on your machine. Run it three times. Note the result.

Show Answer

You will get a different number every time - something like 1,283,419 or 1,847,003. Rarely, if ever, 2,000,000.

The reason is a race condition. The expression counter += 1 is not a single atomic operation. It compiles to three separate bytecode instructions:

LOAD - read the current value of counter into a register
ADD - add 1 to that value
STORE - write the new value back to counter

Here is what goes wrong under concurrent execution:

Thread 1 reads counter = 100
Thread 2 reads counter = 100   ← reads the SAME value before Thread 1 writes
Thread 1 writes counter = 101
Thread 2 writes counter = 101  ← overwrites Thread 1's update - one increment is lost

Both threads read the same value, both add 1, and both write back 101. Two increments happened but the counter only advanced by one. This lost update repeats millions of times, producing a final value that is unpredictably less than 2,000,000.

This is not a Python bug. It is not a CPython bug. It is a fundamental property of any concurrent system where multiple agents modify shared state without coordination. The only fix is to ensure that the read-modify-write cycle is performed atomically - using a lock, a queue, or an atomic primitive.

Every concept in this module exists to help you write concurrent code where this kind of corruption is impossible.

Concurrency is the skill that separates engineers who can build systems that serve thousands of simultaneous users from engineers who can only build systems that serve one. Python has three distinct concurrency models - threading, multiprocessing, and asyncio - each designed for a different problem. Using the wrong model does not just mean slower code. It means code that subtly corrupts data, wastes resources, or never actually runs in parallel at all.

This module gives you the deep understanding to choose and apply the right model every time.

Why Concurrency Matters for Python Engineers

Nearly every production Python system does multiple things at once:

A web server handles hundreds of HTTP requests simultaneously
A data pipeline fetches from ten APIs while writing results to a database
A background worker processes a job queue while the main process serves traffic
A CLI tool downloads a thousand files using every available core

Without concurrency, these tasks must happen sequentially. A web server that handles one request at a time cannot serve more than a handful of users. A data pipeline that fetches APIs one by one takes hours instead of minutes.

Understanding concurrency at engineering depth means:

Knowing when threading helps and when the GIL makes it useless
Knowing why multiprocessing achieves true parallelism and what that costs
Writing asyncio code that is correct, not just syntactically valid
Diagnosing race conditions before they reach production
Using locks, semaphores, and queues correctly to protect shared state
Building production async services with proper lifecycle management

The Three Concurrency Models

Python provides three distinct approaches to concurrency. Each solves a different problem. Understanding when to apply each one is the central engineering judgment of this module.

Threading - Concurrent I/O with Shared Memory

threading runs multiple threads within a single Python process. All threads share the same memory space. The Python interpreter's Global Interpreter Lock (GIL) ensures that only one thread executes Python bytecode at a time - but threads can release the GIL while waiting on I/O, which is why threading helps for I/O-bound workloads.

Use threading when: making network requests, reading/writing files, waiting on external services - any work where your code spends most of its time waiting, not computing.

Do not use threading when: performing CPU-intensive computation (image processing, data transformation, matrix math). The GIL ensures that adding threads adds overhead without adding parallelism.

Multiprocessing - True Parallelism for CPU-Bound Work

multiprocessing spawns separate Python processes. Each process has its own GIL, its own memory space, and its own Python interpreter. True parallelism - multiple CPU cores doing computation simultaneously.

Use multiprocessing when: CPU-bound computation that needs to scale across cores - data transformation, image processing, scientific computing, code that would benefit from multiprocessing.Pool to parallelize across cpu_count() workers.

Do not use multiprocessing when: the work is I/O-bound (process startup overhead and inter-process communication cost will dominate), or when the data that must be shared between workers is large (serialization across process boundaries is expensive).

Asyncio - Cooperative Concurrency for High-Throughput I/O

asyncio runs in a single thread with an event loop that switches between tasks when they are waiting on I/O. No threads, no processes - just cooperative multitasking managed by the Python runtime. The event loop can interleave thousands of concurrent network connections in a single thread.

Use asyncio when: building servers, APIs, or clients that handle many concurrent connections - anything where you control the entire async call stack. asyncio has the lowest overhead per concurrent task and scales to tens of thousands of simultaneous connections on a single core.

Do not use asyncio when: you are mixing with synchronous libraries that block (they will block the entire event loop), or when the code is CPU-bound (asyncio does not parallelize CPU work).

The Decision Table

Workload type	Right model	Why
Network requests (HTTP, database, DNS)	`asyncio` or `threading`	I/O-bound; GIL is released during I/O
File I/O (reading/writing large files)	`threading` or `asyncio`	I/O-bound; OS handles the waiting
CPU-intensive computation (data processing)	`multiprocessing`	Needs true parallelism; GIL blocks threads
Mixed I/O + CPU (fetch data, then transform)	`multiprocessing` + `asyncio`	Use async for I/O, processes for CPU
High-concurrency server (thousands of connections)	`asyncio`	Lowest per-task overhead
Existing synchronous codebase, quick parallelism	`ThreadPoolExecutor`	Easiest integration with sync code
Scientific computing, image processing	`multiprocessing.Pool`	Bypasses GIL entirely

The GIL - Python's Most Misunderstood Feature

The Global Interpreter Lock is a mutex inside the CPython interpreter. It ensures that only one thread at a time can execute Python bytecode, even on a multi-core machine.

Why the GIL Exists

CPython's memory management uses reference counting. Every Python object has a reference count - an integer that tracks how many names point to that object. When the reference count reaches zero, the object is deallocated.

Reference counting is not thread-safe. If two threads simultaneously increment or decrement the same object's reference count, the count can corrupt - leading to double-free errors or memory leaks that are nearly impossible to debug. The GIL is CPython's solution: by allowing only one thread to run at a time, the reference count is always modified by exactly one thread, making it safe without per-object locking.

What the GIL Actually Prevents

The GIL prevents multiple threads from executing Python bytecode simultaneously. It does not prevent:

Threads from running concurrently during I/O (file reads, network calls, database queries) - I/O operations release the GIL while waiting for the OS
NumPy, Pandas, and other C extensions from running in parallel - well-written C extensions release the GIL during computation
Multiprocessing - each process has its own GIL

The GIL in Practice

import threading
import time

def cpu_task():
    """Simulates CPU-bound work - the GIL prevents true parallelism."""
    x = 0
    for _ in range(50_000_000):
        x += 1

def io_task():
    """Simulates I/O-bound work - the GIL is released during the sleep."""
    time.sleep(1)  # time.sleep() releases the GIL

# CPU-bound: 2 threads are no faster than 1 - the GIL serializes them
start = time.perf_counter()
t1 = threading.Thread(target=cpu_task)
t2 = threading.Thread(target=cpu_task)
t1.start(); t2.start()
t1.join(); t2.join()
print(f"2 CPU threads: {time.perf_counter() - start:.2f}s")
# Result: roughly 2x the single-thread time - SLOWER, not faster

# I/O-bound: 2 threads CAN overlap because the GIL is released during sleep
start = time.perf_counter()
t1 = threading.Thread(target=io_task)
t2 = threading.Thread(target=io_task)
t1.start(); t2.start()
t1.join(); t2.join()
print(f"2 I/O threads: {time.perf_counter() - start:.2f}s")
# Result: ~1 second, not 2 - threads ran concurrently

:::note The GIL Is a CPython Detail The GIL exists in CPython - the standard Python interpreter. PyPy, Jython, and GraalPy have different implementations. Python 3.13 introduced an experimental "free-threaded" build (PEP 703) that removes the GIL. In standard CPython, which is what 99% of production Python runs on, the GIL is a real constraint and must be understood. :::

:::tip GIL Removal in Python 3.13+ Python 3.13 shipped an experimental free-threaded mode (python3.13t) that disables the GIL. It is not production-ready yet - the ecosystem is still adapting - but it is the future. Understanding GIL constraints today makes the transition to no-GIL Python straightforward: your multiprocessing patterns become threading patterns, and your asyncio patterns remain unchanged. :::

What You Will Learn

This module covers eight lessons plus two projects:

#	Lesson	Core concept
01	Threading	`Thread`, daemon threads, thread lifecycle, GIL impact on threading
02	Multiprocessing	`Process`, `Pool`, `ProcessPoolExecutor`, true parallelism
03	Asyncio	`async`/`await`, coroutines, `gather`, `create_task`
04	Event Loop	How the event loop works, `run_until_complete`, `get_event_loop`
05	Race Conditions	What they are, how to reproduce them, how to prevent them
06	Locks and Semaphores	`Lock`, `RLock`, `Semaphore`, `Event`, `Condition`, `Barrier`
07	ThreadPoolExecutor	`concurrent.futures`, `as_completed`, timeout, cancellation
08	Async API Service	End-to-end async FastAPI service with `asyncpg`, background tasks

The Concurrency Landscape

The three models are not mutually exclusive - production systems often combine them:

A FastAPI web server is a real example of this combined model:

The event loop handles incoming HTTP connections concurrently
Thread pool (run_in_executor) executes synchronous database calls without blocking the loop
Process pool handles CPU-intensive routes (image resizing, PDF generation)

Understanding when and how to combine these tools is the capstone skill of this module.

Module Prerequisites

Python fundamentals: functions, classes, context managers (with statements), exception handling
Module 03 - Python Internals (recommended): the GIL Explained lesson provides deep background on reference counting and CPython internals
Module 04 - Testing and Quality: the projects require pytest and pytest-asyncio
Module 06 - APIs and Web Basics: Lesson 08 (Async API Service) builds on FastAPI

You do not need prior concurrency experience. The module builds from the simplest threading primitives to a production async service.

The Engineering Standard

Every concept in this module is grounded in how production Python systems handle concurrency:

asyncio is the standard for Python web frameworks - FastAPI, Starlette, and aiohttp are all async-native. Any engineer building Python APIs in 2025 needs to understand async/await, the event loop, and how to avoid blocking it
ThreadPoolExecutor is the standard tool for parallelizing I/O-bound work in synchronous codebases - it is simpler than raw threading and plugs directly into asyncio via loop.run_in_executor()
ProcessPoolExecutor is the standard for CPU-bound parallelism in Python - it abstracts away the multiprocessing module's lower-level API and integrates with the concurrent.futures interface
Race conditions and locks are not optional knowledge - they appear in every concurrent system, and the bugs they produce (data corruption, deadlocks, intermittent failures) are among the hardest to debug in production
asyncpg and aiomysql are the standard async database drivers for production Python services - Lesson 08 demonstrates correct async database connection pooling, which is non-trivial to get right

:::warning Concurrency Bugs Are Timing-Dependent The most dangerous property of concurrency bugs is that they are not reproducible on demand. A race condition might occur one time in ten thousand runs - or only under production load with real latency. This is why the lessons in this module emphasize writing provably correct concurrent code using proper primitives, not "it seems to work" threading code that happens to be correct under your test conditions. :::

:::danger Never Use Mutable Shared State Without Protection The opening puzzle in this overview demonstrates what happens with unprotected shared mutable state. In production, the consequences are worse than an incorrect counter - they include corrupted database records, partial writes, lost updates, and audit logs with missing entries. Every lesson that introduces shared state also introduces the primitive that protects it. Use them. :::

How to Follow Along

Set up your environment once:

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate

pip install fastapi uvicorn asyncpg aiohttp httpx
pip install pytest pytest-asyncio pytest-mock

No external services are required for the first seven lessons. Lesson 08 (Async API Service) optionally uses PostgreSQL via asyncpg - a Docker setup is provided in that lesson.

:::tip Run the Concurrency Examples Repeatedly Concurrency bugs are non-deterministic. When a code example demonstrates a race condition or a lock, run it five or ten times. Notice whether the output changes. Understanding that concurrent behavior is inherently non-reproducible is as important as understanding the technical fix. :::

Key Takeaways

Python has three concurrency models - threading (shared memory, I/O-bound), multiprocessing (separate memory, CPU-bound), and asyncio (event loop, high-concurrency I/O) - and each solves a different problem
The GIL prevents multiple threads from executing Python bytecode simultaneously, making threading ineffective for CPU-bound work but perfectly effective for I/O-bound work
Race conditions arise whenever multiple concurrent agents modify shared state without coordination - they produce non-deterministic, intermittent bugs that are among the hardest to diagnose in production
The correct model for the workload matters more than any implementation detail: the wrong model produces code that is slower, more complex, and less safe than single-threaded code
This module covers the full concurrency stack: from raw Thread and Process primitives through concurrent.futures executors to a production asyncio service
Two projects apply the material end-to-end: a concurrent web scraper and a production async API system

What's Next

Lesson 01 opens with threading - the simplest concurrency primitive in Python. You will learn to create and manage threads, understand the thread lifecycle, discover exactly what the GIL does and does not protect, reproduce a race condition with shared mutable state, and build a concurrent file downloader. Threading is the foundation - even if you eventually write all new code with asyncio, you will work with threaded codebases constantly, and the concepts carry directly into every other concurrency model.

Why Concurrency Matters for Python Engineers​

The Three Concurrency Models​

Threading - Concurrent I/O with Shared Memory​

Multiprocessing - True Parallelism for CPU-Bound Work​

Asyncio - Cooperative Concurrency for High-Throughput I/O​

The Decision Table​

The GIL - Python's Most Misunderstood Feature​

Why the GIL Exists​

What the GIL Actually Prevents​

The GIL in Practice​

What You Will Learn​

The Concurrency Landscape​

Module Prerequisites​

The Engineering Standard​

How to Follow Along​

Key Takeaways​

What's Next​