Skip to main content

Module 08 - Concurrency

Reading time: ~12 minutes | Level: Intermediate → Engineering

Before reading further, predict what happens when two threads each increment a shared counter one million times:

import threading

counter = 0

def increment():
global counter
for _ in range(1_000_000):
counter += 1

t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)

t1.start()
t2.start()
t1.join()
t2.join()

print(counter) # Expected: 2,000,000 - what do you actually get?

Run this on your machine. Run it three times. Note the result.

Show Answer

You will get a different number every time - something like 1,283,419 or 1,847,003. Rarely, if ever, 2,000,000.

The reason is a race condition. The expression counter += 1 is not a single atomic operation. It compiles to three separate bytecode instructions:

  1. LOAD - read the current value of counter into a register
  2. ADD - add 1 to that value
  3. STORE - write the new value back to counter

Here is what goes wrong under concurrent execution:

Thread 1 reads counter = 100
Thread 2 reads counter = 100 ← reads the SAME value before Thread 1 writes
Thread 1 writes counter = 101
Thread 2 writes counter = 101 ← overwrites Thread 1's update - one increment is lost

Both threads read the same value, both add 1, and both write back 101. Two increments happened but the counter only advanced by one. This lost update repeats millions of times, producing a final value that is unpredictably less than 2,000,000.

This is not a Python bug. It is not a CPython bug. It is a fundamental property of any concurrent system where multiple agents modify shared state without coordination. The only fix is to ensure that the read-modify-write cycle is performed atomically - using a lock, a queue, or an atomic primitive.

Every concept in this module exists to help you write concurrent code where this kind of corruption is impossible.

Concurrency is the skill that separates engineers who can build systems that serve thousands of simultaneous users from engineers who can only build systems that serve one. Python has three distinct concurrency models - threading, multiprocessing, and asyncio - each designed for a different problem. Using the wrong model does not just mean slower code. It means code that subtly corrupts data, wastes resources, or never actually runs in parallel at all.

This module gives you the deep understanding to choose and apply the right model every time.

Why Concurrency Matters for Python Engineers

Nearly every production Python system does multiple things at once:

  • A web server handles hundreds of HTTP requests simultaneously
  • A data pipeline fetches from ten APIs while writing results to a database
  • A background worker processes a job queue while the main process serves traffic
  • A CLI tool downloads a thousand files using every available core

Without concurrency, these tasks must happen sequentially. A web server that handles one request at a time cannot serve more than a handful of users. A data pipeline that fetches APIs one by one takes hours instead of minutes.

Understanding concurrency at engineering depth means:

  • Knowing when threading helps and when the GIL makes it useless
  • Knowing why multiprocessing achieves true parallelism and what that costs
  • Writing asyncio code that is correct, not just syntactically valid
  • Diagnosing race conditions before they reach production
  • Using locks, semaphores, and queues correctly to protect shared state
  • Building production async services with proper lifecycle management

The Three Concurrency Models

Python provides three distinct approaches to concurrency. Each solves a different problem. Understanding when to apply each one is the central engineering judgment of this module.

Threading - Concurrent I/O with Shared Memory

threading runs multiple threads within a single Python process. All threads share the same memory space. The Python interpreter's Global Interpreter Lock (GIL) ensures that only one thread executes Python bytecode at a time - but threads can release the GIL while waiting on I/O, which is why threading helps for I/O-bound workloads.

Use threading when: making network requests, reading/writing files, waiting on external services - any work where your code spends most of its time waiting, not computing.

Do not use threading when: performing CPU-intensive computation (image processing, data transformation, matrix math). The GIL ensures that adding threads adds overhead without adding parallelism.

Multiprocessing - True Parallelism for CPU-Bound Work

multiprocessing spawns separate Python processes. Each process has its own GIL, its own memory space, and its own Python interpreter. True parallelism - multiple CPU cores doing computation simultaneously.

Use multiprocessing when: CPU-bound computation that needs to scale across cores - data transformation, image processing, scientific computing, code that would benefit from multiprocessing.Pool to parallelize across cpu_count() workers.

Do not use multiprocessing when: the work is I/O-bound (process startup overhead and inter-process communication cost will dominate), or when the data that must be shared between workers is large (serialization across process boundaries is expensive).

Asyncio - Cooperative Concurrency for High-Throughput I/O

asyncio runs in a single thread with an event loop that switches between tasks when they are waiting on I/O. No threads, no processes - just cooperative multitasking managed by the Python runtime. The event loop can interleave thousands of concurrent network connections in a single thread.

Use asyncio when: building servers, APIs, or clients that handle many concurrent connections - anything where you control the entire async call stack. asyncio has the lowest overhead per concurrent task and scales to tens of thousands of simultaneous connections on a single core.

Do not use asyncio when: you are mixing with synchronous libraries that block (they will block the entire event loop), or when the code is CPU-bound (asyncio does not parallelize CPU work).

The Decision Table

Workload typeRight modelWhy
Network requests (HTTP, database, DNS)asyncio or threadingI/O-bound; GIL is released during I/O
File I/O (reading/writing large files)threading or asyncioI/O-bound; OS handles the waiting
CPU-intensive computation (data processing)multiprocessingNeeds true parallelism; GIL blocks threads
Mixed I/O + CPU (fetch data, then transform)multiprocessing + asyncioUse async for I/O, processes for CPU
High-concurrency server (thousands of connections)asyncioLowest per-task overhead
Existing synchronous codebase, quick parallelismThreadPoolExecutorEasiest integration with sync code
Scientific computing, image processingmultiprocessing.PoolBypasses GIL entirely

The GIL - Python's Most Misunderstood Feature

The Global Interpreter Lock is a mutex inside the CPython interpreter. It ensures that only one thread at a time can execute Python bytecode, even on a multi-core machine.

Why the GIL Exists

CPython's memory management uses reference counting. Every Python object has a reference count - an integer that tracks how many names point to that object. When the reference count reaches zero, the object is deallocated.

Reference counting is not thread-safe. If two threads simultaneously increment or decrement the same object's reference count, the count can corrupt - leading to double-free errors or memory leaks that are nearly impossible to debug. The GIL is CPython's solution: by allowing only one thread to run at a time, the reference count is always modified by exactly one thread, making it safe without per-object locking.

What the GIL Actually Prevents

The GIL prevents multiple threads from executing Python bytecode simultaneously. It does not prevent:

  • Threads from running concurrently during I/O (file reads, network calls, database queries) - I/O operations release the GIL while waiting for the OS
  • NumPy, Pandas, and other C extensions from running in parallel - well-written C extensions release the GIL during computation
  • Multiprocessing - each process has its own GIL

The GIL in Practice

import threading
import time

def cpu_task():
"""Simulates CPU-bound work - the GIL prevents true parallelism."""
x = 0
for _ in range(50_000_000):
x += 1

def io_task():
"""Simulates I/O-bound work - the GIL is released during the sleep."""
time.sleep(1) # time.sleep() releases the GIL

# CPU-bound: 2 threads are no faster than 1 - the GIL serializes them
start = time.perf_counter()
t1 = threading.Thread(target=cpu_task)
t2 = threading.Thread(target=cpu_task)
t1.start(); t2.start()
t1.join(); t2.join()
print(f"2 CPU threads: {time.perf_counter() - start:.2f}s")
# Result: roughly 2x the single-thread time - SLOWER, not faster

# I/O-bound: 2 threads CAN overlap because the GIL is released during sleep
start = time.perf_counter()
t1 = threading.Thread(target=io_task)
t2 = threading.Thread(target=io_task)
t1.start(); t2.start()
t1.join(); t2.join()
print(f"2 I/O threads: {time.perf_counter() - start:.2f}s")
# Result: ~1 second, not 2 - threads ran concurrently

:::note The GIL Is a CPython Detail The GIL exists in CPython - the standard Python interpreter. PyPy, Jython, and GraalPy have different implementations. Python 3.13 introduced an experimental "free-threaded" build (PEP 703) that removes the GIL. In standard CPython, which is what 99% of production Python runs on, the GIL is a real constraint and must be understood. :::

:::tip GIL Removal in Python 3.13+ Python 3.13 shipped an experimental free-threaded mode (python3.13t) that disables the GIL. It is not production-ready yet - the ecosystem is still adapting - but it is the future. Understanding GIL constraints today makes the transition to no-GIL Python straightforward: your multiprocessing patterns become threading patterns, and your asyncio patterns remain unchanged. :::

What You Will Learn

This module covers eight lessons plus two projects:

#LessonCore concept
01ThreadingThread, daemon threads, thread lifecycle, GIL impact on threading
02MultiprocessingProcess, Pool, ProcessPoolExecutor, true parallelism
03Asyncioasync/await, coroutines, gather, create_task
04Event LoopHow the event loop works, run_until_complete, get_event_loop
05Race ConditionsWhat they are, how to reproduce them, how to prevent them
06Locks and SemaphoresLock, RLock, Semaphore, Event, Condition, Barrier
07ThreadPoolExecutorconcurrent.futures, as_completed, timeout, cancellation
08Async API ServiceEnd-to-end async FastAPI service with asyncpg, background tasks

The Concurrency Landscape

The three models are not mutually exclusive - production systems often combine them:

A FastAPI web server is a real example of this combined model:

  • The event loop handles incoming HTTP connections concurrently
  • Thread pool (run_in_executor) executes synchronous database calls without blocking the loop
  • Process pool handles CPU-intensive routes (image resizing, PDF generation)

Understanding when and how to combine these tools is the capstone skill of this module.

Module Prerequisites

  • Python fundamentals: functions, classes, context managers (with statements), exception handling
  • Module 03 - Python Internals (recommended): the GIL Explained lesson provides deep background on reference counting and CPython internals
  • Module 04 - Testing and Quality: the projects require pytest and pytest-asyncio
  • Module 06 - APIs and Web Basics: Lesson 08 (Async API Service) builds on FastAPI

You do not need prior concurrency experience. The module builds from the simplest threading primitives to a production async service.

The Engineering Standard

Every concept in this module is grounded in how production Python systems handle concurrency:

  • asyncio is the standard for Python web frameworks - FastAPI, Starlette, and aiohttp are all async-native. Any engineer building Python APIs in 2025 needs to understand async/await, the event loop, and how to avoid blocking it
  • ThreadPoolExecutor is the standard tool for parallelizing I/O-bound work in synchronous codebases - it is simpler than raw threading and plugs directly into asyncio via loop.run_in_executor()
  • ProcessPoolExecutor is the standard for CPU-bound parallelism in Python - it abstracts away the multiprocessing module's lower-level API and integrates with the concurrent.futures interface
  • Race conditions and locks are not optional knowledge - they appear in every concurrent system, and the bugs they produce (data corruption, deadlocks, intermittent failures) are among the hardest to debug in production
  • asyncpg and aiomysql are the standard async database drivers for production Python services - Lesson 08 demonstrates correct async database connection pooling, which is non-trivial to get right

:::warning Concurrency Bugs Are Timing-Dependent The most dangerous property of concurrency bugs is that they are not reproducible on demand. A race condition might occur one time in ten thousand runs - or only under production load with real latency. This is why the lessons in this module emphasize writing provably correct concurrent code using proper primitives, not "it seems to work" threading code that happens to be correct under your test conditions. :::

:::danger Never Use Mutable Shared State Without Protection The opening puzzle in this overview demonstrates what happens with unprotected shared mutable state. In production, the consequences are worse than an incorrect counter - they include corrupted database records, partial writes, lost updates, and audit logs with missing entries. Every lesson that introduces shared state also introduces the primitive that protects it. Use them. :::

How to Follow Along

Set up your environment once:

python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate

pip install fastapi uvicorn asyncpg aiohttp httpx
pip install pytest pytest-asyncio pytest-mock

No external services are required for the first seven lessons. Lesson 08 (Async API Service) optionally uses PostgreSQL via asyncpg - a Docker setup is provided in that lesson.

:::tip Run the Concurrency Examples Repeatedly Concurrency bugs are non-deterministic. When a code example demonstrates a race condition or a lock, run it five or ten times. Notice whether the output changes. Understanding that concurrent behavior is inherently non-reproducible is as important as understanding the technical fix. :::

Key Takeaways

  • Python has three concurrency models - threading (shared memory, I/O-bound), multiprocessing (separate memory, CPU-bound), and asyncio (event loop, high-concurrency I/O) - and each solves a different problem
  • The GIL prevents multiple threads from executing Python bytecode simultaneously, making threading ineffective for CPU-bound work but perfectly effective for I/O-bound work
  • Race conditions arise whenever multiple concurrent agents modify shared state without coordination - they produce non-deterministic, intermittent bugs that are among the hardest to diagnose in production
  • The correct model for the workload matters more than any implementation detail: the wrong model produces code that is slower, more complex, and less safe than single-threaded code
  • This module covers the full concurrency stack: from raw Thread and Process primitives through concurrent.futures executors to a production asyncio service
  • Two projects apply the material end-to-end: a concurrent web scraper and a production async API system

What's Next

Lesson 01 opens with threading - the simplest concurrency primitive in Python. You will learn to create and manage threads, understand the thread lifecycle, discover exactly what the GIL does and does not protect, reproduce a race condition with shared mutable state, and build a concurrent file downloader. Threading is the foundation - even if you eventually write all new code with asyncio, you will work with threaded codebases constantly, and the concepts carry directly into every other concurrency model.

© 2026 EngineersOfAI. All rights reserved.