Module 03 - Advanced Async & Concurrency
Reading time: ~15 minutes | Level: Advanced
Before reading further, predict what happens here:
import asyncio
async def fetch(url: str) -> str:
await asyncio.sleep(1)
if url == "https://api.example.com/3":
raise ConnectionError(f"Failed to fetch {url}")
return f"Data from {url}"
async def main():
urls = [f"https://api.example.com/{i}" for i in range(5)]
results = await asyncio.gather(
*(fetch(url) for url in urls),
return_exceptions=True,
)
for r in results:
print(r)
asyncio.run(main())
Now answer these questions without running it:
- When the third URL raises
ConnectionError, do the other four tasks get cancelled? - If you remove
return_exceptions=True, what happens to the results from the tasks that already completed? - If two URLs raise different exceptions, which exception do you see?
- Are the tasks that were still running when the exception occurred cleaned up properly?
Show Answer
With return_exceptions=True: all five tasks run to completion. The ConnectionError appears as a value in the results list alongside the four successful strings. No task is cancelled. This is convenient but dangerous - you must manually iterate through results and check isinstance(r, BaseException) to find failures. Nothing forces you to handle them. If you forget, errors are silently swallowed.
Without return_exceptions=True: gather() propagates the first exception it encounters. But the other tasks are not cancelled - they continue running as orphaned tasks. Their results are discarded. If they raise exceptions, those exceptions are logged as "Task exception was never retrieved" warnings. If they hold resources (database connections, file handles, network sockets), those resources leak.
If two URLs raise exceptions, you only see the first one. The second is swallowed.
This is the fundamental problem with asyncio.gather(): it provides no structured lifecycle for its child tasks. Tasks can outlive their parent, exceptions can be lost, and cleanup is not guaranteed. This is why Python 3.11 introduced asyncio.TaskGroup - and why this module exists.
You already know async/await. You can write coroutines, use asyncio.gather(), and build a basic async service. That puts you ahead of most Python developers. It also puts you exactly at the threshold where async code starts failing in ways that are invisible during development and catastrophic in production.
The Intermediate concurrency module taught you the mechanics - how the event loop works, what await does, how to use locks and semaphores. This module teaches you the engineering - how to build async systems that handle failure gracefully, manage resources correctly, and scale under real-world conditions.
Why "Intermediate Async" Is Not Enough
Most async bugs in production share a common pattern: the code works perfectly in the happy path and fails silently in every other path. Consider the real-world scenarios that basic asyncio.gather() and raw async/await cannot handle correctly:
Resource leaks under failure. An async function opens a database connection, starts a transaction, and awaits a network call. The network call times out. Without an async context manager wrapping the connection, the connection leaks. Under load, the connection pool exhausts. The database starts rejecting connections. The entire service goes down - not because of the network timeout, but because of a missing async with.
Uncontrolled concurrency. A scraper launches 10,000 concurrent requests using gather(). The target server rate-limits after 100 requests per second. Without bounded concurrency (an async semaphore), the scraper blasts all 10,000 requests, gets rate-limited, retries them all, gets rate-limited again, and eventually gets IP-banned. The fix is three lines of code - but only if you know async semaphores exist.
Orphaned tasks. A web handler spawns three background tasks with create_task(). The handler returns a response. One background task raises an exception. The exception is logged as a warning and silently ignored. The other two tasks continue running, but their results are never collected. On shutdown, the event loop complains about pending tasks. Under load, hundreds of orphaned tasks accumulate, consuming memory and holding resources.
Graceful shutdown failures. A service receives SIGTERM. It needs to finish processing in-flight requests, close database connections, flush caches, and exit cleanly. Without structured shutdown logic, the event loop is destroyed while tasks are still running. Connections are dropped mid-transaction. Data is partially written.
Every lesson in this module addresses one or more of these failure modes.
The Async Maturity Model
Async proficiency develops in stages. This module takes you from Stage 2 to Stage 4:
You completed Stage 2 in the Intermediate course. This module covers Stages 3 and 4 - the patterns and production practices that separate async code that works in a notebook from async code that runs in production for months without intervention.
What You Will Learn
This module covers seven lessons plus two capstone projects:
| # | Lesson | Core concept |
|---|---|---|
| 01 | Async Generators and Async Iterators | async for, async yield, aiter/anext, async comprehensions, streaming data patterns |
| 02 | Async Context Managers | __aenter__/__aexit__, asynccontextmanager, resource management, connection pools |
| 03 | Structured Concurrency with TaskGroup | asyncio.TaskGroup, exception groups, cancellation semantics, why gather() is broken |
| 04 | Custom Awaitables | __await__ protocol, awaitable objects, coroutine wrappers, Future internals |
| 05 | Advanced Event Loop | Loop internals, custom policies, uvloop, run_in_executor patterns |
| 06 | Async Synchronization Patterns | Async locks, semaphores, events, conditions, bounded concurrency, rate limiting, circuit breakers |
| 07 | Production Async Architecture | Error strategies, graceful shutdown, health checks, backpressure, pytest-asyncio testing |
Projects
| # | Project | What you build |
|---|---|---|
| 01 | Async Web Scraper Pipeline | Rate-limited, fault-tolerant async scraper with backpressure and retry logic |
| 02 | Real-Time Data Stream Processor | Async pipeline processing multiple data streams with TaskGroup and graceful shutdown |
The Architecture of Advanced Async
The seven lessons build on each other in a deliberate sequence. Here is how the concepts connect:
Lessons 01-02 introduce the async protocol extensions - how to produce data asynchronously (generators) and manage resources asynchronously (context managers). These are the building blocks.
Lesson 03 is the centerpiece. asyncio.TaskGroup represents a fundamental shift in how Python handles concurrent tasks. It introduces structured concurrency - the guarantee that child tasks cannot outlive their parent scope. Every production async system should use TaskGroup over gather().
Lesson 04 goes deep into the __await__ protocol - how Python's await keyword actually works at the object level. This is the knowledge that lets you build custom abstractions, not just consume the standard library.
Lesson 05 covers the event loop itself - not as a black box you call asyncio.run() on, but as a programmable runtime you can customize, replace (with uvloop), and extend.
Lesson 06 returns to synchronization but at the async level - bounded concurrency with semaphores, rate limiting, circuit breakers. These are the patterns that prevent async code from overwhelming external services.
Lesson 07 ties everything together into production architecture - how to handle errors across hundreds of concurrent tasks, how to shut down gracefully, how to apply backpressure when producers outpace consumers, and how to test all of it.
What Changed Between Python 3.10 and 3.12 for Async
Several features in this module are relatively new. Understanding what shipped when helps you navigate documentation and legacy codebases:
| Feature | Python version | Significance |
|---|---|---|
asyncio.TaskGroup | 3.11 | Structured concurrency - replaces most uses of gather() |
Exception Groups (ExceptionGroup) | 3.11 | Multiple exceptions from concurrent tasks, handled with except* |
asyncio.Runner | 3.11 | Reusable alternative to asyncio.run() for managing loop lifecycle |
asyncio.eager_task_factory | 3.12 | Tasks that execute synchronously until their first await - reduces overhead |
asyncio.TaskGroup improvements | 3.12 | Better error messages and cancellation behavior |
aiter() / anext() builtins | 3.10 | Async equivalents of iter() / next() - no more __aiter__() calls |
:::note Python Version Requirement
This module targets Python 3.11+ throughout. TaskGroup and ExceptionGroup are core to the material. If you are on 3.10 or earlier, install the exceptiongroup and taskgroup backport packages - but upgrading to 3.11+ is strongly recommended.
:::
The gather() Problem - Why This Module Exists
The opening puzzle demonstrates the core issue, but it is worth stating explicitly because gather() is so widely used.
asyncio.gather() has three design problems that make it unsuitable for production concurrent code:
1. No structured lifecycle. Tasks created by gather() are not bound to the calling scope. If the coroutine that called gather() is cancelled, the gathered tasks may or may not be cancelled depending on implementation details. There is no guarantee.
2. Exception handling is broken. With return_exceptions=False (the default), the first exception cancels gather() but does not reliably cancel the remaining tasks. With return_exceptions=True, exceptions are mixed into the results list as values, and nothing forces the caller to check for them.
3. No cancellation propagation. If you cancel one task in a gather() group, the others continue running. There is no way to express "if any task fails, cancel all of them" - which is the correct behavior for most concurrent operations.
asyncio.TaskGroup solves all three problems:
async def main():
async with asyncio.TaskGroup() as tg:
task1 = tg.create_task(fetch("https://api.example.com/1"))
task2 = tg.create_task(fetch("https://api.example.com/2"))
task3 = tg.create_task(fetch("https://api.example.com/3"))
# All tasks are GUARANTEED to be done here
# If any task raised, ALL tasks are cancelled and an ExceptionGroup is raised
# No orphaned tasks. No silent failures. No resource leaks.
Lesson 03 covers TaskGroup in full depth. The overview here is to explain why this module focuses on patterns that supersede what you learned in the Intermediate course.
Module Prerequisites
This module assumes you have completed the Intermediate concurrency module (Module 08) or have equivalent experience:
- Threading and multiprocessing - you understand threads, processes, the GIL, and when to use each model
- asyncio fundamentals - you can write coroutines with
async/await, useasyncio.gather()andasyncio.create_task(), and understand how the event loop schedules tasks - Synchronization primitives - you have used
Lock,Semaphore,Event, and understand why shared mutable state requires protection - ThreadPoolExecutor - you have used
concurrent.futuresfor parallel I/O - Basic async service - you have built or studied an async web service (FastAPI or similar)
If any of these are unfamiliar, review Module 08 (Concurrency) from the Intermediate course before continuing.
:::tip Test Your Readiness
If you can explain why asyncio.gather() does not provide true cancellation propagation, you are ready for this module. If that statement is surprising, review the Intermediate asyncio lessons first - the gap between "can use async/await" and "understands async lifecycle management" is where most production bugs hide.
:::
How to Follow Along
Set up your environment:
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Core async libraries
pip install aiohttp httpx aiofiles
# Production async tools
pip install uvloop # Linux/macOS only - significant event loop speedup
# Testing
pip install pytest pytest-asyncio
# Optional: for the scraper project
pip install beautifulsoup4 lxml
All examples in this module require Python 3.11 or later. Verify your version:
python --version
# Must be 3.11.0 or later
:::danger Do Not Use Python 3.10 or Earlier
asyncio.TaskGroup and ExceptionGroup are not available before Python 3.11. The backport packages (exceptiongroup, taskgroup) provide partial compatibility but differ in edge-case behavior. The lessons in this module are written against the standard library implementations in 3.11+. Using older versions will produce different behavior in error handling and cancellation scenarios.
:::
Key Takeaways
asyncio.gather()is not production-ready for most concurrent workloads - it cannot guarantee task cleanup, exception visibility, or cancellation propagation- Structured concurrency (
TaskGroup) is the correct model for concurrent async operations - child tasks are bound to a scope and cannot outlive it - Async generators enable streaming data pipelines that process items as they arrive, rather than buffering everything in memory
- Async context managers are the only reliable way to manage resources (connections, files, locks) in async code - manual
try/finallyis error-prone at scale - Production async systems require explicit strategies for error handling, graceful shutdown, backpressure, and bounded concurrency - the happy path is the easy part
- This module covers Python 3.11+ features throughout -
TaskGroup,ExceptionGroup,except*,asyncio.Runner, andeager_task_factory
What's Next
Lesson 01 opens with async generators and async iterators - the async equivalents of Python's iteration protocol. You will learn how to build async data producers that yield values across await boundaries, process streaming data with async for, and compose async generator pipelines. This is the foundation for every data-processing pattern in this module: rather than fetching all data into memory and then processing it, you will learn to process data as it arrives - one item at a time, across network boundaries, without blocking the event loop.
