Module 08 Projects - Concurrency

These projects are engineering specifications, not tutorials. Each spec defines exactly what the finished system must do - it is your job to figure out how to build it. Read every requirement carefully before writing a single line of code.

By the end of this module you will have built two independent, production-flavored concurrent systems: one that demonstrates threading-based and asyncio-based parallelism by crawling many URLs simultaneously, and one that demonstrates async API design by aggregating live external data with caching, rate limiting, and fault tolerance.

Project Summary

#	Project	Concurrency Model	Key Skills	Difficulty
01	Concurrent Web Scraper	`ThreadPoolExecutor` + `asyncio` + `aiohttp`	Semaphores, retry with exponential backoff, timeout per request, domain rate limiting, structured output	Intermediate
02	Async Data Aggregation API	`asyncio` + FastAPI + `asyncpg`	`asyncio.gather`, `Semaphore`, `wait_for`, in-memory TTL cache, circuit breaker, background refresh, health endpoint	Intermediate–Advanced

What These Projects Test

Project 01 - Concurrent Web Scraper

A command-line scraper that fetches and parses a configurable list of URLs in parallel. You will implement it twice: once with ThreadPoolExecutor (threading model) and once with asyncio and aiohttp (cooperative concurrency model). Comparing the two implementations side-by-side is the core learning goal.

Skills assessed:

Controlling concurrency with a semaphore to avoid overwhelming servers
Retry with exponential backoff - how to handle transient failures without hammering a struggling server
Per-request timeouts - ensuring one slow URL cannot hold up the entire scrape
Domain-level rate limiting - respecting politeness by spacing requests to the same domain
Graceful handling of connection errors, HTTP errors, and malformed HTML without crashing the scraper
Producing structured, machine-readable output (JSON or CSV) rather than raw print statements

Project 02 - Async Data Aggregation API

A FastAPI service that calls three or more external APIs concurrently and returns an aggregated response. You will add an in-memory cache with TTL, a background task that refreshes the cache proactively, a circuit breaker that detects failing upstreams, and a health check endpoint that exposes the status of every upstream.

Skills assessed:

asyncio.gather for concurrent outbound calls with per-task error isolation (return_exceptions=True)
asyncio.Semaphore to cap concurrent external connections
asyncio.wait_for to enforce per-call timeouts so slow upstreams cannot stall your handler
In-memory TTL cache - serving stale-but-fresh data while background tasks refresh
Circuit breaker - stopping calls to a repeatedly failing upstream and recovering automatically
FastAPI BackgroundTasks for cache refresh without blocking the response
Lifespan events (asynccontextmanager) for creating and tearing down shared resources (DB pool, HTTP client)
Health endpoint that shows per-upstream circuit breaker state and DB connectivity

How to Approach Each Project

Read the entire spec before writing any code.
Identify which concurrency primitive solves each sub-problem.
Build the smallest working version first - one URL, one upstream - before adding concurrency.
Add one layer at a time: timeout → retry → semaphore → structured output.
Test failure scenarios explicitly: kill a server, introduce latency, return 500 errors.
Run the acceptance criteria as a checklist - every item must pass.
Attempt at least one extension challenge.

Ground Rules

All external calls must have explicit timeouts. A call without a timeout is a hidden reliability bug.
Failures in one task must not crash sibling tasks. Use return_exceptions=True with gather, or try/except inside each worker.
Concurrency must be bounded. A semaphore or a configured max_workers is required - unbounded concurrency is a denial-of-service risk against the targets you are calling.
Structured output is required. Both projects produce machine-readable output (JSON or a well-defined dict structure), not raw print statements or ad-hoc strings.
No asyncio.sleep(0) as a substitute for real async operations. Use actual async I/O libraries (aiohttp, httpx, asyncpg) for network and database operations.

Folder Structure (Recommended)

Project 01

scraper/
├── scraper_threads.py      # ThreadPoolExecutor implementation
├── scraper_async.py        # asyncio + aiohttp implementation
├── parser.py               # shared HTML parsing logic (sync)
├── models.py               # ScrapeResult dataclass
├── retry.py                # exponential backoff utility
├── rate_limiter.py         # per-domain delay tracker
├── output.py               # JSON / CSV writers
└── tests/
    ├── test_retry.py
    └── test_parser.py

Project 02

aggregator/
├── main.py                 # FastAPI app, lifespan, routes
├── fetcher.py              # per-upstream fetch logic, circuit breakers
├── cache.py                # in-memory TTL cache
├── background.py           # cache refresh background task
├── models.py               # Pydantic request/response models
├── config.py               # upstream URLs, timeout config, semaphore sizes
└── tests/
    ├── test_fetcher.py
    ├── test_cache.py
    └── test_routes.py

Concurrency Model Comparison

Understanding when to choose threading vs asyncio is the meta-lesson of this module:

Dimension	`ThreadPoolExecutor`	`asyncio` + async I/O
Concurrency unit	OS thread	Coroutine
Memory per unit	~8 MB (stack)	~few KB
Max practical concurrency	~50–200 threads	~10,000+ coroutines
GIL impact	Limits CPU parallelism	Irrelevant (single thread)
Good for	Wrapping legacy sync libraries	High-concurrency I/O-bound work
Blocking call in worker	Blocks only that thread	Blocks the ENTIRE event loop
Learning curve	Lower	Higher

Both projects require you to choose the right model and justify the choice. The scraper project has you implement both so you can compare them empirically.

Project Summary​

What These Projects Test​

Project 01 - Concurrent Web Scraper​

Project 02 - Async Data Aggregation API​

How to Approach Each Project​

Ground Rules​

Folder Structure (Recommended)​

Project 01​

Project 02​

Concurrency Model Comparison​