Skip to main content

Microservices vs Monolith - Making the Right Choice

Two teams build the same product: an e-commerce platform with users, catalog, orders, and payments. Team A starts with microservices on day one. Team B starts with a monolith.

Six months later:

Team A (Microservices from Day 1):
- 4 services, 4 repos, 4 CI pipelines, 4 Docker images
- Spent 2 months on service discovery, API gateway, distributed tracing
- Network timeouts between services cause intermittent order failures
- Schema changes require coordinated deploys across 3 services
- 2 of 5 engineers spend most of their time on infrastructure
- 47 Kubernetes manifests

Team B (Monolith):
- 1 repo, 1 CI pipeline, 1 Docker image
- Shipped 3 major features in the time Team A spent on infrastructure
- Database transactions guarantee order consistency
- Refactoring is a simple find-and-replace across the codebase
- All 5 engineers work on product features
- 1 Dockerfile, 1 docker-compose.yml

Team A made a common mistake: choosing microservices before they understood their domain. This lesson will teach you when each architecture is appropriate and how to evolve between them.

What You Will Learn

  • The monolith-first approach and why Martin Fowler recommends it
  • When splitting into services is genuinely necessary
  • Communication patterns: REST, gRPC, and message queues (Celery + Redis)
  • Shared libraries and service contracts
  • Bounded contexts from Domain-Driven Design as service boundaries
  • Data ownership and the problems of shared databases
  • The eight fallacies of distributed computing
  • The modular monolith as a pragmatic middle ground

Prerequisites

  • Understanding of Clean Architecture and Hexagonal Architecture (Lessons 1-2)
  • Experience with FastAPI and SQLAlchemy
  • Familiarity with the 12-Factor App methodology (previous lesson)
  • Basic understanding of Docker and container orchestration

Part 1 - The Monolith-First Approach

Martin Fowler's advice: "Almost all the successful microservice stories have started with a monolith that got too big and was broken up."

Why Monoliths Are the Right Starting Point

AdvantageWhy It Matters Early On
Simple deploymentOne artifact, one deploy pipeline
Easy refactoringRename a function, IDE handles it across all modules
ACID transactionsOrders and payments in a single database transaction
Low latencyFunction calls, not network calls
Small team fit1-5 engineers can maintain one codebase efficiently
Domain discoveryYou do not know your service boundaries yet

The Premature Decomposition Trap

# Team A's order creation - microservices
# Order Service calls User Service, then Catalog Service, then Payment Service
async def create_order(user_id: int, items: list[dict]) -> Order:
# Network call #1: verify user exists
async with httpx.AsyncClient() as client:
user_resp = await client.get(f"{USER_SERVICE_URL}/users/{user_id}")
if user_resp.status_code != 200:
raise ValueError("User not found")

# Network call #2: verify item availability and prices
async with httpx.AsyncClient() as client:
catalog_resp = await client.post(
f"{CATALOG_SERVICE_URL}/items/verify",
json={"items": items},
)
if catalog_resp.status_code != 200:
raise ValueError("Items unavailable")

# Network call #3: process payment
async with httpx.AsyncClient() as client:
payment_resp = await client.post(
f"{PAYMENT_SERVICE_URL}/charge",
json={"user_id": user_id, "amount": catalog_resp.json()["total"]},
)
if payment_resp.status_code != 200:
# What if payment fails after catalog reserved items?
# Need a saga or compensation transaction...
raise ValueError("Payment failed")

# Now save the order - but what if THIS fails?
order = Order(user_id=user_id, items=items, status="confirmed")
await order_repo.save(order)
return order
# Team B's order creation - monolith
def create_order(user_id: int, items: list[dict], db: Session) -> Order:
# All in one transaction - either everything succeeds or nothing does
user = db.query(User).get(user_id)
if not user:
raise ValueError("User not found")

catalog_items = db.query(CatalogItem).filter(
CatalogItem.id.in_([i["id"] for i in items])
).all()

total = sum(item.price * qty for item, qty in zip(catalog_items, items))

payment = Payment(user_id=user_id, amount=total, status="charged")
order = Order(user_id=user_id, items=items, payment=payment, status="confirmed")

db.add(payment)
db.add(order)
db.commit() # atomic - all or nothing
return order

The monolith version is simpler, faster, and correct by default (database transactions). The microservices version requires distributed transactions or sagas to achieve the same consistency.

:::tip The "MonolithFirst" Strategy Build a well-structured monolith first. When (and if) a module needs to be extracted, the clean internal boundaries make extraction straightforward. You cannot easily merge two bad microservices, but you can always split a well-structured monolith. :::

Part 2 - When to Split: Real Signals

Splitting makes sense when you have genuine, concrete problems that a monolith cannot solve.

Legitimate Reasons to Split

SignalWhat It Looks LikeMicroservice Solution
Independent scalingThe catalog search handles 100x the traffic of order processingScale catalog independently
Different tech requirementsML model serving needs GPU, web API does notSeparate services on different hardware
Team autonomy30+ engineers stepping on each other's codeTeams own services with clear contracts
Different deployment cadencePayments team deploys monthly (compliance), catalog team deploys dailyIndependent deploy pipelines
Fault isolationA bug in recommendation crashes the entire checkout flowRecommendation failure does not affect orders

Invalid Reasons to Split

SignalWhy It Is Not a Reason
"Microservices are modern"Architecture should solve problems, not follow trends
"Our monolith is messy"A messy monolith becomes messy microservices - fix the structure first
"We might need to scale someday"A well-built monolith handles millions of requests
"Different modules = different services"Modules within a monolith already provide separation
"It looks good on my resume"Please do not inflict unnecessary complexity on your team

Part 3 - Communication Patterns

When services do need to talk to each other, you have three primary options.

Pattern 1: REST (Synchronous HTTP)

# service_a/client.py - calling another service via REST
import httpx
from typing import Optional


class UserServiceClient:
"""REST client for the User Service."""

def __init__(self, base_url: str, timeout: float = 5.0) -> None:
self._base_url = base_url
self._timeout = timeout

async def get_user(self, user_id: int) -> Optional[dict]:
async with httpx.AsyncClient(timeout=self._timeout) as client:
try:
response = await client.get(f"{self._base_url}/users/{user_id}")
response.raise_for_status()
return response.json()
except httpx.TimeoutException:
raise ServiceUnavailableError("User service timed out")
except httpx.HTTPStatusError as e:
if e.response.status_code == 404:
return None
raise


class ServiceUnavailableError(Exception):
pass
ProsCons
Simple, well-understoodSynchronous - caller waits for response
Language-agnosticSerialization overhead (JSON)
Easy to debug (curl)Tight temporal coupling
Standard HTTP tools workCascading failures if service is slow

Pattern 2: gRPC (Synchronous, Binary Protocol)

// user_service.proto
syntax = "proto3";

service UserService {
rpc GetUser(GetUserRequest) returns (User);
rpc ListUsers(ListUsersRequest) returns (stream User);
}

message GetUserRequest {
int32 user_id = 1;
}

message User {
int32 id = 1;
string email = 2;
string name = 3;
bool is_active = 4;
}

message ListUsersRequest {
int32 limit = 1;
int32 offset = 2;
}
# server.py - gRPC server
import grpc
from concurrent import futures
import user_service_pb2
import user_service_pb2_grpc


class UserServicer(user_service_pb2_grpc.UserServiceServicer):
def GetUser(self, request, context):
user = db.query(User).get(request.user_id)
if not user:
context.set_code(grpc.StatusCode.NOT_FOUND)
context.set_details(f"User {request.user_id} not found")
return user_service_pb2.User()
return user_service_pb2.User(
id=user.id, email=user.email, name=user.name, is_active=user.is_active
)


server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
user_service_pb2_grpc.add_UserServiceServicer_to_server(UserServicer(), server)
server.add_insecure_port("[::]:50051")
server.start()
# client.py - gRPC client
import grpc
import user_service_pb2
import user_service_pb2_grpc


def get_user(user_id: int) -> dict:
channel = grpc.insecure_channel("user-service:50051")
stub = user_service_pb2_grpc.UserServiceStub(channel)

try:
response = stub.GetUser(
user_service_pb2.GetUserRequest(user_id=user_id),
timeout=5.0,
)
return {"id": response.id, "email": response.email, "name": response.name}
except grpc.RpcError as e:
if e.code() == grpc.StatusCode.NOT_FOUND:
return None
raise
ProsCons
Binary protocol - fast serializationRequires code generation (protobuf)
Typed contracts (proto files)Harder to debug (not human-readable)
Streaming supportLess ecosystem support than REST
Auto-generated clientsAdditional tooling complexity

Pattern 3: Message Queues (Asynchronous)

# tasks.py - Celery + Redis for async communication
from celery import Celery

celery_app = Celery("myapp", broker="redis://redis:6379/0")


@celery_app.task(bind=True, max_retries=3, default_retry_delay=60)
def send_welcome_email(self, user_id: int, email: str):
"""Async task - runs on a worker, not the web process."""
try:
send_email(
to=email,
subject="Welcome!",
body=f"Welcome to our platform!",
)
except EmailServiceError as e:
self.retry(exc=e)


@celery_app.task(bind=True, max_retries=3)
def generate_certificate(self, user_id: int, course_id: int):
"""Async task - heavy PDF generation runs on worker."""
try:
certificate = create_certificate_pdf(user_id, course_id)
upload_to_s3(certificate)
notify_user(user_id, "Your certificate is ready!")
except Exception as e:
self.retry(exc=e, countdown=120)


# In the web process - fire and forget
@app.post("/enroll")
def enroll(user_id: int, course_id: int, db: Session = Depends(get_db)):
enrollment = Enrollment(user_id=user_id, course_id=course_id)
db.add(enrollment)
db.commit()

# Dispatch async tasks - web process returns immediately
send_welcome_email.delay(user_id, user.email)
generate_certificate.delay(user_id, course_id)

return {"status": "enrolled"}
ProsCons
Decoupled in time - producer does not waitEventual consistency (not immediate)
Natural retry mechanismHarder to debug (async flows)
Load leveling (workers process at their own pace)Queue monitoring required
Fault tolerance (if worker dies, task retries)More infrastructure (broker, workers)

When to Use Each Pattern

PatternUse When
RESTSimple request/response, CRUD operations, public APIs
gRPCInternal service-to-service, high throughput, streaming
Message QueueFire-and-forget, long-running tasks, event-driven workflows

Part 4 - Bounded Contexts and Service Boundaries

Domain-Driven Design's concept of bounded contexts provides the best guidance for where to draw service boundaries.

A bounded context is a portion of the domain where a particular model applies. The same word ("user") can mean different things in different contexts.

The same physical person is a "User" in auth, a "Customer" in billing, an "Author" in catalog, and a "Student" in enrollment. Each context has its own model with only the attributes it needs.

In a Monolith: Modules as Bounded Contexts

myapp/
├── auth/
│ ├── models.py # User (email, password_hash, mfa_enabled)
│ ├── services.py # AuthService
│ └── routes.py
├── billing/
│ ├── models.py # Customer (stripe_id, plan, payment_method)
│ ├── services.py # BillingService
│ └── routes.py
├── catalog/
│ ├── models.py # Course, Author
│ ├── services.py # CatalogService
│ └── routes.py
├── enrollment/
│ ├── models.py # Enrollment, Progress, Certificate
│ ├── services.py # EnrollmentService
│ └── routes.py
└── shared/
└── events.py # Internal event bus

Each module has its own models, services, and routes. Modules communicate through a well-defined internal API (function calls or an in-process event bus), not by reaching into each other's database tables.

# shared/events.py - in-process event bus for the monolith
from typing import Callable, Any
from collections import defaultdict


class EventBus:
"""Simple in-process event bus for inter-module communication."""

def __init__(self) -> None:
self._handlers: dict[str, list[Callable]] = defaultdict(list)

def subscribe(self, event_type: str, handler: Callable) -> None:
self._handlers[event_type].append(handler)

def publish(self, event_type: str, payload: dict[str, Any]) -> None:
for handler in self._handlers[event_type]:
handler(payload)


# Global event bus (or injected via DI)
event_bus = EventBus()


# auth/services.py
def register_user(email: str, password: str):
user = User(email=email, password_hash=hash(password))
db.add(user)
db.commit()
event_bus.publish("user.registered", {"user_id": user.id, "email": email})


# billing/services.py
event_bus.subscribe("user.registered", lambda payload: create_customer(payload["user_id"]))

# enrollment/services.py
event_bus.subscribe("user.registered", lambda payload: create_student_profile(payload["user_id"]))

:::note Extract Services Along Bounded Context Lines When you eventually need microservices, each bounded context becomes a natural service boundary. The internal event bus becomes an external message queue. The function call interface becomes a REST/gRPC API. :::

Part 5 - Data Ownership

The hardest part of microservices is data. Each service should own its data.

The Shared Database Anti-Pattern

Problems:

  • Schema changes in one service break others
  • No clear ownership of tables
  • Impossible to scale databases independently
  • Tight coupling through shared state

Database per Service

Each service owns its database schema. If the Order Service needs user information, it calls the User Service API - it does not query the users table directly.

Practical Compromise: Schema-per-Service

In early stages, you can use separate schemas within the same PostgreSQL instance:

-- Same PostgreSQL server, different schemas
CREATE SCHEMA auth;
CREATE SCHEMA billing;
CREATE SCHEMA catalog;
CREATE SCHEMA enrollment;

-- auth.users is only accessed by the auth module
CREATE TABLE auth.users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL
);

-- billing.customers references auth.users by ID only (no foreign key across schemas)
CREATE TABLE billing.customers (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL, -- no FK to auth.users
stripe_id VARCHAR(255)
);
# auth/models.py
class User(Base):
__tablename__ = "users"
__table_args__ = {"schema": "auth"}

id = Column(Integer, primary_key=True)
email = Column(String, unique=True, nullable=False)
password_hash = Column(String, nullable=False)


# billing/models.py
class Customer(Base):
__tablename__ = "customers"
__table_args__ = {"schema": "billing"}

id = Column(Integer, primary_key=True)
user_id = Column(Integer, nullable=False) # no ForeignKey to auth.users
stripe_id = Column(String)

This gives you data isolation at the schema level while keeping operational simplicity (one database server). When a module needs to become a separate service, it takes its schema with it.

Part 6 - The Eight Fallacies of Distributed Computing

Peter Deutsch and James Gosling identified eight assumptions that developers make about distributed systems, all of which are false.

FallacyRealityPython Impact
1. The network is reliablePackets drop, connections resetAdd timeouts, retries, circuit breakers
2. Latency is zeroNetwork calls add 1-100msBatch requests, cache aggressively
3. Bandwidth is infiniteLarge payloads are slowPagination, compression, gRPC
4. The network is secureTraffic can be interceptedTLS everywhere, even internal
5. Topology doesn't changeServices move, IPs changeService discovery, DNS-based routing
6. There is one administratorMultiple teams, multiple policiesClear ownership, documented contracts
7. Transport cost is zeroSerialization has CPU/memory costChoose efficient formats (protobuf vs JSON)
8. The network is homogeneousDifferent OSes, languages, versionsStandardize on protocols, not implementations

Implementing Resilience in Python

# Retry with exponential backoff
import httpx
import asyncio
from tenacity import retry, stop_after_attempt, wait_exponential


@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=1, max=10),
)
async def call_user_service(user_id: int) -> dict:
async with httpx.AsyncClient(timeout=5.0) as client:
response = await client.get(f"{USER_SERVICE_URL}/users/{user_id}")
response.raise_for_status()
return response.json()
# Circuit breaker pattern
from enum import Enum
from time import time


class CircuitState(Enum):
CLOSED = "closed" # normal operation
OPEN = "open" # failing, reject calls
HALF_OPEN = "half_open" # testing if service recovered


class CircuitBreaker:
def __init__(
self,
failure_threshold: int = 5,
recovery_timeout: float = 30.0,
) -> None:
self._failure_threshold = failure_threshold
self._recovery_timeout = recovery_timeout
self._failure_count = 0
self._last_failure_time = 0.0
self._state = CircuitState.CLOSED

@property
def state(self) -> CircuitState:
if self._state == CircuitState.OPEN:
if time() - self._last_failure_time > self._recovery_timeout:
self._state = CircuitState.HALF_OPEN
return self._state

def record_success(self) -> None:
self._failure_count = 0
self._state = CircuitState.CLOSED

def record_failure(self) -> None:
self._failure_count += 1
self._last_failure_time = time()
if self._failure_count >= self._failure_threshold:
self._state = CircuitState.OPEN

def can_execute(self) -> bool:
state = self.state
if state == CircuitState.CLOSED:
return True
if state == CircuitState.HALF_OPEN:
return True # allow one test request
return False # OPEN - reject


# Usage
user_service_breaker = CircuitBreaker(failure_threshold=5, recovery_timeout=30)

async def get_user_safe(user_id: int) -> dict | None:
if not user_service_breaker.can_execute():
return None # fallback: return cached or default

try:
result = await call_user_service(user_id)
user_service_breaker.record_success()
return result
except Exception:
user_service_breaker.record_failure()
return None

:::danger Every Network Call Can Fail In a monolith, a function call either succeeds or raises an exception. In microservices, a call can also: time out, return garbage, succeed but the response is lost, or succeed on the server but the client never knows. Plan for all of these. :::

Part 7 - The Modular Monolith: The Middle Ground

The modular monolith gives you the organizational benefits of microservices (clear boundaries, team ownership) without the operational complexity (network calls, distributed transactions).

Rules for the Modular Monolith

  1. Each module has a public API - other modules can only call functions in the public interface, never import internal classes.
  2. No cross-module database queries - modules own their tables and expose data through their public API.
  3. Communication through events for side effects - when a user registers, the auth module publishes an event; other modules react.
  4. Each module can be tested independently - mock the event bus and other module APIs.

Implementation

# auth/public.py - the ONLY interface other modules can use
from typing import Optional
from uuid import UUID


class AuthModuleAPI:
"""Public API for the auth module. Other modules call this, not internal services."""

def __init__(self, auth_service: AuthService) -> None:
self._service = auth_service

def get_user_email(self, user_id: UUID) -> Optional[str]:
user = self._service.get_user(user_id)
return user.email if user else None

def verify_user_exists(self, user_id: UUID) -> bool:
return self._service.get_user(user_id) is not None
# order/services.py - uses auth module's public API, not its internals
class OrderService:
def __init__(
self,
order_repo: OrderRepository,
auth_api: AuthModuleAPI, # dependency on public API only
event_bus: EventBus,
) -> None:
self._order_repo = order_repo
self._auth_api = auth_api
self._event_bus = event_bus

def create_order(self, user_id: UUID, items: list[dict]) -> Order:
# Call auth module's public API - NOT a database query
if not self._auth_api.verify_user_exists(user_id):
raise ValueError("User not found")

order = Order(user_id=user_id, items=items)
self._order_repo.save(order)
self._event_bus.publish("order.created", {
"order_id": str(order.id),
"user_id": str(user_id),
})
return order

Enforcing Module Boundaries

# tools/check_boundaries.py - a linting script for import violations
import ast
import sys
from pathlib import Path

# Define allowed imports per module
ALLOWED_IMPORTS = {
"order": {"auth.public", "catalog.public", "shared"},
"billing": {"auth.public", "shared"},
"catalog": {"shared"},
"auth": {"shared"},
}

def check_module(module_name: str, module_path: Path) -> list[str]:
violations = []
allowed = ALLOWED_IMPORTS.get(module_name, set())

for py_file in module_path.rglob("*.py"):
tree = ast.parse(py_file.read_text())
for node in ast.walk(tree):
if isinstance(node, (ast.Import, ast.ImportFrom)):
module = getattr(node, "module", "") or ""
for other_module in ALLOWED_IMPORTS:
if other_module == module_name:
continue
if module.startswith(f"{other_module}.") and not any(
module.startswith(a) for a in allowed
):
violations.append(
f"{py_file}:{node.lineno}: "
f"'{module_name}' imports '{module}' "
f"(only {allowed} allowed)"
)
return violations

# Run as: python tools/check_boundaries.py
# Integrate into CI/CD pipeline

When to Graduate to Microservices

The modular monolith is designed for easy extraction:

Because modules already communicate through a public API and events, extracting a module into a service requires:

  1. Replace the in-process public API with an HTTP/gRPC client
  2. Replace the in-process event bus with a message queue
  3. Give the extracted module its own database

The business logic inside the module does not change at all.

Part 8 - Shared Libraries and Contracts

When you do have multiple services, you need shared contracts to keep them in sync.

Shared Pydantic Models

# shared_contracts/user.py (published as a pip package)
from pydantic import BaseModel
from uuid import UUID


class UserResponse(BaseModel):
id: UUID
email: str
name: str
is_active: bool


class UserCreatedEvent(BaseModel):
user_id: UUID
email: str
timestamp: str
# shared_contracts/pyproject.toml
[project]
name = "myapp-contracts"
version = "1.0.0"
dependencies = ["pydantic>=2.0"]
# In the User Service
from myapp_contracts.user import UserResponse

@app.get("/users/{user_id}", response_model=UserResponse)
def get_user(user_id: UUID):
...

# In the Order Service
from myapp_contracts.user import UserResponse

async def get_user_from_service(user_id: UUID) -> UserResponse:
response = await client.get(f"{USER_SERVICE_URL}/users/{user_id}")
return UserResponse.model_validate(response.json())

Contract Testing

# tests/test_contracts.py
from myapp_contracts.user import UserResponse


def test_user_response_contract():
"""Verify the contract matches the actual API response."""
# This is the shape the User Service actually returns
raw_response = {
"id": "550e8400-e29b-41d4-a716-446655440000",
"email": "[email protected]",
"name": "Alice",
"is_active": True,
}

# If this fails, the contract is out of date
user = UserResponse.model_validate(raw_response)
assert user.email == "[email protected]"

:::tip Version Your Contracts When a service changes its API, bump the shared contract version. Consumer services can pin to the version they support and upgrade at their own pace. This prevents synchronized deploys. :::

Part 9 - Decision Framework

ArchitectureBest ForOperational Cost
MonolithStartups, small teams, MVPs, well-understood domainsLow
Modular MonolithGrowing teams, complex domains, preparing for potential splitMedium
MicroservicesLarge teams, independent scaling needs, polyglot requirementsHigh

Key Takeaways

  • Start with a monolith: you do not know your service boundaries on day one. A monolith lets you discover them through real usage and evolving requirements.
  • The modular monolith is often the right answer: it gives you clean boundaries, team ownership, and testability without the operational overhead of distributed systems.
  • Split only when you have real signals: independent scaling, different deployment cadences, or team size exceeding what a single codebase can support.
  • Bounded contexts define service boundaries: do not split by technical layer (API service, database service). Split by business domain (auth, billing, catalog).
  • Each service owns its data: shared databases create invisible coupling. Use separate schemas as a stepping stone, separate databases as the goal.
  • Every network call can fail: microservices require retries, timeouts, circuit breakers, and fallback strategies that monoliths do not need.
  • Message queues (Celery + Redis) decouple services in time: use async communication for side effects (emails, notifications, report generation) even in monoliths.
  • Shared contracts keep services in sync: publish Pydantic models as a shared package and write contract tests to catch breaking changes.

Graded Practice Challenges

Level 1 - Identify the Issue

Question 1: A team has 3 microservices, each with its own database. The Order Service needs the user's email to send a confirmation. It queries the User Service's database directly. What is wrong?

Answer

This violates data ownership. The Order Service is reaching directly into the User Service's database, creating invisible coupling. If the User Service changes its schema (renames a column, changes the table structure), the Order Service breaks without any contract being violated. The correct approach is for the Order Service to call the User Service's API to get the email, or to receive the email as part of an event when the order is placed.

Question 2: A startup with 3 engineers decides to build with microservices "to be ready for scale." Is this a good decision?

Answer

No. With 3 engineers, the operational overhead of microservices (multiple repos, CI pipelines, service discovery, distributed tracing, network error handling) will consume a disproportionate amount of engineering time. A well-structured monolith or modular monolith would let the team ship features faster. They can extract services later when concrete scaling or organizational signals emerge.

Question 3: The billing module in a modular monolith imports from auth.models import User to check user roles. Is this acceptable?

Answer

No. This violates module boundaries. The billing module should only access the auth module through its public API (auth.public.AuthModuleAPI), not by importing internal models. If billing needs to know user roles, the auth module's public API should expose a method like get_user_role(user_id). Direct model imports create tight coupling that would prevent extracting either module into a separate service.

Level 2 - Refactoring Challenge

You have a monolith where the order processing code directly accesses user data and sends emails:

def create_order(user_id: int, items: list, db: Session):
user = db.query(UserModel).get(user_id) # cross-module DB access
order = OrderModel(user_id=user_id, items=items, total=calculate_total(items))
db.add(order)
db.commit()

# Direct email sending in the request path
smtp = smtplib.SMTP("smtp.company.com")
smtp.sendmail("[email protected]", user.email, f"Order {order.id} confirmed")
smtp.quit()

Refactor into a modular monolith with: (a) an auth module with a public API for getting user emails, (b) an order module that uses the auth public API, (c) an event bus that triggers email sending asynchronously, (d) clear module boundary rules.

Level 3 - Design Challenge

Design the architecture for the EngineersOfAI platform as it scales from a startup (3 engineers) to a growing company (20 engineers):

Phase 1 (3 engineers): Monolith with courses, user auth, and content delivery. Phase 2 (8 engineers): Modular monolith with modules for auth, courses, certificates, and payments. Phase 3 (20 engineers): Extract the certificate generation service (CPU-intensive PDF generation) and the payment service (PCI compliance requires isolation).

For each phase, produce: (a) the architecture diagram, (b) the communication patterns, (c) the data ownership model, (d) the deployment strategy. Explain what triggers the transition from each phase to the next.

What's Next

Congratulations on completing Module 5 - Architecture & Systems Design. You now have the knowledge to make informed architectural decisions, from code-level patterns (Clean Architecture, Hexagonal, DI) to system-level choices (monolith vs microservices, communication patterns, configuration management).

The next module explores Security Engineering, where you will learn to build Python applications that are secure by design - covering authentication, authorization, input validation, cryptography, and common vulnerability patterns.

© 2026 EngineersOfAI. All rights reserved.