What is microservices python?

Navigate the monolith-to-microservices spectrum with Python - bounded contexts, communication patterns, the modular monolith, and practical decision frameworks.

How does monolith architecture work in practice?

Microservices vs Monolith - Making the Right Choice covers microservices python, monolith architecture, service communication from first principles with code examples. Free lesson at https://engineersofai.com/docs/python/python-advanced/architecture-and-systems-design/microservices-vs-monolith

What is the difference between microservices python and service communication?

See the full breakdown at https://engineersofai.com/docs/python/python-advanced/architecture-and-systems-design/microservices-vs-monolith

Microservices vs Monolith - Making the Right Choice

Two teams build the same product: an e-commerce platform with users, catalog, orders, and payments. Team A starts with microservices on day one. Team B starts with a monolith.

Six months later:

Team A (Microservices from Day 1):
- 4 services, 4 repos, 4 CI pipelines, 4 Docker images
- Spent 2 months on service discovery, API gateway, distributed tracing
- Network timeouts between services cause intermittent order failures
- Schema changes require coordinated deploys across 3 services
- 2 of 5 engineers spend most of their time on infrastructure
- 47 Kubernetes manifests

Team B (Monolith):
- 1 repo, 1 CI pipeline, 1 Docker image
- Shipped 3 major features in the time Team A spent on infrastructure
- Database transactions guarantee order consistency
- Refactoring is a simple find-and-replace across the codebase
- All 5 engineers work on product features
- 1 Dockerfile, 1 docker-compose.yml

Team A made a common mistake: choosing microservices before they understood their domain. This lesson will teach you when each architecture is appropriate and how to evolve between them.

What You Will Learn

The monolith-first approach and why Martin Fowler recommends it
When splitting into services is genuinely necessary
Communication patterns: REST, gRPC, and message queues (Celery + Redis)
Shared libraries and service contracts
Bounded contexts from Domain-Driven Design as service boundaries
Data ownership and the problems of shared databases
The eight fallacies of distributed computing
The modular monolith as a pragmatic middle ground

Prerequisites

Understanding of Clean Architecture and Hexagonal Architecture (Lessons 1-2)
Experience with FastAPI and SQLAlchemy
Familiarity with the 12-Factor App methodology (previous lesson)
Basic understanding of Docker and container orchestration

Part 1 - The Monolith-First Approach

Martin Fowler's advice: "Almost all the successful microservice stories have started with a monolith that got too big and was broken up."

Why Monoliths Are the Right Starting Point

Advantage	Why It Matters Early On
Simple deployment	One artifact, one deploy pipeline
Easy refactoring	Rename a function, IDE handles it across all modules
ACID transactions	Orders and payments in a single database transaction
Low latency	Function calls, not network calls
Small team fit	1-5 engineers can maintain one codebase efficiently
Domain discovery	You do not know your service boundaries yet

The Premature Decomposition Trap

# Team A's order creation - microservices
# Order Service calls User Service, then Catalog Service, then Payment Service
async def create_order(user_id: int, items: list[dict]) -> Order:
    # Network call #1: verify user exists
    async with httpx.AsyncClient() as client:
        user_resp = await client.get(f"{USER_SERVICE_URL}/users/{user_id}")
        if user_resp.status_code != 200:
            raise ValueError("User not found")

    # Network call #2: verify item availability and prices
    async with httpx.AsyncClient() as client:
        catalog_resp = await client.post(
            f"{CATALOG_SERVICE_URL}/items/verify",
            json={"items": items},
        )
        if catalog_resp.status_code != 200:
            raise ValueError("Items unavailable")

    # Network call #3: process payment
    async with httpx.AsyncClient() as client:
        payment_resp = await client.post(
            f"{PAYMENT_SERVICE_URL}/charge",
            json={"user_id": user_id, "amount": catalog_resp.json()["total"]},
        )
        if payment_resp.status_code != 200:
            # What if payment fails after catalog reserved items?
            # Need a saga or compensation transaction...
            raise ValueError("Payment failed")

    # Now save the order - but what if THIS fails?
    order = Order(user_id=user_id, items=items, status="confirmed")
    await order_repo.save(order)
    return order

# Team B's order creation - monolith
def create_order(user_id: int, items: list[dict], db: Session) -> Order:
    # All in one transaction - either everything succeeds or nothing does
    user = db.query(User).get(user_id)
    if not user:
        raise ValueError("User not found")

    catalog_items = db.query(CatalogItem).filter(
        CatalogItem.id.in_([i["id"] for i in items])
    ).all()

    total = sum(item.price * qty for item, qty in zip(catalog_items, items))

    payment = Payment(user_id=user_id, amount=total, status="charged")
    order = Order(user_id=user_id, items=items, payment=payment, status="confirmed")

    db.add(payment)
    db.add(order)
    db.commit()  # atomic - all or nothing
    return order

The monolith version is simpler, faster, and correct by default (database transactions). The microservices version requires distributed transactions or sagas to achieve the same consistency.

:::tip The "MonolithFirst" Strategy Build a well-structured monolith first. When (and if) a module needs to be extracted, the clean internal boundaries make extraction straightforward. You cannot easily merge two bad microservices, but you can always split a well-structured monolith. :::

Part 2 - When to Split: Real Signals

Splitting makes sense when you have genuine, concrete problems that a monolith cannot solve.

Legitimate Reasons to Split

Signal	What It Looks Like	Microservice Solution
Independent scaling	The catalog search handles 100x the traffic of order processing	Scale catalog independently
Different tech requirements	ML model serving needs GPU, web API does not	Separate services on different hardware
Team autonomy	30+ engineers stepping on each other's code	Teams own services with clear contracts
Different deployment cadence	Payments team deploys monthly (compliance), catalog team deploys daily	Independent deploy pipelines
Fault isolation	A bug in recommendation crashes the entire checkout flow	Recommendation failure does not affect orders

Invalid Reasons to Split

Signal	Why It Is Not a Reason
"Microservices are modern"	Architecture should solve problems, not follow trends
"Our monolith is messy"	A messy monolith becomes messy microservices - fix the structure first
"We might need to scale someday"	A well-built monolith handles millions of requests
"Different modules = different services"	Modules within a monolith already provide separation
"It looks good on my resume"	Please do not inflict unnecessary complexity on your team

Part 3 - Communication Patterns

When services do need to talk to each other, you have three primary options.

Pattern 1: REST (Synchronous HTTP)

# service_a/client.py - calling another service via REST
import httpx
from typing import Optional


class UserServiceClient:
    """REST client for the User Service."""

    def __init__(self, base_url: str, timeout: float = 5.0) -> None:
        self._base_url = base_url
        self._timeout = timeout

    async def get_user(self, user_id: int) -> Optional[dict]:
        async with httpx.AsyncClient(timeout=self._timeout) as client:
            try:
                response = await client.get(f"{self._base_url}/users/{user_id}")
                response.raise_for_status()
                return response.json()
            except httpx.TimeoutException:
                raise ServiceUnavailableError("User service timed out")
            except httpx.HTTPStatusError as e:
                if e.response.status_code == 404:
                    return None
                raise


class ServiceUnavailableError(Exception):
    pass

Pros	Cons
Simple, well-understood	Synchronous - caller waits for response
Language-agnostic	Serialization overhead (JSON)
Easy to debug (curl)	Tight temporal coupling
Standard HTTP tools work	Cascading failures if service is slow

Pattern 2: gRPC (Synchronous, Binary Protocol)

// user_service.proto
syntax = "proto3";

service UserService {
    rpc GetUser(GetUserRequest) returns (User);
    rpc ListUsers(ListUsersRequest) returns (stream User);
}

message GetUserRequest {
    int32 user_id = 1;
}

message User {
    int32 id = 1;
    string email = 2;
    string name = 3;
    bool is_active = 4;
}

message ListUsersRequest {
    int32 limit = 1;
    int32 offset = 2;
}

# server.py - gRPC server
import grpc
from concurrent import futures
import user_service_pb2
import user_service_pb2_grpc


class UserServicer(user_service_pb2_grpc.UserServiceServicer):
    def GetUser(self, request, context):
        user = db.query(User).get(request.user_id)
        if not user:
            context.set_code(grpc.StatusCode.NOT_FOUND)
            context.set_details(f"User {request.user_id} not found")
            return user_service_pb2.User()
        return user_service_pb2.User(
            id=user.id, email=user.email, name=user.name, is_active=user.is_active
        )


server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
user_service_pb2_grpc.add_UserServiceServicer_to_server(UserServicer(), server)
server.add_insecure_port("[::]:50051")
server.start()

# client.py - gRPC client
import grpc
import user_service_pb2
import user_service_pb2_grpc


def get_user(user_id: int) -> dict:
    channel = grpc.insecure_channel("user-service:50051")
    stub = user_service_pb2_grpc.UserServiceStub(channel)

    try:
        response = stub.GetUser(
            user_service_pb2.GetUserRequest(user_id=user_id),
            timeout=5.0,
        )
        return {"id": response.id, "email": response.email, "name": response.name}
    except grpc.RpcError as e:
        if e.code() == grpc.StatusCode.NOT_FOUND:
            return None
        raise

Pros	Cons
Binary protocol - fast serialization	Requires code generation (protobuf)
Typed contracts (proto files)	Harder to debug (not human-readable)
Streaming support	Less ecosystem support than REST
Auto-generated clients	Additional tooling complexity

Pattern 3: Message Queues (Asynchronous)

# tasks.py - Celery + Redis for async communication
from celery import Celery

celery_app = Celery("myapp", broker="redis://redis:6379/0")


@celery_app.task(bind=True, max_retries=3, default_retry_delay=60)
def send_welcome_email(self, user_id: int, email: str):
    """Async task - runs on a worker, not the web process."""
    try:
        send_email(
            to=email,
            subject="Welcome!",
            body=f"Welcome to our platform!",
        )
    except EmailServiceError as e:
        self.retry(exc=e)


@celery_app.task(bind=True, max_retries=3)
def generate_certificate(self, user_id: int, course_id: int):
    """Async task - heavy PDF generation runs on worker."""
    try:
        certificate = create_certificate_pdf(user_id, course_id)
        upload_to_s3(certificate)
        notify_user(user_id, "Your certificate is ready!")
    except Exception as e:
        self.retry(exc=e, countdown=120)


# In the web process - fire and forget
@app.post("/enroll")
def enroll(user_id: int, course_id: int, db: Session = Depends(get_db)):
    enrollment = Enrollment(user_id=user_id, course_id=course_id)
    db.add(enrollment)
    db.commit()

    # Dispatch async tasks - web process returns immediately
    send_welcome_email.delay(user_id, user.email)
    generate_certificate.delay(user_id, course_id)

    return {"status": "enrolled"}

Pros	Cons
Decoupled in time - producer does not wait	Eventual consistency (not immediate)
Natural retry mechanism	Harder to debug (async flows)
Load leveling (workers process at their own pace)	Queue monitoring required
Fault tolerance (if worker dies, task retries)	More infrastructure (broker, workers)

When to Use Each Pattern

Pattern	Use When
REST	Simple request/response, CRUD operations, public APIs
gRPC	Internal service-to-service, high throughput, streaming
Message Queue	Fire-and-forget, long-running tasks, event-driven workflows

Part 4 - Bounded Contexts and Service Boundaries

Domain-Driven Design's concept of bounded contexts provides the best guidance for where to draw service boundaries.

A bounded context is a portion of the domain where a particular model applies. The same word ("user") can mean different things in different contexts.

The same physical person is a "User" in auth, a "Customer" in billing, an "Author" in catalog, and a "Student" in enrollment. Each context has its own model with only the attributes it needs.

In a Monolith: Modules as Bounded Contexts

myapp/
├── auth/
│   ├── models.py       # User (email, password_hash, mfa_enabled)
│   ├── services.py     # AuthService
│   └── routes.py
├── billing/
│   ├── models.py       # Customer (stripe_id, plan, payment_method)
│   ├── services.py     # BillingService
│   └── routes.py
├── catalog/
│   ├── models.py       # Course, Author
│   ├── services.py     # CatalogService
│   └── routes.py
├── enrollment/
│   ├── models.py       # Enrollment, Progress, Certificate
│   ├── services.py     # EnrollmentService
│   └── routes.py
└── shared/
    └── events.py       # Internal event bus

Each module has its own models, services, and routes. Modules communicate through a well-defined internal API (function calls or an in-process event bus), not by reaching into each other's database tables.

# shared/events.py - in-process event bus for the monolith
from typing import Callable, Any
from collections import defaultdict


class EventBus:
    """Simple in-process event bus for inter-module communication."""

    def __init__(self) -> None:
        self._handlers: dict[str, list[Callable]] = defaultdict(list)

    def subscribe(self, event_type: str, handler: Callable) -> None:
        self._handlers[event_type].append(handler)

    def publish(self, event_type: str, payload: dict[str, Any]) -> None:
        for handler in self._handlers[event_type]:
            handler(payload)


# Global event bus (or injected via DI)
event_bus = EventBus()


# auth/services.py
def register_user(email: str, password: str):
    user = User(email=email, password_hash=hash(password))
    db.add(user)
    db.commit()
    event_bus.publish("user.registered", {"user_id": user.id, "email": email})


# billing/services.py
event_bus.subscribe("user.registered", lambda payload: create_customer(payload["user_id"]))

# enrollment/services.py
event_bus.subscribe("user.registered", lambda payload: create_student_profile(payload["user_id"]))

:::note Extract Services Along Bounded Context Lines When you eventually need microservices, each bounded context becomes a natural service boundary. The internal event bus becomes an external message queue. The function call interface becomes a REST/gRPC API. :::

Part 5 - Data Ownership

The hardest part of microservices is data. Each service should own its data.

The Shared Database Anti-Pattern

Problems:

Schema changes in one service break others
No clear ownership of tables
Impossible to scale databases independently
Tight coupling through shared state

Database per Service

Each service owns its database schema. If the Order Service needs user information, it calls the User Service API - it does not query the users table directly.

Practical Compromise: Schema-per-Service

In early stages, you can use separate schemas within the same PostgreSQL instance:

-- Same PostgreSQL server, different schemas
CREATE SCHEMA auth;
CREATE SCHEMA billing;
CREATE SCHEMA catalog;
CREATE SCHEMA enrollment;

-- auth.users is only accessed by the auth module
CREATE TABLE auth.users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL
);

-- billing.customers references auth.users by ID only (no foreign key across schemas)
CREATE TABLE billing.customers (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL,  -- no FK to auth.users
    stripe_id VARCHAR(255)
);

# auth/models.py
class User(Base):
    __tablename__ = "users"
    __table_args__ = {"schema": "auth"}

    id = Column(Integer, primary_key=True)
    email = Column(String, unique=True, nullable=False)
    password_hash = Column(String, nullable=False)


# billing/models.py
class Customer(Base):
    __tablename__ = "customers"
    __table_args__ = {"schema": "billing"}

    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, nullable=False)  # no ForeignKey to auth.users
    stripe_id = Column(String)

This gives you data isolation at the schema level while keeping operational simplicity (one database server). When a module needs to become a separate service, it takes its schema with it.

Part 6 - The Eight Fallacies of Distributed Computing

Peter Deutsch and James Gosling identified eight assumptions that developers make about distributed systems, all of which are false.

Fallacy	Reality	Python Impact
1. The network is reliable	Packets drop, connections reset	Add timeouts, retries, circuit breakers
2. Latency is zero	Network calls add 1-100ms	Batch requests, cache aggressively
3. Bandwidth is infinite	Large payloads are slow	Pagination, compression, gRPC
4. The network is secure	Traffic can be intercepted	TLS everywhere, even internal
5. Topology doesn't change	Services move, IPs change	Service discovery, DNS-based routing
6. There is one administrator	Multiple teams, multiple policies	Clear ownership, documented contracts
7. Transport cost is zero	Serialization has CPU/memory cost	Choose efficient formats (protobuf vs JSON)
8. The network is homogeneous	Different OSes, languages, versions	Standardize on protocols, not implementations

Implementing Resilience in Python

# Retry with exponential backoff
import httpx
import asyncio
from tenacity import retry, stop_after_attempt, wait_exponential


@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=1, max=10),
)
async def call_user_service(user_id: int) -> dict:
    async with httpx.AsyncClient(timeout=5.0) as client:
        response = await client.get(f"{USER_SERVICE_URL}/users/{user_id}")
        response.raise_for_status()
        return response.json()

# Circuit breaker pattern
from enum import Enum
from time import time


class CircuitState(Enum):
    CLOSED = "closed"      # normal operation
    OPEN = "open"          # failing, reject calls
    HALF_OPEN = "half_open"  # testing if service recovered


class CircuitBreaker:
    def __init__(
        self,
        failure_threshold: int = 5,
        recovery_timeout: float = 30.0,
    ) -> None:
        self._failure_threshold = failure_threshold
        self._recovery_timeout = recovery_timeout
        self._failure_count = 0
        self._last_failure_time = 0.0
        self._state = CircuitState.CLOSED

    @property
    def state(self) -> CircuitState:
        if self._state == CircuitState.OPEN:
            if time() - self._last_failure_time > self._recovery_timeout:
                self._state = CircuitState.HALF_OPEN
        return self._state

    def record_success(self) -> None:
        self._failure_count = 0
        self._state = CircuitState.CLOSED

    def record_failure(self) -> None:
        self._failure_count += 1
        self._last_failure_time = time()
        if self._failure_count >= self._failure_threshold:
            self._state = CircuitState.OPEN

    def can_execute(self) -> bool:
        state = self.state
        if state == CircuitState.CLOSED:
            return True
        if state == CircuitState.HALF_OPEN:
            return True  # allow one test request
        return False  # OPEN - reject


# Usage
user_service_breaker = CircuitBreaker(failure_threshold=5, recovery_timeout=30)

async def get_user_safe(user_id: int) -> dict | None:
    if not user_service_breaker.can_execute():
        return None  # fallback: return cached or default

    try:
        result = await call_user_service(user_id)
        user_service_breaker.record_success()
        return result
    except Exception:
        user_service_breaker.record_failure()
        return None

:::danger Every Network Call Can Fail In a monolith, a function call either succeeds or raises an exception. In microservices, a call can also: time out, return garbage, succeed but the response is lost, or succeed on the server but the client never knows. Plan for all of these. :::

Part 7 - The Modular Monolith: The Middle Ground

The modular monolith gives you the organizational benefits of microservices (clear boundaries, team ownership) without the operational complexity (network calls, distributed transactions).

Rules for the Modular Monolith

Each module has a public API - other modules can only call functions in the public interface, never import internal classes.
No cross-module database queries - modules own their tables and expose data through their public API.
Communication through events for side effects - when a user registers, the auth module publishes an event; other modules react.
Each module can be tested independently - mock the event bus and other module APIs.

Implementation

# auth/public.py - the ONLY interface other modules can use
from typing import Optional
from uuid import UUID


class AuthModuleAPI:
    """Public API for the auth module. Other modules call this, not internal services."""

    def __init__(self, auth_service: AuthService) -> None:
        self._service = auth_service

    def get_user_email(self, user_id: UUID) -> Optional[str]:
        user = self._service.get_user(user_id)
        return user.email if user else None

    def verify_user_exists(self, user_id: UUID) -> bool:
        return self._service.get_user(user_id) is not None

# order/services.py - uses auth module's public API, not its internals
class OrderService:
    def __init__(
        self,
        order_repo: OrderRepository,
        auth_api: AuthModuleAPI,  # dependency on public API only
        event_bus: EventBus,
    ) -> None:
        self._order_repo = order_repo
        self._auth_api = auth_api
        self._event_bus = event_bus

    def create_order(self, user_id: UUID, items: list[dict]) -> Order:
        # Call auth module's public API - NOT a database query
        if not self._auth_api.verify_user_exists(user_id):
            raise ValueError("User not found")

        order = Order(user_id=user_id, items=items)
        self._order_repo.save(order)
        self._event_bus.publish("order.created", {
            "order_id": str(order.id),
            "user_id": str(user_id),
        })
        return order

Enforcing Module Boundaries

# tools/check_boundaries.py - a linting script for import violations
import ast
import sys
from pathlib import Path

# Define allowed imports per module
ALLOWED_IMPORTS = {
    "order": {"auth.public", "catalog.public", "shared"},
    "billing": {"auth.public", "shared"},
    "catalog": {"shared"},
    "auth": {"shared"},
}

def check_module(module_name: str, module_path: Path) -> list[str]:
    violations = []
    allowed = ALLOWED_IMPORTS.get(module_name, set())

    for py_file in module_path.rglob("*.py"):
        tree = ast.parse(py_file.read_text())
        for node in ast.walk(tree):
            if isinstance(node, (ast.Import, ast.ImportFrom)):
                module = getattr(node, "module", "") or ""
                for other_module in ALLOWED_IMPORTS:
                    if other_module == module_name:
                        continue
                    if module.startswith(f"{other_module}.") and not any(
                        module.startswith(a) for a in allowed
                    ):
                        violations.append(
                            f"{py_file}:{node.lineno}: "
                            f"'{module_name}' imports '{module}' "
                            f"(only {allowed} allowed)"
                        )
    return violations

# Run as: python tools/check_boundaries.py
# Integrate into CI/CD pipeline

When to Graduate to Microservices

The modular monolith is designed for easy extraction:

Because modules already communicate through a public API and events, extracting a module into a service requires:

Replace the in-process public API with an HTTP/gRPC client
Replace the in-process event bus with a message queue
Give the extracted module its own database

The business logic inside the module does not change at all.

Part 8 - Shared Libraries and Contracts

When you do have multiple services, you need shared contracts to keep them in sync.

Shared Pydantic Models

# shared_contracts/user.py (published as a pip package)
from pydantic import BaseModel
from uuid import UUID


class UserResponse(BaseModel):
    id: UUID
    email: str
    name: str
    is_active: bool


class UserCreatedEvent(BaseModel):
    user_id: UUID
    email: str
    timestamp: str

# shared_contracts/pyproject.toml
[project]
name = "myapp-contracts"
version = "1.0.0"
dependencies = ["pydantic>=2.0"]

# In the User Service
from myapp_contracts.user import UserResponse

@app.get("/users/{user_id}", response_model=UserResponse)
def get_user(user_id: UUID):
    ...

# In the Order Service
from myapp_contracts.user import UserResponse

async def get_user_from_service(user_id: UUID) -> UserResponse:
    response = await client.get(f"{USER_SERVICE_URL}/users/{user_id}")
    return UserResponse.model_validate(response.json())

Contract Testing

# tests/test_contracts.py
from myapp_contracts.user import UserResponse


def test_user_response_contract():
    """Verify the contract matches the actual API response."""
    # This is the shape the User Service actually returns
    raw_response = {
        "id": "550e8400-e29b-41d4-a716-446655440000",
        "email": "[email protected]",
        "name": "Alice",
        "is_active": True,
    }

    # If this fails, the contract is out of date
    user = UserResponse.model_validate(raw_response)
    assert user.email == "[email protected]"

:::tip Version Your Contracts When a service changes its API, bump the shared contract version. Consumer services can pin to the version they support and upgrade at their own pace. This prevents synchronized deploys. :::

Part 9 - Decision Framework

Architecture	Best For	Operational Cost
Monolith	Startups, small teams, MVPs, well-understood domains	Low
Modular Monolith	Growing teams, complex domains, preparing for potential split	Medium
Microservices	Large teams, independent scaling needs, polyglot requirements	High

Key Takeaways

Start with a monolith: you do not know your service boundaries on day one. A monolith lets you discover them through real usage and evolving requirements.
The modular monolith is often the right answer: it gives you clean boundaries, team ownership, and testability without the operational overhead of distributed systems.
Split only when you have real signals: independent scaling, different deployment cadences, or team size exceeding what a single codebase can support.
Bounded contexts define service boundaries: do not split by technical layer (API service, database service). Split by business domain (auth, billing, catalog).
Each service owns its data: shared databases create invisible coupling. Use separate schemas as a stepping stone, separate databases as the goal.
Every network call can fail: microservices require retries, timeouts, circuit breakers, and fallback strategies that monoliths do not need.
Message queues (Celery + Redis) decouple services in time: use async communication for side effects (emails, notifications, report generation) even in monoliths.
Shared contracts keep services in sync: publish Pydantic models as a shared package and write contract tests to catch breaking changes.

Graded Practice Challenges

Level 1 - Identify the Issue

Question 1: A team has 3 microservices, each with its own database. The Order Service needs the user's email to send a confirmation. It queries the User Service's database directly. What is wrong?

Answer

This violates data ownership. The Order Service is reaching directly into the User Service's database, creating invisible coupling. If the User Service changes its schema (renames a column, changes the table structure), the Order Service breaks without any contract being violated. The correct approach is for the Order Service to call the User Service's API to get the email, or to receive the email as part of an event when the order is placed.

Question 2: A startup with 3 engineers decides to build with microservices "to be ready for scale." Is this a good decision?

Answer

No. With 3 engineers, the operational overhead of microservices (multiple repos, CI pipelines, service discovery, distributed tracing, network error handling) will consume a disproportionate amount of engineering time. A well-structured monolith or modular monolith would let the team ship features faster. They can extract services later when concrete scaling or organizational signals emerge.

Question 3: The billing module in a modular monolith imports from auth.models import User to check user roles. Is this acceptable?

Answer

No. This violates module boundaries. The billing module should only access the auth module through its public API (auth.public.AuthModuleAPI), not by importing internal models. If billing needs to know user roles, the auth module's public API should expose a method like get_user_role(user_id). Direct model imports create tight coupling that would prevent extracting either module into a separate service.

Level 2 - Refactoring Challenge

You have a monolith where the order processing code directly accesses user data and sends emails:

def create_order(user_id: int, items: list, db: Session):
    user = db.query(UserModel).get(user_id)  # cross-module DB access
    order = OrderModel(user_id=user_id, items=items, total=calculate_total(items))
    db.add(order)
    db.commit()

    # Direct email sending in the request path
    smtp = smtplib.SMTP("smtp.company.com")
    smtp.sendmail("[email protected]", user.email, f"Order {order.id} confirmed")
    smtp.quit()

Refactor into a modular monolith with: (a) an auth module with a public API for getting user emails, (b) an order module that uses the auth public API, (c) an event bus that triggers email sending asynchronously, (d) clear module boundary rules.

Level 3 - Design Challenge

Design the architecture for the EngineersOfAI platform as it scales from a startup (3 engineers) to a growing company (20 engineers):

Phase 1 (3 engineers): Monolith with courses, user auth, and content delivery. Phase 2 (8 engineers): Modular monolith with modules for auth, courses, certificates, and payments. Phase 3 (20 engineers): Extract the certificate generation service (CPU-intensive PDF generation) and the payment service (PCI compliance requires isolation).

For each phase, produce: (a) the architecture diagram, (b) the communication patterns, (c) the data ownership model, (d) the deployment strategy. Explain what triggers the transition from each phase to the next.

What's Next

Congratulations on completing Module 5 - Architecture & Systems Design. You now have the knowledge to make informed architectural decisions, from code-level patterns (Clean Architecture, Hexagonal, DI) to system-level choices (monolith vs microservices, communication patterns, configuration management).

The next module explores Security Engineering, where you will learn to build Python applications that are secure by design - covering authentication, authorization, input validation, cryptography, and common vulnerability patterns.

What You Will Learn​

Prerequisites​

Part 1 - The Monolith-First Approach​

Why Monoliths Are the Right Starting Point​

The Premature Decomposition Trap​

Part 2 - When to Split: Real Signals​

Legitimate Reasons to Split​

Invalid Reasons to Split​

Part 3 - Communication Patterns​

Pattern 1: REST (Synchronous HTTP)​

Pattern 2: gRPC (Synchronous, Binary Protocol)​

Pattern 3: Message Queues (Asynchronous)​

When to Use Each Pattern​

Part 4 - Bounded Contexts and Service Boundaries​

In a Monolith: Modules as Bounded Contexts​

Part 5 - Data Ownership​

The Shared Database Anti-Pattern​

Database per Service​

Practical Compromise: Schema-per-Service​

Part 6 - The Eight Fallacies of Distributed Computing​

Implementing Resilience in Python​

Part 7 - The Modular Monolith: The Middle Ground​

Rules for the Modular Monolith​

Implementation​

Enforcing Module Boundaries​

When to Graduate to Microservices​

Part 8 - Shared Libraries and Contracts​

Shared Pydantic Models​

Contract Testing​

Part 9 - Decision Framework​

Key Takeaways​

Graded Practice Challenges​

Level 1 - Identify the Issue​

Level 2 - Refactoring Challenge​

Level 3 - Design Challenge​

What's Next​

What You Will Learn

Prerequisites

Part 1 - The Monolith-First Approach

Why Monoliths Are the Right Starting Point

The Premature Decomposition Trap

Part 2 - When to Split: Real Signals

Legitimate Reasons to Split

Invalid Reasons to Split

Part 3 - Communication Patterns

Pattern 1: REST (Synchronous HTTP)

Pattern 2: gRPC (Synchronous, Binary Protocol)

Pattern 3: Message Queues (Asynchronous)

When to Use Each Pattern

Part 4 - Bounded Contexts and Service Boundaries

In a Monolith: Modules as Bounded Contexts

Part 5 - Data Ownership

The Shared Database Anti-Pattern

Database per Service

Practical Compromise: Schema-per-Service

Part 6 - The Eight Fallacies of Distributed Computing

Implementing Resilience in Python

Part 7 - The Modular Monolith: The Middle Ground

Rules for the Modular Monolith

Implementation

Enforcing Module Boundaries

When to Graduate to Microservices

Part 8 - Shared Libraries and Contracts

Shared Pydantic Models

Contract Testing

Part 9 - Decision Framework

Key Takeaways

Graded Practice Challenges

Level 1 - Identify the Issue

Level 2 - Refactoring Challenge

Level 3 - Design Challenge

What's Next