01Module 04: Real-Time ML SystemsArchitecture patterns for real-time machine learning - from sub-10ms inference at scale to online learning, streaming inference pipelines, and ultra-low-latency optimization.02Real-Time Inference DesignArchitecture for ML inference at 1M QPS with sub-10ms SLA - synchronous vs async real-time, circuit breakers, fallback models, and timeout budget management.03Stream Processing for ML SystemsContinuous feature computation on unbounded data streams using Apache Flink - windowing, watermarks, state management, and production ML feature pipelines.04Event-Driven ML ArchitectureDesigning ML systems around events - event sourcing, CQRS for feature stores, the outbox pattern, and how LinkedIn's unified messaging platform drives ML at scale.05Online LearningContinuous learning in production - online learning vs mini-batch, concept drift adaptation, Vowpal Wabbit, streaming gradient descent, bandit algorithms, and preventing catastrophic forgetting.06Streaming InferenceRunning ML inference on data streams - Kafka integration, Flink ML, stateful stream processing, windowed feature aggregations, exactly-once inference, and time semantics.07Low-Latency OptimizationEngineering for ultra-low latency inference - NUMA awareness, CPU affinity, memory pre-allocation, lock-free data structures, cache line optimization, zero-copy inference, CUDA streams, and kernel profiling.08Real-Time Feature Engineering at ScaleComputing ML features from raw events within milliseconds - Redis patterns, sliding window aggregations, session detection, and Uber's Michelangelo real-time pipeline.09Event-Driven Architecture for MLEvent sourcing and CQRS patterns for ML systems - event-driven state management, Kafka Streams for ML pipelines, event schema design, dead letter queues, and event replay for debugging.10Low-Latency Inference PatternsEngineering ML predictions under 10ms p99 - hardware choices, model optimization, batching strategies, pre-computation, memory layout, and real production targets.11Edge ML DeploymentDeploying ML models to smartphones, IoT devices, and embedded systems - model compression, edge runtimes, OTA updates, federated learning, and real-world examples.12Temporal Features for Real-Time MLEngineering time-based features for real-time ML - recency-weighted features, session features, sliding window aggregations, point-in-time joins, temporal leakage prevention, and clock skew in distributed systems.