Module 10: Real-Time Feature Engineering
Batch pipelines were the foundation of ML feature engineering for a decade. They work well - until the moment a signal changes and the model is still consuming yesterday's value. Real-time feature engineering is the discipline of computing ML features from live event streams with sub-second latency, where the cost of stale features is a measurably worse prediction.
This module covers the full stack: why online features exist, how to serve them at sub-millisecond latency, how to build stream-to-feature pipelines that write to online and offline stores simultaneously, how to guarantee consistency between training and serving, and how to operate these systems reliably at scale.
Module Map
Lessons
| # | Lesson | Key Topics | Read Time |
|---|---|---|---|
| 01 | Online vs Offline Features | Feature freshness spectrum, training-serving gap, hourly vs. daily approximations | 20 min |
| 02 | Low-Latency Feature Serving | Redis data structures, pipeline batching, DynamoDB, Bigtable, feature server architecture | 25 min |
| 03 | Stream-to-Feature Pipelines | Kafka → Flink → dual-write (Redis + Iceberg), watermarks, state management, DLQ | 25 min |
| 04 | Feature Consistency | Single-computation path, feature versioning, canary testing, consistency validator | 22 min |
| 05 | Embedding Stores | HNSW, IVF, Faiss, pgvector, managed vector databases, hybrid search | 25 min |
| 06 | Real-Time Aggregations | Sliding windows, two-level aggregation, HyperLogLog, Redis sorted sets, CUSUM | 25 min |
| 07 | Production Patterns | Connection pooling, backpressure, graceful degradation, multi-region serving, runbooks | 28 min |
Prerequisites
This module assumes familiarity with:
- Module 01 - Data pipeline fundamentals (batch vs. streaming distinctions)
- Module 02 - Storage systems (understanding Redis, DynamoDB, column-oriented stores)
- Module 03 - Streaming (Kafka, Flink operators, event time vs. processing time)
- Module 04 - Data modeling (schema design, key design patterns)
- Module 05 - Feature stores (Feast, Tecton - offline/online split, registry, serving APIs)
:::tip Start Here If you haven't completed Module 03 (Streaming) and Module 05 (Feature Stores), complete those first. Real-time feature engineering sits at the intersection of streaming infrastructure and feature store design. :::
Key Concepts
- Online features - features computed from live event data at request time, not pre-computed from historical snapshots
- Feature freshness - how recently a feature value was computed; the primary axis of the batch-vs-real-time tradeoff
- Stream-to-feature pipeline - the transformation path from a raw event stream to a queryable, versioned feature in the feature store
- Training-serving consistency - the guarantee that a feature computed at training time and at serving time represent the same quantity, computed by the same logic
- Low-latency serving - retrieving feature values within the latency budget of the model endpoint (typically 10–70ms)
- Embedding store - a specialized index (HNSW, IVF) for serving dense vector similarity queries at sub-10ms latency
- Real-time aggregation - computing rolling statistics (counts, sums, cardinalities) over sliding event-time windows
- Feature consistency - systematic prevention of training-serving skew across code versions, timezones, and data sources
Learning Outcomes
By the end of this module, you will be able to:
- Classify features by freshness requirement and choose the appropriate computation strategy (batch, near-real-time, or real-time)
- Design a Redis-backed online feature store with sub-10ms p99 retrieval latency
- Build a Flink stream-to-feature pipeline that writes simultaneously to an online store and an offline store
- Identify and eliminate sources of training-serving skew in a real-world feature pipeline
- Build and serve an approximate nearest neighbor index using Faiss for embedding-based retrieval at scale
- Implement efficient sliding-window aggregations using two-level bucketing and probabilistic data structures
- Operate a real-time feature system with proper fallbacks, circuit breakers, and observability
