Embedding Stores
Storing and serving dense embeddings at scale for real-time recommendation and search.
Storing and serving dense embeddings at scale for real-time recommendation and search.
Ensuring identical features between training (offline) and serving (online).
How to redesign feature engineering pipelines for distributed compute when a 10 GB solution fails at 500 GB.
Monitoring features after deployment - PSI, KS tests, freshness monitoring, completeness tracking, and proving to a regulator that no feature drifted more than 10% PSI.
Reducing 500 features to 50 without losing model performance - filter, wrapper, and embedded methods, SHAP-based selection, and leakage detection.
Architecture and operations of feature stores - offline and online layers, point-in-time joins, and avoiding the training-serving skew that costs you accuracy.
Ensuring feature quality through schema validation, unit tests, integration tests, and monitoring - catching the NaN bug before it degrades your model for 3 weeks.
Redis, Cassandra, and in-memory stores for sub-millisecond feature retrieval.
Feature engineering as an MLOps discipline - from raw data to production-grade feature pipelines, stores, and monitoring.
Systematic feature engineering for tabular data - transformations, encoding, imputation, and selection that lifted AUC from 0.71 to 0.84.
The fundamental split between pre-computed offline and real-time online features.
Overview of real-time feature engineering for low-latency ML systems.
Pandas for machine learning - efficient data loading, feature engineering, pipelines, memory optimisation, and common ML preprocessing patterns.
Case studies in real-time feature engineering from Uber, Twitter, and LinkedIn.
Windowed aggregations, sessionisation, and user behaviour features in real time.
Scikit-learn Pipeline, ColumnTransformer, custom transformers, feature unions, and production-ready ML workflows.
Computing features from event streams with Kafka and Flink.
Turning text into ML features - from TF-IDF baselines to embedding-based representations that improved e-commerce search NDCG by 18%.
Feature engineering for temporal data - lag features, rolling statistics, Fourier seasonality, and preventing temporal leakage that destroys production forecasts.