Anomaly Detection in Pipelines
Statistical anomaly detection for data drift, schema drift, and volume changes.
Statistical anomaly detection for data drift, schema drift, and volume changes.
A deep engineering dive into the five dimensions of data quality - completeness, accuracy, consistency, timeliness, and uniqueness - and how each one silently corrupts AI systems in production.
How poor data quality degrades ML model performance - detection and remediation.
Defining data SLAs, monitoring, alerting, and runbooks for data incidents.
Schema tests, custom tests, and data quality gates in dbt pipelines.
Online store, offline store, feature registry, and the dual-write pattern.
Writing data expectations, validations, and building a data quality suite.
Training-serving skew, feature reuse, and the operational challenges feature stores solve.