Skip to main content

Module 2 - Batch Processing for ML

Apache Spark architecture, distributed joins, partitioning strategies, PySpark best practices, and dbt for ML pipelines.