Apache Hudi
Hudi's copy-on-write vs merge-on-read and upsert patterns.
Hudi's copy-on-write vs merge-on-read and upsert patterns.
Iceberg table format, ACID transactions, schema evolution, and time travel.
What column-level security, data lineage, and cataloguing do for AI systems, when regulated AI training data requires auditability and access controls across the lakehouse, and how to implement governance with Apache Atlas and Unity Catalog in production AI data pipelines.
What each storage architecture does for AI systems, when ML teams need both raw unstructured data and structured query access on the same platform, and how to choose and implement the right architecture in production AI data pipelines.
Delta Lake on Databricks, merge operations, and Change Data Capture.
Storing training datasets, experiment artifacts, and model outputs in a lakehouse.
Trino, DuckDB, Spark SQL - querying open table formats at scale.