7 docs tagged with "lakehouse"

Apache Hudi

Hudi's copy-on-write vs merge-on-read and upsert patterns.

Apache Iceberg

Iceberg table format, ACID transactions, schema evolution, and time travel.

Data Governance for AI Training Datasets

What column-level security, data lineage, and cataloguing do for AI systems, when regulated AI training data requires auditability and access controls across the lakehouse, and how to implement governance with Apache Atlas and Unity Catalog in production AI data pipelines.

Data Lake vs Warehouse vs Lakehouse for AI Workloads

What each storage architecture does for AI systems, when ML teams need both raw unstructured data and structured query access on the same platform, and how to choose and implement the right architecture in production AI data pipelines.

Delta Lake

Delta Lake on Databricks, merge operations, and Change Data Capture.

Lakehouse for ML Workflows

Storing training datasets, experiment artifacts, and model outputs in a lakehouse.

Lakehouse Query Engines

Trino, DuckDB, Spark SQL - querying open table formats at scale.