Apache Flink Fundamentals
Apache Flink for stateful stream processing - DataStream API, windows, watermarks, state backends, checkpointing, and PyFlink for ML feature computation.
Apache Flink for stateful stream processing - DataStream API, windows, watermarks, state backends, checkpointing, and PyFlink for ML feature computation.
A deep dive into Kafka's distributed commit log, partitions, replication, consumer groups, compacted topics, and the architectural decisions that make it the standard event transport for production ML systems.
Using Apache Kafka as the backbone of production ML systems - schema registry, CDC, exactly-once semantics, and dead letter queues.
A comprehensive comparison of Kafka Streams, Faust, and Apache Flink for building real-time ML feature pipelines, with a production decision framework and working code examples.
Eight lessons covering Apache Kafka, Apache Flink, stream processing patterns, real-time feature computation, and production reliability for ML systems that cannot tolerate batch latency.
How to build streaming feature pipelines that compute fresh ML features at production scale, including dual-store architecture, training-serving skew prevention, and hot key mitigation.
Seven production design patterns for streaming ML pipelines - stream enrichment, stream-stream joins, CDC to feature store, streaming inference, feedback loops, and exactly-once end-to-end.
The fundamental theory of stream processing - event time, processing time, watermarks, windowing, delivery semantics, and backpressure - through the lens of ML systems that cannot afford batch latency.
How to build streaming ML pipelines that survive failures, handle schema changes, implement dead letter queues, replay events, and monitor themselves - so your fraud model never runs on 3-hour-old features again.