9 docs tagged with "stream-processing"

Apache Flink Fundamentals

Apache Flink for stateful stream processing - DataStream API, windows, watermarks, state backends, checkpointing, and PyFlink for ML feature computation.

Apache Kafka Architecture - The Nervous System of Real-Time ML

A deep dive into Kafka's distributed commit log, partitions, replication, consumer groups, compacted topics, and the architectural decisions that make it the standard event transport for production ML systems.

Kafka for ML Systems

Using Apache Kafka as the backbone of production ML systems - schema registry, CDC, exactly-once semantics, and dead letter queues.

Kafka Streams vs Apache Flink - The ML Pipeline Decision Guide

A comprehensive comparison of Kafka Streams, Faust, and Apache Flink for building real-time ML feature pipelines, with a production decision framework and working code examples.

Module 3: Stream Processing for Real-Time AI

Eight lessons covering Apache Kafka, Apache Flink, stream processing patterns, real-time feature computation, and production reliability for ML systems that cannot tolerate batch latency.

Real-Time Feature Computation for ML Inference

How to build streaming feature pipelines that compute fresh ML features at production scale, including dual-store architecture, training-serving skew prevention, and hot key mitigation.

Stream Processing Patterns for ML Pipelines

Seven production design patterns for streaming ML pipelines - stream enrichment, stream-stream joins, CDC to feature store, streaming inference, feedback loops, and exactly-once end-to-end.

Streaming Concepts - Why Batch Fails for Real-Time ML

The fundamental theory of stream processing - event time, processing time, watermarks, windowing, delivery semantics, and backpressure - through the lens of ML systems that cannot afford batch latency.

Streaming Pipeline Reliability for ML Systems

How to build streaming ML pipelines that survive failures, handle schema changes, implement dead letter queues, replay events, and monitor themselves - so your fraud model never runs on 3-hour-old features again.