7 docs tagged with "state-space-models"

Hybrid Architectures - Jamba and Beyond

How combining attention and Mamba layers creates models that outperform pure architectures - Jamba's design, the attention-to-Mamba ratio, MoE integration, and the emerging hybrid landscape.

Limitations of Attention at Scale

Why the quadratic complexity of self-attention creates real production bottlenecks - memory, latency, and cost - and why sparse attention approximations only partially solve the problem.

Mamba - Selective State Space Models

How Mamba's input-dependent SSM parameters, hardware-aware parallel scan, and selective gating mechanism achieved linear-time sequence modeling competitive with transformers.

Mamba vs Transformer - When Each Wins

A rigorous benchmark comparison: perplexity, throughput, recall tasks, in-context learning, and the fundamental trade-off between compressed state and full context access.

Module 12: State Space Models

A complete map of State Space Models - from the quadratic attention bottleneck to Mamba's selective recurrence, hybrid architectures, and production deployment.

State Space Model Foundations

How control theory's state space models became a competitive sequence modeling architecture - continuous-time SSMs, the S4 paper, HiPPO initialization, and the convolutional/recurrent duality.

When to Use SSMs in Production

A practical deployment guide: use cases where SSMs win, the streaming inference pattern, model availability on HuggingFace, fine-tuning SSMs, and a forward-looking outlook.