Module 6 - Sequences and Time Series
Most real-world data has an order. Log files, sensor readings, financial prices, user sessions, speech, text - the value of each data point depends on what came before it. Standard feedforward networks throw that order away. This module covers the architectures that don't.
What You'll Learn
Lessons in This Module
| # | Lesson | Core Concept |
|---|---|---|
| 01 | RNNs and Vanishing Gradients | Hidden state, BPTT, why gradients vanish over long sequences |
| 02 | LSTM and GRU Deep Dive | Forget/input/output gates, cell state, GRU simplification |
| 03 | Seq2Seq and Encoder-Decoder | Context vector, Bahdanau attention, teacher forcing |
| 04 | Time Series Forecasting Patterns | Decomposition, walk-forward validation, deep forecasting |
| 05 | Temporal Convolutional Networks | Causal + dilated convolutions, receptive field, WaveNet |
| 06 | Anomaly Detection in Sequences | Point vs contextual anomalies, LSTM autoencoder, thresholds |
Key Concepts at a Glance
The core problem: sequences have temporal dependencies - the model needs memory.
Three architectural answers:
- RNNs/LSTMs - recurrent connections carry state forward through time
- TCNs - dilated causal convolutions capture long-range dependencies without recurrence
- Transformers - attention over all positions simultaneously (covered in Module 9)
When each wins in production:
| Scenario | Best Choice |
|---|---|
| Streaming / low latency | TCN or LSTM |
| Long sequences, parallelism | TCN |
| Variable-length translation | Seq2Seq + attention |
| Demand forecasting | Temporal Fusion Transformer |
| Real-time anomaly detection | LSTM autoencoder |
| Edge device deployment | GRU (fewer params than LSTM) |
© 2026 EngineersOfAI. All rights reserved.
