Module 02: Pretraining and Fine-tuning
What This Module Covers
Every LLM you use today - GPT-4, Claude, Gemini, LLaMA - went through the same fundamental pipeline: pretrain on massive text, then fine-tune to behave as instructed, then align with human preferences. This module takes you deep into each stage of that pipeline.
You will learn why different training objectives produce different model capabilities, how the industry moved from full fine-tuning to parameter-efficient methods like LoRA, why RLHF was revolutionary, and why DPO is replacing it. By the end of this module, you will be able to make informed decisions about which training approach to use for your specific use case.
The Full Pipeline
Lessons in This Module
| # | Lesson | What You Will Learn |
|---|---|---|
| 01 | Language Modeling Objectives | MLM vs CLM, cross-entropy loss, perplexity, why objective choice matters |
| 02 | Masked Language Modeling (BERT) | The 15% masking trick, BERT architecture, NSP debate, RoBERTa improvements |
| 03 | Causal Language Modeling (GPT) | Autoregressive training, GPT evolution, sampling strategies, in-context learning |
| 04 | Pretraining at Scale | Multi-node training, ZeRO optimizer, Flash Attention, training data, costs |
| 05 | Supervised Fine-Tuning | Fine-tuning on labeled data, catastrophic forgetting, hyperparameters, evaluation |
| 06 | Instruction Tuning | Teaching models to follow instructions, FLAN, chain-of-thought, open datasets |
| 07 | LoRA | Low-rank weight updates, rank selection, alpha scaling, merging, PEFT |
| 08 | QLoRA | 4-bit quantization + LoRA, NF4, double quantization, 65B on one GPU |
| 09 | Full Fine-Tuning vs PEFT | Decision framework, memory comparison, quality tradeoffs, practical guide |
| 10 | RLHF | Reward model training, PPO, KL penalty, reward hacking, InstructGPT results |
| 11 | DPO | Direct preference optimization, the math behind DPO, vs RLHF, TRL training |
| 12 | Modern Alignment Techniques | RLAIF, Constitutional AI, iterative DPO, process reward models, open questions |
Prerequisites
Before starting this module, you should have completed:
- Module 01: Transformer Architecture - attention mechanism, positional encoding, feed-forward layers
- Basic PyTorch - tensors, autograd, training loops
- Familiarity with tokenization - BPE, WordPiece, SentencePiece
Key Concepts You Will Master
Training Objectives
- Causal Language Modeling (CLM) - predict the next token
- Masked Language Modeling (MLM) - predict masked tokens using bidirectional context
- Cross-entropy loss and perplexity
Pretraining Infrastructure
- Tensor, pipeline, and data parallelism
- ZeRO optimizer states (DeepSpeed)
- Mixed precision training (BF16/FP16)
- Flash Attention
Fine-tuning Methods
- Full fine-tuning - update all parameters
- LoRA - low-rank weight updates (Hu et al., 2021)
- QLoRA - 4-bit base model + LoRA (Dettmers et al., 2023)
- Prompt tuning and prefix tuning
Alignment
- RLHF - Reinforcement Learning from Human Feedback
- DPO - Direct Preference Optimization (Rafailov et al., 2023)
- Constitutional AI and RLAIF
How to Use This Module
The lessons build on each other. Start with lesson 01 and work through sequentially. Each lesson includes working code examples you can run, production notes from real deployments, and interview Q&A calibrated to ML engineering interviews at top companies.
The module assumes you are an engineer who wants to understand not just what these techniques are, but why they work, when to use them, and how to debug them in production.
