01Module 02: Pretraining and Fine-tuningHow large language models learn from raw text, get aligned with human intent, and become the assistants we use today.02Language Modeling ObjectivesLearn the training objectives that teach LLMs to understand language - causal language modeling, masked language modeling, cross-entropy loss, and perplexity.03Masked Language Modeling and BERTUnderstand how BERT learns bidirectional language representations using masked language modeling, its architecture, and how to fine-tune it for downstream tasks.04Causal Language Modeling and GPTLearn how GPT-style autoregressive models work, the evolution from GPT-1 to GPT-4, sampling strategies, and why causal LM became the dominant paradigm for LLMs.05Pretraining at ScaleThe infrastructure, parallelism strategies, memory optimizations, and training data choices required to pretrain large language models on thousands of GPUs.06Supervised Fine-TuningLearn how to adapt pretrained LLMs to specific tasks through supervised fine-tuning - data preparation, hyperparameters, catastrophic forgetting, and evaluation.07Instruction TuningHow instruction tuning transforms base LLMs into general-purpose assistants that can follow diverse instructions, reason step by step, and generalize to new tasks.08LoRA: Low-Rank AdaptationMaster LoRA - the parameter-efficient fine-tuning method that adds only 0.3% of parameters to GPT-3 while matching full fine-tuning quality, making LLM fine-tuning feasible on a single GPU.09QLoRA: Quantized Low-Rank AdaptationLearn how QLoRA combines 4-bit quantization with LoRA to fine-tune 65B parameter models on a single consumer GPU, using NF4 quantization, double quantization, and paged optimizers.10Full Fine-Tuning vs PEFT: Decision FrameworkA practical decision framework for choosing between full fine-tuning, LoRA, QLoRA, prompt tuning, and other PEFT methods based on your model size, data, and quality requirements.11RLHF: Reinforcement Learning from Human FeedbackUnderstand how RLHF aligns LLMs with human preferences through three phases - SFT, reward model training, and PPO - and why it produced InstructGPT's surprising result that smaller aligned models beat larger unaligned ones.12DPO: Direct Preference OptimizationMaster DPO - the elegant insight that you can optimize LLMs for human preferences without training a reward model or running RL, derived directly from the optimal RLHF policy.13Modern Alignment TechniquesSurvey the post-RLHF alignment landscape - RLAIF, Constitutional AI, rejection sampling fine-tuning, iterative DPO, process reward models, and the open questions shaping the next generation of aligned models.