Causal Language Modeling and GPT
Learn how GPT-style autoregressive models work, the evolution from GPT-1 to GPT-4, sampling strategies, and why causal LM became the dominant paradigm for LLMs.
Learn how GPT-style autoregressive models work, the evolution from GPT-1 to GPT-4, sampling strategies, and why causal LM became the dominant paradigm for LLMs.
Learn the training objectives that teach LLMs to understand language - causal language modeling, masked language modeling, cross-entropy loss, and perplexity.
Understand how BERT learns bidirectional language representations using masked language modeling, its architecture, and how to fine-tune it for downstream tasks.
The infrastructure, parallelism strategies, memory optimizations, and training data choices required to pretrain large language models on thousands of GPUs.