4 docs tagged with "pretraining"

Causal Language Modeling and GPT

Learn how GPT-style autoregressive models work, the evolution from GPT-1 to GPT-4, sampling strategies, and why causal LM became the dominant paradigm for LLMs.

Language Modeling Objectives

Learn the training objectives that teach LLMs to understand language - causal language modeling, masked language modeling, cross-entropy loss, and perplexity.

Masked Language Modeling and BERT

Understand how BERT learns bidirectional language representations using masked language modeling, its architecture, and how to fine-tune it for downstream tasks.

Pretraining at Scale

The infrastructure, parallelism strategies, memory optimizations, and training data choices required to pretrain large language models on thousands of GPUs.