Skip to main content

23 docs tagged with "fine-tuning"

View all tags

Advanced PEFT Methods

Beyond LoRA - Prefix Tuning, Prompt Tuning, IA3, AdaLoRA, VeRA, and LoftQ. When to reach for each method, how they compare on parameter count and quality, and practical implementation with the PEFT library.

Axolotl and TRL Training Frameworks

Using Axolotl and HuggingFace TRL for LoRA and QLoRA fine-tuning - configuration files, SFTTrainer, DPO training, and distributed multi-GPU fine-tuning setups.

Continual Learning and Domain Adaptation

Learn how to adapt open-source language models to specialized domains through continual pre-training, manage catastrophic forgetting with EWC and data mixing, and evaluate domain knowledge gain versus general capability loss.

Evaluating Fine-Tuned Models

Evaluation strategies for fine-tuned LLMs - held-out test sets, LLM-as-judge evaluation, perplexity measurement, task-specific benchmarks, and avoiding evaluation pitfalls.

Fine-Tuning Cost and ROI Analysis

Making the business case for LLM fine-tuning - calculating GPU compute costs, estimating break-even against API pricing, and deciding when fine-tuning beats prompt engineering on ROI.

Fine-Tuning Hyperparameter Search

Systematic hyperparameter optimization for LLM fine-tuning - learning rate, batch size, epochs, LoRA rank, warmup schedules, and efficient search strategies with Optuna and WandB sweeps.

Full Fine-Tuning vs PEFT

Decision framework for choosing between full fine-tuning and parameter-efficient methods like LoRA and QLoRA - covering compute requirements, quality ceilings, catastrophic forgetting, and when each approach wins.

Full Fine-Tuning vs PEFT: Decision Framework

A practical decision framework for choosing between full fine-tuning, LoRA, QLoRA, prompt tuning, and other PEFT methods based on your model size, data, and quality requirements.

Instruction Tuning

How instruction tuning transforms base LLMs into general-purpose assistants that can follow diverse instructions, reason step by step, and generalize to new tasks.

Instruction Tuning at Scale

How to instruction-tune open-source models at production scale - covering the FLAN insight, dataset construction principles, scaling laws for instruction data, multi-node training setup, and a complete pipeline for fine-tuning Llama 3 8B on a 2-node A100 cluster.

Legal LLM Fine-Tuning

Domain adaptation of LLMs for legal tasks - LegalBench evaluation, instruction tuning on legal data, and building legal AI models that outperform general-purpose LLMs on specific tasks.

LoRA Mathematics and Implementation

Learn how LoRA (Low-Rank Adaptation) decomposes weight updates into low-rank matrices, why this works mathematically, and how to implement it from scratch in PyTorch and with HuggingFace PEFT.

LoRA: Low-Rank Adaptation

Master LoRA - the parameter-efficient fine-tuning method that adds only 0.3% of parameters to GPT-3 while matching full fine-tuning quality, making LLM fine-tuning feasible on a single GPU.

Merging and Model Soup Techniques

Combining multiple fine-tuned models without retraining - LoRA adapter merging, SLERP, TIES-merging, DARE, and MergeKit for production model merging that unlocks capabilities no single training run achieves.

Monitoring and Debugging Fine-Tuning

How to monitor LLM fine-tuning runs and debug failures - tracking loss curves, gradient norms, GPU utilization, MFU, and diagnosing NaN loss, overfitting, and OOM errors in LoRA and full fine-tuning.

QLoRA: 4-Bit Fine-Tuning

Learn how QLoRA combines 4-bit NF4 quantization, double quantization, and paged optimizers to fine-tune 65B parameter models on a single GPU - covering the math, implementation, and production engineering.

QLoRA: Quantized Low-Rank Adaptation

Learn how QLoRA combines 4-bit quantization with LoRA to fine-tune 65B parameter models on a single consumer GPU, using NF4 quantization, double quantization, and paged optimizers.

Research Roadmap: RLHF & Alignment

From InstructGPT to DPO to ORPO. Read the 7 most important alignment papers in order — understanding how LLMs are made to follow human intent.

RLHF and DPO for Open Models

Learn how to align open-source language models with human preferences using RLHF and the simpler, more stable Direct Preference Optimization (DPO) approach with TRL.

Selecting Target Modules and Rank

Which layers to apply LoRA to and what rank to use - two of the most impactful fine-tuning decisions. Covers attention vs FFN targeting, rank selection from r=4 to r=64, RSLoRA, DoRA, LoRA+, and ablation strategies.

Supervised Fine-Tuning

Learn how to adapt pretrained LLMs to specific tasks through supervised fine-tuning - data preparation, hyperparameters, catastrophic forgetting, and evaluation.

Synthetic Data and Self-Improvement

Generating high-quality synthetic training data with LLMs using Evol-Instruct, Self-Instruct, Constitutional AI, rejection sampling, and self-play techniques to build data flywheels without expensive human annotation.

Training Data Preparation for Fine-Tuning

Building high-quality data pipelines for LoRA fine-tuning - chat templates, instruction masking, deduplication, quality filtering, synthetic data generation, and dataset formats that actually produce good models.