8 docs tagged with "model-merging"

DARE - Delta Weight Sparsification

How DARE randomly drops delta weights and rescales the remainder to dramatically reduce interference when merging multiple fine-tuned models.

Frankenmodels and Limitations of Model Merging

Layer grafting, depth upscaling, Solar 10.7B, and the fundamental limits of what model merging can and cannot achieve.

Linear Interpolation and Model Soup

How weight averaging of fine-tuned models produces better, more robust models than any individual fine-tune - and the task arithmetic framework for composing capabilities.

MergeKit - The Practical Toolkit

How to use arcee-ai/mergekit to merge language models with YAML configuration, CPU-compatible layer-by-layer processing, and automated HuggingFace Hub upload.

Module 14 Overview - Model Merging

How to combine multiple fine-tuned language models into a single, more capable model without any additional training.

SLERP - Spherical Linear Interpolation

How spherical linear interpolation provides smoother, geometrically correct blending between two model weight configurations than simple linear averaging.

TIES Merging - Resolving Sign Conflicts

How TIES-Merging eliminates task interference by trimming small deltas, electing signs by majority vote, and merging only aligned parameters.

Why Model Merging Exists

The catastrophic forgetting problem, why naive ensembles are too expensive, and the surprising geometric insight that makes model merging possible.