DARE - Delta Weight Sparsification
How DARE randomly drops delta weights and rescales the remainder to dramatically reduce interference when merging multiple fine-tuned models.
How DARE randomly drops delta weights and rescales the remainder to dramatically reduce interference when merging multiple fine-tuned models.
Layer grafting, depth upscaling, Solar 10.7B, and the fundamental limits of what model merging can and cannot achieve.
How weight averaging of fine-tuned models produces better, more robust models than any individual fine-tune - and the task arithmetic framework for composing capabilities.
How to use arcee-ai/mergekit to merge language models with YAML configuration, CPU-compatible layer-by-layer processing, and automated HuggingFace Hub upload.
How to combine multiple fine-tuned language models into a single, more capable model without any additional training.
How spherical linear interpolation provides smoother, geometrically correct blending between two model weight configurations than simple linear averaging.
How TIES-Merging eliminates task interference by trimming small deltas, electing signs by majority vote, and merging only aligned parameters.
The catastrophic forgetting problem, why naive ensembles are too expensive, and the surprising geometric insight that makes model merging possible.