How does between work in practice?

On the Relationship Between Activation Outliers and Feature Death in Sparse Autoencoders covers relationship, between, activation from first principles with code examples. Free lesson at https://engineersofai.com/docs/research/paper-breakdowns/2026-05-29-on-the-relationship-between-activation-outliers-and-feature-death-in-sparse-auto

What is the difference between relationship and activation?

See the full breakdown at https://engineersofai.com/docs/research/paper-breakdowns/2026-05-29-on-the-relationship-between-activation-outliers-and-feature-death-in-sparse-auto

On the Relationship Between Activation Outliers and Feature Death in Sparse Autoencoders

:::info Stub — Full Engineering Breakdown Coming This paper was auto-fetched from arXiv on 2026-06-01. A full breakdown with production viability rating, implementation notes, and honest limitations is being written. Subscribe to AI Letters → :::


Authors	Elana Simon et al.
Year	2026
Field	Machine Learning
arXiv	2605.31518
PDF	Download
Categories	cs.LG

Abstract

Sparse autoencoders (SAEs) decompose neural network activations into interpretable features, but many learned features never activate, a problem called feature death that wastes dictionary capacity and can reintroduce superposition. Death rates vary dramatically between models: near-zero on GPT-2, over 70% on AlphaFold3 with identical configurations. We find that dimension-level activation outliers (dimensions whose mean magnitude is large relative to per-token variation) cause this by shifting pre-activations at initialization based on each feature's alignment with the activation mean. Features anti-aligned with the mean receive permanently negative pre-activations and never fire. We formalize outlier severity as $γ= \|μ\|/\|σ\|$ ; it predicts initial death rates (Spearman $ρ= 0.89$ for dead-by-TopK, $0.82$ for dead-by-ReLU) across 454 model-layer combinations spanning language, vision, protein, and genomic models. Dead features can revive during training, but recovery requires the SAE bias to learn the activation mean, a process that is prohibitively slow at high $γ$ . Mean-centering (subtracting the activation mean) sidesteps this and eliminates outlier-induced death across all tested models, confirming the mechanism and providing a principled basis for when and why this preprocessing step is necessary.

Engineering Breakdown

The Problem

Sparse autoencoders (SAEs) decompose neural network activations into interpretable features, but many learned features never activate, a problem called feature death that wastes dictionary capacity and can reintroduce superposition. Dead features can revive during training, but recovery requires the SAE bias to learn the activation mean, a process that is prohibitively slow at high $γ$ .

The Approach

Death rates vary dramatically between models: near-zero on GPT-2, over 70% on AlphaFold3 with identical configurations.

Key Results

Mean-centering (subtracting the activation mean) sidesteps this and eliminates outlier-induced death across all tested models, confirming the mechanism and providing a principled basis for when and why this preprocessing step is necessary.

Research Areas

This paper contributes to the following areas of AI/ML engineering:

Model training
Generalization
Optimization
Supervised learning
Deep learning
Relationship

:::tip Subscribe Get weekly breakdowns of papers like this in AI Letters - the newsletter for engineers building production AI systems. :::

Back to Research Lab → · Subscribe to AI Letters →

Abstract​

Engineering Breakdown​

The Problem​

The Approach​

Key Results​

Research Areas​