How does representation work in practice?

LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training covers layerwise, representation, analysis from first principles with code examples. Free lesson at https://engineersofai.com/docs/research/paper-breakdowns/2026-05-28-lara-layerwise-representation-analysis-for-detecting-data-contamination-in-rl-po

What is the difference between layerwise and analysis?

See the full breakdown at https://engineersofai.com/docs/research/paper-breakdowns/2026-05-28-lara-layerwise-representation-analysis-for-detecting-data-contamination-in-rl-po

LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training

:::info Stub — Full Engineering Breakdown Coming This paper was featured on Hugging Face Daily Papers on 2026-05-28 with 24 upvotes. A full breakdown with production viability rating, implementation notes, and honest limitations is being written. Subscribe to AI Letters → :::


Authors	Minju Gwak et al.
Year	2026
HF Upvotes	24
arXiv	2605.29888
PDF	Download
HF Page	View on Hugging Face

Abstract

Reinforcement learning (RL) post-training has shown to improve reasoning in large language models (LLMs). However, there has been little exploration on the problem of data contamination in RL post-training, potentially undermining generalization and evaluation reliability of the training process itself. Existing detection methods primarily rely on output-level signals such as likelihood or entropy, which become unreliable for RL-trained models since RL shapes behavior through trajectory-level rewards rather than token likelihoods. We propose LaRA, a layer-wise representation analysis framework for detecting contamination in RL post-trained LLMs. LaRA introduces three complementary metrics, measuring perturbation sensitivity, directional collapse, and local representation rigidity under controlled perturbations. We find that contamination produces progressive geometric deviations across layers, including amplified perturbation sensitivity, stronger directional collapse, and enhanced local rigidity. Based on our findings, we also develop a contamination detection protocol that aggregates representation-level deviations across layers and metrics. Experiments on RL-trained reasoning models show that our protocol outperforms existing output-level baselines for contamination detection.

Engineering Breakdown

The Problem

However, there has been little exploration on the problem of data contamination in RL post-training, potentially undermining generalization and evaluation reliability of the training process itself.

The Approach

We propose LaRA, a layer-wise representation analysis framework for detecting contamination in RL post-trained LLMs.

Key Results

Reinforcement learning (RL) post-training has shown to improve reasoning in large language models (LLMs).

Research Areas

This paper contributes to the following areas of AI/ML engineering:

Machine learning
Deep learning
Neural networks
Model optimization
AI systems
Layerwise

:::tip Subscribe Get weekly breakdowns of papers like this in AI Letters - the newsletter for engineers building production AI systems. :::

Back to Research Lab → · Subscribe to AI Letters →

Abstract​

Engineering Breakdown​

The Problem​

The Approach​

Key Results​

Research Areas​