How does crosslingual work in practice?

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations covers croco, crosslingual, contrastive from first principles with code examples. Free lesson at https://engineersofai.com/docs/research/paper-breakdowns/2026-05-25-croco-crosslingual-contrastive-preference-tuning-on-selfgenerations

What is the difference between croco and contrastive?

See the full breakdown at https://engineersofai.com/docs/research/paper-breakdowns/2026-05-25-croco-crosslingual-contrastive-preference-tuning-on-selfgenerations

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

:::info Stub — Full Engineering Breakdown Coming This paper was featured on Hugging Face Daily Papers on 2026-05-25 with 3 upvotes. A full breakdown with production viability rating, implementation notes, and honest limitations is being written. Subscribe to AI Letters → :::


Authors	Mike Zhang et al.
Year	2026
HF Upvotes	3
arXiv	2605.26293
PDF	Download
HF Page	View on Hugging Face

Abstract

Prior work establishes that controlled contrastiveness between self-generated responses from large language models, set via reward scores, improves downstream preference tuning in English. We extend this method to multiple languages and evaluate two models across a total of 14 high and low-resource languages on a diverse set of tasks. Our central finding is that cross-lingual contrastive preference tuning on self-generations (CroCo) transfers without language-specific preference annotation. A reward model trained on English preferences (atop a multilingual base) produces useful within-language rankings across most languages, and pairing in either a monolingual or multilingual setting improves over each model on the majority of setups while preventing the catastrophic forgetting of supervised fine-tuning. We observe that the gains require on-policy data. Off-policy responses reduce the benefit and online preference optimization fails to improve over the offline variant. Specifically, on structured tasks, our method matches or exceeds the base in 6/7 languages for EuroLLM-9B and 4/7 settings for Aya-3B. On open-ended generation, both tuned models win against their respective base across 11 evaluated languages. Overall, we show promising directions for multilingual preference tuning.

Engineering Breakdown

The Problem

We observe that the gains require on-policy data.

The Approach

Specifically, on structured tasks, our method matches or exceeds the base in 6/7 languages for EuroLLM-9B and 4/7 settings for Aya-3B. Overall, we show promising directions for multilingual preference tuning.

Key Results

Off-policy responses reduce the benefit and online preference optimization fails to improve over the offline variant.

Research Areas

This paper contributes to the following areas of AI/ML engineering:

Machine learning
Deep learning
Neural networks
Model optimization
AI systems
Crosslingual

:::tip Subscribe Get weekly breakdowns of papers like this in AI Letters - the newsletter for engineers building production AI systems. :::

Back to Research Lab → · Subscribe to AI Letters →

Abstract​

Engineering Breakdown​

The Problem​

The Approach​

Key Results​

Research Areas​