Hugging Face Daily Papers · May 27, 2026 · 3 min read

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Cross-lingual contrastive preference tuning on self-generated responses, using reward model scores, transfers across 14 languages without needing language-specific preference annotations.</p>\n","updatedAt":"2026-05-27T12:25:53.555Z","author":{"_id":"60d33fbbd7b174177faabd4f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60d33fbbd7b174177faabd4f/pfyv_xj2B2m2N4F4sT9zJ.jpeg","fullname":"Mike Zhang","name":"jjzha","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":75,"isUserFollowing":false}},"numEdits":2,"identifiedLanguage":{"language":"en","probability":0.8694159388542175},"editors":["jjzha"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/60d33fbbd7b174177faabd4f/pfyv_xj2B2m2N4F4sT9zJ.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.26293","authors":[{"_id":"6a16e160991d34bf20350151","name":"Mike Zhang","hidden":false},{"_id":"6a16e160991d34bf20350152","name":"Ali Basirat","hidden":false},{"_id":"6a16e160991d34bf20350153","name":"Desmond Elliott","hidden":false}],"publishedAt":"2026-05-25T00:00:00.000Z","submittedOnDailyAt":"2026-05-27T00:00:00.000Z","title":"CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations","submittedOnDailyBy":{"_id":"60d33fbbd7b174177faabd4f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60d33fbbd7b174177faabd4f/pfyv_xj2B2m2N4F4sT9zJ.jpeg","isPro":true,"fullname":"Mike Zhang","user":"jjzha","type":"user","name":"jjzha"},"summary":"Prior work establishes that controlled contrastiveness between self-generated responses from large language models, set via reward scores, improves downstream preference tuning in English. We extend this method to multiple languages and evaluate two models across a total of 14 high and low-resource languages on a diverse set of tasks. Our central finding is that cross-lingual contrastive preference tuning on self-generations (CroCo) transfers without language-specific preference annotation. A reward model trained on English preferences (atop a multilingual base) produces useful within-language rankings across most languages, and pairing in either a monolingual or multilingual setting improves over each model on the majority of setups while preventing the catastrophic forgetting of supervised fine-tuning. We observe that the gains require on-policy data. Off-policy responses reduce the benefit and online preference optimization fails to improve over the offline variant. Specifically, on structured tasks, our method matches or exceeds the base in 6/7 languages for EuroLLM-9B and 4/7 settings for Aya-3B. On open-ended generation, both tuned models win against their respective base across 11 evaluated languages. Overall, we show promising directions for multilingual preference tuning.","upvotes":1,"discussionId":"6a16e160991d34bf20350154","githubRepo":"https://github.com/jjzha/CroCo","githubRepoAddedBy":"user","ai_summary":"Cross-lingual contrastive preference tuning enables multilingual language model improvement without language-specific annotations, achieving strong performance across diverse tasks and languages.","ai_keywords":["contrastive preference tuning","self-generations","reward model","multilingual base","supervised fine-tuning","catastrophic forgetting","on-policy data","off-policy responses","online preference optimization","structured tasks","open-ended generation"],"githubStars":0},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"60d33fbbd7b174177faabd4f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60d33fbbd7b174177faabd4f/pfyv_xj2B2m2N4F4sT9zJ.jpeg","isPro":true,"fullname":"Mike Zhang","user":"jjzha","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.26293.md"}">

Papers

arxiv:2605.26293

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

Published on May 25

· Submitted by

Mike Zhang on May 27

Upvote

Authors:

Abstract

Cross-lingual contrastive preference tuning enables multilingual language model improvement without language-specific annotations, achieving strong performance across diverse tasks and languages.

AI-generated summary

Prior work establishes that controlled contrastiveness between self-generated responses from large language models, set via reward scores, improves downstream preference tuning in English. We extend this method to multiple languages and evaluate two models across a total of 14 high and low-resource languages on a diverse set of tasks. Our central finding is that cross-lingual contrastive preference tuning on self-generations (CroCo) transfers without language-specific preference annotation. A reward model trained on English preferences (atop a multilingual base) produces useful within-language rankings across most languages, and pairing in either a monolingual or multilingual setting improves over each model on the majority of setups while preventing the catastrophic forgetting of supervised fine-tuning. We observe that the gains require on-policy data. Off-policy responses reduce the benefit and online preference optimization fails to improve over the offline variant. Specifically, on structured tasks, our method matches or exceeds the base in 6/7 languages for EuroLLM-9B and 4/7 settings for Aya-3B. On open-ended generation, both tuned models win against their respective base across 11 evaluated languages. Overall, we show promising directions for multilingual preference tuning.

View arXiv page View PDF GitHub 0 Add to collection

Community

jjzha

Paper submitter about 13 hours ago

•

edited about 13 hours ago

Cross-lingual contrastive preference tuning on self-generated responses, using reward model scores, transfers across 14 languages without needing language-specific preference annotations.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.26293

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.26293 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.26293 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.26293 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers