Hugging Face Daily Papers · June 2, 2026 · 3 min read

Model-Based Quality Assessment for Massively Multilingual Parallel Data

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

We decompose model-based assessment for such data into two independent components: parallelism assessment with multilingual embeddings and reference-free quality estimation (QE). For parallelism, we benchmark four embedding models on FLORES-200 and BOUQuET retrieval tasks, covering 6,654 source--target directions in our target language-pair inventory. For QE, we evaluate nine reference-free evaluators on professional FLORES-200 translations across 41,412 ordered source--target directions.</p>\n","updatedAt":"2026-06-02T15:17:15.846Z","author":{"_id":"617a92e16f37340367d5d791","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/617a92e16f37340367d5d791/omgyzmaF90KBLa3YgFxhS.png","fullname":"Shaoxiong","name":"jisx","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7778336405754089},"editors":["jisx"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/617a92e16f37340367d5d791/omgyzmaF90KBLa3YgFxhS.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.00285","authors":[{"_id":"6a1ef385e292c1c78ecb1101","name":"Abdelaziz M. A. Ibrahim","hidden":false},{"_id":"6a1ef385e292c1c78ecb1102","name":"Zihao Li","hidden":false},{"_id":"6a1ef385e292c1c78ecb1103","name":"Jörg Tiedemann","hidden":false},{"_id":"6a1ef385e292c1c78ecb1104","name":"Shaoxiong Ji","hidden":false}],"publishedAt":"2026-05-29T00:00:00.000Z","submittedOnDailyAt":"2026-06-02T00:00:00.000Z","title":"Model-Based Quality Assessment for Massively Multilingual Parallel Data","submittedOnDailyBy":{"_id":"617a92e16f37340367d5d791","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/617a92e16f37340367d5d791/omgyzmaF90KBLa3YgFxhS.png","isPro":false,"fullname":"Shaoxiong","user":"jisx","type":"user","name":"jisx"},"summary":"Large-scale multilingual bitext often contains two distinct problems: non-parallel sentence pairs and low-quality translations. We decompose model-based assessment for such data into two independent components: parallelism assessment with multilingual embeddings and reference-free quality estimation (QE). For parallelism, we benchmark four embedding models on FLORES-200 and BOUQuET retrieval tasks, covering 6,654 source--target directions in our target language-pair inventory. For QE, we evaluate nine reference-free evaluators on professional FLORES-200 translations across 41,412 ordered source--target directions. Results show that no model is universally reliable across translation directions. Naive QE ensembles dilute strong model signals, while documented target-language coverage is strongly associated with higher QE scores. Overall, these findings suggest that multilingual parallel-data assessment is best approached as a direction-aware routing and calibration problem, where no single universal metric is expected to suffice across all languages.","upvotes":1,"discussionId":"6a1ef385e292c1c78ecb1105","ai_summary":"Multilingual parallel-data assessment requires direction-specific approaches rather than universal metrics due to varying performance across language pairs.","ai_keywords":["multilingual embeddings","reference-free quality estimation","parallelism assessment","FLORES-200","BOUQuET","multilingual bitext"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"651150db327d22455c5d1e28","name":"MaLA-LM","fullname":"MaLA-LM","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/617a92e16f37340367d5d791/ZTrhLcWPyiDTdELAMs13j.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6a1f2e746ffb1e2b7b364877","avatarUrl":"/avatars/85ce11e3c68e7ca35dfe72f69e7ed23b.svg","isPro":false,"fullname":"Nick Han","user":"nickhqqq","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"651150db327d22455c5d1e28","name":"MaLA-LM","fullname":"MaLA-LM","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/617a92e16f37340367d5d791/ZTrhLcWPyiDTdELAMs13j.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.00285.md"}">

Papers

arxiv:2606.00285

Model-Based Quality Assessment for Massively Multilingual Parallel Data

Published on May 29

· Submitted by

Shaoxiong on Jun 2

MaLA-LM

Upvote

Authors:

Abstract

Multilingual parallel-data assessment requires direction-specific approaches rather than universal metrics due to varying performance across language pairs.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Large-scale multilingual bitext often contains two distinct problems: non-parallel sentence pairs and low-quality translations. We decompose model-based assessment for such data into two independent components: parallelism assessment with multilingual embeddings and reference-free quality estimation (QE). For parallelism, we benchmark four embedding models on FLORES-200 and BOUQuET retrieval tasks, covering 6,654 source--target directions in our target language-pair inventory. For QE, we evaluate nine reference-free evaluators on professional FLORES-200 translations across 41,412 ordered source--target directions. Results show that no model is universally reliable across translation directions. Naive QE ensembles dilute strong model signals, while documented target-language coverage is strongly associated with higher QE scores. Overall, these findings suggest that multilingual parallel-data assessment is best approached as a direction-aware routing and calibration problem, where no single universal metric is expected to suffice across all languages.

View arXiv page View PDF Add to collection

Community

jisx

Paper submitter about 11 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.00285

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.00285 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.00285 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.00285 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Model-Based Quality Assessment for Massively Multilingual Parallel Data

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers