We decompose model-based assessment for such data into two independent components: parallelism assessment with multilingual embeddings and reference-free quality estimation (QE). For parallelism, we benchmark four embedding models on FLORES-200 and BOUQuET retrieval tasks, covering 6,654 source--target directions in our target language-pair inventory. For QE, we evaluate nine reference-free evaluators on professional FLORES-200 translations across 41,412 ordered source--target directions.</p>\n","updatedAt":"2026-06-02T15:17:15.846Z","author":{"_id":"617a92e16f37340367d5d791","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/617a92e16f37340367d5d791/omgyzmaF90KBLa3YgFxhS.png","fullname":"Shaoxiong","name":"jisx","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7778336405754089},"editors":["jisx"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/617a92e16f37340367d5d791/omgyzmaF90KBLa3YgFxhS.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.00285","authors":[{"_id":"6a1ef385e292c1c78ecb1101","name":"Abdelaziz M. A. Ibrahim","hidden":false},{"_id":"6a1ef385e292c1c78ecb1102","name":"Zihao Li","hidden":false},{"_id":"6a1ef385e292c1c78ecb1103","name":"Jörg Tiedemann","hidden":false},{"_id":"6a1ef385e292c1c78ecb1104","name":"Shaoxiong Ji","hidden":false}],"publishedAt":"2026-05-29T00:00:00.000Z","submittedOnDailyAt":"2026-06-02T00:00:00.000Z","title":"Model-Based Quality Assessment for Massively Multilingual Parallel Data","submittedOnDailyBy":{"_id":"617a92e16f37340367d5d791","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/617a92e16f37340367d5d791/omgyzmaF90KBLa3YgFxhS.png","isPro":false,"fullname":"Shaoxiong","user":"jisx","type":"user","name":"jisx"},"summary":"Large-scale multilingual bitext often contains two distinct problems: non-parallel sentence pairs and low-quality translations. We decompose model-based assessment for such data into two independent components: parallelism assessment with multilingual embeddings and reference-free quality estimation (QE). For parallelism, we benchmark four embedding models on FLORES-200 and BOUQuET retrieval tasks, covering 6,654 source--target directions in our target language-pair inventory. For QE, we evaluate nine reference-free evaluators on professional FLORES-200 translations across 41,412 ordered source--target directions. Results show that no model is universally reliable across translation directions. Naive QE ensembles dilute strong model signals, while documented target-language coverage is strongly associated with higher QE scores. Overall, these findings suggest that multilingual parallel-data assessment is best approached as a direction-aware routing and calibration problem, where no single universal metric is expected to suffice across all languages.","upvotes":1,"discussionId":"6a1ef385e292c1c78ecb1105","ai_summary":"Multilingual parallel-data assessment requires direction-specific approaches rather than universal metrics due to varying performance across language pairs.","ai_keywords":["multilingual embeddings","reference-free quality estimation","parallelism assessment","FLORES-200","BOUQuET","multilingual bitext"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"651150db327d22455c5d1e28","name":"MaLA-LM","fullname":"MaLA-LM","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/617a92e16f37340367d5d791/ZTrhLcWPyiDTdELAMs13j.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6a1f2e746ffb1e2b7b364877","avatarUrl":"/avatars/85ce11e3c68e7ca35dfe72f69e7ed23b.svg","isPro":false,"fullname":"Nick Han","user":"nickhqqq","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"651150db327d22455c5d1e28","name":"MaLA-LM","fullname":"MaLA-LM","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/617a92e16f37340367d5d791/ZTrhLcWPyiDTdELAMs13j.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.00285.md"}">
Model-Based Quality Assessment for Massively Multilingual Parallel Data
Abstract
Multilingual parallel-data assessment requires direction-specific approaches rather than universal metrics due to varying performance across language pairs.
Large-scale multilingual bitext often contains two distinct problems: non-parallel sentence pairs and low-quality translations. We decompose model-based assessment for such data into two independent components: parallelism assessment with multilingual embeddings and reference-free quality estimation (QE). For parallelism, we benchmark four embedding models on FLORES-200 and BOUQuET retrieval tasks, covering 6,654 source--target directions in our target language-pair inventory. For QE, we evaluate nine reference-free evaluators on professional FLORES-200 translations across 41,412 ordered source--target directions. Results show that no model is universally reliable across translation directions. Naive QE ensembles dilute strong model signals, while documented target-language coverage is strongly associated with higher QE scores. Overall, these findings suggest that multilingual parallel-data assessment is best approached as a direction-aware routing and calibration problem, where no single universal metric is expected to suffice across all languages.
Community
We decompose model-based assessment for such data into two independent components: parallelism assessment with multilingual embeddings and reference-free quality estimation (QE). For parallelism, we benchmark four embedding models on FLORES-200 and BOUQuET retrieval tasks, covering 6,654 source--target directions in our target language-pair inventory. For QE, we evaluate nine reference-free evaluators on professional FLORES-200 translations across 41,412 ordered source--target directions.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.00285 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.00285 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.00285 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.