Hugging Face Daily Papers · · 5 min read

MLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocal

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2604.20199\">All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.05821\">CLEAR: Cross-Lingual Enhancement in Alignment via Reverse-training</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.25676\">CORAL: Adaptive Retrieval Loop for Culturally-Aligned Multilingual RAG</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.17632\">Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.01733\">From BM25 to Corrective RAG: Benchmarking Retrieval Strategies for Text-and-Table Documents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.25182\">CroSearch-R1: Better Leveraging Cross-lingual Knowledge for Retrieval-Augmented Generation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.05684\">Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{&quot;user&quot;:&quot;librarian-bot&quot;}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-05-19T01:46:26.956Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":357,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7026320099830627},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.07249","authors":[{"_id":"6a0b3d798ca2d0b2563801a9","name":"Youngjoon Jang","hidden":false},{"_id":"6a0b3d798ca2d0b2563801aa","name":"Seongtae Hong","hidden":false},{"_id":"6a0b3d798ca2d0b2563801ab","name":"Hyeonseok Moon","hidden":false},{"_id":"6a0b3d798ca2d0b2563801ac","name":"Heuiseok Lim","hidden":false}],"publishedAt":"2026-05-08T00:00:00.000Z","submittedOnDailyAt":"2026-05-18T00:00:00.000Z","title":"MLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocal","submittedOnDailyBy":{"_id":"65a4c4ed2548c41ad9b1421c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65a4c4ed2548c41ad9b1421c/bMQbowjHKvq-bKpzalvWm.jpeg","isPro":false,"fullname":"Youngjoon Jang","user":"yjoonjang","type":"user","name":"yjoonjang"},"summary":"Multilingual Information Retrieval is increasingly important in real-world search settings, where users issue queries over mixed-language corpora. Existing evaluations mainly reward language-agnostic semantic relevance, treating relevant passages equally regardless of language. Yet retrieval utility also depends on the language of the retrieved passages: users may prefer results they can read and verify in the query language, and query--passage language mismatch can complicate downstream grounding and answer verification in Retrieval-Augmented Generation systems. To evaluate this language-aware dimension, we introduce MLAIRE, a Multilingual Language-Aware Information Retrieval Evaluation protocol that disentangles cross-lingual semantic retrieval from query-language preference. MLAIRE constructs controlled pools with parallel passages across languages, enabling measurement of semantic retrieval accuracy and query-language preference when equivalent translations are available. We propose language-aware metrics, including Language Preference Rate (LPR) and Lang-nDCG, together with a 4-way decomposition separating semantic and query-language preference failures. Evaluating 31 dense, sparse, and late-interaction retrievers, we show that standard metrics obscure distinct behaviors: semantically strong retrievers may return correct content in a non-query language, while retrievers with stronger query-language preference may retrieve less semantically relevant passages.","upvotes":0,"discussionId":"6a0b3d798ca2d0b2563801ad","ai_summary":"Multilingual information retrieval evaluation protocol MLAIRE separates semantic retrieval accuracy from query-language preference to better assess retrieval utility across mixed-language corpora.","ai_keywords":["Multilingual Information Retrieval","Retrieval-Augmented Generation","language-aware metrics","Language Preference Rate","Lang-nDCG","cross-lingual semantic retrieval","query-language preference","dense retrievers","sparse retrievers","late-interaction retrievers"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.07249.md"}">
Papers
arxiv:2605.07249

MLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocal

Published on May 8
· Submitted by
Youngjoon Jang
on May 18
Authors:
,
,
,

Abstract

Multilingual information retrieval evaluation protocol MLAIRE separates semantic retrieval accuracy from query-language preference to better assess retrieval utility across mixed-language corpora.

AI-generated summary

Multilingual Information Retrieval is increasingly important in real-world search settings, where users issue queries over mixed-language corpora. Existing evaluations mainly reward language-agnostic semantic relevance, treating relevant passages equally regardless of language. Yet retrieval utility also depends on the language of the retrieved passages: users may prefer results they can read and verify in the query language, and query--passage language mismatch can complicate downstream grounding and answer verification in Retrieval-Augmented Generation systems. To evaluate this language-aware dimension, we introduce MLAIRE, a Multilingual Language-Aware Information Retrieval Evaluation protocol that disentangles cross-lingual semantic retrieval from query-language preference. MLAIRE constructs controlled pools with parallel passages across languages, enabling measurement of semantic retrieval accuracy and query-language preference when equivalent translations are available. We propose language-aware metrics, including Language Preference Rate (LPR) and Lang-nDCG, together with a 4-way decomposition separating semantic and query-language preference failures. Evaluating 31 dense, sparse, and late-interaction retrievers, we show that standard metrics obscure distinct behaviors: semantically strong retrievers may return correct content in a non-query language, while retrievers with stronger query-language preference may retrieve less semantically relevant passages.

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.07249
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.07249 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.07249 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.07249 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers