This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.24236\">MeVer at CheckThat! 2026: Cluster-Aware Hard-Negative Mining for Multilingual Scientific-Source Retrieval</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.19047\">RARE: Redundancy-Aware Retrieval Evaluation Framework for High-Similarity Corpora</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.11092\">ARHN: Answer-Centric Relabeling of Hard Negatives with Open-Source LLMs for Dense Retrieval</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.06097\">Masking or Mitigating? Deconstructing the Impact of Query Rewriting on Retriever Biases in RAG</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.03642\">LLM-based Listwise Reranking under the Effect of Positional Bias</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.27105\">Lost in the Evidence? Reproducing Document Position and Context Size Effects in RAG</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.20683\">Layer-wise Token Compression for Efficient Document Reranking</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{"user":"librarian-bot"}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-05-30T01:43:05.323Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7190007567405701},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.26578","authors":[{"_id":"6a191dbc56b4bb14ec65d075","name":"Daegon Yu","hidden":false},{"_id":"6a191dbc56b4bb14ec65d076","user":{"_id":"665868596e4d9a66a1b2c779","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/ZiZpk_ZEjqSoBEj6rWgI_.png","isPro":false,"fullname":"SeungYoon Han","user":"seungyoonee","type":"user","name":"seungyoonee"},"name":"SeungYoon Han","status":"claimed_verified","statusLastChangedAt":"2026-05-29T08:49:53.785Z","hidden":false},{"_id":"6a191dbc56b4bb14ec65d077","name":"Woomyoung Park","hidden":false}],"publishedAt":"2026-05-26T00:00:00.000Z","submittedOnDailyAt":"2026-05-29T00:00:00.000Z","title":"Is Position Bias in Dense Retrievers Built In-or Learned from Data?","submittedOnDailyBy":{"_id":"665868596e4d9a66a1b2c779","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/ZiZpk_ZEjqSoBEj6rWgI_.png","isPro":false,"fullname":"SeungYoon Han","user":"seungyoonee","type":"user","name":"seungyoonee"},"summary":"Dense retrievers exhibit positional bias, favoring documents whose query-relevant information appears near the beginning and degrading retrieval performance when the information appears later. While prior work on positional bias in dense retrievers has largely focused on architectural explanations, we study how the positional distribution of evidence in training data affects retrieval-level bias direction. To test this, we construct synthetic position-targeted training sets in which query-relevant evidence appears at the beginning, middle, or end of documents, and fine-tune eight architecturally diverse pretrained models under position-skewed and balanced training distributions. At the ranking level, we observe a strong directional pattern across the examined models: skewed training distributions favor evidence at the corresponding positions. Position-balanced training reduces positional sensitivity by 57--87\\% on position-aware benchmarks, with competitive mean retrieval performance in our controlled setting. Representation-level analyses further suggest that fine-tuning often reshapes learned positional preferences, although pre-existing architectural or pretraining-specific tendencies persist in some models. These results identify training-position distribution as a major controllable factor in retrieval-level position bias and suggest balanced data curation as a practical mitigation strategy.","upvotes":11,"discussionId":"6a191dbc56b4bb14ec65d078","ai_summary":"Training data position distribution significantly influences positional bias in dense retrievers, with balanced training reducing sensitivity by up to 87% while maintaining competitive retrieval performance.","ai_keywords":["dense retrievers","positional bias","query-relevant information","training data distribution","fine-tuning","pretrained models","position-skewed training","position-balanced training","retrieval performance","representation-level analysis"],"organization":{"_id":"64ef692011e7dfe955883ed1","name":"sionic-ai","fullname":"sionic-ai","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ef68e04a9ce403b210f307/s9Rcsvnjz1v2HunQy9rY4.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64664217e9906a259f33dd92","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64664217e9906a259f33dd92/rnTr9-OnmKOLk6Mdc8el8.png","isPro":false,"fullname":"kangsungwoo","user":"KSWKSM","type":"user"},{"_id":"633b1f6a0d68f86e2d9528cd","avatarUrl":"/avatars/ed4029192b8e101b1e2c9dd7a9f980dc.svg","isPro":false,"fullname":"정세민","user":"devtorry","type":"user"},{"_id":"68ee45cb9a6f942b016a3dcd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68ee45cb9a6f942b016a3dcd/Pwg429hBM6Mg24Cbneie0.jpeg","isPro":false,"fullname":"Joon","user":"mconcat","type":"user"},{"_id":"642b0c2fecec03b4464a1d9b","avatarUrl":"/avatars/76c794c7c17503ae9a0d5336395bd132.svg","isPro":false,"fullname":"KuKu","user":"dragonkue","type":"user"},{"_id":"665868596e4d9a66a1b2c779","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/ZiZpk_ZEjqSoBEj6rWgI_.png","isPro":false,"fullname":"SeungYoon Han","user":"seungyoonee","type":"user"},{"_id":"64ec4c04c782d648d28d70fc","avatarUrl":"/avatars/6975526fcf4b513cc934b5bc45370a48.svg","isPro":false,"fullname":"Sukmin Cho","user":"zomss","type":"user"},{"_id":"6456e8934a7ffb7d5a4b6b45","avatarUrl":"/avatars/f022da2b8d5036a5ad0c19001220de97.svg","isPro":false,"fullname":"de","user":"kaki-paper","type":"user"},{"_id":"67c381e5de068b30716bcf14","avatarUrl":"/avatars/bc562fb76d34d65dc1269786226b30fe.svg","isPro":false,"fullname":"Jeong","user":"Kindred1000","type":"user"},{"_id":"6a194400c08aa4869ef28934","avatarUrl":"/avatars/1e835c2e60fdc51e4ac80456c2867970.svg","isPro":false,"fullname":"kim","user":"bukyo","type":"user"},{"_id":"6a19445a0ccf9357520d23ab","avatarUrl":"/avatars/6a18ae7d47508ff415754c6299e70093.svg","isPro":false,"fullname":"shimsewon","user":"shim111111","type":"user"},{"_id":"65a4c4ed2548c41ad9b1421c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65a4c4ed2548c41ad9b1421c/bMQbowjHKvq-bKpzalvWm.jpeg","isPro":false,"fullname":"Youngjoon Jang","user":"yjoonjang","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"64ef692011e7dfe955883ed1","name":"sionic-ai","fullname":"sionic-ai","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ef68e04a9ce403b210f307/s9Rcsvnjz1v2HunQy9rY4.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.26578.md"}">
Is Position Bias in Dense Retrievers Built In-or Learned from Data?
Abstract
Training data position distribution significantly influences positional bias in dense retrievers, with balanced training reducing sensitivity by up to 87% while maintaining competitive retrieval performance.
AI-generated summary
Dense retrievers exhibit positional bias, favoring documents whose query-relevant information appears near the beginning and degrading retrieval performance when the information appears later. While prior work on positional bias in dense retrievers has largely focused on architectural explanations, we study how the positional distribution of evidence in training data affects retrieval-level bias direction. To test this, we construct synthetic position-targeted training sets in which query-relevant evidence appears at the beginning, middle, or end of documents, and fine-tune eight architecturally diverse pretrained models under position-skewed and balanced training distributions. At the ranking level, we observe a strong directional pattern across the examined models: skewed training distributions favor evidence at the corresponding positions. Position-balanced training reduces positional sensitivity by 57--87\% on position-aware benchmarks, with competitive mean retrieval performance in our controlled setting. Representation-level analyses further suggest that fine-tuning often reshapes learned positional preferences, although pre-existing architectural or pretraining-specific tendencies persist in some models. These results identify training-position distribution as a major controllable factor in retrieval-level position bias and suggest balanced data curation as a practical mitigation strategy.
Community
This comment has been hidden (marked as Resolved) This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.26578 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.26578 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.26578 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.