Xetrieval: Mechanistically Explaining Dense Retrieval</p>\n","updatedAt":"2026-05-29T13:30:43.519Z","author":{"_id":"67b5c0d59782a5e2fd1eaae0","avatarUrl":"/avatars/8fd756905cec9b31ca1842af8ef1a373.svg","fullname":"Zhixin Cai","name":"hihiczx","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.47382381558418274},"editors":["hihiczx"],"editorAvatarUrls":["/avatars/8fd756905cec9b31ca1842af8ef1a373.svg"],"reactions":[],"isReport":false}},{"id":"6a1a40a22dd08064193b57af","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false},"createdAt":"2026-05-30T01:42:58.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [MARVEL: Multimodal Adaptive Reasoning-intensiVe Expand-rerank and retrievaL](https://huggingface.co/papers/2604.07079) (2026)\n* [Latent Abstraction for Retrieval-Augmented Generation](https://huggingface.co/papers/2604.17866) (2026)\n* [Conceptualizing Embeddings: Sparse Disentanglement for Vision-Language Models](https://huggingface.co/papers/2605.22679) (2026)\n* [PLUME: Latent Reasoning Based Universal Multimodal Embedding](https://huggingface.co/papers/2604.02073) (2026)\n* [Retrieval from Within: An Intrinsic Capability of Attention-Based Models](https://huggingface.co/papers/2605.05806) (2026)\n* [Semantic-Enriched Latent Visual Reasoning](https://huggingface.co/papers/2605.19342) (2026)\n* [Decompose, Look, and Reason: Reinforced Latent Reasoning for VLMs](https://huggingface.co/papers/2604.07518) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2604.07079\">MARVEL: Multimodal Adaptive Reasoning-intensiVe Expand-rerank and retrievaL</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.17866\">Latent Abstraction for Retrieval-Augmented Generation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.22679\">Conceptualizing Embeddings: Sparse Disentanglement for Vision-Language Models</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.02073\">PLUME: Latent Reasoning Based Universal Multimodal Embedding</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.05806\">Retrieval from Within: An Intrinsic Capability of Attention-Based Models</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.19342\">Semantic-Enriched Latent Visual Reasoning</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.07518\">Decompose, Look, and Reason: Reinforced Latent Reasoning for VLMs</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{"user":"librarian-bot"}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-05-30T01:42:58.814Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7033105492591858},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.29507","authors":[{"_id":"6a18ef3756b4bb14ec65cde8","user":{"_id":"67b5c0d59782a5e2fd1eaae0","avatarUrl":"/avatars/8fd756905cec9b31ca1842af8ef1a373.svg","isPro":false,"fullname":"Zhixin Cai","user":"hihiczx","type":"user","name":"hihiczx"},"name":"Zhixin Cai","status":"claimed_verified","statusLastChangedAt":"2026-05-29T08:51:20.430Z","hidden":false},{"_id":"6a18ef3756b4bb14ec65cde9","name":"Jun Bai","hidden":false},{"_id":"6a18ef3756b4bb14ec65cdea","name":"Yang Liu","hidden":false},{"_id":"6a18ef3756b4bb14ec65cdeb","name":"Jiaqi Li","hidden":false},{"_id":"6a18ef3756b4bb14ec65cdec","name":"Yichi Zhang","hidden":false},{"_id":"6a18ef3756b4bb14ec65cded","name":"Taichuan Li","hidden":false},{"_id":"6a18ef3756b4bb14ec65cdee","name":"Zhuofan Chen","hidden":false},{"_id":"6a18ef3756b4bb14ec65cdef","name":"Zixia Jia","hidden":false},{"_id":"6a18ef3756b4bb14ec65cdf0","name":"Zilong Zheng","hidden":false},{"_id":"6a18ef3756b4bb14ec65cdf1","name":"Wenge Rong","hidden":false}],"publishedAt":"2026-05-28T00:00:00.000Z","submittedOnDailyAt":"2026-05-29T00:00:00.000Z","title":"Xetrieval: Mechanistically Explaining Dense Retrieval","submittedOnDailyBy":{"_id":"67b5c0d59782a5e2fd1eaae0","avatarUrl":"/avatars/8fd756905cec9b31ca1842af8ef1a373.svg","isPro":false,"fullname":"Zhixin Cai","user":"hihiczx","type":"user","name":"hihiczx"},"summary":"Explaining why dense retrievers assign high relevance scores remains challenging because retrieval decisions are made through opaque high-dimensional embeddings. Existing explanations often focus on surface signals, such as lexical matches, token alignments, or post-hoc textual rationales, and thus provide limited insight into the latent factors that shape dense retrieval behavior at the embedding level. We propose Xetrieval, an embedding-level mechanistic framework for explaining dense retrieval. Xetrieval first introduces a lightweight reasoning internalizer that approximates Chain-of-Thought reasoning directly in the embedding space with a single forward pass, enriching sentence embeddings with reasoning-oriented information while avoiding expensive autoregressive generation. It then decomposes these reasoning-enhanced embeddings into sparse, human-interpretable features, each associated with a coherent natural language description. By aggregating sparse feature overlaps across multiple document-side views, Xetrieval provides feature-level explanations of individual retrieval decisions. Experiments on diverse retrievers and benchmarks show that Xetrieval uncovers coherent interpretable features, yields stronger pair-level intervention effects, and supports task-level feature steering. The project page and source code are available at https://hihiczx.github.io/Xetrieval .","upvotes":15,"discussionId":"6a18ef3756b4bb14ec65cdf2","projectPage":"https://hihiczx.github.io/Xetrieval/","githubRepo":"https://github.com/Hihiczx/Xetrieval","githubRepoAddedBy":"user","ai_summary":"Xetrieval is a mechanistic framework that explains dense retrieval by enhancing sentence embeddings with reasoning information and decomposing them into interpretable sparse features for retrieval decision explanations.","ai_keywords":["dense retrievers","high-dimensional embeddings","Chain-of-Thought reasoning","embedding space","reasoning internalizer","sparse features","human-interpretable features","feature-level explanations","retrieval decisions","pair-level intervention effects","task-level feature steering"],"githubStars":13,"organization":{"_id":"63ba7720fc454697637969f1","name":"Beihang","fullname":"Beihang University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/63ba7666c138c8f2b7844b58/n98lZU9VWxYgWIkzE_6o4.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"624505fcd083d28d314de3dd","avatarUrl":"/avatars/92cf6b6a1d81d7958dbbd21f0bf63f8f.svg","isPro":false,"fullname":"Jun Bai","user":"ba1jun","type":"user"},{"_id":"67b5c0d59782a5e2fd1eaae0","avatarUrl":"/avatars/8fd756905cec9b31ca1842af8ef1a373.svg","isPro":false,"fullname":"Zhixin Cai","user":"hihiczx","type":"user"},{"_id":"6a18f6f7ef7a33de1ba8c206","avatarUrl":"/avatars/ade38120c5d280cf40a9f1f38fc9f2d3.svg","isPro":false,"fullname":"yagaohang","user":"shdjshkas","type":"user"},{"_id":"671e0e5c941d8e30b64fdcf7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/UaLkGDxYCmKjmn1vMo4wE.png","isPro":false,"fullname":"steve","user":"XSskeX","type":"user"},{"_id":"65ae310a9ff731b7a5769340","avatarUrl":"/avatars/bdd178616da9962a9459a4c9971ce3d4.svg","isPro":false,"fullname":"chenli","user":"robin087","type":"user"},{"_id":"643281de07bad11484a74dcc","avatarUrl":"/avatars/ea054852ef8ef1cab2e6a8140fd14d87.svg","isPro":false,"fullname":"Tang Renjie","user":"V1tamin","type":"user"},{"_id":"6799c329d30dc065a2d8a6e8","avatarUrl":"/avatars/6ef82d746117cd8bbb042e2e807eb7b9.svg","isPro":false,"fullname":"Boqiao Li","user":"UUQ-2004","type":"user"},{"_id":"666d5262357aeb334f075079","avatarUrl":"/avatars/84c9de87ad6eb9fb3e1be944e5cb1725.svg","isPro":false,"fullname":"Yang","user":"yxccc","type":"user"},{"_id":"6a18ffbec2ce2c4d26ce1f84","avatarUrl":"/avatars/0cde34605a312ed20f0acd10046c744d.svg","isPro":false,"fullname":"lululuwmk lu","user":"lululuwmk","type":"user"},{"_id":"667b99f081adc76dc72f14c6","avatarUrl":"/avatars/48f39cffd3a5dee50a707671c5751f09.svg","isPro":false,"fullname":"Shuyi","user":"shuyi-zsy","type":"user"},{"_id":"6191cc9e6d34e827404cebab","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674119843175-6191cc9e6d34e827404cebab.jpeg","isPro":false,"fullname":"Yang","user":"jacklanda","type":"user"},{"_id":"65331f72b3852ed1ce9c5c06","avatarUrl":"/avatars/b2704f0820ca1f2a561742c978ce75e4.svg","isPro":false,"fullname":"bigainlco","user":"bigainlco","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"63ba7720fc454697637969f1","name":"Beihang","fullname":"Beihang University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/63ba7666c138c8f2b7844b58/n98lZU9VWxYgWIkzE_6o4.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.29507.md"}">
Xetrieval: Mechanistically Explaining Dense Retrieval
Abstract
Xetrieval is a mechanistic framework that explains dense retrieval by enhancing sentence embeddings with reasoning information and decomposing them into interpretable sparse features for retrieval decision explanations.
AI-generated summary
Explaining why dense retrievers assign high relevance scores remains challenging because retrieval decisions are made through opaque high-dimensional embeddings. Existing explanations often focus on surface signals, such as lexical matches, token alignments, or post-hoc textual rationales, and thus provide limited insight into the latent factors that shape dense retrieval behavior at the embedding level. We propose Xetrieval, an embedding-level mechanistic framework for explaining dense retrieval. Xetrieval first introduces a lightweight reasoning internalizer that approximates Chain-of-Thought reasoning directly in the embedding space with a single forward pass, enriching sentence embeddings with reasoning-oriented information while avoiding expensive autoregressive generation. It then decomposes these reasoning-enhanced embeddings into sparse, human-interpretable features, each associated with a coherent natural language description. By aggregating sparse feature overlaps across multiple document-side views, Xetrieval provides feature-level explanations of individual retrieval decisions. Experiments on diverse retrievers and benchmarks show that Xetrieval uncovers coherent interpretable features, yields stronger pair-level intervention effects, and supports task-level feature steering. The project page and source code are available at https://hihiczx.github.io/Xetrieval .
Community
Xetrieval: Mechanistically Explaining Dense Retrieval
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.29507 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.29507 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.29507 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.