Hugging Face Daily Papers · May 29, 2026 · 5 min read

Xetrieval: Mechanistically Explaining Dense Retrieval

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Xetrieval: Mechanistically Explaining Dense Retrieval\n","updatedAt":"2026-05-29T13:30:43.519Z","author":{"_id":"67b5c0d59782a5e2fd1eaae0","avatarUrl":"/avatars/8fd756905cec9b31ca1842af8ef1a373.svg","fullname":"Zhixin Cai","name":"hihiczx","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.47382381558418274},"editors":["hihiczx"],"editorAvatarUrls":["/avatars/8fd756905cec9b31ca1842af8ef1a373.svg"],"reactions":[],"isReport":false}},{"id":"6a1a40a22dd08064193b57af","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false},"createdAt":"2026-05-30T01:42:58.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [MARVEL: Multimodal Adaptive Reasoning-intensiVe Expand-rerank and retrievaL](https://huggingface.co/papers/2604.07079) (2026)\n* [Latent Abstraction for Retrieval-Augmented Generation](https://huggingface.co/papers/2604.17866) (2026)\n* [Conceptualizing Embeddings: Sparse Disentanglement for Vision-Language Models](https://huggingface.co/papers/2605.22679) (2026)\n* [PLUME: Latent Reasoning Based Universal Multimodal Embedding](https://huggingface.co/papers/2604.02073) (2026)\n* [Retrieval from Within: An Intrinsic Capability of Attention-Based Models](https://huggingface.co/papers/2605.05806) (2026)\n* [Semantic-Enriched Latent Visual Reasoning](https://huggingface.co/papers/2605.19342) (2026)\n* [Decompose, Look, and Reason: Reinforced Latent Reasoning for VLMs](https://huggingface.co/papers/2604.07518) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. \nThe following papers were recommended by the Semantic Scholar API \n<ul>\n<li><a href=\"https://huggingface.co/papers/2604.07079\">MARVEL: Multimodal Adaptive Reasoning-intensiVe Expand-rerank and retrievaL</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.17866\">Latent Abstraction for Retrieval-Augmented Generation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.22679\">Conceptualizing Embeddings: Sparse Disentanglement for Vision-Language Models</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.02073\">PLUME: Latent Reasoning Based Universal Multimodal Embedding</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.05806\">Retrieval from Within: An Intrinsic Capability of Attention-Based Models</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.19342\">Semantic-Enriched Latent Visual Reasoning</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.07518\">Decompose, Look, and Reason: Reinforced Latent Reasoning for VLMs</a> (2026)</li>\n</ul>\n Please give a thumbs up to this comment if you found it helpful!\n If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><a href=\"/librarian-bot\">@librarian-bot</a> recommend</code>\n","updatedAt":"2026-05-30T01:42:58.814Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7033105492591858},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.29507","authors":[{"_id":"6a18ef3756b4bb14ec65cde8","user":{"_id":"67b5c0d59782a5e2fd1eaae0","avatarUrl":"/avatars/8fd756905cec9b31ca1842af8ef1a373.svg","isPro":false,"fullname":"Zhixin Cai","user":"hihiczx","type":"user","name":"hihiczx"},"name":"Zhixin Cai","status":"claimed_verified","statusLastChangedAt":"2026-05-29T08:51:20.430Z","hidden":false},{"_id":"6a18ef3756b4bb14ec65cde9","name":"Jun Bai","hidden":false},{"_id":"6a18ef3756b4bb14ec65cdea","name":"Yang Liu","hidden":false},{"_id":"6a18ef3756b4bb14ec65cdeb","name":"Jiaqi Li","hidden":false},{"_id":"6a18ef3756b4bb14ec65cdec","name":"Yichi Zhang","hidden":false},{"_id":"6a18ef3756b4bb14ec65cded","name":"Taichuan Li","hidden":false},{"_id":"6a18ef3756b4bb14ec65cdee","name":"Zhuofan Chen","hidden":false},{"_id":"6a18ef3756b4bb14ec65cdef","name":"Zixia Jia","hidden":false},{"_id":"6a18ef3756b4bb14ec65cdf0","name":"Zilong Zheng","hidden":false},{"_id":"6a18ef3756b4bb14ec65cdf1","name":"Wenge Rong","hidden":false}],"publishedAt":"2026-05-28T00:00:00.000Z","submittedOnDailyAt":"2026-05-29T00:00:00.000Z","title":"Xetrieval: Mechanistically Explaining Dense Retrieval","submittedOnDailyBy":{"_id":"67b5c0d59782a5e2fd1eaae0","avatarUrl":"/avatars/8fd756905cec9b31ca1842af8ef1a373.svg","isPro":false,"fullname":"Zhixin Cai","user":"hihiczx","type":"user","name":"hihiczx"},"summary":"Explaining why dense retrievers assign high relevance scores remains challenging because retrieval decisions are made through opaque high-dimensional embeddings. Existing explanations often focus on surface signals, such as lexical matches, token alignments, or post-hoc textual rationales, and thus provide limited insight into the latent factors that shape dense retrieval behavior at the embedding level. We propose Xetrieval, an embedding-level mechanistic framework for explaining dense retrieval. Xetrieval first introduces a lightweight reasoning internalizer that approximates Chain-of-Thought reasoning directly in the embedding space with a single forward pass, enriching sentence embeddings with reasoning-oriented information while avoiding expensive autoregressive generation. It then decomposes these reasoning-enhanced embeddings into sparse, human-interpretable features, each associated with a coherent natural language description. By aggregating sparse feature overlaps across multiple document-side views, Xetrieval provides feature-level explanations of individual retrieval decisions. Experiments on diverse retrievers and benchmarks show that Xetrieval uncovers coherent interpretable features, yields stronger pair-level intervention effects, and supports task-level feature steering. The project page and source code are available at https://hihiczx.github.io/Xetrieval .","upvotes":15,"discussionId":"6a18ef3756b4bb14ec65cdf2","projectPage":"https://hihiczx.github.io/Xetrieval/","githubRepo":"https://github.com/Hihiczx/Xetrieval","githubRepoAddedBy":"user","ai_summary":"Xetrieval is a mechanistic framework that explains dense retrieval by enhancing sentence embeddings with reasoning information and decomposing them into interpretable sparse features for retrieval decision explanations.","ai_keywords":["dense retrievers","high-dimensional embeddings","Chain-of-Thought reasoning","embedding space","reasoning internalizer","sparse features","human-interpretable features","feature-level explanations","retrieval decisions","pair-level intervention effects","task-level feature steering"],"githubStars":13,"organization":{"_id":"63ba7720fc454697637969f1","name":"Beihang","fullname":"Beihang University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/63ba7666c138c8f2b7844b58/n98lZU9VWxYgWIkzE_6o4.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"624505fcd083d28d314de3dd","avatarUrl":"/avatars/92cf6b6a1d81d7958dbbd21f0bf63f8f.svg","isPro":false,"fullname":"Jun Bai","user":"ba1jun","type":"user"},{"_id":"67b5c0d59782a5e2fd1eaae0","avatarUrl":"/avatars/8fd756905cec9b31ca1842af8ef1a373.svg","isPro":false,"fullname":"Zhixin Cai","user":"hihiczx","type":"user"},{"_id":"6a18f6f7ef7a33de1ba8c206","avatarUrl":"/avatars/ade38120c5d280cf40a9f1f38fc9f2d3.svg","isPro":false,"fullname":"yagaohang","user":"shdjshkas","type":"user"},{"_id":"671e0e5c941d8e30b64fdcf7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/UaLkGDxYCmKjmn1vMo4wE.png","isPro":false,"fullname":"steve","user":"XSskeX","type":"user"},{"_id":"65ae310a9ff731b7a5769340","avatarUrl":"/avatars/bdd178616da9962a9459a4c9971ce3d4.svg","isPro":false,"fullname":"chenli","user":"robin087","type":"user"},{"_id":"643281de07bad11484a74dcc","avatarUrl":"/avatars/ea054852ef8ef1cab2e6a8140fd14d87.svg","isPro":false,"fullname":"Tang Renjie","user":"V1tamin","type":"user"},{"_id":"6799c329d30dc065a2d8a6e8","avatarUrl":"/avatars/6ef82d746117cd8bbb042e2e807eb7b9.svg","isPro":false,"fullname":"Boqiao Li","user":"UUQ-2004","type":"user"},{"_id":"666d5262357aeb334f075079","avatarUrl":"/avatars/84c9de87ad6eb9fb3e1be944e5cb1725.svg","isPro":false,"fullname":"Yang","user":"yxccc","type":"user"},{"_id":"6a18ffbec2ce2c4d26ce1f84","avatarUrl":"/avatars/0cde34605a312ed20f0acd10046c744d.svg","isPro":false,"fullname":"lululuwmk lu","user":"lululuwmk","type":"user"},{"_id":"667b99f081adc76dc72f14c6","avatarUrl":"/avatars/48f39cffd3a5dee50a707671c5751f09.svg","isPro":false,"fullname":"Shuyi","user":"shuyi-zsy","type":"user"},{"_id":"6191cc9e6d34e827404cebab","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674119843175-6191cc9e6d34e827404cebab.jpeg","isPro":false,"fullname":"Yang","user":"jacklanda","type":"user"},{"_id":"65331f72b3852ed1ce9c5c06","avatarUrl":"/avatars/b2704f0820ca1f2a561742c978ce75e4.svg","isPro":false,"fullname":"bigainlco","user":"bigainlco","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"63ba7720fc454697637969f1","name":"Beihang","fullname":"Beihang University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/63ba7666c138c8f2b7844b58/n98lZU9VWxYgWIkzE_6o4.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.29507.md"}">

Papers

arxiv:2605.29507

Xetrieval: Mechanistically Explaining Dense Retrieval

Published on May 28

· Submitted by

Zhixin Cai on May 29

Beihang University

Upvote

Authors:

Zhixin Cai ,

Abstract

Xetrieval is a mechanistic framework that explains dense retrieval by enhancing sentence embeddings with reasoning information and decomposing them into interpretable sparse features for retrieval decision explanations.

AI-generated summary

Explaining why dense retrievers assign high relevance scores remains challenging because retrieval decisions are made through opaque high-dimensional embeddings. Existing explanations often focus on surface signals, such as lexical matches, token alignments, or post-hoc textual rationales, and thus provide limited insight into the latent factors that shape dense retrieval behavior at the embedding level. We propose Xetrieval, an embedding-level mechanistic framework for explaining dense retrieval. Xetrieval first introduces a lightweight reasoning internalizer that approximates Chain-of-Thought reasoning directly in the embedding space with a single forward pass, enriching sentence embeddings with reasoning-oriented information while avoiding expensive autoregressive generation. It then decomposes these reasoning-enhanced embeddings into sparse, human-interpretable features, each associated with a coherent natural language description. By aggregating sparse feature overlaps across multiple document-side views, Xetrieval provides feature-level explanations of individual retrieval decisions. Experiments on diverse retrievers and benchmarks show that Xetrieval uncovers coherent interpretable features, yields stronger pair-level intervention effects, and supports task-level feature steering. The project page and source code are available at https://hihiczx.github.io/Xetrieval .