Hugging Face Daily Papers · · 6 min read

Retrieval from Within: An Intrinsic Capability of Attention-Based Models

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Retrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrieval via Attention), a framework where decoder attention queries score pre-encoded evidence chunks that are then directly reused as context for generation. By construction, INTRA unifies retrieval and generation, eliminating the retriever-generator mismatch typical of RAG pipelines. This design also amortizes context encoding by reusing precomputed encoder states across queries. On question-answering benchmarks, INTRA outperforms strong engineered retrieval pipelines on both evidence recall and end-to-end answer quality. Our results demonstrate that attention-based models already possess a retrieval mechanism that can be elicited, rather than added as an external module.</p>\n","updatedAt":"2026-05-14T14:22:05.802Z","author":{"_id":"630480fa6dbbb80f16352ee3","avatarUrl":"/avatars/f39ce2fe96a578f42a57e3bfe3a2d137.svg","fullname":"Elad Hoffer","name":"ehoffer","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8860291838645935},"editors":["ehoffer"],"editorAvatarUrls":["/avatars/f39ce2fe96a578f42a57e3bfe3a2d137.svg"],"reactions":[],"isReport":false}},{"id":"6a067aa99c42a63f2c32bea4","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":355,"isUserFollowing":false},"createdAt":"2026-05-15T01:45:13.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Transforming External Knowledge into Triplets for Enhanced Retrieval in RAG of LLMs](https://huggingface.co/papers/2604.12610) (2026)\n* [Latent Abstraction for Retrieval-Augmented Generation](https://huggingface.co/papers/2604.17866) (2026)\n* [Bottleneck Tokens for Unified Multimodal Retrieval](https://huggingface.co/papers/2604.11095) (2026)\n* [A Unified Model and Document Representation for On-Device Retrieval-Augmented Generation](https://huggingface.co/papers/2604.14403) (2026)\n* [R$^3$AG: Retriever Routing for Retrieval-Augmented Generation](https://huggingface.co/papers/2604.22849) (2026)\n* [From BM25 to Corrective RAG: Benchmarking Retrieval Strategies for Text-and-Table Documents](https://huggingface.co/papers/2604.01733) (2026)\n* [SAGE: Selective Attention-Guided Extraction for Token-Efficient Document Indexing](https://huggingface.co/papers/2604.15583) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2604.12610\">Transforming External Knowledge into Triplets for Enhanced Retrieval in RAG of LLMs</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.17866\">Latent Abstraction for Retrieval-Augmented Generation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.11095\">Bottleneck Tokens for Unified Multimodal Retrieval</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.14403\">A Unified Model and Document Representation for On-Device Retrieval-Augmented Generation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.22849\">R$^3$AG: Retriever Routing for Retrieval-Augmented Generation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.01733\">From BM25 to Corrective RAG: Benchmarking Retrieval Strategies for Text-and-Table Documents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.15583\">SAGE: Selective Attention-Guided Extraction for Token-Efficient Document Indexing</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{&quot;user&quot;:&quot;librarian-bot&quot;}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-05-15T01:45:13.008Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":355,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7003945112228394},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.05806","authors":[{"_id":"6a05da46b1a8cbabc9f0955e","name":"Elad Hoffer","hidden":false},{"_id":"6a05da46b1a8cbabc9f0955f","name":"Yochai Blau","hidden":false},{"_id":"6a05da46b1a8cbabc9f09560","name":"Edan Kinderman","hidden":false},{"_id":"6a05da46b1a8cbabc9f09561","name":"Ron Banner","hidden":false},{"_id":"6a05da46b1a8cbabc9f09562","name":"Daniel Soudry","hidden":false},{"_id":"6a05da46b1a8cbabc9f09563","name":"Boris Ginsburg","hidden":false}],"publishedAt":"2026-05-08T00:00:00.000Z","submittedOnDailyAt":"2026-05-14T00:00:00.000Z","title":"Retrieval from Within: An Intrinsic Capability of Attention-Based Models","submittedOnDailyBy":{"_id":"630480fa6dbbb80f16352ee3","avatarUrl":"/avatars/f39ce2fe96a578f42a57e3bfe3a2d137.svg","isPro":false,"fullname":"Elad Hoffer","user":"ehoffer","type":"user","name":"ehoffer"},"summary":"Retrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrieval via Attention), a framework where decoder attention queries score pre-encoded evidence chunks that are then directly reused as context for generation. By construction, INTRA unifies retrieval and generation, eliminating the retriever-generator mismatch typical of RAG pipelines. This design also amortizes context encoding by reusing precomputed encoder states across queries. On question-answering benchmarks, INTRA outperforms strong engineered retrieval pipelines on both evidence recall and end-to-end answer quality. Our results demonstrate that attention-based models already possess a retrieval mechanism that can be elicited, rather than added as an external module.","upvotes":3,"discussionId":"6a05da47b1a8cbabc9f09564","ai_summary":"INTRA demonstrates that attention-based models can perform retrieval directly from internal representations, unifying retrieval and generation while improving evidence recall and answer quality.","ai_keywords":["retrieval-augmented generation","attention-based encoder-decoder","decoder attention queries","pre-encoded evidence chunks","intrinsic retrieval","retriever-generator mismatch","evidence recall","end-to-end answer quality"],"organization":{"_id":"60262b67268c201cdc8b7d43","name":"nvidia","fullname":"NVIDIA","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65df9200dc3292a8983e5017/Vs5FPVCH-VZBipV3qKTuy.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"634dbe99418913d58464e3e7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/634dbe99418913d58464e3e7/VvQh0e_r1axTSCcznI1l2.png","isPro":true,"fullname":"Dan Lougen","user":"DJLougen","type":"user"},{"_id":"64b2f97434a92b848c7e941e","avatarUrl":"/avatars/c699c50f3b43cd1641469521127753bb.svg","isPro":false,"fullname":"Nagori","user":"MohammedNaeem","type":"user"},{"_id":"661e8e57ebe3616a1b084101","avatarUrl":"/avatars/b72ed568a97b147b54339a5c26185f71.svg","isPro":false,"fullname":"Travis King","user":"travisking","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"60262b67268c201cdc8b7d43","name":"nvidia","fullname":"NVIDIA","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65df9200dc3292a8983e5017/Vs5FPVCH-VZBipV3qKTuy.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.05806.md"}">
Papers
arxiv:2605.05806

Retrieval from Within: An Intrinsic Capability of Attention-Based Models

Published on May 8
· Submitted by
Elad Hoffer
on May 14
Authors:
,
,
,
,
,

Abstract

INTRA demonstrates that attention-based models can perform retrieval directly from internal representations, unifying retrieval and generation while improving evidence recall and answer quality.

AI-generated summary

Retrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrieval via Attention), a framework where decoder attention queries score pre-encoded evidence chunks that are then directly reused as context for generation. By construction, INTRA unifies retrieval and generation, eliminating the retriever-generator mismatch typical of RAG pipelines. This design also amortizes context encoding by reusing precomputed encoder states across queries. On question-answering benchmarks, INTRA outperforms strong engineered retrieval pipelines on both evidence recall and end-to-end answer quality. Our results demonstrate that attention-based models already possess a retrieval mechanism that can be elicited, rather than added as an external module.

Community

Paper submitter about 12 hours ago

Retrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrieval via Attention), a framework where decoder attention queries score pre-encoded evidence chunks that are then directly reused as context for generation. By construction, INTRA unifies retrieval and generation, eliminating the retriever-generator mismatch typical of RAG pipelines. This design also amortizes context encoding by reusing precomputed encoder states across queries. On question-answering benchmarks, INTRA outperforms strong engineered retrieval pipelines on both evidence recall and end-to-end answer quality. Our results demonstrate that attention-based models already possess a retrieval mechanism that can be elicited, rather than added as an external module.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.05806
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.05806 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.05806 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.05806 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers