Real-world answers live in text, tables, and graphs. And, OmniRetrieval reaches them all through one natural-language interface, meeting each source on its own terms.</p>\n","updatedAt":"2026-05-29T02:52:59.659Z","author":{"_id":"63036b6c5c70c21d0ea79d48","avatarUrl":"/avatars/a7eb03f5cbd4eaa09fe807bbed8bc0f7.svg","fullname":"Jinheon Baek","name":"jinheon","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":11,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9123550057411194},"editors":["jinheon"],"editorAvatarUrls":["/avatars/a7eb03f5cbd4eaa09fe807bbed8bc0f7.svg"],"reactions":[],"isReport":false}},{"id":"6a1a412c2dd08064193b6a22","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false},"createdAt":"2026-05-30T01:45:16.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Query Symbolically or Retrieve Semantically? A Dataset and Method for Semi-Structured Question Answering](https://huggingface.co/papers/2605.27164) (2026)\n* [Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets](https://huggingface.co/papers/2604.22294) (2026)\n* [Knowledge Distillation for Low-Resource Open-source Text-to-SQL Model](https://huggingface.co/papers/2605.22843) (2026)\n* [Retrieval as Reasoning: Self-Evolving Agent-Native Retrieval via LLM-Wiki](https://huggingface.co/papers/2605.25480) (2026)\n* [Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG](https://huggingface.co/papers/2604.14572) (2026)\n* [Graph Query Generation with Constraint-guided Large Language Agents](https://huggingface.co/papers/2605.00845) (2026)\n* [PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents](https://huggingface.co/papers/2605.12260) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.27164\">Query Symbolically or Retrieve Semantically? A Dataset and Method for Semi-Structured Question Answering</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.22294\">Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.22843\">Knowledge Distillation for Low-Resource Open-source Text-to-SQL Model</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.25480\">Retrieval as Reasoning: Self-Evolving Agent-Native Retrieval via LLM-Wiki</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.14572\">Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.00845\">Graph Query Generation with Constraint-guided Large Language Agents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.12260\">PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{"user":"librarian-bot"}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-05-30T01:45:16.308Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.722079336643219},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.29250","authors":[{"_id":"6a18fe7356b4bb14ec65cf31","user":{"_id":"63036b6c5c70c21d0ea79d48","avatarUrl":"/avatars/a7eb03f5cbd4eaa09fe807bbed8bc0f7.svg","isPro":false,"fullname":"Jinheon Baek","user":"jinheon","type":"user","name":"jinheon"},"name":"Jinheon Baek","status":"claimed_verified","statusLastChangedAt":"2026-05-29T08:50:36.452Z","hidden":false},{"_id":"6a18fe7356b4bb14ec65cf32","name":"Soyeong Jeong","hidden":false},{"_id":"6a18fe7356b4bb14ec65cf33","user":{"_id":"638716c14e00d7fc0902fef4","avatarUrl":"/avatars/5fa8152f8c0e4e600d1a64802c3e0103.svg","isPro":false,"fullname":"Sangwoo Park","user":"Sangsang","type":"user","name":"Sangsang"},"name":"Sangwoo Park","status":"claimed_verified","statusLastChangedAt":"2026-05-29T08:50:34.410Z","hidden":false},{"_id":"6a18fe7356b4bb14ec65cf34","name":"Woongyeong Yeo","hidden":false},{"_id":"6a18fe7356b4bb14ec65cf35","name":"Minki Kang","hidden":false},{"_id":"6a18fe7356b4bb14ec65cf36","name":"Patara Trirat","hidden":false},{"_id":"6a18fe7356b4bb14ec65cf37","name":"Heejun Lee","hidden":false},{"_id":"6a18fe7356b4bb14ec65cf38","name":"Sung Ju Hwang","hidden":false}],"publishedAt":"2026-05-28T00:00:00.000Z","submittedOnDailyAt":"2026-05-29T00:00:00.000Z","title":"OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources","submittedOnDailyBy":{"_id":"63036b6c5c70c21d0ea79d48","avatarUrl":"/avatars/a7eb03f5cbd4eaa09fe807bbed8bc0f7.svg","isPro":false,"fullname":"Jinheon Baek","user":"jinheon","type":"user","name":"jinheon"},"summary":"Real-world information needs require access to structurally diverse knowledge sources, from unstructured text and relational tables to knowledge graphs and property graphs. Existing retrievers, however, operate over one source at a time under a fixed query language, leaving the broader landscape of available knowledge fragmented behind incompatible interfaces. A natural attempt at unification would collapse these sources into a shared space, but this erases the structural affordances (such as schemas, ontologies, compositional operators) that give each source its expressive power. Effective retrieval over diverse knowledge, therefore, requires not homogenization but an overarching layer that meets each source on its own terms. To achieve this, we present OmniRetrieval, a framework that takes any natural-language query, identifies appropriate knowledge sources, and dispatches source-native queries to their native execution engines. Across an extensive benchmark spanning 13 datasets and 309 distinct knowledge bases over text, relational, and graph-structured sources, OmniRetrieval exceeds single-source baselines, demonstrating that it can serve as a general-purpose interface to the heterogeneous sources while preserving the structural distinctions that make each source valuable.","upvotes":61,"discussionId":"6a18fe7356b4bb14ec65cf39","githubRepo":"https://github.com/JinheonBaek/OmniRetrieval","githubRepoAddedBy":"user","ai_summary":"OmniRetrieval is a framework that handles diverse knowledge sources by identifying appropriate repositories and dispatching native queries to their respective execution engines, outperforming single-source approaches across multiple dataset types.","ai_keywords":["knowledge sources","natural-language query","source-native queries","execution engines","heterogeneous sources","structured knowledge"],"githubStars":16,"organization":{"_id":"6475760c33192631bad2bb38","name":"kaist-ai","fullname":"KAIST AI","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6469949654873f0043b09c22/aaZFiyXe1qR-Dmy_xq67m.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63036b6c5c70c21d0ea79d48","avatarUrl":"/avatars/a7eb03f5cbd4eaa09fe807bbed8bc0f7.svg","isPro":false,"fullname":"Jinheon Baek","user":"jinheon","type":"user"},{"_id":"66d30f5fad293ffc4b7672bc","avatarUrl":"/avatars/6f164d813b947940a088820f8fd4dbe8.svg","isPro":false,"fullname":"Woongyeong Yeo","user":"wgcyeo","type":"user"},{"_id":"66b57c77778c98d29446c8ec","avatarUrl":"/avatars/c176bb7c072f3093f6a0786c87d384d8.svg","isPro":false,"fullname":"Taekyung Ki","user":"taekyungki","type":"user"},{"_id":"65e5bd4568234ef5d6decadc","avatarUrl":"/avatars/c41095a946c0176b949c0b3566136c05.svg","isPro":false,"fullname":"Jaehyeong Jo","user":"harryjo97","type":"user"},{"_id":"676829dd26eb881162d081ea","avatarUrl":"/avatars/b4a402653bf40defc97b35a8d07fb1ea.svg","isPro":true,"fullname":"Junmo Cho","user":"junmokane","type":"user"},{"_id":"65f06c8356cb8a32b41baf83","avatarUrl":"/avatars/ac66cc63f3abade4a859f7bf9357682a.svg","isPro":false,"fullname":"Jiongdao Jin","user":"jiongdao","type":"user"},{"_id":"695b4e723631aa29113d7b34","avatarUrl":"/avatars/0bf9e44919744f9a067573d9d14c05c8.svg","isPro":false,"fullname":"Ji","user":"543family","type":"user"},{"_id":"63bbf972d8d676a2299cdb44","avatarUrl":"/avatars/366d6ca7a4e19e42d2ec236a38d74ebd.svg","isPro":false,"fullname":"Sangwon","user":"agwmon","type":"user"},{"_id":"66339dbf143f209fe1de6fe7","avatarUrl":"/avatars/25db7821f92fc149e7ac90017acb231b.svg","isPro":false,"fullname":"Silvia Zhang","user":"RealSilvia","type":"user"},{"_id":"64b5457af249713053c736c5","avatarUrl":"/avatars/84cd17e11f20aee404f7ffadf659cd6f.svg","isPro":false,"fullname":"Yukyeong Lee","user":"leee99","type":"user"},{"_id":"66375f36c3296c5d26e1ccf3","avatarUrl":"/avatars/197d589d508f647e339ba0d9ffe3fe79.svg","isPro":false,"fullname":"Hyomin Lee","user":"pwnhyo","type":"user"},{"_id":"636356b14c9f829ef7014ccd","avatarUrl":"/avatars/6a634fd6431d1e2d8fa000f6afec7f8c.svg","isPro":false,"fullname":"Jae Hyuk Sung","user":"okaybody10","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":3,"organization":{"_id":"6475760c33192631bad2bb38","name":"kaist-ai","fullname":"KAIST AI","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6469949654873f0043b09c22/aaZFiyXe1qR-Dmy_xq67m.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.29250.md"}">
OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources
Abstract
OmniRetrieval is a framework that handles diverse knowledge sources by identifying appropriate repositories and dispatching native queries to their respective execution engines, outperforming single-source approaches across multiple dataset types.
AI-generated summary
Real-world information needs require access to structurally diverse knowledge sources, from unstructured text and relational tables to knowledge graphs and property graphs. Existing retrievers, however, operate over one source at a time under a fixed query language, leaving the broader landscape of available knowledge fragmented behind incompatible interfaces. A natural attempt at unification would collapse these sources into a shared space, but this erases the structural affordances (such as schemas, ontologies, compositional operators) that give each source its expressive power. Effective retrieval over diverse knowledge, therefore, requires not homogenization but an overarching layer that meets each source on its own terms. To achieve this, we present OmniRetrieval, a framework that takes any natural-language query, identifies appropriate knowledge sources, and dispatches source-native queries to their native execution engines. Across an extensive benchmark spanning 13 datasets and 309 distinct knowledge bases over text, relational, and graph-structured sources, OmniRetrieval exceeds single-source baselines, demonstrating that it can serve as a general-purpose interface to the heterogeneous sources while preserving the structural distinctions that make each source valuable.
Community
Real-world answers live in text, tables, and graphs. And, OmniRetrieval reaches them all through one natural-language interface, meeting each source on its own terms.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.29250 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.29250 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.29250 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.