Hugging Face Daily Papers · May 14, 2026 · 4 min read

Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation</p>\n","updatedAt":"2026-05-14T04:18:59.572Z","author":{"_id":"64527f701a57e1179c1c3693","avatarUrl":"/avatars/25b2632d7aa9ce26d5d4924ecb00c4f4.svg","fullname":"Jiashuo Sun","name":"gasolsun","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7508645057678223},"editors":["gasolsun"],"editorAvatarUrls":["/avatars/25b2632d7aa9ce26d5d4924ecb00c4f4.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.12975","authors":[{"_id":"6a054d09b1a8cbabc9f08882","name":"Jiashuo Sun","hidden":false},{"_id":"6a054d09b1a8cbabc9f08883","name":"Jimeng Shi","hidden":false},{"_id":"6a054d09b1a8cbabc9f08884","name":"Yixuan Xie","hidden":false},{"_id":"6a054d09b1a8cbabc9f08885","name":"Saizhuo Wang","hidden":false},{"_id":"6a054d09b1a8cbabc9f08886","name":"Jash Rajesh Parekh","hidden":false},{"_id":"6a054d09b1a8cbabc9f08887","name":"Pengcheng Jiang","hidden":false},{"_id":"6a054d09b1a8cbabc9f08888","name":"Zhiyi Shi","hidden":false},{"_id":"6a054d09b1a8cbabc9f08889","name":"Jiajun Fan","hidden":false},{"_id":"6a054d09b1a8cbabc9f0888a","name":"Qinglong Zheng","hidden":false},{"_id":"6a054d09b1a8cbabc9f0888b","name":"Peiran Li","hidden":false},{"_id":"6a054d09b1a8cbabc9f0888c","name":"Shaowen Wang","hidden":false},{"_id":"6a054d09b1a8cbabc9f0888d","name":"Ge Liu","hidden":false},{"_id":"6a054d09b1a8cbabc9f0888e","name":"Jiawei Han","hidden":false}],"publishedAt":"2026-05-13T00:00:00.000Z","submittedOnDailyAt":"2026-05-14T00:00:00.000Z","title":"Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation","submittedOnDailyBy":{"_id":"64527f701a57e1179c1c3693","avatarUrl":"/avatars/25b2632d7aa9ce26d5d4924ecb00c4f4.svg","isPro":false,"fullname":"Jiashuo Sun","user":"gasolsun","type":"user","name":"gasolsun"},"summary":"Retrieval-Augmented Generation (RAG) has become a standard approach for knowledge-intensive question answering, but existing systems remain brittle on multi-hop questions, where solving the task requires chaining multiple retrieval and reasoning steps. Key challenges are that current methods represent reasoning through free-form natural language, where intermediate states are implicit, retrieval queries can drift from intended entities, and errors are detected by the same model that produces them making self-reflection an unreliable, ungrounded signal.\n We observe that multi-hop question answering is a typical form of step-by-step computation, and that this structured process aligns closely with how code-specialized language models are trained to operate. Motivated by this, we introduce \\pyrag, a framework that reformulates multi-hop RAG as program synthesis and execution. Instead of free-form reasoning trajectories, \\pyrag represents the reasoning process as an executable Python program over retrieval and QA tools, exposing intermediate states as variables, producing deterministic feedback through execution, and yielding an inspectable trace of the entire reasoning process. This formulation further enables compiler-grounded self-repair and execution-driven adaptive retrieval without any additional training.\n Experiments on five QA benchmarks (PopQA, HotpotQA, 2WikiMultihopQA, MuSiQue, and Bamboogle) show that \\pyrag consistently outperforms strong baselines under both training-free and RL-trained settings, with especially large gains on compositional multi-hop datasets. Our code, data and models are publicly available at https://github.com/GasolSun36/PyRAG.","upvotes":8,"discussionId":"6a054d0ab1a8cbabc9f0888f","projectPage":"https://gasolsun36.github.io/PyRAG/","githubRepo":"https://github.com/GasolSun36/PyRAG","githubRepoAddedBy":"user","ai_summary":"Multi-hop question answering is reformulated as program synthesis and execution, enabling structured reasoning, deterministic feedback, and improved performance over existing retrieval-augmented generation approaches.","ai_keywords":["retrieval-augmented generation","multi-hop questions","program synthesis","execution-driven adaptive retrieval","compiler-grounded self-repair","reasoning process","executable Python program","intermediate states","deterministic feedback","QA benchmarks","PopQA","HotpotQA","2WikiMultihopQA","MuSiQue","Bamboogle"],"githubStars":5,"organization":{"_id":"65448bef5b5d9185ba3202b9","name":"UIUC-CS","fullname":"University of Illinois at Urbana-Champaign","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65448b21fcb96b8b48733729/ycqcXFayMTTD_KpE37067.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64527f701a57e1179c1c3693","avatarUrl":"/avatars/25b2632d7aa9ce26d5d4924ecb00c4f4.svg","isPro":false,"fullname":"Jiashuo Sun","user":"gasolsun","type":"user"},{"_id":"648a446f28e95179adbf590b","avatarUrl":"/avatars/30f5328f04fd2f2f540f6a90a3204e86.svg","isPro":false,"fullname":"Peiran L","user":"peiranli0930","type":"user"},{"_id":"66d4af28033492801d82b890","avatarUrl":"/avatars/5e8a2dc1b932a679341976d11b22f6c8.svg","isPro":false,"fullname":"shi","user":"Gabshi","type":"user"},{"_id":"64a5583f1e1d475f6da4928c","avatarUrl":"/avatars/a46ddc1479ab6e1d14568822b7546a69.svg","isPro":false,"fullname":"Heng Wang","user":"Heng-Wang","type":"user"},{"_id":"63724cfada3183d9d53f2009","avatarUrl":"/avatars/17838fcf244ecf8d139343bb6c6d8562.svg","isPro":false,"fullname":"Patrick Jiang","user":"pat-jj","type":"user"},{"_id":"66a3f1c4c38ce500371fd8d4","avatarUrl":"/avatars/381de938091f1a5c179eef72aa247bbf.svg","isPro":false,"fullname":"Xueqiang Xu","user":"XueqiangXu","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"652ef1a157a8ba396c6d2561","avatarUrl":"/avatars/057e3fee63257c3069328b1746206a2e.svg","isPro":false,"fullname":"Jimeng Shi","user":"jimeng008","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"65448bef5b5d9185ba3202b9","name":"UIUC-CS","fullname":"University of Illinois at Urbana-Champaign","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65448b21fcb96b8b48733729/ycqcXFayMTTD_KpE37067.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.12975.md"}">

Papers

arxiv:2605.12975

Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation

Published on May 13

· Submitted by

Jiashuo Sun on May 14

University of Illinois at Urbana-Champaign

Upvote

Authors:

Abstract

Multi-hop question answering is reformulated as program synthesis and execution, enabling structured reasoning, deterministic feedback, and improved performance over existing retrieval-augmented generation approaches.

AI-generated summary

Retrieval-Augmented Generation (RAG) has become a standard approach for knowledge-intensive question answering, but existing systems remain brittle on multi-hop questions, where solving the task requires chaining multiple retrieval and reasoning steps. Key challenges are that current methods represent reasoning through free-form natural language, where intermediate states are implicit, retrieval queries can drift from intended entities, and errors are detected by the same model that produces them making self-reflection an unreliable, ungrounded signal. We observe that multi-hop question answering is a typical form of step-by-step computation, and that this structured process aligns closely with how code-specialized language models are trained to operate. Motivated by this, we introduce \pyrag, a framework that reformulates multi-hop RAG as program synthesis and execution. Instead of free-form reasoning trajectories, \pyrag represents the reasoning process as an executable Python program over retrieval and QA tools, exposing intermediate states as variables, producing deterministic feedback through execution, and yielding an inspectable trace of the entire reasoning process. This formulation further enables compiler-grounded self-repair and execution-driven adaptive retrieval without any additional training. Experiments on five QA benchmarks (PopQA, HotpotQA, 2WikiMultihopQA, MuSiQue, and Bamboogle) show that \pyrag consistently outperforms strong baselines under both training-free and RL-trained settings, with especially large gains on compositional multi-hop datasets. Our code, data and models are publicly available at https://github.com/GasolSun36/PyRAG.

View arXiv page View PDF Project page GitHub 5 Add to collection

Community

gasolsun

Paper submitter about 22 hours ago

Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.12975

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.12975 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.12975 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.12975 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

No comments yet. Sign in and be the first to say something.

Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 1

Discussion (0)

More from Hugging Face Daily Papers