Hugging Face Daily Papers · · 7 min read

Verifiable Rewards Beyond Math and Code: Lightweight Corpus-Grounded Process Supervision for Factual Question Answering

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Applying reinforcement learning to improve factual accuracy in knowledge-intensive question answering faces a reward design dilemma. Response-level rewards provide only coarse supervision and cannot distinguish correct from incorrect statements within a reasoning trace. Sentence-level alternatives offer finer-grained feedback, but typically rely on NLI verifiers, LLM judges, or knowledge-verification pipelines that are expensive to deploy at RL scale and often unreliable for rare-entity facts, where accurate reward signals are especially important. We propose CorVer (Corpus Verify), a lightweight, plug-in-ready process reward that replaces neural verifiers with a corpus-grounded signal derived from Wikipedia co-occurrence statistics. CorVer assigns sentence-level credit and maps it to token-level advantages via a simple alignment, requiring only a 0.5B extractor and a single corpus lookup per sentence. Across 30 (model, benchmark) cells spanning six instruction-tuned models (3B to 14B) and five QA benchmarks, CorVer improves over the raw baseline for every cell, with an average TriviaQA gain of +4.1 pp. It also outperforms four neural-verifier baselines in 18 of 20 cells under their feasible configurations, while training 4.8 to 8.4x faster.</p>\n","updatedAt":"2026-05-29T05:59:36.652Z","author":{"_id":"629c6ee73a3221bb210afc2d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/629c6ee73a3221bb210afc2d/Mg-VymVvHQn_pDrTgks0s.jpeg","fullname":"Dehai Min","name":"ZhishanQ","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8976837396621704},"editors":["ZhishanQ"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/629c6ee73a3221bb210afc2d/Mg-VymVvHQn_pDrTgks0s.jpeg"],"reactions":[],"isReport":false}},{"id":"6a1a40f1c64564be73245299","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false},"createdAt":"2026-05-30T01:44:17.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Rubric-Grounded RL: Structured Judge Rewards for Generalizable Reasoning](https://huggingface.co/papers/2605.08061) (2026)\n* [Calibrating LLMs with Semantic-level Reward](https://huggingface.co/papers/2605.15588) (2026)\n* [Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs](https://huggingface.co/papers/2605.07153) (2026)\n* [Step-wise Rubric Rewards for LLM Reasoning](https://huggingface.co/papers/2605.17291) (2026)\n* [Confidence-Orchestrated Self-Evolution against Uncertain LLM Feedback](https://huggingface.co/papers/2605.28010) (2026)\n* [Think Through Uncertainty: Improving Long-Form Generation Factuality via Reasoning Calibration](https://huggingface.co/papers/2604.12046) (2026)\n* [Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR](https://huggingface.co/papers/2605.20164) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.08061\">Rubric-Grounded RL: Structured Judge Rewards for Generalizable Reasoning</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.15588\">Calibrating LLMs with Semantic-level Reward</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.07153\">Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.17291\">Step-wise Rubric Rewards for LLM Reasoning</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.28010\">Confidence-Orchestrated Self-Evolution against Uncertain LLM Feedback</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.12046\">Think Through Uncertainty: Improving Long-Form Generation Factuality via Reasoning Calibration</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.20164\">Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{&quot;user&quot;:&quot;librarian-bot&quot;}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-05-30T01:44:17.838Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7349774837493896},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.29648","authors":[{"_id":"6a192af556b4bb14ec65d0d4","name":"Shicheng Fan","hidden":false},{"_id":"6a192af556b4bb14ec65d0d5","name":"Haochang Hao","hidden":false},{"_id":"6a192af556b4bb14ec65d0d6","name":"Dehai Min","hidden":false},{"_id":"6a192af556b4bb14ec65d0d7","name":"Weihao Liu","hidden":false},{"_id":"6a192af556b4bb14ec65d0d8","name":"Philip S. Yu","hidden":false},{"_id":"6a192af556b4bb14ec65d0d9","name":"Lu Cheng","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/629c6ee73a3221bb210afc2d/0ATgw27CY-5ib8i4YiNLJ.png","https://cdn-uploads.huggingface.co/production/uploads/629c6ee73a3221bb210afc2d/4KUD1876MDX8Jahffzdoz.png"],"publishedAt":"2026-05-28T00:00:00.000Z","submittedOnDailyAt":"2026-05-29T00:00:00.000Z","title":"Verifiable Rewards Beyond Math and Code: Lightweight Corpus-Grounded Process Supervision for Factual Question Answering","submittedOnDailyBy":{"_id":"629c6ee73a3221bb210afc2d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/629c6ee73a3221bb210afc2d/Mg-VymVvHQn_pDrTgks0s.jpeg","isPro":false,"fullname":"Dehai Min","user":"ZhishanQ","type":"user","name":"ZhishanQ"},"summary":"Applying reinforcement learning to improve factual accuracy in knowledge-intensive question answering faces a reward design dilemma. Response-level rewards provide only coarse supervision and cannot distinguish correct from incorrect statements within a reasoning trace. Sentence-level alternatives offer finer-grained feedback, but typically rely on NLI verifiers, LLM judges, or knowledge-verification pipelines that are expensive to deploy at RL scale and often unreliable for rare-entity facts, where accurate reward signals are especially important. We propose CorVer (Corpus Verify), a lightweight, plug-in-ready process reward that replaces neural verifiers with a corpus-grounded signal derived from Wikipedia co-occurrence statistics. CorVer assigns sentence-level credit and maps it to token-level advantages via a simple alignment, requiring only a 0.5B extractor and a single corpus lookup per sentence. Across 30 (model, benchmark) cells spanning six instruction-tuned models (3B to 14B) and five QA benchmarks, CorVer improves over the raw baseline for every cell, with an average TriviaQA gain of +4.1 pp. It also outperforms four neural-verifier baselines in 18 of 20 cells under their feasible configurations, while training 4.8 to 8.4x faster.","upvotes":4,"discussionId":"6a192af556b4bb14ec65d0da","githubRepo":"https://github.com/shichengf/CorVer","githubRepoAddedBy":"user","ai_summary":"CorVer, a corpus-grounded reward mechanism, enhances factual accuracy in question answering by providing efficient sentence-level feedback through Wikipedia co-occurrence statistics, outperforming neural verifiers while reducing training time.","ai_keywords":["reinforcement learning","knowledge-intensive question answering","reward design","NLI verifiers","LLM judges","knowledge-verification pipelines","sentence-level rewards","token-level advantages","corpus-grounded signal","Wikipedia co-occurrence statistics","TriviaQA"],"githubStars":0,"organization":{"_id":"65c2bfbc82fc487034cdbe0e","name":"UIChicago","fullname":"University of Illinois Chicago","avatar":"https://www.gravatar.com/avatar/36812d7b0099d7b6dfd3f48821be465f?d=retro&size=100"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"629c6ee73a3221bb210afc2d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/629c6ee73a3221bb210afc2d/Mg-VymVvHQn_pDrTgks0s.jpeg","isPro":false,"fullname":"Dehai Min","user":"ZhishanQ","type":"user"},{"_id":"691d4123f8321286ee15a131","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/691d4123f8321286ee15a131/0Hfhodl-2KNYCH1RClWvR.jpeg","isPro":false,"fullname":"shicheng","user":"Shichengf","type":"user"},{"_id":"69418eff68f48f199f47bc2d","avatarUrl":"/avatars/5e728a98a15314d76bafb38e8d9e1edd.svg","isPro":false,"fullname":"Haochang Hao","user":"liofoil","type":"user"},{"_id":"640c4c619e5247967ff1567c","avatarUrl":"/avatars/923b5a1b72cd1417ba9b6f9c15814821.svg","isPro":false,"fullname":"liuweihao","user":"neosknight","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"65c2bfbc82fc487034cdbe0e","name":"UIChicago","fullname":"University of Illinois Chicago","avatar":"https://www.gravatar.com/avatar/36812d7b0099d7b6dfd3f48821be465f?d=retro&size=100"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.29648.md"}">
Papers
arxiv:2605.29648

Verifiable Rewards Beyond Math and Code: Lightweight Corpus-Grounded Process Supervision for Factual Question Answering

Published on May 28
· Submitted by
Dehai Min
on May 29
Authors:
,
,
,
,
,

Abstract

CorVer, a corpus-grounded reward mechanism, enhances factual accuracy in question answering by providing efficient sentence-level feedback through Wikipedia co-occurrence statistics, outperforming neural verifiers while reducing training time.

AI-generated summary

Applying reinforcement learning to improve factual accuracy in knowledge-intensive question answering faces a reward design dilemma. Response-level rewards provide only coarse supervision and cannot distinguish correct from incorrect statements within a reasoning trace. Sentence-level alternatives offer finer-grained feedback, but typically rely on NLI verifiers, LLM judges, or knowledge-verification pipelines that are expensive to deploy at RL scale and often unreliable for rare-entity facts, where accurate reward signals are especially important. We propose CorVer (Corpus Verify), a lightweight, plug-in-ready process reward that replaces neural verifiers with a corpus-grounded signal derived from Wikipedia co-occurrence statistics. CorVer assigns sentence-level credit and maps it to token-level advantages via a simple alignment, requiring only a 0.5B extractor and a single corpus lookup per sentence. Across 30 (model, benchmark) cells spanning six instruction-tuned models (3B to 14B) and five QA benchmarks, CorVer improves over the raw baseline for every cell, with an average TriviaQA gain of +4.1 pp. It also outperforms four neural-verifier baselines in 18 of 20 cells under their feasible configurations, while training 4.8 to 8.4x faster.

Community

Paper submitter 1 day ago

Applying reinforcement learning to improve factual accuracy in knowledge-intensive question answering faces a reward design dilemma. Response-level rewards provide only coarse supervision and cannot distinguish correct from incorrect statements within a reasoning trace. Sentence-level alternatives offer finer-grained feedback, but typically rely on NLI verifiers, LLM judges, or knowledge-verification pipelines that are expensive to deploy at RL scale and often unreliable for rare-entity facts, where accurate reward signals are especially important. We propose CorVer (Corpus Verify), a lightweight, plug-in-ready process reward that replaces neural verifiers with a corpus-grounded signal derived from Wikipedia co-occurrence statistics. CorVer assigns sentence-level credit and maps it to token-level advantages via a simple alignment, requiring only a 0.5B extractor and a single corpus lookup per sentence. Across 30 (model, benchmark) cells spanning six instruction-tuned models (3B to 14B) and five QA benchmarks, CorVer improves over the raw baseline for every cell, with an average TriviaQA gain of +4.1 pp. It also outperforms four neural-verifier baselines in 18 of 20 cells under their feasible configurations, while training 4.8 to 8.4x faster.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.29648
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.29648 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.29648 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.29648 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers