Hugging Face Daily Papers · · 5 min read

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning (RFT) and reinforcement learning (RL) pipelines. However, current reward evaluation relies on heterogeneous criteria such as rule-based verifiers, ground-truth references, procedural checklists, and complex rubrics, where a unified mechanism to integrate all types of evidence remains unexplored. To this end, we propose Skill Reward Model (Skill-RM), a unified framework that reformulates reward modeling as the execution of a reusable Reward-Evaluation Skill. By treating reward computation as a structured agentic task, Skill-RM provides a consistent interface to orchestrate heterogeneous resources, dynamically selecting and aggregating evidence tailored to the specific requirements of each input. This approach enables the reward model to move beyond static evaluation, ensuring consistency and transparency across diverse tasks. Extensive experiments on reward benchmarks and downstream applications, including best-of-N selection and reinforcement learning, demonstrate that Skill-RM consistently outperforms traditional judge baselines. Our findings suggest that Skill-RM not only provides a unified solution for reward modeling but also achieves superior performance through the strategic and dynamic orchestration of evidence.</p>\n","updatedAt":"2026-06-09T05:18:53.488Z","author":{"_id":"66f79b323fe089b75e9e0c04","avatarUrl":"/avatars/92918bf8913012a3f005f09e03b381c2.svg","fullname":"Siyuan Huang","name":"chamber111","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8949464559555054},"editors":["chamber111"],"editorAvatarUrls":["/avatars/92918bf8913012a3f005f09e03b381c2.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.03980","authors":[{"_id":"6a268758e4c258a029492267","name":"Tao Chen","hidden":false},{"_id":"6a268758e4c258a029492268","name":"Gangwei Jiang","hidden":false},{"_id":"6a268758e4c258a029492269","name":"Pengyu Cheng","hidden":false},{"_id":"6a268758e4c258a02949226a","name":"Siyuan Huang","hidden":false},{"_id":"6a268758e4c258a02949226b","name":"Yihao Liu","hidden":false},{"_id":"6a268758e4c258a02949226c","name":"Jingwei Ni","hidden":false},{"_id":"6a268758e4c258a02949226d","name":"Jiaqi Guo","hidden":false},{"_id":"6a268758e4c258a02949226e","name":"Mengyu Zhou","hidden":false},{"_id":"6a268758e4c258a02949226f","name":"Kai Tang","hidden":false},{"_id":"6a268758e4c258a029492270","name":"Junling Liu","hidden":false},{"_id":"6a268758e4c258a029492271","name":"Qinliang Su","hidden":false},{"_id":"6a268758e4c258a029492272","name":"Xiaoxi Jiang","hidden":false},{"_id":"6a268758e4c258a029492273","name":"Guanjun Jiang","hidden":false}],"publishedAt":"2026-06-02T00:00:00.000Z","submittedOnDailyAt":"2026-06-09T00:00:00.000Z","title":"Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill","submittedOnDailyBy":{"_id":"66f79b323fe089b75e9e0c04","avatarUrl":"/avatars/92918bf8913012a3f005f09e03b381c2.svg","isPro":false,"fullname":"Siyuan Huang","user":"chamber111","type":"user","name":"chamber111"},"summary":"Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning (RFT) and reinforcement learning (RL) pipelines. However, current reward evaluation relies on heterogeneous criteria such as rule-based verifiers, ground-truth references, procedural checklists, and complex rubrics, where a unified mechanism to integrate all types of evidence remains unexplored. To this end, we propose Skill Reward Model (Skill-RM), a unified framework that reformulates reward modeling as the execution of a reusable Reward-Evaluation Skill. By treating reward computation as a structured agentic task, Skill-RM provides a consistent interface to orchestrate heterogeneous resources, dynamically selecting and aggregating evidence tailored to the specific requirements of each input. This approach enables the reward model to move beyond static evaluation, ensuring consistency and transparency across diverse tasks. Extensive experiments on reward benchmarks and downstream applications, including best-of-N selection and reinforcement learning, demonstrate that Skill-RM consistently outperforms traditional judge baselines. Our findings suggest that Skill-RM not only provides a unified solution for reward modeling but also achieves superior performance through the strategic and dynamic orchestration of evidence. The code is at https://github.com/Qwen-Applications/Skill-RM.","upvotes":9,"discussionId":"6a268758e4c258a029492274","githubRepo":"https://github.com/Qwen-Applications/Skill-RM","githubRepoAddedBy":"user","ai_summary":"Skill-RM presents a unified reward modeling framework that treats reward computation as a structured agentic task, enabling dynamic evidence aggregation and consistent evaluation across diverse applications.","ai_keywords":["reward models","reinforced fine-tuning","reinforcement learning","reward evaluation skill","heterogeneous criteria","evidence aggregation","structured agentic task","reward modeling"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":4},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64340bde95b8ab0493864963","avatarUrl":"/avatars/bdf2d876e4fa0b7e7a1756fc20a1d0d2.svg","isPro":false,"fullname":"Pengyu Cheng","user":"Linear95","type":"user"},{"_id":"66f79b323fe089b75e9e0c04","avatarUrl":"/avatars/92918bf8913012a3f005f09e03b381c2.svg","isPro":false,"fullname":"Siyuan Huang","user":"chamber111","type":"user"},{"_id":"64fcc7e19132c7f62a2e6721","avatarUrl":"/avatars/e8c626554ee72c38c55c4ae44e9037c7.svg","isPro":false,"fullname":"Shijie Zhou","user":"smz8599","type":"user"},{"_id":"67698b2aa8c1f23364133dcd","avatarUrl":"/avatars/731e61e51957216d93b3b0d8b41029ef.svg","isPro":false,"fullname":"Durakaka","user":"Durakaka","type":"user"},{"_id":"674695a17e39c9bcdd93003d","avatarUrl":"/avatars/0e037d26c1a217912b2bf14b907f0e00.svg","isPro":false,"fullname":"Jiajun Song","user":"JiajunSong-Duke","type":"user"},{"_id":"64704b689ad4008a29058b6e","avatarUrl":"/avatars/f2f82ecb3f0019aafadb2c0e4fe82840.svg","isPro":false,"fullname":"Gangwei","user":"Fif2099","type":"user"},{"_id":"6374abbbecbd6fa145a22865","avatarUrl":"/avatars/6168455d60da89493671235684d78885.svg","isPro":false,"fullname":"Vito Chan","user":"cabbage-dog","type":"user"},{"_id":"656888f2461af93fcadd19f9","avatarUrl":"/avatars/44d3e92fb87995d729a21b3b1818e7ca.svg","isPro":false,"fullname":"tu","user":"yihaotu","type":"user"},{"_id":"6418463883957c4eaaae43d1","avatarUrl":"/avatars/a497206d7f01bad1c44971d6573e4946.svg","isPro":false,"fullname":"Junling","user":"williamliu","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.03980.md"}">
Papers
arxiv:2606.03980

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

Published on Jun 2
· Submitted by
Siyuan Huang
on Jun 9
Authors:
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Skill-RM presents a unified reward modeling framework that treats reward computation as a structured agentic task, enabling dynamic evidence aggregation and consistent evaluation across diverse applications.

Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning (RFT) and reinforcement learning (RL) pipelines. However, current reward evaluation relies on heterogeneous criteria such as rule-based verifiers, ground-truth references, procedural checklists, and complex rubrics, where a unified mechanism to integrate all types of evidence remains unexplored. To this end, we propose Skill Reward Model (Skill-RM), a unified framework that reformulates reward modeling as the execution of a reusable Reward-Evaluation Skill. By treating reward computation as a structured agentic task, Skill-RM provides a consistent interface to orchestrate heterogeneous resources, dynamically selecting and aggregating evidence tailored to the specific requirements of each input. This approach enables the reward model to move beyond static evaluation, ensuring consistency and transparency across diverse tasks. Extensive experiments on reward benchmarks and downstream applications, including best-of-N selection and reinforcement learning, demonstrate that Skill-RM consistently outperforms traditional judge baselines. Our findings suggest that Skill-RM not only provides a unified solution for reward modeling but also achieves superior performance through the strategic and dynamic orchestration of evidence. The code is at https://github.com/Qwen-Applications/Skill-RM.

Community

Paper submitter about 3 hours ago

Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning (RFT) and reinforcement learning (RL) pipelines. However, current reward evaluation relies on heterogeneous criteria such as rule-based verifiers, ground-truth references, procedural checklists, and complex rubrics, where a unified mechanism to integrate all types of evidence remains unexplored. To this end, we propose Skill Reward Model (Skill-RM), a unified framework that reformulates reward modeling as the execution of a reusable Reward-Evaluation Skill. By treating reward computation as a structured agentic task, Skill-RM provides a consistent interface to orchestrate heterogeneous resources, dynamically selecting and aggregating evidence tailored to the specific requirements of each input. This approach enables the reward model to move beyond static evaluation, ensuring consistency and transparency across diverse tasks. Extensive experiments on reward benchmarks and downstream applications, including best-of-N selection and reinforcement learning, demonstrate that Skill-RM consistently outperforms traditional judge baselines. Our findings suggest that Skill-RM not only provides a unified solution for reward modeling but also achieves superior performance through the strategic and dynamic orchestration of evidence.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.03980
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.03980 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.03980 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.03980 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers