Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning (RFT) and reinforcement learning (RL) pipelines. However, current reward evaluation relies on heterogeneous criteria such as rule-based verifiers, ground-truth references, procedural checklists, and complex rubrics, where a unified mechanism to integrate all types of evidence remains unexplored. To this end, we propose Skill Reward Model (Skill-RM), a unified framework that reformulates reward modeling as the execution of a reusable Reward-Evaluation Skill. By treating reward computation as a structured agentic task, Skill-RM provides a consistent interface to orchestrate heterogeneous resources, dynamically selecting and aggregating evidence tailored to the specific requirements of each input. This approach enables the reward model to move beyond static evaluation, ensuring consistency and transparency across diverse tasks. Extensive experiments on reward benchmarks and downstream applications, including best-of-N selection and reinforcement learning, demonstrate that Skill-RM consistently outperforms traditional judge baselines. Our findings suggest that Skill-RM not only provides a unified solution for reward modeling but also achieves superior performance through the strategic and dynamic orchestration of evidence.</p>\n","updatedAt":"2026-06-09T05:18:53.488Z","author":{"_id":"66f79b323fe089b75e9e0c04","avatarUrl":"/avatars/92918bf8913012a3f005f09e03b381c2.svg","fullname":"Siyuan Huang","name":"chamber111","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8949464559555054},"editors":["chamber111"],"editorAvatarUrls":["/avatars/92918bf8913012a3f005f09e03b381c2.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.03980","authors":[{"_id":"6a268758e4c258a029492267","name":"Tao Chen","hidden":false},{"_id":"6a268758e4c258a029492268","name":"Gangwei Jiang","hidden":false},{"_id":"6a268758e4c258a029492269","name":"Pengyu Cheng","hidden":false},{"_id":"6a268758e4c258a02949226a","name":"Siyuan Huang","hidden":false},{"_id":"6a268758e4c258a02949226b","name":"Yihao Liu","hidden":false},{"_id":"6a268758e4c258a02949226c","name":"Jingwei Ni","hidden":false},{"_id":"6a268758e4c258a02949226d","name":"Jiaqi Guo","hidden":false},{"_id":"6a268758e4c258a02949226e","name":"Mengyu Zhou","hidden":false},{"_id":"6a268758e4c258a02949226f","name":"Kai Tang","hidden":false},{"_id":"6a268758e4c258a029492270","name":"Junling Liu","hidden":false},{"_id":"6a268758e4c258a029492271","name":"Qinliang Su","hidden":false},{"_id":"6a268758e4c258a029492272","name":"Xiaoxi Jiang","hidden":false},{"_id":"6a268758e4c258a029492273","name":"Guanjun Jiang","hidden":false}],"publishedAt":"2026-06-02T00:00:00.000Z","submittedOnDailyAt":"2026-06-09T00:00:00.000Z","title":"Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill","submittedOnDailyBy":{"_id":"66f79b323fe089b75e9e0c04","avatarUrl":"/avatars/92918bf8913012a3f005f09e03b381c2.svg","isPro":false,"fullname":"Siyuan Huang","user":"chamber111","type":"user","name":"chamber111"},"summary":"Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning (RFT) and reinforcement learning (RL) pipelines. However, current reward evaluation relies on heterogeneous criteria such as rule-based verifiers, ground-truth references, procedural checklists, and complex rubrics, where a unified mechanism to integrate all types of evidence remains unexplored. To this end, we propose Skill Reward Model (Skill-RM), a unified framework that reformulates reward modeling as the execution of a reusable Reward-Evaluation Skill. By treating reward computation as a structured agentic task, Skill-RM provides a consistent interface to orchestrate heterogeneous resources, dynamically selecting and aggregating evidence tailored to the specific requirements of each input. This approach enables the reward model to move beyond static evaluation, ensuring consistency and transparency across diverse tasks. Extensive experiments on reward benchmarks and downstream applications, including best-of-N selection and reinforcement learning, demonstrate that Skill-RM consistently outperforms traditional judge baselines. Our findings suggest that Skill-RM not only provides a unified solution for reward modeling but also achieves superior performance through the strategic and dynamic orchestration of evidence. The code is at https://github.com/Qwen-Applications/Skill-RM.","upvotes":9,"discussionId":"6a268758e4c258a029492274","githubRepo":"https://github.com/Qwen-Applications/Skill-RM","githubRepoAddedBy":"user","ai_summary":"Skill-RM presents a unified reward modeling framework that treats reward computation as a structured agentic task, enabling dynamic evidence aggregation and consistent evaluation across diverse applications.","ai_keywords":["reward models","reinforced fine-tuning","reinforcement learning","reward evaluation skill","heterogeneous criteria","evidence aggregation","structured agentic task","reward modeling"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":4},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64340bde95b8ab0493864963","avatarUrl":"/avatars/bdf2d876e4fa0b7e7a1756fc20a1d0d2.svg","isPro":false,"fullname":"Pengyu Cheng","user":"Linear95","type":"user"},{"_id":"66f79b323fe089b75e9e0c04","avatarUrl":"/avatars/92918bf8913012a3f005f09e03b381c2.svg","isPro":false,"fullname":"Siyuan Huang","user":"chamber111","type":"user"},{"_id":"64fcc7e19132c7f62a2e6721","avatarUrl":"/avatars/e8c626554ee72c38c55c4ae44e9037c7.svg","isPro":false,"fullname":"Shijie Zhou","user":"smz8599","type":"user"},{"_id":"67698b2aa8c1f23364133dcd","avatarUrl":"/avatars/731e61e51957216d93b3b0d8b41029ef.svg","isPro":false,"fullname":"Durakaka","user":"Durakaka","type":"user"},{"_id":"674695a17e39c9bcdd93003d","avatarUrl":"/avatars/0e037d26c1a217912b2bf14b907f0e00.svg","isPro":false,"fullname":"Jiajun Song","user":"JiajunSong-Duke","type":"user"},{"_id":"64704b689ad4008a29058b6e","avatarUrl":"/avatars/f2f82ecb3f0019aafadb2c0e4fe82840.svg","isPro":false,"fullname":"Gangwei","user":"Fif2099","type":"user"},{"_id":"6374abbbecbd6fa145a22865","avatarUrl":"/avatars/6168455d60da89493671235684d78885.svg","isPro":false,"fullname":"Vito Chan","user":"cabbage-dog","type":"user"},{"_id":"656888f2461af93fcadd19f9","avatarUrl":"/avatars/44d3e92fb87995d729a21b3b1818e7ca.svg","isPro":false,"fullname":"tu","user":"yihaotu","type":"user"},{"_id":"6418463883957c4eaaae43d1","avatarUrl":"/avatars/a497206d7f01bad1c44971d6573e4946.svg","isPro":false,"fullname":"Junling","user":"williamliu","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.03980.md"}">
Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill
Authors: ,
,
,
,
,
,
,
,
,
,
,
,
Abstract
Skill-RM presents a unified reward modeling framework that treats reward computation as a structured agentic task, enabling dynamic evidence aggregation and consistent evaluation across diverse applications.
Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning (RFT) and reinforcement learning (RL) pipelines. However, current reward evaluation relies on heterogeneous criteria such as rule-based verifiers, ground-truth references, procedural checklists, and complex rubrics, where a unified mechanism to integrate all types of evidence remains unexplored. To this end, we propose Skill Reward Model (Skill-RM), a unified framework that reformulates reward modeling as the execution of a reusable Reward-Evaluation Skill. By treating reward computation as a structured agentic task, Skill-RM provides a consistent interface to orchestrate heterogeneous resources, dynamically selecting and aggregating evidence tailored to the specific requirements of each input. This approach enables the reward model to move beyond static evaluation, ensuring consistency and transparency across diverse tasks. Extensive experiments on reward benchmarks and downstream applications, including best-of-N selection and reinforcement learning, demonstrate that Skill-RM consistently outperforms traditional judge baselines. Our findings suggest that Skill-RM not only provides a unified solution for reward modeling but also achieves superior performance through the strategic and dynamic orchestration of evidence. The code is at https://github.com/Qwen-Applications/Skill-RM.
Community
Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning (RFT) and reinforcement learning (RL) pipelines. However, current reward evaluation relies on heterogeneous criteria such as rule-based verifiers, ground-truth references, procedural checklists, and complex rubrics, where a unified mechanism to integrate all types of evidence remains unexplored. To this end, we propose Skill Reward Model (Skill-RM), a unified framework that reformulates reward modeling as the execution of a reusable Reward-Evaluation Skill. By treating reward computation as a structured agentic task, Skill-RM provides a consistent interface to orchestrate heterogeneous resources, dynamically selecting and aggregating evidence tailored to the specific requirements of each input. This approach enables the reward model to move beyond static evaluation, ensuring consistency and transparency across diverse tasks. Extensive experiments on reward benchmarks and downstream applications, including best-of-N selection and reinforcement learning, demonstrate that Skill-RM consistently outperforms traditional judge baselines. Our findings suggest that Skill-RM not only provides a unified solution for reward modeling but also achieves superior performance through the strategic and dynamic orchestration of evidence.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.03980 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.03980 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.03980 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.