Hugging Face Daily Papers · June 9, 2026 · 5 min read

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning (RFT) and reinforcement learning (RL) pipelines. However, current reward evaluation relies on heterogeneous criteria such as rule-based verifiers, ground-truth references, procedural checklists, and complex rubrics, where a unified mechanism to integrate all types of evidence remains unexplored. To this end, we propose Skill Reward Model (Skill-RM), a unified framework that reformulates reward modeling as the execution of a reusable Reward-Evaluation Skill. By treating reward computation as a structured agentic task, Skill-RM provides a consistent interface to orchestrate heterogeneous resources, dynamically selecting and aggregating evidence tailored to the specific requirements of each input. This approach enables the reward model to move beyond static evaluation, ensuring consistency and transparency across diverse tasks. Extensive experiments on reward benchmarks and downstream applications, including best-of-N selection and reinforcement learning, demonstrate that Skill-RM consistently outperforms traditional judge baselines. Our findings suggest that Skill-RM not only provides a unified solution for reward modeling but also achieves superior performance through the strategic and dynamic orchestration of evidence.</p>\n","updatedAt":"2026-06-09T05:18:53.488Z","author":{"_id":"66f79b323fe089b75e9e0c04","avatarUrl":"/avatars/92918bf8913012a3f005f09e03b381c2.svg","fullname":"Siyuan Huang","name":"chamber111","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8949464559555054},"editors":["chamber111"],"editorAvatarUrls":["/avatars/92918bf8913012a3f005f09e03b381c2.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.03980","authors":[{"_id":"6a268758e4c258a029492267","name":"Tao Chen","hidden":false},{"_id":"6a268758e4c258a029492268","name":"Gangwei Jiang","hidden":false},{"_id":"6a268758e4c258a029492269","name":"Pengyu Cheng","hidden":false},{"_id":"6a268758e4c258a02949226a","name":"Siyuan Huang","hidden":false},{"_id":"6a268758e4c258a02949226b","name":"Yihao Liu","hidden":false},{"_id":"6a268758e4c258a02949226c","name":"Jingwei Ni","hidden":false},{"_id":"6a268758e4c258a02949226d","name":"Jiaqi Guo","hidden":false},{"_id":"6a268758e4c258a02949226e","name":"Mengyu Zhou","hidden":false},{"_id":"6a268758e4c258a02949226f","name":"Kai Tang","hidden":false},{"_id":"6a268758e4c258a029492270","name":"Junling Liu","hidden":false},{"_id":"6a268758e4c258a029492271","name":"Qinliang Su","hidden":false},{"_id":"6a268758e4c258a029492272","name":"Xiaoxi Jiang","hidden":false},{"_id":"6a268758e4c258a029492273","name":"Guanjun Jiang","hidden":false}],"publishedAt":"2026-06-02T00:00:00.000Z","submittedOnDailyAt":"2026-06-09T00:00:00.000Z","title":"Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill","submittedOnDailyBy":{"_id":"66f79b323fe089b75e9e0c04","avatarUrl":"/avatars/92918bf8913012a3f005f09e03b381c2.svg","isPro":false,"fullname":"Siyuan Huang","user":"chamber111","type":"user","name":"chamber111"},"summary":"Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning (RFT) and reinforcement learning (RL) pipelines. However, current reward evaluation relies on heterogeneous criteria such as rule-based verifiers, ground-truth references, procedural checklists, and complex rubrics, where a unified mechanism to integrate all types of evidence remains unexplored. To this end, we propose Skill Reward Model (Skill-RM), a unified framework that reformulates reward modeling as the execution of a reusable Reward-Evaluation Skill. By treating reward computation as a structured agentic task, Skill-RM provides a consistent interface to orchestrate heterogeneous resources, dynamically selecting and aggregating evidence tailored to the specific requirements of each input. This approach enables the reward model to move beyond static evaluation, ensuring consistency and transparency across diverse tasks. Extensive experiments on reward benchmarks and downstream applications, including best-of-N selection and reinforcement learning, demonstrate that Skill-RM consistently outperforms traditional judge baselines. Our findings suggest that Skill-RM not only provides a unified solution for reward modeling but also achieves superior performance through the strategic and dynamic orchestration of evidence. The code is at https://github.com/Qwen-Applications/Skill-RM.","upvotes":9,"discussionId":"6a268758e4c258a029492274","githubRepo":"https://github.com/Qwen-Applications/Skill-RM","githubRepoAddedBy":"user","ai_summary":"Skill-RM presents a unified reward modeling framework that treats reward computation as a structured agentic task, enabling dynamic evidence aggregation and consistent evaluation across diverse applications.","ai_keywords":["reward models","reinforced fine-tuning","reinforcement learning","reward evaluation skill","heterogeneous criteria","evidence aggregation","structured agentic task","reward modeling"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":4},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64340bde95b8ab0493864963","avatarUrl":"/avatars/bdf2d876e4fa0b7e7a1756fc20a1d0d2.svg","isPro":false,"fullname":"Pengyu Cheng","user":"Linear95","type":"user"},{"_id":"66f79b323fe089b75e9e0c04","avatarUrl":"/avatars/92918bf8913012a3f005f09e03b381c2.svg","isPro":false,"fullname":"Siyuan Huang","user":"chamber111","type":"user"},{"_id":"64fcc7e19132c7f62a2e6721","avatarUrl":"/avatars/e8c626554ee72c38c55c4ae44e9037c7.svg","isPro":false,"fullname":"Shijie Zhou","user":"smz8599","type":"user"},{"_id":"67698b2aa8c1f23364133dcd","avatarUrl":"/avatars/731e61e51957216d93b3b0d8b41029ef.svg","isPro":false,"fullname":"Durakaka","user":"Durakaka","type":"user"},{"_id":"674695a17e39c9bcdd93003d","avatarUrl":"/avatars/0e037d26c1a217912b2bf14b907f0e00.svg","isPro":false,"fullname":"Jiajun Song","user":"JiajunSong-Duke","type":"user"},{"_id":"64704b689ad4008a29058b6e","avatarUrl":"/avatars/f2f82ecb3f0019aafadb2c0e4fe82840.svg","isPro":false,"fullname":"Gangwei","user":"Fif2099","type":"user"},{"_id":"6374abbbecbd6fa145a22865","avatarUrl":"/avatars/6168455d60da89493671235684d78885.svg","isPro":false,"fullname":"Vito Chan","user":"cabbage-dog","type":"user"},{"_id":"656888f2461af93fcadd19f9","avatarUrl":"/avatars/44d3e92fb87995d729a21b3b1818e7ca.svg","isPro":false,"fullname":"tu","user":"yihaotu","type":"user"},{"_id":"6418463883957c4eaaae43d1","avatarUrl":"/avatars/a497206d7f01bad1c44971d6573e4946.svg","isPro":false,"fullname":"Junling","user":"williamliu","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.03980.md"}">

Papers

arxiv:2606.03980

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

Published on Jun 2

· Submitted by

Siyuan Huang on Jun 9

Upvote

Authors:

Abstract

Skill-RM presents a unified reward modeling framework that treats reward computation as a structured agentic task, enabling dynamic evidence aggregation and consistent evaluation across diverse applications.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

View arXiv page View PDF GitHub 4 Add to collection

Community

chamber111

Paper submitter about 3 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.03980

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.03980 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.03980 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.03980 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers