Hugging Face Daily Papers · May 29, 2026 · 6 min read

RUBRIC-ARROW: Alternating Pointwise Rubric Reward Modeling for LLM Post-training in Non-verifiable Domains

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Pointwise reward modeling offers critical signals for LLM post-training, yet struggles with absolute scoring in subjective, non-verifiable settings. Rubric-based methods address this by decomposing evaluation into explicit criteria, but existing approaches typically depend on frontier LLMs and suffer from ties caused by hard Boolean aggregation. We present RUBRIC-ARROW, an alternating framework that jointly trains a rubric generator and a rubric-conditioned judge, with its RL stage using only pairwise preference data. Our method couples a probability-based scoring rule that reduces ties with phase-specific preference-based rewards and an alternating GRPO scheme that together train the pointwise evaluator. Extensive experiments show that RUBRIC-ARROW achieves competitive reward-modeling accuracy and yields consistent gains for downstream policy post-training.\n","updatedAt":"2026-05-29T06:55:05.433Z","author":{"_id":"64bf811d76a6e2efcceabc00","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64bf811d76a6e2efcceabc00/0p3zSIVqzoME25Zmfh7SD.png","fullname":"Tianci Liu","name":"lliutianc","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.90444415807724},"editors":["lliutianc"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/64bf811d76a6e2efcceabc00/0p3zSIVqzoME25Zmfh7SD.png"],"reactions":[],"isReport":false}},{"id":"6a1a40ff0499e06634bc27a2","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false},"createdAt":"2026-05-30T01:44:31.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [C2: Scalable Rubric-Augmented Reward Modeling from Binary Preferences](https://huggingface.co/papers/2604.13618) (2026)\n* [Prompt-Level Reward Specifications for Open-Ended Post-Training](https://huggingface.co/papers/2605.29275) (2026)\n* [Focal Reward: Balanced Reinforcement Learning under Rubric-Based Rewards](https://huggingface.co/papers/2605.26579) (2026)\n* [EvoLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics](https://huggingface.co/papers/2605.03871) (2026)\n* [Rubric-Grounded RL: Structured Judge Rewards for Generalizable Reasoning](https://huggingface.co/papers/2605.08061) (2026)\n* [Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria](https://huggingface.co/papers/2605.08354) (2026)\n* [Tournament-GRPO: Group-Wise Tournament Rewards for Reinforcement Learning in Open-Ended Long-Form Generation](https://huggingface.co/papers/2605.26958) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. \nThe following papers were recommended by the Semantic Scholar API \n<ul>\n<li><a href=\"https://huggingface.co/papers/2604.13618\">C2: Scalable Rubric-Augmented Reward Modeling from Binary Preferences</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.29275\">Prompt-Level Reward Specifications for Open-Ended Post-Training</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.26579\">Focal Reward: Balanced Reinforcement Learning under Rubric-Based Rewards</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.03871\">EvoLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.08061\">Rubric-Grounded RL: Structured Judge Rewards for Generalizable Reasoning</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.08354\">Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.26958\">Tournament-GRPO: Group-Wise Tournament Rewards for Reinforcement Learning in Open-Ended Long-Form Generation</a> (2026)</li>\n</ul>\n Please give a thumbs up to this comment if you found it helpful!\n If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><a href=\"/librarian-bot\">@librarian-bot</a> recommend</code>\n","updatedAt":"2026-05-30T01:44:31.046Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7502484321594238},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.29156","authors":[{"_id":"6a1937ee56b4bb14ec65d148","name":"Haoxiang Jiang","hidden":false},{"_id":"6a1937ee56b4bb14ec65d149","name":"Zihan Dong","hidden":false},{"_id":"6a1937ee56b4bb14ec65d14a","user":{"_id":"64bf811d76a6e2efcceabc00","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64bf811d76a6e2efcceabc00/0p3zSIVqzoME25Zmfh7SD.png","isPro":false,"fullname":"Tianci Liu","user":"lliutianc","type":"user","name":"lliutianc"},"name":"Tianci Liu","status":"claimed_verified","statusLastChangedAt":"2026-05-29T08:49:09.934Z","hidden":false},{"_id":"6a1937ee56b4bb14ec65d14b","name":"Wanying Wang","hidden":false},{"_id":"6a1937ee56b4bb14ec65d14c","name":"Ran Xu","hidden":false},{"_id":"6a1937ee56b4bb14ec65d14d","name":"Tony Yu","hidden":false},{"_id":"6a1937ee56b4bb14ec65d14e","name":"Linjun Zhang","hidden":false},{"_id":"6a1937ee56b4bb14ec65d14f","name":"Haoyu Wang","hidden":false}],"publishedAt":"2026-05-27T00:00:00.000Z","submittedOnDailyAt":"2026-05-29T00:00:00.000Z","title":"RUBRIC-ARROW: Alternating Pointwise Rubric Reward Modeling for LLM Post-training in Non-verifiable Domains","submittedOnDailyBy":{"_id":"64bf811d76a6e2efcceabc00","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64bf811d76a6e2efcceabc00/0p3zSIVqzoME25Zmfh7SD.png","isPro":false,"fullname":"Tianci Liu","user":"lliutianc","type":"user","name":"lliutianc"},"summary":"Pointwise reward modeling offers critical signals for LLM post-training, yet struggles with absolute scoring in subjective, non-verifiable settings. Rubric-based methods address this by decomposing evaluation into explicit criteria, but existing approaches typically depend on frontier LLMs and suffer from ties caused by hard Boolean aggregation. We present RUBRIC-ARROW, an alternating framework that jointly trains a rubric generator and a rubric-conditioned judge, with its RL stage using only pairwise preference data. Our method couples a probability-based scoring rule that reduces ties with phase-specific preference-based rewards and an alternating GRPO scheme that together train the pointwise evaluator. Extensive experiments show that RUBRIC-ARROW achieves competitive reward-modeling accuracy and yields consistent gains for downstream policy post-training.","upvotes":6,"discussionId":"6a1937ee56b4bb14ec65d150","ai_summary":"RUBRIC-ARROW presents an alternating framework for reward modeling that improves upon rubric-based methods by reducing ties and leveraging pairwise preference data for training.","ai_keywords":["reward modeling","LLM post-training","rubric-based methods","pairwise preference data","RL stage","probability-based scoring rule","phase-specific preference-based rewards","alternating GRPO scheme","pointwise evaluator"],"organization":{"_id":"68e706da311f55603f9b6f2f","name":"OpenRubrics","fullname":"OpenRubrics","avatar":"https://www.gravatar.com/avatar/a1cf4d47627d8b743a835e34d24d6b7e?d=retro&size=100"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64bf811d76a6e2efcceabc00","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64bf811d76a6e2efcceabc00/0p3zSIVqzoME25Zmfh7SD.png","isPro":false,"fullname":"Tianci Liu","user":"lliutianc","type":"user"},{"_id":"6358c9d90e4fef21982b6b87","avatarUrl":"/avatars/12def86ed68b74aaea0b6593c867a274.svg","isPro":false,"fullname":"Yue Yu","user":"yyu","type":"user"},{"_id":"68ba7b143b5dd16b2315a5c0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/w3qDz3A_vITfHG4BuWfwt.png","isPro":false,"fullname":"Yi-Chung Chen","user":"andrew0111","type":"user"},{"_id":"68bf90fe449c9a0248625005","avatarUrl":"/avatars/bb6112bc1cfd8a30beebd58a9e57280f.svg","isPro":false,"fullname":"Shiyang Wang","user":"testbed2","type":"user"},{"_id":"665881b031d241b7a609cc8c","avatarUrl":"/avatars/62fd259fd5c9bbadd523c5c195ab764f.svg","isPro":false,"fullname":"Tianchun Li","user":"tchunli","type":"user"},{"_id":"641a92bc4182690729c9324b","avatarUrl":"/avatars/f5d3de7f04fe77d0cfced51b5431c114.svg","isPro":false,"fullname":"haoyu wang","user":"haoyuw","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"68e706da311f55603f9b6f2f","name":"OpenRubrics","fullname":"OpenRubrics","avatar":"https://www.gravatar.com/avatar/a1cf4d47627d8b743a835e34d24d6b7e?d=retro&size=100"}}">

Papers

arxiv:2605.29156

RUBRIC-ARROW: Alternating Pointwise Rubric Reward Modeling for LLM Post-training in Non-verifiable Domains

Published on May 27

· Submitted by

Tianci Liu on May 29

OpenRubrics

Upvote

Authors:

Tianci Liu ,

Abstract

RUBRIC-ARROW presents an alternating framework for reward modeling that improves upon rubric-based methods by reducing ties and leveraging pairwise preference data for training.

AI-generated summary

View arXiv page View PDF Add to collection

Community

lliutianc

Paper author Paper submitter 1 day ago

librarian-bot

about 13 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 2

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.29156 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

RUBRIC-ARROW: Alternating Pointwise Rubric Reward Modeling for LLM Post-training in Non-verifiable Domains

Abstract

Community

Models citing this paper 2

Datasets citing this paper 1

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers