AdaJudge: Adaptive Multi-Perspective Judging for Reward Modeling
Mirrored from arXiv — NLP / Computation & Language for archival readability. Support the source by reading on the original site.
Computer Science > Computation and Language
Title:AdaJudge: Adaptive Multi-Perspective Judging for Reward Modeling
Abstract:Reward modeling is essential for aligning large language models with human preferences, yet predominant architectures rely on a static pooling strategy to condense sequences into scalar scores. This paradigm, however, suffers from two key limitations: a static inductive bias that misaligns with task-dependent preference signals, and a representational mismatch, as the backbone's optimization for generation leaves its representations ill-suited to fine-grained discrimination. To address this, we propose AdaJudge, a unified framework that jointly adapts representation and aggregation. AdaJudge first improves backbone representations into a discrimination-oriented space via gated refinement blocks. It then replaces the static readout with an adaptive multi-view pooling module, which dynamically routes and combines evidence. Extensive experiments on RM-Bench and JudgeBench show that AdaJudge outperforms strong off-the-shelf reward models and traditional pooling baselines.
| Comments: | ACL 2026 |
| Subjects: | Computation and Language (cs.CL); Machine Learning (cs.LG) |
| Cite as: | arXiv:2601.08097 [cs.CL] |
| (or arXiv:2601.08097v2 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2601.08097
arXiv-issued DOI via DataCite
|
Submission history
From: Mengnan Du [view email][v1] Tue, 13 Jan 2026 00:37:38 UTC (310 KB)
[v2] Fri, 5 Jun 2026 06:45:40 UTC (399 KB)
Access Paper:
- View PDF
- HTML (experimental)
- TeX Source
References & Citations
Bibliographic and Citation Tools
Code, Data and Media Associated with this Article
Demos
Recommenders and Search Tools
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
More from arXiv — NLP / Computation & Language
-
RECAP: Regression Evaluation for Continual Adaptation of Prompts
Jun 8
-
RASFT: Rollout-Adaptive Supervised Fine-Tuning for Reasoning
Jun 8
-
OffQ: Taming Structured Outliers in LLM Quantization by Offsetting
Jun 8
-
DEFINED: A Data-Efficient Computational Framework for Fine-Grained Creativity Assessment in Debate Scenarios
Jun 8
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.