Weak-to-strong generalization studies how to improve a strong student using supervision from a weaker teacher when reliable labels are scarce. We view this primarily as a data selection problem, where the key challenge is to identify which weak labels are reliable enough to serve as a training signal. To address this, we introduce trust functions that assign each weak label a scalar trust score and use these scores to filter weak supervision. Across several domains, including world knowledge, quantitative reasoning, and strategy games, trust filtering yields students that match and sometimes surpass ground-truth supervision, achieving near-lossless weak-to-strong generalization. Moreover, trust functions enable an iterative weak-to-strong chain that compounds gains by training a student and reusing it as the next teacher, amplifying the gains. There are several mechanisms to which advantage of trust functions can be attributed.</p>\n","updatedAt":"2026-06-09T23:01:47.660Z","author":{"_id":"61f7fa24b5e6e866f9abdaed","avatarUrl":"/avatars/8d43531365e2e78e568db9e0a421196a.svg","fullname":"Arda Uzunoğlu","name":"ardauzunoglu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9510704278945923},"editors":["ardauzunoglu"],"editorAvatarUrls":["/avatars/8d43531365e2e78e568db9e0a421196a.svg"],"reactions":[],"isReport":false}},{"id":"6a28c28cb6ab1056bc22baff","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":363,"isUserFollowing":false},"createdAt":"2026-06-10T01:49:00.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [When In-Distribution Gains Fail: Evaluating Weak-to-Strong Reward Models under Preference Shift](https://huggingface.co/papers/2605.25629) (2026)\n* [Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight](https://huggingface.co/papers/2606.00424) (2026)\n* [Evaluating Risks in Weak-to-Strong Alignment: A Bias-Variance Perspective](https://huggingface.co/papers/2604.25077) (2026)\n* [Strong Teacher Not Needed? On Distillation in LLM Pretraining](https://huggingface.co/papers/2605.23857) (2026)\n* [Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection](https://huggingface.co/papers/2605.28631) (2026)\n* [On the Generalization Gap in Self-Evolving Language Model Reasoning](https://huggingface.co/papers/2606.01075) (2026)\n* [LoRi: Low-Rank Distillation for Implicit Reasoning](https://huggingface.co/papers/2606.05315) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.25629\">When In-Distribution Gains Fail: Evaluating Weak-to-Strong Reward Models under Preference Shift</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.00424\">Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.25077\">Evaluating Risks in Weak-to-Strong Alignment: A Bias-Variance Perspective</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.23857\">Strong Teacher Not Needed? On Distillation in LLM Pretraining</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.28631\">Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.01075\">On the Generalization Gap in Self-Evolving Language Model Reasoning</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.05315\">LoRi: Low-Rank Distillation for Implicit Reasoning</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{"user":"librarian-bot"}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-06-10T01:49:00.156Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":363,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.710087776184082},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.01000","authors":[{"_id":"6a289b0ee7d78ea7587e5202","name":"Arda Uzunoglu","hidden":false},{"_id":"6a289b0ee7d78ea7587e5203","name":"Alvin Zhang","hidden":false},{"_id":"6a289b0ee7d78ea7587e5204","name":"Daniel Khashabi","hidden":false}],"publishedAt":"2026-05-31T00:00:00.000Z","submittedOnDailyAt":"2026-06-09T00:00:00.000Z","title":"Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher","submittedOnDailyBy":{"_id":"61f7fa24b5e6e866f9abdaed","avatarUrl":"/avatars/8d43531365e2e78e568db9e0a421196a.svg","isPro":false,"fullname":"Arda Uzunoğlu","user":"ardauzunoglu","type":"user","name":"ardauzunoglu"},"summary":"Weak-to-strong generalization studies how to improve a strong student using supervision from a weaker teacher when reliable labels are scarce. We view this primarily as a data selection problem, where the key challenge is to identify which weak labels are reliable enough to serve as a training signal. To address this, we introduce trust functions that assign each weak label a scalar trust score and use these scores to filter weak supervision. Across several domains, including world knowledge, quantitative reasoning, and strategy games, trust filtering yields students that match and sometimes surpass ground-truth supervision, achieving near-lossless weak-to-strong generalization. Moreover, trust functions enable an iterative weak-to-strong chain that compounds gains by training a student and reusing it as the next teacher, amplifying the gains. There are several mechanisms to which advantage of trust functions can be attributed.","upvotes":2,"discussionId":"6a289b0fe7d78ea7587e5205","projectPage":"https://ardauzunoglu.github.io/trust-functions/","githubRepo":"https://github.com/ardauzunoglu/trust-functions","githubRepoAddedBy":"user","ai_summary":"Trust functions enable effective weak-to-strong generalization by identifying reliable weak labels for training, achieving performance comparable to ground-truth supervision across multiple domains.","ai_keywords":["weak-to-strong generalization","data selection","trust functions","weak supervision","reliable labels","training signal","iterative weak-to-strong chain","compounding gains"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":0,"organization":{"_id":"6137aeeaf8e9dca6e152bccf","name":"jhu-clsp","fullname":"Center for Language and Speech Processing @ JHU","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1631039662102-6137ad94501f80a6f6e1eac9.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"5f6540c65e78cc6b0ed3199d","avatarUrl":"/avatars/0280d4df417855965a0964d22766c012.svg","isPro":false,"fullname":"Daniel Khashabi","user":"danyaljj","type":"user"},{"_id":"61f7fa24b5e6e866f9abdaed","avatarUrl":"/avatars/8d43531365e2e78e568db9e0a421196a.svg","isPro":false,"fullname":"Arda Uzunoğlu","user":"ardauzunoglu","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6137aeeaf8e9dca6e152bccf","name":"jhu-clsp","fullname":"Center for Language and Speech Processing @ JHU","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1631039662102-6137ad94501f80a6f6e1eac9.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.01000.md"}">
Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher
Abstract
Trust functions enable effective weak-to-strong generalization by identifying reliable weak labels for training, achieving performance comparable to ground-truth supervision across multiple domains.
Weak-to-strong generalization studies how to improve a strong student using supervision from a weaker teacher when reliable labels are scarce. We view this primarily as a data selection problem, where the key challenge is to identify which weak labels are reliable enough to serve as a training signal. To address this, we introduce trust functions that assign each weak label a scalar trust score and use these scores to filter weak supervision. Across several domains, including world knowledge, quantitative reasoning, and strategy games, trust filtering yields students that match and sometimes surpass ground-truth supervision, achieving near-lossless weak-to-strong generalization. Moreover, trust functions enable an iterative weak-to-strong chain that compounds gains by training a student and reusing it as the next teacher, amplifying the gains. There are several mechanisms to which advantage of trust functions can be attributed.
Community
Weak-to-strong generalization studies how to improve a strong student using supervision from a weaker teacher when reliable labels are scarce. We view this primarily as a data selection problem, where the key challenge is to identify which weak labels are reliable enough to serve as a training signal. To address this, we introduce trust functions that assign each weak label a scalar trust score and use these scores to filter weak supervision. Across several domains, including world knowledge, quantitative reasoning, and strategy games, trust filtering yields students that match and sometimes surpass ground-truth supervision, achieving near-lossless weak-to-strong generalization. Moreover, trust functions enable an iterative weak-to-strong chain that compounds gains by training a student and reusing it as the next teacher, amplifying the gains. There are several mechanisms to which advantage of trust functions can be attributed.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.01000 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.01000 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.01000 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.