Hugging Face Daily Papers · · 5 min read

Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

This paper is accepted at ACL 2026 (Findings, long). It is related to Long-CoT(chain-of-thought) distillation from LRMs (Large Reasoning Models). If you have any questions, please feel free to contact us.</p>\n","updatedAt":"2026-05-18T08:02:11.215Z","author":{"_id":"65642d7401de72cb63165d22","avatarUrl":"/avatars/1f4417c4ac5e781ce73eae1060e3f7f2.svg","fullname":"ytaewon","name":"hamzzi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.9273106455802917},"editors":["hamzzi"],"editorAvatarUrls":["/avatars/1f4417c4ac5e781ce73eae1060e3f7f2.svg"],"reactions":[{"reaction":"👍","users":["HwanChang0106"],"count":1}],"isReport":false}},{"id":"6a0bc1325df6d9f5fd44e888","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":357,"isUserFollowing":false},"createdAt":"2026-05-19T01:47:30.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Confidence-Aware Alignment Makes Reasoning LLMs More Reliable](https://huggingface.co/papers/2605.07353) (2026)\n* [Student-in-the-Loop Chain-of-Thought Distillation via Generation-Time Selection](https://huggingface.co/papers/2604.02819) (2026)\n* [STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes](https://huggingface.co/papers/2605.13165) (2026)\n* [CoTEvol: Self-Evolving Chain-of-Thoughts for Data Synthesis in Mathematical Reasoning](https://huggingface.co/papers/2604.14768) (2026)\n* [Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories](https://huggingface.co/papers/2604.11365) (2026)\n* [HintMR: Eliciting Stronger Mathematical Reasoning in Small Language Models](https://huggingface.co/papers/2604.12229) (2026)\n* [SOD: Step-wise On-policy Distillation for Small Language Model Agents](https://huggingface.co/papers/2605.07725) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.07353\">Confidence-Aware Alignment Makes Reasoning LLMs More Reliable</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.02819\">Student-in-the-Loop Chain-of-Thought Distillation via Generation-Time Selection</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.13165\">STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.14768\">CoTEvol: Self-Evolving Chain-of-Thoughts for Data Synthesis in Mathematical Reasoning</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.11365\">Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.12229\">HintMR: Eliciting Stronger Mathematical Reasoning in Small Language Models</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.07725\">SOD: Step-wise On-policy Distillation for Small Language Model Agents</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{&quot;user&quot;:&quot;librarian-bot&quot;}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-05-19T01:47:30.173Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":357,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7258309125900269},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.02290","authors":[{"_id":"6a0ac6cf75184a0d71e027ab","name":"Taewon Yun","hidden":false},{"_id":"6a0ac6cf75184a0d71e027ac","name":"Jisu Shin","hidden":false},{"_id":"6a0ac6cf75184a0d71e027ad","name":"Jeonghwan Choi","hidden":false},{"_id":"6a0ac6cf75184a0d71e027ae","name":"Seunghwan Bang","hidden":false},{"_id":"6a0ac6cf75184a0d71e027af","name":"Hwanjun Song","hidden":false}],"publishedAt":"2026-05-04T00:00:00.000Z","submittedOnDailyAt":"2026-05-18T00:00:00.000Z","title":"Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding","submittedOnDailyBy":{"_id":"65642d7401de72cb63165d22","avatarUrl":"/avatars/1f4417c4ac5e781ce73eae1060e3f7f2.svg","isPro":false,"fullname":"ytaewon","user":"hamzzi","type":"user","name":"hamzzi"},"summary":"Distilling large reasoning models is essential for making Long-CoT reasoning practical, as full-scale inference remains computationally prohibitive. Existing curation-based approaches select complete reasoning traces post-hoc, overlooking collaboration among heterogeneous teachers and lacking dynamic exploration, which leads to redundant sampling and missed complementary reasoning. We introduce CoRD, a collaborative multi-teacher decoding framework that performs step-wise reasoning synthesis guided by predictive perplexity-based scoring and beam search. This enables heterogeneous LRMs to jointly construct coherent reasoning trajectories while efficiently preserving diverse, high-potential hypotheses. Experiments show that CoRD produces higher-quality reasoning data and achieves near teacher-level student performance with fewer, structured supervision signals, without substantial efficiency overhead. CoRD further generalizes well to out-of-domain and open-ended settings. The dataset and model are available at https://github.com/DISL-Lab/CoRD{https://github.com/DISL-Lab/CoRD}.","upvotes":33,"discussionId":"6a0ac6d075184a0d71e027b0","githubRepo":"https://github.com/DISL-Lab/CoRD","githubRepoAddedBy":"user","ai_summary":"CoRD is a collaborative multi-teacher decoding framework that synthesizes reasoning trajectories through predictive perplexity scoring and beam search, enabling efficient distillation of large reasoning models with high-quality outputs and generalized performance.","ai_keywords":["distilling large reasoning models","Long-CoT reasoning","collaborative multi-teacher decoding","predictive perplexity-based scoring","beam search","heterogeneous teachers","reasoning trajectories","structured supervision signals"],"githubStars":1,"organization":{"_id":"6708fb8eb992dee2c3ffbaae","name":"DISLab","fullname":"Data Intelligence System Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/63c9da8d5fdc575773c84816/YxqnL3XD4yK_dqZY3zlmr.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65642d7401de72cb63165d22","avatarUrl":"/avatars/1f4417c4ac5e781ce73eae1060e3f7f2.svg","isPro":false,"fullname":"ytaewon","user":"hamzzi","type":"user"},{"_id":"670e33701062db514bd5d872","avatarUrl":"/avatars/52d363c2baa76f575465d815c3f227bf.svg","isPro":false,"fullname":"tese","user":"test182617181","type":"user"},{"_id":"67ea868cf5906275490cdb28","avatarUrl":"/avatars/028c65488e149e256ec3775efb7da661.svg","isPro":false,"fullname":"jessi","user":"jesssssi","type":"user"},{"_id":"67ea85823ace6eb4673cea17","avatarUrl":"/avatars/fa6a32561c12e60926194f6701e6da26.svg","isPro":false,"fullname":"ruso","user":"ruso4321","type":"user"},{"_id":"67ea884235616759d21cdf25","avatarUrl":"/avatars/3df9350b7af858911f33ea88e5ffe2ee.svg","isPro":false,"fullname":"dsai","user":"dsai-hif","type":"user"},{"_id":"6481b04b70ac5e1968a82059","avatarUrl":"/avatars/5606f8a6e760e8536e39b381b6d3ddd1.svg","isPro":false,"fullname":"song","user":"song04121","type":"user"},{"_id":"67ea84f33ace6eb4673cbe50","avatarUrl":"/avatars/d9adbe39cf0ca0c9e6d0d45fac9bc464.svg","isPro":false,"fullname":"booo","user":"boooo123","type":"user"},{"_id":"67ea9ac21e23a7499b2d8ccb","avatarUrl":"/avatars/291ad3ef8532f74681afea6e509f3516.svg","isPro":false,"fullname":"Hanminseo","user":"minseohan","type":"user"},{"_id":"6a0acc167228cb296757da17","avatarUrl":"/avatars/b1f982959e0583fe37f2077a847ea863.svg","isPro":false,"fullname":"Nnyy","user":"Ynnnvnny","type":"user"},{"_id":"6a018d45f221b8c15c259033","avatarUrl":"/avatars/ca3bdab67245a92084ad66e323ffdd48.svg","isPro":false,"fullname":"YongUn Kim","user":"polarbear0000","type":"user"},{"_id":"62eff1a871164d46818b59b4","avatarUrl":"/avatars/41103b3c0fb10568be8245dfa73545aa.svg","isPro":false,"fullname":"Park Sunhong","user":"chestnut1717","type":"user"},{"_id":"66b7100e4dd3513d635e22c1","avatarUrl":"/avatars/52680ea59eaab3ca6827c01e97c3ac5a.svg","isPro":false,"fullname":"hoik Hwang","user":"7ol-or","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6708fb8eb992dee2c3ffbaae","name":"DISLab","fullname":"Data Intelligence System Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/63c9da8d5fdc575773c84816/YxqnL3XD4yK_dqZY3zlmr.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.02290.md"}">
Papers
arxiv:2605.02290

Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

Published on May 4
· Submitted by
ytaewon
on May 18
Authors:
,
,
,
,

Abstract

CoRD is a collaborative multi-teacher decoding framework that synthesizes reasoning trajectories through predictive perplexity scoring and beam search, enabling efficient distillation of large reasoning models with high-quality outputs and generalized performance.

AI-generated summary

Distilling large reasoning models is essential for making Long-CoT reasoning practical, as full-scale inference remains computationally prohibitive. Existing curation-based approaches select complete reasoning traces post-hoc, overlooking collaboration among heterogeneous teachers and lacking dynamic exploration, which leads to redundant sampling and missed complementary reasoning. We introduce CoRD, a collaborative multi-teacher decoding framework that performs step-wise reasoning synthesis guided by predictive perplexity-based scoring and beam search. This enables heterogeneous LRMs to jointly construct coherent reasoning trajectories while efficiently preserving diverse, high-potential hypotheses. Experiments show that CoRD produces higher-quality reasoning data and achieves near teacher-level student performance with fewer, structured supervision signals, without substantial efficiency overhead. CoRD further generalizes well to out-of-domain and open-ended settings. The dataset and model are available at https://github.com/DISL-Lab/CoRD{https://github.com/DISL-Lab/CoRD}.

Community

This paper is accepted at ACL 2026 (Findings, long). It is related to Long-CoT(chain-of-thought) distillation from LRMs (Large Reasoning Models). If you have any questions, please feel free to contact us.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.02290
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.02290 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.02290 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.02290 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers