Hugging Face Daily Papers · May 18, 2026 · 5 min read

Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

This paper is accepted at ACL 2026 (Findings, long). It is related to Long-CoT(chain-of-thought) distillation from LRMs (Large Reasoning Models). If you have any questions, please feel free to contact us.\n","updatedAt":"2026-05-18T08:02:11.215Z","author":{"_id":"65642d7401de72cb63165d22","avatarUrl":"/avatars/1f4417c4ac5e781ce73eae1060e3f7f2.svg","fullname":"ytaewon","name":"hamzzi","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.9273106455802917},"editors":["hamzzi"],"editorAvatarUrls":["/avatars/1f4417c4ac5e781ce73eae1060e3f7f2.svg"],"reactions":[{"reaction":"👍","users":["HwanChang0106"],"count":1}],"isReport":false}},{"id":"6a0bc1325df6d9f5fd44e888","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":357,"isUserFollowing":false},"createdAt":"2026-05-19T01:47:30.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Confidence-Aware Alignment Makes Reasoning LLMs More Reliable](https://huggingface.co/papers/2605.07353) (2026)\n* [Student-in-the-Loop Chain-of-Thought Distillation via Generation-Time Selection](https://huggingface.co/papers/2604.02819) (2026)\n* [STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes](https://huggingface.co/papers/2605.13165) (2026)\n* [CoTEvol: Self-Evolving Chain-of-Thoughts for Data Synthesis in Mathematical Reasoning](https://huggingface.co/papers/2604.14768) (2026)\n* [Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories](https://huggingface.co/papers/2604.11365) (2026)\n* [HintMR: Eliciting Stronger Mathematical Reasoning in Small Language Models](https://huggingface.co/papers/2604.12229) (2026)\n* [SOD: Step-wise On-policy Distillation for Small Language Model Agents](https://huggingface.co/papers/2605.07725) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. \nThe following papers were recommended by the Semantic Scholar API \n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.07353\">Confidence-Aware Alignment Makes Reasoning LLMs More Reliable</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.02819\">Student-in-the-Loop Chain-of-Thought Distillation via Generation-Time Selection</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.13165\">STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.14768\">CoTEvol: Self-Evolving Chain-of-Thoughts for Data Synthesis in Mathematical Reasoning</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.11365\">Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.12229\">HintMR: Eliciting Stronger Mathematical Reasoning in Small Language Models</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.07725\">SOD: Step-wise On-policy Distillation for Small Language Model Agents</a> (2026)</li>\n</ul>\n Please give a thumbs up to this comment if you found it helpful!\n If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><a href=\"/librarian-bot\">@librarian-bot</a> recommend</code>\n","updatedAt":"2026-05-19T01:47:30.173Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":357,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7258309125900269},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.02290","authors":[{"_id":"6a0ac6cf75184a0d71e027ab","name":"Taewon Yun","hidden":false},{"_id":"6a0ac6cf75184a0d71e027ac","name":"Jisu Shin","hidden":false},{"_id":"6a0ac6cf75184a0d71e027ad","name":"Jeonghwan Choi","hidden":false},{"_id":"6a0ac6cf75184a0d71e027ae","name":"Seunghwan Bang","hidden":false},{"_id":"6a0ac6cf75184a0d71e027af","name":"Hwanjun Song","hidden":false}],"publishedAt":"2026-05-04T00:00:00.000Z","submittedOnDailyAt":"2026-05-18T00:00:00.000Z","title":"Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding","submittedOnDailyBy":{"_id":"65642d7401de72cb63165d22","avatarUrl":"/avatars/1f4417c4ac5e781ce73eae1060e3f7f2.svg","isPro":false,"fullname":"ytaewon","user":"hamzzi","type":"user","name":"hamzzi"},"summary":"Distilling large reasoning models is essential for making Long-CoT reasoning practical, as full-scale inference remains computationally prohibitive. Existing curation-based approaches select complete reasoning traces post-hoc, overlooking collaboration among heterogeneous teachers and lacking dynamic exploration, which leads to redundant sampling and missed complementary reasoning. We introduce CoRD, a collaborative multi-teacher decoding framework that performs step-wise reasoning synthesis guided by predictive perplexity-based scoring and beam search. This enables heterogeneous LRMs to jointly construct coherent reasoning trajectories while efficiently preserving diverse, high-potential hypotheses. Experiments show that CoRD produces higher-quality reasoning data and achieves near teacher-level student performance with fewer, structured supervision signals, without substantial efficiency overhead. CoRD further generalizes well to out-of-domain and open-ended settings. The dataset and model are available at https://github.com/DISL-Lab/CoRD{https://github.com/DISL-Lab/CoRD}.","upvotes":33,"discussionId":"6a0ac6d075184a0d71e027b0","githubRepo":"https://github.com/DISL-Lab/CoRD","githubRepoAddedBy":"user","ai_summary":"CoRD is a collaborative multi-teacher decoding framework that synthesizes reasoning trajectories through predictive perplexity scoring and beam search, enabling efficient distillation of large reasoning models with high-quality outputs and generalized performance.","ai_keywords":["distilling large reasoning models","Long-CoT reasoning","collaborative multi-teacher decoding","predictive perplexity-based scoring","beam search","heterogeneous teachers","reasoning trajectories","structured supervision signals"],"githubStars":1,"organization":{"_id":"6708fb8eb992dee2c3ffbaae","name":"DISLab","fullname":"Data Intelligence System Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/63c9da8d5fdc575773c84816/YxqnL3XD4yK_dqZY3zlmr.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65642d7401de72cb63165d22","avatarUrl":"/avatars/1f4417c4ac5e781ce73eae1060e3f7f2.svg","isPro":false,"fullname":"ytaewon","user":"hamzzi","type":"user"},{"_id":"670e33701062db514bd5d872","avatarUrl":"/avatars/52d363c2baa76f575465d815c3f227bf.svg","isPro":false,"fullname":"tese","user":"test182617181","type":"user"},{"_id":"67ea868cf5906275490cdb28","avatarUrl":"/avatars/028c65488e149e256ec3775efb7da661.svg","isPro":false,"fullname":"jessi","user":"jesssssi","type":"user"},{"_id":"67ea85823ace6eb4673cea17","avatarUrl":"/avatars/fa6a32561c12e60926194f6701e6da26.svg","isPro":false,"fullname":"ruso","user":"ruso4321","type":"user"},{"_id":"67ea884235616759d21cdf25","avatarUrl":"/avatars/3df9350b7af858911f33ea88e5ffe2ee.svg","isPro":false,"fullname":"dsai","user":"dsai-hif","type":"user"},{"_id":"6481b04b70ac5e1968a82059","avatarUrl":"/avatars/5606f8a6e760e8536e39b381b6d3ddd1.svg","isPro":false,"fullname":"song","user":"song04121","type":"user"},{"_id":"67ea84f33ace6eb4673cbe50","avatarUrl":"/avatars/d9adbe39cf0ca0c9e6d0d45fac9bc464.svg","isPro":false,"fullname":"booo","user":"boooo123","type":"user"},{"_id":"67ea9ac21e23a7499b2d8ccb","avatarUrl":"/avatars/291ad3ef8532f74681afea6e509f3516.svg","isPro":false,"fullname":"Hanminseo","user":"minseohan","type":"user"},{"_id":"6a0acc167228cb296757da17","avatarUrl":"/avatars/b1f982959e0583fe37f2077a847ea863.svg","isPro":false,"fullname":"Nnyy","user":"Ynnnvnny","type":"user"},{"_id":"6a018d45f221b8c15c259033","avatarUrl":"/avatars/ca3bdab67245a92084ad66e323ffdd48.svg","isPro":false,"fullname":"YongUn Kim","user":"polarbear0000","type":"user"},{"_id":"62eff1a871164d46818b59b4","avatarUrl":"/avatars/41103b3c0fb10568be8245dfa73545aa.svg","isPro":false,"fullname":"Park Sunhong","user":"chestnut1717","type":"user"},{"_id":"66b7100e4dd3513d635e22c1","avatarUrl":"/avatars/52680ea59eaab3ca6827c01e97c3ac5a.svg","isPro":false,"fullname":"hoik Hwang","user":"7ol-or","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6708fb8eb992dee2c3ffbaae","name":"DISLab","fullname":"Data Intelligence System Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/63c9da8d5fdc575773c84816/YxqnL3XD4yK_dqZY3zlmr.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.02290.md"}">

Papers

arxiv:2605.02290

Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

Published on May 4

· Submitted by

ytaewon on May 18

Data Intelligence System Lab

Upvote

Authors:

Abstract

CoRD is a collaborative multi-teacher decoding framework that synthesizes reasoning trajectories through predictive perplexity scoring and beam search, enabling efficient distillation of large reasoning models with high-quality outputs and generalized performance.

AI-generated summary

Distilling large reasoning models is essential for making Long-CoT reasoning practical, as full-scale inference remains computationally prohibitive. Existing curation-based approaches select complete reasoning traces post-hoc, overlooking collaboration among heterogeneous teachers and lacking dynamic exploration, which leads to redundant sampling and missed complementary reasoning. We introduce CoRD, a collaborative multi-teacher decoding framework that performs step-wise reasoning synthesis guided by predictive perplexity-based scoring and beam search. This enables heterogeneous LRMs to jointly construct coherent reasoning trajectories while efficiently preserving diverse, high-potential hypotheses. Experiments show that CoRD produces higher-quality reasoning data and achieves near teacher-level student performance with fewer, structured supervision signals, without substantial efficiency overhead. CoRD further generalizes well to out-of-domain and open-ended settings. The dataset and model are available at https://github.com/DISL-Lab/CoRD{https://github.com/DISL-Lab/CoRD}.