Hugging Face Daily Papers · May 14, 2026 · 4 min read

Many-Shot CoT-ICL: Making In-Context Learning Truly Learn

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

We believe this work provides a step toward bridging ICL from pattern matching to in-context test time learning with two principles proposed, showing that reasoning performance relies on demonstrations being both understandable to the model and smoothly sequenced to facilitate conceptual progression.</p>\n","updatedAt":"2026-05-14T02:59:17.781Z","author":{"_id":"60ab6b2ee3de7c7440abb845","avatarUrl":"/avatars/22916bece3b5b951c016bf2ddd8dda1c.svg","fullname":"Cindy","name":"ttchungc","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9223818778991699},"editors":["ttchungc"],"editorAvatarUrls":["/avatars/22916bece3b5b951c016bf2ddd8dda1c.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.13511","authors":[{"_id":"6a05390bb1a8cbabc9f0876e","name":"Tsz Ting Chung","hidden":false},{"_id":"6a05390bb1a8cbabc9f0876f","name":"Lemao Liu","hidden":false},{"_id":"6a05390bb1a8cbabc9f08770","name":"Mo Yu","hidden":false},{"_id":"6a05390bb1a8cbabc9f08771","name":"Dit-Yan Yeung","hidden":false}],"publishedAt":"2026-05-13T00:00:00.000Z","submittedOnDailyAt":"2026-05-14T00:00:00.000Z","title":"Many-Shot CoT-ICL: Making In-Context Learning Truly Learn","submittedOnDailyBy":{"_id":"60ab6b2ee3de7c7440abb845","avatarUrl":"/avatars/22916bece3b5b951c016bf2ddd8dda1c.svg","isPro":false,"fullname":"Cindy","user":"ttchungc","type":"user","name":"ttchungc"},"summary":"In-context learning (ICL) adapts large language models (LLMs) to new tasks by conditioning on demonstrations in the prompt without parameter updates. With long-context models, many-shot ICL can use dozens to hundreds of examples and achieve performance comparable to fine-tuning, yet current understanding of its scaling behavior is largely derived from non-reasoning tasks. We study many-shot chain-of-thought in-context learning (CoT-ICL) for reasoning and show that standard many-shot rules do not transfer. Across non-reasoning and reasoning-oriented LLMs and across non-reasoning and reasoning tasks, we find: (i) a setting-dependent scaling effect, where increasing the number of CoT demonstrations is unstable for non-reasoning LLMs and benefits mainly reasoning-oriented LLMs; (ii) similarity-based retrieval helps on non-reasoning tasks but fails on reasoning, since semantic similarity poorly predicts procedural (i.e., CoT) compatibility; and (iii) an order-scaling effect, where performance variance grows with more CoT demonstrations. We interpret these behaviors by viewing many-shot CoT-ICL as in-context test-time learning rather than scaled pattern matching, and suggests two principles: (i) demonstrations should be easy for the target model to understand, and (ii) they should be ordered to support a smooth conceptual progression. Guided by the principle, we propose Curvilinear Demonstration Selection (CDS), a simple ordering method that yields up to a 5.42 percentage-point gain on geometry with 64 demonstrations. Overall, our results reframe the long context window from a retrieval buffer into a structured curriculum for in-context test-time learning.","upvotes":26,"discussionId":"6a05390cb1a8cbabc9f08772","ai_summary":"Many-shot in-context learning for reasoning tasks exhibits different scaling behaviors than non-reasoning tasks, with demonstration ordering and selection significantly impacting performance.","ai_keywords":["in-context learning","large language models","chain-of-thought","few-shot learning","many-shot learning","test-time learning","demonstration selection","curriculum learning"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"60ab6b2ee3de7c7440abb845","avatarUrl":"/avatars/22916bece3b5b951c016bf2ddd8dda1c.svg","isPro":false,"fullname":"Cindy","user":"ttchungc","type":"user"},{"_id":"689ef5600a1b522a125933ab","avatarUrl":"/avatars/c306acc98c273cfd284fe8732e73a534.svg","isPro":false,"fullname":"Zeng","user":"accident1999","type":"user"},{"_id":"6861ed12ca30e7f987978c69","avatarUrl":"/avatars/78d5a86abfb45964f61ded2a67c03c52.svg","isPro":false,"fullname":"yisen gao","user":"Eason-nuo","type":"user"},{"_id":"6670f52a7a0446ec70e11c96","avatarUrl":"/avatars/7786b9d947d7bbaf95013feee3a3bf39.svg","isPro":false,"fullname":"Jiangnan Li","user":"lossisnotanumber","type":"user"},{"_id":"689ea38daafdf21ed7c40d34","avatarUrl":"/avatars/fa05f9894bf3139710063e4322727a3f.svg","isPro":false,"fullname":"KN","user":"KaNaaaRL","type":"user"},{"_id":"6a053ec6c4f568755517210e","avatarUrl":"/avatars/471d4a34ca0fe6d9b397aa5beb04a49d.svg","isPro":false,"fullname":"tmzhang","user":"TMAgent","type":"user"},{"_id":"6a054599891abe09465b58ec","avatarUrl":"/avatars/1d25ffa228408fc837760900fcea39f6.svg","isPro":false,"fullname":"zzli","user":"ZzzzzLi","type":"user"},{"_id":"65bf3b304b5f8c270d50d6c8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65bf3b304b5f8c270d50d6c8/iSC_Em-JtfcsKFoGw4KFc.jpeg","isPro":false,"fullname":"Kai Chen","user":"KaiChen1998","type":"user"},{"_id":"6a054c5aaf033d262459c6d4","avatarUrl":"/avatars/c952d90fb4ef4c4c82328450d504431b.svg","isPro":false,"fullname":"yauwy","user":"YwwwwY","type":"user"},{"_id":"6891bbff78946201296b4592","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6891bbff78946201296b4592/ECmWBrlfeonPg0HzmQ_sW.png","isPro":false,"fullname":"Yuqing Li","user":"MindscapeRAG","type":"user"},{"_id":"695245ffb6d8665e79f8a472","avatarUrl":"/avatars/1357fd8ee3edd59bdb74ed13fe21de88.svg","isPro":false,"fullname":"cassieyqli","user":"cccassie","type":"user"},{"_id":"636cb3448cb710cbc9945e1e","avatarUrl":"/avatars/368a05d130ca58817a407ba20f40d27e.svg","isPro":false,"fullname":"Yuqing Li","user":"ysgx","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.13511.md"}">

Papers

arxiv:2605.13511

Many-Shot CoT-ICL: Making In-Context Learning Truly Learn

Published on May 13

· Submitted by

Cindy on May 14

Upvote

Authors:

Abstract

Many-shot in-context learning for reasoning tasks exhibits different scaling behaviors than non-reasoning tasks, with demonstration ordering and selection significantly impacting performance.

AI-generated summary

In-context learning (ICL) adapts large language models (LLMs) to new tasks by conditioning on demonstrations in the prompt without parameter updates. With long-context models, many-shot ICL can use dozens to hundreds of examples and achieve performance comparable to fine-tuning, yet current understanding of its scaling behavior is largely derived from non-reasoning tasks. We study many-shot chain-of-thought in-context learning (CoT-ICL) for reasoning and show that standard many-shot rules do not transfer. Across non-reasoning and reasoning-oriented LLMs and across non-reasoning and reasoning tasks, we find: (i) a setting-dependent scaling effect, where increasing the number of CoT demonstrations is unstable for non-reasoning LLMs and benefits mainly reasoning-oriented LLMs; (ii) similarity-based retrieval helps on non-reasoning tasks but fails on reasoning, since semantic similarity poorly predicts procedural (i.e., CoT) compatibility; and (iii) an order-scaling effect, where performance variance grows with more CoT demonstrations. We interpret these behaviors by viewing many-shot CoT-ICL as in-context test-time learning rather than scaled pattern matching, and suggests two principles: (i) demonstrations should be easy for the target model to understand, and (ii) they should be ordered to support a smooth conceptual progression. Guided by the principle, we propose Curvilinear Demonstration Selection (CDS), a simple ordering method that yields up to a 5.42 percentage-point gain on geometry with 64 demonstrations. Overall, our results reframe the long context window from a retrieval buffer into a structured curriculum for in-context test-time learning.

View arXiv page View PDF Add to collection