We believe this work provides a step toward bridging ICL from pattern matching to in-context test time learning with two principles proposed, showing that reasoning performance relies on demonstrations being both understandable to the model and smoothly sequenced to facilitate conceptual progression.</p>\n","updatedAt":"2026-05-14T02:59:17.781Z","author":{"_id":"60ab6b2ee3de7c7440abb845","avatarUrl":"/avatars/22916bece3b5b951c016bf2ddd8dda1c.svg","fullname":"Cindy","name":"ttchungc","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9223818778991699},"editors":["ttchungc"],"editorAvatarUrls":["/avatars/22916bece3b5b951c016bf2ddd8dda1c.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.13511","authors":[{"_id":"6a05390bb1a8cbabc9f0876e","name":"Tsz Ting Chung","hidden":false},{"_id":"6a05390bb1a8cbabc9f0876f","name":"Lemao Liu","hidden":false},{"_id":"6a05390bb1a8cbabc9f08770","name":"Mo Yu","hidden":false},{"_id":"6a05390bb1a8cbabc9f08771","name":"Dit-Yan Yeung","hidden":false}],"publishedAt":"2026-05-13T00:00:00.000Z","submittedOnDailyAt":"2026-05-14T00:00:00.000Z","title":"Many-Shot CoT-ICL: Making In-Context Learning Truly Learn","submittedOnDailyBy":{"_id":"60ab6b2ee3de7c7440abb845","avatarUrl":"/avatars/22916bece3b5b951c016bf2ddd8dda1c.svg","isPro":false,"fullname":"Cindy","user":"ttchungc","type":"user","name":"ttchungc"},"summary":"In-context learning (ICL) adapts large language models (LLMs) to new tasks by conditioning on demonstrations in the prompt without parameter updates. With long-context models, many-shot ICL can use dozens to hundreds of examples and achieve performance comparable to fine-tuning, yet current understanding of its scaling behavior is largely derived from non-reasoning tasks. We study many-shot chain-of-thought in-context learning (CoT-ICL) for reasoning and show that standard many-shot rules do not transfer. Across non-reasoning and reasoning-oriented LLMs and across non-reasoning and reasoning tasks, we find: (i) a setting-dependent scaling effect, where increasing the number of CoT demonstrations is unstable for non-reasoning LLMs and benefits mainly reasoning-oriented LLMs; (ii) similarity-based retrieval helps on non-reasoning tasks but fails on reasoning, since semantic similarity poorly predicts procedural (i.e., CoT) compatibility; and (iii) an order-scaling effect, where performance variance grows with more CoT demonstrations. We interpret these behaviors by viewing many-shot CoT-ICL as in-context test-time learning rather than scaled pattern matching, and suggests two principles: (i) demonstrations should be easy for the target model to understand, and (ii) they should be ordered to support a smooth conceptual progression. Guided by the principle, we propose Curvilinear Demonstration Selection (CDS), a simple ordering method that yields up to a 5.42 percentage-point gain on geometry with 64 demonstrations. Overall, our results reframe the long context window from a retrieval buffer into a structured curriculum for in-context test-time learning.","upvotes":26,"discussionId":"6a05390cb1a8cbabc9f08772","ai_summary":"Many-shot in-context learning for reasoning tasks exhibits different scaling behaviors than non-reasoning tasks, with demonstration ordering and selection significantly impacting performance.","ai_keywords":["in-context learning","large language models","chain-of-thought","few-shot learning","many-shot learning","test-time learning","demonstration selection","curriculum learning"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"60ab6b2ee3de7c7440abb845","avatarUrl":"/avatars/22916bece3b5b951c016bf2ddd8dda1c.svg","isPro":false,"fullname":"Cindy","user":"ttchungc","type":"user"},{"_id":"689ef5600a1b522a125933ab","avatarUrl":"/avatars/c306acc98c273cfd284fe8732e73a534.svg","isPro":false,"fullname":"Zeng","user":"accident1999","type":"user"},{"_id":"6861ed12ca30e7f987978c69","avatarUrl":"/avatars/78d5a86abfb45964f61ded2a67c03c52.svg","isPro":false,"fullname":"yisen gao","user":"Eason-nuo","type":"user"},{"_id":"6670f52a7a0446ec70e11c96","avatarUrl":"/avatars/7786b9d947d7bbaf95013feee3a3bf39.svg","isPro":false,"fullname":"Jiangnan Li","user":"lossisnotanumber","type":"user"},{"_id":"689ea38daafdf21ed7c40d34","avatarUrl":"/avatars/fa05f9894bf3139710063e4322727a3f.svg","isPro":false,"fullname":"KN","user":"KaNaaaRL","type":"user"},{"_id":"6a053ec6c4f568755517210e","avatarUrl":"/avatars/471d4a34ca0fe6d9b397aa5beb04a49d.svg","isPro":false,"fullname":"tmzhang","user":"TMAgent","type":"user"},{"_id":"6a054599891abe09465b58ec","avatarUrl":"/avatars/1d25ffa228408fc837760900fcea39f6.svg","isPro":false,"fullname":"zzli","user":"ZzzzzLi","type":"user"},{"_id":"65bf3b304b5f8c270d50d6c8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65bf3b304b5f8c270d50d6c8/iSC_Em-JtfcsKFoGw4KFc.jpeg","isPro":false,"fullname":"Kai Chen","user":"KaiChen1998","type":"user"},{"_id":"6a054c5aaf033d262459c6d4","avatarUrl":"/avatars/c952d90fb4ef4c4c82328450d504431b.svg","isPro":false,"fullname":"yauwy","user":"YwwwwY","type":"user"},{"_id":"6891bbff78946201296b4592","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6891bbff78946201296b4592/ECmWBrlfeonPg0HzmQ_sW.png","isPro":false,"fullname":"Yuqing Li","user":"MindscapeRAG","type":"user"},{"_id":"695245ffb6d8665e79f8a472","avatarUrl":"/avatars/1357fd8ee3edd59bdb74ed13fe21de88.svg","isPro":false,"fullname":"cassieyqli","user":"cccassie","type":"user"},{"_id":"636cb3448cb710cbc9945e1e","avatarUrl":"/avatars/368a05d130ca58817a407ba20f40d27e.svg","isPro":false,"fullname":"Yuqing Li","user":"ysgx","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.13511.md"}">
Many-Shot CoT-ICL: Making In-Context Learning Truly Learn
Published on May 13
· Submitted by Cindy on May 14 Abstract
Many-shot in-context learning for reasoning tasks exhibits different scaling behaviors than non-reasoning tasks, with demonstration ordering and selection significantly impacting performance.
AI-generated summary
In-context learning (ICL) adapts large language models (LLMs) to new tasks by conditioning on demonstrations in the prompt without parameter updates. With long-context models, many-shot ICL can use dozens to hundreds of examples and achieve performance comparable to fine-tuning, yet current understanding of its scaling behavior is largely derived from non-reasoning tasks. We study many-shot chain-of-thought in-context learning (CoT-ICL) for reasoning and show that standard many-shot rules do not transfer. Across non-reasoning and reasoning-oriented LLMs and across non-reasoning and reasoning tasks, we find: (i) a setting-dependent scaling effect, where increasing the number of CoT demonstrations is unstable for non-reasoning LLMs and benefits mainly reasoning-oriented LLMs; (ii) similarity-based retrieval helps on non-reasoning tasks but fails on reasoning, since semantic similarity poorly predicts procedural (i.e., CoT) compatibility; and (iii) an order-scaling effect, where performance variance grows with more CoT demonstrations. We interpret these behaviors by viewing many-shot CoT-ICL as in-context test-time learning rather than scaled pattern matching, and suggests two principles: (i) demonstrations should be easy for the target model to understand, and (ii) they should be ordered to support a smooth conceptual progression. Guided by the principle, we propose Curvilinear Demonstration Selection (CDS), a simple ordering method that yields up to a 5.42 percentage-point gain on geometry with 64 demonstrations. Overall, our results reframe the long context window from a retrieval buffer into a structured curriculum for in-context test-time learning.
Community
We believe this work provides a step toward bridging ICL from pattern matching to in-context test time learning with two principles proposed, showing that reasoning performance relies on demonstrations being both understandable to the model and smoothly sequenced to facilitate conceptual progression.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.13511 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.13511 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.13511 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.