Hugging Face Daily Papers · June 8, 2026 · 6 min read

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Language models can use verifiable rewards to improve at a wide variety of reasoning tasks. However, both parametric (e.g. RLVR) and non-parametric (e.g. prompt optimization) approaches to doing so typically require hundreds of training samples and thousands of model rollouts, making them expensive in the best case and intractable in the worst. To address this challenge, we introduce Contrastive Reflection (CORE), a non-parametric learning algorithm that compares past reasoning traces to generate insights: short natural-language descriptions of reasoning strategies and constraints that capture differences between successful and unsuccessful problem attempts. Across four reasoning tasks, we demonstrate that CORE enables more rapid improvement than both parametric (GRPO) and non-parametric (GEPA, episodic RAG, and MemRL) methods, while using fewer rollouts. Under fixed rollout budgets with as few as five training samples, CORE achieves the strongest performance in most task–data regimes. Finally, we highlight how CORE is substantially more context-efficient than non-parametric baselines, requiring fewer prompt tokens while storing learned knowledge as compact, interpretable natural-language insights. Our results therefore suggest that distilling contrasts between successful and unsuccessful reasoning traces into abstract and useful insights can provide a more efficient and interpretable route to model self-improvement than weight updates, prompt optimization, or direct reuse of stored reasoning traces.</p>\n<p><a href=\"https://cdn-uploads.huggingface.co/production/uploads/674a59612f5974eb9ace45b1/koiziWjPm36J0g7tMA45F.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/674a59612f5974eb9ace45b1/koiziWjPm36J0g7tMA45F.png\" alt=\"5\"></a></p>\n","updatedAt":"2026-06-08T17:18:45.816Z","author":{"_id":"674a59612f5974eb9ace45b1","avatarUrl":"/avatars/50d8816053796da7b45e11118930d82b.svg","fullname":"Linas Nasvytis","name":"linasmn","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8835269212722778},"editors":["linasmn"],"editorAvatarUrls":["/avatars/50d8816053796da7b45e11118930d82b.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.28742","authors":[{"_id":"6a1925be56b4bb14ec65d09f","user":{"_id":"674a59612f5974eb9ace45b1","avatarUrl":"/avatars/50d8816053796da7b45e11118930d82b.svg","isPro":false,"fullname":"Linas Nasvytis","user":"linasmn","type":"user","name":"linasmn"},"name":"Linas Nasvytis","status":"claimed_verified","statusLastChangedAt":"2026-05-29T08:49:42.369Z","hidden":false},{"_id":"6a1925be56b4bb14ec65d0a0","name":"Simon Jerome Han","hidden":false},{"_id":"6a1925be56b4bb14ec65d0a1","name":"Ben Prystawski","hidden":false},{"_id":"6a1925be56b4bb14ec65d0a2","name":"Satchel Grant","hidden":false},{"_id":"6a1925be56b4bb14ec65d0a3","name":"Noah D. Goodman","hidden":false},{"_id":"6a1925be56b4bb14ec65d0a4","name":"Judith E. Fan","hidden":false}],"publishedAt":"2026-05-27T00:00:00.000Z","submittedOnDailyAt":"2026-06-08T00:00:00.000Z","title":"CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning","submittedOnDailyBy":{"_id":"674a59612f5974eb9ace45b1","avatarUrl":"/avatars/50d8816053796da7b45e11118930d82b.svg","isPro":false,"fullname":"Linas Nasvytis","user":"linasmn","type":"user","name":"linasmn"},"summary":"Language models can use verifiable rewards to improve at a wide variety of reasoning tasks. However, both parametric (e.g. RLVR) and non-parametric (e.g. prompt optimization) approaches to doing so typically require hundreds of training samples and thousands of model rollouts, making them expensive in the best case and intractable in the worst. To address this challenge, we introduce Contrastive Reflection (CORE), a non-parametric learning algorithm that compares past reasoning traces to generate insights: short natural-language descriptions of reasoning strategies and constraints that capture differences between successful and unsuccessful problem attempts. Across four reasoning tasks, we demonstrate that CORE enables more rapid improvement than both parametric (GRPO) and non-parametric (GEPA, episodic RAG, and MemRL) methods, while using fewer rollouts. Under fixed rollout budgets with as few as five training samples, we then show that CORE also achieves comparable or greater performance gains than each baseline. Finally, we highlight how CORE is also substantially more context-efficient than non-parametric baselines, requiring fewer prompt tokens while storing learned knowledge as compact, interpretable natural-language insights. Our results therefore suggest that distilling contrasts between successful and unsuccessful reasoning traces into abstract and useful insights can provide a more efficient and interpretable route to model self-improvement than weight updates, prompt optimization, or direct reuse of stored reasoning traces.","upvotes":0,"discussionId":"6a1925bf56b4bb14ec65d0a5","projectPage":"https://linasnasvytis.com/core-reasoning/","githubRepo":"https://github.com/LinasNas/core-reasoning","githubRepoAddedBy":"user","ai_summary":"Contrastive Reflection (CORE) improves language model reasoning by analyzing differences between successful and unsuccessful attempts to generate concise, interpretable insights that enable faster and more efficient self-improvement compared to traditional parametric and non-parametric approaches.","ai_keywords":["verifiable rewards","reasoning tasks","parametric approaches","non-parametric approaches","training samples","model rollouts","Contrastive Reflection","reasoning traces","natural-language descriptions","reasoning strategies","constraints","GRPO","GEPA","episodic RAG","MemRL","rollout budgets","prompt tokens","self-improvement"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":2,"organization":{"_id":"672c672dcf09d152f4da04c4","name":"StanfordUniversity","fullname":"Stanford University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e396f2b5bb631e9b2fac9a/vJI0POlzGMXL2878t1vz2.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"organization":{"_id":"672c672dcf09d152f4da04c4","name":"StanfordUniversity","fullname":"Stanford University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e396f2b5bb631e9b2fac9a/vJI0POlzGMXL2878t1vz2.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.28742.md"}">

Papers

arxiv:2605.28742

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Published on May 27

· Submitted by

Linas Nasvytis on Jun 8

Stanford University

Upvote

Authors:

Linas Nasvytis ,

Abstract

Contrastive Reflection (CORE) improves language model reasoning by analyzing differences between successful and unsuccessful attempts to generate concise, interpretable insights that enable faster and more efficient self-improvement compared to traditional parametric and non-parametric approaches.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

View arXiv page View PDF Project page GitHub 2 Add to collection

Community

linasmn

Paper author Paper submitter about 3 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.28742

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.28742 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.28742 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.28742 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers