We have open-sourced our code and model. Please check out our project page and GitHub repository:</p>\n<p><a href=\"https://simplified-reasoning.github.io/SU-01/\" rel=\"nofollow\">https://simplified-reasoning.github.io/SU-01/</a><br><a href=\"https://github.com/Simplified-Reasoning/SU-01\" rel=\"nofollow\">https://github.com/Simplified-Reasoning/SU-01</a></p>\n","updatedAt":"2026-05-15T02:34:45.161Z","author":{"_id":"63f3502a520c14618925825a","avatarUrl":"/avatars/e986a2a6625e7be6890616a417f908d2.svg","fullname":"Yafu Li","name":"yaful","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.8878379464149475},"editors":["yaful"],"editorAvatarUrls":["/avatars/e986a2a6625e7be6890616a417f908d2.svg"],"reactions":[{"reaction":"👍","users":["Xiaoye08","pilgrim02","zzzhr97","Minlee1","rzzhan","yizhuoli","idforecasting"],"count":7},{"reaction":"🔥","users":["Tressuras","trxcc2002","jcl9"],"count":3}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.13301","authors":[{"_id":"6a05351eb1a8cbabc9f0872f","name":"Yafu Li","hidden":false},{"_id":"6a05351eb1a8cbabc9f08730","name":"Runzhe Zhan","hidden":false},{"_id":"6a05351eb1a8cbabc9f08731","name":"Haoran Zhang","hidden":false},{"_id":"6a05351eb1a8cbabc9f08732","name":"Shunkai Zhang","hidden":false},{"_id":"6a05351eb1a8cbabc9f08733","name":"Yizhuo Li","hidden":false},{"_id":"6a05351eb1a8cbabc9f08734","name":"Zhilin Wang","hidden":false},{"_id":"6a05351eb1a8cbabc9f08735","name":"Jiacheng Chen","hidden":false},{"_id":"6a05351eb1a8cbabc9f08736","name":"Futing Wang","hidden":false},{"_id":"6a05351eb1a8cbabc9f08737","name":"Xuyang Hu","hidden":false},{"_id":"6a05351eb1a8cbabc9f08738","name":"Yuchen Fan","hidden":false},{"_id":"6a05351eb1a8cbabc9f08739","name":"Bangjie Xu","hidden":false},{"_id":"6a05351eb1a8cbabc9f0873a","name":"Yucheng Su","hidden":false},{"_id":"6a05351eb1a8cbabc9f0873b","name":"Xinmiao Han","hidden":false},{"_id":"6a05351eb1a8cbabc9f0873c","name":"Chenxi Li","hidden":false},{"_id":"6a05351eb1a8cbabc9f0873d","name":"Haodi Lei","hidden":false},{"_id":"6a05351eb1a8cbabc9f0873e","name":"Yufeng Zhao","hidden":false},{"_id":"6a05351eb1a8cbabc9f0873f","name":"Zejin Lin","hidden":false},{"_id":"6a05351eb1a8cbabc9f08740","name":"Qianjia Cheng","hidden":false},{"_id":"6a05351eb1a8cbabc9f08741","name":"Tong Zhu","hidden":false},{"_id":"6a05351eb1a8cbabc9f08742","name":"Xiaoye Qu","hidden":false},{"_id":"6a05351eb1a8cbabc9f08743","name":"Ganqu Cui","hidden":false},{"_id":"6a05351eb1a8cbabc9f08744","name":"Peng Ye","hidden":false},{"_id":"6a05351eb1a8cbabc9f08745","name":"Yun Luo","hidden":false},{"_id":"6a05351eb1a8cbabc9f08746","name":"Zhouchen Lin","hidden":false},{"_id":"6a05351eb1a8cbabc9f08747","name":"Yu Qiao","hidden":false},{"_id":"6a05351eb1a8cbabc9f08748","name":"Bowen Zhou","hidden":false},{"_id":"6a05351eb1a8cbabc9f08749","name":"Ning Ding","hidden":false},{"_id":"6a05351eb1a8cbabc9f0874a","name":"Yu Cheng","hidden":false}],"publishedAt":"2026-05-13T00:00:00.000Z","submittedOnDailyAt":"2026-05-15T00:00:00.000Z","title":"Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling","submittedOnDailyBy":{"_id":"63f3502a520c14618925825a","avatarUrl":"/avatars/e986a2a6625e7be6890616a417f908d2.svg","isPro":false,"fullname":"Yafu Li","user":"yaful","type":"user","name":"yaful"},"summary":"Recent progress in reasoning models has substantially advanced long-horizon mathematical and scientific problem solving, with several systems now reaching gold-medal-level performance on International Mathematical Olympiad (IMO) and International Physics Olympiad (IPhO) problems. In this paper, we introduce a simple and unified recipe for converting a post-trained reasoning backbone into a rigorous olympiad-level solver. The recipe first uses a reverse-perplexity curriculum for SFT to instill rigorous proof-search and self-checking behaviors, then scales these behaviors through a two-stage RL pipeline that progresses from RL with verifiable rewards to more delicate proof-level RL, and finally boosts solving performance with test-time scaling. Applying this recipe, we train a 30B-A3B backbone with SFT on around 340K sub-8K-token trajectories followed by 200 RL steps. The resulting model, SU-01, supports stable reasoning on difficult problems with trajectories exceeding 100K tokens, while achieving gold-medal-level performance on mathematical and physical olympiad competitions, including IMO 2025/USAMO 2026 and IPhO 2024/2025. It also demonstrates strong generalization of scientific reasoning to domains beyond mathematics and physics.","upvotes":126,"discussionId":"6a05351fb1a8cbabc9f0874b","projectPage":"https://simplified-reasoning.github.io/SU-01","githubRepo":"https://github.com/Simplified-Reasoning/SU-01","githubRepoAddedBy":"user","ai_summary":"A systematic approach transforms post-trained reasoning models into rigorous olympiad-level solvers through reverse-perplexity curriculum, two-stage reinforcement learning, and test-time scaling, achieving gold-medal performance on mathematical and physics competitions.","ai_keywords":["reasoning models","mathematical problem solving","scientific problem solving","International Mathematical Olympiad","International Physics Olympiad","reverse-perplexity curriculum","supervised fine-tuning","proof-search","self-checking behaviors","reinforcement learning","verifiable rewards","proof-level RL","test-time scaling","trajectory","token","backbone","SFT","RL"],"githubStars":54},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63f3502a520c14618925825a","avatarUrl":"/avatars/e986a2a6625e7be6890616a417f908d2.svg","isPro":false,"fullname":"Yafu Li","user":"yaful","type":"user"},{"_id":"62495cb96ee7ee6b646db130","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62495cb96ee7ee6b646db130/UwBXmvcMq7LMvBWUw0xo3.jpeg","isPro":false,"fullname":"Runzhe Zhan","user":"rzzhan","type":"user"},{"_id":"68dfc702ef71a35a2e5418d5","avatarUrl":"/avatars/fd7da0820cb4121d205ca6e3fc325bbb.svg","isPro":false,"fullname":"Li","user":"winmyaku","type":"user"},{"_id":"682fe5e1c8ddcbd4645ec29d","avatarUrl":"/avatars/843a5c6ba36230c02cce4a8bf9d47319.svg","isPro":false,"fullname":"Mehmet Yılmaz","user":"mY1lmaz","type":"user"},{"_id":"682fe2641af275ff5c4a06e0","avatarUrl":"/avatars/f939d34170c74fc873a2abb6e97680ff.svg","isPro":false,"fullname":"Emily Carter","user":"EmilyC11","type":"user"},{"_id":"65697feb9fb2d79a79e14e0a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65697feb9fb2d79a79e14e0a/wVGaBjn8pQIJneZWSFIwS.jpeg","isPro":false,"fullname":"lei","user":"bingyang-lei","type":"user"},{"_id":"64c8fb6be761f470612a342a","avatarUrl":"/avatars/d6e1d753f39d9491bc00e2f0169d5e57.svg","isPro":false,"fullname":"xinyi","user":"x1iris","type":"user"},{"_id":"68075d12dadb28dddc14a509","avatarUrl":"/avatars/7a9d86c6fb56ece2ed5bf36350142612.svg","isPro":false,"fullname":"Finch","user":"KKKepler","type":"user"},{"_id":"6a06842f94822ad3a388c7d3","avatarUrl":"/avatars/00bcfdee0c2e15ccedf78fa981c33aa1.svg","isPro":false,"fullname":"Ho","user":"JenniferYY","type":"user"},{"_id":"68076f20757bde45c40abdd8","avatarUrl":"/avatars/5eca1d2a379774310e7f7874a5be0013.svg","isPro":false,"fullname":"rao","user":"bingqing1","type":"user"},{"_id":"68db6669d67603788700b209","avatarUrl":"/avatars/0e09b8096421727d6f503d657085210e.svg","isPro":false,"fullname":"Ganqu Cui","user":"GanquCui","type":"user"},{"_id":"680b35fcdf6ff26584543fee","avatarUrl":"/avatars/baf73994882d15e47583fca812ca5f50.svg","isPro":false,"fullname":"Michael Brooks","user":"MichaelBrooks","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":1,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.13301.md"}">
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
Authors: ,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
A systematic approach transforms post-trained reasoning models into rigorous olympiad-level solvers through reverse-perplexity curriculum, two-stage reinforcement learning, and test-time scaling, achieving gold-medal performance on mathematical and physics competitions.
AI-generated summary
Recent progress in reasoning models has substantially advanced long-horizon mathematical and scientific problem solving, with several systems now reaching gold-medal-level performance on International Mathematical Olympiad (IMO) and International Physics Olympiad (IPhO) problems. In this paper, we introduce a simple and unified recipe for converting a post-trained reasoning backbone into a rigorous olympiad-level solver. The recipe first uses a reverse-perplexity curriculum for SFT to instill rigorous proof-search and self-checking behaviors, then scales these behaviors through a two-stage RL pipeline that progresses from RL with verifiable rewards to more delicate proof-level RL, and finally boosts solving performance with test-time scaling. Applying this recipe, we train a 30B-A3B backbone with SFT on around 340K sub-8K-token trajectories followed by 200 RL steps. The resulting model, SU-01, supports stable reasoning on difficult problems with trajectories exceeding 100K tokens, while achieving gold-medal-level performance on mathematical and physical olympiad competitions, including IMO 2025/USAMO 2026 and IPhO 2024/2025. It also demonstrates strong generalization of scientific reasoning to domains beyond mathematics and physics.
Community
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.13301 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.13301 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.