Hugging Face Daily Papers · May 15, 2026 · 3 min read

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

We have open-sourced our code and model. Please check out our project page and GitHub repository:</p>\n<p><a href=\"https://simplified-reasoning.github.io/SU-01/\" rel=\"nofollow\">https://simplified-reasoning.github.io/SU-01/</a><br><a href=\"https://github.com/Simplified-Reasoning/SU-01\" rel=\"nofollow\">https://github.com/Simplified-Reasoning/SU-01</a></p>\n","updatedAt":"2026-05-15T02:34:45.161Z","author":{"_id":"63f3502a520c14618925825a","avatarUrl":"/avatars/e986a2a6625e7be6890616a417f908d2.svg","fullname":"Yafu Li","name":"yaful","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.8878379464149475},"editors":["yaful"],"editorAvatarUrls":["/avatars/e986a2a6625e7be6890616a417f908d2.svg"],"reactions":[{"reaction":"👍","users":["Xiaoye08","pilgrim02","zzzhr97","Minlee1","rzzhan","yizhuoli","idforecasting"],"count":7},{"reaction":"🔥","users":["Tressuras","trxcc2002","jcl9"],"count":3}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.13301","authors":[{"_id":"6a05351eb1a8cbabc9f0872f","name":"Yafu Li","hidden":false},{"_id":"6a05351eb1a8cbabc9f08730","name":"Runzhe Zhan","hidden":false},{"_id":"6a05351eb1a8cbabc9f08731","name":"Haoran Zhang","hidden":false},{"_id":"6a05351eb1a8cbabc9f08732","name":"Shunkai Zhang","hidden":false},{"_id":"6a05351eb1a8cbabc9f08733","name":"Yizhuo Li","hidden":false},{"_id":"6a05351eb1a8cbabc9f08734","name":"Zhilin Wang","hidden":false},{"_id":"6a05351eb1a8cbabc9f08735","name":"Jiacheng Chen","hidden":false},{"_id":"6a05351eb1a8cbabc9f08736","name":"Futing Wang","hidden":false},{"_id":"6a05351eb1a8cbabc9f08737","name":"Xuyang Hu","hidden":false},{"_id":"6a05351eb1a8cbabc9f08738","name":"Yuchen Fan","hidden":false},{"_id":"6a05351eb1a8cbabc9f08739","name":"Bangjie Xu","hidden":false},{"_id":"6a05351eb1a8cbabc9f0873a","name":"Yucheng Su","hidden":false},{"_id":"6a05351eb1a8cbabc9f0873b","name":"Xinmiao Han","hidden":false},{"_id":"6a05351eb1a8cbabc9f0873c","name":"Chenxi Li","hidden":false},{"_id":"6a05351eb1a8cbabc9f0873d","name":"Haodi Lei","hidden":false},{"_id":"6a05351eb1a8cbabc9f0873e","name":"Yufeng Zhao","hidden":false},{"_id":"6a05351eb1a8cbabc9f0873f","name":"Zejin Lin","hidden":false},{"_id":"6a05351eb1a8cbabc9f08740","name":"Qianjia Cheng","hidden":false},{"_id":"6a05351eb1a8cbabc9f08741","name":"Tong Zhu","hidden":false},{"_id":"6a05351eb1a8cbabc9f08742","name":"Xiaoye Qu","hidden":false},{"_id":"6a05351eb1a8cbabc9f08743","name":"Ganqu Cui","hidden":false},{"_id":"6a05351eb1a8cbabc9f08744","name":"Peng Ye","hidden":false},{"_id":"6a05351eb1a8cbabc9f08745","name":"Yun Luo","hidden":false},{"_id":"6a05351eb1a8cbabc9f08746","name":"Zhouchen Lin","hidden":false},{"_id":"6a05351eb1a8cbabc9f08747","name":"Yu Qiao","hidden":false},{"_id":"6a05351eb1a8cbabc9f08748","name":"Bowen Zhou","hidden":false},{"_id":"6a05351eb1a8cbabc9f08749","name":"Ning Ding","hidden":false},{"_id":"6a05351eb1a8cbabc9f0874a","name":"Yu Cheng","hidden":false}],"publishedAt":"2026-05-13T00:00:00.000Z","submittedOnDailyAt":"2026-05-15T00:00:00.000Z","title":"Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling","submittedOnDailyBy":{"_id":"63f3502a520c14618925825a","avatarUrl":"/avatars/e986a2a6625e7be6890616a417f908d2.svg","isPro":false,"fullname":"Yafu Li","user":"yaful","type":"user","name":"yaful"},"summary":"Recent progress in reasoning models has substantially advanced long-horizon mathematical and scientific problem solving, with several systems now reaching gold-medal-level performance on International Mathematical Olympiad (IMO) and International Physics Olympiad (IPhO) problems. In this paper, we introduce a simple and unified recipe for converting a post-trained reasoning backbone into a rigorous olympiad-level solver. The recipe first uses a reverse-perplexity curriculum for SFT to instill rigorous proof-search and self-checking behaviors, then scales these behaviors through a two-stage RL pipeline that progresses from RL with verifiable rewards to more delicate proof-level RL, and finally boosts solving performance with test-time scaling. Applying this recipe, we train a 30B-A3B backbone with SFT on around 340K sub-8K-token trajectories followed by 200 RL steps. The resulting model, SU-01, supports stable reasoning on difficult problems with trajectories exceeding 100K tokens, while achieving gold-medal-level performance on mathematical and physical olympiad competitions, including IMO 2025/USAMO 2026 and IPhO 2024/2025. It also demonstrates strong generalization of scientific reasoning to domains beyond mathematics and physics.","upvotes":126,"discussionId":"6a05351fb1a8cbabc9f0874b","projectPage":"https://simplified-reasoning.github.io/SU-01","githubRepo":"https://github.com/Simplified-Reasoning/SU-01","githubRepoAddedBy":"user","ai_summary":"A systematic approach transforms post-trained reasoning models into rigorous olympiad-level solvers through reverse-perplexity curriculum, two-stage reinforcement learning, and test-time scaling, achieving gold-medal performance on mathematical and physics competitions.","ai_keywords":["reasoning models","mathematical problem solving","scientific problem solving","International Mathematical Olympiad","International Physics Olympiad","reverse-perplexity curriculum","supervised fine-tuning","proof-search","self-checking behaviors","reinforcement learning","verifiable rewards","proof-level RL","test-time scaling","trajectory","token","backbone","SFT","RL"],"githubStars":54},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63f3502a520c14618925825a","avatarUrl":"/avatars/e986a2a6625e7be6890616a417f908d2.svg","isPro":false,"fullname":"Yafu Li","user":"yaful","type":"user"},{"_id":"62495cb96ee7ee6b646db130","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62495cb96ee7ee6b646db130/UwBXmvcMq7LMvBWUw0xo3.jpeg","isPro":false,"fullname":"Runzhe Zhan","user":"rzzhan","type":"user"},{"_id":"68dfc702ef71a35a2e5418d5","avatarUrl":"/avatars/fd7da0820cb4121d205ca6e3fc325bbb.svg","isPro":false,"fullname":"Li","user":"winmyaku","type":"user"},{"_id":"682fe5e1c8ddcbd4645ec29d","avatarUrl":"/avatars/843a5c6ba36230c02cce4a8bf9d47319.svg","isPro":false,"fullname":"Mehmet Yılmaz","user":"mY1lmaz","type":"user"},{"_id":"682fe2641af275ff5c4a06e0","avatarUrl":"/avatars/f939d34170c74fc873a2abb6e97680ff.svg","isPro":false,"fullname":"Emily Carter","user":"EmilyC11","type":"user"},{"_id":"65697feb9fb2d79a79e14e0a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65697feb9fb2d79a79e14e0a/wVGaBjn8pQIJneZWSFIwS.jpeg","isPro":false,"fullname":"lei","user":"bingyang-lei","type":"user"},{"_id":"64c8fb6be761f470612a342a","avatarUrl":"/avatars/d6e1d753f39d9491bc00e2f0169d5e57.svg","isPro":false,"fullname":"xinyi","user":"x1iris","type":"user"},{"_id":"68075d12dadb28dddc14a509","avatarUrl":"/avatars/7a9d86c6fb56ece2ed5bf36350142612.svg","isPro":false,"fullname":"Finch","user":"KKKepler","type":"user"},{"_id":"6a06842f94822ad3a388c7d3","avatarUrl":"/avatars/00bcfdee0c2e15ccedf78fa981c33aa1.svg","isPro":false,"fullname":"Ho","user":"JenniferYY","type":"user"},{"_id":"68076f20757bde45c40abdd8","avatarUrl":"/avatars/5eca1d2a379774310e7f7874a5be0013.svg","isPro":false,"fullname":"rao","user":"bingqing1","type":"user"},{"_id":"68db6669d67603788700b209","avatarUrl":"/avatars/0e09b8096421727d6f503d657085210e.svg","isPro":false,"fullname":"Ganqu Cui","user":"GanquCui","type":"user"},{"_id":"680b35fcdf6ff26584543fee","avatarUrl":"/avatars/baf73994882d15e47583fca812ca5f50.svg","isPro":false,"fullname":"Michael Brooks","user":"MichaelBrooks","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":1,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.13301.md"}">

Papers

arxiv:2605.13301

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Published on May 13

· Submitted by

Yafu Li on May 15

#1 Paper of the day

Upvote

126

Authors:

Abstract

A systematic approach transforms post-trained reasoning models into rigorous olympiad-level solvers through reverse-perplexity curriculum, two-stage reinforcement learning, and test-time scaling, achieving gold-medal performance on mathematical and physics competitions.

AI-generated summary

Recent progress in reasoning models has substantially advanced long-horizon mathematical and scientific problem solving, with several systems now reaching gold-medal-level performance on International Mathematical Olympiad (IMO) and International Physics Olympiad (IPhO) problems. In this paper, we introduce a simple and unified recipe for converting a post-trained reasoning backbone into a rigorous olympiad-level solver. The recipe first uses a reverse-perplexity curriculum for SFT to instill rigorous proof-search and self-checking behaviors, then scales these behaviors through a two-stage RL pipeline that progresses from RL with verifiable rewards to more delicate proof-level RL, and finally boosts solving performance with test-time scaling. Applying this recipe, we train a 30B-A3B backbone with SFT on around 340K sub-8K-token trajectories followed by 200 RL steps. The resulting model, SU-01, supports stable reasoning on difficult problems with trajectories exceeding 100K tokens, while achieving gold-medal-level performance on mathematical and physical olympiad competitions, including IMO 2025/USAMO 2026 and IPhO 2024/2025. It also demonstrates strong generalization of scientific reasoning to domains beyond mathematics and physics.