arxiv: <a href=\"https://arxiv.org/abs/2605.07237\" rel=\"nofollow\">https://arxiv.org/abs/2605.07237</a><br>Code and models will be released soon.</p>\n","updatedAt":"2026-05-13T01:40:43.339Z","author":{"_id":"6306df0ed37ce67e0e53e3f1","avatarUrl":"/avatars/9e20d43941169e6207f85f9bcc25a0de.svg","fullname":"Hyeon Hwang","name":"Hyeoni","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8235577344894409},"editors":["Hyeoni"],"editorAvatarUrls":["/avatars/9e20d43941169e6207f85f9bcc25a0de.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.07237","authors":[{"_id":"6a0288c2b823258e76123371","user":{"_id":"6306df0ed37ce67e0e53e3f1","avatarUrl":"/avatars/9e20d43941169e6207f85f9bcc25a0de.svg","isPro":false,"fullname":"Hyeon Hwang","user":"Hyeoni","type":"user","name":"Hyeoni"},"name":"Hyeon Hwang","status":"claimed_verified","statusLastChangedAt":"2026-05-12T08:02:56.288Z","hidden":false},{"_id":"6a0288c2b823258e76123372","name":"Jiwoo Lee","hidden":false},{"_id":"6a0288c2b823258e76123373","name":"Jaewoo Kang","hidden":false}],"publishedAt":"2026-05-11T00:00:00.000Z","submittedOnDailyAt":"2026-05-13T00:00:00.000Z","title":"Teaching Language Models to Think in Code","submittedOnDailyBy":{"_id":"6306df0ed37ce67e0e53e3f1","avatarUrl":"/avatars/9e20d43941169e6207f85f9bcc25a0de.svg","isPro":false,"fullname":"Hyeon Hwang","user":"Hyeoni","type":"user","name":"Hyeoni"},"summary":"Tool-integrated reasoning (TIR) has emerged as a dominant paradigm for mathematical problem solving in language models, combining natural language (NL) reasoning with code execution. However, this interleaved setup has three key limitations: code often acts as a post-hoc verifier, intermediate NL computations are error-prone, and NL and code play overlapping rather than clearly distinct roles. We propose ThinC (Thinking in Code), a framework in which code itself serves as the reasoner rather than as a tool invoked by NL. A ThinC trajectory begins with a brief NL planning step, after which all reasoning unfolds through code blocks connected only by their execution outputs. We distill 12.2k code-centric trajectories from a teacher model and train ThinC-1.7B and ThinC-4B with supervised fine-tuning followed by reinforcement learning. ThinC-4B consistently outperforms every TIR baseline on five competition-level math benchmarks and even surpasses the much larger Qwen3-235B-A22B-Thinking. Further analysis shows that ThinC reasons through code: 99.2% of its final answers are grounded in interpreter output, and the model recovers reliably from code execution failures without intermediate NL reasoning. Our code and models will be released soon.","upvotes":17,"discussionId":"6a0288c3b823258e76123374","ai_summary":"ThinC framework enables mathematical problem solving where code serves as the primary reasoning mechanism instead of a verification tool, demonstrating superior performance on math benchmarks.","ai_keywords":["tool-integrated reasoning","language models","code execution","supervised fine-tuning","reinforcement learning","mathematical problem solving","code-centric trajectories","interpreter output"],"organization":{"_id":"6621bc39e774284ec1742ab8","name":"KoreaUniversity","fullname":"Korea University","avatar":"https://www.gravatar.com/avatar/cc5d7875c605f40a27c72e42e6aa2857?d=retro&size=100"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"60f8435644e75317cc02ed51","avatarUrl":"/avatars/68b7fc077fe2bda6607b1c470add8140.svg","isPro":false,"fullname":"Jungwoo Park","user":"affjljoo3581","type":"user"},{"_id":"67348f009551fdc242064ef4","avatarUrl":"/avatars/38023d6d3c2ea12434ed55aca7ca1c3e.svg","isPro":false,"fullname":"Jueon Park","user":"bioai96","type":"user"},{"_id":"65c485309f6c4689fbbe890c","avatarUrl":"/avatars/c872fbfb4e9e399f17bf96a98aef994d.svg","isPro":false,"fullname":"Jiwoo lee","user":"Jamie-Jiwoo-98","type":"user"},{"_id":"6306df0ed37ce67e0e53e3f1","avatarUrl":"/avatars/9e20d43941169e6207f85f9bcc25a0de.svg","isPro":false,"fullname":"Hyeon Hwang","user":"Hyeoni","type":"user"},{"_id":"68121036912c2103dd3ba6cd","avatarUrl":"/avatars/2ac7eacb486bf1993f7465299282cd3c.svg","isPro":false,"fullname":"Taeyun Roh","user":"txxnrd","type":"user"},{"_id":"656a22fa801ed9952f432e69","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/656a22fa801ed9952f432e69/w8T3rqAvxCu8xlh049jhh.webp","isPro":false,"fullname":"Kyochul Jang","user":"OfficerChul","type":"user"},{"_id":"664b233490135abe9bbbb72a","avatarUrl":"/avatars/d2175e2e2956ae1b61093cd424bc903e.svg","isPro":false,"fullname":"Hoonick Lee","user":"hoonick","type":"user"},{"_id":"631c386bc73939ffc0716a37","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1662793811119-noauth.jpeg","isPro":false,"fullname":"SeongWan Kim","user":"idgmatrix","type":"user"},{"_id":"63f1de31f4e30ffd2bcd626b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63f1de31f4e30ffd2bcd626b/aPHgcUj0NN68_fIEym3HE.jpeg","isPro":false,"fullname":"Suhyeong Park","user":"Codingchild","type":"user"},{"_id":"689431c991b338f15ee89b51","avatarUrl":"/avatars/f3d19eb7dc9dc086c4758e3fe778121d.svg","isPro":false,"fullname":"sanghoon lee","user":"sanghoonAIGEN","type":"user"},{"_id":"6686d82da7812dde115c0fad","avatarUrl":"/avatars/38d64823c807e41abb5dce3d8b38285b.svg","isPro":false,"fullname":"Hyunjin Seo","user":"bellaseo72","type":"user"},{"_id":"66b08b4073063b0cceec914f","avatarUrl":"/avatars/02b1446b9af2ef1504d78f08272b630c.svg","isPro":false,"fullname":"Minbyul Jeong","user":"Minstar","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6621bc39e774284ec1742ab8","name":"KoreaUniversity","fullname":"Korea University","avatar":"https://www.gravatar.com/avatar/cc5d7875c605f40a27c72e42e6aa2857?d=retro&size=100"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.07237.md"}">
Teaching Language Models to Think in Code
Abstract
ThinC framework enables mathematical problem solving where code serves as the primary reasoning mechanism instead of a verification tool, demonstrating superior performance on math benchmarks.
AI-generated summary
Tool-integrated reasoning (TIR) has emerged as a dominant paradigm for mathematical problem solving in language models, combining natural language (NL) reasoning with code execution. However, this interleaved setup has three key limitations: code often acts as a post-hoc verifier, intermediate NL computations are error-prone, and NL and code play overlapping rather than clearly distinct roles. We propose ThinC (Thinking in Code), a framework in which code itself serves as the reasoner rather than as a tool invoked by NL. A ThinC trajectory begins with a brief NL planning step, after which all reasoning unfolds through code blocks connected only by their execution outputs. We distill 12.2k code-centric trajectories from a teacher model and train ThinC-1.7B and ThinC-4B with supervised fine-tuning followed by reinforcement learning. ThinC-4B consistently outperforms every TIR baseline on five competition-level math benchmarks and even surpasses the much larger Qwen3-235B-A22B-Thinking. Further analysis shows that ThinC reasons through code: 99.2% of its final answers are grounded in interpreter output, and the model recovers reliably from code execution failures without intermediate NL reasoning. Our code and models will be released soon.
Community
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.07237 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.07237 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.07237 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.