<a href=\"https://frontier-cs.org/blog/frontiersmith/\" rel=\"nofollow\">https://frontier-cs.org/blog/frontiersmith/</a></p>\n","updatedAt":"2026-05-15T07:50:12.085Z","author":{"_id":"69446b15835f00df604cbc7a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/203QH9qkSLhigPnPA2SZW.webp","fullname":"Qiuyang Mang","name":"qmang","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"fr","probability":0.7040683627128601},"editors":["qmang"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/203QH9qkSLhigPnPA2SZW.webp"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.14445","authors":[{"_id":"6a06cf2bb1a8cbabc9f09b1b","name":"Runyuan He","hidden":false},{"_id":"6a06cf2bb1a8cbabc9f09b1c","name":"Qiuyang Mang","hidden":false},{"_id":"6a06cf2bb1a8cbabc9f09b1d","name":"Shang Zhou","hidden":false},{"_id":"6a06cf2bb1a8cbabc9f09b1e","name":"Kaiyuan Liu","hidden":false},{"_id":"6a06cf2bb1a8cbabc9f09b1f","name":"Hanchen Li","hidden":false},{"_id":"6a06cf2bb1a8cbabc9f09b20","name":"Huanzhi Mao","hidden":false},{"_id":"6a06cf2bb1a8cbabc9f09b21","name":"Qizheng Zhang","hidden":false},{"_id":"6a06cf2bb1a8cbabc9f09b22","name":"Zerui Li","hidden":false},{"_id":"6a06cf2bb1a8cbabc9f09b23","name":"Bo Peng","hidden":false},{"_id":"6a06cf2bb1a8cbabc9f09b24","name":"Lufeng Cheng","hidden":false},{"_id":"6a06cf2bb1a8cbabc9f09b25","name":"Tianfu Fu","hidden":false},{"_id":"6a06cf2bb1a8cbabc9f09b26","name":"Yichuan Wang","hidden":false},{"_id":"6a06cf2bb1a8cbabc9f09b27","name":"Wenhao Chai","hidden":false},{"_id":"6a06cf2bb1a8cbabc9f09b28","name":"Jingbo Shang","hidden":false},{"_id":"6a06cf2bb1a8cbabc9f09b29","name":"Alex Dimakis","hidden":false},{"_id":"6a06cf2bb1a8cbabc9f09b2a","name":"Joseph E. Gonzalez","hidden":false},{"_id":"6a06cf2bb1a8cbabc9f09b2b","name":"Alvin Cheung","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/69446b15835f00df604cbc7a/B6TvvMtBL8-yu3ZKsd68N.mp4"],"publishedAt":"2026-05-14T00:00:00.000Z","submittedOnDailyAt":"2026-05-15T00:00:00.000Z","title":"FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale","submittedOnDailyBy":{"_id":"69446b15835f00df604cbc7a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/203QH9qkSLhigPnPA2SZW.webp","isPro":false,"fullname":"Qiuyang Mang","user":"qmang","type":"user","name":"qmang"},"summary":"Many real-world coding challenges are open-ended and admit no known optimal solution. Yet, recent progress in LLM coding has focused on well-defined tasks such as feature implementation, bug fixing, and competitive programming. Open-ended coding remains a weak spot for LLMs, largely because open-ended training problems are scarce and expensive to construct. Our goal is to synthesize open-ended coding problems at scale to train stronger LLM coders. We introduce FrontierSmith, an automated system for iteratively evolving open-ended problems from existing closed-ended coding tasks. Starting from competitive programming problems, FrontierSmith generates candidate open-ended variants by changing the problems'goals, restricting outputs, and generalizing inputs. It then uses a quantitative idea divergence metric to select problems that elicit genuinely diverse approaches from different solvers. Agents then generate test cases and verifiers for the surviving candidates. On two open-ended coding benchmarks, training on our synthesized data yields substantial gains over the base models: Qwen3.5-9B improves by +8.82 score on FrontierCS and +306.36 (Elo-rating-based performance) on ALE-bench; Qwen3.5-27B improves by +12.12 and +309.12, respectively. The synthesized problems also make agents take more turns and use more tokens, similar to human-curated ones, suggesting that closed-ended seeds can be a practical starting point for long-horizon coding data.","upvotes":15,"discussionId":"6a06cf2bb1a8cbabc9f09b2c","projectPage":"https://frontier-cs.org","githubRepo":"https://github.com/FrontierCS/FrontierSmith","githubRepoAddedBy":"user","ai_summary":"FrontierSmith automates the creation of open-ended coding problems from closed-ended tasks, improving LLM coding performance on benchmarks through diverse problem variants and enhanced agent interactions.","ai_keywords":["LLM coding","open-ended coding","competitive programming","automated problem generation","idea divergence metric","test case generation","verifier generation","FrontierCS","ALE-bench","Elo-rating-based performance"],"githubStars":5},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"69446b15835f00df604cbc7a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/203QH9qkSLhigPnPA2SZW.webp","isPro":false,"fullname":"Qiuyang Mang","user":"qmang","type":"user"},{"_id":"66c0a08bac74db25de8427ec","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66c0a08bac74db25de8427ec/9D-piDBZqSt6KNkHImmkv.jpeg","isPro":false,"fullname":"Jintao Zhang","user":"jt-zhang","type":"user"},{"_id":"637c7503fe115289cfecbe6b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676361945047-637c7503fe115289cfecbe6b.jpeg","isPro":false,"fullname":"Wenhao Chai","user":"wchai","type":"user"},{"_id":"6841d752fa50fdb6ce14450b","avatarUrl":"/avatars/eb193e693b8618d832a4b4cb7085903e.svg","isPro":false,"fullname":"Xuanyi Zhou","user":"dyxg","type":"user"},{"_id":"68ed8956c93562a97c50d0a7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/Ijc0rJjsnw9azaPzAjOBp.png","isPro":false,"fullname":"Runyuan He","user":"runyuanhe","type":"user"},{"_id":"69d61c557923b4024f3b5e83","avatarUrl":"/avatars/4c281093e3c98f7844016e7c761a58a0.svg","isPro":false,"fullname":"Pinjia He","user":"pinjiahe","type":"user"},{"_id":"66d2be09d460a0c312624ffd","avatarUrl":"/avatars/b3c91b456d8279cbb8bb47d5a1b49b80.svg","isPro":false,"fullname":"moogician","user":"moogician","type":"user"},{"_id":"64ac632bcf90fe27553a7539","avatarUrl":"/avatars/0aade58948514e6cea034c2037a78aeb.svg","isPro":false,"fullname":"Hanchen Li","user":"large-hadron-collider","type":"user"},{"_id":"6a06d0de94822ad3a38e07ce","avatarUrl":"/avatars/f5d5073e5d0c8f2afeb359bcdc087d75.svg","isPro":false,"fullname":"neil li","user":"d1zzyl126","type":"user"},{"_id":"665715b771aab2ae5bdfeb4d","avatarUrl":"/avatars/2fdfd9e9060fd3a8a3f2486a130f200d.svg","isPro":false,"fullname":"Zhenfeng Su","user":"ranksu","type":"user"},{"_id":"671f34c1dd7a09ed9ba3de88","avatarUrl":"/avatars/ff392cf002b3f1c54f04f9250d610b85.svg","isPro":false,"fullname":"Xiaochuan Yan","user":"Barvemali","type":"user"},{"_id":"67a2f2cf915919019c3f02b8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/DgEAtNvUuHtU3_Waxut3t.png","isPro":false,"fullname":"Boxi Yu","user":"Bertsekas","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.14445.md"}">
FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale
Authors: ,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
FrontierSmith automates the creation of open-ended coding problems from closed-ended tasks, improving LLM coding performance on benchmarks through diverse problem variants and enhanced agent interactions.
AI-generated summary
Many real-world coding challenges are open-ended and admit no known optimal solution. Yet, recent progress in LLM coding has focused on well-defined tasks such as feature implementation, bug fixing, and competitive programming. Open-ended coding remains a weak spot for LLMs, largely because open-ended training problems are scarce and expensive to construct. Our goal is to synthesize open-ended coding problems at scale to train stronger LLM coders. We introduce FrontierSmith, an automated system for iteratively evolving open-ended problems from existing closed-ended coding tasks. Starting from competitive programming problems, FrontierSmith generates candidate open-ended variants by changing the problems'goals, restricting outputs, and generalizing inputs. It then uses a quantitative idea divergence metric to select problems that elicit genuinely diverse approaches from different solvers. Agents then generate test cases and verifiers for the surviving candidates. On two open-ended coding benchmarks, training on our synthesized data yields substantial gains over the base models: Qwen3.5-9B improves by +8.82 score on FrontierCS and +306.36 (Elo-rating-based performance) on ALE-bench; Qwen3.5-27B improves by +12.12 and +309.12, respectively. The synthesized problems also make agents take more turns and use more tokens, similar to human-curated ones, suggesting that closed-ended seeds can be a practical starting point for long-horizon coding data.
Community
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.14445 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.14445 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.14445 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.