Hugging Face Daily Papers · June 11, 2026 · 5 min read

Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Reinforcement Learning (RL) with verifiable environments has emerged as a powerful approach for enhancing the reasoning capabilities of Large Language Models (LLMs). While prior research demonstrates that scaling environment quantity improves RL performance, existing manual or individual construction methods suffer from linear scaling limits, thereby hindering scalable reasoning generalization. This paper introduces RACES (\\textbf{R}ecursive \\textbf{A}utomated \\textbf{C}omposition for \\textbf{E}nvironment \\textbf{S}caling), a framework that conceptualizes verifiable environments as composable building blocks that can be recursively assembled. The key insight is that when the codomain (output type) of one environment matches the domain (input type) of another, they can be automatically fused into a new verifiable environment, enabling recursive composition. RACES is implemented with 300 individual environments and defines a set of composition operators (\\textsc{SEQUENTIAL}, \\textsc{PARALLEL}, \\textsc{SORT}, and \\textsc{SELECT}) that induce diverse reasoning patterns. Extensive experiments show that RL training on these composite environments consistently enhances reasoning generalization. Specifically, RACES improves DeepSeek-R1-Distill-Qwen-14B by an average of 3.1 points (from 48.2 to 51.3) and boosts Qwen3-14B performance from 58.8 to 61.1 on six benchmarks, which are unseen during the construction of training environments. Moreover, RACES achieves performance comparable to training on 300 individual environments using only 50 base environments, demonstrating significant efficiency in environment utilization.</p>\n","updatedAt":"2026-06-11T02:47:40.751Z","author":{"_id":"63f33d500be81bdc5d902356","avatarUrl":"/avatars/125812b9a86c3379b34ebfa8026f1a7f.svg","fullname":"xianghao","name":"xiangh","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8568569421768188},"editors":["xiangh"],"editorAvatarUrls":["/avatars/125812b9a86c3379b34ebfa8026f1a7f.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.12373","authors":[{"_id":"6a2a1ec980a9c7c6830c0ec3","name":"Hao Xiang","hidden":false},{"_id":"6a2a1ec980a9c7c6830c0ec4","name":"Qiaoyu Tang","hidden":false},{"_id":"6a2a1ec980a9c7c6830c0ec5","name":"Le Yu","hidden":false},{"_id":"6a2a1ec980a9c7c6830c0ec6","name":"Yaojie Lu","hidden":false},{"_id":"6a2a1ec980a9c7c6830c0ec7","name":"Xianpei Han","hidden":false},{"_id":"6a2a1ec980a9c7c6830c0ec8","name":"Ben He","hidden":false},{"_id":"6a2a1ec980a9c7c6830c0ec9","name":"Le Sun","hidden":false},{"_id":"6a2a1ec980a9c7c6830c0eca","name":"Bowen Yu","hidden":false},{"_id":"6a2a1ec980a9c7c6830c0ecb","name":"Peng Wang","hidden":false},{"_id":"6a2a1ec980a9c7c6830c0ecc","name":"Hongyu Lin","hidden":false},{"_id":"6a2a1ec980a9c7c6830c0ecd","name":"Dayiheng Liu","hidden":false}],"publishedAt":"2026-06-10T00:00:00.000Z","submittedOnDailyAt":"2026-06-11T00:00:00.000Z","title":"Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization","submittedOnDailyBy":{"_id":"63f33d500be81bdc5d902356","avatarUrl":"/avatars/125812b9a86c3379b34ebfa8026f1a7f.svg","isPro":false,"fullname":"xianghao","user":"xiangh","type":"user","name":"xiangh"},"summary":"Reinforcement Learning (RL) with verifiable environments has emerged as a powerful approach for enhancing the reasoning capabilities of Large Language Models (LLMs). While prior research demonstrates that scaling environment quantity improves RL performance, existing manual or individual construction methods suffer from linear scaling limits, thereby hindering scalable reasoning generalization. This paper introduces RACES (Recursive Automated Composition for Environment Scaling), a framework that conceptualizes verifiable environments as composable building blocks that can be recursively assembled. The key insight is that when the codomain (output type) of one environment matches the domain (input type) of another, they can be automatically fused into a new verifiable environment, enabling recursive composition. RACES is implemented with 300 individual environments and defines a set of composition operators (SEQUENTIAL, PARALLEL, SORT, and SELECT) that induce diverse reasoning patterns. Extensive experiments show that RL training on these composite environments consistently enhances reasoning generalization. Specifically, RACES improves DeepSeek-R1-Distill-Qwen-14B by an average of 3.1 points (from 48.2 to 51.3) and boosts Qwen3-14B performance from 58.8 to 61.1 on six benchmarks, which are unseen during the construction of training environments. Moreover, RACES achieves performance comparable to training on 300 individual environments using only 50 base environments, demonstrating significant efficiency in environment utilization.","upvotes":6,"discussionId":"6a2a1eca80a9c7c6830c0ece","ai_summary":"Recursive automated composition framework enables scalable reinforcement learning for language models by automatically combining verifiable environments through compositional operators.","ai_keywords":["Reinforcement Learning","Large Language Models","verifiable environments","recursive composition","composition operators","SEQUENTIAL","PARALLEL","SORT","SELECT"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"64c8b5837fe12ecd0a7e92eb","name":"Qwen","fullname":"Qwen","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"68f8cb54f95a2b7d4c5c0973","avatarUrl":"/avatars/6b7511eb72ffacc9d71b15553a0388df.svg","isPro":false,"fullname":"xiang","user":"kssysw","type":"user"},{"_id":"63f33d500be81bdc5d902356","avatarUrl":"/avatars/125812b9a86c3379b34ebfa8026f1a7f.svg","isPro":false,"fullname":"xianghao","user":"xiangh","type":"user"},{"_id":"669d9e56fe9496b3c6db6e7e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/669d9e56fe9496b3c6db6e7e/kanTsLdBIfGyyZbNl34pf.jpeg","isPro":false,"fullname":"Ilya Pereverzin","user":"NodeLinker","type":"user"},{"_id":"6953a22727f9d6b3746c6d85","avatarUrl":"/avatars/79dca5dbc0a0d72c370cc42cd58e52ab.svg","isPro":false,"fullname":"AnYang","user":"AnthonyYoung","type":"user"},{"_id":"6953897fa6ebf89c814f4cc5","avatarUrl":"/avatars/5f287f9e303ff1c187713fc89e84330f.svg","isPro":false,"fullname":"MBerger","user":"SHakeShakeShake","type":"user"},{"_id":"6a2ae6c2e36bc84d91b6e7cc","avatarUrl":"/avatars/abf4b4c0020f9332b6827952cc53163e.svg","isPro":false,"fullname":"mmgood","user":"mmgood","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"64c8b5837fe12ecd0a7e92eb","name":"Qwen","fullname":"Qwen","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png"}}">

Papers

arxiv:2606.12373

Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization

Published on Jun 10

· Submitted by

xianghao on Jun 11

Qwen

Upvote

Authors:

Abstract

Recursive automated composition framework enables scalable reinforcement learning for language models by automatically combining verifiable environments through compositional operators.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

View arXiv page View PDF Add to collection

Community

xiangh

Paper submitter about 17 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.12373 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.12373 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.12373 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers