Hugging Face Daily Papers · · 5 min read

LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Add paper.</p>\n","updatedAt":"2026-05-29T03:20:37.409Z","author":{"_id":"63115e4daf1fce227a404b16","avatarUrl":"/avatars/f2e9ee3f1f70bcfabb2b660b241b7aee.svg","fullname":"Xinyu Lu","name":"luxinyu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.8793436288833618},"editors":["luxinyu"],"editorAvatarUrls":["/avatars/f2e9ee3f1f70bcfabb2b660b241b7aee.svg"],"reactions":[],"isReport":false}},{"id":"6a1a4115df1f8833acea432e","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false},"createdAt":"2026-05-30T01:44:53.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Terminal-World: Scaling Terminal-Agent Environments via Agent Skills](https://huggingface.co/papers/2605.20876) (2026)\n* [TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks](https://huggingface.co/papers/2605.22535) (2026)\n* [Toward Scalable Terminal Task Synthesis via Skill Graphs](https://huggingface.co/papers/2604.25727) (2026)\n* [The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence](https://huggingface.co/papers/2605.26494) (2026)\n* [ClawGym: A Scalable Framework for Building Effective Claw Agents](https://huggingface.co/papers/2604.26904) (2026)\n* [Terminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?](https://huggingface.co/papers/2605.03195) (2026)\n* [Synthetic Sandbox for Training Machine Learning Engineering Agents](https://huggingface.co/papers/2604.04872) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.20876\">Terminal-World: Scaling Terminal-Agent Environments via Agent Skills</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.22535\">TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.25727\">Toward Scalable Terminal Task Synthesis via Skill Graphs</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.26494\">The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.26904\">ClawGym: A Scalable Framework for Building Effective Claw Agents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.03195\">Terminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.04872\">Synthetic Sandbox for Training Machine Learning Engineering Agents</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{&quot;user&quot;:&quot;librarian-bot&quot;}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-05-30T01:44:53.238Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7026347517967224},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.29559","authors":[{"_id":"6a18e91156b4bb14ec65cda1","user":{"_id":"6757db3ca72313e29af824f0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/dKM9alNXpPguxs3pszsRN.png","isPro":false,"fullname":"Xiaoxuan Peng","user":"paraline","type":"user","name":"paraline"},"name":"Xiaoxuan Peng","status":"claimed_verified","statusLastChangedAt":"2026-05-29T09:32:27.631Z","hidden":false},{"_id":"6a18e91156b4bb14ec65cda2","name":"Kaiqi Zhang","hidden":false},{"_id":"6a18e91156b4bb14ec65cda3","name":"Xinyu Lu","hidden":false},{"_id":"6a18e91156b4bb14ec65cda4","name":"Boxi Cao","hidden":false},{"_id":"6a18e91156b4bb14ec65cda5","name":"Yaojie Lu","hidden":false},{"_id":"6a18e91156b4bb14ec65cda6","name":"Hongyu Lin","hidden":false},{"_id":"6a18e91156b4bb14ec65cda7","name":"Xianpei Han","hidden":false},{"_id":"6a18e91156b4bb14ec65cda8","name":"Le Sun","hidden":false}],"publishedAt":"2026-05-28T00:00:00.000Z","submittedOnDailyAt":"2026-05-29T00:00:00.000Z","title":"LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents","submittedOnDailyBy":{"_id":"63115e4daf1fce227a404b16","avatarUrl":"/avatars/f2e9ee3f1f70bcfabb2b660b241b7aee.svg","isPro":false,"fullname":"Xinyu Lu","user":"luxinyu","type":"user","name":"luxinyu"},"summary":"Mastering terminal environments requires language agents capable of multi-step planning, feedback-grounded execution, and dynamic state adaptation. However, training such agents is currently bottlenecked by a reliance on scraped external repositories, which limits domain diversity, environment controllability, and the targeting of specific capability deficits. We introduce LiteCoder-Terminal-Gen, a zero-dependency synthesis pipeline that autonomously generates executable and verifiable terminal training environments directly from domain specifications. Using this framework, we construct two large-scale resources: LiteCoder-Terminal-SFT, comprising 11,255 expert trajectories across 10 domains, and LiteCoder-Terminal-RL, featuring 602 verifiable environments for trajectory-level preference optimization. Supervised fine-tuning of Qwen-family models on our SFT dataset yields agents that significantly outperform their base counterparts. Notably, our 32B variant achieves 29.06%, 18.54%, and 34.00% pass@1 on Terminal Bench 1.0, 2.0, and Pro, respectively. Furthermore, applying Direct Multi-turn Preference Optimization (DMPO) on our RL environments yields additional performance gains. These results systematically demonstrate that fully synthetic, executable environments offer a scalable and verifiable supervision signal for mastering complex, real-world command-line workflows.","upvotes":9,"discussionId":"6a18e91156b4bb14ec65cda9","githubRepo":"https://github.com/icip-cas/LiteCoder","githubRepoAddedBy":"user","ai_summary":"LiteCoder-Terminal-Gen enables scalable training of language agents for terminal environments through synthetic, executable environments that outperform traditional methods.","ai_keywords":["language agents","multi-step planning","feedback-grounded execution","dynamic state adaptation","zero-dependency synthesis pipeline","expert trajectories","trajectory-level preference optimization","supervised fine-tuning","Direct Multi-turn Preference Optimization","DMPO","terminal environments","command-line workflows"],"githubStars":15},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6942a253b5344869abe7abfc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6942a253b5344869abe7abfc/ts6znV8l3RLv1_B6oKA9l.png","isPro":false,"fullname":"LiteCoder","user":"Lite-Coder","type":"user"},{"_id":"6757db3ca72313e29af824f0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/dKM9alNXpPguxs3pszsRN.png","isPro":false,"fullname":"Xiaoxuan Peng","user":"paraline","type":"user"},{"_id":"63115e4daf1fce227a404b16","avatarUrl":"/avatars/f2e9ee3f1f70bcfabb2b660b241b7aee.svg","isPro":false,"fullname":"Xinyu Lu","user":"luxinyu","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"643b62ac065961b2252abb7a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/643b62ac065961b2252abb7a/6F140r2GfdpItuvSk8GNc.jpeg","isPro":false,"fullname":"zuijiang","user":"zuijiang","type":"user"},{"_id":"5fc052241160c47d1d438556","avatarUrl":"/avatars/f508707fc92f1b42f7897b12b727754c.svg","isPro":false,"fullname":"Boxi Cao","user":"Bowieee","type":"user"},{"_id":"63c1699e40a26dd2db32400d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63c1699e40a26dd2db32400d/3N0-Zp8igv8-52mXAdiiq.jpeg","isPro":false,"fullname":"Chroma","user":"Chroma111","type":"user"},{"_id":"6a0b2f731f2a5f058ce5c724","avatarUrl":"/avatars/0e2a406f3c8ae3d17e7e67139aca4553.svg","isPro":false,"fullname":"Tab Bab","user":"Tabbab321","type":"user"},{"_id":"624ac233c04d55ec0f42b11e","avatarUrl":"/avatars/58a9abce945e71a65abc8a54085de6d7.svg","isPro":false,"fullname":"oh sehun","user":"sehun","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.29559.md"}">
Papers
arxiv:2605.29559

LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents

Published on May 28
· Submitted by
Xinyu Lu
on May 29
Authors:
,
,
,
,
,
,

Abstract

LiteCoder-Terminal-Gen enables scalable training of language agents for terminal environments through synthetic, executable environments that outperform traditional methods.

AI-generated summary

Mastering terminal environments requires language agents capable of multi-step planning, feedback-grounded execution, and dynamic state adaptation. However, training such agents is currently bottlenecked by a reliance on scraped external repositories, which limits domain diversity, environment controllability, and the targeting of specific capability deficits. We introduce LiteCoder-Terminal-Gen, a zero-dependency synthesis pipeline that autonomously generates executable and verifiable terminal training environments directly from domain specifications. Using this framework, we construct two large-scale resources: LiteCoder-Terminal-SFT, comprising 11,255 expert trajectories across 10 domains, and LiteCoder-Terminal-RL, featuring 602 verifiable environments for trajectory-level preference optimization. Supervised fine-tuning of Qwen-family models on our SFT dataset yields agents that significantly outperform their base counterparts. Notably, our 32B variant achieves 29.06%, 18.54%, and 34.00% pass@1 on Terminal Bench 1.0, 2.0, and Pro, respectively. Furthermore, applying Direct Multi-turn Preference Optimization (DMPO) on our RL environments yields additional performance gains. These results systematically demonstrate that fully synthetic, executable environments offer a scalable and verifiable supervision signal for mastering complex, real-world command-line workflows.

Community

Paper submitter 1 day ago
edited 1 day ago

Add paper.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.29559
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 2

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.29559 in a Space README.md to link it from this page.

Collections including this paper 2

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers