Tmax: A simple recipe for terminal agents
Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.
Tmax: A simple recipe for terminal agents
Abstract
A novel RL training approach for terminal agents achieves superior performance using a simplified recipe and expanded dataset, enabling effective training with fewer parameters than previous methods.
Terminal-using agents have quickly become the most popular downstream application of language models (LMs). Despite their prevalence, relatively little academic work has examined RL-based training of these models, likely due to difficult benchmarks, a lack of data, and a lack of simple baseline recipes. We present Tmax, the strongest open RL recipe for terminal agents to date, bringing open data recipes closer to the frontier. While simple, our recipe achieves 27\% on Terminal-Bench 2.0 with only 9B parameters, outperforming much larger models from prior work. Concretely, we generate data using a novel taxonomy, combining difficulty control, personas, and verifier diversification, which allows us to cheaply generate large amounts of terminal environments for RL and SFT training. We open-source our terminal dataset, which is over 2.5x larger than previously released terminal-agent datasets. We then train open-weight models using RL with our data, using a simple, outcome-only recipe. We release our data, models, and code as a strong baseline for future open academic work on terminal agents at https://github.com/hamishivi/tmax.
Models citing this paper 12
Browse 12 models citing this paperDatasets citing this paper 11
Browse 11 datasets citing this paperSpaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper
More from Hugging Face Daily Papers
-
COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origami
Jun 27
-
Fast LeWorldModel
Jun 27
-
ABACUS: Adapting Unified Foundation Model for Bridging Image Count Understanding and Generation
Jun 27
-
Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents
Jun 26
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.