r/LocalLLaMA · · 1 min read

TMax: A Simple Recipe for Terminal Agents

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

TMax: A Simple Recipe for Terminal Agents

TMax is the strongest open RL recipe for terminal agents to date, bringing open data recipes closer to the frontier. We release two things. The first is TMax-15k, a dataset of 14,600 RL environments built from a compositional pipeline with explicit control over difficulty and diversity. It is over 2.5× larger than the next-largest open terminal dataset that releases full environment data. The second is a simple, outcome-only RL recipe (GRPO plus a few stability fixes), which we use to train a family of open models from 2B to 27B.

TMax-9B reaches 27.2% on Terminal Bench 2.0. Under official Terminal Bench settings this is the strongest open-weights model under 10B we are aware of: it beats 32B terminal agents from prior work and approaches closed models like Claude Haiku 4.5 (29.8%). Scaling the same recipe up, TMax-27B improves to 42.7%, approaching models 10 to 40× its size like the 1T-parameter Kimi K2.5 (43.2%).

#JustSharing. I have no idea what to do with this

submitted by /u/pmttyji
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA