Continual Harness: Online Adaptation for Self-Improving Foundation Agents [R]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
| Sharing a new paper from the GPP and PokeAgent teams. Gemini Plays Pokémon (GPP) was the first AI system to complete Pokémon Blue, Yellow Legacy on hard mode, and Crystal without losing a battle. How? Early signs of iterative harness development. In the Blue era a human watched the stream and edited the harness. By Yellow Legacy and Crystal, the model itself was performing most of the editing through general meta-tools (define_agent, run_code, notepad edits). Our new paper, Continual Harness: Online Adaptation for Self-Improving Foundation Agents, formalizes the loop and automates the refining role end to end. We then carry the same loop into training, enabling model-harness co-learning. The takeaways: Paper (arXiv). https://arxiv.org/abs/2605.09998 [link] [comments] |
More from r/MachineLearning
-
WebHarbor - We "dock" the real websites into local for web agents! [R]
May 14
-
Your AI Use Is Breaking My Brain: Why 10 Minutes of Prompting Fries Us[D]
May 14
-
Trained transformer-based chess models to play like humans (including thinking time) [P]
May 13
-
Scenema Audio: Zero-shot expressive voice cloning and speech generation [N]
May 13
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.