Hugging Face Daily Papers · · 4 min read

Playful Agentic Robot Learning

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

<strong>RATs</strong> is a multi-agent Code-as-Policy system for lifelong robot skill learning. During free-form play a team of LLM agents invents its own tasks, writes code-as-policy, and distills successful executions into a reusable skill library; at evaluation those skills are reused as planner context — no gradients, no RL, all learning through structured natural-language feedback and code reuse.</p>\n<p><video src=\"https://cdn-uploads.huggingface.co/production/uploads/62f0ecd2700bdc19558360de/LdkCi9uv2_eTxHs6VaL0v.mp4\" controls=\"\" class=\"max-w-full!\"></video></p>","updatedAt":"2026-06-19T05:26:28.190Z","author":{"_id":"62f0ecd2700bdc19558360de","avatarUrl":"/avatars/5325b4b763f30c41f30e3aec0d2b59fa.svg","fullname":"Junyi Zhang","name":"Junyi42","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8427327871322632},"editors":["Junyi42"],"editorAvatarUrls":["/avatars/5325b4b763f30c41f30e3aec0d2b59fa.svg"],"reactions":[{"reaction":"🔥","users":["Icycream"],"count":1}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.19419","authors":[{"_id":"6a34d1f54c5c5e0d69bf1d63","name":"Junyi Zhang","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d64","name":"Jiaxin Ge","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d65","name":"Hanjun Yoo","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d66","name":"Letian Fu","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d67","name":"Zihan Yang","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d68","name":"Yaowei Liu","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d69","name":"Raj Saravanan","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d6a","name":"Shaofeng Yin","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d6b","name":"Justin Yu","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d6c","name":"Dantong Niu","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d6d","name":"Zirui Wang","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d6e","name":"Roei Herzig","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d6f","name":"Ken Goldberg","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d70","name":"Yutong Bai","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d71","name":"David M. Chan","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d72","name":"Ion Stoica","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d73","name":"Angjoo Kanazawa","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d74","name":"Jiahui Lei","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d75","name":"Haiwen Feng","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d76","name":"Trevor Darrell","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/62f0ecd2700bdc19558360de/er-NaTO2e-a6Cio3EngLz.mp4"],"publishedAt":"2026-06-17T00:00:00.000Z","submittedOnDailyAt":"2026-06-19T00:00:00.000Z","title":"Playful Agentic Robot Learning","submittedOnDailyBy":{"_id":"62f0ecd2700bdc19558360de","avatarUrl":"/avatars/5325b4b763f30c41f30e3aec0d2b59fa.svg","isPro":false,"fullname":"Junyi Zhang","user":"Junyi42","type":"user","name":"Junyi42"},"summary":"Current agentic robot systems can write executable Code-as-Policy programs, observe feedback, and revise behavior across multiple attempts, but they remain largely task-driven: reusable skills are acquired only after explicit instructions. We study Playful Agentic Robot Learning, where an embodied coding agent uses self-directed play as a continual skill-learning stage before downstream tasks arrive. We introduce RATs, Robotics Agent Teams designed for play-time skill acquisition. During play, RATs proposes novel yet learnable exploratory tasks, plans and executes robot-code policies, verifies intermediate progress, diagnoses failures, retries with dense, step-level feedback, and distills successful executions into a persistent code skill library. At test time, the agent reuses relevant skills from this frozen library to help solve new tasks. Experiments in LIBERO-PRO and MolmoSpaces show that play-learned skills improve held-out downstream tasks over no-play and random-play baselines, with 20.6 and 17.0 percentage-point gains over CaP-Agent0 on LIBERO-PRO and MolmoSpaces, respectively. Moreover, the learned skills can be plugged into other inference-time Code-as-Policy agents by simply retrieving them into the context, improving RoboSuite and real-world transfer by 8.9 and 8.8 points, respectively, without finetuning the underlying model.","upvotes":25,"discussionId":"6a34d1f54c5c5e0d69bf1d77","projectPage":"https://playful-rats.github.io/","githubRepo":"https://github.com/Playful-RATs/rats","githubRepoAddedBy":"user","ai_summary":"Embodied robots learn reusable skills through self-directed play and exploration, then apply these skills to improve performance on downstream tasks without additional training.","ai_keywords":["Code-as-Policy","embodied coding agent","self-directed play","exploratory tasks","robot-code policies","skill library","downstream tasks","LIBERO-PRO","MolmoSpaces","RoboSuite"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":15,"organization":{"_id":"61f20a9ce108f2cba2dc0730","name":"Berkeley","fullname":"UC Berkeley","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/61ac8f8a00d01045fca0ad2f/0FjsTg2txEZZ4dEgmMnQL.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"62f0ecd2700bdc19558360de","avatarUrl":"/avatars/5325b4b763f30c41f30e3aec0d2b59fa.svg","isPro":false,"fullname":"Junyi Zhang","user":"Junyi42","type":"user"},{"_id":"6902d31ceb9d3355ce53cb9d","avatarUrl":"/avatars/914955429b949362b95bc5d85a2491ca.svg","isPro":false,"fullname":"Zihan Yang","user":"smallyang","type":"user"},{"_id":"66e37a99c8ce415ea72d743a","avatarUrl":"/avatars/4dbf5bf09f7984cc2da8ee50a34bacea.svg","isPro":false,"fullname":"Haozhe Jiang","user":"EricHaozheJiang","type":"user"},{"_id":"6629dac35e13d8145e3a605e","avatarUrl":"/avatars/95938f20ab9e067838f37aca6ea235ae.svg","isPro":false,"fullname":"Jiaxin Ge","user":"JiaxinGe","type":"user"},{"_id":"680825cc3b3575df0adc8d3b","avatarUrl":"/avatars/306d31e7ea5b5446ce0bad984b5b0b0e.svg","isPro":false,"fullname":"Fugtemypt","user":"OYJason4583","type":"user"},{"_id":"67f8b0c8e10940947f0ffda5","avatarUrl":"/avatars/bc7c6f1a094cdd07aded29c084a42158.svg","isPro":false,"fullname":"Yixuan Huang","user":"xuanxuan001","type":"user"},{"_id":"65adff01a8f716b32e0d6231","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65adff01a8f716b32e0d6231/48ZSoVU-X1PHuigpPQX9P.jpeg","isPro":false,"fullname":"Yoo, Hanjun","user":"Icycream","type":"user"},{"_id":"676d59d501e3fff315aa57e4","avatarUrl":"/avatars/d89e6aaee7f221ebd3cc69e6a0d28daa.svg","isPro":false,"fullname":"Liu","user":"yysmwt","type":"user"},{"_id":"662fabcbd2dad69f803ace63","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662fabcbd2dad69f803ace63/1czhAAHmzl-8IdpLukG84.jpeg","isPro":false,"fullname":"Shaofeng Yin","user":"DietCoke4671","type":"user"},{"_id":"68097e6672c90763e3364556","avatarUrl":"/avatars/69c2bfb88c34d1bf58c720a2d7b57c48.svg","isPro":false,"fullname":"ssssss","user":"aaaadhsjs","type":"user"},{"_id":"680979bbb07e05b900a719bf","avatarUrl":"/avatars/0b1036b9298191d34d68cd0a3ae58bda.svg","isPro":false,"fullname":"abc","user":"VLaba123","type":"user"},{"_id":"67ec282ea30d05e176fc34af","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/T4o1BLdL3Vay0cNliK3JS.png","isPro":false,"fullname":"Shaofeng Yin","user":"Shaw19260817","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":2,"organization":{"_id":"61f20a9ce108f2cba2dc0730","name":"Berkeley","fullname":"UC Berkeley","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/61ac8f8a00d01045fca0ad2f/0FjsTg2txEZZ4dEgmMnQL.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.19419.md","query":{}}">
Papers
arxiv:2606.19419

Playful Agentic Robot Learning

Published on Jun 17
· Submitted by
Junyi Zhang
on Jun 19
#2 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Embodied robots learn reusable skills through self-directed play and exploration, then apply these skills to improve performance on downstream tasks without additional training.

Current agentic robot systems can write executable Code-as-Policy programs, observe feedback, and revise behavior across multiple attempts, but they remain largely task-driven: reusable skills are acquired only after explicit instructions. We study Playful Agentic Robot Learning, where an embodied coding agent uses self-directed play as a continual skill-learning stage before downstream tasks arrive. We introduce RATs, Robotics Agent Teams designed for play-time skill acquisition. During play, RATs proposes novel yet learnable exploratory tasks, plans and executes robot-code policies, verifies intermediate progress, diagnoses failures, retries with dense, step-level feedback, and distills successful executions into a persistent code skill library. At test time, the agent reuses relevant skills from this frozen library to help solve new tasks. Experiments in LIBERO-PRO and MolmoSpaces show that play-learned skills improve held-out downstream tasks over no-play and random-play baselines, with 20.6 and 17.0 percentage-point gains over CaP-Agent0 on LIBERO-PRO and MolmoSpaces, respectively. Moreover, the learned skills can be plugged into other inference-time Code-as-Policy agents by simply retrieving them into the context, improving RoboSuite and real-world transfer by 8.9 and 8.8 points, respectively, without finetuning the underlying model.

Community

Paper submitter about 3 hours ago

RATs is a multi-agent Code-as-Policy system for lifelong robot skill learning. During free-form play a team of LLM agents invents its own tasks, writes code-as-policy, and distills successful executions into a reusable skill library; at evaluation those skills are reused as planner context — no gradients, no RL, all learning through structured natural-language feedback and code reuse.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.19419
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.19419 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.19419 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.19419 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers