Hugging Face Daily Papers · June 19, 2026 · 4 min read

Playful Agentic Robot Learning

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

RATs is a multi-agent Code-as-Policy system for lifelong robot skill learning. During free-form play a team of LLM agents invents its own tasks, writes code-as-policy, and distills successful executions into a reusable skill library; at evaluation those skills are reused as planner context — no gradients, no RL, all learning through structured natural-language feedback and code reuse.\n<video src=\"https://cdn-uploads.huggingface.co/production/uploads/62f0ecd2700bdc19558360de/LdkCi9uv2_eTxHs6VaL0v.mp4\" controls=\"\" class=\"max-w-full!\"></video>","updatedAt":"2026-06-19T05:26:28.190Z","author":{"_id":"62f0ecd2700bdc19558360de","avatarUrl":"/avatars/5325b4b763f30c41f30e3aec0d2b59fa.svg","fullname":"Junyi Zhang","name":"Junyi42","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8427327871322632},"editors":["Junyi42"],"editorAvatarUrls":["/avatars/5325b4b763f30c41f30e3aec0d2b59fa.svg"],"reactions":[{"reaction":"🔥","users":["Icycream"],"count":1}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.19419","authors":[{"_id":"6a34d1f54c5c5e0d69bf1d63","name":"Junyi Zhang","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d64","name":"Jiaxin Ge","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d65","name":"Hanjun Yoo","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d66","name":"Letian Fu","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d67","name":"Zihan Yang","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d68","name":"Yaowei Liu","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d69","name":"Raj Saravanan","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d6a","name":"Shaofeng Yin","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d6b","name":"Justin Yu","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d6c","name":"Dantong Niu","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d6d","name":"Zirui Wang","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d6e","name":"Roei Herzig","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d6f","name":"Ken Goldberg","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d70","name":"Yutong Bai","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d71","name":"David M. Chan","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d72","name":"Ion Stoica","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d73","name":"Angjoo Kanazawa","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d74","name":"Jiahui Lei","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d75","name":"Haiwen Feng","hidden":false},{"_id":"6a34d1f54c5c5e0d69bf1d76","name":"Trevor Darrell","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/62f0ecd2700bdc19558360de/er-NaTO2e-a6Cio3EngLz.mp4"],"publishedAt":"2026-06-17T00:00:00.000Z","submittedOnDailyAt":"2026-06-19T00:00:00.000Z","title":"Playful Agentic Robot Learning","submittedOnDailyBy":{"_id":"62f0ecd2700bdc19558360de","avatarUrl":"/avatars/5325b4b763f30c41f30e3aec0d2b59fa.svg","isPro":false,"fullname":"Junyi Zhang","user":"Junyi42","type":"user","name":"Junyi42"},"summary":"Current agentic robot systems can write executable Code-as-Policy programs, observe feedback, and revise behavior across multiple attempts, but they remain largely task-driven: reusable skills are acquired only after explicit instructions. We study Playful Agentic Robot Learning, where an embodied coding agent uses self-directed play as a continual skill-learning stage before downstream tasks arrive. We introduce RATs, Robotics Agent Teams designed for play-time skill acquisition. During play, RATs proposes novel yet learnable exploratory tasks, plans and executes robot-code policies, verifies intermediate progress, diagnoses failures, retries with dense, step-level feedback, and distills successful executions into a persistent code skill library. At test time, the agent reuses relevant skills from this frozen library to help solve new tasks. Experiments in LIBERO-PRO and MolmoSpaces show that play-learned skills improve held-out downstream tasks over no-play and random-play baselines, with 20.6 and 17.0 percentage-point gains over CaP-Agent0 on LIBERO-PRO and MolmoSpaces, respectively. Moreover, the learned skills can be plugged into other inference-time Code-as-Policy agents by simply retrieving them into the context, improving RoboSuite and real-world transfer by 8.9 and 8.8 points, respectively, without finetuning the underlying model.","upvotes":25,"discussionId":"6a34d1f54c5c5e0d69bf1d77","projectPage":"https://playful-rats.github.io/","githubRepo":"https://github.com/Playful-RATs/rats","githubRepoAddedBy":"user","ai_summary":"Embodied robots learn reusable skills through self-directed play and exploration, then apply these skills to improve performance on downstream tasks without additional training.","ai_keywords":["Code-as-Policy","embodied coding agent","self-directed play","exploratory tasks","robot-code policies","skill library","downstream tasks","LIBERO-PRO","MolmoSpaces","RoboSuite"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":15,"organization":{"_id":"61f20a9ce108f2cba2dc0730","name":"Berkeley","fullname":"UC Berkeley","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/61ac8f8a00d01045fca0ad2f/0FjsTg2txEZZ4dEgmMnQL.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"62f0ecd2700bdc19558360de","avatarUrl":"/avatars/5325b4b763f30c41f30e3aec0d2b59fa.svg","isPro":false,"fullname":"Junyi Zhang","user":"Junyi42","type":"user"},{"_id":"6902d31ceb9d3355ce53cb9d","avatarUrl":"/avatars/914955429b949362b95bc5d85a2491ca.svg","isPro":false,"fullname":"Zihan Yang","user":"smallyang","type":"user"},{"_id":"66e37a99c8ce415ea72d743a","avatarUrl":"/avatars/4dbf5bf09f7984cc2da8ee50a34bacea.svg","isPro":false,"fullname":"Haozhe Jiang","user":"EricHaozheJiang","type":"user"},{"_id":"6629dac35e13d8145e3a605e","avatarUrl":"/avatars/95938f20ab9e067838f37aca6ea235ae.svg","isPro":false,"fullname":"Jiaxin Ge","user":"JiaxinGe","type":"user"},{"_id":"680825cc3b3575df0adc8d3b","avatarUrl":"/avatars/306d31e7ea5b5446ce0bad984b5b0b0e.svg","isPro":false,"fullname":"Fugtemypt","user":"OYJason4583","type":"user"},{"_id":"67f8b0c8e10940947f0ffda5","avatarUrl":"/avatars/bc7c6f1a094cdd07aded29c084a42158.svg","isPro":false,"fullname":"Yixuan Huang","user":"xuanxuan001","type":"user"},{"_id":"65adff01a8f716b32e0d6231","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65adff01a8f716b32e0d6231/48ZSoVU-X1PHuigpPQX9P.jpeg","isPro":false,"fullname":"Yoo, Hanjun","user":"Icycream","type":"user"},{"_id":"676d59d501e3fff315aa57e4","avatarUrl":"/avatars/d89e6aaee7f221ebd3cc69e6a0d28daa.svg","isPro":false,"fullname":"Liu","user":"yysmwt","type":"user"},{"_id":"662fabcbd2dad69f803ace63","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662fabcbd2dad69f803ace63/1czhAAHmzl-8IdpLukG84.jpeg","isPro":false,"fullname":"Shaofeng Yin","user":"DietCoke4671","type":"user"},{"_id":"68097e6672c90763e3364556","avatarUrl":"/avatars/69c2bfb88c34d1bf58c720a2d7b57c48.svg","isPro":false,"fullname":"ssssss","user":"aaaadhsjs","type":"user"},{"_id":"680979bbb07e05b900a719bf","avatarUrl":"/avatars/0b1036b9298191d34d68cd0a3ae58bda.svg","isPro":false,"fullname":"abc","user":"VLaba123","type":"user"},{"_id":"67ec282ea30d05e176fc34af","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/T4o1BLdL3Vay0cNliK3JS.png","isPro":false,"fullname":"Shaofeng Yin","user":"Shaw19260817","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":2,"organization":{"_id":"61f20a9ce108f2cba2dc0730","name":"Berkeley","fullname":"UC Berkeley","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/61ac8f8a00d01045fca0ad2f/0FjsTg2txEZZ4dEgmMnQL.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.19419.md","query":{}}">

Papers

arxiv:2606.19419

Playful Agentic Robot Learning

Published on Jun 17

· Submitted by

Junyi Zhang on Jun 19

#2 Paper of the day

UC Berkeley

Upvote

Authors:

Abstract

Embodied robots learn reusable skills through self-directed play and exploration, then apply these skills to improve performance on downstream tasks without additional training.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Current agentic robot systems can write executable Code-as-Policy programs, observe feedback, and revise behavior across multiple attempts, but they remain largely task-driven: reusable skills are acquired only after explicit instructions. We study Playful Agentic Robot Learning, where an embodied coding agent uses self-directed play as a continual skill-learning stage before downstream tasks arrive. We introduce RATs, Robotics Agent Teams designed for play-time skill acquisition. During play, RATs proposes novel yet learnable exploratory tasks, plans and executes robot-code policies, verifies intermediate progress, diagnoses failures, retries with dense, step-level feedback, and distills successful executions into a persistent code skill library. At test time, the agent reuses relevant skills from this frozen library to help solve new tasks. Experiments in LIBERO-PRO and MolmoSpaces show that play-learned skills improve held-out downstream tasks over no-play and random-play baselines, with 20.6 and 17.0 percentage-point gains over CaP-Agent0 on LIBERO-PRO and MolmoSpaces, respectively. Moreover, the learned skills can be plugged into other inference-time Code-as-Policy agents by simply retrieving them into the context, improving RoboSuite and real-world transfer by 8.9 and 8.8 points, respectively, without finetuning the underlying model.

View arXiv page View PDF Project page GitHub 15 Add to collection

Community

Junyi42

Paper submitter about 3 hours ago

RATs is a multi-agent Code-as-Policy system for lifelong robot skill learning. During free-form play a team of LLM agents invents its own tasks, writes code-as-policy, and distills successful executions into a reusable skill library; at evaluation those skills are reused as planner context — no gradients, no RL, all learning through structured natural-language feedback and code reuse.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.19419

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.19419 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.19419 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.19419 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Playful Agentic Robot Learning

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers