Hugging Face Daily Papers · · 4 min read

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

A unified embodied foundation model achieving SOTA on embodied VLM and manipulation benchmarks with self-correction capabilities.</p>\n","updatedAt":"2026-06-11T02:40:23.872Z","author":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","fullname":"taesiri","name":"taesiri","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":314,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9565008282661438},"editors":["taesiri"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.11324","authors":[{"_id":"6a2a1fe980a9c7c6830c0ef3","name":"Yifu Yuan","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0ef4","name":"Yaoting Huang","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0ef5","name":"Xianze Yao","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0ef6","name":"Yutong Li","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0ef7","name":"Shuoheng Zhang","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0ef8","name":"Linqi Han","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0ef9","name":"Pengyi Li","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0efa","name":"Jiangeng Sun","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0efb","name":"Wenting Jia","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0efc","name":"Zhao Zhang","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0efd","name":"Yuhao Liu","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0efe","name":"Ruihao Liao","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0eff","name":"Yucheng Hu","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0f00","name":"Qiyu Wu","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0f01","name":"Yuxiao Li","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0f02","name":"Zibin Dong","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0f03","name":"Fei Ni","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0f04","name":"Yan Zheng","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0f05","name":"Shuyang Gu","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0f06","name":"Yi Ma","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0f07","name":"Hongyao Tang","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0f08","name":"Han Hu","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0f09","name":"Jianye Hao","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6039478ab3ecf716b1a5fd4d/vHzEuzfltjA42F8fTNTf8.mp4"],"publishedAt":"2026-06-09T00:00:00.000Z","submittedOnDailyAt":"2026-06-11T00:00:00.000Z","title":"Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models","submittedOnDailyBy":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user","name":"taesiri"},"summary":"We introduce Embodied-R1.5, a unified Embodied Foundation Model (EFM) that integrates comprehensive embodied reasoning capabilities, spanning embodied cognition, task planning, correction, and pointing, within a single architecture toward general physical intelligence. Leveraging three automated data construction pipelines to significantly expand the data coverage of critical capabilities, we build a large-scale data system of over 15B tokens, and design a multi-task balanced RL recipe to alleviate heterogeneous task conflicts. We further introduce a Planner-Grounder-Corrector (PGC) closed-loop framework that enables a single model to autonomously execute and self-correct over long-horizon tasks. With only 8B parameters, Embodied-R1.5 achieves SOTA on 16 out of 24 embodied VLM benchmarks, surpassing leading models like Gemini-Robotics-ER-1.5 and GPT-5.4. Benefiting from the internalized embodied capabilities, Embodied-R1.5 can be fine-tuned into a VLA with only a small amount of data, outperforming leading VLA models like π_{0.5} across 4 popular manipulation benchmark suites. We further conduct extensive zero-shot real-robot experiments, validating performance in instruction following, affordance grounding, articulated object manipulation, and long-horizon complex tasks, demonstrating strong generalization to the physical world. We open-source model weights, datasets, training code, and EmbodiedEvalKit, an evaluation framework tailored for embodied tasks, to facilitate future research in EFMs.","upvotes":6,"discussionId":"6a2a1fe980a9c7c6830c0f0a","projectPage":"https://embodied-r.github.io/","githubRepo":"https://github.com/pickxiguapi/Embodied-R1.5","githubRepoAddedBy":"user","ai_summary":"Embodied-R1.5 is a unified embodied foundation model that integrates embodied reasoning capabilities and achieves state-of-the-art performance on embodied vision-language benchmarks through a multi-task balanced reinforcement learning approach.","ai_keywords":["Embodied Foundation Model","embodied cognition","task planning","correction","pointing","data construction pipelines","multi-task balanced RL","Planner-Grounder-Corrector framework","VLA","reinforcement learning"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":17},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"69538b1b95778588fda50508","avatarUrl":"/avatars/7c285373b73914a3aaac50c6453503dd.svg","isPro":false,"fullname":"xmufqk","user":"LFQ4XMU","type":"user"},{"_id":"69538df16d2ff5cb6dc737f6","avatarUrl":"/avatars/187b8b1bd81b16f807faebbd4897406d.svg","isPro":false,"fullname":"JeremyFrankl","user":"JFKisme","type":"user"},{"_id":"695388d4663d1795c74fc1ae","avatarUrl":"/avatars/2737fd731dad7a862464393ae60de469.svg","isPro":false,"fullname":"LouisPalmer","user":"Lotus487","type":"user"},{"_id":"6953a22727f9d6b3746c6d85","avatarUrl":"/avatars/79dca5dbc0a0d72c370cc42cd58e52ab.svg","isPro":false,"fullname":"AnYang","user":"AnthonyYoung","type":"user"},{"_id":"6953897fa6ebf89c814f4cc5","avatarUrl":"/avatars/5f287f9e303ff1c187713fc89e84330f.svg","isPro":false,"fullname":"MBerger","user":"SHakeShakeShake","type":"user"},{"_id":"6a2ae6c2e36bc84d91b6e7cc","avatarUrl":"/avatars/abf4b4c0020f9332b6827952cc53163e.svg","isPro":false,"fullname":"mmgood","user":"mmgood","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.11324.md"}">
Papers
arxiv:2606.11324

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

Published on Jun 9
· Submitted by
taesiri
on Jun 11
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Embodied-R1.5 is a unified embodied foundation model that integrates embodied reasoning capabilities and achieves state-of-the-art performance on embodied vision-language benchmarks through a multi-task balanced reinforcement learning approach.

We introduce Embodied-R1.5, a unified Embodied Foundation Model (EFM) that integrates comprehensive embodied reasoning capabilities, spanning embodied cognition, task planning, correction, and pointing, within a single architecture toward general physical intelligence. Leveraging three automated data construction pipelines to significantly expand the data coverage of critical capabilities, we build a large-scale data system of over 15B tokens, and design a multi-task balanced RL recipe to alleviate heterogeneous task conflicts. We further introduce a Planner-Grounder-Corrector (PGC) closed-loop framework that enables a single model to autonomously execute and self-correct over long-horizon tasks. With only 8B parameters, Embodied-R1.5 achieves SOTA on 16 out of 24 embodied VLM benchmarks, surpassing leading models like Gemini-Robotics-ER-1.5 and GPT-5.4. Benefiting from the internalized embodied capabilities, Embodied-R1.5 can be fine-tuned into a VLA with only a small amount of data, outperforming leading VLA models like π_{0.5} across 4 popular manipulation benchmark suites. We further conduct extensive zero-shot real-robot experiments, validating performance in instruction following, affordance grounding, articulated object manipulation, and long-horizon complex tasks, demonstrating strong generalization to the physical world. We open-source model weights, datasets, training code, and EmbodiedEvalKit, an evaluation framework tailored for embodied tasks, to facilitate future research in EFMs.

Community

Paper submitter about 17 hours ago

A unified embodied foundation model achieving SOTA on embodied VLM and manipulation benchmarks with self-correction capabilities.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.11324
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.11324 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.11324 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.11324 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers