Hugging Face Daily Papers · June 11, 2026 · 4 min read

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

A unified embodied foundation model achieving SOTA on embodied VLM and manipulation benchmarks with self-correction capabilities.</p>\n","updatedAt":"2026-06-11T02:40:23.872Z","author":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","fullname":"taesiri","name":"taesiri","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":314,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9565008282661438},"editors":["taesiri"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.11324","authors":[{"_id":"6a2a1fe980a9c7c6830c0ef3","name":"Yifu Yuan","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0ef4","name":"Yaoting Huang","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0ef5","name":"Xianze Yao","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0ef6","name":"Yutong Li","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0ef7","name":"Shuoheng Zhang","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0ef8","name":"Linqi Han","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0ef9","name":"Pengyi Li","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0efa","name":"Jiangeng Sun","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0efb","name":"Wenting Jia","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0efc","name":"Zhao Zhang","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0efd","name":"Yuhao Liu","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0efe","name":"Ruihao Liao","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0eff","name":"Yucheng Hu","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0f00","name":"Qiyu Wu","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0f01","name":"Yuxiao Li","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0f02","name":"Zibin Dong","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0f03","name":"Fei Ni","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0f04","name":"Yan Zheng","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0f05","name":"Shuyang Gu","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0f06","name":"Yi Ma","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0f07","name":"Hongyao Tang","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0f08","name":"Han Hu","hidden":false},{"_id":"6a2a1fe980a9c7c6830c0f09","name":"Jianye Hao","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6039478ab3ecf716b1a5fd4d/vHzEuzfltjA42F8fTNTf8.mp4"],"publishedAt":"2026-06-09T00:00:00.000Z","submittedOnDailyAt":"2026-06-11T00:00:00.000Z","title":"Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models","submittedOnDailyBy":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user","name":"taesiri"},"summary":"We introduce Embodied-R1.5, a unified Embodied Foundation Model (EFM) that integrates comprehensive embodied reasoning capabilities, spanning embodied cognition, task planning, correction, and pointing, within a single architecture toward general physical intelligence. Leveraging three automated data construction pipelines to significantly expand the data coverage of critical capabilities, we build a large-scale data system of over 15B tokens, and design a multi-task balanced RL recipe to alleviate heterogeneous task conflicts. We further introduce a Planner-Grounder-Corrector (PGC) closed-loop framework that enables a single model to autonomously execute and self-correct over long-horizon tasks. With only 8B parameters, Embodied-R1.5 achieves SOTA on 16 out of 24 embodied VLM benchmarks, surpassing leading models like Gemini-Robotics-ER-1.5 and GPT-5.4. Benefiting from the internalized embodied capabilities, Embodied-R1.5 can be fine-tuned into a VLA with only a small amount of data, outperforming leading VLA models like π_{0.5} across 4 popular manipulation benchmark suites. We further conduct extensive zero-shot real-robot experiments, validating performance in instruction following, affordance grounding, articulated object manipulation, and long-horizon complex tasks, demonstrating strong generalization to the physical world. We open-source model weights, datasets, training code, and EmbodiedEvalKit, an evaluation framework tailored for embodied tasks, to facilitate future research in EFMs.","upvotes":6,"discussionId":"6a2a1fe980a9c7c6830c0f0a","projectPage":"https://embodied-r.github.io/","githubRepo":"https://github.com/pickxiguapi/Embodied-R1.5","githubRepoAddedBy":"user","ai_summary":"Embodied-R1.5 is a unified embodied foundation model that integrates embodied reasoning capabilities and achieves state-of-the-art performance on embodied vision-language benchmarks through a multi-task balanced reinforcement learning approach.","ai_keywords":["Embodied Foundation Model","embodied cognition","task planning","correction","pointing","data construction pipelines","multi-task balanced RL","Planner-Grounder-Corrector framework","VLA","reinforcement learning"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":17},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"69538b1b95778588fda50508","avatarUrl":"/avatars/7c285373b73914a3aaac50c6453503dd.svg","isPro":false,"fullname":"xmufqk","user":"LFQ4XMU","type":"user"},{"_id":"69538df16d2ff5cb6dc737f6","avatarUrl":"/avatars/187b8b1bd81b16f807faebbd4897406d.svg","isPro":false,"fullname":"JeremyFrankl","user":"JFKisme","type":"user"},{"_id":"695388d4663d1795c74fc1ae","avatarUrl":"/avatars/2737fd731dad7a862464393ae60de469.svg","isPro":false,"fullname":"LouisPalmer","user":"Lotus487","type":"user"},{"_id":"6953a22727f9d6b3746c6d85","avatarUrl":"/avatars/79dca5dbc0a0d72c370cc42cd58e52ab.svg","isPro":false,"fullname":"AnYang","user":"AnthonyYoung","type":"user"},{"_id":"6953897fa6ebf89c814f4cc5","avatarUrl":"/avatars/5f287f9e303ff1c187713fc89e84330f.svg","isPro":false,"fullname":"MBerger","user":"SHakeShakeShake","type":"user"},{"_id":"6a2ae6c2e36bc84d91b6e7cc","avatarUrl":"/avatars/abf4b4c0020f9332b6827952cc53163e.svg","isPro":false,"fullname":"mmgood","user":"mmgood","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.11324.md"}">

Papers

arxiv:2606.11324

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

Published on Jun 9

· Submitted by

taesiri on Jun 11

Upvote

Authors:

Abstract

Embodied-R1.5 is a unified embodied foundation model that integrates embodied reasoning capabilities and achieves state-of-the-art performance on embodied vision-language benchmarks through a multi-task balanced reinforcement learning approach.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

We introduce Embodied-R1.5, a unified Embodied Foundation Model (EFM) that integrates comprehensive embodied reasoning capabilities, spanning embodied cognition, task planning, correction, and pointing, within a single architecture toward general physical intelligence. Leveraging three automated data construction pipelines to significantly expand the data coverage of critical capabilities, we build a large-scale data system of over 15B tokens, and design a multi-task balanced RL recipe to alleviate heterogeneous task conflicts. We further introduce a Planner-Grounder-Corrector (PGC) closed-loop framework that enables a single model to autonomously execute and self-correct over long-horizon tasks. With only 8B parameters, Embodied-R1.5 achieves SOTA on 16 out of 24 embodied VLM benchmarks, surpassing leading models like Gemini-Robotics-ER-1.5 and GPT-5.4. Benefiting from the internalized embodied capabilities, Embodied-R1.5 can be fine-tuned into a VLA with only a small amount of data, outperforming leading VLA models like π_{0.5} across 4 popular manipulation benchmark suites. We further conduct extensive zero-shot real-robot experiments, validating performance in instruction following, affordance grounding, articulated object manipulation, and long-horizon complex tasks, demonstrating strong generalization to the physical world. We open-source model weights, datasets, training code, and EmbodiedEvalKit, an evaluation framework tailored for embodied tasks, to facilitate future research in EFMs.

View arXiv page View PDF Project page GitHub 17 Add to collection

Community

taesiri

Paper submitter about 17 hours ago

A unified embodied foundation model achieving SOTA on embodied VLM and manipulation benchmarks with self-correction capabilities.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.11324

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.11324 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.11324 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.11324 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

No comments yet. Sign in and be the first to say something.

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 1

Discussion (0)

More from Hugging Face Daily Papers