Hugging Face Daily Papers · June 12, 2026 · 6 min read

Revisiting Articulated Parts Perception in Robot Manipulation

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

We are surrounded by various objects with movable, articulated parts, e.g., box, handle, door. An accurate and generalizable perception of articulated parts is essential to enhance robotic manipulation capabilities. Building on this need, recent efforts in articulated parts perception have followed two main directions: One line of work uses pose-based representation, which requires high manual cost; in parallel, affordance-based methods extract future object motion from point tracking without additional manual efforts, but suffer from low-quality data. In this paper, we propose a new representation of articulated parts, Geometric Primary Structure (GPS), an abstraction of the part geometry structure to balance scalability and quality. For efficient and scalable data collection, GPS is integrated with a portable Virtual Reality (VR) device and requires only one minute to annotate one object sequence. This direct human annotation provides higher quality than the estimated affordance. With this efficient VR-GPS system, we collect 41K frames for 234 objects across six part classes, and train a generalizable GPS model with a single RGB-D object image as input. For object manipulation, we deploy a heuristic policy based on GPS prediction. Without any in-domain fine-tuning, our method achieves an 73% success rate, covering 270 initial states for 9 objects.</p>\n","updatedAt":"2026-06-12T09:09:17.517Z","author":{"_id":"644f70be17b6189cda550b82","avatarUrl":"/avatars/d0ec210e6f1d971e9a5a81a60adfc67f.svg","fullname":"Xiaoqian Wu","name":"PandaQQ","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8924102187156677},"editors":["PandaQQ"],"editorAvatarUrls":["/avatars/d0ec210e6f1d971e9a5a81a60adfc67f.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.08103","authors":[{"_id":"6a2bcc29d6d3313f3ac57dab","name":"Xiaoqian Wu","hidden":false},{"_id":"6a2bcc29d6d3313f3ac57dac","name":"Yejie Guo","hidden":false},{"_id":"6a2bcc29d6d3313f3ac57dad","name":"Xiaoyang Chen","hidden":false},{"_id":"6a2bcc29d6d3313f3ac57dae","name":"Lixin Yang","hidden":false},{"_id":"6a2bcc29d6d3313f3ac57daf","name":"Cewu Lu","hidden":false},{"_id":"6a2bcc29d6d3313f3ac57db0","name":"Yong-Lu Li","hidden":false}],"publishedAt":"2026-06-06T00:00:00.000Z","submittedOnDailyAt":"2026-06-12T00:00:00.000Z","title":"Revisiting Articulated Parts Perception in Robot Manipulation","submittedOnDailyBy":{"_id":"644f70be17b6189cda550b82","avatarUrl":"/avatars/d0ec210e6f1d971e9a5a81a60adfc67f.svg","isPro":false,"fullname":"Xiaoqian Wu","user":"PandaQQ","type":"user","name":"PandaQQ"},"summary":"We are surrounded by various objects with movable, articulated parts, e.g., box, handle, door. An accurate and generalizable perception of articulated parts is essential to enhance robotic manipulation capabilities. Building on this need, recent efforts in articulated parts perception have followed two main directions: One line of work uses pose-based representation, which requires high manual cost; in parallel, affordance-based methods extract future object motion from point tracking without additional manual efforts, but suffer from low-quality data. In this paper, we propose a new representation of articulated parts, Geometric Primary Structure (GPS), an abstraction of the part geometry structure to balance scalability and quality. For efficient and scalable data collection, GPS is integrated with a portable Virtual Reality (VR) device and requires only one minute to annotate one object sequence. This direct human annotation provides higher quality than the estimated affordance. With this efficient VR-GPS system, we collect 41K frames for 234 objects across six part classes, and train a generalizable GPS model with a single RGB-D object image as input. For object manipulation, we deploy a heuristic policy based on GPS prediction. Without any in-domain fine-tuning, our method achieves an 73% success rate, covering 270 initial states for 9 objects. Our code, data and reusable tool are available at https://enlighten0707.github.io/gps.","upvotes":0,"discussionId":"6a2bcc29d6d3313f3ac57db1","projectPage":"https://enlighten0707.github.io/gps/","githubRepo":"https://github.com/enlighten0707/Geometric_Primary_Structure","githubRepoAddedBy":"user","ai_summary":"A new geometric representation called Geometric Primary Structure (GPS) is introduced for articulated parts perception, enabling efficient data collection through VR annotation and achieving high manipulation success rates without fine-tuning.","ai_keywords":["Geometric Primary Structure","articulated parts perception","pose-based representation","affordance-based methods","Virtual Reality","RGB-D","heuristic policy","manipulation success rate"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":2,"organization":{"_id":"63e5ef7bf2e9a8f22c515654","name":"SJTU","fullname":"Shanghai Jiao Tong University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676013394657-63e5ee22b6a40bf941da0928.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"organization":{"_id":"63e5ef7bf2e9a8f22c515654","name":"SJTU","fullname":"Shanghai Jiao Tong University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676013394657-63e5ee22b6a40bf941da0928.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.08103.md","query":{}}">

Papers

arxiv:2606.08103

Revisiting Articulated Parts Perception in Robot Manipulation

Published on Jun 6

· Submitted by

Xiaoqian Wu on Jun 12

Shanghai Jiao Tong University

Upvote

Authors:

Abstract

A new geometric representation called Geometric Primary Structure (GPS) is introduced for articulated parts perception, enabling efficient data collection through VR annotation and achieving high manipulation success rates without fine-tuning.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

View arXiv page View PDF Project page GitHub 2 Add to collection

Community

PandaQQ

Paper submitter about 5 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.08103

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.08103 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.08103 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.08103 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Revisiting Articulated Parts Perception in Robot Manipulation

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers