Hugging Face Daily Papers · · 4 min read

EgoForce: Forearm-Guided Camera-Space 3D Hand Pose from a Monocular Egocentric Camera

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Absolute 3D hand pose and shape reconstruction from a single head-mounted camera is essential for smart-glasses-based AR, telepresence, and hand-centric manipulation. However, monocular RGB methods suffer from depth–scale ambiguity and poor generalization across diverse head-mounted camera models, often requiring costly device-specific training data. We introduce EgoForce, a unified monocular 3D hand reconstruction framework that recovers robust camera-space hand pose and position across fisheye, perspective, and distorted wide-FOV cameras. EgoForce combines a differentiable forearm representation, a unified arm–hand transformer, and a ray-space closed-form solver to stabilize pose estimation and resolve absolute 3D geometry across camera models.</p>\n","updatedAt":"2026-05-13T20:01:34.000Z","author":{"_id":"641555d9385a75d7790fb343","avatarUrl":"/avatars/255aec71e94f380de5af809c766404b7.svg","fullname":"Christen Milller","name":"chris10","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8660178780555725},"editors":["chris10"],"editorAvatarUrls":["/avatars/255aec71e94f380de5af809c766404b7.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.12498","authors":[{"_id":"6a04bcd5b1a8cbabc9f0856d","name":"Christen Millerdurai","hidden":false},{"_id":"6a04bcd5b1a8cbabc9f0856e","name":"Shaoxiang Wang","hidden":false},{"_id":"6a04bcd5b1a8cbabc9f0856f","name":"Yaxu Xie","hidden":false},{"_id":"6a04bcd5b1a8cbabc9f08570","name":"Vladislav Golyanik","hidden":false},{"_id":"6a04bcd5b1a8cbabc9f08571","name":"Didier Stricker","hidden":false},{"_id":"6a04bcd5b1a8cbabc9f08572","name":"Alain Pagani","hidden":false}],"publishedAt":"2026-05-12T00:00:00.000Z","submittedOnDailyAt":"2026-05-13T00:00:00.000Z","title":"EgoForce: Forearm-Guided Camera-Space 3D Hand Pose from a Monocular Egocentric Camera","submittedOnDailyBy":{"_id":"641555d9385a75d7790fb343","avatarUrl":"/avatars/255aec71e94f380de5af809c766404b7.svg","isPro":true,"fullname":"Christen Milller","user":"chris10","type":"user","name":"chris10"},"summary":"Reconstructing the absolute 3D pose and shape of the hands from the user's viewpoint using a single head-mounted camera is crucial for practical egocentric interaction in AR/VR, telepresence, and hand-centric manipulation tasks, where sensing must remain compact and unobtrusive. While monocular RGB methods have made progress, they remain constrained by depth-scale ambiguity and struggle to generalize across the diverse optical configurations of head-mounted devices. As a result, models typically require extensive training on device-specific datasets, which are costly and laborious to acquire. This paper addresses these challenges by introducing EgoForce, a monocular 3D hand reconstruction framework that recovers robust, absolute 3D hand pose and its position from the user's (camera-space) viewpoint. EgoForce operates across fisheye, perspective, and distorted wide-FOV camera models using a single unified network. Our approach combines a differentiable forearm representation that stabilizes hand pose, a unified arm-hand transformer that predicts both hand and forearm geometry from a single egocentric view, mitigating depth-scale ambiguity, and a ray space closed-form solver that enables absolute 3D pose recovery across diverse head-mounted camera models. Experiments on three egocentric benchmarks show that EgoForce achieves state-of-the-art 3D accuracy, reducing camera-space MPJPE by up to 28% on the HOT3D dataset compared to prior methods and maintaining consistent performance across camera configurations. For more details, visit the project page at https://dfki-av.github.io/EgoForce.","upvotes":1,"discussionId":"6a04bcd5b1a8cbabc9f08573","projectPage":"https://dfki-av.github.io/EgoForce/","githubRepo":"https://github.com/dfki-av/EgoForce","githubRepoAddedBy":"user","ai_summary":"EgoForce is a monocular 3D hand reconstruction framework that uses a unified network to recover robust, absolute hand pose and position across different camera models through differentiable forearm representation, arm-hand transformers, and ray space solvers.","ai_keywords":["monocular 3D hand reconstruction","egocentric vision","camera-space viewpoint","differentiable forearm representation","arm-hand transformer","ray space closed-form solver","depth-scale ambiguity","fisheye camera model","perspective camera model","distorted wide-FOV camera model"],"githubStars":4,"organization":{"_id":"68e3c1b372607eeeeeaa9662","name":"dfki-av","fullname":"Augmented Vision","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/64da1dd841f0e6c0e9638b67/k24siR3YWnto5sMIRAwu9.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"641555d9385a75d7790fb343","avatarUrl":"/avatars/255aec71e94f380de5af809c766404b7.svg","isPro":true,"fullname":"Christen Milller","user":"chris10","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"68e3c1b372607eeeeeaa9662","name":"dfki-av","fullname":"Augmented Vision","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/64da1dd841f0e6c0e9638b67/k24siR3YWnto5sMIRAwu9.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.12498.md"}">
Papers
arxiv:2605.12498

EgoForce: Forearm-Guided Camera-Space 3D Hand Pose from a Monocular Egocentric Camera

Published on May 12
· Submitted by
Christen Milller
on May 13
Authors:
,
,
,
,
,

Abstract

EgoForce is a monocular 3D hand reconstruction framework that uses a unified network to recover robust, absolute hand pose and position across different camera models through differentiable forearm representation, arm-hand transformers, and ray space solvers.

AI-generated summary

Reconstructing the absolute 3D pose and shape of the hands from the user's viewpoint using a single head-mounted camera is crucial for practical egocentric interaction in AR/VR, telepresence, and hand-centric manipulation tasks, where sensing must remain compact and unobtrusive. While monocular RGB methods have made progress, they remain constrained by depth-scale ambiguity and struggle to generalize across the diverse optical configurations of head-mounted devices. As a result, models typically require extensive training on device-specific datasets, which are costly and laborious to acquire. This paper addresses these challenges by introducing EgoForce, a monocular 3D hand reconstruction framework that recovers robust, absolute 3D hand pose and its position from the user's (camera-space) viewpoint. EgoForce operates across fisheye, perspective, and distorted wide-FOV camera models using a single unified network. Our approach combines a differentiable forearm representation that stabilizes hand pose, a unified arm-hand transformer that predicts both hand and forearm geometry from a single egocentric view, mitigating depth-scale ambiguity, and a ray space closed-form solver that enables absolute 3D pose recovery across diverse head-mounted camera models. Experiments on three egocentric benchmarks show that EgoForce achieves state-of-the-art 3D accuracy, reducing camera-space MPJPE by up to 28% on the HOT3D dataset compared to prior methods and maintaining consistent performance across camera configurations. For more details, visit the project page at https://dfki-av.github.io/EgoForce.

Community

Paper submitter about 6 hours ago

Absolute 3D hand pose and shape reconstruction from a single head-mounted camera is essential for smart-glasses-based AR, telepresence, and hand-centric manipulation. However, monocular RGB methods suffer from depth–scale ambiguity and poor generalization across diverse head-mounted camera models, often requiring costly device-specific training data. We introduce EgoForce, a unified monocular 3D hand reconstruction framework that recovers robust camera-space hand pose and position across fisheye, perspective, and distorted wide-FOV cameras. EgoForce combines a differentiable forearm representation, a unified arm–hand transformer, and a ray-space closed-form solver to stabilize pose estimation and resolve absolute 3D geometry across camera models.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.12498
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers