Hugging Face Daily Papers · · 3 min read

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

The code has been open-sourced.</p>\n","updatedAt":"2026-05-13T04:36:18.923Z","author":{"_id":"64ec877bb93654d4ca5c92e9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ec877bb93654d4ca5c92e9/GvHk_KSdE9Rhnk_o-NaZX.jpeg","fullname":"Zeyu Zhang","name":"SteveZeyuZhang","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9974549412727356},"editors":["SteveZeyuZhang"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/64ec877bb93654d4ca5c92e9/GvHk_KSdE9Rhnk_o-NaZX.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.11354","authors":[{"_id":"6a03ff7386b054ce2fa40f4c","name":"Haoyu Zhang","hidden":false},{"_id":"6a03ff7386b054ce2fa40f4d","user":{"_id":"64ec877bb93654d4ca5c92e9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ec877bb93654d4ca5c92e9/GvHk_KSdE9Rhnk_o-NaZX.jpeg","isPro":false,"fullname":"Zeyu Zhang","user":"SteveZeyuZhang","type":"user","name":"SteveZeyuZhang"},"name":"Zeyu Zhang","status":"claimed_verified","statusLastChangedAt":"2026-05-13T07:43:33.596Z","hidden":false},{"_id":"6a03ff7386b054ce2fa40f4e","name":"Zedong Zhou","hidden":false},{"_id":"6a03ff7386b054ce2fa40f4f","name":"Yang Zhao","hidden":false},{"_id":"6a03ff7386b054ce2fa40f50","name":"Hao Tang","hidden":false}],"publishedAt":"2026-05-12T00:00:00.000Z","submittedOnDailyAt":"2026-05-13T00:00:00.000Z","title":"Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction","submittedOnDailyBy":{"_id":"64ec877bb93654d4ca5c92e9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ec877bb93654d4ca5c92e9/GvHk_KSdE9Rhnk_o-NaZX.jpeg","isPro":false,"fullname":"Zeyu Zhang","user":"SteveZeyuZhang","type":"user","name":"SteveZeyuZhang"},"summary":"Transformer-based 3D reconstruction has emerged as a powerful paradigm for recovering geometry and appearance from multi-view observations, offering strong performance across challenging visual conditions. As these models scale to larger backbones and higher-resolution inputs, improving their efficiency becomes increasingly important for practical deployment. However, modern 3D transformer pipelines face two coupled challenges: dense multi-view attention creates substantial token-mixing overhead, and low-precision execution can destabilize geometry-sensitive representations and degrade depth, pose, and 3D consistency. To address the first challenge, we propose Lite3R, a model-agnostic teacher-student framework that replaces dense attention with Sparse Linear Attention to preserve important geometric interactions while reducing attention cost. To address the second challenge, we introduce a parameter-efficient FP8-aware quantization-aware training (FP8-aware QAT) strategy with partial attention distillation, which freezes the vast majority of pretrained backbone parameters and trains only lightweight linear-branch projection layers, enabling stable low-precision deployment while retaining pretrained geometric priors. We further evaluate Lite3R on two representative backbones, VGGT and DA3-Large, over BlendedMVS and DTU64, showing that it substantially reduces latency (1.7-2.0x) and memory usage (1.9-2.4x) while preserving competitive reconstruction quality overall. These results demonstrate that Lite3R provides an effective algorithm-system co-design approach for practical transformer-based 3D reconstruction. Code: https://github.com/AIGeeksGroup/Lite3R. Website: https://aigeeksgroup.github.io/Lite3R.","upvotes":1,"discussionId":"6a03ff7386b054ce2fa40f51","projectPage":"https://aigeeksgroup.github.io/Lite3R","githubRepo":"https://github.com/AIGeeksGroup/Lite3R","githubRepoAddedBy":"user","ai_summary":"Lite3R addresses efficiency challenges in transformer-based 3D reconstruction through sparse attention and low-precision quantization while maintaining geometric accuracy.","ai_keywords":["transformer-based 3D reconstruction","multi-view attention","dense attention","sparse linear attention","parameter-efficient fine-tuning","FP8-aware quantization-aware training","attention distillation","pretrained backbone","geometric priors","latent space","3D consistency","depth estimation","pose estimation"],"githubStars":3,"organization":{"_id":"61dcd8e344f59573371b5cb6","name":"PekingUniversity","fullname":"Peking University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/vavgrBsnkSejriUF4lXDE.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6407e5294edf9f5c4fd32228","avatarUrl":"/avatars/8e2d55460e9fe9c426eb552baf4b2cb0.svg","isPro":false,"fullname":"Stoney Kang","user":"sikang99","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"61dcd8e344f59573371b5cb6","name":"PekingUniversity","fullname":"Peking University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/vavgrBsnkSejriUF4lXDE.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.11354.md"}">
Papers
arxiv:2605.11354

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

Published on May 12
· Submitted by
Zeyu Zhang
on May 13
Authors:
,
,
,

Abstract

Lite3R addresses efficiency challenges in transformer-based 3D reconstruction through sparse attention and low-precision quantization while maintaining geometric accuracy.

AI-generated summary

Transformer-based 3D reconstruction has emerged as a powerful paradigm for recovering geometry and appearance from multi-view observations, offering strong performance across challenging visual conditions. As these models scale to larger backbones and higher-resolution inputs, improving their efficiency becomes increasingly important for practical deployment. However, modern 3D transformer pipelines face two coupled challenges: dense multi-view attention creates substantial token-mixing overhead, and low-precision execution can destabilize geometry-sensitive representations and degrade depth, pose, and 3D consistency. To address the first challenge, we propose Lite3R, a model-agnostic teacher-student framework that replaces dense attention with Sparse Linear Attention to preserve important geometric interactions while reducing attention cost. To address the second challenge, we introduce a parameter-efficient FP8-aware quantization-aware training (FP8-aware QAT) strategy with partial attention distillation, which freezes the vast majority of pretrained backbone parameters and trains only lightweight linear-branch projection layers, enabling stable low-precision deployment while retaining pretrained geometric priors. We further evaluate Lite3R on two representative backbones, VGGT and DA3-Large, over BlendedMVS and DTU64, showing that it substantially reduces latency (1.7-2.0x) and memory usage (1.9-2.4x) while preserving competitive reconstruction quality overall. These results demonstrate that Lite3R provides an effective algorithm-system co-design approach for practical transformer-based 3D reconstruction. Code: https://github.com/AIGeeksGroup/Lite3R. Website: https://aigeeksgroup.github.io/Lite3R.

Community

Paper author Paper submitter about 16 hours ago

The code has been open-sourced.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.11354
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.11354 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.11354 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.11354 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers