Hugging Face Daily Papers · May 29, 2026 · 6 min read

Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Generating 3D Head Avatars from just 70K Random Internet Images! No 3D, no multi-view, no studio, no view synthesis at any stage of training or inference.\n","updatedAt":"2026-05-29T14:20:22.074Z","author":{"_id":"636459cae31159b7ca6db696","avatarUrl":"/avatars/bb076dad68477a3d16bf22bd7383e6e6.svg","fullname":"Aviral Chharia","name":"aviralchharia","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8292950987815857},"editors":["aviralchharia"],"editorAvatarUrls":["/avatars/bb076dad68477a3d16bf22bd7383e6e6.svg"],"reactions":[],"isReport":false}},{"id":"6a1a40d2b3470011372a7050","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false},"createdAt":"2026-05-30T01:43:46.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures](https://huggingface.co/papers/2605.04035) (2026)\n* [FFAvatar: Few-Shot, Feed-Forward, and Generalizable Avatar Reconstruction](https://huggingface.co/papers/2605.15320) (2026)\n* [Real-Time Human Reconstruction and Animation using Feed-Forward Gaussian Splatting](https://huggingface.co/papers/2604.10259) (2026)\n* [Any3DAvatar: Fast and High-Quality Full-Head 3D Avatar Reconstruction from Single Portrait Image](https://huggingface.co/papers/2604.13856) (2026)\n* [SketchFaceGS: Real-Time Sketch-Driven Face Editing and Generation with Gaussian Splatting](https://huggingface.co/papers/2604.19202) (2026)\n* [Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images](https://huggingface.co/papers/2604.10573) (2026)\n* [ROAR-3D: Routing Arbitrary Views for High-Fidelity 3D Generation](https://huggingface.co/papers/2605.21121) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. \nThe following papers were recommended by the Semantic Scholar API \n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.04035\">Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.15320\">FFAvatar: Few-Shot, Feed-Forward, and Generalizable Avatar Reconstruction</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.10259\">Real-Time Human Reconstruction and Animation using Feed-Forward Gaussian Splatting</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.13856\">Any3DAvatar: Fast and High-Quality Full-Head 3D Avatar Reconstruction from Single Portrait Image</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.19202\">SketchFaceGS: Real-Time Sketch-Driven Face Editing and Generation with Gaussian Splatting</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.10573\">Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.21121\">ROAR-3D: Routing Arbitrary Views for High-Fidelity 3D Generation</a> (2026)</li>\n</ul>\n Please give a thumbs up to this comment if you found it helpful!\n If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><a href=\"/librarian-bot\">@librarian-bot</a> recommend</code>\n","updatedAt":"2026-05-30T01:43:46.533Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7241173982620239},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[{"reaction":"👍","users":["aviralchharia"],"count":1}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.25220","authors":[{"_id":"6a193fde56b4bb14ec65d19b","user":{"_id":"636459cae31159b7ca6db696","avatarUrl":"/avatars/bb076dad68477a3d16bf22bd7383e6e6.svg","isPro":false,"fullname":"Aviral Chharia","user":"aviralchharia","type":"user","name":"aviralchharia"},"name":"Aviral Chharia","status":"claimed_verified","statusLastChangedAt":"2026-05-29T08:48:57.537Z","hidden":false},{"_id":"6a193fde56b4bb14ec65d19c","name":"Fernando De la Torre","hidden":false}],"publishedAt":"2026-05-24T00:00:00.000Z","submittedOnDailyAt":"2026-05-29T00:00:00.000Z","title":"Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation","submittedOnDailyBy":{"_id":"636459cae31159b7ca6db696","avatarUrl":"/avatars/bb076dad68477a3d16bf22bd7383e6e6.svg","isPro":false,"fullname":"Aviral Chharia","user":"aviralchharia","type":"user","name":"aviralchharia"},"summary":"High-fidelity 3D Gaussian head avatar generation is critical for applications such as AR/VR, telepresence, and digital humans. Existing methods depend on multi-view datasets, 3D captures, or intermediate 2D view synthesis. In contrast, we learn both conditional and unconditional 3D head models from randomly sampled 2D images alone, without using multi-view data, 3D supervision, or intermediate view generation. We introduce MVCHead, a single-shot state space model that enforces multi-view consistency (MVC) directly in the 3D representation while regressing 3D Gaussians under these constraints. At its core, we propose a Hierarchical State Space (HiSS) block that progressively refines Gaussians from coarse to fine, while capturing long-range dependencies. Within each HiSS block, we modify Mamba's standard unidirectional scan with the proposed Hierarchical Bi-directional State Scan (HiBiSS) that aligns recurrence with the axes along which multi-view inconsistencies are strongest. Finally, we design an SE(3) Multi-view Critic that judges whether a set of self-renders arises from a single underlying 3D configuration, rewarding cross-view pixel alignment without observing real multi-view pairs. MVCHead achieves state-of-the-art perceptual quality, surpasses prior methods in both texture and geometric consistency, and maintains comparable shape consistency. To demonstrate scalability, we release FaceGS-10K, the first large-scale dataset of ready-to-use 3D Gaussian head assets for training and evaluation of 3D head models. Project Page and code: https://humansensinglab.github.io/MVCHead/","upvotes":3,"discussionId":"6a193fdf56b4bb14ec65d19d","projectPage":"https://humansensinglab.github.io/MVCHead/","githubRepo":"https://github.com/humansensinglab/MVCHead","githubRepoAddedBy":"user","ai_summary":"A novel single-shot 3D Gaussian head avatar generation method called MVCHead uses hierarchical state space models and multi-view consistency enforcement to create high-fidelity 3D assets from 2D images without requiring multi-view data or 3D supervision.","ai_keywords":["3D Gaussian head avatar","state space model","multi-view consistency","Hierarchical State Space","HiSS block","Mamba","Hierarchical Bi-directional State Scan","SE(3) Multi-view Critic","self-renders","3D representation","2D images","3D head models","3D Gaussian","multi-view data","3D supervision","3D asset generation"],"githubStars":9,"organization":{"_id":"691d9a1012cc4d473e1c862f","name":"CarnegieMellonU","fullname":"Carnegie Mellon University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e396f2b5bb631e9b2fac9a/6I146aJvxxlRCEbYFFAeQ.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"636459cae31159b7ca6db696","avatarUrl":"/avatars/bb076dad68477a3d16bf22bd7383e6e6.svg","isPro":false,"fullname":"Aviral Chharia","user":"aviralchharia","type":"user"},{"_id":"646ff038799a974be31bb344","avatarUrl":"/avatars/d9dc17246fba8360e709235f55445ef5.svg","isPro":false,"fullname":"Yehonathan Litman","user":"thebluser","type":"user"},{"_id":"6351e5bb3734c6e8a5c1bec1","avatarUrl":"/avatars/a784a51b369b197398575c3afbd5ceab.svg","isPro":false,"fullname":"Han-Bit Kang","user":"hbkang","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"691d9a1012cc4d473e1c862f","name":"CarnegieMellonU","fullname":"Carnegie Mellon University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e396f2b5bb631e9b2fac9a/6I146aJvxxlRCEbYFFAeQ.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.25220.md"}">

Papers

arxiv:2605.25220

Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation

Published on May 24

· Submitted by

Aviral Chharia on May 29

Carnegie Mellon University

Upvote

Authors:

Aviral Chharia ,

Abstract

A novel single-shot 3D Gaussian head avatar generation method called MVCHead uses hierarchical state space models and multi-view consistency enforcement to create high-fidelity 3D assets from 2D images without requiring multi-view data or 3D supervision.

AI-generated summary

High-fidelity 3D Gaussian head avatar generation is critical for applications such as AR/VR, telepresence, and digital humans. Existing methods depend on multi-view datasets, 3D captures, or intermediate 2D view synthesis. In contrast, we learn both conditional and unconditional 3D head models from randomly sampled 2D images alone, without using multi-view data, 3D supervision, or intermediate view generation. We introduce MVCHead, a single-shot state space model that enforces multi-view consistency (MVC) directly in the 3D representation while regressing 3D Gaussians under these constraints. At its core, we propose a Hierarchical State Space (HiSS) block that progressively refines Gaussians from coarse to fine, while capturing long-range dependencies. Within each HiSS block, we modify Mamba's standard unidirectional scan with the proposed Hierarchical Bi-directional State Scan (HiBiSS) that aligns recurrence with the axes along which multi-view inconsistencies are strongest. Finally, we design an SE(3) Multi-view Critic that judges whether a set of self-renders arises from a single underlying 3D configuration, rewarding cross-view pixel alignment without observing real multi-view pairs. MVCHead achieves state-of-the-art perceptual quality, surpasses prior methods in both texture and geometric consistency, and maintains comparable shape consistency. To demonstrate scalability, we release FaceGS-10K, the first large-scale dataset of ready-to-use 3D Gaussian head assets for training and evaluation of 3D head models. Project Page and code: https://humansensinglab.github.io/MVCHead/

View arXiv page View PDF Project page GitHub 9 Add to collection

Community

aviralchharia

Paper author Paper submitter 1 day ago

Generating 3D Head Avatars from just 70K Random Internet Images! No 3D, no multi-view, no studio, no view synthesis at any stage of training or inference.

librarian-bot

about 13 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.25220

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.25220 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.25220 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.25220 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

No comments yet. Sign in and be the first to say something.

Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 1

Discussion (0)

More from Hugging Face Daily Papers