Generating 3D Head Avatars from just 70K Random Internet Images! No 3D, no multi-view, no studio, no view synthesis at any stage of training or inference.</p>\n","updatedAt":"2026-05-29T14:20:22.074Z","author":{"_id":"636459cae31159b7ca6db696","avatarUrl":"/avatars/bb076dad68477a3d16bf22bd7383e6e6.svg","fullname":"Aviral Chharia","name":"aviralchharia","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8292950987815857},"editors":["aviralchharia"],"editorAvatarUrls":["/avatars/bb076dad68477a3d16bf22bd7383e6e6.svg"],"reactions":[],"isReport":false}},{"id":"6a1a40d2b3470011372a7050","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false},"createdAt":"2026-05-30T01:43:46.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures](https://huggingface.co/papers/2605.04035) (2026)\n* [FFAvatar: Few-Shot, Feed-Forward, and Generalizable Avatar Reconstruction](https://huggingface.co/papers/2605.15320) (2026)\n* [Real-Time Human Reconstruction and Animation using Feed-Forward Gaussian Splatting](https://huggingface.co/papers/2604.10259) (2026)\n* [Any3DAvatar: Fast and High-Quality Full-Head 3D Avatar Reconstruction from Single Portrait Image](https://huggingface.co/papers/2604.13856) (2026)\n* [SketchFaceGS: Real-Time Sketch-Driven Face Editing and Generation with Gaussian Splatting](https://huggingface.co/papers/2604.19202) (2026)\n* [Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images](https://huggingface.co/papers/2604.10573) (2026)\n* [ROAR-3D: Routing Arbitrary Views for High-Fidelity 3D Generation](https://huggingface.co/papers/2605.21121) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.04035\">Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.15320\">FFAvatar: Few-Shot, Feed-Forward, and Generalizable Avatar Reconstruction</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.10259\">Real-Time Human Reconstruction and Animation using Feed-Forward Gaussian Splatting</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.13856\">Any3DAvatar: Fast and High-Quality Full-Head 3D Avatar Reconstruction from Single Portrait Image</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.19202\">SketchFaceGS: Real-Time Sketch-Driven Face Editing and Generation with Gaussian Splatting</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.10573\">Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.21121\">ROAR-3D: Routing Arbitrary Views for High-Fidelity 3D Generation</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{"user":"librarian-bot"}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-05-30T01:43:46.533Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7241173982620239},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[{"reaction":"👍","users":["aviralchharia"],"count":1}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.25220","authors":[{"_id":"6a193fde56b4bb14ec65d19b","user":{"_id":"636459cae31159b7ca6db696","avatarUrl":"/avatars/bb076dad68477a3d16bf22bd7383e6e6.svg","isPro":false,"fullname":"Aviral Chharia","user":"aviralchharia","type":"user","name":"aviralchharia"},"name":"Aviral Chharia","status":"claimed_verified","statusLastChangedAt":"2026-05-29T08:48:57.537Z","hidden":false},{"_id":"6a193fde56b4bb14ec65d19c","name":"Fernando De la Torre","hidden":false}],"publishedAt":"2026-05-24T00:00:00.000Z","submittedOnDailyAt":"2026-05-29T00:00:00.000Z","title":"Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation","submittedOnDailyBy":{"_id":"636459cae31159b7ca6db696","avatarUrl":"/avatars/bb076dad68477a3d16bf22bd7383e6e6.svg","isPro":false,"fullname":"Aviral Chharia","user":"aviralchharia","type":"user","name":"aviralchharia"},"summary":"High-fidelity 3D Gaussian head avatar generation is critical for applications such as AR/VR, telepresence, and digital humans. Existing methods depend on multi-view datasets, 3D captures, or intermediate 2D view synthesis. In contrast, we learn both conditional and unconditional 3D head models from randomly sampled 2D images alone, without using multi-view data, 3D supervision, or intermediate view generation. We introduce MVCHead, a single-shot state space model that enforces multi-view consistency (MVC) directly in the 3D representation while regressing 3D Gaussians under these constraints. At its core, we propose a Hierarchical State Space (HiSS) block that progressively refines Gaussians from coarse to fine, while capturing long-range dependencies. Within each HiSS block, we modify Mamba's standard unidirectional scan with the proposed Hierarchical Bi-directional State Scan (HiBiSS) that aligns recurrence with the axes along which multi-view inconsistencies are strongest. Finally, we design an SE(3) Multi-view Critic that judges whether a set of self-renders arises from a single underlying 3D configuration, rewarding cross-view pixel alignment without observing real multi-view pairs. MVCHead achieves state-of-the-art perceptual quality, surpasses prior methods in both texture and geometric consistency, and maintains comparable shape consistency. To demonstrate scalability, we release FaceGS-10K, the first large-scale dataset of ready-to-use 3D Gaussian head assets for training and evaluation of 3D head models. Project Page and code: https://humansensinglab.github.io/MVCHead/","upvotes":3,"discussionId":"6a193fdf56b4bb14ec65d19d","projectPage":"https://humansensinglab.github.io/MVCHead/","githubRepo":"https://github.com/humansensinglab/MVCHead","githubRepoAddedBy":"user","ai_summary":"A novel single-shot 3D Gaussian head avatar generation method called MVCHead uses hierarchical state space models and multi-view consistency enforcement to create high-fidelity 3D assets from 2D images without requiring multi-view data or 3D supervision.","ai_keywords":["3D Gaussian head avatar","state space model","multi-view consistency","Hierarchical State Space","HiSS block","Mamba","Hierarchical Bi-directional State Scan","SE(3) Multi-view Critic","self-renders","3D representation","2D images","3D head models","3D Gaussian","multi-view data","3D supervision","3D asset generation"],"githubStars":9,"organization":{"_id":"691d9a1012cc4d473e1c862f","name":"CarnegieMellonU","fullname":"Carnegie Mellon University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e396f2b5bb631e9b2fac9a/6I146aJvxxlRCEbYFFAeQ.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"636459cae31159b7ca6db696","avatarUrl":"/avatars/bb076dad68477a3d16bf22bd7383e6e6.svg","isPro":false,"fullname":"Aviral Chharia","user":"aviralchharia","type":"user"},{"_id":"646ff038799a974be31bb344","avatarUrl":"/avatars/d9dc17246fba8360e709235f55445ef5.svg","isPro":false,"fullname":"Yehonathan Litman","user":"thebluser","type":"user"},{"_id":"6351e5bb3734c6e8a5c1bec1","avatarUrl":"/avatars/a784a51b369b197398575c3afbd5ceab.svg","isPro":false,"fullname":"Han-Bit Kang","user":"hbkang","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"691d9a1012cc4d473e1c862f","name":"CarnegieMellonU","fullname":"Carnegie Mellon University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e396f2b5bb631e9b2fac9a/6I146aJvxxlRCEbYFFAeQ.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.25220.md"}">
Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation
Abstract
A novel single-shot 3D Gaussian head avatar generation method called MVCHead uses hierarchical state space models and multi-view consistency enforcement to create high-fidelity 3D assets from 2D images without requiring multi-view data or 3D supervision.
AI-generated summary
High-fidelity 3D Gaussian head avatar generation is critical for applications such as AR/VR, telepresence, and digital humans. Existing methods depend on multi-view datasets, 3D captures, or intermediate 2D view synthesis. In contrast, we learn both conditional and unconditional 3D head models from randomly sampled 2D images alone, without using multi-view data, 3D supervision, or intermediate view generation. We introduce MVCHead, a single-shot state space model that enforces multi-view consistency (MVC) directly in the 3D representation while regressing 3D Gaussians under these constraints. At its core, we propose a Hierarchical State Space (HiSS) block that progressively refines Gaussians from coarse to fine, while capturing long-range dependencies. Within each HiSS block, we modify Mamba's standard unidirectional scan with the proposed Hierarchical Bi-directional State Scan (HiBiSS) that aligns recurrence with the axes along which multi-view inconsistencies are strongest. Finally, we design an SE(3) Multi-view Critic that judges whether a set of self-renders arises from a single underlying 3D configuration, rewarding cross-view pixel alignment without observing real multi-view pairs. MVCHead achieves state-of-the-art perceptual quality, surpasses prior methods in both texture and geometric consistency, and maintains comparable shape consistency. To demonstrate scalability, we release FaceGS-10K, the first large-scale dataset of ready-to-use 3D Gaussian head assets for training and evaluation of 3D head models. Project Page and code: https://humansensinglab.github.io/MVCHead/
Community
Generating 3D Head Avatars from just 70K Random Internet Images! No 3D, no multi-view, no studio, no view synthesis at any stage of training or inference.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.25220 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.25220 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.25220 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.