Hugging Face Daily Papers · · 3 min read

SpatialAvatar-0: High-Quality 4D Head Avatar with Multi-Stage Reconstruction

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

<a href=\"https://spatialwalk.github.io/SpatialAvatar-0/\" rel=\"nofollow\">https://spatialwalk.github.io/SpatialAvatar-0/</a></p>\n","updatedAt":"2026-06-22T07:39:16.220Z","author":{"_id":"64ec877bb93654d4ca5c92e9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ec877bb93654d4ca5c92e9/-HrdFFQd8UKPKiTkbunQF.png","fullname":"Zeyu Zhang","name":"SteveZeyuZhang","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5661388039588928},"editors":["SteveZeyuZhang"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/64ec877bb93654d4ca5c92e9/-HrdFFQd8UKPKiTkbunQF.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.15659","authors":[{"_id":"6a38e633db23715e9da13b5f","name":"Yiran Wang","hidden":false},{"_id":"6a38e633db23715e9da13b60","name":"Zeyu Zhang","hidden":false},{"_id":"6a38e633db23715e9da13b61","name":"Yuanming Li","hidden":false},{"_id":"6a38e633db23715e9da13b62","name":"Ziming Wang","hidden":false},{"_id":"6a38e633db23715e9da13b63","name":"Yang Zhao","hidden":false}],"publishedAt":"2026-06-14T00:00:00.000Z","submittedOnDailyAt":"2026-06-22T00:00:00.000Z","title":"SpatialAvatar-0: High-Quality 4D Head Avatar with Multi-Stage Reconstruction","submittedOnDailyBy":{"_id":"64ec877bb93654d4ca5c92e9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ec877bb93654d4ca5c92e9/-HrdFFQd8UKPKiTkbunQF.png","isPro":false,"fullname":"Zeyu Zhang","user":"SteveZeyuZhang","type":"user","name":"SteveZeyuZhang"},"summary":"High-quality 4D head avatars from one or a few source portraits are central to telepresence, AR/VR, and digital-human interaction. 3D Gaussian Splatting (3DGS) has emerged as the dominant representation, with two complementary regimes (generalizable feed-forward predictors and per-subject refiners) maturing in parallel. However, existing feed-forward predictors are trained on a single dataset family with a hard-coded source count, inheriting the corresponding domain bias. Per-subject refiners require 300K--600K iterations and rely on adaptive densification that destroys upstream Gaussian layouts, preventing the two regimes from sharing a representation end-to-end. To bridge both regimes we propose SpatialAvatar-0 on a shared FLAME-mesh-bound Gaussian representation: a feed-forward generator with a parameter-free K-source mean-pool and a monocular-temporal to multi-view-spatial two-phase schedule that anchors against identity-prior collapse onto the smaller multi-view set. We further introduce a 10K-iter layout-preserving per-subject refinement loop that freezes the FLAME-binding and Gaussian count and replaces densification with a three-component anti-spike regularization. On VFHQ/HDTF cross-domain zero-shot we surpass the in-domain leader GAGAvatar by +1.5 dB PSNR despite never training on either test domain, and on the SplattingAvatar monocular benchmark we lead every reported metric, surpassing the 300K-iter GeoAvatar by +1.3 dB PSNR at up to 60x shorter per-subject schedule than common SOTA baselines. Website: https://spatialwalk.github.io/SpatialAvatar-0.","upvotes":0,"discussionId":"6a38e633db23715e9da13b64","projectPage":"https://spatialwalk.github.io/SpatialAvatar-0/","ai_summary":"SpatialAvatar-0 enables high-quality 4D head avatar generation by combining feed-forward prediction with per-subject refinement through a shared Gaussian representation, achieving superior performance across multiple benchmarks.","ai_keywords":["3D Gaussian Splatting","FLAME-mesh-bound Gaussian representation","feed-forward predictor","per-subject refiner","mean-pool","monocular-temporal to multi-view-spatial scheduling","identity-prior collapse","anti-spike regularization","PSNR","cross-domain zero-shot","SplattingAvatar benchmark"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct"},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.15659.md","query":{}}">
Papers
arxiv:2606.15659

SpatialAvatar-0: High-Quality 4D Head Avatar with Multi-Stage Reconstruction

Published on Jun 14
· Submitted by
Zeyu Zhang
on Jun 22
Authors:
,
,
,
,

Abstract

SpatialAvatar-0 enables high-quality 4D head avatar generation by combining feed-forward prediction with per-subject refinement through a shared Gaussian representation, achieving superior performance across multiple benchmarks.

High-quality 4D head avatars from one or a few source portraits are central to telepresence, AR/VR, and digital-human interaction. 3D Gaussian Splatting (3DGS) has emerged as the dominant representation, with two complementary regimes (generalizable feed-forward predictors and per-subject refiners) maturing in parallel. However, existing feed-forward predictors are trained on a single dataset family with a hard-coded source count, inheriting the corresponding domain bias. Per-subject refiners require 300K--600K iterations and rely on adaptive densification that destroys upstream Gaussian layouts, preventing the two regimes from sharing a representation end-to-end. To bridge both regimes we propose SpatialAvatar-0 on a shared FLAME-mesh-bound Gaussian representation: a feed-forward generator with a parameter-free K-source mean-pool and a monocular-temporal to multi-view-spatial two-phase schedule that anchors against identity-prior collapse onto the smaller multi-view set. We further introduce a 10K-iter layout-preserving per-subject refinement loop that freezes the FLAME-binding and Gaussian count and replaces densification with a three-component anti-spike regularization. On VFHQ/HDTF cross-domain zero-shot we surpass the in-domain leader GAGAvatar by +1.5 dB PSNR despite never training on either test domain, and on the SplattingAvatar monocular benchmark we lead every reported metric, surpassing the 300K-iter GeoAvatar by +1.3 dB PSNR at up to 60x shorter per-subject schedule than common SOTA baselines. Website: https://spatialwalk.github.io/SpatialAvatar-0.

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.15659
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.15659 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.15659 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.15659 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers