Hugging Face Daily Papers · · 3 min read

ZipSplat: Fewer Gaussians, Better Splats

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Less number of 3D Gaussians, better performance</p>\n","updatedAt":"2026-06-04T15:22:22.733Z","author":{"_id":"686fb8a66ea5d5fb0a4953a9","avatarUrl":"/avatars/b4aa187b82dd04a5a7ece3b922d86657.svg","fullname":"Sunghwan Hong","name":"hongsunghwan","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8230152130126953},"editors":["hongsunghwan"],"editorAvatarUrls":["/avatars/b4aa187b82dd04a5a7ece3b922d86657.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.05102","authors":[{"_id":"6a2176c23490a593e87b0edc","name":"Alexander Veicht","hidden":false},{"_id":"6a2176c23490a593e87b0edd","name":"Sunghwan Hong","hidden":false},{"_id":"6a2176c23490a593e87b0ede","name":"Dániel Baráth","hidden":false},{"_id":"6a2176c23490a593e87b0edf","name":"Marc Pollefeys","hidden":false}],"publishedAt":"2026-06-03T00:00:00.000Z","submittedOnDailyAt":"2026-06-04T00:00:00.000Z","title":"ZipSplat: Fewer Gaussians, Better Splats","submittedOnDailyBy":{"_id":"686fb8a66ea5d5fb0a4953a9","avatarUrl":"/avatars/b4aa187b82dd04a5a7ece3b922d86657.svg","isPro":false,"fullname":"Sunghwan Hong","user":"hongsunghwan","type":"user","name":"hongsunghwan"},"summary":"Feed-forward 3D Gaussian Splatting methods reconstruct a scene from posed or pose-free images in a single forward pass, yet current approaches predict one Gaussian per input pixel, tying the representation budget to camera resolution rather than scene complexity. A flat wall and a richly textured object thus produce equally many Gaussians despite very different geometric needs. We propose ZipSplat, a token-based feed-forward model that decouples Gaussian placement from the pixel grid. A multi-view backbone extracts dense visual tokens, and k-means clustering compresses them into a compact set of scene tokens. Cross- and self-attention refine these tokens, and a lightweight MLP decodes each into a group of Gaussians with unconstrained 3D positions. Because clustering is applied at inference, a single trained model spans the quality-efficiency curve without retraining. ZipSplat operates without ground-truth poses or intrinsics, yet sets a new state of the art on DL3DV and RealEstate10K with {sim}6{times} fewer Gaussians than pixel-aligned methods, surpassing the best pose-free baseline by 2.1dB and 1.2dB PSNR, respectively. It further generalizes zero-shot to Mip-NeRF360 and ScanNet++, outperforming all comparable baselines. Our project page is at {https://veichta.com/zipsplat{https://veichta.com/zipsplat}}.","upvotes":13,"discussionId":"6a2176c23490a593e87b0ee0","projectPage":"https://veichta.com/zipsplat/","githubRepo":"https://github.com/cvg/ZipSplat","githubRepoAddedBy":"user","ai_summary":"ZipSplat is a token-based feed-forward method that decouples 3D Gaussian placement from pixel grid, enabling efficient scene reconstruction with fewer Gaussians and superior performance on pose-free imaging tasks.","ai_keywords":["3D Gaussian Splatting","feed-forward","token-based model","multi-view backbone","k-means clustering","cross-attention","self-attention","MLP decoding","pose-free","DL3DV","RealEstate10K","Mip-NeRF360","ScanNet++"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":28,"organization":{"_id":"63263d7db8e57aab1a778773","name":"ethz","fullname":"ETH Zurich","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/xMcrQI8Yx8o697uhiCcoA.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"686fb8a66ea5d5fb0a4953a9","avatarUrl":"/avatars/b4aa187b82dd04a5a7ece3b922d86657.svg","isPro":false,"fullname":"Sunghwan Hong","user":"hongsunghwan","type":"user"},{"_id":"69a66c6c20822b4afdb771fc","avatarUrl":"/avatars/52e504ace4b87b4fb87cfa181b1ae0ae.svg","isPro":false,"fullname":"KAIST-CVLAB","user":"kaistcvlab","type":"user"},{"_id":"637c49ec9c470afa3880b137","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/pdcMPz8N6vQM1tc8IA1lV.png","isPro":false,"fullname":"Seongchan Kim","user":"Seongchan","type":"user"},{"_id":"67c7b179e3f9241dde9ff772","avatarUrl":"/avatars/37cc7a744d8077a0fe7d926cde9d52b2.svg","isPro":false,"fullname":"LeeJaeho","user":"Jaeho0810","type":"user"},{"_id":"602e45160daeb0df2a81b244","avatarUrl":"/avatars/f6bf69f0c1342f8cfad05d5775e59bf4.svg","isPro":true,"fullname":"Seokju Cho","user":"hamacojr","type":"user"},{"_id":"6325abd45cf955bfbbdd68f7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6325abd45cf955bfbbdd68f7/w3cxi_Z5R7s5uDxEN35R5.jpeg","isPro":true,"fullname":"Alexander Veicht","user":"veichta","type":"user"},{"_id":"63f727b1bd28622c9b951114","avatarUrl":"/avatars/19fec633419e00e5363f0229c7c40b8d.svg","isPro":false,"fullname":"Chaehyun Kim","user":"chyun","type":"user"},{"_id":"67861f4658328c475597e540","avatarUrl":"/avatars/ff3d7b7912544cd0799d289e6c51db7a.svg","isPro":false,"fullname":"Seonghu Jeon","user":"SeonghuJeon","type":"user"},{"_id":"64cb5884d469fc2cf83bdd76","avatarUrl":"/avatars/10e63cf62d8200beef3e31846796e398.svg","isPro":false,"fullname":"JisooKim","user":"Jiiiiiisoo","type":"user"},{"_id":"67b54163c40b5b496bc3ded0","avatarUrl":"/avatars/1a1b1310cf1d664983d780651be55e04.svg","isPro":true,"fullname":"Jini Yang","user":"jini-yang","type":"user"},{"_id":"6752b6315281c3cae4b0783f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/xmcyVEl2xBhk3G5_7dmpz.png","isPro":false,"fullname":"Paul Hyunbin Cho","user":"paulcho98","type":"user"},{"_id":"652554ff88514c588fb9ea01","avatarUrl":"/avatars/50f2218632d1423980a3e5bef4e1c4e8.svg","isPro":false,"fullname":"Junghyun Park","user":"jamespark30","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"63263d7db8e57aab1a778773","name":"ethz","fullname":"ETH Zurich","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/xMcrQI8Yx8o697uhiCcoA.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.05102.md"}">
Papers
arxiv:2606.05102

ZipSplat: Fewer Gaussians, Better Splats

Published on Jun 3
· Submitted by
Sunghwan Hong
on Jun 4
Authors:
,
,
,

Abstract

ZipSplat is a token-based feed-forward method that decouples 3D Gaussian placement from pixel grid, enabling efficient scene reconstruction with fewer Gaussians and superior performance on pose-free imaging tasks.

Feed-forward 3D Gaussian Splatting methods reconstruct a scene from posed or pose-free images in a single forward pass, yet current approaches predict one Gaussian per input pixel, tying the representation budget to camera resolution rather than scene complexity. A flat wall and a richly textured object thus produce equally many Gaussians despite very different geometric needs. We propose ZipSplat, a token-based feed-forward model that decouples Gaussian placement from the pixel grid. A multi-view backbone extracts dense visual tokens, and k-means clustering compresses them into a compact set of scene tokens. Cross- and self-attention refine these tokens, and a lightweight MLP decodes each into a group of Gaussians with unconstrained 3D positions. Because clustering is applied at inference, a single trained model spans the quality-efficiency curve without retraining. ZipSplat operates without ground-truth poses or intrinsics, yet sets a new state of the art on DL3DV and RealEstate10K with {sim}6{times} fewer Gaussians than pixel-aligned methods, surpassing the best pose-free baseline by 2.1dB and 1.2dB PSNR, respectively. It further generalizes zero-shot to Mip-NeRF360 and ScanNet++, outperforming all comparable baselines. Our project page is at {https://veichta.com/zipsplat{https://veichta.com/zipsplat}}.

Community

Less number of 3D Gaussians, better performance

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.05102
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.05102 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.05102 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers