Less number of 3D Gaussians, better performance</p>\n","updatedAt":"2026-06-04T15:22:22.733Z","author":{"_id":"686fb8a66ea5d5fb0a4953a9","avatarUrl":"/avatars/b4aa187b82dd04a5a7ece3b922d86657.svg","fullname":"Sunghwan Hong","name":"hongsunghwan","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8230152130126953},"editors":["hongsunghwan"],"editorAvatarUrls":["/avatars/b4aa187b82dd04a5a7ece3b922d86657.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.05102","authors":[{"_id":"6a2176c23490a593e87b0edc","name":"Alexander Veicht","hidden":false},{"_id":"6a2176c23490a593e87b0edd","name":"Sunghwan Hong","hidden":false},{"_id":"6a2176c23490a593e87b0ede","name":"Dániel Baráth","hidden":false},{"_id":"6a2176c23490a593e87b0edf","name":"Marc Pollefeys","hidden":false}],"publishedAt":"2026-06-03T00:00:00.000Z","submittedOnDailyAt":"2026-06-04T00:00:00.000Z","title":"ZipSplat: Fewer Gaussians, Better Splats","submittedOnDailyBy":{"_id":"686fb8a66ea5d5fb0a4953a9","avatarUrl":"/avatars/b4aa187b82dd04a5a7ece3b922d86657.svg","isPro":false,"fullname":"Sunghwan Hong","user":"hongsunghwan","type":"user","name":"hongsunghwan"},"summary":"Feed-forward 3D Gaussian Splatting methods reconstruct a scene from posed or pose-free images in a single forward pass, yet current approaches predict one Gaussian per input pixel, tying the representation budget to camera resolution rather than scene complexity. A flat wall and a richly textured object thus produce equally many Gaussians despite very different geometric needs. We propose ZipSplat, a token-based feed-forward model that decouples Gaussian placement from the pixel grid. A multi-view backbone extracts dense visual tokens, and k-means clustering compresses them into a compact set of scene tokens. Cross- and self-attention refine these tokens, and a lightweight MLP decodes each into a group of Gaussians with unconstrained 3D positions. Because clustering is applied at inference, a single trained model spans the quality-efficiency curve without retraining. ZipSplat operates without ground-truth poses or intrinsics, yet sets a new state of the art on DL3DV and RealEstate10K with {sim}6{times} fewer Gaussians than pixel-aligned methods, surpassing the best pose-free baseline by 2.1dB and 1.2dB PSNR, respectively. It further generalizes zero-shot to Mip-NeRF360 and ScanNet++, outperforming all comparable baselines. Our project page is at {https://veichta.com/zipsplat{https://veichta.com/zipsplat}}.","upvotes":13,"discussionId":"6a2176c23490a593e87b0ee0","projectPage":"https://veichta.com/zipsplat/","githubRepo":"https://github.com/cvg/ZipSplat","githubRepoAddedBy":"user","ai_summary":"ZipSplat is a token-based feed-forward method that decouples 3D Gaussian placement from pixel grid, enabling efficient scene reconstruction with fewer Gaussians and superior performance on pose-free imaging tasks.","ai_keywords":["3D Gaussian Splatting","feed-forward","token-based model","multi-view backbone","k-means clustering","cross-attention","self-attention","MLP decoding","pose-free","DL3DV","RealEstate10K","Mip-NeRF360","ScanNet++"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":28,"organization":{"_id":"63263d7db8e57aab1a778773","name":"ethz","fullname":"ETH Zurich","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/xMcrQI8Yx8o697uhiCcoA.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"686fb8a66ea5d5fb0a4953a9","avatarUrl":"/avatars/b4aa187b82dd04a5a7ece3b922d86657.svg","isPro":false,"fullname":"Sunghwan Hong","user":"hongsunghwan","type":"user"},{"_id":"69a66c6c20822b4afdb771fc","avatarUrl":"/avatars/52e504ace4b87b4fb87cfa181b1ae0ae.svg","isPro":false,"fullname":"KAIST-CVLAB","user":"kaistcvlab","type":"user"},{"_id":"637c49ec9c470afa3880b137","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/pdcMPz8N6vQM1tc8IA1lV.png","isPro":false,"fullname":"Seongchan Kim","user":"Seongchan","type":"user"},{"_id":"67c7b179e3f9241dde9ff772","avatarUrl":"/avatars/37cc7a744d8077a0fe7d926cde9d52b2.svg","isPro":false,"fullname":"LeeJaeho","user":"Jaeho0810","type":"user"},{"_id":"602e45160daeb0df2a81b244","avatarUrl":"/avatars/f6bf69f0c1342f8cfad05d5775e59bf4.svg","isPro":true,"fullname":"Seokju Cho","user":"hamacojr","type":"user"},{"_id":"6325abd45cf955bfbbdd68f7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6325abd45cf955bfbbdd68f7/w3cxi_Z5R7s5uDxEN35R5.jpeg","isPro":true,"fullname":"Alexander Veicht","user":"veichta","type":"user"},{"_id":"63f727b1bd28622c9b951114","avatarUrl":"/avatars/19fec633419e00e5363f0229c7c40b8d.svg","isPro":false,"fullname":"Chaehyun Kim","user":"chyun","type":"user"},{"_id":"67861f4658328c475597e540","avatarUrl":"/avatars/ff3d7b7912544cd0799d289e6c51db7a.svg","isPro":false,"fullname":"Seonghu Jeon","user":"SeonghuJeon","type":"user"},{"_id":"64cb5884d469fc2cf83bdd76","avatarUrl":"/avatars/10e63cf62d8200beef3e31846796e398.svg","isPro":false,"fullname":"JisooKim","user":"Jiiiiiisoo","type":"user"},{"_id":"67b54163c40b5b496bc3ded0","avatarUrl":"/avatars/1a1b1310cf1d664983d780651be55e04.svg","isPro":true,"fullname":"Jini Yang","user":"jini-yang","type":"user"},{"_id":"6752b6315281c3cae4b0783f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/xmcyVEl2xBhk3G5_7dmpz.png","isPro":false,"fullname":"Paul Hyunbin Cho","user":"paulcho98","type":"user"},{"_id":"652554ff88514c588fb9ea01","avatarUrl":"/avatars/50f2218632d1423980a3e5bef4e1c4e8.svg","isPro":false,"fullname":"Junghyun Park","user":"jamespark30","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"63263d7db8e57aab1a778773","name":"ethz","fullname":"ETH Zurich","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/xMcrQI8Yx8o697uhiCcoA.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.05102.md"}">
ZipSplat: Fewer Gaussians, Better Splats
Abstract
ZipSplat is a token-based feed-forward method that decouples 3D Gaussian placement from pixel grid, enabling efficient scene reconstruction with fewer Gaussians and superior performance on pose-free imaging tasks.
Feed-forward 3D Gaussian Splatting methods reconstruct a scene from posed or pose-free images in a single forward pass, yet current approaches predict one Gaussian per input pixel, tying the representation budget to camera resolution rather than scene complexity. A flat wall and a richly textured object thus produce equally many Gaussians despite very different geometric needs. We propose ZipSplat, a token-based feed-forward model that decouples Gaussian placement from the pixel grid. A multi-view backbone extracts dense visual tokens, and k-means clustering compresses them into a compact set of scene tokens. Cross- and self-attention refine these tokens, and a lightweight MLP decodes each into a group of Gaussians with unconstrained 3D positions. Because clustering is applied at inference, a single trained model spans the quality-efficiency curve without retraining. ZipSplat operates without ground-truth poses or intrinsics, yet sets a new state of the art on DL3DV and RealEstate10K with {sim}6{times} fewer Gaussians than pixel-aligned methods, surpassing the best pose-free baseline by 2.1dB and 1.2dB PSNR, respectively. It further generalizes zero-shot to Mip-NeRF360 and ScanNet++, outperforming all comparable baselines. Our project page is at {https://veichta.com/zipsplat{https://veichta.com/zipsplat}}.
Community
Less number of 3D Gaussians, better performance
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.05102 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.05102 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.