Hugging Face Daily Papers · · 6 min read

FLAT: Feedforward Latent Triangle Splatting for Geometrically Accurate Scene Generation

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Generating explorable 3D scenes from a single image requires strong generative priors and accurate geometric representations suitable for downstream use. Current video diffusion models offer high-quality generation and implicitly encode multi-view geometric structure in latent space. However, existing feedforward latent scene decoders typically output volumetric 3D Gaussians that lack a well-defined surface, limiting their use in simulation or standard graphics pipelines. This motivates decoding surface-aligned primitives that are not only renderable but also closer to explicit geometric assets. We ask whether compressed video diffusion latents can be mapped directly to explicit surface primitives in a single pass. To this end, we introduce FLAT and, for the first time, show that triangle splats can be decoded directly from video diffusion latents. Compared with decoding 3D Gaussians, predicting flat primitives is notoriously more challenging due to high sensitivity to primitive orientations, oftentimes leading to poor gradient flow. FLAT solves with two key ingredients: a ray-centered rotation parameterization for triangle regression and a novel product window function that improves gradient flow during differentiable triangle rendering. On standard benchmarks, FLAT achieves significantly better geometric accuracy while maintaining competitive visual quality compared to state-of-the-art feedforward baselines. We further show that a lightweight test-time refinement step converts the predicted triangle soup into a fully opaque, game-engine-ready representation that supports real-time rendering. By evaluating 3DGS, 2DGS, and triangle splatting variants under an identical training setup, we provide the first systematic analysis of representation tradeoffs in feedforward scene generation.</p>\n","updatedAt":"2026-06-24T05:57:57.469Z","author":{"_id":"64b7ae216ab5d14ca7febde0","avatarUrl":"/avatars/bb18ece0b50eb72dd6df6e32e5051d52.svg","fullname":"Orest Kupyn","name":"okupyn","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8577659130096436},"editors":["okupyn"],"editorAvatarUrls":["/avatars/bb18ece0b50eb72dd6df6e32e5051d52.svg"],"reactions":[{"reaction":"🚀","users":["BorisK-1"],"count":1}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.24876","authors":[{"_id":"6a3b6f770a86ac3098d5d7ca","name":"Orest Kupyn","hidden":false},{"_id":"6a3b6f770a86ac3098d5d7cb","name":"Goutam Bhat","hidden":false},{"_id":"6a3b6f770a86ac3098d5d7cc","name":"Philipp Henzler","hidden":false},{"_id":"6a3b6f770a86ac3098d5d7cd","name":"Fabian Manhardt","hidden":false},{"_id":"6a3b6f770a86ac3098d5d7ce","name":"Christian Rupprecht","hidden":false},{"_id":"6a3b6f770a86ac3098d5d7cf","name":"Federico Tombari","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/64b7ae216ab5d14ca7febde0/Ofsp5b7zyRvEmcD6tIgkd.qt"],"publishedAt":"2026-06-23T00:00:00.000Z","submittedOnDailyAt":"2026-06-24T00:00:00.000Z","title":"FLAT: Feedforward Latent Triangle Splatting for Geometrically Accurate Scene Generation","submittedOnDailyBy":{"_id":"64b7ae216ab5d14ca7febde0","avatarUrl":"/avatars/bb18ece0b50eb72dd6df6e32e5051d52.svg","isPro":false,"fullname":"Orest Kupyn","user":"okupyn","type":"user","name":"okupyn"},"summary":"Generating explorable 3D scenes from a single image requires strong generative priors and accurate geometric representations suitable for downstream use. Current video diffusion models offer high-quality generation and implicitly encode multi-view geometric structure in latent space. However, existing feedforward latent scene decoders typically output volumetric 3D Gaussians that lack a well-defined surface, limiting their use in simulation or standard graphics pipelines. This motivates decoding surface-aligned primitives that are not only renderable but also closer to explicit geometric assets. We ask whether compressed video diffusion latents can be mapped directly to explicit surface primitives in a single pass. To this end, we introduce FLAT and, for the first time, show that triangle splats can be decoded directly from video diffusion latents. Compared with decoding 3D Gaussians, predicting flat primitives is notoriously more challenging due to high sensitivity to primitive orientations, oftentimes leading to poor gradient flow. FLAT solves with two key ingredients: a ray-centered rotation parameterization for triangle regression and a novel product window function that improves gradient flow during differentiable triangle rendering. On standard benchmarks, FLAT achieves significantly better geometric accuracy while maintaining competitive visual quality compared to state-of-the-art feedforward baselines. We further show that a lightweight test-time refinement step converts the predicted triangle soup into a fully opaque, game-engine-ready representation that supports real-time rendering. By evaluating 3DGS, 2DGS, and triangle splatting variants under an identical training setup, we provide the first systematic analysis of representation tradeoffs in feedforward scene generation. The project page is available at https://flat-splat.github.io","upvotes":2,"discussionId":"6a3b6f770a86ac3098d5d7d0","projectPage":"https://flat-splat.github.io/","ai_summary":"Video diffusion models are adapted to decode explicit surface primitives directly from latent space, enabling high-quality 3D scene generation with improved geometric accuracy and real-time rendering capabilities.","ai_keywords":["video diffusion models","latent space","3D Gaussians","triangle splats","ray-centered rotation parameterization","product window function","differentiable triangle rendering","feedforward scene generation","geometric accuracy","real-time rendering"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"5e6aca39878b8b2bf9806447","name":"google","fullname":"Google","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/WtA3YYitedOr9n02eHfJe.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64b7ae216ab5d14ca7febde0","avatarUrl":"/avatars/bb18ece0b50eb72dd6df6e32e5051d52.svg","isPro":false,"fullname":"Orest Kupyn","user":"okupyn","type":"user"},{"_id":"6a3b796a46f3d4a17550ba08","avatarUrl":"/avatars/b427a0f10d47d490bbba34f97420fb09.svg","isPro":false,"fullname":"Boris K","user":"BorisK-1","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"5e6aca39878b8b2bf9806447","name":"google","fullname":"Google","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/WtA3YYitedOr9n02eHfJe.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.24876.md","query":{}}">
Papers
arxiv:2606.24876

FLAT: Feedforward Latent Triangle Splatting for Geometrically Accurate Scene Generation

Published on Jun 23
· Submitted by
Orest Kupyn
on Jun 24
Authors:
,
,
,
,
,

Abstract

Video diffusion models are adapted to decode explicit surface primitives directly from latent space, enabling high-quality 3D scene generation with improved geometric accuracy and real-time rendering capabilities.

Generating explorable 3D scenes from a single image requires strong generative priors and accurate geometric representations suitable for downstream use. Current video diffusion models offer high-quality generation and implicitly encode multi-view geometric structure in latent space. However, existing feedforward latent scene decoders typically output volumetric 3D Gaussians that lack a well-defined surface, limiting their use in simulation or standard graphics pipelines. This motivates decoding surface-aligned primitives that are not only renderable but also closer to explicit geometric assets. We ask whether compressed video diffusion latents can be mapped directly to explicit surface primitives in a single pass. To this end, we introduce FLAT and, for the first time, show that triangle splats can be decoded directly from video diffusion latents. Compared with decoding 3D Gaussians, predicting flat primitives is notoriously more challenging due to high sensitivity to primitive orientations, oftentimes leading to poor gradient flow. FLAT solves with two key ingredients: a ray-centered rotation parameterization for triangle regression and a novel product window function that improves gradient flow during differentiable triangle rendering. On standard benchmarks, FLAT achieves significantly better geometric accuracy while maintaining competitive visual quality compared to state-of-the-art feedforward baselines. We further show that a lightweight test-time refinement step converts the predicted triangle soup into a fully opaque, game-engine-ready representation that supports real-time rendering. By evaluating 3DGS, 2DGS, and triangle splatting variants under an identical training setup, we provide the first systematic analysis of representation tradeoffs in feedforward scene generation. The project page is available at https://flat-splat.github.io

Community

Paper submitter about 1 hour ago

Generating explorable 3D scenes from a single image requires strong generative priors and accurate geometric representations suitable for downstream use. Current video diffusion models offer high-quality generation and implicitly encode multi-view geometric structure in latent space. However, existing feedforward latent scene decoders typically output volumetric 3D Gaussians that lack a well-defined surface, limiting their use in simulation or standard graphics pipelines. This motivates decoding surface-aligned primitives that are not only renderable but also closer to explicit geometric assets. We ask whether compressed video diffusion latents can be mapped directly to explicit surface primitives in a single pass. To this end, we introduce FLAT and, for the first time, show that triangle splats can be decoded directly from video diffusion latents. Compared with decoding 3D Gaussians, predicting flat primitives is notoriously more challenging due to high sensitivity to primitive orientations, oftentimes leading to poor gradient flow. FLAT solves with two key ingredients: a ray-centered rotation parameterization for triangle regression and a novel product window function that improves gradient flow during differentiable triangle rendering. On standard benchmarks, FLAT achieves significantly better geometric accuracy while maintaining competitive visual quality compared to state-of-the-art feedforward baselines. We further show that a lightweight test-time refinement step converts the predicted triangle soup into a fully opaque, game-engine-ready representation that supports real-time rendering. By evaluating 3DGS, 2DGS, and triangle splatting variants under an identical training setup, we provide the first systematic analysis of representation tradeoffs in feedforward scene generation.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.24876
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.24876 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.24876 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.24876 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers