Generating complete digital twins from videos requires precise camera control, global scene coverage, and strict spatial–temporal consistency—constraints that remain challenging for perspective video generators due to their limited field of view (FoV). Their narrow FoV forces long or multi-view trajectories, amplifying cross-view inconsistency and temporal drift. We argue that 360° video generation offers a natural solution: panoramic coverage simplifies trajectory design and provides strong global context for maintaining coherence. We introduce Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion, a controllable 360° video generation framework that synthesizes high-fidelity videos from sparse 360° inputs. The key idea is an explicit 3D Cache, reconstructed from the input, which serves as a geometric scaffold for any user-defined camera path. This allows the diffusion model to focus on photorealistic texture refinement while the 3D Cache enforces global geometric consistency. Beyond single-image generation, we are the first video diffusion model to support 360° interpolation, enabling seamless chaining of video segments to produce extended, coherent long-form videos. Experiments show that Pantheon360 achieves superior visual quality and unmatched geometric coherence, enabling reliable and flexible 360° scene generation for downstream simulation and digital-twin applications.</p>\n","updatedAt":"2026-05-26T01:54:38.197Z","author":{"_id":"65d78eb72a17b13e8fa32aa6","avatarUrl":"/avatars/1c7d598d67dc0e233d65dac6d9260e1c.svg","fullname":"Ting-Hsuan Chen","name":"Koi953215","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":8,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7953872680664062},"editors":["Koi953215"],"editorAvatarUrls":["/avatars/1c7d598d67dc0e233d65dac6d9260e1c.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.25449","authors":[{"_id":"6a14fa7eb57a1823d57089dc","name":"Ting-Hsuan Chen","hidden":false},{"_id":"6a14fa7eb57a1823d57089dd","name":"Ying-Huan Chen","hidden":false},{"_id":"6a14fa7eb57a1823d57089de","name":"Tao Tu","hidden":false},{"_id":"6a14fa7eb57a1823d57089df","user":{"_id":"655f1770f74fa124d1172ec1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655f1770f74fa124d1172ec1/bdYocZ1qN50CAfb2z2YLA.png","isPro":false,"fullname":"Jie-Ying Lee","user":"jayinnn","type":"user","name":"jayinnn"},"name":"Jie-Ying Lee","status":"claimed_verified","statusLastChangedAt":"2026-05-26T07:48:16.638Z","hidden":false},{"_id":"6a14fa7eb57a1823d57089e0","name":"Cho-Ying Wu","hidden":false},{"_id":"6a14fa7eb57a1823d57089e1","name":"Fangzhou Lin","hidden":false},{"_id":"6a14fa7eb57a1823d57089e2","name":"Hengyuan Zhang","hidden":false},{"_id":"6a14fa7eb57a1823d57089e3","name":"David Paz","hidden":false},{"_id":"6a14fa7eb57a1823d57089e4","name":"Xinyu Huang","hidden":false},{"_id":"6a14fa7eb57a1823d57089e5","name":"Yuliang Guo","hidden":false},{"_id":"6a14fa7eb57a1823d57089e6","name":"Yu-Lun Liu","hidden":false},{"_id":"6a14fa7eb57a1823d57089e7","name":"Yue Wang","hidden":false},{"_id":"6a14fa7eb57a1823d57089e8","name":"Liu Ren","hidden":false}],"publishedAt":"2026-05-25T00:00:00.000Z","submittedOnDailyAt":"2026-05-26T00:00:00.000Z","title":"Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion","submittedOnDailyBy":{"_id":"65d78eb72a17b13e8fa32aa6","avatarUrl":"/avatars/1c7d598d67dc0e233d65dac6d9260e1c.svg","isPro":false,"fullname":"Ting-Hsuan Chen","user":"Koi953215","type":"user","name":"Koi953215"},"summary":"Generating complete digital twins from videos requires precise camera control, global scene coverage, and strict spatial-temporal consistency constraints that remain challenging for perspective video generators due to their limited field of view (FoV). Their narrow FoV forces long or multi-view trajectories, amplifying cross-view inconsistency and temporal drift. We argue that 360° video generation offers a natural solution: panoramic coverage simplifies trajectory design and provides a strong global context for maintaining coherence. We introduce Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion, a controllable 360° video generation framework that synthesizes high-fidelity videos from sparse 360° inputs. The key idea is an explicit 3D Cache, reconstructed from the input, which serves as a geometric scaffold for any user-defined camera path. This allows the diffusion model to focus on photorealistic texture refinement while the 3D Cache enforces global geometric consistency. Experiments show that Pantheon360 achieves superior visual quality and unmatched geometric coherence, enabling reliable and flexible 360° scene generation for downstream simulation and digital-twin applications.","upvotes":14,"discussionId":"6a14fa7fb57a1823d57089e9","projectPage":"https://koi953215.github.io/pantheon360_page/","ai_summary":"Pantheon360 enables high-fidelity 360° video generation for digital twins by combining 3D-aware diffusion with explicit geometric caching to ensure spatial-temporal consistency.","ai_keywords":["360° video generation","digital twin generation","3D-aware diffusion","3D Cache","geometric scaffold","photorealistic texture refinement","spatial-temporal consistency"],"organization":{"_id":"65de44213bee330db5d56ce5","name":"BoschUS","fullname":"Bosch US","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65394c495325fd76ef00d17b/pimnLwYal24nav0qr8yrS.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65d78eb72a17b13e8fa32aa6","avatarUrl":"/avatars/1c7d598d67dc0e233d65dac6d9260e1c.svg","isPro":false,"fullname":"Ting-Hsuan Chen","user":"Koi953215","type":"user"},{"_id":"68d66bdf28e169473e94ef80","avatarUrl":"/avatars/d1dda5cb5f4126e547faf7b4a77551cd.svg","isPro":false,"fullname":"Luchang Li","user":"llc-kc","type":"user"},{"_id":"671b6b5bcca657cc83d89f5a","avatarUrl":"/avatars/df18d906fefc3c3602a7343ae46eeb5e.svg","isPro":false,"fullname":"Wei-Cheng Wang","user":"dw1209","type":"user"},{"_id":"655f1770f74fa124d1172ec1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655f1770f74fa124d1172ec1/bdYocZ1qN50CAfb2z2YLA.png","isPro":false,"fullname":"Jie-Ying Lee","user":"jayinnn","type":"user"},{"_id":"67178582bc4492cad19a1f14","avatarUrl":"/avatars/f2481c0c70a857a862d887beb05c428e.svg","isPro":false,"fullname":"Yi-Chuan Huang","user":"YiChuanH","type":"user"},{"_id":"6847ec7cb510d25b232edbcf","avatarUrl":"/avatars/45e858159a088c6881794433c5ab92ed.svg","isPro":false,"fullname":"Ronnie Pan","user":"JingShuo66","type":"user"},{"_id":"6831476d9d44e9f5196ac0ca","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/g3JSa7pHlI0j67VAuhfSo.png","isPro":false,"fullname":"Peilin Cai","user":"Sakanaction","type":"user"},{"_id":"697d7daf694fc5feb9bc4551","avatarUrl":"/avatars/ebc4c24eb510d302d617f563087a9b4f.svg","isPro":false,"fullname":"Jiaheng Luo","user":"kb-jiaheng","type":"user"},{"_id":"67c0f6c9ebc2e682b28ad555","avatarUrl":"/avatars/543cc10e4621fab853ff39d22cf5423c.svg","isPro":false,"fullname":"Yihao Wang","user":"Yihao-8bit","type":"user"},{"_id":"662cadfa2d4c0e85da6ea471","avatarUrl":"/avatars/6e18fa1308d181f3f570762c207bb69d.svg","isPro":false,"fullname":"Ente Chang","user":"tnderrry","type":"user"},{"_id":"69b2c81a1edd88149e4de27f","avatarUrl":"/avatars/79387a29047232b9020692918c15d0bd.svg","isPro":false,"fullname":"qindao","user":"QINDAO","type":"user"},{"_id":"6a15211e484fb5bc95b5fdf2","avatarUrl":"/avatars/6951f5c4a76cb93d95fc785fb792bf85.svg","isPro":false,"fullname":"Y.-Z.","user":"YZZ7","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"65de44213bee330db5d56ce5","name":"BoschUS","fullname":"Bosch US","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65394c495325fd76ef00d17b/pimnLwYal24nav0qr8yrS.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.25449.md"}">
Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion
Authors: ,
,
,
,
,
,
,
,
,
,
,
Abstract
Pantheon360 enables high-fidelity 360° video generation for digital twins by combining 3D-aware diffusion with explicit geometric caching to ensure spatial-temporal consistency.
AI-generated summary
Generating complete digital twins from videos requires precise camera control, global scene coverage, and strict spatial-temporal consistency constraints that remain challenging for perspective video generators due to their limited field of view (FoV). Their narrow FoV forces long or multi-view trajectories, amplifying cross-view inconsistency and temporal drift. We argue that 360° video generation offers a natural solution: panoramic coverage simplifies trajectory design and provides a strong global context for maintaining coherence. We introduce Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion, a controllable 360° video generation framework that synthesizes high-fidelity videos from sparse 360° inputs. The key idea is an explicit 3D Cache, reconstructed from the input, which serves as a geometric scaffold for any user-defined camera path. This allows the diffusion model to focus on photorealistic texture refinement while the 3D Cache enforces global geometric consistency. Experiments show that Pantheon360 achieves superior visual quality and unmatched geometric coherence, enabling reliable and flexible 360° scene generation for downstream simulation and digital-twin applications.
Community
Generating complete digital twins from videos requires precise camera control, global scene coverage, and strict spatial–temporal consistency—constraints that remain challenging for perspective video generators due to their limited field of view (FoV). Their narrow FoV forces long or multi-view trajectories, amplifying cross-view inconsistency and temporal drift. We argue that 360° video generation offers a natural solution: panoramic coverage simplifies trajectory design and provides strong global context for maintaining coherence. We introduce Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion, a controllable 360° video generation framework that synthesizes high-fidelity videos from sparse 360° inputs. The key idea is an explicit 3D Cache, reconstructed from the input, which serves as a geometric scaffold for any user-defined camera path. This allows the diffusion model to focus on photorealistic texture refinement while the 3D Cache enforces global geometric consistency. Beyond single-image generation, we are the first video diffusion model to support 360° interpolation, enabling seamless chaining of video segments to produce extended, coherent long-form videos. Experiments show that Pantheon360 achieves superior visual quality and unmatched geometric coherence, enabling reliable and flexible 360° scene generation for downstream simulation and digital-twin applications.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.25449 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.25449 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.25449 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.