Hugging Face Daily Papers · May 26, 2026 · 5 min read

Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Generating complete digital twins from videos requires precise camera control, global scene coverage, and strict spatial–temporal consistency—constraints that remain challenging for perspective video generators due to their limited field of view (FoV). Their narrow FoV forces long or multi-view trajectories, amplifying cross-view inconsistency and temporal drift. We argue that 360° video generation offers a natural solution: panoramic coverage simplifies trajectory design and provides strong global context for maintaining coherence. We introduce Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion, a controllable 360° video generation framework that synthesizes high-fidelity videos from sparse 360° inputs. The key idea is an explicit 3D Cache, reconstructed from the input, which serves as a geometric scaffold for any user-defined camera path. This allows the diffusion model to focus on photorealistic texture refinement while the 3D Cache enforces global geometric consistency. Beyond single-image generation, we are the first video diffusion model to support 360° interpolation, enabling seamless chaining of video segments to produce extended, coherent long-form videos. Experiments show that Pantheon360 achieves superior visual quality and unmatched geometric coherence, enabling reliable and flexible 360° scene generation for downstream simulation and digital-twin applications.</p>\n","updatedAt":"2026-05-26T01:54:38.197Z","author":{"_id":"65d78eb72a17b13e8fa32aa6","avatarUrl":"/avatars/1c7d598d67dc0e233d65dac6d9260e1c.svg","fullname":"Ting-Hsuan Chen","name":"Koi953215","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":8,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7953872680664062},"editors":["Koi953215"],"editorAvatarUrls":["/avatars/1c7d598d67dc0e233d65dac6d9260e1c.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.25449","authors":[{"_id":"6a14fa7eb57a1823d57089dc","name":"Ting-Hsuan Chen","hidden":false},{"_id":"6a14fa7eb57a1823d57089dd","name":"Ying-Huan Chen","hidden":false},{"_id":"6a14fa7eb57a1823d57089de","name":"Tao Tu","hidden":false},{"_id":"6a14fa7eb57a1823d57089df","user":{"_id":"655f1770f74fa124d1172ec1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655f1770f74fa124d1172ec1/bdYocZ1qN50CAfb2z2YLA.png","isPro":false,"fullname":"Jie-Ying Lee","user":"jayinnn","type":"user","name":"jayinnn"},"name":"Jie-Ying Lee","status":"claimed_verified","statusLastChangedAt":"2026-05-26T07:48:16.638Z","hidden":false},{"_id":"6a14fa7eb57a1823d57089e0","name":"Cho-Ying Wu","hidden":false},{"_id":"6a14fa7eb57a1823d57089e1","name":"Fangzhou Lin","hidden":false},{"_id":"6a14fa7eb57a1823d57089e2","name":"Hengyuan Zhang","hidden":false},{"_id":"6a14fa7eb57a1823d57089e3","name":"David Paz","hidden":false},{"_id":"6a14fa7eb57a1823d57089e4","name":"Xinyu Huang","hidden":false},{"_id":"6a14fa7eb57a1823d57089e5","name":"Yuliang Guo","hidden":false},{"_id":"6a14fa7eb57a1823d57089e6","name":"Yu-Lun Liu","hidden":false},{"_id":"6a14fa7eb57a1823d57089e7","name":"Yue Wang","hidden":false},{"_id":"6a14fa7eb57a1823d57089e8","name":"Liu Ren","hidden":false}],"publishedAt":"2026-05-25T00:00:00.000Z","submittedOnDailyAt":"2026-05-26T00:00:00.000Z","title":"Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion","submittedOnDailyBy":{"_id":"65d78eb72a17b13e8fa32aa6","avatarUrl":"/avatars/1c7d598d67dc0e233d65dac6d9260e1c.svg","isPro":false,"fullname":"Ting-Hsuan Chen","user":"Koi953215","type":"user","name":"Koi953215"},"summary":"Generating complete digital twins from videos requires precise camera control, global scene coverage, and strict spatial-temporal consistency constraints that remain challenging for perspective video generators due to their limited field of view (FoV). Their narrow FoV forces long or multi-view trajectories, amplifying cross-view inconsistency and temporal drift. We argue that 360° video generation offers a natural solution: panoramic coverage simplifies trajectory design and provides a strong global context for maintaining coherence. We introduce Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion, a controllable 360° video generation framework that synthesizes high-fidelity videos from sparse 360° inputs. The key idea is an explicit 3D Cache, reconstructed from the input, which serves as a geometric scaffold for any user-defined camera path. This allows the diffusion model to focus on photorealistic texture refinement while the 3D Cache enforces global geometric consistency. Experiments show that Pantheon360 achieves superior visual quality and unmatched geometric coherence, enabling reliable and flexible 360° scene generation for downstream simulation and digital-twin applications.","upvotes":14,"discussionId":"6a14fa7fb57a1823d57089e9","projectPage":"https://koi953215.github.io/pantheon360_page/","ai_summary":"Pantheon360 enables high-fidelity 360° video generation for digital twins by combining 3D-aware diffusion with explicit geometric caching to ensure spatial-temporal consistency.","ai_keywords":["360° video generation","digital twin generation","3D-aware diffusion","3D Cache","geometric scaffold","photorealistic texture refinement","spatial-temporal consistency"],"organization":{"_id":"65de44213bee330db5d56ce5","name":"BoschUS","fullname":"Bosch US","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65394c495325fd76ef00d17b/pimnLwYal24nav0qr8yrS.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65d78eb72a17b13e8fa32aa6","avatarUrl":"/avatars/1c7d598d67dc0e233d65dac6d9260e1c.svg","isPro":false,"fullname":"Ting-Hsuan Chen","user":"Koi953215","type":"user"},{"_id":"68d66bdf28e169473e94ef80","avatarUrl":"/avatars/d1dda5cb5f4126e547faf7b4a77551cd.svg","isPro":false,"fullname":"Luchang Li","user":"llc-kc","type":"user"},{"_id":"671b6b5bcca657cc83d89f5a","avatarUrl":"/avatars/df18d906fefc3c3602a7343ae46eeb5e.svg","isPro":false,"fullname":"Wei-Cheng Wang","user":"dw1209","type":"user"},{"_id":"655f1770f74fa124d1172ec1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655f1770f74fa124d1172ec1/bdYocZ1qN50CAfb2z2YLA.png","isPro":false,"fullname":"Jie-Ying Lee","user":"jayinnn","type":"user"},{"_id":"67178582bc4492cad19a1f14","avatarUrl":"/avatars/f2481c0c70a857a862d887beb05c428e.svg","isPro":false,"fullname":"Yi-Chuan Huang","user":"YiChuanH","type":"user"},{"_id":"6847ec7cb510d25b232edbcf","avatarUrl":"/avatars/45e858159a088c6881794433c5ab92ed.svg","isPro":false,"fullname":"Ronnie Pan","user":"JingShuo66","type":"user"},{"_id":"6831476d9d44e9f5196ac0ca","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/g3JSa7pHlI0j67VAuhfSo.png","isPro":false,"fullname":"Peilin Cai","user":"Sakanaction","type":"user"},{"_id":"697d7daf694fc5feb9bc4551","avatarUrl":"/avatars/ebc4c24eb510d302d617f563087a9b4f.svg","isPro":false,"fullname":"Jiaheng Luo","user":"kb-jiaheng","type":"user"},{"_id":"67c0f6c9ebc2e682b28ad555","avatarUrl":"/avatars/543cc10e4621fab853ff39d22cf5423c.svg","isPro":false,"fullname":"Yihao Wang","user":"Yihao-8bit","type":"user"},{"_id":"662cadfa2d4c0e85da6ea471","avatarUrl":"/avatars/6e18fa1308d181f3f570762c207bb69d.svg","isPro":false,"fullname":"Ente Chang","user":"tnderrry","type":"user"},{"_id":"69b2c81a1edd88149e4de27f","avatarUrl":"/avatars/79387a29047232b9020692918c15d0bd.svg","isPro":false,"fullname":"qindao","user":"QINDAO","type":"user"},{"_id":"6a15211e484fb5bc95b5fdf2","avatarUrl":"/avatars/6951f5c4a76cb93d95fc785fb792bf85.svg","isPro":false,"fullname":"Y.-Z.","user":"YZZ7","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"65de44213bee330db5d56ce5","name":"BoschUS","fullname":"Bosch US","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65394c495325fd76ef00d17b/pimnLwYal24nav0qr8yrS.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.25449.md"}">

Papers

arxiv:2605.25449

Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion

Published on May 25

· Submitted by

Ting-Hsuan Chen on May 26

Bosch US

Upvote

Authors:

Jie-Ying Lee ,

Abstract

Pantheon360 enables high-fidelity 360° video generation for digital twins by combining 3D-aware diffusion with explicit geometric caching to ensure spatial-temporal consistency.

AI-generated summary

Generating complete digital twins from videos requires precise camera control, global scene coverage, and strict spatial-temporal consistency constraints that remain challenging for perspective video generators due to their limited field of view (FoV). Their narrow FoV forces long or multi-view trajectories, amplifying cross-view inconsistency and temporal drift. We argue that 360° video generation offers a natural solution: panoramic coverage simplifies trajectory design and provides a strong global context for maintaining coherence. We introduce Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion, a controllable 360° video generation framework that synthesizes high-fidelity videos from sparse 360° inputs. The key idea is an explicit 3D Cache, reconstructed from the input, which serves as a geometric scaffold for any user-defined camera path. This allows the diffusion model to focus on photorealistic texture refinement while the 3D Cache enforces global geometric consistency. Experiments show that Pantheon360 achieves superior visual quality and unmatched geometric coherence, enabling reliable and flexible 360° scene generation for downstream simulation and digital-twin applications.