This paper introduces MoCam, a unified framework for novel view synthesis that addresses the fundamental conflict between geometric and appearance priors in diffusion-based generation.</p>\n","updatedAt":"2026-05-13T01:55:26.700Z","author":{"_id":"6455afeabda0fbba412d4922","avatarUrl":"/avatars/367731ce1c71d1e19ff415a52ae4067d.svg","fullname":"Jun Liang","name":"utopiar","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9172667264938354},"editors":["utopiar"],"editorAvatarUrls":["/avatars/367731ce1c71d1e19ff415a52ae4067d.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.12119","authors":[{"_id":"6a03d99786b054ce2fa40d19","name":"Haofeng Liu","hidden":false},{"_id":"6a03d99786b054ce2fa40d1a","name":"Yang Zhou","hidden":false},{"_id":"6a03d99786b054ce2fa40d1b","name":"Ziheng Wang","hidden":false},{"_id":"6a03d99786b054ce2fa40d1c","name":"Zhengbo Xu","hidden":false},{"_id":"6a03d99786b054ce2fa40d1d","name":"Zhan Peng","hidden":false},{"_id":"6a03d99786b054ce2fa40d1e","name":"Jie Ma","hidden":false},{"_id":"6a03d99786b054ce2fa40d1f","name":"Jun Liang","hidden":false},{"_id":"6a03d99786b054ce2fa40d20","name":"Shengfeng He","hidden":false},{"_id":"6a03d99786b054ce2fa40d21","name":"Jing Li","hidden":false}],"publishedAt":"2026-05-12T00:00:00.000Z","submittedOnDailyAt":"2026-05-13T00:00:00.000Z","title":"MoCam: Unified Novel View Synthesis via Structured Denoising Dynamics","submittedOnDailyBy":{"_id":"6455afeabda0fbba412d4922","avatarUrl":"/avatars/367731ce1c71d1e19ff415a52ae4067d.svg","isPro":false,"fullname":"Jun Liang","user":"utopiar","type":"user","name":"utopiar"},"summary":"Generative novel view synthesis faces a fundamental dilemma: geometric priors provide spatial alignment but become sparse and inaccurate under view changes, while appearance priors offer visual fidelity but lack geometric correspondence. Existing methods either propagate geometric errors throughout generation or suffer from signal conflicts when fusing both statically. We introduce MoCam, which employs structured denoising dynamics to orchestrate a coordinated progression from geometry to appearance within the diffusion process.MoCam first leverages geometric priors in early stages to anchor coarse structures and tolerate their incompleteness, then switches to appearance priors in later stages to actively correct geometric errors and refine details. This design naturally unifies static and dynamic view synthesis by temporally decoupling geometric alignment and appearance refinement within the diffusion process.Experiments demonstrate that MoCam significantly outperforms prior methods, particularly when point clouds contain severe holes or distortions, achieving robust geometry-appearance disentanglement.","upvotes":2,"discussionId":"6a03d99786b054ce2fa40d22","projectPage":"https://orange-3dv-team.github.io/MoCam/","githubRepo":"https://github.com/Orange-3DV-Team/MoCam","githubRepoAddedBy":"user","ai_summary":"MoCam addresses the challenge of generative novel view synthesis by dynamically coordinating geometric and appearance priors through structured denoising dynamics within a diffusion framework.","ai_keywords":["denoising dynamics","diffusion process","geometric priors","appearance priors","view synthesis","point clouds","geometric errors","appearance refinement","temporal decoupling"],"githubStars":20,"organization":{"_id":"69670c472ad4f2c0e892575c","name":"Orange-Team","fullname":"Orange Team","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6455afeabda0fbba412d4922/Sy7zZn0kb-Q1SCSUwOn-9.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6455afeabda0fbba412d4922","avatarUrl":"/avatars/367731ce1c71d1e19ff415a52ae4067d.svg","isPro":false,"fullname":"Jun Liang","user":"utopiar","type":"user"},{"_id":"62cd94c4aac2c91c95538fb9","avatarUrl":"/avatars/6ac1c7a07ff73364892bb5f2f2074e1b.svg","isPro":false,"fullname":"Lars Vagnes","user":"larsh0103","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"69670c472ad4f2c0e892575c","name":"Orange-Team","fullname":"Orange Team","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6455afeabda0fbba412d4922/Sy7zZn0kb-Q1SCSUwOn-9.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.12119.md"}">
MoCam: Unified Novel View Synthesis via Structured Denoising Dynamics
Abstract
MoCam addresses the challenge of generative novel view synthesis by dynamically coordinating geometric and appearance priors through structured denoising dynamics within a diffusion framework.
AI-generated summary
Generative novel view synthesis faces a fundamental dilemma: geometric priors provide spatial alignment but become sparse and inaccurate under view changes, while appearance priors offer visual fidelity but lack geometric correspondence. Existing methods either propagate geometric errors throughout generation or suffer from signal conflicts when fusing both statically. We introduce MoCam, which employs structured denoising dynamics to orchestrate a coordinated progression from geometry to appearance within the diffusion process.MoCam first leverages geometric priors in early stages to anchor coarse structures and tolerate their incompleteness, then switches to appearance priors in later stages to actively correct geometric errors and refine details. This design naturally unifies static and dynamic view synthesis by temporally decoupling geometric alignment and appearance refinement within the diffusion process.Experiments demonstrate that MoCam significantly outperforms prior methods, particularly when point clouds contain severe holes or distortions, achieving robust geometry-appearance disentanglement.
Community
This paper introduces MoCam, a unified framework for novel view synthesis that addresses the fundamental conflict between geometric and appearance priors in diffusion-based generation.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.12119 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.12119 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.12119 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.