Hugging Face Daily Papers · · 4 min read

Relit-LiVE: Relight Video by Jointly Learning Environment Video

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Relit-LiVE is a novel video relighting framework that produces physically consistent and temporally stable results without needing prior knowledge of camera pose. This is achieved by jointly generating relighting videos and environment videos. Additionally, by integrating real-world lighting effects with intrinsic constraints, the relighting videos demonstrate remarkable physical plausibility, showcasing realistic reflections and shadows.</p>\n","updatedAt":"2026-05-13T12:11:35.637Z","author":{"_id":"67f8cf162b3ba6a48e1b70f8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67f8cf162b3ba6a48e1b70f8/PWRNVEllsEPR7dyJKeS8M.png","fullname":"weiqing xiao","name":"weiqingXiao","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9094921946525574},"editors":["weiqingXiao"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/67f8cf162b3ba6a48e1b70f8/PWRNVEllsEPR7dyJKeS8M.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.06658","authors":[{"_id":"69fd4723675c142cf74acfdb","user":{"_id":"67f8cf162b3ba6a48e1b70f8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67f8cf162b3ba6a48e1b70f8/PWRNVEllsEPR7dyJKeS8M.png","isPro":false,"fullname":"weiqing xiao","user":"weiqingXiao","type":"user","name":"weiqingXiao"},"name":"Weiqing Xiao","status":"claimed_verified","statusLastChangedAt":"2026-05-10T16:50:41.834Z","hidden":false},{"_id":"69fd4723675c142cf74acfdc","name":"Hong Li","hidden":false},{"_id":"69fd4723675c142cf74acfdd","name":"Xiuyu Yang","hidden":false},{"_id":"69fd4723675c142cf74acfde","name":"Houyuan Chen","hidden":false},{"_id":"69fd4723675c142cf74acfdf","name":"Wenyi Li","hidden":false},{"_id":"69fd4723675c142cf74acfe0","name":"Tianqi Liu","hidden":false},{"_id":"69fd4723675c142cf74acfe1","name":"Shaocong Xu","hidden":false},{"_id":"69fd4723675c142cf74acfe2","name":"Chongjie Ye","hidden":false},{"_id":"69fd4723675c142cf74acfe3","name":"Hao Zhao","hidden":false},{"_id":"69fd4723675c142cf74acfe4","name":"Beibei Wang","hidden":false}],"publishedAt":"2026-05-07T00:00:00.000Z","submittedOnDailyAt":"2026-05-13T00:00:00.000Z","title":"Relit-LiVE: Relight Video by Jointly Learning Environment Video","submittedOnDailyBy":{"_id":"67f8cf162b3ba6a48e1b70f8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67f8cf162b3ba6a48e1b70f8/PWRNVEllsEPR7dyJKeS8M.png","isPro":false,"fullname":"weiqing xiao","user":"weiqingXiao","type":"user","name":"weiqingXiao"},"summary":"Recent advances have shown that large-scale video diffusion models can be repurposed as neural renderers by first decomposing videos into intrinsic scene representations and then performing forward rendering under novel illumination. While promising, this paradigm fundamentally relies on accurate intrinsic decomposition, which remains highly unreliable for real-world videos and often leads to distorted appearances, broken materials, and accumulated temporal artifacts during relighting. In this work, we present Relit-LiVE, a novel video relighting framework that produces physically consistent, temporally stable results without requiring prior knowledge of camera pose. Our key insight is to explicitly introduce raw reference images into the rendering process, enabling the model to recover critical scene cues that are inevitably lost or corrupted in intrinsic representations. Furthermore, we propose a novel environment video prediction formulation that simultaneously generates relit videos and per-frame environment maps aligned with each camera viewpoint in a single diffusion process. This joint prediction enforces strong geometric-illumination alignment and naturally supports dynamic lighting and camera motion, significantly improving physical consistency in video relighting while easing the requirement of known per-frame camera pose. Extensive experiments demonstrate that Relit-LiVE consistently outperforms state-of-the-art video relighting and neural rendering methods across synthetic and real-world benchmarks. Beyond relighting, our framework naturally supports a wide range of downstream applications, including scene-level rendering, material editing, object insertion, and streaming video relighting. The Project is available at https://github.com/zhuxing0/Relit-LiVE.","upvotes":14,"discussionId":"69fd4723675c142cf74acfe5","projectPage":"https://zhuxing0.github.io/projects/Relit-LiVE/","githubRepo":"https://github.com/zhuxing0/Relit-LiVE","githubRepoAddedBy":"user","ai_summary":"A novel video relighting framework called Relit-LiVE is presented that produces physically consistent results without requiring camera pose information by incorporating raw reference images and using environment video prediction for joint relighting and environment map generation.","ai_keywords":["video diffusion models","neural renderers","intrinsic scene representations","forward rendering","environment video prediction","diffusion process","geometric-illumination alignment","video relighting","neural rendering"],"githubStars":46,"organization":{"_id":"638f70e8f1256a80d4288555","name":"nanjinguniv","fullname":"Nanjing University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/638f706ef1256a80d42880f9/6M6-JzwJGiLxjIJzvCflf.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"67f8cf162b3ba6a48e1b70f8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67f8cf162b3ba6a48e1b70f8/PWRNVEllsEPR7dyJKeS8M.png","isPro":false,"fullname":"weiqing xiao","user":"weiqingXiao","type":"user"},{"_id":"694fb20b3f3948b10046bb35","avatarUrl":"/avatars/67ec7c1e000c2c8e2600b5d6b5572977.svg","isPro":false,"fullname":"xs1390","user":"xs1390","type":"user"},{"_id":"69fd4a9961404a653e865dc4","avatarUrl":"/avatars/3cc011d6130f19315825b96e95294f46.svg","isPro":false,"fullname":"liruiming","user":"liruiming","type":"user"},{"_id":"67e4b502f7e084e54fb98ef4","avatarUrl":"/avatars/79e0b7cb6e74192744d684738175fd43.svg","isPro":false,"fullname":"tieshuai","user":"addwwad","type":"user"},{"_id":"689b5067e4aff5e9a74905d3","avatarUrl":"/avatars/3c82c90137c75198bf05c7023822042a.svg","isPro":false,"fullname":"Chonghao Zhong","user":"zchnanguan7","type":"user"},{"_id":"65eeb8f3ceb1a8d208fcb865","avatarUrl":"/avatars/345eb8ba27503b77e0e9e42a3642de5a.svg","isPro":false,"fullname":"Qianhan Feng","user":"fqhank","type":"user"},{"_id":"652bd2493a416e1f21beb01a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/652bd2493a416e1f21beb01a/tKijq1pbjmBZuRm82dNEV.jpeg","isPro":false,"fullname":"Shaocong.Xu","user":"Daniellesry","type":"user"},{"_id":"6598ca24f82b15d663d27faf","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6598ca24f82b15d663d27faf/Eu0ziSfl70ZcGeH6b8smU.jpeg","isPro":true,"fullname":"Hong Li","user":"luh0502","type":"user"},{"_id":"67ba0c3a07be4ba2bde30dc7","avatarUrl":"/avatars/f4dbe22a881f5120329fa3152520460f.svg","isPro":false,"fullname":"Ma Zhenshu","user":"ZSMa","type":"user"},{"_id":"69199ab74311763d9d8f0b50","avatarUrl":"/avatars/2caba097e741f120aafa820f3cadf70a.svg","isPro":false,"fullname":"Renkai Luo","user":"luork25","type":"user"},{"_id":"66d347eebb76fb26eedb256e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66d347eebb76fb26eedb256e/iCPF7GkmZu--XCsWzoucl.jpeg","isPro":false,"fullname":"tianqi liu","user":"tqliu","type":"user"},{"_id":"669a11086ce92cea1816d505","avatarUrl":"/avatars/e0b3100eff892560b0f81605c4ec389f.svg","isPro":false,"fullname":"Zhiyuan Xu","user":"Pixtella","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"638f70e8f1256a80d4288555","name":"nanjinguniv","fullname":"Nanjing University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/638f706ef1256a80d42880f9/6M6-JzwJGiLxjIJzvCflf.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.06658.md"}">
Papers
arxiv:2605.06658

Relit-LiVE: Relight Video by Jointly Learning Environment Video

Published on May 7
· Submitted by
weiqing xiao
on May 13
Authors:
,
,
,
,
,
,
,
,

Abstract

A novel video relighting framework called Relit-LiVE is presented that produces physically consistent results without requiring camera pose information by incorporating raw reference images and using environment video prediction for joint relighting and environment map generation.

AI-generated summary

Recent advances have shown that large-scale video diffusion models can be repurposed as neural renderers by first decomposing videos into intrinsic scene representations and then performing forward rendering under novel illumination. While promising, this paradigm fundamentally relies on accurate intrinsic decomposition, which remains highly unreliable for real-world videos and often leads to distorted appearances, broken materials, and accumulated temporal artifacts during relighting. In this work, we present Relit-LiVE, a novel video relighting framework that produces physically consistent, temporally stable results without requiring prior knowledge of camera pose. Our key insight is to explicitly introduce raw reference images into the rendering process, enabling the model to recover critical scene cues that are inevitably lost or corrupted in intrinsic representations. Furthermore, we propose a novel environment video prediction formulation that simultaneously generates relit videos and per-frame environment maps aligned with each camera viewpoint in a single diffusion process. This joint prediction enforces strong geometric-illumination alignment and naturally supports dynamic lighting and camera motion, significantly improving physical consistency in video relighting while easing the requirement of known per-frame camera pose. Extensive experiments demonstrate that Relit-LiVE consistently outperforms state-of-the-art video relighting and neural rendering methods across synthetic and real-world benchmarks. Beyond relighting, our framework naturally supports a wide range of downstream applications, including scene-level rendering, material editing, object insertion, and streaming video relighting. The Project is available at https://github.com/zhuxing0/Relit-LiVE.

Community

Paper author Paper submitter about 9 hours ago

Relit-LiVE is a novel video relighting framework that produces physically consistent and temporally stable results without needing prior knowledge of camera pose. This is achieved by jointly generating relighting videos and environment videos. Additionally, by integrating real-world lighting effects with intrinsic constraints, the relighting videos demonstrate remarkable physical plausibility, showcasing realistic reflections and shadows.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.06658
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.06658 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.06658 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers