Hugging Face Daily Papers · · 3 min read

Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Our method enables interactive camera trajectory following and viewpoint manipulation, similar to HappyOyster and Genie 3, using only a single camera-annotated training example.</p>\n","updatedAt":"2026-05-15T02:16:15.946Z","author":{"_id":"6747ede3a9c72aebe1322382","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/inILqQ05sESbYLdsEldJ_.png","fullname":"Tong He","name":"tonghe90","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":11,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8235515356063843},"editors":["tonghe90"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/inILqQ05sESbYLdsEldJ_.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.15182","authors":[{"_id":"6a0680f7b1a8cbabc9f09883","name":"Yifan Wang","hidden":false},{"_id":"6a0680f7b1a8cbabc9f09884","name":"Tong He","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6747ede3a9c72aebe1322382/VzCeUzlvI2FMStlc3Qf0X.mp4"],"publishedAt":"2026-05-14T00:00:00.000Z","submittedOnDailyAt":"2026-05-15T00:00:00.000Z","title":"Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video","submittedOnDailyBy":{"_id":"6747ede3a9c72aebe1322382","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/inILqQ05sESbYLdsEldJ_.png","isPro":false,"fullname":"Tong He","user":"tonghe90","type":"user","name":"tonghe90"},"summary":"Camera-controlled video generation has made substantial progress, enabling generated videos to follow prescribed viewpoint trajectories. However, existing methods usually learn camera-specific conditioning through camera encoders, control branches, or attention and positional-encoding modifications, which often require post-training on large-scale camera-annotated videos. Training-free alternatives avoid such post-training, but often shift the cost to test-time optimization or extra denoising-time guidance. We propose Warp-as-History, a simple interface that turns camera-induced warps into camera-warped pseudo-history with target-frame positional alignment and visible-token selection. Given a target camera trajectory, we construct camera-warped pseudo-history from past observations and feed it through the model's visual-history pathway. Crucially, we align its positional encoding with the target frames being denoised and remove warped-history tokens without valid source observations. Without any training, architectural modification, or test-time optimization, this interface reveals a non-trivial zero-shot capability of a frozen video generation model to follow camera trajectories. Moreover, lightweight offline LoRA finetuning on only one camera-annotated video further improves this capability and generalizes to unseen videos, improving camera adherence, visual quality, and motion dynamics without test-time optimization or target-video adaptation. Extensive experiments on diverse datasets confirm the effectiveness of our method.","upvotes":34,"discussionId":"6a0680f7b1a8cbabc9f09885","projectPage":"https://yyfz.github.io/warp-as-history/","githubRepo":"https://github.com/yyfz/Warp-as-History","githubRepoAddedBy":"user","ai_summary":"A novel approach called Warp-as-History enables camera-controlled video generation by transforming camera-induced warps into pseudo-history representations, achieving zero-shot capability without training or test-time optimization.","ai_keywords":["camera-induced warps","camera-warped pseudo-history","target-frame positional alignment","visible-token selection","visual-history pathway","positional encoding","LoRA finetuning","camera trajectory following","zero-shot capability","video generation"],"githubStars":56},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6747ede3a9c72aebe1322382","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/inILqQ05sESbYLdsEldJ_.png","isPro":false,"fullname":"Tong He","user":"tonghe90","type":"user"},{"_id":"65dcc2510d1261fa54988f9b","avatarUrl":"/avatars/e1ba38d45d65b1f5213462a978114f1d.svg","isPro":false,"fullname":"yyfz233","user":"yyfz233","type":"user"},{"_id":"6989a1a1aa7d770d51915280","avatarUrl":"/avatars/680f8f46d9f91007276fab86e777c980.svg","isPro":false,"fullname":"XuanxuWang","user":"Xuanxu43","type":"user"},{"_id":"66d6819bc8c857729c54ec0d","avatarUrl":"/avatars/c46545865abdf7d73abbd10f1cf516c1.svg","isPro":false,"fullname":"Runzhe Teng","user":"Runge","type":"user"},{"_id":"68c3d9ce8fc494c46b06e9f5","avatarUrl":"/avatars/790bded92bae4ab2f7a0f795a53736a5.svg","isPro":false,"fullname":"YangZhou","user":"YangZhou24","type":"user"},{"_id":"65b0e74f38a0c8f7705b4ec1","avatarUrl":"/avatars/26c7e6a3efb0a0919e5b758e3fe949f9.svg","isPro":false,"fullname":"KaijingMa","user":"fallenleaves","type":"user"},{"_id":"67a5b0fe5a8652514e67c38c","avatarUrl":"/avatars/28da8e93ee00fd77c7e62d16f9b94045.svg","isPro":false,"fullname":"Wenzheng Chang","user":"AmberHeart","type":"user"},{"_id":"66ac9108f34f5779b22bf748","avatarUrl":"/avatars/c01686a7177069729c20ff090495112e.svg","isPro":false,"fullname":"chaos","user":"chaos-abab","type":"user"},{"_id":"655c7bb799c64cb090c76cf8","avatarUrl":"/avatars/060df85cfe4184907084e6d909ea6a1c.svg","isPro":false,"fullname":"Jun Kuang","user":"JKuang96","type":"user"},{"_id":"6283546209aa80237c6c482c","avatarUrl":"/avatars/0d6fc5846c0456d5282d82d5bf4d7056.svg","isPro":false,"fullname":"Haoyi Zhu","user":"HaoyiZhu","type":"user"},{"_id":"69bba6d95cfbfd7e3aee4122","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/z4k2ykath12drrSkCUPU2.png","isPro":false,"fullname":"SII-KaipengZhang","user":"dddd996","type":"user"},{"_id":"638edc6049de7ae552de7456","avatarUrl":"/avatars/14404ea07ee0753a576918177d26a29c.svg","isPro":false,"fullname":"Junyi Chen","user":"SOTAMak1r","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.15182.md"}">
Papers
arxiv:2605.15182

Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video

Published on May 14
· Submitted by
Tong He
on May 15
Authors:
,

Abstract

A novel approach called Warp-as-History enables camera-controlled video generation by transforming camera-induced warps into pseudo-history representations, achieving zero-shot capability without training or test-time optimization.

AI-generated summary

Camera-controlled video generation has made substantial progress, enabling generated videos to follow prescribed viewpoint trajectories. However, existing methods usually learn camera-specific conditioning through camera encoders, control branches, or attention and positional-encoding modifications, which often require post-training on large-scale camera-annotated videos. Training-free alternatives avoid such post-training, but often shift the cost to test-time optimization or extra denoising-time guidance. We propose Warp-as-History, a simple interface that turns camera-induced warps into camera-warped pseudo-history with target-frame positional alignment and visible-token selection. Given a target camera trajectory, we construct camera-warped pseudo-history from past observations and feed it through the model's visual-history pathway. Crucially, we align its positional encoding with the target frames being denoised and remove warped-history tokens without valid source observations. Without any training, architectural modification, or test-time optimization, this interface reveals a non-trivial zero-shot capability of a frozen video generation model to follow camera trajectories. Moreover, lightweight offline LoRA finetuning on only one camera-annotated video further improves this capability and generalizes to unseen videos, improving camera adherence, visual quality, and motion dynamics without test-time optimization or target-video adaptation. Extensive experiments on diverse datasets confirm the effectiveness of our method.

Community

Paper submitter about 23 hours ago

Our method enables interactive camera trajectory following and viewpoint manipulation, similar to HappyOyster and Genie 3, using only a single camera-annotated training example.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.15182
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.15182 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.15182 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.15182 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers