TL;DR. A 2-step generation often has better physics than the full 50-step output. We trace this to phase erosion during denoising, and introduce PhaseLock — a training-free framework that locks the early motion prior into the final high-fidelity output via Latent Delta Guidance. +6.2 pts physical consistency, 1.06× time, 1.02× memory.</p>\n","updatedAt":"2026-06-08T09:34:30.790Z","author":{"_id":"66a1658183dfab1ea4254368","avatarUrl":"/avatars/6b51436712ee1011f228c1bb91953d34.svg","fullname":"Woojung Han","name":"dnwjddl","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9231430292129517},"editors":["dnwjddl"],"editorAvatarUrls":["/avatars/6b51436712ee1011f228c1bb91953d34.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.06361","authors":[{"_id":"6a2242143490a593e87b14e9","user":{"_id":"66a1658183dfab1ea4254368","avatarUrl":"/avatars/6b51436712ee1011f228c1bb91953d34.svg","isPro":false,"fullname":"Woojung Han","user":"dnwjddl","type":"user","name":"dnwjddl"},"name":"Woojung Han","status":"claimed_verified","statusLastChangedAt":"2026-06-05T15:06:43.247Z","hidden":false},{"_id":"6a2242143490a593e87b14ea","name":"Seil Kang","hidden":false},{"_id":"6a2242143490a593e87b14eb","name":"Youngjun Jun","hidden":false},{"_id":"6a2242143490a593e87b14ec","user":{"_id":"64ae22dd1aee69ece065cdcd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ae22dd1aee69ece065cdcd/JG7QaHIrr4i2k4uwR4pZK.png","isPro":false,"fullname":"Min-Hung Chen","user":"cmhungsteve","type":"user","name":"cmhungsteve"},"name":"Min-Hung Chen","status":"claimed_verified","statusLastChangedAt":"2026-06-05T15:06:45.454Z","hidden":false},{"_id":"6a2242143490a593e87b14ed","name":"Fu-En Yang","hidden":false},{"_id":"6a2242143490a593e87b14ee","name":"Seong Jae Hwang","hidden":false}],"publishedAt":"2026-06-04T00:00:00.000Z","submittedOnDailyAt":"2026-06-08T00:00:00.000Z","title":"Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them","submittedOnDailyBy":{"_id":"66a1658183dfab1ea4254368","avatarUrl":"/avatars/6b51436712ee1011f228c1bb91953d34.svg","isPro":false,"fullname":"Woojung Han","user":"dnwjddl","type":"user","name":"dnwjddl"},"summary":"Image-to-Video diffusion models leverage input images to generate visually stunning content, yet frequently produce motion that violates physical laws. We reveal a surprising finding: a 2-step generation often exhibits better physical consistency than a 50-step output from the same model. Through spectral analysis, we trace this to phase erosion during denoising; the phase degrades significantly (dropping by approx 18% from step 2 to step 50), whereas the magnitude remains relatively stable. Building on this insight, we propose PhaseLock, a training-free framework that preserves the valid motion priors from few-step inference throughout the denoising trajectory. Rather than relying on full-step inference for physical consistency, PhaseLock extracts a motion prior from just 2 steps and enforces it onto high-fidelity generation via Latent Delta Guidance. Our approach effectively mitigates phase degradation, improving physical consistency by an average of 6.2 points across diverse models while largely maintaining visual fidelity, with negligible overhead (1.06times time, 1.02times memory) and reduced reliance on expensive external guidance methods (sim5times time).","upvotes":12,"discussionId":"6a2242143490a593e87b14ef","projectPage":"https://dnwjddl.github.io/phaselock","githubRepo":"https://github.com/dnwjddl/phaselock","githubRepoAddedBy":"user","ai_summary":"PhaseLock is a training-free framework that improves physical consistency in image-to-video diffusion models by preserving motion priors from early-step inference throughout the denoising process.","ai_keywords":["image-to-video diffusion models","denoising","phase erosion","spectral analysis","motion priors","Latent Delta Guidance","physical consistency","visual fidelity"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":2},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64ae22dd1aee69ece065cdcd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ae22dd1aee69ece065cdcd/JG7QaHIrr4i2k4uwR4pZK.png","isPro":false,"fullname":"Min-Hung Chen","user":"cmhungsteve","type":"user"},{"_id":"64e37205825f4133e746000c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64e37205825f4133e746000c/HtRXseIO-3Q1vUX8ZzxV3.jpeg","isPro":false,"fullname":"Seil Kang","user":"seil1131","type":"user"},{"_id":"66a1658183dfab1ea4254368","avatarUrl":"/avatars/6b51436712ee1011f228c1bb91953d34.svg","isPro":false,"fullname":"Woojung Han","user":"dnwjddl","type":"user"},{"_id":"65025370b6595dc45c397340","avatarUrl":"/avatars/9469599b176034548042922c0afa7051.svg","isPro":false,"fullname":"J C","user":"dark-pen","type":"user"},{"_id":"6824438d3ce44c778274ffe3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6824438d3ce44c778274ffe3/kOogFlW4o0xdb-vlD1aW1.jpeg","isPro":false,"fullname":"Yumin Kim","user":"yumink","type":"user"},{"_id":"696ed5ad13ccfc9971e5a88d","avatarUrl":"/avatars/74c57e3b2e440f55921398930c682966.svg","isPro":false,"fullname":"Yujin","user":"geenieev","type":"user"},{"_id":"664d4d9e9ebdbf50c0bb7941","avatarUrl":"/avatars/e72538e92630d441f1a2d87897706e0e.svg","isPro":false,"fullname":"Dan Kim","user":"dannyboy0103","type":"user"},{"_id":"672dbe651973ba267ddb4a3c","avatarUrl":"/avatars/2fcc9017094990a85b4ba7bcfab5faea.svg","isPro":false,"fullname":"dayun ju","user":"juda0707","type":"user"},{"_id":"645dd1dfd90782b1a6aacdbb","avatarUrl":"/avatars/f979063c206121afe62d2282b773d373.svg","isPro":false,"fullname":"YOUNGMIN KIM","user":"winston1214","type":"user"},{"_id":"65435ce968493a7808d26d6c","avatarUrl":"/avatars/78cdc9512584afbc2a07395bccb5a0c4.svg","isPro":false,"fullname":"gayoon choi","user":"gynchoi","type":"user"},{"_id":"6628efe14e1fa854f48d3a28","avatarUrl":"/avatars/aa5421149a07a82b5c2a25978f9b6926.svg","isPro":false,"fullname":"Bryan Sangwoo Kim","user":"bryanswkim","type":"user"},{"_id":"666b144b590bb29565b022e9","avatarUrl":"/avatars/7cb473e1812173a9157d989949df74e2.svg","isPro":false,"fullname":"John Loverich","user":"john-glodon","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.06361.md"}">
Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them
Abstract
PhaseLock is a training-free framework that improves physical consistency in image-to-video diffusion models by preserving motion priors from early-step inference throughout the denoising process.
Image-to-Video diffusion models leverage input images to generate visually stunning content, yet frequently produce motion that violates physical laws. We reveal a surprising finding: a 2-step generation often exhibits better physical consistency than a 50-step output from the same model. Through spectral analysis, we trace this to phase erosion during denoising; the phase degrades significantly (dropping by approx 18% from step 2 to step 50), whereas the magnitude remains relatively stable. Building on this insight, we propose PhaseLock, a training-free framework that preserves the valid motion priors from few-step inference throughout the denoising trajectory. Rather than relying on full-step inference for physical consistency, PhaseLock extracts a motion prior from just 2 steps and enforces it onto high-fidelity generation via Latent Delta Guidance. Our approach effectively mitigates phase degradation, improving physical consistency by an average of 6.2 points across diverse models while largely maintaining visual fidelity, with negligible overhead (1.06times time, 1.02times memory) and reduced reliance on expensive external guidance methods (sim5times time).
Community
TL;DR. A 2-step generation often has better physics than the full 50-step output. We trace this to phase erosion during denoising, and introduce PhaseLock — a training-free framework that locks the early motion prior into the final high-fidelity output via Latent Delta Guidance. +6.2 pts physical consistency, 1.06× time, 1.02× memory.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.06361 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.06361 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.06361 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.