We present AAD-1, an Asymmetric Adversarial Distillation framework for One-step autoregressive image-to-video generation. State-of-the-art methods adopt adversarial distillation but suffer from motion collapse and training instability, resulting in static videos. AAD-1 addresses these challenges through two key designs in architecture and training strategy. Our key architectural insight is to break the symmetry between generator and discriminator. While the generator remains causal to preserve autoregressive sampling capability, the discriminator attends bidirectionally over the full spatiotemporal context and produces a single holistic realism score for the entire video sequence. This asymmetric design enables the discriminator to effectively detect global temporal failures and long-range drift that cause motion collapse in autoregressive generation. To stabilize training, we introduce a phased strategy that first uses distribution matching to bootstrap a stable one-step generator, providing a warm-up phase that brings the student distribution closer to the teacher before adversarial distillation begins. Extensive experiments on VBench demonstrate that AAD-1 achieves state-of-the-art performance in one-step autoregressive video generation.</p>\n","updatedAt":"2026-06-04T03:09:18.111Z","author":{"_id":"65b9f710e7c83813628a5cd0","avatarUrl":"/avatars/47075fb646359211b2abe601fa8156d5.svg","fullname":"Yantai Yang","name":"yantaiyang05","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8432103991508484},"editors":["yantaiyang05"],"editorAvatarUrls":["/avatars/47075fb646359211b2abe601fa8156d5.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.03972","authors":[{"_id":"6a202b5e15100c5272a84174","name":"Haobo Li","hidden":false},{"_id":"6a202b5e15100c5272a84175","name":"Yanhong Zeng","hidden":false},{"_id":"6a202b5e15100c5272a84176","name":"Yunhong Lu","hidden":false},{"_id":"6a202b5e15100c5272a84177","name":"Jiapeng Zhu","hidden":false},{"_id":"6a202b5e15100c5272a84178","name":"Hao Ouyang","hidden":false},{"_id":"6a202b5e15100c5272a84179","name":"Qiuyu Wang","hidden":false},{"_id":"6a202b5e15100c5272a8417a","name":"Ka Leong Cheng","hidden":false},{"_id":"6a202b5e15100c5272a8417b","name":"Yujun Shen","hidden":false},{"_id":"6a202b5e15100c5272a8417c","name":"Zhipeng Zhang","hidden":false}],"publishedAt":"2026-06-02T00:00:00.000Z","submittedOnDailyAt":"2026-06-04T00:00:00.000Z","title":"AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation","submittedOnDailyBy":{"_id":"65b9f710e7c83813628a5cd0","avatarUrl":"/avatars/47075fb646359211b2abe601fa8156d5.svg","isPro":false,"fullname":"Yantai Yang","user":"yantaiyang05","type":"user","name":"yantaiyang05"},"summary":"We present AAD-1, an Asymmetric Adversarial Distillation framework for One-step autoregressive image-to-video generation. State-of-the-art methods adopt adversarial distillation but suffer from motion collapse and training instability, resulting in static videos. AAD-1 addresses these challenges through two key designs in architecture and training strategy. Our key architectural insight is to break the symmetry between generator and discriminator. While the generator remains causal to preserve autoregressive sampling capability, the discriminator attends bidirectionally over the full spatiotemporal context and produces a single holistic realism score for the entire video sequence. This asymmetric design enables the discriminator to effectively detect global temporal failures and long-range drift that cause motion collapse in autoregressive generation. To stabilize training, we introduce a phased strategy that first uses distribution matching to bootstrap a stable one-step generator, providing a warm-up phase that brings the student distribution closer to the teacher before adversarial distillation begins. Extensive experiments on VBench demonstrate that AAD-1 achieves state-of-the-art performance in one-step autoregressive video generation.","upvotes":10,"discussionId":"6a202b5e15100c5272a8417d","projectPage":"https://aad-1.github.io/","githubRepo":"https://github.com/AutoLab-SAI-SJTU/AAD-1","githubRepoAddedBy":"user","ai_summary":"AAD-1 framework improves one-step autoregressive image-to-video generation by breaking generator-discriminator symmetry and using phased training to prevent motion collapse and training instability.","ai_keywords":["adversarial distillation","autoregressive","motion collapse","training instability","asymmetric design","causal generator","bidirectional attention","holistic realism score","distribution matching","phased strategy","one-step generation"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":27,"organization":{"_id":"68ee0edd23dc954f7744ac27","name":"AutoLab-SJTU","fullname":"AutoLab","avatar":"https://www.gravatar.com/avatar/d35be2364b0e0b9b57d2487a06bfe26a?d=retro&size=100"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65b9f710e7c83813628a5cd0","avatarUrl":"/avatars/47075fb646359211b2abe601fa8156d5.svg","isPro":false,"fullname":"Yantai Yang","user":"yantaiyang05","type":"user"},{"_id":"63ca8e060609f1def7e6548a","avatarUrl":"/avatars/1da7947840cb87d5f77c0af9ee11f9c2.svg","isPro":true,"fullname":"Yi Jung","user":"YJ-142150","type":"user"},{"_id":"6867c9214778813c1525df09","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/wm53a3F1nj75jec22iQVq.png","isPro":false,"fullname":"LiHaobo","user":"Watay","type":"user"},{"_id":"65ddaaea4a4fce1ec971e9e5","avatarUrl":"/avatars/fdd901a35b10f4882b4185b8053c763d.svg","isPro":false,"fullname":"yangyixiang","user":"yyx123","type":"user"},{"_id":"6448b2f53e7b3c11be684348","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6448b2f53e7b3c11be684348/QvlUQG3pWf8ZyEVBV6F7w.jpeg","isPro":true,"fullname":"Qianli Ma","user":"Mqleet","type":"user"},{"_id":"63c5d43ae2804cb2407e4d43","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1673909278097-noauth.png","isPro":false,"fullname":"xziayro","user":"xziayro","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6a15c469d57ab19bdd02eb7d","avatarUrl":"/avatars/c576b96ba3190641d74ac85e938509f1.svg","isPro":false,"fullname":"佐藤颯太","user":"mateom8","type":"user"},{"_id":"699ee00bb0b9bbb276b85b5e","avatarUrl":"/avatars/f5b7589e2b4d0bc7fea78a7b7814190a.svg","isPro":false,"fullname":"Семёнов Наталья","user":"DANIGARCIA17","type":"user"},{"_id":"6507fbecffc738079ca592bf","avatarUrl":"/avatars/1cb0f39ac6dc2dba2292846a8d7746da.svg","isPro":false,"fullname":"Ming Chen","user":"ChenMing-thu14","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"68ee0edd23dc954f7744ac27","name":"AutoLab-SJTU","fullname":"AutoLab","avatar":"https://www.gravatar.com/avatar/d35be2364b0e0b9b57d2487a06bfe26a?d=retro&size=100"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.03972.md"}">
AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation
Abstract
AAD-1 framework improves one-step autoregressive image-to-video generation by breaking generator-discriminator symmetry and using phased training to prevent motion collapse and training instability.
We present AAD-1, an Asymmetric Adversarial Distillation framework for One-step autoregressive image-to-video generation. State-of-the-art methods adopt adversarial distillation but suffer from motion collapse and training instability, resulting in static videos. AAD-1 addresses these challenges through two key designs in architecture and training strategy. Our key architectural insight is to break the symmetry between generator and discriminator. While the generator remains causal to preserve autoregressive sampling capability, the discriminator attends bidirectionally over the full spatiotemporal context and produces a single holistic realism score for the entire video sequence. This asymmetric design enables the discriminator to effectively detect global temporal failures and long-range drift that cause motion collapse in autoregressive generation. To stabilize training, we introduce a phased strategy that first uses distribution matching to bootstrap a stable one-step generator, providing a warm-up phase that brings the student distribution closer to the teacher before adversarial distillation begins. Extensive experiments on VBench demonstrate that AAD-1 achieves state-of-the-art performance in one-step autoregressive video generation.
Community
We present AAD-1, an Asymmetric Adversarial Distillation framework for One-step autoregressive image-to-video generation. State-of-the-art methods adopt adversarial distillation but suffer from motion collapse and training instability, resulting in static videos. AAD-1 addresses these challenges through two key designs in architecture and training strategy. Our key architectural insight is to break the symmetry between generator and discriminator. While the generator remains causal to preserve autoregressive sampling capability, the discriminator attends bidirectionally over the full spatiotemporal context and produces a single holistic realism score for the entire video sequence. This asymmetric design enables the discriminator to effectively detect global temporal failures and long-range drift that cause motion collapse in autoregressive generation. To stabilize training, we introduce a phased strategy that first uses distribution matching to bootstrap a stable one-step generator, providing a warm-up phase that brings the student distribution closer to the teacher before adversarial distillation begins. Extensive experiments on VBench demonstrate that AAD-1 achieves state-of-the-art performance in one-step autoregressive video generation.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.03972 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.03972 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.